We are seeking a skilled and proactive Data Engineer to join our team and collaborate closely with the Solution Architect to review and enhance our current system. This role involves modifying existing code to implement new features, ensuring data privacy enhancements, and maintaining the reliability and performance of our data systems. The Data Engineer will actively contribute throughout the Agile development lifecycle, participating in planning, refinement, and review ceremonies.
Key Responsibilities:
* Develop and maintain ETL pipelines in Databricks, leveraging Apache Spark and Delta Lake.
* Design, implement, and optimize data transformations and treatments for structured and unstructured data.
* Work with Hive Metastore and Unity Catalog for metadata management and access control.
* Implement State Store mechanisms for maintaining stateful processing in Spark Structured Streaming.
* Handle DataFrames efficiently for large-scale data processing and analytics.
* Schedule, monitor, and troubleshoot Databricks pipelines for automated workflow execution.
* Enable pause/resume functionality in pipelines based on responses from external API calls.
* Ensure scalability, reliability, and performance optimization for distributed computing environments.
* Collaborate with Data Scientists, Analysts, and DevOps teams to streamline data access and governance.
* Maintain data integrity and security standards in compliance with enterprise data governance policies.
* Review the existing system architecture and its functionalities in collaboration with the Solution Architect and Lead Data Engineer.
* Modify and extend existing code to implement new features and improvements.
* Perform thorough unit testing to verify system functionality and data accuracy.
* Document all changes made, including technical impact assessments and rationales.
* Work within GitLab repository structures and adhere to project-specific processes.
Required Skills and Experience:
* Strong expertise in Databricks, Apache Spark, and Delta Lake.
* Experience with Hive Metastore and Unity Catalog for data governance.
* Proficiency in Python, SQL, Scala, or other relevant languages.
* Familiarity with structured streaming, event-driven architectures, and stateful processing.
* Ability to design, schedule, and optimize Databricks workflows.
* Knowledge of REST APIs for integrating external services into pipeline execution.
* Experience with cloud platforms like Azure Databricks, AWS, or Google Cloud.
* Experience with Databricks Notebooks for development and testing.
* Familiarity with AWS S3 for data storage and management.
* Understanding the platform migrations and dependency requirements, challenges and dependency deliverables.
* Understanding of data privacy principles and ability to implement privacy-aware solutions.
* Experience in unit testing for data pipelines or systems.
* Proficient in version control using GitLab.
* Solid understanding of Agile methodologies and experience working in Scrum or Kanban environments.
Preferred (Nice to Have):
* Understanding of data warehousing, lakehouse architectures, and modern data platforms.
* Strong analytical and problem-solving skills with a focus on automation and efficiency.
* Knowledge of making and handling API calls.
* Experience working with Docker containers, Ansible.
Soft Skills:
* Strong analytical and problem-solving skills.
* Clear communication and collaboration abilities.
* Ability to work independently and with cross-functional teams.
Seniority level
* Associate
Employment type
* Full-time
Job function
* Information Technology
Industries
* IT Services and IT Consulting
#J-18808-Ljbffr