You will play a crucial role in designing, building, and maintaining our data platform, with a strong emphasis on streaming data, cloud infrastructure, and machine learning operations.
Key Responsibilities:
1. Architect and Implement Data Pipelines:
o Design, develop, and maintain scalable and efficient data pipelines.
o Optimize ETL processes to ensure seamless data ingestion, processing, and integration across various systems.
2. Streaming Data Platform Development:
o Lead the development and maintenance of a real-time data streaming platform using tools like Apache Kafka, Databricks, Kinesis.
o Ensure the integration of streaming data with batch processing systems for comprehensive data management.
3. Cloud Infrastructure Management:
o Utilize AWS data engineering services (including S3, Redshift, Glue, Kinesis, Lambda, etc.) to build and manage our data infrastructure.
o Continuously optimize the platform for performance, scalability, and cost-effectiveness.
4. Communications:
o Collaborate with cross-functional teams, including data scientists and BI developers, to understand data needs and deliver solutions.
o Leverage the project management team to coordinate project requirements, timelines, and deliverables, allowing you to concentrate on technical excellence.
5. ML Ops and Advanced Data Engineering:
o Establish ML Ops practices within the data engineering framework, focusing on automation, monitoring, and optimization of machine learning pipelines.
6. Data Quality and Governance:
o Implement and maintain data quality frameworks, ensuring the accuracy, consistency, and reliability of data across the platform.
o Drive data governance initiatives, including data cataloguing, lineage tracking, and adherence to security and compliance standards.
Requirements
Experience:
* 3+ years of experience in data engineering, with a proven track record in building and maintaining data platforms, preferably on AWS.
* Strong proficiency in Python, experience in SQL and PostgreSQL. PySpark, Scala or Java is a plus.
* Familiarity with Databricks and the Delta Lakehouse concept.
* Experience mentoring or leading junior engineers is highly desirable.
Skills:
* Deep understanding of cloud-based data architectures and best practices.
* Proficiency in designing, implementing, and optimizing ETL/ELT workflows.
* Strong database and data lake management skills.
* Familiarity with ML Ops practices and tools, with a desire to expand skills in this area.
* Excellent problem-solving abilities and a collaborative mindset.
Nice to Have:
* Familiarity with containerization and orchestration tools (e.g., Docker, Kubernetes).
* Knowledge of machine learning pipelines and their integration with data platforms.
#J-18808-Ljbffr