About the role
As a Data Engineer, you will be responsible for designing, building, and maintaining efficient, scalable data pipelines and infrastructure. You will work closely with data scientists, analysts, and other stakeholders to ensure that the organization’s data systems are robust, reliable, and meet the needs of various business units. Your expertise will help in integrating various data sources, implementing data models, and ensuring data quality, security, and performance.
Key Responsibilities:
1. Design, develop, and maintain scalable ETL/ELT data pipelines to process large volumes of structured and unstructured data.
2. Integrate data from multiple sources (databases, APIs, cloud storage, etc.) into a unified data warehouse or data lake.
3. Collaborate with the DevOps team to manage cloud infrastructure (AWS, Azure, GCP) for data storage and processing.
4. Optimize data storage solutions, including relational databases, NoSQL databases, and data lakes.
5. Design and implement efficient data models (Star Schema, Snowflake Schema, etc.) to support analytics and reporting requirements.
6. Ensure data models are scalable and aligned with business objectives.
7. Implement data validation, monitoring, and error-handling processes to maintain high data quality.
8. Collaborate with Data Governance teams to ensure compliance with data privacy regulations (e.g., GDPR, CCPA).
9. Work closely with data analysts, data scientists, product managers, and business stakeholders to understand data requirements.
10. Provide support in troubleshooting data-related issues and ensure timely resolution.
11. Monitor and optimize the performance of data pipelines and queries for efficiency.
12. Use indexing, partitioning, and other techniques to enhance database performance.
13. Create and maintain comprehensive documentation of data pipelines, data architecture, and processes.
14. Follow best practices for data engineering, including code versioning, testing, and automation.
Technical Skills
* Proficiency in SQL and experience with relational databases (e.g., PostgreSQL, MySQL, SQL Server).
* Experience with ETL tools (e.g., Apache Airflow, AWS Glue, Informatica) and data warehousing (e.g., Snowflake, Redshift, BigQuery).
* Proficiency in programming languages such as Python, Java, or Scala.
* Knowledge of big data technologies (e.g., Apache Spark, Hadoop).
* Experience with cloud platforms (AWS, Azure, GCP) and related services (e.g., S3, Azure Data Lake, Google Cloud Storage).
* Familiarity with NoSQL databases (e.g., MongoDB, Cassandra).
* Experience with data modeling, schema design, and data pipeline orchestration.
Preferred:
* Experience with containerization and orchestration (Docker, Kubernetes).
* Familiarity with data governance and privacy regulations.
* Knowledge of CI/CD practices and version control systems (e.g., Git).
* Experience with monitoring and logging tools (e.g., Datadog, Splunk).
Soft Skills:
* Strong problem-solving and analytical skills.
* Excellent communication skills to collaborate with cross-functional teams.
* Attention to detail and a commitment to data quality.
* Ability to work in a fast-paced, agile environment.
Tools and Technologies:
Examples not all are required
* Data Warehousing: Snowflake, AWS Redshift, Google BigQuery
* Programming Languages: Python, SQL, Scala, Java
* Big Data Frameworks: Apache Spark, Hadoop
* Version Control: Git, GitHub, GitLab
Hybrid Work
Morningstar’s hybrid work environment gives you the opportunity to work remotely and collaborate in-person each week. We’ve found that we’re at our best when we’re purposely together on a regular basis, at least three days each week. A range of other benefits are also available to enhance flexibility as needs change. No matter where you are, you’ll have tools and resources to engage meaningfully with your global colleagues.
#J-18808-Ljbffr