We are seeking a Data Engineer to join our team. This role will involve designing, building, and optimizing data pipelines and infrastructure that enable efficient storage, processing, and analysis of large datasets. You’ll work closely with data scientists, analysts, and other engineering teams to deliver clean, reliable data for business insights.
Key Responsibilities:
* Design, build, and optimize ETL/ELT pipelines using AWS services like Glue, Lambda, S3, and Redshift to support data processing needs.
* Architect scalable storage solutions using data lakes (S3) and data warehouses (Redshift, RDS) to ensure efficient querying and data availability.
* Integrate data from various internal and external sources, ensuring consistency, reliability, and availability for analytics.
* Continuously monitor and optimize data processing workflows for speed, reliability, and cost efficiency.
* Partner with data scientists, business analysts, and other teams to enable self-service analytics and support data-driven decision-making.
* Implement automation for data workflows, deployment, and monitoring, using tools like CloudFormation, Terraform, or AWS CDK.
* Ensure data security and compliance with regulatory standards, implementing proper access controls, encryption, and governance policies.
* Maintain clear documentation on data pipelines, workflows, and architecture to ensure smooth operations and knowledge sharing.
Required Skills & Experience:
* Hands-on experience with AWS data services like S3, Redshift, Glue, RDS, Lambda, and DynamoDB.
* Strong background in building and optimizing ETL/ELT pipelines using AWS Glue, Apache Spark, or Python.
* Experience in designing and managing data lakes, data warehouses, and databases for efficient storage and querying.
* Ability to integrate diverse data sources, including APIs, databases, and flat files, ensuring consistency for analytical purposes.
* Experience automating data workflows, deployment, and monitoring using AWS CloudFormation, Terraform, or AWS CDK.
* Proficiency in Python, SQL, or Java for developing custom data solutions and processing large datasets.
* Familiarity with Hadoop, Spark, or Kafka for processing large-scale datasets.
* Knowledge of data security best practices, including encryption, IAM roles, and GDPR compliance.
* Strong communication and teamwork skills to work effectively with cross-functional teams, ensuring data solutions meet business needs.
Preferred Qualifications:
* AWS Certified Solutions Architect – Associate, AWS Certified Big Data – Specialty, or other relevant AWS certifications.
* Experience with AWS Kinesis, Kafka, or other real-time data streaming technologies.
* Familiarity with AWS Glue Data Catalog or Apache Atlas for data governance.
* Experience with preparing data for machine learning workflows, supporting data scientists with clean and structured data.
* Experience with Amazon EMR, Redshift Spectrum, or AWS Data Pipeline.
If you are a talented Data Engineer with experience in building scalable data pipelines and managing cloud infrastructure, we want to hear from you!
#J-18808-Ljbffr