Role Overview [UAE Based]
We are seeking a skilled Data Engineer to join our Generative AI team. You will play a critical role in designing, building, and maintaining robust data pipelines and infrastructure to support the development, training, and deployment of cutting-edge Generative AI models. This role requires a blend of technical expertise, problem-solving skills, and a passion for working with data at scale.
Key Responsibilities
* Data Infrastructure Development: Design, implement, and maintain scalable data pipelines and ETL processes to support Generative AI applications.
* Data Preparation: Collaborate with AI researchers and data scientists to preprocess, clean, and transform large datasets for training and evaluation of AI models.
* Model Integration: Support the deployment and monitoring of Generative AI models, ensuring efficient data flow and integration with production systems.
* Database Management: Optimize and manage data storage solutions, ensuring high availability, security, and performance.
* Automation: Develop tools and scripts to automate data workflows, monitoring, and reporting processes.
* Collaboration: Work closely with cross-functional teams, including AI researchers, software engineers, and product managers, to meet project goals.
* Data Governance: Ensure compliance with data privacy and security regulations, and implement best practices for data quality and lineage.
Qualifications
Must-Have:
* Bachelor’s or Master’s degree in Computer Science, Data Engineering, or a related field.
* Proven experience in building and maintaining large-scale data pipelines and infrastructure.
* Proficiency in programming languages such as Python, Scala, or Java.
* Hands-on experience with big data technologies (e.g., Hadoop, Spark, Kafka).
* Strong knowledge of SQL and NoSQL databases (e.g., PostgreSQL, MongoDB).
* Familiarity with cloud platforms (e.g., AWS, Azure, GCP) and data-related services (e.g., S3, Redshift, BigQuery).
* Understanding of Generative AI concepts and familiarity with frameworks like TensorFlow, PyTorch, or Hugging Face.
Nice-to-Have:
* Experience working with unstructured data (e.g., text, images, audio) for AI applications.
* Knowledge of MLOps practices and tools (e.g., MLflow, Kubeflow, Docker).
* Familiarity with version control systems (e.g., Git) and CI/CD pipelines.
* Experience with real-time data processing and streaming technologies.
* Contributions to open-source projects in AI or data engineering.