In short
As a Senior Machine Learning Engineer at On, you'll play a critical role in the full lifecycle of our machine learning models. Besides being responsible for training and deploying models, you will spearhead our MLOps initiatives to ensure their seamless and efficient integration and operation in production. This includes championing MLOps best practices, enhancing deployment processes, developing essential tooling and automation to maximize the impact of our AI solutions, and implementing robust monitoring to optimize performance and reliability.
Your Mission
1. Lead the implementation and continuous improvement of our MLOps strategy, establishing best practices for model development, deployment, and monitoring.
2. Create and train machine learning models to solve specific business problems, such as product recommendations, customer segmentation, and demand forecasting. Implement such models into production systems to make predictions, drive real-time personalization, and support decision-making.
3. Design and build the necessary infrastructure and tooling to support efficient and scalable model deployment, including CI/CD pipelines and automated testing.
4. Implement and own Terraform to manage and provision our cloud infrastructure for machine learning operations.
5. Oversee the transition to a real-time streaming architecture for our machine learning applications, ensuring efficient data ingestion, feature engineering, and model serving in a streaming context.
6. Develop and implement a comprehensive monitoring framework to track model performance, identify potential issues, and ensure optimal model health in production. Monitor model performance and update them as needed to adapt to new data and changing conditions.
7. Collaborate closely with data scientists and engineers to ensure seamless integration of models into our existing systems and workflows. Stay abreast of the latest MLOps trends and technologies to continuously improve our processes and tools.
Your story
1. You have 5+ years of experience as a Machine Learning Engineer with a strong focus on MLOps. You have a proven track record of successfully deploying and managing machine learning models in production environments.
2. You possess deep knowledge of MLOps principles, tools, and best practices.
3. You are proficient in cloud platforms (Google Cloud Platform is preferred) and infrastructure-as-code tools like Terraform.
4. You have experience with CI/CD pipelines, containerization technologies (e.g., Docker), and orchestration tools (e.g., Kubernetes) and using orchestration tools such as Kubeflow (our preferred tool) or similar frameworks like Apache Airflow to manage and automate ML workflows.
5. You have experience with real-time data streaming technologies such as Kafka and Confluent and feature stores in such settings.
6. You are skilled in building and maintaining monitoring systems for machine learning models.
7. You have excellent communication and collaboration skills, enabling you to effectively work with cross-functional teams.
Bonus:
1. Knowledge of frameworks such as LangChain used to orchestrate LLMs.
2. Experience in LLM evaluations, debugging, and monitoring using tools such as LangFuse or LangSmith.
#J-18808-Ljbffr