Job Title: Staff MLOps Engineer
We are seeking a highly skilled Staff MLOps Engineer to join our Machine Learning team. As a key member of our team, you will work closely with engineers to create a platform on top of data that will be leveraged by virtually every other product and system we have built or will build in the future.
Key Responsibilities:
* Design, implement, and maintain robust MLOps platforms and tooling for both batch and streaming ML pipelines.
* Develop and manage monitoring and observability solutions for ML systems.
* Lead DevOps practices, including CI/CD pipelines and Infrastructure as Code (IaC).
* Architect and implement cloud-based solutions on AWS.
* Collaborate with ML Engineers and Data Scientists to develop, train, and deploy machine learning models.
* Engage in feature engineering and model optimization to improve ML system performance.
* Participate in the full ML lifecycle, from data preparation to model deployment and monitoring.
* Optimize and refactor existing systems for improved performance and reliability.
* Drive technical initiatives and best practices in both MLOps and ML Engineering.
Requirements:
* Strong Python Proficiency: Excellent skills for developing, deploying, and maintaining our machine learning systems.
* Language Versatility: Experience with statically-typed or JVM languages. Willingness to learn Scala is highly desirable.
* Cloud Engineering Skills: Extensive experience with Cloud Platforms & Services, ideally AWS (e.g., Lambda, ECS, ECR, CloudWatch, MSK, SNS, SQS).
* Infrastructure as Code: Proficiency in IaC, particularly Terraform.
* Kubernetes Expertise: Strong hands-on experience with managing clusters and deploying services.
* Data Orchestration: Experience with ML orchestration tools (e.g., Flyte, Airflow, Kubeflow, Luigi, or Prefect).
* CI/CD: Expertise in pipelines, especially GitHub Actions and Jenkins.
* Networking: Knowledge of concepts and implementation.
* Streaming: Experience with Kafka and other streaming technologies.
* ML Monitoring: Familiarity with observability tools (e.g., Arize AI, Weights and Biases).
* NLP/LLMs: Experience with NLP, LLMs, and RAG systems in production, or strong desire to learn.
* CLI & Shell Scripting: Proficiency in scripting and command-line tools.
* APIs: Experience with deploying and managing production APIs.
* Software Engineering Best-Practices: Knowledge of industry standards and practices.
Preferred Qualifications:
* AWS AI Services: Hands-on experience with AWS SageMaker and/or AWS Bedrock.
* Data Processing: Experience with high-volume, unstructured data processing.
* ML Applications: Familiarity with NLP, Computer Vision, and traditional ML applications.
* System Migration: Previous work in refactoring and migrating complex systems.
* AWS Certification: AWS Solution Architect Professional or Associate certification.
* Advanced Degree: Master's degree in ML / AI / Computer Science.
Personal Qualities:
* Passionate about building developer-friendly platforms and tools.
* Thrives in a terminal-based development environment.
* Enthusiastic about creating production-grade, robust, reliable, and performant systems.
* Not afraid to dive into and improve complex existing solutions.
* Team player who works well with ML Engineers, Data Scientists, and management.
* Strong technical mentoring skills.
* Excellent problem-solving and communication skills.
Salary: $120,000 - $180,000 per year