Lead mlops sre

London

Posted: 24 November

Offer description

Description There’s nothing more exciting than being at the center of a rapidly growing field in technology and applying your skillsets to drive innovation and modernize the world's most complex and mission-critical systems. The Aumni team at JPMorgan Chase is looking for a Lead MLOps engineer to help us build out a team to manage our core model hosting, deployment, and monitoring infrastructure in AWS. A Lead MLOps Engineer within the Digital Private Markets department will help us solve complex and broad business problems with practical and straightforward solutions. Through code and cloud infrastructure, you will configure, maintain, and monitor the systems and models produced by our data science teams. You are a significant contributor to your team by sharing your knowledge of end-to-end operations, availability, reliability, and scalability in the AI/ML space. You also will serve as a leader and mentor to junior engineers as the team works to enable the downstream Data Science and ML Engineering teams as they execute on our product roadmap. Job responsibilities Develops and maintains infrastructure as code to support Data Science and Machine Learning initiatives Designs and implements automated continuous integration and continuous delivery pipelines for the Data Science teams to develop and train AI/ML models Mentors junior MLops engineers and Data Scientists, setting standards for model deployment and maintenance Leads technical discussions with developers, key stakeholders, and team members to resolve complex technical problems Builds technical roadmaps in collaboration with senior leadership and identifies risks or design optimizations Proactively resolve issues before they impact internal and external stakeholders of deployed models Champions the adoption of MLOps best-practices within your team Optimizes workloads for production and manages performance and observability for these workloads Required qualifications, capabilities, and skills Experience serving as a technical leader and mentor for teams in the MLops space Formal training or certification on MLOps concepts and/or 5 years applied experience. Has managed the deployment of models in production environments. Excellent communication skills and the ability to explain technical concepts to non-technical audiences Practical knowledge of MLOps culture and principles; familiarity with how to scale these ideas to support multiple data science teams Can articulate the importance of monitoring and observability in the AI/ML space. Enforces its implementation & use across an organization Domain knowledge of machine learning applications and technical processes within the AWS ecosystem. Extensive expertise with Terraform, containers and container orchestration, especially Kubernetes Knowledge of continuous integration and continuous delivery tools like Jenkins, GitLab, or Github Actions & associated best practices Expert level in the following programming languages: Python, Bash Deep working knowledge of DevOps best practices, Linux, and networking internals Understanding of the different roles served by data engineers, data scientists, machine learning engineers, and system architects, and how MLOps contributes to each of these workstreams Ability to work with a geographically distributed team across multiple timezones Preferred qualifications, capabilities, and skills Comfortable with team management, fostering collaboration, promoting design patterns, and presenting technical concepts to non-technical audiences Understands how to break down large concepts and goals into smaller requirements and train junior engineers on how to execute against these requirements Experience with ML model training and deployment pipelines, managing scoring endpoints in the financial industry Familiarity with observability concepts and telemetry collection using tools such as Datadog, Grafana, Prometheus, Splunk, and others Experience working with ML engineering platforms such as Databricks and Sagemaker Experience working with Data Engineering technologies such as Snowflake and Airflow Comfortable troubleshooting common containerization technologies and issues Ability to identify new technologies and relevant solutions to improve design patterns where appropriate An understanding of the nuances of managing GPU specific workloads AWS Solutions Architect certification or equivalent experience

See the details

Create E-mail Alert

Save

See more jobs