Description Out of the successful launch of Chase in 2021, our new team is dedicated to creating customer-centric products that address real-world problems. We foster an environment that encourages skill development and realization of potential, valuing collaboration, curiosity, and commitment. As a Site Reliability Engineer lll at JPMorgan Chase within the Accelerators Engineering team, you are the heart of this venture, focused on getting smart ideas into the hands of our customers. You have a curious mindset, thrive in collaborative squads, and are passionate about new technology. By your nature, you are also solution-oriented, commercially savvy and have a head for fintech. You thrive in working in tribes and squads that focus on specific products and projects – and depending on your strengths and interests, you'll have the opportunity to move between them. While we’re looking for professional skills, culture is just as important to us. We understand that everyone's unique – and that diversity of thought, experience and background is what makes a good team, great. By bringing people with different points of view together, we can represent everyone and truly reflect the communities we serve. This way, there's scope for you to make a huge difference – on us as a company, and on our clients and business partners around the world Job responsibilities Guides and assists others in the areas of building appropriate level designs and gaining consensus from peers where appropriate Collaborates with other software engineers and teams to design and implement deployment approaches using automated continuous integration and continuous delivery pipelines Collaborates with other software engineers and teams to design, develop, test, and implement availability, reliability, scalability, and solutions in their applications Implements infrastructure, configuration, and network as code for the applications and platforms in your remit Collaborates with technical experts, key stakeholders, and team members to resolve complex problems Understands service level indicators and utilizes service level objectives to proactively resolve issues before they impact customers Supports the adoption of site reliability engineering best practices within your team Proactively recognize road blocks and demonstrates interest in learning technology that facilitates innovation Identify new technologies and relevant solutions to ensure design constraints are met by the software team Required qualifications, capabilities, and skills Formal training or certification on site reliability engineering concepts and proficient applied experience Proficient in site reliability culture and principles and familiarity with how to implement site reliability within an application or platform Proven public or private cloud experience (GCP is our priority) Proficient in at least one programming language such as Python, Java, or Go Proficient knowledge of software applications and technical processes within a given technical discipline Experience in observability such as white and black box monitoring, service level objective alerting, and telemetry collection using tools such as Grafana, Dynatrace, Prometheus, Datadog, Splunk, and others Experience with continuous integration and continuous delivery tools like Jenkins, GitHub, or Terraform Manage, configure and troubleshoot operating system issues, storage (block and object), networking (VPCs, proxies and CDNs), and administer high-availability Cockroach, PostgreSQL and Redis clusters Monitoring and instrumentation: implement metrics in Prometheus, Grafana, log management and related system, and Slack/PagerDuty integrations Extensive Kubernetes operational experience (ideally including Istio, ArgoCD) Familiarity with container and container orchestration such as ECS, Kubernetes, Helm charts and Docker