Description Join a globally recognized financial organization and advance your profession to new heights by contributing to revolutionary projects. You've discovered the perfect environment to have a major impact. As Principal Site Reliability Engineer Lead at JP Morgan Chase within Corporate Oversight and Governance Technology, you draw upon your advanced knowledge to identify new opportunities to influence critical incident management and improve the end-to-end lifecycle of software development for the firm. You will have the opportunity to lead, manage, design, and implement infrastructure components to improve reliability and ensure operational efficiency. You are a key influencer in the space of SRE Excellence, leading the SRE function, driving continual improvement in customer experience, resiliency, security, scalability, monitoring, instrumentation, and automation of the software in your area. You act in a blameless, data-driven manner and navigate difficult situations with composure and tact. Corporate Oversight and Governance Technology is responsible for developing solutions that support the Compliance, Controls Management, Resiliency, Legal, Regulatory, and Audit line of businesses. The solutions support 1st, 2nd, and 3rd line independent review, monitoring and oversight of business operations with a focus on legal and regulatory obligations related to the offerings of the firm’s products and services. Job responsibilities Leads the SRE function, providing ownership and accountability for raising-the-bar on SRE Excellence Influences and creates new designs, architectures, standards, and best practices in support of service level objectives Troubleshoots priority incidents, conducts objective post-mortems, and ensures permanent closure of incidents Defines and drives adoption of a best-in-class monitoring framework to accomplish end-to-end flow monitoring and effective alerting Identifies and solves problems of high complexity Works with development teams throughout the Software Development Life Cycle to ensure sustainable software releases Leads medium to large projects by bringing together the proper perspective, identifying roadblocks, and integrating feedback from team members and subject matter experts at the firm Participates in support responsibilities for coverage of critical applications Sees problems as opportunities to improve Required qualifications, capabilities, and skills Advanced knowledge of software applications and technical processes with considerable depth in one or more technical disciplines Ability to determine how each system relates to each other and use breadth of tools to build automation to improve reliability for the firm Experienced with detecting opportunities to automate, combine, or simplify control points and executing solutions Experience leading complex projects supporting site reliability engineering design, scaling, resilience, and system performance assessments for critical applications Understands and leads partnerships across job functions (e.g., Cybersecurity and Data) to develop efficient and developer-friendly systems Demonstrated prior experience developing SLO/SLIs, maturity uplifts, observability strategies and TOIL reduction at an enterprise scale Demonstrated prior experience managing high severity production incidents to resolution Strong experience of observability tools both on premise and in the cloud Influences the teams' culture by championing innovation and change for group-wide success Ability to balance and be accountable for the work of multiple architects and designers Experience with translating research, analysis, and tests into business recommendations Preferred qualifications, capabilities, and skills Experience of SRE for SaaS applications Experience running SRE maturity uplift programs