An opportunity has arisen to join our team as a Principal SRE This is a dynamic role that will produce and deliver scalable software solutions as part of a multidisciplinary Scrum team. As a Principal Site Reliability Engineer (SRE), you will lead and drive SRE best practices and DevOps adoption across the organisation. You will be responsible for enhancing the resilience, scalability, and performance of our cloud-based applications while working alongside business verticals (Health & Care, Legal, Education, Accelerator, etc.) to align SRE strategy with business needs. You will play a key leadership role in developing the DevOps Toolchain Center of Excellence (CoE), fostering a culture of automation, observability, and continuous improvement. Working closely with the SRE Engineering Manager and Product Engineering teams, you will help define strategies to improve the reliability and efficiency of applications that support the business. LI-EH1 Key Responsibilities SRE & DevOps Leadership Lead SRE strategy across multiple business verticals to ensure alignment with company objectives. Define and implement best-in-class reliability engineering practices, ensuring high availability, scalability, and performance of applications. Spearhead incident management, post-mortems, and reliability reviews to improve system resilience. Drive adoption of SLOs (Service Level Objectives) and SLIs (Service Level Indicators) across teams. DevOps Toolchain & Automation CoE Work with the DevOps Toolchain Center of Excellence (CoE) to drive best practices in CI/CD, monitoring, security, and automation across applications Evaluate, implement, and maintain modern DevOps tools and technologies for software delivery, observability, and incident response. Promote Infrastructure as Code (IaC), GitOps, and Kubernetes best practices to standardize environments. Develop and advocate self-service platforms for engineering teams to improve efficiency. Collaboration with Business Verticals Work closely with teams across Health & Care, Legal, Education, and Accelerator business units to tailor SRE and DevOps strategies that align with their specific requirements. Understand unique application challenges in each business vertical and design solutions to improve scalability, security, and reliability. Serve as a trusted advisor to engineering leads, ensuring their product applications align with SRE and DevOps standards. Cross-Team Collaboration Partner with SRE Engineering Manager, Software Engineers, Platform Engineers, and Security Teams to build reliable and scalable cloud-native applications. Influence architecture and engineering decisions to ensure reliability and cost-efficiency at scale. Collaborate with Product and Business stakeholders to define system priorities and drive alignment. Observability, Incident Response & Continuous Improvement Implement and enhance monitoring, logging, and observability across systems. Drive incident response processes, blameless post-mortems, and root cause analysis. Improve system scalability, latency, and overall performance through proactive engineering practices. Lead capacity planning, fault tolerance, and disaster recovery initiatives. As a Principal SRE, you will have: 10 years of experience in Site Reliability Engineering (SRE), DevOps, or Cloud Infrastructure roles. Strong cloud experience with AWS, Azure, or GCP, including serverless and containerized architectures (Kubernetes, Docker, Terraform, Helm, etc.). Proven ability to drive SRE best practices, automation, and DevOps culture across multiple teams. Expertise in CI/CD pipelines (GitHub Actions, ArgoCD, Jenkins, Spinnaker, etc.) and Infrastructure as Code (IaC). Deep understanding of monitoring, logging, and observability tools (Prometheus, Grafana, Datadog, New Relic, OpenTelemetry). Experience with SLOs, SLIs, Error Budgets, and Incident Management best practices. Strong leadership and collaboration skills, with experience influencing engineering and product teams. LI-EH1 Wellbeing focused – Our people are our greatest assets, and ensuring everyone feels their best self to come to work is integral Annual Leave – 25 days of annual leave, plus public holidays and the ability to buy additional days Employee Assistance Programme – Free advice, support, and confidential counselling available 24/7 through Care First Endometriosis Friendly Employer - We are proud to confirm our commitment to developing an environment and culture that allows those with endometriosis to thrive in the workplace Personal Growth - Regardless of where you are at in your career, we’re committed to enabling your growth personally and professionally Development Programmes – From Future Managers to Leadership Training, our development programmes help you get where you need to go Performance Bonus – Our Group-wide bonus scheme enables you to reap the rewards of your success Financial wellbeing - We understand as well as your mental wellbeing, your financial wellbeing is really important Pension Scheme – Our plan with Scottish Widows offers 5% matched contribution by the company Income protection insurance – Providing you with support and assistance when you need it most Discounted Parking - We have partnered with QPark to provide an exclusive discounted rate for OneAdvanced employee's when purchasing a digital season tickets Recognition – Highlighting and rewarding the great work our people do Performance & Talent – Our own technology platform that allows you to get real-time feedback, conversations and goals to help you become your best self Making a Difference – we provide opportunities to help our people make a difference to the causes they care about MatchIt – Fundraise for a cause close to your heart and Advanced will match part of the funding Volunteering Time – Our volunteering leave scheme allows you to use your time to help those who need it Pennies from Heaven – donate the pennies from your pay check to help make a difference without lifting a finger OneAdvanced is one UK's largest providers of business software and services serving 20,000 global customers with an annual turnover of £330M. We manage 1.5 million 111 calls per month, support over 2 million Further Education learners across the UK, handle over 10 million wills, and so much more. Our mission is to power the world of work and, as you can see, our software underpins some of the UK's most critical sectors. We invest in our brilliant people. They are at the heart of our success as we strive to be a diverse, inclusive and engaging place to work that not only powers the world of work, but empowers the growth, ambitions and talent of our people. To learn more about working at OneAdvanced please click here