Role: Site Reliability Engineer Reporting to: Senior Platform Engineers This is a hybrid role with 2 days a week at our HQ in East Croydon and 3 days working from home. However, in Easter 2025 we will be moving to our new HQ in Clapham Junction with 3 days a week in the office. Here at The Gym Group we believe we are simply the best in our industry Our amazing teams have skills, abilities and can-do attitudes that make this a great place to work We have strong, embedded values and an easy-going culture which ensures we put our people first. We pride ourselves on being fun, innovative, inclusive and engaging. We are now on the lookout for an IT Service Delivery Manager. So, what's stopping you? Apply today and be YOU with us What you need to know about us We're 1 in our industry when it comes to our values, our culture and our people - check out Glassdoor if you don't believe us. We offer a fantastic place to work in a great working culture. We have been recognised as by The Sunday Times as one of The Best Places to Work which is all down to our great leaders and exceptional teams. We may be one of the leading budget gym chains in the UK but there's nothing budget about our investment into our people. We have retained GOLD in our investors in people award for the last 6 years and silver in wellbeing last year. We're recognised as a disability confident and inclusive employer which is something we are truly proud of. We have a brilliant team and opportunities for development and growth with support for success. Having recently undertaken some huge projects from a digital point of view and our plans are to keep being innovative, creative and agile in all that we do. What you need to know about the role As a Site Reliability Engineer (SRE) at The Gym Group, you'll play a pivotal role in ensuring our digital channels deliver fast, reliable, and delightful experiences for every visitor. Collaborating with a talented team of engineers-across Development, Platform/DevOps, InfoSec, QA, and SRE-alongside Technical Architects and our Digital Ops Manager, you'll ensure our cloud infrastructure and applications are always available, high-performing, and highly observable. You'll be at the forefront of driving improvements to our deployment strategies, system monitoring, logging, and alerting capabilities. Additionally, you'll enhance our readiness to rapidly detect, diagnose, and recover from production issues. Working with a modern tech stack-including Terraform, Kubernetes, GitHub, Azure DevOps, Service Bus, Cosmos DB, Redis, and Cloudflare -you'll help shape our transition to a fully microservice-based, observability-led architecture that supports continuous delivery. While this role primarily operates during standard office hours, occasional late work may be required for system tests, upgrades, or major incident response. Importantly, no formal on-call commitment is required. Let us tell you what we are looking for Essential Skills: Personal & Professional Clear, concise communication skills. Team-oriented, with a strong commitment to collaboration. Calm and effective under pressure. Advocate for best practices with exceptional analytical and problem-solving abilities. Site Reliability Engineering Expertise Solid understanding of core SRE principles (e.g., Golden Signals, SLIs/SLOs, SRE metrics, release engineering, blameless retrospectives). Expertise in log analysis and incident triage. Performance monitoring, dashboard creation, and alerting rule management. Proficient in scripting and coding (e.g., Bash, PowerShell, Python ). Experience with Root Cause Analysis (RCA), Fault Tree Analysis, FMEA, or similar reliability engineering methods. Strong knowledge of DevSecOps tools, methodologies, and Infrastructure-as-Code (IaC). Cloud Computing Extensive experience with a major public cloud platform (e.g., Azure, AWS, or GCP ). Proficiency in containerisation technologies (e.g., Docker, Helm). Awareness of network security and networking protocols. Solid general computing knowledge (e.g., hardware performance, software fault modes, vulnerability patching, system hardening). Desirable Skills: Experience with eCommerce applications. Advanced knowledge of Microsoft Azure (e.g., VNETs, Storage Containers, App Gateway, APIM, App Service). Proficiency in Kubernetes and Azure DevOps (YAML Pipelines). Familiarity with Azure monitoring tools (e.g., Monitor, Application Insights ). Expertise in Terraform, FinOps, and cloud infrastructure optimization. Knowledge of Cloudflare, Azure Active Directory / Entra ID. Experience deploying and supporting Node.js and .NET stack applications. Familiarity with GitOps, Policy-as-Code, and distributed system design patterns (e.g., event-driven, microservices). Strong understanding of relational and NoSQL database management systems. So, we've told you all about us and our amazing new opportunity; now it's your turn to hit 'Apply' and tell us about YOU. If you have a disability or condition that makes it difficult for you to complete your application online please email your cv to recruitmentthegymgroup.com or alternatively call the TGG Recruitment team on 0203 319 4838 and someone will be more than happy to support you. We also want to put it out there that we actively encourage applications from a diverse demographic and we are passionate about your culture and value alignment. We want this to be a match that challenges your limits and works for you as much as for us. When we say We're With You we really do mean it