Role : SRE Lead Architect Location : Birmingham/ Sheffield (Hybrid - 3 days a week) Lead Technical SME - Capacity Planning & Control Uplift Role Overview: We are seeking a Lead Technical Subject Matter Expert (SME) with strong systems thinking and a solid grasp of SRE principles to drive the technical uplift of capacity and observability controls across our technology estate. This role blends hands-on engineering depth with architectural oversight and focuses on enhancing performance, resilience, and control effectiveness across services and platforms. The ideal candidate brings both operational sensibility and the ability to drive scalable solutions — aligning technical capabilities with internal control frameworks and regulatory expectations. Key Responsibilities: • Lead the design and technical evaluation of capacity management, utilisation monitoring, and observability controls across platforms. • Apply SRE-aligned practices to identify control gaps, performance risks, and areas for automation. • Assess existing tooling, data flows and operational practices to identify control gaps and propose remediation strategies. • Collaborate with engineering, infrastructure, architecture, and risk teams to validate technical designs and implementation plans. • Define reusable technical patterns and tooling strategies that enhance operational readiness and control sustainability. • Support roadmap shaping, tooling assessment, and documentation for governance and operational readiness. Required Skills & Experience: • 10 years in engineering, infrastructure, or technical architecture roles in complex technology environments. • Solid understanding of compute, storage, and network capacity planning across mixed deployment models. • Familiarity with SRE disciplines such as observability, service-level indicators/objectives (SLIs/SLOs), and automation of operational tasks. • Demonstrated ability to interpret and apply control requirements in technical design contexts. • Hands-on experience with performance monitoring, alerting systems, and diagnostic tooling (e.g., Geneos, Prometheus, Grafana, AppDynamics, or similar tools). • Strong communication skills — able to convey technical concepts to senior stakeholders and control partners. Desirable: • Experience in implementing or uplifting operational controls (capacity, performance, availability). • Exposure to internal risk frameworks or external regulatory requirements (e.g., DORA, EBA, PRA). • Background in service reliability, system diagnostics, or incident response.