Position Title: Azure Site Reliability Engineer (SRE)
EMPLOYMENT TYPE: Contract / Long Term Project
LOCATION: Flexible (Hybrid – 2 days a week onsite/ Reading)
Job Summary:
We are seeking an experienced Azure Site Reliability Engineer (SRE) to ensure the reliability, scalability, and performance of Azure-based systems and services. The ideal candidate will have a deep technical understanding of Azure, strong problem-solving skills, and the ability to manage operational challenges effectively. This role requires a balance of technical expertise and operational excellence to drive continuous improvement.
Key Responsibilities:
· Monitor and maintain the health, availability, and performance of Azure-based systems and services.
· Develop and implement automated solutions to reduce operational burden and improve system reliability.
· Troubleshoot and resolve incidents, ensuring minimal downtime and root cause analysis.
· Collaborate with cross-functional teams to design, deploy, and refine scalable Azure architectures.
· Drive operational excellence through the implementation of metrics, monitoring, and reporting systems.
· Optimize cloud resources to ensure cost efficiency while maintaining high performance.
Qualifications:
· Proven experience in Azure cloud infrastructure, with expertise in services like Azure Kubernetes Service (AKS), Azure DevOps, and monitoring tools.
· Strong scripting and automation skills using Power Shell, Python, or similar languages.
· Familiarity with CI/CD pipelines, containerization (Docker), and orchestration technologies.
· Solid understanding of cloud architecture, networking, and security best practices.
· Excellent problem-solving and incident management skills.
· Azure certifications (e.g., Azure Solutions Architect, Azure Administrator) are a plus.
This position requires candidates who have a valid work permit to work in the UK. No sponsorship is provided.