Job Description
We're looking for an accomplished and motivated Site Reliability Engineer (SRE) to join our Experian Data Quality team in London, on a hybrid working pattern.
Reporting to the QA Director, you will ensure the reliability, performance, and scalability of our market leading suite of data management products, with an initial focus on observability to support incident resolution and lead ongoing performance and stability improvements.
You will develop and implement monitoring solutions to resolve issues, maintain dashboards, alerts, and visualisations to provide real-time insights into system health and performance, and analyse and interpret telemetry data to identify and address potential issues before they effect customers.
Qualifications
What you'll need to bring to the role & Experian
1. Experience with incident management and SRE best practices.
2. Proficiency in observability tools such as Prometheus, Grafana, Splunk, or similar.
3. Experience with Linux/Unix systems, networking, and cloud infrastructure.
4. Experience of Containerisation with tools such as Docker, Kubernetes, and Amazon EKS.
5. Scripting and IaC automation; experience with Terraform or CloudFormation desirable.
6. Familiarity with CI/CD pipelines and tools (Azure DevOps desirable)