Job Description
Our client, a global digital consultancy, is seeking a Site Reliability Engineer to safeguard their production environments in central London. As part of a great team of engineers, you will be responsible for designing and implementing architecture, automation, observability tooling, service level objectives (SLOs), production support, and incident management.
You will:
* Balance feature development velocity and reliability with well-defined SLOs.
* Monitor the production environment's availability and take a holistic view of system health.
* Drive the incident management process and support a blameless post-mortem culture.
* Partner with development teams to improve services through rigorous testing and release procedures.
* Participate in system design consulting, platform management, and capacity planning.
* Create sustainable systems and services through automation and uplifts.
To succeed, you will need:
* A degree in Computer Science or a related technical field involving coding and/or systems engineering.
* Proficiency in one or more programming languages, including Go, Python, C, C++, Java, Perl, Ruby, or shell scripting.
* Experience with algorithms, data structures, software design, UNIX operating systems internals, and/or networking.
* Excellent communication skills to act as a bridge between internal teams and external stakeholders.
* Excellent problem-solving skills.
Prior experience in distributed systems design, maintenance, and troubleshooting, as well as hands-on experience with debugging and optimizing code, automation, strong interpersonal skills, drive, and ownership are desirable qualifications.
Salary: £90,000 - £110,000 per year.