Who we are looking for
A Site Reliability Engineer who will develop software solutions, consult with development teams and work with modern telemetry data to maintain and improve the performance of key systems.
The site reliability team provide an increasingly important service to our technology department.
Focusing on application performance, reliability, availability, capacity and health, you will work with other teams across the platform department to help ensure our critical systems are reliable and observable. You will be working to provide solutions to help minimise toil and provide operational efficiency at scale on our critical systems for those that operate them.
You will work with a wide range of technologies developing solutions, consulting with development teams and working with contemporary observability and incident management tools to assist the Business. You will be required to make effective decisions to improve the health and maintain the availability and performance of some of our most critical systems.
This role is eligible for inclusion in the Company’s hybrid working from home policy.
Preferred skills and experience
* Excellent knowledge of SRE principles, including the creation and management of effective SLI’s and SLO’s for reliability and customer satisfaction.
* Knowledge of contemporary observability tools, techniques and best practice including Splunk, New Relic, Grafana and Pager Duty.
* Excellent knowledge of programming languages including Python, Golang and JavaScript.
* Knowledge and experience of modern software development techniques and lifecycles.
* Experience with automation and orchestration platforms such as Ansible and Jenkins.
* Prior experience working in a large scale, 24/7 enterprise where system uptime and stability is of paramount importance to the business.
* Keen interest of industry trends, particularly DevOps.
Main Responsibilities
* Developing bespoke in house tooling using a range of technologies to provide effective operational support capabilities for our colleagues in IT Operations.
* Working with automation and orchestration platforms to automate manual activity and reduce toil.
* Building sophisticated dashboards using a range of telemetry data and dash boarding technologies like Grafana, Splunk and New Relic.
* Maintaining and administering existing monitoring and analytic toolsets.
* Mentoring colleagues in use of new technologies or practices.
* Contributing to the evolution of team processes and approaches.
* Collaborating with colleagues in the wider platform teams to determine requirements and solutions, to solve problems and progress work.
* Working with IT Operations to provide and support the use of critical tooling that will enable increasing levels of value to the Business.
By applying to us you are agreeing to share your Personal Data in accordance with our Recruitment Privacy Policy -