Site Reliability Engineering Manager
Posting Date: 19 Sep 2024
Function: Software Engineering
Unit: Digital
Location: Martlesham Heath Business Park, Ipswich, United Kingdom
Salary: Competitive with great benefits
This is a hybrid role (3 days a week in the office) based in Ipswich
The successful candidate will ensure operational stability & performance of OR systems across CRM, workflow, IIP & Field operations to deliver expected business benefits. You will focus on driving the adoption of operational best-practices across OR platforms and optimising service levels across OR, representing OR Technology at senior stakeholder level on our operational performance, trading and service reliability.
As a Site Reliability Engineer (SRE), you will be required to build a meaningful engineering discipline, combining software and systems to develop creative engineering solutions to operations problems. Much of our support and software development focuses on optimizing existing systems, and reducing work through automation.
What you’ll be doing
* SRE
* Operational Readiness
* Operational Stability
* Operational Assurance - ensure all delivered solutions are fit to run
* Trading Reliability - SLAs met for P1/P2/P3 incident management
* Incident Turnaround and resolution – lead on E2E ownership of incidents though the resolution may be owned by other teams
* Automation - reduction in levels of ASG Service Requests/manual workarounds and automation of incident handling
* Security coverage of the estate along with application of latest security patches
* Customer and stakeholder management
* Building key critical skills across the team
* Support cultural change through execution of how systems need to be supported
* Process change – continuous improvement of ASM processes
* Innovation to find new ways to provide operational system support whilst maximising efficiencies and cost savings(using AI/ML)
* Maintains an oversight of key technology transformation programmes in specific area of expertise, monitoring performance against business objectives, scorecard and responding and recommending actions to trends and taking executive action
Skills and Experience
Qualifications:
* May have an engineering degree qualification (engineering/science) in Tier 1 institution or has served a technical apprenticeship and/or obtained NVQ and/or further education technical qualifications (i.e. HND)
* Qualified to be and possibly member of a professional engineering/science institution and working towards chartered engineer accreditation
* Relevant professional experience
Skills/Experience:
* Experience working in a Software Development, Dev Ops, Site Reliability Engineering, Support or Infrastructure position or team
* Hands on experience in ideating, implementing and delivering SRE practices across mid/large tier organization
* Demonstrable knowledge of continuous integration and/or continuous deployment tools and scripting
* Ability to conduct thorough investigations, including a deep dive, into reliability and scaling issues from both a code and infrastructure perspective
* Experience working with source management tools like GitHub
* Experience with Microservices architecture
* Strong knowledge on Change and Incident Management Process
* Python
* Monitoring tools like AppD, Datadog, Dynatrace
* Shell scripting (BASH or similar)
* ELK, App Dynamics
* Helm and Kubernetes manifests
* CI/CD pipelines on GitLab
* Scalable Docker containers, ideally on Kubernetes
* Dockerfiles for Node.js and PHP applications
About us
BT is part of BT Group, along with EE, Openreach, and Plusnet. Millions of people rely on us every day to help them live their lives, power their businesses, and keep their public services running. We connect friends to family, clients to colleagues, people to possibilities. We keep the wheels of business spinning, and the emergency services responding.
We value diversity and celebrate difference. ‘We embed diversity and inclusion into everything that we do. It’s fundamental to our purpose: we connect for good.’
This is your chance to make a real difference to the world: to be part of the digital transformation of countless lives and businesses. Grab it.
#J-18808-Ljbffr