The CoE Lead - Observability & Tools at JD Sports Fashion Plc is a critical, hands-on technical role focused on designing, building, and maintaining the company's Observability platform. This role ensures that our technology platforms operate efficiently and reliably, providing early insights for Engineering, Service Reliability, Service Delivery, and DevOps teams.
The CoE Lead will manage the contract with third-party providers responsible for the execution layer, ensuring adherence to service-level agreements (SLAs) and key performance indicators (KPIs). The position involves a 75% focus on the design of frameworks and a 25% focus on implementation and adoption.
Job Title: Centre Of Excellence Lead- Observability & Tooling
Location: BL9 8RR
Working rota: Monday - Friday
Working hours: 40
What You'll Be Doing:
We are looking for an experienced CoE Lead to design, build, and maintain our Observability platform. The CoE Lead will work closely with DevOps, Engineering, Service Reliability, and Service Delivery teams to continuously improve our Observability capabilities.
This role is a technical, hands-on position with a 75% focus on framework design and 25% on implementation and adoption.
You will contribute to pipeline design, enabling observability from the first deployment in test environments and providing early insights for Engineering, Service Reliability, Service Delivery, and DevOps teams. The role involves building frameworks for intelligent alerts to help Service Delivery teams quickly triage incidents and enable automated runbooks. Additionally, you will identify and deploy tools to automate incident detection, notifications, triage, and resolution.
Key Responsibilities:
* Adopt a pipeline approach to enable observability of services deployed across multiple environments, balancing monitoring, logging, and tracing based on service classification.
* Design and build intelligent alerts using pipelines, onboarding automated runbooks triggered with clear audit/logs in service management tools like Jira Service Management.
* Create and maintain dashboards for proactive monitoring of services to help teams resolve incidents quickly.
* Continuously improve monitoring capabilities to identify key alerts and thresholds for early warnings before services fail.
* Enable intelligent alerts with fine-grained details of underlying services causing issues, extending to trigger automated execution of runbooks with clear audit logs.
* Work closely with DevOps, Service Reliability, and Service Delivery teams to identify and deploy tools that automate incident detection, notifications, triage, and resolution.
What We're Looking For:
Skills:
* Leadership and Collaboration:
o Strong leadership skills with the ability to mentor, coach, and develop high-performing teams.
o Excellent communication and interpersonal skills, capable of building strong relationships with both technical and business stakeholders.
o Proven ability to collaborate effectively with cross-functional teams, including DevOps, Engineering, Service Reliability, and Service Delivery teams.
* Technical Expertise:
o In-depth knowledge of open-source and commercial observability tools (e.g., Prometheus, Grafana, NewRelic).
o Expertise in cloud environments (e.g., AWS, Azure) and infrastructure as code (IaC) tools like Terraform.
* Monitoring and Observability:
o Experience in creating and maintaining dashboards for proactive monitoring of services.
o Ability to design and build intelligent alerts using pipelines, enabling early detection of issues and automated incident response.
o Knowledge of the latest technology trends in the monitoring landscape, such as OpenTelemetry.
* Contract Management:
o Experience in managing third-party provider contracts, including negotiating terms, monitoring performance, and ensuring adherence to SLAs and KPIs.
o Ability to integrate third-party providers seamlessly into the organisation's workflows, aligning with the overall strategic vision.
Experience:
* Professional Experience:
o Minimum of 5-8 years of experience in technology service delivery and management, focusing on observability, monitoring, and tooling.
* Service Management:
o Practical experience in building and maintaining a Service Catalogue, assigning service level objectives (SLOs), and measuring service level indicators (SLIs).
o Experience in operating production services during peak trading periods without service degradation.
* Automation and Tooling:
o Knowledge of automation tools to simplify alert notifications and extend to automated runbook execution.
o Experience in implementing observability solutions for retail stores or similar environments.
The Company:
The JD Group is a leading omnichannel retailer of Sports Fashion, Street & Premium Fashion, Outdoors and Gyms with over 90,000 colleagues over 4,500 stores across several retail fascia's in over 36 countries around the world.
We are an equal opportunities employer who embraces and values differences. We recognise the importance of an inclusive workplace culture in which everyone can thrive irrespective of their background or identity.
To be a part of this successful and continuously growing company, you will have the desire to ingrain our strategic goals of being a people first, a digital leader and customer focused organisation which provides operational excellence and is continuous with identifying new areas of growth into our day-to-day.
We know our employees work tirelessly to make JD Sports the success it is today and in turn, we offer them some amazing benefits:
* Incremental Holiday Allowance
* Staff Discount on qualifying purchases across Group retail stores and online
* Exclusive Colleague Bike Discount scheme
* Discounted Gym membership
* Personal development opportunities to learn and develop at work
* Access to Apprenticeships and accredited qualifications
Interested?
If you are interested in this position, then press the Apply Now button.
Due to the high volumes of applications our opportunities attract, it takes time to review them all. If you don't hear back within two weeks of your application, please consider your application to have been unsuccessful on this occasion.
Applications that meet the skills criteria will be contacted for a 1st stage meeting with the talent team. Shortlisted candidates will then be invited to interview with the hiring manager.
Thank you again for your time. #J-18808-Ljbffr