Site Reliability Engineering within Documents & Biometrics is responsible for ensuring GBG delivers a world-class experience for all our customers and team members globally. The Site Reliability Engineering Team is a 2nd line technical function, providing a gateway service between 1st line Customer Support and Technology 3rd line Engineering for supported products and services consumed by GBG Customers. The team is formed of customer-oriented, product knowledgeable, process driven, and technically skilled professionals. Their purpose is to proactively support, maintain, and monitor live service to identify, prevent and resolve customer impacting issues while providing a feedback loop to engineering to ensure continual service improvement for an outstanding customer service experience.
The role of the Site Reliability Engineering Team Lead is to provide people management and technical leadership for GBG Documents & Biometrics production systems 24/7/365 (outside of core working hours), ensuring customer impacting incidents and problems are resolved quickly, thoroughly, and professionally. As a lead in 2nd line application support, you will have a deep understanding of GBG Documents & Biometrics products, applications, and components hosted within our cloud environments. You will lead the implementation of monitoring to ensure high levels of observability to protect service availability and service performance for customer user journeys, as well as responding to system events, trends, and alerts. You will be responsible for the work activity, skills, and capability of the Site Reliability Engineering team.
What you will do
* As a people manager, you will be responsible for performance management and development of your team and resourcing where required.
* Take a leading role to a best-in-class 2nd line application and operational support service to GBG 1st line Customer Support and GBG Documents & Biometrics 3rd line Product Delivery functions.
* Co-ordinate a 24/7/365 (outside of core working hours) support service and participate in a call out rota to protect GBG Documents & Biometrics products and customers against service outages and degradation.
* Troubleshoot system and customer impacting incidents, effectively utilising logs and information from multiple sources to identify root causes, restoring service via a fix or workarounds, and where needed escalating to 3rd line teams.
* Proactively analyse incident and monitoring trends to identify availability and performance risks to services, including improvement opportunities.
* Deliver observability of our platforms by implementing and maintaining monitoring tooling, creating monitors, dashboards, and alerts ensuring service performance is accurately measured to achieve SLOs and customer SLAs.
* Ensure maintenance of a knowledge base, including the creation of technical documentation where required.
* Maintain application and service configurations, adhering to GBG Change Control process and deployment procedures.
* Contribute to service design and implementation of best practice to ensure successful transition of new products and services into production systems.
* Maintain availability and security operations to agreed standards schedules.
* Build relationships with internal stakeholders to ensure technical, operational, and customer support needs are met.
* Continually develop skills, competencies, and knowledge to support your personal development, a self-starter.
* Be a customer champion - proactive in recommending client enhancements to promote growth, retention, and customer satisfaction.
Minimum Requirements
* Experienced in a Site Reliability, Technical Operations, or Application Support role.
* Good experience of bespoke business solutions and products, including the ability to demonstrate and troubleshoot the relationship between business customer services and underpinning technology components (web, Middleware/Applications, Databases, Infrastructure) and data flows.
* Excellent application of support processes, including Incident, Problem, Request, Event and Change Management.
* Excellent application of technical troubleshooting, procedures, technical knowledge, and personal skillset to rapidly resolve issues.
* Excellent understanding of observability solutions, and deployment, integration and configuration of operational monitoring tooling to capture events to inform performance objectives (SLOs/SLAs & APM, Synthetics, Web, Infrastructure etc).
* Good experience supporting Cloud platforms (AWS, Azure, Google), cloud native technologies (Kubernetes, PaaS services) and operating systems (Windows and/or Linux).
* Good experience querying cloud hosted databases (RDS, Azure SQL and others).
* Performance driven to ensure business and technology can be evidenced against agreed SLAs, SLOs and relevant customer metrics.
* Experience of Release and Deployment Tooling (Azure DevOps).
* Experience of scripting and automation (Terraform, PowerShell).
* Experience adhering to security standards and securing systems (ISO27001, PCI-DSS, SSL & encryption, WAF & attack protection).
* Experience of ITIL operational support working practices.
* Understanding of QA test and Development coding practices.
Desirable Skills
* Experience of Agile delivery working practices.
#J-18808-Ljbffr