Mthree Glasgow City, Scotland, United Kingdom
Site Reliability Engineer
A fantastic opportunity to be working with a leading international investment bank as an SRE within their Cyber Security team.
As part of the Data Loss Prevention (DLP) cybersecurity team, you will be involved with helping a shift from classical Monitoring towards an observability model of metrics, diagnostic logging, distributed tracing, scalability, SLO/SLI based alerting, utilizing the SRE tooling stacks available, primarily focused on driving improvements with building functional and usable Telemetry dashboards for the DLP Cybersecurity stack.
This is an opportunity for those that wish to change career track into cybersecurity and also those who have a cybersecurity background and wish to build on their skills.
Duties will involve but are not limited to:
* Review, Write, and Optimise PromQL queries for Prometheus.
* Operate, Troubleshoot, and Optimise Prometheus in agent mode.
* Review and Craft Grafana dashboards following best practices, such as the Four Golden Signals or RED methodology.
* Review and Craft Splunk dashboards following best practices, such as the Four Golden Signals or RED methodology.
* Revise alerting to reduce noise and false positives, determining any alerting gaps.
* Revise PagerDuty alerting rules and orchestration to reduce noise and false positives, determining any alerting gaps.
* Collaborate with the DLP squads on enhancing current alerting standards that follow SRE best practices.
* Innovate and improve with practical application on continuous enhancements of our monitoring systems.
* Building DLP squad actionable insights from telemetry data.
* Be part of a rota for the 24/7 support of DLP products.
Skills Must Have:
* Critical thinking ability and a proactive approach to identifying and resolving issues.
* Have a track record with establishing microservice SLO and managing error budgets.
* Excellent communications and collaboration skills to work effectively with the squads.
* Experienced in the application of SRE principles.
* 3+ years of Prometheus experience, including Prometheus architecture, Prometheus exporters, and PromQL.
* 3+ years of Grafana skills.
* 3+ years on Splunk.
* Excellent knowledge of observability, especially metrics and dashboarding.
* Fluent in a programming or scripting language.
* Experienced in the use of CI/CD tools (e.g. BitBucket, Jenkins, etc.)
* Experienced in UNIX/Linux based environments.
Would Be Nice to Have:
* Experience with any product that deals with incident, problem, and change management.
* Automation.
* Cybersecurity.
* DLP product.
* Working in an Agile Environment.
* Operational Environment.
* OpenTelemetry skills.
Seniority level
Associate
Employment type
Full-time
Job function
Information Technology
Industries
IT Services and IT Consulting, Financial Services, and Investment Banking
#J-18808-Ljbffr