Senior Site Reliability Engineer, London
Client:
loveholidays
Location:
London, United Kingdom
Job Category:
-
EU work permit required:
Yes
Job Reference:
0a5d73f59faa
Job Views:
96
Posted:
22.01.2025
Expiry Date:
08.03.2025
Job Description:
About us
We are a rapidly growing online travel agency with technology at the heart of our success. In 2022, we sent millions of people on their dream holiday.
With a million visitors a day, our 100+ services handle 8k requests per second, while maintaining p95 search latency of 150ms. Our observability captures and processes 1TB of logs a day and 350k metric samples a second.
We focus on differentiation by relying heavily on open source, while also giving back through contributions to public repositories.
Responsibilities
As our first Site Reliability Engineer, you will contribute to the evolution of SRE practices like incident management, blameless postmortems, SLOs and error budgets. You will contribute to building reliable, performant, auto-scalable and highly available systems with the support of the existing Platform Infrastructure team.
* The application of our services through an SRE lens.
* Leveling up of SRE practices across the teams.
* Improvement of reliability KPIs of the platform.
* Help balance reliability with feature delivery using SLOs and error budgets.
Our engineering teams own the lifecycle of services from first commit to high-load operation in production. Your responsibility will be to help engineering teams succeed at operations, not to run their services for them.
What you'll be working on
* Exposing slow running code paths in critical applications using tools like Java Flight Recorder or Go’s pprof.
* Writing tools or modifying existing applications with reliability and performance in mind.
* Ensuring our systems and their individual components can withstand x10 load by improving our architecture.
* Shortening mean time to discovery and recovery with improvements to observability and alerting.
* Exposing system weaknesses with performance testing.
Our runtime architecture is Service Based. Our engineering teams provision and manage their services' infrastructure using modern cloud technologies.
We place a strong focus on observability, continually evolving our monitoring and alerting stack, currently centred around the cloud-native ecosystem. Our service mesh provides uniform observability of all production services at 10s intervals.
Performance and scalability are integral to our software and infrastructure development process, achieved by combining Computer Science fundamentals and cutting edge cloud technologies.
You should have a good understanding of
* HTTP, web services, REST.
* Containers, cloud.
* Testing, reliability, monitoring.
* Low-level debugging and troubleshooting.
What we'll give back to you
* Company pension contributions at 5%.
* Training budget for you to learn on the job and level yourself up.
* Discounted holidays for you, your family and friends.
* 25 days of holidays per annum (plus 8 public holidays), increasing by 1 day for every second year of service, up to a maximum of 30 days per annum.
* Ability to buy and sell annual leave.
* Cycle to work scheme, season ticket loan and eye care vouchers.
Please note that if you are NOT a passport holder of the country for the vacancy you might need a work permit. Check our Blog for more information.
Bank or payment details should not be provided when applying for a job. Eurojobs.com is not responsible for any external website content. All applications should be made via the 'Apply now' button.
Created on 22/01/2025 by TN United Kingdom
#J-18808-Ljbffr