Senior site reliability engineering

London

Permanent

Randstad

Engineering

Posted: 9 February

Offer description

Job Title: Site Reliability EngineerLocation: Remote (UK)Type: Full-Time (1-Year Contract)Working Hours: 11 AM - 7 PM

Are you passionate about building and managing reliable, large-scale cloud systems? We're looking for a Senior Site Reliability Engineer to join a high-performing Observability team. In this role, you'll play a critical part in ensuring our cloud services remain performant and scalable, supporting billions of daily requests.

Key Responsibilities
* Scale and optimize Prometheus architecture to manage millions of active metrics.
* Operate and maintain large ElasticSearch clusters (2000TB+).
* Build and manage high-throughput Kafka pipelines processing hundreds of thousands of events per second.
* Develop self-service APIs, robust alerting systems, and deploy infrastructure with Terraform.
* Support observability initiatives to monitor and improve critical cloud services.
What We're Looking For
1. 5+ years of experience managing distributed systems on Linux (Debian/Ubuntu preferred).
2. 2+ years of development experience with Ruby, Python, Go, or similar languages.
3. Expertise in technologies such as ElasticSearch, Kafka, Prometheus, Terraform, Ansible, and more.
4. A strong passion for solving complex ...

Apply

Create E-mail Alert

Save

Similar job

Oconus field engineer 3 - bacn mission coordinator

London

Permanent

Northrop Grumman

Field engineer

Similar job

Senior principal specialist, quality assurance (design)

London

Permanent

Leica Biosystems

Principal

Similar job

Principal data/mlops engineer

London

Permanent

Understanding Recruitment

Principal