Job Title: Site Reliability EngineerLocation: Remote (UK)Type: Full-Time (1-Year Contract)Working Hours: 11 AM - 7 PM
Are you passionate about building and managing reliable, large-scale cloud systems? We're looking for a Senior Site Reliability Engineer to join a high-performing Observability team. In this role, you'll play a critical part in ensuring our cloud services remain performant and scalable, supporting billions of daily requests.
Key Responsibilities
* Scale and optimize Prometheus architecture to manage millions of active metrics.
* Operate and maintain large ElasticSearch clusters (2000TB+).
* Build and manage high-throughput Kafka pipelines processing hundreds of thousands of events per second.
* Develop self-service APIs, robust alerting systems, and deploy infrastructure with Terraform.
* Support observability initiatives to monitor and improve critical cloud services.
What We're Looking For
1. 5+ years of experience managing distributed systems on Linux (Debian/Ubuntu preferred).
2. 2+ years of development experience with Ruby, Python, Go, or similar languages.
3. Expertise in technologies such as ElasticSearch, Kafka, Prometheus, Terraform, Ansible, and more.
4. A strong passion for solving complex ...