Nov 11, 2024 - OnBuy is hiring a remote Senior Site Reliability Engineer. Salary: £65,000 - £80,000 per annum. Location: UK.
Who are OnBuy?
OnBuy is an online marketplace on a mission to be the best choice for every customer, everywhere.
We have recently been named one of the UK's fastest-growing tech companies in Deloitte's Technology Fast 50 for the third year in a row.
Working at OnBuy:
We are a team of driven and motivated people who thrive when working at pace. To succeed at OnBuy, you need to take charge and fully own your responsibilities. Working at OnBuy means being surrounded by opportunities while staying focused and prioritising ruthlessly.
At OnBuy, you're not just a number. We are creating something special, and you have the opportunity to affect meaningful change and have your voice heard.
We work in a flexible way, meaning we can prioritise our health and relationships, but when we are working, we graft.
Job overview:
As a Senior Site Reliability Engineer, you will play a critical role in ensuring our systems and environments are robust and reliable, with a high degree of observability, monitoring, and alerting.
You will help OnBuy build and maintain scalable, reliable systems while ensuring that our services meet the high standards of availability and performance expected by our users. Your expertise will be invaluable in automating and enhancing our operational processes, monitoring application performance, and troubleshooting complex issues.
You will collaborate closely with software engineers to design reliable and efficient systems, participating in reliability reviews and driving best practices. Additionally, you will be responsible for creating and managing infrastructure as code, leveraging modern cloud technologies and tools.
Key Responsibilities:
1. Design and implement scalable systems to ensure high availability and performance.
2. Develop automated solutions for monitoring, scaling, and system health management.
3. Collaborate with software development teams to identify and resolve reliability issues.
4. Create and maintain documentation related to system architecture, processes, and configurations.
5. Perform incident response and postmortem analysis to improve site reliability and performance.
6. Monitor system performance and make necessary adjustments to ensure optimal functionality.
7. Implement and manage infrastructure as code using tools like Terraform or Ansible.
8. This role requires out-of-hours support (via a rota) to address urgent DevOps issues, ensuring the reliability and availability of critical systems.
Requirements
Essential
1. Proven experience as a Senior Site Reliability Engineer or in a similar role.
2. Strong proficiency in programming languages such as Python, Go, or Java.
3. Experience with cloud service providers (AWS, Azure, Google Cloud) and container orchestration tools (Kubernetes, Docker).
4. Solid understanding of networking, distributed systems, and microservices architecture.
5. Familiarity with monitoring and logging tools (New Relic, Prometheus, Grafana, ELK stack, GCP logging).
6. Excellent problem-solving skills and the ability to work effectively in a team.
7. Strong communication and interpersonal skills, with the ability to effectively collaborate with cross-functional teams.
Benefits
The salary range on offer for this role is £65,000 - £80,000 per annum, depending on experience.
In return for helping us to grow, we’ll offer you company equity, meaning you own a piece of this business we are all working so hard to build.
#J-18808-Ljbffr