Site Reliability Engineer (Network Automation)
Location: Fully Remote
Salary: £95,000 - £110,000 + Bonus
A leading cybersecurity company is looking for a Site Reliability Engineer to join their expanding Infrastructure and Cloud Operations team. This group is focused on transforming the company's core infrastructure through modern technologies, automation, and Infrastructure-as-Code.
The Role
As a Site Reliability Engineer, you'll work closely with the Network Automation Team to design and build infrastructure for a new automation platform. This platform will need to scale with the company's growing global network and customer base. You'll play a key role in shaping decisions that impact how the infrastructure serves customers, with an emphasis on ensuring reliability, performance, and security.
In this role, you'll be involved in solving problems, optimizing infrastructure, and using data to drive improvements. The tools you help build will enhance operational efficiency and contribute to the evolution of the company's infrastructure.
Key Responsibilities:
* Collaborate with the Network Automation Team to build and deploy infrastructure for a new automation platform.
* Apply core SRE principles (such as SLI/SLO/SLA) to boost reliability and reduce manual work (toil).
* Set up metrics for data-driven decisions to enhance reliability, availability, and speed.
* Develop and maintain baselines for service level objectives (SLOs) and indicators (SLIs).
* Analyze and test network and system integrity to ensure smooth operations.
* Work with internal teams to troubleshoot and resolve critical infrastructure issues.
* Participate in incident response, root cause analysis, and post-incident reviews to improve future reliability.
* Join a 24x7 on-call rotation to support continuous infrastructure availability.
What You'll Need as a Site Reliability Engineer:
* At least 3 years of experience working within large-scale cloud or CDN infrastructures.
* Proficiency with Python and Go (C/C++ is a plus).
* Strong knowledge of Linux systems, network protocols (TCP, UDP, DNS, HTTP), and network programming.
* Experience with BGP and Anycast routing is an advantage.
* Familiarity with DevOps tools (Ansible, Saltstack), CI/CD pipelines (Gitlab, Jenkins), and monitoring tools (Prometheus, Grafana).
* Experience with containers and container orchestration (Docker, Kubernetes).
* Strong analytical and troubleshooting skills for large-scale distributed systems.
* Excellent collaboration and communication skills, with the ability to work cross-functionally.
* A degree in Computer Science, Engineering, or a related technical discipline, or equivalent experience.
This is an excellent opportunity for a Site Reliability Engineer who is passionate about infrastructure automation and eager to make an impact on a global scale. The company offers a collaborative environment and the chance to work on cutting-edge technologies while maintaining a reliable and secure infrastructure for customers. If you're excited about building scalable solutions, this role is your next step.