We are seeking a Site Reliability Engineer (SRE) to join an innovative and fast-growing company in Belfast. This role focuses on ensuring the reliability, scalability, and performance of critical infrastructure and services while working with cutting-edge cloud-native technologies. You'll collaborate with engineering teams to enhance system resilience, streamline deployments, and drive automation.
What You'll Do
* Design, build, and optimise resilient, high-performance infrastructure.
* Develop and maintain CI/CD pipelines to enhance deployment efficiency.
* Implement cloud-native and open-source solutions to improve reliability and scalability.
* Proactively monitor and troubleshoot production systems, ensuring uptime and performance.
* Automate infrastructure provisioning and configuration using Infrastructure as Code (IaC).
* Drive continuous improvements in developer experience (DevEx) and operational efficiency.
* Participate in incident response, root cause analysis, and post-mortem reviews.
What You'll Need
* 3+ years in an SRE, DevOps, or Infrastructure Engineering role in a high-scale environment.
* Strong experience with Kubernetes and container orchestration.
* Deep knowledge of cloud platforms and distributed systems.
* Proficiency with databases and messaging systems (e.g., Elasticsearch, Postgres, Neo4j, RabbitMQ, Redis).
* Expertise in CI/CD tooling (e.g., GitHub Actions, ArgoCD).
* Hands-on experience with Infrastructure as Code (IaC) using Terraform, Terragrunt, or CDK.
* Solid scripting and automation skills (Python, Bash, or Go).
* A proactive, problem-solving mindset with a focus on reliability, automation, and scalability.
If you're passionate about reliability, automation, and cloud-native technologies, we'd love to hear from you. Apply now or reach out to Andrew Harrison for more details.