Site Reliability Engineer (SRE) Contract Role, Platform, Chaos Engineering | FinTech, Enterprise | Fully Remote, UK | £ 600 - 700pd (Outside IR35), 6 months+
The Client:
Owen Thomas has partnered with a company that is looking for exceptional engineers that have a genuine interest in working with cutting-edge technology, in a globally renowned FinTech company.
Technical Requirements
Infrastructure Expertise:
-Advanced experience with cloud platforms (AWS, GCP, Azure), including designing, -deploying, and maintaining scalable infrastructure.
-Strong knowledge of container orchestration tools like Kubernetes and Docker.
-Expertise in Infrastructure as Code (IaC) tools such as Terraform, CloudFormation, or Pulumi.
Chaos Engineering Proficiency:
-Hands-on experience with chaos engineering tools like Gremlin, Chaos Monkey, or LitmusChaos to design and execute fault injection experiments.
-Proven track record of implementing resilience testing strategies across distributed systems.
Monitoring and Observability:
-Experience with monitoring and observability tools (e.g., Prometheus, Grafana, Datadog, New Relic, Elastic Stack).
-Strong understanding of metrics, logging, and tracing in distributed systems.
Automation and Scripting:
-Proficiency in scripting and automation languages (e.g., Python, Go, Shell, Ruby, or Java).
-Demonstrated ability to automate infrastructure and operational processes.
Incident Management and Root Cause Analysis:
-Expertise in incident response processes, including triage, mitigation, and communication.
-Familiarity with incident management tools like PagerDuty or Opsgenie.
Resilience and Scalability Design:
-Advanced understanding of system design principles, scalability, and high-availability architectures.
-Practical experience with load testing and performance benchmarking tools (e.g., JMeter, Locust, k6).
Soft Skills and Additional Qualities
Strong Problem-Solving Skills:
-Ability to debug and resolve complex issues in production environments.
Cross-Team Collaboration:
-Experience working closely with development, DevOps, and QA teams to implement best practices in reliability and availability.
Proactive Communication:
-Clear and concise communication skills to collaborate with diverse stakeholders and write detailed documentation.
Mentorship and Knowledge Sharing:
-Willingness to mentor other team members in chaos engineering principles and SRE best practices.
Desirable Extras
Certifications:
-Relevant certifications (e.g., AWS Certified DevOps Engineer, CKA, CKAD, or Google Professional Cloud DevOps Engineer).
Experience in Highly Regulated Industries:
-Familiarity with compliance frameworks (e.g., PCI DSS, GDPR, ISO 27001) is advantageous.
Exposure to Emerging Tools and Practices:
-Knowledge of modern chaos engineering trends, such as adaptive resilience testing or AI-driven fault detection.
Performance Monitoring in Legacy Systems:
-Ability to apply SRE and chaos engineering principles in legacy system environments.
If you are interested in applying, please apply here and we will get back to you if it's a good match for the client! We appreciate your patience :)