Site Reliability Engineer (SRE) Contract Role, Platform, Chaos Engineering | FinTech, Enterprise | Fully Remote, UK | £ 600 - 700pd (Outside IR35), 6 months The Client: Owen Thomas has partnered with a company that is looking for exceptional engineers that have a genuine interest in working with cutting-edge technology, in a globally renowned FinTech company. Technical Requirements Infrastructure Expertise : -Advanced experience with cloud platforms (AWS, GCP, Azure), including designing, -deploying, and maintaining scalable infrastructure. -Strong knowledge of container orchestration tools like Kubernetes and Docker. -Expertise in Infrastructure as Code (IaC) tools such as Terraform, CloudFormation, or Pulumi. Chaos Engineering Proficiency : -Hands-on experience with chaos engineering tools like Gremlin, Chaos Monkey, or LitmusChaos to design and execute fault injection experiments. -Proven track record of implementing resilience testing strategies across distributed systems. Monitoring and Observability : -Experience with monitoring and observability tools (e.g., Prometheus, Grafana, Datadog, New Relic, Elastic Stack ). -Strong understanding of metrics, logging, and tracing in distributed systems. Automation and Scripting : -Proficiency in scripting and automation languages (e.g., Python, Go, Shell, Ruby, or Java ). -Demonstrated ability to automate infrastructure and operational processes. Incident Management and Root Cause Analysis : -Expertise in incident response processes, including triage, mitigation, and communication. -Familiarity with incident management tools like PagerDuty or Opsgenie. Resilience and Scalability Design : -Advanced understanding of system design principles, scalability, and high-availability architectures. -Practical experience with load testing and performance benchmarking tools (e.g., JMeter, Locust, k6 ). Soft Skills and Additional Qualities Strong Problem-Solving Skills : -Ability to debug and resolve complex issues in production environments. Cross-Team Collaboration : -Experience working closely with development, DevOps, and QA teams to implement best practices in reliability and availability. Proactive Communication : -Clear and concise communication skills to collaborate with diverse stakeholders and write detailed documentation. Mentorship and Knowledge Sharing : -Willingness to mentor other team members in chaos engineering principles and SRE best practices. Desirable Extras Certifications : -Relevant certifications (e.g., AWS Certified DevOps Engineer, CKA, CKAD, or Google Professional Cloud DevOps Engineer ). Experience in Highly Regulated Industries : -Familiarity with compliance frameworks (e.g., PCI DSS, GDPR, ISO 27001) is advantageous. Exposure to Emerging Tools and Practices : -Knowledge of modern chaos engineering trends, such as adaptive resilience testing or AI-driven fault detection. Performance Monitoring in Legacy Systems : -Ability to apply SRE and chaos engineering principles in legacy system environments. If you are interested in applying, please apply here and we will get back to you if it's a good match for the client We appreciate your patience :)