Site Reliability Engineer
Location: London/Remote
Duration: Contract - 6 Months - Day Rate Negotiable
We are currently seeking a highly skilled Site Reliability Engineer (SRE) to join our financial services client. This is a contract role offering the flexibility to work remotely or from our client’s London office. The ideal candidate will have strong experience in monitoring and observability platforms, particularly Datadog, and be well-versed in Kubernetes and automation technologies.
Key Responsibilities:
1. Datadog Expertise: Implement, maintain, and enhance monitoring solutions using Datadog, ensuring optimal performance and real-time observability across client environments.
2. Kubernetes & OpenShift (OCP): Leverage extensive experience with OCP and Kubernetes to manage, scale, and optimize containerized applications and infrastructure.
3. Automation & Testing: Apply your automation skills to streamline operations and testing workflows using industry-specific tools, ensuring efficiency, reliability, and scalability.
4. Disaster Recovery & Operational Excellence: Develop and maintain Disaster Recovery (DR) strategies and ensure the adoption of Operational Excellence best practices within client infrastructure.
5. Cloud & Container Certification: Demonstrate expertise through certifications in AWS and Kubernetes while applying this knowledge to client projects.
6. Client Engagement: Collaborate directly with clients, bringing your consulting experience to deliver technical solutions that meet their unique needs and business objectives.
Required Skills and Experience:
1. Datadog Experience: Proven track record of implementing and managing Datadog in production environments.
2. OCP/Kubernetes: Strong experience in managing Kubernetes and OpenShift (OCP) platforms in high-availability environments.
3. Automation Tools Knowledge: Hands-on experience with automation tools and frameworks, such as Terraform, Ansible, or similar, to optimize infrastructure as code.
4. Certifications: Certification in AWS (Solutions Architect, SysOps Administrator, or similar) and/or Kubernetes (CKA/CKAD).
5. Disaster Recovery & Best Practices: Strong knowledge of DR strategies, coupled with expertise in Operational Excellence frameworks and best practices.
6. Consulting & Client-Facing Experience: Preferred background in a consulting or client-facing role, with the ability to communicate effectively with both technical and business stakeholders.
What You’ll Bring:
1. A proactive, solutions-driven mindset with a focus on automation and resilience.
2. The ability to work independently and manage multiple projects in a fast-paced, client-driven environment.
3. Strong communication skills and the ability to collaborate across teams and with clients.
This is an excellent opportunity for an experienced Site Reliability Engineer to make a significant impact on a dynamic financial services organization, utilizing cutting-edge technology and best practices.
#J-18808-Ljbffr