A leading financial services company located in Central London is seeking a Site Reliability Engineer to join their growing Infrastructure team on a permanent basis with Hybrid working.
Responsibilities:
* Expand and fortify the IT architecture for optimal availability.
* Implement continuous integration and deployment practices for seamless development workflows.
* Leverage state-of-the-art technology for streamlined automation and repeatability.
* Engage in Agile practices, including pair programming and daily standups.
* Establish a competitive edge by constructing robust infrastructure.
* Efficiently reduce TOIL and make informed trade-offs when necessary.
* Foster strong relationships with business counterparts to gain deep insights into client needs.
* Contribute to shaping a culture centered around Service Level Objectives in the engineering domain.
* Proactively address challenges by identifying and confidently mitigating risks, issues, or control weaknesses in day-to-day operations.
Skills and Experience:
* Proven track record of working with cloud-based infrastructure, particularly in AWS environments.
* In-depth knowledge and hands-on experience with Terraform for infrastructure provisioning and management.
* Extensive expertise in constructing, managing, and maintaining Kubernetes clusters within a high-availability, high-traffic Production setting.
* Proficiency in one or more programming languages, with a preference for Go, Python, Ruby, or Node.
* Comfortable troubleshooting in intricate environments using a range of monitoring and logging tools, including but not limited to Grafana, Prometheus
#J-18808-Ljbffr