SENIOR SRE
We are currently looking for a dedicated Site Reliability Engineer (SRE) to join my client's team.
Job Description: As an SRE, you will manage the global AWS infrastructure, ensuring the seamless deployment and operation of new features and products. You will be an integral part of the on-call rotation, addressing application and service issues, and communicating effectively with management and engineering teams.
Key Responsibilities:
* Oversee our global AWS services and infrastructure, supporting the deployment and roll-out of new features and products.
* Participate in on-call rotations to troubleshoot and resolve application and service issues, providing clear status updates to management and engineering teams.
* Embrace the DevOps culture with a strong focus on automation.
* Develop and implement CI/CD solutions in an AWS environment using Jenkins and ArgoCD, aiming for one-click deployments, rollbacks, and parameterized builds.
* Collaborate with the development team to integrate SRE/DevOps processes, ensuring new architectures are designed for operability, stability, and scalability.
* Act as a key member of the SRE team, managing the overall system health, performance, and capacity of our internal and client-facing systems.
Required Qualifications:
* Extensive hands-on experience with AWS services (e.g., EC2, EKS, Aurora MySQL, S3, DynamoDB, Lambda).
* Proficiency with Containers and Kubernetes.
* Experience with Infrastructure-as-Code tools (e.g., Terraform, Ansible).
* Strong Unix/Linux administration and scripting skills (e.g., Bash, Python).
* Familiarity with logging and monitoring tools (e.g., DataDog, Splunk).
* Excellent communication skills and the ability to thrive in a fast-paced environment.
* Experience with CI/CD tools such as GitHub, Jenkins, and ArgoCD.
* Knowledge of Google Cloud Platform.
* Networking experience.
* ITIL certification.
* Security expertise.
For more information, please contact Alice Armstrong at Hayward Hawk.