SRE Engineer - Site Reliability Engineer - Azure and Kubernetes
Position: Permanent Remote with 1 time a month in the office Manchester
Salary: up to £88,000 Depending on Experience
Hours: 35 hours a week, 8-6 are the core hours, can work flexibly within these hours
Requirements:
1. 4+ years experience with Azure and Kubernetes
2. Strong networking concept knowledge
3. Azure and Kubernetes experience is crucial
4. Experience working in a startup company is ideal
5. Senior candidates needed for leading the platform and escalation point
Benefits:
1. Private medical plan covering eye test and dental
2. 40 days holiday (including bank holidays) plus your birthday off
3. Company pension scheme
4. Retailer discounts and other perks
5. Life and Wellbeing support
6. Enhanced parental leave
Skills:
1. Detailed knowledge of public cloud, primarily Azure
2. Experience of modern hosting options like Functions, Logic Apps, ADF, Container Apps, etc.
3. Experience with Kubernetes, preferable AKS and Azure resource connections
4. Programming/scripting skills in bash/PowerShell
5. Strong networking concept knowledge (TCP/IP, DNS, load balancing, routing)
6. In-depth knowledge of monitoring and logging tools (Data Dog, Grafana, ELK stack, etc.)
7. Knowledge of CI/CD practices, preferably with Azure DevOps
8. Confidence to work independently and guide/train inexperienced engineers
9. Enjoy solving technical problems
10. Experience with Windows and Linux Operating Systems
11. Coaching other engineers on infrastructure engineering principles
12. Good communication skills (written and verbal)
13. Good collaboration skills with all levels of technical ability
14. Good understanding of industry best practices for SRE
15. Exposure to Terraform is a plus
Working Behaviour:
1. Collaborate with development teams to define and implement infrastructure solutions ensuring reliability, scalability, and performance
2. Design and develop automated tools and scripts for continuous monitoring, deployment, and management of production systems
3. Troubleshoot and resolve complex production issues timely, applying root cause analysis
4. Manage and maintain cloud-based infrastructure and services
5. Develop and maintain documentation for processes and procedures for knowledge sharing and upskilling
6. Help projects implement better practices using DevOps principles
7. Assist the team in identifying engineering tasks and prioritize backlog based on project and support needs
8. Enhance operational reliability and scalability of existing products
9. Identify simple innovative technical solutions to complex engineering problems
10. Improve Infrastructure as Code testing capabilities through examples and documentation
11. Manage time to ensure project deadlines are met while also completing support tasks
12. Expected to be in the on-call rota with the rest of the teams
Due to high demand, we are only able to respond to applications that meet the required criteria.
#J-18808-Ljbffr