Application Deadline: Monday 3rd March
Hybrid Working Pattern - 3 days in Office & 2 WFH
About us
Cynergy Bank is the UK’s human digital bank serving the needs of ‘scale up’ or medium sized and fast-growing SMEs; professionals; high net worth and mass affluent individuals, in essence those market segments that still value human service enabled by great technology.
We recognise that professional and personal lives often overlap and our mission is to help empower our customers to achieve their ambitions by serving all their interdependent banking needs. We provide a comprehensive range of digitally enabled products and services to meet the property finance, business and commercial banking, private banking and personal savings needs of our customers.
Our human and digital model transforms banking for customers who still value a face-to-face relationship that is enabled by the latest digital technology.
We partner with firms such as Google Cloud, Cigniti and Slalom as we continue to innovate in the human digital space.
Cynergy Bank plc is authorised by the Prudential Regulation Authority and regulated by the Financial Conduct Authority and the Prudential Regulation Authority. Eligible deposits with Cynergy Bank plc are protected by the UK Financial Services Compensation Scheme.
For more information on Cynergy Bank visit www.cynergybank.co.uk
Company Benefits
* Competitive Salary and Company Bonus
* Competitive holiday allowance plus bank holidays
* Option to purchase an additional 10 days holiday
* Pension contribution and Life Assurance
* Income Protection Scheme and Season Ticket Loan
* Medical Cover (After Probation)
* Electric Car Scheme and Money Coach (After Probation)
The Role:
The Site Reliability Engineer (SRE) is responsible for ensuring the reliability, availability, and performance of the bank’s critical systems and infrastructure. The SRE focuses on building scalable solutions to improve system resiliency, automating repetitive operational tasks, and collaborating with engineering teams to enhance system reliability. This role balances operational responsibilities with engineering innovation to align with the bank’s strategic goals of delivering seamless and secure services.
Key Performance Indicators (KPIs)
* System uptime and availability meeting or exceeding agreed Service Level Objectives (SLOs).
* Mean Time to Recovery (MTTR) and Mean Time Between Failures (MTBF) metrics for critical incidents.
* Reduction in manual intervention through automation and tooling improvements.
* Compliance with security and operational standards for all deployed systems.
* Adoption and integration of SRE best practices across teams and functions.
Responsibilities:
Reliability and Performance
• Monitor and maintain the reliability, uptime, and performance of production systems and services.
• Design and implement tools and frameworks to proactively identify and mitigate potential issues.
• Conduct performance tuning and capacity planning to ensure systems scale with the bank’s needs.
Incident Management and Root Cause Analysis
• Lead the response to critical incidents, ensuring swift resolution to minimize business impact.
• Conduct detailed root cause analyses to identify and resolve underlying issues.
• Collaborate with Engineering and IT Operations teams to implement preventive measures.
Automation and Efficiency
• Develop and maintain automation tools for deployment, monitoring, and infrastructure management.
• Automate repetitive operational tasks to improve team efficiency and reduce errors.
• Implement CI/CD pipelines to ensure fast, reliable, and secure code deployments.
Collaboration and Stakeholder Engagement
• Work closely with Engineering, IT Operations, and Change Management teams to support service delivery goals.
• Collaborate with Information Security to ensure systems meet security and compliance standards.
• Partner with Architecture to ensure reliability is built into system design.
Continuous Improvement and Innovation
• Identify and drive initiatives to improve system resiliency, reduce downtime, and enhance performance.
• Stay up-to-date with industry trends, tools, and best practices for site reliability engineering.
• Develop and document operational processes to ensure consistency and knowledge sharing.
Essential Knowledge & Experience:
* Bachelor’s degree in Computer Science, Engineering, or a related field.
* Experience with monitoring and observability tools (e.g., Prometheus, Grafana, Splunk, Datadog).
* Strong expertise in scripting and automation using Python, Bash, or similar languages.
* Proficiency with infrastructure as code (e.g., Terraform, Ansible) and container orchestration tools (e.g., Kubernetes, Docker).
* Experience in building and managing CI/CD pipelines.
* Strong knowledge of cloud platforms (e.g., AWS, Azure, Google Cloud) and Linux/Unix systems.
Desirable knowledge & Experience:
* Experience in the banking or financial services industry.
* Knowledge of security standards and regulatory compliance (e.g., ISO 27001, GDPR).
* Familiarity with disaster recovery and business continuity planning.
* Understanding of database performance tuning and optimization.
Reporting and Relationships:
* Reports to the Engineering Lead or IT Operations Lead.
* Collaborates with teams across Engineering, IT Operations, Architecture, and Information Security.
* Works with Change Management to ensure reliable deployments and incident-free transitions.
J-18808-Ljbffr