Site reliability engineer

London

Posted: 8h ago

Offer description

Description Summary: The Site Reliability Engineers will help build, operate and maintain high-performance, scalable and reliable services for the inhouse operations. In this role, you will be responsible for; overseeing the maintenance of applications, you’ll work closely with engineers to advocate and participate in sensible, scalable, systems design and share responsibility with them in diagnosing, resolving, and preventing issues. Responsibilities: Teamwork Documentation and Development - design, document and share specialist knowledge with other members of the team, including delivering training sessions when required as well as taking responsibility for all relevant documentation (updates, storage and roll out). Security – ensure high levels of security by design, along with architecting a platform which supports monthly patching and vulnerability management to meet company approved information security policies and procedures. Lifecycle Support – support management of IT assets to ensure they are fully supported, including planning upgrades or replacements prior to end of life, to avoid increased risk or service interruption. Availability – achieve SLA’s by building and maintaining services with no Single Points of Failure, identifying weak or failing components for replacement before they cause incidents. Capacity Support – configure and monitor infrastructure usage over time and with alerts to ensure we are always ‘one step ahead’ of demand. Incident Support – configure and respond to monitoring alerts for issues with any devices, supporting incidents and escalating when required. Problem Resolution – provide recommendations to avoid future incidents, including timely delivery of agreed solutions. Configuration and Assets – maintain configuration repositories, including network diagrams, IT asset management system and agreed documentation. Change Management – support the wider project and change programme, design and deliver agreed improvements following governance processes and industry best practices including documentation. Releases - ensure all changes are released or made into controlled environments following agreed and repeatable processes, including roll-back to a known working state. Reporting - provide agreed reporting and updates to the CTO and wider team, including accurate status of tickets being worked on. Horizon Scanning and Strategy – keep abreast of relevant new technologies, security threats and regulatory changes to support the Site Reliability strategy. Stay Updated: Keep abreast of industry trends, best practices, and emerging technologies in data engineering, analytics, and data management to suggest improvements and innovations. Essential Knowledge & Experience Bachelor’s degree in Computer Science or related field. Minimum of 7 years of experience in data engineering or a related field. Experience with Google Cloud Platform GCP delivery and support using IAC (terraform)” most of the working week for most of the working year for years Experience working in multi-cloud environments Experience in working with GKE, Kubernetes and Terraform is essential Prior experience designing, building and maintaining core services and infrastructure. Confidence in troubleshooting complex system issues independently. Ability to work with a high level of autonomy and responsibility in a rapidly changing environment with dynamic objectives and iteration with users. Demonstrated ability to continuously learn and drive ongoing improvements within and across teams. Experience working in Financial services or Banking. Demonstrated ability to identify and troubleshoot data quality issues. Excellent communication and collaboration skills with a strong attention to detail. Certifications in GCP Devops Engineering and Terraform would be added advantage(Desirable) System Admin –Understanding /Experience with windows and unix skills – are a basic foundation for a DevOps Engineer/SRE/GCP IAC Support This deployment is 100% IAC no manual configurations Nice To Have - Helm charts Behavioural Attributes: Key communicator - strong stakeholder management skills across all business & technology areas Bias towards action – ready to implement Measure to improve – utilising analytics across all types of activity to drive and demonstrate progress. Resilient & empathetic Customer-centric thinker A team player who assumes personal responsibility Independent

See the details

Create E-mail Alert

Save

Similar job

Site reliability engineer / devops

London

Oho

Site reliability engineer

Similar job

Site reliability engineer

London

Hays Construction And Property

Site reliability engineer

Similar job

Site reliability engineer

London

Huntress Talent

Site reliability engineer