Driven by a passion for the maritime industry, Ocean Technologies Group (OTG) stands as a leader in maritime software solutions, rooted in a legacy of iconic maritime brands with over 60 years of collective expertise. We have been a guiding light for the industry's top ship operators, helping them achieve unmatched safety and operational excellence.
At OTG, we’re not just a company—we're a vibrant community of maritime enthusiasts, tech innovators, and visionaries. Our comprehensive offerings range from fleet management to exceptional learning resources, positioning us at the forefront of maritime solutions. We forge strategic alliances with renowned global organizations, empowering our customers with the tools to drive sustainable success and enhance visibility across their operations.
Exciting changes are on the horizon! In January 2025, Lloyd's Register will acquire Ocean Technologies Group from Oakley Capital, further solidifying our position as a key player in the maritime sector. With this acquisition, we will deliver advanced solutions to a combined fleet of over 30,000 vessels, supporting over 1,000 shipowners and more than a million seafarers worldwide.
Our portfolio includes Learning & Assessment, Fleet Management, and Crew Management, uniting seven iconic maritime brands with over a century of collective experience.
The Infrastructure and Operations team are seeking an experienced Senior Site Reliability Engineer to take a lead role in a new and upcoming product from OTG, built upon the AWS platform.
As a Senior Engineer, you will work alongside Product, Development and DevOps teams to monitor and maintain the platform and ensure all AWS best-practices are implemented where possible. You will be responsible for ensuring overall reliability of the platform, through monitoring and attending to issues/alarms and regular testing of BC/DR processes as well as maintenance tasks such as system updates and security audits.
You will be joining a team of experienced Cloud Engineers in a lead role, so you will be expected to build and document processes and run books and give guidance to other engineers as we transition to the new platform.
What you will be doing?
Platform Maintenance
* Ensuring overall maintenance and reliability of the new platform on AWS.
* Define and build maintenance processes to be automated where possible.
* Implement appropriate systems and application monitoring and alarms where necessary.
Collaboration and Best Practices
* Coordinate with DevOps team for new deployments and upgrades.
* Work with Development and Product team to advise AWS best-practices in line with OTG requirements.
Incident Management
* Escalation route for incidents and alarms while working with Development to resolve recurring problems.
* Attending critical incidents 24/7 where appropriate following on-call escalation.
Qualifications and Experience
* AWS Certification at Professional level, with a preference for Networking and/or Databases (DocumentDB).
* At least 5 years experience working as a Cloud Engineer or any other SRE-related role.
* Understanding of BCDR (Business Continuity and Disaster Recovery) principles and experience of implementing and testing.
Leadership and Management
* Ability to take lead on projects and guide other engineers and stakeholders to required outcomes.
* Willingness to train and mentor team members to follow processes and practices that you have defined.
Skills and Abilities
* Strong desire for problem-solving and diving into issues which are difficult to resolve.