Ripjar specialises in the development of software and data products that help governments and organisations combat serious financial crime. Our technology is used to identify criminal activity such as money laundering and terrorist financing and enables organisations to enforce sanctions at scale to help combat rogue entities and state actors.
Team Mission:
The core infrastructure team at Ripjar is responsible for commissioning and maintaining the underlying IT infrastructure that supports the company's data analytics and intelligence solutions. These systems are provisioned in a hybrid public/private cloud environment and include the underlying clusters used for large scale analytics as well as internal tooling and customer facing SaaS service.
Position Overview:
The DevOps Team Lead will oversee the day-to-day management of the core-infrastructure team (currently 5 headcount), ensuring the efficient provisioning, monitoring, maintenance, and troubleshooting of our mixed public and private cloud environment. This role requires a strategic mindset to design and implement infrastructure improvements while managing performance, capacity, and cost. The role holder will collaborate closely with Product, Delivery, Engineering, and Security to align infrastructure capabilities with business needs alongside regulatory requirements.
Key Responsibilities:
Team Leadership
* Coordination: Oversee the day-to-day activities of the operations team, ensuring that processes run smoothly and efficiently. This includes assigning tasks, monitoring progress, and addressing any issues that arise.
* Technical Oversight: Design and implement improvements to existing infrastructure as well as new services. Evaluate the benefits of third-party managed solutions vs internal provision.
* Performance Management: Assess and improve the performance of core-infrastructure team members, fostering a culture of continuous development.
Operations Management
* Process Management: Establish and optimise processes that enable the team to independently handle routine tasks.
* Jira Service Desk: Operate an internal facing service desk ensuring triage and timely ticket management as well as evolving ticket types to streamline support requests.
* Out-of-Hours Support: Coordinate out-of-hours support activities, ensuring a collective knowledge base for non-trivial SaaS support issues.
* Incident Response: Manage and contribute to incident response efforts for infrastructure-related issues, ensuring timely resolutions.
Capacity & Cost Management
* Capacity Planning: Conduct infrastructure capacity planning, utilising metrics to inform decisions and ensure readiness for business scaling.
* Cost Tracking & Optimization: Monitor and optimise costs associated with infrastructure and services, ensuring alignment with budgetary goals.
Compliance & Audits
* Compliance: Manage and contribute to recurring annual compliance activities, including ISO27001 and SOC2 audits, in collaboration with the respective audit teams and third-party advisors.
* Security: Ensure security best practice including identifying potential threats and vulnerabilities, designing secure software systems, and implementing robust security measures.
* Disaster Recovery Testing: Participate in disaster recovery testing, ensuring robust recovery processes are in place.
In addition to the above the role holder should remain technically proficient such that they can contribute to the daily activities of the team including provisioning, monitoring, maintenance, and troubleshooting of our core services.
Requirements:
* Minimum of 5 years in operations management, particularly within a platform / core infrastructure team (or equivalent).
* Proven ability to lead, mentor, and develop team members, fostering a culture of continuous improvement.
* Proficiency in managing hybrid cloud environments (both public and private) and familiarity with relevant technologies and platforms (e.g., AWS, Azure, Google Cloud). Our production workloads are currently hosted in AWS.
* Proficiency in infrastructure provisioning, systems administration and monitoring tools. We use Terraform, Ansible, k8s and Datadog to manage a range of RHEL/Rocky 9 hosts. Our analytics clusters make use of Spark, HBASE and HDFS.
* Experience in designing and implementing scalable infrastructure solutions, ideally with some exposure to parallel processing environments used for large-scale analytics.
* An appreciation of security best practice in areas such as network security, threat modelling, vulnerability assessment, IAM, SIEM and incident response.
* Skills in system monitoring, performance tuning, and troubleshooting infrastructure and micro-service-based architectures.
* Understanding of compliance frameworks like ISO 27001 and SOC 2, and experience in managing audits and compliance activities.
* Familiarity with incident response processes and tools, ensuring timely resolution of issues.
Benefits:
* Competitive salary DOE
* 25 days annual leave, rising to 30 days after 5 years of service.
* Flexible Hybrid working - 2 days in the office and 3 days at home
* 35 hour working week.
* Company Share Scheme.
* Private Family Healthcare.
* Employee Assistance Programme.
* Company contributions to your pension (Salary exchange scheme)
* Enhanced maternity/paternity pay.
* The latest tech including a top of the range MacBook Pro.
* Free food and drink
* Hybrid working from our Cheltenham, Bristol or London offices
Ripjar’s Commitment to Diversity
“Diversity is essential in the way we operate. Having people from different backgrounds, genders and experiences ensures that we make decisions with a truly global perspective. Diversity gives us strength in our technology, analysis and relationships.” - Maria Cox, Head of People Operations
Apply for this job
#J-18808-Ljbffr