Senior Site Reliability Engineer - Trusty
Hybrid: London
Stacklok is an innovative software supply chain security startup founded by Kubernetes co-founder, Craig McLuckie and Sigstore founder, Luke Hinds. Our mission is to make it easier to securely develop software. With our deep expertise in open source technologies and commitment to enhancing software security, we are seeking highly skilled and motivated individuals to join our team. This is a rare opportunity to join a startup at an early stage, and to be part of a team that is committed to building something truly innovative and impactful. Learn more about Stacklok’s mission, virtues, and leadership, HERE.
Location
This is a hybrid role that requires on-site work at our London office three (3) days a week. Our office is conveniently located in WeWork at 1 Mark Square, London, EC2A 4EG.
Elevator Pitch
Stacklok Cloud is a comprehensive security platform that combines open source package intelligence with a policy platform built on the open source project, Minder, allowing developers to securely consume open source software while enabling security teams to effectively manage and maintain a robust security posture across the entire software supply chain.
We are seeking a Senior Site Reliability Engineer (SRE) to support Trusty, our package intelligence service that empowers developers to make safer open source dependency choices. Embedded within the OSS insights product team, this role focuses on driving essential initiatives in automation, system monitoring, configuration management, continuous delivery improvements, and incident response to ensure exceptional service performance and reliability.
In addition, this role will be part of a company-wide guild dedicated to unifying platform automation, observability, and reliability practices across all product lines, building a cohesive, high-performance SaaS platform with seamless observability and reliability throughout the Stacklok ecosystem.
If site reliability engineering is your passion and you’re ready to make a lasting impact on the future of open source security, we want to hear from you!
In This Role You Will Have The Opportunity To
* Shape The Future of Stacklok Cloud: As a senior site reliability engineer, you’ll be instrumental in developing innovative solutions that enhance our platform’s reliability and performance. Your focus will include regular platform upgrades and the instrumentation of production systems to ensure active reliability and performance monitoring.
* Embrace an Automate Everything Mindset: Champion a culture of automation across all operational tasks. You’ll lead initiatives for environment automation and incident management tooling to streamline response improvements and enhance operational efficiency.
* Monitor and Improve Service Performance: Take charge of end-to-end service KPI monitoring to drive continuous improvements and ensure optimal performance.
* Uphold Standards of Excellence: Champion the reliability and quality of our systems by establishing clear Service Level Objectives and advocating for robust monitoring and incident management strategies.
Desired Skills & Experience
* Strong background in site reliability engineering, with a robust understanding of observability tools and distributed tracing like Jaeger, Prometheus and Grafana.
* Proficient in programming languages, particularly Python (with a big plus having Go experience).
* Comprehensive knowledge of Infrastructure as Code (IaC) principles, with proficiency in automation tools like Terraform.
* Experience with at least one major cloud provider (AWS, Azure, Google), preferably AWS.
* In-depth understanding of cloud-native application deployment and management using technologies like Docker and Kubernetes.
* Extensive experience in automating incident response processes using platforms such as PagerDuty.
* Proficient in log aggregation and analysis tools such as AWS Athena and Cloudwatch.
* Experience in defining and implementing Service Level Objectives (SLOs) and key performance indicators (KPIs).
* Knowledge of security best practices in site reliability.
* Impact-Driven and Collaborative: Track record of delivering solutions that drive business outcomes.
* Versatile and Self-Starting: Adaptable in dynamic, startup environments.
At Stacklok, you will be a part of a culture that values open communication, collaboration, and innovation. We offer a competitive salary package and flexible work hours. If you’re a self-motivated and result-driven individual with a passion for designing and building secure, scalable, distributed systems, and you want to be part of the most exciting startup in the secure supply chain space, come and join us!
Stacklok Inc, is proud to be an equal opportunity employer. We are committed to providing equal employment opportunities for all people and place great value in both diversity and inclusiveness.
#J-18808-Ljbffr