Site Reliability Engineer – 6 Month Contract – Fully Remote – Outside IR35
The ideal candidate excels in a fast-paced, innovative environment and is collaborative, accountable, and empathetic. We’re seek individuals who believe in teamwork and creating lasting growth for themselves and others. We hire based on attitude, skills, and commitment.
Responsibilities:
* Collaborate with development teams to define infrastructure and deployment needs.
* Actively contribute to and support automation and observability initiatives.
* Partner closely with team members to ensure a high-performing data platform for our observability SaaS product.
* Learn, build, and maintain operational tools for cloud (AWS & Azure) infrastructure deployment, monitoring, and analysis.
* Lead the response to production incidents, conduct postmortems, and drive continuous improvement through 24/7 on-call rotations, gaining exposure to critical issue resolution.
* Contribute to incident response playbooks and documentation for on-call processes.
* Drive operational performance by establishing and monitoring SLOs.
* Adhere to development best practices, including continuous integration/deployment and code reviews.
* Commit to continuous learning and professional growth by seeking mentorship and opportunities within the team.
* Leverage practices to boost development velocity, such as continuous integration/deployment and code reviews via GitHub pull requests.
Experience:
* At least 7+ years of experience designing, building and maintaining SaaS environments
* 5+ years of experience designing, building and maintaining AWS/Azure infrastructure with Terraform.
* Experience with data platforms, especially with high volume data ingestion and processing platforms.
* Experience with Clickhouse, Kafka.
* Experience building and running Kubernetes clusters
* Experience with observability (monitoring – logging, tracing, metrics)
* Experience with GitOps CI/CD processes
* Experience with scripting with Python, Go (Golang), bash, or PowerShell and AWS CLI tools
* Experience with security operations – security policies, infrastructure, key management, setup of encryption at rest and transport