Job description
RemoteStar is looking to hire a Senior Site Reliability Engineering Manager on behalf of our client based in the UK with a fully remote work policy.
About Client:
The client is building the B2B marketplace for diamonds. It’s an industry-leading B2B diamond and gemstones marketplace, connecting jewelry retailers to gemstone suppliers. They have a presence in London, Hong Kong, Amsterdam, Mumbai, and New York since 2001.
About the role:
As the SRE Manager, you will play a critical role in ensuring the reliability, scalability, and performance of our infrastructure and services through both direct technical contribution along with team building and management.
* Take full ownership of the production estate from both a technical and process perspective.
* Provide consistent smooth operation of live systems and drive all on-call support issues.
* Design and operate a new incident tracking process to ensure root causes are found and remediated in a timely fashion by the development team.
* Create and maintain high-end monitoring and automation tooling. Drive automation initiatives to streamline operational workflows and improve efficiency.
* Develop and maintain tools, scripts, and dashboards to monitor system health, performance, and reliability.
* Build a first-class SRE team. Through a combination of leading by example, coaching, and mentoring, mold the team you want to have around you. Provide leadership and guidance to the SRE team, fostering a culture of collaboration, innovation, and continuous improvement.
RESPONSIBILITIES:
* Proven experience in a senior or lead SRE role, with a strong track record of building and maintaining highly reliable infrastructure and services.
* Expertise in incident management, including incident response, resolution, and post-mortem analysis.
* Proficiency in monitoring, alerting, and observability tools such as Prometheus, Grafana, ELK stack, or Datadog.
* Experience with cloud platforms such as AWS, Azure, or GCP, including infrastructure as code tools like Terraform or CloudFormation.
* Strong scripting and automation skills, with proficiency in languages such as Python, Bash, or Go.
* Excellent communication and collaboration skills, with the ability to work effectively with cross-functional teams in a remote environment.
* Demonstrated leadership capabilities, with a passion for mentoring and developing team members.
WHAT THEY OFFER:
* Dynamic working environment in an extremely fast-growing company.
* Work in an international environment.
* Work in a pleasant environment with very little hierarchy.
* Intellectually challenging role, playing a massive part in the client’s success and scalability.
* Flexible working hours.
#J-18808-Ljbffr