1. Collaborate with cross-functional teams to ensure the reliability, availability, and performance of our client-facing services
2. Maintain and configure observability platforms such as Datadog
3. Proactive monitoring of production and other environments to ensure stability, availability, security and integrity
4. Design and implement automation and processes to improve the efficiency and effectiveness of the teams and other support functions
5. Engage with business stakeholders to gather requirements, address concerns, and provide updates on projects and system status
6. Contribute to the design, build and operational management of the services
7. Lead incident response, troubleshooting, and root cause analysis to mitigate and prevent future issues
8. Work closely with engineering, support and operations teams to upskill and promote knowledge transfer, producing training materials and articles.
9. Participate in on-call rotation to provide support and ensure system uptime