Your new companyI have partnered exclusively with a pioneering company that is shaping the future of cloud infrastructure. Their innovative, high-performance, and GPU-optimized platform not only drives advancements in AI and HPC but also champions sustainability for a greener, more efficient world.
This role is fully remote with no expectation to ever be in an office. You'll also enjoy the fantastic perk of unlimited holiday, allowing you to recharge and thrive.
Your new roleAs a Mid-level GPU Cloud Support Engineer, you will be responsible for providing support to customers on a GPU cloud platform as well as customer-dedicated GPU clusters. This role involves working closely with cross-functional teams, external vendors, and partners to uphold SLA commitments and maintain operational excellence.Key Responsibilities:
1. Incident Management: Receive and triage support enquiries, investigate unresolved complex issues related to storage (e.g. Vast, Weka etc.), networking (e.g. Infiniband, RoCE), and GPU optimisation.
2. GPU Cloud Support: Triage issues and provide timely resolutions, working within defined SLAs for critical incidents, including system outages and performance issues.
3. Cluster Monitoring: Conduct health checks of multi-node clusters, ensuring node performance, GPU utilisation, and service availability are optimal.
4. Documentation: Maintain detailed records of incidents, troubleshootin...