Your new company
I'm excited to partner with a trailblazing company that's revolutionising the future of cloud infrastructure! Their cutting-edge, high-performance, GPU-optimized platform is not only pushing the boundaries of AI and HPC but also making strides towards a greener, more sustainable world.
This is a fully remote position, so you can work from anywhere without ever needing to step into an office. Plus, you'll love the fantastic perk of unlimited holiday, giving you the freedom to recharge and thrive whenever you need it.
Your new role
As a Mid-level GPU Cloud Support Engineer, you'll provide top-notch support to customers on a GPU cloud platform and customer-dedicated GPU clusters. You'll collaborate closely with cross-functional teams, external vendors, and partners to uphold SLA commitments and maintain operational excellence.
Key Responsibilities:
1. Incident Management: Handle support enquiries, investigate complex issues related to storage (eg, Vast, Weka), networking (eg, Infiniband, RoCE), and GPU optimisation.
2. GPU Cloud Support: Resolve issues promptly, adhering to SLAs for critical incidents, including system outages and performance problems.
3. Cluster Monitoring: Perform health checks on multi-node clusters, ensuring optimal node performance, GPU utilisation, and service availability.
4. Documentation: Keep detailed records of incidents, troubleshooting steps, r...