Job Title: AI Engineer - Performance Optimization Specialist
About the Role
Are you obsessed with pushing the boundaries of AI model performance? Do you thrive on optimizing every aspect of AI systems — from shaving milliseconds off inference times to maximizing GPU utilization and reducing power consumption? ⚡️
We are seeking an AI Engineer to join our team and focus on building cutting-edge solutions that enhance the efficiency, scalability, and accessibility of AI systems. You’ll work with state-of-the-art models, including LLMs and multimodal systems, and deploy them across large GPU clusters. This is your chance to make a significant impact by driving innovation in AI performance optimization.
Key Responsibilities 📋
* Optimize AI Systems: Design and implement performance enhancements for large-scale AI models, ensuring minimal latency and maximum throughput.
* Distributed Inference: Develop and fine-tune systems for distributed inference, enabling seamless operation across multi-GPU and multi-node setups.
* Hardware Efficiency: Leverage advanced hardware capabilities, such as GPU acceleration and high-performance networking, to improve system efficiency and reduce energy consumption.
* Model Optimization: Research and apply techniques like quantization, pruning, and sparsity to improve model performance and resource utilization.
* Pipeline Development: Create robust deployment pipelines for AI model serving, monitoring, and continuous optimization in production environments.
* Collaborative Innovation: Work closely with cross-functional teams to drive advancements in AI infrastructure and share insights into best practices for performance engineering.
What We’re Looking For 🔍
Core Skills:
* Experience deploying and optimizing AI models in multi-GPU and multi-node systems.
* Proficiency in AI runtimes such as PyTorch, TensorRT, ONNX Runtime, or similar frameworks.
* Knowledge of distributed inference engines like Ray Serve, Triton Inference Server, or SLURM.
* Familiarity with AI compilers, including OpenXLA, torch.compile, MLIR, or TVM.
* Understanding of high-performance networking technologies, such as RDMA, Infiniband, or NVLink.
* Expertise in model optimization techniques like quantization and sparsity.
Preferred Qualifications:
* A growth mindset with a passion for AI innovation and efficiency.
* Experience contributing to open-source projects or showcasing work through personal blogs or GitHub repositories.
* Familiarity with experimental hardware setups for AI model serving and optimization.
Why Join Us?
This is an opportunity to work on cutting-edge AI infrastructure, where you’ll help redefine what’s possible in model performance optimization. By joining our team, you’ll be at the forefront of innovation, ensuring that the next generation of AI systems is faster, smarter, and more efficient than ever.
Let’s shape the future of AI together!