Key Responsibilities: Collaborative engineering: Work within a larger team to rapidly develop proof-of-concept prototypes to validate research ideas and integrate them into production systems and infrastructure Performance Analysis: Conduct in-depth profiling and tuning of operating systems and large-scale distributed systems, leveraging heterogeneous hardware (CPU, NPU). Documentation and Reporting: Maintain clear technical documentation of research findings, design decisions, and implementation details to ensure reproducibility and facilitate knowledge transfer within the team. Research & Technology Exploration: Stay current with the latest advancements in AI infrastructure, cloud-native technologies, and operating systems. E.g. techniques to efficiently execute inference workload based on SW/HW co-design; exploit workload characteristics to prefetch memory/minimize communication. Stakeholder Communication: Present project milestones, performance metrics, and key findings to internal stakeholders. List details of Knowledge, Skills, Experience and Qualifications needed to do the job: Required: Bachelor's or Master's degree in Computer Science or a related technical field. A solid background in operating systems and/or distributed systems and/or ML systems. Excellent programming skills, master of at least one language, such as C/C++. Good communication and teamwork skills. Be comfortable with research methodology. Desired: Familiarity with current LLM architectures (e.g. Llama3, DeepSeek V3) Familiarity with production LLM serving systems and inference optimizations (e.g. VLLM) Experience with accelerator programming (e.g. CUDA, Triton) and communication libraries (e.g. NCCL)