Job Overview:
High-performance ML workloads on Arm CPUs requires the co-development of algorithms and highly optimized CPU kernels. In CT-ML (Central Technology, Machine Learning), rapid kernel prototyping is crucial for exploring algorithms and assessing trade-offs between model accuracy and performance. Successful prototypes are essential to drive future CPU architecture development and also deliverables to Central Engineering for final production.
Responsibilities:
This position is part of a dedicated team within the CT-ML group to focus on analyzing ML workload, rapid prototyping of highly optimized CPU kernels to drive model performance and accuracies.
Required Skills and Experience:
* Strong interest and passion for implementing high-performance kernel code in a dynamic environment.
* 4+ years experience in implementing high performance CPU kernel with vector and matrix extensions.
* Experience measuring and understanding performance.
* Experience in creating an efficient kernel code development framework including tools and testing.
* Deep understanding of CPU architecture.
“Nice To Have” Skills and Experience:
* Knowledge of ML models and algorithms is a plus.
* Advanced degree or equivalent experience in Computer Architecture and Software are a plus.
In Return:
Arm is committed to global talent acquisition, offering an attractive relocation package. With offices around the world, Arm is a diverse organization of dedicated, creative and highly talented engineers. By enabling a dynamic, inclusive, meritocratic, and open workplace, where all our people can grow and succeed, we encourage our people to share their unrivaled contributions to Arm's success in the global marketplace.
#J-18808-Ljbffr