Artificial intelligence engineer - distributed inference

Birmingham (West Midlands)

Danucore

Engineer

€40,000 - €60,000 a year

Posted: 21 January

Offer description

Artificial Intelligence Engineer - Distributed Inference

Danucore Birmingham, England, United Kingdom

About the Role

This role is for those obsessed with pushing the boundaries of AI model performance.

We're looking for someone who gets excited about shaving milliseconds off inference time, every percentage point of GPU utilization gained and how many Watts were consumed to achieve it.

You'll work directly with cutting-edge models — from LLMs to multimodal systems — and large GPU clusters, finding innovative ways to make them run faster, more efficiently, and more accessibly on diverse hardware setups.

What We're Looking For

In team members:

* Passion for AI: A strong desire to influence the future of technology and its societal impact.
* Willingness to Learn: We're looking for future experts with curious minds and a growth mindset.
* Open-Mindedness: Ready to challenge the norm and think outside the box?

And for the role:

* Evidence of deploying and optimising AI models in multi GPU and multi node systems.
* Experience with distributed inference engines: Ray Serve, Triton Inference Server, vLLM, SLURM.
* Knowledge of AI compilers: OpenXLA, torch.compile, OpenAI's Triton, MLIR, Mojo, TVM, MLC-LLM.
* Good working knowledge of inter-process communication: message queues, MPI, NCCL, gRPC.
* Good working knowledge of high performance networking: RDMA, RoCE, Infiniband, NVIDIA GPUDirect, NVLink, NVIDIA DOCA, MagnumIO, dpdk, spdk.
* Experience with model quantisation, pruning, and sparsity techniques for performance optimisation.

And bonus points if you have:

* A homelab, blog, or a collection of git repos showcasing your talents and interests.
* Made contributions to open-source projects or publications in the field of AI/ML systems optimisation.

Key Responsibilities

* Design and implement high-performance distributed inference systems for running large language models and multimodal AI models at scale.
* Optimise model serving infrastructure for maximum throughput, minimal latency, and optimal power efficiency.
* Develop and maintain deployment pipelines for efficient model serving and monitoring in production.
* Research and implement cutting-edge techniques in model optimisation, including pruning, quantisation, and sparsity methods.
* Design, build and configure experimental hardware setups for model serving and optimisation.
* Design and implement robust testing frameworks to ensure reliable model serving.
* Collaborate with the team to build and improve our distributed inference platform, making it more accessible and efficient for users.
* Monitor, optimise and document system performance metrics, including latency, throughput, power consumption and benchmark scores.

How to Apply

Email your cover letter and CV to jobs@danucore.com with subject "AI Engineer - Distributed Inference"

In your cover letter, please include details of:

* What parts or technologies mentioned in this job advert you have experience with and can add value with.
* Links to any public work e.g. GitHub profile, blogs or papers.

Seniority level: Director

Employment type: Full-time

Job function: Engineering and Information Technology

Industries: IT Services and IT Consulting

#J-18808-Ljbffr

Apply

Create E-mail Alert

Save

Similar job

Area engineer geotechnical solutions

Birmingham (West Midlands)

Mitchell Maguire

Engineer

£60,000 a year

Similar job

Fire & gas discipline engineer

Birmingham (West Midlands)

Jam Recruitment

Engineer

£45 - £50 an hour

Similar job

Qhse engineer

Sutton Coldfield

On Target Recruitment

Engineer

£35,000 a year