Machine Learning Engineer | Omics | RNA | DNA | PyTorch | Hybrid, London
While gene-editing is becoming increasingly efficient, identifying which genes to edit and how remains a significant challenge. To overcome this bottleneck, we use cutting-edge deep learning to accurately and efficiently identify high-value genetic targets for gene-editing.
Our approach draws inspiration from recent advancements in the drug discovery space, incorporating large language models (LLMs), transformers, and graph-based technologies to build a best-in-class discovery platform for plant sciences.
Team
Our team is currently composed of 12 members, including ML engineers, data engineers, and bioinformaticians. We also have a remote, part-time intern conducting ML research. The team primarily works together in person at our office in London 4 days per week.
Position
As part of the core ML team, you will help us build genomic foundation models. Your responsibilities could range from model training to data curation to evaluations. We welcome applicants with specific expertise who feel they could uniquely contribute to the training lifecycle of large, complex models.
The ideal applicant will have experience using genomic data in a machine learning context. We are particularly interested in individuals with experience working with foundational generative models of DNA or transcriptomic data. However, our modelling efforts have a strong focus on multi-modality, so experience with or interest in other data modalities (e.g., text) is a plus.
Core Responsibilities
* Contribution to the development of proprietary -omics models, including model training and evaluation development.
* Recreation of state-of-the-art models from the scientific literature and benchmarking against internal models and evaluations.
Additional / Development Areas
* Model deployment to ensure flexible and scalable inference access to the wider Data Science team.
* Collaboration with the bioinformatics team to ingest, standardize, and QC data from multiple sources (internal and external) for use in training pipelines.
* Support for the wider ML team on model development and commercial projects.
Core
* Postgraduate experience (MSc or PhD) in ML with a demonstrated application to a biological domain.
* Experience building modern ML architectures (e.g., transformers, diffusers) from scratch and applying them to real biological datasets.
* Experience working with large-scale transcriptomic datasets, ideally from non-human organisms (though not required).
* Experience with PyTorch, huggingface transformers, and diffusers.
* Experience working with ML accelerators.
Nice-to-have
* Relevant publications in reputable journals or contributions to open-source projects.
* Exposure to and interest in probabilistic ML, causal ML, or active learning.
* Experience with distributed model training (data and model parallelism).
* Experience working on biological data curation, including data cleansing and preprocessing of -omics datasets.
* Exposure to cloud-based ML orchestration frameworks such as Sagemaker and Vertex AI.
* Experience with model deployment in an enterprise setting.
For immediate consideration please send your most up to date CV to jason@enigma-rec.ai
Seniority level: Mid-Senior level
Employment type: Full-time
Job function: Information Technology
Industries: Staffing and Recruiting and Biotechnology Research
J-18808-Ljbffr