Role – Bioinformatics Data Engineer
Scope – Outside IR35
Start - ASAP
Notice – 1 week to both sides
Location – Remote (UK)
Duration – initially for 6 months
Rate – circa £600pd
We are seeking a highly skilled and experienced Bioinformatics Data Engineer to join our dynamic team. In this role, you will significantly impact the delivery of bioinformatics data engineering and visualizations to the Oncology R&D organisation. Your work will be central to advancing our data stack, as well as our automation and observability capabilities.
Main Duties and Responsibilities
Develop, execute, and maintain ETL pipelines for extracting, transforming, and loading data for use in cBio and other bioinformatics analysis and visualizations
Ensure the reliability, scalability, and performance of ETL pipelines and data systems
Troubleshoot and resolve issues related to data loading and integration into downstream systems
Collaborate with bioinformaticians, data scientists and other stakeholders to understand and meet the data needs and requirements of the organization
Stay up-to-date with new technologies and best practices in bioinformatics data engineering
Essential Requirements
A background in Computer Science, Engineering, or Bioinformatics (Master level) with 5 years of relevant experience
Familiar with bioinformatics visualizations in different omics domains including genomics, transcriptomics, proteomics, DNA methylation, etc
Extensive experience with Python and Python data/scientific libraries like pandas, numpy/scipy, polars, etc
Proven experience with bioinformatics visualization systems like cBioPortal, including data loading and troubleshooting
Strong understanding of ETL processes and data pipeline development
Ability to interact with various data sources, both structured and unstructured (e.g. HDFS, SQL, noSQL)
Experience working across multiple scientific compute environments to create data workflows and pipelines (e.g. HPC, cloud, Unix/Linux systems)
Desirable:
Experience with deploying data pipelines using orchestration services like Airflow, Prefect, AWS Glue, Dagster, etc
Experience using AWS services such as S3/EBS, EC2, CloudWatch, SNS, and Lambda.
Understanding of software development, testing and quality processes with experience with testing frameworks and documentation
Expertise with biological/health data, especially genomics and other *omics technologies.
Ability to understand, map, integrate, and document complex data relationship and business rules