Role: Bioinformatics Data Engineer
Scope: Outside IR35
Start: ASAP
Notice: 1 week to both sides
Location: Remote (UK)
Duration: Initially for 6 months
Rate: Circa GBP600pd
We are seeking a highly skilled and experienced Bioinformatics Data Engineer to join our dynamic team. In this role, you will significantly impact the delivery of bioinformatics data engineering and visualizations to the Oncology R&D organisation. Your work will be central to advancing our data stack, as well as our automation and observability capabilities.
Main Duties and Responsibilities:
1. Develop, execute, and maintain ETL pipelines for extracting, transforming, and loading data for use in cBio and other bioinformatics analysis and visualizations.
2. Ensure the reliability, scalability, and performance of ETL pipelines and data systems.
3. Troubleshoot and resolve issues related to data loading and integration into downstream systems.
4. Collaborate with bioinformaticians, data scientists and other stakeholders to understand and meet the data needs and requirements of the organization.
5. Stay up-to-date with new technologies and best practices in bioinformatics data engineering.
Essential Requirements:
1. A background in Computer Science, Engineering, or Bioinformatics (Master level) with 5 years of relevant experience.
2. Familiar with bioinformatics visualizations in different omics domains including genomics, transcriptomics, proteomics, DNA methylation, etc.
3. Extensive experience with Python and Python data/scientific libraries like pandas, numpy/scipy, polars, etc.
4. Proven experience with bioinformatics visualization systems like cBioPortal, including data loading and troubleshooting.
5. Strong understanding of ETL processes and data pipeline development.
6. Ability to interact with various data sources, both structured and unstructured (e.g. HDFS, SQL, noSQL).
7. Experience working across multiple scientific compute environments to create data workflows and pipelines (e.g. HPC, cloud, Unix/Linux systems).
Desirable:
1. Experience with deploying data pipelines using orchestration services like Airflow, Prefect, AWS Glue, Dagster, etc.
2. Experience using AWS services such as S3/EBS, EC2, CloudWatch, SNS, and Lambda.
3. Understanding of software development, testing and quality processes with experience with testing frameworks and documentation.
4. Expertise with biological/health data, especially genomics and other omics technologies.
5. Ability to understand, map, integrate, and document complex data relationships and business rules.
#J-18808-Ljbffr