At GSK, we have bold ambitions for patients, aiming to positively impact the health of 2.5 billion people by the end of the decade. R&D is committed to discovering and delivering transformational vaccines and medicines to prevent and change the course of disease. Science and technology are coming together in a way they never have before, and we have strong tech-enabled capabilities that allow us to build a deeper understanding of the patient, human biology and disease mechanisms, and transform medical discovery. We are revolutionising the way we do R&D. We’re uniting science, technology and talent to get ahead of disease together.
Senior Data Engineer (m/f/d)
Achieving delivery of the data that matters needs design and implementation of data flows and data products which leverage internal and external data assets and tools to drive discovery and development is a key objective for the Quality Engineering and Lab Data Engineering team within GSK's R&D Tech organisation. There are five key drivers for this approach, which are closely aligned with GSK's corporate priorities of Innovation, Performance and Trust:
* Automation of end-to-end data flows: Faster and reliable ingestion of data from high throughput biomedical techniques, such as flow cytometry, sequencing or imaging, to extract value of investments in new technologies or approaches (instrument to analysis-ready data in <12h)
* Innovative domain-expert specific data products: to enable rapid agile optimization and view into data streams that enable scientist faster key insights into experimental setups, novel approaches or data acquisition parameters leading to faster biopharmaceutical development cycles.
* Enabling governance by design of external and internal data: with engineered practical solutions for controlled use and monitoring
* Supporting end-to-end code traceability and data provenance: Increasing assurance of data integrity through automation, integration
* Improving engineering efficiency: Extensible, reusable, scalable, maintainable, testable, deployable, traceable data and code in a cloud native context.
The Data Streams and Operation Engineering team accelerates biopharmaceutical drug discovery and development by designing and developing orchestrated pipelines by using existing or develop novel microservices on k8s aimed at surfacing QC and analytics ready data. These orchestrated pipelines and products provide automated scalable complete end-to-end processing of data from instruments or external data sources to analysis ready data in order to drive drug discovery and development process.
Key responsibilities
We are looking for a highly skilled and experienced Senior Data Engineer (m/f/d) to join to help us make this vision a reality. This software practitioner will work with a team of talented data, cloud and software engineers focused on knowledge and data systems for drug development and discovery. The team works with the Staff Engineer and is accountable for designing, building and testing new end to end data flows and creating data product in the cloud.
A Senior Data Engineer is a highly technical individual contributor who’s responsibility is:
* the development, testing and deployment of cloud native processing nodes including parameter harmonization for hybrid cloud pipelines.
* to contribute to the development of design patterns for our processing node framework and microservices supporting those.
* to contribute to the development of cloud native workflows for high throughput biology and chemistry from instrument to analysis-ready data under 12h
* to contribute to the harmonization between processing nodes, workflows and microservices in order to use the processing node framework to its fullest extent
* to practice “systems” level thinking in sync with senior staff and architects, manages interfaces across packages, data structures and microservices
* to exercise judgement and perform evaluations of external packages, pipeline orchestration tools and microservices in order to accelerate development of end-to-end data pipelines.
* to play an active role of a development team driving our culture.
Specific requirements
* Competitive candidates will have a proven track record of excellent coding and software development within high-performing engineering teams, and shipping products; extensive experience in collaborative coding is very important.
* Strong software development record in Python or at least another common programming language, e.g. Javascript, Rust or C++ and should follow best practises software and starting with low level design documentation, produce clean, readable code that is well-documented and appropriately tested
* Comfortable working in an Agile software development environment using e.g. ADO, JIRA and Confluence
* Prior cloud development experience (e.g. AWS, GoogleCloud, Azure) is a plus.
* She/he/they should have knowledge in k8s; k8s on different clouds, e.g. aks, gke is a plus.
* She/he/they should be very comfortable with GitOps processes, ranging from automated testing on different scales, automated deployment of packages and containers to infrastructure as code using e.g. argoCD or fluxCD.
* She/he/they has background knowledge in life science and tools around high through-put data analysis thereof, and thus can lead discussions with scientist and record acceptance criteria accordingly.
* A team player, eager to invest in personal and team growth
* Excellent communication and writing skills in English
Basic Qualifications
* BS in Computer Science, Software Engineering, biomedical engineering, engineering, or bioinformatics/computational biology, with extensive years of experience (or MS with multiple years of experience, or PhD) in the biotech/pharmaceutical/ healthcare/diagnostics/health insurance space
* Extensive architecture, coding and testing experience, excellent teamwork
Preferred Qualifications
If you have the following characteristics, it would be a plus
* Background in biomedical data processing is a plus, especially in but not limited to the fields of flow cytometry, proteomics or imaging.
* Experience in GenAI is a plus
* AI/ML experience using e.g. tensorflow, pytorch, keras or scikit-learn is a plus.
* Experience in building products with modern Cloud architectures, platforms, and back-end systems
#J-18808-Ljbffr