The Onyx Research Data Tech organization is GSK’s Research data ecosystem which has the capability to bring together, analyze, and power the exploration of data at scale. We partner with scientists across GSK to define and understand their challenges and develop tailored solutions that meet their needs. The goal is to ensure scientists have the right data and insights when they need it to give them a better starting point for and accelerate medical discovery. Ultimately, this helps us get ahead of disease in more predictive and powerful ways.
Onyx is a full-stack shop consisting of product and portfolio leadership, data engineering, infrastructure and DevOps, data / metadata / knowledge platforms, and AI/ML and analysis platforms, all geared toward:
* Building a next-generation, metadata- and automation-driven data experience for GSK’s scientists, engineers, and decision-makers, increasing productivity and reducing time spent on “data mechanics”
* Providing best-in-class AI/ML and data analysis environments to accelerate our predictive capabilities and attract top-tier talent
* Aggressively engineering our data at scale, as one unified asset, to unlock the value of our unique collection of data and predictions in real-time
We are looking for a skilled and experienced Data Platform Engineer II to join our growing team. Data Platform Engineers take full ownership of delivering high-performing, high-impact data platform as products and services, from a description of a problem customer Data Engineers are trying to solve all the way through to final delivery (and ongoing monitoring and operations). They are standard bearers for software engineering and quality coding practices within the team and are expected to mentor more junior engineers; they may even coordinate the work of more junior engineers on a large project. They devise useful metrics ensuring their services are meeting customer demand, having an impact, and iterate to deliver and improve on those metrics in an agile fashion.
The Data Platform team builds and manages reusable components and architectures designed to make it both fast and easy to build robust, scalable, production-grade data products and services in the challenging biomedical data space.
In this role, you will
* Be a technical individual contributor, building modern, cloud-native systems for standardizing and templatizing data engineering such as:
o Standardized physical storage and search/indexing systems
o Schema management (data + metadata + versioning + provenance + governance)
o API semantics and ontology management
o Standard API architectures
o Kafka + standard streaming semantics
o Standard components for publishing data to file-based, relational, and other sorts of data stores
o Metadata systems
o Tooling for QA/evaluation
* Know the metrics desired for your tools and services and iterate to deliver and improve on those metrics in an agile fashion.
* Given a well-specified data framework problem, implement end-to-end solutions using appropriate programming languages (e.g. Python, Java, Scala, Bash), open-source tools (e.g. Spark, Elasticsearch, ...), and cloud vendor-provided tools (e.g. AWS boto3, gcloud cli)
* Leverage tools provided by Tech (e.g. infrastructure as code, cloudOps, DevOps, logging/alerting, ...) in delivery of solutions
* Write proper documentation in code as well as in wikis/other documentation systems
* Write fantastic code along with proper unit, functional, and integration tests for code and services to ensure quality
* Stay up-to-date with developments in the open-source community around data engineering, data science, and similar tooling
Why you?
Qualifications & Skills:
We are looking for professionals with these required skills to achieve our goals:
* Bachelor’s degree in Computer Science, Software Engineering, or related discipline.
* Significant experience working in a relevant role in industry.
* Experience with DevOps and/or cloud infrastructure.
* Strong Python skills.
Preferred Qualifications & Skills:
If you have the following characteristics, it would be a plus:
* Master’s degree in Computer Science, Software Engineering, or related discipline.
* Experience with open-source tools such as Spark, Elasticsearch, etc.
* Experience with cloud vendor-provided tools such as AWS boto3, gcloud cli, etc.
Closing Date for Applications: Wednesday 5th March 2025 (COB)
Please take a copy of the Job Description, as this will not be available post closure of the advert. When applying for this role, please use the ‘cover letter’ of the online application or your CV to describe how you meet the competencies for this role, as outlined in the job requirements above. The information that you have provided in your cover letter and CV will be used to assess your application.
J-18808-Ljbffr