The Onyx Research Data Tech organization is GSK’s Research data ecosystem which has the capability to bring together, analyze, and power the exploration of data at scale. We partner with scientists across GSK to define and understand their challenges and develop tailored solutions that meet their needs. The goal is to ensure scientists have the right data and insights when they need it to give them a better starting point for and accelerate medical discovery. Ultimately, this helps us get ahead of disease in more predictive and powerful ways.
Onyx is a full-stack shop consisting of product and portfolio leadership, data engineering, infrastructure and DevOps, data / metadata / knowledge platforms, and AI/ML and analysis platforms, all geared toward:
* Building a next-generation, metadata- and automation-driven data experience for GSK’s scientists, engineers, and decision-makers, increasing productivity and reducing time spent on “data mechanics”
* Providing best-in-class AI/ML and data analysis environments to accelerate our predictive capabilities and attract top-tier talent
* Aggressively engineering our data at scale, as one unified asset, to unlock the value of our unique collection of data and predictions in real-time
Data Engineering is responsible for the design, delivery, support, and maintenance of industrialized automated end to end data services and pipelines. They apply standardized data models and mapping to ensure data is accessible for end users in end-to-end user tools through use of APIs. They define and embed best practices and ensure compliance with Quality Management practices and alignment to automated data governance. They also acquire and process internal and external, structured and unstructured data in line with Product requirements.
This role is responsible for building and leading a scrum team of world-class NLP data engineers focused on building automated, scalable, and sustainable pipelines to account for evolving scientific needs. They support the head of Data Engineering in building a strong culture of accountability and ownership in their team, as well as instilling best-in-class engineering practices (e.g., testing, code reviews, DevOps-forward ways of working). They work in close partnership with our Platforms teams to ensure we have the right tools and ways of working, and with our Bioinformatics teams to ensure the use of appropriate schemas, vocabularies, and ontologies.
Key responsibilities for the Staff NLP Data Engineer and Team Lead:
* Lead a team of NLP data engineers in delivering data and knowledge products that advance GSK R&D
* Architect the data delivery and operational strategy for the NLP data engineering team; deconstruct a complex and ambiguous data or knowledge request into a detailed strategy to make decisions, anticipate future issues, and drive engineering efficiencies
* Partner with AIML and knowledge graph platform team to build, test, and deploy NLP pipelines, systems and solutions
* Partner closely with other data engineering leads to conceptualize the design of new data flows aimed at maximizing reuse and aligning with an event-driven microservice enable architecture
* Partner with other data engineering leads to architect an engagement model and optimal ways of working with the product management teams
* Apply graph-based data modelling techniques for efficient organization, integration, and data retrieval to ensure system flexibility and maintainability
* Design innovative strategies beyond the current enterprise way of working to create a better environment for the end users, and able to construct a coordinated, stepwise plan to bring others along with the change curve
* Create standards for proper ways of working and engineering discipline, including the QMS framework and CI/CD best practices and proactively spearhead improvement within their engineering area
* Exemplar leader in their field of technical knowledge, keen on bettering their understanding and acting as the knowledge holder for the organization
Why you?
Basic Qualifications:
We are looking for professionals with these required skills to achieve our goals:
* Bachelor’s degree in Data Engineering, Computer Science, Software Engineering or related field.
* 8+ years of Data Engineering experience
* Experience in Natural Language Processing algorithms and deep learning methods
* Experience with building end-to-end systems based on machine learning or deep learning methods
* Cloud experience (e.g., AWS, Google Cloud, Azure, Kubernetes)
Preferred Qualifications:
* If you have the following characteristics, it would be a plus:
* Demonstrable experience overcoming high volume, high compute challenges
* Familiarity with orchestrating tooling
* Experience in automated testing and design
* Experience with DevOps-forward ways of working
* Nice to have good understanding of ontologies and semantic harmonization of data across sources
* Deep knowledge and use of at least one common programming language: e.g., Python, Scala, Java
* Deep experience with common big data tools (e.g., Spark, Kafka, Storm, etc.)
* Proven experience with machine learning algorithms and NLP frameworks like Pytorch, Tensorflow, Spacy, etc.
* Proven track record of working with knowledge graphs and graph databases, and in general good understanding of database concepts
* Proficiency in semantic web technologies (SPARQL, RDF, OWL) and harmonization of data
* Application experience of CI/CD implementations using git and a common CI/CD stack (e.g., Jenkins, CircleCI, GitLab, Azure DevOps)
* Experience with agile software development environments using tools like Jira and Confluence
* Experience with Infrastructure as a Code and automation tools (i.e. Terraform)
#J-18808-Ljbffr