The Onyx Research Data Tech organization is GSK’s Research data ecosystem which has the capability to bring together, analyze, and power the exploration of data at scale. We partner with scientists across GSK to define and understand their challenges and develop tailored solutions that meet their needs. The goal is to ensure scientists have the right data and insights when they need it to give them a better starting point for and accelerate medical discovery. Ultimately, this helps us get ahead of disease in more predictive and powerful ways.
Onyx is a full-stack shop consisting of product and portfolio leadership, data engineering, infrastructure and DevOps, data / metadata / knowledge platforms, and AI/ML and analysis platforms, all geared toward:
* Building a next-generation, metadata- and automation-driven data experience for GSK’s scientists, engineers, and decision-makers, increasing productivity and reducing time spent on “data mechanics”
* Providing best-in-class AI/ML and data analysis environments to accelerate our predictive capabilities and attract top-tier talent
* Aggressively engineering our data at scale, as one unified asset, to unlock the value of our unique collection of data and predictions in real-time
A Data Ops Engineer II is a technical contributor who can take a well-defined specification for a function, pipeline, service, or other sort of component, devise a technical solution, and deliver it at a high level. They have a strong focus on operability of their tools and services, and develop, measure, and monitor key metrics for their work to seek opportunities to improve those metrics. They are aware of, and adhere to, best practice for software development in general (and their specialization in particular), including code quality, documentation, DevOps practices, and testing. They ensure robustness of our services and serve as an escalation point in the operation of existing services, pipelines, and workflows.
A Data Ops Engineer II is a key contributor delivering high-performing, high-impact biomedical and scientific data ops products and services, from a description of a pattern that customer Data Engineers are trying to use all the way through to final delivery (and ongoing monitoring and operations) of a templated project and all associated automation. They should be deeply familiar with the most common tools (languages, libraries, etc) within their specialization, and aware of the open source communities that revolve around these tools. They should be constantly seeking feedback and guidance to further develop their technical skills and expertise, and should take feedback well from all sources in the name of development.
Key responsibilities
* Builds modular code / libraries / services / etc using tools appropriate to their area of specialization
* Produces well-engineered software, including appropriate automated test suites and technical documentation
* Develop, measure, and monitor key metrics for all tools and services and consistently seek to iterate on and improve them
* Partner with Infra and DevOps team where modifications to underlying tools (e.g.infrastructure as code, CloudOps,DevOps, logging / alerting) are needed to serve new use-cases, and to ensure operations are planned
* Consult scientific users on application scalability to PBs of data by having a deep understanding of software engineering, algorithms, and underlying hardware infrastructure and their impact on performance.
* Ensure consistent application of platform abstractions to ensure quality and consistency with respect to logging and lineage
* Fully versed in coding best practices and ways of working, and participates in code reviews and partnering to improve the team’s standards
* Adhere to QMS framework and CI/CD best practices
* Provide L3 support to existing tools / pipelines / services
Why you?
Basic Qualifications:
We are looking for professionals with these required skills to achieve our goals:
* Bachelor’s degree in Computer Science, Software Engineering or related field.
* 4+ years of relevant work experience
* Cloud experience (e.g., AWS, Google Cloud, Azure, Kubernetes)
* Experience with DevOps principles and tools (e.g.GitOps,Azure DevOps, GitHub Actions, GitFlow ...)
* Programming experience in Python, Scala or Go
* Experience with agile software development environments using Jira and Confluence
Preferred Qualifications:
If you have the following characteristics, it would be a plus:
* Experience working with external engagements, technical architecture forums etc.
* Experience in workflow orchestration with tools such as Argo Workflow, Airflow, and scientific workflow tools such as Nextflow, Snakemake, VisTrails, or Cromwell
* Experience with specialized data architecture (e.g. optimizing physical layout for access patterns, including bloom filters, optimizing against self-describing formats such as ORC or Parquet, etc)
* Demonstrated experience building reusable components on top of the CNCF ecosystem including Kubernetes (or similar ecosystem)
* Common distributed data tools (e.g.Spark, Hive)
* Application experience of CI/CD implementations using git and a common CI/CD stack (e.g. Jenkins, CircleCI, GitLab, Azure DevOps)
#J-18808-Ljbffr