Salary for this Role: From £70,600 with benefits, subject to skills and experience. Job Title: Senior HPC & Research Data Systems Engineer Reports to: Wei Xing Closing Date: 08/Dec/2024 23.59 GMT Job Description: Senior HPC & Research Data Systems Engineer Reports to: Research Computing Platform This is a full-time, permanent position on Crick Terms and Conditions of Employment. SUMMARY The Crick requires state-of-the-art Scientific Computing systems and services to enable world-leading scientific research. Research at the Crick is data-intensive in all experimental, theoretical and computational dimensions, so efficient and effective management and high-performance processing of its data is critical to its success. The Sr HPC & Research Data Systems Engineer position is a role created to contribute directly to the design, implementation, development and service delivery of the institute’s HPC, Cloud and Research Data Storage hardware and software through a mix of on-premise and cloud systems, ensuring that the Crick provides state-of-the-art and innovative Scientific Computing capability to its researchers, collaborators and partners, including commercial relationships. The role is part of the Research Computing Platforms/HPC team which delivers services to the Crick’s research community and works closely with Research Labs and Science Technology Platforms (STPs) across the institute. The team works as a specialist part of the Infrastructure and Cloud Platforms group within the IT Office, with excellent opportunities for cross-technology working. The Crick currently has a large (currently 24 PB) on-premise IBM Spectrum Scale high performance storage system, with an offsite daily backup and long-term archive service. We also have an evolving hybrid of on-site and Cloud systems to support research. This role will play a key part in identifying and delivering the continuous improvement of the Crick’s research computing platforms to support our Research Data Management Strategy. KEY RESPONSIBILITIES These include but are not limited to: Technical leadership and management: Lead the design and delivery of research data platforms installed on-site and hosted in public/private clouds. The role will coach/mentor more junior members of the team and on occasions members of other teams and, under the direction of the Team Lead, may be responsible for direct line management. You will also assist other team members in supporting and improving research computing systems, including system configuration management, parallel file systems and object stores. Engage with other ITO teams including Architecture and Design, Information Security, Project Management and Service delivery through well defined work packages, outcomes and departmental governance processes. Systems Administration: Manage storage systems on-site and in the cloud, including quotas, directory structures, and distribution of data across storage systems and tiers. (Accessed from Windows, Mac and Linux clients). Monitor health, security and performance of systems software/hardware and scientific applications, working actively with vendors and members of the enterprise IT team to troubleshoot and quickly restore services when required. Automate operational tasks and perform changes to systems software/hardware required to improve management and service delivery. Produce documentation for internal systems and support processes. Mentor and support existing and new staff regarding operational and project work Create reports for presentation to management, governance groups and other key stakeholders regarding key research computing services managed by the Crick. Systems Engineering: Working with junior colleagues, put in place proof-of-concept systems and services to meet evolving scientific requirements and provide innovative solutions to move ambitious scientific projects forward. Work with scientific teams to integrate scientific instruments and laboratory information management systems with research data analysis and management platforms. Share solutions and best practice with colleagues and the wider Research Technologist community in conferences, publications and training events. Stay constantly up to date with in-depth knowledge and skills to guarantee the delivery of state-of-the-art technologies, as well as data management and processing techniques and best practice User Support: Develop documentation and provide training and advice to research teams regarding best practice in research data management using research computing platforms managed by the Crick. Deploy scientific applications on Cloud and HPC platforms on Linux, Windows and containers Provide support and guidance to colleagues of all experience levels regarding known issues and solutions to problems related to use of the Crick’s research computing platforms. KEY EXPERIENCE AND COMPETENCIES The post holder should embody and demonstrate our core Crick values: Bold, Imaginative, Open, Dynamic and Collegial, in addition to the following: Essential: Demonstrable experience in administration of high-performance parallel and object storage systems, services and data for research - e.g. IBM Spectrum Scale, Lustre, Ceph, iRODS, etc., ideally in a mixed client environment (Windows/Mac/Linux) Experience in using/managing/deploying HPC systems and scheduler policies - SLURM, SGE, etc. Advanced experience in Unix/Linux systems administration and integration, including: scripting and automation using at least one high level language, preferably, python. networking - Ethernet, TCP/IP, ideally InfiniBand – e.g. Mellanox. Monitoring/logging tools - e.g. Splunk, Grafana, Elk Stack, etc. Excellent interpersonal and communication skills, and demonstrable ability to work collaboratively and flexibly as part of a deeply technical engineering team. Excellent time management and prioritisation skills. High attention to detail and accuracy to effectively analyse and interpret complex data and use it to solve complex technical problems quickly and effectively. Desirable (one or more will be advantageous): Virtualisation and public or private cloud computing and storage infrastructure administration - e.g. oVirt, OpenStack, AWS, Microsoft Azure, GCP, etc. Experience in using OS deployment, configuration management and continuous integration and testing tools - e.g. xCAT, Vagrant, Ansible, Terraform, Git, Github, Jenkins, etc. Experience in the most common and modern Scientific Computing applications management and deployment systems - e.g. Lmod, Conda, Easybuild, Spack, Singularity, Docker, etc. Experience in developing and maintaining hybrid cloud infrastructures in a multi-cloud environment. PhD or masters in a computational research field or equivalent professional experience in a Research & Development work environment. Find out what benefits the Crick has to offer: For more information on our great pay and benefits package please click here: https://www.crick.ac.uk/careers-and-study/life-at-the-crick/pay-and-benefits Equality, Diversity & Inclusion: We welcome applications from all backgrounds. We are committed to providing equal employment opportunities, regardless of ethnicity, nationality, gender, sexual orientation, gender identity, religion, pregnancy, age, disability, or civil partnership, marital or family status. We particularly welcome applications from people who are Minority Ethnic as they are currently underrepresented in the Crick at this level. Diversity is essential to excellence in scientific endeavour. It increases breadth and perspective, leading to more innovation and creativity. We want the Crick to be a place where everyone feels valued and where diversity is celebrated and seen as part of the foundation for our Institute’s success. The Crick is committed to creating equality of opportunity and promoting diversity and inclusivity. We all share in the responsibility to actively promote dignity, respect, inclusivity and equal treatment and it is our aim to ensure that these principles are reflected and implemented in all strategies, policies and practices. Read more on our website: https://www.crick.ac.uk/careers-and-study/life-at-the-crick/equality-diversity-and-inclusion