Job Title: Linux, HPC, and Kubernetes Systems Engineer
Location: Remote and onsite required as needs be in Wallingford
Job Type: Contract 3 months - Inside IR35
Job Summary: We are looking for a highly skilled Linux, HPC, and Kubernetes Systems Engineer to join our growing team. This position will be responsible for maintaining and troubleshooting High-Performance Computing (HPC) environments, with a focus on Lenovo and Ubiquity platforms, while also managing Kubernetes clusters. The ideal candidate will have strong experience in Linux administration, HPC systems, and Kubernetes, along with a proven ability to solve complex technical issues and optimize infrastructure performance.
Key Responsibilities:
* Manage and maintain HPC environments with a primary focus on Lenovo and Ubiquity platforms.
* Install, configure, and troubleshoot Kubernetes clusters in a production environment.
* Monitor and optimize Linux-based systems, ensuring reliability and performance for HPC and containerized applications.
* Troubleshoot complex issues in HPC clusters and Kubernetes infrastructure, including hardware, software, networking, and performance-related problems.
* Manage resource allocation, workload scheduling, and performance tuning for HPC environments.
* Implement and manage container orchestration using Kubernetes, ensuring scalability and high availability.
* Automate system processes and improve operational efficiency using Scripting (Bash, Python, etc.).
* Perform system upgrades, apply patches, and monitor security vulnerabilities in Linux, HPC, and Kubernetes environments.
* Collaborate with cross-functional teams to design, deploy, and optimize infrastructure solutions for both HPC and Kubernetes-based workloads.
* Provide documentation, training, and technical support to end-users and internal stakeholders.
* Ensure that backup and recovery strategies are effectively implemented for both HPC and Kubernetes environments.
* Monitor system health and performance using appropriate tools (eg, Prometheus, Grafana) and take proactive measures to address potential issues.
Qualifications:
* Bachelor's degree in Computer Science, Engineering, or related field, or equivalent work experience.
* Proven experience in Linux system administration (Red Hat, CentOS, or Ubuntu).
* Strong experience managing HPC systems, particularly with Lenovo and Ubiquity platforms.
* Extensive hands-on experience with Kubernetes cluster deployment, maintenance, and troubleshooting.
* Deep understanding of containerization technologies like Docker and Kubernetes.
* Strong troubleshooting skills across Linux, HPC environments, and Kubernetes infrastructures.
* Proficiency in Scripting languages (Bash, Python) for automation and process improvement.
* Knowledge of cluster management and workload scheduling software (eg, SLURM, PBS) for HPC environments.
* Familiarity with networking protocols, server hardware, storage solutions, and system monitoring tools.
* Ability to work independently in a fast-paced environment, managing multiple tasks and priorities.
Preferred Skills:
* Experience with cloud-based Kubernetes deployments (AWS, Azure, GCP).
* Familiarity with container networking, service discovery, and load balancing (eg, Istio, Envoy).
* Knowledge of DevOps tools and methodologies (eg, Ansible, Terraform).
* Experience with virtualization and container security practices.
* Experience working in research, academic, or enterprise-level environments.
Benefits:
* Competitive salary and benefits package.
* Health, dental, and vision insurance.
* Paid time off, holidays, and professional development opportunities.
* Opportunity to work in a cutting-edge technological environment.