High Performance Computing Engineer (HPC Engineer)
6 days ago Be among the first 25 applicants
Direct message the job poster from Gazelle Global
About the Role:
We are seeking a skilled and passionate HPC Engineer to join our Infrastructure Engineering team. This role focuses on managing, supporting, and evolving our High Performance Computing (HPC) environment while driving innovation through automation and modern cloud technologies.
You’ll play a key role in ensuring the performance, availability, and scalability of HPC systems used by engineering teams across the organization. From managing workload schedulers to enhancing security and performance tuning, you will be at the heart of our mission to deliver world-class compute infrastructure.
Key Responsibilities:
* Operate and support HPC clusters, including job scheduling, resource management, and performance tuning.
* Work closely with engineering teams to ensure optimal usage of HPC systems for compute-intensive workloads.
* Manage and support HPC workload management tools, such as IBM Spectrum LSF.
* Automate common administrative and maintenance tasks using Shell, Bash, or Python scripting.
* Ensure the HPC environment remains secure, patched, and compliant with evolving security standards.
* Support HPC-related licensing tools (e.g., FlexLM, EDA licenses).
* Monitor system performance, availability, and proactively resolve bottlenecks and failures.
* Provide technical support to engineers and researchers using the HPC platform.
* Collaborate with infrastructure and DevOps teams to integrate HPC solutions with cloud platforms (e.g., AWS, GCP).
* Document architecture, configurations, and procedures for knowledge sharing and operational transparency.
Required Skills & Qualifications:
* 4–8 years of experience in HPC system administration or engineering.
* Degree in Computer Science, Engineering, or a related technical field.
* Strong Linux (RedHat preferred) system administration skills – RHCE certification is a plus.
* In-depth knowledge of HPC infrastructure, including cluster management and optimization.
* Hands-on experience with HPC job schedulers such as IBM Spectrum LSF, SLURM, or similar.
* Strong scripting skills in Shell, Bash, Python, or Perl.
* Experience with cloud platforms (AWS, GCP, Azure) is a plus.
* Familiarity with tools for infrastructure monitoring, remote access, and interactive technologies (e.g., ETX).
* Understanding of vulnerability management, security patching, and system hardening.
* Experience working in global, distributed teams and supporting technical end users.
Nice to Have:
* Experience with Infrastructure as Code tools like Terraform and Ansible.
* Familiarity with Continuous Integration (CI) tools and artifact management (e.g., Artifactory).
* Exposure to EDA tools and software license management systems.
Why Join Us?
* Take ownership of cutting-edge HPC infrastructure powering world-class engineering workloads.
* Be part of a global team solving complex performance and scalability challenges.
* Drive innovation and automation in a technically rich and collaborative environment.
* Hybrid working model with flexibility and work-life balance.
Seniority level
Mid-Senior level
Employment type
Contract
Job function
Information Technology
Industries
Staffing and Recruiting
#J-18808-Ljbffr