Job Title : HPC Platform & Security Engineer
Exp : 4+
Overview
We are seeking a highly skilled and experienced HPC Platform & Security Engineer to join the Onix team. This role is essential for ensuring the stability, security, and seamless operation of High-Performance Computing (HPC) clusters on Google Cloud Platform (GCP) . You will play a key role in delivering a robust and secure HPC platform for our clients’ most demanding computational workloads.
Key Responsibilities
1. Cluster Platform Stability & Maintenance
Own the stability, security, and consistent release of HPC Virtual Machine (VM) images and essential software tools.
Develop and maintain automation scripts using Python, Bash, Terraform for cluster lifecycle management, health checks, and system provisioning.
Perform regular maintenance, performance tuning, and optimization of the HPC environment.
2. Security and Patch Management
Manage, deploy, and triage security patches and updates across all HPC cluster components and VM images.
Monitor and maintain system security configurations in alignment with GCP and client security policies.
Diagnose and resolve complex integration issues related to security and monitoring tools.
3. Release Management
Implement and manage a robust release pipeline to deliver consistent VM image updates and bi-weekly software patches with minimal downtime.
Collaborate with security teams to validate, certify, and sign off on all VM image and tooling releases.
4. Monitoring and Troubleshooting
Utilize Google Cloud Monitoring , Cloud Logging , and other GCP tools to ensure platform health and proactively detect anomalies.
Troubleshoot complex system-level, performance-related, and network-related issues.
Maintain comprehensive documentation for all operational, release, and security processes.
Required Qualifications
Education
Bachelor’s degree in Computer Science , Engineering , or a related technical field.
Experience
4+ years of hands-on experience in cloud infrastructure , DevOps , or system administration , with strong exposure to Linux and HPC environments.
Technical Skills
Deep expertise in Linux administration and shell scripting ( Bash / Python ).
Strong practical experience with major cloud platforms, preferably Google Cloud Platform (GCP) .
Experience with Infrastructure as Code (IaC) tools such as Terraform or Cloud Deployment Manager .
Proven knowledge of security best practices , patch management , and vulnerability remediation in cloud and Linux environments.
Familiarity with HPC workload managers like Slurm or similar job schedulers ( highly desirable ).
Platform Engineer • Pune, Maharashtra, India