Description :
Job Title : HPC Admin
Experience : 5+ Years
Location : Bangalore (Onsite)
Responsibilities :
- Administration of HPC and VDI clusters
- User Account management for HPC onboarding and offboarding
- Creation and Maintenance of AMI Images in AMI accounts
- Install, configure, and maintain Linux operating systems on HPC clusters
- Support HPC necessary components and native services of the platform by coordinating with respective providerse, EFPortal, AWS RES, CycleCloud, AWS Parallel Cluster, etc.
- AWS Managed Active Directory support and Management
- Continuous upgrades to the HPC platform and related components OS, Java, Python, EFPortal, etc.
- Implement and maintain necessary compliance controls i, US Export Control, Confidentiality
- Conduct regular audits, share the findings and implement corrective actions as required
- Co-ordinate with other teams like v-drive team in testing and migrating / installing engineering applications to the platform
- Manage job schedulers such as Slurm or LSF
- Utilize node provisioning tools like Werewolf
- Troubleshoot system issues and provide technical support to users
- Monitor system performance and ensure optimal operation of the HPC environment
- Collaborate with other IT professionals to integrate new technologies into the existing infrastructure
- Progressive experience in HPC system administration, preferably in a Redhat / CentOS Linux environment
- AWS Cloud formation templates to build infrastructure for HPC and storage Amazon FSx for Netapp and Lustre
- Experience with parallel file systems and storage solutions
- Strong knowledge of job schedulers such as Slurm or LSF
- Familiarity with node provisioning tools like Werewolf
- Proficiency in Linux OS administration
- Knowledge of job scheduling tools (e, Slurm)
- Understanding of node provisioning tools (e, Werewolf)
- Excellent problem-solving abilities
- Linux+ certification preferred
- Top Secret Clearance : TS / SCI preferred
- On-site presence at customer location in Stennis, MS
- Availability for some on-call / weekend work
- Hands on experience setting up HPC compute cluster
- Setup PBS job scheduler and supporting PBS servers
- Experience with Redhat and Rocky Linux; bash scripting
- Nice to have Docker, Kubernetes experience
- Nice to have Storage knowledge
- Nice to have networking and devops knowledge
(ref : hirist.tech)