Talent.com
This job offer is not available in your country.
HPC System Engineer

HPC System Engineer

Metasys Technologies, IncDelhi, IN
11 days ago
Job type
  • Remote
Job description

Role : HPC System Engineer

Full Time

Location : Hyderabad (REMOTE)

Notice Period : 30 Days

Job Description : Responsibilities :

  • Administration of HPC and VDI clusters
  • User Account management for HPC onboarding and offboarding
  • Creation and Maintenance of AMI Images in AMI accounts
  • Install, configure, and maintain Linux operating systems on HPC clusters
  • Support HPC necessary components and native services of the platform by coordinating with respective providers e.g., EFPortal, AWS RES, CycleCloud, AWS Parallel Cluster, etc.
  • AWS Managed Active Directory support and Management
  • Continuous upgrades to the HPC platform and related components OS, Java, Python, EFPortal, etc.
  • Implement and maintain necessary compliance controls i.e., US Export Control, Confidentiality
  • Conduct regular audits, share the findings and implement corrective actions as required
  • Co-ordinate with other teams like v-drive team in testing and migrating / installing engineering applications to the platform
  • Manage job schedulers such as Slurm or LSF
  • Utilize node provisioning tools like Werewolf
  • Troubleshoot system issues and provide technical support to users
  • Monitor system performance and ensure optimal operation of the HPC environment
  • Collaborate with other IT professionals to integrate new technologies into the existing infrastructure
  • Progressive experience in HPC system administration, preferably in a Redhat / CentOS Linux environment
  • AWS Cloud formation templates to build infrastructure for HPC and storage Amazon FSx for Netapp and Lustre
  • Experience with parallel file systems and storage solutions
  • Strong knowledge of job schedulers such as Slurm or LSF
  • Familiarity with node provisioning tools like Werewolf
  • Proficiency in Linux OS administration
  • Knowledge of job scheduling tools (e.g., Slurm)
  • Understanding of node provisioning tools (e.g., Werewolf)
  • Excellent problem-solving abilities
  • Linux+ certification preferred
  • Top Secret Clearance : TS / SCI preferred
  • On-site presence at customer location in Stennis, MS
  • Availability for some on-call / weekend work
  • Hands on experience setting up HPC compute cluster
  • Setup PBS job scheduler and supporting PBS servers
  • Experience with Redhat and Rocky Linux; bash scripting
  • Nice to have Docker, Kubernetes experience
  • Nice to have Storage knowledge
  • Nice to have networking and devops knowledge

Qualifications :

Minimum Qualifications / Skills :

  • Bachelor's Degree required
  • Preferably in Computer Science, Information Systems, or related field
  • Preferred qualifications / Skills :

  • Very good written and presentation / verbal communication skills with experience of customer interfacing role
  • In-depth requirement understanding skills with good analytical and problem solving ability, interpersonal efficiency, and positive attitude
  • Must have Experienced :

  • Administration of HPC and VDI clusters
  • Deployed and configured AWS Parallel Cluster for HPC workload orchestration with CFT
  • Deployed and configured AWS Managed active directory
  • Provisioned Amazon Storage FSx for NetApp and Lustre with HPC
  • (ref : hirist.tech)

    Create a job alert for this search

    System Engineer • Delhi, IN