Talent.com
Senior HPC Systems Specialist

Senior HPC Systems Specialist

Netweb Technologies India Ltd.Faridabad, Republic Of India, IN
30+ days ago
Job description

Job Title : Senior Engineer-HPC

Department : Production & Support

Location : Faridabad

Position Summary :

Accomplished HPC Systems Engineer with 8–10 years of enterprise Linux administration and over 5 years of hands-on experience managing large-scale HPC clusters exceeding 500 cores and multi-petabyte storage environments. Proven expertise in designing, implementing, and optimizing HPC infrastructure, including compute, storage, and high-speed networking, to deliver maximum performance for demanding workloads.

Key Responsibilities :

HPC Cluster Management & Optimization

  • Design, implement, and maintain HPC environments, including compute, storage, and network components.
  • Configure and optimize Slurm, PBS Pro, or other workload managers / schedulers for efficient job scheduling and resource allocation.
  • Implement performance tuning for CPU, GPU, memory, I / O, and network subsystems to meet workload demands.
  • Manage HPC filesystem solutions such as Lustre, BeeGFS, or GPFS / Spectrum Scale.

Linux Administration

  • Administer enterprise-grade Linux distributions (RHEL, CentOS, Rocky, Ubuntu) in large-scale compute environments.
  • Manage kernel upgrades, patching, and security hardening.
  • Troubleshoot kernel-level and system-level issues for performance and stability.
  • Automation & Configuration Management

  • Develop and maintain Ansible playbooks / roles for automated provisioning, configuration, and patching of HPC systems.
  • Integrate Ansible with CI / CD pipelines for infrastructure as code (IaC) practices.
  • Automate cluster deployment and environment consistency across hundreds of nodes.
  • Monitoring, Troubleshooting & Support

  • Implement and maintain monitoring tools (e.G., Grafana, Prometheus, Nagios, Ganglia).
  • Troubleshoot complex HPC workloads, MPI communication issues, and application performance bottlenecks.
  • Provide Tier-3 escalation support for Linux / HPC-related incidents.
  • Collaboration & Documentation

  • Work closely with research teams, DevOps engineers, and system architects to deliver high-performance solutions.
  • Document architecture, SOPs, troubleshooting guides, and performance tuning methodologies.
  • Requirements

    Required Skills & Experience

  • 8–10 years of hands-on Linux system administration experience in production environments.
  • 5+ years managing HPC clusters at scale (500+ cores / multiple petabytes of storage).
  • Strong Ansible automation skills (complex playbooks, roles, variables, templates).
  • Deep understanding of MPI, OpenMP, and GPU / accelerator integration in HPC workloads.
  • Proficient with HPC job schedulers (Slurm, PBS Pro, LSF).
  • Experience with HPC storage (Lustre, BeeGFS, GPFS).
  • Strong knowledge of TCP / IP networking, Infiniband, and RDMA technologies.
  • Experience with performance tuning and benchmarking tools (perf, hpc tool kit, Intel VTune, Iperf, fio).
  • Scripting proficiency in Bash, Python, or Perl for automation and tooling.
  • Preferred Qualifications

  • Experience with containerized HPC (Singularity, Apptainer, or Podman).
  • Familiarity with cloud-HPC integration (AWS Parallel Cluster, Azure Cycle Cloud, GCP HPC).
  • Knowledge of security compliance standards (CIS benchmarks, STIG).
  • Contribution to HPC community tools or open-source projects.
  • Soft Skills

  • Strong problem-solving and analytical thinking.
  • Ability to mentor junior engineers and collaborate across teams.
  • Excellent communication skills for technical and non-technical stakeholders.
  • Create a job alert for this search

    Senior Specialist • Faridabad, Republic Of India, IN

    Related jobs
    • Promoted
    Configuration Management Specialist

    Configuration Management Specialist

    PeoplePlusTech Inc.Delhi, India
    Hardware Asset Management & CMDB Specialist.Costa Rica, Mexico, Argentina, South Africa, Eastern Europe, India, Philippines, Vietnam, Malaysia, Indonesia,. We are looking for an experienced Hardware...Show moreLast updated: 3 days ago
    • Promoted
    Senior System Engineer

    Senior System Engineer

    Resonating Mindz Pvt LtdDelhi, India
    Best Opportunity for Automation Engineers to Build a Rewarding Career in the IIoT Domain!.Are you an Industrial Automation Engineer who wants to take on new challenges in the emerging field of Indu...Show moreLast updated: 3 days ago
    • Promoted
    Channels Systems Engineer

    Channels Systems Engineer

    Palo Alto NetworksDelhi, India
    At Palo Alto Networks® everything starts and ends with our mission : .Being the cybersecurity partner of choice, protecting our digital way of life. Our vision is a world where each day is safer and m...Show moreLast updated: 30+ days ago
    • Promoted
    Senior HPC Engineer

    Senior HPC Engineer

    Netweb Technologies India Ltd.Faridabad, Haryana, India
    Accomplished HPC Systems Engineer with 8–10 years of enterprise Linux administration and over 5 years of hands-on experience managing large-scale HPC clusters exceeding 500 cores and multi-petabyte...Show moreLast updated: 30+ days ago
    • Promoted
    Senior Hpc Engineer

    Senior Hpc Engineer

    Netweb Technologies India Ltd.Faridabad, Republic Of India, IN
    Accomplished HPC Systems Engineer with 8–10 years of enterprise Linux administration and over 5 years of hands-on experience managing large-scale HPC clusters exceeding 500 cores and multi-petabyte...Show moreLast updated: 30+ days ago
    • Promoted
    Senior HPC Engineer

    Senior HPC Engineer

    ConfidentialIndia, Faridabad
    Accomplished HPC Systems Engineer with 8–10 years of enterprise Linux administration and over 5 years of hands-on experience managing large-scale HPC clusters exceeding 500 cores and multi-petabyte...Show moreLast updated: 7 days ago
    • Promoted
    Freelance Role : FPGA Engineer (Embedded / Control Systems)

    Freelance Role : FPGA Engineer (Embedded / Control Systems)

    ThreatXIntelgurgaon, haryana, in
    ThreatXIntel is a startup cyber security company focused on protecting businesses and organizations from cyber threats.Our experienced team offers a range of services, including cloud security, web...Show moreLast updated: 4 days ago
    • Promoted
    Linux System Administrator (AWS Specialist)

    Linux System Administrator (AWS Specialist)

    MGT-COMMERCE GmbHGhaziabad, IN
    Do you live and breathe Linux? Do you enjoy building and managing servers in the cloud?.Linux-focused System Administrator. AWS infrastructure and keep systems running at peak performance.Setting up...Show moreLast updated: 30+ days ago
    • Promoted
    Senior Systems Engineer HPC - R-21841

    Senior Systems Engineer HPC - R-21841

    ConfidentialGurgaon / Gurugram, India
    System Administration & Maintenance : .Install, configure, and maintain HPC clusters (hardware, software, operating systems), perform regular updates / patching, manage user accounts and permissions, a...Show moreLast updated: 7 days ago
    • Promoted
    • New!
    ▷ 15h Left : Senior HPC Engineer

    ▷ 15h Left : Senior HPC Engineer

    Netweb Technologies India Ltd.Faridabad, Haryana, India
    Job Title : Senior Engineer-HPC.Department : Production & Support.Accomplished HPC Systems Engineer with 8–10 years of enterprise Linux administration and over 5 years of hands-on experience managing...Show moreLast updated: 2 hours ago
    • Promoted
    HPC System Administrator - Active Directory

    HPC System Administrator - Active Directory

    NVISH SOLUTIONS PRIVATE LIMITEDGurgaon
    Responsibilities : - Administration of HPC and VDI clusters - User Account management for HPC onboarding and offboarding ...Show moreLast updated: 30+ days ago
    • Promoted
    System Integration Specialist

    System Integration Specialist

    Alp Consulting Ltd.Ghaziabad, IN
    AI Automation & Integration Developer.AI Automation & Integration Developers.You’ll design and implement automation workflows using. APIs and enhancing business productivity with AI-driven solutions...Show moreLast updated: 2 days ago
    • Promoted
    Configuration Senior Specialist

    Configuration Senior Specialist

    ConfidentialNoida, India
    NTT DATA strives to hire exceptional, innovative and passionate individuals who want to grow with us.If you want to be part of an inclusive, adaptable, and forward-thinking organization, apply now....Show moreLast updated: 7 days ago
    • Promoted
    Systems & Automation Specialist

    Systems & Automation Specialist

    White Tiger Connections Inc.Ghaziabad, IN
    We’re looking for someone who thrives at the intersection of IT, systems design, and automation — someone who can help us build, connect, and maintain the tools that keep our business running smoot...Show moreLast updated: 2 days ago
    • Promoted
    System Administrator

    System Administrator

    MGT-COMMERCE GmbHfaridabad, haryana, in
    MGT-Commerce is a Berlin-based company founded in 2010 that specializes in providing managed cloud hosting services for Magento e-commerce shops on top of Amazon Web Services (AWS).As an AWS Advanc...Show moreLast updated: 30+ days ago
    • Promoted
    Senior Specialist, IPTX

    Senior Specialist, IPTX

    Rakuten SymphonyDelhi, India
    Rakuten Symphony is reimagining telecom, changing supply chain norms and disrupting outmoded thinking that threatens the industry's pursuit of rapid innovation and growth.Based on proven modern inf...Show moreLast updated: 24 days ago
    • Promoted
    System Integration Sr. Specialist Advisor

    System Integration Sr. Specialist Advisor

    ConfidentialNoida, India
    NTT DATA strives to hire exceptional, innovative and passionate individuals who want to grow with us.If you want to be part of an inclusive, adaptable, and forward-thinking organization, apply now....Show moreLast updated: 6 days ago
    • Promoted
    Senior System Administrator

    Senior System Administrator

    People Prime WorldwideDelhi, India
    Position : VMware Engineer – VCF / NSX / vSAN / Aria Deployment & Migrations.Location : PAN India | Experience : 5+ Years | Immediate Joiner or 15 days. Design & deploy VCF, NSX-T / NSX-V, vSAN, and VMwa...Show moreLast updated: 23 days ago