Talent.com
This job offer is not available in your country.
Senior HPC Engineer

Senior HPC Engineer

Netweb Technologies India Ltd.Faridabad, Haryana, India
22 days ago
Job description

Job Title : Senior Engineer-HPC

Department : Production & Support

Location : Faridabad

Position Summary :

Accomplished HPC Systems Engineer with 8–10 years of enterprise Linux administration and over 5 years of hands-on experience managing large-scale HPC clusters exceeding 500 cores and multi-petabyte storage environments. Proven expertise in designing, implementing, and optimizing HPC infrastructure, including compute, storage, and high-speed networking, to deliver maximum performance for demanding workloads.

Key Responsibilities :

HPC Cluster Management & Optimization

  • Design, implement, and maintain HPC environments, including compute, storage, and network components.
  • Configure and optimize Slurm, PBS Pro, or other workload managers / schedulers for efficient job scheduling and resource allocation.
  • Implement performance tuning for CPU, GPU, memory, I / O, and network subsystems to meet workload demands.
  • Manage HPC filesystem solutions such as Lustre, BeeGFS, or GPFS / Spectrum Scale.

Linux Administration

  • Administer enterprise-grade Linux distributions (RHEL, CentOS, Rocky, Ubuntu) in large-scale compute environments.
  • Manage kernel upgrades, patching, and security hardening.
  • Troubleshoot kernel-level and system-level issues for performance and stability.
  • Automation & Configuration Management

  • Develop and maintain Ansible playbooks / roles for automated provisioning, configuration, and patching of HPC systems.
  • Integrate Ansible with CI / CD pipelines for infrastructure as code (IaC) practices.
  • Automate cluster deployment and environment consistency across hundreds of nodes.
  • Monitoring, Troubleshooting & Support

  • Implement and maintain monitoring tools (e.g., Grafana, Prometheus, Nagios, Ganglia).
  • Troubleshoot complex HPC workloads, MPI communication issues, and application performance bottlenecks.
  • Provide Tier-3 escalation support for Linux / HPC-related incidents.
  • Collaboration & Documentation

  • Work closely with research teams, DevOps engineers, and system architects to deliver high-performance solutions.
  • Document architecture, SOPs, troubleshooting guides, and performance tuning methodologies.
  • Requirements

    Required Skills & Experience

  • 8–10 years of hands-on Linux system administration experience in production environments.
  • 5+ years managing HPC clusters at scale (500+ cores / multiple petabytes of storage).
  • Strong Ansible automation skills (complex playbooks, roles, variables, templates).
  • Deep understanding of MPI, OpenMP, and GPU / accelerator integration in HPC workloads.
  • Proficient with HPC job schedulers (Slurm, PBS Pro, LSF).
  • Experience with HPC storage (Lustre, BeeGFS, GPFS).
  • Strong knowledge of TCP / IP networking, Infiniband, and RDMA technologies.
  • Experience with performance tuning and benchmarking tools (perf, hpc tool kit, Intel VTune, Iperf, fio).
  • Scripting proficiency in Bash, Python, or Perl for automation and tooling.
  • Preferred Qualifications

  • Experience with containerized HPC (Singularity, Apptainer, or Podman).
  • Familiarity with cloud-HPC integration (AWS Parallel Cluster, Azure Cycle Cloud, GCP HPC).
  • Knowledge of security compliance standards (CIS benchmarks, STIG).
  • Contribution to HPC community tools or open-source projects.
  • Soft Skills

  • Strong problem-solving and analytical thinking.
  • Ability to mentor junior engineers and collaborate across teams.
  • Excellent communication skills for technical and non-technical stakeholders.
  • Create a job alert for this search

    Senior Engineer • Faridabad, Haryana, India

    Related jobs
    • Promoted
    Senior AI Platform Engineer (CV & GCP)

    Senior AI Platform Engineer (CV & GCP)

    Jaipur RoboticsDelhi, IN
    Remote (availability to work in the CET timezone).At Jaipur Robotics, we envision a future where data from waste is abundant, driving automation and digital transformation across the waste industry...Show moreLast updated: 30+ days ago
    • Promoted
    Design Verification Engineer

    Design Verification Engineer

    ACL DigitalDelhi, IN
    ACL Digital Hiring for the below requirement.Hands-on experiences on SV / UVM / Specman.Familiarity with formal-based verification. Running regression and debugging failures independently.Experience in ...Show moreLast updated: 30+ days ago
    • Promoted
    SAP CO Architect-Hybrid Mode

    SAP CO Architect-Hybrid Mode

    Avensys ConsultingDelhi, IN
    Avensys is a reputed global IT professional services company headquartered in Singapore.Our service spectrum includes enterprise solution consulting, business intelligence, business process automat...Show moreLast updated: 3 days ago
    • Promoted
    eBPF Systems Engineer (Core Agent Team)

    eBPF Systems Engineer (Core Agent Team)

    Alma SecurityDelhi, IN
    The ideal candidate will help build, maintain, and troubleshoot, the company's rapidly expanding infrastructure.They will work alongside other engineers to ensure highest levels of performance and ...Show moreLast updated: 30+ days ago
    • Promoted
    EUC Windows L2 Engineer

    EUC Windows L2 Engineer

    Yoda TechDelhi, IN
    Singapore-based company that focuses on dividing digitalization into small logical Micro initiatives with ready-to-use Micro-bots. The company aims to reduce IT operations spend by emphasizing Autom...Show moreLast updated: 17 days ago
    • Promoted
    Senior AI Engineer

    Senior AI Engineer

    DarGlobalDelhi, IN
    We are seeking a highly skilled.AI / ML solutions that solve complex business challenges and power next-generation products. The ideal candidate has strong expertise in machine learning, deep learning...Show moreLast updated: 1 day ago
    • Promoted
    Senior HPC Engineer

    Senior HPC Engineer

    Netweb Technologies India Ltd.Faridabad, Haryana, India
    Accomplished HPC Systems Engineer with 8–10 years of enterprise Linux administration and over 5 years of hands-on experience managing large-scale HPC clusters exceeding 500 cores and multi-petabyte...Show moreLast updated: 23 days ago
    • Promoted
    IAM Senior Engineer (IGA)

    IAM Senior Engineer (IGA)

    ATCGhaziabad, IN
    Identity Governance and Administration (IGA).This role is accountable for ensuring uniformity of services, adherence to standards, and consistency of infrastructure delivery.The IAM Senior Engineer...Show moreLast updated: 1 day ago
    • Promoted
    Verification Engineer (Design, Junior and Senior)

    Verification Engineer (Design, Junior and Senior)

    Alpinum ConsultingDelhi, IN
    ONLY WEBSITE APPLICATIONS WILL BE ACCEPTED.We’re Hiring Verification Engineers (All Levels) | India, Remote.Alpinum is expanding its global engineering team and looking for talented Verification En...Show moreLast updated: 1 day ago
    • Promoted
    Resident Engineer – Kubernetes & Portworx

    Resident Engineer – Kubernetes & Portworx

    CMK Resources, Inc.Delhi, IN
    CMK Resources Resident Engineer – Kubernetes & Portworx (3 openings).Help Shape the Future of Kubernetes Storage.Our client's largest and most strategic customer is moving VMware-based workloads to...Show moreLast updated: 5 days ago
    • Promoted
    System Engineer

    System Engineer

    Next VenturesDelhi, IN
    Offshore Systems Engineer – VMware & Azure.We’re seeking a highly skilled.This role is ideal for someone who thrives in dynamic environments, stays ahead of emerging tech trends, and can drive inno...Show moreLast updated: 2 days ago
    • Promoted
    SAP EWM Architect-Hybrid Mode

    SAP EWM Architect-Hybrid Mode

    Avensys ConsultingDelhi, IN
    Avensys is a reputed global IT professional services company headquartered in Singapore.Our service spectrum includes enterprise solution consulting, business intelligence, business process automat...Show moreLast updated: 3 days ago
    • Promoted
    Senior Sales Engineer

    Senior Sales Engineer

    Rohanika Electronics & Medical SystemsSouth Delhi, Delhi, India
    Rohanika Electronics & Medical Systems, established in 1993, is a leader in providing quality medical products.The company is committed to delivering high-end quality products and after-sales servi...Show moreLast updated: 2 days ago
    • Promoted
    Senior Software Engineer(Generative AI Engineer)

    Senior Software Engineer(Generative AI Engineer)

    VeltrisDelhi, IN
    Veltris is a Digital Product Engineering Services partner committed to driving technology-enabled transformation across enterprises, businesses, and industries. We specialize in delivering next-gene...Show moreLast updated: 5 days ago
    • Promoted
    HPC Storage Engineer

    HPC Storage Engineer

    ConfidentialGurgaon / Gurugram
    The High-Performance Computing Storage Engineer is primarily responsible for the overall health and maintenance of storage technologies in our managed services customer's environments.Our Storage E...Show moreLast updated: 17 days ago
    • Promoted
    HPC Infrastructure Engineer

    HPC Infrastructure Engineer

    ConfidentialGurgaon / Gurugram
    Provide enterprise-level operational support to Managed Services customers for incident, problem, and change management activities. Plan and perform maintenance activities.Assess customer environmen...Show moreLast updated: 17 days ago
    • Promoted
    Deployment Engineer

    Deployment Engineer

    AvocaDelhi, IN
    Build, launch & optimize AI agents that power the next generation of home-service customer experiences.Avoca is the all-in-one AI lead-conversion platform. Our technology boosts booking rates, slash...Show moreLast updated: 30+ days ago
    • Promoted
    FPGA Engineer Intern

    FPGA Engineer Intern

    Chiptip TechnologyDelhi, IN
    Remote (Global Team) Our lean team is distributed across the SF Bay Area, Tokyo, Singapore, Auckland, and Colombo.While this is a remote position, we offer access to a workspace in Palo Alto, CA, f...Show moreLast updated: 30+ days ago
    • Promoted
    IAM Senior Engineer - CIAM

    IAM Senior Engineer - CIAM

    ATCGhaziabad, IN
    Customer Identity and Access Management (CIAM).Certificate Lifecycle Management supporting infrastructure and services.This role is a critical part of the overall authentication and authorization i...Show moreLast updated: 1 day ago
    • Promoted
    L4 UC Engineer

    L4 UC Engineer

    Servion Global SolutionsDelhi, IN
    UC Architecture & Design : Deep understanding of Unified Communications Products like CUCM, CUC, IM & Presence, and Expressways. Deep knowledge of designing and troubleshooting clusters, inter-cluste...Show moreLast updated: 15 days ago