Talent.com
No longer accepting applications
▷ [15h Left] GPU Infrastructure & Data Center Engineer

▷ [15h Left] GPU Infrastructure & Data Center Engineer

PhoQtek labsHyderabad, Telangana, India
18 hours ago
Job description

About the Role

We are seeking a highly skilled IT Solutions & GPU Infrastructure Lead to take complete ownership of our GPU-based server infrastructure. This role focuses on next-generation GPU systems used for AI / ML workloads, covering every aspect from data center colocation and setup to GPU slicing, MIG management, resource allocation, optimization, and compliance. You will lead the end-to-end lifecycle of GPU infrastructure — ensuring all servers are optimized, secure, and production-ready for both internal and customer use.

Key Responsibilities

1. Colocation & Infrastructure Setup

GPU colocation and end-to-end infrastructure setup will be entirely under your ownership and responsibility.

  • Coordinate with data centers for rack installation, power, and cooling.
  • Deploy and configure GPU-based servers for production readiness.

2. GPU & AI / ML Infrastructure

  • Manage GPU slicing and MIG (Multi-Instance GPU) for multi-tenant workloads.
  • Install and maintain the NVIDIA software stack — CUDA, cuDNN, NCCL, and DCGM.
  • Optimize GPU infrastructure for AI / ML workloads (TensorFlow, PyTorch, RAPIDS).
  • Support multi-GPU scaling using NVLink and PCIe passthrough.
  • 3. Systems & Virtualization

  • Administer Linux-based environments (Ubuntu, CentOS, Rocky) along with other environments.
  • Manage virtualization platforms such as VMware, KVM, or Proxmox with GPU passthrough.
  • Handle container orchestration with Docker and Kubernetes GPU Operators.
  • Integrate high-performance storage (NFS, Ceph, SAN / NAS) for large-scale datasets.
  • 4. Monitoring & Performance Optimization

  • Monitor GPU and system performance using Prometheus, Grafana, NVIDIA DCGM, and nvidia-smi.
  • Proactively detect, analyze, and resolve GPU or system bottlenecks.
  • Optimize GPU nodes for training and inference performance.
  • Implement structured logging, alerts, and usage reporting.
  • one should have to administer, manage, monitor and maintain GPU infrastructure for AI workloads.
  • 5. Security & Compliance

  • Harden GPU servers for multi-tenant workloads.
  • Manage driver, firmware, and software license compliance.
  • Ensure infrastructure security and audit readiness with periodic patching and updates.
  • 6. Networking & High-Performance I / O

  • Configure and maintain high-speed network fabrics (InfiniBand, RDMA, RoCE).
  • Optimize low-latency interconnects for distributed GPU workloads.
  • Troubleshoot and enhance data transfer performance.
  • 7. Customer & Infrastructure Ownership

  • Serve as the primary contact for GPU resource allocation.
  • Provision GPU slices or MIG instances for internal and external teams.
  • Troubleshoot, document, and optimize workload performance.
  • Qualifications

  • Proven experience in data center server setup and colocation.
  • Deep expertise in GPU server administration (NVIDIA A100 / H100 or equivalent).
  • Strong working knowledge of GPU slicing, MIG, CUDA, NCCL, and NVIDIA drivers.
  • Experience with Linux administration, virtualization (VMware / KVM / Proxmox), and containers (Docker / Kubernetes).
  • Hands-on experience with AI / ML frameworks such as TensorFlow and PyTorch.
  • Familiarity with monitoring tools (Prometheus, Grafana, DCGM).
  • Knowledge of storage systems (NFS, Ceph) and high-performance networking.
  • Strong vendor coordination and infrastructure management skills.
  • Why This Role Matters

    This position owns the entire lifecycle of GPU-based infrastructure — from colocation to slicing, monitoring, and optimization. You will build and maintain the backbone of our AI / ML infrastructure, ensuring that all systems are efficient, scalable, and production-grade.

    Create a job alert for this search

    15H Left Data Center • Hyderabad, Telangana, India

    Related jobs
    • Promoted
    Senior Network Infrastructure Engineer

    Senior Network Infrastructure Engineer

    CtrlS DatacentersHyderabad, Republic Of India, IN
    Remote access solution implementation and support such as IPSEC VPN.Experience installing / configuring routers, switches (Cisco ,Juniper ,HP, DELL). Experience installing / configuring firewalls (Cisco...Show moreLast updated: 23 hours ago
    • Promoted
    Infrastructure Engineer - On-Premises / Cloud

    Infrastructure Engineer - On-Premises / Cloud

    ImpacteersHyderabad
    About the Role : We are seeking a highly skilled Infrastructure Engineer to design, build, and maintain the scalable, secure, and resilient infrastructure that suppo...Show moreLast updated: 30+ days ago
    • Promoted
    • New!
    Data Center System

    Data Center System

    Anicalls (Pty) Ltdhyderabad, India
    Be able to work with and employ rack level, redundancy levels and techniques, rack standards, properties, selection criteria, power rail / strip options, cold aisle / hot aisle containment, and fire su...Show moreLast updated: 6 hours ago
    • Promoted
    • New!
    High Salary : GPU Infrastructure & Data Center Engineer

    High Salary : GPU Infrastructure & Data Center Engineer

    PhoQtek labsHyderabad, Telangana, India
    We are seeking a highly skilled IT Solutions & GPU Infrastructure Lead to take complete ownership of our GPU-based server infrastructure. This role focuses on next-generation GPU systems used for AI...Show moreLast updated: 3 hours ago
    • Promoted
    • New!
    Data Center

    Data Center

    Anicalls (Pty) Ltdhyderabad, India
    Implement, maintain, and lead improvement projects on Windows and Linux infrastructure.Planning upgrades, implementing configuration changes, extending and replacing engineering IT systems.Work wit...Show moreLast updated: 6 hours ago
    • Promoted
    Egen - Lead Infrastructure Engineer - Google Cloud Platform

    Egen - Lead Infrastructure Engineer - Google Cloud Platform

    SPRINGML INDIA DEVELOPMENT CENTER PRIVATE LIMITEDHyderabad
    Job title : Lead Infrastructure Engineer GCP Location : Hyderabad Exp : 10 -15 <...Show moreLast updated: 30+ days ago
    • Promoted
    IP Network Infrastructure Engineer

    IP Network Infrastructure Engineer

    Lucid Technology ServicesHyderabad, India
    We are seeking a highly skilled and versatile IP Network Infrastructure Engineer to join our dynamic and forward-thinking team to support a U. This role is ideal for professionals with deep exper...Show moreLast updated: 30+ days ago
    • Promoted
    ACI Network Engineer

    ACI Network Engineer

    PamTen IncHyderabad, IN
    We are seeking numerous highly skilled Data Center Engineers to join our delivery team supporting Cisco initiatives.This role requires deep technical expertise across core data center technologies,...Show moreLast updated: 30+ days ago
    • Promoted
    GPU Infrastructure & Data Center Engineer

    GPU Infrastructure & Data Center Engineer

    PhoQtek labshyderabad, telangana, in
    We are seeking a highly skilled IT Solutions & GPU Infrastructure Lead to take complete ownership of our GPU-based server infrastructure. This role focuses on next-generation GPU systems used for AI...Show moreLast updated: 5 days ago
    • Promoted
    Data Infrastructure Engineer

    Data Infrastructure Engineer

    TMUS Global SolutionsHyderabad, Republic Of India, IN
    NASDAQ : TMUS), headquartered in Bellevue, Washington, is America’s supercharged Un-carrier, connecting millions through its strong nationwide network and flagship brands, T-Mobile and Metro by T-Mo...Show moreLast updated: 21 days ago
    • Promoted
    Hiring for AWS Infra Engineer / Architect

    Hiring for AWS Infra Engineer / Architect

    Tata Consultancy ServicesHyderabad, Telangana, India
    Tata Consultancy Services (TCS).TCS has always been in the spotlight for being adept in “the next big technologies”.What we can offer you is a space to explore varied technologies and quench your t...Show moreLast updated: 16 days ago
    • Promoted
    Cloud Infrastructure Engineer

    Cloud Infrastructure Engineer

    Tata Consultancy ServicesHyderabad, Republic Of India, IN
    Incident & Problem Management : Handle.L1 support, conduct root-cause analysis for incidents, and implement corrective actions. Experience using ITIL tools like Service Now.Troubleshoot and manage va...Show moreLast updated: 30+ days ago
    • Promoted
    Network Infrastructure Engineer

    Network Infrastructure Engineer

    Tata Consultancy ServicesHyderabad, Republic Of India, IN
    Expert in Network Routing and Switching.Expert knowledge and hands on experience on Cisco ACI Architecture and Implementation. Strong experience & knowledge of ACI components (Tenant, BD, VRF, APIC,...Show moreLast updated: 30+ days ago
    • Promoted
    Infrastructure Engineer

    Infrastructure Engineer

    Tekskills Inc.Hyderabad, Telangana, India
    We are seeking a seasoned Infrastructure Engineer with strong expertise in Oracle Linux Virtualization Manager (OLVM) , and a solid understanding of any IT Industry or systems.The ideal candid...Show moreLast updated: 6 days ago
    • Promoted
    Lead Infrastructure Engineer - MFT / AWS

    Lead Infrastructure Engineer - MFT / AWS

    Talks About People HR SolutionsHyderabad
    Key Responsibilities : - Serve as an individual contributor and technical coach, leading and guiding the team.Provide consultancy and solutions to customers across various prod...Show moreLast updated: 30+ days ago
    • Promoted
    Cloud Engineer Ii T500-20908

    Cloud Engineer Ii T500-20908

    McDonald'sHyderabad, Republic Of India, IN
    One of the world’s largest employers with locations in more than 100 countries, McDonald’s Corporation has corporate opportunities in Hyderabad. Our global offices serve as dynamic innovation and op...Show moreLast updated: 14 days ago
    • Promoted
    System Engineer

    System Engineer

    Shivanta Business Solutions Private Limitedhyderabad, India
    Infrastructure Engineer - Systems Engineer ( Startup Experience Preferred).We are looking for a dedicated.Infrastructure Engineer - Systems Engineer. Automation, VMware technologies, and backup solu...Show moreLast updated: 30+ days ago
    • Promoted
    Hiring for Azure Infra Engineer / Architect

    Hiring for Azure Infra Engineer / Architect

    Tata Consultancy ServicesHyderabad, Telangana, India
    Tata Consultancy Services (TCS).TCS has always been in the spotlight for being adept in “the next big technologies”.What we can offer you is a space to explore varied technologies and quench your t...Show moreLast updated: 16 days ago