Position : Infrastructure Architect AI Systems (NVIDIA HGX & Dell PowerEdge)
Experience : 5+ Years
Location : India
Job Summary :
We are seeking a seasoned Infrastructure Architect AI Systems with over 5 years of experience in designing and managing high-performance computing environments. The ideal candidate will have extensive hands-on expertise with NVIDIA HGX platforms and Dell PowerEdge servers. This role is critical for evaluating, optimizing, and scaling our infrastructure to support evolving AI use cases and workloads, ensuring maximum performance, reliability, and security.
Key Responsibilities :
- Assess the scalability and performance of the existing infrastructure architecture, and review current systems against enterprise infrastructure standards and best practices for AI workloads.
- Evaluate and recommend compute strategies, including CPU versus GPU acceleration trade-offs, to optimize performance for various machine learning and deep learning models.
- Identify and propose infrastructure changes based on evolving AI use cases, business adoption roadmaps, and the need for high-performance data processing.
- Collaborate with cross-functional teams, including hardware engineers, software developers, and QA, to align infrastructure with product goals and ensure seamless deployment.
- Design and implement systems with a focus on high availability, redundancy, and power management optimization to support mission-critical AI operations.
Required Skills & Qualifications :
5+ years in infrastructure architecture or systems engineering roles.Hands-on experience managing Dell PowerEdge Servers (R660, XE8640) integrated with NVIDIA HGX platforms.Proven ability to tune hardware for optimal AI performance, including memory bandwidth, I / O optimization, and thermal management.Strong background in infrastructure security protocols, including secure boot, firmware validation, and network segmentation.Expertise in power management optimization and designing for system redundancy and uptime.Familiarity with firmware lifecycle management, including flashing, validation, and rollback strategies.Infrastructure Architect AI Systems
(ref : hirist.tech)