Job Description :
We are seeking an experienced
Lead Solutions Architect with deep expertise in
AI / ML infrastructure ,
High Performance Computing (HPC) , and
container platforms to join our dynamic team focused on delivering
HPE Private Cloud AI and
Enterprise AI Factory Solutions . This role is instrumental in architecting, deploying, and optimizing private cloud environments that leverage HPE's co-developed solutions with NVIDIA, as well as validated HPE reference architectures, to support enterprise-grade AI workloads at scale.
The ideal candidate will bring strong technical expertise in AI infrastructure, container orchestration platforms, and hybrid cloud environments, and will play a key role in delivering scalable, secure, and high-performance AI platform solutions powered by HPE GreenLake and NVIDIA AI Enterprise technologies.
Key Responsibilities :
Leadership and Strategy :
Provide delivery assurance and serve as the lead design authority to ensure seamless execution of Enterprise grade container platform -including Red Hat OpenShift and SUSE Rancher, HPE Private Cloud AI and HPC / AI solutions, fully aligned with customer AI / ML strategies and business objectives.
Solution Planning and Design :
Architect and optimize end-to-end solutions across container orchestration and HPC workload management domains, leveraging platforms such as Red Hat OpenShift, SUSE Rancher, and / or workload schedulers like Slurm and Altair PBS Pro.
Opportunity assessment :
Lead technical responses to RFPs, RFIs, and customer inquiries, ensuring alignment with business and technical requirements.
Innovation and Research :
Stay current with emerging technologies, industry trends, and best practices across HPC, Kubernetes, container platforms, hybrid cloud, and security to inform solution design and innovation.
Customer-centric mindset :
Act as a trusted advisor to enterprise customers, ensuring alignment of AI solutions with business goals.
6.
Team Collaboration :
Collaborate with cross-functional teams, including subject matter experts in infrastructure components-such as HPE servers, storage, networking-and data science teams to ensure cohesive and integrated solution delivery.
Mentor technical consultants and contribute to internal knowledge sharing through tech talks and innovation forums.
Required Skills :
1. HPC & AI Infrastructure
Extensive knowledge of HPC technologies and workload scheduler such as Slurm and / or Altair PBS Pro,
Proficient in HPC cluster management tools, including HPE Cluster Management (HPCM) and / or NVIDIA Base Command Manager.
Experience with HPC cluster managers like HPE Cluster Management (HPCM) and / or NVIDIA Base Command Manager.
Good understanding with high-speed networking stacks (InfiniBand, Mellanox) and performance tuning of HPC components.
Solid grasp of high-speed networking technologies, such as InfiniBand and Ethernet.
2. Containerization & Orchestration
Extensive hands-on experience with containerization technologies such as Docker, Podman, and Singularity
Proficiency with at least two container orchestration platforms : CNCF Kubernetes, Red Hat OpenShift, SUSE Rancher (RKE / K3S), Canonical Charmed Kubernetes.
Strong understanding of GPU technologies, including the NVIDIA GPU Operator for Kubernetes-based environments and DCGM (Data Center GPU Manager) for GPU health and performance monitoring.
3.Operating Systems & Virtualization
Extensive experience in Linux system administration, including package management, boot process troubleshooting, performance tuning, and network configuration.
Ai Solution Architect • KA, India