Job descriptionDesign and implement large-scale AI / ML infrastructure solutions using NVIDIA GPU clusters, SMCI server platforms, and high-performance computing architectures to support enterprise AI workloadsLead the architecture and deployment of distributed AI training environments, including multi-node GPU clusters, InfiniBand networking, and parallel computing frameworks like NVIDIA NVLink and NVSwitchDevelop comprehensive infrastructure strategies for AI model training, inference, and deployment across on-premises, cloud, and hybrid environments using SMCI hardware solutions and NVIDIA AI Enterprise software stackOversee the integration of AI accelerators, including NVIDIA A100, H200, and emerging GPU architectures, with SMCI SuperServer platforms to optimize performance and resource utilizationCollaborate with data science teams to design scalable MLOps pipelines, model serving infrastructure, and automated deployment systems using containerization and orchestration technologiesEstablish monitoring, logging, and performance optimization frameworks for AI workloads, implementing solutions like NVIDIA Triton Inference Server and GPU monitoring toolsLead technical due diligence for AI hardware procurement decisions, including SMCI server configurations, NVIDIA GPU selections, and storage solutions for large-scale AI datasetsMentor junior infrastructure engineers and provide technical leadership in AI infrastructure best practices, including security, compliance, and cost optimization strategiesDrive automation initiatives for AI infrastructure provisioning, configuration management, and lifecycle management using Infrastructure as Code principlesPartner with stakeholders to define AI infrastructure roadmaps, capacity planning, and technology adoption strategies aligned with business objectives