Job Title : Lead Solutions Architect – AI Infrastructure & Private Cloud
Location : Bengaluru (Electronic City)
Experience : 10–15 Years (Lead / Architect Level)
Position Type : Full-Time | Immediate Joiners Preferred
Criticality : High
Role Overview :
We are seeking a Lead Solutions Architect specializing in AI Infrastructure and Private Cloud to design and deliver scalable, high-performance compute environments for machine learning, deep learning, and AI workloads. The ideal candidate will have deep expertise in Kubernetes , container orchestration , GPU / TPU acceleration , and HPC (High Performance Computing) architectures, enabling AI-driven innovation across enterprise platforms.
Key Responsibilities :
Architect, design, and implement AI / ML infrastructure solutions across private and hybrid cloud environments.
Lead setup and optimization of Kubernetes Landing Zones , including cluster design, multi-tenancy, and security.
Manage containerized workloads using orchestration tools (Kubernetes, Docker, Podman, OpenShift).
Integrate AI accelerators (NVIDIA GPUs, TPUs) for ML / DL model training and inference.
Enable deployment of deep learning models with a focus on hardware acceleration, scalability, and performance tuning.
Build and maintain edge and cloud-native deployment pipelines for AI workloads.
Collaborate with AI / ML and DevOps teams to ensure robust CI / CD workflows for model deployment.
Drive HPC architecture design , including compute, storage, networking, and scheduling (SLURM, PBS, etc.).
Optimize HPC and AI infrastructure for cost, performance, and resource utilization.
Provide technical leadership in evaluating and integrating emerging technologies (AI frameworks, MLOps platforms, accelerator hardware).
Define standards, documentation, and best practices for AI infrastructure operations.
Required Technical Skills :
Containerization & Orchestration : Kubernetes, Docker, Helm, OpenShift, Rancher
Cloud Platforms : AWS, Azure, GCP (Private & Hybrid Cloud expertise preferred)
AI / ML Infrastructure : NVIDIA GPU integration, CUDA, TensorRT, TPUs, PyTorch / TensorFlow deployment
High Performance Computing (HPC) : HPC architecture, schedulers (SLURM, PBS), parallel computing, storage & network optimization
DevOps & CI / CD : GitHub Actions, Jenkins, ArgoCD, Terraform, Ansible
Monitoring & Observability : Prometheus, Grafana, ELK Stack
Scripting / Programming : Python, Bash, YAML, Go (preferred)
Desired Skills :
Experience with RAG / LLM model deployment pipelines or AI workload orchestration
Knowledge of edge computing and distributed inference systems
Exposure to AI model lifecycle management (MLOps)
Strong problem-solving, leadership, and cross-functional collaboration skills
Ai Solution Architect • Agra, Uttar Pradesh, India