Job Title :
Lead Solutions Architect – AI Infrastructure & Private Cloud
Location :
Bengaluru (Electronic City)
Experience :
10–15 Years (Lead / Architect Level)
Position Type :
Full-Time | Immediate Joiners Preferred
Criticality : High
Role Overview : We are seeking a
Lead Solutions Architect
specializing in
AI Infrastructure and Private Cloud
to design and deliver scalable, high-performance compute environments for machine learning, deep learning, and AI workloads. The ideal candidate will have deep expertise in
Kubernetes ,
container orchestration ,
GPU / TPU acceleration , and
HPC (High Performance Computing)
architectures, enabling AI-driven innovation across enterprise platforms.
Key Responsibilities :
Architect, design, and implement
AI / ML infrastructure solutions
across private and hybrid cloud environments.
Lead setup and optimization of
Kubernetes Landing Zones , including cluster design, multi-tenancy, and security.
Manage
containerized workloads
using orchestration tools (Kubernetes, Docker, Podman, OpenShift).
Integrate
AI accelerators (NVIDIA GPUs, TPUs)
for ML / DL model training and inference.
Enable
deployment of deep learning models
with a focus on hardware acceleration, scalability, and performance tuning.
Build and maintain
edge and cloud-native deployment pipelines
for AI workloads.
Collaborate with AI / ML and DevOps teams to ensure robust CI / CD workflows for model deployment.
Drive
HPC architecture design , including compute, storage, networking, and scheduling (SLURM, PBS, etc.).
Optimize
HPC and AI infrastructure
for cost, performance, and resource utilization.
Provide technical leadership in evaluating and integrating emerging technologies (AI frameworks, MLOps platforms, accelerator hardware).
Define standards, documentation, and best practices for AI infrastructure operations.
Required Technical Skills :
Containerization & Orchestration :
Kubernetes, Docker, Helm, OpenShift, Rancher
Cloud Platforms :
AWS, Azure, GCP (Private & Hybrid Cloud expertise preferred)
AI / ML Infrastructure :
NVIDIA GPU integration, CUDA, TensorRT, TPUs, PyTorch / TensorFlow deployment
High Performance Computing (HPC) :
HPC architecture, schedulers (SLURM, PBS), parallel computing, storage & network optimization
DevOps & CI / CD :
GitHub Actions, Jenkins, ArgoCD, Terraform, Ansible
Monitoring & Observability :
Prometheus, Grafana, ELK Stack
Scripting / Programming :
Python, Bash, YAML, Go (preferred)
Desired Skills : Experience with
RAG / LLM model deployment pipelines
or
AI workload orchestration
Knowledge of
edge computing
and
distributed inference systems
Exposure to
AI model lifecycle management (MLOps)
Strong problem-solving, leadership, and cross-functional collaboration skills
Ai Solution Architect • Mohali, Punjab, India