Key Responsibilities :
- Design and implement scalable deployment pipelines for open-source Gen AI models (LLMs, diffusion models, etc.)
- Fine-tune and optimize models using techniques like LoRA, quantization, distillation, etc.
- Manage inference workloads, latency optimization, and GPU utilization.
- Build CI / CD pipelines for model training, validation, and deployment.
- Integrate observability, logging, and alerting for model and infrastructure monitoring.
- Automate resource provisioning using Terraform, Helm, or similar tools on GCP / AWS / Azure.
- Ensure model versioning, reproducibility, and rollback using tools like MLflow, DVC, or
Weights & Biases.
Collaborate with data scientists, backend engineers, and DevOps teams to ensure smoothproduction rollouts.
Required Skills & Qualifications :
5+ years of total experience in software engineering or cloud infrastructure.3+ years in MLOps with direct experience in deploying large Gen AI models.Hands-on experience with open-source models (e.g., LLaMA, Mistral, Stable Diffusion, Falcon,etc.)
Strong knowledge of Docker, Kubernetes, and cloud compute orchestration.Proficiency in Python and familiarity with model-serving frameworks (e.g., FastAPI, TritonInference Server, Hugging Face Accelerate, vLLM).
Experience with cloud platforms (GCP preferred, AWS or Azure acceptable).Familiarity with distributed training, checkpointing, and model parallelism.Good to Have :
Experience with low-latency inference systems and token streaming architectures.Familiarity with cost optimization and scaling strategies for GPU-based workloads.Exposure to LLMOps tools (LangChain, BentoML, Ray Serve, etc.)Why Join Us :
Opportunity to work on cutting-edge Gen AI applications across industries.Collaborative team with deep expertise in AI, cloud, and enterprise software.Flexible work environment with a focus on innovation and impact(ref : hirist.tech)