Job Title : AI Ops Engineer
Location : Pan India
Experience : 5-8 Years
About the Role
We are looking for an AI Ops Engineer to support and optimize Agentic AI, GenAI, and LLM applications in production environments. This role involves working with cutting-edge AI architectures, operational workflows, and automation strategies to ensure reliability and scalability of AI-driven systems.
Key Responsibilities
- Build and manage CI / CD pipelines for Agentic AI deployments.
- Deploy and monitor LLMs, RAG pipelines, and agentic AI systems in containerized environments.
- Design and fine-tune prompts for various AI use cases.
- Operate, troubleshoot, and optimize agentic AI applications across multi-step pipelines and asynchronous agents.
- Develop diagnostics and playbooks for LLM-driven failures , including fallback strategies and human-in-the-loop workflows.
- Collaborate with architects and AI developers to optimize orchestration across platforms like LangGraph, AWS Bedrock Agents, CrewAI, AutoGen .
- Integrate agentic systems with enterprise apps ( Jira, ServiceNow, Confluence ) using REST APIs and webhooks.
- Implement observability and logging best practices for model outputs, latency, and agent performance .
- Contribute to self-healing mechanisms and alerting strategies for production-grade AI workflows.
Must-Have Skills
Experience supporting Agentic AI, GenAI, or LLM applications in production.Exposure to agentic AI architectures (LangGraph, AWS Bedrock Agents, AutoGen, CrewAI).Strong foundation in prompt engineering and experience with LLMs ( GPT, Claude, LLaMA ).Hands-on experience in DevOps / MLOps and AI workflows.Familiarity with Python, Bash , RESTful APIs, MCP , and AWS cloud environments.Knowledge of vector databases (OpenSearch / FAISS), SQL (Postgres) , NoSQL (DynamoDB) , and Graph DBs (Neptune) .Understanding of containerization, orchestration , and monitoring tools ( Prometheus, Grafana, ELK ).Exposure to security and compliance standards in model lifecycle management.Good-to-Have
Experience with Angular deployments and advanced automation strategies.