8+ years (with a minimum of 2 years of hands-on experience in designing, architecting production grade Agentic AI and LLM-based systems with all NFRs covered.)
About the Role
Our organization is developing an Agentic AI Platform designed to orchestrate intelligent, autonomous workflows that drive decision-making, automation, and innovation across diverse domains.
As the Lead AI Platform Engineer, you will be responsible for architecting and implementing the foundational elements of this platform. This includes designing, deploying and maintaining production grade multi-agent systems with most effective frameworks including CrewAI, LangGraph, or equivalent systems, and deploying these solutions at scale on modern cloud infrastructure. You will work on and collaborate closely with other data engineers, data scientists, ML engineers, and product teams to realize our vision of adaptive, reasoning-driven systems.
Key Responsibilities :
- Architect & Build Agentic Systems : Design, develop, and deploy multi-agent workflows leveraging frameworks such as CrewAI, LangGraph, or custom-built orchestration layers.
- Platform Engineering : Construct the underlying agentic runtime, encompassing message routing, memory management, and context-sharing mechanisms between agents.
- LLM & Tool Integration : Integrate Large Language Models (e.g., OpenAI, Anthropic, or open-source alternatives), vector databases, retrieval systems, and external APIs for agent tool-use and reasoning capabilities.
- Workflow Design & Optimization : Collaborate with AI researchers and solution engineers to design dynamic agent workflows that adapt based on contextual information and analytical results.
- Cloud & Scalability : Architect scalable, cost-efficient deployments utilizing AWS, GCP, or Azure, leveraging cloud-native components (e.g., Lambda, ECS, Kubernetes, Pub / Sub) to ensure high availability and performance.
- Observability & Governance : Implement comprehensive monitoring, evaluation metrics, and safety checks for autonomous agents to ensure reliability and compliance.
- Team Leadership & Mentorship : Provide guidance to a team of engineers, establish best practices for agent development, and cultivate a high-performance engineering culture. Take sessions for the team and build tog-grade skills in the area across the company.
Required Qualifications :
Strong background in Python, with demonstrated experience in asynchronous programming, API development, and distributed systems.Proven experience in building LLM-powered multi-agent systems using CrewAI, LangGraph, LangChain, or similar orchestration frameworks.Deep understanding of prompt engineering, RAG pipelines, and tool-calling mechanisms.Hands-on experience with cloud infrastructure (AWS / GCP / Azure) and MLOps components (e.g., Kubernetes, Docker, CI / CD, API Gateway).Solid understanding of state management, context persistence, and memory architecture for agents.Experience integrating vector stores (e.g., FAISS, Pinecone, Chroma, Weaviate) and LLM APIs (e.g., OpenAI, Claude, Gemini).Demonstrated ability to architect scalable AI systems from prototype to production.Excellent communication and collaboration skills, with the ability to translate complex technical concepts into actionable platform features.Experience with Graph-based orchestration or LangGraph advanced workflows.Familiarity with CrewAI crew and task paradigms or other agent coordination Exposure to event-driven architectures, message queues, or knowledge graphs.Understanding of AI safety, alignment, and governance principles.Contributions to open-source agent frameworks or AI orchestration tools.Ability to customize models to specific purposes.What We Offer :
An unparalleled opportunity to build the next-generation Agentic AI Platform from inception.The chance to work with cutting-edge LLM orchestration technologies and cloud-native AI systems.Competitive compensation and potential equity options.A collaborative, innovation-driven culture with a strong focus on real-world impact.(ref : iimjobs.com)