We are seeking an experienced AI Solution Architect to design, develop, and scale next-generation GenAI-powered microservices.
This role involves architecting multi-agent systems, building RAG pipelines, and deploying large-scale LLM applications using Google Cloud services.
You will play a key role in shaping the architecture of AI-driven solutions while ensuring security, performance, and scalability.
Key Responsibilities :
API & Microservices Development :
- Design and implement robust asynchronous APIs using FastAPI for GenAI microservices.
- Ensure request routing, rate limiting, error tracking, and observability for production-grade systems.
Multi-Agent Orchestration :
Architect multi-agent systems using LangGraph, CrewAI, or similar frameworks.Implement dynamic workflows with LangChain Expression Language (LCEL) and tool / function calling for complex task orchestration.RAG & Knowledge Systems :
Build retrieval-augmented generation (RAG) pipelines with advanced chunking, metadata tagging, and vector search integration.Work with vector databases such as FAISS, Pinecone, and GCP Matching Engine.Caching & State Management :
Develop session management layers and caching mechanisms using Redis (pub / sub, aioredis) to enable memory and persistence in real-time chat systems.Cloud Deployment & LLM Optimization :
Deploy and optimize LLM applications on Google Cloud Platform (Vertex AI, Cloud Run, Storage, IAM, Matching Engine).Integrate embedding models from OpenAI, Cohere, and Gemini.Security & Compliance :
Implement API key management, JWT-based authentication, and audit logging.Maintain industry-standard security best practices across deployments.Required Skills & Qualifications :
5+ years of backend engineering experience in Python.Strong expertise in FastAPI with async / await, background tasks, dependency injection, and exception handling.Hands-on experience with LangChain, LangGraph, LCEL, and multi-agent systems.Proficiency in Redis (pub / sub, async clients, caching layers) for conversation state and memory.Strong knowledge of Google Cloud Platform (Vertex AI, Cloud Run, IAM, Storage, Matching Engine).Familiarity with vector databases (FAISS, Pinecone, GCP Matching Engine) and embedding models (OpenAI, Cohere, Gemini).Experience with tool / function calling, session tracking, and context management in LLMs.Proficiency with Docker and building scalable microservice architectures.Preferred Skills (Nice to Have) :
Exposure to observability tools (Prometheus, Grafana, OpenTelemetry).Familiarity with CI / CD pipelines and automated deployments.Experience in fine-tuning or custom training of LLMs.Knowledge of MLOps practices for AI / ML model lifecycle management.(ref : hirist.tech)