Build production-ready Generative AI applications using Large Language Models (LLMs) and AI agents. Implement RAG systems, prompt engineering, and multi-agent workflows for intelligent automation. Key Responsibilities
- Design and implement LLM-powered applications and AI agents
- Build RAG (Retrieval Augmented Generation) systems with vector databases
- Develop advanced prompt engineering strategies and templates
- Create multi-agent systems with tool integration and orchestration
- Implement document processing pipelines and knowledge base ingestion
- Optimize LLM inference for cost, latency, and quality
- Integrate LLMs with business workflows and APIs
- Evaluate LLM outputs and implement guardrails and safety measures Required Skills LLM & Generative AI :
- Deep understanding of Large Language Models (GPT, Claude, Llama)
- Prompt engineering techniques (zero-shot, few-shot, chain-of-thought)
- RAG architecture and implementation patterns
- Context management and token optimization
- Fine-tuning and parameter-efficient methods (LoRA, QLoRA)
- Understanding of transformer architecture and attention mechanisms AI Agents & Orchestration :
- Agent frameworks and autonomous systems
- Tool calling and function integration
- Multi-agent communication and coordination
- Planning, reasoning, and reflection patterns
- Memory management for conversational AI Vector Search & Embeddings :
- Embedding models and semantic search
- Vector database operations and optimization
- Similarity search and retrieval strategies
- Chunking strategies and document preprocessing Required Tech Stack LLM Frameworks & APIs :
- LLM Providers : OpenAI API (GPT-4, GPT-3.5), Anthropic (Claude), OpenRouter
- Frameworks : LangChain, LlamaIndex, LiteLLM, Haystack
- Agent Frameworks : AutoGPT, LangGraph, CrewAI etc
- Open Source LLMs : Llama 3, Mistral, Mixtral (via HuggingFace) etc Vector Databases & Search :
- Vector DBs : Pinecone, Weaviate, Chroma, Qdrant, Milvus (any of one)
- Embeddings : OpenAI Embeddings, Sentence Transformers, Cohere
- Search : Elasticsearch, OpenSearch Development Tools :
- Languages : Python (expert)
- Web Frameworks : FastAPI, Flask, Streamlit
- Document Processing : LangChain Document Loaders, Unstructured, PyPDF2
- NLP Libraries : spaCy, NLTK, Hugging Face Transformers MLOps & Deployment :
- Model Serving : vLLM, Ray Serve, TGI (Text Generation Inference)
- Monitoring : LangSmith, Weights & Biases
- Containerization : Docker, Kubernetes
- Version Control : Git Cloud & Infrastructure :
- Cloud Providers : AWS (Bedrock, Lambda), Azure (OpenAI Service), GCP
- APIs : REST, WebSocket, GraphQL
- Caching : Redis Preferred Qualifications
- Bachelor's / Master's in Computer Science, AI, NLP, or related field
- Experience fine-tuning LLMs (LoRA, full fine-tuning)
- Knowledge of LLM evaluation frameworks (ROUGE, BLEU, BERTScore)
- Contributions to LLM / GenAI open-source projects
- Experience with multi-modal models (vision, audio) What Success Looks Like
- Production GenAI applications handling real user traffic
- High-quality LLM outputs with low hallucination rates
- Cost-optimized inference with acceptable latency
- RAG systems providing accurate, relevant context
- Robust agent systems completing complex multi-step tasks
- Well-structured prompts and reusable templates