Design and implement end-to-end data pipelines for training and fine-tuning LLMs, including dataset creation, cleaning, augmentation, and labeling workflows.
Apply advanced RAG techniques, prompt engineering, and model fine-tuning (LoRA, PEFT, adapters) for domain-specific use cases.
Integrate AI models with backend and frontend systems via APIs, batching, caching, and streaming responses.
Deploy and optimize LLMs & embeddings using APIs and / or open-source models (OpenAI, Anthropic, LLaMA-family, Mistral, etc.).
Develop and maintain secure AI APIs using FastAPI / gRPC with Kubernetes and CI / CD pipelines.
Implement safety and compliance layers including prompt injection defenses, hallucination reduction, and PII redaction.
Collaborate with MLOps and platform engineering teams to ensure scalable deployment using Docker, Kubernetes, and Ray / Serve.
Leverage frameworks such as LangChain, Hugging Face Transformers, and Azure OpenAI for model orchestration and integration.
Work with vector databases (FAISS, Pinecone, Milvus) to build efficient retrieval-augmented generation pipelines.
Mandatory Skills :
Strong experience with LLMs, embeddings, and fine-tuning techniques.
Proficiency in Python and experience in LangChain / Hugging Face.
Hands-on experience with MLOps (Docker, Kubernetes, CI / CD).
Strong knowledge of model integration and scalable API development.
Familiarity with safety and compliance mechanisms in AI systems.
Desirable / Plus :
Exposure to speech-to-text models (e.g., Whisper) with emphasis on Indian languages.