Title : AI / ML Engineer
Location : Sector 63, Noida
About the Role :
We are seeking a talented and hands-on AI / ML Engineer with experience in LLM-based architectures, vector search (e.g., Pinecone), and end-to-end model deployment. You'll be working closely with our product and research teams to develop scalable NLP / NLU applications, including RAG pipelines, LLM integrations, and custom model deployments.
Key Responsibilities :
- Design and implement Retrieval-Augmented Generation (RAG) pipelines using LLMs and vector databases like Pinecone.
- Integrate with OpenAI, LLaMA, and Hugging Face models to build conversational AI solutions.
- Work with vector databases (e.g., Pinecone, Weaviate, FAISS) for embedding-based retrieval.
- Fine-tune and serve LLMs (LLaMA, GPT, etc.) locally or via cloud deployments.
- Implement NLP / NLU tasks including summarization, classification, entity extraction, etc.
- Build and deploy ML pipelines using TensorFlow or PyTorch (preferred but not mandatory).
- Perform model evaluations, optimizations, and monitor post-deployment performance.
- Collaborate with backend and DevOps teams to deploy models using Docker, FastAPI, or other modern tools.
Required Skills :
3+ years of experience in AI / ML or Data Science roles.Strong experience with LLMs (e.g., GPT-4, LLaMA, Falcon).Hands-on experience with RAG architectures and embedding pipelines.Familiarity with OpenAI APIs, LangChain, or LLM tooling frameworks.Working knowledge of vector stores like Pinecone, FAISS, or Weaviate.Proficient in Python and libraries like transformers, scikit-learn, spaCy, etc.Exposure to model serving & deployment FastAPI, Flask, Docker, TorchServe, etc.Familiarity with NLP / ML lifecycle from training to inference and monitoring.Good to Have :
Experience with TensorFlow or PyTorch.Experience in deploying LLMs locally (LLaMA with llama.cpp or Ollama).Experience in managing Hugging Face Spaces, datasets, or model hub.MLOps experience : CI / CD, model versioning, cloud deployment.(ref : hirist.tech)