About the Role
We are looking for an AI Agent Developer t o build intelligent conversational systems that can listen, understand, and respond in natural language. The ideal candidate has experience with speech recognition, language models, and AI agent frameworks — combining these to create robust, real-time voice-enabled experiences.
Key Responsibilities
Design and develop speech-to-text (ASR) and text-to-speech (TTS) pipelines for real-time AI interaction.
Integrate and fine-tune models such as Whisper, Deepgram, Vosk, or OpenAI’s APIs for transcription.
Build LLM-powered AI agents capable of contextual reasoning, task automation, and conversational flow.
Develop end-to-end multimodal AI pipelines, combining voice, text, and API-based actions.
Work with LangChain, OpenAI Assistants API, or similar frameworks for orchestration.
Optimize latency, accuracy, and performance in live voice agent environments.
Collaborate with backend and frontend teams to integrate speech capabilities into web / mobile products.
Continuously research and implement advancements in ASR, NLP, and conversational AI.
Required Skills
Strong experience in Python or JavaScript / TypeScript.
Hands-on with systems like Whisper, DeepSpeech, Deepgram, AssemblyAI, etc.
Solid understanding of LLM frameworks : LangChain, LlamaIndex, or OpenAI API.
Experience developing agentic systems or voice-based assistants.
Knowledge of RESTful APIs, WebSockets, and real-time streaming architectures.
Good grasp of prompt engineering, vector databases (Pinecone, Weaviate, FAISS), and RAG pipelines.
Experience with cloud platforms (AWS, GCP, Azure) for model deployment and scaling.
Nice to Have
Experience with WebRTC, audio streaming, or real-time voice pipelines.
Exposure to multimodal AI (speech + vision + text).
Familiarity with React or Next.js for integrating voice agents in UIs.
Contributions to open-source ASR / NLP projects.
Key Tools & Technologies
Languages : Python, TypeScript / JavaScript
AI / ML : Whisper, Deepgram, Hugging Face Transformers, OpenAI GPT models
Frameworks : LangChain, LlamaIndex, FastAPI, Node.js
Databases : Vector DBs (Pinecone, Weaviate), MongoDB
Cloud / Infra : AWS Lambda, GCP Speech APIs, Docker
Audio / Streaming : FFmpeg, WebRTC
Artificial Intelligence Engineer • mumbai city, maharashtra, in