About the Role :
We’re seeking a highly skilled AI Engineer who can bring real-time, emotionally intelligent voice and chat interactions to life. You’ll work on integrating state-of-the-art speech recognition (STT), text-to-speech (TTS), and large-language-model (LLM) systems to create seamless conversational experiences for millions of users worldwide.
Key Responsibilities :
- Build and optimize real-time AI pipelines using speech-to-text, natural language understanding, and text-to-speech systems.
- Fine-tune and integrate LLMs for contextual dialogue and personality consistency.
- Implement real-time voice streaming tools.
- Develop prompt-management and memory systems for dynamic user conversations.
- Optimize model inference and latency for low-bandwidth or mobile environments.
- Work closely with product and design teams to align technical capability with user experience goals.
- Experiment with multimodal models (voice + vision + text) and maintain continuous improvement in accuracy, emotional tone, and realism.
Requirements :
Strong background in Python , Node.js , or related frameworks.Hands-on experience with OpenAI APIs , Gemini , NVIDIA Riva , or Whisper .Experience with real-time streaming protocols (WebRTC, LiveKit, or similar).Understanding of machine learning, NLP, and voice AI architectures.Familiarity with prompt engineering, model evaluation, and fine-tuning.Experience integrating AI with mobile or web applications.Self-starter who thrives in a fast-paced startup environment.Nice to Have :
Experience with vector databases, embeddings, and conversational memory.Familiarity with emotional-AI, sentiment analysis, or multilingual LLMs.Exposure to tools like ElevenLabs, Play.ht, or Coqui for voice cloning.