We’re seeking a highly skilled AI Engineer who can bring real-time, emotionally intelligent voice and chat interactions to life. You’ll work on integrating state-of-the-art speech recognition (STT), text-to-speech (TTS), and large-language-model (LLM) systems to create seamless conversational experiences for millions of users worldwide.
Key Responsibilities :
- Build and optimize real-time AI pipelines using speech-to-text, natural language understanding, and text-to-speech systems.
- Fine-tune and integrate LLMs for contextual dialogue and personality consistency.
- Implement real-time voice streaming using tools like LiveKit , Whisper , Riva , or Ultravox .
- Develop prompt-management and memory systems for dynamic user conversations.
- Optimize model inference and latency for low-bandwidth or mobile environments.
- Work closely with product and design teams to align technical capability with user experience goals.
- Experiment with multimodal models (voice + vision + text) and maintain continuous improvement in accuracy, emotional tone, and realism.
Requirements :
Strong background in PythonHands-on experience with OpenAI APIs , Gemini , NVIDIA Riva , or Whisper .Experience with real-time streaming protocols ( LiveKit)