Description :
We are looking for a highly skilled Real-Time Voice AI Engineer with hands-on experience in building real-time voice bots, speech pipelines, and streaming AI systems using open-source technologies. This role involves working across the stackspeech recognition, voice generation, WebRTC streaming, ML deployment, and scalable Python microservices.
Key Responsibilities :
Real-Time Voice & Streaming Systems :
- Develop and maintain real-time voice bots using WebRTC, LiveKit, WebSockets, and telephony integrations.
- Build and optimize low-latency voice streaming pipelines for conversational AI applications.
- Ensure secure, robust, and scalable communication channels across all voice systems.
Speech & Generative AI Pipelines :
Build ASR pipelines using Whisper, VAD, turn detection, and custom inference modules.Implement, fine-tune, and deploy Generative AI models using Ollama, vLLM, HuggingFace, etc.Optimize TTS streaming using open-source models such as Orpheus, Spark, Fish Audio, or similar.Backend & ML Infrastructure :
Develop Python-based inference microservices and MLOps pipelines for scalable deployment.Optimize models via quantization, caching, and GPU-aware serving.Ensure real-time performance with low-latency audio, fast inference, and efficient resource utilization.Required Skills & Experience :
3+ years of relevant experience in real-time AI / voice streaming development (must be open-source focused).Strong proficiency in Python, including multiprocessing, async programming, and microservice architecture.Hands-on experience with :ASR / TTS systems (Whisper, VAD, diarization, TTS models)WebRTC, LiveKit, WebSockets for real-time voice applicationsGenerative AI, model fine-tuning, quantization, and HuggingFace ecosystemMLOps tools, scalable inference systems, and production deploymentExperience with telephony integration (SIP, PSTN, Twilio, Asterisk, etc.) is a strong advantage.Strong understanding of low-latency system design, GPU optimization, and secure streaming(ref : hirist.tech)