Talent.com
This job offer is not available in your country.
Data Scientist (Voice)

Data Scientist (Voice)

Tanla Platforms LimitedHyderabad, Telangana, India
2 days ago
Job description

About the Role :

We are seeking a highly experienced Voice AI / ML Engineer to lead the design and deployment of real-time voice intelligence systems . This role focuses on ASR , TTS , speaker diarization , wake word detection , and building production-grade modular audio processing pipelines to power next-generation contact center solutions , intelligent voice agents , and telecom-grade audio systems .

You will work at the intersection of deep learning , streaming infrastructure , and speech / NLP technology , creating scalable, low-latency systems across diverse audio formats and real-world applications.

Key Responsibilities :

Voice & Audio Intelligence :

  • Build, fine-tune, and deploy ASR models (e.g., Whisper , wav2vec2.0 , Conformer ) for real-time transcription.
  • Develop and finetune high-quality TTS systems using VITS , Tacotron , FastSpeech for lifelike voice generation and cloning.
  • Implement speaker diarization for segmenting and identifying speakers in multi-party conversations using embeddings (x-vectors / d-vectors) and clustering (AHC, VBx, spectral clustering).
  • Design robust wake word detection models with ultra-low latency and high accuracy in noisy conditions.

Real-Time Audio Streaming & Voice Agent Infrastructure :

  • Architect bi-directional real-time audio streaming pipelines using WebSocket , gRPC , Twilio Media Streams , or WebRTC .
  • Integrate voice AI models into live voice agent solutions , IVR automation , and AI contact center platforms .
  • Optimize for latency , concurrency , and continuous audio streaming with context buffering and voice activity detection (VAD).
  • Build scalable microservices to process, decode, encode, and stream audio across common codecs (e.g., PCM , Opus , μ-law , AAC , MP3 ) and containers (e.g., WAV , MP4 ).
  • Deep Learning & NLP Architecture :

  • Utilize transformers , encoder-decoder models , GANs , VAEs , and diffusion models , for speech and language tasks.
  • Implement end-to-end pipelines including text normalization, G2P mapping, NLP intent extraction, and emotion / prosody control.
  • Fine-tune pre-trained language models for integration with voice-based user interfaces.
  • Modular System Development :

  • Build reusable, plug-and-play modules for ASR , TTS , diarization , codecs , streaming inference , and data augmentation .
  • Design APIs and interfaces for orchestrating voice tasks across multi-stage pipelines with format conversions and buffering.
  • Develop performance benchmarks and optimize for CPU / GPU, memory footprint, and real-time constraints.
  • Engineering & Deployment :

  • Writing robust, modular, and efficient Python code
  • Experience with Docker , Kubernetes , cloud deployment (AWS, Azure, GCP)
  • Optimize models for real-time inference using ONNX , TorchScript , and CUDA , including quantization , context-aware inference , model caching .
  • On device voice model deployment.
  • Why join us?

  • Impactful Work : Play a pivotal role in safeguarding Tanla's assets, data, and reputation in the industry.
  • Tremendous Growth Opportunities : Be part of a rapidly growing company in the telecom and CPaaS space, with opportunities for professional development.
  • Innovative Environment : Work alongside a world-class team in a challenging and fun environment, where innovation is celebrated.
  • Tanla is an equal opportunity employer. We champion diversity and are committed to creating an inclusive environment for all employees.

    Create a job alert for this search

    Data Scientist • Hyderabad, Telangana, India