This job offer is not available in your country.

Data Scientist (Voice)

Tanla Platforms LimitedHyderabad, Telangana, India

3 days ago

Job description

About the Role :

We are seeking a highly experienced Voice AI / ML Engineer to lead the design and deployment of real-time voice intelligence systems . This role focuses on ASR , TTS , speaker diarization , wake word detection , and building production-grade modular audio processing pipelines to power next-generation contact center solutions , intelligent voice agents , and telecom-grade audio systems .

You will work at the intersection of deep learning , streaming infrastructure , and speech / NLP technology , creating scalable, low-latency systems across diverse audio formats and real-world applications.

Key Responsibilities :

Voice & Audio Intelligence :

Build, fine-tune, and deploy ASR models (e.g., Whisper , wav2vec2.0 , Conformer ) for real-time transcription.
Develop and finetune high-quality TTS systems using VITS , Tacotron , FastSpeech for lifelike voice generation and cloning.
Implement speaker diarization for segmenting and identifying speakers in multi-party conversations using embeddings (x-vectors / d-vectors) and clustering (AHC, VBx, spectral clustering).
Design robust wake word detection models with ultra-low latency and high accuracy in noisy conditions.

Real-Time Audio Streaming & Voice Agent Infrastructure :

Architect bi-directional real-time audio streaming pipelines using WebSocket , gRPC , Twilio Media Streams , or WebRTC .

Integrate voice AI models into live voice agent solutions , IVR automation , and AI contact center platforms .

Optimize for latency , concurrency , and continuous audio streaming with context buffering and voice activity detection (VAD).

Build scalable microservices to process, decode, encode, and stream audio across common codecs (e.g., PCM , Opus , μ-law , AAC , MP3 ) and containers (e.g., WAV , MP4 ).

Deep Learning & NLP Architecture :

Utilize transformers , encoder-decoder models , GANs , VAEs , and diffusion models , for speech and language tasks.

Implement end-to-end pipelines including text normalization, G2P mapping, NLP intent extraction, and emotion / prosody control.

Fine-tune pre-trained language models for integration with voice-based user interfaces.

Modular System Development :

Build reusable, plug-and-play modules for ASR , TTS , diarization , codecs , streaming inference , and data augmentation .

Design APIs and interfaces for orchestrating voice tasks across multi-stage pipelines with format conversions and buffering.

Develop performance benchmarks and optimize for CPU / GPU, memory footprint, and real-time constraints.

Engineering & Deployment :

Writing robust, modular, and efficient Python code

Experience with Docker , Kubernetes , cloud deployment (AWS, Azure, GCP)

Optimize models for real-time inference using ONNX , TorchScript , and CUDA , including quantization , context-aware inference , model caching .

On device voice model deployment.

Why join us?

Impactful Work : Play a pivotal role in safeguarding Tanla's assets, data, and reputation in the industry.

Tremendous Growth Opportunities : Be part of a rapidly growing company in the telecom and CPaaS space, with opportunities for professional development.

Innovative Environment : Work alongside a world-class team in a challenging and fun environment, where innovation is celebrated.

Tanla is an equal opportunity employer. We champion diversity and are committed to creating an inclusive environment for all employees.

Create a job alert for this search

Data Scientist • Hyderabad, Telangana, India