This job offer is not available in your country.

Speech Recognition Consultant

Sony Research IndiaThane, IN

9 hours ago

Job description

Sony Research India is driving cutting-edge research and development in various locations around the globe, including laboratories in Japan, the United States, Europe, and Asia. We endeavor to create new technology, products, and services while sustaining Sony Group’s diverse businesses in electronics, entertainment, and financial fields. For our research centre to blaze a trail in the latest technologies, we seek to foster the growth of a diverse pool of research and engineering talent and create a technology talent bank to drive research excellence worldwide. Sony Research India is offering outstanding career opportunities around frontline technologies such as AI and data analytics.

Sony Research India is seeking a dynamic and motivated Speech Recognition Consultant to join our innovative research team. As a Consultant, you will work on real-world problems in automatic speech recognition (ASR), focusing on improving noise robustness and minimizing code-switching errors in transcription outputs. You'll gain hands-on experience with state-of-the-art tools and datasets, and contribute to impactful projects alongside experienced researchers and engineers.

Key Responsibility :

Explore and develop techniques to enhance ASR robustness under noisy, low-resource, and domain-shifted conditions.
Investigate code-switching errors in end-to-end ASR models (e.g., Whisper, Wav2Vec2, etc.) and propose mitigation strategies.
Conduct experiments using large-scale speech datasets and evaluate ASR performance across varying noise levels and linguistic diversity.
Contribute to publications, technical reports, or open-source tools as outcomes of the research.

Work Location :

Remote within India,

Duration of the paid contractual role :

The annual paid direct contractual tenure is extendable.

Ideally this position will start from first week of November 2025.

The working hours are from 9 : 00 to 18 : 00 (Monday to Friday) full-time.

Essential Education :

Completed Ph.D. / Bachelor’s or Master’s (Research) degree with some industry experience in deep learning or machine learning, and hands-on expertise with Transformer models applied to audio or speech tasks.

Must Have Skills & Abilities :

Excellent coding skills, especially in Python and PyTorch.

Experience with speech processing libraries (e.g., Torchaudio, ESPnet, Hugging Face Transformers).

Prior experience with ASR models like Wav2Vec2, Whisper, or RNN-T is a plus.

Ability to read and implement academic papers.

Strong foundation in machine learning and signal processing.

Good to Have Skills :

Familiarity with prompt tuning, contrastive learning, or multi-modal architectures.

Experience with multilingual ASR.

Papers in top-tier conferences like ICASSP, Interspeech, NeurIPS, AAAI, ACL, etc.

Our Values :

Dreams & Curiosity : Pioneer the future with dreams and curiosity.

Diversity : Pursue the creation of the very best by harnessing diversity and varying viewpoints.

Integrity & Sincerity : Earn the trust for Sony brand through ethical and responsible conduct.

Sustainability : Fulfil our stakeholder responsibilities through disciplined business practices.

Sony Research India is committed to equal opportunity in all its employment practices, policies and procedures and to ensuring that no worker or potential worker will receive less favourable treatment due to any characteristic protected under applicable local laws.

Create a job alert for this search

Consultant • Thane, IN