About The Opportunity :
Join a high-velocity engineering team building robust, low-latency speech and voice solutions for large-scale deployments.
You will design and ship state-of-the-art ASR models and production pipelinesbridging classical signal-processing foundations with modern transformer-based speech models to drive measurable product impact.
Role & Responsibilities :
- Lead design, training and optimisation of ASR systemsend-to-end and hybridusing transformer and sequence modeling (Wav2Vec 2.0, Whisper, CTC, attention-based encoders / decoders).
- Develop and evaluate speech pre-processing and DSP pipelines (feature extraction, augmentation, denoising, VAD) to improve robustness across noisy, multilingual inputs.
- Prototype and productionise model-serving solutions : containerised inference, latency optimisation, batching, and autoscaling for cloud and edge deployments.
- Collaborate with data engineers and linguists to curate datasets, define annotation guidelines, and run rigorous evaluation (WER, CER, streaming metrics) and error-analysis cycles.
- Implement reproducible training workflows, CI / CD for models, monitoring for drift and performance, and automation for retraining and A / B evaluation.
- Mentor peers, author engineering-excellence patterns (testing, observability), and present technical results to product and stakeholder teams.
Skills & Qualifications :
Must-Have :
5+ years in speech recognition or related audio ML roles with proven production impact.Strong DSP and audio analysis fundamentals (feature engineering, spectrograms, filtering, VAD).Hands-on experience with PyTorch and / or TensorFlow for building and training ASR models.Practical knowledge of transformer-based speech models (Wav2Vec 2.0, Whisper) and sequence losses (CTC), plus RNN / CNN architectures.Proficient in Python; experience with C++ / Java for production deployments is highly desirable.Experience deploying models in cloud environments (AWS / GCP) and container orchestration (Docker / Kubernetes); familiar with MLOps tooling and CI / CD.Preferred :
Background in multilingual ASR, low-resource languages, or on-device / edge inference optimisation.Experience with large-scale data pipelines, annotation platforms, and semi-supervised / self-supervised learning workflows.Familiarity with production monitoring (prometheus / grafana), model explainability, and privacy-preserving ML techniques.Benefits & Culture Highlights :
High-autonomy engineering culture with strong emphasis on ownership, mentorship, and career growth.Opportunity to influence product direction and work on state-of-the-art speech models at scale.Competitive compensation, flexible hybrid work, and learning budget for conferences and training.We are seeking a results-oriented Speech Scientist who thrives on technical ownership and delivering dependable voice AI in real-world settings.
Apply if you want to push ASR boundaries and build production-grade speech systems that scale.
(ref : hirist.tech)