About The Role
We are seeking a highly experienced and technically deep Lead Machine Learning Engineer with a specialization in Speech AI, Natural Language Processing (NLP), and Generative AI (GenAI). This role is instrumental in architecting and scaling a production-grade speech-based virtual assistant powered by Large Language Models (LLMs), advanced audio signal processing, and multimodal intelligence. You will lead a team of ML engineers and collaborate closely with product, research, and DevOps stakeholders to develop and deploy cutting-edge AI solutions.
Key Responsibilities
ML Architecture & Modeling :
- Architect, design, and implement advanced machine learning models across speech recognition (ASR), text-to-speech (TTS), NLP, and multimodal tasks.
- Lead the development and fine-tuning of Transformer-based LLMs, including encoder-decoder architectures for audio and text tasks.
- Build custom audio-LLM interaction frameworks, including techniques like modality fusion, speech understanding, and language generation.
Virtual Assistant Development
Design and deploy LLM-powered virtual assistants with real-time speech interfaces for dialog, voice commands, and assistive technologies.Integrate speech models with backend NLP pipelines to handle complex user intents, contextual understanding, and response generation.MLOps & Scalable Pipelines
Design and implement end-to-end ML pipelines covering data ingestion, preprocessing, feature extraction, model training, evaluation, and deployment.Develop reproducible and scalable training pipelines using MLOps tools (e.g., MLflow, Kubeflow, Airflow) with robust monitoring and model versioning.Drive CI / CD for ML workflows, containerization of models (Docker), and orchestration using Kubernetes / serverless infrastructure.Research And Innovation
Stay up to date with state-of-the-art publications in Speech AI, LLMs, and GenAI; evaluate applicability and drive adoption of novel techniques.Experiment with cutting-edge self-supervised learning, prompt tuning, parameter-efficient fine-tuning (PEFT), and zero-shot / multilingual speech models.Required Technical Skills
10+ years of hands-on experience in machine learning, with a deep focus on audio (speech) and NLP applications.Expertise in Automatic Speech Recognition (ASR) and Text-to-Speech (TTS) systems, including tools like Wav2Vec, Whisper, Tacotron, FastSpeech, etc.Strong knowledge of Transformer architectures, such as BERT, GPT, T5, and encoder-decoder LLM variants, including training / fine-tuning at scale.Solid programming expertise in Python, with proficiency in deep learning frameworks like PyTorch and TensorFlow.In-depth understanding of audio signal processing concepts : MFCCs, spectrograms, wavelets, sampling, filtering, etc.Experience with multimodal machine learning, including fusion of speech, text, and contextual signals.Proficient in deploying ML services with Docker, Kubernetes, and experience with distributed training setups on GPU clusters or cloud (AWS, GCP, Azure).Proven experience in building production-grade MLOps frameworks and maintaining model lifecycle management.Experience with real-time inference, latency optimization, and efficient decoding techniques for audio / NLP systems.Preferred Qualifications
Master's or Ph.D. in Computer Science, Machine Learning, Signal Processing, or related technical discipline.Publications or open-source contributions in speech, NLP, or GenAI.Familiarity with LLM alignment techniques, RLHF, prompt engineering, and fine-tuning using LoRA, QLoRA, or adapters.Prior experience deploying voice-based conversational AI products at scale.(ref : hirist.tech)
Skills Required
Machine Learning, Tensorflow, Pytorch, Gcp, MLops, Docker, Azure, Python, Kubernetes, Aws