About VolumX
VolumX is India’s first and leading company for digital humans, trusted by Oscar-winning studios and global brands. Using some of the most advance technologies, we are creating lifelike digital doubles for celebrities powering face replacements, de-aging, and autonomous digital humans.
Now, we’re pushing into AI driven performance synthesis and we’re inviting talented engineers to join us in shaping the future of content and immersive interactions.
Requirement
- Strong experience in AI / ML model development (PyTorch, TensorFlow).
- Practical experience with video generation, face replacement, digital doubles.
- Understanding of Gaussian splats / NeRF / neural rendering concepts.
- Knowledge of emotion recognition & sentiment analysis models.
- Experience with real-time inference optimization (ONNX, TensorRT, quantization).
- Background in speech processing (ASR, vocoders, prosody control) to train facial expression model . - Familiarity with lip-sync engines (e.g., Rhubarb, Wav2Lip, custom phoneme alignment).
- Strong programming skills in Python
- Experience in multimodal AI (audio + video + text).
- Experience deploying AI in cloud environments (AWS / GCP / Azure, Docker, Kubernetes).
- Hands-on expertise with STT / TTS frameworks (LiveKit, Whisper, Riva, Coqui TTS, Tacotron, FastSpeech, VITS, etc.).
Responsibilities
Face & Video Models
Train person-specific models for face reenactment, face swapping, and de-aging.Build high-res, temporally consistent face replacement pipelines.Test and implement neural rendering pipelines using Gaussians, Nerfs, DiffusionLip Sync & Emotions
Implement or adapt lip-sync models (e.g. Wav2Lip-style, phoneme / viseme-based).Research and integrate emotion recognition models (from audio / text input).Map emotion states into facial rigs and lip-sync engines.Interface outputs with rigs / MetaHumans via blendshapes or bone controls.Pipeline Engineering
Design low-latency inference pipelines for STT → NLU → TTS.Optimize models for real-time streaming (GPU / TPU / Cloud deployment).Work with backend engineers to expose AI services via APIs / WebSockets.Collaboration & Integration
Partner with Unreal engineers to sync AI outputs with Pixel Streaming. - Ensure smooth coordination between voice, facial animation, and emotional response.