Job Title : VLM Research Engineer
Location : Vapi, Gujarat
Employment Type : Full-Time
Overview
We are seeking a highly skilled VLM Research Engineer to build multimodal (vision-language-action) models for instruction following, scene grounding, and tool use across platforms. The role involves developing advanced models that bridge perception and language understanding for autonomous systems.
Key Responsibilities
- Pretrain and finetune VLMs, aligning them with robotics data including video, teleoperation, and language.
- Build perception-to-language grounding for referring expressions, affordances, and task graphs.
- Develop Toolformer / actuator interfaces to convert language intents into actionable skills and motion plans.
- Create evaluation pipelines for instruction following, safety filters, and hallucination control.
- Collaborate with cross-functional teams for integration of models into robotics platforms.
Must-Haves
Master’s or PhD in a relevant field.1–2+ years of experience in Computer Vision / Machine Learning.Strong proficiency in PyTorch or JAX; experience with LLMs and VLMs.Familiarity with multimodal datasets, distributed training, and RL / IL.Nice-to-Haves
Experience with world models, diffusion-policy integration, and speech interfaces.Familiarity with sim-to-real transfer in robotics applications.Success Metrics
Success@k on language-based tasks.Grounding precision and latency.Sim-to-real performance retention.Domain Notes
Humanoids :
Language-guided manipulation and tool use.AGVs (Autonomous Ground Vehicles) :
Natural language tasking for warehouse operations; semantic maps.Cars :
Gesture and sign interpretation; driver interaction.Drones :
Natural language mission specification; target search and inspection.Application Instructions
Interested candidates may apply by sending their resume and cover letter to parijat.patel@merai.co with the subject line : “VLM Research Engineer Application” .