Vlm research engineer

MerilVapi, Gujarat, India

16 days ago

Job description

Job Title : VLM Research Engineer

Location : Vapi, Gujarat

Employment Type : Full-Time

Overview

We are seeking a highly skilled VLM Research Engineer to build multimodal (vision-language-action) models for instruction following, scene grounding, and tool use across platforms. The role involves developing advanced models that bridge perception and language understanding for autonomous systems.

Key Responsibilities

Pretrain and finetune VLMs, aligning them with robotics data including video, teleoperation, and language.

Build perception-to-language grounding for referring expressions, affordances, and task graphs.

Develop Toolformer / actuator interfaces to convert language intents into actionable skills and motion plans.

Create evaluation pipelines for instruction following, safety filters, and hallucination control.

Collaborate with cross-functional teams for integration of models into robotics platforms.

Must-Haves

Master’s or Ph D in a relevant field.

1–2+ years of experience in Computer Vision / Machine Learning.

Strong proficiency in Py Torch or JAX; experience with LLMs and VLMs.

Familiarity with multimodal datasets, distributed training, and RL / IL.

Nice-to-Haves

Experience with world models, diffusion-policy integration, and speech interfaces.

Familiarity with sim-to-real

Create a job alert for this search

Research Engineer • Vapi, Gujarat, India