Role : Multimodal Embedding Engineer
Job Type; Full time
Location : Bangalore, Remote for good candidates
Core Technical Work :
- Design and implement multi-modal embedding models that create unified vector representations across text, images, video, audio, and other modalities.
- Build and optimize contrastive learning frameworks (e.g., CLIP-style architectures) for cross-modal alignment.
- Develop embedding spaces that preserve semantic relationships within and across modalities
- Create efficient indexing and retrieval systems for multi-modal data at scale.
- Fine-tune and adapt foundation models for domain-specific multi-modal applications.
Data & Infrastructure :
Build data pipelines for processing and aligning multi-modal datasets.Implement quality control mechanisms for multi-modal training data.Design evaluation frameworks to measure embedding quality across modalities.Optimize inference performance for real-time multi-modal search and retrieval.Research & Innovation :
Stay current with latest research in multi-modal learning, vision-language models, and embedding techniques.Experiment with novel architectures for improved cross-modal understanding.Benchmark different approaches and conduct ablation studies.