Job Title : Computer Vision Research Engineer
Location : Noida
Experience : 2-5 Years
Qualification : B.Tech
Job description
Core Responsibilities :
- Architect and develop novel Deep Learning (DL) and Machine Learning (ML) neural network architectures, including those for generative AI.
- Design, develop, and refine Vision-Language Model (VLM) architectures, and integrate them with generative AI techniques for
advanced multimodal understanding and generation.
Spearhead the New Product Development (NPD) process, delivering innovative, differentiated products / solutions that provide distinct consumer advantages and exceed market benchmarks.Engineer and implement advanced computer vision and deep learning algorithms to resolve intricate scene understanding challenges and their multimodal interpretations.Evaluate experimental visual and contextual data to discern and execute performance enhancement strategies.Facilitate the transition of algorithms into operational prototypes and / or real-time system demonstrations.Establish and manage a cohesive team of resources to ensure seamless integration with related functions.Job Requirements :
B.Tech in AI and Computer Science, Engineering, or a related discipline.2 - 5 years of experience in computer vision, Generative AI, Vision Language Models (VLM), multimodal systems, and image processing.A strong background in designing image understanding, multimodal, and cross-domain learning algorithms.Hands-on experience with deployable, Real-time Vision Language Models (VLMs).Practical experience with acceleration frameworks such as NVIDIA TensorRT, DeepStream, and Intel OpenVINO is preferred.Proficiency in programming languages including Python or C++.Experience with Docker and Kubernetes.Knowledge of pattern recognition and Edge AI product development.Ability to review code, design architectures, and document core components.Familiarity with research and development practices and industry standards.Preferred : Hands-on experience with Natural Language Processing (NLP) and the integration of visual and textual signals for multimodal systems.(ref : hirist.tech)