Talent.com
Computer Vision & Multimodal LLM Intern (Engineering Drawing Analysis Agent)

Computer Vision & Multimodal LLM Intern (Engineering Drawing Analysis Agent)

doAZVisakhapatnam, IN
15 hours ago
Job description

Computer Vision & Multimodal LLM Intern (Drawing Change Analysis Agent)About Doaz

Doaz turns fragmented industrial knowledge into instant, actionable insight. We build LLM- and Vision-AI solutions for construction, heavy industry, and finance—helping teams convert drawings, specs, and regulations into real-time decisions. We’re expanding our GeoAI programs (incl. joint work with POSCO E&C) and launching drawing-change detection services that compare plan versions, detect deltas, and explain design impacts.

Why You’ll Love Working Here

  • Ship real things : Your models and tools can reach production pilots in weeks.
  • Mentorship, not bureaucracy : Learn directly from senior CV / LLM engineers and domain SMEs.
  • Global crew : 30 teammates across KR 🇰🇷 / PK 🇵🇰 / IN 🇮🇳; English-first collaboration.
  • Tech playground : YOLO / RT-DETR, Gemma-VL / Qwen-VL / LLaVA, PaddleOCR, LayoutLMv3, Triton—hands-on.

Role Overview

As a CV & Multimodal LLM Intern , you’ll support the end-to-end development of a version-aware drawing-diff engine (PDF / DWG raster & vector), symbol / text extraction, and change-impact narratives powered by RAG / LLM. You’ll prototype, evaluate, and iterate with fast feedback from real engineering users.

What You’ll Do (Intern Scope)

  • Drawing Change Analysis (CV) : assist in rasterization, layer parsing, vector geometry ops; train / evaluate detectors (YOLOv8 / RT-DETR / SAM); implement geometry-aware post-processing (IoU / topology / snapping).
  • Document & Layout Understanding : combine OCR (PaddleOCR / Tesseract) with layout models (DocFormer / LayoutLMv3 / Donut); normalize to structured JSON; help with version-aware entity tracking (gridlines, BH IDs, coordinates).
  • GeoAI & LLM / RAG : set up retrieval (BM25 + vector with reranking); ground LLM answers with citations and clickable evidence; draft change-impact summaries with rule prompts + LLM verification.
  • Productization Basics : package prototypes as FastAPI services or notebooks; write READMEs; contribute datasets, labeling guides, and simple A / B or ablation tests.
  • Minimum Qualifications

  • BS / MS student or recent graduate in CS / EE / CE / Geoinformatics / Civil (or similar).
  • Solid Python (3.x); foundations in DS / algorithms, linear algebra, probability.
  • Coursework / projects in CV and / or document AI (detection, segmentation, OCR, layout).
  • Familiar with PyTorch or TensorFlow; Git, Linux, Jupyter.
  • Clear written English; high learning velocity and ownership.
  • Nice to Have

  • Hands-on with YOLO / RT-DETR / Detectron2 / SAM; PaddleOCR / Tesseract; LayoutLMv3 / Donut.
  • Exposure to VLMs (Gemma-VL, Qwen-VL, LLaVA), CLIP, rerankers.
  • Experience with engineering drawings / CAD / PDF toolchains.
  • Basic FastAPI, Docker, ONNX / TensorRT / Triton.
  • Frontend (TypeScript / React) for quick review UIs.
  • Internship Details & Benefits

  • Type / Duration : Paid internship — 4 months (full-time preferred).
  • Compensation (India) : Stipend prorated from 6 LPA (INR 600,000 annualized ), paid monthly ( ≈ INR 50,000 / month during the internship).
  • For candidates outside India, compensation will be benchmarked to local market equivalents .
  • Conversion : High performers will receive a full-time offer upon successful completion of the 4-month internship.
  • Perks : Mentorship, cloud / GPU credits, real production impact.
  • Hiring Process (fast)

  • Intro call (15–20 min).
  • 48-hour mini task : simple drawing diff or OCR / layout extraction + short README (clarity >
  • polish).

  • Tech chat (45–60 min) : approach, trade-offs, evaluation.
  • Founder chat on culture & goals.
  • Offer.
  • How to Apply

    Email doaz@doaz.ai

    with subject [CV / LLM Intern – Your Name] and include :

  • Résumé / CV (highlight courses / projects; metrics if available).
  • GitHub or demo links (CV / doc-AI / RAG preferred).
  • Availability (start date, weekly hours).
  • (Optional) A one-page diagram of your “Drawing Revision → Detection → Evidence → LLM Narrative” pipeline.
  • Ready to learn fast and turn messy drawings into trusted intelligence? Join Doaz and build with us.
  • Create a job alert for this search

    Computer Vision Intern • Visakhapatnam, IN