We are looking for someone obsessed with turning messy real-world documents into perfectly structured, actionable data.
If you live and breathe document layout analysis, OCR post-processing, and visual + language models, this role is custom-built for you.
Core Responsibilities
- Design and build production-grade Document Intelligence pipelines (invoices, contracts, forms, reports, handwritten, tables, multi-language, etc.)
- Train / fine-tune and deploy Layout-aware models (LayoutLMv3, Donut, LLaMA-Adapter, Nougat, etc.)
- Build and optimize Vision-Language models (VLLMs) on custom enterprise document datasets
- Improve OCR accuracy using layout context, post-correction with LLMs, and geometric reasoning
- Own the full stack : data preparation → model training (PyTorch) → evaluation → ONNX / TensorRT optimization → FastAPI deployment
- Push boundaries on table extraction, key-value pairing, nested hierarchies, and multi-page document understanding
Must-Have Skills & Experience
Very strong understanding of document layout analysis (bounding boxes, reading order, logical blocks, nested tables, headers / footers, multi-column detection)Hands-on experience with modern Document AI architectures :LayoutLMv1 / v2 / v3, DocFormer, LayoutXLM, Donut, Pix2Struct, Nougat, UDOP, etc.Vision-Language models (LLaVA, Qwen-VL, InternVL, PaliGemma, etc.)Deep experience fine-tuning and serving LLMs & VLLMs (Llama-3, Mistral, Phi-3-vision, Qwen, etc.) using PEFT (LoRA / QLoRA), vLLM, TGI, or OllamaStrong PyTorch proficiency (custom trainers, distributed training with DDP / FSDP, TorchCompile, mixed precision)Solid grasp of OCR ecosystems and post-processing (Tesseract, EasyOCR, PaddleOCR, AWS Textract / Google Document AI limitations and how to beat them)Experience building datasets from real enterprise documents (Labelling tools : UBIAI, Label Studio, Doccano, custom UI)Good applied math : Transformers, attention mechanisms, positional encodings (especially 2D layouts), RoPE, ALiBiNice-to-Have (Big Bonus)
Published research or open-source contributions in Document AI / VLLM spaceExperience with multimodal RAG over documentsONNX / TensorRT / DeepSpeed optimization for low-latency inferenceKubernetes + GPU scheduling (we run our own bare-metal cluster)Who thrives here?
You get excited when you see a 50-page scanned purchase order with overlapping stamps and handwritten notes — because you already know exactly how you’re going to destroy it.
Perks
Work directly on enterprise deals worth crores — your model = real revenue impactUnlimited GPU access (A100s & H100s in-house)