We are looking for someone obsessed with turning messy real-world documents into perfectly structured, actionable data.
If you live and breathe document layout analysis, OCR post-processing, and visual + language models, this role is custom-built for you.
Core Responsibilities
Design and build production-grade Document Intelligence pipelines (invoices, contracts, forms, reports, handwritten, tables, multi-language, etc.)
Train / fine-tune and deploy Layout-aware models (Layout LMv3, Donut, LLa MA-Adapter, Nougat, etc.)
Build and optimize Vision-Language models (VLLMs) on custom enterprise document datasets
Improve OCR accuracy using layout context, post-correction with LLMs, and geometric reasoning
Own the full stack : data preparation → model training (Py Torch) → evaluation → ONNX / Tensor RT optimization → Fast API deployment
Push boundaries on table extraction, key-value pairing, nested hierarchies, and multi-page document understanding
Must-Have Skills & Experience
Very strong understanding of document layout analysis (bounding boxes, reading order, logical blocks, nested tables, headers / footers, multi-column detection)
Hands-on experience with modern Document AI architectures :
Layout LMv1 / v2 / v3, Doc Former, Layout XLM, Donut, Pix2 Struct, Nougat, UDOP, etc.
Vision-Language models (LLa VA, Qwen-VL, Intern VL, Pali Gemma, etc.)
Deep experience fine-tuning and serving LLMs & VLLMs (Llama-3, Mistral, Phi-3-vision, Qwen, etc.) using PEFT (Lo RA / QLo RA), v LLM, TGI, or Ollama
Strong Py Torch proficiency (custom trainers, distributed training with DDP / FSDP, Torch Compile, mixed precision)
Solid grasp of OCR ecosystems and post-processing (Tesseract, Easy OCR, Paddle OCR, AWS Textract / Google Document AI limitations and how to beat them)
Experience building datasets from real enterprise documents (Labelling tools : UBIAI, Label Studio, Doccano, custom UI)
Good applied math : Transformers, attention mechanisms, positional encodings (especially 2 D layouts), Ro PE, ALi Bi
Nice-to-Have (Big Bonus)
Published research or open-source contributions in Document AI / VLLM space
Experience with multimodal RAG over documents
ONNX / Tensor RT / Deep Speed optimization for low-latency inference
Kubernetes + GPU scheduling (we run our own bare-metal cluster)
Who thrives here?
You get excited when you see a 50-page scanned purchase order with overlapping stamps and handwritten notes — because you already know exactly how you’re going to destroy it.
Perks
Work directly on enterprise deals worth crores — your model = real revenue impact
Unlimited GPU access (A100s & H100s in-house)
Senior Ai Engineer • Hyderabad, Andhra Pradesh, India