Junior ML Engineer – LLM Infrastructure & Orchestration
About Us
We are a legal AI platform that ingests entire contracts and runs long-context, multimodal LLM pipelines on AWS Bedrock (Claude) and Vertex AI (Gemini) .
We operate schema-constrained LLM systems : prompts define intent, and Pydantic models enforce structure, validation, and reliability across production workflows.
We’re hiring an ML Engineer (~1 year experience) to own LLM orchestration, latency, and scaling for workflows already live with customers. Available to join immediately or within 1 month
This role is production ML systems engineering , not model training.
What You’ll Do
- Build and operate end-to-end LLM pipelines for full-document analysis (100–500+ page contracts)
- Implement schema-first LLM inference using Pydantic to produce deterministic, typed outputs
- Own LLM orchestration logic : prompt routing, validation, retries, fallbacks, and partial re-execution
- Optimize latency, throughput, and cost for long-context inference (batching, streaming, async execution)
- Build and scale OCR → document parsing → LLM inference pipelines for scanned leases (Textract)
- Develop streaming and async APIs using FastAPI
- Manage distributed background workloads with Celery (queues, retries, idempotency, backpressure)
- Productionize report generation (DOCX / EXCEL) as deterministic pipeline outputs
- Deploy, monitor, and scale inference workloads on AWS (Bedrock, EC2, S3, Lambda)
- Debug production issues : timeouts, schema failures, partial extractions, cost spikes
What You’ll Own Technically
Pydantic-based schemas for all LLM outputsPrompt ↔ schema contracts and versioningValidation, retry, and fallback mechanismsLatency and cost optimization for long-context inferenceReliability of OCR + LLM pipelines at scaleMust Have
Strong Python and async programming fundamentals~1 year experience working on production ML or LLM systemsHands-on experience with Claude, Gemini , and AWS BedrockExperience with schema-constrained LLM outputs (Pydantic, JSON Schema, or similar)Experience with OCR and document-heavy pipelinesExperience with Celery or distributed async job systemsComfort treating LLMs as non-deterministic services requiring validation and retriesIndividual contributor mindset in a lean startupAvailable to join immediately or within 1 monthNice to Have (Strong ML Signals)
Experience with streaming LLM responsesFamiliarity with long-context failure modes and truncation issuesExperience with LLM output evaluation or regression testingCost monitoring and optimization for LLM inferenceWhy Join Us
Work on real production ML systems , not demosOwn core LLM infrastructure end-to-endDirect exposure to long-context, document-scale AIFully remote, fast-paced startupCTC : ₹9,00,000 – ₹12,00,000 (based on experience & impact)