Machine Learning Engineer III – LLM Training (RL + PEFT)
📍 On-site, Bangalore
🏢 LatentForce
About the Role
We are building specialized LLMs that understand and reason over massive enterprise codebases. This is real model training — RL loops, PEFT, verifiable rewards, long-context modeling — not API integration. You’ll own end-to-end experimentation and work directly with founders.
Responsibilities
- Train LLMs using RL (PPO / GRPO / RLHF / RLVR) and PEFT (LoRA, QLoRA, DoRA, IA3).
- Build custom training loops with PyTorch, HuggingFace, TRL, Unsloth .
- Design reward functions and verifiers for code-understanding tasks.
- Run full-stack ML experiments : data → training → eval → iteration.
- Develop scalable training infra (FSDP / DeepSpeed, distributed training).
- Build evaluation suites for reasoning and code comprehension.
Minimum Qualifications
3+ years of real deep learning experience (actual model training).Strong fundamentals : linear algebra, probability, optimization, statistics .Proven experience training transformers or large DNNs from scratch or checkpoints.Proficiency in PyTorch , HuggingFace , TRL , Unsloth .Experience implementing RL algorithms or custom training pipelines.Research exposure (publications / preprints) or strong open-source work.Ability to debug training issues (NaNs, KL drift, reward hacking, etc.).Startup mindset;comfortable with fast, on-site, high-performance execution.
Nice to Have
DeepSpeed / FSDP, model parallelism, vLLM.Program analysis / AST tooling.Long-context modeling experience.Why Join Us
Build specialized LLMs at a well-funded early-stage company.Direct work with founders;high ownership and technical depth.
High-impact role shaping core training architecture.Apply here : https : / / forms.Gle / KUyohXyBjbU8gFC69