Why this role
We're building agentic AI for recruitment workflows—sourcing, screening, interview assistance, and offer orchestration. You'll own LLM / agent design, retrieval, evaluation, safety, and targeted traditional ML models where they outperform or complement LLMs.
What you'll do
- Hands-on AI (70–80%) : design & build agent workflows (tool use, planning / looping, memory, self-critique) using multi-agent frameworks (e.g., LangChain , LangGraph ; plus experience with similar ecosystems like AutoGen / CrewAI is a plus).
- Retrieval & context (RAG) : chunking, metadata, hybrid search, query rewriting, reranking, and context compression.
- Traditional ML : design and ship supervised / unsupervised models for ranking, matching, dedup, scoring, and risk / quality signals.
- Feature engineering, leakage control, CV strategy, imbalanced learning, and calibration.
- Model families : Logistic / Linear, Tree ensembles, kNN, SVMs, clustering, basic time-series.
- Evaluation & quality : offline / online evals (goldens, rubrics, A / B), statistical testing, human-in-the-loop; build small, high-signal datasets.
- Safety & governance : guardrails (policy / PII / toxicity), prompt hardening, hallucination containment; bias / fairness checks for ML.
- Cost / perf optimization : model selection / routing, token budgeting, latency tuning, caching, semantic telemetry.
- Light MLOps (in-collab) : experiment tracking, model registry, reproducible training; coordinate batch / real-time inference hooks with platform team.
- Mentorship : guide 2–3 juniors on experiments, code quality, and research synthesis.
- Collaboration : pair with full-stack / infra teams for APIs / deploy; you won't own K8s / IaC.
What you've done (must-haves)
8–10 years in software / AI with recent deep focus on LLMs / agentic systems plus delivered traditional ML projects.Strong Python ; solid stats / ML fundamentals (bias-variance, CV, A / B testing, power, drift).Built multi-agent or tool-using systems with LangChain and / or LangGraph (or equivalent), including function / tool calling and planner / executor patterns.Delivered RAG end-to-end with vector databases ( pgvector / FAISS / Pinecone / Weaviate ), hybrid retrieval, and cross-encoder re-ranking .Trained and evaluated production ML models using scikit-learn and tree ensembles ( XGBoost / LightGBM / CatBoost ); tuned via grid / Bayes / Optuna.Set up LLM and ML evals (RAGAS / DeepEval / OpenAI Evals or custom), with clear task metrics and online experiments.Implemented guardrails & safety and measurable quality gates for both LLM and ML features.Product sense : translate use-cases into tasks / metrics; ship iteratively with evidence.Nice to have
Re-ranking (bi-encoders / cross-encoders), ColBERT; semantic caching; vector DBs (pgvector / FAISS / Pinecone / Weaviate).Light model serving (vLLM / TGI) and adapters (LoRA); PyTorch experience for small finetunes.Workflow engines (Temporal / Prefect); basic time-series forecasting; causal inference / uplift modeling for experiments.HRTech exposure (ATS / CRM, interview orchestration, assessments).Skills Required
XGBoost, Python