Talent.com
This job offer is not available in your country.
Software Engineer - LLM Applications

Software Engineer - LLM Applications

Talent CornerGurgaon
21 days ago
Job description

JD : What youll do :

  • Build LLM apps : Design APIs, microservices, and UIs that use function calling, tools, and streaming responses.
  • RAG pipelines : Ingest / clean data, chunk / embeddings, set retrieval strategies (BM25 / hybrid), and tune for relevance & latency.
  • Prompt & policy engineering : Craft prompts, guardrails, and safety checks (PII redaction, jailbreak defense).
  • Model ops : Integrate managed (Azure OpenAI) and open-source (Llama, Mistral) models; choose / optimize runtimes (vLLM / Triton).
  • Evaluation & quality : Establish automatic evals (correctness, toxicity, hallucination, latency, cost / token); build golden test sets and CI gates.
  • Observability : Add tracing, metrics, and logs (OpenTelemetry); set error budgets & SLOs.
  • Security & compliance : Secrets / RBAC, data residency, audit trails; align to SOC2 / GDPR.
  • Cost control : Token budgeting, caching, batching, quantization / LoRA where appropriate.
  • Collaboration : Partner with Product, SecOps, and FinOps; review PRs and mentor juniors.

Minimum qualifications :

  • 4 - 5 years software engineering (Python or TypeScript) shipping production services.
  • Hands-on LLM experience (?12 years) : built at least one production feature using OpenAI / Azure OpenAI / Bedrock / Vertex or OSS models.
  • RAG with a vector DB (Pinecone, Redis, pgvector, Weaviate, Milvus) and embedding models.
  • Solid with APIs (REST / GraphQL), Git, testing, and CI / CD (GitHub Actions / Azure DevOps).
  • Cloud fundamentals on Azure / AWS / GCP, containers (Docker, Kubernetes basics).
  • Clear written & verbal communication; comfort with docs and design reviews.
  • Nice to have :

  • Agent frameworks (LangChain, LlamaIndex, Semantic Kernel, OpenAI Assistants), tools & MCPs.
  • Evals frameworks (Ragas, DeepEval, Promptfoo), AB testing, offline / online metrics.
  • Fine-tuning / LoRA, distillation, quantization; DSPy; retrieval re-ranking.
  • Event systems (Kafka), queues (SQS), and caching layers.
  • Frontend familiarity (React / Next.js) for rapid prototyping.
  • Tech stack (example) :

  • Models : Azure OpenAI (GPT-4.x),
  • Orchestration : OpenAI Assistants
  • Data / RAG : Azure Cognitive Search
  • Pipelines : GitHub Actions, Docker, Kubernetes / AKS, Terraform (AVM)
  • Observability : OpenTelemetry, Grafana
  • Testing / Evals : PyTest, SonarCloud
  • (ref : hirist.tech)

    Create a job alert for this search

    Application Engineer • Gurgaon