Responsibilities
- Deploy and scale LLM inference workloads on Kubernetes (K8s) with 99.9% uptime.
- Build agentic tools and services for fraud investigations with complex reasoning capabilities.
- Work with Platform Engineers to set up monitoring and observability (e.g., Prometheus, Grafana) to track model performance and system health.
- Fine-tune open-source LLMs using TRL or similar libraries.
- Use Terraform for infrastructure-as-code to support scalable ML deployments.
- Contribute to Tech blogs, especially technical deep dives of the latest research in the field of reasoning.
Requirements
Strong programming skills (Python, etc.) and problem-solving abilities.Hands-on experience with open-source LLM inference and serving frameworks such as vLLM.Deep expertise in Kubernetes (K8s) for orchestrating LLM workloads.Some familiarity with fine-tuning and deploying open-source LLMs using GRPO, TRL, or similar frameworks.Deep expertise in Kubernetes (K8s) for orchestrating LLM workloads.Familiarity with / Knowledge of high-availability systems.Skills Required
Python, Kubernetes, Problem Solving