Talent.com
This job offer is not available in your country.
Lead ML Engineer

Lead ML Engineer

Fission Labshyderabad, telangana, in
16 hours ago
Job description

Role : AI / ML Lead

Experience : 7-11 years

Designation : Associate Architect

Location : Hyderabad & Pune

______________________________________________

GEN AI & FINE TUNING OF LLMs IS MUST

______________________________________________

Key Responsibilities :

Architecture & Infrastructure

  • Design, implement, and optimize end-to-end ML training workflows including infrastructure setup, orchestration, fine-tuning, deployment, and monitoring.
  • Evaluate and integrate multi-cloud and single-cloud training options across AWS and other major platforms.
  • Lead cluster configuration, orchestration design, environment customization, and scaling strategies.
  • Compare and recommend hardware options (GPUs, TPUs, accelerators) based on performance, cost, and availability.

Technical Expertise Requirements

  • At least 4-5 years in AI / ML infrastructure and large-scale training environments.
  • Expert in AWS cloud services (EC2, S3, EKS, SageMaker, Batch, FSx, etc.) and familiar with Azure, GCP, and hybrid / multi-cloud setups.
  • Strong knowledge of AI / ML training frameworks (PyTorch, TensorFlow, Hugging Face, DeepSpeed, Megatron, Ray, etc.).
  • Proven experience with cluster orchestration tools (Kubernetes, Slurm, Ray, SageMaker, Kubeflow).
  • Deep understanding of hardware architectures for AI workloads (NVIDIA, AMD, Intel Habana, TPU).
  • LLM Inference Optimization

  • Expert knowledge of inference optimization techniques including speculative decoding, KV cache optimization (MQA / GQA / PagedAttention), and dynamic batching.
  • Deep understanding of prefill vs decode phases, memory-bound vs compute-bound operations.
  • Experience with quantization methods (INT4 / INT8, GPTQ, AWQ) and model parallelism strategies.
  • Inference Frameworks

  • Hands-on experience with production inference engines : vLLM, TensorRT-LLM, DeepSpeed-Inference, or TGI.
  • Proficiency with serving frameworks : Triton Inference Server, KServe, or Ray Serve.
  • Familiarity with kernel optimization libraries (FlashAttention, xFormers).
  • Performance Engineering

  • Proven ability to optimize inference metrics : TTFT (first token latency), ITL (inter-token latency), and throughput.
  • Experience profiling and resolving GPU memory bottlenecks and OOM issues.
  • Knowledge of hardware-specific optimizations for modern GPU architectures (A100 / H100).
  • Fine tuning

  • Drive end-to-end fine-tuning of LLMs, including model selection, dataset preparation / cleaning, tokenization, and evaluation with baseline metrics.
  • Configure and execute fine-tuning experiments (LoRA, QLoRA, etc.) on large-scale compute setups, ensuring optimal hyperparameter tuning, logging, and checkpointing.
  • Document fine-tuning outcomes by capturing performance metrics (losses, BERT / ROUGE scores, training time, resource utilization) and benchmark against baseline models.
  • If you've done only POCs and not production ready ML models which scale, Please skip to apply

    Create a job alert for this search

    Ml Engineer • hyderabad, telangana, in