Talent.com
No longer accepting applications
Machine Learning Infrastructure Engineer

Machine Learning Infrastructure Engineer

CareerXperts ConsultingHyderabad, Republic Of India, IN
10 days ago
Job description

Role Focus : Production ML Systems | GPU Orchestration | Inference at Scale

What You'll Actually Do (Not Buzzwords)

Infrastructure That Doesn't Break

  • Design and maintain the backbone for training, fine-tuning, and deploying ML models that actually work in production
  • Orchestrate GPU workloads on Kubernetes (EKS) with node autoscaling, intelligent bin-packing, and cost-aware scheduling (spot instances, preemptibles—you know the drill)
  • Build CI / CD pipelines that handle ML code, data versioning, and model artifacts like a well-oiled machine (GitHub Actions, Argo Workflows, Terraform)

Production ML, Not Science Projects

  • Partner with Data Scientists and ML Engineers to turn Jupyter notebooks into production-grade systems
  • Deploy and scale inference backends (vLLM, Hugging Face, NVIDIA Triton) that serve real traffic
  • Optimize GPU utilization because every idle A100 hour is money burning
  • Build observability that actually tells you why things broke (Prometheus, Grafana, OpenTelemetry)
  • Ship Fast, Sleep Well

  • Create tooling for seamless model deployment, instant rollback, and A / B testing
  • Lead incident response when production AI systems decide to have opinions
  • Work with security and compliance teams to implement best practices without slowing down innovation
  • What We're Really Looking For

    Must-Haves (No Negotiation)

  • 5+ years in MLOps, infrastructure, or platform engineering —you've been in the trenches
  • Production ML experience : At least one project that's serving real users, not a Kaggle competition
  • Kubernetes expertise with GPUs : You understand taints, tolerations, affinity rules, and why GPU scheduling is its own special hell
  • Cloud-native architecture (AWS preferred) : You think in VPCs, IAM roles, and cost optimization
  • Training pipeline experience : Set up or scaled training / fine-tuning for ML models in production (PyTorch Lightning, Hugging Face Accelerate, DeepSpeed)
  • IaC fluency : Terraform, Helm, Kustomize are second nature
  • Python engineering skills : You can debug a distributed training failure and fix it
  • Inference scaling : You've deployed and scaled inference workloads and lived to tell the tale
  • The "We're Very Interested" Signals

  • You mention scaling inference and we can see the fire in your eyes
  • You've used MLflow, W&B, or SageMaker Experiments and have opinions on which is best
  • You understand CI / CD for ML and why it's different from regular software
  • You've built monitoring systems that caught issues before users did
  • Nice to Have (But Seriously Nice)

  • GPU scheduling wizardry in Kubernetes
  • Model drift monitoring and versioning tools
  • Low-latency inference optimization (quantization, FP8, TensorRT—the good stuff)
  • Experience in compliance or regulated industries where "just ship it" isn't an option
  • What Makes This Role Different

    Ownership. You're not a ticket-taker or a consultant passing through. You'll own infrastructure that powers real AI products, make architectural decisions that matter, and have the autonomy to build things the right way.

    Impact. Your work directly affects model training speed, inference latency, GPU costs, and system reliability. You'll see the results of your optimizations in dollars saved and milliseconds gained.

    Quality over speed. We value security, operational excellence, and sustainable systems. No "move fast and break things" chaos here—we move deliberately and build things that last.

    The Reality Check

    This role is not for you if :

  • You prefer working on proofs-of-concept over production systems
  • You think "it works on my machine" is an acceptable answer
  • You haven't shipped ML systems to production
  • You're looking for pure research or pure DevOps (this is the intersection)
  • This role is for you if :

  • You get excited about making GPUs go brrr efficiently
  • You've been oncall for ML systems and learned hard lessons
  • You believe infrastructure is a product, not an afterthought
  • You want to build the foundation for AI that actually works
  • Write to mlops@careerxperts.com to get connected!

    Create a job alert for this search

    Machine Learning Engineer • Hyderabad, Republic Of India, IN

    Related jobs
    • Promoted
    AI Infrastructure Engineer

    AI Infrastructure Engineer

    Oxmiq LabsHyderabad, Republic Of India, IN
    Develop OXMIQ AI Infrastructure Management Software using state of the art agentic development flows.Objectively evaluate LLMs for incorporation into this architecture. Develop inference containers ...Show moreLast updated: 30+ days ago
    • Promoted
    Senior Cloud & ML Infrastructure Engineer

    Senior Cloud & ML Infrastructure Engineer

    Orbion InfotechHyderabad
    Key Responsibilities : - Architect, deploy, and manage scalable ML infrastructure on cloud platforms (AWS, GCP, or Azure). Design and maintain end-to-end ML pipelines ...Show moreLast updated: 30+ days ago
    • Promoted
    Deep Learning Engineer - Machine Learning Models

    Deep Learning Engineer - Machine Learning Models

    NXP SemiconductorsHyderabad
    Key Responsibilities : - Model Porting & Deployment : Port and deploy complex deep learning models from various frameworks (e. PyTorch, TensorFlow) to proprietary ...Show moreLast updated: 7 days ago
    • Promoted
    Machine Learning Engineer

    Machine Learning Engineer

    Bohiyaanam TalentHyderabad, Telangana, India
    Role : Machine Learning Engineer.Experience Preferred : 5-8 Years.Experience building production-grade machine learning deployments on AWS, Azure, or GCP including drift monitoring.Experience with th...Show moreLast updated: 21 days ago
    • Promoted
    Machine Learning Engineer

    Machine Learning Engineer

    TEKsystems Global Services in IndiaHyderabad, Republic Of India, IN
    Develop and manage data lakes and data warehouses using Azure Data Lake Storage Gen2 and Azure SQL / Data Warehouse.Build and deploy machine learning models using Azure Machine Learning, Mlflow.Integ...Show moreLast updated: 21 days ago
    • Promoted
    Leadsoc Technologies - Machine Learning Engineer

    Leadsoc Technologies - Machine Learning Engineer

    LeadSoc Technologies Pvt LtdHyderabad
    Description : Role : Machine Learning Performance Engineer (C++ Focus) This Job Description outlines a highly specialized, technically de...Show moreLast updated: 13 days ago
    • Promoted
    Infrastructure Engineer

    Infrastructure Engineer

    Tekskills Inc.Hyderabad, Telangana, India
    Oracle Linux Virtualization Manager (OLVM).The ideal candidate will be responsible for designing, implementing, and maintaining robust and scalable infrastructure solutions that support telecom-gra...Show moreLast updated: 10 days ago
    • Promoted
    Machine Learning Operations Engineer

    Machine Learning Operations Engineer

    ValueMomentumHyderabad, Republic Of India, IN
    Evaluate and source appropriate cloud infrastructure solutions for machine learning needs, ensuring cost-effectiveness and scalability based on project requirements. Automate and manage the deployme...Show moreLast updated: 13 days ago
    • Promoted
    Machine Learning Engineer - AWS Platform

    Machine Learning Engineer - AWS Platform

    Digihelic Solutions Private LimitedHyderabad
    Location : Pune, Bangalore, Hyderabad, Trivandrum, Chennai, Kochi, Gurgaon, Noida.Key Summary : The MLE will design, build, test, and deplo...Show moreLast updated: 14 days ago
    • Promoted
    Machine Learning Engineer

    Machine Learning Engineer

    BayOne SolutionsHyderabad, IN
    Machine Learning Development & Implementation (40%).Design and implement end-to-end ML pipelines for recommendation systems, search ranking, and classification problems. Build and optimize tradition...Show moreLast updated: 30+ days ago
    • Promoted
    Machine Learning Engineer

    Machine Learning Engineer

    5G-AIHyderabad
    Job Summary : We are seeking a highly experienced and skilled Machine Learning Software Engineer with 8-10 years of experience to join our...Show moreLast updated: 30+ days ago
    • Promoted
    Machine Learning Engineer

    Machine Learning Engineer

    Aakaar AIHyderabad, IN
    We're a small but ambitious startup revolutionizing the e-commerce landscape with cutting-edge AI solutions.Our mission is to empower sellers with the tools and insights they need to thrive in the ...Show moreLast updated: 11 days ago
    • Promoted
    • New!
    Machine Learning Pipeline Engineer

    Machine Learning Pipeline Engineer

    Incedo Inc.Hyderabad, Republic Of India, IN
    ML pipeline creation, drift monitoring and control.Automating CI / CD pipelines to account for data, code, and model changes. Develop and deploy CI / CD-based automated ML application pipelines (collect...Show moreLast updated: 13 hours ago
    • Promoted
    Developer for OXMIQ AI Infrastructure Management System

    Developer for OXMIQ AI Infrastructure Management System

    Oxmiq Labshyderabad, telangana, in
    Develop OXMIQ AI Infrastructure Management Software using state of the art agentic development flows.Objectively evaluate LLMs for incorporation into this architecture. Develop inference containers ...Show moreLast updated: 30+ days ago
    • Promoted
    Computational Infrastructure Engineer

    Computational Infrastructure Engineer

    SHI Solutions India Pvt. Ltd.Hyderabad, Republic Of India, IN
    HPC Engineer(L2) with Application Expertise.An L2 HPC (High-Performance Computing) Engineer with an application skillset is responsible for supporting, troubleshooting, and maintaining HPC infrastr...Show moreLast updated: 21 days ago
    • Promoted
    Lead Machine Learning Engineer

    Lead Machine Learning Engineer

    S&P GlobalHyderabad, Telangana, India
    This job is with S&P Global, an inclusive employer and a member of myGwork – the largest global platform for the LGBTQ+ business community. Please do not contact the recruiter directly.About the Rol...Show moreLast updated: 6 days ago
    • Promoted
    Machine Learning Engineer

    Machine Learning Engineer

    RecroHyderabad, IN
    We are looking for an experienced.Azure and AWS cloud ecosystems.The ideal candidate should bring a strong background in. GenAI tooling, automation, and CI / CD pipelines.Design, implement, and manage...Show moreLast updated: 30+ days ago
    • Promoted
    AI Infrastructure Engineer

    AI Infrastructure Engineer

    ValueMomentumHyderabad, Republic Of India, IN
    Evaluate and source appropriate cloud infrastructure solutions for machine learning needs, ensuring cost-effectiveness and scalability based on project requirements. Automate and manage the deployme...Show moreLast updated: 30+ days ago