Talent.com
Site Reliability Engineer

Site Reliability Engineer

Datum Technologies GroupDavanagere, IN
18 hours ago
Job description

Job Title : Site Reliability Engineer (SRE) – Azure & AI

Experience : 7+ years

Work Mode : Hybrid

Work Location : Chennai / Mumbai / Gurgaon

Job Summary :

We are looking for an experienced Site Reliability Engineer (SRE) with strong expertise in Microsoft Azure , AI infrastructure , and automation . The ideal candidate will have a solid background in managing cloud environments using GitHub / Azure DevOps , and hands-on experience in AI model deployment and scaling . This role involves working closely with engineering teams to deliver reliable, secure, and scalable cloud infrastructure that supports AI workloads and enterprise applications.

Key Responsibilities :

  • Design, build, and maintain scalable cloud infrastructure on Microsoft Azure .
  • Automate infrastructure provisioning and deployment using Terraform , Argo , and Helm .
  • Manage and optimize Azure Kubernetes Service (AKS) for AI and microservices workloads.
  • Support AI model hosting using frameworks such as Huggingface Transformers , vLLM , or Llama.cpp on Azure OpenAI , VMs , or GPUs .
  • Implement CI / CD pipelines using GitHub Actions and integrate with JFrog Artifactory .
  • Monitor and maintain system performance and reliability using Grafana , ensuring proactive issue resolution.
  • Collaborate with development teams to align infrastructure with application requirements.
  • Enforce networking and information security best practices .
  • Manage and optimize caching and data layer performance using Redis .

Required Skills & Technologies :

  • Azure Cloud Services (including Azure OpenAI )
  • AI Model Hosting & Infrastructure
  • GitHub (CI / CD, workflows)
  • Azure Kubernetes Service (AKS)
  • Argo , Helm , Terraform
  • Docker , JFrog , Grafana
  • Networking & Security , Redis
  • Create a job alert for this search

    Site Reliability Engineer • Davanagere, IN