Talent.com
SRE/DevOps Team Lead
SRE/DevOps Team LeadInfinite Computer Solutions • Bengaluru, Republic Of India, IN
No longer accepting applications
SRE / DevOps Team Lead

SRE / DevOps Team Lead

Infinite Computer Solutions • Bengaluru, Republic Of India, IN
7 days ago
Job description

We are looking for Site Reliability / Cloud Engineer Devops Lead / SSE

Experience - 6 years - 12 years

Can join immediate - 30 days

Shift timing : Regular

Location : Bangalore / Hyderabad / Chennai / Noida / Pune / Gurgaon / Visakhapatnam

Interested candidates, Please share your profiles and below details to

Email ID : shanmukh.varma@infinite.com

Total experience :

Relevant Experience :

Current CTC :

Expected CTC :

Notice Period :

If Serving Notice Period, Last working day :

Email ID : shanmukh.varma@infinite.com

Job Title : Site Reliability / Cloud Engineer

Job Type : Full-time

Department : Engineering

Job Summary

We're seeking a motivated, and passionate Site Reliability Engineering (SRE) leader with strong expertise in programming, distributed systems, and Kubernetes. In this role, you'll help evolve our SRE team's Kubernetes and microservices architecture, while also supporting the integration of Agentic AI workloads both within Kubernetes and via managed services.

The SRE function plays a critical role in maintaining system visibility, ensuring platform scalability, and enhancing operational efficiency. As part of this, you'll help drive AIOps initiatives, leveraging AI tools and automation to proactively detect, diagnose, and remediate issues, enhancing the reliability and performance of Zyter’s global platform. As a cloud practictioner, you’ll have the opportunity to apply your technical strengths, shape platform reliability strategies, and collaborate closely with engineering teams across the organization. You’ll work as part of a globally distributed, inclusive team focused on AWS-based cloud infrastructure.

Key Responsibilities

Core SRE :

  • Collaborate with development teams, product owners, and stakeholders to define, enforce, and track SLOs and manage error budgets.
  • Improve system reliability by designing for failure, testing edge cases, and monitoring key metrics.
  • Boost performance by identifying bottlenecks, optimizing resource usage, and reducing latency across services.
  • Build scalable systems that handle growth in traffic or data without compromising performance.
  • Stay directly involved in technical work, contributing to the codebase and leading by example in solving complex infrastructure challenges

AI Ops :

  • Design and implement scalable deployment strategies optimized for large language models like, Llama, Claude, Cohere and others.
  • Set up continuous monitoring for model performance, ensuring robust alerting systems are in place to catch anomalies or degradation.
  • Stay current with advancements in MLOps and Generative AI, proactively introducing innovative practices to strengthen AI infrastructure and delivery.
  • Monitoring and Alerting :

  • Set up monitoring and observability using Prometheus, Grafana, CloudWatch, and logging with OpenSearch / ELK
  • Proactively identify and resolve issues by leveraging monitoring systems to catch early signals before they impact operations.
  • Design and maintain alerting mechanisms that are clear, actionable, and tuned to avoid unnecessary noise or alert fatigue.
  • Continuously improve system observability to enhance visibility, reduce false positives, and support faster incident response.
  • Apply best practices for alert thresholds and monitoring configurations to ensure reliability and maintain system health.
  • Cost Management :

  • Monitor infrastructure usage to identify waste and reduce unnecessary spending.
  • Optimize resource allocation by using right-sized instances, auto-scaling, and spot instances where appropriate.
  • Implement cost-aware design practices during architecture and deployment planning.
  • Track and analyze monthly cloud costs to ensure alignment with budget and forecast.
  • Collaborate with teams to increase cost visibility and promote ownership of cloud spend.
  • Required Skills & Experience :

  • Strong experience as SRE with a proven track record of managing large-scale, highly available systems.
  • Knowledge of core operating system principles, networking fundamentals, and systems management.
  • Strong understanding of cloud deployment and management practices
  • Hands-on experience with Terraform / OpenTofu, Helm, Docker, Kubernetes, Prometheus and Istio
  • Hands-on experience with tools and techniques to diagnose and uncover container performance
  • Skilled with AWS services both from technology and cost perspectives
  • Skilled in DevOps / SRE practices and build / release pipelines
  • Experience working with mature development practices and tools for source control, security, and deployment
  • Hands on experience with Python / Golang / Groovy / Java
  • Excellent communication skills, written and verbal
  • Strong analytical and problem-solving skills
  • Preferred Qualifications

  • Experience scaling Kubernetes clusters and managing ingress traffic.
  • Familiarity with multi-environment deployments and automated workflows.
  • Knowledge of AWS service quotas, cost optimization, and networking nuances.
  • Strong troubleshooting skills and effective communication across teams.
  • Prior experience in regulated environments (HIPAA, SOC2, ISO27001) is a plus
  • Create a job alert for this search

    Team Lead • Bengaluru, Republic Of India, IN

    Related jobs
    Technical Lead - DevSecOps

    Technical Lead - DevSecOps

    Infosys Finacle • hosur, tamil nadu, in
    Role : DevSecOps Developer – Secure Coding & Automation.Strong scripting skills in Python, Shell, or similar languages for automation and tooling. Should be able to design, develop, test, and deploy...Show more
    Last updated: 16 days ago • Promoted
    SRE / DevOps

    SRE / DevOps

    Confidential • Bengaluru / Bangalore
    Demonstrated ability in designing, building, refactoring and releasing software written in Python.ML frameworks such as PyTorch, TensorFlow, Triton. Ability to handle framework-related issues, versi...Show more
    Last updated: 19 days ago • Promoted
    Sre (Devops)

    Sre (Devops)

    Cozzera • Hosūr, Republic Of India, IN
    Manage and optimize cloud infrastructure with strong hands-on expertise in.Automate deployment pipelines and ensure high availability and scalability of services. Troubleshoot production issues and ...Show more
    Last updated: 11 hours ago • Promoted • New!
    Tech Lead -Database SRE

    Tech Lead -Database SRE

    London Stock Exchange Group • Bangalore, India
    Job Description (Job advert content).LSEG (London Stock Exchange Group) is more than a diversified global financial markets infrastructure and data business. We are dedicated, open-access partners w...Show more
    Last updated: 11 days ago • Promoted
    SRE Lead

    SRE Lead

    TEKsystems Global Services in India • Bengaluru, Karnataka, India
    Location – Bengaluru / Hyderabad Only.Notice Period – Immediate to 20 days.Hands on experience in TDC / CD technology stack(SRE). Bachelor’s degree / 4-year college degree in Computer Science or enginee...Show more
    Last updated: 26 days ago • Promoted
    SRE Team Lead

    SRE Team Lead

    Media.net • Bengaluru, Republic Of India, IN
    Net is a leading, global ad tech company that focuses on creating the most transparent and efficient path for advertising budgets to become publisher revenue. Our proprietary contextual technology i...Show more
    Last updated: 3 days ago • Promoted
    Team Lead

    Team Lead

    Zensar Technologies • hosur, tamil nadu, in
    ZENSAR -TEAM LEAD | PROJECT MANAGER OPPORTUNITY FOR GEN AI PROJECT.Dear Aspirant, Greetings from Zensar!!.We are a technology consulting and services company with over 11,500 associates in 33 globa...Show more
    Last updated: 12 days ago • Promoted
    Customer Solutions Architect - SRE / DevOps

    Customer Solutions Architect - SRE / DevOps

    Rakuten India • Bengaluru, Republic Of India, IN
    We are seeking a highly skilled and experienced Solutions Lead for our Observability domain.In this critical role, you will be responsible for designing, implementing, and optimizing our comprehens...Show more
    Last updated: 15 days ago • Promoted
    Lead SRE - Observability

    Lead SRE - Observability

    Palo Alto Networks • Bengaluru, Republic Of India, IN
    At Palo Alto Networks® everything starts and ends with our mission : .Being the cybersecurity partner of choice, protecting our digital way of life. Our vision is a world where each day is safer and m...Show more
    Last updated: 8 days ago • Promoted
    SRE (Devops)

    SRE (Devops)

    Cozzera • bangalore district, karnataka, in
    Manage and optimize cloud infrastructure with strong hands-on expertise in.Automate deployment pipelines and ensure high availability and scalability of services. Troubleshoot production issues and ...Show more
    Last updated: 13 hours ago • Promoted • New!
    Solutions Architect - SRE / Devops

    Solutions Architect - SRE / Devops

    Rakuten India • Bengaluru, Karnataka, India
    We are seeking a highly skilled and experienced Solutions Lead for our Observability domain.In this critical role, you will be responsible for designing, implementing, and optimizing our comprehens...Show more
    Last updated: 15 days ago • Promoted
    Senior SRE

    Senior SRE

    Delta Air Lines • Bengaluru, India
    Execute on the Incident, Change Management, Problem Management processes.Building and supporting a reliable application suite for the environment in order to meet the development and maintenance re...Show more
    Last updated: 12 hours ago • Promoted • New!
    Senior Manager - SRE

    Senior Manager - SRE

    London Stock Exchange Group • Bangalore, India
    Cloud (AWS,Azure) certified Engineer to manage infrastructure deployment within cloud environment.Collaborate with wider software development teams to ensure reliable application deployment and mai...Show more
    Last updated: 30+ days ago • Promoted
    Lead DevOps / SRE Engineer

    Lead DevOps / SRE Engineer

    Confidential • Bengaluru / Bangalore
    Design and implementation of monitoring strategies.Improving reliability, stability, and performance of production systems. Leading automation of engineering and operations processes.Systems adminis...Show more
    Last updated: 30+ days ago • Promoted
    SRE / DevOps Architect

    SRE / DevOps Architect

    Rakuten India • Bengaluru, Republic Of India, IN
    We are seeking a highly skilled and experienced Solutions Lead for our Observability domain.In this critical role, you will be responsible for designing, implementing, and optimizing our comprehens...Show more
    Last updated: 15 days ago • Promoted
    Azure Cloud Team Lead

    Azure Cloud Team Lead

    LTIMindtree • Bengaluru, Karnataka, India
    Notice Period : Immediate to 30 days.Please do not apply Notice Period with more than 30 days of Notice Period.Azure Services (must include all services).Show more
    Last updated: 30+ days ago • Promoted
    Senior Amazon Redshift

    Senior Amazon Redshift

    Vidhema Technologies • hosur, tamil nadu, in
    Notice Period : Immediate Joiners Preferred.We are looking for an experienced Senior.Amazon Redshift Developer to lead the design, setup, and management of new Redshift projects from the ground up.T...Show more
    Last updated: 1 day ago • Promoted
    Senior Cloud SRE Engineer (Azure)

    Senior Cloud SRE Engineer (Azure)

    London Stock Exchange Group • Bangalore, India
    In this role, you will be joining our Cloud SRE team within.Cloud & Productivity Engineering.This team focuses on applying software Engineering practices to IT operations tasks to maintain and impr...Show more
    Last updated: 30+ days ago • Promoted