Talent.com
Lead Site Reliability Engineer

Lead Site Reliability Engineer

ConfidentialBengaluru / Bangalore, India
6 days ago
Job description

Job Title : SRE Lead (Engineering & Reliability)

Job Summary :

We are seeking an experienced and dynamic Site Reliability Engineering (SRE) Lead to oversee the reliability, scalability, and performance of our critical systems. As an SRE Lead, you will play a pivotal role in establishing and implementing SRE practices, leading a team of engineers, and driving automation, monitoring, and incident response strategies. This position combines software engineering and systems engineering expertise to build and maintain high-performing, reliable systems.

Experience : 6+ years

Key Responsibilities :

Reliability & Performance :

  • Lead efforts to maintain high availability and reliability of critical services.
  • Define and monitor SLIs, SLOs, and SLAs to ensure business requirements are met.
  • Proactively identify and resolve performance bottlenecks and system inefficiencies.

Incident Management & Response :

  • Establish and improve incident management processes and on-call rotations.
  • Lead incident response and root cause analysis for high-priority outages.
  • Drive post-incident reviews and ensure actionable insights are implemented.
  • Automation & Tooling :

  • Develop and implement automated solutions to reduce manual operational tasks.
  • Enhance system observability through metrics, logging, and distributed tracing tools (e.g., Prometheus, Grafana, Elastic APM).
  • Optimize CI / CD pipelines for seamless deployments.
  • Collaboration :

  • Partner with software engineering teams to improve the reliability of applications and infrastructure.
  • Work closely with product / engineering teams to design scalable and robust systems.
  • Ensure seamless integration of monitoring and alerting systems across teams.
  • Leadership & Team Building :

  • Manage, mentor, and grow a team of SREs.
  • Promote SRE best practices and foster a culture of reliability and performance across the organization.
  • Drive performance reviews, skills development, and career progression for team members.
  • Capacity Planning & Cost Optimization :

  • Perform capacity planning and implement autoscaling solutions to handle traffic spikes.
  • Optimize infrastructure and cloud costs while maintaining reliability and performance.
  • Skills & Qualifications :

    Required Skills :

  • Technical Expertise :
  • Experience with cloud platforms (AWS / Azure / GCP) and Kubernetes.
  • Hands-on knowledge of infrastructure-as-code tools like Terraform / Helm / Ansible.
  • Proficiency in Java
  • Expertise in distributed systems, databases, and load balancing.
  • Monitoring & Observability :
  • Proficient with tools like Prometheus, Grafana, Elastic APM, or new relic.
  • Understanding of metrics-driven approaches for system monitoring and alerting.
  • Automation & CI / CD :
  • Hands-on experience with CI / CD pipelines (e.g., Jenkins, Azure Pipelines etc.).
  • Skilled in automation frameworks and tools for infrastructure and application deployments.
  • Incident Management :
  • Proven track record in handling incidents, post-mortems, and implementing solutions to prevent recurrence.
  • Leadership & Communication Skills :

  • Strong people management and leadership skills with the ability to inspire and motivate teams.
  • Excellent problem-solving and decision-making skills.
  • Clear and concise communication, with the ability to translate technical concepts for non-technical stakeholders.
  • Preferred Skills :

  • Experience with database optimization, Kafka, or other messaging systems.
  • Knowledge of autoscaling techniques
  • Previous experience in an SRE, DevOps, or infrastructure engineering leadership role.
  • Understanding of compliance and security best practices in distributed systems.
  • Why Join Us

  • Be a key driver in building and scaling reliable systems in a fast-paced environment.
  • Work with cutting-edge technologies and influence the evolution of the infrastructure.
  • Lead a high-impact team and foster a culture of reliability and innovation.
  • Skills Required

    Databases, Java, Prometheus, Grafana, New Relic, Jenkins, Gcp, Terraform, Distributed Systems, Ansible, Load Balancing, Helm, Azure, Kubernetes, Aws

    Create a job alert for this search

    Site Reliability Engineer • Bengaluru / Bangalore, India

    Related jobs
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    JRD SystemsBengaluru, Karnataka, India
    Site Reliability Engineer (Windows / Cloud / Automation).We are seeking an experienced Site Reliability Engineer with a strong background in managing Windows infrastructure and cloud environments.T...Show moreLast updated: 21 days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    CodeKarmahosur, tamil nadu, in
    Site Reliability Engineer (Multi-Cloud Deployments).CodeKarma is redefining how engineering teams understand and evolve complex systems — bringing production context directly into the developer’s w...Show moreLast updated: 23 days ago
    • Promoted
    Lead Site Reliability Engineer

    Lead Site Reliability Engineer

    Tata Consultancy ServicesBengaluru, Republic Of India, IN
    Senior Site Reliability Engineer (SRE).Senior Site Reliability Engineer (SRE).Desired Experience Range : 7 - 10 yrs.Notice Period : Immediate to 90Days only. We are currently planning to do a Virtual....Show moreLast updated: 13 days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    VIDA Digital IdentityBengaluru, Republic Of India, IN
    DevOps Engineer - VIDA Digital Identity.We are looking for an experienced and driven DevOps Engineer to join our dynamic team at VIDA Digital Identity. The successful candidate will play a pivotal r...Show moreLast updated: 24 days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    CitNOW GroupBengaluru, IN
    Founded in 2008, CitNOW is an innovative, enterprise-level software product suite that allows automotive dealerships globally to sell more vehicles and parts more profitably.CitNOW’s app-based plat...Show moreLast updated: 1 day ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    SynechronBengaluru, Karnataka, India
    We have immediate opportunity for Senior Site Reliability Engineer.Senior Site Reliability Engineer.At Synechron, we believe in the power of digital to transform businesses for the better.Our globa...Show moreLast updated: 30+ days ago
    • Promoted
    Senior Site Reliability Engineer

    Senior Site Reliability Engineer

    IntraEdgeBengaluru, IN
    Strong leadership and people management skills.Exceptional technical proficiency in Pearson's technology stack.Strategic thinking with a focus on long-term operational excellence.Champion operation...Show moreLast updated: 15 days ago
    • Promoted
    Senior Site Reliability Engineer- ELK Expert

    Senior Site Reliability Engineer- ELK Expert

    iVedha Inc.hosur, tamil nadu, in
    Senior Site Reliability Engineer (SRE) – ELK Expert | Platform Engineering Practice.Must be available to work in the EST (US / Canada) Time Zone. Are you a Senior Site Reliability Engineer (SRE) with ...Show moreLast updated: 30+ days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    ACL DigitalBengaluru, India
    Service Management : Maintain application uptime / performance, manage system enhancements and defects, oversee daily operational activities, and ensure continuous improvement and adherence to ITIL be...Show moreLast updated: 30+ days ago
    • Promoted
    Lead Site Reliability Engineer

    Lead Site Reliability Engineer

    Palo Alto NetworksBengaluru, Republic Of India, IN
    At Palo Alto Networks® everything starts and ends with our mission : .Being the cybersecurity partner of choice, protecting our digital way of life. Our vision is a world where each day is safer and m...Show moreLast updated: 30+ days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Sonata SoftwareBengaluru, Republic Of India, IN
    In today's market, there is a unique duality in technology adoption.On one side, extreme focus on cost containment by clients, and on the other, deep motivation to modernize their Digital storefron...Show moreLast updated: 24 days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    greytHRBengaluru, Republic Of India, IN
    We are looking for a passionate and detail-oriented.Site Reliability Engineer (SRE).As an SRE, you will play a critical role in ensuring the reliability, scalability, and performance of our infrast...Show moreLast updated: 24 days ago
    • Promoted
    Lead - Cloud Reliability Engineer

    Lead - Cloud Reliability Engineer

    Searce Inchosur, tamil nadu, in
    The ‘process-first’ AI-native modern tech consultancy that's rewriting the rules.As an engineering-led consultancy, we are dedicated to relentlessly improving the real business outcomes.Our solvers...Show moreLast updated: 30+ days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    ImpetusBengaluru, Republic Of India, IN
    You will be a key contributor in the implementation of CI / CD pipelines, managing infrastructure, container orchestration, and system monitoring. Good hands-on experience on Azure Cloud and leading t...Show moreLast updated: 15 days ago
    • Promoted
    • New!
    Site Reliability Engineer (SRE) – Infrastructure & Automation

    Site Reliability Engineer (SRE) – Infrastructure & Automation

    InstaServicehosur, tamil nadu, in
    InstaService is revolutionizing the home services industry through AI-driven technology, connecting customers with trusted professionals instantly. We’re growing fast across 23+ states and expanding...Show moreLast updated: 19 hours ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    super.moneyBengaluru, Karnataka, India
    Site Reliability Engineer (SRE) Level 3.A Site Reliability Engineer (SRE) Level 3 is a senior technical leadership role focused on designing, implementing, and maintaining large-scale, complex, and...Show moreLast updated: 3 days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    CapgeminiBengaluru, IN
    Choosing Capgemini means choosing a company where you will be empowered to shape your career in the way you’d like, where you’ll be supported and inspired by a collaborative community of colleagues...Show moreLast updated: 12 days ago
    • Promoted
    Staff Site Reliability Engineer (Observability)

    Staff Site Reliability Engineer (Observability)

    Palo Alto NetworksBengaluru, Karnataka, India
    At Palo Alto Networks® everything starts and ends with our mission : .Being the cybersecurity partner of choice, protecting our digital way of life. Our vision is a world where each day is safer and m...Show moreLast updated: 30+ days ago