Role
- : Manager, Site Reliability Engineering
Required Technical Skill Set : Manager, Site Reliability Engineering
Desired Experience Range : 12 - 18 yrs
Notice Period : Immediate to 90Days only
Location of Requirement : Bangalore
We are currently planning to do a Virtual Interview
Job Description :
Describe what the person will do in the role - how he / she will impact the organization.
As the Manager of Site Reliability Engineering on the Infrastructure Reliability team, you will be responsible for building and leading a high-performing team dedicated to ensuring our infrastructure is reliable, scalable, and efficient. Your primary focus will be on people management, strategic planning, and technical leadership. You will mentor and guide your team members, fostering their professional growth and creating a culture of ownership and operational excellence. You will define the team's vision and roadmap, aligning it with the company's broader goals, and work with cross-functional partners to prioritize and execute projects. You will oversee the development of SRE solutions across our globally distributed environments and empowering your team to improve service resiliency, automate processes, and conduct effective incident response and capacity planning to guarantee the highest level of uptime and Quality of Service (QoS) for our internal customers.
Responsibilities and Duties of the Role :
Summarize job responsibilities, core deliverables and major duties. What is required for the position to exist?
Focus on major areas of work, typically 20% or more of role% of Time
Lead, mentor, and grow a team of software and infrastructure automation engineers.Develop and execute the roadmap for the Infrastructure Reliability Engineering team.Collaborate with engineering and operations teams to identify and prioritize reliability improvements.Drive the design and implementation of tools and automation for infrastructure testing and self-healing.Establish and monitor key performance indicators (KPIs) for infrastructure reliability.10%Minimum and Preferred. Inclusive of Licenses / Certs (include functional experience as well as behavioral attributes and / or leadership capabilities)
Basic Qualifications
Bachelor's degree in Computer Science, Engineering, or a related field, or equivalent experience.12+ years of experience in a software engineering or infrastructure role.5+ years of experience in a leadership or management role.Lead a team of Infrastructure Reliability Engineers on projects for users and be directly responsible for uptime.Own end-to-end availability and performance of key services and build automation to prevent problem recurrence. Automate response to all non-exceptional service conditions.Set the standard for excellence by mentoring team members and establishing trust through superior technical delivery.Proficiency in Kubernetes administration and modern CI / CD techniques and Infrastructure as Code (IaC).Deep understanding of Linux operating systems and TCP / IP fundamentals.Experience with monitoring, metrics gathering, APM, container management, and log collection tools.Creative problem solver with excellent debugging skills and great documentation abilities.Strong understanding of networking, storage, security, and compute technologies.Preferred Qualifications
Experience building and leading a Site Reliability Engineering (SRE) or Infrastructure Reliability team.Expertise with complex system architectures and infrastructures.Proficiency in one or more programming languages (e.g., Python, Go, Java).Passion for automation, scalability, and building reliable systems from the ground up.