Talent.com
Lead Site Reliability Engineer - Cloud Computing

Lead Site Reliability Engineer - Cloud Computing

NeemtreeMumbai
30+ days ago
Job description

Responsibilities :

  • Team Leadership : Manage and mentor a team of SREs, assigning tasks, providing technical guidance, and fostering a culture of collaboration and continuous learning.
  • Design and Implement Monitoring and Alerting : Lead the implementation of reliable, scalable, and fault-tolerant systems, including infrastructure, monitoring, and alerting.
  • Incident Management : Manage incident response processes, including root cause analysis, post-mortem reviews, and proactive mitigation strategies to minimize system downtime and impact.
  • Monitoring and Alerting : Develop and maintain comprehensive monitoring systems to identify potential issues early, set appropriate alerting thresholds, and optimize system performance.
  • Automation and Tooling : Drive automation initiatives to streamline operational tasks, including deployments, scaling, and configuration management, utilizing relevant tools and technologies.
  • Capacity Planning : Proactively assess system capacity needs, plan for future growth, and implement scaling strategies to ensure optimal performance under load.
  • Performance Optimization : Analyze system metrics and identify bottlenecks, implement performance improvements, and optimize resource utilization.
  • Collaboration : Work closely with development teams, product managers, and other stakeholders to ensure alignment on reliability goals and smooth integration of new features.
  • Technical Strategy : Develop and implement the SRE roadmap, including technology adoption, standards, and best practices to maintain a high level of system reliability.

Requirements :

  • Technical Expertise : Strong proficiency in system administration, cloud computing (AWS, Azure), networking, distributed systems, and containerization technologies (Docker, Kubernetes).
  • Programming Skills : Expertise in scripting languages (Python, Bash) and ability to develop automation tools. Good to have a basic understanding of Java
  • Monitoring and Alerting : Deep understanding of monitoring systems (Prometheus, Grafana), alerting configurations, and log analysis.
  • Incident Management : Proven experience in managing critical incidents, performing root cause analysis, and coordinating response efforts.
  • Leadership and Communication : Excellent communication skills to convey technical concepts to both technical and non-technical audiences, ability to lead and motivate a team.
  • Problem-Solving : Strong analytical and troubleshooting skills to identify and resolve complex technical issues.
  • (ref : hirist.tech)

    Create a job alert for this search

    Site Reliability Engineer • Mumbai

    Related jobs
    • Promoted
    Senior DevOps & Database Reliability Engineer – 100% Remote

    Senior DevOps & Database Reliability Engineer – 100% Remote

    Hyly.AIKalyan-Dombivli, IN
    Remote
    AI, we’re building the first AI + Data Fabric for the multifamily industry, transforming how clients manage, secure, and scale their marketing and operational data. As the industry moves toward a co...Show moreLast updated: 7 days ago
    • Promoted
    Lead Engineer

    Lead Engineer

    HyqooThane, IN
    Design, deploy, and manage AWS cloud infrastructure, including EC2 instances, S3 buckets, VPCs, RDS databases, and Lambda functions. Assist in the design, implementation, and maintenance of backup, ...Show moreLast updated: 10 days ago
    • Promoted
    Morningstar - Site Reliability Engineer - DevOps

    Morningstar - Site Reliability Engineer - DevOps

    Morning StarNavi Mumbai
    Description : Job Title : Site Reliability Engineer.The Group : The EDP group is the home of data production and innovation at Mornings...Show moreLast updated: 28 days ago
    • Promoted
    Lead Site Reliability Engineer

    Lead Site Reliability Engineer

    Media.netMumbai, Maharashtra, India
    Our proprietary contextual technology is at the forefront of enhancing Programmatic buying, the latest industry standard in ad buying for digital platforms. HQ is based in New York, and the Global H...Show moreLast updated: 3 days ago
    • Promoted
    Cloud Engineer

    Cloud Engineer

    Right Advisors Private Limitednavi mumbai, maharashtra, in
    Develop, deploy, and maintain cloud-based applications and services.Write, compile, and test software code to ensure functionality and performance. Design and implement scalable, secure, and high-pe...Show moreLast updated: 14 days ago
    • Promoted
    Site Reliability Engineer / Lead - CI / CD Pipeline

    Site Reliability Engineer / Lead - CI / CD Pipeline

    SolutionTech HRMumbai
    Key Responsibilities : - Lead and mentor a team of SREs / DevOps Engineers, fostering a culture of ownership, reliability,...Show moreLast updated: 30+ days ago
    • Promoted
    • New!
    Senior Site Reliability Engineer (C# / Python)

    Senior Site Reliability Engineer (C# / Python)

    EntechMumbai, IN
    Senior Software Site Reliability Engineer (C# / Python).You’ll ensure enterprise systems are reliable, scalable, and performant - driving improvements, leading SRE initiatives, and mentoring teams on...Show moreLast updated: 3 hours ago
    • Promoted
    Senior Site Reliability Engineer

    Senior Site Reliability Engineer

    Synechrondombivli, maharashtra, in
    We have immediate opportunity for.SRE (Senior Site Reliability Engineer) 5+ years.SRE (Senior Site Reliability Engineer). We began life in 2001 as a small, self-funded team of technology specialists...Show moreLast updated: 1 day ago
    • Promoted
    Cloud Engineer

    Cloud Engineer

    Searce IncMumbai, Maharashtra, India
    The ‘process-first’ AI-native modern tech consultancy that's rewriting the rules.As an engineering-led consultancy, we are dedicated to relentlessly improving the real business outcomes.Our solvers...Show moreLast updated: 30+ days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    ACL Digitaldombivli, maharashtra, in
    ACL Digital is Hiring for the Below position.ACL Digital, part of the ALTEN Group, is a trusted AI-led, Digital & Systems Engineering Partner driving innovation by designing and building intelligen...Show moreLast updated: 14 days ago
    • Promoted
    AWS Cloud Engineer

    AWS Cloud Engineer

    ProgliteThane, IN
    Infrastructure & System Administration : .Deploy, manage, and optimize EC2 instances across dev, test, and production environments. Perform system administration and troubleshooting for Linux and Wind...Show moreLast updated: 30+ days ago
    • Promoted
    Cloud Engineer (AWS, Azure)

    Cloud Engineer (AWS, Azure)

    BPMLinksmumbai, maharashtra, in
    Job Description – Cloud Engineer (AWS, Azure, Snowflake) : .We are looking for a skilled Cloud Engineer to take ownership of the existing multi-cloud infrastructure and data integration setup.Manage ...Show moreLast updated: 14 days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    SynechronMumbai, Maharashtra, India
    We have immediate opportunity for.Site Reliability Engineer Devop 5 to 9 years.SRE (Senior Site Reliability Engineer) Devop. We began life in 2001 as a small, self-funded team of technology speciali...Show moreLast updated: 30+ days ago
    • Promoted
    Site Reliability Engineer (SRE) – Infrastructure & Automation

    Site Reliability Engineer (SRE) – Infrastructure & Automation

    InstaServiceMumbai, IN
    InstaService is revolutionizing the home services industry through AI-driven technology, connecting customers with trusted professionals instantly. We’re growing fast across 23+ states and expanding...Show moreLast updated: 13 days ago
    • Promoted
    Senior Dell Boomi Integration Engineer

    Senior Dell Boomi Integration Engineer

    MaitsysThane, IN
    Job Description : Senior Boomi Integration Engineer.Atom migration (on-prem → cloud), integration development, and ongoing support. Senior Dell Boomi Integration Engineer.Boomi Atom to a cloud-hosted...Show moreLast updated: 1 day ago
    • Promoted
    Senior Site Reliability Engineer

    Senior Site Reliability Engineer

    IntraEdgeKalyan-Dombivli, IN
    Strong leadership and people management skills.Exceptional technical proficiency in Pearson's technology stack.Strategic thinking with a focus on long-term operational excellence.Champion operation...Show moreLast updated: 28 days ago
    • Promoted
    Sr Site Reliability Engineer

    Sr Site Reliability Engineer

    Media.netMumbai, Maharashtra, India
    Our proprietary contextual technology is at the forefront of enhancing Programmatic buying, the latest industry standard in ad buying for digital platforms. HQ is based in New York, and the Global H...Show moreLast updated: 30+ days ago
    • Promoted
    Lead Site Reliability Manager - Cloud Computing

    Lead Site Reliability Manager - Cloud Computing

    NeemtreeMumbai
    Description : Responsibilities : - Manage and mentor a team of SREs, assigning tasks, providing technical guid...Show moreLast updated: 20 days ago