Talent.com
This job offer is not available in your country.
Lead Site Reliability Engineer - Cloud Computing

Lead Site Reliability Engineer - Cloud Computing

NeemtreeMumbai
3 days ago
Job description

Responsibilities :

  • Team Leadership : Manage and mentor a team of SREs, assigning tasks, providing technical guidance, and fostering a culture of collaboration and continuous learning.
  • Design and Implement Monitoring and Alerting : Lead the implementation of reliable, scalable, and fault-tolerant systems, including infrastructure, monitoring, and alerting.
  • Incident Management : Manage incident response processes, including root cause analysis, post-mortem reviews, and proactive mitigation strategies to minimize system downtime and impact.
  • Monitoring and Alerting : Develop and maintain comprehensive monitoring systems to identify potential issues early, set appropriate alerting thresholds, and optimize system performance.
  • Automation and Tooling : Drive automation initiatives to streamline operational tasks, including deployments, scaling, and configuration management, utilizing relevant tools and technologies.
  • Capacity Planning : Proactively assess system capacity needs, plan for future growth, and implement scaling strategies to ensure optimal performance under load.
  • Performance Optimization : Analyze system metrics and identify bottlenecks, implement performance improvements, and optimize resource utilization.
  • Collaboration : Work closely with development teams, product managers, and other stakeholders to ensure alignment on reliability goals and smooth integration of new features.
  • Technical Strategy : Develop and implement the SRE roadmap, including technology adoption, standards, and best practices to maintain a high level of system reliability.

Requirements :

  • Technical Expertise : Strong proficiency in system administration, cloud computing (AWS, Azure), networking, distributed systems, and containerization technologies (Docker, Kubernetes).
  • Programming Skills : Expertise in scripting languages (Python, Bash) and ability to develop automation tools. Good to have a basic understanding of Java
  • Monitoring and Alerting : Deep understanding of monitoring systems (Prometheus, Grafana), alerting configurations, and log analysis.
  • Incident Management : Proven experience in managing critical incidents, performing root cause analysis, and coordinating response efforts.
  • Leadership and Communication : Excellent communication skills to convey technical concepts to both technical and non-technical audiences, ability to lead and motivate a team.
  • Problem-Solving : Strong analytical and troubleshooting skills to identify and resolve complex technical issues.
  • (ref : hirist.tech)

    Create a job alert for this search

    Site Reliability Engineer • Mumbai

    Related jobs
    • Promoted
    Cloud Engineer

    Cloud Engineer

    DBiz.aiThane, IN
    We are seeking a dynamic and skilled AWS Cloud & DevOps Engineer to design, implement, and maintain scalable, secure, and automated cloud environments on Amazon Web Services.The ideal candidate wil...Show moreLast updated: 6 days ago
    • Promoted
    Senior Cloud Engineer AWS

    Senior Cloud Engineer AWS

    Matrix USAKalyan-Dombivli, IN
    We are seeking an experienced AWS Developer proficient in Python and PySpark to design, develop, and maintain scalable, serverless data processing and workflow automation solutions on AWS.The ideal...Show moreLast updated: 2 days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Haysmumbai, maharashtra, in
    Required skills and qualifications.Experience : Proven experience in technical support or engineering, preferably in AI / ML / GenAI environments. Technical Proficiency : Expertise in GenAI models (e.GPT,...Show moreLast updated: 23 days ago
    • Promoted
    Cloud Engineer

    Cloud Engineer

    ValueMomentummumbai city, maharashtra, in
    We are seeking a highly skilled.You will work closely with development, operations, and security teams to ensure continuous delivery, high availability, and optimal performance of our applications....Show moreLast updated: 5 days ago
    • Promoted
    Senior Cloud Engineer

    Senior Cloud Engineer

    Dexian IndiaKalyan-Dombivli, IN
    Title : Senior Cloud Engineer / FullStack Developer (with Cloud Experience).Notice Period : Immediate to Currently serving. Proficiency in modern programming languages such as Java, Python, JavaScript,...Show moreLast updated: 24 days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Amicon Hub Servicesdombivli, maharashtra, in
    Manage and scale production systems hosted on.Automate operational tasks using.Improve system reliability and reduce manual interventions through automation. Collaborate with development teams to en...Show moreLast updated: 4 days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    ConcordThane, IN
    Engineers (Individual Contributors).Strong SRE (Site Reliability Engineering).CI / CD, monitoring, automation, infrastructure as code, etc.Show moreLast updated: 16 days ago
    • Promoted
    Site Reliability Engineer - Chaos Management

    Site Reliability Engineer - Chaos Management

    Xebiamumbai, maharashtra, in
    AWS Engineer with strong Python development and Chaos Engineering expertise.The ideal candidate will combine cloud engineering, DevOps, and chaos experimentation to improve reliability, fault toler...Show moreLast updated: 5 days ago
    • Promoted
    Cloud Engineer

    Cloud Engineer

    Delta System & Software, Inc.Kalyan-Dombivli, IN
    AWS Cloud Development Kit (AWS CDK) in TypeScript.Solid experience with TypeScript fundamentals : interfaces, types, classes, generics. Experience writing unit tests for infrastructure code using Jes...Show moreLast updated: 3 days ago
    • Promoted
    Senior Cloud Platform Engineer -AWS-Salary 70LPA

    Senior Cloud Platform Engineer -AWS-Salary 70LPA

    The BigCjobs.comThane, IN
    We are looking for a Senior Cloud Platform Engineer to lead the automation, reliability, and performance of our AWS-based infrastructure. You will architect, optimize, and scale mission-critical sys...Show moreLast updated: 4 days ago
    • Promoted
    Senior Site Reliability Engineer- ELK Expert

    Senior Site Reliability Engineer- ELK Expert

    iVedha Inc.Kalyan-Dombivli, IN
    Senior Site Reliability Engineer (SRE) – ELK Expert | Platform Engineering Practice.Must be available to work in the EST (US / Canada) Time Zone. Are you a Senior Site Reliability Engineer (SRE) with ...Show moreLast updated: 30+ days ago
    • Promoted
    AWS Cloud Engineer

    AWS Cloud Engineer

    ProgliteThane, IN
    Infrastructure & System Administration : .Deploy, manage, and optimize EC2 instances across dev, test, and production environments. Perform system administration and troubleshooting for Linux and Wind...Show moreLast updated: 5 days ago
    • Promoted
    Cloud Engineer

    Cloud Engineer

    Sharp Brainsmumbai, maharashtra, in
    Deep understanding of Linux / Windows OS and networking concepts.Experience with Azure, including services, architecture, and best practices. Containerization & Orchestration : .Hands-on experience with...Show moreLast updated: 24 days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    XebiaKalyan-Dombivli, IN
    AWS Engineer with strong Python development and Chaos Engineering expertise.The ideal candidate will combine cloud engineering, DevOps, and chaos experimentation to improve reliability, fault toler...Show moreLast updated: 24 days ago
    • Promoted
    Senior Site Reliability Engineer

    Senior Site Reliability Engineer

    WSO2dombivli, maharashtra, in
    Founded in 2005, WSO2 is the largest independent software vendor providing open-source API management, integration, and identity and access management (IAM) to thousands of enterprises in over 90 c...Show moreLast updated: 5 days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    UplersMumbai, IN
    Uplers is hiring for one of the clients.SRE (Oracle Cloud Infrastructure).Remote | Mon–Fri | 10 : 30 AM – 7 : 30 PM IST.Use of personal device required. OCI cloud infrastructure using Terraform and GitL...Show moreLast updated: 22 days ago
    • Promoted
    Cloud Engineer

    Cloud Engineer

    Strobes Security, Inc.Thane, IN
    We are looking for a Mid-level Cloud Engineer with hands-on expertise in designing, automating, and operating production-grade cloud infrastructure. This role requires a strong background in AWS ser...Show moreLast updated: 24 days ago
    • Promoted
    Associate Platform Reliability Engineer (SRE)

    Associate Platform Reliability Engineer (SRE)

    Jefferiesmumbai, maharashtra, in
    Jefferies,’’ ‘‘we,’’ ‘‘us’’ or ‘‘our’’) is a U.Our largest subsidiary, Jefferies LLC, a U.Jefferies International Limited, a U. Our strategy focuses on continuing to build out our investment banking...Show moreLast updated: 21 days ago