Team Leadership : Manage and mentor a team of SREs, assigning tasks, providing technical guidance, and fostering a culture of collaboration and continuous learning.
Design and Implement Monitoring and Alerting : Lead the implementation of reliable, scalable, and fault-tolerant systems, including infrastructure, monitoring, and alerting.
Incident Management : Manage incident response processes, including root cause analysis, post-mortem reviews, and proactive mitigation strategies to minimize system downtime and impact.
Monitoring and Alerting : Develop and maintain comprehensive monitoring systems to identify potential issues early, set appropriate alerting thresholds, and optimize system performance.
Automation and Tooling : Drive automation initiatives to streamline operational tasks, including deployments, scaling, and configuration management, utilizing relevant tools and technologies.
Capacity Planning : Proactively assess system capacity needs, plan for future growth, and implement scaling strategies to ensure optimal performance under load.
Performance Optimization : Analyze system metrics and identify bottlenecks, implement performance improvements, and optimize resource utilization.
Collaboration : Work closely with development teams, product managers, and other stakeholders to ensure alignment on reliability goals and smooth integration of new features.
Technical Strategy : Develop and implement the SRE roadmap, including technology adoption, standards, and best practices to maintain a high level of system reliability.
Requirements :
Technical Expertise : Strong proficiency in system administration, cloud computing (AWS, Azure), networking, distributed systems, and containerization technologies (Docker, Kubernetes).
Programming Skills : Expertise in scripting languages (Python, Bash) and ability to develop automation tools. Good to have a basic understanding of Java
Monitoring and Alerting : Deep understanding of monitoring systems (Prometheus, Grafana), alerting configurations, and log analysis.
Incident Management : Proven experience in managing critical incidents, performing root cause analysis, and coordinating response efforts.
Leadership and Communication : Excellent communication skills to convey technical concepts to both technical and non-technical audiences, ability to lead and motivate a team.
Problem-Solving : Strong analytical and troubleshooting skills to identify and resolve complex technical issues.
(ref : hirist.tech)
Create a job alert for this search
Site Reliability Engineer • Mumbai
Related jobs
Promoted
Cloud Engineer
DBiz.aiThane, IN
We are seeking a dynamic and skilled AWS Cloud & DevOps Engineer to design, implement, and maintain scalable, secure, and automated cloud environments on Amazon Web Services.The ideal candidate wil...Show moreLast updated: 6 days ago
Promoted
Senior Cloud Engineer AWS
Matrix USAKalyan-Dombivli, IN
We are seeking an experienced AWS Developer proficient in Python and PySpark to design, develop, and maintain scalable, serverless data processing and workflow automation solutions on AWS.The ideal...Show moreLast updated: 2 days ago
Promoted
Site Reliability Engineer
Haysmumbai, maharashtra, in
Required skills and qualifications.Experience : Proven experience in technical support or engineering, preferably in AI / ML / GenAI environments.
Technical Proficiency : Expertise in GenAI models (e.GPT,...Show moreLast updated: 23 days ago
Promoted
Cloud Engineer
ValueMomentummumbai city, maharashtra, in
We are seeking a highly skilled.You will work closely with development, operations, and security teams to ensure continuous delivery, high availability, and optimal performance of our applications....Show moreLast updated: 5 days ago
Promoted
Senior Cloud Engineer
Dexian IndiaKalyan-Dombivli, IN
Title : Senior Cloud Engineer / FullStack Developer (with Cloud Experience).Notice Period : Immediate to Currently serving.
Proficiency in modern programming languages such as Java, Python, JavaScript,...Show moreLast updated: 24 days ago
Promoted
Site Reliability Engineer
Amicon Hub Servicesdombivli, maharashtra, in
Manage and scale production systems hosted on.Automate operational tasks using.Improve system reliability and reduce manual interventions through automation.
Collaborate with development teams to en...Show moreLast updated: 4 days ago
Promoted
Site Reliability Engineer
ConcordThane, IN
Engineers (Individual Contributors).Strong SRE (Site Reliability Engineering).CI / CD, monitoring, automation, infrastructure as code, etc.Show moreLast updated: 16 days ago
Promoted
Site Reliability Engineer - Chaos Management
Xebiamumbai, maharashtra, in
AWS Engineer with strong Python development and Chaos Engineering expertise.The ideal candidate will combine cloud engineering, DevOps, and chaos experimentation to improve reliability, fault toler...Show moreLast updated: 5 days ago
Promoted
Cloud Engineer
Delta System & Software, Inc.Kalyan-Dombivli, IN
AWS Cloud Development Kit (AWS CDK) in TypeScript.Solid experience with TypeScript fundamentals : interfaces, types, classes, generics.
Experience writing unit tests for infrastructure code using Jes...Show moreLast updated: 3 days ago
Promoted
Senior Cloud Platform Engineer -AWS-Salary 70LPA
The BigCjobs.comThane, IN
We are looking for a Senior Cloud Platform Engineer to lead the automation, reliability, and performance of our AWS-based infrastructure.
You will architect, optimize, and scale mission-critical sys...Show moreLast updated: 4 days ago
Promoted
Senior Site Reliability Engineer- ELK Expert
iVedha Inc.Kalyan-Dombivli, IN
Senior Site Reliability Engineer (SRE) – ELK Expert | Platform Engineering Practice.Must be available to work in the EST (US / Canada) Time Zone.
Are you a Senior Site Reliability Engineer (SRE) with ...Show moreLast updated: 30+ days ago
Promoted
AWS Cloud Engineer
ProgliteThane, IN
Infrastructure & System Administration : .Deploy, manage, and optimize EC2 instances across dev, test, and production environments.
Perform system administration and troubleshooting for Linux and Wind...Show moreLast updated: 5 days ago
Promoted
Cloud Engineer
Sharp Brainsmumbai, maharashtra, in
Deep understanding of Linux / Windows OS and networking concepts.Experience with Azure, including services, architecture, and best practices.
Containerization & Orchestration : .Hands-on experience with...Show moreLast updated: 24 days ago
Promoted
Site Reliability Engineer
XebiaKalyan-Dombivli, IN
AWS Engineer with strong Python development and Chaos Engineering expertise.The ideal candidate will combine cloud engineering, DevOps, and chaos experimentation to improve reliability, fault toler...Show moreLast updated: 24 days ago
Promoted
Senior Site Reliability Engineer
WSO2dombivli, maharashtra, in
Founded in 2005, WSO2 is the largest independent software vendor providing open-source API management, integration, and identity and access management (IAM) to thousands of enterprises in over 90 c...Show moreLast updated: 5 days ago
Promoted
Site Reliability Engineer
UplersMumbai, IN
Uplers is hiring for one of the clients.SRE (Oracle Cloud Infrastructure).Remote | Mon–Fri | 10 : 30 AM – 7 : 30 PM IST.Use of personal device required.
OCI cloud infrastructure using Terraform and GitL...Show moreLast updated: 22 days ago
Promoted
Cloud Engineer
Strobes Security, Inc.Thane, IN
We are looking for a Mid-level Cloud Engineer with hands-on expertise in designing, automating, and operating production-grade cloud infrastructure.
This role requires a strong background in AWS ser...Show moreLast updated: 24 days ago
Promoted
Associate Platform Reliability Engineer (SRE)
Jefferiesmumbai, maharashtra, in
Jefferies,’’ ‘‘we,’’ ‘‘us’’ or ‘‘our’’) is a U.Our largest subsidiary, Jefferies LLC, a U.Jefferies International Limited, a U.
Our strategy focuses on continuing to build out our investment banking...Show moreLast updated: 21 days ago