Talent.com
Site Reliability Engineer

Site Reliability Engineer

ReyikaBengaluru, Karnataka, India
1 day ago
Job description

Role : Senior Site Reliability Engineer / Reliability Architect

Locations : Pune,Bengalore,Chennai,Pune,Noida

Job Description :

Reliability Architect with over 9 years of experience in proactive monitoring, automation, and observability. Skilled in AIOps / MLOps, infrastructure management, and performance optimization using modern tools and practices. Adept at leading incident response, mentoring support teams, and driving cross-functional collaboration to ensure system reliability and scalability.

Key Responsibilities :

  • Monitoring and Automation
  • Proactively monitor software systems to prevent incidents and automate routine operational tasks.
  • Effective Monitoring
  • Design monitoring systems that trigger alerts based on symptoms rather than outages, ensuring early detection and resolution.
  • Application Performance Monitoring (APM)
  • Implement and manage APM tools like New Relic or Dynatrace to track application performance, identify bottlenecks, and optimize resource usage.
  • Log Analysis with Splunk
  • Use Splunk to analyze logs for troubleshooting, anomaly detection, and improving system reliability.
  • Dashboards Preparation
  • Build intuitive dashboards to visualize system health, performance metrics, and operational KPIs.
  • Alerts Setup
  • Configure intelligent alerts based on thresholds and anomalies to ensure timely incident response.
  • Reports Scheduling
  • Automate regular reporting to provide insights into system performance, reliability, and trends.
  • Reliability Metrics
  • Define and track metrics such as SLOs, SLIs, and error budgets to measure and maintain system reliability.
  • Observability Skills
  • Apply observability practices including distributed tracing, logging, and metrics collection to gain deep insights into system behavior.
  • AI-Driven Monitoring & Automation
  • Utilize AIOps techniques to proactively detect anomalies, automate incident response, and enable self-healing systems through intelligent alerting and predictive analytics.
  • Observability & ML Integration
  • Integrate machine learning models with observability tools to enhance system insights, optimize performance, and ensure reliability of AI-powered services in production.
  • Cross-Team Collaboration
  • Work closely with development and support teams to enhance service reliability through rigorous testing and release procedures.
  • Capacity Planning
  • Participate in system design reviews and capacity planning to ensure scalability and performance.
  • Debugging and Incident Response
  • Lead incident response efforts, analyze debugging information, and manage rollbacks of faulty software deployments.
  • Mentoring Support Teams
  • Guide and mentor L1 / L2 support teams to establish best practices in monitoring and observability.
  • Infrastructure Management
  • Manage infrastructure using tools like Chef , Ansible , Terraform , GitLab CI / CD , and Kubernetes .
  • Documentation
  • Maintain comprehensive documentation of processes and procedures to ensure operational consistency and reduce redundancy.
  • Proactive Mindset
  • Approach challenges with enthusiasm, ownership, and a continuous improvement mindset.
Create a job alert for this search

Site Reliability Engineer • Bengaluru, Karnataka, India

Related jobs
  • Promoted
Site Reliability Engineer

Site Reliability Engineer

SynamediaBengaluru, Karnataka, India
At Synamedia, the world’s most talented innovators and trailblazers are shaping the way the world is entertained and informed. We are backed by the Permira funds and Sky.This is the age of infinite ...Show moreLast updated: 9 days ago
  • Promoted
Site Reliability Engineer

Site Reliability Engineer

London Stock Exchange GroupBangalore, India
Engineer, Site Reliability Engineering.We are evolving our Reliability Engineering team to move beyond support and operations. As a Senior Engineer in Site Reliability, you will be part of a diverse...Show moreLast updated: 30+ days ago
  • Promoted
Site Reliability Engineer

Site Reliability Engineer

JRD SystemsBengaluru, Karnataka, India
Site Reliability Engineer (Windows / Cloud / Automation) Job Summary : We are seeking an experienced Site Reliability Engineer with a strong background in managing Windows infrastructure and cloud e...Show moreLast updated: 30+ days ago
  • Promoted
Site Reliability Engineer

Site Reliability Engineer

SynechronBengaluru, Karnataka, India
We have immediate opportunity for Senior Site Reliability Engineer.Senior Site Reliability Engineer.At Synechron, we believe in the power of digital to transform businesses for the better.Our globa...Show moreLast updated: 30+ days ago
  • Promoted
  • New!
Site Reliability Engineer

Site Reliability Engineer

Karixhosur, tamil nadu, in
We are seeking an experienced professional Site Reliability Engineer who acts as a bridge between development and IT operations, taking operational tasks to ensure the efficient functioning of Serv...Show moreLast updated: 4 hours ago
  • Promoted
Senior Site Reliability Engineer

Senior Site Reliability Engineer

o9 Solutions, Inc.Bengaluru, Karnataka, India
Be part of something revolutionary.At o9 Solutions, our mission is clear : be the Most Valuable Platform (MVP) for enterprises. With our AI-driven platform — the o9 Digital Brain — we integrate globa...Show moreLast updated: 6 days ago
  • Promoted
Site Reliability Engineer

Site Reliability Engineer

GREYTIP SOFTWARE PRIVATE LIMITEDBengaluru, Karnataka, India
About the Role We are looking for a skilled Site Reliability Engineer II to join our SRE team.The ideal candidate will have hands-on experience in production monitoring, alert handling, and L1 pro...Show moreLast updated: 4 days ago
  • Promoted
Site Reliability Engineer

Site Reliability Engineer

WhiteLotus Talent PartnersBengaluru, Karnataka, India
L0 and L1 Site Reliability Engineer (SRE) Support.Krutrim Cloud Site Reliability operations team and ensure the smooth functioning of our cloud infrastructure powered by. In this role, you will focu...Show moreLast updated: 30+ days ago
  • Promoted
Lead Site Reliability Engineer

Lead Site Reliability Engineer

Delta Air LinesBengaluru, India
Execute on the Incident, Change Management, Problem Management processes.Building and supporting reliable applications that meet development and maintenance requirements. Provide consultation and di...Show moreLast updated: 30+ days ago
  • Promoted
Site Reliability Engineer

Site Reliability Engineer

super.moneyBengaluru, Karnataka, India
Site Reliability Engineer (SRE) Level 3.A Site Reliability Engineer (SRE) Level 3 is a senior technical leadership role focused on designing, implementing, and maintaining large-scale, complex, and...Show moreLast updated: 16 days ago
  • Promoted
Site Reliability Engineer

Site Reliability Engineer

People Prime WorldwideBengaluru, IN
Our client is a French multinational information technology (IT) services and consulting company, headquartered in Paris, France. Founded in 1967, It has been a leader in business transformation for...Show moreLast updated: 30+ days ago
  • Promoted
  • New!
Site Reliability Engineer

Site Reliability Engineer

Awign ExpertBangalore, IN
Position : SRE Observability Engineer.Mandatory Skills : Observability, Grafana and Writing queries using Prometheus and Loki. We are seeking a highly experienced and driven Senior Observability Engin...Show moreLast updated: 10 hours ago
  • Promoted
  • New!
Senior Site Reliability Engineer (SRE)

Senior Site Reliability Engineer (SRE)

Voya Indiahosur, tamil nadu, in
We are seeking a strategic and technically adept leader to drive the scalability, resilience, and operational excellence of our enterprise systems. This role will set the vision for site reliability...Show moreLast updated: 4 hours ago
  • Promoted
Site Reliability Engineer

Site Reliability Engineer

Datum Technologies Grouphosur, tamil nadu, in
Job Title : Site Reliability Engineer (SRE) – AWS.AWS, Terraform, Kubernetes, Docker, Grafana, Prometheus, Datadog.We are looking for a skilled Site Reliability Engineer (SRE) with strong AWS experi...Show moreLast updated: 8 days ago
  • Promoted
Site Reliability Engineer

Site Reliability Engineer

Media.netBengaluru, Karnataka, India
Our proprietary contextual technology is at the forefront of enhancing Programmatic buying, the latest industry standard in ad buying for digital platforms. HQ is based in New York, and the Global H...Show moreLast updated: 30+ days ago
  • Promoted
Site Reliability Engineer

Site Reliability Engineer

Landmark GroupBengaluru, India
Ensure reliability and high availability of Java and microservices-based applications through proactive monitoring and automation. Define and track SLIs / SLOs to maintain service performance and stab...Show moreLast updated: 7 days ago
  • Promoted
Site Reliability Engineer

Site Reliability Engineer

PhonePehosur, tamil nadu, in
SRE We are looking for engineers who are passionate about reliability, performance, and efficiency, and with experience in building tools, services, and automation to manage and improve production ...Show moreLast updated: 16 days ago
  • Promoted
  • New!
Site Reliability Engineer (SRE) / DevOps Engineer

Site Reliability Engineer (SRE) / DevOps Engineer

Stoopa AIhosur, tamil nadu, in
AI is building next-generation AI-driven platforms for ports and is focused on reliability, speed, and intelligent automation. As we scale our next generation smart port product Turi, we are hiring ...Show moreLast updated: 4 hours ago