Talent.com
Systems Reliability Engineer
Systems Reliability EngineerReyika • Bengaluru, Republic Of India, IN
Systems Reliability Engineer

Systems Reliability Engineer

Reyika • Bengaluru, Republic Of India, IN
13 hours ago
Job description

Role : Senior Site Reliability Engineer / Reliability Architect

Locations : Pune,Bengalore,Chennai,Pune,Noida

Job Description :

Reliability Architect with over 9 years of experience in proactive monitoring, automation, and observability. Skilled in AIOps / MLOps, infrastructure management, and performance optimization using modern tools and practices. Adept at leading incident response, mentoring support teams, and driving cross-functional collaboration to ensure system reliability and scalability.

Key Responsibilities :

  • Monitoring and Automation
  • Proactively monitor software systems to prevent incidents and automate routine operational tasks.
  • Effective Monitoring
  • Design monitoring systems that trigger alerts based on symptoms rather than outages, ensuring early detection and resolution.
  • Application Performance Monitoring (APM)
  • Implement and manage APM tools like New Relic or Dynatrace to track application performance, identify bottlenecks, and optimize resource usage.
  • Log Analysis with Splunk
  • Use Splunk to analyze logs for troubleshooting, anomaly detection, and improving system reliability.
  • Dashboards Preparation
  • Build intuitive dashboards to visualize system health, performance metrics, and operational KPIs.
  • Alerts Setup
  • Configure intelligent alerts based on thresholds and anomalies to ensure timely incident response.
  • Reports Scheduling
  • Automate regular reporting to provide insights into system performance, reliability, and trends.
  • Reliability Metrics
  • Define and track metrics such as SLOs, SLIs, and error budgets to measure and maintain system reliability.
  • Observability Skills
  • Apply observability practices including distributed tracing, logging, and metrics collection to gain deep insights into system behavior.
  • AI-Driven Monitoring & Automation
  • Utilize AIOps techniques to proactively detect anomalies, automate incident response, and enable self-healing systems through intelligent alerting and predictive analytics.
  • Observability & ML Integration
  • Integrate machine learning models with observability tools to enhance system insights, optimize performance, and ensure reliability of AI-powered services in production.
  • Cross-Team Collaboration
  • Work closely with development and support teams to enhance service reliability through rigorous testing and release procedures.
  • Capacity Planning
  • Participate in system design reviews and capacity planning to ensure scalability and performance.
  • Debugging and Incident Response
  • Lead incident response efforts, analyze debugging information, and manage rollbacks of faulty software deployments.
  • Mentoring Support Teams
  • Guide and mentor L1 / L2 support teams to establish best practices in monitoring and observability.
  • Infrastructure Management
  • Manage infrastructure using tools like Chef , Ansible , Terraform , GitLab CI / CD , and Kubernetes .
  • Documentation
  • Maintain comprehensive documentation of processes and procedures to ensure operational consistency and reduce redundancy.
  • Proactive Mindset
  • Approach challenges with enthusiasm, ownership, and a continuous improvement mindset.
Create a job alert for this search

Reliability Engineer • Bengaluru, Republic Of India, IN

Related jobs
Site Reliability Engineer

Site Reliability Engineer

Reyika • Bengaluru, Karnataka, India
Senior Site Reliability Engineer / Reliability Architect.Pune,Bengalore,Chennai,Pune,Noida.Reliability Architect with over 9 years of experience in proactive monitoring, automation, and observabili...Show more
Last updated: 18 hours ago • Promoted • New!
Site Reliability Engineer

Site Reliability Engineer

Synamedia • Bengaluru, Karnataka, India
At Synamedia, the world’s most talented innovators and trailblazers are shaping the way the world is entertained and informed. We are backed by the Permira funds and Sky.This is the age of infinite ...Show more
Last updated: 9 days ago • Promoted
Site Reliability Engineer

Site Reliability Engineer

London Stock Exchange Group • Bangalore, India
Engineer, Site Reliability Engineering.We are evolving our Reliability Engineering team to move beyond support and operations. As a Senior Engineer in Site Reliability, you will be part of a diverse...Show more
Last updated: 30+ days ago • Promoted
Senior Systems Reliability Engineer II

Senior Systems Reliability Engineer II

Confidential • Bengaluru / Bangalore, India
ThoughtSpot is an AI-powered analytics platform that enables users to explore and analyze data through natural language queries, making insights accessible to all. Our mission is to deliver reliable...Show more
Last updated: 19 days ago • Promoted
Site Reliability Engineer

Site Reliability Engineer

JRD Systems • Bengaluru, Karnataka, India
Site Reliability Engineer (Windows / Cloud / Automation).We are seeking an experienced Site Reliability Engineer with a strong background in managing Windows infrastructure and cloud environments.T...Show more
Last updated: 30+ days ago • Promoted
System Reliability Engineer

System Reliability Engineer

Confidential • Bengaluru / Bangalore, India
We are seeking an experienced Site Reliability Engineer (SRE) with a strong background in DevOps technologies and cloud infrastructure. The ideal candidate will have hands-on experience with Kuberne...Show more
Last updated: 30+ days ago • Promoted
Site Reliability Engineer

Site Reliability Engineer

Flipkart • Bengaluru, Karnataka, India
Hiring Site Reliability Engineers.The engineer will work in the Reliability and Productivity Engineering team and is responsible for building industry standard large scale platforms to be utilised ...Show more
Last updated: 5 days ago • Promoted
Reliability Systems Engineer

Reliability Systems Engineer

super.money • Bengaluru, Republic Of India, IN
Site Reliability Engineer (SRE) Level 3.A Site Reliability Engineer (SRE) Level 3 is a senior technical leadership role focused on designing, implementing, and maintaining large-scale, complex, and...Show more
Last updated: 16 days ago • Promoted
Systems Reliablity Engineer

Systems Reliablity Engineer

Confidential • Bengaluru / Bangalore, India
Systems Reliability Engineer (SRE).In this role, you will be responsible for ensuring the reliability, scalability, and performance of systems and applications. You'll collaborate closely with devel...Show more
Last updated: 19 days ago • Promoted
System Reliability Engineer

System Reliability Engineer

Andromeda Security • Bengaluru, Karnataka, India
We are seeking an experienced Site Reliability Engineer (SRE) with a strong background in DevOps technologies and cloud infrastructure. The ideal candidate will have hands-on experience with Kuberne...Show more
Last updated: 30+ days ago • Promoted
Site Reliability Engineer

Site Reliability Engineer

WhiteLotus Talent Partners • Bengaluru, Karnataka, India
L0 and L1 Site Reliability Engineer (SRE) Support.Krutrim Cloud Site Reliability operations team and ensure the smooth functioning of our cloud infrastructure powered by. In this role, you will focu...Show more
Last updated: 30+ days ago • Promoted
Site Reliability Engineer

Site Reliability Engineer

Synechron • Bengaluru, Karnataka, India
We have immediate opportunity for Senior Site Reliability Engineer.Senior Site Reliability Engineer.At Synechron, we believe in the power of digital to transform businesses for the better.Our globa...Show more
Last updated: 30+ days ago • Promoted
Site Reliability Engineer (SRE) – Infrastructure & Automation

Site Reliability Engineer (SRE) – Infrastructure & Automation

InstaService • hosur, tamil nadu, in
InstaService is revolutionizing the home services industry through AI-driven technology, connecting customers with trusted professionals instantly. We’re growing fast across 23+ states and expanding...Show more
Last updated: 13 days ago • Promoted
Site Reliability Engineer

Site Reliability Engineer

super.money • Bengaluru, Karnataka, India
Site Reliability Engineer (SRE) Level 3.A Site Reliability Engineer (SRE) Level 3 is a senior technical leadership role focused on designing, implementing, and maintaining large-scale, complex, and...Show more
Last updated: 16 days ago • Promoted
Reliability Engineer

Reliability Engineer

London Stock Exchange Group • Bangalore, India
LSEG (London Stock Exchange Group) is more than a diversified global financial markets infrastructure and data business.We are dedicated, open-access partners with a dedication to excellence in del...Show more
Last updated: 30+ days ago • Promoted
Principal Site Reliability Engineer

Principal Site Reliability Engineer

Rakuten India • Bengaluru, Karnataka, India
Design, develop SLA, SLO, SLI of services within the Business Unit.Involve in whole process of Development, Production System Operation including system maintenance, monitoring, automation, backend...Show more
Last updated: 30+ days ago • Promoted
Site Reliability Engineer

Site Reliability Engineer

Landmark Group • Bengaluru, Karnataka, India
Ensure reliability and high availability of.Java and microservices-based applications.Build and enhance observability using. Prometheus, Grafana, Loki, or New Relic.Collaborate with engineering and ...Show more
Last updated: 8 days ago • Promoted
Lead Site Reliability Engineer

Lead Site Reliability Engineer

Delta Air Lines • Bengaluru, India
Execute on the Incident, Change Management, Problem Management processes.Building and supporting reliable applications that meet development and maintenance requirements. Provide consultation and di...Show more
Last updated: 11 hours ago • Promoted • New!