Talent.com
Systems Reliability Engineer
Systems Reliability EngineerReyika • Bengaluru, Republic Of India, IN
Systems Reliability Engineer

Systems Reliability Engineer

Reyika • Bengaluru, Republic Of India, IN
6 days ago
Job description

Role : Senior Site Reliability Engineer / Reliability Architect

Locations : Pune,Bengalore,Chennai,Pune,Noida

Job Description :

Reliability Architect with over 9 years of experience in proactive monitoring, automation, and observability. Skilled in AIOps / MLOps, infrastructure management, and performance optimization using modern tools and practices. Adept at leading incident response, mentoring support teams, and driving cross-functional collaboration to ensure system reliability and scalability.

Key Responsibilities :

  • Monitoring and Automation
  • Proactively monitor software systems to prevent incidents and automate routine operational tasks.
  • Effective Monitoring
  • Design monitoring systems that trigger alerts based on symptoms rather than outages, ensuring early detection and resolution.
  • Application Performance Monitoring (APM)
  • Implement and manage APM tools like New Relic or Dynatrace to track application performance, identify bottlenecks, and optimize resource usage.
  • Log Analysis with Splunk
  • Use Splunk to analyze logs for troubleshooting, anomaly detection, and improving system reliability.
  • Dashboards Preparation
  • Build intuitive dashboards to visualize system health, performance metrics, and operational KPIs.
  • Alerts Setup
  • Configure intelligent alerts based on thresholds and anomalies to ensure timely incident response.
  • Reports Scheduling
  • Automate regular reporting to provide insights into system performance, reliability, and trends.
  • Reliability Metrics
  • Define and track metrics such as SLOs, SLIs, and error budgets to measure and maintain system reliability.
  • Observability Skills
  • Apply observability practices including distributed tracing, logging, and metrics collection to gain deep insights into system behavior.
  • AI-Driven Monitoring & Automation
  • Utilize AIOps techniques to proactively detect anomalies, automate incident response, and enable self-healing systems through intelligent alerting and predictive analytics.
  • Observability & ML Integration
  • Integrate machine learning models with observability tools to enhance system insights, optimize performance, and ensure reliability of AI-powered services in production.
  • Cross-Team Collaboration
  • Work closely with development and support teams to enhance service reliability through rigorous testing and release procedures.
  • Capacity Planning
  • Participate in system design reviews and capacity planning to ensure scalability and performance.
  • Debugging and Incident Response
  • Lead incident response efforts, analyze debugging information, and manage rollbacks of faulty software deployments.
  • Mentoring Support Teams
  • Guide and mentor L1 / L2 support teams to establish best practices in monitoring and observability.
  • Infrastructure Management
  • Manage infrastructure using tools like Chef , Ansible , Terraform , GitLab CI / CD , and Kubernetes .
  • Documentation
  • Maintain comprehensive documentation of processes and procedures to ensure operational consistency and reduce redundancy.
  • Proactive Mindset
  • Approach challenges with enthusiasm, ownership, and a continuous improvement mindset.
Create a job alert for this search

Reliability Engineer • Bengaluru, Republic Of India, IN

Related jobs
Senior Systems Reliability Engineer II

Senior Systems Reliability Engineer II

Confidential • Bengaluru / Bangalore, India
ThoughtSpot is an AI-powered analytics platform that enables users to explore and analyze data through natural language queries, making insights accessible to all. Our mission is to deliver reliable...Show more
Last updated: 26 days ago • Promoted
Site Reliability Engineer

Site Reliability Engineer

Reyika • Bengaluru, Karnataka, India
Senior Site Reliability Engineer / Reliability Architect.Pune,Bengalore,Chennai,Pune,Noida.Reliability Architect with over 9 years of experience in proactive monitoring, automation, and observabili...Show more
Last updated: 7 days ago • Promoted
Site Reliability Engineer

Site Reliability Engineer

PhonePe • Bengaluru, India
SRE We are looking for engineers who are passionate about reliability, performance, and efficiency, and with experience in building tools, services, and automation to manage and improve production ...Show more
Last updated: 3 days ago • Promoted
Site Reliability Engineer

Site Reliability Engineer

Landmark Group • bangalore, karnataka, in
Ensure reliability and high availability of.Java and microservices-based applications.Build and enhance observability using. Prometheus, Grafana, Loki, or New Relic.Collaborate with engineering and ...Show more
Last updated: 14 days ago • Promoted
Systems Engineer (Platform Reliability)

Systems Engineer (Platform Reliability)

Confidential • Bengaluru / Bangalore, India
We're looking for problem solvers, innovators, and dreamers who are searching for anything but business as usual.Like us, you're a high performer who's an expert at your craft, constantly challengi...Show more
Last updated: 26 days ago • Promoted
Principal Systems Reliability Engineer

Principal Systems Reliability Engineer

Delta Air Lines • Bengaluru, Republic Of India, IN
Delta Air Lines (NYSE : DAL) is the U.Powered by our employees around the world, Delta has for a decade led the airline industry in operational excellence while maintaining our reputation for award-...Show more
Last updated: 30+ days ago • Promoted
Reliability Systems Engineer

Reliability Systems Engineer

super.money • Bengaluru, Republic Of India, IN
Site Reliability Engineer (SRE) Level 3.A Site Reliability Engineer (SRE) Level 3 is a senior technical leadership role focused on designing, implementing, and maintaining large-scale, complex, and...Show more
Last updated: 22 days ago • Promoted
Site Reliability Engineer

Site Reliability Engineer

Yum! India Global Services Private Limited • bangalore district, karnataka, in
Design, test, implement, deploy, and support continuous integration pipelines that build and deploy to cloud-based environments (development, stage / testing, production). In this role, you will help ...Show more
Last updated: 30+ days ago • Promoted
Senior Systems Reliability Engineer

Senior Systems Reliability Engineer

Voya India • Bengaluru, Republic Of India, IN
We are seeking a strategic and technically adept leader to drive the scalability, resilience, and operational excellence of our enterprise systems. This role will set the vision for site reliability...Show more
Last updated: 4 days ago • Promoted
AI / ML Systems Reliability Engineer

AI / ML Systems Reliability Engineer

ACL Digital • Bengaluru, Republic Of India, IN
ACL Digital is Hiring for the Below position.ACL Digital, part of the ALTEN Group, is a trusted AI-led, Digital & Systems Engineering Partner driving innovation by designing and building intelligen...Show more
Last updated: 4 days ago • Promoted
System Reliability Engineer

System Reliability Engineer

Andromeda Security • Bengaluru, Karnataka, India
We are seeking an experienced Site Reliability Engineer (SRE) with a strong background in DevOps technologies and cloud infrastructure. The ideal candidate will have hands-on experience with Kuberne...Show more
Last updated: 30+ days ago • Promoted
Site Reliability Engineer

Site Reliability Engineer

WhiteLotus Talent Partners • Bengaluru, Karnataka, India
L0 and L1 Site Reliability Engineer (SRE) Support.Krutrim Cloud Site Reliability operations team and ensure the smooth functioning of our cloud infrastructure powered by. In this role, you will focu...Show more
Last updated: 30+ days ago • Promoted
Z Systems Reliability Engineer

Z Systems Reliability Engineer

Oracle • Bengaluru, Republic Of India, IN
Oracle is looking for a Principal Site Reliability Engineer with expertise in IBM Mainframe, zLinux, DB2, zVM, and AIX.The role involves improving performance and reliability, managing complex conf...Show more
Last updated: 1 day ago • Promoted
Site Reliability Engineer

Site Reliability Engineer

super.money • Bengaluru, Karnataka, India
Site Reliability Engineer (SRE) Level 3.A Site Reliability Engineer (SRE) Level 3 is a senior technical leadership role focused on designing, implementing, and maintaining large-scale, complex, and...Show more
Last updated: 22 days ago • Promoted
Site Reliability Engineer

Site Reliability Engineer

Glocomms • Bengaluru, Karnataka, India
We are currently looking for an SRE Lead - to join our customer - an IT consultancy with urgent projects on board.This will be a 6 month contract initially with an option to extend further.Assess a...Show more
Last updated: 1 day ago • Promoted
Systems Reliability Engineer

Systems Reliability Engineer

Media.net • Bengaluru, Republic Of India, IN
Net is a leading, global ad tech company that focuses on creating the most transparent and efficient path for advertiser budgets to become publisher revenue. Our proprietary contextual technology is...Show more
Last updated: 30+ days ago • Promoted
Principal Site Reliability Engineer

Principal Site Reliability Engineer

Rakuten India • Bengaluru, Karnataka, India
Design, develop SLA, SLO, SLI of services within the Business Unit.Involve in whole process of Development, Production System Operation including system maintenance, monitoring, automation, backend...Show more
Last updated: 30+ days ago • Promoted
Site Reliability Engineer

Site Reliability Engineer

Delta Electronics India • Bengaluru, Karnataka, India
Define and monitor Service Level Objectives (SLOs), Service Level Indicators (SLIs), and error budgets to balance reliability with feature velocity and ensure optimal system availability.Respond to...Show more
Last updated: 4 days ago • Promoted