Talent.com
This job offer is not available in your country.
Senior Associate - Reliability Operations

Senior Associate - Reliability Operations

ConfidentialHyderabad / Secunderabad, Telangana
30+ days ago
Job description

The Senior Associate Reliability Operations role is critical in ensuring the continuous, reliable, and secure operation of our SaaS products, operating in a 24x7 support capacity. This role involves proactive monitoring, incident response, and collaboration with teams across the organization to maintain optimal service levels. The Senior Associate will participate in a rotating shift schedule to ensure high availability, rapid issue resolution, and support for key reliability initiatives. Senior Associate will serve as a key escalation point, mentor junior team members, and lead critical efforts to optimize operational workflows and systems.

Responsibilities :

  • 24x7 Monitoring and Support : Oversee the health, performance, and availability of cloud-based SaaS infrastructure and applications, using monitoring tools like Prometheus and Grafana, and respond to alerts during assigned shifts. Alignment and adherence to organization process to maintain the SLA.
  • Incident Management : Act as the first responder in a 24x7 rotation, managing and mitigating service disruptions, following standard incident procedures, and escalating issues to SMEs as needed.
  • Deployments and Change Management : Manage deployment lifecycle of the applications. Proactively engage with SMEs to resolve deployment process issues or challenges.
  • Troubleshooting and Resolution : Use diagnostic tools and scripts to resolve common issues in real-time and collaborate with cross-functional teams to analyze and address root causes.
  • Service Health and Reliability : Assist in defining and refining SLAs, SLOs, and SLIs; perform routine checks and follow established runbooks to maintain consistent service reliability.
  • Analysis and Reporting : Regularly review incident data to identify patterns, improve service resilience, and produce shift reports summarizing system health and resolved incidents.
  • Documentation and Knowledge Base : Document incident resolutions, update runbooks, and contribute to an internal knowledge base to improve team response and efficiency.
  • Continuous Improvement Initiatives : Participate in reliability enhancement projects, including automation, configuration management, and tools improvement.
  • Collaboration : Communicate effectively with SMEs to relay critical incident information, insights, and preventive recommendations
  • Mentorship : Work closely with team members to provide guidance during shifts and share insights on improving incident response.

Experience and Qualifications

  • Education : B.Sc IT, B.Sc Computers, BCA or equivalent.
  • Experience : 2-4 years of experience in reliability operations or related 24x7 support role within SaaS or cloud environments
  • Skills

  • Proficiency in monitoring and alerting tools, such as Prometheus, Grafana, Datadog, or Splunk.
  • Ability to remain composed in high-stakes situations and resolve incidents promptly.
  • Strong verbal and written communication skills to document and relay incident information effectively.
  • Shift Information

  • 24x7 Rotational Shifts : This role requires availability to work rotating shifts, including nights, weekends, and holidays, to ensure 24x7 support coverage.
  • Role :   Technical Support - Non Voice

    Industry Type :   IT Services & Consulting

    Department :   Customer Success ,  Service & Operations

    Employment Type :   Full Time, Permanent

    Role Category :   Non Voice

    Education

    UG :   Any Graduate

    PG :   Any Postgraduate

    Skills Required

    Change Management, Operations

    Create a job alert for this search

    Associate Operation • Hyderabad / Secunderabad, Telangana