Talent.com
No longer accepting applications
Site Reliability Engineer

Site Reliability Engineer

GREYTIP SOFTWARE PRIVATE LIMITEDpushkar, India
4 days ago
Job description

About the Role

We are looking for a skilled Site Reliability Engineer II to join our SRE team. The ideal candidate will have hands-on experience in production monitoring, alert handling, and L1 production support . You will play a key role in ensuring the reliability, availability, and performance of our production systems.

Key Responsibilities

  • Monitor production systems using enterprise monitoring tools and dashboards.
  • Respond to alerts promptly and take appropriate first-level actions.
  • Provide L1 production support , including initial triage, log analysis, and escalation to relevant teams as needed.
  • Participate in incident management, including documentation, communication, and coordination during production incidents.
  • Perform basic troubleshooting for application, infrastructure, and platform issues.
  • Ensure adherence to SLAs, SLOs, and operational best practices.
  • Contribute to runbooks, knowledge base articles, and incident postmortems.
  • Collaborate with engineering and DevOps teams for incident resolution and improvements.
  • Participate in on-call rotations as required.

Required Skills & Qualifications

  • 2–5 years of experience in SRE, Production Support, DevOps, or similar roles.
  • Hands-on experience with production monitoring tools (e.g., Prometheus, Grafana, Datadog, New Relic, Splunk, CloudWatch, etc.).
  • Strong understanding of alerting systems , incident lifecycle, and on-call processes.
  • Basic troubleshooting knowledge in Linux / Unix , networking fundamentals, and cloud environments.
  • Familiarity with logging tools (e.g., ELK, Splunk, Cloud Logging).
  • Ability to communicate clearly during incidents and coordinate with cross-functional teams.
  • Strong analytical, problem-solving, and time-management skills.
  • Good to Have

  • Experience with cloud platforms (AWS / Azure / GCP).
  • Basic scripting skills (Python, Shell, Bash).
  • Exposure to CI / CD pipelines and DevOps practices.
  • Understanding of SLOs, SLIs, and reliability engineering principles.
  • Create a job alert for this search

    Site Reliability Engineer • pushkar, India