Talent.com
Site Reliability Engineer

Site Reliability Engineer

GREYTIP SOFTWARE PRIVATE LIMITEDBengaluru, Karnataka, India
4 days ago
Job description

About the Role

We are looking for a skilled Site Reliability Engineer II to join our SRE team. The ideal candidate will have hands-on experience in production monitoring, alert handling, and L1 production support . You will play a key role in ensuring the reliability, availability, and performance of our production systems.

Key Responsibilities

Monitor production systems using enterprise monitoring tools and dashboards.

Respond to alerts promptly and take appropriate first-level actions.

Provide L1 production support , including initial triage, log analysis, and escalation to relevant teams as needed.

Participate in incident management, including documentation, communication, and coordination during production incidents.

Perform basic troubleshooting for application, infrastructure, and platform issues.

Ensure adherence to SLAs, SLOs, and operational best practices.

Contribute to runbooks, knowledge base articles, and incident postmortems.

Collaborate with engineering and DevOps teams for incident resolution and improvements.

Participate in on-call rotations as required.

Required Skills & Qualifications

2–5 years of experience in SRE, Production Support, DevOps, or similar roles.

Hands-on experience with production monitoring tools (e.g., Prometheus, Grafana, Datadog, New Relic, Splunk, CloudWatch, etc.).

Strong understanding of alerting systems , incident lifecycle, and on-call processes.

Basic troubleshooting knowledge in Linux / Unix , networking fundamentals, and cloud environments.

Familiarity with logging tools (e.g., ELK, Splunk, Cloud Logging).

Ability to communicate clearly during incidents and coordinate with cross-functional teams.

Strong analytical, problem-solving, and time-management skills.

Good to Have

Experience with cloud platforms (AWS / Azure / GCP).

Basic scripting skills (Python, Shell, Bash).

Exposure to CI / CD pipelines and DevOps practices.

Understanding of SLOs, SLIs, and reliability engineering principles.

Create a job alert for this search

Site Reliability Engineer • Bengaluru, Karnataka, India