Talent.com
No longer accepting applications
Site Reliability Engineer (SRE II) (15h Left)

Site Reliability Engineer (SRE II) (15h Left)

greytHRIndia
5 hours ago
Job description

About the Role

We are looking for a passionate and detail-oriented Site Reliability Engineer (SRE) to join our engineering team. As an SRE, you will play a critical role in ensuring the reliability, scalability, and performance of our infrastructure and services. You’ll work closely with development and QA teams to build, maintain, and scale production systems while implementing best practices for monitoring, automation, and incident management.

This position is ideal for engineers who thrive in complex distributed environments, are strong in Databases, Kubernetes, and enjoy improving system reliability through automation and modern tooling.

Key Responsibilities

  • Infrastructure Reliability & Performance
  • Maintain, monitor, and improve uptime and performance of production systems.
  • Design and implement scalable, reliable, and secure infrastructure on cloud platforms (AWS / GCP).
  • Kubernetes & Containerization
  • Deploy, manage, and optimize containerized workloads using Kubernetes and Helm.
  • Troubleshoot Kubernetes clusters, pods, and networking issues.
  • Manage CI / CD pipelines integrated with Kubernetes-based deployments.
  • Database Administration
  • Manage and optimize databases (PostgreSQL, MongoDB, or other DBs).
  • Perform database tuning, backups, restores, and replication management.
  • Automate DB monitoring and implement high availability (HA) strategies
  • Monitoring & Incident Response
  • Participate in on-call rotations for production support and incident response.
  • Conduct post-incident reviews and drive preventive improvements.
  • Security & Compliance
  • Implement and enforce security best practices in infrastructure and application deployments.
  • Manage access controls, secrets, and network policies in production environments.
  • Collaboration & Continuous Improvement
  • Work with development teams to design systems with reliability and scalability in mind.
  • Drive automation and self-healing capabilities for common operational tasks.
  • Contribute to SRE playbooks, runbooks, and documentation.

Required Skills & Qualifications

  • Education : Bachelor’s degree in Computer Science, Engineering, or related field (or equivalent experience).
  • Experience : 2–5 years of experience as an SRE / DevOps / DBA
  • Core Skills :
  • Strong experience with Kubernetes, Docker, and container orchestration.
  • Hands-on experience with Databases (MySQL, PostgreSQL, MongoDB, or similar).
  • Proficiency in Linux system administration and shell scripting.
  • Good knowledge of cloud platforms (AWS / GCP / Azure) and related services.
  • Basic understanding of networking concepts (DNS, Load Balancing, Firewalls, etc.).
  • Programming experience in Python, Go, or Bash for automation.
  • Create a job alert for this search

    Site Reliability Engineer • India