Talent.com
No longer accepting applications
Site Reliability Engineer

Site Reliability Engineer

HireAlphakolkata, India
19 hours ago
Job description

Job Description- Site Reliability Engineer

Experience- 8+ Years

Responsibilities :

  • Ensure high availability, performance, and scalability of mission-critical systems and services.
  • Lead the design and implementation of resilient and fault-tolerant infrastructure.
  • Drive incident response, root cause analysis, and postmortem culture. Mentor others in incident practices.
  • Write and maintain operational documentation, runbooks, and architecture diagrams.
  • Drive and promote protocols on production readiness and operational excellence.
  • Own and evolve infrastructure automation using Terraform or similar tools to remove as much as possible any human intervention.
  • Help automate infrastructure provisioning and other engineering processes by working on automations built on top of an engineering platform written in GitHub Actions.
  • Build internal platforms, tools, and frameworks to improve developer productivity and service reliability.
  • Work closely with software engineers, platform teams, and product managers to align on company goals.
  • Coach and up-skill other engineering team members

Skills and Qualifications :

  • 8–12+ years in SRE, DevOps, or related infrastructure-focused roles.
  • Understand large-scale complex systems from a reliability perspective.
  • Design, implement and maintain processes and tools.
  • Passion for producing clean, standards-compliant, secure code.
  • Bringing a developer mindset and applying it to infrastructure
  • Strong experience with Linux / Unix systems.
  • Deep experience with Kubernetes.
  • Deep experience with tools like Terraform, Ansible, Helm.
  • Strong coding skills in scripts for automating the execution of certain tasks with a programming language like Python, Bash or any other scripting language.
  • Experience with at least one relational and non-relational databases (ex : PostgreSQL, MySQL, MongoDB, Redis, ElasticSearch).
  • Ability to identify time consuming and error prone manual tasks and then build / leverage tooling to automate them.
  • Ability to identify root causes of instability in a large-scale distributed system across stacks.
  • Experience leading high-severity incident responses and postmortems
  • Nice to haves / Pluses :

  • Experience with cloud-based solutions such as Amazon AWS, Google Cloud, or Microsoft Azure.
  • Experience supporting scalable DBs like PostgreSQL, or MongoDB in production.
  • Understanding of cost
  • Create a job alert for this search

    Site Reliability Engineer • kolkata, India