Role-Site Reliability Engineer
6+ Years
Permanent / Bangalore - Hybrid
Job Description
We are looking for an engineer to focus on Developer Experience and who can help us design, build, and maintain high-performance, scalable, and reliable services. As Company provides a Contact Center service, we play a very critical role in our Customer’s business operations and therefore need to provide a highly available and fault tolerant service.
We believe in a DevOps philosophy where every engineering team should be responsible for the software they build and deploy and SREs play a critical role in ensuring that the teams have the tools, practices, and expertise to make that happen in a blame free culture.
Our mission is to improve developers’ experience by giving them the tools to manage the entire software lifecycle and to be self-sufficient.
To help with this we are building our own internal PaaS using the latest technologies like Kubernetes, Prometheus, Kotlin and others. This platform is an important pillar engineering effort and helps us deliver better, faster and more reliable solutions for our customers.
Responsibilities :
Design, build, harden, and maintain some key parts of our internal platform (from CI / CD to developer tools which aim increasing R&D productivity)
Help migrate to industry leading CICD tools like GitHub Actions
Help automate safe deployment practices by using industry leading tools like GitHub Actions, ArgoCD, Argo Rollouts, Helm Charts, etc
Help automate infrastructure provisioning and other engineering processes by working on automations built on top of an engineering platform written in GitHub Actions
Coach and up-skill other engineering team members
Solve challenging technical problems and put your skills to the test every day; see an immediate impact of your work and value you created for other engineers
Automate every aspect of our infrastructure to remove as much as possible any human intervention
Develop effective tooling, alerts, and response to both identify and address reliability risks
Drive and promote protocols on production readiness and operational excellence
Partner with product engineering teams to debug production outages and carry out action items to improve reliability of those systems
Advocate for automated testing, continuous integration and delivery, feature toggles and progressive rollouts
Plan for growth of Company infrastructure.
Skills and Qualifications :
6+ Years of experience.
Understand large-scale complex systems from a reliability perspective
Design, implement and maintain CI / CD processes and tools
Passion for producing clean, standards-compliant, secure code
Bringing a developer mindset and applying it to infrastructure
Know your way around Linux / Unix systems
Experience with Kubernetes
Experience with Infrastructure as code tools like Terraform and Ansible
Experience building software with a programming language such as Java, Kotlin, Scala or any other JVM-based languages
Experience writing scripts for automating the execution of certain tasks with a programming language like Ruby, Python, Bash or any other scripting language
Experience with at least one relational and non-relational databases (ex : PostgreSQL, MySQL, MongoDB, Redis, Elasticsearch)
Ability to identify time consuming and error prone manual tasks and then build / leverage tooling to automate them
Ability to identify root causes of instability in a large-scale distributed system across stacks
Nice to haves / Pluses :
Experience with cloud-based solutions such as Amazon AWS, Google Cloud, or Microsoft Azure
Experience with CI / CD platforms (e.g Jenkins, GitLab), Containers (Docker, Kubernetes), Artifact Management tools (e.g : Nexus, ECR)
Experience with Go programming language
Site Reliability Engineer • Bengaluru, India