As a Senior Site Reliability Engineer, you will play a critical role in supporting application developers by providing expert guidance on Application and infrastructure best practices from reliability perspective.Your role covers the entire life cycle of a product / application. Your primary focus will be Automation, Observability, reliability and Release management with CICD with an emphasis on solving operations issuesMust have at least 5+ years of SRE experience in large programs with focus on release engineering, observability tasks and reliabilityMust have good understanding of Site Reliability Engineering (SRE) and release management processesshould possess strong analytical and troubleshooting skillsShould be a strong team player and enjoy collaborating with different people and profiles as well as share knowledge and strive for continuous development and learning.Excellent communication skills along with leadership skillsPreferred candidate profile
- Reliability practices
- Chaos engineering
- Strong experience on one or more Observability tools like New Relic, AppDynamics, Prometheus, Dynatrace, DataDog, Splunk,
- Experience in event correlation using observability or other tools like BigPanda
- Experience in Observability Dashboard creation, custom metrics, Synthetic Monitoring and Real User Monitoring (RUM)
- Understanding of automation avenues
- Good experience in scripting or development languages, including expertise in Python, Ruby, JSON, Java, and Node.JS, PHP (anyone)
- Experience with scripting in PowerShell(M) and Bash / Shell / Perl (anyone)
- Strong knowledge of application design and architecture including microservices architecture
- Experience in CICD tooling and best practices
- Experience of Cloud platforms such as AWS, Azure, and Google
- Good communication skills
Good to have
- AIOps and related tools
- Experience in container orchestration and practices, including Kubernetes, Docker Swarm
- Experience in infrastructure automation tools like Terraform, Cloud Formation, Ansible, and Puppet (Any one)
- Knowledge on SQL, NoSQL (Oracle, Couchbase)
- Experience working on ITSM tools like Remedy, ServiceNow, Confluence, Jira
- Experience with Cloud cost optimization / FinOps
Skills Required
Site Reliability Engineer, Cicd, Azure, Aws