Description :
An SRE spends just as much of their time working on systems as they do writing code. Youll be tasked with all manner of work from building operational tooling, automating operational workflows, performing architecture and design reviews, investigating system failures and complex outages, improving our monitoring infrastructure, defining service level objectives and agreements for products and flows, and much more.
Job Responsibility :
- Work with development partners to shape the architecture, design, and implementations of new and existing systems to enhance their reliability, performance, efficiency, and scalability
- Work with development partners to shape the architecture, design, and implementations of new and existing systems to enhance their reliability, performance, efficiency, and scalability
- Ensure all key services are measured, monitored, and raising alerts when needed
- Automation of deployment and configuration processes
- Develop reliability tools and frameworks for use by all engineers
- Share On-Call for most critical systems and lead incident response and no-blame post-mortem analysis and review
- Drive efficiencies in systems and processes : capacity planning, configuration management, performance tuning, monitoring and root cause analysis.
Qualification :
A dynamic persona with grit, drive and a deep sense of ownership.BS or MS in Computer Science or a related technical discipline. Equivalent practical experience is a reasonable substitute.Expertise or deep working knowledge in Cloud networking and Next Gen security servicesProduct and working knowledge with Palo Alto products (Next gen firewalls, Panorama, Global Protect VPN) and with Hashicorp products (Vault, Terraform etc.)Expertise with coding infrastructure, automation and orchestrationWorking knowledge of Kubernetes, Terraform, Prometheus, Elastic, Jenkins (or other similar toolset)Well versed in multiple cloud flavours (AWS and Azure)Good understanding of IAM (Identity and Access Management) in cloudGood programming skills in one of C / C++, Java, JavaScript, Python or Go, and an ability to pick up new ones.A good understanding of large-scale distributed systems in practice, including multi-tier architectures, application security, monitoring and storage systems.Good understanding of the DevOps and SAFe / Scrum ways of working(ref : hirist.tech)