Experience - 6-10 years
Design and Architect SRE element into all the existing and new apps and services along with defining several controls / processes that ensures SLAs / KPIs are met.
Define SLAs / SLIs / SLOs metrics at a technical level and ensure 100% adherence.
Proactively maintain services once they are live by measuring and monitoring availability, latency and overall system health.
Respond quickly to issues and mobilise responsible individuals quickly to achieve the fasted possible resolution.
Support services before they go live through activities such as system design consulting, developing software platforms and frameworks, capacity planning and launch reviews
Scale system and service sustainably through mechanisms like automation and evolve systems by pushing for changes that improve reliability and speed of service resolution.
Continually analyse service to end customers with a view to enhancing customer experience, eradicating issues, fixing root causes and driving quality into everything we do.
Educating support operations and customer help desks to adapt to new ways of working by increasing skills and knowledge.
Perform RCAs, publish reports and take it to the next level by inventing short / long term fixes and further Runbooks.
Be part of the Agile Mode of delivering Work Products by performing Backlog planning, Sprint Planning, Design Reviews, Peer Reviews and Retrospective.
Site Reliability Engineer • Bengaluru, Karnataka, India