Key Responsibilities :
- Support daily cloud operations, ensuring continuous availability, durability, and performance of JFrog's multi-cloud SaaS services.
- Monitor system health across cloud environments (AWS, Azure, GCP) and Kubernetes-based containerized workloads.
- Analyze incidents and events, perform root cause analysis, and drive issues to resolution while maintaining proper communication with internal stakeholders.
- Develop, maintain, and update Standard Operating Procedures (SOPs) and documentation for monitoring and automation activities.
- Collaborate closely with Site Reliability Engineering (SRE), Production Engineering, and Cloud Engineering teams to drive service improvements.
- Participate in a 24x7x365 shift rotation covering INDIA, EMEA, and US regions.
- Leverage enterprise monitoring tools to ensure proactive detection and resolution of performance or availability issues.
- Contribute to the automation of repetitive operational tasks using scripting and CI / CD tools.
Skills Required
Cloud Operations, Aws, Azure, Gcp, Kubernetes, Linux Administration