Key Deliverables :
- Ensure 100% uptime by designing, deploying, and maintaining highly available, scalable global systems.
- Lead incident response, perform root cause analysis, and conduct blameless post-mortems to improve system resilience.
- Automate deployment, monitoring, and maintenance processes to enhance platform stability and reliability.
- Oversee production environments, including applications, middleware, infrastructure, and databases like Postgres, MongoDB, and MySQL.
- Develop and implement CI / CD pipelines and configuration management using Jenkins, Ansible, and Shell scripting.
Role Responsibilities :
Drive architecture design and implementation for containerized, cloud-native applications using Docker and Kubernetes.Collaborate with Agile teams to define technical requirements and best practices for reliability engineering.Monitor system health using tools like Prometheus, ELK, AppDynamics, and Nagios; ensure proactive scaling and performance tuning.Manage middleware environments (Weblogic, Tomcat, JBoss) and distributed systems (RabbitMQ, Kafka, Redis).Participate in planning sessions, architecture / code reviews, and mentor team members on SRE practices and tooling.Skills Required
Jenkins, Docker, Linux, Ansible, Kubernetes