We are seeking SRE / Ansible Developers to join our Enterprise SRE Center of Excellence (COE) team. This team is responsible for defining development standards, ensuring compliance, and building automation frameworks that reduce downtime and improve system reliability across the organization. In this role, you will be instrumental in automating complex failover processes using Ansible, Ansible Tower, and modern DevOps practices. You'll collaborate with cross-functional teams to build a robust, push-button failover system that supports real-time disaster recovery across critical :
- Design and maintain Ansible playbooks and Ansible Tower workflows for disaster recovery and failover automation.
- Automate failover processes across relational databases (Oracle, MySQL, PostgreSQL, SQL Server).
- Integrate with tools like Pronghorn for DNS failover and routing logic.
- Build self-healing scripts and reusable automation patterns for large-scale, asynchronous systems.
- Develop a centralized failover dashboard with visual indicators and dependency mapping.
- Collaborate with DBAs, application owners, and network engineers to ensure seamless failover orchestration.
- Support Kubernetes-based scaling strategies and CI / CD integration using GitLab.
- Contribute to operational readiness frameworks including blue-green deployments and observability.
Required Skills & Experience :
5+ years in DevOps / SRE roles within enterprise environments.Strong scripting skills in Bash and Python.Expertise in Ansible and Ansible Tower for infrastructure automation.Experience with CI / CD tools like Jenkins and GitLab.Proficiency in Git, version control, and release strategies.Familiarity with Kubernetes, and AWS cloud services.Deep understanding of relational databases and failover strategies.Knowledge of networking, load balancing, and asynchronous messaging systems.Experience with observability tools and monitoring systems.Excellent problem-solving and cross-functional collaboration skills.(ref : hirist.tech)