Director of Cloud, DevOps, and SRE : Emphasis on Execution
We are looking for a Director of Cloud, DevOps, and Site Reliability Engineering (SRE) who will be a hands-on, execution-focused leader responsible for driving the technical strategy, implementation, and continuous operation of our cloud infrastructure and services. This role demands a pragmatic leader capable of translating strategic vision into tangible, high-quality, and scalable results.
Key Responsibilities and Execution Focus
The primary responsibility of the Director is to execute on the cloud, DevOps, and SRE strategy, ensuring immediate and long-term operational excellence.
1. Delivery and Implementation (Execution)
- Lead the migration and deployment of core business applications and services to cloud platforms (e.g., AWS, Azure, GCP), ensuring projects are delivered on time, within budget, and meet defined non-functional requirements (security, scalability, performance).
- Direct the implementation of Continuous Integration / Continuous Delivery (CI / CD) pipelines across all engineering teams, focusing on fully automated, reliable, and repeatable deployments.
- Drive Infrastructure as Code (IaC) adoption (e.g., Terraform, Ansible) , establishing a 100% code-driven infrastructure environment with clear governance and review processes.
- Establish and enforce Service Level Objectives (SLOs) and Service Level Indicators (SLIs) for all critical services, immediately implementing monitoring and alerting to measure against these targets.
2. Operational Excellence and Reliability (SRE Execution)
Direct the SRE function to minimize operational toil by developing and deploying automation tools and services for routine tasks, incident response, and capacity management.Lead major incident response and post-mortem processes , ensuring effective root cause analysis and implementing immediate, execution-driven solutions to prevent recurrence.Execute a robust cost management strategy for cloud resources, implementing FinOps practices to optimize spending without compromising reliability or performance.Own the security posture of the cloud environment , working hands-on with security teams to implement and automate compliance and security controls (DevSecOps).3. Team Leadership and Mentorship (Pragmatic Leadership)
Recruit, develop, and mentor a high-performing team of Cloud Engineers, DevOps Engineers, and SREs, setting clear, execution-focused goals and metrics.Foster a culture of ownership, accountability, and execution within the team, emphasizing rapid iteration, collaboration, and bias for action.Act as a hands-on leader by actively participating in design reviews, critical deployments, and troubleshooting efforts.Qualifications and Requirements
Required Skills & Experience (Execution-Driven)
Minimum of 10 years of progressive experience in infrastructure, operations, or software engineering, with at least 3 years in a Director or Senior Management role overseeing Cloud, DevOps, or SRE teams.Deep, demonstrable expertise in a major cloud provider (AWS, Azure, and GCP) , including advanced networking, security services, and serverless architectures. Certification at the Professional / Specialty level is a plus.Extensive experience implementing and scaling IaC and configuration management tools (e.g., Terraform, Ansible, SaltStack) in a production environment.Proven track record of establishing and running SRE practices (SLOs, error budgets, toil reduction) with tangible results in improving service reliability and availability.Proficiency in modern scripting / programming languages (e.g., Python, Go, Bash) for automation and tool development.Education
Bachelor’s degree in Computer Science, Engineering, or a related field; equivalent practical experience is accepted.