Senior Cloud Ops Engineer Azure (Infrastructure & Operations)
We are seeking a highly skilled and experienced Senior Cloud Ops Engineer to join our client's Infrastructure & Operations team. The successful candidate will be a critical asset in managing and optimizing their diverse cloud services offering to ensure high availability, performance, and security across the organization. This role requires deep expertise across various cloud services, cloud security, cloud identity, cloud governance and automation with strong focus on infrastructure and operational excellence.
Key Responsibilities
- Lead the design, implementation, and ongoing evolution of the Azure Landing Zone architecture, including platform-level services and automated provisioning of application landing zones using Terraform.
- Architect and implement secure, scalable virtual networks, storage strategies, identity, and security configurations aligned to enterprise standards and regulatory requirements.
- Own the design and delivery of robust CI / CD pipelines using GitHub (Actions) and other tools; implement and champion Infrastructure-as-Code (IaC) and configuration management best practices.
- Define and deploy platform engineering tooling, automation, and patterns that enable repeatable, secure, and accelerated workload onboarding and application migration.
- Design and operationalize GitOps workflows and pipelines to provide declarative, auditable, and self-healing infrastructure delivery.
- Establish and enforce cloud security controls, identity & access management, and compliance guardrails; perform threat modeling and collaborate with security teams on remediation.
- Drive observability and reliability : define monitoring, logging, alerting, SLOs / SLIs, and incident response practices to ensure production stability and performance.
- Optimize cloud cost, capacity planning, tagging, and governance; recommend and implement cost-optimization strategies and guardrails.
- Lead operational excellence for cloud infrastructure : automate runbooks, manage change control, and reduce toil through continuous improvement.
- Mentor and coach engineers, provide technical leadership across cross-functional teams, and influence cloud strategy and roadmap decisions.
- Partner with product, development, and enterprise architecture teams to translate business requirements into secure, resilient, and scalable cloud solutions.
Essential Functions
Demonstrate deep expertise in core Azure services and platform capabilities, including compute, networking, storage, identity, and PaaS offerings.Author, maintain, and govern Infrastructure-as-Code using Terraform and / or ARM templates; create reusable modules, enforce standards, and drive automated provisioning.Design, implement, and optimize DevOps practices and CI / CD pipelines (e.g., GitHub Actions, Azure DevOps) to enable secure, repeatable, and auditable deployments.Lead containerization and orchestration efforts using Docker and Kubernetes (AKS); define patterns for packaging, deployment, scaling, and lifecycle management of containerized workloads.Define and enforce cloud security best practices : identity & access management, network security, encryption, secrets management, and policy-as-code for continuous compliance.Troubleshoot complex infrastructure and application issues using structured root-cause analysis; implement long-term mitigations and proactive monitoring to reduce incidents.Communicate clearly and persuasively with technical and non-technical stakeholders; produce design documentation, runbooks, and executive-level updates as needed.Drive platform engineering initiatives : design platform patterns, implement self-service tooling, automate onboarding, and improve developer experience across the organization.Advocate for operational excellence by establishing observability (monitoring, logging, tracing), SLOs / SLIs, incident response processes, and cost governance practices.Highly desirable if the candidate has AWS and Azure expertise on Landing Zone Architecture, Managing Cloud Policies, Governance, Monitoring, Identify Solutions, System Integrations, driving application migration strategies, Application CI-CD best practices, Catalogs of Terraform Modules & release management and Infrastructure pipelines.
Required Qualifications
Bachelor's degree in Computer Science, Information Technology, or a related field, or equivalent practical experience.7+ years of professional experience in Azure Cloud Management & Operations specifically within an Infrastructure and Operations context.Proven master of landing zone deployment and managementExcellent scripting and automation skills (e.g., ARM, Terraform, Shell, Python, PowerShell).Strong analytical, problem-solving, and communication skills, with the ability to convey complex technical information clearly.Experience with DevOps practices and tools (e.g., Terraform, Ansible, Kubernetes, Docker).Experience with database migration projects, including cross-platform migrations.Knowledge of monitoring tools like Prometheus, Grafana, or equivalent.Technologies : Azure Core Platform
o Hands on ARM, Terraform, Azure Cli, Kubectl, powershell and python for automations
o Compute : Azure Virtual Machines, Azure App Service, Azure Functions
o Networking : Azure Virtual Network (VNet), Azure Load Balancer, Azure Application Gateway,
Azure Firewall, Azure Bastion
o Storage : Azure Blob Storage, Azure Files, Azure Managed Disks
o Identity : Azure Active Directory (AAD), Azure AD Connect, Conditional Access, Managed Identities
Infrastructure-as-Code (IaC)
o Primary : ARM and Terraform (authoring modules, state management, workspaces)
DevOps / CI-CD / GitOps
o GitHub (Actions, repos, packages), Azure DevOps (Pipelines, Repos)
Containerization and orchestration
o Docker (image authoring, registries)
o Azure Kubernetes Service (AKS) and Kubernetes expertise (helm, manifests, operators)
o Container registries : Azure Container Registry (ACR)
Security & compliance
o Azure Policy, Azure Blueprints, Microsoft Defender for Cloud (formerly Security Center)
Observability & reliability
o Azure Monitor, Log Analytics, Application Insights
o Tracing & metrics tools, alerting, SLO / SLA concepts
Governance, cost & operations
o Tagging strategies, cost management (Azure Cost Management), subscription / tenant strategy, management groups
Operational Support & Security
o Serve as an escalation point for critical issues, providing 24 / 7 on-call support on a rotational basis.
o Enforce security policies and procedures, managing user access, roles, and privileges in compliance with internal and external audit requirements.
o Troubleshoot and resolve complex production incidents, providing root cause analysis and implementing permanent fixes.