Aptean is seeking an experienced and hands-on Manager, SRE (Cloud Infrastructure & Operations) to lead a team of 15 engineers. You will be responsible for managing the infrastructure layer of our multi-tenant, cloud-hosted ERP products. This critical role covers platform reliability, product upgrades, cloud security, incident and preventive maintenance, disaster recovery, and compliance audits. You will also act as a stage-gate for all production deployments, ensuring release readiness, rollback capability, and platform stability.
Roles and Responsibilities
- Cloud Infrastructure Oversight : Oversee provisioning, monitoring, and scaling of cloud environments (primarily Azure) for ERP products. Ensure optimal performance, cost control, and platform stability.
- SaaS Product Operations : Own product environment availability (Dev, UAT, Prod), plan platform upgrades, apply security patches, and manage certificates and access.
- Incident Management : Lead incident response for outages and degradation. Perform Root Cause Analysis (RCA), document learnings, and implement post-mortem action items.
- Preventive Maintenance : Define and execute regular health checks, patching schedules, environment cleanups, and alert tuning.
- Disaster Recovery Planning : Develop and test Disaster Recovery (DR) / Business Continuity Planning (BCP) plans. Ensure business continuity across all cloud-hosted environments.
- Security & Compliance : Lead infrastructure-level compliance activities for SOC 2, ISO 27001 , and secure deployment pipelines. Coordinate with infosec and audit teams.
- Production Deployment Stage-Gate : Review and approve all deployment tickets. Validate readiness, rollback strategy, and impact analysis before production cutover.
- Team Leadership : Lead, coach, and upskill a team of cloud and DevOps engineers. Foster a learning culture aligned with platform reliability and innovation.
Skills
Cloud Platform : Advanced proficiency in Azure (App Services, VM, Networking, Storage, Defender).ERP Infrastructure : Advanced understanding of multi-tenant ERP hosting, Cloud DB tuning, and PaaS scaling.DevOps : Intermediate knowledge of CI / CD (Azure DevOps, GitHub Actions) and automation.Infrastructure as Code (IaC) : Intermediate experience with Terraform / Bicep / ARM Templates.Monitoring & Logging : Advanced proficiency in Azure Monitor, Application Insights, Log Analytics.Incident Management : Expert in ITIL, On-call Runbooks, and RCA Writing.Preventive Operations : Expert in Scheduled health checks and capacity management.Security & Access : Advanced understanding of IAM, Azure AD, Role-based Access, and Secret Rotation.Disaster Recovery : Advanced knowledge of DR Drills, Geo-Redundancy, and RTO / RPO.Audit & Compliance : Advanced understanding of SOC 2, ISO 27001, and Risk Registers.Release Stage-Gate : Expert in Deployment approvals and Go / No-go criteria.Collaboration : Expert in working with Product, Security, and Development teams.Tools : Intermediate proficiency with Azure DevOps, Jira, ServiceNow, Salesforce (case management).Leadership : Expert in People development, Shift planning, and Mentoring.Strong hands-on knowledge of Azure (VMs, PaaS, Networking, Monitoring, Identity).Experience with ERP platforms (SAP Cloud, Infor, Oracle Cloud, or custom-built ERP solutions).Good grasp of DevOps practices, CI / CD pipelines, infrastructure as code (IaC).Familiarity with SOC 2, ISO 27001, and data privacy compliance.Qualifications
Education : Bachelor's degree (Required). Master's degree (Preferred).Work Experience : 10+ years of experience in Cloud Infrastructure / SaaS Operations, with 3+ years managing teams in a cloud product environment (preferably multi-tenant SaaS).Certifications : ITIL or SRE certification preferred.Skills Required
cloud platform , Erp System, Devops, Incident Management, Azure Database