The Lead Platform & DevOps Engineer position will require heavy hands-on work on one or more public cloud platforms leveraging several PaaS and marketplace services. This position will play a key role in the configuration, implementation, monitoring and troubleshooting of platform components, which include (but not limited to) networking, VMs, API management services, Application Gateways, Load Balancers, Kubernetes clusters etc. The lead engineer will be expected to write / review code that bridges the gap between application and infrastructure using tools like Terraform and test the system to be reliable, scalable and resilient. The individual is expected to collaborate and operate in parallel with the application Engineers.
He / she will be the first level of escalation to resolve technical blockers.
ESSENTIAL JOB RESPONSIBILITIES :
List and describe the positions key responsibilities in order of importance, and indicate the approximate percentage of time spent on the responsibility. (Percentages must add up to 100%.) For each, describe in simple terms what the job holder must do to accomplish the main purpose of the job and the amount of direction that is required to perform the job duties. If the job manages others, describe the management duties (including authority to hire / fire / recommend pay increases / manage overall work product / schedule, etc.) Insert additional rows as needed.
Note : These statements are not intended to be an exhaustive list of all responsibilities and duties.
Responsibility : Technical
- Designing, implementing, and managing Azure-based cloud infrastructure to ensure scalability, performance, and security.
- Collaborating with cross-functional teams to deploy and maintain cloud services and resources.
- Monitoring, optimizing, and troubleshooting Azure environments to ensure high availability and cost-effectiveness.
- Technical Expertise with Microsoft Azure including foundational services such as Virtual Networks, Virtual Machines, Storage, Load Balancer, Azure Active Directory, App Service, Azure SQL Database, Azure Service Bus, Application Gateway, Azure Redis Cache, Cosmos DB etc.
- Developing and maintaining GitLab CI / CD / Azure DevOps pipelines for continuous integration and deployment.
- Automating software build, testing, and deployment processes to accelerate software delivery.
- Collaborating with development teams to enhance CI / CD practices and tools.
- Setting up, configuring, and managing AKS clusters for container orchestration and management.
- Implementing best practices for deploying and scaling containerized applications.
- Monitoring and ensuring the reliability and performance of AKS environments.
- Integrating Cloudflare services to enhance the security, performance, and scalability of web applications.
- Configuring and managing DNS, CDN, and DDoS protection solutions using Cloudflare.
- Collaborating with cybersecurity teams to implement robust security measures.
- Implementing and configuring monitoring solutions using tools such as New Relic, Splunk, and Dynatrace.
- Creating dashboards and alerts to proactively identify and address performance and security issues.
- Analyzing monitoring data to optimize system performance and resource utilization.
- Utilizing Infrastructure as Code tools such as Terraform / ARM templates / python / bash / PowerShell for infrastructure provisioning and automation.
- Maintaining version-controlled IaC scripts for infrastructure changes and updates.
Mentoring / Technical :
Mentor and guide junior members of teamTroubleshooting and triage of technical roadblocks for scheduled deliverablesKNOWLEDGE, SKILLS AND ABILITIES :
Indicate the education level, previous experience, specific knowledge, skills and abilities required to meet minimum requirements for this position.Education :
4-year degree in computer science, Information Technology, or related fieldExperience :
Minimum 8+ years experience in designing and managing Azure-based cloud infrastructure.Minimum 5+ years experience in GitLab CI / CD pipelines and version control.In-depth knowledge of AKS and containerization technologies.Experience with Cloudflare or similar CDN and security services.Proficiency in Infrastructure as Code tools (Terraform, ARM templates, etc.).Expert knowledge with monitoring and observability tools like New Relic, Splunk, and Dynatrace.Excellent problem-solving and troubleshooting skills.Strong communication and teamwork abilities.Knowledge and skills (general and technical)Technical Skills :
Level of competency 4 on a scale of 5 for skills mentioned below.Cloud Provider : AzureCore Services : Elasticpool, SQL, Application Gateway, API Management (APIM), Key Vaults, AKS (Azure Kubernetes Service), VMSS (Virtual Machine Scale Sets), VMNetworking : NSG (Network Security Groups), Private Endpoints, Private Linked Service, VNet, Subnets, WAF (Web Application Firewall), GeoReplicationStorage : Storage AccountsMessaging and Events : EventHub, EventGrid, Azure Service Bus (Namespaces, Queues, Topics)Identity and Security : Managed Identities / Workload Identities, Private DNS, Auth0Containerization and Orchestration :
Kubernetes (K8s) : For container orchestrationHelm : For Kubernetes package managementDocker : For containerizationInfrastructure as Code (IaC) :
TerraformCI / CD :
GitLab : CI / CD pipeline management for continuous integration and deploymentAzure DevOpsMonitoring and Observability :
New Relic / SplunkAutomation and Scripting :
PowerShellPythonOther requirements (licenses, certifications, specialized training)Good to have certifications :
Certified Kubernetes AdministratorAZ-104 (Microsoft Certified : Azure Administrator Associate)AZ-700 (Microsoft Certified : Azure Network Engineer Associate)HashiCorp Certified : Terraform Associate (003)AZ-305 : Designing Microsoft Azure Infrastructure SolutionsWORKING RELATIONSHIPS
Indicate the primary internal and external contacts with whom the position interacts on a regular and recurring basis, and the purpose / nature of the relationship. If not applicable, indicate n / a.Internal contacts (and purpose of relationship) :
Digital EngineeringRestaurant EngineeringTech OpsData EngineeringQuality EngineeringDatabase EngineeringEnterprise ArchitecturePlatform Architecture and GovernanceCyber Security / SecurityCloud InfrastructureNetwork EngineeringPurpose of interaction is directly / indirectly related to Project and Engineering work Platform & DevOps Team is expected to accomplish.External contacts : (and purpose of relationship)
OEMs (Microsoft, Gitlab, Cloudflare, Okta)