DevOps 3 – Lead DevOps
Overview
We are seeking a DevOps 3 – Lead DevOps Engineer / Infrastructure Architect to own and scale our cloud-native infrastructure, with a deep focus on Kubernetes, platform reliability, and automation. This role requires strong architectural thinking, hands-on technical depth, and the ability to lead DevOps initiatives across multiple engineering teams. You will drive infrastructure design, build platform-level capabilities, and ensure our environments are secure, observable, cost-efficient, and highly scalable.
This role sits at the intersection of DevOps, SRE, cloud architecture, and platform engineering , and requires expert-level hands-on execution.
What You’ll Do
Kubernetes Architecture & Platform Ownership
- Architect, design, and operate large-scale Kubernetes clusters (EKS / GKE / AKS) for microservices and data workloads.
- Define cluster topology, autoscaling strategies (HPA / VPA / Karpenter), node groups, and networking models.
- Build and maintain service mesh frameworks (Istio / Linkerd), ingress controllers, and API gateways.
- Develop internal platform tooling for deployments, traffic management, rollback, and cluster governance.
- Own cluster security : pod security standards, network policies, admission controllers, RBAC.
Infrastructure Architecture & Automation
Design end-to-end cloud infrastructure architectures ensuring security, reliability, and cost optimization.Implement and enforce Infrastructure-as-Code using Terraform / Pulumi at org-wide scale.Define patterns for multi-account architecture, VPC design, load balancing, secret management, and zero-trust networking.Lead cloud cost optimization initiatives and implement FinOps practices.CI / CD Platform & Deployment Engineering
Architect highly reliable CI / CD workflows with GitHub Actions / GitLab CI / Jenkins / ArgoCD.Build automated release pipelines for microservices, operators, and stateful workloads.Set standards for blue / green, canary, shadow deploy, and progressive rollouts.Build reusable deployment templates and developer self-service mechanisms.Reliability, Observability & Operations
Own organization-wide monitoring, logging, and alerting platforms (Prometheus, Loki, Grafana, New Relic).Define SLOs / SLIs, reliability targets, and implement automated remediation workflows.Lead incident response for high-severity issues and drive long-term fixes through RCAs.Build platform-wide health dashboards, cluster insights, cost dashboards, and performance metrics.Security & Compliance
Implement Kubernetes and cloud-native security best practices : IAM hardening, OPA / Gatekeeper policies, secret lifecycle, container scanning, and runtime security.Automate compliance validations and enforce organization-wide DevSecOps policies.Partner with security teams for penetration tests, threat modeling, and vulnerability management.Technical Leadership & Collaboration
Mentor engineers and guide teams on DevOps, Kubernetes, and cloud architecture.Lead technical design reviews, platform roadmap discussions, and cross-team initiatives.Influence engineering decisions by providing architectural recommendations and operational insights.What You’ll Bring
Core Technical Strengths
6–12 years of DevOps / SRE / Platform experience with deep Kubernetes expertise.Hands-on experience designing, operating, and scaling production-grade Kubernetes clusters.Expert-level understanding of :Kubernetes internals (scheduler, controller manager, CRDs, operators)Container runtime fundamentalsCluster networking (CNI, service mesh, ingress, DNS)Security (PSPs, policies, certificate mgmt)Strong cloud architecture experience (AWS / GCP / Azure), especially :VPC design, interconnects, NAT, security groupsLoad balancing, autoscaling, resilience patternsMonitoring stack setup and automationAutomation & Tooling
Strong proficiency in IaC (Terraform, Pulumi) and GitOps (ArgoCD / FluxCD).Programming / scripting proficiency in Python, Go, or Bash.Experience building internal dev platforms, tools, and automation frameworks.Soft Skills
Ability to drive architecture discussions and make high-impact technical decisions.Leadership mindset with strong ownership and ability to influence without authority.Strong communication and stakeholder management across engineering, security, and product teams.Preferred Qualifications
Experience with multi-cluster, multi-region, or hybrid-cloud Kubernetes setups.Prior work with distributed systems, high-scale environments, or data platforms.Exposure to eBPF, Cilium, or advanced CNI plugins.Certifications such as CKA / CKAD / CKS or AWS Solutions Architect.