Key Responsibilities
- Design, deploy, and manage infrastructure on AWS (EC2, VPC, ALB, IAM, Route53) Operate and maintain Kubernetes clusters (EKS and kubeadm) using Helm and ArgoCD Build, optimize, and maintain CI / CD pipelines (GitHub Actions, Jenkins, GitLab CI) Automate infrastructure provisioning using Terraform, with modular and version-controlled setups
- Implement and monitor observability systems using Prometheus, Grafana, Loki, or ELK stack
- Manage production incidents, perform root cause analysis, and implement preventive actions
- Enforce security best practices with IAM, HashiCorp Vault, TLS, and access controls
- Collaborate with engineering teams to ensure deployment hygiene, cost efficiency, and system scalability
Technical Requirements
Cloud & InfrastructureAWS (EC2, VPC, IAM, ALB, CloudWatch, Route53), DNS, NAT, routingContainers & OrchestrationKubernetes (EKS preferred), kubeadm, Helm, ArgoCD, GitOps workflowsInfrastructure as Code & AutomationTerraform (modular, environment-specific), Bash scripting, YAML, JSON, basic Python CI / CDGitHub Actions, GitLab CI, JenkinsMonitoring & ObservabilityPrometheus, Grafana, Loki, ELK stack, SLO / SLA implementation, latency / P99 tracking SecurityIAM, Vault, network security groups, TLS, least-privilege access enforcementPreferred Experience & Traits
Prior experience operating production-grade systems and Kubernetes clustersStrong understanding of cloud networking, VPC / subnet design, and security configurationsAbility to debug real-time incidents and proactively optimize system reliabilityIndependent ownership mindset with strong collaboration and communication skillsExposure to hybrid or co-located infrastructure environments is a plusSkills Required
Ec2, Vpc, Iam, Cloudwatch, Route53, Aws