Job Description :
1. AWS Cloud Infrastructure :
- Design, deploy, and manage scalable, secure, and highly available systems on AWS.
- Optimize cloud costs, enforce tagging, and implement security best practices (IAM, VPC, GuardDuty, etc.).
- Automate infrastructure provisioning using Terraform or AWS CDK.
- Ensure backup, disaster recovery, and high availability (HA) strategies are in place.
2. Kubernetes (EKS preferred) :
Manage and scale Kubernetes clusters (preferably Amazon EKS).Implement CI / CD pipelines with GitOps (e.g., ArgoCD or Flux) or traditional tools (e.g., Jenkins, GitLab).Enforce RBAC policies, namespaces isolation, and pod security policies.Monitor cluster health, optimize pod scheduling, autoscaling, and resource limits / requests.3. Monitoring and Observability (Datadog) :
Build and maintain Datadog dashboards for real-time visibility across systems and services.Set up alerting policies, SLOs, SLIs, and incident response workflows.Integrate Datadog with AWS, Kubernetes, and applications for full-stack observability.Conduct post-incident reviews using Datadog analytics to reduce MTTR.4. Automation and DevOps :
Automate manual processes (e.g., server setup, patching, scaling) using Python, Bash, or Ansible.Maintain and improve CI / CD pipelines (Jenkins) for faster and more reliable deployments.Drive Infrastructure-as-Code (IaC) practices using Terraform to manage cloud resources.Promote GitOps and version-controlled deployments.5. Linux Systems Administration :
Administer Linux servers (Ubuntu, RHEL, Amazon Linux) for stability and performance.Harden OS security, configure SELinux, firewalls, and ensure timely patching.Troubleshoot system-level issues : disk, memory, network, and processes.Optimize system performance using tools like top, htop, iotop, netstat, etc.ref : hirist.tech)