Talent.com
This job offer is not available in your country.
Site Reliability Engineer

Site Reliability Engineer

Volansys-An ACL Digital CompanyAhmedabad
6 days ago
Job description

Job Description :

1. AWS Cloud Infrastructure :

  • Design, deploy, and manage scalable, secure, and highly available systems on AWS.
  • Optimize cloud costs, enforce tagging, and implement security best practices (IAM, VPC, GuardDuty, etc.).
  • Automate infrastructure provisioning using Terraform or AWS CDK.
  • Ensure backup, disaster recovery, and high availability (HA) strategies are in place.

2. Kubernetes (EKS preferred) :

  • Manage and scale Kubernetes clusters (preferably Amazon EKS).
  • Implement CI / CD pipelines with GitOps (e.g., ArgoCD or Flux) or traditional tools (e.g., Jenkins, GitLab).
  • Enforce RBAC policies, namespaces isolation, and pod security policies.
  • Monitor cluster health, optimize pod scheduling, autoscaling, and resource limits / requests.
  • 3. Monitoring and Observability (Datadog) :

  • Build and maintain Datadog dashboards for real-time visibility across systems and services.
  • Set up alerting policies, SLOs, SLIs, and incident response workflows.
  • Integrate Datadog with AWS, Kubernetes, and applications for full-stack observability.
  • Conduct post-incident reviews using Datadog analytics to reduce MTTR.
  • 4. Automation and DevOps :

  • Automate manual processes (e.g., server setup, patching, scaling) using Python, Bash, or Ansible.
  • Maintain and improve CI / CD pipelines (Jenkins) for faster and more reliable deployments.
  • Drive Infrastructure-as-Code (IaC) practices using Terraform to manage cloud resources.
  • Promote GitOps and version-controlled deployments.
  • 5. Linux Systems Administration :

  • Administer Linux servers (Ubuntu, RHEL, Amazon Linux) for stability and performance.
  • Harden OS security, configure SELinux, firewalls, and ensure timely patching.
  • Troubleshoot system-level issues : disk, memory, network, and processes.
  • Optimize system performance using tools like top, htop, iotop, netstat, etc.
  • ref : hirist.tech)

    Create a job alert for this search

    Site Reliability Engineer • Ahmedabad