Talent.com
This job offer is not available in your country.
Senior Site Reliability Engineer

Senior Site Reliability Engineer

Idox plcPune, Maharashtra, India
30+ days ago
Job description

Key responsibilities :

Infrastructure Management :

  • Design, build, and maintain scalable, highly available, and secure AWS environments.
  • Manage and automate infrastructure as code (IaC) using tools like Terraform, OpenTofu, CloudFormation, Ansible.
  • Optimize existing systemsto improve efficiency, reliability, and performance.

Monitoring s Incident Response :

  • Implement comprehensive monitoring solutions using AWS CloudWatch, Prometheus, Grafana, or similar tools.
  • Respond to incidents, troubleshoot issues, and conduct root cause analysis to prevent future occurrences.
  • Develop and maintain runbooks and post-incident reviews to enhance incident response processes.
  • Automation s CI / CD :

  • Develop and maintain automated deployment pipelines using tools like GitHub Actions, GitLab CI / CD, Jenkins.
  • Automate routine tasks, including monitoring, backups, and scaling operations.
  • Collaborate with development teams to ensure smooth CI / CD processes and effective change management.
  • Security s Compliance :

  • Ensure systems meet security and internal governance standards, including data encryption, access controls, and logging.
  • Collaborate with the security operations team to conduct regular security audits and vulnerability assessments, implementing remediation as needed.
  • Enforce best practices in AWS identityand access management (IAM).
  • Performance Optimization :

  • Perform capacity planning and load testing to ensure systems scale efficiently.
  • Optimize application performance, identifying and eliminating bottlenecks in the system.
  • Utilize AWS services such as CloudFront, RDS, and ElastiCache to enhance performance.
  • Collaboration & Communication :

  • Work closely with internal teams to understand application requirements and provide infrastructure solutions.
  • Advocate for best practices in infrastructure management and cloud operations.
  • Communicate clearly with stakeholders, providing updates on system performance, incidents, and project status.
  • To be successful, you should bring

    Education :

  • Bachelor’s degree in computer science, Information Technology
  • Engineering, or a related field (or equivalent experience)
  • Experience :

  • 3+ years of experience as a Site Reliability Engineer, DevOps Engineer, Systems Administrator, or in a similar role.
  • Extensive experience managingand automating AWS environments.
  • Solid experience with infrastructure as code (IaC) tools like Terraform, OpenTofu, CloudFormation, or Ansible.
  • Proficiency in scripting languages such as Python, Bash, Golang or similar.
  • Technical Skills :

  • Good understanding of AWS services(EC2, S3, RDS, Elasticache, EKS, VPC, etc.).
  • Experience with containerization and orchestration (Docker, Kubernetes).
  • Solid understanding of CI / CD pipelines and tools (GitHub Actions, Gitlab CI / CD, Jenkins).
  • Knowledge of networking, security principles, and best practices in a cloud environment.
  • Familiarity with monitoring and logging tools (Prometheus, Grafana, Loki, Promtail, CloudWatch, PRTG).
  • Soft Skills :

  • Strong problem-solving skills with a focus on root cause analysis.
  • Excellent communication and collaboration abilities.
  • Ability to work in a fast-paced, 24 / 7 production environment.
  • Passion for continuous learning and improvement.
  • Create a job alert for this search

    Senior Site Reliability Engineer • Pune, Maharashtra, India