Key responsibilities :
Infrastructure Management :
- Design, build, and maintain scalable, highly available, and secure AWS environments.
- Manage and automate infrastructure as code (IaC) using tools like Terraform, OpenTofu, CloudFormation, Ansible.
- Optimize existing systemsto improve efficiency, reliability, and performance.
Monitoring s Incident Response :
Implement comprehensive monitoring solutions using AWS CloudWatch, Prometheus, Grafana, or similar tools.Respond to incidents, troubleshoot issues, and conduct root cause analysis to prevent future occurrences.Develop and maintain runbooks and post-incident reviews to enhance incident response processes.Automation s CI / CD :
Develop and maintain automated deployment pipelines using tools like GitHub Actions, GitLab CI / CD, Jenkins.Automate routine tasks, including monitoring, backups, and scaling operations.Collaborate with development teams to ensure smooth CI / CD processes and effective change management.Security s Compliance :
Ensure systems meet security and internal governance standards, including data encryption, access controls, and logging.Collaborate with the security operations team to conduct regular security audits and vulnerability assessments, implementing remediation as needed.Enforce best practices in AWS identityand access management (IAM).Performance Optimization :
Perform capacity planning and load testing to ensure systems scale efficiently.Optimize application performance, identifying and eliminating bottlenecks in the system.Utilize AWS services such as CloudFront, RDS, and ElastiCache to enhance performance.Collaboration & Communication :
Work closely with internal teams to understand application requirements and provide infrastructure solutions.Advocate for best practices in infrastructure management and cloud operations.Communicate clearly with stakeholders, providing updates on system performance, incidents, and project status.To be successful, you should bring
Education :
Bachelor’s degree in computer science, Information TechnologyEngineering, or a related field (or equivalent experience)Experience :
3+ years of experience as a Site Reliability Engineer, DevOps Engineer, Systems Administrator, or in a similar role.Extensive experience managingand automating AWS environments.Solid experience with infrastructure as code (IaC) tools like Terraform, OpenTofu, CloudFormation, or Ansible.Proficiency in scripting languages such as Python, Bash, Golang or similar.Technical Skills :
Good understanding of AWS services(EC2, S3, RDS, Elasticache, EKS, VPC, etc.).Experience with containerization and orchestration (Docker, Kubernetes).Solid understanding of CI / CD pipelines and tools (GitHub Actions, Gitlab CI / CD, Jenkins).Knowledge of networking, security principles, and best practices in a cloud environment.Familiarity with monitoring and logging tools (Prometheus, Grafana, Loki, Promtail, CloudWatch, PRTG).Soft Skills :
Strong problem-solving skills with a focus on root cause analysis.Excellent communication and collaboration abilities.Ability to work in a fast-paced, 24 / 7 production environment.Passion for continuous learning and improvement.