Job Summary :
We are seeking a seasoned Site Reliability Engineer (SRE) Engineer to join our growing team.
This is a critical role in ensuring the reliability, scalability, and performance of our cloud infrastructure on AWS. You will leverage your expertise in automation, infrastructure management, and cost optimization to build and maintain resilient systems that support our business objectives. This role requires a proactive, results-oriented individual with a passion for building and maintaining robust, scalable :
- Design, deploy, and manage highly available and scalable infrastructure on AWS.
- Automate infrastructure provisioning and configuration using tools like Terraform and Ansible.
- Develop and implement monitoring and alerting systems to proactively identify and troubleshoot incidents.
- Optimize infrastructure costs on AWS through resource management and utilization analysis
- Collaborate with development teams to implement DevOps practices and ensure smooth deployments.
- Participate in on-call rotations and diligently respond to incidents to minimize downtime
- Continuously improve infrastructure reliability and performance through automation and best practices.
- Stay up-to-date with the latest trends and technologies in cloud computing and SRE :
- 4+ years of experience in Site Reliability Engineering or a related field (Devops)
- Proven expertise in deploying and managing infrastructure on AWS (EC2, S3, VPC, etc.)
- Experience in Linux OS is a must. Prior experience as a Linux administrator a plus.
- Strong understanding of networking fundamentals is a must.
- Strong knowledge of infrastructure automation tools like Terraform and Ansible
- Experience with DevOps methodologies and CI / CD pipelines
- A keen understanding of cost optimization principles in AWS
- Excellent problem-solving and analytical skills
- Ability to work independently and as part of a cross-functional team
- Diligent and proactive approach to incident response
- Willingness to participate in on-call to have :
- Experience with SOC compliance frameworks (SOC 2, HIPAA, etc.)
- Experience with container orchestration tools (Kubernetes)
(ref : hirist.tech)