Key Responsibilities :
- System Reliability : Monitor, maintain, and enhance system uptime and availability, minimizing downtime.
- Infrastructure as Code (IaC) : Design, implement, and manage infrastructure using tools such as CloudFormation, Terraform, Ansible, or Puppet.
- Automation : Develop and maintain CI / CD pipelines and deployment scripts to streamline software releases.
- Containerization : Manage and orchestrate application containers using Docker Swarm and AWS ECS.
- Monitoring and Alerting : Set up and maintain monitoring tools like CloudWatch, Datadog, Zenduty, and New Relic for proactive issue resolution.
- Scalability and Performance : Optimize application and infrastructure performance collaboratively with development teams.
- Security : Implement and maintain security best practices across development and operations pipelines.
- Incident Management : Participate in incident response, root cause analysis, and preventive measures.
- Documentation : Maintain clear documentation of system architecture, deployment processes, and best practices.
- Collaboration : Facilitate communication and knowledge sharing between development, operations, and other teams.
Qualifications :
Bachelor's degree in Computer Science, Information Technology, or related field.Minimum 2 years of experience in DevOps, System Operations, or SRE roles.Strong Linux knowledge and shell scripting skills.Proficiency in AWS cloud platform and AWS services.Understanding of networking, security, and secure infrastructure best practices.Knowledge of ELK stack and Kafka.Experience with Docker Swarm or AWS ECS for containerization.Hands-on experience with CloudFormation, Terraform, Ansible.Familiarity with CI / CD pipelines and version control (GitLab CI, Jenkins, Git).Working knowledge of databases (MySQL, Postgres).Willingness to participate in L1 incident response rotation.AWS certifications (e.g., AWS Certified DevOps Engineer, AWS Certified Solutions Architect) are a plus.Skills Required
Aws, Sql, Cloud, Devops, AWS ECS