Your Responsibilities :
- Cloud Infrastructure Ownership : Design, deploy, and manage secure, scalable cloud infrastructure on AWS, ensuring high availability, fault tolerance, and cost optimization for business-critical applications.
- Monitoring Observability : Implement and enhance observability and monitoring solutions using tools like DataDog , ensuring proactive detection of issues and continuous performance improvements across AWS environments.
- CI / CD Pipeline Optimization : Lead the development, optimization, and maintenance of robust CI / CD pipelines using Jenkins and other automation tools, improving deployment velocity, reliability, and consistency.
- Containerization Orchestration : Manage and optimize containerized applications using Kubernetes , leading efforts to improve scalability, efficiency, and deployment processes.
- Performance Cost Management : Continuously monitor AWS infrastructure and application performance, identify optimization opportunities, and drive initiatives for both cost management and improved resource utilization.
- Disaster Recovery Backup Planning : Architect and implement disaster recovery and backup strategies on AWS, balancing cost-efficiency with robust data protection and system availability.
- Advanced Troubleshooting : Act as a key point of contact for complex troubleshooting across multiple platforms and applications. Resolve critical incidents swiftly to minimize service disruptions and downtime.
- Security Compliance : Ensure cloud infrastructure and operations follow AWS security best practices, data protection regulations (e.g., HIPAA, GDPR), and organizational security requirements.
- Automation Infrastructure as Code : Champion the use of Infrastructure as Code (IaC) tools like Terraform or Ansible , leading the automation of cloud operations and driving improvements in operational efficiency.
- Mentorship Team Leadership : Mentor and guide junior team members in technical skills, cloud technologies, and operational best practices. Foster a collaborative and continuous learning environment.
- Collaboration Across Teams : Work closely with cross-functional teams (e.g., DevOps, software engineering, product) to understand business requirements and deliver cloud-based solutions that align with strategic goals.
Skills Qualifications :
Education : Bachelor s degree in Computer Science, Information Technology, or a related field, or equivalent practical experience.Experience : 5-8 years of experience in Site Reliability Engineering, Cloud Engineering, or a similar technical role, with a strong focus on AWS infrastructure, monitoring, automation, and cloud-native technologies.Technical Skills :Extensive experience in managing and optimizing AWS environments , including EC2, S3, Lambda, RDS, etc.Proficiency in cloud monitoring tools like DataDog or similar platforms.Strong experience with Jenkins for CI / CD pipeline management and automation.In-depth knowledge of container orchestration and management with Kubernetes .Strong scripting skills in languages such as Python or Bash for automation and system management.Expertise in Infrastructure as Code (IaC) with tools like Terraform or Ansible .Knowledge of cloud security best practices, compliance frameworks (e.g., HIPAA, SOC 2), and performance optimization.Strong troubleshooting and problem-solving skills, particularly in complex cloud-based environments.Leadership Mentorship : Demonstrated ability to lead technical initiatives, mentor junior engineers, and promote a culture of collaboration and continuous improvement.Communication Skills : Excellent verbal and written communication skills, with the ability to explain complex technical issues to both technical and non-technical audiences.Problem-Solving Ownership : A proactive, results-driven attitude toward solving complex technical problems and owning the reliability of critical systems.Nice-to-Haves :
AWS Certifications (e.g., AWS Certified Solutions Architect, AWS Certified DevOps Engineer).Experience with serverless architectures (e.g., AWS Lambda , API Gateway ).Familiarity with cloud-native tools like Kafka , Kinesis , or Redshift .Exposure to container security and advanced Kubernetes management practices.Skills Required
Soc, Automation, Python, Information Technology, Aws