Responsible for the overall system / framework design working with client requirement.
Set standards for resiliency, high availability, multi-region strategy, DR strategy for application and services on AWS / Cloud
Run test suites / framework to find non compliance services / applications
Plan strategy to migrate non-compliant applications / services to adhere them to resiliency standards
Know how best to monitor systems and react when things go wrong, constantly writing and rewriting response playbooks to reduce the time to fix any breakdown which may occur
Ensure software applications remain reliable amidst frequent updates from development teams
Collaborate with development teams to optimize application performance & resiliency on AWS platforms
REQUIRED SKILL SETS
5+ years of experience in AWS, CICD and DevOps tools
Strong understanding of Cloud-based architecture & cloud operations
Working understanding of Infrastructure and application monitoring platforms – Datadog, Opensearch, ELK Stack etc.
Good understanding of performance and capacity monitoring. It’s configuration & optimization
5+ years of experience in setting up strategy, process and checks for resiliency in AWS
Knowledge of Linux, shell scripting, Python is preferred
Excellent problem-solving skills and attention to detail.
Ability to work independently as well as collaboratively in a team environment