As a Senior Site Reliability Engineer, you will be responsible for :
- Demonstrating best practices pertaining to Cloud DevOps development along with a willingness to continually learn Cloud native technologies.
- Following security guidelines to develop secure and compliant Cloud services by working with Risk and Security teams.
- Monitoring configuration management, platform layout, and hosting infrastructure.
- Automating deployment of applications and infrastructure
- Be able to work independently and in a team environment managing a range of customers and technical situations.
- Providing technical application support for enterprise-level systems
- Running our infrastructure with Chef, Ansible, Terraform, Github CI / CD, and Kubernetes
- Participating in Capacity planning, system performance monitoring, resource utilization trending and incident and change management.
- Co-ordinating with Cloud infrastructure partners for Server, Network, Database, service-related incidents, and projects
- Deploying application upgrades / patches in production and test environments
- Troubleshooting application alerts, Azure and AWS Policy from monitoring tools and code inspection and performing RCAs
- Writing tutorials, how-to videos, and other technical articles for the customer community and knowledgebase articles and keep them up to date
- Working on critical, complex customer problems that may span multiple services
- Participating in 24x7 on-call rotation and working with global teams
- Collaborating with cross functional stakeholders
- Providing mentorship and guidance to team members
- Ensuring security best practices are integrated into the development lifecycle, including compliance with data protection regulations.
- Collaborating with stakeholders to understand requirements, set priorities, and communicate progress and challenges.
Fuel your passion
To be successful in this role you will :
Have bachelors degree in computer science or STEM Majors (Science, Technology, Engineering and Math) with 7-10 years of experience in total.Have 5-8 years of experience with cloud infrastructure platforms such as AWS and Azure. Have prior experience in setting up, running and configuring Cloud applications.Have 5+ years of Hands-on experience with Public Cloud-based applications, technologies and tools, deployment, monitoring, and operations, such as Docker, Kubernetes, etc.Have 5+ years of Experience in Linux (RHEL) operating system performance monitoring parameters and their interpretation, commands used for monitoringHave Mastery in collaborative software development using Git, Jira, Confluence etc.Have experience in infrastructure optimization in Cloud.Have deep understanding of operating and monitoring Java applications and Dockerized containersHave hands-on experience in CI-CD (AWS CodePipeline, Azure DevOps, GitLab CI / CD, Jenkins) and IaC tools (Terraform, AWS CloudFormation, Ansible etc.)Be an expert in performance monitoring and capacity management of enterprise systems using various tools.Have experience in Observability - APM tools (Dynatrace, AppDynamics etc.), metrics / log consolidation (Splunk) and logging tools such as Prometheus, Grafana, and the ELK stack is essential.Have knowledge of application design patterns, J2EE application architectures, Microservices, Spring boot & Cloud native architecturesHave proficiency in Java runtimes, Core Java, Garbage collection, JVM parameters tuningHave experience in RDBMS and NoSQL database technologiesHave Knowledge in automation scripting language like Python / Linux Shell scripting / Windows PowershellHave experience in Change management and Incident management processSkills Required
Rdbms, Core Java, Linux, Automation, Application Support, Python