Take ownership and be available 24 X 7 in case of crisis management
Perform trend analysis on the daily, weekly and monthly KPI reports to identify the potential risks and issues and guide the L1 / L2 team to drive them under continuous improvement plans.
Review the Business specific KPI dashboards and suggest value based improvements
Review the Incident, problem and change report and Guide L2 and L1 teams for closures within the SLA / OLA.
Perform the RCA and drive the actions identified for permanent resolution of the issues.
Training actions report including delivery of training materials created
Review the Knowledge articles and suggest improvement and updates periodically
Perform periodic audits and participates in the system improvement audits and contribute to the continuous improvement strategy
Collaborate and contribute to the sprint review / Planning
Collaborate with the product owner and leadership for driving the future innovation
Automate, stabilize and sustain the recurring tasks to drive the overall efficiency in the RTM run operations.
Qualifications :
Science graduate with 5+ years of experience in Infrastructure monitoring with SPLUNK administration and integrations with multiple teams in matrix organization.
Required Skills :
Strong Sense of ownership of technical issues and proven capabilities in driving the complex issues to closure
SPLUNK Infrastructure, deployment, integration and administration with certifications
Certified Redhat Linux administrator & strong understanding of the Windows OS
Amazon : AWS EC2, S3, Route53
Expert in Scripting : Shell / Python
Strong Understanding of Network DHCP / DNS / SSL
DevOps : Git / Ansible / Jenkins
JIRA / Confluence
Authent. SAML / Oauth
ITIL Methods for Incidents / Changes / Problem / Escalation Management