Knowledge & Skills (Indicate which criteria are mandatory)
- Must have min 6 years of recent experience in Application Support / Technology Support / DevOps / CloudOps, and should be ready to work in a 24 X 7 support environment
- Must have managed 1 or more applications single handedly and worked as L2 / L3 support engineer for 2 to 3 years.
- Must be hands on with Unix Commands, Shell Scripting, PL / SQL, NOSQL, JCL, Programing Language : Java, Python,
- Must be hands on with observability tools like ELK, Kibana, Grafana, AppD, Splunk or any other similar tools
- Must have domain knowledge in E-commerce, Retail, Consumer Goods, Supply Chain or any equivalent domain applications that have direct customer facing web or mobile applications
- Must be hands on with analyzing logs, thread dumps, heap dumps, GCs etc.
- Working / Functional knowledge of SAP Hybris, IBM Sterling, Magento Commerce, SAP or any other E-commerce platform would be an added advantage
- ITIL foundation certifications will be added advantages
- Good understanding of microservices architecture
- Working knowledge of Dockers, Kubernetes, Cloud platforms would be added advantage
- Strong written and verbal communication skills is must
Job Role : SME
Key Responsibilities
Application Operations & Management :
Study and perform capacity planning to ensure that adequate capacity is available in application and application as per present and future projections across all environments (Replica and Prod)
Study Volumetrics / traffic / routing patterns and perform business KPI trending to identify abnormal patterns / deviations that may cause system issues in future. Propose and make changes towards closure.Perform continuous checks on E2E application w.r.t functionality, sequence flows, system load managementHandle all escalations on issues if not resolved or partially resolved by L2Keep track of all existing defects in application and review the closure status with app lead / platform lead.Lead and participate in all Sev1 / Sev2 Issue and resolution activity by way of, 'Issue analysis, fixing and RCA Identification, 'Log extraction and sharing with the Dev / SRE teams, 'Coordinate with Dev / SRE support team for workaround / fix to resolve the Sev1 / Sev2 issue, 'RCA Preparation and closure of action points closureAssist in timely reporting of critical issues to managementAssist in Generating KPI reports and Business Metrics for MIS reportingAlert configuration and monitoringIdentify all failure points are captured as part of monitoring and alert notifications and assist in configuration
Perform Optimization on existing alerts based on application workingIdentify and create known gaps and track them for closure based on alertsMonitor the Alerts in NGO Portal on ongoing basis for any exceptionsAssist App lead to work on alert reduction planChange Management :Review changes and assess end to end impact and limitations that might destabilize or impact production
Ensure changes are thoroughly tested in Replica environments and meets all the production standardsApplication Onboarding & New Projects :'Participate and support Project activities (Upgrades, migration, new product implementations)
Lead the Functional and Regression Testing activitiesPerform Performance and Stress Testing CompletenessLearning, Training and DocumentationCreate / Change the technical documentation (runbooks, configuration , design docs) as per review cycle
Create Standard Operating Procedures to be shared with all team members for immediate actionsPrepare a training calendar in coordination with App Lead , Prepare the training the material and train the resources in the team for operationsInformation Security & Audit Compliance :Lead and address Application security concerns (InfoSec observations, BAVAMA tasks) and are actioned and closed on priority basis.