Ensures the successful installation, integration, and deployment of monitoring tools applications, and solutions in various production and non-production environments.
Identifies opportunities for streamlining processes through automation.
Prepares and maintains technical documentation to assist with the operation, maintenance, and development of the monitoring.
Analyses of reported incidents and problems for the applications, assessing patterns, making recommendations, and implementing monitoring solutions.
Install monitoring tools application working with the infrastructure teams, and report application metrics with the ultimate goal of preventing problems and improving IT effectiveness.
Creates and executes application and system test procedures after tools implementation.
Identifies critical business transactions and implements tools to manage alert and response.
Manage and ensure Security remediation Implementation for all findings.
Lead problem Management and resolution.
Provide Instrumentation for proactive diagnostic insight into the health of applications.
End-to-end ownership of managing the tools application and ensuring the on-time updates, hotfix & upgrades.
Keep up to date with new features and versions of the tools products.
Build custom monitor to support new requirements.
Service maintenance employing ITIL principles.
Provide knowledge transfer to team members and IT teams on Monitoring Systems.
Ensure & maintain HLD, LLD & SOP across all technology & tools support.
Continually learning, sharing knowledge learned, and pushing industry best practices.
Contribute to our long-term vision and strategy for our engineering processes.
Participate in the interview process for new engineers.
Working with application and system owners to clarify monitoring requirements and business needs.
Provide solution demonstrations.
Collaborating with teams throughout IT to ensure that the proper use of the monitoring system is understood.
Identifying and creating methods of solving complex monitoring issues.
Providing continuous suggestions for improved alert configuration and operational processes in relation to monitoring.
Serves as an escalation point or resource to system administrators for monitoring issues.
Mandatory Skillset :
Above 6-7 years of Installation, maintenance, and working knowledge of Infra & application monitoring tools, Solarwinds, OpsRamp, Dynatrace, New Relic, Prometheus & Grafana, ServiceNow ITOM, and equivalent.
Working knowledge of the holistic monitoring of applications with drill-down on the platform layer (infra, data, middleware, apps, etc.).
2 or more years of experience writing code (including Python, Shell Script, PowerShell; databases like MSSQL / Oracle. Basic understanding of modern software development methodologies (Object).
Basic knowledge of Server, Cloud, Virtualization, Database, Network, and Containers Administration.