Description GSPANN is hiring AI Operations Engineers to build monitoring solutions, automate workflows, and apply AIOps practices for reliable, high-performing systems. Expertise in AppDynamics, Sumo Logic, automation, and Site Reliability Engineering (SRE) is crucial.
Role and Responsibilities
- Architect and deploy monitoring solutions across applications, infrastructure, and networks using tools such as AppDynamics, Sumo Logic, Grafana, and LogicMonitor.
- Define and enforce observability best practices, including infrastructure monitoring, Application Performance Monitoring (APM), log analytics, and synthetic monitoring.
- Build and maintain dashboards, alerts, and Key Performance Indicator (KPI) reports that provide actionable insights for technical and business stakeholders.
- Write and optimize log queries (e.g., Sumo Logic queries, Dynatrace DQL, Grafana Loki, Splunk SPL) to extract meaningful data and support root cause analysis.
- Monitor and improve performance in cloud environments such as Amazon Web Services (AWS), Microsoft Azure, and Google Cloud Platform (GCP), as well as containerized platforms like Kubernetes and Docker.
- Develop automation scripts and workflows using Python, PowerShell, or Shell scripting to enable self-healing systems and streamline monitoring operations.
- Integrate monitoring tools with Information Technology Service Management (ITSM) platforms, including ServiceNow, Ivanti, and FreshService, to automate incident detection, ticketing, and resolution workflows.
- Perform deep-dive troubleshooting and analysis by correlating data across multiple monitoring sources to uncover performance issues and anomalies.
- Collaborate with DevOps, infrastructure, and application teams to ensure full monitoring coverage and continuous improvement of observability practices.
Skills and Experience
5–8 years of experience in monitoring, observability, or AIOps engineering roles.Design, implement, and configure monitoring solutions across applications, infrastructure, and networks using AppDynamics, Sumo Logic, and Grafana.Strong understanding of monitoring methodologies, including infrastructure monitoring, APM, log analytics, and synthetic monitoring.Hands-on experience in LogicMonitor, particularly for infrastructure monitoring.Expertise in dashboard creation, alerting, and KPI reporting for business and technical audiences.Proficiency in log query languages such as Sumo Logic queries, Dynatrace DQL, Grafana Loki, or Splunk SPL.Familiarity with AWS, Azure, GCP, and containerized workloads on Kubernetes and Docker.Working knowledge of scripting and automation with Python, PowerShell, or Shell scripting for integrations and self-healing automation.Strong analytical and troubleshooting abilities to correlate data across monitoring sources.Experience integrating monitoring tools with ITSM platforms such as ServiceNow, Ivanti, or FreshService to support automated workflows.