Job Description
REQUIREMENTS :
- Total experience 10+ years.
- Strong working expertise with observability and monitoring tools : Splunk, Datadog, ELK, Prometheus, or similar.
- Proven experience in anomaly detection, alert tuning, event correlation, and custom dashboards.
- Deep understanding of alert deduplication, incident impact scoring, and automation frameworks.
- Hands-on with automation platforms (Rundeck, StackStorm, Jenkins, or custom scripting).
- Strong Python expertise (scripting & automation) and proficiency in Bash or other scripting languages.
- Experience in leveraging AI / ML for Ops : log analysis, chatbot incident assistance, predictive alerts.
- Knowledge of multi-cloud platforms and tools like PolyCloud, Terraform, or CloudFormation.
- Strong experience with ITSM tools (ServiceNow, Remedy) and their integration into AIOps pipelines.
- Expertise in integrating ServiceNow via REST / SOAP APIs for incident automation, CMDB sync, and workflow orchestration.
- Working knowledge of ITIL processes and how AIOps enhances Incident, Problem, and Change Management.
- Exposure to CMDB integration, dependency graphs, and service maps for contextual alerting and automation.
- Excellent communication and collaboration skills, with the ability to interact effectively with senior stakeholders.
RESPONSIBILITIES :
Understanding the client’s business use cases and technical requirements and be able to convert them into technical design which elegantly meets the requirements.Mapping decisions with requirements and be able to translate the same to developers.Identifying different solutions and being able to narrow down the best option that meets the clients’ requirements.Defining guidelines and benchmarks for NFR considerations during project implementation.Writing and reviewing design document explaining overall architecture, framework, and high-level design of the application for the developers.Reviewing architecture and design on various aspects like extensibility, scalability, security, design patterns, user experience, NFRs, etc., and ensure that all relevant best practices are followed.Developing and designing the overall solution for defined functional and non-functional requirements; and defining technologies, patterns, and frameworks to materialize it.Understanding and relating technology integration scenarios and applying these learnings in projects.Resolving issues that are raised during code / review, through exhaustive systematic analysis of the root cause, and being able to justify the decision taken.Carrying out POCs to make sure that suggested design / technologies meet the requirements.Qualifications
Bachelor’s or master’s degree in computer science, Information Technology, or a related field.