We are looking for talented, creative, and proactive individuals who are passionate about solving complex business problems and contributing to the next generation of modern applications. Our goal is to help our customers understand the connections between application performance, user experience, and business outcomes, thereby creating exceptional customer experiences. Join us in shaping the future of Observability Engineering within our Intelligent Operations team with innovative data and integration solutions tools.
Experience
- Minimum 6+ years of hands-on experience with Application Performance Management tools such as Datadog, New Relic, AppDynamics, Dynatrace, Splunk ITSI, Honeycomb, Chronosphere, Riverbed Aternity / Alluvio, ExtraHop, & Logic Monitor
- Hands-on experience with cloud-native, open-source solutions like Prometheus, Grafana, ELK stack / Elastic.io, OpenTelemetry (OTEL)
- Experience with public cloud solutions like AWS CloudWatch, Azure App Insights, etc.
- Strong understanding of network & system management solutions, distributed systems, networking, and database technologies
- Operational background and familiarity with ITIL, ITSM, SRE, or DevOps best practices and principles
- Excellent problem-solving skills, organizational, project management, and communication skills
- Eagerness to collaborate, contribute to team success, and a continuous learning mindset
- Experience with containerization and orchestration technologies like Docker and Kubernetes
- Broad background in software engineering with, at a minimum, generalist-level expertise in programming languages such as Python, Java, Go, .NET, NodeJS, Ruby, and PHP
- Familiarity with microservices architecture, service mesh technologies, and end-user technologies (iOS, Android, JavaScript, HTML5)
- Knowledge of configuration management tools such as Terraform and Ansible
Roles and Responsibilities
Implement and maintain cutting-edge Observability solutions utilizing tools like New Relic, Datadog, AppDynamics, or Dynatrace for our large-scale enterprise customersDevelop and maintain systems for effective monitoring, logging, and tracing, ensuring scalability and reliabilityCollaborate with cross-functional teams, including software engineers, product managers, and data scientists, to build resilient systemsIntegrate observability practices into different engineering workflows and lead the adoption, optimization, and integration of products within the customers business infrastructureCreate custom dashboards, set up alerts, and develop AIOps rules, ensuring effective tracking against goals / KPIsProvide technical support in post-sales processes, including installation, deployment, training, technical check-ups, and escalation managementIdentify performance bottlenecks and anomalous system behavior and resolve root causes of service issuesStay updated with the latest trends in observability, logging, monitoring, and cloud technologies and introduce innovative solutions and best practicesParticipate in strategic technology planning, focusing on scalability, cost-effectiveness, and risk management in observability infrastructureDocument observability systems and processes comprehensively and prepare reports for management on system performance and reliabilityUtilize Infrastructure as Code (IaC) principles for efficient infrastructure provisioning and managementSkills Required
Kubernetes, Python, Datadog, Appdynamics, Prometheus, Terraform