Job descriptionJob Description AgileEngine is an Inc. 5000 company that creates award-winning software for Fortune 500 brands and trailblazing startups across 17+ industries. We rank among the leaders in areas like application development and AI/ML, and our people-first culture has earned us multiple Best Place to Work awards. WHY JOIN US If you're looking for a place to grow, make an impact, and work with people who care, we'd love to meet you! ABOUT THE ROLE We are looking for a Senior Site Reliability Engineering to strengthen our platform reliability and observability capabilities. You will own the design and operation of monitoring infrastructure — including Datadog APM, alerting, and distributed tracing — across Kubernetes-based microservices on AWS. The role spans backend engineering and SRE practice in roughly a 65/35 split, with direct involvement in CI/CD integration and observability automation. You will also support internal teams in adopting monitoring best practices as we modernize our R&D platform. WHAT YOU WILL DO - Design, build, and maintain scalable backend and platform components; - Implement and manage observability solutions across distributed systems; - Configure dashboards, alerts, and APM for tracing, metrics, and logging; - Monitor and improve system reliability, scalability, and performance; - Deploy, operate, and maintain services in Kubernetes environments; - Integrate observability tools into CI/CD pipelines and cloud infrastructure; - Automate monitoring and operational workflows using scripting; - Provide operational and training support for observability platforms, especially Datadog; - Collaborate with engineering teams to improve system visibility and reliability practices. MUST HAVES - 4+ years of experience with Python, Node.js, or Java; - Hands-on experience with API integrations; - Strong experience in Kubernetes environments; - Experience with Datadog or similar tools such as Prometheus and Grafana; - Ability to configure dashboards, alerts, and APM; - Experience monitoring containerized and microservices architectures; - Hands-on experience with AWS; - Experience integrating observability tools into cloud environments; - Experience with CI/CD integrations for observability; - Ability to automate monitoring and operational tasks using scripting; - Upper-intermediate English level. NICE TO HAVES - Experience owning and operating an internal engineering platform, especially observability platforms; - Demonstrated ownership of reliability, scalability, and performance; - Ability to proactively lead maintenance and platform improvements; - Experience installing and configuring Datadog agents and integrations; - Experience managing API keys and secure configurations; - Experience managing user roles and access controls; - Familiarity with Go (Golang); - Experience with additional observability tools such as New Relic, Dynatrace, Elastic Stack, or Splunk. PERKS AND BENEFITS - Remote work & Local connection: Work where you feel most productive and connect with your team in periodic meet-ups to strengthen your network and connect with other top experts. - Legal presence in India: We ensure full local compliance with a structured, secure work environment tailored to Indian regulations. - Competitive Compensation in INR: Fair compensation in INR with dedicated budgets for your personal growth, education, and wellness. - Innovative Projects: Leverage the latest tech and create cutting-edge solutions for world-recognized clients and the hottest startups.