Talent.com
Agentic Infrastructure Observability Engineer

Agentic Infrastructure Observability Engineer

ConfidentialNagar, Sahibzada Ajit Singh Nagar, India
5 days ago
Job description

About Xenonstack

XenonStack is the fastest-growing Data and AI Foundry for Agentic Systems , enabling enterprises to gain real-time and intelligent business insights .

We Deliver Innovation Through

  • Agentic Systems for AI Agents akira.ai
  • Vision AI Platform xenonstack.ai
  • Inference AI Infrastructure for Agentic Systems nexastack.ai

Our mission is to accelerate the world's transition to AI + Human Intelligence by building platforms that are scalable, reliable, and observable by design .

THE OPPORTUNITY

We are seeking an Agentic Infrastructure Observability Engineer to design and implement end-to-end observability frameworks for AI-native and multi-agent systems.

This role sits at the heart of AgentOps and Reliability Engineering — ensuring that agents, pipelines, and infrastructure are monitored, measurable, and continuously optimized.

If you thrive on metrics, monitoring, and making complex systems transparent and reliable , this role offers a chance to define observability for the next generation of enterprise AI.

Key Responsibilities

  • Observability Frameworks
  • Design and implement observability pipelines covering metrics, logs, traces, and cost telemetry for agentic systems.
  • Build dashboards and alerting systems to monitor reliability, performance, and drift in real-time.
  • Agentic AI Monitoring
  • Track LLM usage, context windows, token allocation, and multi-agent interactions.
  • Build monitoring hooks into LangChain, LangGraph, MCP, and RAG pipelines.
  • Reliability & Performance
  • Define and monitor SLOs, SLIs, and SLAs for agentic workflows and inference infrastructure.
  • Conduct root cause analysis of agent failures, latency issues, and cost spikes.
  • Automation & Tooling
  • Integrate observability into CI / CD and AgentOps pipelines.
  • Develop custom plugins / scripts to extend observability for LLMs, agents, and data pipelines.
  • Collaboration & Reporting
  • Work with AgentOps, DevOps, and Data Engineering teams to ensure system-wide observability.
  • Provide executive-level reporting on reliability, efficiency, and adoption metrics.
  • Continuous Improvement
  • Implement feedback loops to improve agent performance and reduce downtime.
  • Stay updated with state-of-the-art observability and AI monitoring frameworks.
  • Skills & Qualifications

    Must-Have

  • 3–6 years of experience in SRE, DevOps, or Observability Engineering.
  • Strong knowledge of observability tools (Prometheus, Grafana, ELK, OpenTelemetry, Jaeger).
  • Experience with cloud-native infrastructure (AWS, GCP, Azure) and Kubernetes monitoring.
  • Proficiency in Python, Go, or Bash for scripting and automation.
  • Understanding of AI / LLM pipelines, RAG systems, and vector databases.
  • Hands-on with CI / CD pipelines and monitoring-as-code.
  • Good-to-Have

  • Experience with AgentOps tools (LangSmith, PromptLayer, Arize AI, Weights & Biases).
  • Exposure to AI-specific observability (token usage, model latency, hallucination tracking).
  • Knowledge of Responsible AI monitoring frameworks.
  • Background in BFSI, GRC, SOC, or other regulated industries.
  • WHY SHOULD YOU JOIN US

  • Agentic AI Product Company
  • Build observability frameworks for next-gen enterprise AI systems .

  • A Fast-Growing Category Leader
  • Be part of one of the fastest-growing AI Foundries , powering mission-critical agent deployments.

  • Career Mobility & Growth
  • Advance into roles like Reliability Architect, AgentOps Lead, or Head of Observability .

  • Global Exposure
  • Work on observability challenges across Fortune 500 enterprises and global innovators .

  • Create Real Impact
  • Ensure transparency, trust, and resilience in production-grade AI systems.

  • Culture of Excellence
  • Our values — Agency, Taste, Ownership, Mastery, Impatience, and Customer Obsession — give you autonomy to innovate and accountability to deliver.

  • Responsible AI First
  • Help enterprises adopt AI that is not just powerful, but explainable and auditable .

    XENONSTACK CULTURE – JOIN US & MAKE AN IMPACT!

    At XenonStack, we believe in shaping the future of intelligent systems . We foster a culture of cultivation built on bold, human-centric leadership principles, where deep work, simplicity, and adoption define everything we do.

    Our Cultural Values

  • Agency – Be self-directed and proactive.
  • Taste – Sweat the details and build with precision.
  • Ownership – Take responsibility for outcomes.
  • Mastery – Commit to continuous learning and growth.
  • Impatience – Move fast and embrace progress.
  • Customer Obsession – Always put the customer first.
  • Our Product Philosophy

  • Obsessed with Adoption – Making observability and trust an integral part of enterprise AI.
  • Obsessed with Simplicity – Turning complex monitoring into seamless, actionable insights.
  • Be part of our mission to accelerate the world's transition to AI + Human Intelligence — by making agentic AI systems transparent, observable, and reliable at scale .

    Skills Required

    Go, Grafana, Elk, Aws, Prometheus, Bash, Python, Azure, Gcp

    Create a job alert for this search

    Infrastructure Engineer • Nagar, Sahibzada Ajit Singh Nagar, India

    Related jobs
    • Promoted
    Cloud Infrastructure Engineer

    Cloud Infrastructure Engineer

    JRD Systemspanchkula, haryana, in
    We are seeking a highly skilled.Senior DevOps / Platform Engineer.The ideal candidate will have deep expertise in infrastructure automation, Terraform, and cloud platform management, with a strong De...Show moreLast updated: 1 day ago
    • Promoted
    Aws Engineer

    Aws Engineer

    Spryc SystemsBaddi, Republic Of India, IN
    We are seeking an experienced AWS Engineer to design, implement, and maintain AWS infrastructure and services in a managed service environment. The ideal candidate will possess deep expertise in AWS...Show moreLast updated: 1 day ago
    • Promoted
    Aws Cloud Engineer

    Aws Cloud Engineer

    ProgliteMohali, Republic Of India, IN
    Infrastructure & System Administration : .Deploy, manage, and optimize EC2 instances across dev, test, and production environments. Perform system administration and troubleshooting for Linux and Wind...Show moreLast updated: 1 day ago
    • Promoted
    Senior Cloud Infrastructure Engineer

    Senior Cloud Infrastructure Engineer

    ArthaNovabaddi, himachal pradesh, in
    ArthaNova is an institutional-grade platform that tokenises real-world credit assets to deliver transparent, programmable liquidity for lenders and investors. Our architecture is blockchain-agnostic...Show moreLast updated: 1 day ago
    • Promoted
    IT Infrastructure Audit Engineer

    IT Infrastructure Audit Engineer

    Live Connectionspanchkula, haryana, in
    Greetings from Live Connections!.We have an urgent requirement on.IT Infrastructure Audit Engineer.Please find the below job description and kindly share me your updated CV to sharmila@liveconnecti...Show moreLast updated: 1 day ago
    • Promoted
    Cloud Engineer

    Cloud Engineer

    TalinkChandigarh, India, India
    We are seeking a skilled and experienced Senior Azure Cloud Engineer to architect, implement, and maintain mission-critical Azure cloud infrastructure for strategic initiatives of our clients.This ...Show moreLast updated: 30+ days ago
    • Promoted
    Senior Infrastructure Engineer || On-site

    Senior Infrastructure Engineer || On-site

    ConfidentialChandigarh
    Trantor is seeking an experienced Senior Infrastructure Engineer to join our dynamic team working on the Digital Concierge Project. This role will focus on leading the design, implementation, and ma...Show moreLast updated: 5 days ago
    • Promoted
    AI & Cloud computing Trainer

    AI & Cloud computing Trainer

    Chitkara University, PunjabRajpura, Punjab, India
    We’re Hiring : Technical Trainer – AI & Cloud Computing ☁️🤖.Chitkara University is inviting applications for the role of Technical Trainer in Artificial Intelligence and Cloud Technologies.If you’r...Show moreLast updated: 1 day ago
    • Promoted
    Information Technology Infrastructure Engineer

    Information Technology Infrastructure Engineer

    Extended Teams by ExtendedGTpanchkula, haryana, in
    Full-time – working directly with a UK-based company via.UK time, Monday to Friday (flexibility required).We’re looking for an experienced. In this hands-on role, you’ll be responsible for maintaini...Show moreLast updated: 1 day ago
    • Promoted
    Full Stack Observability Engineer (Cloud Engineer)

    Full Stack Observability Engineer (Cloud Engineer)

    FICOpanchkula, haryana, in
    FICO is seeking a Full-Stack observability Lead Engineer to design, maintain, and optimize our observability platform.The ideal candidate will be an expert in Open telemetry(Otel) instrumentation a...Show moreLast updated: 1 day ago
    • Promoted
    Infrastructure Solutions Architect

    Infrastructure Solutions Architect

    BayOne Solutionspanchkula, haryana, in
    Systems or Solutions Architect.IaaS), and cloud-scale system design.The ideal candidate combines strong fundamentals in.Kubernetes, observability, and automation. You’ll design scalable systems that...Show moreLast updated: 1 day ago
    • Promoted
    Infrastructure Engineer - Tier3

    Infrastructure Engineer - Tier3

    NEXPLAY SECUREpanchkula, haryana, in
    The Infrastructure Engineer (Tier III, remote) serves as the senior technical authority within Nexplay Secure's Managed Services division. This role leads the deployment and ongoing support of criti...Show moreLast updated: 30+ days ago
    • Promoted
    Regional Cloud Infrastructure Engineer

    Regional Cloud Infrastructure Engineer

    Argyll Scottpanchkula, haryana, in
    This position offers an opportunity to lead and support a diverse hybrid IT landscape across the APAC region.The Regional IT and Cloud Specialist will be responsible for managing, optimizing, and s...Show moreLast updated: 1 day ago
    • Promoted
    AWS Engineer

    AWS Engineer

    Spryc Systemspanchkula, haryana, in
    We are seeking an experienced AWS Engineer to design, implement, and maintain AWS infrastructure and services in a managed service environment. The ideal candidate will possess deep expertise in AWS...Show moreLast updated: 1 day ago
    • Promoted
    Remote Sr. Network Security Engineer

    Remote Sr. Network Security Engineer

    Nextbridge IT Solutionsbaddi, himachal pradesh, in
    Remote
    We are seeking an experienced subject matter expertise.This critical role is centered on high-severity incident management, complex security troubleshooting, and architectural improvements to our n...Show moreLast updated: 1 day ago
    • Promoted
    Cloud Infrastructure & Security Engineer

    Cloud Infrastructure & Security Engineer

    ConfidentialMohali, India
    We are seeking a highly skilled Cloud Infrastructure & Security Engineer to support and enhance client environments across AWS, Microsoft 365, and on-premises hybrid networks.The ideal candidate wi...Show moreLast updated: 5 days ago
    • Promoted
    Infrastructure Engineer

    Infrastructure Engineer

    TrantorChandigarh, Chandigarh, India
    Job Title : Lead Senior Infrastructure Engineer.Location : Chandigarh (On-site).Trantor is seeking an experienced Lead Senior Infrastructure Engineer to join our dynamic team working on the Digital C...Show moreLast updated: 8 days ago
    • Promoted
    Lead Network & Security Engineer (Hyperscalers – OCI / GCP)

    Lead Network & Security Engineer (Hyperscalers – OCI / GCP)

    Cloud4C Servicesbaddi, himachal pradesh, in
    Gartner’s Magic Quadrant (2021), is a leading automation-driven Cloud Managed Services Provider (MSP).We specialize in multi-cloud migration, management, and disaster recovery with zero data loss g...Show moreLast updated: 15 days ago