Talent.com
Site Reliability Engineer
Site Reliability EngineerAION • Bengaluru, KA, IN
Site Reliability Engineer

Site Reliability Engineer

AION • Bengaluru, KA, IN
30+ days ago
Job type
  • Quick Apply
Job description

About AION

AION is building the next generation of AI cloud platform by transforming the future of high-performance computing (HPC) through its decentralized AI cloud. Purpose-built for bare-metal performance, AION democratizes access to compute power for AI training, fine-tuning, inference, data labeling, and beyond.

By leveraging underutilized resources such as idle GPUs and data centers, AION provides a scalable, cost-effective, and sustainable solution tailored for developers, researchers, and enterprises. The platform's innovative Proof of Compute Contribution (PoCC) protocol rewards contributors based on performance, creating a transparent and efficient ecosystem.

Integrated with Tether (USD₮ & USD₮0) for stability and regulatory clarity, AION eliminates volatility, ensuring predictable costs and seamless transactions. With cutting-edge partnerships and a USD-backed economy, AION is pioneering the commoditization of high-performance compute, empowering global innovation and bridging the AI wealth gap.

Led by high-pedigree founders with previous exits, AION is well-funded by major VCs with strategic global partnerships. Headquartered in the US with global presence, the company is building its initial core team in India.

Who you are

You are a reliability-focused engineer with deep expertise in cloud-native systems and infrastructure automation. You thrive on building robust monitoring solutions and creating self-healing infrastructure. You understand the challenges of maintaining high availability across distributed systems and have experience implementing SRE best practices. You're passionate about creating production-ready environments that can scale efficiently and recover automatically from failures.

Technical Skills & Experience

  • 3-8 years of experience in Site Reliability Engineering or DevOps (exceptional candidates with different experience profiles will be considered)
  • A Tier1 college education or previous work experience at FAANG / top startups is preferred but not required
  • Cloud Platforms : Deep expertise with AWS, GCP, or Azure infrastructure services
  • Kubernetes : Advanced knowledge of Kubernetes operations, cluster management, and troubleshooting
  • Infrastructure as Code : Strong experience with Terraform, Pulumi, or similar IaC tools
  • Observability : Expertise implementing comprehensive monitoring using Prometheus, Grafana, and ELK stack
  • Service Mesh : Experience with Istio, Linkerd, or similar service mesh technologies
  • Networking : Understanding of network architectures, DNS, load balancing, and security groups
  • CI / CD : Knowledge of automated deployment pipelines and GitOps workflows
  • Scripting : Proficiency in Bash, Python, or Go for automation scripts
  • Container Technologies : Deep understanding of Docker, containerd, and OCI specifications
  • Security : Knowledge of infrastructure security best practices and compliance requirements
  • Incident Management : Experience with incident response, post-mortems, and developing SOP documentation

Key Responsibilities

  • Responsible for designing and implementing comprehensive monitoring and alerting systems across all AION platforms.
  • Develop automation for infrastructure provisioning, scaling, and recovery using Terraform and Kubernetes.
  • Create and maintain runbooks and playbooks for handling common operational scenarios and incidents.
  • Responsible for implementing service mesh solutions for observability, traffic management, and security.
  • Design and implement logging systems that provide visibility into complex distributed systems.
  • Responsible for capacity planning and resource optimization across cloud environments.
  • Implement CI / CD pipelines for reliable and consistent deployments across all environments.
  • Design and build self-healing systems that automatically recover from common failure modes.
  • Develop infrastructure for both the compute platform and data annotation services with consistent reliability practices.
  • Responsible for designing and implementing disaster recovery strategies and testing procedures.
  • Create and maintain production, staging, and development environments with appropriate isolation.
  • Collaborate with security teams to implement infrastructure security best practices and compliance requirements.
  • Location

    Individuals in this role are expected to relocate to Bangalore, though exceptions can be made. We offer a hybrid working setup with 3 days in-office setup. Employees would have flexibility to work from anywhere for a few months during a year.

    Why Join Us

  • Be part of a mission-driven team at the intersection of web3 and AI, tackling some of the most exciting challenges in the industry.
  • Join the ground floor of an AI startup, with the opportunity to make a significant impact on the company and the industry.
  • Collaborate with top-tier talent from the tech industry.
  • Competitive salary and benefits package.
  • Flexible work environment with opportunities for professional growth and development.
  • If you are a skilled and motivated Site Reliability Engineer with a passion for building reliable, scalable infrastructure for cutting-edge compute systems, we would love to hear from you.

    Create a job alert for this search

    Site Reliability Engineer • Bengaluru, KA, IN

    Related jobs
    Site Reliability Engineer

    Site Reliability Engineer

    GlobalFoundries • Bengaluru, Karnataka, India
    GlobalFoundriesis a leading full-service semiconductor foundry providing a unique combination of design development and fabrication services to some of the worlds most inspired technology companies...Show more
    Last updated: 30+ days ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Synamedia • Bengaluru, Karnataka, India
    At Synamedia, the world’s most talented innovators and trailblazers are shaping the way the world is entertained and informed. We are backed by the Permira funds and Sky.This is the age of infinite ...Show more
    Last updated: 16 days ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Reyika • Bengaluru, Karnataka, India
    Senior Site Reliability Engineer / Reliability Architect.Pune,Bengalore,Chennai,Pune,Noida.Reliability Architect with over 9 years of experience in proactive monitoring, automation, and observabili...Show more
    Last updated: 7 days ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Delta Electronics India • Bengaluru, Karnataka, India
    Define and monitor Service Level Objectives (SLOs), Service Level Indicators (SLIs), and error budgets to balance reliability with feature velocity and ensure optimal system availability.Respond to...Show more
    Last updated: 5 days ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Yum! India Global Services Private Limited • Bengaluru, India
    Design, test, implement, deploy, and support continuous integration pipelines that build and deploy to cloud-based environments (development, stage / testing, production). In this role, you will help ...Show more
    Last updated: 4 days ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Synechron • Bengaluru, Karnataka, India
    We have immediate opportunity for Senior Site Reliability Engineer.Senior Site Reliability Engineer.At Synechron, we believe in the power of digital to transform businesses for the better.Our globa...Show more
    Last updated: 30+ days ago • Promoted
    Site Reliability Engineer IC3

    Site Reliability Engineer IC3

    Oracle • Bengaluru, Republic Of India, IN
    Solve complex problems related to infrastructure cloud services and build automation to prevent problem recurrence.Design, write, and deploy software to improve the availability, scalability, and e...Show more
    Last updated: 15 days ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    GREYTIP SOFTWARE PRIVATE LIMITED • Bengaluru, Karnataka, India
    About the Role We are looking for a skilled Site Reliability Engineer II to join our SRE team.The ideal candidate will have hands-on experience in production monitoring, alert handling, and L1 p...Show more
    Last updated: 10 days ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Aqilea (formerly Soltia) • Bangalore, Karnataka, India
    Quick Apply
    We are a consulting company with a bunch of technology-interested and happy people!.We love technology, we love design and we love quality. Our diversity makes us unique and creates an inclusive and...Show more
    Last updated: 30+ days ago
    Senior Site Reliability Engineer

    Senior Site Reliability Engineer

    o9 Solutions, Inc. • Bengaluru, Karnataka, India
    Be part of something revolutionary.At o9 Solutions, our mission is clear : be the Most Valuable Platform (MVP) for enterprises. With our AI-driven platform — the o9 Digital Brain — we integrate globa...Show more
    Last updated: 13 days ago • Promoted
    Site Reliability Engineer - 2

    Site Reliability Engineer - 2

    Confidential • Bengaluru / Bangalore
    MoEngage, you'll be a critical member of our SRE team, responsible for the health and performance of key services and contributing directly to the evolution of our infrastructure at a scale that fe...Show more
    Last updated: 30+ days ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    WhiteLotus Talent Partners • Bengaluru, Karnataka, India
    L0 and L1 Site Reliability Engineer (SRE) Support.Krutrim Cloud Site Reliability operations team and ensure the smooth functioning of our cloud infrastructure powered by. In this role, you will focu...Show more
    Last updated: 30+ days ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    super.money • Bengaluru, Karnataka, India
    Site Reliability Engineer (SRE) Level 3.A Site Reliability Engineer (SRE) Level 3 is a senior technical leadership role focused on designing, implementing, and maintaining large-scale, complex, and...Show more
    Last updated: 23 days ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    MSCI • Bangalore, Karnataka, India
    The successful candidate shall be part of the ESG Production and Application Management Team.Our team provides a tier 2 / 3 support to proprietary MSCI ESG Business. This position involves collabora...Show more
    Last updated: 6 days ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Capgemini • Bengaluru, IN
    Choosing Capgemini means choosing a company where you will be empowered to shape your career in the way you’d like, where you’ll be supported and inspired by a collaborative community of colleagues...Show more
    Last updated: 30+ days ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Landmark Group • Bengaluru, India
    Ensure reliability and high availability of Java and microservices-based applications through proactive monitoring and automation. Define and track SLIs / SLOs to maintain service performance and ...Show more
    Last updated: 13 days ago • Promoted
    Site Reliability Engineer I

    Site Reliability Engineer I

    Backblaze External Website • Bengaluru, Karnataka, India
    Backblaze is the object storage leader in the open cloud movement fueling customer success with cloud storage built purposefully to unlock budgets unburden administrators and unleash innovators.Tog...Show more
    Last updated: 12 days ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    SentinelOne • Bengaluru, Karnataka, India
    SRE organizations mission at SentinelOne (S1) is to keep our uptime promise to our customers by ensuring we meet our SLOs / SLAs help our engineering teams ship software to our customers fast and wit...Show more
    Last updated: 18 days ago • Promoted