Talent.com
This job offer is not available in your country.
Veryon - Technical Lead / Site Reliability Engineer - AWS Platform

Veryon - Technical Lead / Site Reliability Engineer - AWS Platform

VeryonChennai
19 days ago
Job description

Description :

Why We Need You The Mission & Our Vision :

Veryon is a leading software and technology company that enables aviation teams around the world to improve efficiency and safety.

Our products maximize uptime for aircraft maintenance teams through customer-driven innovation and world-class service.

With over 7,500 customers across 137 countries, we serve general and business aviation, military / defense, commercial aviation, and OEMs.

Our valuesFueled by Customers, Win Together, Make It Happen, Innovate to Elevateare the foundation of everything we do.

As a hands-on Technical Lead in Site Reliability Engineering, you will be directly responsible for designing, building, and implementing modern reliability practices to ensure uptime, resilience, and production excellence across Veryons systems.

Youll work closely with Engineering, DevOps, and Support teams to streamline software delivery to both internal and client environments, troubleshoot production issues, and build observability using Datadog, Dynatrace, and AWS-native tools.

You will also be a mentor on best practices and a key contributor to reliability-focused architecture and deployment design.

What Youll Accomplish Your Performance Objectives :

Objective #1 First 30 Days :

  • Complete onboarding and gain deep understanding of Veryons systems, release processes, and deployment environment on AWS.
  • Review existing application architecture, CI / CD flows, and monitoring implementations.
  • Begin implementing improvements to observability using Datadog and Dynatrace.
  • Collaborate with engineers and DevOps to identify bottlenecks in production releases and issue resolution.

Objective #2 First 90 Days :

  • Build or enhance monitoring dashboards and alerts for critical infrastructure and applications.
  • Define and begin implementing Service Level Objectives (SLOs), Service Level Indicators (SLIs), and error budgets.
  • Own and improve release workflows and ensure reliable software delivery to customer environments.
  • Take ownership of investigating production issues, ensuring timely resolution and coordination across teams.
  • Begin documenting Root Cause Analyses (RCAs) for production incidents and drive preventive improvements.
  • Partner with DevOps to optimize and automate CI / CD pipelines using GitLab or equivalent.
  • Objective #3 First 12 Months :

  • Deliver measurable improvements in system uptime, MTTR, and deployment success rate.
  • Build self-healing automation and rollback mechanisms for high-risk services.
  • Standardize and own the RCA process for production incidents to ensure continuous learning.
  • Implement robust controls and metrics to monitor software delivery health.
  • Support production readiness of new services through performance baselining and fault testing.
  • Establish and track health KPIs that inform operational decisions and product improvements.
  • Requirements :

    Key Job Responsibilities :

  • Implement and manage observability, alerting, and dashboards using Datadog, Dynatrace, and AWS tools.
  • Take ownership of production deployments, ensuring successful delivery to client environments with minimal disruption.
  • Troubleshoot and resolve production issues across the stack (infrastructure, application, integration).
  • Lead Root Cause Analysis (RCA) documentation, follow-ups, and remediation planning.
  • Define and maintain service SLOs, SLIs, and error budgets with product and engineering teams.
  • Build automation for deployment, monitoring, incident response, and recovery.
  • Design CI / CD workflows that support safe and reliable delivery across distributed environments.
  • Partner with developers to ensure observability and reliability are part of the application design.
  • Mentor engineers in SRE principles, monitoring strategy, and scalable operations.
  • Experience And Skills We Seek :

  • 6+ years of experience in SRE, DevOps, or platform engineering roles.
  • Strong hands-on experience with AWS services (e.g., EC2, ECS / EKS, RDS, IAM, CloudWatch, Route 53, ELB, etc.) is required.
  • Deep familiarity with CI / CD pipelines and deployment strategies using GitLab CI, Jenkins, or equivalent.
  • Expertise in observability tools such as Datadog and Dynatrace for APM, logging, and alerting.
  • Solid experience troubleshooting distributed systems in production environments.
  • Proficiency in scripting and infrastructure as code (e.g., Python, Bash, Terraform, Ansible).
  • Working knowledge of containers and orchestration (Docker, Kubernetes).
  • Understanding of SRE principles (SLIs, SLOs, MTTR, incident response, etc.)
  • Excellent communication and documentation skills, especially for RCA and runbook creation.
  • Bachelors or Masters degree in Computer Science, Engineering, or a related field.
  • (ref : hirist.tech)

    Create a job alert for this search

    Reliability Engineer • Chennai

    Related jobs
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Zyoin GroupChennai, Tamil Nadu, India
    Site Reliability Engineer (SRE).Chennai (Hybrid – 2 days in office).We are seeking a Site Reliability Engineer (SRE) responsible for leading reliability practices, ensuring scalable systems, and co...Show moreLast updated: 30+ days ago
    • Promoted
    Senior Site Reliability Engineer

    Senior Site Reliability Engineer

    PoshmarkChennai, Tamil Nadu, India
    We’re looking for an experienced Site Reliability Engineer to fill the mission-critical role of ensuring that our complex, web-scale systems are healthy, monitored, automated, and designed to scale...Show moreLast updated: 3 days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Amicon Hub Serviceschennai, tamil nadu, in
    Manage and scale production systems hosted on.Automate operational tasks using.Improve system reliability and reduce manual interventions through automation. Collaborate with development teams to en...Show moreLast updated: 5 days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    XebiaChennai, IN
    AWS Engineer with strong Python development and Chaos Engineering expertise.The ideal candidate will combine cloud engineering, DevOps, and chaos experimentation to improve reliability, fault toler...Show moreLast updated: 25 days ago
    • Promoted
    System Engineer

    System Engineer

    Netsmore Technologieschennai, tamil nadu, in
    Systems Engineer – Level 3 (Internal).Mandatory skills : AWS cloud infrastructure + OKTA administration.The L3 Systems Engineer role is more engineering-focused than traditional system admin roles.I...Show moreLast updated: 3 days ago
    • Promoted
    • New!
    Site Reliability Engineer - AWS / Azure

    Site Reliability Engineer - AWS / Azure

    Funic TechChennai
    Job Title : Site Reliability Engineer (SRE) Experience Required : 7+ Years Location : Bangalore / Chennai &l...Show moreLast updated: 1 hour ago
    • Promoted
    Site Reliability Engineer - AWS / Azure Cloud Services

    Site Reliability Engineer - AWS / Azure Cloud Services

    DeqodeChennai
    Profile : Site Reliability Engineer (SRE) Experience Required : 6+ Years Locations : Mumbai, Gurgaon, Ch...Show moreLast updated: 30+ days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    UplersChennai, IN
    Uplers is hiring for one of the clients.SRE (Oracle Cloud Infrastructure).Remote | Mon–Fri | 10 : 30 AM – 7 : 30 PM IST.Use of personal device required. OCI cloud infrastructure using Terraform and GitL...Show moreLast updated: 23 days ago
    • Promoted
    Site Reliability Engineer - Chaos Management

    Site Reliability Engineer - Chaos Management

    Xebiachennai, tamil nadu, in
    AWS Engineer with strong Python development and Chaos Engineering expertise.The ideal candidate will combine cloud engineering, DevOps, and chaos experimentation to improve reliability, fault toler...Show moreLast updated: 7 days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    ConcordChennai, IN
    Engineers (Individual Contributors).Strong SRE (Site Reliability Engineering).CI / CD, monitoring, automation, infrastructure as code, etc.Show moreLast updated: 17 days ago
    • Promoted
    Senior Site Reliability Engineer- ELK Expert

    Senior Site Reliability Engineer- ELK Expert

    iVedha Inc.Chennai, IN
    Senior Site Reliability Engineer (SRE) – ELK Expert | Platform Engineering Practice.Must be available to work in the EST (US / Canada) Time Zone. Are you a Senior Site Reliability Engineer (SRE) with ...Show moreLast updated: 30+ days ago
    • Promoted
    Poshmark - Senior Site Reliability Engineer - Cloud Infrastructure

    Poshmark - Senior Site Reliability Engineer - Cloud Infrastructure

    POSHMARKChennai
    Job Description : Were looking for an experienced Site Reliability Engineer to fill the mission-critical role of ensuring that our complex, web-scale systems ...Show moreLast updated: 18 days ago
    • Promoted
    Senior Site Reliability Engineer

    Senior Site Reliability Engineer

    WSO2chennai, tamil nadu, in
    Founded in 2005, WSO2 is the largest independent software vendor providing open-source API management, integration, and identity and access management (IAM) to thousands of enterprises in over 90 c...Show moreLast updated: 7 days ago
    • Promoted
    Senior Site Reliability Engineer

    Senior Site Reliability Engineer

    Tata Consultancy ServicesChennai, Tamil Nadu, India
    TCS is looking for Senior Site Reliability Engineer – AWS.Design, implement, and maintain scalable, secure, and highly available infrastructure on AWS. Develop and improve CI / CD pipelines, Infrastru...Show moreLast updated: 4 days ago
    • Promoted
    Site Reliability Engineer - Cloud Platforms

    Site Reliability Engineer - Cloud Platforms

    LanceSoft, IncChennai
    Role and Responsibilities : Reporting to Engineering, the Site Reliability Engineer will play a critical role in driving innovation and growth for the Banking Soluti...Show moreLast updated: 18 days ago
    • Promoted
    Infrastructure Lead – AWS

    Infrastructure Lead – AWS

    Qruize IncChennai, Tamil Nadu, India
    We are seeking an experienced and dynamic professional for the role of.The ideal candidate will have a strong background in IT infrastructure, hands-on AWS expertise, and proven leadership skills t...Show moreLast updated: 25 days ago
    • Promoted
    Senior Site Reliability Engineer

    Senior Site Reliability Engineer

    Loyalytics AIChennai
    Site Reliability / DevOps Engineer to be our first hire in this function, responsible for owning and scaling the reliability, observability, and infrastructure of our platform running entirely on M...Show moreLast updated: 14 days ago
    • Promoted
    RELX - Site Reliability Engineer - IAC Terraform

    RELX - Site Reliability Engineer - IAC Terraform

    REED ELSEVIER INDIA (a part of RELX India Pvt Ltd)Chennai
    Job Description : - Lead initiatives to identify and eliminate manual, repetitive tasks through automation and tooling.Develop s...Show moreLast updated: 18 days ago