Talent.com
Sr SRE Advanced (Data)

Sr SRE Advanced (Data)

EMBARKGCC SERVICES PRIVATE LIMITEDBangalore North, KA, in
3 days ago
Job type
  • Quick Apply
Job description

Job Description

Job Summary

As a Data SRE Lead, you will architect and oversee end-to-end reliability strategies for data infrastructure. You’ll work with cloud, data engineering, and analytics teams to ensure that critical data services are secure, automated, observable, and resilient.

Responsibilities

  • Lead design and automation of resilient data platforms across multi-cloud environments.
  • Drive data reliability frameworks (SLO / SLI metrics, incident prevention, SLA tracking).
  • Oversee monitoring, logging, and cost optimization for big-data environments.
  • Manage data platform scalability, versioning, and fault tolerance.
  • Build recovery and rollback mechanisms for critical data pipelines.
  • Guide teams in observability, data validation automation, and data governance integration.
  • Mentor junior engineers and define SRE standards for data reliability.

Requirements

Skills & Tools

  • 5–8 years of experience in Data Ops / Data Engineering / SRE .
  • Expertise in Databricks , Kafka , Airflow , Snowflake , or Spark .
  • Strong understanding of AWS / Azure data ecosystem (S3, Glue, Redshift, ADF, Synapse).
  • Automation using Terraform , Ansible , and Python scripting.
  • Familiar with monitoring and observability tools (CloudWatch, Grafana, Data-dog).
  • Experience implementing SLA / SLO frameworks for data reliability.
  • Eligibility

  • Bachelor’s or Master’s in Computer Science, Data Science, or Information Systems.
  • Proven record of leading data reliability or platform stability projects.
  • Strong communication and leadership skills across technical and business teams.
  • Requirements

    Skills & Tools 5–8 years of experience in Data Ops / Data Engineering / SRE. Expertise in Databricks, Kafka, Airflow, Snowflake, or Spark. Strong understanding of AWS / Azure data ecosystem (S3, Glue, Redshift, ADF, Synapse). Automation using Terraform, Ansible, and Python scripting. Familiar with monitoring and observability tools (CloudWatch, Grafana, Data-dog). Experience implementing SLA / SLO frameworks for data reliability. Eligibility Bachelor’s or Master’s in Computer Science, Data Science, or Information Systems. Proven record of leading data reliability or platform stability projects. Strong communication and leadership skills across technical and business teams.

    Create a job alert for this search

    Sre • Bangalore North, KA, in