Talent.com
Site Reliability Engineer
Site Reliability EngineerSails Software Inc • Delhi, India
Site Reliability Engineer

Site Reliability Engineer

Sails Software Inc • Delhi, India
19 days ago
Job description

SRE- AWS

Job Summary

We are looking for an experienced and driven Senior Site Reliability Engineer (SRE) to architect, implement, and maintain robust cloud infrastructure. This role demands a deep understanding of AWS, Kubernetes, ECS, and the ability to build scalable, secure, and highly available infrastructure from scratch. The ideal candidate will be a strong advocate for DevOps principles, automation, and reliability, and will possess the skills to support and optimize complex microservices-based architectures.

Key Responsibilities

  • Infrastructure Design & Implementation
  • Design and build highly scalable, fault-tolerant, and secure cloud infrastructure using AWS, Kubernetes, and ECS.
  • Lead efforts in infrastructure as code (IaC) using tools like Terraform or CloudFormation.
  • Develop and enforce best practices for infrastructure provisioning, security, and cost optimization.

System Reliability & Performance

  • Ensure availability, performance, scalability, and security of production systems.
  • Implement observability strategies including monitoring, logging, and alerting using tools such as Prometheus, Grafana, ELK, or Datadog.
  • Analyse system performance metrics and proactively identify potential issues and bottlenecks.
  • DevOps & Automation

  • Build and maintain CI / CD pipelines to streamline code deployments across environments.
  • Drive automation in infrastructure provisioning, configuration management, and operational tasks.
  • Ensure repeatable and reliable deployments using containers and orchestration tools like Kubernetes and ECS.
  • Service Management

  • Own the SRE lifecycle, including incident management, postmortems, root cause analysis, and runbook creation.
  • Collaborate closely with development and QA teams to ensure seamless microservices integration, deployment, and lifecycle management.
  • Maintain service-level objectives (SLOs), service-level agreements (SLAs), and error budgets.
  • Security & Compliance

  • Implement and enforce cloud security best practices for networking, identity and access management, and data protection.
  • Support audits, compliance assessments, and vulnerability remediation.
  • Monitor for security anomalies and work with security teams to respond to threats.
  • Technical Skills

  • 6+ years of hands-on experience in Site Reliability Engineering, DevOps, or Cloud Engineering.
  • Expertise in AWS services such as EC2, S3, RDS, IAM, VPC, Lambda, CloudWatch, etc.
  • Strong knowledge of Kubernetes and container orchestration best practices.
  • Experience managing services on Amazon ECS (Fargate or EC2).
  • Proficient in infrastructure-as-code tools like Terraform, CloudFormation, or Pulumi.
  • Skilled in scripting languages such as Python, Bash, or Go.
  • Solid grasp of networking, load balancing, DNS, and firewall rules in cloud environments.
  • Deep understanding of microservices architectures, API gateways, and service meshes.
  • Soft Skills

  • Proven leadership and cross-functional collaboration skills.
  • Strong problem-solving and incident-resolution mindset.
  • Clear communication, documentation, and stakeholder reporting abilities.
  • Passion for continuous improvement and automation.
  • Preferred Qualifications

  • AWS certifications such as AWS Certified DevOps Engineer, Solutions Architect – Professional, or equivalent.
  • Familiarity with service meshes like Istio or Linkerd.
  • Experience with serverless architectures and event-driven systems.
  • Knowledge of regulatory compliance (SOC2, ISO 27001, GDPR) in cloud environments.
  • Skills – AWS Cloud, CICD, EC2, Kubernete, Grafana, Datadog, Python

    Key Responsibilities :

    Cloud Platform : GCP

  • Infrastructure Automation : Design, implement, and manage infrastructure as code using Terraform to provision and manage GCP resources.
  • Container Orchestration : Deploy and manage Kubernetes clusters, ensuring efficient operation of containerized applications.
  • Continuous Integration / Continuous Deployment (CI / CD) : Develop and maintain CI / CD pipelines using Jenkins to automate application build, test, and deployment processes.
  • Containerization : Collaborate with development teams to containerize applications using Docker and manage deployments with Helm Charts.
  • Code Quality Assurance : Integrate and manage SonarQube to ensure code quality and security standards are met.
  • Monitoring and Logging : Implement and manage monitoring solutions using Datadog to ensure system health, performance, and security.
  • Collaboration : Work closely with cross-functional teams, including developers, QA, and operations, to streamline processes and improve productivity.
  • Requirements :

  • Experience : 5+ years in DevOps or cloud engineering roles, with at least 3 years of relevant experience in the specified technologies.
  • Technical Proficiency :
  • o Hands-on experience with GCP services and architecture.

    o Proficiency in Terraform for infrastructure as code implementations.

    o Strong understanding and experience with Kubernetes and Docker.

    o Experience in setting up and managing CI / CD pipelines using Jenkins.

    o Familiarity with Helm Charts for application deployment.

    o Experience with SonarQube for code quality analysis.

    o Proficiency in monitoring and logging tools, particularly Datadog.

  • Scripting Skills : Proficiency in scripting languages such as Bash or Python is an added advantage.
  • o Strong problem-solving abilities and analytical thinking.

    o Excellent communication skills, both verbal and written.

    o Ability to work collaboratively in a team environment.

    o Strong organizational and time management skills.

    Skills – Terraform, Kubernetes, Cluster, Docker, GCP, Sonar

    Technical Skills

  • 6+ years of hands-on experience in Site Reliability Engineering, DevOps, or Cloud Engineering.
  • Expertise in AWS services such as EC2, S3, RDS, IAM, VPC, Lambda, CloudWatch, etc.
  • Strong knowledge of Kubernetes and container orchestration best practices.
  • Experience managing services on Amazon ECS (Fargate or EC2).
  • Proficient in infrastructure-as-code tools like Terraform, CloudFormation, or Pulumi.
  • Skilled in scripting languages such as Python, Bash, or Go.
  • Solid grasp of networking, load balancing, DNS, and firewall rules in cloud environments.
  • Deep understanding of microservices architectures, API gateways, and service meshes.
  • Soft Skills

  • Proven leadership and cross-functional collaboration skills.
  • Strong problem-solving and incident-resolution mindset.
  • Clear communication, documentation, and stakeholder reporting abilities.
  • Passion for continuous improvement and automation.
  • Preferred Qualifications

  • AWS certifications such as AWS Certified DevOps Engineer, Solutions Architect – Professional, or equivalent.
  • Familiarity with service meshes like Istio or Linkerd.
  • Experience with serverless architectures and event-driven systems.
  • Knowledge of regulatory compliance (SOC2, ISO 27001, GDPR) in cloud environments.
  • Skills – AWS Cloud, CICD, EC2, Kubernete, Grafana, Datadog, Python

    Create a job alert for this search

    Site Reliability Engineer • Delhi, India

    Related jobs
    Site Reliability Engineer

    Site Reliability Engineer

    Grootan Technologies • Delhi, India
    About the Role We are seeking a skilled.Site Reliability Engineer (SRE).In this role, you will be responsible for building and maintaining reliable, scalable, and secure infrastructure to support o...Show more
    Last updated: 14 days ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Delta Electronics India • Delhi, India
    Responsibilities • Define and monitor Service Level Objectives (SLOs), Service Level Indicators (SLIs), and error budgets to balance reliability with feature velocity and ensure optimal system avai...Show more
    Last updated: 6 days ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    super.money • Delhi, India
    Site Reliability Engineer (SRE) Level 3.Overview : A Site Reliability Engineer (SRE) Level 3 is a senior technical leadership role focused on designing, implementing, and maintaining large-scale, co...Show more
    Last updated: 24 days ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Capgemini • Ghaziabad, IN
    Choosing Capgemini means choosing a company where you will be empowered to shape your career in the way you’d like, where you’ll be supported and inspired by a collaborative community of colleagues...Show more
    Last updated: 30+ days ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Glocomms • Delhi, India
    We are currently looking for an SRE Lead - to join our customer - an IT consultancy with urgent projects on board.This will be a 6 month contract initially with an option to extend further.Responsi...Show more
    Last updated: 4 days ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Synamedia • Delhi, India
    JOB DESCRIPTION At Synamedia, the world’s most talented innovators and trailblazers are shaping the way the world is entertained and informed. We are backed by the Permira funds and Sky.This is the ...Show more
    Last updated: 18 days ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    GREYTIP SOFTWARE PRIVATE LIMITED • Delhi, India
    We are looking for a skilled Site Reliability Engineer II to join our SRE team.The ideal candidate will have hands-on experience in production monitoring, alert handling, and L1 production support....Show more
    Last updated: 12 days ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Flipkart • Delhi, India
    Hiring Site Reliability Engineers.The engineer will work in the Reliability and Productivity Engineering team and is responsible for building industry standard large scale platforms to be utilised ...Show more
    Last updated: 14 days ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    HireAlpha • Delhi, India
    Role-Site Reliability Engineer 6+ Years Permanent / Bangalore - Hybrid.Job Description We are looking for an engineer to focus on Developer Experience and who can help us design, build, and maintain...Show more
    Last updated: 1 day ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Karix • Delhi, India
    We are seeking an experienced professional Site Reliability Engineer who acts as a bridge between development and IT operations, taking operational tasks to ensure the efficient functioning of Serv...Show more
    Last updated: 8 days ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    CareStack - Dental Practice Management • Delhi, India
    Manage and maintain day-to-day BAU operations, including monitoring system.Build infrastructure as code (IAC) patterns that meet security and engineering. Build CI / CD pipelines using Octopus, GitLab...Show more
    Last updated: 4 days ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Yum! India Global Services Private Limited • Delhi, India
    Design, test, implement, deploy, and support continuous integration pipelines that build and deploy to cloud-based environments (development, stage / testing, production). In this role, you will help ...Show more
    Last updated: 16 days ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Synechron • Delhi, India
    We have immediate opportunity for.SRE (Senior Site Reliability Engineer) 5 to 9 years.SRE (Senior Site Reliability Engineer) Job Location : -. About Synechron We began life in 2001 as a small, self-f...Show more
    Last updated: 30+ days ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    VXI Global Solutions • Delhi, India
    We are looking for a Site Reliability Engineer with 3+ years for Experience into design, implement, and manage robust observability solutions across our cloud infrastructure and applications.The id...Show more
    Last updated: 10 days ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Pagos Consultants • Ghaziabad, IN
    This team will play a pivotal role in spearheading innovation.As such, you will have the opportunity to shape the early architecture and design of the system and set the trajectory for its future d...Show more
    Last updated: 1 day ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    HRhelpdesk • Delhi, India
    Company is a rapidly growing, private equity backed SaaS product company and provides cloud-based solutions.As a Site Reliability Engineer (SRE), you will be responsible for building and maintainin...Show more
    Last updated: 14 days ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    ACL Digital • Delhi, India
    Position : SRE & DevOps (ML Framework / Ray.Bangalore (Onsite) Type of Hire : .SRE & Devops (ML Framework) Required Skills : • Demonstrated ability in designing, building, refactoring and releasing sof...Show more
    Last updated: 30+ days ago • Promoted
    Site Reliability Engineer Rotation shift

    Site Reliability Engineer Rotation shift

    Synechron • Delhi, India
    We have immediate opportunity for.SRE (Senior Site Reliability Engineer) 5-8 years.SRE (Senior Site Reliability Engineer) Job Location : -. About Synechron We began life in 2001 as a small, self-fund...Show more
    Last updated: 17 days ago • Promoted