Talent.com
SRE Developer (IAM)

SRE Developer (IAM)

ConfidentialNoida
30+ days ago
Job description

Key Responsibilities :

  • Monitoring & Alerting :
  • Develop, maintain, and enhance monitoring and alerting systems using Datadog to proactively identify and address potential issues, ensuring optimal system performance.
  • CI / CD Pipelines :
  • Participate in the design and implementation of CI / CD pipelines using Azure DevOps , enabling automated and reliable software delivery.
  • Incident Response :
  • Lead efforts in incident response and troubleshooting to quickly diagnose and resolve production incidents, minimizing downtime and impact on users.
  • Reliability Initiatives :
  • Take ownership of reliability initiatives by identifying areas for improvement, conducting root cause analysis , and implementing solutions to prevent recurrence of incidents.
  • Collaboration :
  • Collaborate with cross-functional teams to ensure security , compliance , and performance standards are met throughout the development lifecycle.
  • On-call Support :
  • Participate in on-call rotations and provide 24 / 7 support for critical incidents, ensuring rapid response and resolution.
  • SLOs & SLIs :
  • Work with development teams to define and establish Service Level Objectives (SLOs) and Service Level Indicators (SLIs) to measure and maintain system reliability.
  • Documentation :
  • Contribute to the documentation of processes, procedures, and best practices to enhance knowledge sharing within the team.
  • Qualifications :
  • Education :
  • Bachelor's degree in Computer Science , Information Technology , or a related field, or equivalent work experience.
  • Experience :
  • Minimum of 4 years of experience in a Site Reliability Engineer or similar role, managing cloud-based infrastructure on AWS with EKS .
  • AWS Expertise :
  • Strong expertise in AWS services , especially EKS , including cluster provisioning , scaling , and management .
  • Monitoring & Observability :
  • Proficiency in using monitoring and observability tools , with hands-on experience in Datadog or similar tools for tracking system performance and generating meaningful alerts.
  • CI / CD Experience :
  • Experience in implementing CI / CD pipelines using Azure DevOps or similar tools to automate software deployment and testing.
  • Containerization & Orchestration :
  • Solid understanding of containerization and orchestration technologies (e.g., Docker , Kubernetes ) and their role in modern application architectures.
  • Troubleshooting :
  • Excellent troubleshooting skills and the ability to analyze complex issues, determine root causes, and implement effective solutions.
  • Scripting & Automation :
  • Strong scripting and automation skills (e.g., Python , Bash ).
  • IaC (Infrastructure as Code) :
  • Familiarity with Infrastructure as Code (IaC) tools such as Terraform or CloudFormation .
  • Incident Management :
  • Experience with incident management , post-incident analysis , and implementing improvements based on lessons learned.
  • Security & Compliance :
  • Good understanding of security best practices and compliance standards in cloud environments.
  • Communication :
  • Exceptional communication skills and the ability to collaborate effectively with cross-functional teams.
  • On-call Rotations :
  • Willingness to participate in on-call rotations and provide off-hours support when necessary.

Preferred Qualifications :

  • Relevant certifications such as :
  • AWS Certified DevOps Engineer
  • AWS Certified SRE
  • Kubernetes certifications
  • Experience with other cloud platforms (e.g., Azure , Google Cloud Platform ).
  • Familiarity with microservices architecture and service mesh technologies .
  • Prior experience with application performance tuning and optimization .
  • Skills Required

    Aws, SRE, Devops, Azure, cloud platform

    Create a job alert for this search

    Developer • Noida