Description :
Job Title : Site Reliability Engineer (SRE) - DataDog / AWS Lambda / DynamoDB / Serverless
Location : Bangalore / Pune / Hyderabad
Experience : 5- 10 Years
About the Role :
We are seeking an experienced Site Reliability Engineer (SRE) with strong expertise in DataDog integration, AWS Lambda, DynamoDB, and Serverless architectures. The ideal candidate will be responsible for building, monitoring, and maintaining highly reliable, scalable, and secure cloud-based systems.
Key Responsibilities :
- Design, implement, and maintain monitoring and observability solutions using DataDog (metrics, logs, traces, dashboards, and alerts).
- Develop and optimize serverless applications using AWS Lambda and related AWS services.
- Manage and optimize DynamoDB for scalability, reliability, and cost efficiency.
- Automate deployment and infrastructure provisioning using AWS CDK / CloudFormation / Terraform.
- Implement reliability engineering practices including performance tuning, auto-scaling, and fault tolerance.
- Collaborate with development teams to design and implement highly available, resilient, and secure architectures.
- Troubleshoot production issues and drive root cause analysis (RCA) to ensure long-term stability.
- Continuously improve CI / CD pipelines and observability frameworks.
Required Skills & Experience :
5-10 years of total experience, with at least 3+ years in SRE / DevOps roles.Hands-on experience with DataDog setup and integrations (custom metrics, APM, log management).Strong experience with AWS Lambda, DynamoDB, and other Serverless services (API Gateway, Step Functions, SQS, SNS).Proficiency in Python / Node.js / Bash scripting for automation.Experience with IaC tools like Terraform, CloudFormation, or AWS CDK.Solid understanding of AWS architecture, networking, and security best practices.Working knowledge of CI / CD tools (GitHub Actions, Jenkins, CodePipeline, etc.).Experience with incident management, monitoring dashboards, and alerting automation.Good to Have :
Experience with Kubernetes / ECS / EKS for container orchestration.Familiarity with CloudWatch, Prometheus, or Grafana.AWS Certification (Solutions Architect / DevOps Engineer) preferred.(ref : hirist.tech)