Description :
We're looking for a highly skilled Site Reliability Engineer (SRE)to ensure reliability, scalability, and performance across our systems. As a key technical leader, you'll own mission-critical services, drive automation, and work closely with Engineering, Product, and DevOps teams to build and maintain world-class :
- Design, implement, and maintain scalable and reliable infrastructure on AWS.
- Build and optimize CI / CD pipelines for faster, safer deployments.
- Implement and manage monitoring and alerting systems using Prometheus, Grafana, or Datadog.
- Develop infrastructure automation scripts using Python and Terraform.
- Manage Kubernetes clusters and ensure high availability of cloud-native applications.
- Lead incident response, root cause analysis, and reliability improvements.
- Mentor junior engineers and champion SRE best practices across teams.
- Maintain strong focus on security, compliance, and cost :
- 8 -15 years of experience in software or infrastructure engineering roles.
- Hands-on experience with AWS Cloud Platform (GCP or Azure experience is a plus).
- Strong expertise in CI / CD tools(GitHub Actions, GitLab CI, Jenkins, etc. ).
- Proficiency in Python or similar scripting languages.
- Experience with Kubernetes, Terraform, and Prometheus-based monitoring.
- Strong understanding of networking, observability, and automation principles.
- Excellent problem-solving, communication, and collaboration skills.
- Willingness to participate in on-call rotations for critical services.
(ref : hirist.tech)