We are recruiting an experienced Site Reliability Engineer to join our newly established TechOps division within the Technology department. We maintain the systems that keep our products running smoothly around the world, 24x7 - supporting everything from cloud infrastructure and CI / CD pipelines to observability and incident response.
How you will contribute in this role :
- Define and implement best practices for system reliability, observability, monitoring, and alerting.
- Build and manage automation for our AWS cloud based services, and SaaS stack. Continuously reduce operational toil.
- Drive end-to-end observability across our web and mobile applications, cloud infrastructure, firewalls and CDNs.
- Diagnose infrastructure failures, performance bottlenecks, and production issues through strong debugging skillsWork closely with Service Delivery Managers to drive incident management processes, including postmortems and root cause analysis, and with application teams, and platform engineers to improve reliability and performance.
- Participate in on-call rotations, ensuring rapid incident response across our stack.
- Take ownership of SLAs / SLOs / SLIs and commit to continuous improvement of service levels across all platforms.
- Improve system resilience and minimize MTTR (mean time to recovery) through incident response automation.
What were looking for :
4+ years of professional experience as a Site Reliability Engineer or in a Cloud Operations / DevOps role.3+ years in a production environment supporting large-scale, mission-critical applications - including web, mobile, and e-commerce / payment applications.Proficient in one or more programming / scripting languages (e.g., Python, Golang, Typescript).In-depth knowledge of observability tools (e.g., New Relic, Prometheus, Grafana ).Professional experience in cloud platforms (AWS strongly preferred), such as serverless functions, API gateway, relational and NoSQL databases, and caching.Strong experience with container orchestration ( ECS, Kubernetes), CI / CD pipelines, and infrastructure-as-code (AWS CDK, Terraform, Pulumi, etc.).An advanced degree in software / data engineering, computer / information science, or a related quantitative field or equivalent work experience.Strong verbal and written communication skills and ability to work well with a wide range of stakeholders.Strong ownership, scrappy and biased for action.(ref : hirist.tech)