WHO WE ARE
Sapaad is a global leader in unified commerce platforms, delivering world-class software solutions for the food and beverage industry. Our flagship product, also named Sapaad, has achieved remarkable success over the past decade, empowering thousands of F&B businesses across 40+ countries —with many more coming onboard each day.
Driven by a passionate team of developers, designers, and product experts, Sapaad is constantly evolving—introducing innovative, industry-defining features that set the benchmark for F&B tech. Headquartered in Singapore, with offices across five countries, Sapaad is backed by seasoned technology veterans with deep expertise in web, mobility, and e-commerce.
JOB OVERVIEW
Sapaad Software Private Limited is seeking a Senior Site Reliability Engineer (SRE) to lead our infrastructure reliability efforts and mentor a growing SRE team.
This is a strategic, hands-on leadership position responsible for ensuring the reliability, scalability, and performance of our global cloud-based restaurant management platform serving thousands of customers worldwide.
As a senior member of our engineering organization, you will take ownership of system availability, drive automation initiatives, and establish SRE best practices across the company. You’ll work at the intersection of development and operations—embedding reliability into every layer of our technology stack while building and leading a team focused on operational excellence.
This role is ideal for an experienced SRE professional who is passionate about building resilient systems at scale, mentoring engineering talent, and shaping the reliability culture of a fast-growing SaaS organization.
WHAT YOU’LL DO
- Own the reliability, availability, and performance of all production systems supporting our multi-tenant SaaS platform.
- Define and manage SLIs, SLOs, and error budgets across critical services.
- Architect and implement highly available, fault-tolerant systems on AWS and Heroku.
- Proactively monitor and analyze performance to predict capacity needs and prevent issues.
- Lead incident management and postmortem processes , driving root cause analysis and preventive actions.
- Develop incident response playbooks , implement chaos engineering , and reduce MTTD and MTTR.
- Design and implement comprehensive observability solutions —monitoring, logging, and alerting for microservices and distributed systems.
- Enforce security and compliance standards , including access controls, vulnerability management, and patching.
- Mentor and lead SRE and infrastructure engineers, driving team growth, knowledge sharing, and operational maturity.
- Collaborate with development, DevOps, and product teams to embed reliability practices into every stage of the software lifecycle.
YOU’RE A STRONG FIT IF YOU HAVE
5–8 years of experience in SRE, DevOps, or Systems Engineering roles within SaaS or cloud-based environments.2+ years in a technical leadership or mentoring capacity .Proven experience maintaining large-scale, high-availability systems (99.9%+ uptime) .Expertise with AWS (EC2, RDS, S3, ECS / EKS, Lambda) and Heroku .Proficiency in Infrastructure as Code (Terraform, CloudFormation) and containerization (Docker, Kubernetes).Strong scripting and automation skills in Python, Bash, or PowerShell .Experience with CI / CD pipelines (Jenkins, GitLab CI, GitHub Actions) and configuration management tools (Chef, Ansible, Puppet).Deep understanding of SRE principles —SLIs, SLOs, toil reduction, blameless postmortems, and incident management frameworks.Hands-on experience with monitoring tools (Prometheus, Grafana, Datadog, New Relic, CloudWatch, ELK).Excellent leadership, analytical, and communication skills with a customer-first mindset .PREFERRED QUALIFICATIONS
AWS Certified Solutions Architect – Associate or Professional certification.Experience with SOC 2, ISO 27001, GDPR, or PCI DSS compliance frameworks.Background in microservices architectures , disaster recovery planning , or cost optimization .Experience in the restaurant, hospitality, or retail technology sectors.