Position Summary :
We are seeking a Senior Service Reliability Engineer (SSRE) to join our technology team. This role focuses on ensuring the stability, scalability, and reliability of cloud-based services and applications.
The ideal candidate will bring deep expertise in Linux systems, distributed systems, and modern infrastructure technologies, along with the ability to mentor junior engineers and influence architecture decisions across the Roles Responsibilities & Duties :
- Take a leadership role in improving system reliability and scalability.
- Work closely with SRE management to define KPIs, processes, and continuous improvement strategies.
- Drive architectural decisions and provide operational input into solution design.
- Mentor and guide junior SREs, enabling their success.
- Represent operational scalability and reliability considerations in the wider organization.
- Lead small-scale projects from inception to implementation.
- Provide technical leadership for platform-wide solutions.
- Contribute to incident response, monitoring, and operational readiness throughout the software lifecycle.
- Participate in on-call rotation to support production Skills and Qualifications :
Mandatory :
Bachelors degree in engineering / technology or related discipline79 years of experience in Software Development and / or Linux Systems Administration.Strong interpersonal, written, and verbal communication skills.Expertise as a Linux Production Systems Engineer managing large-scale Web Services infrastructure.Development experience in Python (preferred) or one of Bash, Go, Java, C++, Expertise (at least 3 areas) :Distributed data storage at scale (Hadoop, Ceph).NoSQL databases (MongoDB, Redis, Cassandra).Data aggregation (Elasticsearch, Kafka).RDBMS scaling & HA (PostgreSQL, MySQL).Monitoring & Alerting tools (Prometheus, Grafana) and Incident Management.Kubernetes and / or AWS (deployment & management).Software Distribution and Package Management.Configuration Management (Ansible, SaltStack, Puppet, Chef).Software Performance Analysis & Load Testing (QA / SDET experience is a We Use :Linux, Python, Java, Go, C++, Rust, Hadoop, Ceph, MongoDB, Redis, Cassandra, Elasticsearch, Kafka, PostgreSQL, MySQL, Prometheus, Grafana, AWS, Kubernetes, Ansible, Puppet, Chef, we Offer :Bootstrapped and financially stable with high pre-money evaluation.Above industry renumerations.Additional compensation tied to Renewal and Pilot Project Execution.Additional lucrative business development compensation.Firm building opportunities that offer stage for holistic professional development, growth, andbranding.
Empathetic, excellence and result driven organization.Believes in mentoring and growing a team with constant emphasis on learning(ref : hirist.tech)