Your Role :
We are seeking a Senior Site Reliability Engineer (Infrastructure & Site Reliability Engineering) with experience in AWS, GCP, Kubernetes, and GitOps to work with our Site Reliability Engineering (SRE) team.
The successful candidate will understand SRE practices and have a track record of implementing high-quality site reliability engineering practices (SLAs, SLOs, Proactive Alert Management, Incident Response / Review, Postmortems, etc.
In this role, you will work with our SRE and cross-functional engineering teams to develop and operate our development and production infrastructure and operations.
Responsibilities :
- Work collaboratively with software engineering teams to define infrastructure and deployment requirements.
- Contribute actively and assist in our automation and observability initiatives.
- Learn, develop, and maintain operational tools for deployment, monitoring, and analysis of cloud (AWS & GCP) infrastructure and systems.
- Work closely with team members to lead the response to production incidents, conduct postmortems, and drive continuous improvement efforts as part of 24 / 7 on-call rotations for exposure to critical issue resolution.
- Contribute to on-call documentation and incident response playbooks.
- Establish and drive operations performance through SLOs.
- Embrace and adhere to development best practices, including continuous integration / deployment and code review.
- Demonstrate a strong commitment to continuous learning and professional development by seeking opportunities for mentorship and learning within the team.
Our team uses practices to maximize our development velocity, including but not limited to : continuous integration / deployment, code review via GitHub pull requests.
Ideal Attributes :
Strong customer orientation.Excellent interpersonal and organizational skills.Attention to detail and focus on quality.Strong communication skills to effectively liaise with both technical and non-technical staff.Ability to act decisively and work well under pressure.Must be a collaborative problem solver.Strong bias for ownership and action.Qualifications :
At least 5+ years of experience designing, building ,and maintaining SAAS environments.4+ years of experience designing, building,g and maintaining AWS / GCP infrastructure with Terraform.Experience building and running Kubernetes clusters.Experience with observability (monitoring, logging, tracing, metrics).Experience with GitOps CI / CD processes.Experience with scripting with Python, Go (Golang), bash, or PowerShell, and AWS CLI tools.Experience with security operations security policies, infrastructure, key management, setup of encryption at rest and transport.(ref : hirist.tech)