We are looking for
- An experienced SRE & DevOps Engineer with deep expertise in cloud infrastructure automation and observability
- A hands-on engineer who ensures reliability performance and scalability of systems
- A proactive problem solver with a strong focus on operational excellence and continuous improvement
- A collaborator who bridges development and operations through modern DevOps and SRE practices
- An effective communicator who thrives in cross-functional teams and drives best practices
This role matters to us
The Senior SRE & DevOps Engineer plays a vital role in ensuring the resilience scalability and reliability. By applying modern SRE principles automation and incident management practices you will enable faster more reliable delivery of business value while safeguarding system stability and customer trust.
Key Responsibilities
Design implement and maintain scalable secure and cloud-native infrastructureSet up and maintain observability solutions including monitoring alerting logging and tracing (e.g. Prometheus Grafana ELK DataDog)Continuously improve CI / CD pipelines and automate deployment workflows to increase delivery efficiencyLead structured incident response root cause analysis and drive a culture of post-mortem learningCollaborate closely with developers QA and architects to ensure seamless integration and performance optimizationApply SRE principles (SLIs SLOs SLAs error budgets) to guide operational decisions and system reliabilityChampion Infrastructure-as-Code practices using Terraform Helm or AnsibleEnsure security compliance and reliability are embedded into operationsMentor team members and foster a culture of operational excellence and continuous improvementQualifications : Education
Bachelors or Masters degree in Computer Science Engineering or equivalent practical experienceWork Experience
Proven 6 to 8 yrs experience in Site Reliability Engineering DevOps or Cloud Engineering rolesHands-on expertise with Kubernetes (preferably GKE) Docker and service mesh technologies like IstioStrong background in CI / CD practices and tools (GitHub Actions Jenkins X ArgoCD or similar)Experience with observability solutions (Prometheus Grafana ELK Jaeger DataDog GCP Dashboards)Proficiency with at least one major cloud platform (GCP AWS Azure)Scripting or programming experience (Python Go Bash or similar)Practical knowledge of Infrastructure-as-Code tools like Terraform Helm or AnsibleHands-on experience managing incidents troubleshooting and performing root cause analysisFamiliarity with SRE practices (SLIs SLOs SLAs error budgets)Other Requirements
Strong communication and collaboration skills across cross-functional teamsAbility to balance short-term operational needs with long-term scalability and system healthAnalytical and proactive mindset with focus on continuous improvementFluency in English (written and spoken)Nice-to-Have
Experience with security best practices in distributed systems (OAuth2 mTLS RBAC)Knowledge of cost optimization and cloud governance practicesFamiliarity with Camunda / CIB7 environmentsContributions to open-source DevOps / SRE communitiesRemote Work : No
Employment Type : Full-time
Key Skills
ASP.NET,Health Education,Fashion Designing,Fiber,Investigation
Experience : years
Vacancy : 1