We are looking for a passionate Site Reliability Engineer (SRE) to join our team and help build scalable, resilient, and secure systems.
As an SRE, you will bridge the gap between software engineering and infrastructure operations.
You will focus on automation, reliability, performance, and security to ensure that our applications and services run smoothly in production.
Title : Site Reliability Engineer.
Location : Remote Work.
Key Responsibilities :
- Automation & Tooling : Develop scripts and tools (Python, Go, Bash, etc.) to automate manual tasks, reduce operational toil, and improve system reliability.
- Cloud & Containerization : Design, deploy, and manage infrastructure on AWS / GCP and containerized environments using Docker and Kubernetes.
- CI / CD Ownership : Implement and optimize CI / CD pipelines (Jenkins, GitLab CI, GitHub Actions) to enable safe, frequent, and automated deployments.
- Monitoring & Observability : Build and maintain monitoring systems (Prometheus, Grafana, ELK Stack, OpenTelemetry) to proactively detect, troubleshoot, and resolve issues.
- Infrastructure as Code (IaC) : Manage infrastructure using Terraform, Ansible, or equivalent tools for repeatable and version-controlled deployments.
- Incident Management : Lead troubleshooting and incident response efforts, ensuring root cause analysis and long-term fixes.
- Networking : Design and optimize network configurations (VPCs, Load Balancing, DNS, Service Mesh) for distributed systems performance and resilience.
- Security & Compliance : Integrate DevSecOps best practices into CI / CD, ensuring secrets management, vulnerability scanning, and secure-by-design operations.
- Capacity Planning & Performance Tuning : Forecast resource needs, conduct load testing, and optimize system performance for cost-effective scaling.
Required Skills & Qualifications :
Strong programming / scripting experience (Python, Go, Bash, or similar).Hands-on experience with at least one major cloud provider (AWS, GCP, or Azure).Expertise in Kubernetes, Docker, and container orchestration.Experience with CI / CD pipelines and tools (Jenkins, GitLab CI, GitHub Actions, etc.).Proficiency in monitoring / observability platforms (Prometheus, Grafana, ELK, OpenTelemetry).Experience with Infrastructure as Code (Terraform, Ansible, or similar).Solid troubleshooting and incident response skills under pressure.Knowledge of networking fundamentals (VPC, DNS, Load Balancers, Service Mesh).Familiarity with security best practices, DevSecOps, and secrets management.Strong analytical and problem-solving skills with a proactive mindset.Preferred Qualifications :
Previous experience in a high-availability, large-scale production environment.Exposure to performance benchmarking, load testing, and capacity planning.Contributions to open-source SRE / DevOps tools or frameworks.Certifications in cloud (AWS / GCP / Azure) or Kubernetes.If you believe you are qualified and are looking forward to setting your career on a fast-track, apply by submitting a few paragraphs explaining why you believe you are the right person for this role.
About Techolution :
Techolution is a next gen AI consulting firm on track to become one of the most admired brands in the world for "AI done right".
Our purpose is to harness our expertise in novel technologies to deliver more profits for our enterprise clients while helping them deliver a better human experience for the communities they serve.
At Techolution, we build custom AI solutions that produce revolutionary outcomes for enterprises worldwide.
Specializing in "AI Done Right," we leverage our expertise and proprietary IP to transform operations and help achieve business goals efficiently.
We are honored to have recently received the prestigious Inc 500 Best In Business award, a testament to our commitment to excellence.
(ref : hirist.tech)