Our Site Reliability Engineers (SREs) play a crucial role in ensuring our systems are reliable, scalable, and efficient. We are looking for an experienced SRE to join our team and help us maintain and improve our infrastructure.
Responsibilities
- Monitor and Maintain Systems : Ensure the availability, performance, and reliability of our production environment by monitoring system health and responding to incidents.
- Automation : Develop and implement automation tools to reduce manual intervention and improve system efficiency.
- Collaboration : Work closely with development teams to design and implement scalable and reliable systems.
- Performance Tuning : Analyze system metrics to identify performance bottlenecks and optimize system performance.
- Incident Management : Lead incident response efforts, conduct root cause analysis, and implement preventive measures.
- Documentation : Create and maintain comprehensive documentation for system architecture, processes, and procedures.
- Capacity Planning : Conduct capacity planning and ensure systems can handle future growth.
Qualifications
Experience : 6+ years of experience in site reliability engineering, operations, or software engineering.Education : Bachelor's degree in Computer Science, Engineering, or a related field.Technical Skills : Proficiency in scripting languages (e.g., Python, Ruby), experience with containerization (Docker, Kubernetes), and familiarity with cloud platforms (AWS, GCP, Azure).System Knowledge : Strong understanding of Linux / Unix systems, networking, and infrastructure components.Problem-Solving : Excellent troubleshooting and problem-solving skills.Communication : Strong communication and collaboration skills to work effectively with cross-functional teams.Certifications : Relevant certifications (e.g., AWS Certified Solutions Architect, Certified Kubernetes Administrator) are a plus.Preferred Skills
Experience with configuration management tools (e.g., Ansible, Chef, Puppet).Knowledge of CI / CD pipelines and tools (e.g., Jenkins, GitLab CI).Familiarity with monitoring and logging tools (e.g., Prometheus, Grafana, ELK stack).Why Join Us
Innovative Environment : Work on cutting-edge technologies and projects.Growth Opportunities : Opportunities for professional development and career advancement.Collaborative Culture : Join a team that values collaboration, diversity, and inclusion.Competitive Benefits : Comprehensive benefits package including health insurance, retirement plans, and more.Skills Required
Unix, Chef, Prometheus, Elk Stack, Grafana, Jenkins, Gcp, Linux, Docker, Ansible, Ruby, Puppet, Azure, Kubernetes, Python, Aws