About Roku :
The #1 platform for streaming television, Roku wants to revolutionize the way the world watches TV.
Our Roku-branded TVs, Roku TV models, Smart Home system, streaming players, audio equipment, and the purpose-built operating system that powers it all can turn any home into a home theater, with seamless integration of hardware and software.
Our commitment to our users extends to our brand studio, which creates innovative Roku Originals exclusively for The Roku Channel, a free channel that reaches approximately 80 million households in the U and Mexico.
Join us, and you'll have the chance to delight millions of TV streamers around the world while gaining meaningful experience across a variety of disciplines.
Job Description :
We are seeking a talented and experienced DevOps / SRE (Site Reliability Engineering) Team Lead to join our dynamic team.
The ideal candidate will have a strong background in DevOps practices, cloud infrastructure management, automation, and team leadership skills.
If you have a consistent track record architecting & building large-scale systems and enjoy solving intriguing system challenges at the internet scale, and If you are innovative at heart and have a great balance between learning, organizing, building, and enjoy making an impact, this role might be a great fit for you!
What you will be doing :
- Provide leadership and guidance to a team of DevOps / SRE engineers, fostering a collaborative and high-performing work environment.
- Mentor team members in best practices, technologies, and methodologies.
- Oversee the design, implementation, and maintenance of scalable and resilient cloud infrastructure on platforms spanning AWS and GCP.
- Ensure high availability, reliability, and performance of critical systems.
- Collaborate with your peers to be responsible for the entire software lifecycle, seek the right problem to solve, and strive for excellence.
- Manage individual project priorities, deadlines, and deliverables related to your technical expertise and assigned domains
- Lead incident response efforts, working closely with cross-functional teams to resolve issues quickly and minimize downtime.
- Implement effective incident management processes and post-incident reviews.
- Collaborate with security teams to ensure the integrity and security of infrastructure and applications.
- Implement security best practices and compliance standards.
- Identify performance bottlenecks and optimize system resources for maximum efficiency.
- Conduct regular performance tuning and capacity planning exercises.
- Drive continuous improvement initiatives within the team and across the organization.
- Proactively identify areas for enhancement and implement solutions to address them.
- Maintain comprehensive documentation of systems, processes, and procedures.
- Foster a culture of knowledge sharing and contribute to the collective learning of the team.
- Participate in 24x7 on-call rotation, and be available to work with global teams in the event of critical outages.
We're excited if you have :
Experience with a number of the following : ECS, Docker, Kubernetes, Envoy, Istio.Experience with infrastructure as code (IaC) tools such as Terraform, Ansible, or CloudFormation.Strong understanding of distributed systems, microservices architecture, and cloud-native technologies.The drive and self-motivation to understand the intricate details of a complex infrastructure environment.10+ years of experience in DevOps / SRE roles, with at least 2 years in a leadership capacity.Strong proficiency in cloud platforms such as AWS, Azure, or GCP.Solid understanding of networking, security, and compliance principles.Proven track record of driving results and delivering high-quality solutions in a fast-paced environment.Demonstrated ability to communicate clearly with both technical and non-technical project stakeholders, with the ability to work effectively in a cross-functional team environment.BS Degree in Computer Science or Equivalent.Certifications in relevant technologies such as Certified Kubernetes Administrator (CKA), AWS Certified DevOps Engineer, or Certified Information Systems Security Professional (CISSP).Certified Scrum Master is a plus(ref : hirist.tech)