Pearson is looking for a dynamic and experienced Manager - Site Reliability Engineering (SRE) to join our team. This individual will play a critical role in ensuring the stability, performance, and scalability of our infrastructure. If you possess excellent leadership skills, profound technical expertise, and the ability to thrive in a fast-paced, collaborative environment, we encourage you to apply.
Key Responsibilities
Leadership and Team Management
- Lead, mentor, and develop a team of highly skilled Site Reliability Engineers.
- Promote a culture of continuous improvement and high performance.
- Foster collaboration and communication within the team and with other departments.
- Monitor team performance and provide constructive feedback.
Technical Expertise
Oversee the design, implementation, and maintenance of reliable and scalable infrastructure.Develop and enforce best practices for system reliability, monitoring, and incident management.Ensure the availability, performance, and security of our services.Collaborate with software engineering teams to design and implement solutions that improve system reliability and performance.Utilize automation and DevOps practices to streamline operations and enhance productivity.Experience with Terraform is required.Extensive knowledge of multi-cloud environments is an added advantage.Collaboration and Communication
Work closely with cross-functional teams , including engineering, product management, and operations, to ensure alignment and successful project execution.Communicate effectively with stakeholders at all levels, providing regular updates on SRE initiatives and performance metrics.Facilitate incident response and post-mortem meetings , ensuring thorough analysis and follow-up on action items.Qualifications
Education : Bachelor's or Master's degree in Computer Science, Engineering, or a related field.Experience : Proven experience in a leadership role within a Site Reliability Engineering or DevOps team.Strong technical background with extensive knowledge of cloud infrastructure, containerization, automation, and monitoring tools.Proficiency in scripting languages such as Python, Bash, or similar.Excellent problem-solving skills and a proactive approach to identifying and mitigating risks.Exceptional communication and interpersonal skills.Skills Required
Team Management, Continuous Improvement, scalable architecture , Devops, Automation, Scripting Languages