About the job : Position Purpose :
At Brambles there is a need to make sure that platforms built on cloud hypervisors run smoothly as expected and can scale to the demand. The SRE Lead will monitor, maintain, and drive the software engineering required to ensure performance, scalability and reliability of cloud-based applications and infrastructure.
This role will proactively use observation data to identify improvement opportunities, not just across cloud services, but also for the platform itself. This role will drive a self-healing mentality across the global estate that can scale seamlessly.
The SRE Lead will work alongside the Cloud Platform Engineering Team Lead(s) and others, and may assist in the creation of modules but is focused on delivering performance and optimisation to maintain production services.
Major / Key Accountabilities :
- Using Brambles observability tools to detect platform and workload issues
- Work closely with the native platform management team, product groups and technical leads to formulate and design systems to troubleshoot issues proactively and automatically.
- Support cloud operations in postmortem reviews to identify mitigation for future failure.
- Evaluate the key workloads and implement strategies to mitigate risk of failure.
- Continuous monitoring to review effectiveness
- Minimising mean time to respond. (MTTR)
- Supporting the maintenance of tools for bug tracking
- Ensuring documentation and designs are kept relevant
Experience :
Significant experience in within a technology automation / SRE role10+ years working with scripting languages.Proven success in improving the customer experience.Experience working within a matrix structure.Qualifications :
Essential Qualifications :
Extensive experience with PythonStrong experience with BASHStrong experience with automation of processesExperience with KubernetesStrong knowledge of CI / CDDesirable Qualifications :
SRE Reliability Engineering PractitionerSRE Reliability Engineering FoundationsBachelors degree in Computer Science, Information Systems, Business or related field, Masters preferred or equivalent combination of education / experience.Skills and Knowledge :
Python : Can guide others to write clean, reusable, scalable codeBuild pipelines for continuous improvement, writing Python scripts to automate testing, deployment, and rollback processes to ensure a smooth and reliable CI / CD pipelineAdvanced monitoring, logging and custom toolingWrite scripts to interact with cloud APIs, handling authentication, error handling, and maximising availabilitySystem Programming Languages : Can guide and support others in the development, testing, and deployment of cloud-native applications, services and infrastructureTroubleshoot issues with guidance from senior team membersSupport the integration of cloud applications with edge devices using system programming languages for low-level interactions and communicationDevelop and implement networking protocolsUnderstanding and use of event-based design, object-oriented design, functional design, multi-tenant design, domain driven design and knowing which design approach is best suited for the particular problem and abstraction to solve complex problems.Ability to design at both the high level (the forest) and the low level (the tree); and include understanding of current design approaches used in the field, and when they are appropriate to the use cases relevant to the platform being built.Use of well-established tools such as databases and Structured Query Language (SQL), and new leading-edge tools such as Kubernetes and the eco-system of tools around a particular language or programming environment with continuous research and learning of emerging new tools in a rapidly changing computing landscape.Thinking abstractly to incorporate multiple perspectives; work within a space where the boundary or scope of problem or system may be fuzzy; understand diverse operational contexts of the system; identify inter- and intrarelationships and dependencies; understand complex system behaviour; and reliably predict the impact of change to the system.Ability to navigate cloud platforms such as AWS and Azure, and use them effectively as the technical landscape for building Brambles specific platforms (both multi-tenant and purely internal). The platforms built within Brambles Digital need to be "cloud-native" and run securely, effectively and correctly at scale.(ref : hirist.tech)