Talent.com
Senior Manager - Site Reliability Engineering

Senior Manager - Site Reliability Engineering

Standard Chartered BankIndia
7 days ago
Job description

This job is with Standard Chartered Bank, an inclusive employer and a member of myGwork – the largest global platform for the LGBTQ+ business community. Please do not contact the recruiter directly.

JOB SUMMARY

RESPONSIBILITIES

Lead the implementation and advocacy for SRE (Support Site Reliability Engineer) principles to improve the reliability and availability of our applications

Drive work on setting and maintaining SLI / SLO / Error budgets for our applications

Responsible for developing and executing on the Chapter Vision together with the other Chapter Leads

Drive technology strategy, technology stack selection, and implementation for a future-ready technology stack, to achieve outcomes of highly scalable, robust, resilient system.

Experienced former practitioner with leadership ability.

Oversees the execution of functional standards and best practices

Provide thought leadership on the craft, inspire, and retain talents by developing and nurturing an extensive internal and external network of practitioners.

This role is around capability building, it is not to own applications or delivery

Creates a strategy roadmap of technical work

Works to drive technology convergence and simplification across their chapter area

Technical Responsibilities

Service Reliability : Monitor and maintain the reliability, availability, and performance of production services and infrastructure.

Automation and Tooling : Develop and maintain automation tools and processes to streamline system provisioning, configuration management, deployment, and monitoring.

Incident Management : Respond to and troubleshoot incidents, outages, and performance issues in production environments, ensuring timely resolution and minimal impact on users.

Blameless Postmortems and Learning from Incidents - Participate in the wider root cause analysis and support & drive collaborative actions.

Capacity Planning : Analyse system performance and capacity trends to forecast future resource requirements and optimize infrastructure utilization.

Performance Optimization : Identify and address performance bottlenecks and optimization opportunities across the software stack, from application code to underlying infrastructure.

Security and Compliance : Implement security best practices and ensure compliance with regulatory requirements, collaborating with security and compliance teams as needed.

Continuous Improvement : Continuously evaluate and improve system reliability, scalability, and performance through automation, process refinement, and technology upgrades.

Documentation and Knowledge Sharing : Document system designs, configurations, and procedures, and share knowledge with team members through documentation, training, and mentoring.

Strategy

Reliability Engineering Strategy - Develop and execute a comprehensive reliability engineering strategy to ensure high availability, fault tolerance and disaster recovery capabilities for critical systems and services

Scalability Planning - Design and implement scalable architecture solution that can accommodate growth in user traffic and data volume over time

Monitoring and Alerting Strategy - Defining and implementing monitoring and alerting strategies to proactively identify and address issues before they reach the end users

Capacity Planning Strategies - Develop capacity planning strategies to ensure that systems have sufficient resources to handle current and future workloads

Business

Experienced practitioner and hands on contribution to the squad delivery for their craft (E.g. SRE).

Responsible for balancing skills and capabilities across teams (squads) and hives in partnership with the Chief Product Owner & Hive Leadership, and in alignment with the fixed capacity model.

Responsible to evolve the craft towards improving automation, simplification, and innovative use of latest market trends.

Trusted advisor to the business. Work hand in hand with the Business, taking product programs from investment decisions, into design, specification, and solution phases, all the way to operations on the ground and securing support services from other teams.

Provide leadership and technical expertise for the subdomain to achieve goals and outcomes

Support respective businesses in the commercialisation of capabilities, bid teams, monitoring of usage, improving client experience, and collecting defects for future improvements.

Manage business partner expectations. Ensure delivery to business meeting time, cost and with high quality

Processes

Chapter Lead may vary based upon the specific chapter domain its leading.

Define standards to ensure that applications are designed with scale, resilience, and performance in mind

Enforce and streamline sound development practices and establish and maintain effective governance processes including training, advice, and support, to assure the platforms are developed, implemented, and maintained aligning with the Group's standards

Responsible for overall governance of the subdomain that includes risk management, representation in steering committee reviews and engagement with business for strategy, change management and timely course correction as required

Ensure compliance to the highest standards of business conduct, regulatory requirements and practices defined by internal and external requirements. This includes compliance with local banking laws and anti-money laundering stipulations

People & Talent

Accountable for people management and capability development of their Chapter members.

Reviews metrics on capabilities and performance across their area, has improvement backlog for their Chapters and drives continual improvement of their chapter.

Focuses on the development of people and capabilities as the highest priority.

Ensure that the organisation works in a proactive way to upgrade capacity well in advance and predict future capacity needs

Responsible for building an engineering culture where application and infrastructure scalability is paramount for on-going capacity management with an aim to reduce the need for capacity reviews using monitoring and auto-scale properties

Empower the engineers so that they can provide economy of scale focused on delivering value, speed to market, availability, monitoring & system management

Foster a culture of innovation, transparency, and accountability end to end in the subdomain while promoting a "business-first" mentality at all levels

Develop and maintain a plan that provides for succession and continuity in the most critical delivery and management position

Risk Management

Responsible for effective capacity risk management across the Chapter with regards to attrition and leave plans.

Ensures the chapter follows the standards with respect to risk management as applicable to their chapter domain.

Adheres to common practices to mitigate risk in their respective domain.

Effectively and collaboratively identify, escalate, mitigate, and resolve risk, conduct and compliance matters.

Incident Response Planning - Develop incident response plans and procedures to effectively mitigate and manage risks when they materialize

Risk monitoring and alerting - Implement monitoring and alerting systems to detect early warning signs of potential risks

Root Cause analysis - Conduct thorough root cause analysis of incidents and outages to understand the underlying causes and contributing factors

Ensure that the organisation works in a proactive way to upgrade capacity well in advance and predict future capacity needs

Responsible for building an engineering culture where application and infrastructure scalability is paramount for on-going capacity management with an aim to reduce the need for capacity reviews using monitoring and auto-scale properties

Empower the engineers so that they can provide economy of scale focused on delivering value, speed to market, availability, monitoring & system management

Regulatory & Governance

Ensure all artefacts and assurance deliverables are as per the required standards and policies (e.g., SCB Governance Standards, ESDLC etc.).

Display exemplary conduct and live by the Group's Values and Code of Conduct.

Take personal responsibility for embedding the highest standards of ethics, including regulatory and business conduct, across Standard Chartered Bank. This includes understanding and ensuring compliance with, in letter and spirit, all applicable laws, regulations, guidelines and the Group Code of Conduct.

Key Stakeholders

WRB Application Teams

Chief Product Owner, Hive Lead, Product Owners, Engineering Leads

Other Responsibilities

Embed Here for Good and Group's brand and values in the digital sales / commerce team

Perform other responsibilities assigned under Group, Country, Business or Functional policies and procedures

Requirements & Skills

Bachelor's degree in computer science, Information Technology, or related field (or equivalent experience).

Proven experience (10+ years) as an SRE Engineer or in a similar role, with a proven track record of leadership.

Strong understanding of SRE principles and practices.

Proficiency in troubleshooting complex issues and exceptional problem-solving skills.

Deep knowledge of a wide array of software applications and infrastructure.

Experience with monitoring and observability tools (e.g., Prometheus, Grafana, AppDynamics, Splunk, PagerDuty).

Proficiency in scripting and automation (e.g., Python, Bash, Ansible).

Familiarity with cloud platforms (e.g., AWS, Azure) and containerization technologies (e.g., Docker, Kubernetes).

Excellent communication and collaboration skills.

Ability to work in a fast-paced, dynamic environment.

Strong attention to detail and a commitment to delivering high-quality results.

Ability to debug and troubleshoot Java applications.

Proficiency in using Splunk for log management and analysis.

Familiarity with CI / CD tools and practices.

Experience in the banking or financial services industry.

Certification in relevant technologies (e.g., AWS Certified Solutions Architect, Google Cloud Professional DevOps Engineer).

Knowledge of security best practices and compliance requirements.

Ability to articulate the overall vision for the Chapters and ensure upskilling of the organisation holistically

Experience in identifying skill gaps and mitigate risks to deliverables

Ensure all solutions are as per Architecture Standards

Strong experience in software development, system administration, or a related technical field.

Proficiency in programming / scripting languages such as Python, Go, Java, or Shell scripting.

Experience with containerization and orchestration technologies such as Docker, Kubernetes, or similar.

Deep understanding of Linux / Unix systems and networking fundamentals.

Experience with cloud platforms such as AWS, GCP, or Azure.

Strong analytical and problem-solving skills, with a keen attention to detail.

Excellent communication and collaboration skills, with the ability to work effectively in a cross-functional team environment.

Prior experience with DevOps practices, continuous integration / continuous delivery (CI / CD) pipelines, and infrastructure as code (IaC) is a plus.

Role Specific Technical Competencies

Software Engineering

Systems Software Infrastructure

Platform Architecture

Programming & Scripting (Java / Python or Similar Programming Language)

Cloud (AWS, Azure, GCP)

Database Development

Service Excellence

Agile Application Delivery Process

Operating Systems

Network Fundamentals

Security Fundamentals

Core Banking Domain Knowledge

About Standard Chartered

We're an international bank, nimble enough to act, big enough for impact. For more than 170 years, we've worked to make a positive difference for our clients, communities, and each other. We question the status quo, love a challenge and enjoy finding new opportunities to grow and do better than before. If you're looking for a career with purpose and you want to work for a bank making a difference, we want to hear from you. You can count on us to celebrate your unique talents and we can't wait to see the talents you can bring us.

Our purpose, to drive commerce and prosperity through our unique diversity, together with our brand promise, to be here for good are achieved by how we each live our valued behaviours. When you work with us, you'll see how we value difference and advocate inclusion.

Together we : Do the right thing

and are assertive, challenge one another, and live with integrity, while putting the client at the heart of what we do

Never settle,

continuously striving to improve and innovate, keeping things simple and learning from doing well, and not so well

Are better together,

we can be ourselves, be inclusive, see more good in others, and work collectively to build for the long term

What we offer

In line with our Fair Pay Charter,

we offer a competitive salary and benefits to support your mental, physical, financial and social wellbeing.

Core bank funding for retirement savings, medical and life insurance,

with flexible and voluntary benefits available in some locations.

Time-off

including annual leave, parental / maternity (20 weeks), sabbatical (12 months maximum) and volunteering leave (3 days), along with minimum global standards for annual and public holiday, which is combined to 30 days minimum.

Flexible working

options based around home and office locations, with flexible working patterns.

Proactive wellbeing support

through Unmind, a market-leading digital wellbeing platform, development courses for resilience and other human skills, global Employee Assistance Programme, sick leave, mental health first-aiders and all sorts of self-help toolkits

A continuous learning culture

to support your growth, with opportunities to reskill and upskill and access to physical, virtual and digital learning.

Being part of an inclusive and values driven organisation,

one that embraces and celebrates our unique diversity, across our teams, business functions and geographies - everyone feels respected and can realise their full potential.

Create a job alert for this search

Engineering Manager • India

Related jobs
  • Promoted
  • New!
Lead Site Reliability Engineer

Lead Site Reliability Engineer

Atyeti IncIndia
Job Description : We are seeking a highly skilled and motivated Site Reliability Engineer (SRE) to join our growing team. Bachelor’s degree in computer science, Engineering, or equivalent practical ...Show moreLast updated: 8 hours ago
  • Promoted
Senior Site Reliability Engineer

Senior Site Reliability Engineer

ITC InfotechIndia
Must-Have Requirements Experience : .SRE and / or DevOps roles Programming Skills : .Proficiency in at least one coding language — preferably. Experience supporting and enhancing.AI Platform services Auto...Show moreLast updated: 10 days ago
  • Promoted
Senior Staff Site Reliability Engineer

Senior Staff Site Reliability Engineer

Palo Alto NetworksIndia
At Palo Alto Networks® everything starts and ends with our mission : .Being the cybersecurity partner of choice, protecting our digital way of life. Our vision is a world where each day is safer and m...Show moreLast updated: 30+ days ago
  • Promoted
Senior Site Reliability Engineer

Senior Site Reliability Engineer

AllegionIndia
Allegion India is seeking a highly motivated Senior Site Reliability Engineer who will play a critical role in ensuring the reliability, scalability, and performance of our organization's systems a...Show moreLast updated: 30+ days ago
  • Promoted
Site Reliability Engineer

Site Reliability Engineer

SFS Group India Pvt. Ltd.India
Objectives Act as the Site Reliability Engineer for global operations, ensuring system stability, scalability, and efficiency through advanced automation, observability, and proactive infrastructur...Show moreLast updated: 8 days ago
  • Promoted
Senior Site Reliability Engineer

Senior Site Reliability Engineer

IntraEdgeNagpur, IN
Strong leadership and people management skills.Exceptional technical proficiency in Pearson's technology stack.Strategic thinking with a focus on long-term operational excellence.Champion operation...Show moreLast updated: 4 days ago
  • Promoted
Senior Site Reliability Engineer

Senior Site Reliability Engineer

iVoyantIndia
One of our clients is looking for an experienced Senior Site Reliability Engineer (SRE) - Mission-Critical SaaS Cloud Products to join their team. Reliability and Performance Management : .Design, imp...Show moreLast updated: 2 days ago
  • Promoted
Senior Site Reliability Engineer

Senior Site Reliability Engineer

ACL DigitalIndia
Python, AWS (EC2, IAM, Lambda, API Gateway, SNS, SQS & etc.GITHUB Actions, Service Management, Incident Management etc.Show moreLast updated: 12 days ago
  • Promoted
Site Reliability Engineer

Site Reliability Engineer

QualityKiosk TechnologiesIndia
QualityKiosk Technologies is one of the world's largest independent Quality Engineering (QE) providers and digital transformation enablers, helping companies build and manage applications for optim...Show moreLast updated: 12 days ago
  • Promoted
Sr Engineer, Site Reliability [T500-20437]

Sr Engineer, Site Reliability [T500-20437]

TMUS Global SolutionsIndia
NASDAQ : TMUS), headquartered in Bellevue, Washington, is America’s supercharged Un-carrier, connecting millions through its strong nationwide network and flagship brands, T-Mobile and Metro by T-Mo...Show moreLast updated: 16 days ago
  • Promoted
Site Reliability Engineer

Site Reliability Engineer

CapgeminiIndia, India
Choosing Capgemini means choosing a company where you will be empowered to shape your career in the way you’d like, where you’ll be supported and inspired by a collaborative community of colleagues...Show moreLast updated: 1 day ago
  • Promoted
Sr Engineer, Site Reliability [T500-20279]

Sr Engineer, Site Reliability [T500-20279]

TMUS Global SolutionsIndia
NASDAQ : TMUS), headquartered in Bellevue, Washington, is America’s supercharged Un-carrier, connecting millions through its strong nationwide network and flagship brands, T-Mobile and Metro by T-Mo...Show moreLast updated: 16 days ago
  • Promoted
Senior Engineering Manager [T500-21045]

Senior Engineering Manager [T500-21045]

Marriott Tech AcceleratorIndia
Bethesda, Maryland, USA, was founded in May 1927 by J.Marriott with a modest nine-seat A&W root beer stand.Guided by the family's leadership and core principles, Marriott International today has gr...Show moreLast updated: 2 days ago
  • Promoted
Site Reliability Engineer

Site Reliability Engineer

ACL DigitalIndia
Service Management : Maintain application uptime / performance, manage system enhancements and defects, oversee daily operational activities, and ensure continuous improvement and adherence to ITIL be...Show moreLast updated: 30+ days ago
  • Promoted
Site Reliability Engineer

Site Reliability Engineer

o9 Solutions, Inc.India
Be part of something revolutionary.At o9 Solutions, our mission is clear : be the Most Valuable Platform (MVP) for enterprises. With our AI-driven platform — the o9 Digital Brain — we integrate globa...Show moreLast updated: 30+ days ago
  • Promoted
Senior Site Reliability Engineer- ELK Expert

Senior Site Reliability Engineer- ELK Expert

iVedha Inc.India, India
Senior Site Reliability Engineer (SRE) – ELK Expert | Platform Engineering Practice.Must be available to work in the EST (US / Canada) Time Zone. Are you a Senior Site Reliability Engineer (SRE) with ...Show moreLast updated: 30+ days ago
  • Promoted
Senior Site Reliability Engineer

Senior Site Reliability Engineer

RecRootsIndia
The core premise for the SRE lies in treating operational issues as a software problem.We code our way out of problems where operations are concerned, addressing availability, scalability, latency,...Show moreLast updated: 23 days ago
  • Promoted
Senior Site Reliability Engineer (SRE)

Senior Site Reliability Engineer (SRE)

Tata Consultancy ServicesIndia
Role • • : Senior Site Reliability Engineer (SRE).Required Technical Skill Set : Senior Site Reliability Engineer (SRE).Desired Experience Range : 7 - 10 yrs. Notice Period : Immediate to 90Days only.Loca...Show moreLast updated: 2 days ago
  • Promoted
Site Reliability Engineering Manager

Site Reliability Engineering Manager

Tata Consultancy ServicesIndia
Role • • : Manager, Site Reliability Engineering Required Technical Skill Set : Manager, Site Reliability Engineering Desired Experience Range : 12 - 18 yrs Notice Period : Immediate to 90Days only Locat...Show moreLast updated: 12 days ago
  • Promoted
Site Reliability Engineer

Site Reliability Engineer

IntraEdgeIndia
Job Title : Site Reliability Engineer (SRE) – Production Support Location : Bengaluru.Job Summary : We are looking for a skilled. Site Reliability Engineer (SRE).DevOps practices, and cloud infrastruct...Show moreLast updated: 30+ days ago