Talent.com
SITA - Lead Site Reliability Engineer/Expert
SITA - Lead Site Reliability Engineer/ExpertSITA INFORMATION NETWORKING COMPUTING INDIA • Delhi, IN
SITA - Lead Site Reliability Engineer / Expert

SITA - Lead Site Reliability Engineer / Expert

SITA INFORMATION NETWORKING COMPUTING INDIA • Delhi, IN
8 days ago
Job description

PURPOSE :

Responsible for the proactive support of products so that there is high product performance that is continuously improved.

Responsible for identifying and resolving the root causes of operational incidents implementing solutions to improve stability and prevent recurrence.

Manages the creation and maintenance of the event catalog to trigger events and develops both manual remediation approaches and automated workflows to resolve alerts.

Oversees the deployment of IT services and solutions ensuring successful integration with minimal disruption.

Focuses on operational automation and integration to enhance efficiency and collaboration between development and operations within service operations.

KEY RESPONSIBILITIES :

  • Define, build, and maintain support systems to ensure high availability and performance.
  • Handle complex cases for the Operations team.
  • Build events to add to the event catalog for the relevant product or application.
  • Implement automation for system provisioning, self-healing, auto recovery, deployment, and monitoring.
  • Perform incident response and root cause analysis for critical system failures.
  • Monitor system performance and establish service-level indicators (SLIs) and objectives (SLOs).
  • Collaborate with development and operations to integrate reliability best practices, including moving to zero downtime architecture.
  • Proactively identify and remediate performance issues.
  • Work closely with Product, Software & Infra Engineering and Service support architects for new product productization.
  • Ensure Operations readiness to support new products.
  • Coordinate with internal and external stakeholders for feedback for continual service improvement for in scope products & drive plan till successful closure.
  • Accountable for the in-scope product to ensure high availability performance.

Problem Management :

  • Conduct thorough problem investigations and root cause analyses (RCA) to diagnose recurring incidents and service disruptions.
  • Coordinate with incident management teams,operations experts and collaborate with different Service Operations and Engineering teams to develop and implement permanent solutions.
  • Monitor the effectiveness of problem resolution activities, provide regular reports on problem management activities, and ensure continuous improvement.
  • Event Management :

  • Define and maintain an event catalog, specifying active events, thresholds, and relevant remediation, and optimize it for efficiency.
  • Develop event response protocols, provide training to teams, and ensure quick and efficient handling of incidents.
  • Collaborate with stakeholders to define events, ensure coverage across the Service Operations, and drive improvements based on post-event reviews and feedback.
  • Deployment Management :

  • Own the quality of new release deployment for the Service Operations, ensuring a clear process and responsibilities are assigned for smooth implementation.
  • Develop and maintain deployment schedules, conduct operational readiness assessments, and manage deployment risk assessments to ensure service stability.
  • Oversee the execution of deployment plans, coordinate resources & process with delivery and lifecycle engineering, communicate with stakeholders, and continuously work with different stakeholders to improve deployment processes based on feedback.
  • DevOps Management :

  • Manage continuous integration and deployment (CI / CD) pipelines, ensuring smooth integration between development and operational teams.
  • Automate operational processes, monitor system performance, and resolve issues related to automation scripts to increase efficiency.
  • Implement and manage infrastructure as code, provide ongoing support for automation tools, and continuously improve DevOps practices.
  • EXPERIENCE :

  • 8+ years of experience in IT operations service management or infrastructure management or application management including roles such as Site Reliability Engineering lead or DevOps Engineer / lead.
  • Proven experience in managing high-availability systems and ensuring operational reliability.
  • Extensive experience in root cause analysis (RCA) incident management and developing permanent solutions for recurring service disruptions.
  • Extensive expertise in monitoring and observability implementation.
  • Hands-on experience with CI / CD pipelines, automation system performance monitoring and the implementation of infrastructure as code.
  • Strong background in collaborating with cross-functional teams (development operations engineering etc.
  • ) to improve operational processes and service delivery.

  • Experience in managing deployments risk assessments and optimizing event and problem management processes.
  • Familiarity with cloud technologies containerization and scalable architecture including experience with zero-downtime deployment strategies.
  • KNOWLEDGE & SKILLS : Skills :

  • Collaboration.
  • Communication.
  • Problem Solving.
  • Incident Management.
  • Change Management.
  • Technical Skills :

  • Cloud Infrastructure (AWS, Azure).
  • Linux Administration.
  • Windows Administration.
  • Monitoring & Observability.
  • DevOps (CI / CD).
  • Programming & Scripting Languages.
  • Application Support.
  • PROFESSION COMPETENCIES :

  • Business Acumen.
  • Consultancy.
  • Financial Acumen.
  • Info Organisational Awareness.
  • Quality Orientation.
  • CORE COMPETENCIES :

  • Collaboration.
  • Communication.
  • Problem Solving.
  • Incident Management.
  • Change Management.
  • Innovation.
  • EDUCATION & QUALIFICATIONS : Background :

  • Bachelor's degree in Computer Science, Information Technology, Engineering, or a related field.
  • Advanced degree (Masters or equivalent) is often preferred for senior positions.
  • Qualifications :

  • Relevant certifications such as Linux Administration, Certified Kubernetes Administrator (CKA).
  • Certifications in cloud platforms (AWS, Azure, Google Cloud) or DevOps methodologies (e.g., Certified DevOps Professional).
  • Certification in Windows Administration, Linux Administration.
  • (ref : hirist.tech)

    Create a job alert for this search

    Lead Site Reliability • Delhi, IN

    Related jobs
    Senior Dell Boomi Integration Engineer

    Senior Dell Boomi Integration Engineer

    Maitsys • Delhi, IN
    Job Description : Senior Boomi Integration Engineer.Atom migration (on-prem → cloud), integration development, and ongoing support. Senior Dell Boomi Integration Engineer.Boomi Atom to a cloud-hosted...Show more
    Last updated: 8 days ago • Promoted
    Senior Site Reliability Engineer

    Senior Site Reliability Engineer

    o9 Solutions, Inc. • Delhi, Republic Of India, IN
    Be part of something revolutionary.At o9 Solutions, our mission is clear : be the Most Valuable Platform (MVP) for enterprises. With our AI-driven platform — the o9 Digital Brain — we integrate globa...Show more
    Last updated: 12 days ago • Promoted
    Site Reliability Engineer (SRE) – Infrastructure & Automation

    Site Reliability Engineer (SRE) – Infrastructure & Automation

    InstaService • Delhi, IN
    InstaService is revolutionizing the home services industry through AI-driven technology, connecting customers with trusted professionals instantly. We’re growing fast across 23+ states and expanding...Show more
    Last updated: 20 days ago • Promoted
    Technical Lead

    Technical Lead

    Mphasis • Delhi, IN
    Looking for Senior Ingenium Developer with 10+ years' experience and following skills.Experience in Mainframe O / S and Development using COBOL programming language & JCL. Experience in development an...Show more
    Last updated: 7 days ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Yum! India Global Services Private Limited • Delhi, Republic Of India, IN
    Design, test, implement, deploy, and support continuous integration pipelines that build and deploy to cloud-based environments (development, stage / testing, production). In this role, you will help ...Show more
    Last updated: 13 days ago • Promoted
    Senior DevOps & Database Reliability Engineer – 100% Remote

    Senior DevOps & Database Reliability Engineer – 100% Remote

    Hyly.AI • Delhi, IN
    Remote
    AI, we’re building the first AI + Data Fabric for the multifamily industry, transforming how clients manage, secure, and scale their marketing and operational data. As the industry moves toward a co...Show more
    Last updated: 14 days ago • Promoted
    SITA - Senior / Lead Site Reliability Engineer

    SITA - Senior / Lead Site Reliability Engineer

    SITA INFORMATION NETWORKING COMPUTING INDIA • Delhi
    About the job : WELCOME TO SITA : We're the team that keeps airports moving, airlines flying smoothly, and borders open.Our tech and communi...Show more
    Last updated: 30+ days ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Synamedia • Delhi, Republic Of India, IN
    At Synamedia, the world’s most talented innovators and trailblazers are shaping the way the world is entertained and informed. We are backed by the Permira funds and Sky.This is the age of infinite ...Show more
    Last updated: 15 days ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    HRhelpdesk • Delhi, Republic Of India, IN
    Company is a rapidly growing, private equity backed SaaS product company and provides cloud-based solutions.As a Site Reliability Engineer (SRE), you will be responsible for building and maintainin...Show more
    Last updated: 11 days ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Capgemini • Delhi, IN
    Choosing Capgemini means choosing a company where you will be empowered to shape your career in the way you’d like, where you’ll be supported and inspired by a collaborative community of colleagues...Show more
    Last updated: 30+ days ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Confidential • Kolkata, Delhi, Mumbai
    Hands on experience monitoring, managing, and maintaining high availability web systems (Windows and Linux) as a System Administrator Engineer. Follow and champion ITIL Best Practices and Standards....Show more
    Last updated: 30+ days ago • Promoted
    Senior Site Reliability Engineer (C# / Python)

    Senior Site Reliability Engineer (C# / Python)

    Entech • Delhi, IN
    Senior Software Site Reliability Engineer (C# / Python).You’ll ensure enterprise systems are reliable, scalable, and performant - driving improvements, leading SRE initiatives, and mentoring teams on...Show more
    Last updated: 7 days ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Xomiro Technologies • Delhi, IN
    Remote
    Description : Role : Site Reliability Engineer (SRE) Location : Remote-First - (Bangalore)(Hybrid : Rare O...Show more
    Last updated: 30+ days ago • Promoted
    Staff Site Reliability Engineer

    Staff Site Reliability Engineer

    Confidential • Kolkata, Delhi, Mumbai
    SRE organization s mission at SentinelOne (S1) is to keep our uptime promise to our customers by ensuring we meet our SLOs / SLAs, help our engineering teams ship software to our customers fast and w...Show more
    Last updated: 30+ days ago • Promoted
    Team Lead

    Team Lead

    ALTISOURCE BUSINESS SOLUTIONS PRIVATE LIMITED • Delhi, IN
    Willing to work in night shift.Lead the property inspection operations in a multi-client environment ensuring adherence to service level agreements and quality standards. Track team perfoJob Descrip...Show more
    Last updated: 13 days ago • Promoted
    Lead Expert - Information Systems (SAP PP / QM) Business

    Lead Expert - Information Systems (SAP PP / QM) Business

    Suzlon Group • Delhi, IN
    Seeking an experienced S / 4HANA PP / QM Consultant with 5-6 years of hands-on experience in SAP Production Planning (PP) and Quality Management (QM) modules within the S / 4HANA environment.The ideal ca...Show more
    Last updated: 10 days ago • Promoted
    Lead Engineer

    Lead Engineer

    Hyqoo • Delhi, IN
    Design, deploy, and manage AWS cloud infrastructure, including EC2 instances, S3 buckets, VPCs, RDS databases, and Lambda functions. Assist in the design, implementation, and maintenance of backup, ...Show more
    Last updated: 17 days ago • Promoted
    Full Chip STA Lead

    Full Chip STA Lead

    eInfochips (An Arrow Company) • Delhi, IN
    Full Chip STA Lead (8+ Years Experience).Bangalore, Hyderabad, Noida, Ahmedabad, Chennai, Pune.We are looking for an experienced. The ideal candidate will drive timing closure activities for complex...Show more
    Last updated: 9 days ago • Promoted