Talent.com
This job offer is not available in your country.
Site Reliability Engineer

Site Reliability Engineer

Talent WorxChennai, TN, IN
13 days ago
Job type
  • Quick Apply
Job description

EXP required - 5 to 8 years.

Role and Responsibilities

Reporting to Engineering, the Site Reliability Engineer will play a critical role in driving innovation and growth for the Banking Solutions, Payments and Capital Markets business.  In this role, the candidate will have the opportunity to make a lasting impact on the company's transformation journey, drive customer-centric innovation and automation, and position the organization as a leader in the competitive banking, payments and investment landscape. Specifically, the Site Reliability Engineer will be responsible for the following :

  • Design and maintain monitoring solutions and alerting mechanisms for infrastructure, application performance, and user experience metrics, enabling proactive issue detection and mitigation.
  • Implement automation tools and processes to automate routine tasks, scale infrastructure, and ensure seamless deployments, updates, and rollbacks with minimal user impact.
  • Ensure the reliability, availability, and performance of applications and services, focusing on minimizing downtime, optimizing response times, and maintaining high availability for users.
  • Lead incident response efforts for incidents, including identification, triage, resolution, and post-incident analysis to prevent recurrence and improve system resilience.
  • Conduct capacity planning, performance tuning, and resource optimization for environments, collaborating with development and operations teams to meet scalability and performance goals.
  • Collaborate with security teams to implement security best practices, perform vulnerability assessments, and ensure compliance with security standards and regulatory requirements for applications.
  • Manage deployment pipelines, release processes, and configuration management for app deployments, ensuring consistency, reliability, and version control across environments.
  • Identify areas for improvement in reliability, performance, and efficiency through data analysis, root cause analysis, and trend analysis, and drive initiatives to enhance system reliability and operational efficiency.
  • Create and maintain documentation, runbooks, and knowledge base articles for operational procedures, troubleshooting guides, and best practices, and promote knowledge sharing within the team.
  • Develop and test disaster recovery plans, backup strategies, and failover mechanisms for app services, ensuring business continuity and data integrity in case of failures or disasters.
  • Collaborate with development, QA, DevOps, and product teams to ensure alignment on reliability goals, performance metrics, release schedules, and incident response processes.
  • Participate in on-call rotations and provide 24 / 7 support for critical incidents, troubleshoot issues, and coordinate with teams for resolution, escalation, and follow-up actions as per defined SLAs.

Professional Qualifications

  • Proficient in development technologies, architectures, and platforms (web, api) to understand system complexities and performance considerations.
  • Experience in cloud platforms (e.g., AWS, Azure, Google Cloud) and infrastructure as code (IaC) tools for managing app infrastructure and deployments.
  • Knowledge of monitoring tools (e.g., Prometheus, Grafana, DataDog, New Relic) and logging frameworks (e.g., Splunk, SumoLogic, ELK Stack) for real-time visibility into system health, performance metrics, and user experience.
  • Experience in incident management, including incident response, triage, root cause analysis (RCA), and post-mortem reviews to prevent recurring issues.
  • Strong troubleshooting skills to diagnose complex technical issues in app environments, infrastructure, networking, and performance bottlenecks.
  • Proficiency in scripting languages (e.g., Python, Bash) and automation tools (e.g., Terraform, Ansible) for automating routine tasks, deployments, and infrastructure management.
  • Experience in implementing continuous integration / continuous deployment (CI / CD) pipelines for apps using tools like Jenkins, GitLab CI / CD, or Azure DevOps.
  • Expertise in setting up monitoring solutions, configuring alerts, and creating dashboards to monitor system performance, application metrics, and user experience.
  • Familiarity with APM (Application Performance Monitoring) tools to analyze app performance, identify bottlenecks, and optimize resource utilization.
  • Familiarity with RUM (Real User Monitoring) for tracking and analyzing user interaction and system performance.
  • Commitment to continuous learning, staying updated with industry trends, new technologies, and best practices in app reliability, performance, and operations.
  • Adaptability to evolving requirements, technologies, and business needs, with a focus on driving continuous improvement and operational excellence.
  • Personal Characteristics

  • Demonstrates judgment and flexibility; thinks about issues and develops solutions that thoughtfully take the broader context into account - positively deals with a shifting demand for time, priorities, and the rapid change of environments.
  • Takes an ownership approach to engineering and product outcomes.
  • Action-oriented self-starter who can set strategy and drive execution with a “roll up the sleeves” approach.
  • Excellent interpersonal communication, negotiation and influencing skills to work effectively with all stakeholders (internal & external), making information-based decisions.
  • Penchant for excellence, both personally and professionally, demonstrated by intellectual curiosity, record of accomplishment, and reputation; shows strong attention to detail and implementation of best practices with an inclination for continuous improvement.
  • Ability to quickly establish strong credibility with employees, business partners and external resources.
  • Embodies and delivers the firm's values and culture towards colleagues, clients, and communities :
  • o   Win as one team

    o   Lead with integrity

    o   Be the change

    Benefits

    Talent Worx Is a emerging recruitment firm. we are hiring for our client who is in advance the way the world pays, banks, and invests. With decades of expertise, we provide financial technology solutions to financial institutions, businesses, and developer

    Create a job alert for this search

    Site Reliability Engineer • Chennai, TN, IN

    Related jobs
    • Promoted
    Senior Site Reliability Engineer

    Senior Site Reliability Engineer

    PoshmarkChennai, Tamil Nadu, India
    We’re looking for an experienced Site Reliability Engineer to fill the mission-critical role of ensuring that our complex, web-scale systems are healthy, monitored, automated, and designed to scale...Show moreLast updated: 17 days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    ConfidentialChennai
    A Site Reliability Engineer is a professional who plays a crucial role in maintaining the reliability and performance of computer systems in an organization. They bridge the gap between development ...Show moreLast updated: 30+ days ago
    • Promoted
    Senior Site Reliability Engineer

    Senior Site Reliability Engineer

    ConfidentialChennai
    Serve as subject matter expert in monitoring and observability.Design and implement the tools to improve the reliability and efficiency of Lifion services and data stores.Automate infrastructure an...Show moreLast updated: 30+ days ago
    Site Reliability Engineer

    Site Reliability Engineer

    Weekday AIChennai, TN, IN
    Quick Apply
    This role is for one of Weekday’s clients.If you thrive in a small, high-energy team and want to play a key role in shaping infrastructure and reliability at scale, this is the place for you.We’re ...Show moreLast updated: 28 days ago
    • Promoted
    Site Reliability Engineer - AWS / Azure

    Site Reliability Engineer - AWS / Azure

    Funic TechChennai
    Job Title : Site Reliability Engineer (SRE) Experience Required : 7+ Years Location : Bangalore / Chennai &l...Show moreLast updated: 14 days ago
    • Promoted
    Reliability Engineer

    Reliability Engineer

    Alp Consulting Ltd.Chennai, Tamil Nadu, India
    Job Title : Reliability Engineer.Qualification : Diploma / BE (Mech.Experience of maintaining the Instruments, Valves, transmitters, Sensors, Control systems (DCS / PLC, SCADA), Analyzers and F &G system...Show moreLast updated: 30+ days ago
    • Promoted
    Site Reliability Engineer - Chaos Management

    Site Reliability Engineer - Chaos Management

    Xebiachennai, tamil nadu, in
    AWS Engineer with strong Python development and Chaos Engineering expertise.The ideal candidate will combine cloud engineering, DevOps, and chaos experimentation to improve reliability, fault toler...Show moreLast updated: 21 days ago
    • Promoted
    Senior Site Reliability Engineer- ELK Expert

    Senior Site Reliability Engineer- ELK Expert

    iVedha Inc.Chennai, IN
    Senior Site Reliability Engineer (SRE) – ELK Expert | Platform Engineering Practice.Must be available to work in the EST (US / Canada) Time Zone. Are you a Senior Site Reliability Engineer (SRE) with ...Show moreLast updated: 30+ days ago
    • Promoted
    Poshmark - Senior Site Reliability Engineer - Cloud Infrastructure

    Poshmark - Senior Site Reliability Engineer - Cloud Infrastructure

    POSHMARKChennai
    Job Description : Were looking for an experienced Site Reliability Engineer to fill the mission-critical role of ensuring that our complex, web-scale systems ...Show moreLast updated: 30+ days ago
    • Promoted
    • New!
    ▷ [Immediate Start] Senior Site Reliability Engineer

    ▷ [Immediate Start] Senior Site Reliability Engineer

    PoshmarkChennai, Tamil Nadu, India
    We’re looking for an experienced Site Reliability Engineer to fill the mission-critical role of ensuring that our complex, web-scale systems are healthy, monitored, automated, and designed to scale...Show moreLast updated: 2 hours ago
    • Promoted
    Senior Site Reliability Engineer

    Senior Site Reliability Engineer

    Tata Consultancy ServicesChennai, Tamil Nadu, India
    TCS is looking for Senior Site Reliability Engineer – AWS.Design, implement, and maintain scalable, secure, and highly available infrastructure on AWS. Develop and improve CI / CD pipelines, Infrastru...Show moreLast updated: 18 days ago
    • Promoted
    Site Reliability Engineer 2

    Site Reliability Engineer 2

    ConfidentialChennai
    Work with team to plan, design and deploy new cloud technologies.Create, Maintain , and Enhance Automated Product Deployments. Develop, Modify, Support and maintain AWS based components through Infr...Show moreLast updated: 30+ days ago
    • Promoted
    Site Reliability Engineer - Cloud Platforms

    Site Reliability Engineer - Cloud Platforms

    LanceSoft, IncChennai
    Role and Responsibilities : Reporting to Engineering, the Site Reliability Engineer will play a critical role in driving innovation and growth for the Banking Soluti...Show moreLast updated: 30+ days ago
    • Promoted
    RELX - Site Reliability Engineer - IAC Terraform

    RELX - Site Reliability Engineer - IAC Terraform

    REED ELSEVIER INDIA (a part of RELX India Pvt Ltd)Chennai
    Job Description : - Lead initiatives to identify and eliminate manual, repetitive tasks through automation and tooling.Develop s...Show moreLast updated: 30+ days ago
    • Promoted
    Senior Site Reliability Engineer

    Senior Site Reliability Engineer

    Loyalytics AIChennai
    Site Reliability / DevOps Engineer to be our first hire in this function, responsible for owning and scaling the reliability, observability, and infrastructure of our platform running entirely on M...Show moreLast updated: 28 days ago
    • Promoted
    Keuro Life - Senior Site Reliability Engineer - DevOps

    Keuro Life - Senior Site Reliability Engineer - DevOps

    Keuro LifeChennai
    Site Reliability Engineer / DevOps We are seeking an experienced Site Reliability Engineer / DevOps professional with a minimum of 6 years in the industry.The ideal c...Show moreLast updated: 30+ days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    ElgebraChennai
    Role Overview : We are seeking a highly experienced and technically proficient Site Reliability Engineer (SRE) to join our team in support of our c...Show moreLast updated: 17 days ago
    • Promoted
    Lead Site Reliability Engineer

    Lead Site Reliability Engineer

    ConfidentialChennai
    Lead Site Reliability Engineer.Sr Manager, Availability Management.Hybrid (Part Office / Part Home).Cloud Site Reliability Engineer Responsibilities. On-board internal customers to our 24x7 Applicatio...Show moreLast updated: 30+ days ago