Talent.com
Principal Site Reliability Engineer I / II

Principal Site Reliability Engineer I / II

ConfidentialHyderabad / Secunderabad, Telangana, India
5 days ago
Job description

About Zeta

Zeta is a Next-Gen Banking Tech company that empowers banks and fintechs to launch banking products for the future. It was founded by and Ramki Gaddipati in 2015.

Our flagship processing platform - Zeta Tachyon - is the industry's first modern, cloud-native, and fully API-enabled stack that brings together issuance, processing, lending, core banking, fraud & risk, and many more capabilities as a single-vendor stack. 20M+ cards have been issued on our platform globally.

Zeta is actively working with the largest Banks and Fintechs in multiple global markets transforming customer experience for multi-million card portfolios.

Zeta has over 1700+ employees - with over 70% roles in R&D - across locations in the US , EMEA , and Asia . We raised $280 million at a $1.5 billion valuation from Softbank, Mastercard, and other investors in 2021.

Learn more @ , , ,

About the Role :

The role of an Site Reliability Engineer is to bridge the gap between development and operations, focusing on building and maintaining reliable, scalable, and efficient systems. The ultimate goal is to ensure a seamless and reliable user experience while promoting a culture of automation, collaboration, and continuous improvement within the organization.

Responsibilities :

  • System Reliability : Ensuring the reliability of software systems by designing, implementing, and maintaining scalable and reliable infrastructure.
  • Automation : Developing automation tools and scripts to streamline operational tasks, reduce manual intervention, and improve overall system efficiency.
  • Incident Response and Resolution : Monitoring system performance and responding to incidents promptly to minimize downtime and ensure high availability.
  • Capacity Planning : Analyzing system usage patterns and forecasting future capacity needs to ensure that the infrastructure can handle current and future demands.
  • Performance Optimization : Identifying and addressing performance bottlenecks in software systems through optimization and tuning.
  • Infrastructure as Code (IaC) : Implementing infrastructure as code practices, using tools like Terraform or Ansible, to define and manage infrastructure in a version-controlled and automated manner.
  • Monitoring and Logging : Implementing and maintaining monitoring and logging solutions to gain insights into system behavior, troubleshoot issues, and proactively address potential problems.
  • Security : Collaborating with security teams to implement and maintain security best practices in infrastructure and application
  • Disaster Recovery Planning : Developing and maintaining disaster recovery plans to ensure that systems can quickly recover from major outages or failures
  • Continuous Improvement : Continuously analyzing system performance, reliability, and incidents to identify areas for improvement and implementing changes to enhance overall system resilience.
  • Mentorship and Coaching : Providing mentorship and coaching to team members to foster their professional development.

Skills :

  • Programming Languages : Proficiency in one or more programming languages, commonly Python, Go, Shell, Bash.
  • Automation and Scripting : Strong automation skills using tools like Ansible, Puppet, Chef, or custom scripts. Knowledge of Infrastructure as Code (IaC) tools like Terraform
  • Containerization and Orchestration : Experience with containerization technologies like Docker and container orchestration platforms like Kubernetes.
  • Cloud Computing : Proficiency in any of the cloud platforms such as AWS, Azure, or Google Cloud Platform, and knowledge of managing infrastructure in the cloud.
  • Monitoring and Logging : Familiarity with monitoring tools (e.g., Prometheus, Grafana, ELK stack) and logging frameworks to track system performance and troubleshoot issues.
  • Networking : Understanding of networking concepts, protocols, and troubleshooting skills.
  • Security : Knowledge of security best practices, including encryption, access controls, and vulnerability management.
  • Continuous Integration / Continuous Deployment (CI / CD) : Understanding and implementation of CI / CD pipelines for automated testing and deployment.
  • Load Balancing : Experience in incident response, troubleshooting, and resolution.
  • Version Control : Proficient use of version control systems like Git.
  • Experience & Qualifications :

  • 10 - 15 years of experience in site reliability engineering.
  • in computer science, information technology or a related field.
  • Having experience working for a product organization is a plus.
  • Certifications from cloud service providers like AWS Certified DevOps Engineer, Google Cloud Professional DevOps Engineer, or Microsoft Certified is a plus
  • Skills Required

    Git, Load Balancing, Puppet, Chef, Go, Grafana, Ansible, Shell, Continuous Integration, Continuous Deployment, Aws, Prometheus, Networking, Bash, Kubernetes, Python, Azure, Terraform, Docker, Security, Elk Stack, Google Cloud Platform

    Create a job alert for this search

    Site Reliability Engineer • Hyderabad / Secunderabad, Telangana, India

    Related jobs
    • Promoted
    Engineer, Site Reliability [T500-20517]

    Engineer, Site Reliability [T500-20517]

    TMUS Global SolutionsHyderabad, Telangana, India
    NASDAQ : TMUS), headquartered in Bellevue, Washington, is America’s supercharged Un-carrier, connecting millions through its strong nationwide network and flagship brands, T-Mobile and Metro by T-Mo...Show moreLast updated: 26 days ago
    • Promoted
    Sr Engineer, Site Reliability Engineer [T500-20464]

    Sr Engineer, Site Reliability Engineer [T500-20464]

    TMUS Global SolutionsHyderabad, Telangana, India
    NASDAQ : TMUS), headquartered in Bellevue, Washington, is America’s supercharged Un-carrier, connecting millions through its strong nationwide network and flagship brands, T-Mobile and Metro by T-Mo...Show moreLast updated: 26 days ago
    • Promoted
    Engineer, Site Reliability [T500-20521]

    Engineer, Site Reliability [T500-20521]

    TMUS Global SolutionsHyderabad, Telangana, India
    NASDAQ : TMUS), headquartered in Bellevue, Washington, is America’s supercharged Un-carrier, connecting millions through its strong nationwide network and flagship brands, T-Mobile and Metro by T-Mo...Show moreLast updated: 26 days ago
    • Promoted
    Site Reliability Engineer II

    Site Reliability Engineer II

    ConfidentialHyderabad / Secunderabad, Telangana
    Our purpose is to help a billion people find the right work! Phenom is an AI-Powered talent experience platform that is redefining the HR tech space. We have grown into a global organization with of...Show moreLast updated: 30+ days ago
    • Promoted
    Principal Engineer, Site Reliability - Accounting Technology T500-20232

    Principal Engineer, Site Reliability - Accounting Technology T500-20232

    ANSRHyderabad, Republic Of India, IN
    ANSR is hiring for one of its clients.NASDAQ : TMUS), headquartered in Bellevue, Washington, is America’s supercharged Un-carrier, connecting millions through its strong nationwide network and flags...Show moreLast updated: 30+ days ago
    • Promoted
    Senior Site Reliability Engineer

    Senior Site Reliability Engineer

    IntraEdgeHyderabad, IN
    Strong leadership and people management skills.Exceptional technical proficiency in Pearson's technology stack.Strategic thinking with a focus on long-term operational excellence.Champion operation...Show moreLast updated: 14 days ago
    • Promoted
    Principal Engineer, Site Reliability - Accounting Technology [T500-20232]

    Principal Engineer, Site Reliability - Accounting Technology [T500-20232]

    ANSRhyderabad, telangana, in
    ANSR is hiring for one of its clients.NASDAQ : TMUS), headquartered in Bellevue, Washington, is America’s supercharged Un-carrier, connecting millions through its strong nationwide network and flags...Show moreLast updated: 30+ days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Talent Sutrahyderabad, telangana, in
    The position exists to deploy the products and their updates ensuring smooth infrastructure and configuration management for robust project delivery. Operating System (Linux & Windows), Ansible, Doc...Show moreLast updated: 1 day ago
    • Promoted
    Principal Engineer, Site Reliability T500-20295

    Principal Engineer, Site Reliability T500-20295

    TMUS Global SolutionsHyderabad, Republic Of India, IN
    NASDAQ : TMUS), headquartered in Bellevue, Washington, is America’s supercharged Un-carrier, connecting millions through its strong nationwide network and flagship brands, T-Mobile and Metro by T-Mo...Show moreLast updated: 26 days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    CapgeminiHyderabad, IN
    Choosing Capgemini means choosing a company where you will be empowered to shape your career in the way you’d like, where you’ll be supported and inspired by a collaborative community of colleagues...Show moreLast updated: 11 days ago
    • Promoted
    Principal Site Reliability Engineer - IAC Terraform

    Principal Site Reliability Engineer - IAC Terraform

    TidyhireHyderabad
    Description : This is a pure individual contributor role.Core Responsibilities : Infrastructure Design &...Show moreLast updated: 23 days ago
    • Promoted
    Principal Engineer, Site Reliability [T500-20295]

    Principal Engineer, Site Reliability [T500-20295]

    TMUS Global SolutionsHyderabad, Telangana, India
    NASDAQ : TMUS), headquartered in Bellevue, Washington, is America’s supercharged Un-carrier, connecting millions through its strong nationwide network and flagship brands, T-Mobile and Metro by T-Mo...Show moreLast updated: 26 days ago
    • Promoted
    Engineer, Site Reliability [T500-20515]

    Engineer, Site Reliability [T500-20515]

    TMUS Global SolutionsHyderabad, Telangana, India
    NASDAQ : TMUS), headquartered in Bellevue, Washington, is America’s supercharged Un-carrier, connecting millions through its strong nationwide network and flagship brands, T-Mobile and Metro by T-Mo...Show moreLast updated: 26 days ago
    • Promoted
    Site Reliability Engineer [T500-21132]

    Site Reliability Engineer [T500-21132]

    InspireHyderabad, Telangana, India
    About Inspire Brands : Inspire Brands is disrupting the restaurant industry through digital transformation and operational efficiencies. The company’s technology hub, Inspire Brands Hyderabad Suppor...Show moreLast updated: 1 day ago
    • Promoted
    Engineer, Site Reliability [T500-20266]

    Engineer, Site Reliability [T500-20266]

    TMUS Global SolutionsHyderabad, Telangana, India
    NASDAQ : TMUS), headquartered in Bellevue, Washington, is America’s supercharged Un-carrier, connecting millions through its strong nationwide network and flagship brands, T-Mobile and Metro by T-Mo...Show moreLast updated: 26 days ago
    • Promoted
    Site Reliability Engineer - II

    Site Reliability Engineer - II

    ConfidentialHyderabad / Secunderabad, Telangana, India
    LivePerson (NASDAQ : LPSN) is a leading customer engagement company, creating digital experiences powered by Curiously Human AI. Every person is unique, and our technology makes it possible for compa...Show moreLast updated: 3 days ago
    • Promoted
    Site Reliability Engineer III

    Site Reliability Engineer III

    ConfidentialHyderabad / Secunderabad, Telangana, India
    As a Site Reliability Engineer III at JPMorgan Chase within the Chief Technology Office, you will collaborate with engineering, support, and operations teams to maintain and improve the reliability...Show moreLast updated: 30+ days ago
    • Promoted
    Lead Site Reliability Engineer

    Lead Site Reliability Engineer

    S&P GlobalHyderabad, Telangana, India
    This job is with S&P Global, an inclusive employer and a member of myGwork – the largest global platform for the LGBTQ+ business community. Please do not contact the recruiter directly.About the Rol...Show moreLast updated: 7 days ago