Talent.com
Senior Systems Reliability Engineer

Senior Systems Reliability Engineer

Blue Spire IncHyderabad, Republic Of India, IN
30+ days ago
Job description

About the Role :

We are seeking a highly skilled Senior L2 Ops Engineer to join our dynamic team. You will play a critical role in maintaining the stability, performance, and reliability of our systems through robust observability practices, incident response readiness, and a commitment to operational excellence.

This role focuses on payment solutions and requires hands-on experience with platforms like Fiserv Enterprise Payments Platform, ACI Universal Payments (UP) Framework, SWIFT, and Message Translation Service (MTS). A strong foundation in the BFSI Payments Domain is essential.

Key Responsibilities :

  • Act as a subject matter expert in system recovery processes, ensuring rapid resolution and minimal business impact using technologies such as Java, AWS Cloud Platform / Infra, API Engineering, Mainframe, and observability tools.
  • Design, implement, and maintain monitoring, alerting, and logging solutions using observability tools such as Datadog and Splunk, with automation / custom integration support via Python scripting.
  • Proactively identify risks and implement preventive measures to ensure system stability across distributed and cloud-native environments.
  • Leverage domain expertise in payment protocols (SWIFT, MTS, ACH), standards like ISO 20022, and regulatory frameworks including SEPA and PCI-DSS.
  • Improve incident response workflows, lead critical incident triage, and drive blameless postmortems. Keep documentation current and actionable.
  • Analyze recurring issues, perform deep-dive investigations, and collaborate across teams to implement long-term fixes and resiliency strategies.
  • Develop automation scripts (Python, Bash, PowerShell) for routine tasks, system health checks, and self-healing mechanisms to reduce manual intervention.
  • Work closely with Engineering, DevOps, and Business stakeholders to align operations with business goals and coordinate production readiness for deployments.
  • Exposure to tools like Istio and Launch Darkly to support traffic management and controlled feature rollouts.
  • Participate in release planning and coordination, pre / post-deployment validations, and production cutover support.

Required Qualifications :

  • Bachelor’s degree in Computer Science, IT, or a related field.
  • 7+ years of experience in L2 operations, incident management, and system recovery.
  • Deep knowledge of the Enterprise Payments Platform (EPP) and modern payment processing protocols.
  • Experience with payment gateways, fintech APIs, and ISO 8583 / ISO 20022.
  • Proficiency in Java, AWS Cloud Platform / Infra, API Engineering, and Mainframe systems.
  • Advanced experience with observability tooling (Datadog, Splunk) and Python scripting for integrations and automation.
  • Strong Linux / Unix systems knowledge and understanding of cloud environments (AWS, Azure, or GCP).
  • Familiarity with CI / CD pipelines, Docker, Kubernetes, and ITSM tools.
  • Experience with SQL / NoSQL database monitoring and troubleshooting.
  • Understanding of distributed systems and microservices architectures.
  • Excellent problem-solving skills and ability to perform under pressure.
  • Preferred Qualifications :

  • Experience developing or troubleshooting APIs using Spring Boot.
  • Familiarity with service meshes (Istio) and feature flagging tools (LaunchDarkly).
  • Exposure to secrets management tools like Vault or Consul.
  • Prior experience in highly regulated environments with a focus on compliance (e.G., PCI-DSS).
  • Strong documentation and communication skills.
  • Create a job alert for this search

    Senior System Engineer • Hyderabad, Republic Of India, IN

    Related jobs
    • Promoted
    SRE (Site Reliability Engineer)

    SRE (Site Reliability Engineer)

    Sonata SoftwareHyderabad, Republic Of India, IN
    We have immediate openings for SRE.Role - Site Reliability Engineer.Interested candidates can share your CVs to - sravani.Show moreLast updated: 21 days ago
    • Promoted
    Senior Site Reliability Engineer

    Senior Site Reliability Engineer

    Zyoin GroupHyderabad
    Description : As the most senior technical individual contributor within an entire division of Engine...Show moreLast updated: 5 days ago
    • Promoted
    Sr Engineer, Site Reliability Engineer [T500-20464]

    Sr Engineer, Site Reliability Engineer [T500-20464]

    TMUS Global SolutionsHyderabad, Telangana, India
    NASDAQ : TMUS), headquartered in Bellevue, Washington, is America’s supercharged Un-carrier, connecting millions through its strong nationwide network and flagship brands, T-Mobile and Metro by T-Mo...Show moreLast updated: 24 days ago
    • Promoted
    Senior Systems Engineer

    Senior Systems Engineer

    [24]7.aiHyderabad, Republic Of India, IN
    To design, plan and implement servers and to ensure consistent performance maximizing Uptime.To ensure that there is 24x7 support from the Server Management Group. Summary of Essential Job Functions...Show moreLast updated: 21 days ago
    • Promoted
    Senior Site Reliability Engineer

    Senior Site Reliability Engineer

    IntraEdgeHyderabad, IN
    Strong leadership and people management skills.Exceptional technical proficiency in Pearson's technology stack.Strategic thinking with a focus on long-term operational excellence.Champion operation...Show moreLast updated: 12 days ago
    • Promoted
    Senior Systems Reliability Specialist

    Senior Systems Reliability Specialist

    TMUS Global SolutionsHyderabad, Republic Of India, IN
    NASDAQ : TMUS), headquartered in Bellevue, Washington, is America’s supercharged Un-carrier, connecting millions through its strong nationwide network and flagship brands, T-Mobile and Metro by T-Mo...Show moreLast updated: 25 days ago
    • Promoted
    Principal Site Reliability Engineer, Financial Systems

    Principal Site Reliability Engineer, Financial Systems

    ANSRHyderabad, Republic Of India, IN
    ANSR is hiring for one of its clients.NASDAQ : TMUS), headquartered in Bellevue, Washington, is America’s supercharged Un-carrier, connecting millions through its strong nationwide network and flags...Show moreLast updated: 27 days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    CapgeminiHyderabad, IN
    Choosing Capgemini means choosing a company where you will be empowered to shape your career in the way you’d like, where you’ll be supported and inspired by a collaborative community of colleagues...Show moreLast updated: 9 days ago
    • Promoted
    Senior Engineer - Reliability [T500-18354]

    Senior Engineer - Reliability [T500-18354]

    ANSRHyderabad, Telangana, India
    To Care for People on Life's Journey®.We have a relentless drive for innovation and excellence.Whether you're engaging with customers at the airport or advancing our IT infrastructure, every team m...Show moreLast updated: 20 days ago
    • Promoted
    Sr Engineer, Site Reliability [T500-20279]

    Sr Engineer, Site Reliability [T500-20279]

    TMUS Global SolutionsHyderabad, Telangana, India
    NASDAQ : TMUS), headquartered in Bellevue, Washington, is America’s supercharged Un-carrier, connecting millions through its strong nationwide network and flagship brands, T-Mobile and Metro by T-Mo...Show moreLast updated: 25 days ago
    • Promoted
    Lead Systems Reliability Engineer

    Lead Systems Reliability Engineer

    TMUS Global SolutionsHyderabad, Republic Of India, IN
    NASDAQ : TMUS), headquartered in Bellevue, Washington, is America’s supercharged Un-carrier, connecting millions through its strong nationwide network and flagship brands, T-Mobile and Metro by T-Mo...Show moreLast updated: 25 days ago
    • Promoted
    Senior Site Reliability Engineer

    Senior Site Reliability Engineer

    TMUS Global SolutionsHyderabad, Republic Of India, IN
    NASDAQ : TMUS), headquartered in Bellevue, Washington, is America’s supercharged Un-carrier, connecting millions through its strong nationwide network and flagship brands, T-Mobile and Metro by T-Mo...Show moreLast updated: 25 days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    o9 Solutions, Inc.hyderabad, telangana, in
    Be part of something revolutionary.At o9 Solutions, our mission is clear : be the Most Valuable Platform (MVP) for enterprises. With our AI-driven platform — the o9 Digital Brain — we integrate globa...Show moreLast updated: 21 days ago
    • Promoted
    Principal Systems Reliability Engineer

    Principal Systems Reliability Engineer

    TMUS Global SolutionsHyderabad, Republic Of India, IN
    NASDAQ : TMUS), headquartered in Bellevue, Washington, is America’s supercharged Un-carrier, connecting millions through its strong nationwide network and flagship brands, T-Mobile and Metro by T-Mo...Show moreLast updated: 25 days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    TMUS Global SolutionsHyderabad, Republic Of India, IN
    NASDAQ : TMUS), headquartered in Bellevue, Washington, is America’s supercharged Un-carrier, connecting millions through its strong nationwide network and flagship brands, T-Mobile and Metro by T-Mo...Show moreLast updated: 25 days ago
    • Promoted
    Senior Systems Reliability Engineer

    Senior Systems Reliability Engineer

    TMUS Global SolutionsHyderabad, Republic Of India, IN
    NASDAQ : TMUS), headquartered in Bellevue, Washington, is America’s supercharged Un-carrier, connecting millions through its strong nationwide network and flagship brands, T-Mobile and Metro by T-Mo...Show moreLast updated: 24 days ago
    • Promoted
    Senior Site Reliability Engineer (Accounting Systems)

    Senior Site Reliability Engineer (Accounting Systems)

    ANSRHyderabad, Republic Of India, IN
    ANSR is hiring for one of its clients.NASDAQ : TMUS), headquartered in Bellevue, Washington, is America’s supercharged Un-carrier, connecting millions through its strong nationwide network and flags...Show moreLast updated: 27 days ago
    • Promoted
    Systems Reliability Specialist

    Systems Reliability Specialist

    SID Global SolutionsHyderabad, Republic Of India, IN
    Job Role : Site Reliability Engineer (SRE) – GCP.SIDGS is a premium global systems integrator and global implementation partner of Google corporation, providing Digital Solutions & Services to Fortu...Show moreLast updated: 30+ days ago