Talent.com
Principal Site Reliability Engineer (SRE)

Principal Site Reliability Engineer (SRE)

Creyente InfotechHyderabad, Republic Of India, IN
8 hours ago
Job description

Job Description – AWS Platform Engineer

About the Role

We are seeking a AWS Platform Engineer to take full ownership of mission-critical financial systems within an investment banking environment. This role is not a traditional “support” function—it requires a hands-on engineer who can ensure system resilience, reliability, and operational excellence across on-premise and AWS-hosted environments.

The ideal candidate is a hardcore technical engineer who can automate, troubleshoot, and optimize platforms end-to-end while driving operational maturity.

Key Responsibilities

System Ownership & Reliability

  • Take complete responsibility for assigned financial systems and ensure high availability.
  • Troubleshoot complex issues across infrastructure, applications, and integrations.
  • Conduct root cause analysis and implement permanent fixes to prevent recurrences.

Operational Excellence

  • Design and manage Disaster Recovery (DR) strategies;
  • conduct periodic DR drills.

  • Ensure timely patching, upgrades, and compliance with security standards.
  • Define and manage backup and restore strategies.
  • Build monitoring, logging, and alerting frameworks to proactively detect issues.
  • SRE & Automation

  • Apply site reliability engineering principles to improve system performance and resilience.
  • Automate operational tasks (deployments, failover tests, log analysis, scaling).
  • Develop tooling and scripts (Python, Shell, Ansible) for efficiency and reliability.
  • Implement self-healing mechanisms and runbooks for predictable operations.
  • Cloud & Hybrid Environments

  • Engineer and operate systems deployed on on-prem and AWS platforms.
  • Leverage key AWS services such as EC2, ECS / Fargate, RDS, S3, CloudWatch, IAM, Lambda, and VPC networking.
  • Work closely with infrastructure teams to optimize scalability and performance.
  • Required Skills & Experience

    Technical Skills

  • 5–10 years in platform engineering within financial services or capital markets.
  • Strong SRE (Site Reliability Engineering) experience focused on automation, observability, and resilience.
  • Expertise in Linux / Windows environments.
  • Hands-on with key AWS services (EC2, ECS / Fargate, RDS, S3, IAM, Lambda, VPC).
  • Strong automation and scripting skills (Python, Shell, Ansible, and Terraform preferred).
  • Proficiency in monitoring / observability tools (CloudWatch, Prometheus, Grafana, ELK, etc.).
  • Operational Expertise

  • Proven experience in patching, DR, backup, monitoring, and system hardening.
  • Strong troubleshooting skills across applications, middleware, and databases.
  • Familiarity with incident, problem, and change management frameworks (ITIL preferred).
  • Preferred

  • Exposure to financial applications (Front Arena, Calypso, Murex, or similar).
  • Strong background in automation-driven operations and performance tuning.
  • Soft Skills

  • Strong sense of ownership and accountability.
  • Calm, decisive, and resilient in high-pressure financial environments.
  • Collaborative, with excellent communication skills across business and technical teams.
  • Passionate about automation and continuous improvement.
  • Why Join Us?

  • This is an opportunity for a tech champ who wants to own platforms end-to-end , applying SRE principles to financial systems that are critical to investment banking operations. You will play a pivotal role in making these systems highly reliable, scalable, and secure.
  • Excellent salary and other benefits.
  • Growing FinTech Startup.
  • Create a job alert for this search

    Site Reliability Engineer • Hyderabad, Republic Of India, IN

    Related jobs
    • Promoted
    Sr Engineer, Site Reliability Engineer

    Sr Engineer, Site Reliability Engineer

    TMUS Global SolutionsHyderabad, India
    The Senior Systems Reliability Engineer (SRE) ensures the stability, performance, and reliability of IT services and infrastructure. This role combines software engineering and operations expertise ...Show moreLast updated: 30+ days ago
    • Promoted
    • New!
    Site Reliability Engineer

    Site Reliability Engineer

    SynamediaHyderabad, IN
    At Synamedia, the world’s most talented innovators and trailblazers are shaping the way the world is entertained and informed. We are backed by the Permira funds and Sky.This is the age of infinite ...Show moreLast updated: 1 hour ago
    • Promoted
    Sr Engineer, Site Reliability Engineer [T500-20464]

    Sr Engineer, Site Reliability Engineer [T500-20464]

    TMUS Global SolutionsHyderabad, Telangana, India
    NASDAQ : TMUS), headquartered in Bellevue, Washington, is America’s supercharged Un-carrier, connecting millions through its strong nationwide network and flagship brands, T-Mobile and Metro by T-Mo...Show moreLast updated: 30+ days ago
    • Promoted
    Senior Site Reliability Engineer

    Senior Site Reliability Engineer

    IntraEdgeHyderabad, IN
    Strong leadership and people management skills.Exceptional technical proficiency in Pearson's technology stack.Strategic thinking with a focus on long-term operational excellence.Champion operation...Show moreLast updated: 19 days ago
    • Promoted
    Senior Site Reliability Engineer- ELK Expert

    Senior Site Reliability Engineer- ELK Expert

    iVedha Inc.secunderabad, telangana, in
    Senior Site Reliability Engineer (SRE) – ELK Expert | Platform Engineering Practice.Must be available to work in the EST (US / Canada) Time Zone. Are you a Senior Site Reliability Engineer (SRE) with ...Show moreLast updated: 30+ days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    NationsBenefits IndiaHyderabad, Telangana, India
    Site Reliability Engineer (SRE) | Fintech | Kubernetes | Datadog |.SRE team focused on maintaining the performance, reliability, and availability of our fintech platforms.Triage and resolve product...Show moreLast updated: 27 days ago
    • Promoted
    Principal Engineer, Site Reliability

    Principal Engineer, Site Reliability

    TMUS Global SolutionsHyderabad, India
    The Principal Engineer, Site Reliability (SRE) will play a critical role in ensuring the stability, scalability, and operational excellence of Accounting and Finance platforms.This role is focused ...Show moreLast updated: 30+ days ago
    • Promoted
    Sr Engineer, Site Reliability

    Sr Engineer, Site Reliability

    TMUS Global SolutionsHyderabad, India
    The Senior Systems Reliability Engineer (SRE) ensures the stability, performance, and reliability of IT services and infrastructure. This role combines software engineering and operations expertise ...Show moreLast updated: 30+ days ago
    • Promoted
    Principal Engineer, Site Reliability T500-20295

    Principal Engineer, Site Reliability T500-20295

    TMUS Global SolutionsHyderabad, Republic Of India, IN
    NASDAQ : TMUS), headquartered in Bellevue, Washington, is America’s supercharged Un-carrier, connecting millions through its strong nationwide network and flagship brands, T-Mobile and Metro by T-Mo...Show moreLast updated: 30+ days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    CapgeminiHyderabad, IN
    Choosing Capgemini means choosing a company where you will be empowered to shape your career in the way you’d like, where you’ll be supported and inspired by a collaborative community of colleagues...Show moreLast updated: 16 days ago
    • Promoted
    Engineer - Site Relibility - FPT

    Engineer - Site Relibility - FPT

    Talent500 INCHyderabad, India
    Engineer - Site Reliability - FPT.As a Site Reliability Engineer, youll play a crucial role in keeping our digital backbone running seamlessly for millions of customers. Your mission : reduce inciden...Show moreLast updated: 30+ days ago
    • Promoted
    Senior Site Reliability Engineer (SRE) – Datadog Observability

    Senior Site Reliability Engineer (SRE) – Datadog Observability

    Jade Globalsecunderabad, telangana, in
    Senior Site Reliability Engineer (SRE) – Datadog Observability.SRE and Infrastructure Operations with minimum 3.Hyderabad preferable but open for Pune and remote. Site Reliability Engineer (SRE).SRE...Show moreLast updated: 6 days ago
    • Promoted
    Sr Engineer, Site Reliability [T500-20279]

    Sr Engineer, Site Reliability [T500-20279]

    TMUS Global Solutionshyderabad, telangana, in
    NASDAQ : TMUS), headquartered in Bellevue, Washington, is America’s supercharged Un-carrier, connecting millions through its strong nationwide network and flagship brands, T-Mobile and Metro by T-Mo...Show moreLast updated: 30+ days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    CitNOW GroupHyderabad, IN
    Founded in 2008, CitNOW is an innovative, enterprise-level software product suite that allows automotive dealerships globally to sell more vehicles and parts more profitably.CitNOW’s app-based plat...Show moreLast updated: 5 days ago
    • Promoted
    Principal Engineer, Site Reliability [T500-20295]

    Principal Engineer, Site Reliability [T500-20295]

    TMUS Global SolutionsHyderabad, Telangana, India
    NASDAQ : TMUS), headquartered in Bellevue, Washington, is America’s supercharged Un-carrier, connecting millions through its strong nationwide network and flagship brands, T-Mobile and Metro by T-Mo...Show moreLast updated: 30+ days ago
    • Promoted
    Lead Site Reliability Engineer

    Lead Site Reliability Engineer

    FACTSETHyderabad, India
    FactSet creates flexible, open data and software solutions for over 200,000 investment professionals worldwide, providing instant access to financial data and analytics that investors use to make c...Show moreLast updated: 21 days ago
    • Promoted
    Engineer, Site Reliability

    Engineer, Site Reliability

    TMUS Global SolutionsHyderabad, India
    Engineer reliability : Identify potential system issues early, implement preventive measures, and boost system resilience. Automate for speed : Build tools, pipelines, and scripts that eliminate manua...Show moreLast updated: 30+ days ago
    • Promoted
    Principal Engineer, Site Reliability - Accounting Technology [T500-20232]

    Principal Engineer, Site Reliability - Accounting Technology [T500-20232]

    ANSRHyderabad, Telangana, India
    ANSR is hiring for one of its clients.NASDAQ : TMUS), headquartered in Bellevue, Washington, is America’s supercharged Un-carrier, connecting millions through its strong nationwide network and flags...Show moreLast updated: 30+ days ago