Talent.com
Site Reliability Engineer 2
Site Reliability Engineer 2PhonePe • Chennai, IN
No longer accepting applications
Site Reliability Engineer 2

Site Reliability Engineer 2

PhonePe • Chennai, IN
2 days ago
Job description

About PhonePe Limited :

Headquartered in India, its flagship product, the PhonePe digital payments app, was launched in Aug 2016. As of April 2025, PhonePe has over 60 Crore (600 Million) registered users and a digital payments acceptance network spread across over 4 Crore (40+ million) merchants. PhonePe also processes over 33 Crore (330+ Million) transactions daily with an Annualized Total Payment Value (TPV) of over INR 150 lakh crore.

PhonePe’s portfolio of businesses includes the distribution of financial products (Insurance, Lending, and Wealth) as well as new consumer tech businesses (Pincode - hyperlocal e-commerce and Indus AppStore Localized App Store for the Android ecosystem) in India, which are aligned with the company’s vision to offer every Indian an equal opportunity to accelerate their progress by unlocking the flow of money and access to services.

Culture :

At PhonePe, we go the extra mile to make sure you can bring your best self to work, Everyday!. And that starts with creating the right environment for you. We empower people and trust them to do the right thing. Here, you own your work from start to finish, right from day one. PhonePe-rs solve complex problems and execute quickly; often building frameworks from scratch. If you’re excited by the idea of building platforms that touch millions, ideating with some of the best minds in the country and executing on your dreams with purpose and speed, join us!

Minimum Experience : 3 Years

About the Role :

This role is responsible for managing and maintaining complex, distributed big data ecosystems. It ensures the reliability, scalability, and security of large-scale production infrastructure. Key responsibilities include automating processes, optimizing workflows, troubleshooting production issues, and driving system improvements across multiple business verticals.

Roles and Responsibilities :

  • Manage, maintain, and support incremental changes to Linux / Unix environments.
  • Lead on-call rotations and incident responses, conducting root cause analysis and driving postmortem processes.
  • Design and implement automation systems for managing infrastructure, including provisioning, scaling, upgrades, and patching clusters.
  • Troubleshoot and resolve complex production issues while identifying root causes and implementing mitigating strategies.
  • Design and review scalable and reliable system architectures.
  • Collaborate with teams to optimize overall system / cluster performance.
  • Enforce security standards across systems and infrastructure.
  • Set technical direction, drive standardization, and operate independently.
  • Ensure availability, performance, and scalability of systems and services through proactive monitoring, maintenance, and capacity planning.
  • Resolve, analyze, and respond to system outages and disruptions and implement measures to prevent similar incidents from recurring.
  • Develop tools and scripts to automate operational processes, reducing manual workload, increasing efficiency and improving system resilience.
  • Monitor and optimize system performance and resource usage, identify and address bottlenecks, and implement best practices for performance tuning.
  • Collaborate with development teams to integrate best practices for reliability, scalability, and performance into the software development lifecycle.
  • Stay informed of industry technology trends and innovations, and actively contribute to the organization's technology communities.
  • Develop and enforce SRE best practices and principles.
  • Align across functional teams on priorities and deliverables.
  • Drive automation to enhance operational efficiency.
  • Adapt new technologies as and when the need arises and define architectural recommendations for new tech stacks.

Skills Required :

  • 3 to 7 years of experience managing and maintaining distributed big ecosystems.
  • Strong expertise in Linux, MySQL, Networking, System Setup, Azure
  • Proficiency in scripting / programming in any backend language.
  • Familiarity with open-source configuration management and deployment tools.
  • Solid understanding of networking, open-source technologies, and related tools.
  • Excellent communication and collaboration skills.
  • On-Prem experience mandatory.
  • DevOps tools : Saltstack, Ansible, docker, Git.
  • SRE Logging and monitoring tools : ELK stack, Grafana, Prometheus, opentsdb, Open Telemetry.
  • Good to Have :

  • Experience managing infrastructure on public cloud platforms.
  • Experience in designing and reviewing system architectures for scalability and reliability.
  • Experience with observability tools to visualize and alert on system performance.
  • Experience in massive petabyte scale data migrations, massive upgrades.
  • PhonePe Full Time Employee Benefits (Not applicable for Intern or Contract Roles)

  • Insurance Benefits - Medical Insurance, Critical Illness Insurance, Accidental Insurance, Life Insurance
  • Wellness Program - Employee Assistance Program, Onsite Medical Center, Emergency Support System
  • Parental Support - Maternity Benefit, Paternity Benefit Program, Adoption Assistance Program, Day-care Support Program
  • Mobility Benefits - Relocation benefits, Transfer Support Policy, Travel Policy
  • Retirement Benefits - Employee PF Contribution, Flexible PF Contribution, Gratuity, NPS, Leave Encashment
  • Other Benefits - Higher Education Assistance, Car Lease, Salary Advance Policy
  • Our inclusive culture promotes individual expression, creativity, innovation, and achievement and in turn helps us better understand and serve our customers. We see ourselves as a place for intellectual curiosity, ideas and debates, where diverse perspectives lead to deeper understanding and better quality results. PhonePe is an equal opportunity employer and is committed to treating all its employees and job applicants equally; regardless of gender, sexual preference, religion, race, color or disability. If you have a disability or special need that requires assistance or reasonable accommodation, during the application and hiring process, including support for the interview or onboarding process, please fill out this form.

    Create a job alert for this search

    Site Reliability Engineer • Chennai, IN

    Related jobs
    Site Reliability Engineer

    Site Reliability Engineer

    Tata Consultancy Services • Chennai, Tamil Nadu, India
    TCS has been a great pioneer in feeding the fire of young techies like you.We are a global leader in the technology arena and there’s nothing that can stop us from growing together.Role : Site Relia...Show more
    Last updated: 30+ days ago • Promoted
    Senior Site Reliability Engineer

    Senior Site Reliability Engineer

    Poshmark • Chennai, Tamil Nadu, India
    We’re looking for an experienced Site Reliability Engineer to fill the mission-critical role of ensuring that our complex, web-scale systems are healthy, monitored, automated, and designed to scale...Show more
    Last updated: 30+ days ago • Promoted
    Senior Site Reliability Engineer

    Senior Site Reliability Engineer

    Miratech • Chennai, Tamil Nadu, India
    Join us in revolutionizing customer experiences with our client a global leader in cloud contact center software.Senior Site Reliability Engineer. You will design dashboards work with observability ...Show more
    Last updated: 30+ days ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Capgemini • Chennai, IN
    Choosing Capgemini means choosing a company where you will be empowered to shape your career in the way you’d like, where you’ll be supported and inspired by a collaborative community of colleagues...Show more
    Last updated: 30+ days ago • Promoted
    AWS Site Reliability Engineer

    AWS Site Reliability Engineer

    HTC Global Services • Chennai, Tamil Nadu, India
    Troy, Michigan, is a leading global Information Technology solution and BPO provider.HTC assists clients across multiple industry verticals, offering turnkey project lifecycle in, e-business, data ...Show more
    Last updated: 29 days ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Pagos Consultants • Chennai, IN
    This team will play a pivotal role in spearheading innovation.As such, you will have the opportunity to shape the early architecture and design of the system and set the trajectory for its future d...Show more
    Last updated: 6 days ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    NielsenIQ • Chennai, Tamil Nadu, India
    NIQ Activate is the leading provider of AI-powered customer analytics personalization and brand collaboration platform.Serving dozens of retailers and brands across the world using cutting edge big...Show more
    Last updated: 30+ days ago • Promoted
    Staff Site Reliability Engineer

    Staff Site Reliability Engineer

    Poshmark • Chennai, Tamil Nadu, India
    We’re looking for an experienced.You will use your background as an operations generalist to work closely with our development teams from the early stages of design all the way through identifying ...Show more
    Last updated: 30+ days ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Datum Technologies Group • Chennai, Tamil Nadu, India
    Job Title : Site Reliability Engineer (SRE) – AWS.AWS, Terraform, Kubernetes, Docker, Grafana, Prometheus, Datadog.We are looking for a skilled Site Reliability Engineer (SRE) with strong AWS experi...Show more
    Last updated: 20 days ago • Promoted
    Lead Site Reliability Engineer

    Lead Site Reliability Engineer

    Datum Technologies Group • Chennai, Tamil Nadu, India
    Job Details : Job Title : Lead Site Reliability Engineer (SRE) Duration : Contract to Hire (On the Payroll of Datum Technology Group) Location : Chennai || Mumbai || Gurugram Interview Process : Vir...Show more
    Last updated: 1 day ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Grootan Technologies • Chennai, Tamil Nadu, India
    Site Reliability Engineer (SRE).In this role, you will be responsible for building and maintaining reliable, scalable, and secure infrastructure to support our applications.You will leverage your e...Show more
    Last updated: 19 days ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Insight Global • Chennai, IN
    Contract with Insight Global Client.Join our Site Reliability Engineering (SRE) team as a Fullstack Developer, focused on building and maintaining highly reliable, automated, and scalable systems.Y...Show more
    Last updated: 30+ days ago • Promoted
    Senior Site Reliability Engineer

    Senior Site Reliability Engineer

    Arcadia • Chennai, Tamil Nadu, India
    Senior Site Reliability Engineer.Arcadia is the technology company empowering energy innovators and consumers to fight the climate crisis. Our software and APIs are revolutionizing an industry held ...Show more
    Last updated: 6 days ago • Promoted
    Lead Site Reliability Engineer (SRE)

    Lead Site Reliability Engineer (SRE)

    Datum Technologies Group • Chennai, Tamil Nadu, India
    Job Title : Lead Site Reliability Engineer (SRE).Duration : Contract to Hire (On the Payroll of Datum Technology Group).Location : Chennai || Mumbai || Gurugram. Interview Process : Virtual (2 Rounds) +...Show more
    Last updated: 5 days ago • Promoted
    Sr. Site Reliability Engineer (SRE)

    Sr. Site Reliability Engineer (SRE)

    Datum Technologies Group • Chennai, Tamil Nadu, India
    Site Reliability Engineer (SRE).Duration : Contract to Hire (On the Payroll of Datum Technology Group).Location : Chennai || Mumbai || Gurugram. Interview Process : Virtual (2 Rounds) + 1 Technical scr...Show more
    Last updated: 5 days ago • Promoted
    Principal Site Reliability Engineer

    Principal Site Reliability Engineer

    Datum Technologies Group • Chennai, Tamil Nadu, India
    Job Details : Job Title : Lead Site Reliability Engineer (SRE) Duration : Contract to Hire (On the Payroll of Datum Technology Group) Location : Chennai || Mumbai || Gurugram Interview Process : Vir...Show more
    Last updated: 1 day ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Intellistaff Services Pvt. Ltd • Chennai, Chennai (district), India
    SRE Public Cloud & Cloud Engineering.Docker / Kubernetes, Terraform (incl.DevOps & CI / CD (GitHub, Cloud Build).Scripting : Python, Go, PowerShell, Java, JS / Node. Messaging : Kafka, RabbitMQ, ActiveMQ.Mo...Show more
    Last updated: 6 hours ago • Promoted • New!
    Sr. Site Reliability Engineer

    Sr. Site Reliability Engineer

    Datum Technologies Group • Chennai, Tamil Nadu, India
    Job Details : Job Title : Sr.Site Reliability Engineer (SRE) Duration : Contract to Hire (On the Payroll of Datum Technology Group) Location : Chennai || Mumbai || Gurugram Interview Process : Virtu...Show more
    Last updated: 1 day ago • Promoted