What youll do
Engineer reliability : Identify potential system issues early, implement preventive measures, and boost system resilience.
Automate for speed : Build tools, pipelines, and scripts that eliminate manual effort and enable rapid, secure deployments.
Solve problems at scale : Participate in root cause analysis (RCA), drive corrective and preventive actions, and prioritize improvements in the SRE backlog.
Optimize performance : Apply programming and scripting expertise (Python, Bash, etc.) to strengthen infrastructure and services.
Mentor & grow : Share knowledge with peers, champion best practices, and push for innovation.
Continuously evolve : Stay ahead of new technologies, adapt quickly, and help shape the future of T-Mobiles IT systems.
What you bring
Education : Bachelors degree in computer science, Engineering, or related field (masters preferred).
Experience : 25 years in operations, DevOps, or software engineering, with expertise in troubleshooting, automation, and customer support.
Must Have Skills :
Proficiency in Java / Python / Bash or similar programming languages.
Hands-on experience designing & maintaining CI / CD pipelines.
Strong incident response & problem-solving mindset.
Ability to adapt to changing environments and drive innovation.
Familiarity with cloud-native platforms and container orchestration (Kubernetes, Docker).
Monitoring / APM Tools
Nice to Have
Azure Certified DevOps Engineer, Certified Kubernetes Administrator, or Microsoft Cloud Certified Professional DevOps Engineer.
Experience with AppDynamics, Splunk, or other monitoring / APM tools.
A passion for creating a culture of automation, resilience, and learning.
Site Reliability Engineer • Hyderabad, India