Engineer - Site Reliability - FPT
About the Role :
As a Site Reliability Engineer, youll play a crucial role in keeping our digital backbone running seamlessly for millions of customers. Your mission : reduce incidents, automate everything, and help us dream big while delivering at scale.
What you will do :
- Engineer reliability : Identify potential system issues early, implement preventive measures, and boost system resilience.
- Automate for speed : Build tools, pipelines, and scripts that eliminate manual effort and enable rapid, secure deployments.
- Solve problems at scale : Participate in root cause analysis (RCA), drive corrective and preventive actions, and prioritize improvements in the SRE backlog.
- Optimize performance : Apply programming and scripting expertise (Python, Bash, etc.) to strengthen infrastructure and services.
- Mentor & grow : Share knowledge with peers, champion best practices, and push for innovation.
- Continuously evolve : Stay ahead of new technologies, adapt quickly, and help shape the future of T-Mobiles IT systems.
What you bring :
Education : Bachelors degree in Computer Science, Engineering, or related field (Masters preferred).Experience : 35 years in operations, DevOps, or software engineering, with expertise in troubleshooting, automation, and customer support.Skills :
Proficiency in Java, Python, Bash, or similar programming languages.Hands-on experience designing & maintaining CI / CD pipelines.Strong incident response & problem-solving mindset.Ability to adapt to changing environments and drive innovation.Familiarity with cloud-native platforms and container orchestration (Kubernetes, Docker).Must Have Skills :
Java,Monitoring / APM tools.Nice To Have :
Azure Certified DevOps Engineer, Certified Kubernetes Administrator, or Microsoft Cloud Certified Professional DevOps Engineer.Experience with AppDynamics, Splunk, or other monitoring / APM tools.A passion for creating a culture of automation, resilience, and learning.