Are you passionate about cloud computing, obsessed with customer experience, and skilled at translating complex technical issues into clear, transparent communication Do you thrive in high-stakes, fast-paced environments and want to play a pivotal role in how Microsoft shows up for customers during moments that matter most If so, the Azure Customer Experience (CXP) team has the opportunity for you.
Microsoft Azure is one of the most exciting and strategic products at Microsoft—powering mission-critical workloads for enterprises, governments, and startups around the world. Azure delivers on-demand, hyper-scale infrastructure and platforms via Microsoft's global data centers, enabling customers to build, host, and scale their applications with confidence.
The Customer Reliability Engineering (CRE) team within Azure CXP is a top-level pillar of Azure Engineering responsible for world-class live-site management, customer reliability engagements, and modern customer-first experiences at scale. Our 'no dead-ends' philosophy ensures that every customer, regardless of size or scale, can realize their full potential through the Microsoft Cloud.
We are seeking a decisive, detail-oriented Service Engineer who will serve as the customer's voice and advocate during high-severity incidents across Microsoft Azure. While predominantly focused on livesite customers communications, this hybrid role will also support service engineering, program and project management, and continual service improvement. You will work closely with incident managers, engineering responders, and field stakeholders to shape and deliver clear, timely, and action-oriented communications during outages, security events, service retirements, and other high-impact scenarios.
This is a critical, customer-facing role requiring exceptional writing skills, calm leadership during ambiguity, and a passion for building customer trust through transparency and clarity. You'll work at the intersection of customer support, technical operations, and communications—and you'll help shape how Microsoft communicates during crises, preemptively and retrospectively.
Responsibilities
As part of the Azure CXP CRE team, your responsibilities include :
On-call Communication Management during regular on-call rotations
Problem Management & Data Analytics
Tooling & Automation
Qualifications
Required Qualifications
Preferred
5+ Years of demonstrated experience as an Incident Commander or Crisis Manager for critical, high-severity incidents in high-availability, distributed environments.
Experience with SRE (Site Reliability Engineering) principles and practices.
Exposure to chaos engineering, fault injection, or high availability architecture.
AI / ML Experience : [Beginner to Intermediate]
Familiarity with how AI / ML models are integrated into cloud infrastructure and their potential failure modes.
Experience using AI-powered tools for incident analysis, log correlation, or predictive alerting.
An understanding of the challenges and risks associated with AI / ML systems in a production environment.
Certifications
Relevant cloud certifications (e.g., AWS Certified DevOps Engineer, Azure Solutions Architect, GCP Professional Cloud Architect).
Certifications in ITIL, SRE, or other relevant frameworks.
Microsoft Cloud Background Check : This position will be required to pass the Microsoft Cloud Background Check upon hire / transfer and every two years thereafter.
Microsoft is an equal opportunity employer. Consistent with applicable law, all qualified applicants will receive consideration for employment without regard to age, ancestry, citizenship, color, family or medical care leave, gender identity or expression, genetic information, immigration status, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran or military status, race, ethnicity, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable local laws, regulations and ordinances. If you need assistance and / or a reasonable accommodation due to a disability during the application process, read more about requesting accommodations.
Skills Required
Java, Javascript, Power Bi, Power Automate, Python
Service Engineer • Hyderabad / Secunderabad, Telangana, India