Senior Service Engineer

Microsoft

Microsoft

IT
Posted on Oct 27, 2025

Senior Service Engineer

Hyderabad, Telangana, India

Save

Share job

Date posted
Oct 27, 2025
Job number
1901482
Work site
3 days / week in-office
Travel
None
Role type
Individual Contributor
Profession
Software Engineering
Discipline
Service Engineering
Employment type
Full-Time

Overview

Are you passionate about cloud computing, obsessed with customer experience, and skilled at translating complex technical issues into clear, transparent communication? Do you thrive in high-stakes, fast-paced environments and want to play a pivotal role in how Microsoft shows up for customers during moments that matter most? If so, the Azure Customer Experience (CXP) team has the opportunity for you.

Microsoft Azure is one of the most exciting and strategic products at Microsoft—powering mission-critical workloads for enterprises, governments, and startups around the world. Azure delivers on-demand, hyper-scale infrastructure and platforms via Microsoft's global data centers, enabling customers to build, host, and scale their applications with confidence.

The Customer Reliability Engineering (CRE) team within Azure CXP is a top-level pillar of Azure Engineering responsible for world-class live-site management, customer reliability engagements, and modern customer-first experiences at scale. Our “no dead-ends” philosophy ensures that every customer, regardless of size or scale, can realize their full potential through the Microsoft Cloud.

We are seeking a decisive, detail-oriented Service Engineer who will serve as the customer’s voice and advocate during high-severity incidents across Microsoft Azure. While predominantly focused on livesite customers communications, this hybrid role will also support service engineering, program and project management, and continual service improvement. You will work closely with incident managers, engineering responders, and field stakeholders to shape and deliver clear, timely, and action-oriented communications during outages, security events, service retirements, and other high-impact scenarios.

This is a critical, customer-facing role requiring exceptional writing skills, calm leadership during ambiguity, and a passion for building customer trust through transparency and clarity. You’ll work at the intersection of customer support, technical operations, and communications—and you’ll help shape how Microsoft communicates during crises, preemptively and retrospectively.

Qualifications

Required Qualifications
• Bachelor’s degree in computer science, Information Technology, Data Science, Cybersecurity, or a related field AND 5+ years of technical experience in software engineering, network engineering, service engineering, systems engineering, or industrial controls; OR equivalent hands-on experience.
• Hands-on experience implementing AI-driven solutions and automation, with proficiency in one or more programming/automation languages (e.g., C#, Java, JavaScript, Python) or equivalent expertise is a plus.
• Certifications in cloud technologies (Azure, AWS, GCP), ITIL, or SRE frameworks are desirable.
• Strategic thinking and a customer-first mindset; able to advocate for improvements in platform transparency and experience.
• Excellent problem-solving, judgment, and decision-making skills,
communication and collaboration skills.
• Understanding of SRE principles, including SLAs/SLOs, telemetry, and monitoring.
• Proven experience in cloud operations, incident & crisis management, or large-scale systems engineering ideally within platforms such as Azure, AWS, or GCP.
• Contribute to a data-driven culture as well as a culture of experimentation across the organization.
• Own and drive projects and features by working towards the team’s defined goals and milestones.
• Creating prototypes and proof-of-concepts for iterative development.
• Be curious and willing to learn and grow.

Preferred:
5+ Years of demonstrated experience as an Incident Commander or Crisis Manager for critical, high-severity incidents in high-availability, distributed environments.
Experience with SRE (Site Reliability Engineering) principles and practices.
Exposure to chaos engineering, fault injection, or high availability architecture.
AI/ML Experience: [Beginner to Intermediate]
Familiarity with how AI/ML models are integrated into cloud infrastructure and their potential failure modes.
Experience using AI-powered tools for incident analysis, log correlation, or predictive alerting.
An understanding of the challenges and risks associated with AI/ML systems in a production environment.
Certifications:
Relevant cloud certifications (e.g., AWS Certified DevOps Engineer, Azure Solutions Architect, GCP Professional Cloud Architect).
Certifications in ITIL, SRE, or other relevant frameworks.

Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud Background Check upon hire/transfer and every two years thereafter.

Responsibilities

s part of the Azure CXP CRE team, your responsibilities include:

On-call Communication Management during regular on-call rotations
• Join incident bridges and work with engineering to obtain real-time outage details.
• Understand incident scope, impact, and mitigation to translate complex technical findings into clear, professional, and decisive updates for customers and stakeholders.
• Keep communications consistent and fact-based throughout the incident; confirm information with engineering and leadership before sharing.
• Assist with publishing Public Incident Reports and RCA summaries.
• Support live site incident (LSI) operations, including triage, resolution, and post-incident analysis.
• Shares details related to incidents and their resolution through post-mortem reports and during regular review meetings.
Problem Management & Data Analytics
• Design and implement automated detection systems to identify impacted resources in real time.
• Collaborate with engineering and operations teams to enhance telemetry, monitoring, and alerting accuracy while reducing false positives.
• Develop dashboards and visualizations in Power BI and Azure Data Explorer to support data-driven insights.
• Build scalable data collection and analysis frameworks to improve service reliability and incident response.
• Participate in incident resolution workflows and provide actionable insights to drive platform and process improvements.
• Communicate technical findings and recommendations to stakeholders through clear, data-backed reporting.
Tooling & Automation
• Develop tools and analytics pipelines to automatically assess incident impact and blast radius across services, regions, and customers in real time.
• Design and maintain automation solutions that enhance incident detection, monitoring, communication, and remediation while reducing operational toil and repeat issues.
• Identify recurring problems, propose preventive solutions, and collaborate with engineers and teams to implement fixes.
• Design and orchestrate automation workflows using Microsoft Copilot Studio, Power Automate, and Azure AI Foundry.
• Build and support no-code/low-code solutions to optimize operations and improve team efficiency.
• Collaborate with product, infrastructure, and operations teams to align automation initiatives with organizational reliability and customer trust goals.


Benefits/perks listed below may vary depending on the nature of your employment with Microsoft and the country where you work.
Industry leading healthcare
Educational resources
Discounts on products and services
Savings and investments
Maternity and paternity leave
Generous time away
Giving programs
Opportunities to network and connect

Microsoft is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to age, ancestry, citizenship, color, family or medical care leave, gender identity or expression, genetic information, immigration status, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran or military status, race, ethnicity, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable local laws, regulations and ordinances. If you need assistance and/or a reasonable accommodation due to a disability during the application process, read more about requesting accommodations.