Senior Software Engineer
Microsoft
As a Senior Software Engineer in the Azure Reliability team, you will be part of a multidisciplinary organisation dedicated to making Azure the safest and most reliable cloud platform in the world. This role is at the heart of our mission to deliver world-class reliability for Azure’s most critical services and products.
You will apply Site Reliability Engineering (SRE) principles to design and implement solutions that enhance availability, observability, and operability across planet-scale systems. Beyond coding, you’ll collaborate with product teams to drive long-term platform improvements, automate repetitive tasks, and leverage AI to scale reliability across Azure. Your work will influence architecture, create reusable solutions, and contribute to industry-leading practices.
We are the Azure Reliability team; a multidisciplinary engineering organization committed to making Azure the world’s safest and most reliable cloud. For Azure’s most critical services and products, we apply a Site Reliability Engineering (SRE) approach. Our software engineers work closely with product teams to enhance availability, reliability, observability, and operability across our planet-scale systems.
Every day, our customers stake their business and reputation on our cloud. You can help #AzCXP provide our customers with the world-class cloud services they need to succeed.
Microsoft’s mission is to empower every person and every organization on the planet to achieve more. As employees we come together with a growth mindset, innovate to empower others, and collaborate to realize our shared goals. Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond.
Responsibilities
Billions of users across the world rely on our products, and to meet this demand we design and implement world-class distributed systems. As a Software Engineer in one of our Azure SRE teams, you will be responsible for improving the reliability of key Azure products.
The Azure SRE key focus areas are:
- Defining system reliability goals through Service Level Objectives (SLOs). Enhancing production posture with targeted improvements in observability and operability (telemetry, alerting, incident/change management, safe deployment practices).
- Building reusable automation and processes that help multiple teams meet their reliability goals. Influencing product architecture and roadmaps to ensure customer-experienced reliability is a core design principle.
- Contributing directly to product code to achieve reliability outcomes. Leveraging AI to proactively detect anomalies, predict incidents, and automate operational workflows - scaling reliability efforts across complex systems.
- We are looking for engineers passionate about the above areas who are also interested in:
- Providing technical leadership across multiple Azure teams. Mentoring others on SRE principles, practices, and tools as well as AI usage to boost software development productivity.
- Designing and developing large-scale distributed software services and solutions. Delivering “best-in-class” engineering by ensuring services are modular, secure, reliable, testable, diagnosable, observable, and reusable.
- Collaborating with internal and external partners to support team goals. Balancing pragmatism with vision—driving continuous improvements in process and codebase. Building automation to prevent or remediate service issues before they impact users.
- Driving innovation in large-scale operations by applying cutting-edge AI tools and techniques to reduce operational toil and scale reliability engineering across complex systems. Gaining a working understanding of Microsoft businesses and contributing to cohesive, end-to-end user experiences.
Qualifications
- Bachelor's Degree in Computer Science or related technical field AND technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python
- OR Master's Degree in Computer Science or related technical field AND technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python.
- OR equivalent experience working with large-scale distributed systems (e.g., cloud computing providers, SaaS services, etc., ideally with millions or billions of users) or similarly complex environments.
- Awareness of, and ability to reason about, modern distributed software design patterns and cloud systems architecture, including microservices, containers, load-balancing, queuing, caching.
- Experience with C#/Java/C/C++/Golang.
- Experience in building, shipping and operating reliable solutions.
Preferred Qualifications
- Familiarity with modern distributed software design patterns and cloud systems architecture, including microservices, containers, load balancing, queuing, caching.
- Experience working on large and unfamiliar codebases (millions of lines of code).
- Experience with open-source projects, Kubernetes, Linux and containers is desired.
- Proficiency in programming languages like C#/Java/Python.
- Experience in AI adoption with tools like GitHub Copilot, Azure OpenAI and custom copilots to streamline development and reduce toil.
Other Qualifications
- Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud Background Check upon hire/transfer and every two years thereafter.
#AzRel #AzCXP
This position will be open for a minimum of 5 days, with applications accepted on an ongoing basis until the position is filled.
Microsoft is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to age, ancestry, citizenship, color, family or medical care leave, gender identity or expression, genetic information, immigration status, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran or military status, race, ethnicity, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable local laws, regulations and ordinances. If you need assistance with religious accommodations and/or a reasonable accommodation due to a disability during the application process, read more about requesting accommodations.