Principal Engineer Manager
Microsoft
Principal Engineer Manager
Multiple Locations, United States
Save
Overview
The HPC/AI (High-Performance Computing and Artificial Intelligence) organization is on a mission to build the next generation of distributed AI supercomputers—systems that deliver unprecedented computational power, scalability, and reliability to accelerate breakthroughs in artificial intelligence. Our teams design and develop world-class AI infrastructure that enables large-scale model training and inference, forming the backbone of Microsoft’s AI innovation.
As a Principal Software Engineering Manager, you will lead a team building foundational components of Azure’s AI networking infrastructure—powering some of the largest and most complex distributed training systems in the world. This is a rare opportunity to work at the intersection of AI, cloud infrastructure, and high-performance networking, driving innovation across hardware and software boundaries. With the explosive growth of generative AI and the demand for low-latency, high-bandwidth systems, your work will directly impact the scale, performance, and reliability of Microsoft’s AI platforms.
You will lead the design, development, and deployment of high-performance, scalable, and observable networking systems that connect AI accelerators at massive scale. The role requires deep technical acumen, strategic thinking, and a passion for engineering excellence. You’ll collaborate across Microsoft teams to define architecture, deliver solutions to complex infrastructure challenges, and ensure our systems meet the evolving needs of AI workloads.
If you’re passionate about building large-scale distributed systems, pushing the boundaries of AI infrastructure, and leading teams that shape the future of supercomputing, we invite you to join us on this journey to define the next era of AI at Microsoft.
Microsoft’s mission is to empower every person and every organization on the planet to achieve more. As employees we come together with a growth mindset, innovate to empower others, and collaborate to realize our shared goals. Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond.
Qualifications
Required Qualifications:
- Bachelor's Degree in Computer Science, or related technical discipline AND 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python
- OR equivalent experience.
- 4+ years of professional software design and development experience in large-scale distributed systems.
- 4+ years of experience leading and managing engineering teams.
Other Qualifications:
- Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include, but are not limited to the following specialized security screenings:
- Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud Background Check upon hire/transfer and every two years thereafter.
Preferred Qualifications:
- Bachelor's Degree in Computer Science
- OR related technical field AND 8+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript,
- OR Python
- OR Master's Degree in Computer Science or related technical field AND 12+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python
- OR equivalent experience.
- 4+ years people management experience.
- 4+experience building and operating networking infrastructure for hyperscale datacenters or AI clusters, including scalable and fault-tolerant systems.
- Hands-on expertise with AI-specific networking technologies (e.g., InfiniBand, ROCE, NVLink) and protocols like Ethernet, TCP/IP, RDMA, gRPC.
- Familiarity with network virtualization, SDN, performance tuning, and telemetry/observability tools for large-scale monitoring.
- Understanding of AI accelerators (GPUs, TPUs) and their interaction with networking infrastructure in distributed environments.
Software Engineering M5 - The typical base pay range for this role across the U.S. is USD $139,900 - $274,800 per year. There is a different range applicable to specific work locations, within the San Francisco Bay area and New York City metropolitan area, and the base pay range for this role in those locations is USD $188,000 - $304,200 per year.
Certain roles may be eligible for benefits and other compensation. Find additional benefits and pay information here:https://careers.microsoft.com/us/en/us-corporate-pay
Microsoft will accept applications and processes offers for these roles on an ongoing basis.
#azurecorejobs
Responsibilities
- Hire, manage, and grow a high-performing team of software engineers, fostering a culture of excellence, inclusion, and innovation.
- Lead the design and development of large-scale distributed systems and services that power Azure’s AI infrastructure.
- Drive engineering planning and execution while ensuring alignment with organizational OKRs and long-term strategy.
- Establish lean, scalable, and efficient processes that promote innovation and engineering rigor.
- Deliver best-in-class engineering by ensuring services and components are modular, secure, reliable, diagnosable, observable, and reusable.
- Improve test coverage, automation, and integration testing to proactively identify and resolve reliability gaps.
- Ensure live-site reliability and service health through robust monitoring, telemetry, and automation.
- Collaborate across Microsoft and partner organizations to deliver cohesive, end-to-end infrastructure solutions.
- Apply data-driven insights to optimize performance, scalability, and customer satisfaction.