Senior Software Engineer
Microsoft
Senior Software Engineer
Multiple Locations, United States
Save
Overview
The HPC/AI (High-Performance Computing and Artificial Intelligence) team is driving the creation of next-generation distributed AI supercomputers—delivering unmatched computational power, scalability, and reliability to enable breakthroughs in artificial intelligence. We design and build advanced infrastructure for large-scale AI model training, setting the stage for innovations that redefine what AI can achieve.
We’re seeking Senior Software Engineers who are passionate about high-performance systems and eager to tackle complex challenges in backend network design, RDMA-based communication libraries, and new network transport protocol development. In this role, you’ll develop networking solutions that ensure high throughput, ultra-low latency, and minimal jitter for distributed AI workloads—critical for enabling state-of-the-art AI systems to reach their full potential.
In this role, you’ll develop next-generation network transport protocols, and build RDMA-based communication libraries that deliver ultra-low latency and high throughput. You’ll collaborate across diverse network architectures, processors, and accelerator technologies to deliver end-to-end solutions with a relentless focus on performance, scalability, and observability.
Generative AI and large-scale distributed systems are transforming technology. As a Senior Software Engineer on our team, you’ll work at the intersection of AI and high-performance computing, shaping the networking backbone that powers Azure’s AI supercomputing platform. This is your chance to influence the future of AI infrastructure and make an impact at global scale.
Microsoft’s mission is to empower every person and every organization on the planet to achieve more. We embrace a growth mindset, innovate to empower others, and collaborate to achieve shared goals. Our values—respect, integrity, and accountability—guide us as we build a culture of inclusion where everyone can thrive.
Qualifications
Required Qualifications:
- Bachelor's Degree in Computer Science or related technical field AND 4+ years technical engineering experience with coding in languages including, but not limited to, C, C++, Rust, or Python
- OR equivalent experience.
- 3+ years of experience in software design and development.
- Experience with high performance networking hardware/architecture.
Other Requirements:
- Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include, but are not limited to the following specialized security screenings:
- Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud Background Check upon hire/transfer and every two years thereafter.
Preferred Qualifications:
- Bachelor's Degree in Computer Science
- OR related technical field AND 8+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, OR Python
- OR Master's Degree in Computer Science or related technical field AND 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python
- OR equivalent experience.
- Advanced degree preferably in Computer Science or related fields OR 3+ years equivalent professional work experience in Software Development with demonstrated history of success in building scalable software for data center networks.
- 3+ years experience on High Performance Computing / Machine Learning middleware and Communication Runtime.
- 3+ years experience on Hardware-Software co-design.
- 3+ years experience on Profiling and Performance Analysis Tools.
Software Engineering IC4 - The typical base pay range for this role across the U.S. is USD $119,800 - $234,700 per year. There is a different range applicable to specific work locations, within the San Francisco Bay area and New York City metropolitan area, and the base pay range for this role in those locations is USD $158,400 - $258,000 per year.
Certain roles may be eligible for benefits and other compensation. Find additional benefits and pay information here: https://careers.microsoft.com/us/en/us-corporate-pay.
Microsoft will accept applications and processes offers for these roles on an ongoing basis.
#azurecorejobs
Responsibilities
- Design, develop, and optimize networking solutions tailored for large-scale AI training infrastructure.
- Benchmark, analyze, and enhance the scalability and reliability of networking systems to handle petabyte-scale data transfer.
- Debug and resolve complex networking issues in large-scale, high-performance environments.
- Drive identification of dependencies and the development of design documents for a product, application, service, or platform.
- Create, implement, optimize, debug, refactor, and reuse code to establish and improve performance and maintainability, effectiveness, and return on investment (ROI).
- Proactively seek new knowledge and adapts to new AI trends, technical solutions, and patterns that will improve the availability, reliability, efficiency, observability, and performance.