Principal Product Manager - M365 CPU/GPU Capacity Management
Microsoft
Principal Product Manager – M365 CPU/GPU Capacity Management
Redmond, Washington, United States
Save
Overview
As Copilot adoption accelerates, so does the demand on our infrastructure. We are approaching a multi-billion dollar infrastructure footprint, managing millions of Central Processing Unit (CPU) and a growing Graphics Processing Unit (GPU) fleet.
As Principal Product Manager – M365 CPU/GPU Capacity Management for M365 Capacity Management, you will lead the strategy and execution for scaling Copilot’s CPU & GPU fleet. Your mission: deliver innovation at scale while reducing marginal Cost of Goods Sold (COGS) and accelerating time-to-value for new features, experiments, and model deployments.
Are You...
- A strategic thinker who thrives at the intersection of infrastructure, product, business metrics, and AI innovation?
- Passionate about driving efficiency at hyperscale across hardware, software, and operations?
- A proactive, AI-focused product manager who thrives on extreme ownership and drives outcomes?
- Someone who operates without boundaries —navigating across teams, domains, and ambiguity to get things done?
- Energized by scaling AI infrastructure to deliver real-world impact?
- A natural collaborator who brings urgency, accountability, and clarity to complex, cross-functional efforts?
If yes, then the M365 Core Platform Capacity Management team is just the place for you. We are looking for a Principal Product Manager who will lead the strategy and execution for scaling our Copilot CPU & GPU fleet—powering one of the most ambitious AI workloads in the world. You’ll be at the forefront of designing value-based, COGS-aware capacity systems, driving multi-layered efficiency across hardware and software, and partnering across engineering, finance, and infrastructure to ensure we scale with precision and purpose.
Microsoft’s mission is to empower every person and every organization on the planet to achieve more. As employees we come together with a growth mindset, innovate to empower others, and collaborate to realize our shared goals. Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond.
Qualifications
Required Qualifications:
- Bachelor's Degree AND 8+ years experience in product/service/program management or software development
- OR equivalent experience.
- Experience with cloud infrastructure, AI/ML platforms, or large-scale distributed systems.
- Solid analytical and modeling skills; experience with forecasting, telemetry, and financial modeling.
- Familiarity with Azure, M365, GPU-based workloads, and AI model deployment.
Preferred Qualifications:
- Bachelor's Degree AND 12+ years experience in product/service/program management or software development
- OR equivalent experience.
- 4+ years experience taking a product, feature, or experience to market (e.g., design, addressing product market fit, and launch, internal tool/framework).
- 6+ years experience improving product metrics for a product, feature, or experience in a market (e.g., growing customer base, expanding customer usage, avoiding customer churn).
- 6+ years experience disrupting a market for a product, feature, or experience (e.g., competitive disruption, taking the place of an established competing product).
Product Management IC5 - The typical base pay range for this role across the U.S. is USD $139,900 - $274,800 per year. There is a different range applicable to specific work locations, within the San Francisco Bay area and New York City metropolitan area, and the base pay range for this role in those locations is USD $188,000 - $304,200 per year.
Certain roles may be eligible for benefits and other compensation. Find additional benefits and pay information here: https://careers.microsoft.com/us/en/us-corporate-pay
Microsoft will accept applications for the role until October 28, 2025.
#M365Core
Responsibilities
- Design and operationalize a COGS-aware capacity demand & supply allocation framework for Copilot features, experimentation, and training workloads.
- Drive a multi-layered efficiency roadmap from hardware Stock Keeping Unit (SKU) to workload orchestration and fleet operations, ensuring every GPU delivers maximum value.
- Own the end-to-end capacity signal lifecycle, including demand forecasting, supply planning, and alignment with finance and operations.
- Evolve the Copilot COGS model, identify top marginal cost drivers, and feed insights into the efficiency roadmap.
- Partner with the Copilot Infra team to enhance control plane capabilities for faster experimentation, model selection, benchmarking, and production deployment.
- Act as a unifying force across the Foundation Fleet & Capacity team, Copilot Infra, Azure AI ensuring shared goals, aligned execution, and transparent communication.