Senior AI Network System Architect
NVIDIA
Our technology has no boundaries! NVIDIA is building the world’s most groundbreaking and state-of-the-art accelerated computing platforms. Because of our work, scientists, researchers, and engineers can advance their ideas. We pioneered a supercharged form of computing loved by the fastest-paced computer users in the world—scientists, designers, artists, and gamers.
We seek a highly motivated Senior AI Network System Architect to join our team of experts and help shape the foundational infrastructure for the AI revolution. Our next-generation networking systems are at the forefront of connecting and powering the world's most advanced AI clusters. As a key member of our architecture team, you will be responsible for a wide range of critical activities, from deep technical analysis and performance modeling to strategic architectural studies, ensuring NVIDIA continues to innovate and lead.
What You’ll Be Doing:
Define, develop, and execute cutting-edge benchmarks and workloads to analyze system performance, identify bottlenecks, and drive optimizations across our hardware and software stack.
Drive the direction of our future products by performing deep-dive analysis of system architectures and solutions to assess their performance, efficiency, and value proposition.
Develop and validate sophisticated performance and network simulation models, correlating them with real-world hardware to predict and analyze the behavior of future systems.
Analyze and optimize the entire AI stack, including communication libraries (like NCCL) and system software to the underlying network fabric, developing Proof-of-Concepts (POCs) for new features and improvements.
Conceptualize next-generation networking architectures driven by emerging DL and AI technologies.
Collaborate with multi-functional teams, including other architecture teams, logic design, system software, firmware, and DL research teams, to ensure the successful execution of our vision.
What We Need To See:
M.Sc. or Ph.D. degree in Computer Science, Computer Engineering, or Electrical Engineering, or equivalent experience.
6+ years of relevant industry or research experience in high-performance computing, computer architecture, or computer networks.
Excellent understanding of large-scale system behavior and the effect of distributed computing workloads on network and system performance.
Proven experience in simulative performance analysis or benchmarking.
Exceptional analytical, problem-solving, and systems-thinking skills, with the ability to translate complex technical data into strategic architectural insights.
Hands-on programming skills in Python and/or AI frameworks for system analysis, automation, and modeling.
Ability to thrive in a fast-paced, dynamic environment and work concurrently with multiple groups across the organization.
Ways To Stand Out From The Crowd:
Expertise in the architecture and system-level requirements of large-scale, distributed DL workloads (e.g., LLMs, Generative AI for vision).
Deep understanding of communication libraries such as NCCL, UCX, or UCC.
Expertise in network protocols (Ethernet, InfiniBand, RoCE) and large-scale network topologies.
Experience with industry-standard AI benchmarks (e.g., MLPerf) and NVIDIA's frameworks (e.g., NeMo) on large-scale clusters.
NVIDIA has some of the most forward-thinking and hardworking people in the world working for us, and due to unprecedented growth, our world-class engineering teams are growing fast. If you're a creative and autonomous engineer with a real passion for technology, we want to hear from you.
We are committed to fostering a diverse work environment and are proud to be an equal-opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law. We will ensure that individuals with disabilities are provided reasonable accommodation to participate in the job application or interview process, perform essential job functions, and receive other benefits and privileges of employment. Please contact us to request accommodation.