Software Architect, Enterprise AI Software

NVIDIA

NVIDIA

Software Engineering, Data Science, IT
Shanghai, China
Posted on Sep 19, 2025

NVIDIA is the platform upon which every new AI-powered application is built. We are seeking a Software Architect to define and lead the technical vision for the NVIDIA Inference Microservices (NIM) Factory. You will set the architectural direction for how we build, deploy, and scale enterprise-grade AI services to delight customers, while staying hands-on to guide our most critical implementations. The scope spans day-0 launches and the follow-through to harden them into enterprise-grade software, ensuring reliability, performance, and security across thousands of GPUs. You will shape our strategy for emerging challenges like disaggregated LLM inference and safeguard the long-term technical health of the platform.

What you'll be doing:

  • Define the end-to-end technical architecture for the NIM Factory, from container build systems and CI/CD to Kubernetes deployment patterns and runtime optimization.

  • Drive technical strategy and roadmap, making high-impact decisions on frameworks, technologies, and standards that empower dozens of engineering teams.

  • Architect and influence the design of workflow orchestration systems that underpin the NIM factory.

  • Coach and mentor senior engineers across the organization, fostering a culture of technical excellence, innovation, and knowledge sharing.

  • Champion best practices in software development, including API design, automation, observability, and secure supply chain management.

  • Collaborate with leadership across research, backend, SRE, and product to align technical vision with product goals and influence technical roadmaps.

What we need to see:

  • 12+ years of experience designing and building large-scale, production distributed systems.

  • Proven track record in a technical leadership or architect role, setting technical direction while staying hands-on with implementation.

  • Deep architectural expertise in cloud-native technologies, including Kubernetes, containers, and microservices.

  • Exceptional ability to coach, teach, and influence senior engineers; a passion for raising the technical bar of the entire organization.

  • Strong foundation in modern software development practices, with proficiency in languages like Python for building tooling and services.

  • Experience architecting solutions for GPU-accelerated or other high-performance computing workloads.

  • Excellent communication and collaboration skills, with the ability to articulate complex technical concepts to diverse audiences and drive consensus.

  • A degree in Computer Science, Computer Engineering, or a related field (BS or MS) or equivalent experience.

Ways to stand out from the crowd:

  • Hands-on with LLM inference stacks (Triton Inference Server, TensorRT-LLM, vLLM, FasterTransformer, KServe).

  • Experience optimizing large-model serving (KV cache sharding/paging, tensor/sequence parallelism, speculative decoding, dynamic batching).

  • Experience architecting next-generation container build systems or CI/CD platforms at scale.

  • Background with workflow orchestration engines (e.g., Temporal, Airflow) for complex, distributed processes.

  • Expertise in designing multi-tenant, multi-cluster, or edge/air-gapped deployment architectures.

We are widely considered to be one of the technology world’s most desirable employers. We have some of the most forward-thinking and creative people in the world working for us. If you're creative and autonomous with a real passion for technology, we want to hear from you.