Software Architect, Enterprise AI Software
NVIDIA
NVIDIA is the platform upon which every new AI-powered application is built. We are seeking a Software Architect to define and lead the technical vision for the NVIDIA Inference Microservices (NIM) Factory. You will set the architectural direction for how we build, deploy, and scale enterprise-grade AI services to delight customers, while staying hands-on to guide our most critical implementations. The scope spans day-0 launches and the follow-through to harden them into enterprise-grade software, ensuring reliability, performance, and security across thousands of GPUs. You will shape our strategy for emerging challenges like disaggregated LLM inference and safeguard the long-term technical health of the platform.
What you'll be doing:
Define the end-to-end technical architecture for the NIM Factory, from container build systems and CI/CD to Kubernetes deployment patterns and runtime optimization.
Drive technical strategy and roadmap, making high-impact decisions on frameworks, technologies, and standards that empower dozens of engineering teams.
Architect and influence the design of workflow orchestration systems that underpin the NIM factory.
Coach and mentor senior engineers across the organization, fostering a culture of technical excellence, innovation, and knowledge sharing.
Champion best practices in software development, including API design, automation, observability, and secure supply chain management.
Collaborate with leadership across research, backend, SRE, and product to align technical vision with product goals and influence technical roadmaps.
What we need to see:
12+ years of experience designing and building large-scale, production distributed systems.
Proven track record in a technical leadership or architect role, setting technical direction while staying hands-on with implementation.
Deep architectural expertise in cloud-native technologies, including Kubernetes, containers, and microservices.
Exceptional ability to coach, teach, and influence senior engineers; a passion for raising the technical bar of the entire organization.
Strong foundation in modern software development practices, with proficiency in languages like Python for building tooling and services.
Experience architecting solutions for GPU-accelerated or other high-performance computing workloads.
Excellent communication and collaboration skills, with the ability to articulate complex technical concepts to diverse audiences and drive consensus.
A degree in Computer Science, Computer Engineering, or a related field (BS or MS) or equivalent experience.
Ways to stand out from the crowd:
Hands-on with LLM inference stacks (Triton Inference Server, TensorRT-LLM, vLLM, FasterTransformer, KServe).
Experience optimizing large-model serving (KV cache sharding/paging, tensor/sequence parallelism, speculative decoding, dynamic batching).
Experience architecting next-generation container build systems or CI/CD platforms at scale.
Background with workflow orchestration engines (e.g., Temporal, Airflow) for complex, distributed processes.
Expertise in designing multi-tenant, multi-cluster, or edge/air-gapped deployment architectures.
We are widely considered to be one of the technology world’s most desirable employers. We have some of the most forward-thinking and creative people in the world working for us. If you're creative and autonomous with a real passion for technology, we want to hear from you.