Senior System Software Engineer, AI HW Infra System

NVIDIA

NVIDIA

Software Engineering, Data Science
Santa Clara, CA, USA
Posted on Dec 4, 2025

For more than 25 years, NVIDIA has changed the landscape of digital imaging, personal gaming, and high-performance computing. Our success depends on reliable, informative telemetry and data systems that provide real-time understandings of our sophisticated, distributed infrastructure. As an engineer on our team, you will play a key role in building the next generation of observability for a diverse set of sophisticated workloads. You will transform raw telemetry data into actionable insights. You will architect, develop, and maintain infrastructure that supervises workload health, performance, and usage in critical engineering systems. This allows our global teams to work at peak efficiency. This role offers an outstanding mix of core software engineering, data management, and workload observability.

What you'll be doing:

  • Collaborate closely with internal chip design teams to understand their workflows and determine observability needs to help improve the overall efficiency of our chip development process.

  • Compose, build and maintain robust and scalable platforms and infrastructures for capturing, storing, visualizing and processing the data collected from chip build workflows.

  • Maintain and update the observability tools and systems to meet the needs of new/evolving chip design workflows.

  • Keep up to date with recent developments in the area related to observability tools, frameworks and strategies and advocate for their integration within the organization.

What we need to see:

  • Candidates must hold a BS or above degree in Computer Science or equivalent experience

  • Minimum 4+ years of professional experience developing and managing observability infrastructure.

  • Familiarity with EDA (Electronic Design Automation) workflows and tools used in the semiconductor industry.

  • Proficiency in programming and scripting using Python, Perl. Familiarity with databases, containerized applications, observability stack components. Experience in building data pipelines for a compute cluster using open-source technologies and building custom components as vital. Experience with C++ is a plus.

  • Solid grasp of software engineering principles and methodologies such as OOP, CI/CD. Ability to translate ambiguous problems into concrete solvable pieces.

  • Excellent communication and collaboration skills. Ability to adapt in a fast-paced environment with evolving requirements.

Ways to stand out from the crowd:

  • Background knowledge in accelerated computing (parallel programming) or experience running CPU-vectorized or GPU-based workloads, even if not directly tied to observability.

  • Hands-on experience in developing user interfaces using technologies such as HTML, CSS, JS, ReactJS or VueJS.

  • A passion for improving engineering productivity and efficiency with a data-driven philosophy.

Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. The base salary range is 148,000 USD - 235,750 USD for Level 3, and 184,000 USD - 287,500 USD for Level 4.

You will also be eligible for equity and benefits.

Applications for this job will be accepted at least until December 7, 2025.NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.