Principal GPU/CPU Systems Engineer
Oracle
- 10 or more years of experience in hardware design, system engineering, and platform bring-up.
- Hands-on experience with market-leading GPUs or AI platforms spanning development, bring-up, test, and characterization.
- Strong knowledge of AI/GPU and or AI/CPU platform architectures and capabilities.
- Experience evaluating system architectures, platform definitions, and implementation paths.
- Ability to balance hardware performance, power, cost, regulatory, and cross-functional requirements.
- Experience with modern server platforms across x86 and ARM architectures.
- Hardware development experience at the system, board, and FPGA levels.
- Proficiency reviewing hierarchical schematics, advanced multilayer board layouts, and end-to-end interconnects.
- Strong understanding of firmware and system diagnostics using BMC firmware, UEFI or BIOS, and Linux tools.
- Experience scripting and customizing diagnostics, validation, and test workflows.
- Experience with GPU supplier test code and open-source AI test and characterization tools.
- Experience with system integration, validation, and performance characterization.
- Strong understanding of high-speed buses and interconnects used in modern AI and compute platforms.
- Demonstrated ability to debug and root-cause complex hardware and software issues.
- Ability to document design intent and technical specifications clearly.
- Strong communication skills with the ability to explain complex technical topics across engineering teams and executive audiences.
- Proven ability to provide cross-functional technical leadership and collaborate effectively with internal teams and external partners.
- Experience using hardware debuggers.
- Experience with PCIe, DDR, Ethernet, USB, SPI, and related interfaces.
- Experience with platform-level security technologies.
- Experience with power circuit design and signal integrity.
As a world leader in cloud solutions, Oracle uses tomorrow’s technology to tackle today’s challenges. We’ve partnered with industry-leaders in almost every sector—and continue to thrive after 40+ years of change by operating with integrity.
We know that true innovation starts when everyone is empowered to contribute. That’s why we’re committed to growing an inclusive workforce that promotes opportunities for all.
Oracle careers open the door to global opportunities where work-life balance flourishes. We offer competitive benefits based on parity and consistency and support our people with flexible medical, life insurance, and retirement options. We also encourage employees to give back to their communities through our volunteer programs.
We’re committed to including people with disabilities at all stages of the employment process. If you require accessibility assistance or accommodation for a disability at any point, let us know by emailing accommodation-request_mb@oracle.com or by calling +1 888 404 2494 in the United States.
Oracle is an Equal Employment Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, national origin, sexual orientation, gender identity, disability and protected veterans’ status, or any other characteristic protected by law. Oracle will consider for employment qualified applicants with arrest and conviction records pursuant to applicable law.
Oracle hardware platform development engineering is seeking a Sr. Principal GPU/CPU Systems Engineer to help define, develop, and support next-generation AI and compute platforms for Oracle Cloud Infrastructure (OCI). This role focuses on platform architecture, system integration, performance characterization, and in-service support of large-scale Cloud AI systems. You will work closely with internal hardware, firmware, software, security, manufacturing, and cloud operations teams, as well as external GPU and AI silicon partners, to deliver highly performant, secure, and scalable Cloud AI solutions.
Disclaimer:
Certain US customer or client-facing roles may be required to comply with applicable requirements, such as immunization and occupational health mandates.
Range and benefit information provided in this posting are specific to the stated locations only
US: Hiring Range in USD from: $120,100 to $251,600 per annum. May be eligible for bonus, equity, and compensation deferral.
Oracle maintains broad salary ranges for its roles in order to account for variations in knowledge, skills, experience, market conditions and locations, as well as reflect Oracle's differing products, industries and lines of business.
Candidates are typically placed into the range based on the preceding factors as well as internal peer equity.
Oracle US offers a comprehensive benefits package which includes the following:
1. Medical, dental, and vision insurance, including expert medical opinion
2. Short term disability and long term disability
3. Life insurance and AD&D
4. Supplemental life insurance (Employee/Spouse/Child)
5. Health care and dependent care Flexible Spending Accounts
6. Pre-tax commuter and parking benefits
7. 401(k) Savings and Investment Plan with company match
8. Paid time off: Flexible Vacation is provided to all eligible employees assigned to a salaried (non-overtime eligible) position. Accrued Vacation is provided to all other employees eligible for vacation benefits. For employees working at least 35 hours per week, the vacation accrual rate is 13 days annually for the first three years of employment and 18 days annually for subsequent years of employment. Vacation accrual is prorated for employees working between 20 and 34 hours per week. Employees working fewer than 20 hours per week are not eligible for vacation.
9. 11 paid holidays
10. Paid sick leave: 72 hours of paid sick leave upon date of hire. Refreshes each calendar year. Unused balance will carry over each year up to a maximum cap of 112 hours.
11. Paid parental leave
12. Adoption assistance
13. Employee Stock Purchase Plan
14. Financial planning and group legal
15. Voluntary benefits including auto, homeowner and pet insurance
The role will generally accept applications for at least three calendar days from the posting date or as long as the job remains posted.
Career Level - IC5
Platform Architecture and Definition
- Participate in platform definition, architecture evaluation, and analysis for existing and next-generation Cloud AI platforms.
- Evaluate system architectures, proposed implementations, and scaling and optimization strategies.
- Review and assess third-party merchant silicon used for AI accelerator modules and GPU/CPU platforms.
- Balance hardware performance priorities against power, cost, regulatory, and cross-functional requirements.
- Drive definition, development, integration, debug, characterization, and tuning of AI hardware platforms.
- Provide platform development oversight for internal teams and third-party partners.
- Work with in-house engineering experts on design reviews, schematics, board layout, and implementation decisions.
- Document and specify design intent and technical details in collaboration with engineering teams.
- Guide and support system integration, system test, qualification, and characterization.
- Define and oversee system validation plans, diagnostics features, and test strategies.
- Develop and expand system characterization and performance testing capabilities.
- Utilize supplier-provided and approved open-source AI platform qualification and test tools.
- Support definition of in-service system monitoring, error reporting, and operational health visibility.
- Collaborate with GPU and AI chip suppliers, system architects, firmware developers, and hardware engineers.
- Partner with storage, networking, compute, quality, security, cloud orchestration, and manufacturing teams.
- Support development program managers with technical assessments and planning.
- Assist manufacturing teams to ensure hardware is secure, robustly evaluated, and production-ready.
- Participate in hardware platform security evaluations.
- Guide internal teams and partners on scaling, monitoring, and deploying AI platforms into the cloud.
- Serve as a senior technical advisor to Oracle hardware, software, cloud, and support teams.
- Act as the final level of engineering support for complex deployed product issues.
- Assist with root-cause analysis through lab replication, remote debug, and cross-team collaboration.