Sr. Manufacturing Engineer, Trainium Manufacturing, Quality and Reliability
Amazon
DESCRIPTION
Leading the manufacturing of AI Servers and Systems based on Trainium chips across cross-geographical ODMs and CMs. As part of the Manufacturing, Quality and Reliability Team in AWS Annapurna Labs focused on Machine Learning products that designs cutting AI platforms for the world’s largest Cloud Services provider.
AWS Utility Computing (UC) provides product innovations — from foundational services such as Amazon’s Simple Storage Service (S3) and Amazon Elastic Compute Cloud (EC2), to consistently released new product innovations that continue to set AWS’s services and features apart in the industry. As a member of the UC organization, you’ll support the development and management of Compute, Database, Storage, Internet of Things (Iot), Platform, and Productivity Apps services in AWS, including support for customers who require specialized security solutions for their cloud services.
Within AWS, Annapurna Labs team is building the next generation cloud server infrastructure. Our success depends on delivering world-class server infrastructure; we're handling massive scale and rapid integration of emergent technologies. Our servers include accelerators such as AWS Trainium and AWS Inferentia which are machine learning products designed to deliver high performance at low cost.
The Trainium Manufacturing, Quality and Reliability Team is part of AWS Annapurna Labs focused on Machine Learning products that designs cutting AI platforms for the world’s largest Cloud Services provider. We are seeking a talented and motivated Manufacturing Engineer with a proven track record of implementing best in class test techniques and processes within a complex supply chain. As a member of the Cloud-Scale Machine Learning Acceleration team, you will be the interface between the system engineering team and the ODM and CM partners.
As a Senior Manufacturing Engineer you will engage with an experienced cross-disciplinary staff to conceive and design infrastructure technologies. You will work closely with an internal inter-disciplinary team, and outside partners to drive key aspects of product definition, execution and test in manufacturing. A successful candidate will be responsive, flexible and able to succeed within an open collaborative peer environment. You will:
* Be responsible for the test validation of future technologies.
* Drive manufacturing process improvements to address reliability issues and concerns.
* Qualify manufacturing lines and mechanisms for mass production
Lead identifying and validating product/component risks and work with design teams to mitigate them and define the test methodology and test coverage to assure product quality
* Provide technical leadership and mentor engineers.
* Working with multiple vendors and ODMs to standardize component manufacturing and reliability expectations.
The successful candidate will be capable of making wide-ranging business decisions on behalf of the organization and willing to “roll up sleeves and do what needs to get done” to consistently deliver results. We’re changing an industry, and we want individuals who are ready for this challenge.
Key job responsibilities
- Work with system engineering teams to identify and escalate manufacturing challenges by enforcing DFM, DFA and DFS principles
- Evaluate, investigate and introduce new manufacturing technology and methodology to enhance product quality and production efficiency at ODM and CM
- Develop or adapt manufacturing process at the ODM and CM, including defining fixture requirements, critical assembly requirements, test methodology, signal integrity, power and heat management requirement
- Drive all factory-related operational issues related to assembly processes during pre-production builds; ensure effective closure to enable operational success of the new product introduction cycle, and put products into mass production
- Manage product lifecycle changes, lead product quality and reliability improvement projects, and drive technical root cause for supplier defects
- Implement and optimize manufacturing 1st pass yields and efficiency from prototype through product ramp
- Work with engineering teams to clearly represent process and reviews to enable smooth New Product Introduction and changes
- Support cost reduction and sustaining activities
About the team
Annapurna Labs is a wholly owned subsidiary of AWS, focused on developing custom silicon and servers including the Nitro(K2), Graviton, Inferentia, and Trainium families of processors.
Machine Learning Annapurna functions as a vertically integrated team including software, firmware, hardware, and silicon design in a single organization.
We are the Trainium Servers and Systems organization under MLA focused on Hardware Development, Software Development, Fleet Ops Systems, and Manufacturing, Quality, and Reliability.
This position is in the Manufacturing, Quality and Reliability team.