Software Development Engineer, EC2 Trainium AI Infra

Amazon

Amazon

Software Engineering, Data Science

Seattle, WA, USA

Posted on May 12, 2026

Description

EC2 Infrastructure Services organization is responsible for making EC2 instances available to our customers at all times. We are a key part of what makes EC2 elastic. AI infrastructure has taken a key place in EC2 and we are building systems, services, and automation to operate this at scale.

The Software Development Engineer will design, build, and maintain cloud-based provisioning and recovery systems for AWS Trainium-based AI UltraServers. This role requires expertise in AWS services, system architecture, and cross-functional collaboration with Capacity Management, Hardware Engineering, and Datacenter Operations to manage AI/ML infrastructure.

Key job responsibilities
Key job responsibilities
- The Software Development Engineer is responsible for building and maintaining scalable micro services.
- They are adept at system design that solves the business problem efficiently.
- Work in environments where the technology strategy is defined but the solution design is not
- Build cloud-based solutions using AWS native services for scaling infrastructure frameworks
- Create observable systems with appropriate metrics and alarming
- Collaborate with customers and stakeholders to convert business needs into technical designs
- Participate in code reviews and technical assessments

About the team
The EC2 UltraServer Provisioning team is a high-performing engineering organization responsible for delivering AWS Trainium-based UltraServers infrastructure at scale. We manage end-to-end provisioning workflows from host ingestion through testing, repair, and recovery.