Senior Software Development Engineer, Annapurna Labs, Elastic Collectives
Amazon
Description
We seek an experienced engineer to work on distributed Artificial Intelligence/Machine Learning (AI/ML) systems. This role focuses on developing high-performance collective operations - the fundamental operations that enable AI to scale efficiently across multiple accelerators and servers. Most of our stack uses C/C++ at a relatively low level, requiring knowledge of Linux systems and performance-optimized code.
We value experience with ML frameworks, performance tuning and optimization techniques, embedded systems, and high-speed networking interconnects. Experience optimizing ML workloads is particularly valuable for this role.
If you enjoy solving complex performance challenges, want to work with ML customers, iterate quickly, and deliver optimized solutions at scale, join us! You'll work on the forefront of AI/ML, developing high-throughput, low-latency features for the largest clusters, with the largest customers, for the largest AI models.
Key job responsibilities
You'll work on the stack from ML collective frameworks to the libfabric and Elastic Fabric Adapter (EFA) stacks. Your focus will be designing and implementing Application Programming Interfaces (APIs) and features, as well as optimizing performance at every layer, reducing latency, and maximizing throughput for ML workloads on AWS
A day in the life
Annapurna Labs, a crucial part of AWS, is responsible for developing hardware and software components for EC2 infrastructure. Our team focuses on building networking solutions that for Machine Learning (ML) and High-Performance Computing (HPC) workloads on AWS.
We have mixed discipline orgs, you’d be working side by side with infrastructure experts, hardware engineers, RTL engineers, scientists & architects. Our workforce spans the globe and is truly international, you’ll find yourself working side by side with individuals from numerous countries. We take mentorship seriously, you can both expect senior mentorship and will be expected to mentor new and junior engineers.
The pace is fast as we work on the latest advancements of AI/ML, but we take the time to bond as a team and enjoy the successes. We offer flexibility in working hours, and respect WLB as a core org tenet. The team enjoys working with numerous principal-level engineers and closely with directors, career growth opportunities are certainly available. This is a role where you will always be encouraged to keep learning, the AI/ML field is fast moving and constantly evolving.
About the team
Annapurna Labs, an integral part of Amazon Web Services (AWS), develops hardware and software components that serve as critical building blocks for Elastic Compute Cloud (EC2) infrastructure. Every instance in EC2 runs hardware designed by Annapurna Labs. We specialize in designing performance-optimized software, systems, and chips that enhance the AWS customer experience.