Software Development Engineer II - Amazon MSK, Managed Streaming Kafka (MSK), MSK Infrastructure Management
Amazon
Description
Come build the future of data streaming with the Amazon Managed Streaming for Kafka (MSK) Infrastructure team!
We are seeking a Software Development Engineer II for our Amazon MSK service, a fully managed service that makes it easy for customers to build and run applications that use Apache Kafka to process streaming data. You will join a team that owns fleet infrastructure, patching automation, and region expansion for MSK.
As a member of the Amazon MSK Infrastructure team, you will work on systems that maintain fleet health across 500,000 hosts spanning 37 regions. Your work will include building automation for fleet patching to keep RED hosts under 1% at any given time, developing region build automation to support MSK launches in new AWS regions, and ensuring feature parity across all regions. The scale of this fleet presents unique challenges in coordination, rollout strategies, and failure handling that require sophisticated automation and monitoring systems.
You will design and build scalable infrastructure services, implement monitoring and alerting systems, and develop tools that enable hands-off fleet maintenance with minimal customer impact. Your solutions must handle the complexity of coordinating updates across hundreds of thousands of hosts while maintaining service availability.
The ideal candidate has experience designing large-scale distributed systems, enjoys solving infrastructure challenges at scale, and possesses strong analytical and problem-solving skills. You should have experience with fleet management, automated deployment systems, and monitoring at scale. Knowledge of streaming data technologies like Apache Kafka and experience with infrastructure-as-code tools will be valuable.
Your responsibilities will include collaborating with other engineers to build reliable infrastructure for a large-scale AWS service, working with senior leaders to define infrastructure roadmaps, and ensuring MSK can scale globally while maintaining high availability standards.
Utility Computing (UC)
AWS Utility Computing (UC) provides product innovations — from foundational services such as Amazon's Simple Storage Service (S3) and Amazon Elastic Compute Cloud (EC2), to consistently released new product innovations that continue to set AWS's services and features apart in the industry. As a member of the UC organization, you'll support the development and management of Compute, Database, Storage, Internet of Things (IoT), Platform, and Productivity Apps services in AWS.
About AWS
Amazon Web Services (AWS) is the world's most comprehensive and broadly adopted cloud platform. We pioneered cloud computing and never stopped innovating — that's why customers from the most successful startups to Global 500 companies trust our robust suite of products and services to power their businesses.