Senior Machine Learning Engineer, WebIR - ML Infrastructure
Amazon
Description
Unlock the future of AI at Amazon AGI (Artificial General Intelligence). At Amazon, we're at the forefront of transformative AI, shaping the next generation of intelligent technologies. For over 25 years, we've developed state-of-the-art AI solutions that transform how businesses serve their customers. Today, as AI stands ready to reshape society, we're pushing beyond current breakthroughs in generative AI toward the next frontier. Join our team of scientists, engineers, and experts to help define the future of artificial intelligence. AGI is dedicated to pushing the boundaries of what's possible, using Amazon's unparalleled ML infrastructure, computing resources, and commitment to responsible AI principles. We're looking for the brightest minds from a wide range of backgrounds and experiences to help create transformative AI solutions that will improve lives, solve global challenges, and open up new realm of possibility, from reinventing commerce and accelerating enterprise productivity to advancing universal agents and shaping the future of robotics.
We are looking for a talented Senior Machine Learning Engineer to help us develop state-of-the-art, next generation web search capabilities within Amazon AGI.
A day in the life
What will you do: you will work with a multidisciplinary team across multiple programs to:
(i) build and automate training data generation: You will build a data pipeline for producing high-quality training data sets for our web information retrieval and ranking models, having direct and significant impact on our search quality. You will help improve the data quality, including mining for hard negatives, incorporating dimensions of quality (e.g. relevance, content freshness, page trustworthiness, etc.), as well as scaling the pipeline to billions of examples. You will work closely with scientists to address their specific modeling needs and help develop metrics to communicate your progress on data quality and scale.
(ii) accelerate experimental velocity: develop leveraged systems to enable the team to experiment faster. This includes centralizing evaluation workflows and developing tools and systems to streamline production model optimization and deployment.
(iii) improve system understandability: develop advanced analytics and automate failure space analysis processes to help the team debug and understand search quality issues. You will partner with the broader AGI analytics team to coordinate metrics for our information retrieval engine with user signals or downstream dependencies to debug across systems as well.
(iv) push model performance to limits; optimize model inference to maximize hardware utilization, reducing GPU inference latency, balancing trade-offs with quality for performance.