Senior ML Engineer - Evaluation Automation
Apple
Software Engineering, Data Science
Cupertino, CA, USA
Posted on May 30, 2025
Apple has an extraordinary reputation for product quality. Help shape the future of Siri by leading the evaluation of next-generation experiences that redefine how people interact with their devices. Work at the intersection of product, AI, and user experience — ensuring Siri not only works, but works naturally, intelligently, and delightfully for millions of users worldwide. As part of the Siri Evaluation team, you’ll drive the strategy and execution of large-scale evaluation efforts for cutting-edge features powered by Apple’s latest language models. You’ll collaborate closely with engineering, design, and product teams to define quality standards, build automated evaluation pipelines, and deliver actionable insights that guide Siri’s development across iPhone, iPad, Apple Watch, HomePod, Vision Pro, and more. We are looking for a talented Machine Learning Engineer with a strong background in Large Language Models (LLMs) to build the next generation ML evaluation frameworks and tools. In this role, you will leverage LLMs and other ML techniques to help automate large-scale data generation and evaluation job execution on server or on device, build LLM judges, detect anomalies, and streamline ML evaluation workflows. This is a high-impact role where you'll work at the intersection of AI/ML, conversational agents, information retrieval, software engineering, and ML evaluation, helping us push the boundaries of how AI can transform ML evaluation.