AIML - Sr Machine Learning Engineer, Data and ML Innovation

Apple

Apple

Software Engineering, Data Science

Cupertino, CA, USA

USD 181,100-318,400 / year + Equity

Posted on Jun 9, 2026
We are looking for talented machine learning engineers who are excited to tackle some of the most meaningful and technically challenging problems in building and deploying foundation model–based products for our customers. As a Machine Learning Engineer focused on foundation model evaluation, you will play a critical role in assessing the capabilities of the models that power Apple Intelligence features. You will work closely with machine learning researchers to translate evaluation insights into actionable improvements that advance future model performance.
As a foundation model evaluation Machine Learning Engineer, you will be entrusted with ensuring that foundation model performance can be measured quickly and reliably, in order to support crucial model shipping decisions. You will design, implement, and maintain crucial evaluation infrastructure. You will collaborate extensively with ML researchers on both model hillclimbing and developing novel methodologies for measuring model performance. Your responsibilities will span a number of high-impact parts of the Apple product and foundation model lifecycle.
  • Ensure the stability, reliability, and performance of Apple's foundation model evaluation system.
  • Design and implement novel evaluation methodologies.
  • Help design and implement tooling to simplify metrics generation, ingestion, and reporting.
  • Leverage agentic LLM systems to facilitate and improve model evaluations.
  • 5+ years of hands on ML engineering experiences, with at least 1+ years working directly on large language models or generative AI.
  • Bachelor’s, Master’s, or PhD in Computer Science, Machine Learning, or a related technical field — or equivalent practical experience.
  • Strong software engineering fundamentals: debugging, testing, code reviews, and production reliability / scalability.
  • Hands-on experience with LLM training and / or evaluation workflows, including any of the following: pre-training, post-training, online evaluation, offline evaluation, automated evaluation, human evaluation.
  • Hands on experience with evaluating large language models at scale or designing large language model benchmarks.
  • Strong communication skills, able to clearly and concisely convey important information.
  • Self-motivated and curious. Strive to continually learn on the job.
  • High level of creative and critical thinking skills with an innate drive to improve how things work. Have a high tolerance for ambiguity and the ability to identify the most important problems to solve.