Machine Learning Engineer - Visual Agents - Special Projects
Apple
Software Engineering, Data Science
Cupertino, CA, USA
USD 126,800-220,900 / year + Equity
Posted on Apr 17, 2026
Apple is where individual imaginations gather together, committing to the values that lead to great work. Every new product we build, service we create, or experience we deliver is the result of us making each other’s ideas stronger. The diversity of our people and their thinking inspires the innovation that runs through everything we do. When we bring everybody in, we can do the best work of our lives. Here, you’ll do more than join something — you’ll add something.
The Special Projects team at Apple is developing novel experiences powered by state-of-the-art agentic vision-language models that incorporate visual context into conversational interaction. We are looking for a Machine Learning Engineer to help us build, fine-tune, and rigorously evaluate these systems. A successful candidate has hands-on experience with vision-language models, knows how to translate ambiguous product requirements into measurable evaluation criteria, and is excited to work at the intersection of multimodal modeling and agentic AI.
- Build and evaluate vision-language agents that perceive real-world scenes and incorporate that context into conversational models
- Curate, annotate, and build multimodal datasets to support model training and evaluation
- Develop automated evaluation pipelines including LLM-as-judge frameworks, human evaluation protocols, and domain-specific benchmarks
- Fine-tune Large Language Models (LLMs) and Visual-Language Models (VLMs) to improve performance for specific use cases
- Work closely with other ML Researchers to define evaluation criteria and methodology to systematically evaluate foundation models
- Design controlled experiments to measure model capabilities, identify failure modes, and drive iterative model improvements
- Conduct robust statistical analysis to identify model deficiencies and failure modes and performance gaps.
- BA or Master’s degree in Computer Science or Machine Learning
- 2+ years of hands-on experience building and evaluating generative AI or multimodal models
- Experience working with vision-language models or multimodal systems
- Proficiency in Python and ML frameworks (Pytorch or Tensorflow)
- PhD in Computer Science, Machine Learning, Statistics, or other STEM field
- Prior industry internship or research experience applying ML to product use cases
- Experience with video understanding, temporal reasoning, or activity recognition
- Familiarity with agentic system design including tool use, grounding, or perceive-act loops
- Experience building or working with large-scale multimodal data and annotation pipelines
- Proficiency in training, fine-tuning, and evaluation of foundation models and frameworks
- Publications or technical presentations in Machine Learning journals or conferences
- Excellent communication skills and cross functional collaboration
Apple is an equal opportunity employer that is committed to inclusion and diversity. We seek to promote equal opportunity for all applicants without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, Veteran status, or other legally protected characteristics. Learn more about your EEO rights as an applicant.