ML Engineer, FM Training Integration - ML Platform Technologies
Apple
Software Engineering, Data Science
San Francisco, CA, USA · New York, USA · San Francisco Bay Area, CA, USA · Multiple locations
USD 147,400-220,900 / year + Equity
Posted on Dec 17, 2025
We are a group of engineers to support training foundation models at Apple! We build infrastructure to support training foundation models with general capabilities such as understanding and generation of text, images, speech, videos, and other modalities and apply these models to Apple products. We are looking for engineers who are passionate about building systems that push the frontier of deep learning in terms of scaling, efficiency, and flexibility and delight millions of users in Apple products.
We are looking for a ML Engineer to join our ML Compute team to help improve the efficiency, scalability, and reliability of model training and inference workloads in the cloud. In this role, you will work closely with senior ML engineers, infra engineers, and researchers to integrate ML workloads with cloud infrastructure, tune performance, and ensure effective utilization of the accelerators.
- Support the integration of model training and inference workloads with accelerator based cloud infrastructure.
- Assist with performance tuning of ML workloads to improve throughput, latency, and hardware utilization.
- Help identify and debug performance bottlenecks across data loading, model execution, and distributed training/inference.
- Collaborate with senior engineers to benchmark models and infrastructure configurations.
- Contribute to tooling, scripts, or pipelines that improve observability, reliability, and efficiency of ML workloads.
- Bachelor’s degree in Computer Science, Engineering, or a related field.
- Basic understanding of machine learning workflows (training, evaluation, inference).
- Familiarity with Python and at least one ML framework (e.g., PyTorch, TensorFlow, JAX).
- Basic knowledge of cloud computing concepts (e.g., VMs, containers, storage, networking).
- Interest in performance optimization, systems efficiency, and scalable ML infrastructure.
- Strong problem-solving skills and willingness to learn complex systems.
- Exposure to GPU/TPU computing or accelerator-based workloads.
- Familiarity with distributed training or inference concepts (e.g., data parallelism, model parallelism).
- Experience with containerization or orchestration tools (e.g., Docker, Kubernetes).
- Basic understanding of profiling or benchmarking tools for ML workloads.
- Coursework or projects related to systems, cloud infrastructure, or performance engineering.
Apple is an equal opportunity employer that is committed to inclusion and diversity. We seek to promote equal opportunity for all applicants without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, Veteran status, or other legally protected characteristics. Learn more about your EEO rights as an applicant.