Software Engineer, ML platform and Infrastructure

Apple

Apple

Software Engineering, Other Engineering, Data Science

Austin, TX, USA

Posted on Apr 9, 2026
The Applied Machine Learning team has been at the forefront of accelerating digital transformation through machine learning across Apple's enterprise ecosystem. Our ML Platforms, Solutions, and Services deliver a comprehensive suite of capabilities that drive efficiency, agility, and innovation at Apple scale—serving business-critical needs across the enterprise. We are looking for talented Software Engineers who are passionate about distributed systems and large-scale infrastructure to build and operate world-class ML platforms and products across cloud environments.
Join Apple's Applied Machine Learning Team as a Machine Learning Platform Engineer and play a central role in designing and building the systems that power our Data, Machine Learning, and Generative AI initiatives. You will architect and engineer robust, high-performance, massively scalable platforms that serve as the foundation for groundbreaking ML workloads across the enterprise. In this role, you will apply software engineering depth to solve the hardest challenges in large-scale distributed systems—designing for reliability, performance, and efficiency from the ground up. You will own the technical direction of ML/Data/Inference platform capabilities, leading the evaluation and integration of cutting-edge open-source technologies and building innovative internal solutions that raise the bar for scalability and resilience across our ML ecosystem. You'll collaborate closely with cross-functional engineering and business teams, influencing technical strategy and contributing meaningfully to the broader platform roadmap.
  • Highly proficient in Python, Java, or Go, with a strong track record of building production-grade automation, tooling, and system-level software.
  • Deep understanding of LLM infrastructure requirements—including GPUs, TPUs, and Inferentia—with hands-on experience engineering systems that optimize their utilization and performance.
  • Experience designing and building Agents and MCP servers, with hands-on expertise in frameworks such as LangGraph and LangChain.
  • Solid background in software engineering for complex, large-scale distributed systems, with strong familiarity with DevOps and reliability engineering practices.
  • Expert-level proficiency with AWS/GCP and deep, hands-on experience architecting and engineering containerized workloads using Kubernetes in production environments.
  • Proven ability to read, understand, and make meaningful contributions to complex open-source codebases in the ML infrastructure space.
  • Strong command of operating system internals, networking protocols, and security principles, applied to building highly available and resilient systems.
  • Exceptional analytical and problem-solving skills, with a demonstrated ability to identify and resolve critical system bottlenecks and failures in high-stakes environments.
  • 5+ years of experience in software development, with a strong focus on backend systems and APIs.
  • 2+ years of experience working with LLMs, Agent Frameworks
  • 5+ years of experience with cloud platforms such as AWS,or GCP
  • Experience engineering scalable solutions for data processing and model training/fine-tuning workflows.
  • Hands-on experience building with distributed data technologies for ML training such as Spark, Flink, Iceberg, or Snowflake, with a deep understanding of their architectural trade-offs at scale.

Apple is an equal opportunity employer that is committed to inclusion and diversity. We seek to promote equal opportunity for all applicants without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, Veteran status, or other legally protected characteristics. Learn more about your EEO rights as an applicant.

Apple accepts applications to this posting on an ongoing basis.