Data Scientist - Health AIML

Apple

Apple

Data Science

Cupertino, CA, USA

Posted on May 29, 2026
The Health AI team is at the forefront of machine learning and health science at Apple. We are a close-knit team of highly accomplished, deeply technical research scientists, software engineers, and machine learning engineers passionate about delivering innovative technologies that impact millions of users. We are looking for a senior engineer excited about solving real-world problems in the health domain that make a difference in our customers' lives.
We are looking for a highly technical and experienced data scientist who can work embedded with engineering to help design, execute, and analyze manual and LLM based evaluations of health AI models and agentic experiences.
  • Design and Own End-to-End Evaluation Frameworks for Health AI: Develop rigorous evaluation methodologies for AI-enabled health systems, including metric definition, sampling strategy, experiment design, and statistical validity checks. Build scalable, reproducible pipelines that produce trustworthy and interpretable results across product surfaces and model iterations — ensuring that quality is measured against the high bar required for technologies that impact users' health.
  • Build High-Quality Evaluation Datasets & Human-in-the-Loop Systems: Create and maintain gold-standard datasets for offline and online assessment of generative AI and ML models in the health domain. Lead data generation and annotation workflows (e.g., clinical and expert ratings, red teaming, preference data, domain-specific evals), ensuring coverage, data quality, bias mitigation, and alignment with product and safety goals.
  • Partner Daily with Quality & Training Teams to Drive Product and Model Development: Work hand-in-hand with quality and model training teams as an embedded collaborator throughout the development cycle. Help turn product hypotheses into concrete evaluation sets, then analyze whether proposed fixes genuinely resolve the targeted problem. A typical loop: the quality team raises a hypothesis, you help design an eval set to test it, a fix is proposed, and you rigorously assess whether that fix moves the needle. Translate these findings into actionable recommendations for model training, tuning, and product launches.
  • 5+ years of experience in data science, machine learning, and analytics, including statistical data analysis and A/B testing.
  • Experience articulating and translating business questions and using statistical techniques to arrive at an answer using available data.
  • Strong programming skills, including data-querying skills (SQL and/or Spark, etc.) and experience with a scripting language for data processing and development (e.g., Python, R, or Scala).
  • Excellent collaboration skills to achieve impactful results by working effectively with diverse cross-functional teams, including PMs, engineers, data scientists, and others.
  • B.S. in Machine Learning, Computer Science, Statistics, Operations Research or other quantitative fields.
  • Applicants have a good understanding of large language model (LLMs), including their architecture, training methods, prompt engineering and fine-tuning for specific tasks.
  • Hands-on experience in applying LLMs to solve technical problems, such as data analysis, data automation, synthetic data generation, with proven ability to optimize model performance for accuracy and efficiency.
  • Ph.D. in machine learning, computer science, statistics, operations research or other quantitative fields.
  • 10+ years of relevant work experience.