Apple’s push to embed AI-driven health insights into its ecosystem hit a major stumbling block after a detailed investigation exposed ChatGPT Health’s inability to deliver consistent or reliable medical assessments—even when analyzing the same user data.
The flaw surfaced when a shared 29 million steps and 6 million heart-rate measurements from Apple Health with ChatGPT Health, only to receive a cardiac risk grade of F. The ’s own doctor dismissed the result as so far off-base that insurers would likely refuse additional testing to disprove it. Worse, when the same query was repeated, the AI oscillated between a B and an F, demonstrating a level of volatility that undermines any potential utility.
The instability isn’t just a technical quirk—it’s a systemic failure. ChatGPT Health was designed to help users interpret medical data, prepare for doctor visits, or navigate insurance tradeoffs by pulling from connected health apps like Apple Health, Function, and MyFitnessPal. But the investigation suggests the system lacks the precision required for even basic diagnostic support, let alone advanced health coaching.
For Apple, the findings complicate its broader strategy to weave AI deeper into health services. The company has already signaled plans to expand AI features in Apple Health, potentially through iOS 27, while new Apple Watch models are adding sensors like body temperature monitoring. Yet the inconsistency in ChatGPT’s output raises critical questions about whether such tools are ready for primetime—especially when stakes involve personal health decisions.
Regulators and medical professionals have long warned about the risks of AI misdiagnosis, but this case highlights a new danger: not just wrong answers, but inconsistent answers that could lead users to act on conflicting advice. With no clear path to validation or oversight, the integration may force Apple to rethink how—or whether—it deploys AI in health-related applications.
The investigation follows earlier reports of ChatGPT Health’s limitations, including its inability to provide medically accurate interpretations in some cases. While OpenAI has framed the service as a supplementary tool for understanding health data, the volatility in risk assessments suggests it may instead serve as a cautionary tale about the gaps between AI’s potential and its current capabilities.
