Health “Stress Test” finds 70% of AI medical advice is problematic

INDIANA — A new study published in BMJ Open has issued a stark warning for patients using AI: the “doctor” in your pocket may be providing dangerous misinformation.

Researchers tested five of the world’s most popular AI models—ChatGPT, Gemini, Grok, Meta AI, and DeepSeek—by asking 50 medical questions across topics like cancer, vaccines, and nutrition. The results revealed a systemic failure to provide accurate health data, with roughly 70% of responses flagged as problematic by independent experts.

The study utilized a “stress test” to see if chatbots would provide unfounded claims or suggest unproven alternative clinics for serious diagnoses like early-stage cancer.

  • High Failure Rates: Nearly 20% of answers were rated as “highly problematic.”
  • Hallucinated Citations: None of the chatbots could reliably produce a fully accurate reference list. Footnotes often led to nonexistent studies or dead links.
  • Lack of Guardrails: Despite the high-stakes nature of the questions, the AI models only refused to answer two out of 250 queries.

The “Worst” Performers

While all chatbots performed similarly, some struggled more with accuracy than others. Elon Musk’s Grok performed the worst, with 58% of its answers flagged, followed by ChatGPT (52%) and Meta AI (50%).

Topic CategoryPerformance LevelReason
Vaccines & CancerBest (but still 25% problematic)Large, structured bodies of clinical research.
Nutrition & AthleticsWorstHigh volume of conflicting online advice and thinner evidence bases.

The study highlights a growing concern in the medical community regarding AI Hallucinations—where a chatbot generates confident but entirely false information. Unlike a human doctor, AI lacks “common sense” and often fails to tell a user that their question might be based on a dangerous premise.

Safely Using AI for Health

Medical professionals advise that while AI can be a starting point for general health literacy, it should never replace clinical consultation. To stay safe:

  • Verify with reputable sources: Cross-reference AI claims with established institutions like the Mayo Clinic, Cleveland Clinic, or the NIH.
  • Check the URL: If an AI provides a link, ensure it leads to a .gov, .edu, or a peer-reviewed journal.
  • Consult your doctor: Always bring AI-generated questions to a licensed professional before making any changes to treatment or diet.

As the industry moves toward “Med-ALLLMs” (Medical Large Language Models), researchers hope to see stricter guardrails that force chatbots to defer to human experts for life-altering medical advice.