AI Health Advice Caution Urged By Study
Researchers from the University of Waterloo have revealed that OpenAIs advanced large language model ChatGPT-4o struggled significantly with open-ended diagnostic questions in a simulated study. The findings indicated that the AI provided incorrect answers nearly two-thirds of the time.
AI's Diagnostic Accuracy Questioned in New Study
Troy Zada a doctoral student at the University of Waterloo involved in the research emphasized the need for public caution. LLMs continue to improve but right now there is still a high risk of misinformation Zada stated.
How the Study Tested ChatGPTs Medical Know-How
The research methodology involved adapting nearly 100 multiple-choice questions from a medical licensing examination. These questions were transformed into an open-ended format mirroring the types of health-related queries users might pose to ChatGPT.
Assessments of ChatGPT-4os responses conducted by medical students found a mere 37 percent to be accurate. Furthermore approximately two-thirds of all answers regardless of their factual correctness were considered unclear by both expert and non-expert evaluators.
The Dangers of Relying on AI for Health Advice
To illustrate the models shortcomings one case presented involved a man with a rash on his wrists and hands. His background included weekend farm work mortuary science studies raising homing pigeons and using a new cost-saving laundry detergent. ChatGPT incorrectly identified the new detergent as the most probable cause of the rash suggesting a type of skin inflammation. However the correct diagnosis was an allergic reaction to latex gloves used during his mortuary science studies.
Its very important for people to be aware of the potential for LLMs to misinform reiterated Zada. His work on this paper was supervised by Dr Sirisha Rambhatla an assistant professor of management science and engineering at Waterloo.
Zada further highlighted the risks The danger is that people trying to self-diagnose will get reassuring news and dismiss a serious problem or be told something is very bad when its really nothing to worry about.
While ChatGPT-4o showed improvement over previous versions and avoided outlandish errors the researchers concluded that LLMs are not yet sufficiently accurate for reliable medical advice.
Dr Rambhatla who also directs the Critical ML Lab at Waterloo pointed out a specific concern Subtle inaccuracies are especially concerning. Obvious mistakes are easy to identify but nuances are key for accurate diagnosis.
AI Self-Diagnosis A Growing Trend and Expert Recommendations
The extent of LLM use for medical diagnosis among Canadians remains unknown but a recent Australian study indicated that one in ten people there have used ChatGPT for such purposes.
If you use LLMs for self-diagnosis as we suspect people increasingly do dont blindly accept the results Zada advised. Going to a human healthcare practitioner is still ideal.
The collaborative research team also featured experts in law and psychiatry from the University of Toronto and St Michaels Hospital in Toronto.
The full study titled Medical Misinformation in AI-Assisted Self-Diagnosis Development of a Method EvalPrompt for Analyzing Large Language Models was published in JMIR Formative Research.
Source: University of Waterloo media relations