May 29, 2025
Education News Canada

UNIVERSITY OF WATERLOO
AI's medical diagnostic skills still need a check-up

May 26, 2025

You may want to think twice about using powerful artificial intelligence (AI) programs such as ChatGPT to self-diagnose health problems. 

A team led by researchers at the University of Waterloo found in a simulated study that ChatGPT-4o, the well-known large language model (LLM) created by OpenAI, answered open-ended diagnostic questions incorrectly nearly two-thirds of the time. 

"People should be very cautious," said Troy Zada, a doctoral student at Waterloo. "LLMs continue to improve, but right now there is still a high risk of misinformation." 

The study used almost 100 questions from a multiple-choice medical licensing examination. The questions were modified to be open-ended and similar to the symptoms and concerns real users might ask ChatGPT about.

Medical students who assessed the responses found just 37 per cent of them were correct. About two-thirds of the answers, whether factually right or wrong, were also deemed to be unclear by expert and non-expert assessors. 

One question involved a man with a rash on his wrists and hands. The man was said to work on a farm every weekend, study mortuary science, raise homing pigeons, and uses new laundry detergent to save money. 

ChatGPT incorrectly said the most likely cause of the rash was a type of skin inflammation caused by the new detergent. The correct diagnosis? His rash was caused by the latex gloves the man wore as a mortuary science student. 

"It's very important for people to be aware of the potential for LLM's to misinform," said Zada, who was supervised by Dr. Sirisha Rambhatla, an assistant professor of management science and engineering at Waterloo, for this paper. 

"The danger is that people trying to self-diagnose will get reassuring news and dismiss a serious problem or be told something is very bad when it's really nothing to worry about." 

Although the model didn't get any questions spectacularly or ridiculously wrong and performed significantly better than a previous version of ChatGPT also tested by the researchers the study concluded that LLMs just aren't accurate enough to rely on  for any medical advice yet. 

"Subtle inaccuracies are especially concerning," added Rambhatla, director of the Critical ML Lab at Waterloo. "Obvious mistakes are easy to identify, but nuances are key for accurate diagnosis."

It is unclear how many Canadians turn to LLMs to help with a medical diagnosis, but a recent study found that one-in-10 Australians have used ChatGPT to help diagnose their medical conditions.

"If you use LLMs for self-diagnosis, as we suspect people increasingly do, don't blindly accept the results," Zada said. "Going to a human health-care practitioner is still ideal." 

The study team also included researchers in law and psychiatry at the University of Toronto and St. Michael's Hospital in Toronto.

The study, Medical Misinformation in AI-Assisted Self-Diagnosis: Development of a Method (EvalPrompt) for Analyzing Large Language Models, appeared in JMIR Formative Research

For more information

University of Waterloo
200 University Avenue West
Waterloo Ontario
Canada N2L 3G1
uwaterloo.ca/


From the same organization :
258 Press releases