Health

AI chatbots cause havoc with medical misinfo

The chatbots consistently expanded upon the false medical detail.

A new investigation conducted by researchers at the Icahn School of Medicine at Mount Sinai reveals that widely utilized AI chatbots are highly susceptible to perpetuating and expanding upon false medical information, highlighting an urgent need for robust safeguards before these tools can be reliably integrated into health care settings.

The researchers also showed that incorporating a straightforward warning prompt within the system can substantially mitigate this risk, providing a viable strategy as the technology continues to advance rapidly.

With an increasing number of clinicians and patients relying on AI for assistance, the team sought to determine whether chatbots would uncritically reproduce incorrect medical details embedded within a user’s query and whether a brief prompt could guide them toward delivering safer, more accurate responses.

“What we observed consistently is that AI chatbots can be easily deceived by false medical information, whether those inaccuracies are deliberate or unintentional,” says lead author Mahmud Omar, MD, an independent consultant collaborating with the research team.

“Not only did they repeat the misinformation, but they frequently elaborated on it, providing confident explanations for entirely fabricated conditions. The promising finding is that a simple, one-line warning included in the prompt significantly reduced these errors, demonstrating that modest safeguards can yield substantial improvements.”

The researchers developed fictional patient scenarios, each incorporating a single fabricated medical term, such as an invented disease, symptom, or diagnostic test, and submitted these to leading large language models. In the initial phase, the chatbots processed the scenarios without any additional guidance. In the subsequent phase, the team added a one-line caution to the prompt, alerting the AI to the possibility that the provided information might be inaccurate.

In the absence of the warning, the chatbots consistently expanded upon the false medical detail, confidently generating explanations for nonexistent conditions or treatments. However, when the cautionary prompt was included, these errors were significantly reduced.

“Our objective was to assess whether a chatbot would propagate false information if it was embedded in a medical query, and the answer is unequivocally yes,” says co-corresponding senior author Eyal Klang, MD, Chief of Generative AI in the Windreich Department of Artificial Intelligence and Human Health at the Icahn School of Medicine at Mount Sinai.

“A single fabricated term could trigger a detailed, authoritative response based entirely on fiction. Yet, we also discovered that a simple, well-placed safety reminder within the prompt made a significant impact, reducing those errors by nearly half. This suggests that these tools can be made safer, provided we prioritize prompt design and implement robust safeguards.”

The research team plans to extend this methodology to real, de-identified patient records and explore more sophisticated safety prompts and retrieval tools. They aim for their “fake-term” approach to serve as a straightforward yet effective method for hospitals, technology developers, and regulators to rigorously evaluate AI systems prior to their use in clinical settings.

  • Press release from The Mount Sinai Hospital / Mount Sinai School of Medicine