ChatGPT has received a lot of praise for its ability to produce well-written papers. However, those with cancer are discouraged from turning to the AI-powered bot and instead encouraged to talk to their physicians.

ChatGPT Has Limited Ability to Cancer-Recommended Guidelines

The internet is useful for many patients who want to educate themselves on various medical matters. Researchers from Brigham and Women's Hospital, a founding member of the Mass General Brigham healthcare system, evaluated how consistently the artificial intelligence chatbot recommends cancer treatment that is in line with National Comprehensive Cancer Network (NCCN) guidelines now that ChatGPT is available to patients.

According to a recent study, ChatGPT 3.5 made inappropriate ("non-concordant") recommendations in about one-third of situations, underscoring the importance of being aware of the technology's limits.

Author of the study, Danielle Bitterman, MD, of the Mass General Brigham Department of Radiation Oncology and the Artificial Intelligence in Medicine (AIM) Program, and her colleagues decided to assess how closely ChatGPT's recommendations matched the NCCN guidelines, which are used by doctors at institutions across the nation. They concentrated on the three most prevalent cancers-breast, prostate, and lung cancer-and asked ChatGPT to suggest a course of treatment for each malignancy based on its severity. The researchers created 104 prompts using 104 different diagnosis descriptions and four slightly different prompts to ask ChatGPT for a treatment plan.

Almost all responses (98%) included at least one treatment strategy that complied with NCCN recommendations. However, the researchers discovered that 34% of these comments also contained one or more non-concordant suggestions, which were occasionally challenging to spot among otherwise solid advice.

The definition of a non-concordant treatment suggestion was only partially accurate, such as advising surgery alone for locally advanced breast cancer without mentioning any other forms of therapy. Notably, only 62% of cases had a perfect agreement in grading, highlighting both the difficulty of the NCCN criteria and the potential ambiguity or complexity of ChatGPT's output.

In 12.5% of the time, ChatGPT generated "hallucinations," or a therapy prescription that was wholly deviant from NCCN recommendations. These included suggestions for cutting-edge treatments or curative treatments for incurable cancers.

The authors noted how this kind of false information may lead patients to unrealistic expectations for their care and may adversely affect the doctor-patient relationship.

ALSO READ: Bots Better, Faster Than Humans at Cracking Captcha Tests With Nearly 100% Accuracy

Study Used GPT-3.5

The largest model when the study was conducted was GPT-3.5-turbo-0301. That model class is still utilized in ChatGPT's open-access version (a newer version, GPT-4, is only accessible with a paid subscription).

GPT-3.5-turbo-0301 was designed using data up to September 2021. They also applied the 2021 NCCN criteria.

The researchers stress that although outcomes may differ if alternative LLMs and clinical guidelines are utilized, LLMs have similar limitations and abilities as they are built the same.

According to lead author Shan Chen, MS, of the AIM Program, there is an unresolved research question regarding the extent to which LLMs produce consistently rational responses because "hallucinations" are frequently noticed. In a manner similar to how Google searches have been conducted, users are likely to ask LLMs for information to educate themselves on health-related matters.

Chen reminded everyone that LLMs are not the same as licensed medical doctors.

RELATED ARTICLE: AI (Artificial Intelligence) Bot GPT-3 Finished a 500-Word Academic Thesis

Check out more news and information on AI and GPT-3 in Science Times.