Back to all posts

ChatGPT Falls Short On Specific Hematology Cancer Queries

2025-10-16Chris Ryan3 minutes read
Artificial Intelligence
Healthcare
ChatGPT

A study published in Future Science OA found that ChatGPT 3.5 often struggles to provide current and accurate information for patient-specific queries and questions about novel therapies in hematologic malignancies, performing better only on more general inquiries.

To evaluate the AI, researchers had hematology-oncology experts rate ChatGPT's responses to 10 different questions on a scale of 1 (strongly disagree) to 5 (strongly agree). The questions were split between five general queries and five highly specific questions related to new treatments and mutations.

Key Findings: General vs. Specific Queries

The study's findings highlighted a clear performance gap:

  • General questions received a higher average score of 3.38 from the clinician reviewers.
  • Highly specific questions scored lower, with an average of 3.06.

Notably, none of the 10 questions received a perfect score of 5, meaning no answer was considered fully accurate, clear, and comprehensive enough for a physician to recommend directly to a patient. The question "How can I lower my measurable residual disease?" scored the lowest at 2.25.

"Due to the fact that AI is malleable and these studies have shown that AI does not present 100% accurate or updated information needed to effectively and safely educate patients, a physician will always be needed, at least at this time, to approve AI information," wrote lead author Tiffany Nong and her colleagues.

Why Was ChatGPT Version 3.5 Selected for this Study?

The research was conducted in July 2024, when ChatGPT 3.5 was the freely available version. The study's authors noted that the model's knowledge cutoff date of September 2021 was a likely contributor to its shortcomings. This outdated information base means it lacks knowledge of recent therapeutic breakthroughs, such as certain FLT3 inhibitors.

"Machine learning models rely on training data, and when ChatGPT only has a small number of sources available, it may pull information from less reliable sources," the authors explained.

How Was this Study of ChatGPT Conducted?

Researchers formulated questions that mirrored the evolving needs of patients throughout their treatment journey. The questions were created with input from a hematology oncologist and based on information from trusted sources like the National Cancer Institute and the American Cancer Society.

To ensure consistency and avoid bias, four reviewers—all hematology oncologists specializing in leukemias—submitted the questions in separate, private chat sessions.

What Were the Limitations of the Study?

The authors acknowledged that their 10 questions represented a small sample of potential patient inquiries. The results are also limited to ChatGPT version 3.5 and cannot be generalized to other AI chatbots or newer versions. Furthermore, the answers were generated at a single point in time, which doesn't capture how AI models evolve with ongoing training.

Despite these limitations, the authors concluded that while chatbots could become useful for triaging routine patient questions, their use requires strict oversight. "Successful implementation requires protocols for physicians to effectively vet and approve chatbot-generated responses before they reach the patient," they stated.

References

  1. Nong T, Britton S, Bhanderi V, Taylor J. ChatGPT’s role in the rapidly evolving hematologic cancer landscape. Future Sci OA. 2025;11(1):2546259. doi:10.1080/20565623.2025.2546259
  2. Introducing GPT-5. Open AI. August 7, 2025. Accessed October 15, 2025. https://openai.com/index/introducing-gpt-5/
Read Original Post
ImaginePro newsletter

Subscribe to our newsletter!

Subscribe to our newsletter to get the latest news and designs.