Developer Offer

Try ImaginePro API with 50 Free Credits

Build and ship AI-powered visuals with Midjourney, Flux, and more — free credits refresh every month.

ChatGPT Fails To Spot Flawed Scientific Research

2025-08-15•Dalmeet Singh Chawla, special to C&EN•3 minutes read

Artificial Intelligence

Scientific Research

Misinformation

A new study reveals a significant blind spot in the popular AI chatbot, ChatGPT. The large language model often fails to identify or flag scientific papers that have been retracted or have had their validity questioned, potentially leading to the spread of inaccurate information.

The Alarming Findings of a New Study

A recent analysis published in Learned Publishing investigated how well the AI tool recognizes problematic scholarly articles. The research specifically examined GPT-4o mini's ability to assess problems with 217 studies that were listed as retracted or flagged for concerns in the comprehensive Retraction Watch Database.

The results were startling. Researchers tasked the text-oriented version of the AI to evaluate each of the 217 papers 30 times, generating a total of 6,510 reports. In none of these reports did the AI mention that the paper in question had been retracted or had any documented validity issues.

How Widespread is the Problem

Instead of flagging the problematic research, ChatGPT often praised it. In 190 instances, the AI described the flawed papers as being “world leading,” “internationally excellent,” or close to that standard. Only a small fraction of the papers received low scores for being weak, and just five, including a controversial study on hydroxychloroquine as a COVID-19 treatment, were labeled as controversial.

In a follow-up test, the study authors took 61 specific claims from the retracted papers and asked the AI to verify their accuracy 10 times each. Two-thirds of the time, ChatGPT either confirmed the false claims were true or provided a similarly positive response.

Experts Weigh In on the Implications

“We were surprised that, at the time, ChatGPT didn’t deal very well with retractions at all, so it didn’t mention them and reported retracted information as true,” said study coauthor Mike Thelwall, a metascience researcher at the University of Sheffield. He expressed concern that this flaw could have serious consequences, noting, “One of the main ways in which people get information about science nowadays is through large language models.”

Thelwall warns that if researchers use tools like ChatGPT for tasks such as literature reviews, they could easily and unknowingly incorporate retracted articles into their work. He recommends that the algorithms powering these AI chatbots be updated to take retractions seriously.

Debora Weber-Wulff, a computer scientist at the HTW Berlin University of Applied Sciences, stated she was not surprised by the findings. “People are relying too much on these text-extruding machines, and that will corrupt the scientific record,” she warned.

A Systemic Challenge for AI and Humans

However, Weber-Wulff also raised questions about the study's methodology, pointing out the lack of a control group of non-retracted papers to see if the AI treated them differently. She also highlighted a broader issue: retractions are often not clearly marked in scientific literature, making them difficult for anyone to identify.

“They are only using the title and the abstract for the evaluations and are assuming that there is some sort of method to determine if a paper is retracted that ChatGPT can apply,” she explained. “The problem is that HUMANS have a very difficult time determining if a paper or a dissertation has been retracted because of the reluctance of journals and universities to properly mark them!”

Read Original Post

Compare Plans & Pricing

Find the plan that matches your workload and unlock full access to ImaginePro.

ImaginePro pricing comparison
Plan	Price	Highlights
Standard	$8 / month	300 monthly credits included Access to Midjourney, Flux, and SDXL models Commercial usage rights
Premium	$20 / month	900 monthly credits for scaling teams Higher concurrency and faster delivery Priority support via Slack or Telegram

Need custom terms? Talk to us to tailor credits, rate limits, or deployment options.

View All Pricing Details

Try ImaginePro API with 50 Free Credits

ChatGPT Fails To Spot Flawed Scientific Research

The Alarming Findings of a New Study

How Widespread is the Problem

Experts Weigh In on the Implications

A Systemic Challenge for AI and Humans

Compare Plans & Pricing

More Blogs

AI Fails The Australian Test With Biased Images

Claude Challenges ChatGPT With New AI Learning Tools

Subscribe to our newsletter!