← Back to all posts

AI Is Flooding Science With Copycat Research Papers

2025-09-24•Naddaf, Miryam•3 minutes read
Artificial Intelligence
Scientific Publishing
Research Integrity

Two hands handle a pile of bundled white paper documents. Open data sets and AI tools can be used to mass-produce low-quality, redundant papers. Credit: Tutatama/Alamy

A startling new analysis reveals that text-generating artificial intelligence tools like ChatGPT and Gemini are being used to rewrite existing scientific papers, creating hundreds of 'copycat' versions that are then published as new research. These AI-generated studies are sophisticated enough to evade the standard anti-plagiarism checks used by academic journals.

A recent preprint study identified over 400 such papers published across 112 journals in the last four and a half years, highlighting a growing threat to the integrity of scientific literature.

The Rise of Redundant Research

The study's authors warn that this trend is likely being driven by individuals and even professional paper mills—companies that produce fraudulent papers for a fee. These groups appear to be exploiting large, publicly available health databases and using large language models (LLMs) to mass-produce papers that offer no new scientific value.

To find these copycat papers, researchers screened studies that used data from the US National Health and Nutrition Examination Survey (NHANES), a vast repository of health and lifestyle data. They specifically looked for what they termed 'redundant' studies—research that tested the exact same hypothesis as a previously published paper but used a slightly different slice of the data, such as different survey years or participant demographics.

Their search uncovered 411 redundant studies, with some topics having as many as six nearly identical papers published, sometimes within the same year. "This shouldn’t be happening, and it doesn’t help the health of the scientific literature," stated co-author Matt Spick, a biomedical scientist at the University of Surrey.

Bypassing Plagiarism Detectors with Ease

To confirm their suspicions, the research team conducted an experiment. They prompted ChatGPT and Google's Gemini to rewrite three of the most duplicated articles they found, using the original papers and the NHANES data as source material. Their goal was to create new manuscripts that could pass plagiarism checks.

"We were shocked that it worked straight away," Spick commented. Although the AI-generated drafts had some errors that required about two hours of cleanup per manuscript, the results were alarming. When run through a common plagiarism-detection tool used by many publishers, the AI-generated papers did not flag as problematic.

This demonstrates that LLMs can produce derivative work that, while adding nothing new to the scientific conversation, can successfully appear original to automated checks. This new reality makes it increasingly difficult for editors and reviewers to distinguish between genuine researchers using public data and those deliberately flooding the system with low-quality, AI-generated papers.

Csaba Szabó, a pharmacologist at the University of Fribourg, warned of the potential consequences, stating, “If left unaddressed, this AI-based approach can be applied to all sorts of open-access databases, generating far more papers than anyone can imagine. This could open up Pandora’s box [and] the literature may be flooded with synthetic papers.”

Read Original Post
ImaginePro newsletter

Subscribe to our newsletter!

Subscribe to our newsletter to get the latest news and designs.