Back to all posts

Why ChatGPT Fails at Summarizing Science

2025-09-21Nadeem Sarwar2 minutes read
ChatGPT
Artificial Intelligence
Science Communication

The Allure of AI Powered Simplification

The popular prompt, "Explain it to me like a fifth grader," highlights a key benefit of AI in making complex topics accessible. This capability has been particularly praised in education. However, when it comes to summarizing dense scientific research, experts are sounding an alarm: relying solely on AI tools like ChatGPT could be a mistake.

ChatGPT on a phone.

A Year Long Scientific Experiment

Scientific research papers are often dense with technical jargon, making them challenging for the general public to understand. Traditionally, science journalists bridge this gap by translating complex findings into accessible language. Now, a team from the Science journal's press office, known as SciPak, has put ChatGPT to the test. In a revealing blog post, they detailed a year-long experiment using ChatGPT Plus to see if the AI could effectively and accurately simplify scientific papers.

ChatGPT on a laptop.

The Verdict on ChatGPTs Performance

The results of the year-long evaluation were mixed. The team discovered that summaries from ChatGPT often "sacrifice accuracy for simplicity" and are prone to hyperbole, requiring significant human editing. The AI showed a particular fondness for buzzwords like "groundbreaking," a common trait among AI chatbots that is beginning to influence everyday language.

Siri asking to shift user query to ChatGPT.

Accuracy vs Simplicity A Dangerous Trade off

The methodology involved generating three different ChatGPT summaries for two research papers each week, which were then reviewed by human writers. While not a complete failure, ChatGPT struggled significantly with the critical nuances of scientific communication.

The official white paper from the study concludes that the AI "cannot synthesize, or translate, findings for non-expert audiences." It highlights the chatbot's tendency to overhype results, its inability to properly explain the limitations of the research, and its poor performance when comparing multiple findings.

This lack of nuance is a serious concern. One human evaluator noted that the AI-generated summaries could "break a lot of trust." Abigail Eisenstadt of the AAAS Press Package added that when faced with particularly complex research, ChatGPT often reverted to using the very jargon it was supposed to simplify.

Read Original Post
ImaginePro newsletter

Subscribe to our newsletter!

Subscribe to our newsletter to get the latest news and designs.