Back to all posts

ChatGPT Caught Ignoring Website Blocks and Copying Content

2025-07-24Jonathan Bailey4 minutes read
Ai Ethics
Chatgpt
Content Scraping

OpenAI logo

For years, I've published a daily "3 Count" column on copyright news, a task that has become more challenging due to the decline of Google News and the rise of paywalls. Facing a slow news night, I decided to test AI's ability to help with research.

First, I tried Google Gemini, which I'm now paying for as a Google Workspace user. I prompted it, “What is the latest copyright news from the past 24 hours?” The results were poor, offering stories that were weeks old.

Next, I gave the same prompt to the free version of ChatGPT. The response was astonishingly worse. While the information was more current, it was also a rehashed version of my own work. ChatGPT had simply rewritten and served back my most recent "3 Count" column.

ChatGPT's Response

This wasn't just an embarrassing failure for the AI; it pointed to a much more significant problem. ChatGPT should never have been on my site in the first place.

An Ineffective Block and a Worse Answer

Back in August 2023, I wrote an article on how to block ChatGPT and followed OpenAI's own instructions to add GPTbot to my site's robots.txt file. This block, which has been in place for nearly two years, is supposed to prevent OpenAI from using my site's content for training its AI models.

A quick check with a robots.txt tester confirmed that the page in question was, in fact, supposed to be blocked to ChatGPT.

robots.txt check showing ChatGPT is supposed to be blocked

Despite this, it's clear that ChatGPT is continuing to train on my site's content. The AI even cited plagiarismtoday.com for each item and provided a link back to my column, ruling out the possibility that it scraped a pirate copy.

ChatGPT providing attribution

What's worse is that while ChatGPT presented the same three stories I did and in the same order, it failed to credit the original sources I had carefully cited in my column, such as Deadline, Torrentfreak, and Seeking Alpha. This behavior is a disservice to me as a creator, the original sources, and the end-user who is left with incomplete information.

Weak Attribution and Wrong Answers

The issue of AI and attribution is a major concern for publishers, who are seeing traffic decline as AI summarizes their work. While I was surprised that ChatGPT provided four links back to my column, the summary it generated was deeply flawed.

The AI added a “Why it Matters” section to my points, but two of the three explanations it provided were either misleading or completely wrong.

For example, it claimed a trademark was rejected due to “skepticism of IP violations,” when in reality, the company had abandoned the trademark. It also incorrectly linked a story about a company's stock rising after a legal settlement to infrastructure issues. In short, ChatGPT's only original contribution was inaccurate information, all sourced from a site it wasn't supposed to access.

Bottom Line

I recognize the limitations here. This was a single prompt, and some might call it a fluke. Others might argue I should use newer, experimental robots.txt protocols. It's also known that AI models, especially free ones, struggle with up-to-the-minute news.

However, from my perspective, the experiment was a total failure. I tried to use AI to save time, and instead, it plagiarized my work, introduced factual errors, and failed to direct users to the original sources. Everyone would have been better served if ChatGPT had simply provided a link to my column.

This incident leaves me wondering why ChatGPT was on my site at all when a robots.txt block had been in place for almost two years. My next step is to update my robots.txt file with more comprehensive AI-blocking rules, but I remain skeptical that any AI system will truly be prevented from indexing my site if it chooses not to honor the protocol.

Read Original Post
ImaginePro newsletter

Subscribe to our newsletter!

Subscribe to our newsletter to get the latest news and designs.