Developer Offer
Try ImaginePro API with 50 Free Credits
Build and ship AI-powered visuals with Midjourney, Flux, and more — free credits refresh every month.
The Arms Race To Block AI Web Crawlers
For decades, the internet has been home to bots. Search engines like Google have used automated "crawlers" to index websites, helping you find what you're looking for. But a new generation of crawlers, designed to train generative AI, is posing a significant threat to the web's ecosystem. These technologies are known to undermine the business models of countless websites and bring up major privacy concerns. Fortunately, there are emerging ways to stop them from absorbing your content.
Fighting AI with AI The Rise of Data Poisoning
One of the most innovative strategies is to essentially poison the data. This involves adding something to your content that makes it difficult for AI to learn from. Researchers have developed tools like image filters that introduce a layer of useless "noise." This noise confuses AI models, while the content still appears normal to human eyes.
To an AI, protected images are drowned out with "noise". (Supplied: CSIRO)
However, Salil Kanhere, a computer scientist at the University of New South Wales, notes that this is an ongoing battle. "The fear with technological solutions is they might work today, but then in a month, they may no longer work," he says. This constant back-and-forth is often described as a "cat-and-mouse game."
An Australian research team from CSIRO is working to gain an edge in this game by creating "provably unlearnable" content. Collaborator Derek Wang explains their algorithm assesses how learnable any piece of content is for any type of AI. This information helps defenders create stronger blocking tools. By applying their algorithm to images, Dr. Wang says they can "guarantee" an image will be impenetrable to about 90 percent of AI attacks. The team presented their work at a conference and has received significant interest from online creators.
"People are very interested in this possible solution for their creative rights."
Dr. Wang says the tool could eventually be embedded into websites for automatic protection. (Supplied: CSIRO)
A Simpler Approach Just Saying No to Crawlers
There's also a more straightforward method to block crawlers: simply asking them not to access your content. Websites traditionally use a file called robots.txt to give instructions to crawlers. Of course, whether the crawlers comply is another matter. To address this, new standards like the RSL Standard are being developed to specify what content AI can scrape and under what licensing terms.
Some major internet companies are taking action on a larger scale. Cloudflare, which provides services for about 20% of the web, announced in July that it would block AI crawlers by default for its customers. Will Allen, Cloudflare's vice president of product, says this move gives website owners a choice.
"You as a site owner can decide, 'I'm going to allow them or block them.'"
This allows content creators, such as news sites, to negotiate payment models with AI companies in exchange for access to their work.
Will AI Companies Play by the Rules
This all depends on AI crawlers acting in good faith. Historically, this hasn't always been the case, with many of the largest AI companies having used vast amounts of copyrighted material without permission. However, Will Allen of Cloudflare believes the bigger players are starting to play fair, noting that well-behaved crawlers self-identify. If a crawler tries to bypass these rules by pretending to be a human user, it risks being flagged as a malicious bot and blocked entirely.
The Future of the Open Internet Hangs in the Balance
The rise of AI-generated summaries has led to significant drops in website page views. This loss of ad revenue is forcing many sites to put up paywalls, threatening the fundamental structure of the open web.
"The internet's an amazing, amazing invention and one of the most amazing parts of it is the fact that large parts have been open. That's phenomenal. It is at risk."
Professor Kanhere believes that while Cloudflare's model is a good start, it's not foolproof. If AI companies decide that paying for content isn't worth it, the cat-and-mouse game of technical blocks will likely escalate. However, the hope is that licensing deals, similar to those OpenAI has made with news publishers, will become more common. This could create a sustainable model without having to completely rethink the internet, even as direct traffic to websites continues to decline.
Compare Plans & Pricing
Find the plan that matches your workload and unlock full access to ImaginePro.
| Plan | Price | Highlights |
|---|---|---|
| Standard | $8 / month |
|
| Premium | $20 / month |
|
Need custom terms? Talk to us to tailor credits, rate limits, or deployment options.
View All Pricing Details

