Developer Offer

Try ImaginePro API with 50 Free Credits

Build and ship AI-powered visuals with Midjourney, Flux, and more — free credits refresh every month.

Illegal Images Found In Widely Used AI Research Dataset

2025-10-25•Emanuel Maiberg•4 minutes read

AI Ethics

Data Security

CSAM

A significant image dataset, NudeNet, which has been instrumental in developing AI tools for nudity detection, has been found to contain illegal child sexual abuse material (CSAM). The discovery was made by the Canadian Centre for Child Protection (C3P), raising serious legal and ethical questions about the sourcing of data for AI development.

The NudeNet Dataset and Its Widespread Use

The NudeNet dataset is a massive collection of over 700,000 images scraped from the internet, designed to train AI classifiers to automatically identify nudity. Since its release on the research data-sharing platform Academic Torrents in June 2019, it has become a foundational tool in the academic community. According to a C3P announcement, the dataset has been cited or used in more than 250 academic works. A review of just 50 of these projects revealed that 13 directly used the NudeNet data, while 29 relied on the AI model trained with it.

Disturbing Findings Within the Data

C3P's investigation uncovered more than 120 images of identified or known victims of CSAM within the dataset. The findings were deeply troubling, including nearly 70 images that focused on the genital or anal areas of children who appeared to be pre-pubescent. The organization stated, “In some cases, images depicting sexual or abusive acts involving children and teenagers such as fellatio or penile-vaginal penetration.”

Following a removal notice from C3P, Academic Torrents has taken the dataset offline. Lloyd Richardson, C3P's director of technology, explained that the investigation began after a tip from the public flagged concerns, prompting a closer look into the dataset's contents.

Legal and Ethical Alarms for Researchers

The presence of this material creates a severe predicament for the hundreds of researchers and organizations that downloaded the data. Without actively searching for it, they would have no way of knowing their systems contained illegal CSAM, yet possession of such material is a criminal offense.

Hany Farid, a professor at UC Berkeley and a leading expert on digital image analysis, highlighted the profound ethical breach. “CSAM is illegal and hosting and distributing creates huge liabilities for the creators and researchers,” he stated. “There is also a larger ethical issue here in that the victims in these images have almost certainly not consented to have these images distributed and used in training. Even if the ends are noble, they don’t justify the means in this case.”

A Widespread Problem in AI Data Collection

This incident is not an isolated one. It mirrors the findings of a 2023 study from Stanford University’s Cyber Policy Center, which discovered that LAION-5B, another massive dataset used for AI image generation, also contained CSAM. The LAION-5B dataset was subsequently removed and cleaned before being redistributed.

Lloyd Richardson of C3P emphasized that this is a systemic issue. “Many of the AI models used to support features in applications and research initiatives have been trained on data that has been collected indiscriminately or in ethically questionable ways,” he said. “This lack of due diligence has led to the appearance of known child sexual abuse and exploitation material in these types of datasets, something that is largely preventable.”

A Call for Responsible AI Development

The rush to innovate in the AI space has often overshadowed the critical need for ethical data sourcing and vetting. As Richardson noted, unvetted datasets are promoted and distributed to researchers and companies who may use them for commercial products, with little consideration for the potential harm or exploitation underpinning the data.

“We also can’t forget that many of these images are themselves evidence of child sexual abuse crimes,” he added. “In the rush for innovation, we’re seeing a great deal of collateral damage, but many are simply not acknowledging it — ultimately, I think we have an obligation to develop AI technology in responsible and ethical ways.” The incident with NudeNet serves as another stark reminder of the urgent need for stringent oversight and ethical standards in the collection and use of data for training AI models.

Read Original Post

Compare Plans & Pricing

Find the plan that matches your workload and unlock full access to ImaginePro.

ImaginePro pricing comparison
Plan	Price	Highlights
Standard	$8 / month	300 monthly credits included Access to Midjourney, Flux, and SDXL models Commercial usage rights
Premium	$20 / month	900 monthly credits for scaling teams Higher concurrency and faster delivery Priority support via Slack or Telegram