How AI Flaws Accidentally Create Art
We were once promised a future of self-driving cars and robotic maids. Instead, we have been met with one of the great surprises of the modern era: while physical tasks remain difficult for machines, artificial intelligence systems are becoming increasingly adept at mimicking our intellect. They can master complex games, analyze vast amounts of text, and even compose poetry.
Even more perplexing to researchers has been their strange and unexpected knack for creativity.
The Paradox of AI Creativity
Diffusion models, the technology powering popular image generators like DALL·E, Imagen, and Stable Diffusion, are fundamentally designed to create perfect copies of the images in their training data. Yet, in practice, they appear to improvise, blending concepts and elements to create something entirely new and coherent.
This is the paradox that has puzzled experts. Giulio Biroli, an AI researcher and physicist, notes, “If they worked perfectly, they should just memorize. But they don’t—they’re actually able to produce new samples.”
To create images, diffusion models employ a process called denoising. Imagine putting a painting through a shredder until it’s just a pile of dust, and then painstakingly patching the pieces back together. The models do this digitally, converting an image to random pixels (noise) and then reassembling it. For years, the question has been: If the models are just reassembling, how does novelty enter the picture? How can you reassemble a shredded painting and end up with a completely new work of art?
From Imperfection to Innovation
A new paper from two physicists makes a startling claim: the creativity of diffusion models stems directly from the technical imperfections in the denoising process itself. In their research, presented at the International Conference on Machine Learning 2025, they use a mathematical model to show that this creativity isn't random magic but a deterministic and inevitable consequence of the AI's architecture.
This work helps illuminate the black box of AI and could have significant implications for future research. “The real strength of the paper is that it makes very accurate predictions of something very nontrivial,” says Luca Ambrogioni, a computer scientist at Radboud University.
A Lesson from Biology: The Bottom-Up Approach
Mason Kamb, a Stanford University graduate student and the paper's lead author, drew inspiration from morphogenesis—the process by which living things self-assemble. In an embryo, cells organize into limbs and organs by following local rules, responding only to their immediate neighbors. This bottom-up system, sometimes described by a Turing pattern (named for mathematician Alan Turing), has no central CEO or master blueprint. Occasionally, this local process goes awry, resulting in anomalies like hands with extra fingers.
When Kamb saw early AI-generated images of humans with extra fingers, it reminded him of morphogenesis. “It smelled like a failure you’d expect from a [bottom-up] system,” he said.
Researchers already knew that diffusion models take two key technical shortcuts. First is locality: they only focus on one small “patch” of pixels at a time. Second is translational equivariance: a rule that ensures if an input is shifted, the output shifts in the same way, preserving the image’s structure. These were long seen as mere limitations, not as the source of a higher-order phenomenon like creativity.
Proving the Theory: The ELS Machine
Working in the lab of Stanford physicist Surya Ganguli, Kamb hypothesized that locality and equivariance were the direct causes of creativity. To test this, he and Ganguli built a system to do nothing but optimize for these two principles.
They called their system the equivariant local score (ELS) machine. It is not a trained AI but a set of mathematical equations designed to predict the output of a denoising process based only on the mechanics of locality and equivariance. They fed a series of noisy images to both the ELS machine and several powerful, fully trained diffusion models.
The results, Ganguli said, were “shocking.” The simple ELS machine was able to match the outputs of the complex, trained AI models with an average accuracy of 90 percent—a figure described as “unheard of in machine learning.”
This confirmed Kamb’s hypothesis. The very constraints that forced the models to focus on local patches, without any broader context, are what enabled them to be creative. The extra-fingers phenomenon was a direct side effect of this hyperfixation on local details.
What This Means for Human and Machine Creativity
For the first time, researchers have demonstrated how the creativity of diffusion models can be understood as a predictable, mathematical by-product of their own limitations. While experts agree this is a major piece of the puzzle, it is not the whole story, as it doesn’t explain creativity in other systems like large language models.
Still, the work could offer insights into our own minds. “Human and AI creativity may not be so different,” suggests Benjamin Hoover, a machine learning researcher who studies diffusion models. “We assemble things based on what we experience, what we’ve dreamed, what we’ve seen, heard, or desire. AI is also just assembling the building blocks from what it’s seen and what it’s asked to do.”
According to this view, creativity—both human and artificial—may be fundamentally rooted in an incomplete understanding of the world. It is in the effort of filling those gaps that we sometimes generate something both new and valuable.