What is a Diffusion Model?
It's the technology behind stunning AI art generators like Midjourney and Stable Diffusion. But how does it create a masterpiece from a simple text prompt? The answer is surprisingly simple: it learns to reverse chaos.
Step 1: The Block of Marble (Pure Noise)
A diffusion model doesn't start with a blank canvas. It starts with a canvas filled with pure, random static, like an old TV screen. This is its "block of marble"—a space of pure potential, containing no information at all.
Step 2: The Sculptor's Vision (The Prompt)
Next, we give the model its instructions. The text prompt, like "a photorealistic cat in a library," acts as the sculptor's vision. This text is converted into a mathematical representation that the AI can understand and use as a guide.
Step 3: The First Chisel (Denoising)
The core process begins. In a series of steps, the AI looks at the noisy image and makes a prediction: "What noise can I remove to make this look slightly more like 'a cat in a library'?" It removes a small amount of noise, revealing a very blurry, abstract shape.
Step 4: The Refinement Loop (Iteration)
This "denoising" step is repeated hundreds or even thousands of times. With each iteration, the AI refines the previous output, removing more noise and adding detail that more closely matches the prompt. The abstract blur begins to take on recognizable forms.
Step 5: The Masterpiece Emerges
In the final steps, the noise is almost completely gone. The AI adds the last intricate details—the texture of the fur, the pages of the books, the glint in the cat's eyes—resulting in a coherent and often stunning image that fulfills the prompt.
Explore Generative AI →