Textual Inversion
A method that learns new text embeddings to represent specific visual concepts for image generation.
Overview
Textual Inversion, introduced by Gal et al. in 2022, teaches diffusion models to understand new visual concepts by learning new word embeddings. Given a few example images of a concept (a specific style, object, or texture), it optimizes a new token embedding that captures that concept within the model's existing text encoder.
Key Details
Unlike DreamBooth, Textual Inversion only modifies the text embedding space — not the model weights — making it lightweight (a few KB per concept) and composable. Multiple learned concepts can be combined in a single prompt. It's widely used for capturing artistic styles, specific textures, and character appearances.