Sora
OpenAI's text-to-video generation model that creates photorealistic videos from text descriptions, demonstrating emergent understanding of physics and 3D consistency.
Capabilities
Generates up to 60-second videos with realistic motion, lighting, and physics. Can extend existing videos, fill in missing frames, and generate from still images. Demonstrates an implicit understanding of 3D space and object permanence.
Significance
Sora showed that scaling diffusion transformers to video data produces emergent physical understanding. It represents a major step toward AI world models that understand how the physical world works.