Whisper
OpenAI's open-source automatic speech recognition model that transcribes and translates speech in 100+ languages with near-human accuracy.
Architecture
Whisper uses a transformer encoder-decoder trained on 680,000 hours of multilingual audio-text pairs from the internet. It handles transcription, translation, language identification, and voice activity detection in a single model.
Impact
Released as open-source, Whisper became the foundation for countless speech applications. Its multilingual capability and robustness to noise made high-quality speech recognition accessible to everyone.