X Bookmarks — 2023 KW46: Distil-Whisper drops Whisper latency by 6x

November 16, 2023

|bookmarks

by Florian Narr

X Bookmarks — 2023 KW46: Distil-Whisper drops Whisper latency by 6x

@LiorOnAI — distil-whisper cuts Whisper latency by 6x

A team just made OpenAI Whisper 6x faster, 49% smaller, while keeping 99% of the accuracy.

The model is already available on the HuggingFace Transformers library:

model_id = "distil-whisper/distil-large-v2"

You can also use their web UI to transcribe from URLs, files, or microphone.

That's a meaningful compression result — 49% size reduction with 1% accuracy loss is the kind of tradeoff that makes a model actually deployable in latency-sensitive contexts where the full large-v2 would be too slow. Six times the throughput on the same hardware changes what you can build. Saving this to run against our transcription pipeline.