Audio Transcription
Drop an audio or video file, get a transcript with timestamps. Whisper runs locally in your browser. Nothing uploaded, nothing logged, unlimited length on a desktop.
100% in-browser
No upload
Multilingual
SRT, VTT, JSON
Drop audio or video, or click to browse
MP3, WAV, M4A, OGG, MP4, WEBM, MOV · processed locally
Ready.
How it works
OpenAI Whisper, ported to ONNX and run inside your browser via transformers.js. The audio decoder reads any common format, downsamples to 16 kHz mono, and feeds 30-second chunks into the model. The first time you pick a model size it downloads (Tiny is about 80 MB, Base 150 MB, Small 480 MB) and is cached locally.
Which model size to pick
- Tiny: fastest, great for short clear audio, English does best.
- Base: good balance for most clips, works well in many languages.
- Small: highest accuracy, especially for accented speech and noisy audio. Larger download and slower per minute of audio.
Tips
- Pick the language explicitly when you know it. Auto-detect works but adds a small overhead.
- Translate to English uses the same model with the translate task. Output is always English even from a non-English source.
- Long files: a 30-minute podcast can take a few minutes on a desktop, longer on mobile.
- Switch to the SRT or VTT tab to get subtitles ready for video editors.