Question 1

Does VBR encoding cause timing problems?

Accepted Answer

It can with naive decoders, but not here. Variable bitrate MP3 files don't have a linear relationship between byte position and playback time. We parse the Xing or VBRI header to build a seek table, and fall back to a full frame scan when those headers are missing. This gives us accurate timestamps regardless of encoding mode.

Question 2

What's the minimum bitrate that works?

Accepted Answer

We reliably transcribe MP3 files down to 64 kbps for clear speech. At 32 kbps the quality degrades significantly — speech becomes muffled and sibilants disappear — but we can still extract usable text from reasonably clean recordings at that rate. For best results, 96 kbps or higher is recommended.

Question 3

Does mono vs stereo make a difference for transcription?

Accepted Answer

For a single speaker, no. Mono and stereo produce equivalent results. Where stereo helps is when different speakers are panned to different channels — our engine processes both channels and can use the spatial separation as an additional signal for speaker diarization.

Question 4

How do you handle compression artifacts in low-bitrate MP3?

Accepted Answer

MP3 compression introduces predictable artifacts: pre-echo before transients, bandwidth limiting that removes frequencies above 10-16 kHz, and stereo imaging issues in joint stereo mode. Our speech model is trained on audio with these specific degradation patterns, so it doesn't mistake artifacts for speech sounds.

Question 5

Should I convert my MP3 to WAV before uploading?

Accepted Answer

No. Converting MP3 to WAV just wraps the already-decoded (and already-degraded) audio in a larger file. The information lost during MP3 encoding can't be recovered. Upload the MP3 directly — it's smaller and produces identical results.

Question 6

Do ID3 tags or album art cause problems?

Accepted Answer

No. MP3 files often contain ID3v1 tags at the end and ID3v2 tags at the beginning, sometimes with large embedded album art. Our decoder identifies and skips these metadata blocks before processing, so they never interfere with the audio or timestamps.

Question 7

Why does my MP3 show a different duration in different players?

Accepted Answer

This is a common VBR issue. Some players estimate duration from file size assuming constant bitrate, which gives wrong results for VBR files. Our decoder reads the actual frame count from the Xing header (or scans all frames), so the duration and timestamps we report are accurate regardless of what your media player shows.

Transcribe any MP3 — from 64kbps voice memos to 320kbps podcasts

MP3 transcription that understands MP3 encoding

How it works

Upload your MP3 file

Decoding and transcription

Review and export

Features

VBR timestamp accuracy

Low-bitrate artifact tolerance

Mono and stereo channel handling

ID3 tag and metadata handling

Podcast chapter awareness

Why choose Vocova

Turn podcast episodes into written content

Transcribe compressed interview recordings

Process audio downloaded from the web

Archive voice recorder files as text

Who can benefit

Podcast producers

Journalists with field recordings

Researchers doing qualitative analysis

Audio archivists

Frequently asked questions

Related tools

Audio converter

Audio to text

WAV to text

M4A to text

Audio translation

Subtitle generator

Start transcribing for free