Question 1

Does the video codec affect transcription quality?

Accepted Answer

No. Transcription works on the audio track only. We never decode the video stream. A file encoded with H.264, H.265, VP9, or AV1 will produce identical transcripts if the audio track is the same. Video resolution, frame rate, and codec choice are completely irrelevant.

Question 2

My MP4 has multiple audio tracks. Which one gets transcribed?

Accepted Answer

We transcribe the primary (first) audio track in the MP4 container. This is typically the default track that plays when you open the file in a media player. If you need a different track transcribed, use a tool like FFmpeg to extract the specific track as a separate file and upload that.

Question 3

Can I transcribe a screen recording with system audio?

Accepted Answer

Yes. Screen recordings often mix microphone input with system audio (music, notifications, UI sounds). Our speech model focuses on human voice frequencies and ignores system sounds. For best results, make sure the microphone level is higher than the system audio level in your recording.

Question 4

Why do Zoom MP4 recordings transcribe worse than DSLR footage?

Accepted Answer

Zoom compresses audio significantly — local recordings use a low-bitrate AAC encode, and cloud recordings compress even further. Combined with the already-lossy WebRTC audio from the call itself, this creates double compression. DSLR cameras capture audio directly from high-quality microphones without network compression. Our model handles Zoom-quality audio, but the gap in source quality is real.

Question 5

My MP4 file is several gigabytes. Will it work?

Accepted Answer

The upload limit is 500 MB. Large MP4 files are usually large because of high-resolution video, not because of audio. Since we only use the audio track, you can extract just the audio with FFmpeg ('ffmpeg -i video.mp4 -vn -acodec copy audio.m4a') and upload that much smaller file with identical transcription results.

Question 6

Does it handle MP4 files from DJI or GoPro cameras?

Accepted Answer

Yes. DJI and GoPro cameras produce standard MP4 files with AAC audio. The wind noise common in drone and action camera footage is handled by our speech model, though accuracy depends on how much the wind obscures the actual speech.

Question 7

Can I get subtitles timed to match the video?

Accepted Answer

Yes. When you export as SRT or VTT, the timestamps correspond to the original video timeline. Import the subtitle file into your video editor or upload it to YouTube, Vimeo, or any platform that accepts external subtitle files.

Transcribe MP4 video — any codec, any source

MP4 is a container — what's inside it matters

Come funziona

Carica il tuo file MP4

Audio extraction and transcription

Export your transcript

Funzionalità

Container-aware processing

Multiple audio track handling

Screen recording optimization

Zoom and meeting recording handling

Video codec is irrelevant

Perché scegliere Vocova

Subtitle any video without an editor

Transcribe meeting recordings from any platform

Extract dialogue from camera footage

Turn screen recordings into documentation

Chi può trarne vantaggio

Video editors and post-production teams

Remote teams with meeting recordings

YouTubers and content creators

Educators recording screen tutorials

Domande frequenti

Does the video codec affect transcription quality?

My MP4 has multiple audio tracks. Which one gets transcribed?

Can I transcribe a screen recording with system audio?

Why do Zoom MP4 recordings transcribe worse than DSLR footage?

My MP4 file is several gigabytes. Will it work?

Does it handle MP4 files from DJI or GoPro cameras?

Can I get subtitles timed to match the video?

Strumenti correlati

Video in testo

MP4 in SRT

Video in PDF

Generatore di sottotitoli

Audio in testo

Da MOV a testo

Inizia a trascrivere gratuitamente