Transcribe Cantonese audio and video to text

Upload Cantonese audio or video and get a transcript that handles what makes Cantonese uniquely challenging: pervasive English code-mixing (我send咗個email畀你), a sentence-final particle system richer than any other Chinese variety, and the fundamental question of whether to output written Cantonese or standard Chinese.

Drop your file here or click to browse

.mp3, .wav, .m4a, .aac, .ogg, .flac, .mp4, .mov, .avi, .mkv, .webm·up to 500MB

Cantonese transcription that understands Cantonese — not just 'Chinese with different sounds'

Cantonese is not Mandarin with different pronunciation — it has its own grammar, vocabulary, and a written form that diverges significantly from standard Chinese. Hong Kong Cantonese speakers routinely mix English at the word level (我要book個room), use sentence-final particles (喎, 囉, 㗎, 嘅) that encode nuances no other Chinese variety expresses this way, and use characters like 嘅 (possessive), 咗 (completed), and 唔 (negation) that don't exist in Mandarin. Vocova's AI is built for Cantonese specifically, handling these features natively rather than forcing Cantonese speech through a Mandarin-shaped pipeline.

How it works

1

Upload Cantonese audio or video

Drag and drop or select a file containing Cantonese speech. Handles everything from Hong Kong business calls to TVB dramas to diaspora community recordings.

  • MP3, WAV, M4A, MP4, MOV, MKV, and all other formats
  • Files up to 500MB supported
  • No format conversion needed
2

AI processes Cantonese grammar, particles, and code-mixing

The engine recognizes Cantonese-specific grammar and vocabulary, preserves sentence-final particles, and handles English-Cantonese code-mixing at the word and phrase level.

  • Written Cantonese output with characters like 嘅, 咗, 唔, 冇
  • English-Cantonese code-mixing preserved naturally
  • 30+ sentence-final particles (喎, 囉, 㗎, 喇, 嘛) captured
  • Speaker diarization for multi-person recordings
3

Export your transcript

Review the Cantonese transcript, edit inline if needed, and export in your preferred format.

  • Export as TXT, SRT, VTT, DOCX, or PDF
  • Timestamps on every segment
  • Edit directly in the browser before exporting

Features

English-Cantonese code-mixing handled natively

Hong Kong Cantonese doesn't just borrow English words — it integrates them into Cantonese grammar. Speakers say 我send咗個email畀你 (I sent you an email) with Cantonese aspect markers wrapping English verbs. The AI preserves this code-mixing pattern faithfully, outputting English words where the speaker used them rather than attempting to translate everything into Chinese characters.

Sentence-final particle system captured

Cantonese has over 30 sentence-final particles that encode mood, attitude, and social meaning: 㗎 (assertion/emphasis), 喎 (surprise/new information), 囉 (obviousness), 嘛 (it should be clear), 啦 (suggestion/urging). These aren't optional — dropping them changes the meaning. The AI captures the particle actually spoken rather than omitting them or substituting a generic marker.

Cantonese-specific characters used correctly

Written Cantonese uses characters that standard Chinese does not: 嘅 (possessive, = Mandarin 的), 咗 (completed action, = Mandarin 了), 唔 (negation, = Mandarin 不), 冇 (not have, = Mandarin 没), 嘢 (thing, = Mandarin 东西), 佢 (he/she, = Mandarin 他/她). The AI uses these characters naturally because it processes Cantonese as its own language system.

Six to nine tones distinguished

Cantonese has 6 citation tones — more than Mandarin's 4 — plus changed tones that function as morphology (e.g., the high-rising changed tone marks familiarity or diminutive meaning). The AI uses this tonal information alongside context to select the correct characters from Cantonese's rich tonal landscape.

Hong Kong speech patterns recognized

The AI understands Hong Kong-specific vocabulary (巴士 for bus, 的士 for taxi, 冷氣 for air conditioning), institutional terms (立法會, 廉政公署), and the characteristic speech rhythm of Hong Kong Cantonese. It also handles the Guangzhou Cantonese variety with its own vocabulary preferences.

Why choose Vocova

Transcribe Hong Kong business communications

Hong Kong business Cantonese mixes English terminology freely — 我哋要review下個proposal先 is standard office speech. The transcript captures this naturally so your meeting minutes reflect how the conversation actually happened.

Create subtitles for Cantonese video content

Export as SRT or VTT for Hong Kong dramas, YouTube content, and social media videos. The subtitles include Cantonese-specific characters and code-mixed English, matching how Cantonese audiences expect to read their language.

Process Hong Kong media and news

Turn Cantonese news broadcasts, radio programs, and talk shows into searchable text for media monitoring, journalism, and content analysis across Hong Kong's media landscape.

Preserve Cantonese language and culture

Cantonese oral histories, community recordings, and cultural content deserve transcription in written Cantonese — not a Mandarin approximation. The AI produces text that reflects how Cantonese is actually spoken, preserving linguistic features that standard Chinese transcription would erase.

Who can benefit

Hong Kong businesses and professionals

Capture Cantonese meetings and calls with English code-mixing preserved. Get written records that reflect the actual language of Hong Kong business communication.

Cantonese content creators

Generate subtitles and written content from Cantonese videos and podcasts using written Cantonese characters (嘅, 咗, 唔) that your Hong Kong audience recognizes.

Researchers studying Cantonese

Get transcripts that preserve sentence-final particles, code-mixing patterns, and Cantonese-specific grammar for linguistic analysis and sociolinguistic research.

Cantonese diaspora communities

Transcribe family recordings, community media, and cultural content in written Cantonese for language preservation across generations in North America, Europe, and Southeast Asia.

Frequently asked questions

Start transcribing for free

Upload a file or paste a link from YouTube, TikTok, and 1,000+ platforms — get an accurate transcript in minutes. No credit card required.