Transcribe Cantonese audio and video to text
Upload Cantonese audio or video and get a transcript that handles what makes Cantonese uniquely challenging: pervasive English code-mixing (我send咗個email畀你), a sentence-final particle system richer than any other Chinese variety, and the fundamental question of whether to output written Cantonese or standard Chinese.
Drop your file here or click to browse
.mp3, .wav, .m4a, .aac, .ogg, .flac, .mp4, .mov, .avi, .mkv, .webm·up to 500MB
Cantonese transcription that understands Cantonese — not just 'Chinese with different sounds'
Cantonese is not Mandarin with different pronunciation — it has its own grammar, vocabulary, and a written form that diverges significantly from standard Chinese. Hong Kong Cantonese speakers routinely mix English at the word level (我要book個room), use sentence-final particles (喎, 囉, 㗎, 嘅) that encode nuances no other Chinese variety expresses this way, and use characters like 嘅 (possessive), 咗 (completed), and 唔 (negation) that don't exist in Mandarin. Vocova's AI is built for Cantonese specifically, handling these features natively rather than forcing Cantonese speech through a Mandarin-shaped pipeline.
How it works
Upload Cantonese audio or video
Drag and drop or select a file containing Cantonese speech. Handles everything from Hong Kong business calls to TVB dramas to diaspora community recordings.
- MP3, WAV, M4A, MP4, MOV, MKV, and all other formats
- Files up to 500MB supported
- No format conversion needed
AI processes Cantonese grammar, particles, and code-mixing
The engine recognizes Cantonese-specific grammar and vocabulary, preserves sentence-final particles, and handles English-Cantonese code-mixing at the word and phrase level.
- Written Cantonese output with characters like 嘅, 咗, 唔, 冇
- English-Cantonese code-mixing preserved naturally
- 30+ sentence-final particles (喎, 囉, 㗎, 喇, 嘛) captured
- Speaker diarization for multi-person recordings
Export your transcript
Review the Cantonese transcript, edit inline if needed, and export in your preferred format.
- Export as TXT, SRT, VTT, DOCX, or PDF
- Timestamps on every segment
- Edit directly in the browser before exporting
Features
English-Cantonese code-mixing handled natively
Hong Kong Cantonese doesn't just borrow English words — it integrates them into Cantonese grammar. Speakers say 我send咗個email畀你 (I sent you an email) with Cantonese aspect markers wrapping English verbs. The AI preserves this code-mixing pattern faithfully, outputting English words where the speaker used them rather than attempting to translate everything into Chinese characters.
Sentence-final particle system captured
Cantonese has over 30 sentence-final particles that encode mood, attitude, and social meaning: 㗎 (assertion/emphasis), 喎 (surprise/new information), 囉 (obviousness), 嘛 (it should be clear), 啦 (suggestion/urging). These aren't optional — dropping them changes the meaning. The AI captures the particle actually spoken rather than omitting them or substituting a generic marker.
Cantonese-specific characters used correctly
Written Cantonese uses characters that standard Chinese does not: 嘅 (possessive, = Mandarin 的), 咗 (completed action, = Mandarin 了), 唔 (negation, = Mandarin 不), 冇 (not have, = Mandarin 没), 嘢 (thing, = Mandarin 东西), 佢 (he/she, = Mandarin 他/她). The AI uses these characters naturally because it processes Cantonese as its own language system.
Six to nine tones distinguished
Cantonese has 6 citation tones — more than Mandarin's 4 — plus changed tones that function as morphology (e.g., the high-rising changed tone marks familiarity or diminutive meaning). The AI uses this tonal information alongside context to select the correct characters from Cantonese's rich tonal landscape.
Hong Kong speech patterns recognized
The AI understands Hong Kong-specific vocabulary (巴士 for bus, 的士 for taxi, 冷氣 for air conditioning), institutional terms (立法會, 廉政公署), and the characteristic speech rhythm of Hong Kong Cantonese. It also handles the Guangzhou Cantonese variety with its own vocabulary preferences.
Why choose Vocova
Transcribe Hong Kong business communications
Hong Kong business Cantonese mixes English terminology freely — 我哋要review下個proposal先 is standard office speech. The transcript captures this naturally so your meeting minutes reflect how the conversation actually happened.
Create subtitles for Cantonese video content
Export as SRT or VTT for Hong Kong dramas, YouTube content, and social media videos. The subtitles include Cantonese-specific characters and code-mixed English, matching how Cantonese audiences expect to read their language.
Process Hong Kong media and news
Turn Cantonese news broadcasts, radio programs, and talk shows into searchable text for media monitoring, journalism, and content analysis across Hong Kong's media landscape.
Preserve Cantonese language and culture
Cantonese oral histories, community recordings, and cultural content deserve transcription in written Cantonese — not a Mandarin approximation. The AI produces text that reflects how Cantonese is actually spoken, preserving linguistic features that standard Chinese transcription would erase.
Who can benefit
Hong Kong businesses and professionals
Capture Cantonese meetings and calls with English code-mixing preserved. Get written records that reflect the actual language of Hong Kong business communication.
Cantonese content creators
Generate subtitles and written content from Cantonese videos and podcasts using written Cantonese characters (嘅, 咗, 唔) that your Hong Kong audience recognizes.
Researchers studying Cantonese
Get transcripts that preserve sentence-final particles, code-mixing patterns, and Cantonese-specific grammar for linguistic analysis and sociolinguistic research.
Cantonese diaspora communities
Transcribe family recordings, community media, and cultural content in written Cantonese for language preservation across generations in North America, Europe, and Southeast Asia.
Frequently asked questions
Related tools

Chinese transcription
Transcribe Chinese (Mandarin) audio and video with AI

Japanese transcription
Transcribe Japanese audio and video with AI

Korean transcription
Transcribe Korean audio and video with AI

Audio to text
Upload any audio file and get accurate text instantly

Audio translation
Upload audio in any language and translate it to 140+ languages

Subtitle generator
Upload audio or video and get ready-to-use subtitle files
Start transcribing for free
Upload a file or paste a link from YouTube, TikTok, and 1,000+ platforms — get an accurate transcript in minutes. No credit card required.