Vocova
PricingBlog

Product

  • Pricing
  • Blog
  • View all tools

Solutions

  • For podcasters
  • For video creators
  • Multilingual interviews

Company

  • About
  • FAQ
  • Terms of service
  • Privacy policy
  • Contact

Transcription

  • Audio to text
  • Video to text
  • Podcast transcription
  • Interview transcription
  • Lecture transcription

Platform

  • YouTube transcription
  • Apple Podcasts transcription
  • Zoom transcription
  • Google Meet transcription
  • TikTok transcription
  • Loom transcription
  • Bilibili transcription
  • Vimeo transcription
  • Instagram transcription
  • Facebook transcription
  • X (Twitter) transcription
  • SoundCloud transcription
  • Reddit transcription
  • Dailymotion transcription

Language

  • Japanese transcription
  • Spanish transcription
  • French transcription
  • German transcription
  • Portuguese transcription
  • Korean transcription
  • Chinese transcription
  • Arabic transcription
  • Hindi transcription
  • Italian transcription
  • Russian transcription
  • Thai transcription
  • Vietnamese transcription
  • Turkish transcription
  • Indonesian transcription
  • Dutch transcription
  • Polish transcription
  • Swedish transcription
  • Cantonese transcription
  • Tagalog transcription

Translation

  • Audio translation
  • Bilingual subtitles
  • Video translation
  • Japanese to English
  • Chinese to English
  • Spanish to English
  • Korean to English
  • French to English

Format

  • MP4 to text
  • MP3 to text
  • WAV to text
  • M4A to text
  • MOV to text
  • SRT generator
  • VTT generator
  • Subtitle generator

Converter

  • Audio converter
  • Video converter
  • MP4 to MP3

Summarize

  • Podcast summarizer
  • YouTube summarizer
Vocova

© 2026 NOWGIC LTD. All rights reserved.

Featured on Product Hunt
Vocova
PricingBlog

Product

  • Pricing
  • Blog
  • View all tools

Solutions

  • For podcasters
  • For video creators
  • Multilingual interviews

Company

  • About
  • FAQ
  • Terms of service
  • Privacy policy
  • Contact

Transcription

  • Audio to text
  • Video to text
  • Podcast transcription
  • Interview transcription
  • Lecture transcription

Platform

  • YouTube transcription
  • Apple Podcasts transcription
  • Zoom transcription
  • Google Meet transcription
  • TikTok transcription
  • Loom transcription
  • Bilibili transcription
  • Vimeo transcription
  • Instagram transcription
  • Facebook transcription
  • X (Twitter) transcription
  • SoundCloud transcription
  • Reddit transcription
  • Dailymotion transcription

Language

  • Japanese transcription
  • Spanish transcription
  • French transcription
  • German transcription
  • Portuguese transcription
  • Korean transcription
  • Chinese transcription
  • Arabic transcription
  • Hindi transcription
  • Italian transcription
  • Russian transcription
  • Thai transcription
  • Vietnamese transcription
  • Turkish transcription
  • Indonesian transcription
  • Dutch transcription
  • Polish transcription
  • Swedish transcription
  • Cantonese transcription
  • Tagalog transcription

Translation

  • Audio translation
  • Bilingual subtitles
  • Video translation
  • Japanese to English
  • Chinese to English
  • Spanish to English
  • Korean to English
  • French to English

Format

  • MP4 to text
  • MP3 to text
  • WAV to text
  • M4A to text
  • MOV to text
  • SRT generator
  • VTT generator
  • Subtitle generator

Converter

  • Audio converter
  • Video converter
  • MP4 to MP3

Summarize

  • Podcast summarizer
  • YouTube summarizer
Vocova

© 2026 NOWGIC LTD. All rights reserved.

Featured on Product Hunt
Vocova
PricingBlog
BlogSubtitle file formats explained: SRT, WebVTT, ASS, TTML compared (2026)

Subtitle file formats explained: SRT, WebVTT, ASS, TTML compared (2026)

All 6 major subtitle formats explained with platform compatibility, code samples, and a decision guide. Choose between SRT, WebVTT, ASS/SSA, SBV, STL, and TTML/DFXP for streaming, broadcast, or social media in 2026.

Apr 2, 2026·12 min read·
subtitlescaptionsformatsreference

A subtitle file is a plain-text document that tells a video player what text to show, when to show it, and -- optionally -- how to style and position it. The seven formats that matter in 2026 are SRT (universal baseline), WebVTT (web-native, HTML5), ASS/SSA (advanced styling for anime and karaoke), SBV (YouTube's internal format), STL (European broadcast standard), and TTML/DFXP (W3C XML standard used by Netflix and broadcast workflows). Each has a specific job, and using the wrong one guarantees compatibility headaches.

This reference covers the technical spec, a minimal example, platform support, and a decision tree so you can pick the right format the first time. If you only need a two-format comparison, the SRT vs VTT post is shorter. This guide is the full map.

Quick comparison

FormatExtensionStylingPositioningPrimary usePlatform coverage
SRT.srtMinimal (italic, bold, underline)NoneUniversal video playbackNear-universal
WebVTT.vttCSS-basedFull (x,y,align)HTML5 video, webAll modern browsers
ASS / SSA.ass, .ssaRich (fonts, colors, effects)FullAnime, karaoke, styled subsVLC, MPV, Aegisub
SBV.sbvNoneNoneYouTube uploadsYouTube Studio only
STL (EBU).stlBroadcast-safeYesEuropean TV broadcastProfessional broadcast
TTML / DFXP.ttml, .dfxp, .xmlXML + CSSFullOTT, broadcast, NetflixNetflix, SMPTE workflows

Every major format is human-readable plain text except some variants of STL. Any of them can be converted to another, though you lose styling when going from richer to simpler formats.

SRT (SubRip Text)

SRT is the lowest common denominator of subtitle formats. It was designed for the SubRip DVD-ripping tool in the early 2000s, and its simplicity is exactly why it became universal -- virtually every video player, video editor, and streaming platform supports it.

Structure. An SRT file is a sequence of cues, each with a numeric index, a start and end timestamp separated by -->, and one or more lines of text. Cues are separated by a blank line. Timestamps use HH:MM:SS,mmm (comma as decimal separator).

Minimal example:

1
00:00:01,000 --> 00:00:03,500
Welcome to the video.

2
00:00:04,000 --> 00:00:07,200
Subtitles make content accessible
to global audiences.

Styling. SRT supports a tiny subset of HTML-like tags: <i>italic</i>, <b>bold</b>, <u>underline</u>, and <font color="#ff0000">colored</font>. Tag support varies by player. Anything beyond these is not portable.

Limitations. No positioning, no vertical text, no animation, no precise CSS control. Unicode is supported but some older players assume Windows-1252 or Latin-1, so save as UTF-8 without BOM for widest compatibility.

When to use. Default choice for uploads to video platforms, local playback, and anywhere you need maximum compatibility.

WebVTT (Web Video Text Tracks)

WebVTT is the W3C standard for HTML5 video captions. It was designed to be SRT-compatible on the surface while adding the features the web actually needs: CSS styling, positioning, metadata cues, and chapter markers.

Structure. Begins with a WEBVTT header, followed by cues. Timestamps use HH:MM:SS.mmm (period as decimal separator, not comma). Cues can carry styling and positioning hints inline.

Minimal example:

WEBVTT

1
00:00:01.000 --> 00:00:03.500
Welcome to the video.

2
00:00:04.000 --> 00:00:07.200 line:80% position:50% align:center
Subtitles make content accessible
to global audiences.

Styling. Supports CSS via ::cue and ::cue(selector) pseudo-elements in a stylesheet, or STYLE blocks directly in the VTT file. You get control over color, background, font, font size, weight, and shadow effects.

Positioning. Cue settings (line, position, size, align, vertical) control where the text appears. This is the main functional advantage over SRT.

Extensions. Supports NOTE blocks for comments, STYLE blocks for embedded CSS, and chapter/metadata tracks via the kind attribute on the HTML <track> element.

When to use. HTML5 video, web players, chapter markers, and anywhere you need CSS-level control over caption appearance.

ASS / SSA (Advanced SubStation Alpha)

ASS (Advanced SubStation Alpha) and its predecessor SSA are the heavyweight format of the subtitle world. Originally developed for the SubStation Alpha karaoke and anime subtitling tool, ASS provides the richest styling options of any widely-used subtitle format.

Structure. INI-like sections: [Script Info], [V4+ Styles], [Events]. Events are the actual subtitle cues, each with a layer, start/end time, style name, and text. Text can contain inline override tags in curly braces ({\b1}bold{\b0}, {\c&H00FFFF&}yellow, {\pos(100,200)}positioned).

Minimal example:

[Script Info]
Title: Example
ScriptType: v4.00+
PlayResX: 1920
PlayResY: 1080

[V4+ Styles]
Format: Name, Fontname, Fontsize, PrimaryColour, SecondaryColour, OutlineColour, BackColour, Bold, Italic, Underline, StrikeOut, ScaleX, ScaleY, Spacing, Angle, BorderStyle, Outline, Shadow, Alignment, MarginL, MarginR, MarginV, Encoding
Style: Default,Arial,48,&H00FFFFFF,&H000000FF,&H00000000,&H80000000,0,0,0,0,100,100,0,0,1,2,2,2,10,10,10,1

[Events]
Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text
Dialogue: 0,0:00:01.00,0:00:03.50,Default,,0,0,0,,Welcome to the video.
Dialogue: 0,0:00:04.00,0:00:07.20,Default,,0,0,0,,{\b1}Subtitles{\b0} matter.

Styling. Named styles defined once and applied to many cues. Inline overrides can animate properties (\t(start,end,\fscx120) scales horizontally between two times), rotate text, apply shadows and outlines, and draw vector graphics using \p1 ... \p0 commands.

When to use. Anime fansubs, karaoke lyrics, heavily stylized captions, and any time you need production-grade control over typography and positioning. Overkill for most use cases.

Compatibility. VLC, MPV, mpv.net, and most anime-community players support ASS fully. Web players generally do not. YouTube strips ASS styling on upload.

SBV (YouTube format)

SBV is YouTube's historical internal subtitle format. It is essentially a stripped-down SRT without indices or styling. YouTube Studio still accepts SBV alongside SRT, VTT, TTML, and several other formats.

Structure. Timestamps separated by a comma, followed by subtitle text. Cues separated by blank lines. Timestamps use H:MM:SS.mmm.

Minimal example:

0:00:01.000,0:00:03.500
Welcome to the video.

0:00:04.000,0:00:07.200
Subtitles make content accessible
to global audiences.

When to use. Almost never, outside of the narrow case of uploading directly to YouTube where you already have SBV exports from a tool. For new workflows, use SRT or VTT -- YouTube accepts both.

STL (EBU Subtitling data exchange format)

EBU-STL is the European Broadcasting Union's binary subtitle exchange format, standardized in EBU Tech 3264. It is the dominant format in European broadcast television and is required by many public broadcasters for delivery.

Structure. Binary container with a general subtitle information (GSI) header followed by a sequence of text and timing information (TTI) blocks, each 128 bytes. The GSI block encodes metadata like language, character set, frame rate, and aspect ratio. Each TTI block is a single cue with precise in/out frame numbers and styling attributes.

Styling. Supports teletext-style color and positioning attributes, double-height characters, and box backgrounds. Output is visually constrained to match traditional broadcast caption capabilities.

When to use. Broadcast delivery to European TV networks (BBC, ZDF, France Télévisions, etc.). If you are not working in professional broadcast, you will not touch this format.

Compatibility. Professional broadcast software (EZTitles, WinCAPS, Subtitle Workshop) handles STL. Consumer video players do not.

TTML and DFXP (W3C Timed Text Markup Language)

TTML (Timed Text Markup Language) is the W3C XML-based format that has become the backbone of professional OTT (over-the-top) and streaming delivery. DFXP is the profile of TTML originally standardized by the W3C, and IMSC (SMPTE-TT) is a tighter profile used by broadcasters and Netflix.

Structure. XML document with a root <tt> element containing <head> (styles, regions, metadata) and <body> (divisions containing paragraphs, each representing a subtitle cue with begin/end timing).

Minimal example:

<?xml version="1.0" encoding="UTF-8"?>
<tt xmlns="http://www.w3.org/ns/ttml" xml:lang="en">
  <head>
    <styling>
      <style xml:id="default" tts:color="white" tts:fontFamily="Arial" tts:fontSize="100%"/>
    </styling>
  </head>
  <body>
    <div style="default">
      <p begin="00:00:01.000" end="00:00:03.500">Welcome to the video.</p>
      <p begin="00:00:04.000" end="00:00:07.200">Subtitles make content accessible<br/>to global audiences.</p>
    </div>
  </body>
</tt>

Styling. Full CSS-like styling inline or via style definitions. Supports regions for positioning, animations via <set> elements, ruby annotations for East Asian typography, and rich text semantics.

When to use. Netflix delivery, OTT platforms, broadcast workflows that require SMPTE-TT or IMSC profiles, and anywhere you need precise styling that survives processing pipelines.

Compatibility. Netflix requires IMSC 1.1. Amazon Prime, Hulu, and Disney+ accept TTML variants. Apple TV uses iTunes Timed Text (iTT), a profile of TTML. Consumer players generally prefer SRT or VTT.

Platform compatibility matrix

PlatformSRTVTTASS/SSASBVSTLTTML/DFXP
YouTube (upload)✅✅❌✅❌✅
Vimeo✅✅❌❌❌❌
Netflix (delivery)❌❌❌❌❌✅ (IMSC)
Amazon Prime (delivery)❌❌❌❌❌✅
HTML5 <track>❌✅❌❌❌❌
VLC✅✅✅✅❌✅
MPV✅✅✅✅❌✅
Adobe Premiere Pro✅✅❌❌✅✅
DaVinci Resolve✅✅❌❌✅✅
Final Cut Pro✅✅❌❌❌✅ (iTT)
TikTok / Instagram Reels✅❌❌❌❌❌

"Delivery" means the platform accepts that format in its ingest pipeline, not that it plays back directly in the consumer app.

Decision tree: which format should you use?

Answer these in order. The first yes is your format.

  1. Are you delivering to Netflix or another major OTT service? Use TTML / IMSC 1.1. This is a hard requirement, not a preference.
  2. Are you delivering to European broadcast TV? Use EBU-STL. Check the specific broadcaster's delivery spec for the exact STL variant.
  3. Do you need stylized subtitles for anime, karaoke, or typography-heavy content? Use ASS / SSA. No other format gives you comparable control.
  4. Are you embedding in HTML5 video on the web? Use WebVTT. It is the native format for the <track> element.
  5. Are you uploading to YouTube? Use SRT (YouTube's preferred input) or VTT. Skip SBV unless you have a legacy workflow.
  6. Do you need maximum compatibility across unknown players? Use SRT. Nothing is more universally supported.

For most content creators -- podcasters, YouTubers, course creators -- the answer is almost always SRT or WebVTT. The exotic formats are relevant only when a specific platform or client mandates them.

Converting between formats

All seven formats are convertible, but each conversion loses information in one direction. Going from a rich format (ASS, TTML) to a simple format (SRT, SBV) strips styling and positioning. Going the other way preserves text but cannot recreate the source styling.

Common conversion tools:

  • FFmpeg: ffmpeg -i input.ass output.srt handles most subtitle conversions including strip-to-plain-text.
  • Subtitle Edit (Windows, free): GUI for converting between ~30 subtitle formats with visual preview.
  • Aegisub (cross-platform, free): Specialized ASS editor that imports and exports to SRT and VTT.
  • Online converters: Useful for one-off conversions, but avoid them for sensitive content (uploads leave your control).

Programmatic conversion is straightforward for format pairs that share a cue-based model (SRT, VTT, SBV, ASS events). XML formats (TTML/DFXP) need a proper parser because of namespaces and nested elements.

Character encoding and Unicode

All modern subtitle formats support UTF-8 and this is the only encoding you should use in 2026. Legacy files may be in Windows-1252, Latin-1, Shift-JIS, or GB2312 -- if your text renders as ?????? or é instead of é, the file is in the wrong encoding. Most editors let you re-save as UTF-8.

A single mistake to watch for: do not save UTF-8 with a byte-order mark (BOM). The BOM is three invisible bytes at the start of the file that confuse older SRT parsers and some streaming pipelines. In VS Code, use "Save with Encoding → UTF-8" rather than "UTF-8 with BOM".

Generating subtitles from audio

Modern transcription services output directly to most subtitle formats. The typical pipeline is:

  1. Upload or paste the source audio/video
  2. Choose the output format(s): SRT, VTT, TXT, or DOCX
  3. Download the generated file and attach to your video

Vocova supports export to SRT, VTT, DRCX (Descript), plain text, and timestamped PDF, covering every practical need for content creators and most professional workflows. If you need TTML, ASS, or STL, the standard approach is to export to SRT first and then convert using the tools listed above.

For a deeper walkthrough of generating subtitles from video, see the AI subtitle generators guide.

Frequently asked questions

What is the most widely used subtitle format?

SRT is the most widely used subtitle format in 2026. It is supported by essentially every video player, video editor, and streaming platform, and its simplicity makes it the default output of most transcription tools.

What is the difference between SRT and VTT?

SRT is the legacy universal format with minimal styling and no positioning. WebVTT is the modern HTML5 standard with full CSS styling, positioning, and chapter markers. WebVTT uses periods in timestamps (.), while SRT uses commas (,).

Does YouTube support WebVTT?

Yes. YouTube Studio accepts WebVTT, SRT, SBV, TTML, SAMI, and several other formats on upload. SRT is the most common choice because it is the simplest to generate and edit.

Can I use subtitle files for accessibility compliance?

Yes. All formats listed can serve as closed captions when they include speaker identification and non-speech sounds ([music playing], [door slams]). Transcription for accessibility covers the specific WCAG requirements.

What format does Netflix require?

Netflix requires IMSC 1.1, a profile of TTML. Delivery specifications mandate specific styling, timing, and metadata constraints that go beyond generic TTML. Netflix publishes its Timed Text Style Guide for vendors who need to meet the spec.

Is ASS still used in 2026?

Yes, ASS remains the standard for anime fansubs, karaoke-style subtitles, and any use case needing typography control beyond what VTT offers. It has not been deprecated and continues to receive community tooling updates.

How do I add styling to SRT?

SRT supports a small set of inline HTML tags: <i>, <b>, <u>, and <font color="...">. Anything more advanced requires switching to VTT or ASS.

Summary

The right subtitle format depends on where your file is going, not on personal preference. SRT for universal compatibility, WebVTT for the web, ASS for styled typography, TTML for OTT delivery, STL for European broadcast, and SBV almost never. Unicode everything as UTF-8 without BOM, and convert between formats using FFmpeg or Subtitle Edit when a platform requires a specific input.

If you are starting a transcription workflow, generate SRT or VTT first -- they cover 90% of content creator needs, and every other format is one conversion away.

Sources and further reading

  • W3C Timed Text Markup Language 2 specification
  • W3C WebVTT specification
  • EBU Tech 3264: Subtitling data exchange format
  • SubRip Text (SRT) format overview, Matroska docs
  • IMSC 1.1 specification (W3C)
  • SRT vs VTT detailed comparison
  • Closed captions vs subtitles

Related articles

Read more
Feb 13, 2026·10 min

Closed captions vs subtitles: what's the difference?

Read more
Feb 7, 2026·11 min

SRT vs WebVTT in 2026: which subtitle format works on YouTube, Vimeo, Netflix

Read more
May 1, 2026·11 min

How to transcribe Bilibili videos: transcript, subtitles, and English translation

Product

  • Pricing
  • Blog
  • View all tools

Solutions

  • For podcasters
  • For video creators
  • Multilingual interviews

Company

  • About
  • FAQ
  • Terms of service
  • Privacy policy
  • Contact

Transcription

  • Audio to text
  • Video to text
  • Podcast transcription
  • Interview transcription
  • Lecture transcription

Platform

  • YouTube transcription
  • Apple Podcasts transcription
  • Zoom transcription
  • Google Meet transcription
  • TikTok transcription
  • Loom transcription
  • Bilibili transcription
  • Vimeo transcription
  • Instagram transcription
  • Facebook transcription
  • X (Twitter) transcription
  • SoundCloud transcription
  • Reddit transcription
  • Dailymotion transcription

Language

  • Japanese transcription
  • Spanish transcription
  • French transcription
  • German transcription
  • Portuguese transcription
  • Korean transcription
  • Chinese transcription
  • Arabic transcription
  • Hindi transcription
  • Italian transcription
  • Russian transcription
  • Thai transcription
  • Vietnamese transcription
  • Turkish transcription
  • Indonesian transcription
  • Dutch transcription
  • Polish transcription
  • Swedish transcription
  • Cantonese transcription
  • Tagalog transcription

Translation

  • Audio translation
  • Bilingual subtitles
  • Video translation
  • Japanese to English
  • Chinese to English
  • Spanish to English
  • Korean to English
  • French to English

Format

  • MP4 to text
  • MP3 to text
  • WAV to text
  • M4A to text
  • MOV to text
  • SRT generator
  • VTT generator
  • Subtitle generator

Converter

  • Audio converter
  • Video converter
  • MP4 to MP3

Summarize

  • Podcast summarizer
  • YouTube summarizer
Vocova

© 2026 NOWGIC LTD. All rights reserved.

Featured on Product Hunt