diff --git a/playbooks/subtitles/STOPGAP-SUBS.md b/playbooks/subtitles/STOPGAP-SUBS.md index 860cf7c..6d3d0b2 100644 --- a/playbooks/subtitles/STOPGAP-SUBS.md +++ b/playbooks/subtitles/STOPGAP-SUBS.md @@ -6,6 +6,16 @@ v4 WhisperX (ROADMAP H5) lands on the friend RTX 4080 node, **regenerate every show on this list** with proper-noun-prompted transcription and replace the sidecars in place. Keep this file as the v4 worklist. +**NOT a stop-gap** (do NOT log here): embedded original-release bitmap +subs (PGS, VobSub, `dvd_subtitle`). Per [`STYLE.md`](STYLE.md) tier 2, +those are first-class — they're the original studio render and ship +as-is. Examples currently in library that are correct, not stop-gap: +- Lilo & Stitch (2002) — 2× embedded English PGS +- Archer (2009) S02 — 3× embedded DVD-bitmap (eng/spa/fre) + +Optional `pgsrip` OCR sidecar for those is a UX nicety, not a +correctness fix — see STYLE.md "OCR bitmap → text". + ## Active stop-gaps | Show | Eps subbed | Source path | Why stop-gap | Owner verdict | Logged | diff --git a/playbooks/subtitles/STYLE.md b/playbooks/subtitles/STYLE.md index ac7dd69..998eead 100644 --- a/playbooks/subtitles/STYLE.md +++ b/playbooks/subtitles/STYLE.md @@ -3,17 +3,41 @@ The bar every fetch should hit. If a recipe step would violate any of these, stop and ask before proceeding. +## Source priority (highest → lowest) + +Accuracy beats format. Use this tier ladder before reaching for OCR/AI: + +1. **Original release text subs** (`.srt`/`.ass` from disc/streamer rip, + embedded or sidecar). Ground truth — ship as-is. +2. **Original release bitmap subs** (PGS, VobSub, `dvd_subtitle`, + embedded). **Acceptable in their native form** — they ARE the original + words from the source master, just rendered as images. Jellyfin server + burns them in for clients that can't render natively. Optionally + OCR-extract a `.srt` sidecar alongside (do NOT replace the embedded + stream) when client-side styling, search, or mobile rendering matters. +3. **Trusted text rips** from OpenSubtitles (verified uploads, hash-match + or high-download-count + frame-rate-match). +4. **WhisperX rebuild** with `--initial_prompt` proper-nouns yaml — only + when no original exists (e.g. user-uploaded YT content with auto-CC). + Logged in [`STOPGAP-SUBS.md`](STOPGAP-SUBS.md) until cleared. + +Tier 1 and 2 are first-class. Tier 3 is a fallback. Tier 4 is a stop-gap. + ## What lands on disk -- **Exactly one** English subtitle file per episode. -- Filename: `.eng.srt` — no language-region tags (`en-US`), - no flag stack on regular subs (no `.sdh`, no `.forced`, no `.cc` unless - there genuinely is no plain-English option). -- Format: `.srt` (SubRip text). Skip `.ass`, `.ssa`, `.vtt`, `.sup`, `.idx` - unless the source has nothing else; convert with `ffmpeg -map 0:s:0 -c:s srt` - in that case. -- Encoding: UTF-8. Re-encode with `iconv` if a sidecar comes back as cp1252 - / windows-1250. +- **At least one** English subtitle stream per episode (embedded OR + sidecar — not both required). +- Sidecar filename when used: `.eng.srt` — no + language-region tags (`en-US`), no flag stack on regular subs (no + `.sdh`, no `.forced`, no `.cc` unless there genuinely is no + plain-English option). +- Sidecar format: `.srt` (SubRip text). For embedded: keep the original + codec (`subrip`, `ass`, `pgs`, `vobsub`, `dvd_subtitle`) — do NOT + re-mux just to convert format. Convert only when extracting to disk: + text codecs via `ffmpeg -map 0:s:0 -c:s srt`, bitmap codecs via OCR + (see "OCR bitmap → text" below). +- Encoding (sidecar): UTF-8. Re-encode with `iconv` if a sidecar comes + back as cp1252 / windows-1250. ## What gets picked @@ -40,12 +64,41 @@ After fetch, **eyeball-verify one sample episode per show** plays in sync ## What doesn't ship - Multiple language tracks per episode (no German/French alternatives — - English-only library). + English-only library). Drop non-English embedded streams via + `mkvpropedit` only if user complains about client picker clutter; do + NOT silently strip them on import. - Director's commentary, behind-the-scenes, song-only subs. - Subs that cover only a partial runtime (the partial-cover heuristic isn't scripted yet; spot-check duration vs episode runtime if a srt looks short). - "All-episodes-in-one" mega-packs treated as a single episode's sidecar. +## OCR bitmap → text (optional, tier-2 augmentation) + +Embedded PGS/VobSub/`dvd_subtitle` are acceptable as-is (tier 2). OCR +becomes worthwhile when: (a) client repeatedly transcodes due to bitmap +burn-in (CPU pressure on nullstone — no GPU transcode available), (b) +user wants to restyle font/size on a specific show, (c) mobile client +renders bitmap subs poorly. + +Recipe (`pgsrip`, batch-friendly, Tesseract-backed): + +```bash +pip install pgsrip +# PGS: extract embedded to .sup +ffmpeg -i input.mkv -map 0:s:0 -c copy subs.sup +pgsrip --language eng subs.sup # -> subs.srt + +# VobSub/dvd_subtitle: extract to .idx + .sub +mkvextract input.mkv tracks 2:subs.idx +pgsrip subs.idx # -> subs.srt +``` + +OCR accuracy ~90–95 % raw, ~95–98 % after Subtitle Edit cleanup. Source +words are correct (it's transcription of original render, not Whisper +hallucination) — only font recognition fights you. Resulting `.srt` +ships as sidecar **alongside** the embedded bitmap stream, not as a +replacement. + ## How the UI presents subs The detail-page subtitle dropdown is shimmed via @@ -62,6 +115,14 @@ Revert: `bin/revert-sub-label-shim.sh`. - Boutique-release-group quality bar from [`README.md`](../../README.md): "every show and film is the best version I could put together." +- **Original-release subs > pretty format.** The DVD/BD/streamer master + is the canonical script — bitmap or text, those are the words the + studio shipped. An OCR'd or AI-rebuilt sidecar is a derivative that + introduces error (font confusion, mistranscription); the original + doesn't. Especially true for older shows (Futurama S1–S3, Archer + early seasons) where the master is the only authoritative source and + upscale artifacts already dominate the visual experience — bitmap + subs match the source vibe. - One-language library = one stream per ep = no need to expose codec or source in UI. - SDH/CC adds `[door slams]`, `[music]` etc. — distracting on first watch