Original-release bitmap subs (PGS, VobSub, dvd_subtitle) are first-class, not stop-gaps. They're the canonical studio render — bitmap encoding is just a format choice, not a quality compromise. OCR'd or AI-rebuilt sidecars introduce transcription error that the source doesn't have. STYLE.md changes: - New "Source priority" section with 4 tiers: original text > original bitmap > trusted text rips > WhisperX rebuild. - "What lands on disk" loosened: at least one English stream (embedded OR sidecar), keep embedded codec as-is, sidecar still .srt. - New "OCR bitmap -> text" section documenting pgsrip recipe as an optional UX-nicety augmentation, not a correctness fix. - "Why these rules" now explains why original > pretty (esp. for older shows like Futurama S1-3 / early Archer where the master is the only authoritative source and upscale artifacts already dominate). STOPGAP-SUBS.md: header note clarifying bitmap-from-disc is NOT a stop-gap; lists Lilo & Stitch (2002) and Archer (2009) S02 as examples of correct-as-shipped library entries.
2.5 KiB
Stop-gap subs — pending Whisper cross-ref
Shows whose current subtitles ship from a path that explicitly violates
STYLE.md. Quality is "acceptable, not great" (~85 %). When
v4 WhisperX (ROADMAP H5) lands on the friend RTX 4080 node, regenerate
every show on this list with proper-noun-prompted transcription and
replace the sidecars in place. Keep this file as the v4 worklist.
NOT a stop-gap (do NOT log here): embedded original-release bitmap
subs (PGS, VobSub, dvd_subtitle). Per STYLE.md tier 2,
those are first-class — they're the original studio render and ship
as-is. Examples currently in library that are correct, not stop-gap:
- Lilo & Stitch (2002) — 2× embedded English PGS
- Archer (2009) S02 — 3× embedded DVD-bitmap (eng/spa/fre)
Optional pgsrip OCR sidecar for those is a UX nicety, not a
correctness fix — see STYLE.md "OCR bitmap → text".
Active stop-gaps
| Show | Eps subbed | Source path | Why stop-gap | Owner verdict | Logged |
|---|---|---|---|---|---|
| Sassy the Sasquatch (2022) | S01 5/5 | v3.5 YouTube auto-CC | lowercase, no punctuation, names mangled (Sassy → sasha), profanity = [ __ ] |
"85 % the way there, acceptable, fine" — keep until v4 | 2026-05-10 |
When more Big Lez universe shows ship via v3.5
Same channel hosts these — when subbed via the v3.5 yt-dlp path, append to the table above:
- The Donny & Clarence Show (2024)
- The Big Lez Saga (2022)
- The Mike Nolan Show (2016) — but try the YT "COMPLETE SEASON | SUBTITLES" upload first for hand-typed CCs before falling back to auto-CC
v4 WhisperX rebuild plan
When the friend node (100.64.0.3, per memory project_friend_gpu.md) is
back online:
- Install WhisperX on the node (CUDA 12 + cuDNN 9 + faster-whisper + pyannote VAD).
- For each show in the table above, write
playbooks/subtitles/prompts/<show>.yamlwith the recurring proper nouns the YT auto-CC mangled. - Run
lib/sub-whisperx-fetch.py(TBD, ROADMAP H5) per show. Each episode: pull mkv → ffmpeg extract 16k mono wav → WhisperX large-v3 with--initial_promptfrom the yaml → SRT → SSH push to nullstone with library filename, overwriting the v3.5 sidecar in place. - Tick off the row from the table; move it to a "Cleared via v4" archive section below this one (kept as record).
- Library scan; verify Jellyfin still reports 1 external eng sub stream per ep (no dupes from v3.5 + v4 stacking).
Cleared via v4 (archive)
(empty — populate as v4 rebuilds land)