From fba9a5bfebbb0a5e3fab5a76f0592da946885d3d Mon Sep 17 00:00:00 2001 From: s8n Date: Sun, 10 May 2026 01:18:27 +0100 Subject: [PATCH] =?UTF-8?q?processes/subtitles:=20STOPGAP-SUBS.md=20tracke?= =?UTF-8?q?r=20for=20v3.5=20=E2=86=92=20v4=20cross-ref?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Owner accepted Sassy the Sasquatch S01 v3.5 YouTube-auto-CC subs as '85 % acceptable, fine, not great' but flagged them for v4 WhisperX rebuild. Adds a single worklist file (STOPGAP-SUBS.md) so every show that ships via the v3.5 path gets logged for the eventual v4 sweep instead of being silently forgotten. Sassy run log gets a STOP-GAP banner at the top pointing to the new worklist. README.md gets a stop-gap-exception note alongside the STYLE.md hard-prereq paragraph. ROADMAP H5 now points at the worklist file as the canonical source of which shows v4 needs to regenerate. --- ROADMAP.md | 2 +- processes/subtitles/README.md | 5 ++ processes/subtitles/STOPGAP-SUBS.md | 46 +++++++++++++++++++ .../subtitles/runs/sassy-the-sasquatch.md | 4 ++ 4 files changed, 56 insertions(+), 1 deletion(-) create mode 100644 processes/subtitles/STOPGAP-SUBS.md diff --git a/ROADMAP.md b/ROADMAP.md index 2ee145b..41fa188 100644 --- a/ROADMAP.md +++ b/ROADMAP.md @@ -28,7 +28,7 @@ Last revised: **2026-05-08** | H2 | GPU transcode (nvidia driver kernel module + container toolkit + SecureBoot signing) | L | **owner sudo + reboot** | | H3 | Apply `bin/force-english-all-users.sh` (German Play button breaks UX for non-English browsers) | S | none — owner runs | | H4 | Backup `/home/docker/jellyfin/config/` off-host (no automated backup yet) | M | strategy decision | -| H5 | **v4 subtitle path: WhisperX large-v3 on friend RTX 4080 node**. Regenerate Sassy + Big Lez Saga + Donny & Clarence + Mike Nolan with proper-noun prompts (replaces v3.5 YT auto-CC stop-gap). New helper at `processes/subtitles/lib/sub-whisperx-fetch.py`. WhisperX install on 100.64.0.3 (per memory `project_friend_gpu.md`, currently offline 2d); per-show prompt yaml at `processes/subtitles/prompts/.yaml` (recurring proper nouns). Expected 4–6 % WER vs ~12 % for YT auto-CC; restores STYLE.md "best quality" bar. See `processes/subtitles/runs/sassy-the-sasquatch.md` for context. | M | friend node back online + WhisperX setup | +| H5 | **v4 subtitle path: WhisperX large-v3 on friend RTX 4080 node**. Worklist = `processes/subtitles/STOPGAP-SUBS.md` (currently Sassy 5/5, will grow as more Big Lez universe shows ship via v3.5). Replaces v3.5 YT auto-CC stop-gap with proper-noun-prompted transcription. New helper at `processes/subtitles/lib/sub-whisperx-fetch.py`. WhisperX install on `100.64.0.3` (per memory `project_friend_gpu.md`, currently offline 2d); per-show prompt yaml at `processes/subtitles/prompts/.yaml`. Expected 4–6 % WER vs ~12 % for YT auto-CC; restores STYLE.md "best quality" bar. | M | friend node back online + WhisperX setup | ## 🟨 Open — Medium value diff --git a/processes/subtitles/README.md b/processes/subtitles/README.md index 03af4c3..e738eaa 100644 --- a/processes/subtitles/README.md +++ b/processes/subtitles/README.md @@ -13,6 +13,11 @@ how Jellyfin and the OpenSubtitles plugin work together lives in > AI / no Forced), best-quality release. The picker logic in v1/v2/v3 > mirrors that bar; if a step would violate it, stop and ask before > downloading. +> +> Stop-gap exception: when the only available source is the v3.5 YouTube +> auto-CC path (lowercase, censored, mangled names), ship the sub but +> **add the show to [`STOPGAP-SUBS.md`](STOPGAP-SUBS.md)** so v4 WhisperX +> picks it up later. --- diff --git a/processes/subtitles/STOPGAP-SUBS.md b/processes/subtitles/STOPGAP-SUBS.md new file mode 100644 index 0000000..e3a74fb --- /dev/null +++ b/processes/subtitles/STOPGAP-SUBS.md @@ -0,0 +1,46 @@ +# Stop-gap subs — pending Whisper cross-ref + +Shows whose current subtitles ship from a path that explicitly violates +[`STYLE.md`](STYLE.md). Quality is "acceptable, not great" (~85 %). When +v4 WhisperX (ROADMAP H5) lands on the friend RTX 4080 node, **regenerate +every show on this list** with proper-noun-prompted transcription and +replace the sidecars in place. Keep this file as the v4 worklist. + +## Active stop-gaps + +| Show | Eps subbed | Source path | Why stop-gap | Owner verdict | Logged | +|---|---|---|---|---|---| +| Sassy the Sasquatch (2022) | S01 5/5 | v3.5 YouTube auto-CC | lowercase, no punctuation, names mangled (`Sassy → sasha`), profanity = `[ __ ]` | "85 % the way there, acceptable, fine" — keep until v4 | 2026-05-10 | + +## When more Big Lez universe shows ship via v3.5 + +Same channel hosts these — when subbed via the v3.5 yt-dlp path, append +to the table above: + +- The Donny & Clarence Show (2024) +- The Big Lez Saga (2022) +- The Mike Nolan Show (2016) — but **try the YT "COMPLETE SEASON | SUBTITLES" + upload first** for hand-typed CCs before falling back to auto-CC + +## v4 WhisperX rebuild plan + +When the friend node (`100.64.0.3`, per memory `project_friend_gpu.md`) is +back online: + +1. Install WhisperX on the node (CUDA 12 + cuDNN 9 + faster-whisper + + pyannote VAD). +2. For each show in the table above, write + `processes/subtitles/prompts/.yaml` with the recurring proper + nouns the YT auto-CC mangled. +3. Run `lib/sub-whisperx-fetch.py` (TBD, ROADMAP H5) per show. Each + episode: pull mkv → ffmpeg extract 16k mono wav → WhisperX large-v3 + with `--initial_prompt` from the yaml → SRT → SSH push to nullstone + with library filename, **overwriting the v3.5 sidecar in place**. +4. Tick off the row from the table; move it to a "Cleared via v4" archive + section below this one (kept as record). +5. Library scan; verify Jellyfin still reports 1 external eng sub stream + per ep (no dupes from v3.5 + v4 stacking). + +## Cleared via v4 (archive) + +(empty — populate as v4 rebuilds land) diff --git a/processes/subtitles/runs/sassy-the-sasquatch.md b/processes/subtitles/runs/sassy-the-sasquatch.md index b341e1e..c9ff21d 100644 --- a/processes/subtitles/runs/sassy-the-sasquatch.md +++ b/processes/subtitles/runs/sassy-the-sasquatch.md @@ -1,5 +1,9 @@ # Subtitle run — `Sassy the Sasquatch (2022)` +> ⚠ **STOP-GAP — needs v4 WhisperX cross-ref.** Owner accepted current +> subs as "85 %, acceptable" but tracked for full rebuild when v4 lands +> (ROADMAP H5). See [`STOPGAP-SUBS.md`](../STOPGAP-SUBS.md). + Recipe version: v3.5 — YouTube auto-CC via yt-dlp + cleaner (v4 WhisperX planned, see ROADMAP) Run date: 2026-05-10 Operator: Claude Code @ onyx session, ai-lab cwd