legacy-arrflix/processes/subtitles/STOPGAP-SUBS.md
s8n fba9a5bfeb processes/subtitles: STOPGAP-SUBS.md tracker for v3.5 → v4 cross-ref
Owner accepted Sassy the Sasquatch S01 v3.5 YouTube-auto-CC subs as
'85 % acceptable, fine, not great' but flagged them for v4 WhisperX
rebuild. Adds a single worklist file (STOPGAP-SUBS.md) so every show
that ships via the v3.5 path gets logged for the eventual v4 sweep
instead of being silently forgotten.

Sassy run log gets a STOP-GAP banner at the top pointing to the new
worklist. README.md gets a stop-gap-exception note alongside the
STYLE.md hard-prereq paragraph. ROADMAP H5 now points at the worklist
file as the canonical source of which shows v4 needs to regenerate.
2026-05-10 01:18:27 +01:00

46 lines
2 KiB
Markdown

# Stop-gap subs — pending Whisper cross-ref
Shows whose current subtitles ship from a path that explicitly violates
[`STYLE.md`](STYLE.md). Quality is "acceptable, not great" (~85 %). When
v4 WhisperX (ROADMAP H5) lands on the friend RTX 4080 node, **regenerate
every show on this list** with proper-noun-prompted transcription and
replace the sidecars in place. Keep this file as the v4 worklist.
## Active stop-gaps
| Show | Eps subbed | Source path | Why stop-gap | Owner verdict | Logged |
|---|---|---|---|---|---|
| Sassy the Sasquatch (2022) | S01 5/5 | v3.5 YouTube auto-CC | lowercase, no punctuation, names mangled (`Sassy → sasha`), profanity = `[ __ ]` | "85 % the way there, acceptable, fine" — keep until v4 | 2026-05-10 |
## When more Big Lez universe shows ship via v3.5
Same channel hosts these — when subbed via the v3.5 yt-dlp path, append
to the table above:
- The Donny & Clarence Show (2024)
- The Big Lez Saga (2022)
- The Mike Nolan Show (2016) — but **try the YT "COMPLETE SEASON | SUBTITLES"
upload first** for hand-typed CCs before falling back to auto-CC
## v4 WhisperX rebuild plan
When the friend node (`100.64.0.3`, per memory `project_friend_gpu.md`) is
back online:
1. Install WhisperX on the node (CUDA 12 + cuDNN 9 + faster-whisper +
pyannote VAD).
2. For each show in the table above, write
`processes/subtitles/prompts/<show>.yaml` with the recurring proper
nouns the YT auto-CC mangled.
3. Run `lib/sub-whisperx-fetch.py` (TBD, ROADMAP H5) per show. Each
episode: pull mkv → ffmpeg extract 16k mono wav → WhisperX large-v3
with `--initial_prompt` from the yaml → SRT → SSH push to nullstone
with library filename, **overwriting the v3.5 sidecar in place**.
4. Tick off the row from the table; move it to a "Cleared via v4" archive
section below this one (kept as record).
5. Library scan; verify Jellyfin still reports 1 external eng sub stream
per ep (no dupes from v3.5 + v4 stacking).
## Cleared via v4 (archive)
(empty — populate as v4 rebuilds land)