# Subtitle run — `Sassy the Sasquatch (2022)` Recipe version: v3.5 — YouTube auto-CC via yt-dlp + cleaner (v4 WhisperX planned, see ROADMAP) Run date: 2026-05-10 Operator: Claude Code @ onyx session, ai-lab cwd ## Source | Field | Value | |---|---| | Episodes | 5 (S01 only) | | Container | mkv | | Video | AV1 Main, 1920×1080, 29.97 fps | | Audio | `eng` Opus stereo (default) | | Embedded subs | none (only font / cover-art attachments) | | Existing sidecars | none | | Runtime | ~11:20 per episode | | Distribution | YouTube (THE BIG LEZ SHOW OFFICIAL channel, creator: Jarrad Wright) | Niche-show indie animation. Same channel hosts Donny & Clarence Show, Mike Nolan Show, Big Lez Saga — all four shows in our library are Jarrad Wright productions distributed YouTube-first. ## Series + library context - Series Id: `b2d1afd8a4a30c59adb42ccaf47376c2` - Library: `767bffe4f11c93ef34b805451a696a4e` (TV Shows, `/media/tv`) - IMDB series: `tt21209936` - TVDB series: `421839` - Per-episode IMDB ids: only S01E01 (`tt21215354`) — rest blank in TVDB ## Coverage probe — paid + free providers Three parallel research agents (2026-05-10) checked every realistic source before falling back to YouTube: | Provider | Hits | |---|---| | OpenSubtitles.com REST (`parent_imdb_id=21209936`) | 1 — `SASSY THE SASQUATCH.Web-DL.1080p.en` S01E01, **HI-flagged** | | OpenSubtitles.org legacy XML-RPC | 0 (account login 401 anyway) | | Addic7ed | 0 | | SubDL | 0 (`subtitles_count: 0`) | | SubSource (Subscene successor) | 0 | | Podnapisi | 0 | | OS VIP upgrade | **would not unlock anything** — VIP is download-cap relief, not coverage. Same catalog as free. | Conclusion: nothing exists outside YouTube. Buying VIP would not help; the honest path is auto-generated subs. ## Outcome | Season | Eps | Subs fetched | Quality | Notes | |---|---|---|---|---| | S01 | 5 | 5 / 5 | YT auto-CC stop-gap (lowercase, no punctuation, names mangled) | Cleaned via `lib/yt-clean.py`. v4 WhisperX rebuild planned | Net: **5 / 5 (100 %)** — but at the lowest tier of the USER-G quality bar. ## Pipeline used 1. `yt-dlp --skip-download --write-auto-subs --sub-langs en-orig` against the official Sassy playlist (`PLGMC7oz7XpmDMGrALMQiNXCi9p7aqkWbj`) → raw VTT per episode in `/tmp/sassy-research/`. 2. `lib/yt-clean.py` collapses the rolling-window VTT (each cue carries 2-3 stale lines plus the freshly-spoken bottom line) into deduplicated SRT. 3. SSH cat redirect each cleaned `.srt` to nullstone at `/home/user/media/tv/Sassy the Sasquatch (2022)/Season 01/.eng.srt` with library filename. 4. Validation-only library refresh; verified all 5 eps show exactly 1 external eng sub stream. Reusable pipeline now lives at `lib/sub-yt-fetch.sh` (wrapper) + `lib/yt-clean.py` (cleaner). Same one-liner handles Donny & Clarence, Mike Nolan, Big Lez Saga (all on the same channel). ## Quality known issues - **Lowercase, no punctuation** — YT ASR output verbatim - **Proper-noun mishears**: "Sassy" → `sasha`, "Big Lez" → `Big Less` - **Profanity censored as `[ __ ]`** — passthrough from YT - **Sentence segmentation absent** — cues split on word boundaries These violate STYLE.md "best quality" and "clean" rules. Documented as explicit stop-gap; v4 WhisperX rebuild restores quality bar. ## Mike Nolan special-case (deferred) A YouTube upload titled "MIKE NOLAN SHOW | COMPLETE SEASON | SUBTITLES" posted Oct 2025 carries hand-typed CC tracks. When subbing Mike Nolan, prefer that single video (rip CC tracks) over the per-episode auto-CC playlist path. Note added to v4 roadmap. ## Followups - [ ] visually verify one Sassy episode plays in sync (recipe §6) — YT auto-cap timing is usually tight but worth a sanity check - [ ] when v4 WhisperX lands, regenerate Sassy + Donny & Clarence + Big Lez Saga + Mike Nolan in one batch on the 4080 friend node - [ ] for Mike Nolan, try the "COMPLETE SEASON | SUBTITLES" YT upload before falling back to Whisper