Adds lib/sub-yt-fetch.sh (yt-dlp wrapper) and lib/yt-clean.py (collapses YouTube's rolling-window auto-caption VTT into a flat SRT). For shows distributed YouTube-first that have no community subs anywhere -- verified via three parallel research agents covering OpenSubtitles REST, OS legacy, Addic7ed, SubDL, SubSource, and Podnapisi for the 5 niche shows in the library, plus a price-vs-coverage analysis of OpenSubtitles VIP. Findings: OS VIP would not have helped on the niche shows (it is download-cap relief, not coverage unlock; same catalog as free). All 4 Jarrad Wright shows in the library (Sassy, Big Lez Saga, Donny & Clarence, Mike Nolan) live on the same channel and have only YouTube auto-CC available. v3.5 ships those, explicitly violating STYLE.md 'best quality' as a tracked stop-gap. Sassy the Sasquatch S01 5/5 episodes subbed with cleaned auto-CC. Mike Nolan special-case noted: a 'COMPLETE SEASON | SUBTITLES' YT upload from Oct 2025 carries hand-typed CCs and should be preferred over per-episode auto-CC when subbing that show. ROADMAP H5 added: v4 WhisperX large-v3 on the friend RTX 4080 node will regenerate the v3.5 stop-gap with proper-noun-prompted transcription (~4-6%% WER vs ~12%% YT auto-CC) and restore the STYLE.md quality bar. H1 OpenSubtitles credentials marked done (was completed 2026-05-09).
3.9 KiB
Subtitle run — Sassy the Sasquatch (2022)
Recipe version: v3.5 — YouTube auto-CC via yt-dlp + cleaner (v4 WhisperX planned, see ROADMAP) Run date: 2026-05-10 Operator: Claude Code @ onyx session, ai-lab cwd
Source
| Field | Value |
|---|---|
| Episodes | 5 (S01 only) |
| Container | mkv |
| Video | AV1 Main, 1920×1080, 29.97 fps |
| Audio | eng Opus stereo (default) |
| Embedded subs | none (only font / cover-art attachments) |
| Existing sidecars | none |
| Runtime | ~11:20 per episode |
| Distribution | YouTube (THE BIG LEZ SHOW OFFICIAL channel, creator: Jarrad Wright) |
Niche-show indie animation. Same channel hosts Donny & Clarence Show, Mike Nolan Show, Big Lez Saga — all four shows in our library are Jarrad Wright productions distributed YouTube-first.
Series + library context
- Series Id:
b2d1afd8a4a30c59adb42ccaf47376c2 - Library:
767bffe4f11c93ef34b805451a696a4e(TV Shows,/media/tv) - IMDB series:
tt21209936 - TVDB series:
421839 - Per-episode IMDB ids: only S01E01 (
tt21215354) — rest blank in TVDB
Coverage probe — paid + free providers
Three parallel research agents (2026-05-10) checked every realistic source before falling back to YouTube:
| Provider | Hits |
|---|---|
OpenSubtitles.com REST (parent_imdb_id=21209936) |
1 — SASSY THE SASQUATCH.Web-DL.1080p.en S01E01, HI-flagged |
| OpenSubtitles.org legacy XML-RPC | 0 (account login 401 anyway) |
| Addic7ed | 0 |
| SubDL | 0 (subtitles_count: 0) |
| SubSource (Subscene successor) | 0 |
| Podnapisi | 0 |
| OS VIP upgrade | would not unlock anything — VIP is download-cap relief, not coverage. Same catalog as free. |
Conclusion: nothing exists outside YouTube. Buying VIP would not help; the honest path is auto-generated subs.
Outcome
| Season | Eps | Subs fetched | Quality | Notes |
|---|---|---|---|---|
| S01 | 5 | 5 / 5 | YT auto-CC stop-gap (lowercase, no punctuation, names mangled) | Cleaned via lib/yt-clean.py. v4 WhisperX rebuild planned |
Net: 5 / 5 (100 %) — but at the lowest tier of the USER-G quality bar.
Pipeline used
yt-dlp --skip-download --write-auto-subs --sub-langs en-origagainst the official Sassy playlist (PLGMC7oz7XpmDMGrALMQiNXCi9p7aqkWbj) → raw VTT per episode in/tmp/sassy-research/.lib/yt-clean.pycollapses the rolling-window VTT (each cue carries 2-3 stale lines plus the freshly-spoken bottom line) into deduplicated SRT.- SSH cat redirect each cleaned
.srtto nullstone at/home/user/media/tv/Sassy the Sasquatch (2022)/Season 01/<base>.eng.srtwith library filename. - Validation-only library refresh; verified all 5 eps show exactly 1 external eng sub stream.
Reusable pipeline now lives at lib/sub-yt-fetch.sh (wrapper) +
lib/yt-clean.py (cleaner). Same one-liner handles Donny & Clarence,
Mike Nolan, Big Lez Saga (all on the same channel).
Quality known issues
- Lowercase, no punctuation — YT ASR output verbatim
- Proper-noun mishears: "Sassy" →
sasha, "Big Lez" →Big Less - Profanity censored as
[ __ ]— passthrough from YT - Sentence segmentation absent — cues split on word boundaries
These violate STYLE.md "best quality" and "clean" rules. Documented as explicit stop-gap; v4 WhisperX rebuild restores quality bar.
Mike Nolan special-case (deferred)
A YouTube upload titled "MIKE NOLAN SHOW | COMPLETE SEASON | SUBTITLES" posted Oct 2025 carries hand-typed CC tracks. When subbing Mike Nolan, prefer that single video (rip CC tracks) over the per-episode auto-CC playlist path. Note added to v4 roadmap.
Followups
- visually verify one Sassy episode plays in sync (recipe §6) — YT auto-cap timing is usually tight but worth a sanity check
- when v4 WhisperX lands, regenerate Sassy + Donny & Clarence + Big Lez Saga + Mike Nolan in one batch on the 4080 friend node
- for Mike Nolan, try the "COMPLETE SEASON | SUBTITLES" YT upload before falling back to Whisper