Owner accepted Sassy the Sasquatch S01 v3.5 YouTube-auto-CC subs as '85 % acceptable, fine, not great' but flagged them for v4 WhisperX rebuild. Adds a single worklist file (STOPGAP-SUBS.md) so every show that ships via the v3.5 path gets logged for the eventual v4 sweep instead of being silently forgotten. Sassy run log gets a STOP-GAP banner at the top pointing to the new worklist. README.md gets a stop-gap-exception note alongside the STYLE.md hard-prereq paragraph. ROADMAP H5 now points at the worklist file as the canonical source of which shows v4 needs to regenerate.
103 lines
4.1 KiB
Markdown
103 lines
4.1 KiB
Markdown
# Subtitle run — `Sassy the Sasquatch (2022)`
|
||
|
||
> ⚠ **STOP-GAP — needs v4 WhisperX cross-ref.** Owner accepted current
|
||
> subs as "85 %, acceptable" but tracked for full rebuild when v4 lands
|
||
> (ROADMAP H5). See [`STOPGAP-SUBS.md`](../STOPGAP-SUBS.md).
|
||
|
||
Recipe version: v3.5 — YouTube auto-CC via yt-dlp + cleaner (v4 WhisperX planned, see ROADMAP)
|
||
Run date: 2026-05-10
|
||
Operator: Claude Code @ onyx session, ai-lab cwd
|
||
|
||
## Source
|
||
|
||
| Field | Value |
|
||
|---|---|
|
||
| Episodes | 5 (S01 only) |
|
||
| Container | mkv |
|
||
| Video | AV1 Main, 1920×1080, 29.97 fps |
|
||
| Audio | `eng` Opus stereo (default) |
|
||
| Embedded subs | none (only font / cover-art attachments) |
|
||
| Existing sidecars | none |
|
||
| Runtime | ~11:20 per episode |
|
||
| Distribution | YouTube (THE BIG LEZ SHOW OFFICIAL channel, creator: Jarrad Wright) |
|
||
|
||
Niche-show indie animation. Same channel hosts Donny & Clarence Show, Mike
|
||
Nolan Show, Big Lez Saga — all four shows in our library are Jarrad Wright
|
||
productions distributed YouTube-first.
|
||
|
||
## Series + library context
|
||
|
||
- Series Id: `b2d1afd8a4a30c59adb42ccaf47376c2`
|
||
- Library: `767bffe4f11c93ef34b805451a696a4e` (TV Shows, `/media/tv`)
|
||
- IMDB series: `tt21209936`
|
||
- TVDB series: `421839`
|
||
- Per-episode IMDB ids: only S01E01 (`tt21215354`) — rest blank in TVDB
|
||
|
||
## Coverage probe — paid + free providers
|
||
|
||
Three parallel research agents (2026-05-10) checked every realistic source
|
||
before falling back to YouTube:
|
||
|
||
| Provider | Hits |
|
||
|---|---|
|
||
| OpenSubtitles.com REST (`parent_imdb_id=21209936`) | 1 — `SASSY THE SASQUATCH.Web-DL.1080p.en` S01E01, **HI-flagged** |
|
||
| OpenSubtitles.org legacy XML-RPC | 0 (account login 401 anyway) |
|
||
| Addic7ed | 0 |
|
||
| SubDL | 0 (`subtitles_count: 0`) |
|
||
| SubSource (Subscene successor) | 0 |
|
||
| Podnapisi | 0 |
|
||
| OS VIP upgrade | **would not unlock anything** — VIP is download-cap relief, not coverage. Same catalog as free. |
|
||
|
||
Conclusion: nothing exists outside YouTube. Buying VIP would not help; the
|
||
honest path is auto-generated subs.
|
||
|
||
## Outcome
|
||
|
||
| Season | Eps | Subs fetched | Quality | Notes |
|
||
|---|---|---|---|---|
|
||
| S01 | 5 | 5 / 5 | YT auto-CC stop-gap (lowercase, no punctuation, names mangled) | Cleaned via `lib/yt-clean.py`. v4 WhisperX rebuild planned |
|
||
|
||
Net: **5 / 5 (100 %)** — but at the lowest tier of the USER-G quality bar.
|
||
|
||
## Pipeline used
|
||
|
||
1. `yt-dlp --skip-download --write-auto-subs --sub-langs en-orig` against
|
||
the official Sassy playlist (`PLGMC7oz7XpmDMGrALMQiNXCi9p7aqkWbj`) →
|
||
raw VTT per episode in `/tmp/sassy-research/`.
|
||
2. `lib/yt-clean.py` collapses the rolling-window VTT (each cue carries 2-3
|
||
stale lines plus the freshly-spoken bottom line) into deduplicated SRT.
|
||
3. SSH cat redirect each cleaned `.srt` to nullstone at
|
||
`/home/user/media/tv/Sassy the Sasquatch (2022)/Season 01/<base>.eng.srt`
|
||
with library filename.
|
||
4. Validation-only library refresh; verified all 5 eps show exactly 1
|
||
external eng sub stream.
|
||
|
||
Reusable pipeline now lives at `lib/sub-yt-fetch.sh` (wrapper) +
|
||
`lib/yt-clean.py` (cleaner). Same one-liner handles Donny & Clarence,
|
||
Mike Nolan, Big Lez Saga (all on the same channel).
|
||
|
||
## Quality known issues
|
||
|
||
- **Lowercase, no punctuation** — YT ASR output verbatim
|
||
- **Proper-noun mishears**: "Sassy" → `sasha`, "Big Lez" → `Big Less`
|
||
- **Profanity censored as `[ __ ]`** — passthrough from YT
|
||
- **Sentence segmentation absent** — cues split on word boundaries
|
||
|
||
These violate STYLE.md "best quality" and "clean" rules. Documented as
|
||
explicit stop-gap; v4 WhisperX rebuild restores quality bar.
|
||
|
||
## Mike Nolan special-case (deferred)
|
||
|
||
A YouTube upload titled "MIKE NOLAN SHOW | COMPLETE SEASON | SUBTITLES"
|
||
posted Oct 2025 carries hand-typed CC tracks. When subbing Mike Nolan,
|
||
prefer that single video (rip CC tracks) over the per-episode auto-CC
|
||
playlist path. Note added to v4 roadmap.
|
||
|
||
## Followups
|
||
|
||
- [ ] visually verify one Sassy episode plays in sync (recipe §6) — YT
|
||
auto-cap timing is usually tight but worth a sanity check
|
||
- [ ] when v4 WhisperX lands, regenerate Sassy + Donny & Clarence + Big
|
||
Lez Saga + Mike Nolan in one batch on the 4080 friend node
|
||
- [ ] for Mike Nolan, try the "COMPLETE SEASON | SUBTITLES" YT upload
|
||
before falling back to Whisper
|