legacy-arrflix/processes/subtitles/runs/sassy-the-sasquatch.md
s8n fba9a5bfeb processes/subtitles: STOPGAP-SUBS.md tracker for v3.5 → v4 cross-ref
Owner accepted Sassy the Sasquatch S01 v3.5 YouTube-auto-CC subs as
'85 % acceptable, fine, not great' but flagged them for v4 WhisperX
rebuild. Adds a single worklist file (STOPGAP-SUBS.md) so every show
that ships via the v3.5 path gets logged for the eventual v4 sweep
instead of being silently forgotten.

Sassy run log gets a STOP-GAP banner at the top pointing to the new
worklist. README.md gets a stop-gap-exception note alongside the
STYLE.md hard-prereq paragraph. ROADMAP H5 now points at the worklist
file as the canonical source of which shows v4 needs to regenerate.
2026-05-10 01:18:27 +01:00

103 lines
4.1 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Subtitle run — `Sassy the Sasquatch (2022)`
> ⚠ **STOP-GAP — needs v4 WhisperX cross-ref.** Owner accepted current
> subs as "85 %, acceptable" but tracked for full rebuild when v4 lands
> (ROADMAP H5). See [`STOPGAP-SUBS.md`](../STOPGAP-SUBS.md).
Recipe version: v3.5 — YouTube auto-CC via yt-dlp + cleaner (v4 WhisperX planned, see ROADMAP)
Run date: 2026-05-10
Operator: Claude Code @ onyx session, ai-lab cwd
## Source
| Field | Value |
|---|---|
| Episodes | 5 (S01 only) |
| Container | mkv |
| Video | AV1 Main, 1920×1080, 29.97 fps |
| Audio | `eng` Opus stereo (default) |
| Embedded subs | none (only font / cover-art attachments) |
| Existing sidecars | none |
| Runtime | ~11:20 per episode |
| Distribution | YouTube (THE BIG LEZ SHOW OFFICIAL channel, creator: Jarrad Wright) |
Niche-show indie animation. Same channel hosts Donny & Clarence Show, Mike
Nolan Show, Big Lez Saga — all four shows in our library are Jarrad Wright
productions distributed YouTube-first.
## Series + library context
- Series Id: `b2d1afd8a4a30c59adb42ccaf47376c2`
- Library: `767bffe4f11c93ef34b805451a696a4e` (TV Shows, `/media/tv`)
- IMDB series: `tt21209936`
- TVDB series: `421839`
- Per-episode IMDB ids: only S01E01 (`tt21215354`) — rest blank in TVDB
## Coverage probe — paid + free providers
Three parallel research agents (2026-05-10) checked every realistic source
before falling back to YouTube:
| Provider | Hits |
|---|---|
| OpenSubtitles.com REST (`parent_imdb_id=21209936`) | 1 — `SASSY THE SASQUATCH.Web-DL.1080p.en` S01E01, **HI-flagged** |
| OpenSubtitles.org legacy XML-RPC | 0 (account login 401 anyway) |
| Addic7ed | 0 |
| SubDL | 0 (`subtitles_count: 0`) |
| SubSource (Subscene successor) | 0 |
| Podnapisi | 0 |
| OS VIP upgrade | **would not unlock anything** — VIP is download-cap relief, not coverage. Same catalog as free. |
Conclusion: nothing exists outside YouTube. Buying VIP would not help; the
honest path is auto-generated subs.
## Outcome
| Season | Eps | Subs fetched | Quality | Notes |
|---|---|---|---|---|
| S01 | 5 | 5 / 5 | YT auto-CC stop-gap (lowercase, no punctuation, names mangled) | Cleaned via `lib/yt-clean.py`. v4 WhisperX rebuild planned |
Net: **5 / 5 (100 %)** — but at the lowest tier of the USER-G quality bar.
## Pipeline used
1. `yt-dlp --skip-download --write-auto-subs --sub-langs en-orig` against
the official Sassy playlist (`PLGMC7oz7XpmDMGrALMQiNXCi9p7aqkWbj`) →
raw VTT per episode in `/tmp/sassy-research/`.
2. `lib/yt-clean.py` collapses the rolling-window VTT (each cue carries 2-3
stale lines plus the freshly-spoken bottom line) into deduplicated SRT.
3. SSH cat redirect each cleaned `.srt` to nullstone at
`/home/user/media/tv/Sassy the Sasquatch (2022)/Season 01/<base>.eng.srt`
with library filename.
4. Validation-only library refresh; verified all 5 eps show exactly 1
external eng sub stream.
Reusable pipeline now lives at `lib/sub-yt-fetch.sh` (wrapper) +
`lib/yt-clean.py` (cleaner). Same one-liner handles Donny & Clarence,
Mike Nolan, Big Lez Saga (all on the same channel).
## Quality known issues
- **Lowercase, no punctuation** — YT ASR output verbatim
- **Proper-noun mishears**: "Sassy" → `sasha`, "Big Lez" → `Big Less`
- **Profanity censored as `[ __ ]`** — passthrough from YT
- **Sentence segmentation absent** — cues split on word boundaries
These violate STYLE.md "best quality" and "clean" rules. Documented as
explicit stop-gap; v4 WhisperX rebuild restores quality bar.
## Mike Nolan special-case (deferred)
A YouTube upload titled "MIKE NOLAN SHOW | COMPLETE SEASON | SUBTITLES"
posted Oct 2025 carries hand-typed CC tracks. When subbing Mike Nolan,
prefer that single video (rip CC tracks) over the per-episode auto-CC
playlist path. Note added to v4 roadmap.
## Followups
- [ ] visually verify one Sassy episode plays in sync (recipe §6) — YT
auto-cap timing is usually tight but worth a sanity check
- [ ] when v4 WhisperX lands, regenerate Sassy + Donny & Clarence + Big
Lez Saga + Mike Nolan in one batch on the 4080 friend node
- [ ] for Mike Nolan, try the "COMPLETE SEASON | SUBTITLES" YT upload
before falling back to Whisper