legacy-arrflix/playbooks/subtitles/runs/sassy-the-sasquatch.md
s8n 24a9497e7d playbooks/ rename + import-media v1.0 + lilo&stitch run
processes/ -> playbooks/ (git mv preserves history; updated cross-refs
in ROADMAP, README, subtitles playbook + scripts).

playbooks/import-media/README.md v1.0 — 7-step import workflow:
  stage on onyx -> rsync to nullstone -> chmod -> verify scan ->
  Items/Counts bump -> optional subtitle pass -> run-log
Cross-references docs/05/07/08, ADMIN-GUIDE, README. Mirrors the
existing subtitles playbook structure (CHANGELOG + runs/_template).

CHANGELOG v1.0 lists known gaps (bin/cleanup-import.sh and
bin/normalize.py still doc-only, ROADMAP M6).

First run logged: playbooks/import-media/runs/lilo-stitch-2002.md.
Lilo & Stitch (2002) imported to /home/user/media/movies/, item
c2f4aff133c1b9631500fadf293b0b2f, TMDb 11544, MovieCount 3 -> 4.
LibraryMonitor didn't auto-fire — needed manual /Library/Refresh;
playbook updated to make this an unconditional step.

Source: 1080p BluRay HEVC 10-bit / EAC3 5.1 / 2x PGS embedded subs.
Per quality bar (README.md:41) — passes.
2026-05-10 02:29:57 +01:00

103 lines
4.1 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Subtitle run — `Sassy the Sasquatch (2022)`
> ⚠ **STOP-GAP — needs v4 WhisperX cross-ref.** Owner accepted current
> subs as "85 %, acceptable" but tracked for full rebuild when v4 lands
> (ROADMAP H5). See [`STOPGAP-SUBS.md`](../STOPGAP-SUBS.md).
Recipe version: v3.5 — YouTube auto-CC via yt-dlp + cleaner (v4 WhisperX planned, see ROADMAP)
Run date: 2026-05-10
Operator: Claude Code @ onyx session, ai-lab cwd
## Source
| Field | Value |
|---|---|
| Episodes | 5 (S01 only) |
| Container | mkv |
| Video | AV1 Main, 1920×1080, 29.97 fps |
| Audio | `eng` Opus stereo (default) |
| Embedded subs | none (only font / cover-art attachments) |
| Existing sidecars | none |
| Runtime | ~11:20 per episode |
| Distribution | YouTube (THE BIG LEZ SHOW OFFICIAL channel, creator: Jarrad Wright) |
Niche-show indie animation. Same channel hosts Donny & Clarence Show, Mike
Nolan Show, Big Lez Saga — all four shows in our library are Jarrad Wright
productions distributed YouTube-first.
## Series + library context
- Series Id: `b2d1afd8a4a30c59adb42ccaf47376c2`
- Library: `767bffe4f11c93ef34b805451a696a4e` (TV Shows, `/media/tv`)
- IMDB series: `tt21209936`
- TVDB series: `421839`
- Per-episode IMDB ids: only S01E01 (`tt21215354`) — rest blank in TVDB
## Coverage probe — paid + free providers
Three parallel research agents (2026-05-10) checked every realistic source
before falling back to YouTube:
| Provider | Hits |
|---|---|
| OpenSubtitles.com REST (`parent_imdb_id=21209936`) | 1 — `SASSY THE SASQUATCH.Web-DL.1080p.en` S01E01, **HI-flagged** |
| OpenSubtitles.org legacy XML-RPC | 0 (account login 401 anyway) |
| Addic7ed | 0 |
| SubDL | 0 (`subtitles_count: 0`) |
| SubSource (Subscene successor) | 0 |
| Podnapisi | 0 |
| OS VIP upgrade | **would not unlock anything** — VIP is download-cap relief, not coverage. Same catalog as free. |
Conclusion: nothing exists outside YouTube. Buying VIP would not help; the
honest path is auto-generated subs.
## Outcome
| Season | Eps | Subs fetched | Quality | Notes |
|---|---|---|---|---|
| S01 | 5 | 5 / 5 | YT auto-CC stop-gap (lowercase, no punctuation, names mangled) | Cleaned via `lib/yt-clean.py`. v4 WhisperX rebuild planned |
Net: **5 / 5 (100 %)** — but at the lowest tier of the USER-G quality bar.
## Pipeline used
1. `yt-dlp --skip-download --write-auto-subs --sub-langs en-orig` against
the official Sassy playlist (`PLGMC7oz7XpmDMGrALMQiNXCi9p7aqkWbj`) →
raw VTT per episode in `/tmp/sassy-research/`.
2. `lib/yt-clean.py` collapses the rolling-window VTT (each cue carries 2-3
stale lines plus the freshly-spoken bottom line) into deduplicated SRT.
3. SSH cat redirect each cleaned `.srt` to nullstone at
`/home/user/media/tv/Sassy the Sasquatch (2022)/Season 01/<base>.eng.srt`
with library filename.
4. Validation-only library refresh; verified all 5 eps show exactly 1
external eng sub stream.
Reusable pipeline now lives at `lib/sub-yt-fetch.sh` (wrapper) +
`lib/yt-clean.py` (cleaner). Same one-liner handles Donny & Clarence,
Mike Nolan, Big Lez Saga (all on the same channel).
## Quality known issues
- **Lowercase, no punctuation** — YT ASR output verbatim
- **Proper-noun mishears**: "Sassy" → `sasha`, "Big Lez" → `Big Less`
- **Profanity censored as `[ __ ]`** — passthrough from YT
- **Sentence segmentation absent** — cues split on word boundaries
These violate STYLE.md "best quality" and "clean" rules. Documented as
explicit stop-gap; v4 WhisperX rebuild restores quality bar.
## Mike Nolan special-case (deferred)
A YouTube upload titled "MIKE NOLAN SHOW | COMPLETE SEASON | SUBTITLES"
posted Oct 2025 carries hand-typed CC tracks. When subbing Mike Nolan,
prefer that single video (rip CC tracks) over the per-episode auto-CC
playlist path. Note added to v4 roadmap.
## Followups
- [ ] visually verify one Sassy episode plays in sync (recipe §6) — YT
auto-cap timing is usually tight but worth a sanity check
- [ ] when v4 WhisperX lands, regenerate Sassy + Donny & Clarence + Big
Lez Saga + Mike Nolan in one batch on the 4080 friend node
- [ ] for Mike Nolan, try the "COMPLETE SEASON | SUBTITLES" YT upload
before falling back to Whisper