processes/ -> playbooks/ (git mv preserves history; updated cross-refs in ROADMAP, README, subtitles playbook + scripts). playbooks/import-media/README.md v1.0 — 7-step import workflow: stage on onyx -> rsync to nullstone -> chmod -> verify scan -> Items/Counts bump -> optional subtitle pass -> run-log Cross-references docs/05/07/08, ADMIN-GUIDE, README. Mirrors the existing subtitles playbook structure (CHANGELOG + runs/_template). CHANGELOG v1.0 lists known gaps (bin/cleanup-import.sh and bin/normalize.py still doc-only, ROADMAP M6). First run logged: playbooks/import-media/runs/lilo-stitch-2002.md. Lilo & Stitch (2002) imported to /home/user/media/movies/, item c2f4aff133c1b9631500fadf293b0b2f, TMDb 11544, MovieCount 3 -> 4. LibraryMonitor didn't auto-fire — needed manual /Library/Refresh; playbook updated to make this an unconditional step. Source: 1080p BluRay HEVC 10-bit / EAC3 5.1 / 2x PGS embedded subs. Per quality bar (README.md:41) — passes.
103 lines
4.1 KiB
Markdown
103 lines
4.1 KiB
Markdown
# Subtitle run — `Sassy the Sasquatch (2022)`
|
||
|
||
> ⚠ **STOP-GAP — needs v4 WhisperX cross-ref.** Owner accepted current
|
||
> subs as "85 %, acceptable" but tracked for full rebuild when v4 lands
|
||
> (ROADMAP H5). See [`STOPGAP-SUBS.md`](../STOPGAP-SUBS.md).
|
||
|
||
Recipe version: v3.5 — YouTube auto-CC via yt-dlp + cleaner (v4 WhisperX planned, see ROADMAP)
|
||
Run date: 2026-05-10
|
||
Operator: Claude Code @ onyx session, ai-lab cwd
|
||
|
||
## Source
|
||
|
||
| Field | Value |
|
||
|---|---|
|
||
| Episodes | 5 (S01 only) |
|
||
| Container | mkv |
|
||
| Video | AV1 Main, 1920×1080, 29.97 fps |
|
||
| Audio | `eng` Opus stereo (default) |
|
||
| Embedded subs | none (only font / cover-art attachments) |
|
||
| Existing sidecars | none |
|
||
| Runtime | ~11:20 per episode |
|
||
| Distribution | YouTube (THE BIG LEZ SHOW OFFICIAL channel, creator: Jarrad Wright) |
|
||
|
||
Niche-show indie animation. Same channel hosts Donny & Clarence Show, Mike
|
||
Nolan Show, Big Lez Saga — all four shows in our library are Jarrad Wright
|
||
productions distributed YouTube-first.
|
||
|
||
## Series + library context
|
||
|
||
- Series Id: `b2d1afd8a4a30c59adb42ccaf47376c2`
|
||
- Library: `767bffe4f11c93ef34b805451a696a4e` (TV Shows, `/media/tv`)
|
||
- IMDB series: `tt21209936`
|
||
- TVDB series: `421839`
|
||
- Per-episode IMDB ids: only S01E01 (`tt21215354`) — rest blank in TVDB
|
||
|
||
## Coverage probe — paid + free providers
|
||
|
||
Three parallel research agents (2026-05-10) checked every realistic source
|
||
before falling back to YouTube:
|
||
|
||
| Provider | Hits |
|
||
|---|---|
|
||
| OpenSubtitles.com REST (`parent_imdb_id=21209936`) | 1 — `SASSY THE SASQUATCH.Web-DL.1080p.en` S01E01, **HI-flagged** |
|
||
| OpenSubtitles.org legacy XML-RPC | 0 (account login 401 anyway) |
|
||
| Addic7ed | 0 |
|
||
| SubDL | 0 (`subtitles_count: 0`) |
|
||
| SubSource (Subscene successor) | 0 |
|
||
| Podnapisi | 0 |
|
||
| OS VIP upgrade | **would not unlock anything** — VIP is download-cap relief, not coverage. Same catalog as free. |
|
||
|
||
Conclusion: nothing exists outside YouTube. Buying VIP would not help; the
|
||
honest path is auto-generated subs.
|
||
|
||
## Outcome
|
||
|
||
| Season | Eps | Subs fetched | Quality | Notes |
|
||
|---|---|---|---|---|
|
||
| S01 | 5 | 5 / 5 | YT auto-CC stop-gap (lowercase, no punctuation, names mangled) | Cleaned via `lib/yt-clean.py`. v4 WhisperX rebuild planned |
|
||
|
||
Net: **5 / 5 (100 %)** — but at the lowest tier of the USER-G quality bar.
|
||
|
||
## Pipeline used
|
||
|
||
1. `yt-dlp --skip-download --write-auto-subs --sub-langs en-orig` against
|
||
the official Sassy playlist (`PLGMC7oz7XpmDMGrALMQiNXCi9p7aqkWbj`) →
|
||
raw VTT per episode in `/tmp/sassy-research/`.
|
||
2. `lib/yt-clean.py` collapses the rolling-window VTT (each cue carries 2-3
|
||
stale lines plus the freshly-spoken bottom line) into deduplicated SRT.
|
||
3. SSH cat redirect each cleaned `.srt` to nullstone at
|
||
`/home/user/media/tv/Sassy the Sasquatch (2022)/Season 01/<base>.eng.srt`
|
||
with library filename.
|
||
4. Validation-only library refresh; verified all 5 eps show exactly 1
|
||
external eng sub stream.
|
||
|
||
Reusable pipeline now lives at `lib/sub-yt-fetch.sh` (wrapper) +
|
||
`lib/yt-clean.py` (cleaner). Same one-liner handles Donny & Clarence,
|
||
Mike Nolan, Big Lez Saga (all on the same channel).
|
||
|
||
## Quality known issues
|
||
|
||
- **Lowercase, no punctuation** — YT ASR output verbatim
|
||
- **Proper-noun mishears**: "Sassy" → `sasha`, "Big Lez" → `Big Less`
|
||
- **Profanity censored as `[ __ ]`** — passthrough from YT
|
||
- **Sentence segmentation absent** — cues split on word boundaries
|
||
|
||
These violate STYLE.md "best quality" and "clean" rules. Documented as
|
||
explicit stop-gap; v4 WhisperX rebuild restores quality bar.
|
||
|
||
## Mike Nolan special-case (deferred)
|
||
|
||
A YouTube upload titled "MIKE NOLAN SHOW | COMPLETE SEASON | SUBTITLES"
|
||
posted Oct 2025 carries hand-typed CC tracks. When subbing Mike Nolan,
|
||
prefer that single video (rip CC tracks) over the per-episode auto-CC
|
||
playlist path. Note added to v4 roadmap.
|
||
|
||
## Followups
|
||
|
||
- [ ] visually verify one Sassy episode plays in sync (recipe §6) — YT
|
||
auto-cap timing is usually tight but worth a sanity check
|
||
- [ ] when v4 WhisperX lands, regenerate Sassy + Donny & Clarence + Big
|
||
Lez Saga + Mike Nolan in one batch on the 4080 friend node
|
||
- [ ] for Mike Nolan, try the "COMPLETE SEASON | SUBTITLES" YT upload
|
||
before falling back to Whisper
|