processes/ -> playbooks/ (git mv preserves history; updated cross-refs in ROADMAP, README, subtitles playbook + scripts). playbooks/import-media/README.md v1.0 — 7-step import workflow: stage on onyx -> rsync to nullstone -> chmod -> verify scan -> Items/Counts bump -> optional subtitle pass -> run-log Cross-references docs/05/07/08, ADMIN-GUIDE, README. Mirrors the existing subtitles playbook structure (CHANGELOG + runs/_template). CHANGELOG v1.0 lists known gaps (bin/cleanup-import.sh and bin/normalize.py still doc-only, ROADMAP M6). First run logged: playbooks/import-media/runs/lilo-stitch-2002.md. Lilo & Stitch (2002) imported to /home/user/media/movies/, item c2f4aff133c1b9631500fadf293b0b2f, TMDb 11544, MovieCount 3 -> 4. LibraryMonitor didn't auto-fire — needed manual /Library/Refresh; playbook updated to make this an unconditional step. Source: 1080p BluRay HEVC 10-bit / EAC3 5.1 / 2x PGS embedded subs. Per quality bar (README.md:41) — passes.
144 lines
6.3 KiB
Markdown
144 lines
6.3 KiB
Markdown
# Subtitle process — changelog
|
||
|
||
## v1 — 2026-05-09
|
||
|
||
Initial recipe. Drafted while running on American Dad. Distilled from doc
|
||
03-subtitles.md (Futurama work) and the actual AD run.
|
||
|
||
Approach: Jellyfin RemoteSearch/Subtitles/eng → pick best non-HI/non-MT match
|
||
via Python filter → POST download → docker cp metadata cache → media folder →
|
||
delete cache dupes → validation refresh.
|
||
|
||
Scope: works on shows whose library season/episode numbering matches
|
||
OpenSubtitles' indexed numbering. Verified passing on AD S01 (7/7 episodes).
|
||
|
||
### Known break — added 2026-05-09 same day
|
||
|
||
After S01 passed, S02 returned 0 results for every episode probed (E01, E02,
|
||
E08, E13). Quota was fine (13 downloads remaining). Cause:
|
||
|
||
> Jellyfin metadata for American Dad uses **Hulu/DSP season ordering**
|
||
> (S1=7, S2=16, S3=19, S4=16). OpenSubtitles indexes by **Fox original-airing
|
||
> order** where S1 has 23 episodes. The plugin queries OS by
|
||
> `(parent_imdb_id, season_number, episode_number)`. For library S02E01
|
||
> "Bullocks to Stan" the plugin sends `S=2,E=1` but OS catalogues that
|
||
> episode as `S=1,E=8`. Result: 0 hits.
|
||
|
||
Each library episode has its own correct per-episode IMDB id (e.g.
|
||
`tt0511631` for "Bullocks to Stan") which would resolve directly via OS REST
|
||
`imdb_id=` parameter, but the plugin doesn't expose that path.
|
||
|
||
## v2 — 2026-05-09
|
||
|
||
Approach **A** chosen: direct OpenSubtitles REST API, per-episode `imdb_id`
|
||
lookup, bypass the Jellyfin plugin entirely. New helper at
|
||
`lib/sub-rest-fetch.py`.
|
||
|
||
- API key file: `~/.config/arrflix-opensubtitles-api.txt` (mode 600)
|
||
- Account: `Caveman5` (free tier, 20 downloads/day)
|
||
- Saves sidecars directly to nullstone media folder via `ssh ... cat >`
|
||
- No more docker-cp from `/config/metadata/library` cache (plugin path)
|
||
|
||
Recipe upgrade:
|
||
- Step 4 swaps `lib/sub-fetch.sh` → `lib/sub-rest-fetch.py` for shows with
|
||
non-standard season ordering.
|
||
- Picker logic identical: filter HI/MT/AI/Forced (renamed
|
||
`foreign_parts_only` in OS REST), prefer 23.976fps, sort by
|
||
`download_count` desc.
|
||
|
||
### v2 known quirks
|
||
|
||
- **OpenSubtitles `/download` endpoint rejects urllib** — consistent HTTP 503
|
||
via Python `urllib.request`, HTTP 200 via `curl` with same headers/body.
|
||
`_curl()` shim added; all OS API calls go through it. **Each 503 still
|
||
consumes 1 download-quota slot**, so this had to be fixed before retrying
|
||
large batches.
|
||
- `download_count` of `0` and `fps` of `0.0` appear on some catalogue
|
||
entries; treat as informational, not exclusionary.
|
||
- Some hits have `file_name` mismatching the `imdb_id` searched (OS metadata
|
||
drift). Recipe Step 6 visual-sync check is the catch.
|
||
|
||
### v2 known limits
|
||
|
||
- Free-tier 20/day still in force (REST and plugin share the counter).
|
||
- Recipe Step 6 (sync verification) is still manual — no automated check
|
||
that the picked .srt actually aligns with audio.
|
||
|
||
## v3 — 2026-05-09
|
||
|
||
Approach **Addic7ed via subliminal** added as a quota-free fallback. New
|
||
helper at `lib/sub-a7d-fetch.py`. Runs alongside v2; pick whichever fits.
|
||
|
||
- `subliminal` Python lib drives `addic7ed` provider, anonymous
|
||
- OS REST is still consulted (search-only, no quota cost) to translate
|
||
library Hulu numbering to the show's primary catalogue numbering, since
|
||
Addic7ed and OS feature_details appear to align for at least the test
|
||
show (American Dad)
|
||
- Sidecar written direct to nullstone via `ssh ... cat >`
|
||
|
||
### v3 picker / matching
|
||
|
||
- subliminal returns ordered candidates by match score; takes first
|
||
- "!" in series name breaks subliminal's matcher; recipe strips it before
|
||
building the synthetic filename for `Video.fromname()`
|
||
- Synthetic filename pattern: `Series.Name.Year.SXXEYY.HDTV.x264.mkv`
|
||
|
||
### v3 known quirks
|
||
|
||
- Some episodes return 0 hits at addic7ed for the OS-feat-details S/E we
|
||
pass — likely cases where addic7ed indexes by Fox airing order while OS
|
||
uses DVD-compressed (or vice versa). On American Dad, ~9 of 58 episodes
|
||
missed via this path. Fall back to v2 OS REST when quota allows.
|
||
- One episode (`Black Mystery Month`) had a hit but downloaded empty
|
||
content — addic7ed-side cataloguing error or temp 0-byte upload.
|
||
- Per-show coverage varies: Addic7ed has near-complete English on broadcast
|
||
US shows but spotty for animated specials and obscure titles.
|
||
|
||
### v3 known limits
|
||
|
||
- English coverage best; non-English near-empty
|
||
- Anonymous downloads work but heavy bursts may trigger Addic7ed's
|
||
bot detection and short IP throttle (~1 hour). The script makes no
|
||
effort at jittering / backoff
|
||
- No automated sync-quality check; recipe Step 6 still manual
|
||
|
||
## v3.5 — 2026-05-10 (stop-gap path for niche YouTube-distributed shows)
|
||
|
||
For shows that distribute on YouTube and have no community subs anywhere
|
||
(verified by parallel research agents covering OS REST / OS legacy /
|
||
Addic7ed / SubDL / SubSource / Podnapisi for 5 niche shows), pull the
|
||
YouTube auto-CC track via yt-dlp and clean it.
|
||
|
||
- New helper: `lib/sub-yt-fetch.sh` (yt-dlp wrapper) + `lib/yt-clean.py`
|
||
(rolling-window VTT → flat SRT cleaner)
|
||
- First applied to **Sassy the Sasquatch (2022)**, S01 5/5 episodes
|
||
- Reusable for the rest of the Big Lez universe (same channel hosts
|
||
Donny & Clarence, Mike Nolan, Big Lez Saga)
|
||
|
||
### v3.5 known limits — explicitly violates STYLE.md "best quality"
|
||
|
||
- Lowercase, no punctuation, no sentence segmentation
|
||
- Proper-noun mishears (Sassy → "sasha", Big Lez → "Big Less")
|
||
- Profanity censored as `[ __ ]` by YouTube's ASR
|
||
- Will be replaced wholesale by v4 WhisperX (see ROADMAP H5)
|
||
|
||
### v3.5 also discovered
|
||
|
||
- **OpenSubtitles VIP would not have helped.** Verified: VIP is download-cap
|
||
relief and ad removal, not coverage unlock. Same catalog as free.
|
||
- **Mike Nolan special-case**: a YouTube upload titled
|
||
"MIKE NOLAN SHOW | COMPLETE SEASON | SUBTITLES" (Oct 2025) carries
|
||
hand-typed CCs. When subbing Mike Nolan, prefer ripping that single
|
||
upload over the per-episode auto-CC playlist path.
|
||
|
||
## v4 — planned (see ROADMAP H5)
|
||
|
||
Path: **WhisperX large-v3 on friend RTX 4080 node** (`100.64.0.3`).
|
||
|
||
- Replaces v3.5 stop-gap with full-quality auto-transcription
|
||
- Per-show proper-noun prompt at `playbooks/subtitles/prompts/<show>.yaml`
|
||
- New helper: `lib/sub-whisperx-fetch.py` (TBD)
|
||
- Expected WER: 4–6% on noisy / animated dialogue (vs ~12% YT auto-CC)
|
||
- Restores STYLE.md "one clean English sub per ep" bar for niche shows
|
||
- Cloud fallback: ElevenLabs Scribe v2 (~$0.40/hr, ~2.2% WER) for any
|
||
episode WhisperX still misses
|