Owner accepted Sassy the Sasquatch S01 v3.5 YouTube-auto-CC subs as '85 % acceptable, fine, not great' but flagged them for v4 WhisperX rebuild. Adds a single worklist file (STOPGAP-SUBS.md) so every show that ships via the v3.5 path gets logged for the eventual v4 sweep instead of being silently forgotten. Sassy run log gets a STOP-GAP banner at the top pointing to the new worklist. README.md gets a stop-gap-exception note alongside the STYLE.md hard-prereq paragraph. ROADMAP H5 now points at the worklist file as the canonical source of which shows v4 needs to regenerate.
206 lines
8.3 KiB
Markdown
206 lines
8.3 KiB
Markdown
# Subtitle acquisition process — v1
|
|
|
|
Last updated: 2026-05-10
|
|
Status: **v3.5** — four fetch paths (plugin / OS REST / Addic7ed / YouTube auto-CC). American Dad 49/58 + Sassy 5/5. v4 WhisperX planned (ROADMAP H5).
|
|
|
|
This recipe is written for Claude Code to execute. Each step lists the exact
|
|
command, what to verify, and what to do on failure. Background reference for
|
|
how Jellyfin and the OpenSubtitles plugin work together lives in
|
|
[`docs/03-subtitles.md`](../../docs/03-subtitles.md).
|
|
|
|
> **Read [`STYLE.md`](STYLE.md) first.** Every fetch must hit the
|
|
> bar set there: one English `.srt` per episode, plain (no SDH / no MT / no
|
|
> AI / no Forced), best-quality release. The picker logic in v1/v2/v3
|
|
> mirrors that bar; if a step would violate it, stop and ask before
|
|
> downloading.
|
|
>
|
|
> Stop-gap exception: when the only available source is the v3.5 YouTube
|
|
> auto-CC path (lowercase, censored, mangled names), ship the sub but
|
|
> **add the show to [`STOPGAP-SUBS.md`](STOPGAP-SUBS.md)** so v4 WhisperX
|
|
> picks it up later.
|
|
|
|
---
|
|
|
|
## Prereqs (verify before running)
|
|
|
|
| Check | How |
|
|
|---|---|
|
|
| OpenSubtitles plugin v20 installed + Active | `docker exec jellyfin ls /config/plugins | grep -i opensub` |
|
|
| Plugin creds saved (`Caveman5`) | `docker exec jellyfin grep -E 'Username\|CredentialsInvalid' /config/plugins/configurations/Jellyfin.Plugin.OpenSubtitles.xml` — expect `Caveman5` and `false` |
|
|
| TV library has `SaveSubtitlesWithMedia=true`, `SubtitleDownloadLanguages=["eng"]`, `RequirePerfectSubtitleMatch=false` | `curl -s -H "X-Emby-Token: $TOK" http://localhost:8096/Library/VirtualFolders` |
|
|
| Free-tier quota remaining today (≥ episode count, else plan multi-day) | `docker logs --tail 200 jellyfin 2>&1 \| grep "Remaining downloads" \| tail -1` (free = 20/day, resets 00:00 UTC) |
|
|
| Source files have audio language tag | `ffprobe` sample episode |
|
|
|
|
If any prereq fails, stop. Fix it before running the recipe.
|
|
|
|
---
|
|
|
|
## Step 1 — Probe the source
|
|
|
|
Pick one episode of the target show. Run `ffprobe` on it:
|
|
|
|
```bash
|
|
ssh user@192.168.0.100 'docker exec jellyfin /usr/lib/jellyfin-ffmpeg/ffprobe -hide_banner "<path-to-mkv>" 2>&1 | grep -E "Stream|Duration"'
|
|
```
|
|
|
|
Record in the run log:
|
|
|
|
- video codec + resolution + frame rate
|
|
- audio language tag(s)
|
|
- whether any subtitle streams are embedded
|
|
- container
|
|
|
|
Decide based on probe:
|
|
|
|
| Probe result | Branch |
|
|
|---|---|
|
|
| English audio, no embedded subs | "simple" path (this recipe) |
|
|
| Foreign-dub audio, no embedded subs | "foreign-dub" path (deferred to v?) |
|
|
| Embedded English subs already present | skip — Jellyfin will use them |
|
|
| Embedded PGS/VobSub bitmap subs | "OCR" path (deferred to v?) |
|
|
|
|
---
|
|
|
|
## Step 2 — Resolve series + episode IDs
|
|
|
|
```bash
|
|
TOK=<jellyfin-admin-token>
|
|
SERIES_NAME='American Dad'
|
|
ssh user@192.168.0.100 "docker exec jellyfin curl -s -H 'X-Emby-Token: $TOK' \
|
|
'http://localhost:8096/Items?searchTerm=${SERIES_NAME// /+}&IncludeItemTypes=Series&Recursive=true&Limit=3'" \
|
|
| python3 -c "import json,sys; [print(x['Id'],x['Name']) for x in json.load(sys.stdin).get('Items',[])]"
|
|
```
|
|
|
|
Record series Id. Then list episodes:
|
|
|
|
```bash
|
|
SERIES=<series-id>
|
|
ssh user@192.168.0.100 "docker exec jellyfin curl -s -H 'X-Emby-Token: $TOK' \
|
|
'http://localhost:8096/Items?ParentId=$SERIES&IncludeItemTypes=Episode&Recursive=true&Fields=Path,ParentIndexNumber,IndexNumber'" \
|
|
| python3 -c "import json,sys; [print(e['Id'],'S%02dE%02d'%(e['ParentIndexNumber'],e['IndexNumber']),e['Name']) for e in json.load(sys.stdin)['Items']]"
|
|
```
|
|
|
|
---
|
|
|
|
## Step 3 — Pick fetch path
|
|
|
|
Four paths, ordered cheapest-quota-cost-first:
|
|
|
|
| Path | Cost / day cap | Coverage | Tool |
|
|
|---|---|---|---|
|
|
| **v3 Addic7ed** | free, no daily cap (anon) | English-only; near-complete on broadcast US shows; spotty on animated specials / niche titles | `lib/sub-a7d-fetch.py` |
|
|
| **v2 OS REST** | 20 / day on free OS account | best overall coverage; survives any S/E numbering quirk via per-ep `imdb_id` | `lib/sub-rest-fetch.py` |
|
|
| **v1 plugin** | counts against same OS 20/day | only works when library numbering matches OS catalogue (e.g. fails on American Dad past S01E07) | `lib/sub-fetch.sh` |
|
|
| **v3.5 YouTube auto-CC** | free, ratelimited only | for shows distributed YouTube-first (no community subs anywhere); produces lowercase, no-punctuation, name-mangled subs — **stop-gap, violates STYLE.md** | `lib/sub-yt-fetch.sh` + `lib/yt-clean.py` |
|
|
| **v4 WhisperX (planned)** | local CPU/GPU time | full-quality auto-transcription, restores STYLE.md bar for niche shows | TBD `lib/sub-whisperx-fetch.py` (ROADMAP H5) |
|
|
|
|
Default: try **v3** first to spare quota; fall back to **v2** for episodes
|
|
v3 misses or for non-English needs. **v1** stays for shows where simple
|
|
plugin auto-fetch is enough. **v3.5** is the stop-gap when nothing exists
|
|
on community providers; **v4** replaces v3.5 once the GPU node is set up.
|
|
|
|
Quick check whether v1 plugin will suffice (skip the rest if yes):
|
|
|
|
1. Pick the first episode of season 2 in the library.
|
|
2. Run `curl -s -H 'X-Emby-Token: $TOK' 'http://localhost:8096/Items/$EP/RemoteSearch/Subtitles/eng'` (read-only).
|
|
3. If results > 0 — v1 works.
|
|
4. If results == 0 but the show exists on opensubtitles.com — numbering mismatch (e.g. American Dad: library uses Hulu S1=7 eps; OS uses different). Use **v3** then **v2** for misses.
|
|
|
|
---
|
|
|
|
## Step 4 — Fetch subs per episode
|
|
|
|
### v3 — Addic7ed (default, free)
|
|
|
|
```bash
|
|
JELLYFIN_TOKEN=<admin-token> \
|
|
OPENSUBTITLES_API_KEY=$HOME/.config/arrflix-opensubtitles-api.txt \
|
|
processes/subtitles/lib/sub-a7d-fetch.py <series-id> --season N [--start E] [--end E]
|
|
```
|
|
|
|
Pre-flight with `DRY_RUN=1`. The OS REST key is used only for search
|
|
(quota-free) to translate library S/E to the show's catalogue numbering.
|
|
|
|
### v2 — OpenSubtitles REST (fallback for v3 misses)
|
|
|
|
```bash
|
|
JELLYFIN_TOKEN=<admin-token> \
|
|
OPENSUBTITLES_API_KEY=$HOME/.config/arrflix-opensubtitles-api.txt \
|
|
OPENSUBTITLES_USER=Caveman5 \
|
|
OPENSUBTITLES_PASS=<password> \
|
|
processes/subtitles/lib/sub-rest-fetch.py <series-id> --season N [--start E] [--end E]
|
|
```
|
|
|
|
20 / day cap, resets at 00:00 UTC.
|
|
|
|
### v1 — Jellyfin plugin (when library numbering matches OS)
|
|
|
|
`lib/sub-fetch.sh` — see header for env. Counts against the same 20/day cap.
|
|
|
|
### Verify after each batch
|
|
|
|
```bash
|
|
ssh user@192.168.0.100 'ls "<media-dir>/" | grep -c eng.srt'
|
|
```
|
|
|
|
---
|
|
|
|
## Step 5 — Library scan + de-dup (v1 only)
|
|
|
|
If you used the v1 plugin path, the metadata-cache copy and the media-folder
|
|
sidecar both register as subtitle streams in Jellyfin (counted twice).
|
|
Delete the cache copies:
|
|
|
|
```bash
|
|
ssh user@192.168.0.100 'docker exec jellyfin bash -c "find /config/metadata/library -path \"*<show-name>*S0[1-9]E*.eng.srt\" -delete -print"'
|
|
```
|
|
|
|
v2 writes directly to the media folder so there is no cache copy to clean.
|
|
|
|
Trigger a validation-only refresh so Jellyfin sees the new sidecars:
|
|
|
|
```bash
|
|
ssh user@192.168.0.100 "docker exec jellyfin curl -s -X POST -H 'X-Emby-Token: $TOK' \
|
|
'http://localhost:8096/Items/$SERIES/Refresh?MetadataRefreshMode=ValidationOnly&Recursive=true'"
|
|
```
|
|
|
|
Confirm one episode has exactly 1 external eng sub stream:
|
|
|
|
```bash
|
|
ssh user@192.168.0.100 "docker exec jellyfin curl -s -H 'X-Emby-Token: $TOK' \
|
|
'http://localhost:8096/Items/<sample-ep-id>?Fields=MediaStreams'" \
|
|
| python3 -c "import json,sys; subs=[s for s in json.load(sys.stdin).get('MediaStreams',[]) if s['Type']=='Subtitle']; print(len(subs),'sub streams')"
|
|
```
|
|
|
|
---
|
|
|
|
## Step 6 — Quality gate
|
|
|
|
For the run to pass:
|
|
|
|
- [ ] **Coverage**: every episode has a matching `<base>.eng.srt` sidecar
|
|
- [ ] **Sync sample**: at least one episode of each season is opened in
|
|
Jellyfin web and subs visually align with audio (±1 s) on a known dialogue
|
|
line
|
|
- [ ] **Flag check**: no `.sdh.srt`, `.forced.srt`, or `.hi.srt` files
|
|
(machine pick should have filtered)
|
|
- [ ] **Stream count**: Jellyfin shows exactly 1 external eng sub per episode
|
|
|
|
If any check fails, log it in `runs/<show>.md` under "breakage" and propose
|
|
the recipe amendment in `CHANGELOG.md`.
|
|
|
|
---
|
|
|
|
## Quota hygiene
|
|
|
|
Free OpenSubtitles.com account = 20 downloads / day, resets 00:00 UTC.
|
|
Plan large series across multiple days, or switch to VIP (~$3/mo, unlimited).
|
|
|
|
Quota check:
|
|
|
|
```bash
|
|
ssh user@192.168.0.100 'docker logs --tail 200 jellyfin 2>&1 | grep "Remaining downloads" | tail -1'
|
|
```
|
|
|
|
When quota hits 0 the API returns 0 results, indistinguishable from a real
|
|
miss. Always check quota before declaring a "no subs" failure.
|