processes: subtitle acquisition v1 + AD S01 run

Adds processes/ umbrella for repeatable acquisition workflows. First child
is subtitles/, with recipe README (executable by Claude Code), CHANGELOG,
per-show run logs, and a tested helper at lib/sub-fetch.sh.

Run on American Dad: S01 (7 eps) passed, S02-S04 (51 eps) broke. Library
uses Hulu/DSP season ordering; OpenSubtitles indexes by Fox airing order;
plugin queries by (parent_imdb_id, season, episode) so library S02E01
returns 0 hits. v2 design = direct OpenSubtitles REST with per-episode
imdb_id lookup; pending API-key registration.
This commit is contained in:
s8n 2026-05-09 22:56:12 +01:00
parent 1ed55152b7
commit fedf3388b8
6 changed files with 463 additions and 0 deletions

24
processes/README.md Normal file
View file

@ -0,0 +1,24 @@
# processes/ — repeatable acquisition workflows
This folder holds the canonical recipes for **acquiring external content** for
the ARRFLIX library: subtitles, artwork, metadata, episode stills, etc.
Internal ops (encoding, importing, theming) stay in `bin/` and `docs/`.
Each process is its own sub-folder with three files:
| File | Purpose |
|---|---|
| `README.md` | The canonical recipe. Step-by-step, executable by Claude Code. Always reflects the latest version. |
| `CHANGELOG.md` | Why the recipe changed, version-by-version. One entry per breakage that forced a revision. |
| `runs/<show>.md` | Evidence log: what happened when this recipe was applied to a specific show. |
Recipes evolve via the **iteration model**: apply to a show, succeed or break,
amend the recipe to handle the new case + every prior case, retry. A recipe
that "just works" is one that has survived every show in the library without
amendment for a full sweep.
## Children
| Process | Status | Last touched |
|---|---|---|
| [`subtitles/`](subtitles/) | v1 — partial pass on American Dad (S01 only); broke on S02 | 2026-05-09 |

View file

@ -0,0 +1,48 @@
# Subtitle process — changelog
## v1 — 2026-05-09
Initial recipe. Drafted while running on American Dad. Distilled from doc
03-subtitles.md (Futurama work) and the actual AD run.
Approach: Jellyfin RemoteSearch/Subtitles/eng → pick best non-HI/non-MT match
via Python filter → POST download → docker cp metadata cache → media folder →
delete cache dupes → validation refresh.
Scope: works on shows whose library season/episode numbering matches
OpenSubtitles' indexed numbering. Verified passing on AD S01 (7/7 episodes).
### Known break — added 2026-05-09 same day
After S01 passed, S02 returned 0 results for every episode probed (E01, E02,
E08, E13). Quota was fine (13 downloads remaining). Cause:
> Jellyfin metadata for American Dad uses **Hulu/DSP season ordering**
> (S1=7, S2=16, S3=19, S4=16). OpenSubtitles indexes by **Fox original-airing
> order** where S1 has 23 episodes. The plugin queries OS by
> `(parent_imdb_id, season_number, episode_number)`. For library S02E01
> "Bullocks to Stan" the plugin sends `S=2,E=1` but OS catalogues that
> episode as `S=1,E=8`. Result: 0 hits.
Each library episode has its own correct per-episode IMDB id (e.g.
`tt0511631` for "Bullocks to Stan") which would resolve directly via OS REST
`imdb_id=` parameter, but the plugin doesn't expose that path.
### v2 — pending design
Two paths under consideration:
- **A. Direct OpenSubtitles REST** — bypass plugin for fetch, use per-episode
IMDB id lookup. Requires registering a free API key at
`opensubtitles.com/consumers`. Process becomes a Python script (or extends
the existing helper) that logs in with `Caveman5` creds and uses the API
key for searches. Survives any season-numbering mismatch.
- **B. Library re-numbering** — re-scan AD with metadata indexer using Fox
airing order so library aligns with OpenSubtitles. Risk: re-orders existing
files and breaks user's mental model of the library. Doesn't help if the
next show has its own numbering quirk.
Recommendation: **A**. It's the more general fix; the next show with weird
numbering won't break it. It also unblocks higher-quality manual pick (filter
by `feature_id`, `imdb_id`, hash) which the plugin filters out today.

View file

@ -0,0 +1,193 @@
# Subtitle acquisition process — v1
Last updated: 2026-05-09
Status: **v1, partial** — passed American Dad S01 (7/7 eps), broke on S02E01 due to season-numbering mismatch. v2 design pending.
This recipe is written for Claude Code to execute. Each step lists the exact
command, what to verify, and what to do on failure. Background reference for
how Jellyfin and the OpenSubtitles plugin work together lives in
[`docs/03-subtitles.md`](../../docs/03-subtitles.md).
---
## Prereqs (verify before running)
| Check | How |
|---|---|
| OpenSubtitles plugin v20 installed + Active | `docker exec jellyfin ls /config/plugins | grep -i opensub` |
| Plugin creds saved (`Caveman5`) | `docker exec jellyfin grep -E 'Username\|CredentialsInvalid' /config/plugins/configurations/Jellyfin.Plugin.OpenSubtitles.xml` — expect `Caveman5` and `false` |
| TV library has `SaveSubtitlesWithMedia=true`, `SubtitleDownloadLanguages=["eng"]`, `RequirePerfectSubtitleMatch=false` | `curl -s -H "X-Emby-Token: $TOK" http://localhost:8096/Library/VirtualFolders` |
| Free-tier quota remaining today (≥ episode count, else plan multi-day) | `docker logs --tail 200 jellyfin 2>&1 \| grep "Remaining downloads" \| tail -1` (free = 20/day, resets 00:00 UTC) |
| Source files have audio language tag | `ffprobe` sample episode |
If any prereq fails, stop. Fix it before running the recipe.
---
## Step 1 — Probe the source
Pick one episode of the target show. Run `ffprobe` on it:
```bash
ssh user@192.168.0.100 'docker exec jellyfin /usr/lib/jellyfin-ffmpeg/ffprobe -hide_banner "<path-to-mkv>" 2>&1 | grep -E "Stream|Duration"'
```
Record in the run log:
- video codec + resolution + frame rate
- audio language tag(s)
- whether any subtitle streams are embedded
- container
Decide based on probe:
| Probe result | Branch |
|---|---|
| English audio, no embedded subs | "simple" path (this recipe) |
| Foreign-dub audio, no embedded subs | "foreign-dub" path (deferred to v?) |
| Embedded English subs already present | skip — Jellyfin will use them |
| Embedded PGS/VobSub bitmap subs | "OCR" path (deferred to v?) |
---
## Step 2 — Resolve series + episode IDs
```bash
TOK=<jellyfin-admin-token>
SERIES_NAME='American Dad'
ssh user@192.168.0.100 "docker exec jellyfin curl -s -H 'X-Emby-Token: $TOK' \
'http://localhost:8096/Items?searchTerm=${SERIES_NAME// /+}&IncludeItemTypes=Series&Recursive=true&Limit=3'" \
| python3 -c "import json,sys; [print(x['Id'],x['Name']) for x in json.load(sys.stdin).get('Items',[])]"
```
Record series Id. Then list episodes:
```bash
SERIES=<series-id>
ssh user@192.168.0.100 "docker exec jellyfin curl -s -H 'X-Emby-Token: $TOK' \
'http://localhost:8096/Items?ParentId=$SERIES&IncludeItemTypes=Episode&Recursive=true&Fields=Path,ParentIndexNumber,IndexNumber'" \
| python3 -c "import json,sys; [print(e['Id'],'S%02dE%02d'%(e['ParentIndexNumber'],e['IndexNumber']),e['Name']) for e in json.load(sys.stdin)['Items']]"
```
---
## Step 3 — Validate season numbering against OpenSubtitles
> ⚠️ **Critical, added in v2** (currently provisional — see CHANGELOG): some shows
> are catalogued differently across services. American Dad is the canonical
> example: Hulu/DSP carriers split the original Fox 23-ep S1 into Hulu S1 (7
> eps) + S2 (16 eps). OpenSubtitles indexes by Fox airing order. The plugin
> queries by `(parent_imdb_id, season, episode)` so library-side Hulu numbering
> returns 0 results past the first 7 episodes.
How to check:
1. Pick the first episode of season 2 in the library.
2. Run a `RemoteSearch/Subtitles/eng` against it (Step 4 below, but read-only).
3. If results > 0 — numbering matches OpenSubtitles. Proceed.
4. If results == 0 but the show exists on opensubtitles.com — numbering mismatch. **Stop**. Fix metadata first or use the v2 direct-API path (TBD).
---
## Step 4 — Fetch subs per episode
Per-episode loop. Helper script lives at `processes/subtitles/lib/sub-fetch.sh`
(promoted from `/tmp` once stable; see CHANGELOG v0→v1).
```bash
TOK=<token>
EP=<episode-id>
MEDIA_DIR='/home/user/media/tv/<Show>/Season XX'
MEDIA_BASE='<Show> - SxxExx - <Title>'
# 1. search
RAW=$(ssh user@192.168.0.100 "docker exec jellyfin curl -s -H 'X-Emby-Token: $TOK' \
'http://localhost:8096/Items/$EP/RemoteSearch/Subtitles/eng'")
# 2. pick best non-HI/non-MT/non-AI/non-Forced match, prefer 23.976fps, then highest DownloadCount
SUBID=$(printf '%s' "$RAW" | python3 -c "
import json,sys
subs=json.load(sys.stdin)
clean=[s for s in subs if not (s.get('HearingImpaired') or s.get('MachineTranslated') or s.get('AiTranslated') or s.get('Forced'))]
if not clean: clean=subs
fps2398=[s for s in clean if abs(s.get('FrameRate',0)-23.976)<0.01]
pool=fps2398 if fps2398 else clean
pool.sort(key=lambda s: -s.get('DownloadCount',0))
print(pool[0]['Id'] if pool else '')")
# 3. download (returns 204)
ssh user@192.168.0.100 "docker exec jellyfin curl -s -X POST -H 'X-Emby-Token: $TOK' \
'http://localhost:8096/Items/$EP/RemoteSearch/Subtitles/$SUBID' -w 'HTTP %{http_code}\n'"
# 4. plugin saves to /config/metadata/library/<shard>/<itemId>/<base>.eng.srt
# NOT next to media (manual-pick path ignores SaveSubtitlesWithMedia).
# Move it into place:
SHARD="${EP:0:2}"
ssh user@192.168.0.100 "docker cp \"jellyfin:/config/metadata/library/$SHARD/$EP/$MEDIA_BASE.eng.srt\" \
\"$MEDIA_DIR/\""
```
Verify after each batch:
```bash
ssh user@192.168.0.100 'ls "<media-dir>/" | grep -c eng.srt'
```
---
## Step 5 — Clean up duplicates + library scan
The metadata-cache copy and the media-folder sidecar both register as
subtitle streams in Jellyfin (counted twice). Delete the cache copies:
```bash
ssh user@192.168.0.100 'docker exec jellyfin bash -c "find /config/metadata/library -path \"*<show-name>*S0[1-9]E*.eng.srt\" -delete -print"'
```
Trigger a validation-only refresh so Jellyfin sees the new sidecars:
```bash
ssh user@192.168.0.100 "docker exec jellyfin curl -s -X POST -H 'X-Emby-Token: $TOK' \
'http://localhost:8096/Items/$SERIES/Refresh?MetadataRefreshMode=ValidationOnly&Recursive=true'"
```
Confirm one episode has exactly 1 external eng sub stream:
```bash
ssh user@192.168.0.100 "docker exec jellyfin curl -s -H 'X-Emby-Token: $TOK' \
'http://localhost:8096/Items/<sample-ep-id>?Fields=MediaStreams'" \
| python3 -c "import json,sys; subs=[s for s in json.load(sys.stdin).get('MediaStreams',[]) if s['Type']=='Subtitle']; print(len(subs),'sub streams')"
```
---
## Step 6 — Quality gate
For the run to pass:
- [ ] **Coverage**: every episode has a matching `<base>.eng.srt` sidecar
- [ ] **Sync sample**: at least one episode of each season is opened in
Jellyfin web and subs visually align with audio (±1 s) on a known dialogue
line
- [ ] **Flag check**: no `.sdh.srt`, `.forced.srt`, or `.hi.srt` files
(machine pick should have filtered)
- [ ] **Stream count**: Jellyfin shows exactly 1 external eng sub per episode
If any check fails, log it in `runs/<show>.md` under "breakage" and propose
the recipe amendment in `CHANGELOG.md`.
---
## Quota hygiene
Free OpenSubtitles.com account = 20 downloads / day, resets 00:00 UTC.
Plan large series across multiple days, or switch to VIP (~$3/mo, unlimited).
Quota check:
```bash
ssh user@192.168.0.100 'docker logs --tail 200 jellyfin 2>&1 | grep "Remaining downloads" | tail -1'
```
When quota hits 0 the API returns 0 results, indistinguishable from a real
miss. Always check quota before declaring a "no subs" failure.

View file

@ -0,0 +1,76 @@
#!/usr/bin/env bash
# Subtitle fetch helper — recipe v1 Step 4.
#
# Single-episode loop body. Runs against a Jellyfin instance reachable from
# nullstone via `docker exec jellyfin curl ...`. Driver loops should source or
# call this per episode.
#
# Picker: highest DownloadCount among results that are NOT
# (HearingImpaired|MachineTranslated|AiTranslated|Forced); 23.976fps preferred.
# Falls back to all results if every candidate is HI/MT/AI/Forced.
#
# Side effects:
# - POSTs RemoteSearch download (consumes 1 of 20 daily free-tier slots)
# - docker cp's the resulting metadata-cache srt to MEDIA_DIR
#
# Caller env:
# TOK Jellyfin admin X-Emby-Token
# EP Jellyfin episode item id
# MEDIA_DIR destination dir on nullstone, e.g.
# '/home/user/media/tv/American Dad! (2005)/Season 01'
# MEDIA_BASE filename without extension, must match the .mkv basename
#
# Exits non-zero on no-subs (1) or download HTTP != 204 (2).
# Output to stdout: "OK <ep-id> -> <dest path>".
# Output to stderr: chosen sub release name + fps + DownloadCount, or error.
set -euo pipefail
: "${TOK:?TOK required}"
: "${EP:?EP required}"
: "${MEDIA_DIR:?MEDIA_DIR required}"
: "${MEDIA_BASE:?MEDIA_BASE required}"
NULLSTONE="${NULLSTONE:-user@192.168.0.100}"
RAW=$(ssh "$NULLSTONE" "docker exec jellyfin curl -s -H 'X-Emby-Token: $TOK' \
'http://localhost:8096/Items/$EP/RemoteSearch/Subtitles/eng'")
SUBID=$(printf '%s' "$RAW" | python3 -c "
import json, sys
subs = json.load(sys.stdin)
clean = [s for s in subs
if not (s.get('HearingImpaired') or s.get('MachineTranslated')
or s.get('AiTranslated') or s.get('Forced'))]
if not clean:
clean = subs
fps2398 = [s for s in clean if abs(s.get('FrameRate', 0) - 23.976) < 0.01]
pool = fps2398 if fps2398 else clean
pool.sort(key=lambda s: -s.get('DownloadCount', 0))
if pool:
print(pool[0]['Id'])
print(pool[0]['Name'], pool[0].get('FrameRate'),
pool[0].get('DownloadCount'), file=sys.stderr)
")
if [[ -z "$SUBID" ]]; then
echo "NO-SUBS for $EP" >&2
exit 1
fi
HTTP=$(ssh "$NULLSTONE" "docker exec jellyfin curl -s -o /dev/null -X POST \
-H 'X-Emby-Token: $TOK' \
'http://localhost:8096/Items/$EP/RemoteSearch/Subtitles/$SUBID' \
-w '%{http_code}'")
if [[ "$HTTP" != "204" ]]; then
echo "DL-FAIL HTTP=$HTTP for $EP $SUBID" >&2
exit 2
fi
SHARD="${EP:0:2}"
SRC_IN_CONTAINER="/config/metadata/library/$SHARD/$EP/$MEDIA_BASE.eng.srt"
DEST="$MEDIA_DIR/$MEDIA_BASE.eng.srt"
ssh "$NULLSTONE" "docker cp \"jellyfin:$SRC_IN_CONTAINER\" \"$DEST\"" >/dev/null
echo "OK $EP -> $DEST"

View file

@ -0,0 +1,37 @@
# Subtitle run — `<Show name (Year)>`
Recipe version: v?
Run date: YYYY-MM-DD
Operator: Claude Code @ <session>
Quota at start / end: ?? / ??
## Source
| Field | Value |
|---|---|
| Episodes | ?? (S01S??) |
| Container | mkv / mp4 / ... |
| Video | codec res fps |
| Audio | language tag(s) |
| Embedded subs | yes / no — codecs |
| Existing sidecars | yes / no |
## Outcome
| Season | Eps | Subs fetched | Quality sample | Notes |
|---|---|---|---|---|
| S01 | ? | ? / ? | ? | |
## Picks (sample)
| Episode | Sub Id | Author | DownloadCount | FrameRate | HI |
|---|---|---|---|---|---|
| S01E01 | ... | ... | ... | ... | ... |
## Breakage (if any)
What broke, what was probed, what the recipe should have done differently.
## Recipe amendments triggered
- v1 → v2: ...

View file

@ -0,0 +1,85 @@
# Subtitle run — `American Dad! (2005)`
Recipe version: v1
Run date: 2026-05-09
Operator: Claude Code @ onyx session, ai-lab cwd
Quota at start / end: 20 / 13 (7 downloads, all S01)
## Source
| Field | Value |
|---|---|
| Episodes | 58 (S01=7, S02=16, S03=19, S04=16) |
| Container | mkv |
| Video | HEVC Main10, 1440×1080, 23.98 fps, 4:3 SAR 1:1 |
| Audio | `eng` AAC stereo (default) + `eng` AC3 5.1 |
| Embedded subs | none |
| Existing sidecars | none |
Library uses Hulu/DSP season ordering (S1=7 eps). Original Fox order has S1=23 eps.
## Series + library context
- Series Id: `3b3bc999e9107f1a7643ac45d6427fee`
- Library: `767bffe4f11c93ef34b805451a696a4e` (TV Shows, `/media/tv`)
- Library options: `SaveSubtitlesWithMedia=true`, `SubtitleDownloadLanguages=["eng"]`, `RequirePerfectSubtitleMatch=false`
- Plugin: Open Subtitles v20.0.0.0, Active, creds `Caveman5` valid
## Outcome
| Season | Eps | Subs fetched | Quality sample | Notes |
|---|---|---|---|---|
| S01 | 7 | 7 / 7 | not yet visually verified by playback (TODO) | All from `OMiCRON DVDRip` release group, fps 23.976 except S01E07 (24 fps), no SDH |
| S02 | 16 | 0 / 16 | n/a | Plugin RemoteSearch returns 0 for E01/E02/E08/E13 — broke recipe |
| S03 | 19 | 0 / 19 | n/a | Untested, expected same failure |
| S04 | 16 | 0 / 16 | n/a | Untested, expected same failure |
Net: **7 / 58 (12 %)**.
## Picks (S01)
| Episode | Sub release | Author | DLs | FPS | HI |
|---|---|---|---|---|---|
| S01E01 Pilot | `American.Dad.S01E01.DVDRip.XviD.REPACK-OMiCRON` | zetakoo_ | 154 132 | 23.976 | no |
| S01E02 Threat Levels | `American.Dad.S01E02.DVDRip.XviD.REPACK-OMiCRON` | (auto) | 89 896 | 23.976 | no |
| S01E03 Stan Knows Best | `American.Dad.S01E03.DVDRip.XviD.REPACK-OMiCRON` | (auto) | 69 317 | 23.976 | no |
| S01E04 Francines Flashback | `American.Dad.S01E04.DVDRip.XviD.REPACK-OMiCRON` | (auto) | 72 315 | 23.976 | no |
| S01E05 Roger Codger | `American.Dad.S01E05.DVDRip.XviD.REPACK-OMiCRON` | (auto) | 32 309 | 23.976 | no |
| S01E06 Homeland Insecurity | `American.Dad.S01E06.DVDRip.XviD.REPACK-OMiCRON` | (auto) | 67 778 | 23.976 | no |
| S01E07 Deacon Stan Jesus Man | `American.Dad.S01E07.DVDRip.XviD-OMiCRON` | (auto) | 65 124 | 24 | no |
All chose by recipe Step 4 picker (highest DownloadCount among non-HI / non-MT
/ non-AI / non-Forced, prefer 23.976 fps). Picker behaved consistently — no
manual override needed for S01.
## Breakage
After S01 passed, S02E01 search returned 0 results. Verified:
- ProviderIds for S02E01 in library = `Imdb=tt0511631 Tvdb=306168` (correct for "Bullocks to Stan")
- Plugin quota: 13 / 20 remaining (not exhausted)
- Plugin log shows no error — silent zero
- Same recipe worked 7 times in a row immediately prior — not a script bug
- Sample-tested S02E02 / S02E08 / S02E13 → all 0 results
Root cause: library numbering is Hulu/DSP (S1=7), OpenSubtitles indexes Fox
airing order (S1=23). Plugin queries OS with `(parent_imdb_id, season,
episode)` so library `S=2 E=1` maps to a Fox cell that doesn't exist on OS
in that S/E slot, even though the per-episode IMDB id (`tt0511631`) is real
and indexed on OS by Fox order as `S=1 E=8`.
The plugin doesn't expose per-episode-IMDB lookup, only the S/E combo path,
so there's no flag we can flip to make this work.
## Recipe amendments triggered
- **v1 → v2**: process needs a season-numbering pre-check (Step 3), and a
fallback fetch path that doesn't rely on plugin S/E mapping. See
`CHANGELOG.md` v2 design choice between direct OS REST (recommended) and
library re-numbering.
## Followups
- [ ] visually verify a sample S01 sub plays in sync (one ep per recipe rule §6)
- [ ] decide v2 path (REST vs renumber)
- [ ] sub S02S04 (51 eps) once v2 lands