legacy-arrflix/processes/subtitles/README.md
s8n fedf3388b8 processes: subtitle acquisition v1 + AD S01 run
Adds processes/ umbrella for repeatable acquisition workflows. First child
is subtitles/, with recipe README (executable by Claude Code), CHANGELOG,
per-show run logs, and a tested helper at lib/sub-fetch.sh.

Run on American Dad: S01 (7 eps) passed, S02-S04 (51 eps) broke. Library
uses Hulu/DSP season ordering; OpenSubtitles indexes by Fox airing order;
plugin queries by (parent_imdb_id, season, episode) so library S02E01
returns 0 hits. v2 design = direct OpenSubtitles REST with per-episode
imdb_id lookup; pending API-key registration.
2026-05-09 22:56:31 +01:00

193 lines
7.3 KiB
Markdown

# Subtitle acquisition process — v1
Last updated: 2026-05-09
Status: **v1, partial** — passed American Dad S01 (7/7 eps), broke on S02E01 due to season-numbering mismatch. v2 design pending.
This recipe is written for Claude Code to execute. Each step lists the exact
command, what to verify, and what to do on failure. Background reference for
how Jellyfin and the OpenSubtitles plugin work together lives in
[`docs/03-subtitles.md`](../../docs/03-subtitles.md).
---
## Prereqs (verify before running)
| Check | How |
|---|---|
| OpenSubtitles plugin v20 installed + Active | `docker exec jellyfin ls /config/plugins | grep -i opensub` |
| Plugin creds saved (`Caveman5`) | `docker exec jellyfin grep -E 'Username\|CredentialsInvalid' /config/plugins/configurations/Jellyfin.Plugin.OpenSubtitles.xml` — expect `Caveman5` and `false` |
| TV library has `SaveSubtitlesWithMedia=true`, `SubtitleDownloadLanguages=["eng"]`, `RequirePerfectSubtitleMatch=false` | `curl -s -H "X-Emby-Token: $TOK" http://localhost:8096/Library/VirtualFolders` |
| Free-tier quota remaining today (≥ episode count, else plan multi-day) | `docker logs --tail 200 jellyfin 2>&1 \| grep "Remaining downloads" \| tail -1` (free = 20/day, resets 00:00 UTC) |
| Source files have audio language tag | `ffprobe` sample episode |
If any prereq fails, stop. Fix it before running the recipe.
---
## Step 1 — Probe the source
Pick one episode of the target show. Run `ffprobe` on it:
```bash
ssh user@192.168.0.100 'docker exec jellyfin /usr/lib/jellyfin-ffmpeg/ffprobe -hide_banner "<path-to-mkv>" 2>&1 | grep -E "Stream|Duration"'
```
Record in the run log:
- video codec + resolution + frame rate
- audio language tag(s)
- whether any subtitle streams are embedded
- container
Decide based on probe:
| Probe result | Branch |
|---|---|
| English audio, no embedded subs | "simple" path (this recipe) |
| Foreign-dub audio, no embedded subs | "foreign-dub" path (deferred to v?) |
| Embedded English subs already present | skip — Jellyfin will use them |
| Embedded PGS/VobSub bitmap subs | "OCR" path (deferred to v?) |
---
## Step 2 — Resolve series + episode IDs
```bash
TOK=<jellyfin-admin-token>
SERIES_NAME='American Dad'
ssh user@192.168.0.100 "docker exec jellyfin curl -s -H 'X-Emby-Token: $TOK' \
'http://localhost:8096/Items?searchTerm=${SERIES_NAME// /+}&IncludeItemTypes=Series&Recursive=true&Limit=3'" \
| python3 -c "import json,sys; [print(x['Id'],x['Name']) for x in json.load(sys.stdin).get('Items',[])]"
```
Record series Id. Then list episodes:
```bash
SERIES=<series-id>
ssh user@192.168.0.100 "docker exec jellyfin curl -s -H 'X-Emby-Token: $TOK' \
'http://localhost:8096/Items?ParentId=$SERIES&IncludeItemTypes=Episode&Recursive=true&Fields=Path,ParentIndexNumber,IndexNumber'" \
| python3 -c "import json,sys; [print(e['Id'],'S%02dE%02d'%(e['ParentIndexNumber'],e['IndexNumber']),e['Name']) for e in json.load(sys.stdin)['Items']]"
```
---
## Step 3 — Validate season numbering against OpenSubtitles
> ⚠️ **Critical, added in v2** (currently provisional — see CHANGELOG): some shows
> are catalogued differently across services. American Dad is the canonical
> example: Hulu/DSP carriers split the original Fox 23-ep S1 into Hulu S1 (7
> eps) + S2 (16 eps). OpenSubtitles indexes by Fox airing order. The plugin
> queries by `(parent_imdb_id, season, episode)` so library-side Hulu numbering
> returns 0 results past the first 7 episodes.
How to check:
1. Pick the first episode of season 2 in the library.
2. Run a `RemoteSearch/Subtitles/eng` against it (Step 4 below, but read-only).
3. If results > 0 — numbering matches OpenSubtitles. Proceed.
4. If results == 0 but the show exists on opensubtitles.com — numbering mismatch. **Stop**. Fix metadata first or use the v2 direct-API path (TBD).
---
## Step 4 — Fetch subs per episode
Per-episode loop. Helper script lives at `processes/subtitles/lib/sub-fetch.sh`
(promoted from `/tmp` once stable; see CHANGELOG v0→v1).
```bash
TOK=<token>
EP=<episode-id>
MEDIA_DIR='/home/user/media/tv/<Show>/Season XX'
MEDIA_BASE='<Show> - SxxExx - <Title>'
# 1. search
RAW=$(ssh user@192.168.0.100 "docker exec jellyfin curl -s -H 'X-Emby-Token: $TOK' \
'http://localhost:8096/Items/$EP/RemoteSearch/Subtitles/eng'")
# 2. pick best non-HI/non-MT/non-AI/non-Forced match, prefer 23.976fps, then highest DownloadCount
SUBID=$(printf '%s' "$RAW" | python3 -c "
import json,sys
subs=json.load(sys.stdin)
clean=[s for s in subs if not (s.get('HearingImpaired') or s.get('MachineTranslated') or s.get('AiTranslated') or s.get('Forced'))]
if not clean: clean=subs
fps2398=[s for s in clean if abs(s.get('FrameRate',0)-23.976)<0.01]
pool=fps2398 if fps2398 else clean
pool.sort(key=lambda s: -s.get('DownloadCount',0))
print(pool[0]['Id'] if pool else '')")
# 3. download (returns 204)
ssh user@192.168.0.100 "docker exec jellyfin curl -s -X POST -H 'X-Emby-Token: $TOK' \
'http://localhost:8096/Items/$EP/RemoteSearch/Subtitles/$SUBID' -w 'HTTP %{http_code}\n'"
# 4. plugin saves to /config/metadata/library/<shard>/<itemId>/<base>.eng.srt
# NOT next to media (manual-pick path ignores SaveSubtitlesWithMedia).
# Move it into place:
SHARD="${EP:0:2}"
ssh user@192.168.0.100 "docker cp \"jellyfin:/config/metadata/library/$SHARD/$EP/$MEDIA_BASE.eng.srt\" \
\"$MEDIA_DIR/\""
```
Verify after each batch:
```bash
ssh user@192.168.0.100 'ls "<media-dir>/" | grep -c eng.srt'
```
---
## Step 5 — Clean up duplicates + library scan
The metadata-cache copy and the media-folder sidecar both register as
subtitle streams in Jellyfin (counted twice). Delete the cache copies:
```bash
ssh user@192.168.0.100 'docker exec jellyfin bash -c "find /config/metadata/library -path \"*<show-name>*S0[1-9]E*.eng.srt\" -delete -print"'
```
Trigger a validation-only refresh so Jellyfin sees the new sidecars:
```bash
ssh user@192.168.0.100 "docker exec jellyfin curl -s -X POST -H 'X-Emby-Token: $TOK' \
'http://localhost:8096/Items/$SERIES/Refresh?MetadataRefreshMode=ValidationOnly&Recursive=true'"
```
Confirm one episode has exactly 1 external eng sub stream:
```bash
ssh user@192.168.0.100 "docker exec jellyfin curl -s -H 'X-Emby-Token: $TOK' \
'http://localhost:8096/Items/<sample-ep-id>?Fields=MediaStreams'" \
| python3 -c "import json,sys; subs=[s for s in json.load(sys.stdin).get('MediaStreams',[]) if s['Type']=='Subtitle']; print(len(subs),'sub streams')"
```
---
## Step 6 — Quality gate
For the run to pass:
- [ ] **Coverage**: every episode has a matching `<base>.eng.srt` sidecar
- [ ] **Sync sample**: at least one episode of each season is opened in
Jellyfin web and subs visually align with audio (±1 s) on a known dialogue
line
- [ ] **Flag check**: no `.sdh.srt`, `.forced.srt`, or `.hi.srt` files
(machine pick should have filtered)
- [ ] **Stream count**: Jellyfin shows exactly 1 external eng sub per episode
If any check fails, log it in `runs/<show>.md` under "breakage" and propose
the recipe amendment in `CHANGELOG.md`.
---
## Quota hygiene
Free OpenSubtitles.com account = 20 downloads / day, resets 00:00 UTC.
Plan large series across multiple days, or switch to VIP (~$3/mo, unlimited).
Quota check:
```bash
ssh user@192.168.0.100 'docker logs --tail 200 jellyfin 2>&1 | grep "Remaining downloads" | tail -1'
```
When quota hits 0 the API returns 0 results, indistinguishable from a real
miss. Always check quota before declaring a "no subs" failure.