legacy-arrflix/processes/subtitles/README.md
s8n 23520df2df processes/subtitles: v2 REST fetcher + AD S02E01-E12 subbed
Adds lib/sub-rest-fetch.py: direct OpenSubtitles REST, looks up subs by
per-episode IMDB id (e.g. tt0511631) instead of the plugin's
(parent_imdb_id, season, episode) combo path. This sidesteps shows where
library numbering diverges from OpenSubtitles' catalogued numbering --
American Dad uses Hulu S1=7 eps; OS uses Fox S1=23 eps; the plugin path
returns 0 hits past S01E07 even though every per-episode IMDB id is
correct.

Recipe README updated to surface the two paths (v1 plugin / v2 REST) and
recommend v2 by default. American Dad run log now shows 19/58 episodes
subbed (S01 7/7 via v1, S02E01-E12 via v2). S02E13-S04 (39 eps) deferred
to next 20/day quota windows.

Quirk fixed in v2: OpenSubtitles /download endpoint consistently returns
HTTP 503 to Python urllib.request despite identical headers/body via curl.
_curl() shim routes all OS API calls through curl. Each 503 still
consumes a download slot, so urllib path was unsafe to retry on.
2026-05-09 23:09:09 +01:00

181 lines
6.7 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Subtitle acquisition process — v1
Last updated: 2026-05-09
Status: **v2** — direct REST API. American Dad S01S02 (19/58 eps) subbed. S02E13S04 awaiting next quota window.
This recipe is written for Claude Code to execute. Each step lists the exact
command, what to verify, and what to do on failure. Background reference for
how Jellyfin and the OpenSubtitles plugin work together lives in
[`docs/03-subtitles.md`](../../docs/03-subtitles.md).
---
## Prereqs (verify before running)
| Check | How |
|---|---|
| OpenSubtitles plugin v20 installed + Active | `docker exec jellyfin ls /config/plugins | grep -i opensub` |
| Plugin creds saved (`Caveman5`) | `docker exec jellyfin grep -E 'Username\|CredentialsInvalid' /config/plugins/configurations/Jellyfin.Plugin.OpenSubtitles.xml` — expect `Caveman5` and `false` |
| TV library has `SaveSubtitlesWithMedia=true`, `SubtitleDownloadLanguages=["eng"]`, `RequirePerfectSubtitleMatch=false` | `curl -s -H "X-Emby-Token: $TOK" http://localhost:8096/Library/VirtualFolders` |
| Free-tier quota remaining today (≥ episode count, else plan multi-day) | `docker logs --tail 200 jellyfin 2>&1 \| grep "Remaining downloads" \| tail -1` (free = 20/day, resets 00:00 UTC) |
| Source files have audio language tag | `ffprobe` sample episode |
If any prereq fails, stop. Fix it before running the recipe.
---
## Step 1 — Probe the source
Pick one episode of the target show. Run `ffprobe` on it:
```bash
ssh user@192.168.0.100 'docker exec jellyfin /usr/lib/jellyfin-ffmpeg/ffprobe -hide_banner "<path-to-mkv>" 2>&1 | grep -E "Stream|Duration"'
```
Record in the run log:
- video codec + resolution + frame rate
- audio language tag(s)
- whether any subtitle streams are embedded
- container
Decide based on probe:
| Probe result | Branch |
|---|---|
| English audio, no embedded subs | "simple" path (this recipe) |
| Foreign-dub audio, no embedded subs | "foreign-dub" path (deferred to v?) |
| Embedded English subs already present | skip — Jellyfin will use them |
| Embedded PGS/VobSub bitmap subs | "OCR" path (deferred to v?) |
---
## Step 2 — Resolve series + episode IDs
```bash
TOK=<jellyfin-admin-token>
SERIES_NAME='American Dad'
ssh user@192.168.0.100 "docker exec jellyfin curl -s -H 'X-Emby-Token: $TOK' \
'http://localhost:8096/Items?searchTerm=${SERIES_NAME// /+}&IncludeItemTypes=Series&Recursive=true&Limit=3'" \
| python3 -c "import json,sys; [print(x['Id'],x['Name']) for x in json.load(sys.stdin).get('Items',[])]"
```
Record series Id. Then list episodes:
```bash
SERIES=<series-id>
ssh user@192.168.0.100 "docker exec jellyfin curl -s -H 'X-Emby-Token: $TOK' \
'http://localhost:8096/Items?ParentId=$SERIES&IncludeItemTypes=Episode&Recursive=true&Fields=Path,ParentIndexNumber,IndexNumber'" \
| python3 -c "import json,sys; [print(e['Id'],'S%02dE%02d'%(e['ParentIndexNumber'],e['IndexNumber']),e['Name']) for e in json.load(sys.stdin)['Items']]"
```
---
## Step 3 — Pick fetch path
Two paths, differ in robustness vs simplicity:
| Path | When to use | Tool |
|---|---|---|
| **v1 (plugin)** | Library season/episode numbering matches OpenSubtitles indexing AND every episode has good IMDB ProviderId | `lib/sub-fetch.sh` |
| **v2 (REST)** | Default. Survives Hulu/Fox numbering mismatches and shows with weird ordering | `lib/sub-rest-fetch.py` |
Quick check whether v1 will work:
1. Pick the first episode of season 2 in the library.
2. Run `curl -s -H 'X-Emby-Token: $TOK' 'http://localhost:8096/Items/$EP/RemoteSearch/Subtitles/eng'` (read-only).
3. If results > 0 — v1 works. v2 also works.
4. If results == 0 but the show exists on opensubtitles.com — numbering mismatch (e.g. American Dad: library uses Hulu S1=7 eps; OS uses Fox S1=23). Use **v2**.
When in doubt, use v2.
---
## Step 4 — Fetch subs per episode
Use `lib/sub-rest-fetch.py` (v2). It logs in to OpenSubtitles, looks each
episode up by its per-episode IMDB id, picks the best English match, and
writes the sidecar straight to nullstone.
```bash
JELLYFIN_TOKEN=<admin-token> \
OPENSUBTITLES_API_KEY=$HOME/.config/arrflix-opensubtitles-api.txt \
OPENSUBTITLES_USER=Caveman5 \
OPENSUBTITLES_PASS=<password> \
processes/subtitles/lib/sub-rest-fetch.py <series-id> --season N [--start E] [--end E]
```
Pre-flight with `DRY_RUN=1` to see picks without consuming quota.
The legacy v1 path (Jellyfin plugin RemoteSearch + docker cp) lives at
`lib/sub-fetch.sh` and is kept for shows where library numbering matches
OpenSubtitles' indexing — slightly less general but doesn't depend on the
external OS REST API or our 20/day account quota.
Verify after each batch:
```bash
ssh user@192.168.0.100 'ls "<media-dir>/" | grep -c eng.srt'
```
---
## Step 5 — Library scan + de-dup (v1 only)
If you used the v1 plugin path, the metadata-cache copy and the media-folder
sidecar both register as subtitle streams in Jellyfin (counted twice).
Delete the cache copies:
```bash
ssh user@192.168.0.100 'docker exec jellyfin bash -c "find /config/metadata/library -path \"*<show-name>*S0[1-9]E*.eng.srt\" -delete -print"'
```
v2 writes directly to the media folder so there is no cache copy to clean.
Trigger a validation-only refresh so Jellyfin sees the new sidecars:
```bash
ssh user@192.168.0.100 "docker exec jellyfin curl -s -X POST -H 'X-Emby-Token: $TOK' \
'http://localhost:8096/Items/$SERIES/Refresh?MetadataRefreshMode=ValidationOnly&Recursive=true'"
```
Confirm one episode has exactly 1 external eng sub stream:
```bash
ssh user@192.168.0.100 "docker exec jellyfin curl -s -H 'X-Emby-Token: $TOK' \
'http://localhost:8096/Items/<sample-ep-id>?Fields=MediaStreams'" \
| python3 -c "import json,sys; subs=[s for s in json.load(sys.stdin).get('MediaStreams',[]) if s['Type']=='Subtitle']; print(len(subs),'sub streams')"
```
---
## Step 6 — Quality gate
For the run to pass:
- [ ] **Coverage**: every episode has a matching `<base>.eng.srt` sidecar
- [ ] **Sync sample**: at least one episode of each season is opened in
Jellyfin web and subs visually align with audio (±1 s) on a known dialogue
line
- [ ] **Flag check**: no `.sdh.srt`, `.forced.srt`, or `.hi.srt` files
(machine pick should have filtered)
- [ ] **Stream count**: Jellyfin shows exactly 1 external eng sub per episode
If any check fails, log it in `runs/<show>.md` under "breakage" and propose
the recipe amendment in `CHANGELOG.md`.
---
## Quota hygiene
Free OpenSubtitles.com account = 20 downloads / day, resets 00:00 UTC.
Plan large series across multiple days, or switch to VIP (~$3/mo, unlimited).
Quota check:
```bash
ssh user@192.168.0.100 'docker logs --tail 200 jellyfin 2>&1 | grep "Remaining downloads" | tail -1'
```
When quota hits 0 the API returns 0 results, indistinguishable from a real
miss. Always check quota before declaring a "no subs" failure.