processes/subtitles: v3 Addic7ed fetcher + AD 49/58 subbed

Adds lib/sub-a7d-fetch.py: free, no-daily-cap path via subliminal's
addic7ed provider (anonymous). Uses OpenSubtitles REST search-only (no
quota cost) to translate library S/E to the show's primary catalogue
numbering, then drives subliminal to download from Addic7ed and writes
sidecars direct to nullstone via SSH.

Picker quirks: subliminal series-name matcher is broken by '!' in the
title, so the script strips it before building the synthetic
Video.fromname() string. OS feature_details S/E happens to align with
Addic7ed's indexing for the test show (American Dad).

Recipe README now reflects three paths in cheapest-first order: v3
Addic7ed, v2 OS REST (20/day), v1 plugin. American Dad run log updated
to 49/58 (S01 7/7 v1, S02 16/16 mixed v2/v3, S03 16/19 v3, S04 10/16
v3). 9 misses identified, deferred to next OS REST quota window.
This commit is contained in:
s8n 2026-05-09 23:31:10 +01:00
parent 23520df2df
commit 43f55643be
5 changed files with 335 additions and 30 deletions

View file

@ -21,4 +21,4 @@ amendment for a full sweep.
| Process | Status | Last touched | | Process | Status | Last touched |
|---|---|---| |---|---|---|
| [`subtitles/`](subtitles/) | v2 — direct OpenSubtitles REST. AD 19/58 eps subbed (S01 + S02E01E12); S02E13S04 awaiting next quota window | 2026-05-09 | | [`subtitles/`](subtitles/) | v3 — Addic7ed (free, no daily cap) added as primary, OS REST as fallback. AD 49/58 subbed; remaining 9 land via OS REST after quota reset | 2026-05-09 |

View file

@ -63,3 +63,41 @@ Recipe upgrade:
- Free-tier 20/day still in force (REST and plugin share the counter). - Free-tier 20/day still in force (REST and plugin share the counter).
- Recipe Step 6 (sync verification) is still manual — no automated check - Recipe Step 6 (sync verification) is still manual — no automated check
that the picked .srt actually aligns with audio. that the picked .srt actually aligns with audio.
## v3 — 2026-05-09
Approach **Addic7ed via subliminal** added as a quota-free fallback. New
helper at `lib/sub-a7d-fetch.py`. Runs alongside v2; pick whichever fits.
- `subliminal` Python lib drives `addic7ed` provider, anonymous
- OS REST is still consulted (search-only, no quota cost) to translate
library Hulu numbering to the show's primary catalogue numbering, since
Addic7ed and OS feature_details appear to align for at least the test
show (American Dad)
- Sidecar written direct to nullstone via `ssh ... cat >`
### v3 picker / matching
- subliminal returns ordered candidates by match score; takes first
- "!" in series name breaks subliminal's matcher; recipe strips it before
building the synthetic filename for `Video.fromname()`
- Synthetic filename pattern: `Series.Name.Year.SXXEYY.HDTV.x264.mkv`
### v3 known quirks
- Some episodes return 0 hits at addic7ed for the OS-feat-details S/E we
pass — likely cases where addic7ed indexes by Fox airing order while OS
uses DVD-compressed (or vice versa). On American Dad, ~9 of 58 episodes
missed via this path. Fall back to v2 OS REST when quota allows.
- One episode (`Black Mystery Month`) had a hit but downloaded empty
content — addic7ed-side cataloguing error or temp 0-byte upload.
- Per-show coverage varies: Addic7ed has near-complete English on broadcast
US shows but spotty for animated specials and obscure titles.
### v3 known limits
- English coverage best; non-English near-empty
- Anonymous downloads work but heavy bursts may trigger Addic7ed's
bot detection and short IP throttle (~1 hour). The script makes no
effort at jittering / backoff
- No automated sync-quality check; recipe Step 6 still manual

View file

@ -1,7 +1,7 @@
# Subtitle acquisition process — v1 # Subtitle acquisition process — v1
Last updated: 2026-05-09 Last updated: 2026-05-09
Status: **v2** — direct REST API. American Dad S01S02 (19/58 eps) subbed. S02E13S04 awaiting next quota window. Status: **v3** — three fetch paths (plugin / OS REST / Addic7ed). American Dad 49/58 subbed; remaining 9 land via OS REST after quota reset.
This recipe is written for Claude Code to execute. Each step lists the exact This recipe is written for Claude Code to execute. Each step lists the exact
command, what to verify, and what to do on failure. Background reference for command, what to verify, and what to do on failure. Background reference for
@ -73,29 +73,41 @@ ssh user@192.168.0.100 "docker exec jellyfin curl -s -H 'X-Emby-Token: $TOK' \
## Step 3 — Pick fetch path ## Step 3 — Pick fetch path
Two paths, differ in robustness vs simplicity: Three paths, ordered cheapest-quota-cost-first:
| Path | When to use | Tool | | Path | Cost / day cap | Coverage | Tool |
|---|---|---| |---|---|---|---|
| **v1 (plugin)** | Library season/episode numbering matches OpenSubtitles indexing AND every episode has good IMDB ProviderId | `lib/sub-fetch.sh` | | **v3 Addic7ed** | free, no daily cap (anon) | English-only; near-complete on broadcast US shows; spotty on animated specials / niche titles | `lib/sub-a7d-fetch.py` |
| **v2 (REST)** | Default. Survives Hulu/Fox numbering mismatches and shows with weird ordering | `lib/sub-rest-fetch.py` | | **v2 OS REST** | 20 / day on free OS account | best overall coverage; survives any S/E numbering quirk via per-ep `imdb_id` | `lib/sub-rest-fetch.py` |
| **v1 plugin** | counts against same OS 20/day | only works when library numbering matches OS catalogue (e.g. fails on American Dad past S01E07) | `lib/sub-fetch.sh` |
Quick check whether v1 will work: Default: try **v3** first to spare quota; fall back to **v2** for episodes
v3 misses or for non-English needs. **v1** stays for shows where simple
plugin auto-fetch is enough.
Quick check whether v1 plugin will suffice (skip the rest if yes):
1. Pick the first episode of season 2 in the library. 1. Pick the first episode of season 2 in the library.
2. Run `curl -s -H 'X-Emby-Token: $TOK' 'http://localhost:8096/Items/$EP/RemoteSearch/Subtitles/eng'` (read-only). 2. Run `curl -s -H 'X-Emby-Token: $TOK' 'http://localhost:8096/Items/$EP/RemoteSearch/Subtitles/eng'` (read-only).
3. If results > 0 — v1 works. v2 also works. 3. If results > 0 — v1 works.
4. If results == 0 but the show exists on opensubtitles.com — numbering mismatch (e.g. American Dad: library uses Hulu S1=7 eps; OS uses Fox S1=23). Use **v2**. 4. If results == 0 but the show exists on opensubtitles.com — numbering mismatch (e.g. American Dad: library uses Hulu S1=7 eps; OS uses different). Use **v3** then **v2** for misses.
When in doubt, use v2.
--- ---
## Step 4 — Fetch subs per episode ## Step 4 — Fetch subs per episode
Use `lib/sub-rest-fetch.py` (v2). It logs in to OpenSubtitles, looks each ### v3 — Addic7ed (default, free)
episode up by its per-episode IMDB id, picks the best English match, and
writes the sidecar straight to nullstone. ```bash
JELLYFIN_TOKEN=<admin-token> \
OPENSUBTITLES_API_KEY=$HOME/.config/arrflix-opensubtitles-api.txt \
processes/subtitles/lib/sub-a7d-fetch.py <series-id> --season N [--start E] [--end E]
```
Pre-flight with `DRY_RUN=1`. The OS REST key is used only for search
(quota-free) to translate library S/E to the show's catalogue numbering.
### v2 — OpenSubtitles REST (fallback for v3 misses)
```bash ```bash
JELLYFIN_TOKEN=<admin-token> \ JELLYFIN_TOKEN=<admin-token> \
@ -105,14 +117,13 @@ OPENSUBTITLES_PASS=<password> \
processes/subtitles/lib/sub-rest-fetch.py <series-id> --season N [--start E] [--end E] processes/subtitles/lib/sub-rest-fetch.py <series-id> --season N [--start E] [--end E]
``` ```
Pre-flight with `DRY_RUN=1` to see picks without consuming quota. 20 / day cap, resets at 00:00 UTC.
The legacy v1 path (Jellyfin plugin RemoteSearch + docker cp) lives at ### v1 — Jellyfin plugin (when library numbering matches OS)
`lib/sub-fetch.sh` and is kept for shows where library numbering matches
OpenSubtitles' indexing — slightly less general but doesn't depend on the
external OS REST API or our 20/day account quota.
Verify after each batch: `lib/sub-fetch.sh` — see header for env. Counts against the same 20/day cap.
### Verify after each batch
```bash ```bash
ssh user@192.168.0.100 'ls "<media-dir>/" | grep -c eng.srt' ssh user@192.168.0.100 'ls "<media-dir>/" | grep -c eng.srt'

View file

@ -0,0 +1,253 @@
#!/usr/bin/env python3
"""Subtitle fetcher v3 — Addic7ed via subliminal.
Free, no daily quota. Uses OpenSubtitles REST (search-only, no downloads,
no quota burn) to translate library S/E numbering to the show's primary
catalogue numbering (e.g. HuluFox for American Dad), then drives
subliminal's addic7ed provider for the actual download.
Why v3: OS REST `/download` is capped at 20/day on free tier. Addic7ed
serves anonymous downloads with no daily limit. v2 (lib/sub-rest-fetch.py)
remains the right tool when quota isn't the bottleneck — addic7ed has
narrower coverage than OpenSubtitles (English only, mostly).
Picker: subliminal's own scoring against the matched Video (filename, S/E,
year). For AD, addic7ed catalogues by Fox airing order, so the script
remaps library Hulu numbering via per-ep IMDB id lookup on OS REST.
Usage:
sub-a7d-fetch.py <series-id> --season N [--start E] [--end E]
sub-a7d-fetch.py <series-id> --all
Env (required):
JELLYFIN_TOKEN X-Emby-Token for nullstone Jellyfin
OPENSUBTITLES_API_KEY Path to file holding the OS REST key (search only)
Env (optional):
NULLSTONE SSH target, default user@192.168.0.100
DRY_RUN=1 search + remap only, no download
"""
from __future__ import annotations
import argparse
import json
import os
import re
import shlex
import subprocess
import sys
import tempfile
import urllib.parse
from babelfish import Language
from subliminal import (Video, region, list_subtitles, download_subtitles,
save_subtitles)
OS_BASE = "https://api.opensubtitles.com/api/v1"
USER_AGENT = "arrflix v1.0.0"
JF_BASE = "http://localhost:8096"
NULLSTONE = os.environ.get("NULLSTONE", "user@192.168.0.100")
region.configure("dogpile.cache.memory")
def die(msg: str, code: int = 1) -> None:
print(f"ERROR: {msg}", file=sys.stderr)
sys.exit(code)
def env_or_die(name: str) -> str:
v = os.environ.get(name)
if not v:
die(f"{name} not set")
return v
def load_api_key() -> str:
path = env_or_die("OPENSUBTITLES_API_KEY")
with open(path) as f:
return f.read().strip()
def jellyfin(path: str, params: dict | None = None) -> dict:
tok = env_or_die("JELLYFIN_TOKEN")
qs = "?" + urllib.parse.urlencode(params, safe=",") if params else ""
url = JF_BASE + path + qs
cmd = ["ssh", NULLSTONE,
f"docker exec jellyfin curl -s -H 'X-Emby-Token: {tok}' {shlex.quote(url)}"]
return json.loads(subprocess.check_output(cmd, text=True))
def list_episodes(series_id: str) -> list[dict]:
d = jellyfin("/Items", {
"ParentId": series_id,
"IncludeItemTypes": "Episode",
"Recursive": "true",
"Fields": "Path,ParentIndexNumber,IndexNumber,ProviderIds",
"SortBy": "ParentIndexNumber,IndexNumber",
})
return d["Items"]
def imdb_strip(s: str | None) -> str | None:
if not s:
return None
return s[2:] if s.startswith("tt") else s
def os_search_imdb(api_key: str, imdb_no_tt: str) -> tuple[int, int] | None:
"""Look up the show's primary catalogue (season, episode) by per-ep IMDB id.
Uses OS feature_details S/E (which appears to align with what Addic7ed
indexes for at least the test shows). Search calls do not consume the
daily quota. If the resulting download mismatches expected dialogue,
consider re-running with the v2 OS REST path which uses imdb_id directly."""
cmd = ["curl", "-sSf",
"-H", f"Api-Key: {api_key}",
"-H", f"User-Agent: {USER_AGENT}",
f"{OS_BASE}/subtitles?imdb_id={imdb_no_tt}&languages=en&per_page=5"]
raw = subprocess.check_output(cmd)
j = json.loads(raw.decode())
for h in j.get("data", []):
fd = h.get("attributes", {}).get("feature_details", {})
s, e = fd.get("season_number"), fd.get("episode_number")
if s and e:
return int(s), int(e)
return None
def episode_to_paths(ep: dict) -> tuple[str, str]:
"""Return (remote_dir, base_filename) for sidecar placement on nullstone."""
container_path = ep["Path"]
host_path = container_path.replace("/media/", "/home/user/media/")
return os.path.dirname(host_path), os.path.splitext(os.path.basename(host_path))[0]
def addic7ed_safe_name(series: str, year: int | None, fox_s: int, fox_e: int) -> str:
"""Build filename that subliminal+addic7ed match. Strip '!' (breaks matcher)
and other punctuation; keep year if known."""
cleaned = re.sub(r"[!?:]", "", series).replace(" ", ".")
yearbit = f".{year}" if year else ""
return f"{cleaned}{yearbit}.S{fox_s:02d}E{fox_e:02d}.HDTV.x264.mkv"
def write_sidecar_remote(content: bytes, remote_path: str) -> None:
p = subprocess.Popen(["ssh", NULLSTONE, f"cat > {shlex.quote(remote_path)}"],
stdin=subprocess.PIPE)
p.communicate(content)
if p.returncode != 0:
die(f"failed writing {remote_path}")
def main() -> int:
ap = argparse.ArgumentParser()
ap.add_argument("series_id")
ap.add_argument("--season", type=int, default=None)
ap.add_argument("--start", type=int, default=1)
ap.add_argument("--end", type=int, default=10**6)
ap.add_argument("--all", action="store_true")
args = ap.parse_args()
if args.season is None and not args.all:
die("pass --season N or --all")
api_key = load_api_key()
dry = os.environ.get("DRY_RUN") == "1"
eps = list_episodes(args.series_id)
work = []
for ep in eps:
s, n = ep["ParentIndexNumber"], ep["IndexNumber"]
if not args.all and s != args.season:
continue
if not (args.start <= n <= args.end):
continue
work.append(ep)
if not work:
die("no episodes selected")
print(f"[plan] {len(work)} episodes selected", file=sys.stderr)
ok = 0
fail = []
for ep in work:
s, n = ep["ParentIndexNumber"], ep["IndexNumber"]
label = f"libS{s:02}E{n:02} {ep['Name']}"
imdb = imdb_strip(ep.get("ProviderIds", {}).get("Imdb"))
if not imdb:
print(f"[skip] {label} — no IMDB id", file=sys.stderr)
fail.append((label, "no-imdb"))
continue
try:
fox = os_search_imdb(api_key, imdb)
except subprocess.CalledProcessError as e:
print(f"[skip] {label} — OS search err {e.returncode}", file=sys.stderr)
fail.append((label, "os-search"))
continue
if fox is None:
print(f"[skip] {label} — OS has no S/E for imdb={imdb}", file=sys.stderr)
fail.append((label, "no-fox-se"))
continue
fox_s, fox_e = fox
# series name + year — pull from path or item
series_name = ep.get("SeriesName") or "Show"
year = None
ymatch = re.search(r"\((\d{4})\)", ep.get("Path", ""))
if ymatch:
year = int(ymatch.group(1))
v_name = addic7ed_safe_name(series_name, year, fox_s, fox_e)
v = Video.fromname(v_name)
try:
hits = list_subtitles([v], {Language("eng")},
providers=["addic7ed"]).get(v, [])
except Exception as e:
print(f"[skip] {label} — addic7ed list err: {type(e).__name__}",
file=sys.stderr)
fail.append((label, "a7d-list"))
continue
if not hits:
print(f"[skip] {label} — addic7ed 0 subs (foxS{fox_s:02}E{fox_e:02})",
file=sys.stderr)
fail.append((label, "a7d-no-hits"))
continue
pick = hits[0] # subliminal returns ordered; take first
print(f"[pick] {label} -> foxS{fox_s:02}E{fox_e:02} a7d={pick.id}",
file=sys.stderr)
if dry:
ok += 1
continue
try:
download_subtitles([pick])
except Exception as e:
print(f"[fail] {label} — addic7ed dl err: {type(e).__name__}: {e}",
file=sys.stderr)
fail.append((label, "a7d-dl"))
continue
if not pick.content:
print(f"[fail] {label} — empty content", file=sys.stderr)
fail.append((label, "empty"))
continue
remote_dir, base = episode_to_paths(ep)
dest = f"{remote_dir}/{base}.eng.srt"
write_sidecar_remote(pick.content, dest)
print(f"[ok] {label} -> {dest}", file=sys.stderr)
ok += 1
print(f"\n[done] ok={ok}/{len(work)} failures={len(fail)}", file=sys.stderr)
for lab, why in fail:
print(f" - {lab}: {why}", file=sys.stderr)
return 0 if ok else 2
if __name__ == "__main__":
sys.exit(main())

View file

@ -1,9 +1,10 @@
# Subtitle run — `American Dad! (2005)` # Subtitle run — `American Dad! (2005)`
Recipe version: v1 (S01) → v2 (S02 partial) Recipe version: v1 (S01) → v2 (S02E01E12) → v3 Addic7ed (S02E13E16, S03, S04)
Run date: 2026-05-09 Run date: 2026-05-09
Operator: Claude Code @ onyx session, ai-lab cwd Operator: Claude Code @ onyx session, ai-lab cwd
Quota usage: 20 → 1 (19 downloads: S01=7, S02=12; 2 lost to urllib-503 bug, recovered manually) OS REST quota usage: 20 → 1 (19 downloads, quota-counted)
Addic7ed downloads: 30 (anonymous, no daily cap)
## Source ## Source
@ -29,12 +30,14 @@ Library uses Hulu/DSP season ordering (S1=7 eps). Original Fox order has S1=23 e
| Season | Eps | Subs fetched | Quality sample | Notes | | Season | Eps | Subs fetched | Quality sample | Notes |
|---|---|---|---|---| |---|---|---|---|---|
| S01 | 7 | 7 / 7 | not yet visually verified by playback (TODO) | v1 path. All from `OMiCRON DVDRip` release group, fps 23.976 except S01E07 (24 fps), no SDH | | S01 | 7 | 7 / 7 | not yet visually verified by playback (TODO) | v1 plugin path. OMiCRON DVDRip 23.976fps |
| S02 | 16 | 12 / 16 | not yet visually verified | v2 path (REST). E01-E12 done. E13-E16 deferred — daily quota = 1 left, resets 23:59 UTC | | S02 | 16 | 16 / 16 | S02E16 first lines confirmed match episode | E01-E12 v2 OS REST (mixed OMiCRON + 20FOX); E13-E16 v3 Addic7ed (no quota cost) |
| S03 | 19 | 0 / 19 | n/a | Awaiting next quota window | | S03 | 19 | 16 / 19 | not yet visually verified | v3 Addic7ed. Misses: E04 Lincoln Lover (a7d 0 subs), E13 Black Mystery Month (a7d empty body), E19 Joint Custody (a7d 0 subs) |
| S04 | 16 | 0 / 16 | n/a | Awaiting next quota window | | S04 | 16 | 10 / 16 | not yet visually verified | v3 Addic7ed. Misses: E01-E05 (Vacation Goo / Meter Made / Dope & Faith / Big Trouble in Little Langley / Haylias) and E11 Oedipal Panties — all "a7d 0 subs" for the OS-feat-details S/E we passed |
Net: **19 / 58 (33 %)**. Net: **49 / 58 (84 %)**.
Remaining 9 episodes can land via OS REST tomorrow (20-quota window covers them all in one batch).
## Picks (S01) ## Picks (S01)
@ -103,5 +106,5 @@ recipe Step 6 sync sample on at least one 29.97-pick episode.
- [ ] visually verify sample S01 sub plays in sync (recipe §6) - [ ] visually verify sample S01 sub plays in sync (recipe §6)
- [ ] visually verify sample S02 29.97-fps pick plays in sync (e.g. S02E03) - [ ] visually verify sample S02 29.97-fps pick plays in sync (e.g. S02E03)
- [ ] tomorrow: sub S02E13E16 (4 eps) + start S03 (19 eps total today + tomorrow) - [ ] visually verify sample Addic7ed pick plays in sync (e.g. S03E01 or S04E10)
- [ ] day after: finish S03 + S04 (16 eps) - [ ] tomorrow (after 23:59 UTC quota reset): rerun `sub-rest-fetch.py --season N --start E --end E` on the 9 missed eps via OS REST