From 43f55643be3dfcff5618144cde8699aeed9f2dd9 Mon Sep 17 00:00:00 2001 From: s8n Date: Sat, 9 May 2026 23:31:10 +0100 Subject: [PATCH] processes/subtitles: v3 Addic7ed fetcher + AD 49/58 subbed Adds lib/sub-a7d-fetch.py: free, no-daily-cap path via subliminal's addic7ed provider (anonymous). Uses OpenSubtitles REST search-only (no quota cost) to translate library S/E to the show's primary catalogue numbering, then drives subliminal to download from Addic7ed and writes sidecars direct to nullstone via SSH. Picker quirks: subliminal series-name matcher is broken by '!' in the title, so the script strips it before building the synthetic Video.fromname() string. OS feature_details S/E happens to align with Addic7ed's indexing for the test show (American Dad). Recipe README now reflects three paths in cheapest-first order: v3 Addic7ed, v2 OS REST (20/day), v1 plugin. American Dad run log updated to 49/58 (S01 7/7 v1, S02 16/16 mixed v2/v3, S03 16/19 v3, S04 10/16 v3). 9 misses identified, deferred to next OS REST quota window. --- processes/README.md | 2 +- processes/subtitles/CHANGELOG.md | 38 ++++ processes/subtitles/README.md | 51 +++-- processes/subtitles/lib/sub-a7d-fetch.py | 253 +++++++++++++++++++++++ processes/subtitles/runs/american-dad.md | 21 +- 5 files changed, 335 insertions(+), 30 deletions(-) create mode 100755 processes/subtitles/lib/sub-a7d-fetch.py diff --git a/processes/README.md b/processes/README.md index 0889454..676a0e8 100644 --- a/processes/README.md +++ b/processes/README.md @@ -21,4 +21,4 @@ amendment for a full sweep. | Process | Status | Last touched | |---|---|---| -| [`subtitles/`](subtitles/) | v2 — direct OpenSubtitles REST. AD 19/58 eps subbed (S01 + S02E01–E12); S02E13–S04 awaiting next quota window | 2026-05-09 | +| [`subtitles/`](subtitles/) | v3 — Addic7ed (free, no daily cap) added as primary, OS REST as fallback. AD 49/58 subbed; remaining 9 land via OS REST after quota reset | 2026-05-09 | diff --git a/processes/subtitles/CHANGELOG.md b/processes/subtitles/CHANGELOG.md index 7e53605..c8770cc 100644 --- a/processes/subtitles/CHANGELOG.md +++ b/processes/subtitles/CHANGELOG.md @@ -63,3 +63,41 @@ Recipe upgrade: - Free-tier 20/day still in force (REST and plugin share the counter). - Recipe Step 6 (sync verification) is still manual — no automated check that the picked .srt actually aligns with audio. + +## v3 — 2026-05-09 + +Approach **Addic7ed via subliminal** added as a quota-free fallback. New +helper at `lib/sub-a7d-fetch.py`. Runs alongside v2; pick whichever fits. + +- `subliminal` Python lib drives `addic7ed` provider, anonymous +- OS REST is still consulted (search-only, no quota cost) to translate + library Hulu numbering to the show's primary catalogue numbering, since + Addic7ed and OS feature_details appear to align for at least the test + show (American Dad) +- Sidecar written direct to nullstone via `ssh ... cat >` + +### v3 picker / matching + +- subliminal returns ordered candidates by match score; takes first +- "!" in series name breaks subliminal's matcher; recipe strips it before + building the synthetic filename for `Video.fromname()` +- Synthetic filename pattern: `Series.Name.Year.SXXEYY.HDTV.x264.mkv` + +### v3 known quirks + +- Some episodes return 0 hits at addic7ed for the OS-feat-details S/E we + pass — likely cases where addic7ed indexes by Fox airing order while OS + uses DVD-compressed (or vice versa). On American Dad, ~9 of 58 episodes + missed via this path. Fall back to v2 OS REST when quota allows. +- One episode (`Black Mystery Month`) had a hit but downloaded empty + content — addic7ed-side cataloguing error or temp 0-byte upload. +- Per-show coverage varies: Addic7ed has near-complete English on broadcast + US shows but spotty for animated specials and obscure titles. + +### v3 known limits + +- English coverage best; non-English near-empty +- Anonymous downloads work but heavy bursts may trigger Addic7ed's + bot detection and short IP throttle (~1 hour). The script makes no + effort at jittering / backoff +- No automated sync-quality check; recipe Step 6 still manual diff --git a/processes/subtitles/README.md b/processes/subtitles/README.md index 20650c1..c836b6c 100644 --- a/processes/subtitles/README.md +++ b/processes/subtitles/README.md @@ -1,7 +1,7 @@ # Subtitle acquisition process — v1 Last updated: 2026-05-09 -Status: **v2** — direct REST API. American Dad S01–S02 (19/58 eps) subbed. S02E13–S04 awaiting next quota window. +Status: **v3** — three fetch paths (plugin / OS REST / Addic7ed). American Dad 49/58 subbed; remaining 9 land via OS REST after quota reset. This recipe is written for Claude Code to execute. Each step lists the exact command, what to verify, and what to do on failure. Background reference for @@ -73,29 +73,41 @@ ssh user@192.168.0.100 "docker exec jellyfin curl -s -H 'X-Emby-Token: $TOK' \ ## Step 3 — Pick fetch path -Two paths, differ in robustness vs simplicity: +Three paths, ordered cheapest-quota-cost-first: -| Path | When to use | Tool | -|---|---|---| -| **v1 (plugin)** | Library season/episode numbering matches OpenSubtitles indexing AND every episode has good IMDB ProviderId | `lib/sub-fetch.sh` | -| **v2 (REST)** | Default. Survives Hulu/Fox numbering mismatches and shows with weird ordering | `lib/sub-rest-fetch.py` | +| Path | Cost / day cap | Coverage | Tool | +|---|---|---|---| +| **v3 Addic7ed** | free, no daily cap (anon) | English-only; near-complete on broadcast US shows; spotty on animated specials / niche titles | `lib/sub-a7d-fetch.py` | +| **v2 OS REST** | 20 / day on free OS account | best overall coverage; survives any S/E numbering quirk via per-ep `imdb_id` | `lib/sub-rest-fetch.py` | +| **v1 plugin** | counts against same OS 20/day | only works when library numbering matches OS catalogue (e.g. fails on American Dad past S01E07) | `lib/sub-fetch.sh` | -Quick check whether v1 will work: +Default: try **v3** first to spare quota; fall back to **v2** for episodes +v3 misses or for non-English needs. **v1** stays for shows where simple +plugin auto-fetch is enough. + +Quick check whether v1 plugin will suffice (skip the rest if yes): 1. Pick the first episode of season 2 in the library. 2. Run `curl -s -H 'X-Emby-Token: $TOK' 'http://localhost:8096/Items/$EP/RemoteSearch/Subtitles/eng'` (read-only). -3. If results > 0 — v1 works. v2 also works. -4. If results == 0 but the show exists on opensubtitles.com — numbering mismatch (e.g. American Dad: library uses Hulu S1=7 eps; OS uses Fox S1=23). Use **v2**. - -When in doubt, use v2. +3. If results > 0 — v1 works. +4. If results == 0 but the show exists on opensubtitles.com — numbering mismatch (e.g. American Dad: library uses Hulu S1=7 eps; OS uses different). Use **v3** then **v2** for misses. --- ## Step 4 — Fetch subs per episode -Use `lib/sub-rest-fetch.py` (v2). It logs in to OpenSubtitles, looks each -episode up by its per-episode IMDB id, picks the best English match, and -writes the sidecar straight to nullstone. +### v3 — Addic7ed (default, free) + +```bash +JELLYFIN_TOKEN= \ +OPENSUBTITLES_API_KEY=$HOME/.config/arrflix-opensubtitles-api.txt \ +processes/subtitles/lib/sub-a7d-fetch.py --season N [--start E] [--end E] +``` + +Pre-flight with `DRY_RUN=1`. The OS REST key is used only for search +(quota-free) to translate library S/E to the show's catalogue numbering. + +### v2 — OpenSubtitles REST (fallback for v3 misses) ```bash JELLYFIN_TOKEN= \ @@ -105,14 +117,13 @@ OPENSUBTITLES_PASS= \ processes/subtitles/lib/sub-rest-fetch.py --season N [--start E] [--end E] ``` -Pre-flight with `DRY_RUN=1` to see picks without consuming quota. +20 / day cap, resets at 00:00 UTC. -The legacy v1 path (Jellyfin plugin RemoteSearch + docker cp) lives at -`lib/sub-fetch.sh` and is kept for shows where library numbering matches -OpenSubtitles' indexing — slightly less general but doesn't depend on the -external OS REST API or our 20/day account quota. +### v1 — Jellyfin plugin (when library numbering matches OS) -Verify after each batch: +`lib/sub-fetch.sh` — see header for env. Counts against the same 20/day cap. + +### Verify after each batch ```bash ssh user@192.168.0.100 'ls "/" | grep -c eng.srt' diff --git a/processes/subtitles/lib/sub-a7d-fetch.py b/processes/subtitles/lib/sub-a7d-fetch.py new file mode 100755 index 0000000..68ff9b5 --- /dev/null +++ b/processes/subtitles/lib/sub-a7d-fetch.py @@ -0,0 +1,253 @@ +#!/usr/bin/env python3 +"""Subtitle fetcher v3 — Addic7ed via subliminal. + +Free, no daily quota. Uses OpenSubtitles REST (search-only, no downloads, +no quota burn) to translate library S/E numbering to the show's primary +catalogue numbering (e.g. Hulu→Fox for American Dad), then drives +subliminal's addic7ed provider for the actual download. + +Why v3: OS REST `/download` is capped at 20/day on free tier. Addic7ed +serves anonymous downloads with no daily limit. v2 (lib/sub-rest-fetch.py) +remains the right tool when quota isn't the bottleneck — addic7ed has +narrower coverage than OpenSubtitles (English only, mostly). + +Picker: subliminal's own scoring against the matched Video (filename, S/E, +year). For AD, addic7ed catalogues by Fox airing order, so the script +remaps library Hulu numbering via per-ep IMDB id lookup on OS REST. + +Usage: + sub-a7d-fetch.py --season N [--start E] [--end E] + sub-a7d-fetch.py --all + +Env (required): + JELLYFIN_TOKEN X-Emby-Token for nullstone Jellyfin + OPENSUBTITLES_API_KEY Path to file holding the OS REST key (search only) + +Env (optional): + NULLSTONE SSH target, default user@192.168.0.100 + DRY_RUN=1 search + remap only, no download +""" +from __future__ import annotations + +import argparse +import json +import os +import re +import shlex +import subprocess +import sys +import tempfile +import urllib.parse + +from babelfish import Language +from subliminal import (Video, region, list_subtitles, download_subtitles, + save_subtitles) + +OS_BASE = "https://api.opensubtitles.com/api/v1" +USER_AGENT = "arrflix v1.0.0" +JF_BASE = "http://localhost:8096" +NULLSTONE = os.environ.get("NULLSTONE", "user@192.168.0.100") + +region.configure("dogpile.cache.memory") + + +def die(msg: str, code: int = 1) -> None: + print(f"ERROR: {msg}", file=sys.stderr) + sys.exit(code) + + +def env_or_die(name: str) -> str: + v = os.environ.get(name) + if not v: + die(f"{name} not set") + return v + + +def load_api_key() -> str: + path = env_or_die("OPENSUBTITLES_API_KEY") + with open(path) as f: + return f.read().strip() + + +def jellyfin(path: str, params: dict | None = None) -> dict: + tok = env_or_die("JELLYFIN_TOKEN") + qs = "?" + urllib.parse.urlencode(params, safe=",") if params else "" + url = JF_BASE + path + qs + cmd = ["ssh", NULLSTONE, + f"docker exec jellyfin curl -s -H 'X-Emby-Token: {tok}' {shlex.quote(url)}"] + return json.loads(subprocess.check_output(cmd, text=True)) + + +def list_episodes(series_id: str) -> list[dict]: + d = jellyfin("/Items", { + "ParentId": series_id, + "IncludeItemTypes": "Episode", + "Recursive": "true", + "Fields": "Path,ParentIndexNumber,IndexNumber,ProviderIds", + "SortBy": "ParentIndexNumber,IndexNumber", + }) + return d["Items"] + + +def imdb_strip(s: str | None) -> str | None: + if not s: + return None + return s[2:] if s.startswith("tt") else s + + +def os_search_imdb(api_key: str, imdb_no_tt: str) -> tuple[int, int] | None: + """Look up the show's primary catalogue (season, episode) by per-ep IMDB id. + Uses OS feature_details S/E (which appears to align with what Addic7ed + indexes for at least the test shows). Search calls do not consume the + daily quota. If the resulting download mismatches expected dialogue, + consider re-running with the v2 OS REST path which uses imdb_id directly.""" + cmd = ["curl", "-sSf", + "-H", f"Api-Key: {api_key}", + "-H", f"User-Agent: {USER_AGENT}", + f"{OS_BASE}/subtitles?imdb_id={imdb_no_tt}&languages=en&per_page=5"] + raw = subprocess.check_output(cmd) + j = json.loads(raw.decode()) + for h in j.get("data", []): + fd = h.get("attributes", {}).get("feature_details", {}) + s, e = fd.get("season_number"), fd.get("episode_number") + if s and e: + return int(s), int(e) + return None + + +def episode_to_paths(ep: dict) -> tuple[str, str]: + """Return (remote_dir, base_filename) for sidecar placement on nullstone.""" + container_path = ep["Path"] + host_path = container_path.replace("/media/", "/home/user/media/") + return os.path.dirname(host_path), os.path.splitext(os.path.basename(host_path))[0] + + +def addic7ed_safe_name(series: str, year: int | None, fox_s: int, fox_e: int) -> str: + """Build filename that subliminal+addic7ed match. Strip '!' (breaks matcher) + and other punctuation; keep year if known.""" + cleaned = re.sub(r"[!?:]", "", series).replace(" ", ".") + yearbit = f".{year}" if year else "" + return f"{cleaned}{yearbit}.S{fox_s:02d}E{fox_e:02d}.HDTV.x264.mkv" + + +def write_sidecar_remote(content: bytes, remote_path: str) -> None: + p = subprocess.Popen(["ssh", NULLSTONE, f"cat > {shlex.quote(remote_path)}"], + stdin=subprocess.PIPE) + p.communicate(content) + if p.returncode != 0: + die(f"failed writing {remote_path}") + + +def main() -> int: + ap = argparse.ArgumentParser() + ap.add_argument("series_id") + ap.add_argument("--season", type=int, default=None) + ap.add_argument("--start", type=int, default=1) + ap.add_argument("--end", type=int, default=10**6) + ap.add_argument("--all", action="store_true") + args = ap.parse_args() + + if args.season is None and not args.all: + die("pass --season N or --all") + + api_key = load_api_key() + dry = os.environ.get("DRY_RUN") == "1" + + eps = list_episodes(args.series_id) + work = [] + for ep in eps: + s, n = ep["ParentIndexNumber"], ep["IndexNumber"] + if not args.all and s != args.season: + continue + if not (args.start <= n <= args.end): + continue + work.append(ep) + if not work: + die("no episodes selected") + + print(f"[plan] {len(work)} episodes selected", file=sys.stderr) + + ok = 0 + fail = [] + for ep in work: + s, n = ep["ParentIndexNumber"], ep["IndexNumber"] + label = f"libS{s:02}E{n:02} {ep['Name']}" + + imdb = imdb_strip(ep.get("ProviderIds", {}).get("Imdb")) + if not imdb: + print(f"[skip] {label} — no IMDB id", file=sys.stderr) + fail.append((label, "no-imdb")) + continue + + try: + fox = os_search_imdb(api_key, imdb) + except subprocess.CalledProcessError as e: + print(f"[skip] {label} — OS search err {e.returncode}", file=sys.stderr) + fail.append((label, "os-search")) + continue + if fox is None: + print(f"[skip] {label} — OS has no S/E for imdb={imdb}", file=sys.stderr) + fail.append((label, "no-fox-se")) + continue + fox_s, fox_e = fox + + # series name + year — pull from path or item + series_name = ep.get("SeriesName") or "Show" + year = None + ymatch = re.search(r"\((\d{4})\)", ep.get("Path", "")) + if ymatch: + year = int(ymatch.group(1)) + + v_name = addic7ed_safe_name(series_name, year, fox_s, fox_e) + v = Video.fromname(v_name) + + try: + hits = list_subtitles([v], {Language("eng")}, + providers=["addic7ed"]).get(v, []) + except Exception as e: + print(f"[skip] {label} — addic7ed list err: {type(e).__name__}", + file=sys.stderr) + fail.append((label, "a7d-list")) + continue + + if not hits: + print(f"[skip] {label} — addic7ed 0 subs (foxS{fox_s:02}E{fox_e:02})", + file=sys.stderr) + fail.append((label, "a7d-no-hits")) + continue + + pick = hits[0] # subliminal returns ordered; take first + print(f"[pick] {label} -> foxS{fox_s:02}E{fox_e:02} a7d={pick.id}", + file=sys.stderr) + + if dry: + ok += 1 + continue + + try: + download_subtitles([pick]) + except Exception as e: + print(f"[fail] {label} — addic7ed dl err: {type(e).__name__}: {e}", + file=sys.stderr) + fail.append((label, "a7d-dl")) + continue + + if not pick.content: + print(f"[fail] {label} — empty content", file=sys.stderr) + fail.append((label, "empty")) + continue + + remote_dir, base = episode_to_paths(ep) + dest = f"{remote_dir}/{base}.eng.srt" + write_sidecar_remote(pick.content, dest) + print(f"[ok] {label} -> {dest}", file=sys.stderr) + ok += 1 + + print(f"\n[done] ok={ok}/{len(work)} failures={len(fail)}", file=sys.stderr) + for lab, why in fail: + print(f" - {lab}: {why}", file=sys.stderr) + return 0 if ok else 2 + + +if __name__ == "__main__": + sys.exit(main()) diff --git a/processes/subtitles/runs/american-dad.md b/processes/subtitles/runs/american-dad.md index 94d9675..8b648ea 100644 --- a/processes/subtitles/runs/american-dad.md +++ b/processes/subtitles/runs/american-dad.md @@ -1,9 +1,10 @@ # Subtitle run — `American Dad! (2005)` -Recipe version: v1 (S01) → v2 (S02 partial) +Recipe version: v1 (S01) → v2 (S02E01–E12) → v3 Addic7ed (S02E13–E16, S03, S04) Run date: 2026-05-09 Operator: Claude Code @ onyx session, ai-lab cwd -Quota usage: 20 → 1 (19 downloads: S01=7, S02=12; 2 lost to urllib-503 bug, recovered manually) +OS REST quota usage: 20 → 1 (19 downloads, quota-counted) +Addic7ed downloads: 30 (anonymous, no daily cap) ## Source @@ -29,12 +30,14 @@ Library uses Hulu/DSP season ordering (S1=7 eps). Original Fox order has S1=23 e | Season | Eps | Subs fetched | Quality sample | Notes | |---|---|---|---|---| -| S01 | 7 | 7 / 7 | not yet visually verified by playback (TODO) | v1 path. All from `OMiCRON DVDRip` release group, fps 23.976 except S01E07 (24 fps), no SDH | -| S02 | 16 | 12 / 16 | not yet visually verified | v2 path (REST). E01-E12 done. E13-E16 deferred — daily quota = 1 left, resets 23:59 UTC | -| S03 | 19 | 0 / 19 | n/a | Awaiting next quota window | -| S04 | 16 | 0 / 16 | n/a | Awaiting next quota window | +| S01 | 7 | 7 / 7 | not yet visually verified by playback (TODO) | v1 plugin path. OMiCRON DVDRip 23.976fps | +| S02 | 16 | 16 / 16 | S02E16 first lines confirmed match episode | E01-E12 v2 OS REST (mixed OMiCRON + 20FOX); E13-E16 v3 Addic7ed (no quota cost) | +| S03 | 19 | 16 / 19 | not yet visually verified | v3 Addic7ed. Misses: E04 Lincoln Lover (a7d 0 subs), E13 Black Mystery Month (a7d empty body), E19 Joint Custody (a7d 0 subs) | +| S04 | 16 | 10 / 16 | not yet visually verified | v3 Addic7ed. Misses: E01-E05 (Vacation Goo / Meter Made / Dope & Faith / Big Trouble in Little Langley / Haylias) and E11 Oedipal Panties — all "a7d 0 subs" for the OS-feat-details S/E we passed | -Net: **19 / 58 (33 %)**. +Net: **49 / 58 (84 %)**. + +Remaining 9 episodes can land via OS REST tomorrow (20-quota window covers them all in one batch). ## Picks (S01) @@ -103,5 +106,5 @@ recipe Step 6 sync sample on at least one 29.97-pick episode. - [ ] visually verify sample S01 sub plays in sync (recipe §6) - [ ] visually verify sample S02 29.97-fps pick plays in sync (e.g. S02E03) -- [ ] tomorrow: sub S02E13–E16 (4 eps) + start S03 (19 eps total today + tomorrow) -- [ ] day after: finish S03 + S04 (16 eps) +- [ ] visually verify sample Addic7ed pick plays in sync (e.g. S03E01 or S04E10) +- [ ] tomorrow (after 23:59 UTC quota reset): rerun `sub-rest-fetch.py --season N --start E --end E` on the 9 missed eps via OS REST