# Subtitle USER-G style — ARRFLIX The bar every fetch should hit. If a recipe step would violate any of these, stop and ask before proceeding. ## Source priority (highest → lowest) Accuracy beats format. Use this tier ladder before reaching for OCR/AI: 1. **Original release text subs** (`.srt`/`.ass` from disc/streamer rip, embedded or sidecar). Ground truth — ship as-is. 2. **Original release bitmap subs** (PGS, VobSub, `dvd_subtitle`, embedded). **Acceptable in their native form** — they ARE the original words from the source master, just rendered as images. Jellyfin server burns them in for clients that can't render natively. Optionally OCR-extract a `.srt` sidecar alongside (do NOT replace the embedded stream) when client-side styling, search, or mobile rendering matters. 3. **Trusted text rips** from OpenSubtitles (verified uploads, hash-match or high-download-count + frame-rate-match). 4. **WhisperX rebuild** with `--initial_prompt` proper-nouns yaml — only when no original exists (e.g. user-uploaded YT content with auto-CC). Logged in [`STOPGAP-SUBS.md`](STOPGAP-SUBS.md) until cleared. Tier 1 and 2 are first-class. Tier 3 is a fallback. Tier 4 is a stop-gap. ## What lands on disk - **At least one** English subtitle stream per episode (embedded OR sidecar — not both required). - Sidecar filename when used: `.eng.srt` — no language-region tags (`en-US`), no flag stack on regular subs (no `.sdh`, no `.forced`, no `.cc` unless there genuinely is no plain-English option). - Sidecar format: `.srt` (SubRip text). For embedded: keep the original codec (`subrip`, `ass`, `pgs`, `vobsub`, `dvd_subtitle`) — do NOT re-mux just to convert format. Convert only when extracting to disk: text codecs via `ffmpeg -map 0:s:0 -c:s srt`, bitmap codecs via OCR (see "OCR bitmap → text" below). - Encoding (sidecar): UTF-8. Re-encode with `iconv` if a sidecar comes back as cp1252 / windows-1250. ## What gets picked In order: 1. **English** language — `eng` / `en`. Never auto-pick `en-US`/`en-GB` variants over plain `en`; treat them equivalent for matching. 2. **No SDH / Hearing Impaired** — drop any sub flagged `hearing_impaired`, `sdh`, `cc`. Only fall back to SDH if no plain-English option exists. 3. **No machine / AI translation** — drop `machine_translated`, `ai_translated`. Hand-authored subs only. 4. **No forced subtitles** — drop `foreign_parts_only` / `Forced` unless the episode has English audio with foreign-language scenes that need translation (rare for US shows). 5. **Frame-rate match** — prefer entries whose declared fps matches the source video (typically 23.976 for our masters). Treat `0.0` as unknown and fall through to step 6. 6. **Highest download count** within the surviving candidates — proxies for "the version everyone agreed was best." After fetch, **eyeball-verify one sample episode per show** plays in sync (±1 s on a known dialogue line) before declaring the show done. ## What doesn't ship - Multiple language tracks per episode (no German/French alternatives — English-only library). Drop non-English embedded streams via `mkvpropedit` only if user complains about client picker clutter; do NOT silently strip them on import. - Director's commentary, behind-the-scenes, song-only subs. - Subs that cover only a partial runtime (the partial-cover heuristic isn't scripted yet; spot-check duration vs episode runtime if a srt looks short). - "All-episodes-in-one" mega-packs treated as a single episode's sidecar. ## OCR bitmap → text (optional, tier-2 augmentation) Embedded PGS/VobSub/`dvd_subtitle` are acceptable as-is (tier 2). OCR becomes worthwhile when: (a) client repeatedly transcodes due to bitmap burn-in (CPU pressure on nullstone — no GPU transcode available), (b) user wants to restyle font/size on a specific show, (c) mobile client renders bitmap subs poorly. Recipe (`pgsrip`, batch-friendly, Tesseract-backed): ```bash pip install pgsrip # PGS: extract embedded to .sup ffmpeg -i input.mkv -map 0:s:0 -c copy subs.sup pgsrip --language eng subs.sup # -> subs.srt # VobSub/dvd_subtitle: extract to .idx + .sub mkvextract input.mkv tracks 2:subs.idx pgsrip subs.idx # -> subs.srt ``` OCR accuracy ~90–95 % raw, ~95–98 % after Subtitle Edit cleanup. Source words are correct (it's transcription of original render, not Whisper hallucination) — only font recognition fights you. Resulting `.srt` ships as sidecar **alongside** the embedded bitmap stream, not as a replacement. ## How the UI presents subs The detail-page subtitle dropdown is shimmed via `web-overrides/index.html` (markers `SUB-LABEL-SHIM-BEGIN/END`). Stock Jellyfin shows e.g. `English - SUBRIP - External - Default`; the shim collapses to `English`, with `(Forced)` / `(SDH)` / `(Hearing Impaired)` suffixes only when those flags actually apply. `Default` is dropped — it's redundant when there's only one stream per language. Revert: `bin/revert-sub-label-shim.sh`. ## Why these rules - Boutique-release-group quality bar from [`README.md`](../../README.md): "every show and film is the best version I could put together." - **Original-release subs > pretty format.** The DVD/BD/streamer master is the canonical script — bitmap or text, those are the words the studio shipped. An OCR'd or AI-rebuilt sidecar is a derivative that introduces error (font confusion, mistranscription); the original doesn't. Especially true for older shows (Futurama S1–S3, Archer early seasons) where the master is the only authoritative source and upscale artifacts already dominate the visual experience — bitmap subs match the source vibe. - One-language library = one stream per ep = no need to expose codec or source in UI. - SDH/CC adds `[door slams]`, `[music]` etc. — distracting on first watch and not what someone reaches for unless they specifically need it. - Machine / AI translations are inconsistent and often wrong on slang or show-specific terms (esp. animated comedies). - Frame-rate-matched subs sync without manual offset on first try; mismatch generally still works on NTSC (29.97 vs 23.976 are the same elapsed time) but hash-match or fps-match removes that gamble.