This repository has been archived on 2026-05-20. You can view files and clone it, but cannot push or open issues or pull requests.
media-acquisition/docs/architecture.md
obsidian-ai d300d83ce1 init: media-acquisition pipeline scaffold
Self-hosted BitTorrent + arr-stack + catalog-update pipeline targeting
nullstone (Debian 13). Replaces the legacy onyx -> rsync -> import
round-trip.

Contents:
- README.md          headline + ASCII architecture diagram + quickstart
- CLAUDE.md          project rules (mirrors beta-flix style)
- .gitignore         secrets dirs (.env, gluetun, qbt config, ssh keys)
- .gitleaksignore    allowlist nullstone LAN addr + Tailscale CGNAT
- docs/architecture.md   the plan in detail (gluetun + qbt + arr + catalog)
- docs/migration.md  onyx-qbt -> nullstone-qbt runbook (3 phases)
- docs/trackers.md   tracker schema + IP-pinning + ratio notes (user-curated)
- compose/docker-compose.yml  gluetun v3.40 + qbt 5.0.5 (netns=gluetun) +
                              sonarr/radarr/prowlarr (hotio) + betaflix-catalog
- compose/.env.example       documented env-var template (no secrets)
- compose/traefik/arr.yml    file-provider for qbt/sonarr/radarr/prowlarr
                             .s8n.ru subdomains, LAN+TS only via
                             trusted-only@file + authentik-forwardauth@file
- catalog/catalog.py         Flask service, ~340 LoC, /sonarr + /radarr +
                             /healthz; pulls beta-flix, inserts alphabetic
                             row into MEDIA-LIST.md, writes run log, commits
                             + pushes as obsidian-ai. Idempotent via
                             payload-hash cache.
- catalog/Dockerfile         python:3.12-slim + git + tini
- catalog/requirements.txt   flask + jinja2 + requests + gitpython + pyyaml (pinned)
- catalog/templates/*.j2     run log + catalog row Jinja templates
- catalog/README.md          service docs
- scripts/migrate-onyx.sh    phase-2 helper (rsync + .torrent ship, dry-run by default)
- scripts/add-tracker.sh     Prowlarr API helper
- scripts/killswitch-test.sh gluetun kill-switch verification (3 steps)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-20 01:15:43 +01:00

242 lines
12 KiB
Markdown

# Architecture — nullstone BitTorrent + Import Pipeline
Last reviewed: 2026-05-20 against live state of `user@192.168.0.100`.
**Goal:** kill the `download-on-onyx → rsync → import` round-trip. Land torrents
directly on nullstone, through VPN, hardlink into the canonical ARRFLIX library,
auto-update the catalog in `git.s8n.ru/s8n/beta-flix`.
---
## TL;DR Decisions
| Question | Decision |
|-------------------------|---------------------------------------------------------------------------------------------------------------------------------------|
| Client | `qbittorrentofficial/qbittorrent-nox:5.0.5` (single container, official build, slim) |
| VPN binding | **gluetun sidecar + `network_mode: service:gluetun`** (WireGuard, kill-switch built-in). Reuses Proton WG. Replaces `socks-pvpn` for BT only. |
| Folder layout | `/home/user/media/_downloads/{incomplete,complete}` (NOT scanned by JF) + hardlinks into existing `movies/tv/...` |
| Arr stack? | **Yes for Sonarr/Radarr/Prowlarr, NO for Bazarr/cross-seed (yet)**. Mature rename engine beats bespoke; manual selection still works. |
| FS atomic-import | XFS reflinks (`cp --reflink=always`) — same inode cost as hardlinks but allow free path/perm divergence. "Use Hardlinks" toggle works. |
| Catalog auto-update | sidecar Python service (`betaflix-catalog`) on Sonarr/Radarr webhooks → patches `MEDIA-LIST.md` → git commit+push to Forgejo. |
| GPU | untouched — qbt doesn't need it; Jellyfin keeps its existing passthrough (CPU-only post-driver-issue, separate concern). |
Override any of this if your gut disagrees — but record an ADR under
`docs/decisions/` first.
---
## Current State (verified live)
- `socks-pvpn` container: `serjs/go-socks5-proxy` on `socks-vpn` (172.31.0.0/24).
Already provides `socks5://socks-pvpn:1080` with `qbt` user. Egress via host's
`wg-pvpn-A` / `wg-pvpn-B` (policy-routed by fwmark `0x51820` / `0x51821`).
Proton WG is **host-side**, not in a container.
- `jellyfin-stock`: mounts `/home/user/media → /media` bind, in `proxy` network.
- xfs at `/dev/sda1 → /home/user/media`, 5.5T total / 3.9T free. Reflink-capable.
- No existing Sonarr / Radarr / Prowlarr / gluetun / qbt containers.
- Traefik on networks `proxy`, `socket-proxy-net`, `misskey-frontend`.
Two viable VPN strategies: keep using `socks-pvpn` SOCKS5 with qbt proxy, or
drop in a dedicated `gluetun` for the BT stack only. See § b.
---
## a) qBittorrent image
**Pick:** `qbittorrentofficial/qbittorrent-nox:5.0.5`.
- Official upstream build, signed, no LSIO PUID/PGID overhead.
- 5.0.x ships native WebUI v2 and modern logging.
- Single port: 8080 (WebUI) + chosen listen port (e.g. 51820+random for BT).
- Run as uid `1000:1000` (matches `user:user` on host) so anything qbt writes
to `/home/user/media/_downloads` already matches library ownership.
**Skip** `linuxserver/qbittorrent` — extra init scripts, slower updates, PUID
drift when paired with userns-remap.
**Skip** `qbittorrent-nox` bare on host — Docker buys VPN-namespace binding +
restart isolation. Cheap.
---
## b) VPN binding — pick `gluetun`
Three patterns considered:
| Pattern | Pro | Con |
|-------------------------------------------|---------------------------------------------|---------------------------------------------------------------------------------------------------------------------------|
| qbt SOCKS5 → `socks-pvpn` | Zero new infra | qbt SOCKS support has historical leaks (UDP, trackers, DHT). Not a kill-switch — if SOCKS dies, qbt uses default route → clear-net leak |
| WireGuard inside qbt container | Tight blast radius | Bake wg into image; restart re-attach pain; upgrades painful |
| **`gluetun` sidecar, qbt joins its netns**| Mature kill-switch (iptables-enforced), port-forward helper, qbt unchanged | Adds one container; eats a Proton WG slot |
**Decision: gluetun.** It's a kill-switch by design — if WG drops, gluetun's
firewall blackholes traffic. Used by every torrent setup in /r/selfhosted for
that reason. Add a third Proton WG endpoint specifically for it (so it doesn't
collide with the existing wg-pvpn-A/B host-level policy routes).
Keep `socks-pvpn` running for other clients.
### Kill-switch verification
```bash
docker exec qbittorrent curl -sf --max-time 5 https://api.ipify.org # should return Proton exit IP
docker stop gluetun && docker exec qbittorrent curl -sf --max-time 5 https://api.ipify.org # MUST hang/fail
docker start gluetun
```
If the second command succeeds, you have a leak — **do not proceed**.
The wrapper script is at `scripts/killswitch-test.sh`.
---
## c) Folder layout (xfs, single device)
```
/home/user/media/
├── _downloads/ # NOT in any JF library → JF can't see it
│ ├── incomplete/ # qbt's "temp path" — half-written files
│ ├── complete/ # qbt's "save path" — completed, still seeding
│ └── watch/ # drop .torrent files here for auto-add
├── movies/ # canonical, JF scans (existing)
├── tv/ # canonical, JF scans (existing)
├── education/ # YouTube creator, JF scans (existing)
├── music/ # (existing)
└── podcasts/ # (existing)
```
JF library paths are configured in dashboard and only point at the four
canonical roots. `_downloads/` is on the same xfs filesystem → hardlinks /
reflinks are free (zero extra blocks consumed) when sonarr/radarr import.
Sonarr/Radarr setting: **Use Hardlinks instead of Copy = yes**.
Permissions: qbt runs as `1000:1000`, files land 644/dirs 755 (image sets
`umask 022`). If defaults drift, force with `UMASK=022` in container env
(qbt 5 honors it).
---
## d, f) Arr stack vs custom watcher — pick Arr
For 20-40 items in pipeline the bespoke watcher is *tempting*. Pick
Sonarr/Radarr anyway:
**For Sonarr/Radarr (mature rename + import):**
- Their rename engine handles 100+ edge cases your bash will eventually trip:
multi-episode files, anime absolute numbering, special seasons,
daily-broadcast dates, year-disambiguated titles. You will hit these.
- "Interactive Search" gives manual selection — not forced into RSS auto-grab.
- Hardlink-on-import is a checkbox, not a function to debug.
- Webhook on import → ready-made trigger for catalog-update.
- Library "Scan after import" is built-in. Skip the cargo-cult JF scan task ID
dance (keep as manual escape hatch).
**For Prowlarr:**
- One-place indexer config. Even if you only use 3 trackers, having them
managed in Prowlarr and pushed to Sonarr+Radarr is less duplication.
- Categories + capabilities matter when manual-search returns results — you
want season-pack vs single-episode discrimination on the search UI.
**Against (kept honest):**
- Five extra containers (gluetun, qbt, sonarr, radarr, prowlarr). ~600 MB RAM
combined idle. nullstone has 31 G; rounding error.
- Sonarr database in SQLite — back up in `./backup.sh`.
- More UI surface. Two evenings.
**Hard NO for now:**
- **Bazarr** — subtitle pipeline is the WhisperX v4 build, not OpenSubtitles.
- **cross-seed** — only useful when seriously seeding to ratio. Defer.
- **Lidarr / Readarr** — out of scope (music + books not in this pipeline).
If after 2 weeks Sonarr's metadata picker is fighting you, **then** swap to
bespoke — files on disk are the same shape either way.
---
## e) Catalog-update service (mandatory regardless)
Even with Sonarr/Radarr, neither tool knows about
`/home/admin/projects/beta-flix/playbooks/import-media/MEDIA-LIST.md`. So:
`betaflix-catalog` (Python 3.12, Flask, ~200 LoC, in `catalog/`). Listens for
Sonarr/Radarr **"On Import"** webhooks. For each event:
1. Pull metadata from webhook payload (`series.title`, `series.year`,
`episodeFile.path`, or `movie.title` + `movie.year` + `movieFile.path`).
2. `git -C /repo pull --rebase origin main`.
3. Edit `playbooks/import-media/MEDIA-LIST.md`:
- Movies: insert into Movies table, alphabetic on title.
- TV: if series row exists, merge seasons into the `Seasons` column;
else insert new row.
- "Source / Version" column = parsed from filename release-group tokens
**before** Sonarr stripped them. The webhook gives `sourceTitle`
(original release name) — log it raw, you can edit later.
- "Why on arrflix" column stays blank — that's human-authored.
4. Write run log to `playbooks/import-media/runs/<slug>.md` using a Jinja
template (date, source path, target path, item count, ffprobe summary
from a `docker exec jellyfin-stock ffprobe` call — optional, deferred).
5. `git commit -m "catalog: add <title> (<year>)" --author "obsidian-ai <obsidian-ai@s8n.ru>"`.
6. `git push origin main`. Forgejo deploy key in `compose/catalog/ssh/`
(gitignored — placed by operator at deploy time).
Webhook config in Sonarr: Settings → Connect → Webhook → POST to
`http://host.docker.internal:5055/sonarr` on `OnImport` event only.
Idempotency: hash the payload (`{series_id}:{season}:{episode}`); skip if
seen in the last hour (Sonarr retries on transient failure). Cache lives at
`/tmp/seen-imports.json` (ephemeral; that's fine — duplicate commits are
benign-but-noisy, not destructive).
Skeleton lives at `catalog/catalog.py` in this repo. ~30 minutes to draft,
~2 hours to harden. The piece that bridges "files on nullstone" to "facts
in Forgejo".
---
## g) Migration from onyx-qbt → nullstone-qbt
State: 60+ active torrents on onyx, with download dirs on onyx local disk.
**Goal:** keep seeding (don't burn ratios) while shifting future downloads to
nullstone. Two-phase, no big-bang.
Full runbook: `docs/migration.md` (and the script `scripts/migrate-onyx.sh`).
---
## What this doesn't solve (be aware)
- **Tracker IP allowlists.** Some private trackers pin sessions to a single
IP. Switching from onyx public IP → Proton exit IP will trip them. Check
each tracker's rules before migrating — you may need an IP-update request
per private tracker. See `docs/trackers.md`.
- **Port forwarding via Proton.** gluetun's `VPN_PORT_FORWARDING=on` handles
this for Proton, but the forwarded port rotates. Set qbt to use the
gluetun-provided port via the gluetun control server (gluetun writes the
current port to `/tmp/gluetun/forwarded_port`; qbt's `qBittorrent.conf`
needs a wrapper script to read it on start). Known helper image:
`caillef/qbittorrent-port-sync` — drop in as a 6th container if seeding
ratio matters. Deferred until tracker ratio becomes a real concern.
- **Backup.** Add `/opt/docker/media-acquisition/compose/{sonarr,radarr,prowlarr,qbittorrent}/config`
to nullstone's `/opt/docker/backup.sh`. SQLite DBs — stop containers
briefly or use `sqlite3 .backup` semantics.
---
## Open decisions to confirm before implementing
1. Proton plan slot count — gluetun needs its own WG key. Free slot?
2. Which private trackers do you actually use? IP-pinning check.
3. Public hostnames for the arr-stack: confirm `qbt/sonarr/radarr/prowlarr.s8n.ru`
or pick a sub-zone (`arr.s8n.ru/qbt/`).
4. Authentik group for arr-stack access (LAN-only? or also from gravel via
Tailscale?).
5. Forgejo deploy key — generate now or reuse `obsidian-ai`'s existing key?
Answer those five and the implementation is ~1 evening of compose + ~2 hours
on the catalog service. Migration is a separate weekend.