Archived

This repository has been archived on 2026-05-20. You can view files and clone it, but cannot push or open issues or pull requests.

obsidian-ai d300d83ce1 init: media-acquisition pipeline scaffold

Self-hosted BitTorrent + arr-stack + catalog-update pipeline targeting
nullstone (Debian 13). Replaces the legacy onyx -> rsync -> import
round-trip.

Contents:
- README.md          headline + ASCII architecture diagram + quickstart
- CLAUDE.md          project rules (mirrors beta-flix style)
- .gitignore         secrets dirs (.env, gluetun, qbt config, ssh keys)
- .gitleaksignore    allowlist nullstone LAN addr + Tailscale CGNAT
- docs/architecture.md   the plan in detail (gluetun + qbt + arr + catalog)
- docs/migration.md  onyx-qbt -> nullstone-qbt runbook (3 phases)
- docs/trackers.md   tracker schema + IP-pinning + ratio notes (user-curated)
- compose/docker-compose.yml  gluetun v3.40 + qbt 5.0.5 (netns=gluetun) +
                              sonarr/radarr/prowlarr (hotio) + betaflix-catalog
- compose/.env.example       documented env-var template (no secrets)
- compose/traefik/arr.yml    file-provider for qbt/sonarr/radarr/prowlarr
                             .s8n.ru subdomains, LAN+TS only via
                             trusted-only@file + authentik-forwardauth@file
- catalog/catalog.py         Flask service, ~340 LoC, /sonarr + /radarr +
                             /healthz; pulls beta-flix, inserts alphabetic
                             row into MEDIA-LIST.md, writes run log, commits
                             + pushes as obsidian-ai. Idempotent via
                             payload-hash cache.
- catalog/Dockerfile         python:3.12-slim + git + tini
- catalog/requirements.txt   flask + jinja2 + requests + gitpython + pyyaml (pinned)
- catalog/templates/*.j2     run log + catalog row Jinja templates
- catalog/README.md          service docs
- scripts/migrate-onyx.sh    phase-2 helper (rsync + .torrent ship, dry-run by default)
- scripts/add-tracker.sh     Prowlarr API helper
- scripts/killswitch-test.sh gluetun kill-switch verification (3 steps)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

2026-05-20 01:15:43 +01:00

12 KiB

Raw Blame History

Architecture — nullstone BitTorrent + Import Pipeline

Last reviewed: 2026-05-20 against live state of user@192.168.0.100.

Goal: kill the download-on-onyx → rsync → import round-trip. Land torrents directly on nullstone, through VPN, hardlink into the canonical ARRFLIX library, auto-update the catalog in git.s8n.ru/s8n/beta-flix.

TL;DR Decisions

Question	Decision
Client	`qbittorrentofficial/qbittorrent-nox:5.0.5` (single container, official build, slim)
VPN binding	gluetun sidecar + `network_mode: service:gluetun` (WireGuard, kill-switch built-in). Reuses Proton WG. Replaces `socks-pvpn` for BT only.
Folder layout	`/home/user/media/_downloads/{incomplete,complete}` (NOT scanned by JF) + hardlinks into existing `movies/tv/...`
Arr stack?	Yes for Sonarr/Radarr/Prowlarr, NO for Bazarr/cross-seed (yet). Mature rename engine beats bespoke; manual selection still works.
FS atomic-import	XFS reflinks (`cp --reflink=always`) — same inode cost as hardlinks but allow free path/perm divergence. "Use Hardlinks" toggle works.
Catalog auto-update	sidecar Python service (`betaflix-catalog`) on Sonarr/Radarr webhooks → patches `MEDIA-LIST.md` → git commit+push to Forgejo.
GPU	untouched — qbt doesn't need it; Jellyfin keeps its existing passthrough (CPU-only post-driver-issue, separate concern).

Override any of this if your gut disagrees — but record an ADR under docs/decisions/ first.

Current State (verified live)

socks-pvpn container: serjs/go-socks5-proxy on socks-vpn (172.31.0.0/24). Already provides socks5://socks-pvpn:1080 with qbt user. Egress via host's wg-pvpn-A / wg-pvpn-B (policy-routed by fwmark 0x51820 / 0x51821). Proton WG is host-side, not in a container.
jellyfin-stock: mounts /home/user/media → /media bind, in proxy network.
xfs at /dev/sda1 → /home/user/media, 5.5T total / 3.9T free. Reflink-capable.
No existing Sonarr / Radarr / Prowlarr / gluetun / qbt containers.
Traefik on networks proxy, socket-proxy-net, misskey-frontend.

Two viable VPN strategies: keep using socks-pvpn SOCKS5 with qbt proxy, or drop in a dedicated gluetun for the BT stack only. See § b.

a) qBittorrent image

Pick: qbittorrentofficial/qbittorrent-nox:5.0.5.

Official upstream build, signed, no LSIO PUID/PGID overhead.
5.0.x ships native WebUI v2 and modern logging.
Single port: 8080 (WebUI) + chosen listen port (e.g. 51820+random for BT).
Run as uid 1000:1000 (matches user:user on host) so anything qbt writes to /home/user/media/_downloads already matches library ownership.

Skip linuxserver/qbittorrent — extra init scripts, slower updates, PUID drift when paired with userns-remap.

Skip qbittorrent-nox bare on host — Docker buys VPN-namespace binding + restart isolation. Cheap.

b) VPN binding — pick `gluetun`

Three patterns considered:

Pattern	Pro	Con
qbt SOCKS5 → `socks-pvpn`	Zero new infra	qbt SOCKS support has historical leaks (UDP, trackers, DHT). Not a kill-switch — if SOCKS dies, qbt uses default route → clear-net leak
WireGuard inside qbt container	Tight blast radius	Bake wg into image; restart re-attach pain; upgrades painful
`gluetun` sidecar, qbt joins its netns	Mature kill-switch (iptables-enforced), port-forward helper, qbt unchanged	Adds one container; eats a Proton WG slot

Decision: gluetun. It's a kill-switch by design — if WG drops, gluetun's firewall blackholes traffic. Used by every torrent setup in /r/selfhosted for that reason. Add a third Proton WG endpoint specifically for it (so it doesn't collide with the existing wg-pvpn-A/B host-level policy routes).

Keep socks-pvpn running for other clients.

Kill-switch verification

docker exec qbittorrent curl -sf --max-time 5 https://api.ipify.org   # should return Proton exit IP
docker stop gluetun && docker exec qbittorrent curl -sf --max-time 5 https://api.ipify.org   # MUST hang/fail
docker start gluetun

If the second command succeeds, you have a leak — do not proceed.

The wrapper script is at scripts/killswitch-test.sh.

c) Folder layout (xfs, single device)

/home/user/media/
├── _downloads/            # NOT in any JF library → JF can't see it
│   ├── incomplete/        # qbt's "temp path" — half-written files
│   ├── complete/          # qbt's "save path" — completed, still seeding
│   └── watch/             # drop .torrent files here for auto-add
├── movies/                # canonical, JF scans (existing)
├── tv/                    # canonical, JF scans (existing)
├── education/             # YouTube creator, JF scans (existing)
├── music/                 # (existing)
└── podcasts/              # (existing)

JF library paths are configured in dashboard and only point at the four canonical roots. _downloads/ is on the same xfs filesystem → hardlinks / reflinks are free (zero extra blocks consumed) when sonarr/radarr import.

Sonarr/Radarr setting: Use Hardlinks instead of Copy = yes.

Permissions: qbt runs as 1000:1000, files land 644/dirs 755 (image sets umask 022). If defaults drift, force with UMASK=022 in container env (qbt 5 honors it).

d, f) Arr stack vs custom watcher — pick Arr

For 20-40 items in pipeline the bespoke watcher is tempting. Pick Sonarr/Radarr anyway:

For Sonarr/Radarr (mature rename + import):

Their rename engine handles 100+ edge cases your bash will eventually trip: multi-episode files, anime absolute numbering, special seasons, daily-broadcast dates, year-disambiguated titles. You will hit these.
"Interactive Search" gives manual selection — not forced into RSS auto-grab.
Hardlink-on-import is a checkbox, not a function to debug.
Webhook on import → ready-made trigger for catalog-update.
Library "Scan after import" is built-in. Skip the cargo-cult JF scan task ID dance (keep as manual escape hatch).

For Prowlarr:

One-place indexer config. Even if you only use 3 trackers, having them managed in Prowlarr and pushed to Sonarr+Radarr is less duplication.
Categories + capabilities matter when manual-search returns results — you want season-pack vs single-episode discrimination on the search UI.

Against (kept honest):

Five extra containers (gluetun, qbt, sonarr, radarr, prowlarr). ~600 MB RAM combined idle. nullstone has 31 G; rounding error.
Sonarr database in SQLite — back up in ./backup.sh.
More UI surface. Two evenings.

Hard NO for now:

Bazarr — subtitle pipeline is the WhisperX v4 build, not OpenSubtitles.
cross-seed — only useful when seriously seeding to ratio. Defer.
Lidarr / Readarr — out of scope (music + books not in this pipeline).

If after 2 weeks Sonarr's metadata picker is fighting you, then swap to bespoke — files on disk are the same shape either way.

e) Catalog-update service (mandatory regardless)

Even with Sonarr/Radarr, neither tool knows about /home/admin/projects/beta-flix/playbooks/import-media/MEDIA-LIST.md. So:

betaflix-catalog (Python 3.12, Flask, ~200 LoC, in catalog/). Listens for Sonarr/Radarr "On Import" webhooks. For each event:

Pull metadata from webhook payload (series.title, series.year, episodeFile.path, or movie.title + movie.year + movieFile.path).
git -C /repo pull --rebase origin main.
Edit playbooks/import-media/MEDIA-LIST.md:
- Movies: insert into Movies table, alphabetic on title.
- TV: if series row exists, merge seasons into the Seasons column; else insert new row.
- "Source / Version" column = parsed from filename release-group tokens before Sonarr stripped them. The webhook gives sourceTitle (original release name) — log it raw, you can edit later.
- "Why on arrflix" column stays blank — that's human-authored.
Write run log to playbooks/import-media/runs/<slug>.md using a Jinja template (date, source path, target path, item count, ffprobe summary from a docker exec jellyfin-stock ffprobe call — optional, deferred).
git commit -m "catalog: add <title> (<year>)" --author "obsidian-ai <obsidian-ai@s8n.ru>".
git push origin main. Forgejo deploy key in compose/catalog/ssh/ (gitignored — placed by operator at deploy time).

Webhook config in Sonarr: Settings → Connect → Webhook → POST to http://host.docker.internal:5055/sonarr on OnImport event only.

Idempotency: hash the payload ({series_id}:{season}:{episode}); skip if seen in the last hour (Sonarr retries on transient failure). Cache lives at /tmp/seen-imports.json (ephemeral; that's fine — duplicate commits are benign-but-noisy, not destructive).

Skeleton lives at catalog/catalog.py in this repo. ~30 minutes to draft, ~2 hours to harden. The piece that bridges "files on nullstone" to "facts in Forgejo".

g) Migration from onyx-qbt → nullstone-qbt

State: 60+ active torrents on onyx, with download dirs on onyx local disk.

Goal: keep seeding (don't burn ratios) while shifting future downloads to nullstone. Two-phase, no big-bang.

Full runbook: docs/migration.md (and the script scripts/migrate-onyx.sh).

What this doesn't solve (be aware)

Tracker IP allowlists. Some private trackers pin sessions to a single IP. Switching from onyx public IP → Proton exit IP will trip them. Check each tracker's rules before migrating — you may need an IP-update request per private tracker. See docs/trackers.md.
Port forwarding via Proton. gluetun's VPN_PORT_FORWARDING=on handles this for Proton, but the forwarded port rotates. Set qbt to use the gluetun-provided port via the gluetun control server (gluetun writes the current port to /tmp/gluetun/forwarded_port; qbt's qBittorrent.conf needs a wrapper script to read it on start). Known helper image: caillef/qbittorrent-port-sync — drop in as a 6th container if seeding ratio matters. Deferred until tracker ratio becomes a real concern.
Backup. Add /opt/docker/media-acquisition/compose/{sonarr,radarr,prowlarr,qbittorrent}/config to nullstone's /opt/docker/backup.sh. SQLite DBs — stop containers briefly or use sqlite3 .backup semantics.

Open decisions to confirm before implementing

Proton plan slot count — gluetun needs its own WG key. Free slot?
Which private trackers do you actually use? IP-pinning check.
Public hostnames for the arr-stack: confirm qbt/sonarr/radarr/prowlarr.s8n.ru or pick a sub-zone (arr.s8n.ru/qbt/).
Authentik group for arr-stack access (LAN-only? or also from gravel via Tailscale?).
Forgejo deploy key — generate now or reuse obsidian-ai's existing key?

Answer those five and the implementation is ~1 evening of compose + ~2 hours on the catalog service. Migration is a separate weekend.

12 KiB Raw Blame History