legacy-arrflix/testing/ROLLBACK.md
s8n d9d6bdba64 testing/ folder: theme-edit guides + error catalog + headless recipes
7 docs in /testing/ — institutional memory after 6+ regressions in
24-48h on the v6 theme. Read before any edit.

  README.md           — index + quickstart
  THEMING.md          — safe-edit checklist + layer/specificity tables
  ERROR-PATTERNS.md   — 12 cataloged patterns (Symptom/Cause/Diag/Fix/Prev)
  HEADLESS-PROBE.md   — 11 playwright recipes (md5 chain, darkPct,
                        ancestor bg sample, dropdown listItem probe)
  ROLLBACK.md         — 8 emergency revert recipes (overlay, branding,
                        encoding, full-from-repo, dev-clone-prod,
                        git-revert, pw-reset, bind-mount inode-swap)
  SMOKE-TEST.md       — manual + headless verify checklist
  DEPLOY.md           — dev → prod promotion workflow with backup +
                        chown root + restart inode-swap

Empty subdirs: snipUSER-Es/, recipes/, incidents/ (post-mortems land here).

Goal: stop reinventing the same fixes. Catalog every error class,
codify the recovery, build a skills folder for future ARRFLIX work.
2026-05-10 00:47:20 +01:00

154 lines
8.2 KiB
Markdown

# ROLLBACK — emergency revert procedures
> When something breaks, follow these recipes. Each is one shell block + verify step.
## When to use this
Any of:
- Live users report black screens, missing UI, can't login
- Headless probe shows `darkPct > 50%` during playback
- `/Branding/Css.css` returns 0 bytes
- Container won't start or stays unhealthy
- `curl -s https://arrflix.s8n.ru/web/index.html | grep -c ARRFLIX-MIDDLE-THEME-BEGIN` returns 0
- Owner says "rollback" — don't debate, restore first, diagnose after
## ROLLBACK 1 — overlay (most common)
Symptom: theme regression, wrong colors, missing features after a deploy. Login or home page renders but looks wrong.
Source: every prod deploy creates a `.bak.pre-<reason>.<unix-ts>` file in `/opt/docker/jellyfin/web-overrides/`. Pick the most recent (or the one matching the last-known-good state — e.g. `index.html.bak.pre-favfix.1778318089` is pre-v6+favfix).
```bash
ssh user@nullstone 'set -e
TS=$(ls /opt/docker/jellyfin/web-overrides/index.html.bak.* 2>/dev/null | sort -V | tail -1)
echo "Restoring from: $TS"
docker run --rm --userns=host -v /opt/docker/jellyfin/web-overrides:/d:rw alpine sh -c "cp $TS /d/index.html && chown root:root /d/index.html && md5sum /d/index.html"
docker restart jellyfin && sleep 12
docker exec jellyfin md5sum /jellyfin/jellyfin-web/index.html'
```
Verify: `curl -s https://arrflix.s8n.ru/web/index.html | grep -c ARRFLIX-MIDDLE-THEME-BEGIN` returns `1`. Then hard-refresh the browser (`Ctrl+Shift+R`) — defeats Service Worker + HTTP cache.
## ROLLBACK 2 — branding.xml
Symptom: `/Branding/Css.css` returns 0 bytes (Cineplex theme stops loading site-wide). Usually caused by an unescaped `<video>` literal or other XML parse failure inside `<CustomCss>` (silent failure — HTTP 200 empty body, no admin alert).
```bash
ssh user@nullstone 'set -e
TS=$(ls /home/docker/jellyfin/config/config/branding.xml.bak.* 2>/dev/null | sort -V | tail -1)
echo "Restoring from: $TS"
docker run --rm --userns=host -v /home/docker/jellyfin/config/config:/d:rw alpine sh -c "cp $TS /d/branding.xml"
docker restart jellyfin && sleep 12
docker exec jellyfin curl -s http://127.0.0.1:8096/Branding/Css.css | wc -c'
```
Expect: `> 30000` bytes (v6-stable serves 36 256 B). 0 bytes = still broken — the backup itself was corrupt; pick an older `.bak.*` and retry.
## ROLLBACK 3 — encoding.xml (HDR / tonemap)
Symptom: HDR10 playback looks washed out / grey, or transcode fails after flipping `EnableTonemapping`. Backup created pre-flip as `encoding.xml.bak.pre-tonemap.1778318089`.
```bash
ssh user@nullstone 'set -e
TS=$(ls /home/docker/jellyfin/config/config/encoding.xml.bak.* 2>/dev/null | sort -V | tail -1)
echo "Restoring from: $TS"
docker run --rm --userns=host -v /home/docker/jellyfin/config/config:/d:rw alpine sh -c "cp $TS /d/encoding.xml"
docker restart jellyfin && sleep 12
docker exec jellyfin grep -E "EnableTonemapping|TonemappingAlgorithm|HardwareAccelerationType" /config/config/encoding.xml'
```
Verify: values match the last-known-good (v6-stable: `EnableTonemapping=true`, `TonemappingAlgorithm=bt2390`, `HardwareAccelerationType=none`). Stop any in-flight HDR10 transcode and re-start it from the client.
## ROLLBACK 4 — full prod = exact state from repo HEAD
When you don't trust the live state (drift, tampering, multiple bad deploys), force prod to match `git origin/main`:
```bash
cd /tmp/arrflix-recon
git fetch origin && git checkout origin/main
md5sum web-overrides/index.html
scp web-overrides/index.html user@nullstone:/tmp/repo-overlay.html
ssh user@nullstone 'set -e
docker run --rm --userns=host -v /opt/docker/jellyfin/web-overrides:/d:rw -v /tmp:/src:ro alpine sh -c "cp /src/repo-overlay.html /d/index.html && chown root:root /d/index.html && md5sum /d/index.html"
docker restart jellyfin && sleep 12
docker exec jellyfin md5sum /jellyfin/jellyfin-web/index.html'
```
Verify: container md5 == repo md5 (v6-stable: `364cc890c58f02d07cf50b43b31a48f0`). `curl -s https://arrflix.s8n.ru/web/index.html | md5sum` should match too.
## ROLLBACK 5 — dev = exact clone of prod
When dev has drifted and you want to reset it to live prod state (for a clean theme test sandbox):
```bash
ssh user@nullstone 'set -e
docker run --rm --userns=host \
-v /opt/docker/jellyfin/web-overrides:/p:ro \
-v /opt/docker/jellyfin-dev/web-overrides:/d:rw \
alpine sh -c "cp /p/index.html /d/index-dev.html && chown 1000:1000 /d/index-dev.html && md5sum /p/index.html /d/index-dev.html"
docker restart jellyfin-dev && sleep 12
docker exec jellyfin-dev md5sum /jellyfin/jellyfin-web/index.html'
```
Note dev's overlay filename is `index-dev.html` (NOT `index.html`) and ownership is `user:user` (1000:1000), unlike prod's `root:root`. Also resync `branding.xml` if needed: source `/home/docker/jellyfin/config/config/branding.xml` → dest `/home/docker/jellyfin-dev/config/config/branding.xml` (backup as `branding.xml.bak.dev-pre-resync` first).
## ROLLBACK 6 — git revert last commit
When the bad change is in repo HEAD and you want it gone from history (then redeploy cleanly):
```bash
cd /tmp/arrflix-recon
git log --oneline -5
git revert HEAD --no-edit
git push origin main
```
Then redeploy via ROLLBACK 4 to push the reverted state to prod. If the bad commit is more than one back, use `git revert <sha>` for each, or `git revert <bad-sha>..HEAD --no-edit` for a range.
## ROLLBACK 7 — recover dev `test`/`123` password
Symptom: dev login broken, can't auth as `test`/`123`. Standard sqlite password-reset cycle (dev only — never prod).
```bash
ssh user@nullstone 'set -e
docker stop jellyfin-dev
DB=/home/docker/jellyfin-dev/config/data/jellyfin.db
cp $DB ${DB}.bak.pre-pwreset.$(date +%s)
docker run --rm -v /home/docker/jellyfin-dev/config/data:/d:rw alpine sh -c "apk add --no-cache sqlite >/dev/null && sqlite3 /d/jellyfin.db \"UPDATE Users SET Password=NULL, EasyPassword=NULL WHERE Username=\x27test\x27;\""
chown -R 1000:1000 /home/docker/jellyfin-dev/config/data
docker start jellyfin-dev && sleep 12'
```
Then in browser: log in as `test` with **blank** password → admin → user `test` → set password to `123`. Verify `curl -X POST https://dev.arrflix.s8n.ru/Users/AuthenticateByName -H 'Content-Type: application/json' -d '{"Username":"test","Pw":"123"}'` returns a token.
## ROLLBACK 8 — bind-mount inode swap (just restart)
Symptom: you changed an overlay file and the served bytes don't match the file on disk. `docker exec jellyfin md5sum /jellyfin/jellyfin-web/index.html` differs from `md5sum /opt/docker/jellyfin/web-overrides/index.html`. Bind-mount captured the old inode; container is serving stale.
```bash
ssh user@nullstone 'docker restart jellyfin && sleep 12 && docker exec jellyfin md5sum /jellyfin/jellyfin-web/index.html'
# or for dev:
ssh user@nullstone 'docker restart jellyfin-dev && sleep 12 && docker exec jellyfin-dev md5sum /jellyfin/jellyfin-web/index.html'
```
This is not a "rollback" per se — it's the cure for any `cp`-without-restart that left the container out of sync.
## Backups directory layout
| Path | Purpose |
|------|---------|
| `/opt/docker/jellyfin/web-overrides/index.html.bak.*` | prod overlay (root:root) |
| `/opt/docker/jellyfin-dev/web-overrides/index-dev.html.bak.*` | dev overlay (user:user, note `-dev` suffix) |
| `/home/docker/jellyfin/config/config/branding.xml.bak.*` | prod branding |
| `/home/docker/jellyfin/config/config/encoding.xml.bak.*` | prod encoding |
| `/home/docker/jellyfin-dev/config/config/branding.xml.bak.*` | dev branding |
| `/home/docker/jellyfin-dev/config/data/jellyfin.db.bak.*` | dev sqlite (user db) |
Keep the **most recent** `.bak` per file; older ones can be deleted (per doc 30 cleanup). Never delete a `.bak.*` you haven't verified is older than the current good state.
## After any rollback
1. **Notify users** — restart drops in-flight stream sessions; if anyone was mid-playback they'll get bumped.
2. **Open an incident** in `testing/incidents/` (post-mortem) — what broke, what backup was used, container md5 before/after, owner-visible impact.
3. **Add the failure mode to `testing/ERROR-PATTERNS.md`** if novel.
4. **Verify v6-stable invariants** — overlay md5 prod==dev, `/Branding/Css.css` > 30 000 B, `EnableTonemapping=true`, login + playback both green via `testing/SMOKE-TEST.md`.