veilor-os/overlay/etc/sysctl.d/95-memory-pressure.conf
veilor-org 7d2b94b5be feat(hardening): add memory-pressure tuning for zram-only stack
veilor-os runs zram-only swap (THREAT-MODEL.md — no key leak from
disk swap). With kernel defaults that policy bites: once zram fills
there is no overflow tier, the kernel waits until total exhaustion
to trigger OOM, then picks a victim by oom_score and frequently
kills plasmashell or the foreground terminal instead of the leaking
browser tab. Mouse locks for minutes during the thrash window.

Three co-dependent layers:

1. systemd-oomd enabled — PSI-based pre-OOM killer fires at cgroup
   boundaries before the kernel reaper. Fedora's systemd-oomd-defaults
   ship sane thresholds for user.slice; installed in kickstart and
   layered in bluebuild containerfile, enabled in both unit-toggle
   blocks.

2. zram bumped 8 GiB lzo-rle (Fedora default) -> 16 GiB zstd. zstd
   gives ~3:1 (~48 GiB effective) at negligible CPU cost on any
   post-2018 x86_64. 8 GiB filled in practice on 32+ GiB laptops
   running Chromium + LSP + chat clients.

3. /etc/sysctl.d/95-memory-pressure.conf:
   - vm.swappiness=180 (zram is RAM-fast, swap early; default 60
     assumes HDD)
   - vm.watermark_scale_factor=125 (kswapd reclaim starts ~1.25%
     headroom vs default 0.1%; ~400 MiB head start on 32 GiB)
   - vm.page-cluster=0 (no read-ahead; pointless on RAM-backed swap,
     wastes decompress)

Without any one of the three the system still wedges briefly: oomd
without zram tuning waits for PSI to climb; zram tuning without oomd
gets victim selection wrong.

Verified by new test/boot-checklist.md "Memory pressure" section.
Inline rationale headers in both overlay files so the why survives
doc drift. Trigger event: onyx (Fedora 43, not veilor-os) thrashed
2026-05-11; same defaults shipped to veilor-os, fixed here too.
2026-05-12 10:17:00 +01:00

45 lines
2.2 KiB
Text

# veilor-os — memory-pressure tuning for zram-only swap
#
# Rationale: veilor-os ships zram swap with NO disk swap (see THREAT-MODEL.md
# §"Lost or stolen laptop"). The kernel's default vm.* knobs assume a slow
# spinning disk and refuse to swap until physical RAM is nearly exhausted.
# Under a zram-only stack that policy is wrong on two axes:
#
# 1. zram is RAM-fast — there is no penalty for swapping early, only a
# small CPU cost for zstd compress/decompress.
# 2. Once zram fills, there is no overflow (no disk swap by design), so
# the kernel falls through to OOM. With default knobs the OOM trigger
# is slow and reactive: by the time it fires, the system has spent
# minutes in thrash (compositor/input frozen, mouse stuck) and the
# kernel picks a victim by oom_score which is often plasmashell or
# the terminal — i.e. the user's session goes down, not the runaway.
#
# What these knobs do:
#
# vm.swappiness = 180
# Tell the kernel to prefer evicting anonymous pages to (zram) swap
# over reclaiming file-backed pages. Fedora's zram-generator upstream
# recommends 180 for zram-only systems. Default 60 is tuned for HDD
# swap and leaves zram unused until too late.
#
# vm.watermark_scale_factor = 125
# Start kswapd reclaim earlier (~1.25% of RAM headroom vs default
# 0.1%). On a 32 GiB box that's ~400 MiB head start before allocations
# would otherwise stall in direct-reclaim. Trades a tiny amount of
# usable RAM for much smoother latency under bursty allocators
# (Chromium/Electron tab spawns, language server warm-up).
#
# vm.page-cluster = 0
# Read one page per swap-in instead of the default 8. Read-ahead is a
# win on rotational media because seeks dominate; on zram the seek
# cost is zero and grabbing 7 extra pages just wastes decompress
# cycles and CPU cache. Setting to 0 is the documented zram tuning.
#
# Companion: systemd-oomd is enabled in the same change so PSI-based
# pre-OOM kills land on the right cgroup before the kernel OOM reaper
# fires. Without it, even with these knobs the system can still wedge
# briefly while the kernel waits for the global watermark.
vm.swappiness = 180
vm.watermark_scale_factor = 125
vm.page-cluster = 0