veilor-os/docs/HARDENING.md
veilor-org 7d2b94b5be feat(hardening): add memory-pressure tuning for zram-only stack
veilor-os runs zram-only swap (THREAT-MODEL.md — no key leak from
disk swap). With kernel defaults that policy bites: once zram fills
there is no overflow tier, the kernel waits until total exhaustion
to trigger OOM, then picks a victim by oom_score and frequently
kills plasmashell or the foreground terminal instead of the leaking
browser tab. Mouse locks for minutes during the thrash window.

Three co-dependent layers:

1. systemd-oomd enabled — PSI-based pre-OOM killer fires at cgroup
   boundaries before the kernel reaper. Fedora's systemd-oomd-defaults
   ship sane thresholds for user.slice; installed in kickstart and
   layered in bluebuild containerfile, enabled in both unit-toggle
   blocks.

2. zram bumped 8 GiB lzo-rle (Fedora default) -> 16 GiB zstd. zstd
   gives ~3:1 (~48 GiB effective) at negligible CPU cost on any
   post-2018 x86_64. 8 GiB filled in practice on 32+ GiB laptops
   running Chromium + LSP + chat clients.

3. /etc/sysctl.d/95-memory-pressure.conf:
   - vm.swappiness=180 (zram is RAM-fast, swap early; default 60
     assumes HDD)
   - vm.watermark_scale_factor=125 (kswapd reclaim starts ~1.25%
     headroom vs default 0.1%; ~400 MiB head start on 32 GiB)
   - vm.page-cluster=0 (no read-ahead; pointless on RAM-backed swap,
     wastes decompress)

Without any one of the three the system still wedges briefly: oomd
without zram tuning waits for PSI to climb; zram tuning without oomd
gets victim selection wrong.

Verified by new test/boot-checklist.md "Memory pressure" section.
Inline rationale headers in both overlay files so the why survives
doc drift. Trigger event: onyx (Fedora 43, not veilor-os) thrashed
2026-05-11; same defaults shipped to veilor-os, fixed here too.
2026-05-12 10:17:00 +01:00

223 lines
9.5 KiB
Markdown

# Hardening Reference
What veilor-os locks down and why. Each item is applied by either the
kickstart `%post` or the overlay tree shipped in `/etc`.
## Boot chain
| Item | State | Source |
|------|-------|--------|
| Secure Boot | Required (bootloader signed) | `bootloader` kickstart line |
| Kernel lockdown | `lockdown=integrity` | bootloader kernel args |
| Slab hardening | `slab_nomerge`, `init_on_alloc=1`, `init_on_free=1` | bootloader |
| Stack offset | `randomize_kstack_offset=on` | bootloader |
| vsyscall | `vsyscall=none` | bootloader |
| LUKS2 | aes-xts-plain64 / argon2id, mem=1GB, time=9 | `part pv.veilor` |
| Module loading | Locked 30s after graphical boot | `veilor-modules-lock.service` |
## Kernel sysctl
`/etc/sysctl.d/99-veilor-hardening.conf`:
| Key | Value | Why |
|-----|-------|-----|
| `kernel.kptr_restrict` | 2 | hide kernel pointers from /proc |
| `kernel.dmesg_restrict` | 1 | dmesg root-only |
| `kernel.yama.ptrace_scope` | 2 | ptrace = parent only |
| `kernel.perf_event_paranoid` | 3 | unprivileged perf disabled |
| `net.core.bpf_jit_harden` | 2 | BPF JIT constant blinding |
| `kernel.randomize_va_space` | 2 | full ASLR |
| `fs.suid_dumpable` | 0 | no SUID core dumps |
| `dev.tty.ldisc_autoload` | 0 | block tty LPE vector |
| `net.ipv4.tcp_syncookies` | 1 | SYN flood mitigation |
| `net.ipv4.conf.all.rp_filter` | 1 | reverse-path filter |
| `accept_source_route` | 0 (v4+v6) | ignore source routing |
| `accept_redirects` | 0 (v4+v6) | ignore ICMP redirects |
## SELinux
- Enforcing, targeted policy.
- Custom module `veilor-systemd` grants `systemd_modules_load_t` the
`sys_admin` and `perfmon` capabilities required by the modules-lock
service. Source: `scripts/selinux/veilor-systemd.te`.
### veilor-firstboot SELinux confinement
The first-boot password service is privileged (it has to write
`/etc/shadow`) but small. Module `veilor-firstboot` carves a tight domain:
- Allowed: read `/etc/passwd`, exec `passwd(1)`, write
`/var/lib/veilor-firstboot.done`, write `/etc/sddm.conf.d/`,
start `sddm.service`.
- `neverallow` rules block: network sockets (no phone-home),
`home_root_t` / `user_home_t` access, `sys_module`, `sys_ptrace`,
`sys_rawio`.
Source: `scripts/selinux/veilor-firstboot.te`. Build & load with
`scripts/selinux/build-policy.sh` (loads all modules in one pass).
## Network surface
- **firewalld** default zone = `drop`.
- **Inbound:** ssh only.
- **systemd-resolved:** LLMNR off, DNSSEC `allow-downgrade`,
DNS-over-TLS opportunistic. Resolvers: Cloudflare (1.1.1.1, 1.0.0.1),
fallback Quad9 (9.9.9.9, 149.112.112.112).
- **chrony:** NTS-authenticated time from `time.cloudflare.com` and
`nts.sth1/2.ntp.se`. Pool fallback only.
## SSH
`/etc/ssh/sshd_config.d/10-veilor-hardening.conf`:
- `PasswordAuthentication no`
- `PermitRootLogin no`
- `AllowUsers admin`
- `X11Forwarding no`
- `MaxAuthTries 3`
- `ClientAliveInterval 300`
- `LogLevel VERBOSE`
## Auth / accounts
- Root account **locked** (`passwd -l root`). No interactive root login.
- Single `admin` user, `wheel` group, full sudo.
- `pwquality.conf`: minlen=14, 4 character classes required, dictcheck.
- **First-boot password flow:** `chage -d 0 admin` expires the empty
password immediately. `veilor-firstboot.service` runs on TTY1 before
SDDM, prompts for new password, then starts the graphical session.
## Audit
`/etc/audit/rules.d/99-veilor-hardening.rules` watches:
- `/etc/passwd`, `/etc/shadow`, `/etc/group`, `/etc/gshadow`
- `/etc/sudoers`, `/etc/sudoers.d/`
- `/etc/ssh/sshd_config*`, `/etc/selinux/`, `/etc/firewalld/`
- `/etc/cron.*`, `/var/spool/cron/`
- `/etc/sysctl.*`, `/etc/systemd/system/`, `/usr/lib/systemd/system/`
- All privileged binaries (sudo, su, passwd, mount, pkexec, etc.)
- Kernel module load/unload syscalls
- Permission/ownership changes by uid≥1000
## Intrusion detection
`fail2ban` jails:
- `sshd` — aggressive mode, 3 retries, 24h ban
- `pam-generic` — 5 retries, 1h ban (catches XDM, su, sudo failures)
Backend: systemd journal. Action: firewalld rich rules.
## USB
`USBGuard` daemon, `ImplicitPolicyTarget=block`.
Ships with **empty allowlist**. On first boot, admin runs:
```bash
sudo usbguard generate-policy > /etc/usbguard/rules.conf
sudo systemctl restart usbguard
```
This snapshots all currently-connected devices into the allowlist.
Anything plugged in afterward is blocked unless explicitly allowed:
```bash
sudo usbguard list-devices
sudo usbguard allow-device <id>
```
## Disabled services
`abrt*`, `cups`, `cups-browsed`, `geoclue`, `avahi-daemon`,
`bluetooth`, `ModemManager`, `gssproxy`, `atd`, `pcscd.socket`,
`pcscd.service`, `kdeconnectd` (removed at package level).
## AppArmor (v0.5)
Fedora 43 ships AppArmor alongside SELinux. veilor-os keeps SELinux as the
primary MAC layer (enforcing, targeted) but ships AppArmor profile
skeletons for high-risk userland binaries that benefit from a second,
binary-scoped policy on top of SELinux's role-based one.
Profiles live in `scripts/apparmor/`:
| Profile | Target | Default mode |
|---------|--------|--------------|
| `usr.bin.thorium` | Thorium browser | `complain` |
| `usr.local.bin.lm-studio` | LM Studio LLM runner | `complain` |
| `usr.bin.veilor-power` | Power profile switcher | `enforce` |
Profiles are **not** loaded automatically — they are opt-in until v0.5.
Enable a profile post-install with:
```bash
sudo dnf install apparmor-utils apparmor-parser
sudo install -m 0644 scripts/apparmor/usr.bin.thorium /etc/apparmor.d/
sudo apparmor_parser -r /etc/apparmor.d/usr.bin.thorium
sudo aa-complain /etc/apparmor.d/usr.bin.thorium # log only
sudo aa-enforce /etc/apparmor.d/usr.bin.thorium # block
```
Refine `complain`-mode profiles with `aa-logprof` after exercising the
app through normal use; it converts logged denials into rule additions
interactively.
## Audit log shipping (optional)
Local journald is the default audit sink. For off-device shipping to a
trusted log collector (Loki / Wazuh / Splunk), veilor-os ships a
disabled-by-default plugin template:
- `/etc/audit/plugins.d/veilor-remote.conf` — auditd plugin shim
(set `active = yes` to enable).
- `/etc/audisp/audisp-remote.conf.disabled` — audisp-remote target
config template (rename to `audisp-remote.conf` and edit
`remote_server` to enable).
**Warning:** enabling remote audit shipping leaks every privileged syscall,
file-watch hit, and auth event off-device. Treat the collector as a host
with the same trust level as root. Only enable if the collector itself is
hardened and the transport is TLS or kerberized.
Reference integration paths in the template: Loki via promtail/vector
syslog source, Wazuh via local wazuh-agent (no network shipping needed),
Splunk via HEC bridge.
## What's *not* enabled by default
- **Disk swap** — replaced by zram (RAM-only, no key leak risk).
## Memory pressure
veilor-os runs **zram-only swap** (see THREAT-MODEL.md — keeps cleartext
session keys out of any persistent allocation that would survive
suspend-to-disk or a yanked drive). That stance has a sharp edge: once
zram fills, there is no overflow tier. With stock kernel defaults the
result is a multi-minute thrash — input compositor frozen, mouse stuck,
keyboard ignored — followed by a kernel OOM kill that picks the wrong
victim (often `plasmashell` or the foreground terminal) because the
runaway browser tab has a lower oom_score than the long-lived session
process. The user's desktop dies; the leaking app survives.
Three layers of mitigation ship by default:
| Layer | File | What it does | Failure mode if absent |
|-------|------|--------------|------------------------|
| **systemd-oomd** | enabled in `kickstart/veilor-os.ks` `%post` and in `bluebuild/recipe.yml` unit-toggle RUN | PSI-based pre-OOM killer — picks the cgroup under highest memory+IO pressure and terminates it *before* the kernel's global reaper fires. Reads from `/proc/pressure/*`, kills at the cgroup boundary so siblings survive. | Kernel waits until total exhaustion. Picks by oom_score → plasmashell / terminal die, browser tab keeps leaking. Mouse locks during the wait. |
| **zram-generator** override | `overlay/etc/systemd/zram-generator.conf` (and matching `%post` write) | 16 GiB compressed with `zstd` (~3:1 → ~48 GiB effective). Replaces Fedora default 8 GiB / lzo-rle. | 8 GiB fills under sustained pressure on 32+ GiB laptops running Chromium + LSP + chat. No overflow (no disk swap) → straight to OOM. |
| **vm.* sysctl** | `overlay/etc/sysctl.d/95-memory-pressure.conf` | `swappiness=180` (use zram early — it's RAM-fast), `watermark_scale_factor=125` (kswapd starts reclaim ~1.25 % headroom vs default 0.1 %), `page-cluster=0` (no read-ahead — pointless on RAM-backed swap, wastes decompress cycles). | Defaults `60 / 10 / 3` assume slow HDD swap. Kernel refuses to swap until allocations stall in direct-reclaim → thrash window before either oomd or kernel OOM acts. |
All three are co-dependent: oomd without zram tuning still wedges
briefly waiting for PSI to climb; zram tuning without oomd still gets
kernel-OOM victim selection wrong. Verified by `test/boot-checklist.md`
"Memory pressure" section.
Layer rationale logged in `overlay/etc/sysctl.d/95-memory-pressure.conf`
and `overlay/etc/systemd/zram-generator.conf` headers — kept inline so
the *why* survives even if this doc is deleted.
- **Bluetooth** — disabled. Enable with `systemctl enable --now bluetooth`.
- **Printing** — CUPS removed. Reinstall if needed: `dnf install cups`.
- **Snapd, Flatpak** — not installed (Flatpak optional add-on).
- **PackageKit** — removed; updates manual via `dnf`.