veilor-os/bluebuild/recipe.yml
veilor-org 7d2b94b5be feat(hardening): add memory-pressure tuning for zram-only stack
veilor-os runs zram-only swap (THREAT-MODEL.md — no key leak from
disk swap). With kernel defaults that policy bites: once zram fills
there is no overflow tier, the kernel waits until total exhaustion
to trigger OOM, then picks a victim by oom_score and frequently
kills plasmashell or the foreground terminal instead of the leaking
browser tab. Mouse locks for minutes during the thrash window.

Three co-dependent layers:

1. systemd-oomd enabled — PSI-based pre-OOM killer fires at cgroup
   boundaries before the kernel reaper. Fedora's systemd-oomd-defaults
   ship sane thresholds for user.slice; installed in kickstart and
   layered in bluebuild containerfile, enabled in both unit-toggle
   blocks.

2. zram bumped 8 GiB lzo-rle (Fedora default) -> 16 GiB zstd. zstd
   gives ~3:1 (~48 GiB effective) at negligible CPU cost on any
   post-2018 x86_64. 8 GiB filled in practice on 32+ GiB laptops
   running Chromium + LSP + chat clients.

3. /etc/sysctl.d/95-memory-pressure.conf:
   - vm.swappiness=180 (zram is RAM-fast, swap early; default 60
     assumes HDD)
   - vm.watermark_scale_factor=125 (kswapd reclaim starts ~1.25%
     headroom vs default 0.1%; ~400 MiB head start on 32 GiB)
   - vm.page-cluster=0 (no read-ahead; pointless on RAM-backed swap,
     wastes decompress)

Without any one of the three the system still wedges briefly: oomd
without zram tuning waits for PSI to climb; zram tuning without oomd
gets victim selection wrong.

Verified by new test/boot-checklist.md "Memory pressure" section.
Inline rationale headers in both overlay files so the why survives
doc drift. Trigger event: onyx (Fedora 43, not veilor-os) thrashed
2026-05-11; same defaults shipped to veilor-os, fixed here too.
2026-05-12 10:17:00 +01:00

165 lines
8.3 KiB
YAML
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# veilor-os — BlueBuild recipe (v0.7 spike, 1-day target)
#
# Extends secureblue's hardened Kinoite OCI image with veilor branding,
# threat-model-driven UX choices, and the three-layer mesh stack
# (Tailscale + Yggdrasil + opt-in Reticulum). This is the OCI image
# that the v0.7+ kickstart's `ostreecontainer` directive pulls into
# the target root during the install pass.
#
# Build: bluebuild build recipe.yml
# Test: podman run --rm -it ghcr.io/veilor-org/veilor-os:43 /bin/bash
# CI: .github/workflows/build-bluebuild.yml signs + pushes to GHCR.
#
# Reference: https://blue-build.org/reference/recipe/
#
# ── Module collapse history ──────────────────────────────────────
# Run 183 (2026-05-08) ate 3h10min before runner timeout: each RUN/COPY
# layer COMMIT under fuse-overlayfs over secureblue's 130-layer hardened
# base costs ~40min wallclock (STEP 10..13 each 3843min). Ergo: every
# saved module = ~40min saved. Collapsed (A1b):
# - 5× rpm-ostree → 1× (-4 layers)
# - 2× containerfile (brand sed + systemctl enable) → 1× (-1 layer)
# - 4× copy left as-is — BlueBuild copy module is one src/dest per
# entry per https://blue-build.org/reference/modules/copy/
# Net: 12 → 7 modules, ~5×40min ≈ 3h20min off wallclock budget.
#
# Run 189 + 191 (2026-05-08) — surviving rpm-ostree module hit the same
# `chmod: Operation not permitted` bug we already worked around for
# type:files / type:script / type:systemd: BlueBuild's helper scripts
# (here `/tmp/modules/rpm-ostree/rpm-ostree.sh`) try to chmod themselves
# inside their own buildah bind-mount under userns=host and fail.
#
# A1c fix: drop type:rpm-ostree, fold its install list into the existing
# containerfile module as a raw RUN. Per BB containerfile docs each
# `snippets:` entry = its own layer, so we MERGE pkg-install + brand +
# systemctl into ONE snippet (= one RUN, one layer). Ordering: install
# packages first (yggdrasil/tailscale/etc must exist before systemctl
# enable/disable touches their units), then brand sed, then unit toggles.
# `ostree container commit` ends the snippet because BB's rpm-ostree
# module wraps it implicitly; raw RUN must do it manually for parity.
# Mullvad + Tailscale repo files curl'd in same RUN — secureblue base
# does not ship either repo, and the previous type:rpm-ostree must have
# silently failed earlier (build never got that far in 189/191).
# Net: 7 → 6 modules, one more layer commit avoided.
---
name: veilor-os
description: Hardened security-branded Fedora KDE on top of secureblue.
# Base image: secureblue's hardened Kinoite variant with userns sandboxing.
# That brings in: sysctl + kargs + custom SELinux policy + USBGuard +
# hardened-malloc + Unbound DoT + chronyd NTS + Trivalent browser.
base-image: ghcr.io/secureblue/kinoite-main-hardened
image-version: latest
modules:
# ── 1. veilor branding overlay ──────────────────────────────────
# `type: copy` is a low-level direct COPY (no chmod, no script).
# `type: files` was failing with `chmod: Operation not permitted` on
# the BlueBuild-shipped /tmp/modules/files/files.sh under buildah +
# podman privileged in our runner — the script tries to make itself
# executable inside its own bind-mounted layer.
#
# NOTE: Each copy module = one COPY layer (~40min commit on our
# runner). BlueBuild's copy module accepts a single src/dest pair
# only, so these four entries are the floor unless we move to a
# hand-rolled Containerfile.
- type: copy
source: ../overlay
destination: /
- type: copy
source: ../assets
destination: /usr/share/veilor-os/assets
- type: copy
source: ../scripts
destination: /usr/share/veilor-os/scripts
- type: copy
source: config/just
destination: /usr/share/ublue-os/just
# ── 2. Packages + branding + unit toggles in ONE RUN snippet ────
# secureblue removes sudo + replaces with run0 (too disruptive for
# daily-driver) — restore. Xwayland was disabled for attack-surface
# reduction — restore for Element/Slack/Qt5 apps. Mullvad Browser
# layered alongside Trivalent (Trivalent default per STRATEGY.md;
# Mullvad for pseudonymous browsing). Mesh stack: Tailscale (Layer
# 1, daily driver, pre-disabled), Yggdrasil-go (Layer 2, idle warm-
# fallback). Reticulum/RetiNet stays opt-in via ujust. Memory
# hygiene + ergonomic deps for veilor-postinstall + veilor-doctor.
#
# Repos: secureblue base ships neither mullvad nor tailscale repos.
# curl them into /etc/yum.repos.d/ inside the same RUN, before the
# rpm-ostree install. Both pinned to upstream stable for Fedora.
#
# Branding + unit toggles run in the same RUN (= same layer) AFTER
# rpm-ostree install so systemctl enable yggdrasil / disable tailscaled
# see their unit files.
#
# Helper-script avoidance: BlueBuild's `type: rpm-ostree` /
# `type: files` / `type: script` / `type: systemd` modules all hit
# `chmod: Operation not permitted` on their own bind-mounted helper
# script under buildah userns=host (run 189 + 191, last-frame error:
# `chmod: changing permissions of '/tmp/modules/rpm-ostree/rpm-ostree.sh':
# Operation not permitted`). Raw `type: containerfile` RUN bypasses
# the whole helper-script layer.
#
# ostree container commit at the end mirrors what BB's wrapped
# rpm-ostree module does implicitly — finalizes the layer for the
# secureblue / Universal Blue base expectation.
#
# brand-leak grep moved to CI smoke-test in build-bluebuild.yml
# (STEP 14 hung under buildah overlayfs, run 171 2026-05-07).
- type: containerfile
snippets:
- |
RUN set -euo pipefail ; \
curl -fsSL https://repository.mullvad.net/rpm/stable/mullvad.repo \
-o /etc/yum.repos.d/mullvad.repo ; \
curl -fsSL https://pkgs.tailscale.com/stable/fedora/tailscale.repo \
-o /etc/yum.repos.d/tailscale.repo ; \
rpm-ostree install \
sudo \
xorg-x11-server-Xwayland \
mullvad-browser \
tailscale \
yggdrasil \
zram-generator \
systemd-oomd-defaults \
jq \
vim-enhanced \
tmux \
htop ; \
{ sed -i -e 's|^GRUB_DISTRIBUTOR=.*|GRUB_DISTRIBUTOR="veilor-os"|' /etc/default/grub 2>/dev/null || true ; \
bash /usr/share/veilor-os/scripts/kde-theme-apply.sh 2>/dev/null || true ; \
bash /usr/share/veilor-os/scripts/30-apply-v03-theme.sh 2>/dev/null || true ; \
plymouth-set-default-theme details 2>/dev/null || true ; \
chmod +x /usr/share/veilor-os/scripts/*.sh \
/usr/share/veilor-os/scripts/selinux/*.sh \
/usr/local/bin/veilor-* 2>/dev/null || true ; \
fc-cache -f 2>/dev/null || true ; \
if [ -f /etc/os-release ]; then \
sed -i \
-e 's|^NAME=.*|NAME="veilor-os"|' \
-e 's|^PRETTY_NAME=.*|PRETTY_NAME="veilor-os 0.7 (atomic)"|' \
-e 's|^ID=.*|ID=veilor|' \
-e 's|^ID_LIKE=.*|ID_LIKE="fedora kinoite"|' \
/etc/os-release || true ; \
fi ; \
systemctl enable yggdrasil.service 2>/dev/null || true ; \
systemctl disable tailscaled.service 2>/dev/null || true ; \
systemctl enable veilor-firstboot.service 2>/dev/null || true ; \
systemctl enable veilor-modules-lock.service 2>/dev/null || true ; \
systemctl enable veilor-postinstall.service 2>/dev/null || true ; \
systemctl enable veilor-doctor.timer 2>/dev/null || true ; \
systemctl enable systemd-oomd.service 2>/dev/null || true ; \
} ; \
rpm-ostree cleanup -m ; \
ostree container commit
# ── 4. signing config ───────────────────────────────────────────
# cosign.pub committed alongside this recipe; cosign.key kept off
# repo and provided to CI as Forgejo secret COSIGN_PRIVATE_KEY.
# The action exports it to /tmp at build time.
- type: signing