Companion to the memory-pressure tuning (7d2b94b). Memory was only
half the "expensive laptop typing like a Chromebook" story — once
zram-only OOM thrash was solved, a second symptom class emerged:
post-boot CPU/IO contention on machines with high core counts.
Live incident on a 24-thread Ryzen AI 9 HX 370 / 30 GiB workstation,
2026-05-13: ~16 min after login, load avg 6.5, typing in konsole and
the address bar lagged 100s of ms. RAM/swap uncontended (8 GiB/30 GiB
used, zero swap), so the memory tuning was holding. PSI showed
cpu some=0.34 — pure scheduler contention.
Root cause: every Fedora unit ships with CPUWeight=[not set] which
maps to weight=100. Under contention the kernel splits CPU evenly
between every leaf cgroup. With the post-boot storm running
concurrently (plasma-discover ~80%, packagekitd ~33%, fwupd ~20%,
dnf-makecache firing) the compositor (kwin_wayland, plasmashell) was
losing scheduling fights against package metadata.
Three fixes shipped together:
1. system-bg.slice — CPUWeight=20, IOWeight=50, MemoryHigh=4G. Five
service drop-ins assign packagekit, fwupd, fwupd-refresh,
dnf-makecache, dnf5-automatic into it with Nice=10 and
IOSchedulingClass=idle. Proportional, not a hard cap — idle
systems still get full speed.
2. user-.slice.d/10-boost.conf — CPUWeight=300, IOWeight=200 on every
logged-in user session. Combined with above gives a 15:1
interactive:background ratio under contention.
3. Boot-storm sources defused: skel autostart shadow disables the
discover update notifier auto-launch; dnf-makecache.timer
OnBootSec=20min pushes metadata refresh past peak session
bring-up.
One opt-in artifact: skel user-bg.slice (CPUWeight=30) for anyone
installing Syncthing, rclone, or a file indexer — drop a
Slice=user-bg.slice drop-in on the service to inherit the same
protection at the user level.
Verified live before opening this PR: load dropped 6.53 -> 3.55
within minutes of applying; cgroup placement confirmed via
systemd-cgls.
Follow-up filed in CHANGELOG (not in this PR): tuned-adm
"onyx-performance" profile silently falls back to balanced, and
EPP regresses to balance_performance on AC. Needs separate branch.
19 KiB
Changelog
All notable changes to veilor-os are documented here.
The format follows Keep a Changelog, and this project loosely follows Semantic Versioning during the pre-1.0 phase.
Each release section records the bug found and the fix applied so future maintainers can see why a change exists, not just what it changes.
[Unreleased]
Hardening: CPU/IO slice isolation for background services
Companion to the memory-pressure tuning (see prior entry). Memory was only half the story — once OOM thrash was solved, a second class of "why is my expensive laptop typing like a Chromebook" symptom emerged: post-boot CPU/IO contention.
Bug found
Live incident on a 24-thread Ryzen AI 9 HX 370 / 30 GiB workstation,
2026-05-13: ~16 minutes after login, load avg climbed to ~6.5, typing
in konsole and the address bar lagged by hundreds of ms. RAM and swap
were uncontended (8 GiB used / 30 GiB total, zero swap), so the
memory-pressure work was holding. PSI showed cpu some=0.34 — pure
scheduler contention.
Root cause: every Fedora unit ships with CPUWeight=[not set]
(defaults to 100), so under contention the kernel's CFQ splits CPU
evenly between every leaf cgroup. With the post-boot storm running
concurrently:
plasma-discover(KDE update GUI, autostarted via/etc/xdg/autostart/org.kde.discover.notifier.desktop) — ~80 % CPU doing repo metadata refreshpackagekitd(the discover backend) — ~33 %fwupd+fwupd-refresh— ~20 %dnf-makecache.timerfiring in the same windowkwin_wayland(~33 %) andplasmashell(~19 %) competing on equal footing with all of the above
The compositor lost scheduling fights against package metadata, hence
the typing lag. zram-only swap and vm.swappiness=180 are correct for
this stack but do nothing for a CPU-bound storm.
Fix applied
Two new slices in overlay/etc/systemd/system/:
system-bg.slice—CPUWeight=20,IOWeight=50,MemoryHigh=4G. Drop-ins assignpackagekit.service,fwupd.service,fwupd-refresh.service,dnf-makecache.service, anddnf5-automatic.serviceinto it withNice=10andIOSchedulingClass=idle.user-.slice.d/10-boost.conf—CPUWeight=300,IOWeight=200on every logged-in user session. Combined with above, gives a 15:1 interactive:background CPU ratio under contention. Idle systems still get full speed; weights are proportional, not hard caps.
Two boot-storm sources defused:
overlay/etc/skel/.config/autostart/org.kde.discover.notifier.desktopshadows the system autostart withHidden=true. Updates still flow viadnf5-automatic.timer; users can launch Discover manually. No GUI fires at session start.dnf-makecache.timer.d/10-delay.confpushesOnBootSec=20minso metadata refresh lands past peak session bring-up.
One opt-in artifact for users:
overlay/etc/skel/.config/systemd/user/user-bg.slice(CPUWeight=30,IOWeight=50,MemoryHigh=3G). Veilor-os does not ship sync tools by default, but anyone installing Syncthing / rclone / a file indexer can drop aSlice=user-bg.slicedrop-in on the service and inherit the same protection at the user level.
Verified live (post-incident workstation, before opening the PR):
slice CPUWeight IOWeight MemoryHigh
system-bg.slice 20 50 4G
user-1000.slice 500 500 infinity
user-bg.slice 30 50 3G
cgroup placement confirmed via systemd-cgls: packagekit.service
under /system.slice/system-bg.slice/, syncthing.service under
/user.slice/user-1000.slice/.../user-bg.slice/. Load dropped from
6.53 → 3.55 within minutes of applying, and typing in the compositor
recovered immediately on the next contention event.
Follow-up surfaced during this work (not in this PR)
While debugging "still feels laggy after slice fix" on the same workstation, found two power-profile bugs worth a separate investigation:
tuned-adm activereportedbalanceddespite the system being on AC + charging. EPP wasbalance_performanceand all 24 cores sat pinned atscaling_min_freq(605 MHz) — typing latency was the CPU refusing to ramp on short bursts, even with no contention. Manually setting EPP toperformanceand switching to the stockthroughput-performanceprofile restored snappy input.tuned-adm profile onyx-performance(shipped viaoverlay/etc/tuned/profiles/) silently fell back tobalancedinstead of activating. No errors injournalctl -u tuned. The profile config or itstuned.confscript likely has a bad exit somewhere; needs reproduction in CI and a test that assertstuned-adm activematches what was requested.
Both are tracked for a follow-up branch — out of scope here because this PR only covers cgroup/slice isolation. Filing now so it does not get lost.
v0.7 BlueBuild OCI spike (active — v0.7-bluebuild-spike)
CI plumbing landed (~13 fixes) to unblock the first green BlueBuild run on the self-hosted Forgejo runner. Build still red as of 2026-05-08; OCI artifact + installer ISO pending green run.
Forgejo runner + build-image plumbing
- Forgejo runner upgraded to v6.4.0 with
userns-remap=default. Buildah needs--userns=hostto undo the remap inside the job; added to everybluebuild buildinvocation. - Custom build image
veilor-build:43(fedora:43 + nodejs + buildah deps). Replaces the upstream BlueBuild image, which lacked Forgejo-runner-friendly tooling. - Workflow now
runs-on: nullstone(single self-hosted runner, no nested docker). - Build timeout bumped 60 min → 360 min to absorb first-time secureblue base pulls on a cold runner.
Signing + registry auth
- cosign v2.4.1 installed from upstream binary (no Fedora RPM yet for v2.4.x).
- GHCR PAT login added so the BlueBuild step can pull
ghcr.io/secureblue/kinoite-main-hardened(rate-limited anonymous). - cosign keypair signing — keyless OIDC fails on Forgejo (no
Sigstore Fulcio integration), so we ship a static keypair under
the repo and sign with
cosign sign --key. Public key checked in for verification.
BlueBuild recipe pivots
- Base image switched to
ghcr.io/secureblue/kinoite-main-hardened(the actual published image). Prior reference tosecurecore-kinoite-hardened-usernswas a planning-phase guess and did not exist. - Module type pivots driven by buildah-privileged + bind-mounted helper
scripts hitting chmod-permitted blockers:
type: files→type: copy(files module's chmod step failed under bind-mount).type: script+type: systemd→type: containerfileRUN (single layer, no helper-script bind-mount).
Installer ISO — pivoted
- livemedia-creator → bootc-image-builder. livemedia-creator does
not support the
ostreecontainerinstall method (onlyostreesetup/url/nfs), so the v0.7 path required the swap. Build pending OCI artifact.
Docs
- This CHANGELOG entry.
- ROADMAP refresh — v0.5.0 marked done, v0.7 OCI marked in-flight, installer-iso pivot recorded, USB install-log persistence default-on promise documented, v1.0 ship criteria carried over.
Infra (out-of-tree, recorded for traceability)
- 2026-05-08 — Headscale OIDC 403 fixed by adding
172.20.0.0/24(docker proxy bridge gateway) to theno-guest@fileTraefik middleware allowlist on nullstone. Unblockstag:guestprovisioning for veilor-os clients. - All GitHub remotes removed from veilor-os local clones, six worktrees, and sibling projects (auth-limbo, minecraft-launcher, minecraft-server, infra). GH push-mirrors disabled. Forgejo-only since 2026-05-05.
Planned (deferred / parking)
- v0.3 polish — Plymouth black theme, SDDM theme, Konsole profile,
wallpaper SVG. Re-enable
init_on_alloc=1 init_on_free=1post-install viaveilor-firstbootso live boot stays fast but installed system keeps the memory hygiene. - USBGuard auto-snapshot on first boot.
- veilor-firstboot UX improvements (cleaner banner, better error paths).
[0.5.0] — 2026-05-06
Tag: v0.5.0 — final kickstart-path release.
The hardened-Fedora-43 kickstart line ships. Future work moves to the v0.7 BlueBuild OCI spike; the kickstart retires at v1.0.
Added
- First green Forgejo-CI ISO build (~2.7 GB live ISO, EFI + BIOS
bootable). Released as
ci-latestartifact atgit.s8n.ru/veilor-org/veilor-os/releases/tag/ci-latest. - gum TUI installer wrapping Anaconda — single LUKS prompt,
locale locked to
en_US.UTF-8, admin-password first-boot flow. - LUKS2 argon2id + btrfs subvols install via Anaconda, written
through
/etc/kernel/cmdlineso BLS entries carry the cmdline veilor needs. - 3-mode
veilor-powerCLI (save | mid | perf) with AC/battery udev auto-switching, lifted into the overlay. - KDE black theme + Fira Code system font, branded
/etc/os-release, GRUB rebrand, plymouth detail-text boot. - Hardening: SELinux enforcing, USBGuard default-block, fail2ban + auditd, firewalld drop zone, NTS chrony, DNS-over-TLS, locked root.
- Self-hosted Forgejo CI on nullstone replaces the GitHub Actions build pipeline.
Fixed (delta from v0.2.5 → v0.5.0 — 35+ failure classes)
The full v0.5.x grind is documented per-release in commit messages (v0.5.21–v0.5.32). Headline fixes:
--location=noneskippedCollectKernelArgumentsTask. Anaconda shipped BLS entries with empty cmdline. Fix: write/etc/kernel/cmdlinedirectly +/etc/default/grub+ grubby + explicitkernel-install add. (v0.5.31)transaction_progress.pyinstall scroll masked real failures when patched too broadly. Narrowed the patch to only suppressConfiguring xxx.x86_64. (v0.5.28 → v0.5.29)- Locale dialog raced anaconda startup. Lock to en_US.UTF-8,
defer locale choice to
veilor-postinstall(v0.7 scope). (v0.5.28) fbcon=nodefer+ GRUB rebrand + ASCII gum cursor make the install flow legible on linux fbcon. (v0.5.27)rd.luks.uuidinjected viagrubby --update-kernel=ALLin chroot%post— earlier releases relied on Anaconda which silently dropped it. (v0.5.23, v0.5.27)- 9-agent research wave identified the v0.5.32 blocker map; 7 blockers shipped in one bundle.
Notes
- Treat v0.5.0 as the portfolio anchor for the kickstart path. v0.5.32-rc was the last test-run; v0.5.0 was tagged on 2026-05-06 as the freeze point.
- v0.6 was cancelled the same day (folded into v0.7). See
docs/ROADMAP.mdstrategy-pivot section.
[0.2.5] — 2026-05-01
Commit: 8515bdb
Fixed
- Live boot took 5+ minutes on KVM. Dracut sat at the parse-livenet
stage for what looked like a hang. Root cause:
init_on_alloc=1andinit_on_free=1zero every memory page on allocation and free. In a virtualised guest with paravirtual memory, this multiplied the early-boot cost by ~5x. Removed both flags from the live kernel cmdline.
Notes
- The two memory-hygiene flags will be re-added on the installed
system via
veilor-firstbootin v0.3 — the cost on bare metal is negligible, the live-ISO penalty is the only place it bites. - Live cmdline retained:
lockdown=integrity slab_nomerge randomize_kstack_offset=on vsyscall=none.
[0.2.4] — 2026-05-01
Commit: a23ce63
Fixed
- VM booted but stalled at dracut "parse-livenet" looking for a label
that never matched. Root cause: an upstream bug in
livecd-tools—imgcreate/live.py::__get_efi_image_stanza()writes the EFI grub stanza asroot=live:LABEL=...for dracut. Dracut on live ISOs expectslive:CDLABEL=...for ISO9660 volume labels;LABEL=matches partition labels which a live ISO doesn't have. - Patched
live.pyin-place inside the CI build container before invokinglivecd-creator. With the patched stanza, the VM booted cleanly to the SDDM login prompt.
Changed
- CI workflow now
seds the patch into the installedlive.pyand asserts the patch landed before continuing the build.
Notes
- Bug also affects
livemedia-creator --make-iso --no-virtand any other consumer ofimgcreate.LiveImageCreator. Worth filing upstream once we have a clean repro recipe.
[0.2.3] — 2026-05-01
Commit: ef54a24
Added
- Manual
useradd admininvocation in chroot%post.livecd-creatordoes not run an installer phase, so the kickstartuserdirective is silently ignored. Without this, the booted live system has no admin account at all, and SDDM falls back to "no users" — login impossible.
Fixed
/etc/os-releasewas still pointing at stock Fedora. Even with the overlay tree successfully copied,kde-theme-apply.shwas resolving/etc/os-release.d/veilorfrom the wrong path (the build host's repo, not the overlay's installed location).- Rewired the symlink chain cleanly:
/etc/os-release → ../usr/lib/os-release, with the override file written to/usr/lib/os-releasedirectly during%post. - Branding now reflects veilor-os in
/etc/os-release,hostnamectl, and the SDDM session menu.
Notes
- The
user --name=admindirective stays in the kickstart for documentation and for any futurelivemedia-creator-based installer ISO that does honour it.
[0.2.2] — 2026-05-01
Commit: 3408841
Fixed
- Overlay was partially copied — boot worked but veilor-power, KDE
theme, custom scripts were all missing. Found via offline debugfs
inspection of the v0.2.1 rootfs: tuned profiles, sshd hardening,
sudoers entries, and systemd units were present, but
/usr/share/veilor-os/{assets,scripts}was empty. - Root cause:
%post --nochrootran withset -eu. When the firstcpof a non-essential overlay file returned non-zero, the script aborted, leaving the assets/scripts copy step un-executed. None of the chroot%postscripts could then find what they needed and they silently no-op'd.
Changed
%post --nochrootnow usesset +earoundcp/mkdirso a partial-permissions error on one tree doesn't kill the whole copy.- Added
/var/log/veilor-nochroot.log— every action in%post --nochrootnow traces with timestamps. Future debugging is onejournalctl --bootaway.
Notes
- The looser error handling is intentional but bounded — only the
overlay copy uses
set +e. Hardening scripts that follow run with strict mode.
[0.2.1] — 2026-05-01
Commit: 9c6136f
Fixed
- ISO booted, but it was effectively bare Fedora KDE. No
hardening, no theme, no
veilor-power, no/etc/os-releaseoverride. Confirmed by mounting v0.2.0 with debugfs:/etc/os-releasesymlinked to../usr/lib/os-release(Fedora's default), no/usr/share/veilor-os, no overlay files anywhere. - Root cause:
%post --nochroothardcoded/mnt/sysimageas the destination./mnt/sysimageis the livemedia-creator install root. We had switched the build pipeline to livecd-creator, which exposes the destination as$INSTALL_ROOT— a different path inside its tmpfs sandbox. - Switched the copy target to
$INSTALL_ROOT.
Notes
- Partial overlay landed in v0.2.1 (tuned, sshd, sddm.conf) — but
/usr/share/veilor-os/{assets,scripts}was still missing becauseset -euaborted partway through the cp tree. That fix is in v0.2.2. - Lesson learned: tooling-specific environment variables matter.
$INSTALL_ROOTis the portable answer;/mnt/sysimageis a livemedia-creator-only convention.
[0.2.0] — 2026-04-30
Commit: 7c4a94d (tagged release)
Added
- First green ISO. Reproducible build pipeline lands.
- GitHub Actions workflow
build-iso.ymlproduces a UEFI+BIOS-bootable live ISO fromkickstart/veilor-os.ks. - CI: kickstart syntax linting (
ksvalidator) gate. - Kickstart based on Fedora 43, KDE Plasma minimal, hardening
packages selected (
fail2ban,usbguard,tuned,audit,firewalld). - Overlay tree authored: tuned profiles, sshd hardening, sysctl drop-in, sudoers, udev rules, KDE theme assets, Fira Code font.
- 3-mode power profiles:
veilor-power save | mid | perfwith AC/battery udev auto-switching.
Notes — known limitations of v0.2.0
- The overlay never actually applied to the installed system.
The
%post --nochrootcopy step targeted/mnt/sysimage(livemedia-creator's install root) but the build pipeline had moved to livecd-creator, which uses$INSTALL_ROOT. Result: the ISO boots and presents a working KDE Plasma desktop, but it is in practice stock Fedora 43 KDE with no veilor-os hardening, branding, theme, or power scripts applied. - v0.2.0 is best understood as a build-pipeline milestone — the ISO format, EFI/BIOS bootability, partitioning, and squashfs build all work end-to-end. The userspace customisation layer was wired but not delivering. Treat v0.2.0 as proof-of-build, not as a feature-complete release.
- See v0.2.5 for the first feature-complete ISO that actually ships veilor-os hardening and branding into the running system.
Build pipeline path to green
For posterity, the issues resolved between v0.1 (scaffold) and v0.2.0 (first green ISO):
- pcre2 / selinux-policy version skew on stock Fedora 43 base —
worked around with a pinned
fix-repofor the local build only; CI usesdnf upgrade --refreshto sidestep entirely. - KDE Plasma hard-deps (cups, geoclue2, ModemManager, PackageKit) — kept at the package level, masked at the daemon level.
%post --nochrootsource path — multi-path detection added so the overlay can be sourced from/work(CI) or/run/install/repo(virt) or kickstart-relative (no-virt).livemedia-creator --make-iso --no-virtproduced a squashfs but no EFI/BOOT image. Switched tolivecd-creator(livecd-tools) which is purpose-built for live ISOs and handles EFI grafting.- Tmpdir on
/tmpexhausted the GitHub Actions tmpfs cap (16GB vs ~30GB working set). Moved to/var/lmcon the runner's host ext4.
[0.1.0] — 2026-04-29
Commit: 1822005
Added
- Initial repo scaffold:
kickstart/,build/,overlay/,scripts/,assets/,docs/,test/. - Kickstart skeleton (Fedora 43 KDE base, single-prompt LUKS install,
hardened bootloader cmdline, locked root, blank-password admin with
chage -d 0to force first-boot reset). - Hardening scripts ported and rebranded from operator's reference
system: base hardening, kernel hardening, custom SELinux policy
module
veilor-systemd. - KDE theme: BreezeBlackPure base + grey accent (
#686B6F). - Fira Code chosen as system font (Fedora
fira-code-fonts, SIL OFL 1.1). - Test harness: VM runner (
test/run-vm.sh) with QEMU + OVMF for fast iteration, withSECBOOT=1andFRESH=1modes. - Documentation:
BUILD.md,INSTALL.md,HARDENING.md,POWER.md,boot-checklist.md.
Notes
- v0.1 was scaffold-only — no green ISO yet. Build pipeline iterated through ~22 distinct toolchain issues before producing v0.2.0.
- All
onyxreferences stripped from shipped artifacts; comments refer to "reference system" only.