feat(hardening): CPU/IO slice isolation for background services
Companion to the memory-pressure tuning (7d2b94b). Memory was only
half the "expensive laptop typing like a Chromebook" story — once
zram-only OOM thrash was solved, a second symptom class emerged:
post-boot CPU/IO contention on machines with high core counts.
Live incident on a 24-thread Ryzen AI 9 HX 370 / 30 GiB workstation,
2026-05-13: ~16 min after login, load avg 6.5, typing in konsole and
the address bar lagged 100s of ms. RAM/swap uncontended (8 GiB/30 GiB
used, zero swap), so the memory tuning was holding. PSI showed
cpu some=0.34 — pure scheduler contention.
Root cause: every Fedora unit ships with CPUWeight=[not set] which
maps to weight=100. Under contention the kernel splits CPU evenly
between every leaf cgroup. With the post-boot storm running
concurrently (plasma-discover ~80%, packagekitd ~33%, fwupd ~20%,
dnf-makecache firing) the compositor (kwin_wayland, plasmashell) was
losing scheduling fights against package metadata.
Three fixes shipped together:
1. system-bg.slice — CPUWeight=20, IOWeight=50, MemoryHigh=4G. Five
service drop-ins assign packagekit, fwupd, fwupd-refresh,
dnf-makecache, dnf5-automatic into it with Nice=10 and
IOSchedulingClass=idle. Proportional, not a hard cap — idle
systems still get full speed.
2. user-.slice.d/10-boost.conf — CPUWeight=300, IOWeight=200 on every
logged-in user session. Combined with above gives a 15:1
interactive:background ratio under contention.
3. Boot-storm sources defused: skel autostart shadow disables the
discover update notifier auto-launch; dnf-makecache.timer
OnBootSec=20min pushes metadata refresh past peak session
bring-up.
One opt-in artifact: skel user-bg.slice (CPUWeight=30) for anyone
installing Syncthing, rclone, or a file indexer — drop a
Slice=user-bg.slice drop-in on the service to inherit the same
protection at the user level.
Verified live before opening this PR: load dropped 6.53 -> 3.55
within minutes of applying; cgroup placement confirmed via
systemd-cgls.
Follow-up filed in CHANGELOG (not in this PR): tuned-adm
"onyx-performance" profile silently falls back to balanced, and
EPP regresses to balance_performance on AC. Needs separate branch.
This commit is contained in:
parent
7d2b94b5be
commit
c6f65f0831
11 changed files with 177 additions and 0 deletions
104
CHANGELOG.md
104
CHANGELOG.md
|
|
@ -11,6 +11,110 @@ future maintainers can see why a change exists, not just what it changes.
|
||||||
|
|
||||||
## [Unreleased]
|
## [Unreleased]
|
||||||
|
|
||||||
|
### Hardening: CPU/IO slice isolation for background services
|
||||||
|
|
||||||
|
Companion to the memory-pressure tuning (see prior entry). Memory was
|
||||||
|
only half the story — once OOM thrash was solved, a second class of
|
||||||
|
"why is my expensive laptop typing like a Chromebook" symptom emerged:
|
||||||
|
post-boot CPU/IO contention.
|
||||||
|
|
||||||
|
#### Bug found
|
||||||
|
|
||||||
|
Live incident on a 24-thread Ryzen AI 9 HX 370 / 30 GiB workstation,
|
||||||
|
2026-05-13: ~16 minutes after login, load avg climbed to ~6.5, typing
|
||||||
|
in konsole and the address bar lagged by hundreds of ms. RAM and swap
|
||||||
|
were uncontended (8 GiB used / 30 GiB total, zero swap), so the
|
||||||
|
memory-pressure work was holding. PSI showed `cpu some=0.34` — pure
|
||||||
|
scheduler contention.
|
||||||
|
|
||||||
|
Root cause: every Fedora unit ships with `CPUWeight=[not set]`
|
||||||
|
(defaults to 100), so under contention the kernel's CFQ splits CPU
|
||||||
|
evenly between every leaf cgroup. With the post-boot storm running
|
||||||
|
concurrently:
|
||||||
|
|
||||||
|
- `plasma-discover` (KDE update GUI, autostarted via
|
||||||
|
`/etc/xdg/autostart/org.kde.discover.notifier.desktop`) — ~80 % CPU
|
||||||
|
doing repo metadata refresh
|
||||||
|
- `packagekitd` (the discover backend) — ~33 %
|
||||||
|
- `fwupd` + `fwupd-refresh` — ~20 %
|
||||||
|
- `dnf-makecache.timer` firing in the same window
|
||||||
|
- `kwin_wayland` (~33 %) and `plasmashell` (~19 %) competing on equal
|
||||||
|
footing with all of the above
|
||||||
|
|
||||||
|
The compositor lost scheduling fights against package metadata, hence
|
||||||
|
the typing lag. zram-only swap and `vm.swappiness=180` are correct for
|
||||||
|
this stack but do nothing for a CPU-bound storm.
|
||||||
|
|
||||||
|
#### Fix applied
|
||||||
|
|
||||||
|
Two new slices in `overlay/etc/systemd/system/`:
|
||||||
|
|
||||||
|
1. **`system-bg.slice`** — `CPUWeight=20`, `IOWeight=50`,
|
||||||
|
`MemoryHigh=4G`. Drop-ins assign `packagekit.service`,
|
||||||
|
`fwupd.service`, `fwupd-refresh.service`, `dnf-makecache.service`,
|
||||||
|
and `dnf5-automatic.service` into it with `Nice=10` and
|
||||||
|
`IOSchedulingClass=idle`.
|
||||||
|
2. **`user-.slice.d/10-boost.conf`** — `CPUWeight=300`,
|
||||||
|
`IOWeight=200` on every logged-in user session. Combined with
|
||||||
|
above, gives a **15:1** interactive:background CPU ratio under
|
||||||
|
contention. Idle systems still get full speed; weights are
|
||||||
|
proportional, not hard caps.
|
||||||
|
|
||||||
|
Two boot-storm sources defused:
|
||||||
|
|
||||||
|
- `overlay/etc/skel/.config/autostart/org.kde.discover.notifier.desktop`
|
||||||
|
shadows the system autostart with `Hidden=true`. Updates still flow
|
||||||
|
via `dnf5-automatic.timer`; users can launch Discover manually. No
|
||||||
|
GUI fires at session start.
|
||||||
|
- `dnf-makecache.timer.d/10-delay.conf` pushes `OnBootSec=20min` so
|
||||||
|
metadata refresh lands past peak session bring-up.
|
||||||
|
|
||||||
|
One opt-in artifact for users:
|
||||||
|
|
||||||
|
- `overlay/etc/skel/.config/systemd/user/user-bg.slice`
|
||||||
|
(`CPUWeight=30`, `IOWeight=50`, `MemoryHigh=3G`). Veilor-os does not
|
||||||
|
ship sync tools by default, but anyone installing Syncthing /
|
||||||
|
rclone / a file indexer can drop a `Slice=user-bg.slice` drop-in
|
||||||
|
on the service and inherit the same protection at the user level.
|
||||||
|
|
||||||
|
Verified live (post-incident workstation, before opening the PR):
|
||||||
|
|
||||||
|
```
|
||||||
|
slice CPUWeight IOWeight MemoryHigh
|
||||||
|
system-bg.slice 20 50 4G
|
||||||
|
user-1000.slice 500 500 infinity
|
||||||
|
user-bg.slice 30 50 3G
|
||||||
|
```
|
||||||
|
|
||||||
|
cgroup placement confirmed via `systemd-cgls`: `packagekit.service`
|
||||||
|
under `/system.slice/system-bg.slice/`, `syncthing.service` under
|
||||||
|
`/user.slice/user-1000.slice/.../user-bg.slice/`. Load dropped from
|
||||||
|
6.53 → 3.55 within minutes of applying, and typing in the compositor
|
||||||
|
recovered immediately on the next contention event.
|
||||||
|
|
||||||
|
#### Follow-up surfaced during this work (not in this PR)
|
||||||
|
|
||||||
|
While debugging "still feels laggy after slice fix" on the same
|
||||||
|
workstation, found two power-profile bugs worth a separate
|
||||||
|
investigation:
|
||||||
|
|
||||||
|
1. `tuned-adm active` reported `balanced` despite the system being on
|
||||||
|
AC + charging. EPP was `balance_performance` and all 24 cores sat
|
||||||
|
pinned at `scaling_min_freq` (605 MHz) — typing latency was the
|
||||||
|
CPU refusing to ramp on short bursts, even with no contention.
|
||||||
|
Manually setting EPP to `performance` and switching to the stock
|
||||||
|
`throughput-performance` profile restored snappy input.
|
||||||
|
2. `tuned-adm profile onyx-performance` (shipped via
|
||||||
|
`overlay/etc/tuned/profiles/`) **silently fell back to `balanced`**
|
||||||
|
instead of activating. No errors in `journalctl -u tuned`. The
|
||||||
|
profile config or its `tuned.conf` script likely has a bad exit
|
||||||
|
somewhere; needs reproduction in CI and a test that asserts
|
||||||
|
`tuned-adm active` matches what was requested.
|
||||||
|
|
||||||
|
Both are tracked for a follow-up branch — out of scope here because
|
||||||
|
this PR only covers cgroup/slice isolation. Filing now so it does not
|
||||||
|
get lost.
|
||||||
|
|
||||||
### v0.7 BlueBuild OCI spike (active — `v0.7-bluebuild-spike`)
|
### v0.7 BlueBuild OCI spike (active — `v0.7-bluebuild-spike`)
|
||||||
|
|
||||||
CI plumbing landed (~13 fixes) to unblock the first green BlueBuild
|
CI plumbing landed (~13 fixes) to unblock the first green BlueBuild
|
||||||
|
|
|
||||||
|
|
@ -0,0 +1,11 @@
|
||||||
|
[Desktop Entry]
|
||||||
|
# Shadow /etc/xdg/autostart/org.kde.discover.notifier.desktop.
|
||||||
|
# Auto-launching the Discover updater at session start stacks
|
||||||
|
# CPU/IO load with packagekit + dnf-makecache + fwupd-refresh.
|
||||||
|
# Users can still launch Discover manually; updates also happen
|
||||||
|
# via dnf5-automatic.timer. This only suppresses the autostart.
|
||||||
|
Type=Application
|
||||||
|
Name=Discover Update Notifier
|
||||||
|
Exec=true
|
||||||
|
Hidden=true
|
||||||
|
X-KDE-autostart-condition=
|
||||||
15
overlay/etc/skel/.config/systemd/user/user-bg.slice
Normal file
15
overlay/etc/skel/.config/systemd/user/user-bg.slice
Normal file
|
|
@ -0,0 +1,15 @@
|
||||||
|
[Unit]
|
||||||
|
Description=User background services (low priority)
|
||||||
|
|
||||||
|
# For per-user cloud-sync / indexer / backup tools the user opts into
|
||||||
|
# (Syncthing, rclone, file indexers, etc). Drop a service drop-in at
|
||||||
|
# ~/.config/systemd/user/<unit>.service.d/10-bg.conf with:
|
||||||
|
# [Service]
|
||||||
|
# Slice=user-bg.slice
|
||||||
|
# Nice=10
|
||||||
|
# IOSchedulingClass=idle
|
||||||
|
|
||||||
|
[Slice]
|
||||||
|
CPUWeight=30
|
||||||
|
IOWeight=50
|
||||||
|
MemoryHigh=3G
|
||||||
|
|
@ -0,0 +1,4 @@
|
||||||
|
[Service]
|
||||||
|
Slice=system-bg.slice
|
||||||
|
Nice=10
|
||||||
|
IOSchedulingClass=idle
|
||||||
|
|
@ -0,0 +1,5 @@
|
||||||
|
[Timer]
|
||||||
|
# Default OnBootSec fires the makecache job near login, stacking
|
||||||
|
# CPU/IO load with the desktop session bring-up. 20min delay puts
|
||||||
|
# the refresh past peak session-start activity.
|
||||||
|
OnBootSec=20min
|
||||||
|
|
@ -0,0 +1,4 @@
|
||||||
|
[Service]
|
||||||
|
Slice=system-bg.slice
|
||||||
|
Nice=10
|
||||||
|
IOSchedulingClass=idle
|
||||||
|
|
@ -0,0 +1,4 @@
|
||||||
|
[Service]
|
||||||
|
Slice=system-bg.slice
|
||||||
|
Nice=10
|
||||||
|
IOSchedulingClass=idle
|
||||||
4
overlay/etc/systemd/system/fwupd.service.d/10-bg.conf
Normal file
4
overlay/etc/systemd/system/fwupd.service.d/10-bg.conf
Normal file
|
|
@ -0,0 +1,4 @@
|
||||||
|
[Service]
|
||||||
|
Slice=system-bg.slice
|
||||||
|
Nice=10
|
||||||
|
IOSchedulingClass=idle
|
||||||
|
|
@ -0,0 +1,4 @@
|
||||||
|
[Service]
|
||||||
|
Slice=system-bg.slice
|
||||||
|
Nice=10
|
||||||
|
IOSchedulingClass=idle
|
||||||
15
overlay/etc/systemd/system/system-bg.slice
Normal file
15
overlay/etc/systemd/system/system-bg.slice
Normal file
|
|
@ -0,0 +1,15 @@
|
||||||
|
[Unit]
|
||||||
|
Description=Background system services (low priority)
|
||||||
|
Documentation=https://git.s8n.ru/veilor-org/veilor-os/src/branch/main/docs
|
||||||
|
Before=slices.target
|
||||||
|
|
||||||
|
# Holds dnf metadata refresh, PackageKit, fwupd, and other deferrable
|
||||||
|
# system maintenance. CPUWeight=20 vs default 100 means these yield
|
||||||
|
# 5:1 to the rest of system.slice under contention; idle systems still
|
||||||
|
# get full speed. MemoryHigh=4G is a soft cap — kernel reclaims pages
|
||||||
|
# rather than evicting interactive workloads when these grow.
|
||||||
|
|
||||||
|
[Slice]
|
||||||
|
CPUWeight=20
|
||||||
|
IOWeight=50
|
||||||
|
MemoryHigh=4G
|
||||||
7
overlay/etc/systemd/system/user-.slice.d/10-boost.conf
Normal file
7
overlay/etc/systemd/system/user-.slice.d/10-boost.conf
Normal file
|
|
@ -0,0 +1,7 @@
|
||||||
|
[Slice]
|
||||||
|
# Logged-in user sessions get 3x weight vs default. Combined with
|
||||||
|
# system-bg.slice CPUWeight=20, ratio is 15:1 in the interactive
|
||||||
|
# session's favour when CPU is contended — kwin/plasmashell win
|
||||||
|
# scheduling over dnf-makecache / fwupd-refresh / packagekit.
|
||||||
|
CPUWeight=300
|
||||||
|
IOWeight=200
|
||||||
Loading…
Reference in a new issue