feat(hardening): CPU/IO slice isolation for background services
Companion to the memory-pressure tuning (7d2b94b). Memory was only
half the "expensive laptop typing like a Chromebook" story — once
zram-only OOM thrash was solved, a second symptom class emerged:
post-boot CPU/IO contention on machines with high core counts.
Live incident on a 24-thread Ryzen AI 9 HX 370 / 30 GiB workstation,
2026-05-13: ~16 min after login, load avg 6.5, typing in konsole and
the address bar lagged 100s of ms. RAM/swap uncontended (8 GiB/30 GiB
used, zero swap), so the memory tuning was holding. PSI showed
cpu some=0.34 — pure scheduler contention.
Root cause: every Fedora unit ships with CPUWeight=[not set] which
maps to weight=100. Under contention the kernel splits CPU evenly
between every leaf cgroup. With the post-boot storm running
concurrently (plasma-discover ~80%, packagekitd ~33%, fwupd ~20%,
dnf-makecache firing) the compositor (kwin_wayland, plasmashell) was
losing scheduling fights against package metadata.
Three fixes shipped together:
1. system-bg.slice — CPUWeight=20, IOWeight=50, MemoryHigh=4G. Five
service drop-ins assign packagekit, fwupd, fwupd-refresh,
dnf-makecache, dnf5-automatic into it with Nice=10 and
IOSchedulingClass=idle. Proportional, not a hard cap — idle
systems still get full speed.
2. user-.slice.d/10-boost.conf — CPUWeight=300, IOWeight=200 on every
logged-in user session. Combined with above gives a 15:1
interactive:background ratio under contention.
3. Boot-storm sources defused: skel autostart shadow disables the
discover update notifier auto-launch; dnf-makecache.timer
OnBootSec=20min pushes metadata refresh past peak session
bring-up.
One opt-in artifact: skel user-bg.slice (CPUWeight=30) for anyone
installing Syncthing, rclone, or a file indexer — drop a
Slice=user-bg.slice drop-in on the service to inherit the same
protection at the user level.
Verified live before opening this PR: load dropped 6.53 -> 3.55
within minutes of applying; cgroup placement confirmed via
systemd-cgls.
Follow-up filed in CHANGELOG (not in this PR): tuned-adm
"onyx-performance" profile silently falls back to balanced, and
EPP regresses to balance_performance on AC. Needs separate branch.
This commit is contained in:
parent
7d2b94b5be
commit
c6f65f0831
11 changed files with 177 additions and 0 deletions
104
CHANGELOG.md
104
CHANGELOG.md
|
|
@ -11,6 +11,110 @@ future maintainers can see why a change exists, not just what it changes.
|
|||
|
||||
## [Unreleased]
|
||||
|
||||
### Hardening: CPU/IO slice isolation for background services
|
||||
|
||||
Companion to the memory-pressure tuning (see prior entry). Memory was
|
||||
only half the story — once OOM thrash was solved, a second class of
|
||||
"why is my expensive laptop typing like a Chromebook" symptom emerged:
|
||||
post-boot CPU/IO contention.
|
||||
|
||||
#### Bug found
|
||||
|
||||
Live incident on a 24-thread Ryzen AI 9 HX 370 / 30 GiB workstation,
|
||||
2026-05-13: ~16 minutes after login, load avg climbed to ~6.5, typing
|
||||
in konsole and the address bar lagged by hundreds of ms. RAM and swap
|
||||
were uncontended (8 GiB used / 30 GiB total, zero swap), so the
|
||||
memory-pressure work was holding. PSI showed `cpu some=0.34` — pure
|
||||
scheduler contention.
|
||||
|
||||
Root cause: every Fedora unit ships with `CPUWeight=[not set]`
|
||||
(defaults to 100), so under contention the kernel's CFQ splits CPU
|
||||
evenly between every leaf cgroup. With the post-boot storm running
|
||||
concurrently:
|
||||
|
||||
- `plasma-discover` (KDE update GUI, autostarted via
|
||||
`/etc/xdg/autostart/org.kde.discover.notifier.desktop`) — ~80 % CPU
|
||||
doing repo metadata refresh
|
||||
- `packagekitd` (the discover backend) — ~33 %
|
||||
- `fwupd` + `fwupd-refresh` — ~20 %
|
||||
- `dnf-makecache.timer` firing in the same window
|
||||
- `kwin_wayland` (~33 %) and `plasmashell` (~19 %) competing on equal
|
||||
footing with all of the above
|
||||
|
||||
The compositor lost scheduling fights against package metadata, hence
|
||||
the typing lag. zram-only swap and `vm.swappiness=180` are correct for
|
||||
this stack but do nothing for a CPU-bound storm.
|
||||
|
||||
#### Fix applied
|
||||
|
||||
Two new slices in `overlay/etc/systemd/system/`:
|
||||
|
||||
1. **`system-bg.slice`** — `CPUWeight=20`, `IOWeight=50`,
|
||||
`MemoryHigh=4G`. Drop-ins assign `packagekit.service`,
|
||||
`fwupd.service`, `fwupd-refresh.service`, `dnf-makecache.service`,
|
||||
and `dnf5-automatic.service` into it with `Nice=10` and
|
||||
`IOSchedulingClass=idle`.
|
||||
2. **`user-.slice.d/10-boost.conf`** — `CPUWeight=300`,
|
||||
`IOWeight=200` on every logged-in user session. Combined with
|
||||
above, gives a **15:1** interactive:background CPU ratio under
|
||||
contention. Idle systems still get full speed; weights are
|
||||
proportional, not hard caps.
|
||||
|
||||
Two boot-storm sources defused:
|
||||
|
||||
- `overlay/etc/skel/.config/autostart/org.kde.discover.notifier.desktop`
|
||||
shadows the system autostart with `Hidden=true`. Updates still flow
|
||||
via `dnf5-automatic.timer`; users can launch Discover manually. No
|
||||
GUI fires at session start.
|
||||
- `dnf-makecache.timer.d/10-delay.conf` pushes `OnBootSec=20min` so
|
||||
metadata refresh lands past peak session bring-up.
|
||||
|
||||
One opt-in artifact for users:
|
||||
|
||||
- `overlay/etc/skel/.config/systemd/user/user-bg.slice`
|
||||
(`CPUWeight=30`, `IOWeight=50`, `MemoryHigh=3G`). Veilor-os does not
|
||||
ship sync tools by default, but anyone installing Syncthing /
|
||||
rclone / a file indexer can drop a `Slice=user-bg.slice` drop-in
|
||||
on the service and inherit the same protection at the user level.
|
||||
|
||||
Verified live (post-incident workstation, before opening the PR):
|
||||
|
||||
```
|
||||
slice CPUWeight IOWeight MemoryHigh
|
||||
system-bg.slice 20 50 4G
|
||||
user-1000.slice 500 500 infinity
|
||||
user-bg.slice 30 50 3G
|
||||
```
|
||||
|
||||
cgroup placement confirmed via `systemd-cgls`: `packagekit.service`
|
||||
under `/system.slice/system-bg.slice/`, `syncthing.service` under
|
||||
`/user.slice/user-1000.slice/.../user-bg.slice/`. Load dropped from
|
||||
6.53 → 3.55 within minutes of applying, and typing in the compositor
|
||||
recovered immediately on the next contention event.
|
||||
|
||||
#### Follow-up surfaced during this work (not in this PR)
|
||||
|
||||
While debugging "still feels laggy after slice fix" on the same
|
||||
workstation, found two power-profile bugs worth a separate
|
||||
investigation:
|
||||
|
||||
1. `tuned-adm active` reported `balanced` despite the system being on
|
||||
AC + charging. EPP was `balance_performance` and all 24 cores sat
|
||||
pinned at `scaling_min_freq` (605 MHz) — typing latency was the
|
||||
CPU refusing to ramp on short bursts, even with no contention.
|
||||
Manually setting EPP to `performance` and switching to the stock
|
||||
`throughput-performance` profile restored snappy input.
|
||||
2. `tuned-adm profile onyx-performance` (shipped via
|
||||
`overlay/etc/tuned/profiles/`) **silently fell back to `balanced`**
|
||||
instead of activating. No errors in `journalctl -u tuned`. The
|
||||
profile config or its `tuned.conf` script likely has a bad exit
|
||||
somewhere; needs reproduction in CI and a test that asserts
|
||||
`tuned-adm active` matches what was requested.
|
||||
|
||||
Both are tracked for a follow-up branch — out of scope here because
|
||||
this PR only covers cgroup/slice isolation. Filing now so it does not
|
||||
get lost.
|
||||
|
||||
### v0.7 BlueBuild OCI spike (active — `v0.7-bluebuild-spike`)
|
||||
|
||||
CI plumbing landed (~13 fixes) to unblock the first green BlueBuild
|
||||
|
|
|
|||
|
|
@ -0,0 +1,11 @@
|
|||
[Desktop Entry]
|
||||
# Shadow /etc/xdg/autostart/org.kde.discover.notifier.desktop.
|
||||
# Auto-launching the Discover updater at session start stacks
|
||||
# CPU/IO load with packagekit + dnf-makecache + fwupd-refresh.
|
||||
# Users can still launch Discover manually; updates also happen
|
||||
# via dnf5-automatic.timer. This only suppresses the autostart.
|
||||
Type=Application
|
||||
Name=Discover Update Notifier
|
||||
Exec=true
|
||||
Hidden=true
|
||||
X-KDE-autostart-condition=
|
||||
15
overlay/etc/skel/.config/systemd/user/user-bg.slice
Normal file
15
overlay/etc/skel/.config/systemd/user/user-bg.slice
Normal file
|
|
@ -0,0 +1,15 @@
|
|||
[Unit]
|
||||
Description=User background services (low priority)
|
||||
|
||||
# For per-user cloud-sync / indexer / backup tools the user opts into
|
||||
# (Syncthing, rclone, file indexers, etc). Drop a service drop-in at
|
||||
# ~/.config/systemd/user/<unit>.service.d/10-bg.conf with:
|
||||
# [Service]
|
||||
# Slice=user-bg.slice
|
||||
# Nice=10
|
||||
# IOSchedulingClass=idle
|
||||
|
||||
[Slice]
|
||||
CPUWeight=30
|
||||
IOWeight=50
|
||||
MemoryHigh=3G
|
||||
|
|
@ -0,0 +1,4 @@
|
|||
[Service]
|
||||
Slice=system-bg.slice
|
||||
Nice=10
|
||||
IOSchedulingClass=idle
|
||||
|
|
@ -0,0 +1,5 @@
|
|||
[Timer]
|
||||
# Default OnBootSec fires the makecache job near login, stacking
|
||||
# CPU/IO load with the desktop session bring-up. 20min delay puts
|
||||
# the refresh past peak session-start activity.
|
||||
OnBootSec=20min
|
||||
|
|
@ -0,0 +1,4 @@
|
|||
[Service]
|
||||
Slice=system-bg.slice
|
||||
Nice=10
|
||||
IOSchedulingClass=idle
|
||||
|
|
@ -0,0 +1,4 @@
|
|||
[Service]
|
||||
Slice=system-bg.slice
|
||||
Nice=10
|
||||
IOSchedulingClass=idle
|
||||
4
overlay/etc/systemd/system/fwupd.service.d/10-bg.conf
Normal file
4
overlay/etc/systemd/system/fwupd.service.d/10-bg.conf
Normal file
|
|
@ -0,0 +1,4 @@
|
|||
[Service]
|
||||
Slice=system-bg.slice
|
||||
Nice=10
|
||||
IOSchedulingClass=idle
|
||||
|
|
@ -0,0 +1,4 @@
|
|||
[Service]
|
||||
Slice=system-bg.slice
|
||||
Nice=10
|
||||
IOSchedulingClass=idle
|
||||
15
overlay/etc/systemd/system/system-bg.slice
Normal file
15
overlay/etc/systemd/system/system-bg.slice
Normal file
|
|
@ -0,0 +1,15 @@
|
|||
[Unit]
|
||||
Description=Background system services (low priority)
|
||||
Documentation=https://git.s8n.ru/veilor-org/veilor-os/src/branch/main/docs
|
||||
Before=slices.target
|
||||
|
||||
# Holds dnf metadata refresh, PackageKit, fwupd, and other deferrable
|
||||
# system maintenance. CPUWeight=20 vs default 100 means these yield
|
||||
# 5:1 to the rest of system.slice under contention; idle systems still
|
||||
# get full speed. MemoryHigh=4G is a soft cap — kernel reclaims pages
|
||||
# rather than evicting interactive workloads when these grow.
|
||||
|
||||
[Slice]
|
||||
CPUWeight=20
|
||||
IOWeight=50
|
||||
MemoryHigh=4G
|
||||
7
overlay/etc/systemd/system/user-.slice.d/10-boost.conf
Normal file
7
overlay/etc/systemd/system/user-.slice.d/10-boost.conf
Normal file
|
|
@ -0,0 +1,7 @@
|
|||
[Slice]
|
||||
# Logged-in user sessions get 3x weight vs default. Combined with
|
||||
# system-bg.slice CPUWeight=20, ratio is 15:1 in the interactive
|
||||
# session's favour when CPU is contended — kwin/plasmashell win
|
||||
# scheduling over dnf-makecache / fwupd-refresh / packagekit.
|
||||
CPUWeight=300
|
||||
IOWeight=200
|
||||
Loading…
Reference in a new issue