feat(hardening): CPU/IO slice isolation for background services #12

Open
s8n wants to merge 1 commit from feat/cpu-io-slice-isolation into feat/memory-pressure-tuning
11 changed files with 177 additions and 0 deletions
Showing only changes of commit c6f65f0831 - Show all commits

View file

@ -11,6 +11,110 @@ future maintainers can see why a change exists, not just what it changes.
## [Unreleased]
### Hardening: CPU/IO slice isolation for background services
Companion to the memory-pressure tuning (see prior entry). Memory was
only half the story — once OOM thrash was solved, a second class of
"why is my expensive laptop typing like a Chromebook" symptom emerged:
post-boot CPU/IO contention.
#### Bug found
Live incident on a 24-thread Ryzen AI 9 HX 370 / 30 GiB workstation,
2026-05-13: ~16 minutes after login, load avg climbed to ~6.5, typing
in konsole and the address bar lagged by hundreds of ms. RAM and swap
were uncontended (8 GiB used / 30 GiB total, zero swap), so the
memory-pressure work was holding. PSI showed `cpu some=0.34` — pure
scheduler contention.
Root cause: every Fedora unit ships with `CPUWeight=[not set]`
(defaults to 100), so under contention the kernel's CFQ splits CPU
evenly between every leaf cgroup. With the post-boot storm running
concurrently:
- `plasma-discover` (KDE update GUI, autostarted via
`/etc/xdg/autostart/org.kde.discover.notifier.desktop`) — ~80 % CPU
doing repo metadata refresh
- `packagekitd` (the discover backend) — ~33 %
- `fwupd` + `fwupd-refresh` — ~20 %
- `dnf-makecache.timer` firing in the same window
- `kwin_wayland` (~33 %) and `plasmashell` (~19 %) competing on equal
footing with all of the above
The compositor lost scheduling fights against package metadata, hence
the typing lag. zram-only swap and `vm.swappiness=180` are correct for
this stack but do nothing for a CPU-bound storm.
#### Fix applied
Two new slices in `overlay/etc/systemd/system/`:
1. **`system-bg.slice`** — `CPUWeight=20`, `IOWeight=50`,
`MemoryHigh=4G`. Drop-ins assign `packagekit.service`,
`fwupd.service`, `fwupd-refresh.service`, `dnf-makecache.service`,
and `dnf5-automatic.service` into it with `Nice=10` and
`IOSchedulingClass=idle`.
2. **`user-.slice.d/10-boost.conf`** — `CPUWeight=300`,
`IOWeight=200` on every logged-in user session. Combined with
above, gives a **15:1** interactive:background CPU ratio under
contention. Idle systems still get full speed; weights are
proportional, not hard caps.
Two boot-storm sources defused:
- `overlay/etc/skel/.config/autostart/org.kde.discover.notifier.desktop`
shadows the system autostart with `Hidden=true`. Updates still flow
via `dnf5-automatic.timer`; users can launch Discover manually. No
GUI fires at session start.
- `dnf-makecache.timer.d/10-delay.conf` pushes `OnBootSec=20min` so
metadata refresh lands past peak session bring-up.
One opt-in artifact for users:
- `overlay/etc/skel/.config/systemd/user/user-bg.slice`
(`CPUWeight=30`, `IOWeight=50`, `MemoryHigh=3G`). Veilor-os does not
ship sync tools by default, but anyone installing Syncthing /
rclone / a file indexer can drop a `Slice=user-bg.slice` drop-in
on the service and inherit the same protection at the user level.
Verified live (post-incident workstation, before opening the PR):
```
slice CPUWeight IOWeight MemoryHigh
system-bg.slice 20 50 4G
user-1000.slice 500 500 infinity
user-bg.slice 30 50 3G
```
cgroup placement confirmed via `systemd-cgls`: `packagekit.service`
under `/system.slice/system-bg.slice/`, `syncthing.service` under
`/user.slice/user-1000.slice/.../user-bg.slice/`. Load dropped from
6.53 → 3.55 within minutes of applying, and typing in the compositor
recovered immediately on the next contention event.
#### Follow-up surfaced during this work (not in this PR)
While debugging "still feels laggy after slice fix" on the same
workstation, found two power-profile bugs worth a separate
investigation:
1. `tuned-adm active` reported `balanced` despite the system being on
AC + charging. EPP was `balance_performance` and all 24 cores sat
pinned at `scaling_min_freq` (605 MHz) — typing latency was the
CPU refusing to ramp on short bursts, even with no contention.
Manually setting EPP to `performance` and switching to the stock
`throughput-performance` profile restored snappy input.
2. `tuned-adm profile onyx-performance` (shipped via
`overlay/etc/tuned/profiles/`) **silently fell back to `balanced`**
instead of activating. No errors in `journalctl -u tuned`. The
profile config or its `tuned.conf` script likely has a bad exit
somewhere; needs reproduction in CI and a test that asserts
`tuned-adm active` matches what was requested.
Both are tracked for a follow-up branch — out of scope here because
this PR only covers cgroup/slice isolation. Filing now so it does not
get lost.
### v0.7 BlueBuild OCI spike (active — `v0.7-bluebuild-spike`)
CI plumbing landed (~13 fixes) to unblock the first green BlueBuild

View file

@ -0,0 +1,11 @@
[Desktop Entry]
# Shadow /etc/xdg/autostart/org.kde.discover.notifier.desktop.
# Auto-launching the Discover updater at session start stacks
# CPU/IO load with packagekit + dnf-makecache + fwupd-refresh.
# Users can still launch Discover manually; updates also happen
# via dnf5-automatic.timer. This only suppresses the autostart.
Type=Application
Name=Discover Update Notifier
Exec=true
Hidden=true
X-KDE-autostart-condition=

View file

@ -0,0 +1,15 @@
[Unit]
Description=User background services (low priority)
# For per-user cloud-sync / indexer / backup tools the user opts into
# (Syncthing, rclone, file indexers, etc). Drop a service drop-in at
# ~/.config/systemd/user/<unit>.service.d/10-bg.conf with:
# [Service]
# Slice=user-bg.slice
# Nice=10
# IOSchedulingClass=idle
[Slice]
CPUWeight=30
IOWeight=50
MemoryHigh=3G

View file

@ -0,0 +1,4 @@
[Service]
Slice=system-bg.slice
Nice=10
IOSchedulingClass=idle

View file

@ -0,0 +1,5 @@
[Timer]
# Default OnBootSec fires the makecache job near login, stacking
# CPU/IO load with the desktop session bring-up. 20min delay puts
# the refresh past peak session-start activity.
OnBootSec=20min

View file

@ -0,0 +1,4 @@
[Service]
Slice=system-bg.slice
Nice=10
IOSchedulingClass=idle

View file

@ -0,0 +1,4 @@
[Service]
Slice=system-bg.slice
Nice=10
IOSchedulingClass=idle

View file

@ -0,0 +1,4 @@
[Service]
Slice=system-bg.slice
Nice=10
IOSchedulingClass=idle

View file

@ -0,0 +1,4 @@
[Service]
Slice=system-bg.slice
Nice=10
IOSchedulingClass=idle

View file

@ -0,0 +1,15 @@
[Unit]
Description=Background system services (low priority)
Documentation=https://git.s8n.ru/veilor-org/veilor-os/src/branch/main/docs
Before=slices.target
# Holds dnf metadata refresh, PackageKit, fwupd, and other deferrable
# system maintenance. CPUWeight=20 vs default 100 means these yield
# 5:1 to the rest of system.slice under contention; idle systems still
# get full speed. MemoryHigh=4G is a soft cap — kernel reclaims pages
# rather than evicting interactive workloads when these grow.
[Slice]
CPUWeight=20
IOWeight=50
MemoryHigh=4G

View file

@ -0,0 +1,7 @@
[Slice]
# Logged-in user sessions get 3x weight vs default. Combined with
# system-bg.slice CPUWeight=20, ratio is 15:1 in the interactive
# session's favour when CPU is contended — kwin/plasmashell win
# scheduling over dnf-makecache / fwupd-refresh / packagekit.
CPUWeight=300
IOWeight=200