How do you continually test and release new versions of systemd with confidence? Also, once released, how do you monitor PID 1 itself and your PID 1 usage across your server fleet? This talk dives into Meta’s way of answering these questions so we can minimize the risk of breaking changes and fun each systemd release brings us. Some of the technology in the talk is OSS, so you too, can join in on the fun knowing how your systemd usage is across your own infrastructure!
This talk will dive into how Meta baseline’s our systemd usage across the fleet and use that data for CI, releasing and monitoring systemd.
* Who am I + what do I work on
* The common big monitoring hole many bare bone infrastructures have
* PID 1
* PID 1 usage
* Systemd @ meta
* Imaging initrd
* Initrd
* Main os
* Twine containers
* Overview of OS image building and deployment @ meta
* How we build images
* How we provision servers
* Chef’s role
* What we check from our PID1 statistics to ensure a box is “healthy” enough to take workloads
* Usage of hyperscale’s systemd-cd @ meta
* What is systemd-cd
* [https://sigs.centos.org/hyperscale/internal/ci/](https://sigs.centos.org/hyperscale/internal/ci/)
* How do we use it
* What issues has it found for us
* Monitoring of meta’s systemd usage across the millions of hosts
* Stats collected
* Introduce monitord
* Dbus (fun) vs. varlink
* mention OSS alternative(s) found - explain why invented monitord
* Introduce monitord-exporter
* Show usage outside of meta (will be my small home infra + VPS’s)
Licensed to the public under https://creativecommons.org/licenses/by/4.0/de/