You may perhaps not recognize the name of Kevin S. Braunsdorf, or “ksb”
(kay ess bee) as he was called, but you certainly used one tool he wrote,
together with Matthew Bradburn, namely the implementation of test(1)
in GNU coreutils.
Kevin S. Braunsdorf died last year, on July 24, 2024, after a long illness.
In this post, I try to remember his work and legacy.
He studied at Purdue University and worked there as a sysadmin from 1986 to 1994. Later, he joined FedEx and greatly influenced how IT is run there, from software deployments to the physical design of datacenters.
Kevin was a pioneer of what we today call “configuration engineering”,
and he wrote a Unix toolkit called msrc_base
to help with these tasks.
(Quote: “This lets a team of less than 10 people run more than 3,200
instances without breaking themselves or production.”)
Together with other tools that are useful in general, he built the
“pundits tool-chain”.
These tools deserve further investigation.
Now, back in those days, Unix systems were vastly heterogeneous and
ridden with vendor-specific quirks and bugs. His tooling centers
around a least common denominator; for example, m4
and make
are
used heavily as they were widely available (and later, Perl). C
programs have to be compiled on their specific target hosts. Remote
execution initially used rsh
, file distribution was done with
rdist
. Everything had to be bootstrappable from simple shell
scripts and standard Unix tools, porting to new platforms was common.
The idea behind msrc
The basic concept of how
msrc
works was already implemented in the first releases from
2000
we can find online: at its core, there is a two-stage Makefile, where
one part runs on the distribution machine, and then the results get
transferred to the target machine (say, with rdist
), and then a
second Makefile (Makefile.host
) is run there.
This is a practical and very flexible approach. Configuration can be kept centralized, but if you need to run tasks on the target machine (say, compile software across your heterogeneous architecture), it is possible to do as well.
Over time, tools were added to parallelize this (xapply
), make the
deployment logs readable (xclate
), or work around resource
contention (ptbw
). Likewise, tools for inventory management and
host definitions were added (hxmd
, efmd
). Stateful operations on
sets (oue
) can be used for retrying on errors by keeping track of
failed tasks….
All tools are fairly well documented, but documentation is spread among many files, so it takes some time to understand the core ideas.
Start here if you are curious.
Dicing and mixing
Unix systems contain a series of ad-hoc text formats, such as the
format of /etc/passwd
. ksb invented a tiny language to work with
such file formats, implemented by the dicer. A sequence of field
separators and field selectors can be used to drill down on formatted
data:
% grep Leah /etc/passwd
leah:x:1000:1000:Leah Neukirchen:/home/leah:/bin/zsh
% grep Leah /etc/passwd | xapply -f 'echo %[1:5 $] %[1:$/$]' -
Neukirchen zsh
The first field (the whole line) is split on :
, then we select the
5th field, split by space, then select the last field ($
).
For the basename of the shell, we split by /
.
Using another feature, the mixer, we can build bigger strings from diced results. For example to format a phone number:
% echo 5555551234 | xapply -f 'echo %Q(1,"(",1-3,") ",4-6,"-",7-$)' -
(555) 555-1234
The %Q
does shell-quoting here!
Since the dicer and the mixer are implemented as library routines, they appear in multiple tools.
“Business logic” in m4
One of the more controversial choices in the pundits tool-chain is that
“business logic” (e.g. things like “this server runs this OS and has
this purpose, therefore it should have this package installed”) is
generally implemented using the notorious macro processor m4
. But
there were few other choices back then: awk
would have been a
possibility, but is a bit tricky to use due to its line-based
semantics. perl
wasn’t around when the tool-chain was started, though
it was used later for some things. But m4
shines if you want to
convert a text file into a text file with some pieces of logic.
One central tool is
hxmd
,
which takes tabular data file that contain configuration data (such
as, which hosts exist and what roles do they have), and can use m4
snippets to filter and compute custom command lines to deploy them,
e.g.:
% hxmd -C site.cf -E "COMPUTONS(CPU,NPROC)>1000" ...
Later, another tool named
efmd
was added that does not spawn a new m4
instance for each configuration line.
m4
is also used as a templating language. There I learned the nice
trick of quoting the entire document except for the parts where you
want to expand macros:
`# $Id...
# Output a minimal /etc/hosts to install to get the network going.
'HXMD_CACHE_TARGET`:
echo "# hxmd generated proto hosts file for 'HOST`"
echo "127.0.0.1 localhost 'HOST ifdef(`SHORTHOST',` SHORTHOST')`"
dig +short A 'HOST` |sed -n -e "s/^[0-9.:]*$$/& 'HOST ifdef(`SHORTHOST',` SHORTHOST')`/p"
'dnl
This example also shows that nested escaping was nothing ksb frowned upon.
Wrapper stacks
Since many tools of the pundits tool-chain are meant to be used
together, they were written as so-called
“wrappers”,
i.e. programs calling each other. For example, above mentioned hxmd
can spawn several commands in parallel using
xapply
,
which themselves call
xclate
again to yield different output streams, or use
ptbw
for resource management.
The great thing about the design of all these tools is how nicely they fit together. You can easily see what need drove the creation of the tool, and how they still can be used in a very general way, also for unanticipated use cases.
Influences on my own work
Discovering these tools was important for my own Unix toolkit and some tools are directly inspired, e.g. xe, and arr.
I still ponder host configuration systems.
NP: Adrianne Lenker—Not a Lot, Just Forever