leah blogs

08jan2022 · How to check you're in the initial pid namespace?

It all started with a simple question: how can a Linux process determine whether it is the init process of a freshly booted system?

A dozen years ago, the Unix textbook answer to this would have been: well, if its process id (pid) is 1, then it is init by definition.

These days, things are not that simple anymore. Containerization creates situations where pid is 1, but the process runs, well, in a container. In Linux, this is realized by using a feature called “pid namespaces”. The clone(2) syscall can take the flag CLONE_NEWPID (“since Linux 2.6.24”), which puts the new process into a new pid namespace. This means that this process will have pid 1 inside the pid namespace, but outside (i.e. in the parent pid namespace), the process has a regular pid. Various Linux API transparently translate pids between these namespaces.

The pid namespaces form a hierarchy, and the one at the very top is called “initial pid namespace”.

You can use the tool unshare(1) to play with pid namespaces:

% unshare --fork --map-root-user --pid bash -c 'echo $$' 
1

This is a way to spawn (as a regular user!) a process that has pid 1, at least, that’s what it looks like to the process.

We can try to find some evidence that we’re a freshly booted init, but none of it is really conclusive:

  • Our user id is 0, we are root (necessary but not sufficient of course).
  • $TERM should be linux; trivial to override.
  • $BOOT_IMAGE is set, but this depends on the boot loader.
  • System uptime is “low”, but it takes the initrd boot time into account. Our non-root init could be spawned in a container at boot time.

There are also some indicators the process runs in a container using one of the popular solutions such as docker or podman:

  • The process has a lot of supplementary groups already.
  • If we were put inside a cgroup, reading /proc/1/cgroup will indicate it.
  • The file /.dockerenv exists.

But there are still situations, such as the unshare call above, where all of these things may not be true.

Therefore I tried to find the ultimate way to detect whether we are in the initial pid namespace.

I started to research this and quickly found the ioctl(2) NS_GET_PARENT which seemed to be useful: “Returns a file descriptor that refers to the parent namespace of the namespace referred to by fd.” However, it is useless for this purpose:

EPERM  The requested namespace is outside of the caller's
       namespace scope.  This error can occur if, for example,
       the owning user namespace is an ancestor of the caller's
       current user namespace.  It can also occur on attempts to
       obtain the parent of the initial user or PID namespace.

Of course, it makes a lot of sense that we cannot get a handle to the surrounding pid namespace, as this would make the encapsulation provided by namespaces futile. However, coalescing these two error conditions (namespace is outside the caller namespace, and namespace is initial pid namespace) doesn’t make our life easier.

So, we need to bring out bigger guns in. I searched the kernel source for occurrences of init_pid_ns, as this namespace is called in the Linux source code. There are not too many occurrences we can rely on. The taskstats module limits the TASKSTATS_CMD_ATTR_REGISTER_CPUMASK command to the initial pid namespace only, but to use this requires speaking the netlink interface, which is terrible. Also, the behavior could change in future versions.

One interesting, and viable approach, is this limitation of the reboot(2) syscall: only some LINUX_REBOOT_CMD_* commands are allowed to be sent inside a nested pid namespace. Now, we need to find a “harmless” command to call reboot(2) with to test this! (Obviously, only being able to suspend the machine from the initial pid namespace is not a very useful check…) There are two commands that do not do much harm: LINUX_REBOOT_CMD_CAD_{ON,OFF} will toggle the action that Ctrl-Alt-Delete performs. Unfortunately, it is impossible to read the state of this flag, making this test a destructive operation still. (But if you are pid 1, you may want to set it anyway, so you get pid namespace detection for free.)

So I kept looking for other ways until I realized there’s a quite natural property to check for, and that is to find out if there are kernel threads in the pid namespace. Kernel threads are spawned by the kernel in the initial pid namespace and help perform certain asynchronous actions the kernel has to do, subject to process scheduling. As far as I know, kernel threads never occur in a nested pid namespace, and at least the parent process of kernel threads, kthreadd, will always exist. Conveniently, it also always has pid 2.

Thus, we just need to figure out if pid 2 is a kernel thread! Note that just checking whether pid 2 exists is cheap, but racy: the container runtime could have spawned another process before we are scheduled to do the check, and this process will as well get pid 2 then.

Luckily, kernel threads have quite a few special properties, that are of different difficulty to check from a C program:

  • /proc/PID/cmdline is empty (not a good indicator, user space processes can clear it too).
  • kernel threads have parent pid 0 (requires parsing /proc/PID/stat, which everyone gets wrong the first time, or /proc/PID/status).
  • kernel threads have no Vm* data in /proc/PID/status.
  • kernel threads have the flag PF_KTHREAD set (requires parsing /proc/PID/stat again).
  • kernel threads have an empty symlink for /proc/PID/exe.

I decided to go with the latter. On Linux, empty symlinks are impossible to create as a user, so we just need to check that and we’re done, right?

On a regular file system, using lstat(2) would have filled st_size with the length of the symlink. But on a procfs, lstat is not to be trusted, and even non-empty symlinks have st_size equal to 0. We thus really need to use the readlink(2) syscall to read the link. After doing this, you will notice that it returns ENOENT… exactly the same as if pid 2 did not exist!

We therefore need another check, to verify that pid 2 does exist. Luckily, here a lstat on /proc/2/exe file is fine. It must return zero.

Note that you need to do these operations in exactly this order, else you are subject to race conditions again: the only reason this works is that if pid 2 is kthreadd, it will not have terminated before the lstat check (because it cannot terminate).

Therefore, readlink(2) failing with ENOENT and lstat(2) succeeding is exactly the combination required to check pid 2 is kthreadd, which implies there are kernel threads in our pid namespace, which implies that we are in the initial namespace.

Phew, this went deeper than expected.

NP: David Bowie—Lazarus

24dec2021 · Merry Christmas!

Picture of a cat in front of a christmas tree

Frohe Weihnachten, ein schönes Fest, und einen guten Rutsch ins neue Jahr wünscht euch
Leah Neukirchen

Merry Christmas and a Happy New Year!

NP: Sade—Keep Looking

10dec2021 · Surveying lava basins with BQN and fixpoints

Yesterday, Advent of Code had an interesting problem: given the heightmap of a lava cave, compute the lowest points and the size of their basins (connected regions).

Let’s do this in BQN again, as this problem teaches some good ways to think in array languages.

First, let’s load the input data into a matrix:

   d ← > '0' -˜ •FLines"day09"

We subtract the ASCII lines from the character 0 to get numerical rows. The merge function (>) then converts this list-of-lists into a 2-dimensional array. For the sample data, we get:

┌─
╵ 2 1 9 9 9 4 3 2 1 0
  3 9 8 7 8 9 4 9 2 1
  9 8 5 6 7 8 9 8 9 2
  8 7 6 7 8 9 6 7 8 9
  9 8 9 9 9 6 5 6 7 8
                      ┘

A low point is a point that is lower than every orthogonally adjacent point. Thanks to array progamming, we can solve this for the whole matrix at once without any loops!

The core idea is to shift the array into each cardinal direction, and then compute the minimum of these arrays. If the original array is smaller then the array of the minimums, it’s a low point.

By default, shifting («, ») in BQN inserts zeroes for numerical arrays. But since we are looking for the minimum, we need to shift in a value that is higher than any. We can simply use .

So, to shift in from the left, we use:

   ∞»˘d
┌─
╵ ∞ 2 1 9 9 9 4 3 2 1
  ∞ 3 9 8 7 8 9 4 9 2
  ∞ 9 8 5 6 7 8 9 8 9
  ∞ 8 7 6 7 8 9 6 7 8
  ∞ 9 8 9 9 9 6 5 6 7
                      ┘

Unfortunately, shifting from the top is not so easy:

   ∞»d
Error: shift: =𝕨 must be =𝕩 or ¯1+=𝕩 (0≡=𝕨, 2≡=𝕩)
at ∞»d

We need would need to make a list of long enough to agree with the array width:

   (∞¨⊏d)»d
┌─
╵ ∞ ∞ ∞ ∞ ∞ ∞ ∞ ∞ ∞ ∞
  2 1 9 9 9 4 3 2 1 0
  3 9 8 7 8 9 4 9 2 1
  9 8 5 6 7 8 9 8 9 2
  8 7 6 7 8 9 6 7 8 9
                      ┘

However, since we need to do this on every side, we can also look at the problem differently: we shift in from the left under rotation by 0, 90, 180, 270 degrees.

How do we “rotate” a matrix? We reverse the rows () and then transpose () it.

   ⍉⌽d
┌─
╵ 9 8 9 3 2
  8 7 8 9 1
  9 6 5 8 9
  9 7 6 7 9
  9 8 7 8 9
  6 9 8 9 4
  5 6 9 4 3
  6 7 8 9 2
  7 8 9 2 1
  8 9 2 1 0
            ┘

By using the repeat modifier () we can easily rotate several times.

   (⍉∘⌽⍟(↕4)) d
┌─
· ┌─                      ┌─            ┌─                      ┌─
  ╵ 2 1 9 9 9 4 3 2 1 0   ╵ 9 8 9 3 2   ╵ 8 7 6 5 6 9 9 9 8 9   ╵ 0 1 2 9 8
    3 9 8 7 8 9 4 9 2 1     8 7 8 9 1     9 8 7 6 9 8 7 6 7 8     1 2 9 8 7
    9 8 5 6 7 8 9 8 9 2     9 6 5 8 9     2 9 8 9 8 7 6 5 8 9     2 9 8 7 6
    8 7 6 7 8 9 6 7 8 9     9 7 6 7 9     1 2 9 4 9 8 7 8 9 3     3 4 9 6 5
    9 8 9 9 9 6 5 6 7 8     9 8 7 8 9     0 1 2 3 4 9 9 9 1 2     4 9 8 9 6
                        ┘   6 9 8 9 4                         ┘   9 8 7 8 9
                            5 6 9 4 3                             9 7 6 7 9
                            6 7 8 9 2                             9 8 5 6 9
                            7 8 9 2 1                             1 9 8 7 8
                            8 9 2 1 0                             2 3 9 8 9
                                      ┘                                     ┘
                                                                              ┘

Finally, we perform the shift operation under () the rotation, that is, BQN rotates the array, does the shift, and knows how to undo the rotation!

┌─
· ┌─                      ┌─                      ┌─                      ┌─
  ╵ ∞ 2 1 9 9 9 4 3 2 1   ╵ 3 9 8 7 8 9 4 9 2 1   ╵ 1 9 9 9 4 3 2 1 0 ∞   ╵ ∞ ∞ ∞ ∞ ∞ ∞ ∞ ∞ ∞ ∞
    ∞ 3 9 8 7 8 9 4 9 2     9 8 5 6 7 8 9 8 9 2     9 8 7 8 9 4 9 2 1 ∞     2 1 9 9 9 4 3 2 1 0
    ∞ 9 8 5 6 7 8 9 8 9     8 7 6 7 8 9 6 7 8 9     8 5 6 7 8 9 8 9 2 ∞     3 9 8 7 8 9 4 9 2 1
    ∞ 8 7 6 7 8 9 6 7 8     9 8 9 9 9 6 5 6 7 8     7 6 7 8 9 6 7 8 9 ∞     9 8 5 6 7 8 9 8 9 2
    ∞ 9 8 9 9 9 6 5 6 7     ∞ ∞ ∞ ∞ ∞ ∞ ∞ ∞ ∞ ∞     8 9 9 9 6 5 6 7 8 ∞     8 7 6 7 8 9 6 7 8 9
                        ┘                       ┘                       ┘                       ┘
                                                                                                  ┘

Now we insert (´) the minimum function () between these arrays and compute the minimum at each position:

   ⌊´{∞⊸»˘⌾(⍉∘⌽⍟𝕩)d}¨↕4
┌─
╵ 1 2 1 7 4 3 2 1 0 1
  2 1 5 6 7 4 3 2 1 0
  3 5 6 5 6 7 4 7 2 1
  7 6 5 6 7 6 5 6 7 2
  8 7 6 7 6 5 6 5 6 7
                      ┘

The positions where the original array d is still smaller are the low points, and we store them for part 2.

   l ← d < ⌊´{∞⊸»˘⌾(⍉∘⌽⍟𝕩)d}¨↕4
┌─
╵ 0 1 0 0 0 0 0 0 0 1
  0 0 0 0 0 0 0 0 0 0
  0 0 1 0 0 0 0 0 0 0
  0 0 0 0 0 0 0 0 0 0
  0 0 0 0 0 0 1 0 0 0
                      ┘

To finish part 1, we need to compute the risk level for each low point, which is 1 plus the height. So compute that:

   (1+d)×l
┌─
╵ 0 2 0 0 0 0 0 0 0 1
  0 0 0 0 0 0 0 0 0 0
  0 0 6 0 0 0 0 0 0 0
  0 0 0 0 0 0 0 0 0 0
  0 0 0 0 0 0 6 0 0 0
                      ┘

Finally, we compute the end result by deshaping () the array into a single long list and summing it up:

   +´⥊(1+d)×l
15

This concludes part 1.

Part 2 is less straight forward. We need to compute the basins around every low point, which is the area limited by the points of height 9.

Since we need to compute the size for each basin in the end, we need to know which point belongs to which basin. To get started, we first give each low point a unique number. One way to do this is to assign an index to each position and just use those that are used in the low point array.

In BQN, the shape () of an array is the list of sizes for each axis:

   ≢d
⟨ 5 10 ⟩

We count up to the product of these values and add one (to avoid numbering a basin with 0, which will be used for the basin limits). Then, we multiply this with the array of low points so all ones in it get turned into a unique basin index. We perform the multiplication under deshape, so we keep the shape of the input data:

   s ← (1+↕×´≢d) ×⌾⥊ l
┌─
╵ 0 2  0 0 0 0  0 0 0 10
  0 0  0 0 0 0  0 0 0  0
  0 0 23 0 0 0  0 0 0  0
  0 0  0 0 0 0  0 0 0  0
  0 0  0 0 0 0 47 0 0  0
                         ┘

Here’s the main idea how to solve part 2: we incrementally grow these areas by adding their neighbors until the whole array is filled. For one step, we do this with a function Rise:

   Rise ← (d≠9)⊸×(»⌈«⌈«˘⌈»˘⌈⊣)

Here, shifting in zeroes is good enough, so we can use the monadic shift functions. The train at the end computes the maximum () of the four shifted versions and the array itself (). We multiply it with a matrix that is 0 where the depth is 9, so the basin limits will be constantly zero.

Let’s run it once and twice to see how it works:

   Rise s
┌─
╵ 2  2  0  0 0  0  0  0 10 10
  0  0 23  0 0  0  0  0  0 10
  0 23 23 23 0  0  0  0  0  0
  0  0 23  0 0  0 47  0  0  0
  0  0  0  0 0 47 47 47  0  0
                              ┘
   Rise⍟2 s
┌─
╵ 2  2  0  0  0  0  0 10 10 10
  2  0 23 23  0  0  0  0 10 10
  0 23 23 23 23  0  0  0  0 10
  0 23 23 23  0  0 47 47  0  0
  0  0  0  0  0 47 47 47 47  0
                               ┘

As you can see, row 1 colum 3 does not get filled by basin #2 since it has height 9.

Now we could just iterate this step a often enough, or actually only until we reach a fixpoint; that is, applying Rise again doesn’t change the value anymore.

A simple way to implement a fixpoint operator is

_fix ← { 𝕩 ≡ 𝔽 𝕩 ? 𝕩 ; 𝕊 𝔽 𝕩 }

Let’s compute the filled map:

   Rise _fix s
┌─
╵  2  2  0  0  0 10 10 10 10 10
   2  0 23 23 23  0 10  0 10 10
   0 23 23 23 23 23  0 47  0 10
  23 23 23 23 23  0 47 47 47  0
   0 23  0  0  0 47 47 47 47 47
                                ┘

Now, we just need to compute the sizes of the basins, which means computing a histogram of the basin numbers. We deshape the array again, and only keep all values bigger than zero:

   m ← ⥊ Rise _fix s
   (m>0)/m
⟨ 2 2 10 10 10 10 10 2 23 23 23 10 10 10 23 23 23 23 23 47 10 23 23 23 23 23 47 47 47 23 47 47 47 47 47 ⟩

With the nice group indices function (), we can count how often each value appears:

   ≠¨⊔(m>0)/m
⟨ 0 0 3 0 0 0 0 0 0 0 9 0 0 0 0 0 0 0 0 0 0 0 0 14 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 9 ⟩

Finally, let’s compute the three largest values by sorting in descending order () and taking the first three entries ():

   3↑∨≠¨⊔(m>0)/m
⟨ 14 9 9 ⟩

We then multiply this together and get the result for part 2:

   ×´3↑∨≠¨⊔(m>0)/m
1134

NP: The New Basement Tapes—Quick Like a Flash

07dec2021 · Counting lanternfish with BQN and linear algebra

Yesterday, Advent of Code had an interesting problem: modeling the growth rate of lanternfish. Read the link for the full details, but the core algorithm is this:

Each day, a 0 becomes a 6 and adds a new 8 to the end of the list, while each other number decreases by 1 if it was present at the start of the day.

For part 1, you need to compute the total number of lanternfish after 80 days, and for part 2 after 256 days. While the first part is possible with a native list based version (around 300k), the second part yields over 1.6 trillion for my input, so storing every lanternfish on its own is infeasible.

Luckily, with a bit of thinking we quickly realize that the position and order of the lanternfish is irrelevant and we can thus just count how many at each age we have and sum them up at the end for the answer.

I decided to tackle this problem using the array language BQN and will try to explain how it works.

First, let’s convert the sample input data into a histogram: For reasons that will be obvious in a minute, we compare each input element with the list of numbers from 0 to 8 (↕9).

   input ← ⟨3,4,3,1,2⟩
   input =⌜ ↕9
┌─
╵ 0 0 0 1 0 0 0 0 0
  0 0 0 0 1 0 0 0 0
  0 0 0 1 0 0 0 0 0
  0 1 0 0 0 0 0 0 0
  0 0 1 0 0 0 0 0 0
                    ┘

If we sum this table by column now, we have a count which age appears how often (note that ages over 8 never happen):

   d ← +˝ input =⌜ ↕9
⟨ 0 1 1 2 1 0 0 0 0 ⟩

Next, we need to figure out what happens in a step. We can model one part (number decreases by 1, or becomes 8 when it was 0) by rotating () the array to the left:

   1 ⌽ d
⟨ 1 1 2 1 0 0 0 0 0 ⟩
   2 ⌽ d
⟨ 1 2 1 0 0 0 0 0 1 ⟩

But we also need to add the first element to column 6, e.g. by adding an appropriately scaled vector. First, the vector, which we get by taking the numbers of 0 to 8 and comparing them to 6:

   6=↕9
⟨ 0 0 0 0 0 0 1 0 0 ⟩

Now we can scale this vector by the first element ():

   (6=↕9) × ⊑ ⟨9,8,7,6,5,4,3,2,1⟩
⟨ 0 0 0 0 0 0 9 0 0 ⟩

Finally, we put both things together. We can use a hook if we bind () the argument of the rotation to the operator:

Step ← 1⊸⌽ + (6=↕9)×⊑

Now, we simply need to repeat () this step 80 (resp. 256) times and count up the results to solve the puzzle!

   +´ Step⍟80 d
5934
   +´ Step⍟256 d
26984457539

This agrees with the provided example.

Of course, computing 256 steps is very fast, but we can wonder if there is not a more efficient solution. Let’s say we wanted to compute many lanternfish populations quickly.

The key realization is that Step is a linear function, which means:

(n × Step v) ≡ (Step n × v)
(Step v) + (Step w) ≡ (Step v + w)

We can thus compute Step for every basis vector of our 9-dimensional histogram space; let’s take the unit vectors here:

   step256 ← {+´Step⍟256 𝕩}˘ =⌜˜↕9
⟨ 6703087164 6206821033 5617089148 5217223242 4726100874
  4368232009 3989468462 3649885552 3369186778 ⟩

And now we can solve part 2 for any input by using a dot-product (pairwise multiplication followed by addition):

   step256 +´∘× d
26984457539

This yields the same result as above, but only does 9 multiplications and 9 additions.

We can also compute step256 faster by using a bit of matrix theory. Since we know the operation of step on the basis vectors, we can compute the matrix m corresponding to this linear operator:

   m ← Step˘ =⌜˜↕9
┌─
╵ 0 0 0 0 0 0 1 0 1
  1 0 0 0 0 0 0 0 0
  0 1 0 0 0 0 0 0 0
  0 0 1 0 0 0 0 0 0
  0 0 0 1 0 0 0 0 0
  0 0 0 0 1 0 0 0 0
  0 0 0 0 0 1 0 0 0
  0 0 0 0 0 0 1 0 0
  0 0 0 0 0 0 0 1 0
                    ┘

Note how we just applied Step for every column of the identity matrix.

We can solve the problem the same way now by computing the 256th power of the matrix and multiplying it to the histogram vector and summing up again.

First, we need the matrix product:

   MP ← +˝∘×⎉1‿∞

Naively, we can compute the power like this (we already have m to the power of 1!):

   m256 ← m MP⍟255 m

Now multiply d to it, and sum up:

   +´ m256 +˝∘× d
26984457539

Note that step256 is just the sum of rows, so we can precompute that:

   step256 ≡ +˝˘ m256
1

If you paid attention, you may now wonder why we replaced 256 row operations with 255 matrix multiplications and claim it’s faster to compute. We need an additional trick: fast matrix exponentiation. Instead of 255 matrix multiplications we just need 8 matrix squarings with itself to compute the 256th power:

   m256 ≡ MP˜⍟8 m
1

This yields an asymptotically faster algorithm to compute the step vector for higher powers. For non-powers of two you need to implement a square-and-multiply algorithm, or look up the optimal result. This is left as an exercise to the reader.

For day 80 we can use only 7 matrix multiplications:

   (MP˜⍟4 (m MP MP˜⍟2 m)) ≡ (m MP⍟79 m)
1

In theory, you could optimize this even further and compute a closed form from the eigenvectors of m, but you’ll end up with nonic roots and I have not found a way to compute them precise enough to be useful. So let’s leave it at that for now.

NP: Slade—My Oh My

20jan2021 · Remembering the work of David M. Tilbrook and the QED editor

Last week, I learned that David M. Tilbrook has died, which made me sad. I did not know him personally and cannot say much about his life, but I studied his publications and software ideas a lot, and consider them interesting, especially from a historic perspective.

Unfortunately, most of these things are on websites taken down already, so this post will refer to pages on the Internet Archive extensively. [Update 2020-01-16: His son Joe Tilbrook sent me a mail stating that http://qef.com/ is up again.]

I first came across David when I researched the history of the QED editor, the direct predecessor of the Unix standard text editor ed. The history of QED is well documented by Dennis Ritchie. However, for a long time the QED source code was hard to find, and I heard that David still maintained a version.

Around 2016, I swooped a copy of Caltech qed from USENIX tape 80.1 and tried making it work on modern platforms, with moderate success. Thanks to efforts by Arnold Robbins there is now an QED Archive which also features a copy of 1992 QED from Toronto, which contains contributions by Tom Duff, Robert Pike, Hugh Redelmeier and David Tilbrook. If you want to run it yourself, there is a modernized, UTF-8 aware version available now!

I do not know what David exactly contributed to QED, but from his writings its clear he was a heavy user of QED and wrote many scripts in it. Remember that in the early 80’s, awk was quite limited and Perl did not exist, so general-purpose string processing on Unix was difficult. We will have an example of a small QED script at the end of this post.

David’s opus magnum was a suite of tools called QEF, quod erat faciendum. Euclid wrote this at the end of geometric constructions, and in a software sense, we want a build system to produce what was to be made. At its time of creation, David was responsible for maintaining a (for the time) large system, essentially a Unix distribution. Tooling was stuck in the era of 1977’s make(1). For the details and basic ideas, see Tilbrook and Place (1986), “Tools for the Maintenance and Installation of a Large Software Distribution” (Huge thanks to Alan Grosskurth for making a copy available.) My favorite footnote is the one about their Prolog prototype of a build system: “Is the cray free? I need to reinstall /bin/true!”

Back then, Unix software development was plagued with all kinds of incompatibilities and vendor-specific workarounds, and QEF was one of the first tools to provide concepts like out-of-tree builds, automatic dependency detection, and provided portable abstractions for things like creating shared libraries, which required performing occult rituals back then.

The QEF whitepaper from 1996 explains the system at a more developed state.

What is intriguing is that the whole toolkit is created in classic Unix manner from small tools and little languages. I acquired an evaluation copy of QEF in 2015, but sadly it had no copy of QED included. However, I could read the full manpages for many of his tools.

Looking at these tools was instructive; many are obsoleted now because the features have been added to standard tools, or we can afford using more general, inefficient tools now. But for example, his go tool directly influenced the user interface of my nq tool, and reading about rls, lls, fexists, and ftest inspired my lr. His rpl lives on as my mend.

There are also additional resources and talks by David worth checking out.

Well, I promised you some QED code earlier, so let’s do fizzbuzz. (I heard this is useful in job interviews, so next time why not ask whether you can use QED for the task!)

It turns out this does not get as obscure as expected, but I’m not claiming what I wrote is idiomatic QED. I hacked it together while studying the tutorial a little bit.

ba
"loop
zc#+1

a\N\N.
za#:\zc%3=0 yf s/$/fizz/
za#:\zc%5=0 yf s/$/buzz/
s/^$/\zc/

zc#=100 yf`loop

,p
Q

We can run this using the convenient -x flag:

% qed -x fizzbuzz.qed
1
2
fizz
4
buzz
fizz
7
8
fizz
buzz
11
fizz
13
14
fizzbuzz
16
    ...

So, how does it work? First, we switch to buffer a of QED’s 56 available buffers (ba). By default we are in the script buffer named ~ due to the -x flag, which made me scratch my head in the beginning because the script always printed its own source at first!

Next, we set up a loop. QED provides structured programming in the form of a while loop, but that requires fitting the loop body on the same line. Instead, we will use jumps. We start with a label "loop and increment the c register using zc#+1. The # enables arithmetic mode for registers, they also can be strings. Since the register is empty, it will be regarded as zero. The end of the loop is the zc#=100 yf`loop line, which checks whether the c register equals 100, and if not (yf) jumps back to the label loop. Curiously, QED has different commands for jumping forwards (') and backwards (`). Having explicit directions is, I assume, an implementation detail of the buffer-based program storage, but compare with Unix’s goto or TECO’s O command, which both search from the top.

Inside the loop, we first append an empty line (a\N\N.) This works like in ed:

a

.

But QED allows using \N as a newline, so we can save some screen space.

Then, we need to do the fizzbuzz logic. The command za#:\zc%3=0 will copy the c register to the a register (we need to do this, as all arithmetic operations store their result in a register), then we take the a register modulo 3 and check if the register is then zero. If it isn’t, we jump a line forward (yf). If we didn’t jump, we append fizz to the line, using the s command (pretty much like in ed).

We deal with the buzz case the same way, but modulo 5.

If the line is still empty, we have to say the number, so we append it using s again and this time insert the c register into the input stream using \zc. Note that this not interpolated by the s command, but by QED itself. You can write \zc (or \N) essentially everywhere and it will work as if you’d have typed it in! A \ can be quoted as \c, so we can do this:

za:foo\czb
zb:bar
a
\za
.
p
foobar

Note how the insertion of the a register resulted in the b register being inserted as well! Luckily, the people involved with QED did better when they wrote shells later.

At the end of the loop, we have the fizzbuzz output in the buffer, so we can just print out the whole thing (,p) and quit QED with no questions asked (Q).

That wasn’t so bad, was it? Just like the tutorial says:

By striking a harmonious balance between Qed and UNIX’s other tools, the intelligent user will find Qed powerful, flexible, easy to master and fun!

NP: Talking Heads—Seen And Not Seen

24dec2020 · Merry Christmas!

Frohe Weihnachten, ein schönes Fest, und einen guten Rutsch ins neue Jahr wünscht euch
Leah Neukirchen

Merry Christmas and a Happy New Year!

NP: Elvis Perkins—It's A Sad World After All

23oct2020 · Efficient text editing on a PDP-10

I did not know into what rabbit hole I’d fall when I clicked on last week’s Lazy Reading post and discovered the link to SAILDART. The linked e-book gives a good preview of what we can find there: a complete archive of the Stanford AI project’s PDP-10 setup (a.k.a SAIL), and large parts of it are open for public access!

Back in the early seventies, when American universities got the first computers suitable for multi-user usage, it was common to develop custom operating systems (or heavily modify the vendor’s one), as research facilities had special needs, and also a different mindset than commercial offerings.

I was mostly familiar with the Incompatible Timesharing System (ITS) of MIT (written for a PDP-6, later on a PDP-10), due to its leakage of concepts and culture into Lisp Machines and the GNU project. And of course, Berkeley is known for developments around Unix in the late seventies which then turned into BSD. However, I had only remotely heard of WAITS, the operating system used at the Stanford Artificial Intelligence Laboratory, which first ran on a PDP-6, and later on a PDP-10. Initial development was a based on the standard DEC monitor (“kernel”). [2020-10-26: Rich Anderson points out that it was not based on TOPS-10, but rather a spinoff.]

So I started to dig around in the SAILDART archives, and quickly found Donald Knuth’s TeX working directory, because this actually was the system TeX initially was developed on! Not only the early TeX78 can be found there (look for the TEX*.SAI files), but also the source code of TeX82, written in literate Pascal, which essentially still is the core of today’s TeX setups.

But not only that, we also can find the source of the TeXbook and parts of TAOCP. (Looking at TeX code from its creator himself is very instructive, by the way.)

One thing that surprised me was that all these files were rather big for the time; while the TeX78 code was split up into 6 files, TeX82 was a 1 megabyte file and the TeXbook a single 1.5 megabyte file. This makes sense for redistribution of course, but there is no evidence the files were not kept around as-is, which brought me to the central question of this post:

How was it possible to efficiently edit files bigger than a megabyte on a PDP-10?

Remember that, at the time in question, these systems supported at most 256 kilowords—and in later versions 4 megawords—(the PDP-10 is a 36-bit machine, usually one packed 5 7-bit ASCII characters into a word) main memory at most, and 256 kilowords per process, so simply loading the file fully into memory was impossible. A smarter approach was needed. Earlier editors on WAITS, such as SOS (SON OF STOPGAP, an even older editor), had to write the file with the changes you made into a different output file which was then moved to overwrite the original contents. Of course, this had the disadvantage that saving big files was very slow, as rewriting a megabyte file easily could take several tens of seconds.

The most popular editor of WAITS, called E (manual as PDF, reference, source), had a better approach: big text files were split into pages, separated by form feeds (^L), and the editor could only load some of these pages into main memory. It was recommended to keep pages less than 200 lines, which roughly are 3 kilowords. Finally, E edited files in-place and only wrote out changed pages, so fixing a single character typo, for example, just required a quick write.

In order to know where the pages start, E maintained a directory page, which was the first page of a file, starting with COMMENT (click on some links above to see an example) and then the list of pages and their offset. Thus, seeking to a specific page was very quick, and the directory page doubled as a table of contents for bigger files, which improved the user experience.

This directory page was part of the file. Compilers and tools had to be adjusted to ignore it if needed (well, FORTRAN ignored lines with C anyway…), but for example TeX had modifications (see “Reading the first line of a file”) to skip the directory page.

This sounded all plausible to me, until I realized that it would not work for actual editing, because you of course not only overwrite characters, but also insert a word or a line here or there when you are working on a big program. E would still have to rewrite the whole file after the insertion, I thought!

So I dug deeper and realized I had to rethink some implicit assumptions I had from using Unix-like systems for the last 20 years. On Unix, a file is a linear stream of bytes: you cannot insert data in the middle without rewriting the file until the end.

However, WAITS used a record-based file system. We can read read about it in the documentation on UUOs (what a Unix user would call syscalls):

A disk or dectape file consists of a series of 200 word records. Often, these records are read (or written) sequentially from the beginning of the file to the end. But sometimes one wishes to read or alter only selected parts of a file. Three random access UUOs are provided to allow the user to do exactly that. To do random access input or output, you must specify which record you want to reference next. On the disk, the records of a file are numbered consecutively from 1 to n, where the file is n records long.

This means that a WAITS file is a sequence of records, not a sequence of bytes. And if a record was not full, it could be re-written with more content! Of course, if you inserted so much you actually needed to insert a new record, the file needed to be rewritten. This was called “bubbling”, and E also did it in-place. But for small edits, rewriting the records that contained the changed pages was enough.

I think the record-oriented file system of WAITS was actually key to support editing big files in this environment. Other systems at the time did not support this as well: Unix ed loaded the whole file into memory [2020-10-25: as Eric Fisher points out, ed uses a temporary file to keep blocks if the buffer reaches a certain size] and wrote it out again, and Unix consists of many small files not larger than 1000 lines. On ITS, the only bigger files I could find were assembled from other inputs, or mail archives which were only read or appended to, but not modified inside.

However, as having more memory got feasible, all these optimizations became obsolete. All modern text editors load files directly into memory and rewrite them when saving.

The other thing I found that amazed me was how much the E command set influenced Emacs! Richard Stallman saw the E editor in action and wanted a real-time screen editor for ITS as well, so their TECO got a full screen mode. I think that Emacs’ choice of modifier keys (Control, Meta, or both) and things like prefix arguments are directly taken from E. However, E was still fundamentally line-based and only supported interactive editing of whole lines (reusing the system line editor for performance reasons). TECO was stream-oriented and then supported direct editing on the screen.

Digging through the SAILDART archives, and then looking into fragments of ITS for comparison also showed interesting cultural differences: WAITS used mechanisms for accounting disk space and CPU usage, and projects had to be registered to be paid for (I have not heard of any such features for ITS). WAITS requires logins with passwords from remote connections (this was added to ITS very late). The ITS documentation is full of slang and in-jokes. But not everything was serious on WAITS: SPACEWAR was very important and there are references to it all over the place.

There are many interesting things to be found in SAILDART, I recommend you to look around for yourself.

If you got curious now, it’s actually possible to run WAITS on your own machine! Grab SIMH and a WAITS image and you can get it running pretty easily. I recommend having a Monitor Command Manual handy. (Also note that currently there is a bug in the SIMH repository which makes graphical login impossible. I can vouch that commit c062c7589 works.)

Thanks go to Madeline Autumn-Rose, Richard Cornwell and Shreevatsa R for helpful comments. All errors in above text are mine, drop a mail if you have a correction.

NP: Bob Dylan—I Contain Multitudes

01jun2020 · 5 days with the MX Ergo

Five days ago, I got an Logitech MX Ergo trackball delivered. Here are some notes of me getting used to it and how I set it up.

I learned about the MX Ergo in Michael Stapelberg’s desk setup post and Puck also recommended it. My last experience with a trackball was with a two-button Logitech Marble Mouse probably 15 years ago and did not go very well.

Luckily, I get along with a thumb-driven trackball a lot better! I could use it almost instantly and after maybe half an hour of use I already had a good precision for common use.

The MX Ergo has a regular mouse-style 2 button setup with a clickable mousewheel in between. I feels so much like a mouse that you may find yourself trying to shift it over the table at first ;), but the solid plate will remember you to not do that.

There are two settings for the MX Ergo, either flat or tilted 20 degrees. I find the tilted setting more comfortable. The trackballs then lays comfortably in my hand.

I use the included Logitech Unifying receiver which works out of the box on Linux. I have not tried using Bluetooth yet, but I may in the future when I run out of USB ports.

There are two extra buttons (button 8 and 9) next to the left mouse button, which by default do browser history forward/backward. I don’t need that and reconfigured them as follows: the top button is assigned to my window manager to raise a window (this would require a keyboard shortcut else), while the lower button also does a middle mouse click: I find it more comfortable than clicking the wheel, and I use the middle mouse button heavily. Having a physical middle mouse button would have been perfect (e.g. like the Lenovo ScrollPoint mouse).

Remapping the mouse button can be done with xinput:

xinput set-button-map $(xinput list --id-only 'pointer:Logitech MX Ergo') 1 2 3 4 5 6 7 2 9

Now for the downsides: in my opinion, the scrollwheel is pretty flawed. It has 18 ratchets and no high-definition mode (at least, not one that has more high-resolution events). Compared to scrolling with a touchpad, it feels very slow in comparison. Unfortunately, Linux provides no way to speed up scroll wheels per input device (libinput does not do it at all, and evdev only seems to possibly slow it down). I would vastly prefer a free spinning mouse wheel, which many other Logitech products have.

As a work-around, I have now enabled “mouse scrolling” with the top button, so I can scroll with the ball as well. This is signifcantly faster and pretty comfortable, and it actually works for both axes at the same time:

With evdev, this is setup as:

xinput set-prop $(xinput list --id-only 'pointer:Logitech MX Ergo') 324 1
xinput set-prop $(xinput list --id-only 'pointer:Logitech MX Ergo') 325 6 7 4 5
xinput set-prop $(xinput list --id-only 'pointer:Logitech MX Ergo') 328 9

The MX Ergo is wireless, and thus battery powered. I did not charge it yet and it has been on 50% battery all the time (with probably more than 10h use daily). The battery is supposed to charge very quickly. Charging works over USB. It would have been nice if the trackball was just usable over this USB connection as well, as wirelessness is hardly necessary for a stationary device on the desk. For power saving purposes, the trackball turns off after some time, but it turns on very quickly when being used.

Touching the trackball will also wake up my notebook by default, but this can be changed by putting disabled into the associated /sys/bus/usb/devices/*/power/wakeup.

The MX Ergo is usable with the right hand only, which means I can’t recommend it to about 10% of you. There is no left-handed option available.

My fingers are pretty long, which means my whole hand doesn’t fit on the device, and the heel of hand is on the table. I will maybe acquire a hand rest to go with it.

I paid 74€ for the MX Ergo, which is on the higher end of what I’d pay for a pointing device, but it feels very robust and will hopefully last a few years. I certainly don’t regret getting it and would also recommend it to others.

NP: KRS-One—Sound of da Police

11may2020 · A minimal .vimrc

I have been using Emacs since 2001, any only really learned Vim in 2007. Of course, I did quick admin edit tasks with it before, but in 2007 I had a small job where I had to develop on machines across the ocean, with only vim installed. As latency was high (and mosh didn’t exist yet), efficient use of Vim was essential. After the summer of 2007, I was fairly familiar with Vim. I still use Emacs for my main editing and programming tasks, but on remote machines and for small edit tasks I primarily use Vim.

I always found reading other people’s .vimrc (and analyses) interesting, so today I’ll share mine. Well, I actually have two .vimrc, a “big” one (233 lines) and a minimal one.

Why do you need a .vimrc at all? Well, since Vim 8, a defaults.vim is loaded else with some annoying settings (such as enabling mouse and colors), so you have two options:

  • Run Vim as vi, but then you are in compatible mode which disables some useful things.

  • Create a .vimrc—even if it’s empty—to start vim in non-compatible mode. But then I can also can fetch my minimal one and set some defaults I prefer.

    My minimal .vimrc is only 64 bytes long and does 80% of the job, so let’s dissect it:

    set nocp bs=2 cot= hid is ru sm t_te= t_ti= vb wim=longest,list
    

    As you can see, it only sets a few things in a single set command:

  • nocp: nocompatible; disable Vi compatible mode and enable a bunch of features. This is superfluous if you save the file as .vimrc, but not if you want to try it with vim -u .vimrc.minimal.

  • bs=2: backspace=indent,eol,start; allow backspace to delete the autoindent, linebreaks or the start of insert mode. This is just convenient sometimes.

  • cot=: completeopt=; this disables the popup menu for, e.g., ^X^F (filename completion). I never want popups to hide the text in my buffer.

  • hid: hidden; this allows having open buffers that are not displayed, and really should be the default.

  • is: incsearch; search as you type, pretty convenient.

  • ru: ruler; show the current line number.

  • sm: showmatch; quickly jump to the matching open bracket when typing the closing bracket.

  • t_te= t_ti=; unset the terminfo sequences that usually (de-)activate the alternate screen—I want the screen contents to stay after editing.

  • vb: visualbell: flash the display instead of beeping.

  • wim=longest,list: wildmode=longest,list: In command-line mode, complete the longest common string, then list alternatives. This is the default in bash/readline and how I configured my zsh.

So, in summary: 5 options enable features, 5 options disable annoying defaults. All in all, I think Vim has pretty good defaults.

NP: Pearl Jam—Buckle Up

02may2020 · The case of the mysterious --help directory

For quite some time—as my research will show, March 2016—, I have been annoyed by a directory named --help showing up in my home directory (and sometimes in others).

It’s just an empty, innocent directory with a weird name. Removed by a simple run of rmdir ./--help. Yet, it would turn up again. Sometimes soon, sometimes it took a few weeks.

I had no idea where it came from.

Obviously, a simple mkdir --help just print’s the mkdir help:

Usage: mkdir [OPTION]... DIRECTORY...
Create the DIRECTORY(ies), if they do not already exist.
...

I thought, maybe some program was called once with --help but didn’t understand it, yet stored it somewhere and would create it when run again. I suspected many things, especially applications I don’t use that often, like Gimp or Transmission. I grepped all dotfiles in $HOME for --help. To no avail.

I decided I had to track this down.

My first thought was to use fatrace, which logs all file accesses. But fatrace doesn’t register directory creation (the fanotify API does, I think).

I didn’t know what to do, and left the problem alone for some time. The --help directory kept reappearing.

Some time later, I read a book on eBPF, and decided I can log all mkdir(2) calls with it, globally. (I also think I tried this with SystemTap before, but I’m not sure anymore.)

I wrote a bpftrace script called mkdirsnoop, a variant of opensnoop. It would print all mkdir(2) calls and what program ran it, system wide.

I ran it a lot, but not consistently.

Yesterday, it finally triggered! I tweeted:

I’m really flipping the table.
For months, I’ve been trying to figure out what occasionally creates “–help” folders in my $HOME. I have no idea why it happens.
I wrote a bpftrace script to find the culprit.
Today it triggered.
It says, it’s mkdir(1).
(╯°□°)╯︵ ┻━┻

If it was mkdir(1), it must have been created by a script. Gimp and Transmission were acquitted of charge.

Julien Kirch proposed to put a different mkdir into my $PATH, which I think was a smart idea. I wrote this:

#!/bin/sh
printf '%s\n' "mkdir called from $$ at $(date)" >> /tmp/mkdirlog
pstree -sap $$ >> /tmp/mkdirlog
exec /usr/bin/mkdir "$@"

Very quickly, the script triggered, and I saw:

runit,1
  `-sh,30996 -c urxvt
      `-urxvt,30997
          `-zsh,30998
              `-zsh,31055
                  `-mkdir,31058 /home/leah/bin/mkdir --help
                      `-pstree,31060 -sap 31058

So it was actually zsh calling mkdir --help. (Or a zsh script.) But there was no directory --help. Because calling mkdir --help does not create a directory, instead printing the help.

Jannis Harder told me that the zsh completion for mkdir would run mkdir --help (to see if it’s the GNU variant), which I could verify.

But it didn’t explain the --help directory.

However, I had something to track down now.

Using my extrace, I would log all runs of mkdir with a --help argument, and at some point I found:

mkdir -p -- --help

Mysterious. I grepped my dotfiles again and found the unsuspicious function mkcd:

# mkcd -- mkdir and cd at once
mkcd() { mkdir -p -- "$1" && cd -- "$1" }

This is a helper function I use a lot. But I don’t call it with --help of course, and then wonder for 4 years why I do that.

Well I don’t. But zsh does, because I also had:

compdef mkcd=mkdir

This was a quick shortcut to complete mkdir’s arguments (only directories) also for mkcd. I added it in 2013. Even before I added the -p flag in 2016.

Quickly, I verified that mkcd <TAB> would create a --help directory. But only if the completion wasn’t loaded already, e.g. when I didn’t run mkdir <TAB> before. This explains why the directory only appeared sporadically.

I quickly rewrote the completion:

_mkcd() { _path_files -/ }
compdef _mkcd mkcd

And another mystery of my setup has been solved. Case closed.

NP: Pearl Jam—Take The Long Way

Copyright © 2004–2021