leah blogs

20jan2021 · Remembering the work of David M. Tilbrook and the QED editor

Last week, I learned that David M. Tilbrook has died, which made me sad. I did not know him personally and cannot say much about his life, but I studied his publications and software ideas a lot, and consider them interesting, especially from a historic perspective.

Unfortunately, most of these things are on websites taken down already, so this post will refer to pages on the Internet Archive extensively. [Update 2020-01-16: His son Joe Tilbrook sent me a mail stating that http://qef.com/ is up again.]

I first came across David when I researched the history of the QED editor, the direct predecessor of the Unix standard text editor ed. The history of QED is well documented by Dennis Ritchie. However, for a long time the QED source code was hard to find, and I heard that David still maintained a version.

Around 2016, I swooped a copy of Caltech qed from USENIX tape 80.1 and tried making it work on modern platforms, with moderate success. Thanks to efforts by Arnold Robbins there is now an QED Archive which also features a copy of 1992 QED from Toronto, which contains contributions by Tom Duff, Robert Pike, Hugh Redelmeier and David Tilbrook. If you want to run it yourself, there is a modernized, UTF-8 aware version available now!

I do not know what David exactly contributed to QED, but from his writings its clear he was a heavy user of QED and wrote many scripts in it. Remember that in the early 80’s, awk was quite limited and Perl did not exist, so general-purpose string processing on Unix was difficult. We will have an example of a small QED script at the end of this post.

David’s opus magnum was a suite of tools called QEF, quod erat faciendum. Euclid wrote this at the end of geometric constructions, and in a software sense, we want a build system to produce what was to be made. At its time of creation, David was responsible for maintaining a (for the time) large system, essentially a Unix distribution. Tooling was stuck in the era of 1977’s make(1). For the details and basic ideas, see Tilbrook and Place (1986), “Tools for the Maintenance and Installation of a Large Software Distribution” (Huge thanks to Alan Grosskurth for making a copy available.) My favorite footnote is the one about their Prolog prototype of a build system: “Is the cray free? I need to reinstall /bin/true!”

Back then, Unix software development was plagued with all kinds of incompatibilities and vendor-specific workarounds, and QEF was one of the first tools to provide concepts like out-of-tree builds, automatic dependency detection, and provided portable abstractions for things like creating shared libraries, which required performing occult rituals back then.

The QEF whitepaper from 1996 explains the system at a more developed state.

What is intriguing is that the whole toolkit is created in classic Unix manner from small tools and little languages. I acquired an evaluation copy of QEF in 2015, but sadly it had no copy of QED included. However, I could read the full manpages for many of his tools.

Looking at these tools was instructive; many are obsoleted now because the features have been added to standard tools, or we can afford using more general, inefficient tools now. But for example, his go tool directly influenced the user interface of my nq tool, and reading about rls, lls, fexists, and ftest inspired my lr. His rpl lives on as my mend.

There are also additional resources and talks by David worth checking out.

Well, I promised you some QED code earlier, so let’s do fizzbuzz. (I heard this is useful in job interviews, so next time why not ask whether you can use QED for the task!)

It turns out this does not get as obscure as expected, but I’m not claiming what I wrote is idiomatic QED. I hacked it together while studying the tutorial a little bit.


za#:\zc%3=0 yf s/$/fizz/
za#:\zc%5=0 yf s/$/buzz/

zc#=100 yf`loop


We can run this using the convenient -x flag:

% qed -x fizzbuzz.qed

So, how does it work? First, we switch to buffer a of QED’s 56 available buffers (ba). By default we are in the script buffer named ~ due to the -x flag, which made me scratch my head in the beginning because the script always printed its own source at first!

Next, we set up a loop. QED provides structured programming in the form of a while loop, but that requires fitting the loop body on the same line. Instead, we will use jumps. We start with a label "loop and increment the c register using zc#+1. The # enables arithmetic mode for registers, they also can be strings. Since the register is empty, it will be regarded as zero. The end of the loop is the zc#=100 yf`loop line, which checks whether the c register equals 100, and if not (yf) jumps back to the label loop. Curiously, QED has different commands for jumping forwards (') and backwards (`). Having explicit directions is, I assume, an implementation detail of the buffer-based program storage, but compare with Unix’s goto or TECO’s O command, which both search from the top.

Inside the loop, we first append an empty line (a\N\N.) This works like in ed:



But QED allows using \N as a newline, so we can save some screen space.

Then, we need to do the fizzbuzz logic. The command za#:\zc%3=0 will copy the c register to the a register (we need to do this, as all arithmetic operations store their result in a register), then we take the a register modulo 3 and check if the register is then zero. If it isn’t, we jump a line forward (yf). If we didn’t jump, we append fizz to the line, using the s command (pretty much like in ed).

We deal with the buzz case the same way, but modulo 5.

If the line is still empty, we have to say the number, so we append it using s again and this time insert the c register into the input stream using \zc. Note that this not interpolated by the s command, but by QED itself. You can write \zc (or \N) essentially everywhere and it will work as if you’d have typed it in! A \ can be quoted as \c, so we can do this:


Note how the insertion of the a register resulted in the b register being inserted as well! Luckily, the people involved with QED did better when they wrote shells later.

At the end of the loop, we have the fizzbuzz output in the buffer, so we can just print out the whole thing (,p) and quit QED with no questions asked (Q).

That wasn’t so bad, was it? Just like the tutorial says:

By striking a harmonious balance between Qed and UNIX’s other tools, the intelligent user will find Qed powerful, flexible, easy to master and fun!

NP: Talking Heads—Seen And Not Seen

24dec2020 · Merry Christmas!

Frohe Weihnachten, ein schönes Fest, und einen guten Rutsch ins neue Jahr wünscht euch
Leah Neukirchen

Merry Christmas and a Happy New Year!

NP: Elvis Perkins—It's A Sad World After All

23oct2020 · Efficient text editing on a PDP-10

I did not know into what rabbit hole I’d fall when I clicked on last week’s Lazy Reading post and discovered the link to SAILDART. The linked e-book gives a good preview of what we can find there: a complete archive of the Stanford AI project’s PDP-10 setup (a.k.a SAIL), and large parts of it are open for public access!

Back in the early seventies, when American universities got the first computers suitable for multi-user usage, it was common to develop custom operating systems (or heavily modify the vendor’s one), as research facilities had special needs, and also a different mindset than commercial offerings.

I was mostly familiar with the Incompatible Timesharing System (ITS) of MIT (written for a PDP-6, later on a PDP-10), due to its leakage of concepts and culture into Lisp Machines and the GNU project. And of course, Berkeley is known for developments around Unix in the late seventies which then turned into BSD. However, I had only remotely heard of WAITS, the operating system used at the Stanford Artificial Intelligence Laboratory, which first ran on a PDP-6, and later on a PDP-10. Initial development was a based on the standard DEC monitor (“kernel”). [2020-10-26: Rich Anderson points out that it was not based on TOPS-10, but rather a spinoff.]

So I started to dig around in the SAILDART archives, and quickly found Donald Knuth’s TeX working directory, because this actually was the system TeX initially was developed on! Not only the early TeX78 can be found there (look for the TEX*.SAI files), but also the source code of TeX82, written in literate Pascal, which essentially still is the core of today’s TeX setups.

But not only that, we also can find the source of the TeXbook and parts of TAOCP. (Looking at TeX code from its creator himself is very instructive, by the way.)

One thing that surprised me was that all these files were rather big for the time; while the TeX78 code was split up into 6 files, TeX82 was a 1 megabyte file and the TeXbook a single 1.5 megabyte file. This makes sense for redistribution of course, but there is no evidence the files were not kept around as-is, which brought me to the central question of this post:

How was it possible to efficiently edit files bigger than a megabyte on a PDP-10?

Remember that, at the time in question, these systems supported at most 256 kilowords—and in later versions 4 megawords—(the PDP-10 is a 36-bit machine, usually one packed 5 7-bit ASCII characters into a word) main memory at most, and 256 kilowords per process, so simply loading the file fully into memory was impossible. A smarter approach was needed. Earlier editors on WAITS, such as SOS (SON OF STOPGAP, an even older editor), had to write the file with the changes you made into a different output file which was then moved to overwrite the original contents. Of course, this had the disadvantage that saving big files was very slow, as rewriting a megabyte file easily could take several tens of seconds.

The most popular editor of WAITS, called E (manual as PDF, reference, source), had a better approach: big text files were split into pages, separated by form feeds (^L), and the editor could only load some of these pages into main memory. It was recommended to keep pages less than 200 lines, which roughly are 3 kilowords. Finally, E edited files in-place and only wrote out changed pages, so fixing a single character typo, for example, just required a quick write.

In order to know where the pages start, E maintained a directory page, which was the first page of a file, starting with COMMENT (click on some links above to see an example) and then the list of pages and their offset. Thus, seeking to a specific page was very quick, and the directory page doubled as a table of contents for bigger files, which improved the user experience.

This directory page was part of the file. Compilers and tools had to be adjusted to ignore it if needed (well, FORTRAN ignored lines with C anyway…), but for example TeX had modifications (see “Reading the first line of a file”) to skip the directory page.

This sounded all plausible to me, until I realized that it would not work for actual editing, because you of course not only overwrite characters, but also insert a word or a line here or there when you are working on a big program. E would still have to rewrite the whole file after the insertion, I thought!

So I dug deeper and realized I had to rethink some implicit assumptions I had from using Unix-like systems for the last 20 years. On Unix, a file is a linear stream of bytes: you cannot insert data in the middle without rewriting the file until the end.

However, WAITS used a record-based file system. We can read read about it in the documentation on UUOs (what a Unix user would call syscalls):

A disk or dectape file consists of a series of 200 word records. Often, these records are read (or written) sequentially from the beginning of the file to the end. But sometimes one wishes to read or alter only selected parts of a file. Three random access UUOs are provided to allow the user to do exactly that. To do random access input or output, you must specify which record you want to reference next. On the disk, the records of a file are numbered consecutively from 1 to n, where the file is n records long.

This means that a WAITS file is a sequence of records, not a sequence of bytes. And if a record was not full, it could be re-written with more content! Of course, if you inserted so much you actually needed to insert a new record, the file needed to be rewritten. This was called “bubbling”, and E also did it in-place. But for small edits, rewriting the records that contained the changed pages was enough.

I think the record-oriented file system of WAITS was actually key to support editing big files in this environment. Other systems at the time did not support this as well: Unix ed loaded the whole file into memory [2020-10-25: as Eric Fisher points out, ed uses a temporary file to keep blocks if the buffer reaches a certain size] and wrote it out again, and Unix consists of many small files not larger than 1000 lines. On ITS, the only bigger files I could find were assembled from other inputs, or mail archives which were only read or appended to, but not modified inside.

However, as having more memory got feasible, all these optimizations became obsolete. All modern text editors load files directly into memory and rewrite them when saving.

The other thing I found that amazed me was how much the E command set influenced Emacs! Richard Stallman saw the E editor in action and wanted a real-time screen editor for ITS as well, so their TECO got a full screen mode. I think that Emacs’ choice of modifier keys (Control, Meta, or both) and things like prefix arguments are directly taken from E. However, E was still fundamentally line-based and only supported interactive editing of whole lines (reusing the system line editor for performance reasons). TECO was stream-oriented and then supported direct editing on the screen.

Digging through the SAILDART archives, and then looking into fragments of ITS for comparison also showed interesting cultural differences: WAITS used mechanisms for accounting disk space and CPU usage, and projects had to be registered to be paid for (I have not heard of any such features for ITS). WAITS requires logins with passwords from remote connections (this was added to ITS very late). The ITS documentation is full of slang and in-jokes. But not everything was serious on WAITS: SPACEWAR was very important and there are references to it all over the place.

There are many interesting things to be found in SAILDART, I recommend you to look around for yourself.

If you got curious now, it’s actually possible to run WAITS on your own machine! Grab SIMH and a WAITS image and you can get it running pretty easily. I recommend having a Monitor Command Manual handy. (Also note that currently there is a bug in the SIMH repository which makes graphical login impossible. I can vouch that commit c062c7589 works.)

Thanks go to Madeline Autumn-Rose, Richard Cornwell and Shreevatsa R for helpful comments. All errors in above text are mine, drop a mail if you have a correction.

NP: Bob Dylan—I Contain Multitudes

01jun2020 · 5 days with the MX Ergo

Five days ago, I got an Logitech MX Ergo trackball delivered. Here are some notes of me getting used to it and how I set it up.

I learned about the MX Ergo in Michael Stapelberg’s desk setup post and Puck also recommended it. My last experience with a trackball was with a two-button Logitech Marble Mouse probably 15 years ago and did not go very well.

Luckily, I get along with a thumb-driven trackball a lot better! I could use it almost instantly and after maybe half an hour of use I already had a good precision for common use.

The MX Ergo has a regular mouse-style 2 button setup with a clickable mousewheel in between. I feels so much like a mouse that you may find yourself trying to shift it over the table at first ;), but the solid plate will remember you to not do that.

There are two settings for the MX Ergo, either flat or tilted 20 degrees. I find the tilted setting more comfortable. The trackballs then lays comfortably in my hand.

I use the included Logitech Unifying receiver which works out of the box on Linux. I have not tried using Bluetooth yet, but I may in the future when I run out of USB ports.

There are two extra buttons (button 8 and 9) next to the left mouse button, which by default do browser history forward/backward. I don’t need that and reconfigured them as follows: the top button is assigned to my window manager to raise a window (this would require a keyboard shortcut else), while the lower button also does a middle mouse click: I find it more comfortable than clicking the wheel, and I use the middle mouse button heavily. Having a physical middle mouse button would have been perfect (e.g. like the Lenovo ScrollPoint mouse).

Remapping the mouse button can be done with xinput:

xinput set-button-map $(xinput list --id-only 'pointer:Logitech MX Ergo') 1 2 3 4 5 6 7 2 9

Now for the downsides: in my opinion, the scrollwheel is pretty flawed. It has 18 ratchets and no high-definition mode (at least, not one that has more high-resolution events). Compared to scrolling with a touchpad, it feels very slow in comparison. Unfortunately, Linux provides no way to speed up scroll wheels per input device (libinput does not do it at all, and evdev only seems to possibly slow it down). I would vastly prefer a free spinning mouse wheel, which many other Logitech products have.

As a work-around, I have now enabled “mouse scrolling” with the top button, so I can scroll with the ball as well. This is signifcantly faster and pretty comfortable, and it actually works for both axes at the same time:

With evdev, this is setup as:

xinput set-prop $(xinput list --id-only 'pointer:Logitech MX Ergo') 324 1
xinput set-prop $(xinput list --id-only 'pointer:Logitech MX Ergo') 325 6 7 4 5
xinput set-prop $(xinput list --id-only 'pointer:Logitech MX Ergo') 328 9

The MX Ergo is wireless, and thus battery powered. I did not charge it yet and it has been on 50% battery all the time (with probably more than 10h use daily). The battery is supposed to charge very quickly. Charging works over USB. It would have been nice if the trackball was just usable over this USB connection as well, as wirelessness is hardly necessary for a stationary device on the desk. For power saving purposes, the trackball turns off after some time, but it turns on very quickly when being used.

Touching the trackball will also wake up my notebook by default, but this can be changed by putting disabled into the associated /sys/bus/usb/devices/*/power/wakeup.

The MX Ergo is usable with the right hand only, which means I can’t recommend it to about 10% of you. There is no left-handed option available.

My fingers are pretty long, which means my whole hand doesn’t fit on the device, and the heel of hand is on the table. I will maybe acquire a hand rest to go with it.

I paid 74€ for the MX Ergo, which is on the higher end of what I’d pay for a pointing device, but it feels very robust and will hopefully last a few years. I certainly don’t regret getting it and would also recommend it to others.

NP: KRS-One—Sound of da Police

11may2020 · A minimal .vimrc

I have been using Emacs since 2001, any only really learned Vim in 2007. Of course, I did quick admin edit tasks with it before, but in 2007 I had a small job where I had to develop on machines across the ocean, with only vim installed. As latency was high (and mosh didn’t exist yet), efficient use of Vim was essential. After the summer of 2007, I was fairly familiar with Vim. I still use Emacs for my main editing and programming tasks, but on remote machines and for small edit tasks I primarily use Vim.

I always found reading other people’s .vimrc (and analyses) interesting, so today I’ll share mine. Well, I actually have two .vimrc, a “big” one (233 lines) and a minimal one.

Why do you need a .vimrc at all? Well, since Vim 8, a defaults.vim is loaded else with some annoying settings (such as enabling mouse and colors), so you have two options:

  • Run Vim as vi, but then you are in compatible mode which disables some useful things.

  • Create a .vimrc—even if it’s empty—to start vim in non-compatible mode. But then I can also can fetch my minimal one and set some defaults I prefer.

    My minimal .vimrc is only 64 bytes long and does 80% of the job, so let’s dissect it:

    set nocp bs=2 cot= hid is ru sm t_te= t_ti= vb wim=longest,list

    As you can see, it only sets a few things in a single set command:

  • nocp: nocompatible; disable Vi compatible mode and enable a bunch of features. This is superfluous if you save the file as .vimrc, but not if you want to try it with vim -u .vimrc.minimal.

  • bs=2: backspace=indent,eol,start; allow backspace to delete the autoindent, linebreaks or the start of insert mode. This is just convenient sometimes.

  • cot=: completeopt=; this disables the popup menu for, e.g., ^X^F (filename completion). I never want popups to hide the text in my buffer.

  • hid: hidden; this allows having open buffers that are not displayed, and really should be the default.

  • is: incsearch; search as you type, pretty convenient.

  • ru: ruler; show the current line number.

  • sm: showmatch; quickly jump to the matching open bracket when typing the closing bracket.

  • t_te= t_ti=; unset the terminfo sequences that usually (de-)activate the alternate screen—I want the screen contents to stay after editing.

  • vb: visualbell: flash the display instead of beeping.

  • wim=longest,list: wildmode=longest,list: In command-line mode, complete the longest common string, then list alternatives. This is the default in bash/readline and how I configured my zsh.

So, in summary: 5 options enable features, 5 options disable annoying defaults. All in all, I think Vim has pretty good defaults.

NP: Pearl Jam—Buckle Up

02may2020 · The case of the mysterious --help directory

For quite some time—as my research will show, March 2016—, I have been annoyed by a directory named --help showing up in my home directory (and sometimes in others).

It’s just an empty, innocent directory with a weird name. Removed by a simple run of rmdir ./--help. Yet, it would turn up again. Sometimes soon, sometimes it took a few weeks.

I had no idea where it came from.

Obviously, a simple mkdir --help just print’s the mkdir help:

Usage: mkdir [OPTION]... DIRECTORY...
Create the DIRECTORY(ies), if they do not already exist.

I thought, maybe some program was called once with --help but didn’t understand it, yet stored it somewhere and would create it when run again. I suspected many things, especially applications I don’t use that often, like Gimp or Transmission. I grepped all dotfiles in $HOME for --help. To no avail.

I decided I had to track this down.

My first thought was to use fatrace, which logs all file accesses. But fatrace doesn’t register directory creation (the fanotify API does, I think).

I didn’t know what to do, and left the problem alone for some time. The --help directory kept reappearing.

Some time later, I read a book on eBPF, and decided I can log all mkdir(2) calls with it, globally. (I also think I tried this with SystemTap before, but I’m not sure anymore.)

I wrote a bpftrace script called mkdirsnoop, a variant of opensnoop. It would print all mkdir(2) calls and what program ran it, system wide.

I ran it a lot, but not consistently.

Yesterday, it finally triggered! I tweeted:

I’m really flipping the table.
For months, I’ve been trying to figure out what occasionally creates “–help” folders in my $HOME. I have no idea why it happens.
I wrote a bpftrace script to find the culprit.
Today it triggered.
It says, it’s mkdir(1).
(╯°□°)╯︵ ┻━┻

If it was mkdir(1), it must have been created by a script. Gimp and Transmission were acquitted of charge.

Julien Kirch proposed to put a different mkdir into my $PATH, which I think was a smart idea. I wrote this:

printf '%s\n' "mkdir called from $$ at $(date)" >> /tmp/mkdirlog
pstree -sap $$ >> /tmp/mkdirlog
exec /usr/bin/mkdir "$@"

Very quickly, the script triggered, and I saw:

  `-sh,30996 -c urxvt
                  `-mkdir,31058 /home/leah/bin/mkdir --help
                      `-pstree,31060 -sap 31058

So it was actually zsh calling mkdir --help. (Or a zsh script.) But there was no directory --help. Because calling mkdir --help does not create a directory, instead printing the help.

Jannis Harder told me that the zsh completion for mkdir would run mkdir --help (to see if it’s the GNU variant), which I could verify.

But it didn’t explain the --help directory.

However, I had something to track down now.

Using my extrace, I would log all runs of mkdir with a --help argument, and at some point I found:

mkdir -p -- --help

Mysterious. I grepped my dotfiles again and found the unsuspicious function mkcd:

# mkcd -- mkdir and cd at once
mkcd() { mkdir -p -- "$1" && cd -- "$1" }

This is a helper function I use a lot. But I don’t call it with --help of course, and then wonder for 4 years why I do that.

Well I don’t. But zsh does, because I also had:

compdef mkcd=mkdir

This was a quick shortcut to complete mkdir’s arguments (only directories) also for mkcd. I added it in 2013. Even before I added the -p flag in 2016.

Quickly, I verified that mkcd <TAB> would create a --help directory. But only if the completion wasn’t loaded already, e.g. when I didn’t run mkdir <TAB> before. This explains why the directory only appeared sporadically.

I quickly rewrote the completion:

_mkcd() { _path_files -/ }
compdef _mkcd mkcd

And another mystery of my setup has been solved. Case closed.

NP: Pearl Jam—Take The Long Way

26apr2020 · Brute forthing minimal programs for stack arrangements

Recently I have been nerdsniped into thinking about optimal stack programs, and one part of this are optimal stack arrangements, that is, words (i.e. procedures) that rearrange the stack but do not look into the contents. Borrowing terms of functional programming parlor, we can call these “words for free”, as their behavior is uniquely determined by their stack effect, which we write as ( before -- after ) as is customary in Forth. For example ( a -- aa ) is dup, which duplicates top of stack, while ( ab -- ba ) is swap, which exchanges the two topmost elements of stack.

In most common Forths, the stack arrangement primitives are: drop, dup, swap. Together with the words to handle the return stack (>r and r>), every stack arrangement can be performed, as we’ll see below. I don’t want to explain the semantics of the return stack, for now we can just think of >r moving an element from the stack to the return stack, and r> from the return stack to the stack. Additionally, at the end of the word, the return stack must be in the same state at as the beginning of the word.

For reasons of performance, of course many Forths also implement more complex words as primitives.

I wondered if a Forth with fewer than these 5 words can still be complete, and a bit of research turned up a paper by Mikael Patel from 1990. It’s written in Swedish, but we can figure out the essentials from page 4:

variable t
: t! ( x -- ) t ! ;
: t@ ( -- x ) t @ ;

If you don’t speak Forth, that’s just a single global variable and two words to set (t!) and read it (t@). The paper then shows how to define drop, dup and swap from these. (Or see below.)

We shall call a Forth with t!, t@, >r, and r> a Minimal Forth.

So, I wondered how to get the optimal definitions for many stack words, and while I wished for a smarter approach I didn’t find anything that can deal with the infinite stack well (of course, you can limit it, but also implementing a stack machine in a way a SAT/SMT solver likes seems to be inconvenient), I just decided I can brute force short forth programs.

So I wrote femtoforth, which implements these minimal subsets in Ruby:

def femtoforth(input, code)
  s = input.dup
  r = []
  t = nil
  code.each { |insn|
    case insn
    when :"t@";    s << t
    when :"t!";    return :oos  if s.empty?;      t = s.pop
    when :">r";    return :oos  if s.empty?;      r << s.pop
    when :"r>";    return :oor  if r.empty?;      s << r.pop
    when :"dup";   return :oos  if s.empty?;      s << s.last
    when :"swap";  return :oos  if s.size < 2;    s[-1],s[-2] = s[-2],s[-1]
    when "drop";   return :oos  if s.empty?;      s.pop
    else           raise "Invalid insn: #{insn}"

  return :error  if !r.empty? or s.include? nil

And then we can happily brute force all programs of length l:

stdops.repeated_permutation(l).each { |code|
  if femtoforth(input, code) == output
    return code

Turns out that with recent hardware, programs up to size 16 are tractable in a few hours without a particularily fast implementation.

These are the results I found for the standard words, and some custom ones I think can be useful:

Word Effect Standard Forth Minimal Forth
DROP ( a – ) drop t!
DUP ( a – aa ) dup t! t@ t@
SWAP ( ab – ba ) swap t! >r t@ r>
NIP ( ab – b ) swap drop >r t! r>
OVER ( ab – aba ) >r dup r> swap >r t! t@ r> t@
TUCK ( ab – bab ) dup >r swap r> t! >r t@ r> t@
ROT ( abc – bca ) >r swap r> swap >r >r t! r> r> t@
2SWAP ( abcd – cdab ) >r swap >r swap r> r> swap >r swap r> >r t! >r >r t@ r> r> r> t! >r >r t@ r> r>
2DUP ( ab – abab ) >r dup r> dup >r swap r> t! t@ t@ >r >r t! t@ r> t@ r>
2OVER ( abcd – abcdab ) n/a n/a
2DROP ( ab – ) drop drop t! t!
-ROT ( abc – cab ) swap >r swap r> t! >r >r t@ r> r>
3REV ( abc – cba ) swap >r swap r> swap t! >r t@ >r t! r> r> t@
4REV ( abcd – dcba ) >r swap r> swap >r swap >r swap r> swap r> swap t! >r t@ >r t! r> r> t@ >r >r >r t! r> r> r> t@
THIRD ( abc – abca ) >r >r dup r> swap r> swap >r >r t! t@ r> r> t@
FOURTH ( abcd – abcda ) >r >r >r dup r> swap r> swap r> swap >r >r >r t! t@ r> r> r> t@
3DUP ( abc – abcabc ) dup >r swap dup >r swap >r >r dup r> swap r> swap r> r> n/a

The words with n/a are not possible to define in up to 16 words.

We know that 2OVER is >r >r 2dup r> r> 2swap, so by mere inlining that we get 21 primitive words in Standard Forth, and 28 primitive words in Minimal Forth. A definition for 3DUP is third third third, so 3DUP is at most 21 words in Minimal Forth, which is shorter than a translation of Standard Forth 3DUP to Minimal Forth.

Why are both sets of words complete in the sense that any stack arrangement can be expressed with it? Essentially, we can decompose every stack arrangement into a compsition of multiple words that extract a certain stack value and then keep the rest of the stack below, e.g. to implement the word ( abc -- cab ) we can compose the four words ( abc -- cabc ) then ( abc -- aabc ) then ( abc -- babc ) then ( abc -- ). Of course, this doesn’t yield an efficient program. But all of these words are easy to define by shuffling the stack to the return stack until there is the value you need, and then dup‘ing and swap‘ing it around until you move the whole return stack back to the stack in the end.

Word Effect Standard Forth Minimal Forth
W1 ( abc – cabc ) dup >r swap >r swap r> r> t! >r >r t@ r> r> t@
W2 ( abc – aabc ) >r >r dup r> r> >r >r t! t@ t@ r> r>
W3 ( abc – babc ) >r dup >r swap r> r> >r t! >r t@ r> t@ r>
W4 ( abc – ) drop drop drop t! t! t!

In this contemplation, we focused on the runtime of the words (assuming every primitive takes 1 step). It would also be interesting to optimize for size, by factoring out common (sub)words into their own definitions, but this makes the problem a lot more complex.

NP: Pearl Jam—Take The Long Way

18apr2020 · Shrinking multiple TeX Live installations

As an avid TeX user, I like to have a full installation of TeX Live, the standard Unix distribution, around. New TeX Live releases come every year, but it’s often useful to keep old releases around to ensure you can build a paper you wrote three years ago, or to help others which run a non-recent version of TeX Live.

Unfortunately, TeX Live installations are getting quite big; see here the sizes of the versions I have:

# du --apparent-size -shc /opt/texlive/20*
4.9G    /opt/texlive/2017
5.5G    /opt/texlive/2018
6.2G    /opt/texlive/2019
6.2G    /opt/texlive/2020

Note that on common file systems, these require even more disk space due to consisting of many rather small text files files, which are rounded up to the block size, e.g. here on an ext4 file system:

6.6G    /opt/texlive/2020

Luckily, the machine above uses ZFS which has built-in LZ4 compression that does a pretty good job on TeX Live:

4.2G    /opt/texlive/2017
4.7G    /opt/texlive/2018
5.3G    /opt/texlive/2019
5.3G    /opt/texlive/2020
20G     total

Yesterday, I realized that actually many of these files don’t change, since there are many packages (especially fonts) that rarely get updated. Let’s use jdupes to find out how many:

# jdupes -r -m /opt/texlive/20*
Scanning: 729045 files, 53145 items (in 4 specified)
467364 duplicate files (in 208444 sets), occupying 13469 MB

13GB of duplicates, that is very impressive! After a short verification that tlmgr update will break hardlinks (it does, due to use of tar), I decided I can safely hardlink all the same files together:

# jdupes -r -L /opt/texlive/20*

The final result (scanning in reverse order, so you see the overhead of old releases):

# du -shc /opt/texlive/20{20..17}
5.2G    /opt/texlive/2020
672M    /opt/texlive/2019
1.5G    /opt/texlive/2018
1.1G    /opt/texlive/2017
8.4G    total

So you can keep around TeX Live 2019 at the cost of 12% it’s original size, and even older versions for less than 25%.

NP: Pearl Jam—Quick Escape

18mar2020 · Determining an election in k

Last Sunday, we had local elections in Bavaria, and somehow I got nerd-sniped into looking at the proportional representation method that is being used now as of 2020, which is the Sainte-Laguë method.

As this algorithm turned out to be non-trivial yet pretty easy, I decided to implement it in k (get a copy). The array-oriented style of k fits the algorithm quite well.

There are two styles of algorithm that compute this method, and I think the German wikipedia page explains them best. If you can’t read German you maybe still can look at the tables.

First, I’ll explain the highest averages method (Höchstzahlverfahren): you write down the result of dividing the amount of votes for each party by 0.5, 1.5, 2.5, 3.5, 4.5 etc. From these quotients, you take the biggest numbers (depending on how many seats you need) and assign them to the corresponding party; each assignment is a seat.

So, let’s do this in k. We set v to be the votes, and s to be the seats:

 v: 10400 3400 6200
 s: 15

First, we need the list of divisors. It does not make sense to make it longer than the amount of seats, so let’s count up (!) to that:

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

(In its typical minimalist manner, k’s prompt is an empty space.)

However, we also need to add (+) the shift of 1/2, which is the key characteristic of Sainte-Laguë:

0.5 1.5 2.5 3.5 4.5 5.5 6.5 7.5 8.5 9.5 10.5 11.5 12.5 13.5 14.5

As you can see, k evaluates the program from right to left.

Doing the division (%) is easy with k, but we need to use the each (') operator so say we want the whole list of votes to be divided by each divisor:

 v % 0.5
20800 6800 12400f
 v % 1.5
6933.333 2266.667 4133.333

20800    6800     12400
6933.333 2266.667 4133.333
4160     1360     2480
2971.429 971.4286 1771.429
2311.111 755.5556 1377.778
1890.909 618.1818 1127.273
1600     523.0769 953.8462
1386.667 453.3333 826.6667
1223.529 400      729.4118

This is the table already! Let’s store (:) it as r:


We’ll then flatten the table into a list, by applying the concat operator (,) over the table (/):

20800 6800 12400 6933.333 2266.667 4133.333 4160 1360 2480 2971.429 ...

Now we can sort this list, using the new ^ operator:

234.4828 251.8519 272 295.6522 323.8095 357.8947 400 427.5862 453.3333 ...

But we actually want the biggest numbers, so let’s reverse this:

20800 12400 6933.333 6800 4160 4133.333 2971.429 2480 2311.111 2266.667 ...

Now we can take as many as we have seats:

20800 12400 6933.333 6800 4160 4133.333 2971.429 2480 2311.111 2266.667 1890.909 1771.429 1600 1386.667 1377.778

Great. Now, we have to find (?) these in the original table:

 0  3  1
 2  9  5
 4 15  7
 6 15 11
 8 15 14
10 15 15
12 15 15
13 15 15
15 15 15

Find returns the index of the left side in the right side, or an index out of bounds if it can’t find the element. If we check this table for s, we can detect these elements:


But we actually want the negation (~) of this table:


Now, we just have to sum (+) up (/) the columns again:

8 2 5

This is the seat assignment per party now!

In good k manner, we fuse this program into a single line (assignment has the value that is being assigned):


One issue with this implementation is that it doesn’t work correctly when multiple parties get the same votes, for example:

 v:7 7 7
7 7 7


We can fix the solution, and actually even make it shorter (but a bit harder to explain), by using the grade-up operator (>).

Let’s go back to our table r:


Instead of sorting, we now grade the flattened r up:

0 2 3 1 6 5 9 8 12 4 15 11 18 21 14 7 24 17 27 30 10 20 33 36 23 39 13 26 42 29 16 32 35 19 38 41 22 44 25 28 31 34 37 40 43

What does this mean? It says the biggest number is the 0th element, the second biggest number is the 2nd element, the third biggest number is the 3rd element, and so on.

This is not directly useful, but a well-known trick is to grade-down (<) this list now:

0 3 1 2 9 5 4 15 7 6 20 11 8 26 14 10 30 17 12 33 21 13 36 24 16 38 27 18 39 29 19 40 31 22 41 32 23 42 34 25 43 35 28 44 37

And this is actually the rank of each seat! (I recommend spending a few minutes to figure this out on your own.)

But now the column structure is lost. We can recreate it with the new cut (^) operator that replaces the reshape operator. As we have as many columns as votes, we get:

 0  3  1
 2  9  5
 4 15  7
 6 20 11
 8 26 14
10 30 17

But again, we are only interested in the seats less than s, so


And we can sum up again:

8 2 5

Since this solution doesn’t require r later, we can just inline it:


The other algorithm is the divisor method (Divisorverfahren), which says you find a number (it doesn’t say how, so not really an algorithm…) such that the rounded sum of the votes divided by this divisor equals to the number of seats.

Given a divisor x, we can divide (%) the votes:

7.63583 2.496329 4.552129

To round these values, we use the idiom of adding 1/2 and taking the integer part (a.k.a floor, _):

8 2 5

As we know, this sums up to s:



But how do we find the x? We’ll use a while loop (/:) for that, because we know that we’ll find a divisor by counting up, and every solution that sums up to s is the same. So the first is good enough.


Then, we can divide again for this divisor, and get the same result as with the previous method.

8 2 5

However, due to the linear search for the quotient, this method is slower.

Finally, we can insert the official results and check the election:

 v: 9983378 11756412 1007445 1559070 8878524 1419438 1597499 1318396 395847 273557 86275 142051 339665 247453 528617 732056 120877
 s: 80
20 23 2 3 18 3 3 3 1 1 0 0 1 0 1 1 0

This agrees with the official seat assignment.

For funsies (and because I’m pretty bored), we can contrast the new election algorithm with the previous ones. At the 2014 election, the Hare/Niemeyer method was used. Here’s the code, perhaps you can figure it out:

20 23 2 3 18 3 3 3 1 1 0 0 1 0 1 1 0

This yields the same result as Sainte-Laguë for this election. There is still a significant difference between these methods: if we compute how many more votes mut would have needed to get a seat, it’s 6220 additional votes with Sainte-Laguë, but 26105 with Hare/Niemeyer!

Until 2010, the d’Hondt method was used. Here, we just have to tweak the .5+ offset:

21 25 2 3 19 3 3 2 0 0 0 0 0 0 1 1 0

Using this method would have kicked out three small parties that are in this year! To no surprise, the big conservative parties want back to this system.

NP: Trembling Bells—This Is How The World Will End

01jan2020 · X11 screen locking: a secure and modular approach

For years I’ve been using XScreenSaver as a default, but I recently learned about xsecurelock and re-evaluated my screen-saving requirements:

  • The screen saver should turn on and lock on after some configurable time.
  • The screen saver should lock on a hotkey.
  • The screen saver should lock before suspend.
  • The screen saver should authenticate via PAM.
  • The screen saver should be configurable via xset s.
  • The screen saver should tell to forget SSH agent keys on locking.
  • I can disable the screen saver, e.g. when giving a presentation.
  • The screen saver should display a pretty demo.

    When I just used XScreenSaver, I had these issues:

  • Locking before suspend was not so easy, I had hacks using either xauth as root or additional helper scripts trapping signals.

  • Forgetting the SSH agent keys required a small, but additional script.
  • Rarely, XScreenSaver got stuck, so I had to kill it from a TTY to get in.
  • My xlbiff managed to pop up over XScreenSaver.

    After some unsuccessful fiddling with xss-lock and xautolock, I settled down on this toolset now:

  • xsecurelock for screen locking and spawning XScreenSaver demos

  • xidle for spawning xsecurelock after timeout
  • xbindkeys for triggering xidle on hotkey
  • acpid for triggering xidle on lid close

    Note that none of this requires systemd, DBus, or really anything else that X11 didn’t have for at least 25 years.

So, how to put it together:

I use a script run-xsecurelock, which is spawned from .xinitrc:

# run-xsecurelock - run xsecurelock(1) with right config

if [ "$1" = lock ]; then
    ssh-add -D
    XSECURELOCK_SAVER=saver_xscreensaver \
        exec xsecurelock

xidle -no -program "$HOME/bin/run-xsecurelock lock" -timeout 600

This runs xidle with a timeout of 10 minutes and tells it to spawn this script again with the lock argment, so it will run xsecurelock after forgetting the SSH agent keys. xsecurelock then spawns a random XScreenSaver demo. There is no support for cycling the demo, but you can trigger the authentication dialog and close it to get a different demo.

Then, we can set up the hotkey in ~/.xbindkeysrc:

"xset s activate"

I used to use the Pause key for triggering the screen saver, but my T480 doesn’t have it anymore, so I use Insert now, which I don’t use else.

Finally, acpid can trigger xidle by sending SIGUSR1 from handler.sh:

pkill -USR1 xidle

Note how simple this is as root and doesn’t require getting X11 credentials or complex IPC.

To disable xidle, I just run xset s off. The timeout is configurable at run time using xset s SECONDS.

This should satisfy all my requirements and is a cleaner setup then I had before.

NP: Leonard Cohen—Thanks for the Dance

Copyright © 2004–2021