leah blogs: October 2020

23oct2020 · Efficient text editing on a PDP-10

I did not know into what rabbit hole I’d fall when I clicked on last week’s Lazy Reading post and discovered the link to SAILDART. The linked e-book gives a good preview of what we can find there: a complete archive of the Stanford AI project’s PDP-10 setup (a.k.a SAIL), and large parts of it are open for public access!

Back in the early seventies, when American universities got the first computers suitable for multi-user usage, it was common to develop custom operating systems (or heavily modify the vendor’s one), as research facilities had special needs, and also a different mindset than commercial offerings.

I was mostly familiar with the Incompatible Timesharing System (ITS) of MIT (written for a PDP-6, later on a PDP-10), due to its leakage of concepts and culture into Lisp Machines and the GNU project. And of course, Berkeley is known for developments around Unix in the late seventies which then turned into BSD. However, I had only remotely heard of WAITS, the operating system used at the Stanford Artificial Intelligence Laboratory, which first ran on a PDP-6, and later on a PDP-10. Initial development was a based on the standard DEC monitor (“kernel”). [2020-10-26: Rich Anderson points out that it was not based on TOPS-10, but rather a spinoff.]

So I started to dig around in the SAILDART archives, and quickly found Donald Knuth’s TeX working directory, because this actually was the system TeX initially was developed on! Not only the early TeX78 can be found there (look for the TEX*.SAI files), but also the source code of TeX82, written in literate Pascal, which essentially still is the core of today’s TeX setups.

But not only that, we also can find the source of the TeXbook and parts of TAOCP. (Looking at TeX code from its creator himself is very instructive, by the way.)

One thing that surprised me was that all these files were rather big for the time; while the TeX78 code was split up into 6 files, TeX82 was a 1 megabyte file and the TeXbook a single 1.5 megabyte file. This makes sense for redistribution of course, but there is no evidence the files were not kept around as-is, which brought me to the central question of this post:

How was it possible to efficiently edit files bigger than a megabyte on a PDP-10?

Remember that, at the time in question, these systems supported at most 256 kilowords—and in later versions 4 megawords—(the PDP-10 is a 36-bit machine, usually one packed 5 7-bit ASCII characters into a word) main memory at most, and 256 kilowords per process, so simply loading the file fully into memory was impossible. A smarter approach was needed. Earlier editors on WAITS, such as SOS (SON OF STOPGAP, an even older editor), had to write the file with the changes you made into a different output file which was then moved to overwrite the original contents. Of course, this had the disadvantage that saving big files was very slow, as rewriting a megabyte file easily could take several tens of seconds.

The most popular editor of WAITS, called E (manual as PDF, reference, source), had a better approach: big text files were split into pages, separated by form feeds (^L), and the editor could only load some of these pages into main memory. It was recommended to keep pages less than 200 lines, which roughly are 3 kilowords. Finally, E edited files in-place and only wrote out changed pages, so fixing a single character typo, for example, just required a quick write.

In order to know where the pages start, E maintained a directory page, which was the first page of a file, starting with COMMENT (click on some links above to see an example) and then the list of pages and their offset. Thus, seeking to a specific page was very quick, and the directory page doubled as a table of contents for bigger files, which improved the user experience.

This directory page was part of the file. Compilers and tools had to be adjusted to ignore it if needed (well, FORTRAN ignored lines with C anyway…), but for example TeX had modifications (see “Reading the first line of a file”) to skip the directory page.

This sounded all plausible to me, until I realized that it would not work for actual editing, because you of course not only overwrite characters, but also insert a word or a line here or there when you are working on a big program. E would still have to rewrite the whole file after the insertion, I thought!

So I dug deeper and realized I had to rethink some implicit assumptions I had from using Unix-like systems for the last 20 years. On Unix, a file is a linear stream of bytes: you cannot insert data in the middle without rewriting the file until the end.

However, WAITS used a record-based file system. We can read read about it in the documentation on UUOs (what a Unix user would call syscalls):

A disk or dectape file consists of a series of 200 word records. Often, these records are read (or written) sequentially from the beginning of the file to the end. But sometimes one wishes to read or alter only selected parts of a file. Three random access UUOs are provided to allow the user to do exactly that. To do random access input or output, you must specify which record you want to reference next. On the disk, the records of a file are numbered consecutively from 1 to n, where the file is n records long.

This means that a WAITS file is a sequence of records, not a sequence of bytes. And if a record was not full, it could be re-written with more content! Of course, if you inserted so much you actually needed to insert a new record, the file needed to be rewritten. This was called “bubbling”, and E also did it in-place. But for small edits, rewriting the records that contained the changed pages was enough.

I think the record-oriented file system of WAITS was actually key to support editing big files in this environment. Other systems at the time did not support this as well: Unix ed loaded the whole file into memory [2020-10-25: as Eric Fisher points out, ed uses a temporary file to keep blocks if the buffer reaches a certain size] and wrote it out again, and Unix consists of many small files not larger than 1000 lines. On ITS, the only bigger files I could find were assembled from other inputs, or mail archives which were only read or appended to, but not modified inside.

However, as having more memory got feasible, all these optimizations became obsolete. All modern text editors load files directly into memory and rewrite them when saving.

The other thing I found that amazed me was how much the E command set influenced Emacs! Richard Stallman saw the E editor in action and wanted a real-time screen editor for ITS as well, so their TECO got a full screen mode. I think that Emacs’ choice of modifier keys (Control, Meta, or both) and things like prefix arguments are directly taken from E. However, E was still fundamentally line-based and only supported interactive editing of whole lines (reusing the system line editor for performance reasons). TECO was stream-oriented and then supported direct editing on the screen.

Digging through the SAILDART archives, and then looking into fragments of ITS for comparison also showed interesting cultural differences: WAITS used mechanisms for accounting disk space and CPU usage, and projects had to be registered to be paid for (I have not heard of any such features for ITS). WAITS requires logins with passwords from remote connections (this was added to ITS very late). The ITS documentation is full of slang and in-jokes. But not everything was serious on WAITS: SPACEWAR was very important and there are references to it all over the place.

There are many interesting things to be found in SAILDART, I recommend you to look around for yourself.

If you got curious now, it’s actually possible to run WAITS on your own machine! Grab SIMH and a WAITS image and you can get it running pretty easily. I recommend having a Monitor Command Manual handy. (Also note that currently there is a bug in the SIMH repository which makes graphical login impossible. I can vouch that commit c062c7589 works.)

Thanks go to Madeline Autumn-Rose, Richard Cornwell and Shreevatsa R for helpful comments. All errors in above text are mine, drop a mail if you have a correction.

NP: Bob Dylan—I Contain Multitudes

Copyright © 2004–2022