Much of the design and philosophy of operating environments can be
found out by the file formats they use. In this note, I'll compare
the file formats of the default mail programs of Unix, ITS and
Windows.
On Unix, I take mail(1) as reference mail program, since many
other mail programs use the same file format, mbox, but a look at
maildir will also be taken. On ITS, RMail and Babyl are examined.
Finally on Windows, Outlook Express is the only mail program included
by default.
The mbox format of Unix is very simple: Its a file with a number of
RFC (2)822 messages separated by lines like
From MAILER-DAEMON Tue Dec 17 16:53:03 2002
Lines starting with "From" are taken as message delimiters.
The simplicity of this format speaks for Unix, its very easy to edit
and manipulate, to search and retrieve messages of the file.
However, as (unfortunately) so often in Unix, this was not thought to
the end: What happens if the message itself contains a "From"
line? Here, mbox(5) tells us what to do: In order to avoid
misinterpretation of lines in message bodies which begin with the four
characters "From", followed by a space character, the character ">" is
commonly prepended in front of such lines.
What a ugly hack! And still, no-one tells what to do with lines that
start with ">From"... (It is interesting by the way that sometimes
you see articles in newspapers that include the word ">From"...)
ITS mainly used two MUAs, RMail and Babyl. Both of them have almost
the same structure and are very similar to mbox too. All messages are
concatenated as RFC (2)822 messages, but they are separated by
^_ (ASCII 31, octal 037, also known as Unit Separator
(US)) on a single line.
This is obviously the right thing: A special character was made for
this purpose, so it's used. Furthermore, its a non-printable char and
so would be encoded using quoted-printable or MIME even anyway.
(At least if that existed back than.)
Still, this format has all the good sides of mbox stated as above.
Outlook Express, the default mail program on recent Windows versions,
uses magical, binary and proprietary PST files not readable by humans.
There exist some tools and special libraries to access these files,
the format however is neither open nor portable and not used by any
other program (except in input filters). This is the usual way of
making a monopoly, first force the users to use something, and later
force them say there as they cannot switch (you cannot export your
mails into some other format with Outlook Express).
All these formats have something unique, they are all stored in a
single file. This can easily cause data corruption, for example if
several processes access the same file. While not fatal in the case
of mbox and RMail, Outlook Express files are likely to be fubar.
Therefore D. J. Bernstein invented a new way to storage mail, the
maildir format. Here, mail is stored in—as the name
says—directories. Furthermore, maildir doesn't need locking as two
processes can write into the same directory concurrently. This helps
a lot as many networked file systems handle locking badly or not at all.
Basically, a maildir directory includes three subdirectories,
tmp/, new/ and cur/. new/ and
cur/ have exactly the same substructure—except that
new/ contains unread mail and cur/ mail already seen
my the MUA—as they contain files with the single messages in RFC
(2)822 format without any content escaping at all.
maildir is available for and being used on many Unixes and clones,
including GNU/Linux and various BSD.
It is truly is the best format of them and without any hacks at all,
still being open, independent and easy to use. In fact, a user could
read his mail without any MUA at all, using only the standard file
utilities found on any system.
So, what can you learn of this comparison?
Looking at how elementary things are done, you learn a lot about how
the rest of the stuff works. You immediately see if its closed,
complex and opaque (Windows), or open, simple, flexible but not always
well-thought (Unix) or open, simple, flexible, and done as best as
possible (ITS, please note that ITS didn't support nested directories,
so maildir wasn't possible way back then).
And sometimes, there's a new technology which is different, but better
than everything before. Then go ahead and use it, and drop the old
things, but keep compatibility to them (there is
maildir2mbox), at least at much as possible and as long as
its reasonable.
NP: The Overprivileged—Power Shift