leah blogs: April 2005

« March 2005 May 2005 »

30apr2005 · Ich weiss was du gestern getan hast...

… aber du nicht! Da haben wir nämlich Matthias Ferstls 18. Geburstag gefeiert:

Der Rest direkt bei flickr, Ferstls 18. Geburstag.

NP: Adam Green—Chubby Princess

28apr2005 · Fortress

Fortress is a fairly new language developed at Sun Microsystems. They claim it is a

general-purpose, statically checked, nominally typed, component-based programming language designed for producing robust high-performance software with high programmer productivity.

And they also state its domain very clearly:

The name “Fortress” is derived from the intent to produce a “secure Fortran”, i.e., a language for high-performance computation that provides abstraction and type safety on par with modern programming language principles. Despite this etymology, the language is a new language with little relation to Fortran other than its intended domain of application.

Recently they published a preliminary Fortress Language Specification (which contains above quotes, by the way) and had an a bit closer look at it.

My general impression is that the language totally rules in a way such a thing never was there before. Think about mixing the domain of Fortran with the flexibility of Common Lisp and the type system and syntax of ML and Haskell and you have a language very close to Fortress.

Should Fortress gain a wide acceptance (which, I hope, is likely; but remember Self), it will be one of the most advanced languages out there, I don’t think there is any language outside academics that would be on par (feel free to prove me wrong).

But now, after this hype, let’s have a look at some features.

The first thing you will notice by glazing at some Fortress code is that it uses Unicode syntax. For example, you write ≥ instead of >= and also can use greek letters like α, β or γ. You can write Fortress in plain 7-bit ASCII too, but that will force you to possibly use ugly constructs, so Unicode really is the way to go. I wonder how this will influence Fortress IDEs, they maybe will have special keyboard macros for this kind of stuff.

Very soon in the manual, Objects and Traits are introduced. Fortress includes a very nice, statically typed object system. Traits are a bit like classes in other languages (Modules in Ruby), but can inherit (or, mix-in) other traits. Traits implement methods. If I think about it, the object system is very similar to the one of Self. You create new Objects by specifying a main trait (parent* in Self) and mixing in other traits (Modules in Self, if I recall correctly). The object system supports overloading and multiple dispatch.

Also, Fortress has top-level functions (wrapped in a package system of course, inspired by Java’s; it is rather complex but also addresses issues like building programs or installing 3rd-party libraries. A bit as if Ant was included, or defsystem) that are first-class and can be passed around. Anonymous functions can be created, but only with limitations. I could not find any mention of closures.

The Fortress language is very clear and easy to read. The default (more on that later) Syntax uses indentation in a Haskell-way to save usage of ;, and another special (but IMO obscure) syntax for arrays. For example, you can simply write

accelerate(θ) =
  spin := spin + θ

Another interesting (IMO not a very good idea) thing is that multiplication is implicit, that is, you can write:

foo := bar quux dong

A nifty feature, probably taken from Eiffel, are function contracts. They can check the values (“require n to be greater than zero”) or do assertions (“if the input list was sorted, the return value will be too”).

Fortress provides a generic for loop that uses generators internally and can easily parallelize (important!) the calculation.

There is a case statement which works for all kind of types and any predicate (took a long time to get that into languages…) and dispatch, which can dispatch on type and is type safe.

Arrays, sets, maps and matrices are in the core language and have easy syntaxes to make them convenient to enter. Also, you can calculate sets and arrays using Haskell-style list comprehensions.

Arbitrary-precision integers and rational numbers are provided, too. One very nice thing I really like is that you can define dimensions that provide type-safety to calulations. For example:

dim Length
unit m : Length
k = 1000

circumference = 40075 k m

Now, you can create functions that only operate on Lengths. Additionally, and this is the nifty part, dimensions can be connected to compound types, for example:

type Area = Length²

I can imagine this to be very useful to scientific calculations, as wrong calculations/formulae can be found easily.

Definitions and declarations have modifiers, some of which are well-known like abstract or private. However, there are much more powerful and cool modifiers, for example io which needs to be given to perform IO directly or indirectly (kind of like the IO monad of Haskell). Or pure which makes methods ensure they have have no side effects and will probably help a lot with optimization. Then, there is a static modifier that can make methods run at compile-time, for example for static checkers or parsers. You can use value to make your objects immutable and wrapped to forward undefined methods. Finally, there is a test modifier to define unit tests.

Fortress also provides futures, values that will be calculated in “background” and can synchronize when they are enforced. A very nifty thing are atomic expressions that will make all read/writes to variables happen in a single step, for example:

atomic do sum:=sum+a(i) end

Arrays can be optimized heavily due to their abstraction. You can choose between several ways to distribute them in memory and how to apply parallel calculations.

Now we come to one of the biggest and most powerful features, the extensible syntax. It can be used to write domain-specific languages inside Fortress and works a bit like Caml4p, IIRC. DSLs can access inner Fortress code using syntax escapers. E.g. you write:

syntax sql exp end = parseSQL(exp)
sql
  SELECT spectral_class FROM stars
end

and it can generate code as if you had written

SqlQuery(Select("spectral_class"), From("stars"))

It would be even possible to support C-ish for loops like

jfor (int i = 0; i < 10; i++) {
  print(i);
}

As you probably can tell, I’m rather excited about Fortress. I think it is a very powerful and flexbile language, but it has lots of features (not to say, it is over-engineered). Maybe it’s a bit too big and they would rather stick some features into external libaries instead of providing them in the core.

I hope Fortress will be a successful language (you could say that solely depends on the marketing, but they need a damn good optimizing compiler too for their domain; also, an open-source implementation to get support from “the community”), but lastly it will be the programmers’ thing to actually make use of all the features provided in a sensible way, which can be too much for people that only use Fortran all the day for now. However, I think they can be introduced into Fortress rather easily.

Fortress is probably the best and most properly designed language for high-performance computing to my knowledge.

NP: Brazilian Girls—Lazy Lover

26apr2005 · Papa Razi und Soldaten

Diesmal eine Sonderausgabe der laufenden Quotes wegen dem hervorgehobenen Zitat. Das muss man erst mal bringen…

[gibt zwei Mädchen die einzigen Einsen:] Ja, des mach’ ich wegen den Titten.

Nicht die Paparazzi mit Papa Razi verwechseln!

Ihre Muschi frisst ihre Hose! [Unterstüfler zu Lehrerin.]

Wo kommt ihr her? — Wir sind zu spät…

Und jetzt alle auf die Tische stuhlen.

Was machen Soldaten? — Sterben.

David referiert jetzt auch über die Klasse, sogar mit Photos: d3construction — german podunk life at its best.

NP: Distillers—The Gallow is God

24apr2005 · Dynamic Variables in Ruby

Many Lisps provide dynamically scoped or special variables additionally to lexically scoped ones. Some (for example Elisp or ye olde MacLisp) even soley provide dynamically scoped variables. In fact, Scheme was the first lisp-based language to make lexical scoping popular.

Now, what’s the difference between dynamic and lexical scoping? Lexical scoping should be known to every Rubyist, as this is the usual (and the only one provided by default) way of scoping. Lexical scoping has the major advantage of enabling closures, that is, pieces of code that save their lexical environment. For example, in Ruby you can write:

def adder(n)
  v = 0                         # v is lexically scoped here
  lambda { v += n }             # v and n are accessed from the closure
end

add1 = adder(10)
p add1.call  #=> 10
p add1.call  #=> 20
p add1.call  #=> 30

add2 = adder(5)
p add2.call  #=> 5
p add2.call  #=> 10
p add2.call  #=> 15

But you already knew that. Not so with dynamic scope, and that’s why dynamic scope is usually avoided. However, there is one important and useful usage for dynamic scope: Providing contexts. For example, let’s say you write a currency converter in Elisp:

(defvar +eur2usd-factor+ 1.3068)    ; 24apr2005

(defun eur2usd (euro)
  (* +eur2usd-factor+ euro))

(eur2usd 10)                ; => 13.068
(eur2usd 0.77)              ; => 1.006236

Ok, that was easy. But not, let’s say we would like to know what our Euros would have been worth last year. Of course, we could globally change the factor, but that would be icky and we would need to be careful to change it back, so rather let’s dynamically rebind the value:

(let ((+eur2usd-factor+ 0.9267))    ; last year, maybe
  (eur2usd 10)              ; => 9.267
  (eur2usd 0.77))           ; => 0.713559

This may surprise you, but eur2usd looks up the dynamic value of +eur2usd-factor+, not the lexical value (which would have been 1.3068). let redefines the value during the execution of his body, so when eur2usd is called, +eur2usd-factor+ is actually 0.9267. However, after the let, everything is as it has been before:

(eur2usd 10)                ; => 13.068

Now, how can we translate this behavior into Ruby. I decided to go the way of using a thread-local variable (a Hash, actually) and add convenience functions to define and change and rebind them. Let’s see how above code looks in Ruby:

require 'dynamic'

Dynamic.variable :eur2usd_factor => 1.3068

def eur2usd(euro)
  euro * Dynamic.eur2usd_factor
end

p eur2usd(10)               # => 13.068
p eur2usd(0.77)             # => 1.006236

Dynamic.let :eur2usd_factor => 0.9267 do
  p eur2usd(10)             # => 9.267
  p eur2usd(0.77)           # => 0.713559
end

p eur2usd(10)               # => 13.068

Now, that was pretty easy. You basically just need to prefix your variable with Dynamic. to look it up. It’s also a nice sample how to extend Ruby to add new language features, by definining some convenient methods alone.

As you should use dynamic variables as globals only, it does no harm that Dynamic is implemented using a singleton. Dynamic is thread-safe however, in that you can’t change other thread’s environment, and get a new copy of the main thread’s environment.

Now, go ahead and grok the code. Happy hacking.

NP: The Distillers—Gypsy Rose Lee

21apr2005 · Blindschleichen, Pazifische und Gürkinen

Immer wieder Quotes aus dem zauberhaften WG, in das es genauso rein regnet wie in die DR-Turnhalle:

Was ist grün und trägt ein Kopftuch? — Eine Gürkin! [Dieser Witz endete im größten kollektiven Lachgasmus den ich je erlebt habe.]

Nazipan

Gibts auch Schwarze, die weise waren?

Fuckminster Bullerene

Wie nennt man Araber, die nicht sehen können? — Blindscheiche!

Ich habe ein ganz raffiniertes Salatöl gefunden… [*klick*]

[schreit in der Dusche] — Ja ja, beim ersten Mal tut’s immer weh.

Quatsch doch net, sonst rutscht mir dr Lumpa aus!

Ich bin Pazifisch.

[schaut Liste mit Zielberufen an:] Gibt’s da nix mit Alkohol?

Die Mädchen sind voller Blut, zumindest inwändig.

NP: Dan Bern—Wednesday Night

19apr2005 · Individual-i

Bruce Schneier as launched a new campaign called Individual-i which provides a symbol for individual rights.

[The Symbol] recognizes that a free society is a safe society, and that freedom is founded upon individual rights.

The battle for individual rights is just beginning; our side needs a symbol.

As the site says,

Individual-i stands for:

Freedom from surveillance

Personal privacy

Anonymity

Equal protection

Due process

Freedom to read, write, think, speak, associate, and travel

The right to make your own choices about sex, reproduction, marriage, and death

The right to dissent

I think all these issues are very important, especially in times where privacy and anonymity get attacked for dubious reasons like “war on terrorism” and “fighting crime”. Furthermore, they help creating a better world due to more tolerance, independence, discussion and equality.

The Individual-i is a great idea in my opinion and encourage you to put the logo on your site, create bumpers, make t-shirts and posters or whatever your like.

The Individual-i symbol is not owned by any organization. There is no platform, no organizational structure, no meetings. This symbol is in the public domain: uncopyrighted, untrademarked, unowned. Anyone can use it for any purpose.

You can get the logo on the Individual-i site, in both JPEG and EPS (why the hell is the EPS so big?).

We hope to see this symbol displayed proudly wherever individual rights are valued.

NP: Dan Bern—New American Language

17apr2005 · A Tagged Filesystem

I often notice that computer users, mostly the ones of the novice kind, have trouble understanding the filesystem, organizing and therefore (re)finding their files.

Usually, this results in the user saving the file into whatever the current path of the “Save…” dialog is, alternatively stuffing it all into one large directory…

Tags to the rescue! We can build a Tagged File System, which doesn’t have a hierarchy, but only tags that can be attached to files. This would even be possible to represent in Unix, provided you can alter the file dialogs of your applications: All files get saved into a hidden directory, .everything. By tagging a file, it will be hardlinked into a directory with the name of the tag. Now you can simply “copy”, “move” and “delete” the file, thereby only changing tags. To unlink the file, a tool would need to look into .everything for files that don’t have a link in a tag folder. (Actually, you can only use tag folders, and no .everything, but this may be a bad idea, read on.)

The problem now is that because all files reside in .everything, they all need to have a different basename. I first played with the idea of moving the complete system into a Tagged File System, but then I analyzed my disks: My root directory (Mac OS X 10.3.8 installation with a big home and lots of cra^Wstuff installed) has 603484 files, and there are 81498 basename clashes, every seventh file clashes. Additionally, there are 17807 different parts of directory names used. That would be 17807 tags!

When I reduced the analysis to my home directory, it still was 25092 clashes in 204372 files, every eigth file (mostly due to files like Makefile, COPYING, info.nib that can be found in developer’s homes).

Of course, I’m not the target user of this, but these results nevertheless tell me that one better only uses Tagged Filesystems for directories like “Documents” or “Music” (assuming your files aren’t called 01.ogg, 02.ogg…). In these kinds of directories, name clashes are rather rare, so here tagging can fully pay out.

One very nifty thing would be implementing the Tagged Filesystem using LUFS or Hurd’s filesystem translators, so you could do stuff like (assuming the Tagged Filesystem is mounted at ~/music.)

ls ~/music/blues/clapton

very easily. By the way, above would be the same as

ls ~/music/clapton/blues

of course! One may want to invent a syntax to implement negation too, so you could do

ls ~/music/-clapton/blues

to show all music files tagged as Blues not by Eric Clapton. Another nice thing would be to have computed tags, like “Files created last week”, “Files changed after last backup” and so on.

I think Tagged Filesystems could help the average user lots, and still be downward compatible enough to classic, hierarchical filesystems to stay accessible within the shell. Of course, this requires OS and application developers to actually implement them, and making them so easy and natural to use that “average” people will actually use them.

NP: Eric Clapton & B.B. King—Worried Life Blues

15apr2005 · Hypotheken, Benzin und Diarröh

Und schonwieder Quotes…:

Das ist ja eine der Haupthypoteken, dass man das Christentum in Südameria verbreiten soll.

Was macht die Rehmutter mit ihren Kindern? — Kitzeln!

Wer geht von euch noch in die Kirche?

Wir gehen zur Aral…! — Bringt mir einen Liter Benzin mit! — (sie kommen zurück mit einem Liter Benzin!!) — Komm, wir gehen umtauschen, ich will einen Liter Apfelsaft…

Du mit deine scheiss Sex-Bücher! — Des waren aber nur zwei?

Ich habe genug Sex, das Leben fickt mich jeden Tag.

Kuh-Klacks-Klan

Gibts Geschlechterrollen eigentlich auch beim Chinesen?

Von hinten eingeweiht…

Diarröh ist doch eine Geschlechtskrankheit, oder? — Wenn du nicht extremst Analsex praktizierst nicht…

NP: Eric Clapton & B.B. King—I Wanna Be

13apr2005 · shift, reset and streams

As everyone using and understanding continuations (hopefully) knows, there is no way to return from a continuation once it is called. This “problem” (actually more of an restriction) can be solved using “partial continuations” which can return, as they only save a certain part of the stack.

There are many ways to implement and describe partial continuations, A Library of High Level Control Operators by Christian Queinnec gives a nice overview on them. Personally, I think shift/reset is the most easiest to grasp and to use, YMMV. Recently it also has been proven that shift/reset can be expressed in terms of control/prompt, see How to remove a dynamic prompt for an analysis; if you are interested in such things, also read Shift to Control by Ken Shan.

Now, really, how do shift and reset work? Well, this is actually rather easy. reset introduces a new scope, in which you can call shift to get a partial continuation. Calling this partial continuation will result in starting at reset again. At the end of the shift, shift will return from the reset.

What maybe is a bit more suprising, is that partial continuations can be implemented using “ordinary” continuations, if the language allows for side-effects (which lambda calculus doesn’t). I have written shift and reset in Ruby, here is the code:

# Modeled after Andrzej Filinski's article "Representing
# Monads" at POPL'94, and a Scheme implementation of it.
# http://citeseer.ist.psu.edu/filinski94representing.html

module ShiftReset
  @@metacont = lambda { |x|
    raise RuntimeError, "You forgot the top-level reset..."
  }

  def reset(&block)
    mc = @@metacont
    callcc { |k|
      @@metacont = lambda { |v|
        @@metacont = mc
        k.call v
      }
      x = block.call
      @@metacont.call x
    }
  end

  def shift(&block)
    callcc { |k|
      @@metacont.call block.call(lambda { |*v|
                                   reset { k.call *v } })
    }
  end
end

You aren’t really expected to understand this code; still, anyone learning about (partial) continuations may take this as a koan to meditate about. :-) Please note that above implementation is rather inefficient, not at least since callcc is quite slow in Ruby anyway. A proper partial continuation supporting language would provide these methods natively, of course. This can be done very easily if you use continuation style passing, see above papers for detail.

Now, let’s implement something nifty using our new partial continuations, I decided to convert enumberators like each to lazy streams, look here how this can be done:

include ShiftReset

def each_to_stream(collection)
  reset {
    collection.each { |v|
      shift { |k|
        lambda { |&block|
          block.call v
          k.call
        }
      }
    }
    nil
  }
end

As you can see, this is rather simple code, written in purely functional style. (You need to use Ruby 1.9/CVS for lambda with blocks, sorry. If you don’t want to, rewrite the code and explicitly pass the block. This is left as an exercise for the reader.)

Here’s an example on how to use it:

iter = each_to_stream [1,2,3,4,5,6]

Now, on each call of iter, a block will be called with the current value, and a new iter is returned which will return the next value on it’s call, ad infinitum. The nifty thing is that you can fork the stream, resulting in having two ends:

iter = iter.call { |v| p v }
iter = iter.call { |v| p v }
iter2 = iter                    # Fork
iter = iter.call { |v| p v }
iter = iter.call { |v| p v }

As expected, above code will print:

However, note that we forked iter2 when 2 was the current value. To prove that this worked, try this:

iter2 = iter2.call { |v| p v }  while iter2

This will output the following; the fork was sucessful and our implementation of partial continuations work:

If your brain hurts now, don’t worry. :-)

NP: Bob Dylan—Hurricane

11apr2005 · Nanoki

Over the weekend, I wrote a Wiki in Ruby called Nanoki, which is named that way because it’s supposed to be small and lightweight.

Now, writing a Wiki is neither hard nor very complex, and many people did it before me already; in many different languages and Ruby too, of course. I wrote Nanoki because I had some special needs related to the rendering and markup of the pages (yeah, I want to use BlueCloth, RubyPants, Vooly, Kashmir and ClothesLines everywhere, I know. You should do too. :-)).

Instead of looking for a Ruby Wiki, I just started to write one. Even the ones supposed to be small, as RubyMiniWiki are spread over multiple files, and not as clean as I wanted them. Others, like Ruwiki probably require far bigger changes than I wanted to do. If there was something like usemod in Ruby (real Ruby), maybe I’d have adapted it.

Nanoki really is nothing special and I don’t want to publicize the code yet, but what’s more interesting is the way creeping featuritis set in, because you think like this: Ok, now I can view and edit pages; damn, I’d really like to keep a history, and whup, I was keeping old pages. Then, oh! I’d like to keep informed about recent changes, let’s add a RecentChanges page. Whup. I don’t want to browse RecentChanges all the time! Whup, RSS feed of RecentChanges. Now, the first users complain: It’s too hard to add images, and I don’t have a server to keep and link to them. Whup, image upload and helpers to refer to them. These images are too large! Whup, thumbnail generation.

It really hit me. And I really tried to keep it all small and clean. Well, now it’s a bit over 400 lines of code, but mind that all the hard jobs were done already… CGI interfacing (rather easy, but I wouldn’t really want to develop multipart encoding on my own), the rendering pipeline (I threw a WikiWords filter in) and my templating engine. Don’t forget the datastore, I use PStore so far, but it may be too inefficient the way I use it. The code is comparatively clean, though I still inline templates and (need to) use lots of regexps for filename twiddling.

Also, although it only runs (and probably only ever will) on CGI, the design of the application is a bit Railish. For example, the main method dispatches this way:

if %w{view edit rss upload source}.include? cgi.query_string
  send @action=cgi.query_string
else
  view
end

And I use some instance variables to add messages (and errors) to my pages. Mind that I only use(d) Rails for a few days, there were still some concepts that just sneaked in. :-)

Now, I’m not sure whether to publish Nanoki, as it would probably make the thing even bigger as some stuff is still hardcoded and the code really is nothing special. But then, there may be lots of other people in the situation I was in Saturday morning and who don’t want to spend a weekend throwing out a few hundred LoC…

NP: Pearl Jam—Dissident

09apr2005 · Life Hacks for the Rest of Us

The Morning News have a great post The Non-Expert: Life Hacks for the Rest of Us, which lists lots of hints and tips I really wonder how I had lived without them.

For example,

You can open beer bottles using the metal end of most car seatbelts.

or even (every serious handyman will get the creeps now):

In the workshop, one screwdriver can do more than just turn screws; it can hammer in nails, slice open packages, puncture leather, create holes for inserting molly bolts, and more.

And one I do all the time:

When washing drinking glasses by hand, create your own dishwasher by squeezing a little soap into each glass, then set each one in the sink and let a constant stream of hot water fill each glass and overflow until the water runs clear. The glasses won’t be as clean as if you had scrubbed each one, but it’s a lot easier on your hands and elbows.

Pragmatic till it bleeds.

NP: Pearl Jam—Last Kiss

07apr2005 · Engel, Indice und Sextanten

Und es ist mal wieder Zeit für Quotes:

Bei den meisten Engeln haben sich die Flügel zu Schamlippen zurückgebildet… — Dann gibts aber viele Engel.

Die Mädels haben “Kirche” wohl mit “Küche” verwechselt.

Ein Indice ist ein Fußglied, dass sich unabhängig gemacht hat.

Adam und Julia

Ihr macht ja schon wieder nix! — Wir bereiten uns auf unser Berufsleben vor!

Tschechoslowake

Fressreihe [meint: Nahrungskette]

Der Sextant… — So ein Pornoheft, so eins!

NP: Bright Eyes—Land Locked Blues

05apr2005 · Thoughts on Vooly

I’ve been using Vooly for ten days now, but not completely as I had it in mind: I wrote entries for Anarchaia using it, inbetween ordinary HTML.

Now, how does this work, actually? Anarchaia is, if you look closely, just a list of certain things, just as

Links
Flickr images
Quotes
IRC quotes
Lyrics

Each of these get their own Vooly tag, for example links are structured like this:

<<link << URL >>
  << INTRO >> DESCRIPTION >>

Or, as a real example:

<<link << http://www.picotux.com/ >>
  << picotux >>, the world's smallest Linux computer.>>

(IRC-)Quotes and Lyrics behave a bit differently, as there is special non-Vooly syntax used. I do that because it’s easier to write and feels more natural too:

<<quote God not only plays dice in physics but also in pure
mathematics. Mathematical truth is sometimes nothing more than a
perfect coin toss.
--- Gregory Chaitin, A Random Walk in Arithmetic>>

Everything after the --- will get converted to the source.

As you (hopefully) can see, writing repetitive HTML with “macros” like this is much easier than doing it manually (or with XML/XSLT, even!).

Now, let’s see how this actually is implemented: Since Anarchaia is powered by Nukumi2, it’s just a matter of another plugin for it, called AugmentedHTML. This plugin is trivial:

class AugmentedHTML
  def initialize(string)
    @string = string
  end

  def to_html
    v = Vooly::Slurp.new(@string, false)
    data = v.slurp.with_mapping! Decorator::MAPPING
    data.to_s
  end
end

AugmentedHTML uses Vooly::Slurp, a tree-like interface (I prefer this term over DOM) to Vooly documents. It provides a nice method with_mapping! that calls constructors given in the mapping.

For example, for the <<link>>, this looks like:

class Link < Decorator
  TEMPLATE = Kashmir.new(<<EOF)
<p class="link">
  <a href="^href">^title</a>^description
</p>
EOF

  def initialize(href, title, description='')
    @href = href
    @title = title
    @description = description
  end

  def to_s
    TEMPLATE.expand { |e|
      e.href @href
      e.title @title
      e.description @description
    }
  end
end

You can see, I’m exercising all the cool technologies I built in the last months. :-) The important part is the initializer, look how the elements of the document get mapped onto parameters.

Then, we simply call to_s recursively on the Slurp, and voilà, there is our HTML!

Now, this works pretty good, but there are some issues with Vooly with respect to using it that way. One, perhaps minor thing, are syntactic constructs like

<<foo <em>cool</em>>>

This will not work as expected, since the parser will (and has to!) match on the first >> to close the tag. Therefore, I decided to make Vooly drop the very last space of tag content, so you simply write

<<foo <em>cool</em> >>

And it’s not ambiguous anymore.

The more difficult thing is that I can’t use (my beloved) BlueCloth anymore, since BlueCloth (and Markdown in general?) will interpret <foo as unclosed HTML, and ignore that block till it finds the matching <.

This is very unfortunate, but I can’t figure a way how to avoid it. Also, AugmentedHTML must run before BlueCloth, since BlueCloth will also try to escape the “superfluous” angle brackets! I wonder if I should maybe change Vooly brackets to {{ and }}, but these are icky to type (at least on German keyboards), and don’t look and feel as near as cool as << and >> do…

NP: Bright Eyes—Easy/Lucky/Free

03apr2005 · How I learned to love the tag

I spend a lot of time reading about various ways to organize and re-find data. The concepts of these techniques range from very high-level concepts that are complicated to use, hard to build and to manage, such as Topic Maps over sightly loosened ways like XFML (eXchangeable Faceted Metadata Language) to anarchic, ad-hoc methods like my Topical and, finally Tags.

The more complicated the technique is, the lesser will it scale and distribute. Trying to build a Topic Map as a collaborative effort sounds like suicide. Tag webs are build everyday, however. del.icio.us and Technorati Tags demonstrate this daily.

Now, why do tags work?

I first had a hard time believing that tags are useful and scale. After all, tags are essentially one-dimensional faceted metadata. When I built Topical, I knew this and invented “n-dimensional tags”, in that you could build a hierarchy of tags. They work wonderfully for me, but I use them alone. They will not scale very good, and I know that. Especially, you need one to maintain the tag hierarchy.

Over the time, however, as usage of del.icio.us raised, and Technorati started to use tags too (and their wonderful idea of combining tags of several sites, such as del.icio.us and furl.net), I really wondered why they work.

The reason tags work is that tags are common-sense. The less complex your hierarchy of categorization (there is none in the case of tags) is, the more likely it is that people will do the Right Thing on their own. As Technorati says (emphasis mine):

Think of a tag as a simple category name. Bloggers categorize their posts, photos, and links with any tag that makes sense.

Also, tag webs can be picked-up easily. You read del.icio.us and you pickup tags. Maybe you stumble on one tag you don’t really know what it has behind, so you click on it and see what it relates too. If you want to tag data on your own now, you learned a new tag you can use where appropriate. And there are lists of often-used tags too.

With complex ontologies, this is not (easily) possible. No one can possibly remember “Thing/Person/Person of public interest/Singer/21th Century/Madonna”, but everyone can (and will!) tag with “Madonna”.

And it doesn’t matter when other people will tag figures of wood as “Madonna” too, because you can look for “Madonna Music” then. Someone will have tagged it that way.

This is another reason for why tags just work: Because adding metadata is so easy, lots of people can do it, and they will add lots. And “bad” (that is, wrong) tags don’t hurt much (fortunately, tag spamming is still rare).

Still, there is one problem with tags: Due to their limited dimension, it is hard to get an overview of the tag web, even if you have a lots of often used tags. Tags only make sense if you have an consistent, but big set of them. However, I think, this can be solved using metatagging. Why not tagging tags itself?

Say, you want to see all bookmarks about “Music”, but there are bookmarks about “Madonna” that are not tagged itself as “Music”. Still, these pages would be included if the tag “Madonna” was tagged as “Music”.

The problem is now managing these metatags. In spirit of true tagging, this should be done by the same people (and with the same rights) as tagging itself. I’m not sure if this will work; in doubt, one has to limit metatagging. But then, I’ve been proven wrong often enough with respect to tags.

NP: Bright Eyes—Time Code

« March 2005 May 2005 »