Is this just another phase? earthquakes making waves
Trying to shake the cancer off? oh, stupid human beings
Once you hold the hand of love,… it’s all surmountable.

28dec2004 · Push vs. Pull

Push me, pull me… or just pull me out.
— Pearl Jam, Push Me, Pull Me

Currently, there are three popular ways of parsing XML: The DOM, the SAX and Pull APIs.

DOM is probably the most well known, and it did a great job showing off XML’s slowness. This is because DOM reads the full XML into the core as a tree structure. The nodes of the tree mainly consist of Element nodes and Text nodes. DOM is very easy to use, but also very inefficient. Ruby’s REXML is mainly used in the DOM mode; although it’s API is rubyfied and made a lot nicer, the core idea is the same: provide a tree structure to XML documents.

The downsides of DOM (and the complexity of implementation) turned out pretty quickly, and soon the Simple API for XML (SAX) was implemented in Java. SAX is a streaming API, the SAX parser scans over the document and emits events. For example, the simple document

<doc>This is a <em>very</em> simple XML document.</doc>

emits these events (simplified):

start_element "doc"
text "This is a "
start_element "em"
text "very"
end_element "em"
text " simple XML document."
end_element "doc"

Usually, the program defines a new class whose methods get called by the SAX parser.

SAX is very efficient (the API is quite like the Expat one, which is well-known being the fastest XML parser on earth), but can be amazingly complex to use. (Personally, I don’t have a hard time with it—at least for easy stuff—, but there are people that don’t like state machines.)

REXML also provides a SAX interface, which is quite fast, but of course not like the real thing. If you want absolute performance, I still can only recommend the Ruby Expat bindings. Though the binding was not updated recently (my version is from 2002), it runs like hell.

The last, and the IMO most interesting way to parse XML is a pull API. As usual, the first implementation was written in Java (XPP1, I think) and the API was only informally (i.e. by implementation) defined. The XmlPull API changed that. For Java, there now exist a bunch of XML pull libraries that can all be accessed using the same way.

REXML provides a Pull API, too; but has been marked experimental and they don’t recommend production use. Besides, it has problems with namepaces (which is true for REXML in general. :-/).

Although being technically one step lower than SAX, Pull APIs are much more intuitive to program. The Pull API requires you to fetch each event by yourself, therefore you don’t need a state machine as you can simply get the next events when you need them, not when they arrive. This can be shown best at a simple example:

My old homepage provided an index facility, where words marked in idx and hidx (for hidden index) are aggregated into a special page. I implemented the program first using REXML and its DOM API. The page to link to can be found out getting the name attribute of the page root node. I basically needed an index word to page hash. Here is the parsing code:

doc = Document.new(File.open(f))
htmlfile = XPath.first(doc, "/page/@name").to_s

["//idx", "//hidx"].each { |xp|
  XPath.each(doc, xp) { |child|
    word = child.text.squeeze(" ").strip
    index[word] ||= []
    index[word] << htmlfile
  }
}

This program was a total performance hog, and intolerable to run (for me, at least). I rewrote it using expat:

class IdxParser < XML::Parser
  attr_accessor :index

  def initialize(encoding = nil, nssep = nil)
    @save = false
    @word = ""
    @htmlfile = ""
  end

  def startElement(nsname, attr)
    ns, name = nsname.split(";", 2)

    if ns == $IDXNS && (name == "idx" || name == "hidx")
      @save = true
      @word = ""
    elsif name == "page" && attr["name"]
      @htmlfile = attr["name"]
    end
  end

  def endElement(nsname)
    ns, name = nsname.split(";", 2)

    if ns == $IDXNS && (name == "idx" || name == "hidx")
      @save = false
      @word = @word.squeeze(" ").strip
      @index[@word] ||= []
      index[@word] << @htmlfile
    end
  end

  def character(t)
    @word << t if @save
  end
end

This program works very fast, but it was a bitch to write and lots longer, of course. For complex formats, SAX gets darn complicated, trust me.

Finally, the elegant pull API (this implementation uses REXML’s Pull API which has a bit weird API):

parser = REXML::Parsers::PullParser.new(io)
htmlfile = nil

parser.each { |event|
  if event.start_element? && event[0] == "page"
    htmlfile = event[1]["name"]
  end

  if event.start_element? && event[0] =~ /idx:h?idx/
    word = parser.squeeze(' ').strip
    (index[word] ||= []) << htmlfile
  end
}

Since the REXML Pull API didn’t fully satisfy my needs, I decided to rewrite a pull parser for ruby from scratch (Heck, I even spent a day hacking it in pure Ruby, but soon gave up.), based on the Pull API exposed by libxml2. I think above example will read something like this when it’s done:

parser = FastPull.new(io)
htmlfile = nil
while parser.pull
  if parser.start_element? && parser.name == "page"
    htmlfile = parser.attributes["name"]
  end

  if parser.start_element? && parser.name =~ /{#{$IDXNS}}h?idx/
    word = ""
    while parser.pull.text?
      word << parser.text
    end

    (index[word.squeeze(' ').strip] ||= []) << htmlfile
  end
}

Now, you have the easiness of DOM and the speed of SAX. :-) (And I can look at XML again.)

NP: Pearl Jam—Push Me, Pull Me

26dec2004 · Ruby Christmas

My Christmas post to ruby-talk:

s="IyBUaGFua3MgZm9yIGxvb2tpbmcgYXQgbXkgY29kZ
S4KIwojIENvcHlyaWdodCAoQykgMjAwMiAgQ2hyaXN0a
WFuI      E       5       l       d     Wtpc
mNoZ  W       4      gP       G       N obmV
1a2l      y       Y 2hlb  k       B     nbWF
pbC5  j       b    20+CiM     K       I yBUa
GlzI      H       Byb2dyYW        0     gaXM
gZnJ  l       Z  SBzb2Z0d2F   y       Z Tsge
W91I      G     NhbiByZWRpc3      R     yaWJ
1dGU  g        aXQgYW5kL29yCi M       g bW9k
aWZ5      I   Gl0IHVuZGVyIHRoZ    S     B0ZX
Jtcy  B      vZiB0aGUgR05VIEdlb       m VyYW
wgUH      V      ibGljIExpY       2     Vuc2
UuCg  p       T VERPVVQuc3lu  Y       y A9IH
RydW      U    KZDEsIGQyID0gM     C     4xNS
wgMC  4       wNgpzID0gIk1lcnJ        5 IGNo
cmlz      d  G1hcywgLi4uIGFuZCB   h     IGhh
cHB5  I     G5ldyB5ZWFyIgptID0gJ      X d7LC
AuID       ogISArICogMCBPIEB9CnUg P     SAiI
CIgK  i   BzLnNpemUKCnByaW50ICJcci    A gI3t
1fVx      y   IjsKCigwLi4ocy5z    a     XplL
TEpK  S      50b19hLnNvcnRfYnkg       e yByY
W5kI      H 0uZWFjaCB7IHxyfAogIH  N     sZWV
wIGQ  x    CiAgbmV4dCBpZiBzW3JdID     0 9ICI
gIls      wXQogIG0uZWFjaCB7IHxrfAo      gICA
gdVt  y  XSA9IGsKICAgIHByaW50ICIgIC   N 7dX1
cciI    KICAgIHNsZWVwIGQyCiAgfQogIHV    bcl0
gPSB   zW3JdCiAgcHJpbnQgIiAgI3t1fVxyI g p9Cg
pzbG  VlcCBkMgpwcmludCAiICAje3V9IVxyI   jsKc
2xlZ  X       A    gMwpwc     m       l udCA
iICA      j        e3V9IS A       g     LS1j
aHJp  c       z    JcbiI7     C       g ojIG
ZpbG      x        lciBzc G       F     jZSA
jIyM  j       I    yMjIyM     j       I yMjI
yMjI      y       M       j       I     yMjI
yMK";eval s.delete!(" \n").unpack("m*")[0]##
### Copyright (C) 2004  Christian Neukirchen

Ruby 1.8.2 is out!

NP: Heather Noel—Santa Came On A Nuclear Missile

26dec2004 · A Koan on Diversity

The master asked the novice: “What should be prefered? A shower with one knob, or a shower with two knobs?”

“The one with two knobs, of course, because it allows for greater diversity during showering.”, the novice responded.

“Then you shalt receive the shower with two knobs: One knob for too hot water, and the other one for too cold water.”, the master said.

Upon hearing this, the novice was enlightened.

NP: Bob Dylan—Someone’s Got A Hold Of My Heart

23dec2004 · Merry Christmas!

Turkey Liberation Front Strikes Again (Happy New Year)

Frohe Weihnachten, ein schönes Fest, und einen guten Rutsch ins neue Jahr wünscht euch Christian Neukirchen

Merry Christmas and a Happy New Year!

NP: Die Roten Rosen—Merry X-Mas Everbody

22dec2004 · Winterferien!

Endlich, die Winterferien sind da!

Da man (wie üblich) in der letzten Woche (außer einer Französisch- und Mathe-KA) nix sinnvolles mehr gemacht hat, muss das auch dokumentiert werden:

Am Freitag wurde ein Tageslichtprojektor als Drache verkleidet (meine Fresse):

In Chemie haben wir einen Spitzer angezündet:

Außerdem gabs da auch vier Bunsenbrenner, weil ja 4. Advent war: (Man beachte auch die Wodkaflasche auf dem Tisch…)

Schließlich, die letzten Quotes im Jahr 2004!:

Klima! — Das ist mir zu allgemein. — Lebensraum!

Bist du schön? — Nein, … ähm JA!

Schwarz wie Zedernholz.

Ein Maulwurf zum anderen: “Was ist los? Du siehst du aufgewühlt aus!”

Keine Angst, nächstes Jahr gehts weiter!

Allen, denen von mir aus noch nichts frohes gewünscht wurde:

Frohe Weihnacht, frohes Chanukka und passt auf die Mauer auf!

NP: Die Roten Rosen—We Wish You A Merry Christmas

20dec2004 · PowerPoint in schools

I just found PowerPoint Remix by Aaron Swartz via del.icio.us, and this “slide” I couldn’t leave behind unblogged (emphasis mine):

PowerPoint in schools

disturbing!

must find replacement

Good: teaching kids to smoke

Better: close school, go to Exploratorium

Best: write illustrated essay

NP: Bob Dylan—License To Kill

19dec2004 · 4. Advent

Soo, 4. (und letzter ;-)) Advent heute. Und pünktlich zur bald folgenden Weihnachtszeit hat’s auch schon schön geschneit; der Schnee bleibt sogar liegen. Wunderschöne Winterlandschaft draussen.

Als Nebeneffekt dessen haben wir heute den Gefrierschrank enteist. Bilder gibt’s leider keine, aber dafür das von letztem Jahr nochmal:

Noch drei Tage Schule dieses Jahr. Der Schülergottesdienst ist übrigens am Dienstag, damit sich die Schüler am Mittwoch in der ersten nicht besaufen können…

NP: Pearl Jam & Neil Young—Truth Be Known

17dec2004 · Songs And Artists That Inspired Fahrenheit 9/11

I just discovered this, IMO very felicitous, compilation of songs by Michael More, called “Songs And Artists That Inspired Fahrenheit 9/11”. Although I don’t fully agree with him in political ways (he has a point though, for sure.), I think he has a good taste for music:

I Am A Patriot—Little Steven & the Disciples of Soul
Chimes Of Freedom (Live)—Bruce Springsteen
With God On Our Side—Bob Dylan
We Want It All—Zack de la Rocha
Boom!—System Of A Down
No One Left—The Nightwatchman
Masters Of War (Live)—Pearl Jam
Travelin’ Soldier—Dixie Chicks
Fortunate Son (Live)—John Fogerty
Know Your Rights—The Clash
The Revolution Starts Now—Steve Earle
Where Is The Love?—Black Eyed Peas feat. Justin Timberlake
Good Night, New York (Live)—Nanci Griffith
Hallelujah—Jeff Buckley

NP: Little Steven—I Am A Patriot

16dec2004 · Wie man die iBook-Tastatur entnervt

Seit ich mein iBook erhalten habe, war sein Tastaturlayout mir ein Dorn im Auge. Das @ liegt auf Alt-L, und nicht auf AltGr-Q, ~ ist auf Alt-N, und nicht auf AltGr-+, usw. Besonders störend empfand ich die Anordnung der Klammern, die man beim Programmieren laufend braucht ([ und ], {, }) sind sehr seltsam angeordnet. Den Vogel schiesst aber \ ab: Alt-Shift-7 finde ich unmöglich zu tippen!

Etwas googlen brachte mich zur exzellenten Anleitung von Heiko Hellweg, die auch einen Link zu seiner “Deutsch-PC”-Keymap liefert. Die Datei kopiert man nach ~/Library/Keyboard Layouts und loggt sich neu ein. Jetzt kann man schon mal mit Alt und der gewohnten Taste die Sonderzeichen einfügen.

Das gewohnte AltGr liegt allerdings auf der linken Seite der Tastatur. Ich habe beschlossen, die Enter-Taste (zwischen linker Apfeltaste und Cursor-Rechts) auch auf Alt zu mappen. Die oben erwähnte Seite empfiehlt uControl, welches ich aber nicht installieren konnte, da mein Mac OS X 10.3.6 “zu neu” war und sich der Installer deshalb beschwert hat. DoubleCommand tut’s aber für diesen Zweck genausogut. Nach der Installation einfach “Enter acts as option key” in System Preferences/Double Command auswählen. (Wer will, kann auch gleich noch “Disable Capslock” aktivieren.) Nach der Aktivierung aller Einstellungen kann man die neuen Tasten testen.

Ich persönlich hatte jetzt noch Probleme, in einigen Anwendungen wie z.B. Colloquy oder Terminal.app | einzugeben. Es stellte heraus, dass Alt-< noch als Shortcut für moveToBeginningOfDocument: eingestellt war, deshalb habe ich dieses (von mir bis jetzt noch nie verwendetes) Keybinding einfach aus ~/Library/KeyBindings/DefaultKeyBinding.dict auskommentiert. (Diese Datei war bei TextExtras dabei und bietet noch mehr Emacs-Keybindings.)

Alle Einstellungen wurden unter Mac OS X 10.3.6 auf einem iBook G4 ausgeführt; fröhliches Hacken.

NP: Die toten Hosen—Alles wird vorrübergehen

15dec2004 · Ruby related news

There have happened some quite interesting Ruby things:

rand.rb has been officially released. (It used to be in RPA for several weeks by now.)
Jamis Buck has a cool story about building transcription software that uses Ruby, Ogg Vorbis and vim.
Nukumi2 now has documentation. At least a little bit to get you started. :-) I’m quite confident I get a 0.1 out this year…

NP: Pearl Jam & Neil Young—Act of Love

15dec2004 · Collaborative Coding

Yesterday, on #ruby-lang, we found out that iTunes doesn’t really satisfy our needs. Especially the playlist handling misses a lot of stuff I would like. (For example, I am unable to import a .m3u playlist keeping the songs in exactly that order. And that order makes sense.) Quickly it was decided that writing a music player in Ruby was the thing to do. :-)

Ilmari Heikkinen and me started shortly after using SubEthaEdit, a collaborative editor (both for the first time, by the way). Collaborative means that both users can edit the same buffer at the same time, which will get interesting

SubEthaEdit (which unfortunately exists only for Mac OS X) was very easy to setup. The host needs to open some port range whereas the other users don’t need to change anything. After telling SubEthaEdit the URI of the editing session, all participants can start to hack on. (Control of permission is available too; for example, you can let some people only read the buffer but not edit.)

Every user gets his own background color. Quickly, the display was full of colorful Ruby code.

This way of editing is very impressive. Usually, one participant starts coding something while the others look (or correct his mistakes). Then, as soon something becomes unclear (note that the participants can’t actually talk to each other; communication works mainly via code, and this is a good thing), he changes it to something that makes more sense to him, or adds a comment.

Then, discussion starts. Even more comments get appended until a solution (usually happens rather quickly) is found. This part is probably the most fun, and the most useful one too. Simple coding is not enough as all code gets reflected by several minds at the same time.

This is usually considered the strength of pair programming (an XP virtue), but with ordinary pairing, only one person has control over the keyboard while the other “just” watches. They have the advantage of being able to talk to each other, though. (This could possibly work using some kind of VoIP software over the net, too, but probably wouldn’t have worked very good in our case, as we both weren’t native English speakers.)

All in all, it was a very exciting and fun thing to do. I can only recommend to try it on your own (if you didn’t already do it). SubEthaEdit for now only runs on Mac OS X and I don’t know of any other collaborative editors (you could try putting an Emacs frame on another box using X11…). It probably can’t be that hard to write something similar on your own…

NP: Pearl Jam & Neil Young—Song Xalto

12dec2004 · 3. Advent

So, mit dem heutigen 3. Advent ist der Biberacher Weihnachtsmarkt, äh… Christkindles-Markt auch rum. Wurde auch Zeit, länger als zwei Wochen kann man Rolf Zuckowski ja auch nicht ertragen, so viel Glühwein wie man wolle…

Und weil heute der 3. Advent ist (was für ein Grund, gell?), gibt’s mal ein Recept (frei nach Cooking for Engineers):

Pfundstopf für 10–12 Personen

Ofen vorheizen: 185–200°				2-2½ Stunden im Backofen braten, letzte Stunde ohne Alufolie
2 kleine Flaschen Kraft Schaschliksauce	Vermischen		Sauce über den Pfundstopf geben
1 Becher Sahne
1 große Dose Tomaten	Grob würfeln	In dieser Reihenfolge in den Gänsebräter schichten
1 Pfund magerer Räucherbauch
1 Pfund Zwiebel
1 Pfund Schweinegulasch (oder Schnitzelfleisch)
1 Schote grüne Paprika
1 Schote rote Paprika
1 Pfund Hackfleisch
1 Pfund heißgeräuchertes Kassler
Reis	Als Beilage
Weißbrot

NP: Elliot Smith—A Fond Farewell

10dec2004 · Willkommen in der Gosse

Wird mal wieder Zeit:

Ich war Ministrant, Oberministrant natürlich. Willkommen in der Gosse!

SOS—Save Our Lives [und jetzt lernen wir, wie man buchstabiert…]

[erklärt Call of Duty:] Es gibt Nazis und Deutsche…

Leitkultur, nicht Führernatur!

NP: Ton Steine Scherben—Resolution (Bert Brecht)

08dec2004 · My first iBook

Today, my first ever iBook arrived (I ordered it last Wednesday, if there wasn’t the weekend in between, I’d have gotten it even faster.), and so far, it totally kicks ass. I bought the 12.1" one with a 60 GB disk for even more music/code/other cool stuff. :-)

It is just the right size, and the features and look are simply amazing. And you don’t hear a thing.

I have never used Mac OS X before, but after toying around a bit with it, it really looks nice and usable (heck, Expose rocks).

So far, I only found on thing that turned be off a bit: The keyboard, although of good quality has a quite, umm, twist^Winteresting keylayout. Why the pipe symbol | is Alt-6, for example, someone really should explain to me…

After reinstalling it for maximum customization (a hint ThreeDayMonk gave me), I think I’m going to spend some days to adopt it my working style.

BTW, I called it lilith because my (main) box is dubbed paradise, and there are nice, medieval paintings where Lilith (mythologically Adam’s first wife) gives Eva the Apple. :-)

Of course, such an event needs to be documented photographically:

The box of my iBook

Opening the box

The iBook, still in safety cover

Getting it boot...!

NP: Pearl Jam & Neil Young—Fallen Angel

08dec2004 · femtoblog

On ruby-talk, there recently was a thread about the nicest programs for Ruby signatures. I proposed femtoblog, a tiny CGI blog in only 127 bytes:

puts"Content-type: text/html\n\n<h1>Blog",Dir["*.entry"].sort_by{
|f|-File.mtime(f).to_i}[0,9].map{|f|"<h2>#{IO.read f}<hr>"}

Of course, that code shouldn’t be taken too seriously… heck, I need to get Nukumi2 out.

NP: Die Roten Rosen—Ihr Kinderlein Kommet

05dec2004 · 2. Advent

Frohen zweiten Advent euch allen!

Ja, auch den C-Programmiern:

Advent, Advent, 0 Lichtlein brennt.
Erst eins, dann zwei, dann drei, dann vier,
dann steht das Christkind vor der Tür.

Advent, Advent, 1 Lichtlein brennt.
Erst eins, dann zwei, dann drei, dann vier,
dann steht das Christkind vor der Tür.

Advent, Advent, 2 Lichtlein brennt.
Erst eins, dann zwei, dann drei, dann vier,
dann steht das Christkind vor der Tür.

Advent, Advent, 3 Lichtlein brennt.
Erst eins, dann zwei, dann drei, dann vier,
dann steht das Christkind vor der Tür.

Advent, Advent, Segmentation Fault

NP: Die Roten Rosen—I Wish It Could Be Christmas Every Day

02dec2004 · Quotes am Donnerstag

Auch mal wieder Zeit:

[Lehrer:] Respekt hab’ ich vor euch gar keinen. — Das beruht auf Gegenseitigkeit!

Man hat das ja gesehen, im 3. Reich mit der Zivilcourage — die hatten ja alle den Schwanz im Arsch. [meint: eingezogen.]

Der Bundestag ist ein gesetzlicher Feiertag.

Queen Mom hat jeden Tag Gin gegessen.

NP: Ton Steine Scherben—Wir müssen hier raus!

« November 2004 January 2005 »