leah blogs: May 2005

« April 2005 June 2005 »

30may2005 · Abi-Gag im PG

Heute, am ersten Schultag (*seufz*) nach den Pfingstferien, war also Abi-Gag im benachbarten Pestalozzi-Gymnasium—kein wohlgehütetes Geheimnis, ich wusste das schon vor den Ferien.

Sie hatten ein generisches Thema, “Märchenwald”, aber recht nette Kulissen, einen Dennete-Stand, eine Hüpfburg und eine Wasserlandschaft, in der die 5. und 6. Klässler ihre Wet-Shirt-Contests durchführen konnten, zumindest sah es danach aus.

Livemusik gabs auch, der wir durch schlechte Dämmung des WG-Computerraums teilhaben durften. Sollten meine Schreikrämpfe bei Verunglimpfung und starker akustischer Umweltverschmutzung als sie anfingen “Basket Case” (war schlimm), “Teenage Dirtbag” (noch schlimmer) und last but not least “Highway to Hell” (ich hätte nicht gedacht, dass es schlimmer kommt als von Corinna May) zu spielen, jemanden im Matheunterricht gestört haben, bitte ich um Entschuldigung.

Erstaunlicherweise klang die Band dann danach besser, als wir direkt vor den Boxen saßen… komisch.

Ich bin dann ja mal gespannt, wann unser Abi-Gag stattfinden wird. Teile der Kulisse hängen schon zusammengerollt an der Fassade, es wird wohl diese Woche sein.

Lieber saurer Joster als süße Siebzehn.

Katholisator

[über die Schützenabzeichen:] Merkt man wenigstens nicht, wenn man versehentlich draufpisst…

NP: Adam Green—Down On The Street

26may2005 · Pulling from Streams

It is interesting that in the “real world”, most things go easier into one direction than in the other. For example, it is easier to break a coffee cup than to glue it together again. I think this has something to do with entropy.

Computer programs are not unaffected by this fact, take for example the two styles of on-the-fly XML parsing, Push vs. Pull. Translating a Pull API into a Pull API is done trivially, such code usually looks like that:

while parser.pull
  case parser.state
    when :open
      receiver.open parser.text
    when :text
      receiver.text parser.text
    when :close
      receiver.close
  end
end

Doing the reverse—converting a Push API into a Pull one—looks impossible at first, but that is not true. It can be done (albeit only in limited ways in Ruby, see below) using our beloved continuations.

I have written a small class, Stream2Pull that does this crazy task:

class Stream2Pull
  attr_reader :data

  def initialize(&block)
    @data = nil
    @continuation = lambda { instance_eval &block }
  end

  def method_missing(*arguments)
    @data = arguments
    @again = true
    switch
  end

  def pull
    @again = false
    switch
    @again
  end

  def switch
    save = @continuation
    callcc { |@continuation| save.call }
    save = nil
    GC.start
  end
end

Stream2Pull takes a block it will call on itself on the first execution of pull. This code is expected to send undefined messages to the instance. Then the “magic” happens: With the call of switch, the context of the program changes from method_missing to the call of pull, which then returns. Now the caller can examine the undefined message—the event—by looking at data. Let’s wrap REXML’s StreamParser that way:

s_to_p = Stream2Pull.new {
  REXML::Parsers::StreamParser.new(<<XML, self).parse
<?xml version="1.0"?>
<info>abc<em>hehe</em></info>
}

Now, we can use our new object just like a pull parser:

while s_to_p.pull
  p s_to_p.data
end

We just repeatedly pull from the parser, thereby always switching contexts between the inner loop of the StreamParser and ours. These are the events that are successfully emitted:

[:xmldecl, "1.0", nil, nil]
[:tag_start, "info", {}]
[:text, "abc"]
[:tag_start, "em", {}]
[:text, "hehe"]
[:tag_end, "em"]
[:tag_end, "info"]
[:text, "\n"]

We just have converted a Push API into a Pull API!

Now, we can talk about the dark side of the medal. Ruby’s continuations are very slow and they eat loads of memory. When I tried parsing a 1.5 megabyte file—the largest one I could find on my disk, it is actually the XSL specification—I didn’t yet have the GC.start inside switch. In about 2 minutes, my Ruby gobbled 400 megabytes of memory! After enforcing garbage-collection, it took less memory but memory usage was still raising constantly. After 15 minutes and 100 megabytes used I gave up.

These are the things I really want to see in (I hope not too far away) future Ruby 2 implementations: Efficient tail calls, continuation passing style, and possibly support for partial continuations. If you implement CPS, you can get the others almost for free. Those features would make above code run at about the speed of a “native” pull parser. Alternatively, please port the Ruby standard library to Scheme. }}:-) Just kidding.

If you just want to parse XML, it’s all not too bad, actually. This is because REXML’s StreamParser *is* actually a wrapper around a PullParser (REXML::BaseParser)! (Of course, I knew that in advance.) Therefore, we can easily write:

bp = REXML::Parsers::BaseParser.new(ABOVE_XML)
while bp.has_next?
  p bp.pull
end

And it will generate exactly the same output as above. :-) The main difference is, well, it uses constant memory and only needs 15 seconds to parse the big XML document. And that’s not too bad for a pure Ruby implementation.

NP: Bob Marley—Three Little Birds

24may2005 · Parsing Vooly in C

Yesterday I started writing a Vooly parser in C, mainly for reasons of speed and to provide bindings for lots of other languages without needing to reimplement the exact (yet unfinished) specification.

Before I did that, I extended Vooly with a new mechanism for escaping. Until now, it was not possible to write << and >> as text in Vooly, because those sequences were delimiters. With the new variable length terminators, you can easily do that by just using more angle brackets, for example:

<<note
To bitshift in C, use code like:
  <<<code foo = bar >> 2 >>>
or, to shift left:
  <<<code foo = bar << 2 >>>.
Pretty easy, no?
>>

Since the code blocks start with <<<, you need >>> to close them again (and at least <<< to open a new tag inside).

The “fun” thing about this addition was that it totally crushed by old Ruby parser that was based on StringScanner, and about 30 lines long. It turns out that the syntax is not regular anymore, and therefore is hard to parse using ordinary regular expressions.

All my Vooly parsers are pull parsers because these are comparatively easy to implement (push can be easier, sometimes), but also pleasant to use and quick.

Now, this is the way I wrote the C implementation: First, I rewrote the Ruby parser until it could run the old test-suite I had for the previous parser. Then, I added tests for variable length terminators and… I created a monster. I actually wrote a parser that would dynamically generate and compile regular expressions, keep them on a stack, and match them against the input string.

Seriously, I never wrote such a weird parser. Look at some snippets:

TEXT_REGEXP = Hash.new { |hash, key|
  hash[key] = /.+?(?=<{#{key.size}}|(>{#{key.size}}))/m
}
    ...
    elsif @scanner.scan(@openrx.last)
      @textrx << TEXT_REGEXP[@scanner[1]]
      ...
    elsif @scanner.scan(@closerx.last)
      ...
      @textrx.pop
    elsif @scanner.scan(@textrx.last)
      ...
      @text = @scanner.matched
    ...

It was among the most craziest code ever, but it told me my idea worked as I wanted. It was about 60 lines of Ruby. Still, if I were to implement it like that in C, I’d probably have gone crazy. You maybe should know that I didn’t do (serious) C in a long time, and I’m totally spoiled by Ruby’s string handling, garbage-collection etc.

Another, even more serious problem was that my nice parser only read from Strings (due to StringScanner), but I also wanted to read from streams (IO). Therefore, rewrite! Just too good I have a fairly complete test suite…

I reimplemented the parser in Ruby rather quickly; according to SLOCCount, it is whole 108 lines of Ruby as now, fully featured. I tried hard to use not very Ruby-specific features; the code doesn’t use regular expressions for parsing at all, it’s just a simple state machine (oh I love those :-)) with five states. Also, I don’t use a lot of OO features, only instance variables. The rest is written in a plain old procedural style. While it surely isn’t the most elegant Ruby I ever wrote, this is extremely helpful for the next step.

Let’s see what features and methods of Ruby we use:

String, Array, Integer, Symbols, true, false, nil
One user-defined class with instance variables
String#strip!, String#[], String#empty?, String#<<, one trivial substitution (@text.sub! /\s$/, '')
IO#getc, IO#eof?, IO#ungetc
Array#last, Array#pop, Array#size, Array#<<

That was it. And, believe me, that part of Ruby was very easy to write in C. I basically could go over the source and translate it line by line. Symbols would get enums, my class would get a struct, the instance variables its fields. String#strip! can be written in 5 lines of code, String#<< needs a bit help because of the memory management, but it’s not too bad. The IO methods map directly to the C standard library, and the Array stuff can be written using a statically allocated array and a “head pointer”. Just as you’d usually implement stacks in C.

I think I wrote the guts of the C parser in about one and a half hour. I ported some test cases over, and they ran fine. I estimate the total time to port the second Ruby version to C to about five hours, including documenting the API. The C version is, as of now, 275 lines of code according to David A. Wheeler’s SLOCCount. Not very much, in the end.

If I were to write the C parser directly, I fear I would have lost track early, have no chance to test it easily, and if my idea wouldn’t have worked, it would have wasted a lot of time. Clearly, I wrote lots of “superfluous” code during the port, but it helped me getting a good understanding of the code and its working.

Using a more flexible language just allows for much shorter turnaround times and makes development a lot more fun. I do rather see 30 “undefined method foo” exceptions than three “segmentation faults” per hour. Debugging Ruby is just a lot easier. I debugged the Ruby versions using p and testcases only, but I would have wasted a lot of time if I didn’t use gdb for checking some things in the C version. Lucky me, I didn’t need to open gdb at lot.

All in all, I wrote a piece of non-trivial C code for a new idea by

Implementing/extending a test suite in Ruby.
Implementing a new parser using complex Ruby without limitations.
Testing the implementation and extending the testcases for the new feature.
Implementing the new feature.
Rewriting the parser in “low-level” Ruby until it passes the test suite.
Porting the low-level parser to C.
Testing and documenting the C parser.

Every step is clearly defined, and the list “just” can be followed sequentially. Should any unexpected things happen, feel free to backtrack until you get everything working.

I am very confident with this way of developing efficient and short C code, and will likely do it that way in the future again, should (there will…) be need for it.

NP: Dan Bern—Mars

20may2005 · Off to Büsum

Alas, tonight we’re driving to Büsum for holiday. I expect to be back next Saturday evening. As usual, I don’t have net access in that time, at least I think so. This means no blogging and no mail.

If you want to see the house we’ll be residing in, go ahead.

And please excuse light blogging this week, nothing relevant happened and I got sidetracked by some things. I expect that to get better when I’m back again. Who knows what I’ll code on this trip. :-)

Oh my, how will I survive that 10 hour drive?

NP: Pitty Sing—Radio

17may2005 · Lazy Streams for Ruby

On #ruby-lang, we talked about SICP yesterday, and that reminded me I didn’t look into this wonderful book for a long time. I had enough time today to open it…

Quickly skipping over the first two chapters that I remembered fairly well (and that are “easy”), I started reading chapter 3 Modularity, Objects, and State. The section on Streams finally made me switch to my always-open Emacs window and port that beautiful Scheme into even better Ruby. :-)

In short, streams can be explained like this:

From an abstract point of view, a stream is simply a sequence. However, we will find that the straightforward implementation of streams as lists doesn’t fully reveal the power of stream processing. As an alternative, we introduce the technique of delayed evaluation, which enables us to represent very large (even infinite) sequences as streams.

Stream processing lets us model systems that have state without ever using assignment or mutable data.

What a great target for an functional installment in Ruby! Should you not be any familiar with streams, I recommend you to read above chapter now, and come back after. Or just read on, if you want to bite into the code.

To make use of Ruby’s object-orientation, I decided to make each string-cons instances of class Stream. A Stream contains two objects, car and cdr, whereas cdr is evaluated lazily. In Ruby, we do this using a block:

class Stream
  def initialize(car, &cdr)
    @car = car
    @cdr = cdr
  end

  def car; @car; end
  def cdr; @cdr.call; end
end

Now, we can create a simple cons:

cons = Stream.new(1) { 2 }
[cons.car, cons.cdr]  # => [1, 2]

That way of creating new conses is not very beautiful, but everything we get in Ruby (no macros…).

We will also need a few helper methods:

class Stream
  def self.interval(low, high)
    if low > high
      nil
    else
      new(low) { interval low+1, high }
    end
  end

  def filter(&pred)
    if pred.call car
      Stream.new(car) { cdr.filter &pred }
    else
      cdr.filter &pred
    end
  end

  def take(n)
    stream = self
    results = []
    n.times {
      results << stream.car
      stream = stream.cdr
    }
    results
  end

end

Stream.interval(1, 100) will return a new stream that will evaluate in the numbers of 1 to 100.

Stream#filter returns a new stream that only contains elements that satisfy a given predicate. Lazily of course.

Stream#take finally will evaluate the string n times and return an Array of the evaluated values.

Now, let’s do an example from SICP, calculate the third prime between 10000 and 1000000. If this was evaluated eagerly, you still wouldn’t see the result. Instead, we simply run:

def prime?(n)
  2.upto(Math.sqrt(n).round) { |i| return false  if n % i == 0 }
  true
end

Stream.interval(10000, 1000000).filter { |s| prime? s }.take(3).last
# => 10037

The answer comes instantly. Why is this? In the end, the stream behaves just a loop: We loop over all the integers in the range, and, basically, trackback if the numbers aren’t primes. Now, we do that only three times, because that often the stream will be evaluated. Therefore, we only test 37 numbers for being prime, not 999000.

Another nice example are fibonacci numbers, where each element—I hope you knew that already—is the sum of its successors. But first, we need to add two streams:

class Stream
  def self.fibonacci
    new(0) { new(1) { fibonacci.cdr + fibonacci } }
  end

  def +(other)
    Stream.new(self.car + other.car) { self.cdr + other.cdr }
  end
end

A quick test, it works:

Stream.fibonacci.take(10)
# => [0, 1, 1, 2, 3, 5, 8, 13, 21, 34]

Now, by the way, we also can calculate the golden mean, which is the can be approximated as the quotient of two successive fibonacci numbers.

n = Stream.fibonacci.take(20).last(2)
n[1] / n[0].to_f            # => 1.61803405572755
(Math.sqrt(5.0) + 1) / 2.0  # => 1.61803398874989

Unfortunately, lazyness with lambdas is not exactly a wonder of speed in Ruby, therefore Streams are, beyond their very interesting properties, not of much use for the usual Ruby programmer. In purely functional languages (e.g. Haskell), where lazy evaluation is a common thing, they are however a very common way of implementing state without actually changing things.

NP: Jetscreamer—Front Porch

15may2005 · Pfingstparty

So, Fabses Pfingstparty war ja ein voller Erfolg. Ich lasse einfach mal Bilder sprechen:

Der Kofferraum nach dem Einkauf

Bierschrank

Hartalk

Wohnzimmer

WG-Trommler

Hauge gewinnt den SchlÃ¼-Contest (leider)

Der Rest (und trotzdem nicht alle, hÃ¤hÃ¤) sind bei Fabse online.

Andi gibt auch noch seinen Senf dazu.

Update: Meine Güte, jetzt hab ich doch tatsächlich die beiden Knüllersprüche des Abends vergessen:

Der Tag hat keine Zukunft [Um 21:51…]

Das Leben ist kein Ponyhof!

NP: Apollo Sunshine—I Was On The Moon

15may2005 · Introducing Rye

After trying almost every web framework for Ruby out there, I still could not find one I really liked (Wee was close, but has non-restful URLs), so I simply decided to grow my own (hey, you know me ;-)), called Rye.

Rye is a framework of the different kind. I don’t have a lot of code yet, and lacks a lot of functionality, but I could extract the basic idea already and will try to explain it here.

Rye provides an RESTful interface to stored objects. So far, my objects are simply stored as YAML files, but that will not scale for bigger amounts of data. Rye maps these objects onto URLs like this:

http://<server>/<object-path...>/<method>/<arguments...>?<query>

So far, Rye can handle the two most commonly used (and supported, sigh) HTTP methods, GET and POST. In the case of GET, Rye will retrieve/load the object, call the method on it (possibly with the arguments and the parsed query) and, depending on the return value of the method, render the generated page or call show, the default renderer for the object. POST is treated similary, except that the object is saved to the store again before rendering the site. Therefore, GET can’t easily change any values and is idempotent, as required by REST. (Of course, you still can—and are free to—shoot yourself in the foot by manipulating the storage on your own, but it will really be your fault.)

The nice benefit of this is that there is not much to do to use Rye: Usually (so far, at least) you simply define a few YAML specific methods to store only the required instance variables of the object and make your class inherit from Rye::Base, which defines a few helper methods and attributes (maybe this should get a mix-in).

As an example, I developed a very simple to-do list manager:

class ToDoList < Rye::Base
  attr_accessor :title
  attr_accessor :todo
  attr_accessor :done

  def initialize(title)
    super()
    @title = title
    @todo, @done = [], []
  end

  def to_yaml_properties; ["@title", "@todo", "@done"]; end

  def show; template; end

  def add(params); @todo << params["item"]; self; end

  def finish(item)
    @done << item  if @todo.delete item
    self
  end

  def undo(item)
    @todo << item  if @done.delete item
    self
  end
end

As you can see, this is straight-forward Ruby code without any Web specific stuff—except maybe the show method that calls template, a helper defined in Rye::Base that renders the object using a class-specific Kashmir template (of course, you can easily interface with your favourite templating engine).

If you wonder why finish and undo return self, well, that means that Rye will redirect to show them, a useful and logic convention. finish and undo do change data, and therefore need to be called using POST. As of now, you would call finish("blubb") by writing HTML like this:

<form action="/myapp.rye/finish/blubb" method="post">
  <input type="submit" value="Finish blubb!" />
</form>

If you decide to pass additional values (for example, to call add using a form) you can do that too, as usual:

<form action="/myapp.rye/add" method="post">
  <input type="text" name="item" />
  <input type="submit" value="Add!" />
</form>

Since the code essentially only contains the real application logic, it is very easy to unit test. In fact, you test it just like any other class. (Nevertheless, I may add a helper library for unit testing which I dub “ergot” :-))

So far, Rye is only about 150 lines of code, but already covers a lot of the basic functionality. I’ll need to add a generic store, so one can use databases (for example with Og) too and isn’t forced to use YAML. Maybe I get a first release done this week (or possibly not, who knows).

I’d be interested in hearing your comments on Rye, just comment here or mail me.

NP: 33hz—Digital Lover

14may2005 · Wie man eine Shisha präpariert...

…in einfachen bebilderten Schritten:

Unsere Wasserpfeife samt Utensilien.

Trichter aufsetzen und mit Tabak befüllen.

Trichter mit Alufolie bespannen.

Alufolie perforieren.

Metallsieb einsetzen.

Holzkohle reinlegen.

Metallverkleidung aufsetzen.

Fast fertig…

… Schläuche nicht vergessen!

NP: Bob Marley—No Woman No Cry

11may2005 · Kühltieftruhe, Teebauern und sinnlose Lebern

Soo, mal wieder ‘ne Ladung Quotes in die Runde geworfen:

Warum eins kaufen, wenn man für’s doppelte Geld zwei bekommt?

Was hat Jesus vor seinem Tod getrunken? — Kreuzkümmel. [Der ist richtig schlecht. :-P]

You drive me nuting futs!

Kühltieftruhe

Meine Leber hatte keinen Sinn mehr…

Das hab ich jetzt nicht so gemeint, aber ernsthaft.

We are teh pawnz. — Juhuu, wir sind Teebauern!

[zum ev. Religionslehrer:] Gibt es auch noch andere heilige Gegenstände? — Heilige Gegenstände?!? — Unsere Englischlehrerin hat gesagt, unser Grammatikheft ist ein heiliger Gegenstand…

SHIT — Selbsthilfe in Tübingen.

NP: Bob Dylan—Slow Train

09may2005 · Scripting for Runaways

During our trip over the extended weekend, we quickly decided it would be easier if different persons buy various stuff everyone needed (say, group tickets, lunch and booze). So, everyone started to pay lots of different things with his/her own money, and it seemed like it would get a big mess.

Ruby to the rescue! I quickly (and I mean it, I coded while the others made breakfast) came up with a script to calculate who needed to give what amounts of money to other persons so everyone spent the same amount of money—all in all.

In the beginning, it was a very easy task. We created a simple file formatted like this:

# Name      Amount  Comment
Mr. Foo     72.00   Tickets
Lil' Bar    33.00   Booze

The people that didn’t pay anything just were listed, with empty amount and comment:

Squarehead
J. Random Hacker

Now, we would run the program and get an output like that:

J. Random Hacker -> Lil' Bar: 8.25
J. Random Hacker -> Mr. Foo: 18.00
Lil' Bar -> Mr. Foo: 9.75
Squarehead -> Lil' Bar: 8.25
Squarehead -> Mr. Foo: 18.00

4 Leute, 26.25 pro Person (Cashflow: 105.00)
Lil' Bar: Booze
Mr. Foo: Tickets

On a glance, you see who needs to pay whom what amount and what for. Some throwaway results are presented too.

Of course, that script was far too easy (the initial version didn’t sort the lines, so it was even easier, but I didn’t use VC, so it’s lost in the mists of time). Have a look at the source.

Soon, trouble started. One person came on the next day on her own, so she wouldn’t need to pay the group tickets (that’s only fair), but of course the other stuff we bought. I simply rewrote the program to divide the money among the known people at that time, and list everyone at the beginning and the one that came late just later. It worked fine and was a quick adjustment.

On the second day, we decided to make the first cashing-up. Obviously, we wouldn’t want to remove the entries so far (they were needed for the statistic, too), but they shouldn’t be calculated anymore. I hacked some code to make --- output the list and delete the current debts.

Then, there was someone that would leave sooner, and of course didn’t want to pay things she couldn’t make use of. Another hack, and -Name lines remove the person from the known people. This messes up the statistic a bit, but we didn’t really care.

The source code of our final version is available too.

Now, when I revisit the code, it’s very much like a awk script to me. You can see the typical elements, no additional classes were defined, only hashes and arrays store the data. The main program is a loop to parse, and run stuff at the same time. It’s rather icky with it’s linear, but messed flow.

But I can do better. Let’s build a domain specific language to manage group money! Now, the input file (a valid Ruby program, actually) looks like this:

person :MrFoo
person :LilBar
person :Squarehead
person :JRHacker

MrFoo.pay  72.00, "Tickets"
LilBar.pay 33.00, "Booze"

cashing_up!

The last version (for now, at least) is only two lines longer than the previous one, but lets Ruby do the parsing. This is actually very useful. For example, we could now also write:

EGG = 0.33
JRHacker.pay    12*EGG, "a dozen eggs"

There are further ways to clean up the code. For now, shared state is saved in constants and globals, we could create a wrapper class to take care of that (even instance_eval the file inside it, and use $SAFE), but for now, and especially for such a simple program, this is not needed.

I have demonstrated the usefulness of a general-purpose scripting language to ease tasks that do happen on the road and wouldn’t be a lot of fun to calculate manually (YMMV, but if you have lots of entries, you need to calculate a lot). Additionally, I showed how a flexible object-oriented language like Ruby can be used to create mini-languages to save parsing overhead and make use of the language features.

I wish you well on your travels
My friends I wish you well along the way
— Dan Bern, Jail

NP: Manu Chao—La Despedida

03may2005 · Vacation

From May 4th to May 8th I’m in Munich with friends for no particular reason except having free time :-) and therefore unlikely to blog here or at Anarchaia and to read my mail. While it may be possible I get WLAN somewhere (Anyone know a nifty, free hotspot? Please mail me.), don’t count on that.

I’ll write quotes, scribble stuff that happened and make a lot of pictures though, so wait till I get back and there will be something to see here.

NP: The Distillers—The Young Crazed Peeling

02may2005 · Jehoovacraft und Bordbordell

Wie üblich, Quotes:

[über den globalen Temperaturanstieg während des zweiten Weltkrieges:] Wieso sind die dann gestorben, weils in Stalingrad so kalt war?

Methan ist ökumenisch nicht wertvoll!

Bordbordell.

Also 100N ist schon eine ordentliche Kraft auf den Finger… — So viel wie 100 Tafeln Schokolade!

Hoffentlich spielen sie in der Sportschau besser…

Mädchen kommt von Made, oder?

Willkommen in der Gosse der Erwachsenen!

Jetzt hat’s ihm aber fast den Sack weggerissen!

Ich bau’ jetzt ein Jehoovacraft.

Die ist rumgelaufen und hat die Fisiker gesucht…

Du bläst auf deiner Blase.

NP: Interpol—NYC

« April 2005 June 2005 »