leah blogs

29nov2024 · Time series based monitoring in very heterogeneous environments

For the last few years, I have built a centralized monitoring system based on Prometheus that gathers various metrics across my whole private fleet of servers.

Since writing Prometheus exporters is rather simple, I have written some of them myself:

  • lywsd03mmc-exporter, a Prometheus exporter for the LYWSD03MMC BLE thermometer which monitors my flat’s temperature and air humidity (as well as when to replace the batteries).
  • card10-bme680-exporter, which accesses the environmental sensor of the card10 via serial port for measuring air quality.
  • tab-exporter which exports the number of Firefox tabs I have open (this needs countfftabs).
  • I also forked and improved nano-exporter, a very lightweight and zero-dependency Linux version of node_exporter.

Additionally I use the following pre-made exporters:

As you can see, this is quite a lot of different exporters running on different hosts.

A few months ago I decided to rebuild the centralized metrics server on top of VictoriaMetrics and with proper access control.

Why VictoriaMetrics? I tried it for a bit and it seems to use less RAM and less storage while supporting long term storage nicely. It also has better mechanisms for importing and exporting data than Prometheus.

Setting up VictoriaMetrics

Setting up victoria-metrics is very easy. I run it like this:

victoria-metrics -enableTCP6 \
  -storageDataPath=/srv/victoria-metrics \
  -retentionPeriod=99y \
  -httpListenAddr=127.0.0.1:8428 \
  -selfScrapeInterval=20s \
  -promscrape.config /usr/local/etc/prometheus/prometheus.yaml

Note that you need to enable IPv6 manually all the time.

The prometheus.yaml file is compatible with stock Prometheus.

I then use Grafana to connect to it, using the Prometheus protocol.

Scraping non-public endpoints

I don’t consider most of above metrics to be super private, but they certainly leak metadata (e.g. am I at home or not, how much mail do I get) so I don’t want to publish them on the net accessible to everyone that finds them.

Since Prometheus mainly favors a pull based model, we need to figure out ways to protect the data.

“Obvious” solutions like using mTLS or a maintenance VPN would require reconfiguring many machines and were deemded too much effort.

Essentially, I found three solutions that I will describe in detail:

Hiding metrics behind existing web servers

This is the easiest mechanism, when your host already runs a web server: simply use it as a proxy for the metrics, and filter access by IP address or Basic Auth. Since most webservers have HTTPS today already, you get encryption for free.

A simple nginx configuration to do this would be:

location /metrics {
        proxy_http_version 1.1;
        proxy_pass http://127.0.0.1:9100/metrics;
        access_log off;
        allow 127.0.0.1;
        allow ...;
        deny all;
}

You need to configure the metrics exporter to only listen on localhost.

Reverse SSH tunnelling

This is a quite elegant solution that provides encryption, flexible configuration, and can be used when the scrape target doesn’t have a public IP address. OpenSSH provides the -R flag to do reverse port forwarding, but most people don’t know it also can be used to run a reverse SOCKS proxy!

For this, I create a separate Unix user on scrape target and server, and assign it a SSH key. Then, the target runs:

ssh -o ServerAliveInterval=15 -o ExitOnForwardFailure=yes -R8083 server.example.com -NT

You should run this using service supervision so it tries to reconnect on network failures.

On the server side, you restrict access to only open a port using /etc/ssh/authorized_keys/scrape-user:

restrict,port-forwarding,permitlisten="8083" ssh-ed25519 ....

Then, the server can use port 8083 as a SOCKS proxy to access the network of the scrape target directly! So you can write a scrape config like:

  - job_name: 'nano-exporter-hecate'
    proxy_url: 'socks5://127.0.0.1:8083'
    static_configs:
    - targets: ['127.0.0.1:9100']
      labels:
        instance: 'hecate.home.vuxu.org:9100'
    - targets: ['10.0.0.119:9100']
      labels:
        instance: 'leto.home.vuxu.org:9100'

Here, we use a host in my home network that is always on, and can also safely scrape other hosts in the same LAN. (Note that the IP addresses in targets are resolved relative to the SSH client.)

Pushing with vmagent

I used the SSH approach for my notebook as well, but there’s the problem that we lose data when there’s no Internet connection available. I have thus moved my notebook to a solution using vmagent, which is included with VictoriaMetrics.

vmagent does scrape metrics just like VictoriaMetrics (and also supports all other metrics protocols, but I don’t use them), but it simply forwards everything via the Prometheus remote write protocol, and locally buffers data if it can’t forward the metrics currently.

On the server side, we need to provide access to the remote write protocol. Since VictoriaMetrics operates without internal access control, we can use the vmauth gateway to implement Basic Auth over TLS. (Again, you can use an existing HTTPS server and proxy it, but in this case I don’t have a HTTPS server on the metrics host.)

vmauth needs some configuration. First, we create a self-signed certificate (Let’s Encrypt support is limited to the commercial version of VictoriaMetrics unfortunately):

openssl req -x509 -newkey ed25519 \
	-keyout /usr/local/etc/vmauth/key.pem \
	-out /usr/local/etc/vmauth/cert.pem \
	-sha256 -days 3650 -nodes -subj "/CN=server.example.org" \
	-addext "subjectAltName = DNS:server.example.org"

I then run it as:

vmauth -enableTCP6 \
  -tls \
  -tlsCertFile=/usr/local/etc/vmauth/cert.pem \
  -tlsKeyFile=/usr/local/etc/vmauth/key.pem \
  -reloadAuthKey=secret \
  -flagsAuthKey=secret \
  -metricsAuthKey=secret \
  -pprofAuthKey=secret \
  -auth.config=/usr/local/etc/vmauth/vmauth.yaml

(I think it’s unfortunate that we need to add auth-keys now, as internal and forwarded API are exposed on the same port…)

The vmauth.yaml configures who can access:

users:
- username: "client"
  password: "evenmoresecret"
  url_prefix: "http://localhost:8428/"

Here, localhost:8428 is the VictoriaMetrics instance.

Finally, on the scrape target we can now run vmagent:

vmagent -enableTCP6 \
        -promscrape.config=/etc/vmagent/promscrape.yml \
        -httpListenAddr=127.0.0.1:8429 \
        -remoteWrite.url=https://server.example.org:8427/api/v1/write \
        -remoteWrite.label=vmagent=myhostname \
        -remoteWrite.retryMinInterval=30s \
        -remoteWrite.basicAuth.username=client \
        -remoteWrite.basicAuth.passwordFile=/etc/vmagent/passwd \
        -remoteWrite.tlsCAFile=/etc/vmagent/cert.pem

The cert.pem is copied from the server, the password is stored in /etc/vmagent/passwd.

Note that the vmagent instance is configured locally, so we can again scrape targets that are only reachable from it. We also can adjust the scrape targets without having to touch the metrics server itself.

NP: Godspeed You! Black Emperor—Broken Spires At Dead Kapital

06nov2024 · Problem Solving with Answer Set Programming

A few months ago I found an article on how to organize your training plan using logic programming, which used Haskell to implement a logic language.

At first, I thought this is a good problem to solve in Prolog, but there’s a difficulty: Prolog makes it quite hard to specify models that have multiple possible outcomes (yes, you can work with backtracking, but it gets tricky when you start to combine multiple predicates or fiddle around with bagof).

An alternative to classic Prolog is the concept of Answer Set Programming, based on the Stable Model Semantics by Gelfond and Lifschitz (1988). Here, the idea is that the logical formulas specify models, and the goal of Answer Set Programming is to compute the Answer Sets, i.e. a set of models fulfilling the formulas.

I can’t do a full introduction to Answer Set Programming here, but I can recommend the overview article Answer set programming at a glance by Brewka et. al., as well as the short article What Is Answer Set Programming? and the book Answer Set Programming by Lifschitz.

So what is a stable model?

Essentially, we can think of a stable model as a maximal set of atoms that are true (derivable from the rules) without being in conflict to other propositions. If we don’t use negation, that pretty boring

However, consider this logic program:

q :- not p.
p :- not q.

Contrary to what you may expect, it has two models: {p} and {q}. This is because ASP uses default negation, which means “not A” is assumed to hold unless A is derived. Both cannot be true at the same time, but likewise the empty set is not a model because it can be extended to {p} or {q}.

The program p :- not not p. has two models, {} and {p}.

Let’s do a small problem first so we get a feel for the tooling. I chose to use clingo, which is the most advanced open source implementation of Answer Set Programming.

If your distribution doesn’t include it, you can run it from Nix:

nix shell nixpkgs#clingo

Let’s model a directed graph with four nodes:

edge(a,b).
edge(a,c).
edge(b,d).
edge(c,d).

Our problem is to find paths from a to d. We can define a relation step, which either means “we can step from X to d” or “we can step from X to Y when there’s an edge from X to Y and another step from Y”:

0 { step(X,E) } 1 :- edge(X,E), E = d.
0 { step(X,Y) } 1 :- edge(X,Y), step(Y,_).

The 0 { ... } 1 decoration means that each step can be taken at most once.

Finally, we need to specify our goal, which in tradition of logic programming, is written as a negation:

:- not step(a,_).
#show step/2.

This means “it is not the case that there’s no step starting from a”.

The #show instructions limits clingo to only output the binary step relation. Let’s run it:

% gringo graph.pl | clasp    
clasp version 3.3.10
Reading from stdin
Solving...
Answer: 1
step(b,d) step(a,b)
SATISFIABLE

Models       : 1+
Calls        : 1
Time         : 0.001s (Solving: 0.00s 1st Model: 0.00s Unsat: 0.00s)
CPU Time     : 0.000s

Clingo has found a solution. We can go from a to b and from b to d.

We can also ask for all solutions, by passing -n0:

% gringo graph.pl | clasp -n0
clasp version 3.3.10
Reading from stdin
Solving...
Answer: 1
step(b,d) step(a,b)
Answer: 2
step(c,d) step(b,d) step(a,b)
Answer: 3
step(c,d) step(a,c)
Answer: 4
step(c,d) step(b,d) step(a,c)
Answer: 5
step(c,d) step(b,d) step(a,b) step(a,c)
SATISFIABLE

Models       : 5
Calls        : 1
Time         : 0.000s (Solving: 0.00s 1st Model: 0.00s Unsat: 0.00s)
CPU Time     : 0.000s

Here we see there are five possible models of this system (but every model except the first and the third has superflous steps).

To see why the 0 { ... } 1 matters, here’s what happens without it:

Answer: 1
step(b,d) step(c,d) step(a,b) step(a,c)
SATISFIABLE

Now there is only one model, but it contains all paths. Cardinality bounds are implemented using negation internally.

Planning Weekly Workouts in 30 lines of ASP

Back to the original task: Planning Weekly Workouts in 100 lines of Haskell.

The goal is to create a weekly plan of training exercises according to some specific rules.

First, we need some definitions related to weekdays:

weekday(1..7).

n_weekday(D, DD)  :- weekday(D), weekday(DD), (D+1) \ 7 == DD \ 7.
nn_weekday(D, DD) :- weekday(D), weekday(DD), (D+2) \ 7 == DD \ 7.

two_weekday(D, DD) :- n_weekday(D, DD).
two_weekday(D, DD) :- nn_weekday(D, DD).
two_weekday(D, DD) :- two_weekday(DD, D).

This is a bit more complicated because we later need “day after” and “within two days”. (\ means modulo.)

Now, let’s define workout and running exercises:

workout(push; pull; leg; none).
running(long; short; none).

To the plan: each weekday has one workout and one running exercise:

{ plan(D,W,R) : workout(W), running(R) } = 1 :- weekday(D).

We then add the constraints:

% No running on leg day
plan(D, leg, none) :- plan(D, leg, _).

% Short intervals run is after an outdoor pull/push workout
:- plan(_, none, short).

% Workout on Monday outdoors always, not legs
:- plan(1, none, _).
:- plan(1, leg, _).

% Pull day during the week?
:- plan(6..7, pull, _).

% One long run, one short run
{ plan(D,W,long) : weekday(D), workout(W) } = 1.
{ plan(D,W,short) : weekday(D), workout(W) } = 1.

% Two push, two pull, two leg
:- not { plan(D,push,R) } = 2.
:- not { plan(D,pull,R) } = 2.
:- not { plan(D,leg,R) } = 2.

% Long run on weekend
{ plan(6..7,W,long) : workout(W) } = 1.

% Run spaced out at least 2 days
:- plan(D,_,short), plan(DD,_,long), two_weekday(D, DD).

% Space out workouts at least 2 days
:- plan(D,W,_), plan(DD,W,_), W != none, two_weekday(D, DD).

% No leg day before short run
% No leg day before a long run
{ plan(D,W,R) : running(R), R != none, workout(W), plan(DD,leg,_), n_weekday(D, DD) } = 1.

#show plan/3.

clingo generates the same three plans as the Haskell program:

Solving...
Answer: 1
plan(5,leg,none) plan(2,leg,none) plan(1,pull,none) plan(3,push,none) plan(4,pull,short) plan(6,push,none) plan(7,none,long)
Answer: 2
plan(5,leg,none) plan(2,leg,none) plan(1,pull,none) plan(3,push,none) plan(4,pull,short) plan(6,none,none) plan(7,push,long)
Answer: 3
plan(7,leg,none) plan(4,leg,none) plan(1,pull,none) plan(2,push,short) plan(3,none,none) plan(5,pull,none) plan(6,push,long)
SATISFIABLE

Answer Set Programming against heteronormativity

The next problem I solved using ASP was a silly exercise from a statistics book (Lehn, Wegmann, Rettig: Aufgabensammlung zur Einführung in die Statistik), translation mine:

Exercise 11c) Ten French married couples bid goodbye to each other: the men with the men by shaking hands, the women with the women by kisses on both cheeks, and women with the men likewise by kisses on both cheeks. How many kisses and how many handshakes take place?

The implicit assumption of the exercise is that all married couples consist of a man and a woman, but it’s more fun to solve it in a generic way. So let’s do that using ASP.

First, we model the people and their fixed binary gender (more assumptions, but else the exercise is truly underspecified):

person(0..19).
gender(man; woman).
{ gender(P, G) : gender(G) } = 1 :- person(P).

I decided to model a couple using an even and odd numbered person:

couple(A, B) :- person(A), person(B), A != B, A/2 == B/2.

Next, we model the handshakes. Two people shake hands if they are not part of the same couple, and both are men:

handshake(A, B) :- person(A), person(B), A < B, not couple(A, B), gender(A, man), gender(B, man).

The A < B ensures we only count one handshake between two men, as handshaking is a symmetric act.

We also count the handshakes:

handshakes(N) :- N = #count { handshakes(A, B) : handshake(A, B) }.

Likewise, two kisses happen between two persons not in the same couple where one is a woman. Note that kisses are asymmetric since they go from mouth to cheek (this cost me an hour of debugging…):

kiss(A, B) :- person(A), person(B), not A == B, not couple(A, B), gender(A, woman).
kiss(A, B) :- person(A), person(B), not A == B, not couple(A, B), gender(B, woman).
kisses(N) :- H = #count { kisses(A, B) : kiss(A, B) }, N = H * 2.

Finally, we also count men and women for reporting purposes:

men(N) :- N = #count { g(P) : gender(P, man) }.
women(N) :- N = #count { g(P) : gender(P, woman) }.

#show handshakes/1.
#show kisses/1.
#show men/1.
#show women/1.

Thanks to clingo’s parallelization support, we can compute all possible 220 solutions very quickly:

Solving...
Answer: 1
women(0) men(20) kisses(0) handshakes(180)
Answer: 2
women(1) men(19) kisses(72) handshakes(162)
Answer: 3
women(1) men(19) kisses(72) handshakes(162)
...
Answer: 1048576
women(12) men(8) kisses(616) handshakes(26)
SATISFIABLE

Models       : 1048576
Calls        : 1
Time         : 51.163s (Solving: 51.00s 1st Model: 0.10s Unsat: 0.01s)
CPU Time     : 463.811s
Threads      : 16       (Winner: 0)

We can also specialize the model to have only hetero couples:

:- Z = 0..9, not gender(2*Z, man).
:- Z = 0..9, not gender(2*Z+1, woman).

Then we get the unique solution:

Solving...
Answer: 1
women(10) men(10) kisses(540) handshakes(45)
SATISFIABLE

I hope this post could interest you in how Answer Set Programming can be used. Some more interesting programs can be found on Hakan Kjellerstrand’s blog.

06apr2024 · What autoconf got right

Thanks to the xz backdoor, many people are now talking about the state of Linux packaging tools, and in particular build systems. As a maintainer of Void Linux and packager of many things, I have my five cents to add, so today I’ll be the contrarian and argue what autoconf got right. This is not an apology for GNU autotools; we are all well familiar with the issues they bring—yet some prospective replacements manage to be worse in certain aspects.

It provides a standardized interface.

This is of course the hardest point to tackle for any new contestor that has not reached a critical mass.

In Void Linux, the GNU configure build style is the most popular; roughly 2250 of about 14300 package template use it, and an additional 120 use the generic configure build style, which works similarily.

As a packager, the worst thing is to find a custom made build system that behaves totally different from what we know—if you decide to write your own ./configure scripts, please stick to the conventions! We packagers really have better things to do than figure out yet another homebrew build system that’s used exactly once.

These conventions are standardized as part of the GNU Coding Standards and they specify many features that packagers expect, but developers without own packaging experience are likely to miss. One example is support for staged installation, i.e. DESTDIR. This is essential for building packages that only contain the files that package actually ships. And no, support for --prefix is not enough to make up for this (if you wonder why, please read up the standards).

It is based on checking features.

People who have been staring at ./configure output for too long may want to disagree, but let me make my point: check-based configuration is the only way to write software that will continue to work properly in the future. If you instead keep a table of broken systems and workarounds, it a) will not be updated for future systems, b) doesn’t detect if the system was actually fixed (either by patching a bug, or adding a missing feature). It’s also very unlikely the software builds on an system unknown to the build system, even if it’s standards-compliant otherwise.

Of course, the checks should be reasonable (and in practice, often are excessive). If your code assumes a C99 environment, you don’t need to check whether all C99 functions you use are available. Likewise, if you don’t need macros for certain sizeof values, you don’t need to check for them, either. And you never need to check if sizeof char is actually 1—it literally can’t be anything else. Also, checking for functions can be done incorrectly.

Overrides are possible.

While checks are good, sometimes they are broken or a certain configuration needs special override, because a feature can’t be checked (for example, when cross-compiling). In this case, autoconf scripts provide options to override checks with a predetermined result; usually you can set an environment variable like gt_cv_func_printf_posix=yes.

Likewise, if a library is installed at a special location, it’s also easy to tell configure to use it.

The config.log tells what happened.

Many other systems do checks, but only tell that something has failed. Debugging this can be difficult. Autoconf writes what it does into a config.log file, which is sometimes helpful to debug a check.

There is support for cross-compiling and for host/target separation.

Cross-compilation is a build system feature that is often put in second place, but as a maintainer of a system that heavily makes use of it, I have a fair share of experience and can say that autotools are one of the best systems to support cross-compilation. Especially custom-made build systems are often very lacking. Cross-compilation of C programs is not particularly hard in principle, but your build system needs to know which code is going to run on the target, and that programs which need to run during compilation (e.g. to precompute tables or something) need to be compiled for the host (with different CFLAGS and so on).

It has few runtime dependencies.

This is also a defining feature of autoconf, as usually a basic POSIX shell environment (or, say, something busybox) is enough to run the configure scripts. This is in particular important for packages needed for bootstrapping. If your build system needs Python, well, then you need to compile Python first; but to compile Python, you need to compile all of its dependencies, which hopefully don’t need Python then themselves to build…

However, for packages not directly relevant to bootstrapping a system this is not such an essential feature.

NP: Policy of 3—Let It Build

27jan2024 · Definitions with shared hidden variables in Gerbil Scheme

It is well known that a Scheme function definition is merely defining a lambda function to a variable:

(define (hypot a b)
  (sqrt (+ (* a a) (* b b))))

This function could be written just as well:

(define hypot
  (lambda (a b)
    (sqrt (+ (* a a) (* b b)))))

Occasionally, we need this explicit style when a function needs to keep around some internal state:

(define generate-id
  (let ((i 0))
    (lambda ()
      (set! i (+ i 1))
      i)))

(generate-id) ;=> 1
(generate-id) ;=> 2
(generate-id) ;=> 3

A problem arises when we need multiple functions to share the state, let’s say we also want a reset-id function:

(define i 0)

(define (generate-id)
  (set! i (+ i 1))
  i)

(define (reset-id)
  (set! i 0))

(generate-id) ;=> 1
(generate-id) ;=> 2
(generate-id) ;=> 3
(reset-id)
(generate-id) ;=> 1
(generate-id) ;=> 2

Here, I had to make i a global variable to allow two toplevel functions to access it. This is ugly, of course. When you did deeper into the scheme specs, you may find define-values, which lets us write:

(define-values (generate-id reset-id)
  (let ((i 0))
    (values
      (lambda ()
        (set! i (+ i 1))
        i)
      (lambda ()
        (set! i 0)))))

This hides i successfully from the outer world, but at what cost. The programming language Standard ML has a nice feature to write this idiomatically, namely local:

local
    val i = ref 0
in
    fun generate_id() = (i := !i + 1; !i)
    fun reset_id() = i := 0
end

Here, the binding of i is not visible to the outer world as well. It would be nice to have this in Scheme too, I thought. Racket provides it, but the implementation is quite hairy. Since I mostly use Gerbil Scheme these days, I thought perhaps I can reuse the module system. In Gerbil, we could also write:

(module ids
  (export generate-id reset-id)
  (def i 0)
  (def (generate-id)
    (set! i (+ i 1))
    i)
  (def (reset-id)
    (set! i 0)))

It is used like this:

(import ids)
(generate-id) ;=> 1
(generate-id) ;=> 2
(generate-id) ;=> 3
(reset-id)
(generate-id) ;=> 1
(generate-id) ;=> 2

So, let’s use the module system of Gerbil (which supports nested modules) to implement local. As a first step, we need to write a macro to define a new module and immediately import it. Since all Gerbil modules need to have a name, we’ll make up new names using gensym:

(import :std/sugar)
(defrule (local-module body ...)
  (with-id ((mod (gensym 'mod)))
    (begin
      (module mod
        body ...)
      (import mod))))

Here, we use the with-id macro to transform mod into a freshly generated symbol. defrule is just a shortcut for regular Scheme syntax-rules, which guarantees a hygienic macro. We can use the identifier mod inside the body without problems:

> (local-module (export mod) (def (mod) 42))
> (mod)
42
> mod
#<procedure #27 mod378#mod>

The internal representation of the function shows us it was defined in a generated module. Re-running local-module verifies that fresh modules are generated.

Now to the second step. We want a macro that takes local bindings and a body and only exports the definitions of the body. Using the Gerbil export modifier except-out, we can export everything but the local bindings; this is good enough for us.

Thanks to the power of syntax-rules, this is now straight forward to state:

(defrule (local ((var val) ...) body ...)
  (local-module
    (export (except-out #t var ...))
    (def var val) ...
    body ...))

We rewrite all let-style binding pairs into definitions as well, but we make sure not to export their names.

Now, we can write our running example like this:

(local ((i 0))
  (def (generate-id)
    (set! i (+ i 1))
    i)
  (def (reset-id)
    (set! i 0)))

(generate-id) ;=> 1
(generate-id) ;=> 2
(generate-id) ;=> 3
(reset-id)
(generate-id) ;=> 1
(generate-id) ;=> 2

But perhaps you prefer the slightly less magical way of using modules directly (which is also what many recommend to do in Standard ML). Still, creating modules by macros for fine-grained scoping is a good trick to have up your sleeve.

[Addendum 2024-01-28:] Drew Crampsie showed me another trick with nested modules that will help improve the local macro: when you import a module into another, the bindings are not re-exported by (export #t) (which means “export all defined identifiers”). (You can use (export (import: module)) if you really wanted this.) This means we don’t need the except-out trick, but we can just use two nested modules instead:

(defrule (local ((var val) ...) body ...)
  (local-module
    (local-module
      (export #t)
      (def var val) ...)
   (export #t)
   body ...))

The inner module contains all the local bindings, the outer one the visible bindings made in the body. Now, definitions in the body can also shadow local bindings (not that this is very useful):

(local ((var 5))
  (def (get)
    var)
  (def var 6))

(get) ;=> 6

NP: Dead Moon—Play With Fire

24dec2023 · Merry Christmas!


The Christmas Duck Song

Frohe Weihnachten, ein schönes Fest, und einen guten Rutsch ins neue Jahr wünscht euch
Leah Neukirchen

Merry Christmas and a Happy New Year!

NP: ANOHNI and the Johnsons—You Be Free

26nov2023 · On division by three

I drafted this article in Feburary for a blog project that never came to realization, but as it’s a nice little article so I’ll just post it here:

I was thoroughly nerdsniped by a recent post on The Unix Heritage Society mailing list, where this anecdote was mentioned by Noel Chiappa:

Steve Ward told another oral story which I’m pretty sure is true, though. They ask the candidate to design a state machine (or digital logic, I forget which) which can tell if a number is divisible by three (I think I have the details correct, but I’m not absolutely certain). So they describe one – and then point out that you can feed the number in from either end (most or least significant end first) – and proves that it will work either way! The committee was blown away.

Since this sounded like an interesting statement, I tried to find a proof myself and found two solutions, one using automata theory and one using ordinary math.

Read as a PDF: On division by three.

NP: Lightning Bolt—No Rest for the Obsessed

16oct2023 · Das Punktesystem

Ein Entwurfsmuster für kollektiv verwaltete Räume. (Entworfen um 2011 von Leah Neukirchen.)

DAS PUNKTESYSTEM

Kontext: Verschiedene Waren sollen gegen ein Entgelt zur Verfügung gestellt werden können.

Problem: Die Waren haben unterschiedliche Preise, der Verkauf soll zudem keinen Verlust erzeugen.

Verwandte Muster:

  • STRICHLISTE, falls alles die gleichen Preise hat.
  • KASSE DES VERTRAUENS, falls keine Vorzahlung erwünscht ist.

Lösung: Die Einführung eines PUNKTESYSTEMS. Dies besteht aus einem zentral angebrachten Blatt Papier auf dem jeder Nutzende eine Zeile belegt (Stammnutzer können eine Zeile in der Vorlage vorgedruckt bekommen).

Jede Zeile besteht aus z.B. drei Kästchen übereinander. Es wird ein Preis pro Kästchen spezifiziert (z.B. 33ct). Nutzende können nun in eine KASSE DES VERTRAUENS einzahlen, und die vorbezahlten Kästchen mit einem Stift markieren. Bei der Entnahme von Waren wird bezahlt, in dem man die entsprechende Anzahl Kästchen ausmalt. Dank der Vorkasse können Waren auf Vorrat eingekauft werden. Es ist darauf zu achten, dass niemand bei der Kasse Schulden macht.

Sind sehr viele Zeilen gefüllt, so wird ein neues Blatt ausgedruckt, und die geleisteten Vorzahlungen werden in neue Zeilen übertragen.

Ein Bild der Vorlage

NP: Mrs. Piss—You Took Everything

20apr2023 · A conservative extension of ISO 8601 to support fractional days

You probably have seen ISO 8601 timestamps with fractional seconds, such as this one:

% date --iso-8601=ns 
2023-04-20T18:45:11,094052607+02:00

However, many people don’t know ISO 8601 also allows for fractional minutes and hours!

According to the standard, these timestamps are equivalent (rounded to a second):

2023-04-20T18:45:11
2023-04-20T18:45,18333
2023-04-20T18,75305

Note that in contrast to common scientific usage, the decimal part is recommended to be separated by a comma and not a full stop, although the latter is permitted too.

However, the standard does not specify the obvious next generalization, that is, allowing fractional days. I thus propose to extend ISO 8601 in the following way, which does not change the meaning of valid existing representations:

The local time representation (after the optional time designator) may consist of only a decimal fraction, which then is interpreted as a multiple of 24 hours.

Thus, we can write the above timestamp also like this:

2023-04-20T,78137
2023-04-20,78137

Now, why would one want this? Essentially, there are three reasons:

First, it’s cute and an obvious extension of the existing format.

Second, it allows representing times of the French Republican Calendar in a natural way, which uses a decimal system as well: in this calendar, the day is divided into 10 hours of 100 minutes and 100 seconds each. Thus, the digits align directly to a decimal fraction of the whole day. The above timestamp is then (computed using fdate):

Primidi, 1 Floréal CCXXXI (231) 7:81:37

Note that we use local time here, not Paris time. If you insist on using Paris solar time, you need to offset 9 ISO minutes and 21 ISO seconds, which can be approximated as

2023-04-20T,77350+0009

Note that ISO 8601 does not allow for specifying offsets from UTC in seconds (another obvious oversight).

Finally, the mechanism also supports the use of Swatch Internet Time, a late 90s decimal time system. Here, the day is divided into 1000 beats, and the offset is fixed UTC+1 (for the Swatch headquarters in Biel):

2023-04-20T,739+0100

This is a bit more verbose than @739 but at least it’s an international standard already!

NP: Tristan Brusch feat. Annett Louisan—Kein Problem

24dec2022 · Merry Christmas!

Picture of Kropotkin with a Christmas hat

Frohe Weihnachten, ein schönes Fest, und einen guten Rutsch ins neue Jahr wünscht euch
Leah Neukirchen

Merry Christmas and a Happy New Year!

NP: First Aid Kit—Fallen Snow

11oct2022 · 50 blank pages, or: black-box debugging of PDF rendering in printers

I was happily typesetting a book with ConTeXt and its new engine LMTX when I decided to print a few pages to see if I got the sizes right. Since I despise printer daemons, I directly print stuff over the network using a little script.

To my surprise, I just got a blank page out.

As the title of this post suggests, it won’t be the only blank page. This is the story of me debugging PDF generation in LMTX.

Giving it a closer look, the page wasn’t entirely blank. I could see the cutmarks I added to indicate the page size. During creation, I used MuPDF as a previewer—it’s lightweight and stays out of my way. But apparently the PDF was broken, so I tried a few other previewers. Evince and Firefox pdf.js rendered it fine. I looked at Okular and xpdf, and it came out nicely as well. At a later point, I even installed ancient Acrobat 7(!) and it would display as intended.

I tried the other printer in our university office. Another blank page.

Two different vendors, yet they both fail to print a simple PDF?

I secretly hoped some previewer would also render a blank page. Then I could just compile its source code on my machine and throw all kinds of debugging tools at it… but they all worked.

I tried converting the PDF to PDF with Ghostscript, and then it printed fine. So the PDF couldn’t be too wrong. But I wanted to fix it directly.

So how do you debug a PDF that gets printed wrongly but seems to be fine else?

My first intuition was to make a PDF that works, and then look at the differences. So I created a simple document and ran it through the previous ConTeXt version, called MKIV. This version uses LuaTeX as an engine. It printed fine. (No nobody’s surprise—I would have discovered this years ago else.)

I put both PDFs through various PDF validators, but they all said both where good.

Time to dig deeper. I disabled PDF compression and looked at both PDF files in a text editor. Sure, there were a lot of little differences. But fundamentally? Pretty much the same.

… I looked at the first printout again. Not only the page marks were printed, but actually the tiny page numbers inside them were too! I checked the PDF, and saw it uses two fonts (using pdffonts):

NSOLKP+TeXGyreSchola-Regular         CID Type 0C       Identity-H       yes yes yes      4  0
FFMARX+DejaVuSansMono                CID TrueType      Identity-H       yes yes yes      5  0

The page numbers use the DejaVu Sans font, which is supplied in TrueType format. I changed the main font of my test document to DejaVu Sans, and voilà: it printed fine. I was very happy about this, as it meant LMTX can generate printable PDF files in principle. But for its default font (Latin Roman) and the font I wanted to print in (TeX Gyre Schola), there apprently was an issue.

I knew the basics of PDF from decades ago when I wrote a PDF generator from scratch. (I never got around doing more than putting a few characters on a page, though.) Now it was time to learn about the PDF font formats.

Both the MKIV and the LMTX engine use the “CID Type 0C” font format these days, which embeds only the actually used glyphs from an OpenType font into the PDF. I pulled out the CID fonts from the PDF (using mutool extract). While file didn’t recognize the file format, luckily FontForge could open it fine. (As I learned later, FontForge can open the PDF directly and import its fonts.)

I noticed a first difference: while MKIV (and thus LuaTeX) spread out the glyphs over the positions, LMTX nicely arranged the used glyphs starting from code point 1. I had already contacted Hans Hagen, the main developer behind ConTeXt, and we wondered whether starting the glyphs from 31 would help… again it rendered nicely on all previewers, but still printed blank pages.

I had the strong suspicion that the font embedding was the problem. To verify this hypothesis, I manually fiddled the LMTX font into the MKIV document (this was easy because it was smaller, so I just had to add some padding to make the document be valid again), adjusted some code points in the PDF, and it would render glyphs on the screen. But it would not print. So now I was fairly sure that the font stream was the culprit, and not some other part of the PDF.

After more research, I found a tool to dump a CID font in a readable format: CFFDump. This small Java program turned out to be essential for tracking down the bug.

It generates a dump that looks like this:

% CFF Dump Output
% File: font-0008.cid


--------------------------------------------------------------------------------

Header (0x00000000):
    major: 1
    minor: 0
    hdrSize: 4
    offSize: 4

--------------------------------------------------------------------------------

Name INDEX (0x00000004):
  count: 1, offSize: 1
    [0]: (CLLXEY+LMRoman12-Regular)

--------------------------------------------------------------------------------

Top DICT INDEX (0x00000021):
  count: 1, offSize: 1
  [0] (0x00000026):
  <<
    /ROS << /Registry (Adobe) /Ordering (Identity) /Supplement 0 >>
    /CIDCount 15
    /FamilyName (LMRoman12)  % SID 392
    /FullName (LMRoman12-Regular)  % SID 393
    /Weight (Normal)  % SID 394
    /FontBBox [-422 -280 1394 1127]
    /isFixedPitch false
    /ItalicAngle 0
    /UnderlinePosition -175
    /UnderlineThickness 44
    /CharstringType 2
    /FontMatrix [0.001 0 0 0.001 0 0]
    /StrokeWidth 0
    /CharStrings 257  % offset
    /charset 220  % offset
    /FDArray 1751  % offset
    /FDSelect 249  % offset
    /Private [23 1728]  % [size offset]
    % ----- Following entries are missing, so they get default values: -----
    /PaintType 0  % default
    /CIDFontVersion 0  % default
    /CIDFontRevision 0  % default
    /CIDFontType 0  % default
  >>

And it goes on and on, detailing all the things specified in the font.

Inevitably, I had to dig into the internals of CFF fonts, that is Adobe’s Technical Note #5176.

I carefully compared the dump of the working MKIV font with the broken LMTX font… and didn’t find substantial differences. Sure, one copied a few more metadata fields, and the other had more font fields set, but mostly to values that were the default anyway. Nothing that seemed to be related to our bug. And also, the various PDF viewers rendered the document fine, so there couldn’t have been a major mistake there.

By now I had learned about the design of LMTX, and luckily I saw that all parts of this font embedding were written in quite straight-forward Lua code that I could easily modify, so experiments were easy. Unfortunately, I didn’t have a printer at home so I had to annoy some of my friends to do test prints for me. They printed a lot of blank pages…

But I just couldn’t track down the problem. A reasonable person would have given up ages ago and just fed the PDF through Ghostscript before printing, but I wanted to get to the bottom of the thing; and I also wanted this new TeX engine to produce working documents out of the box.

In my time as a software developer, one thing I learned about debugging is that if a thing takes a long time to debug, it can be for two reasons: either the cause is much more simple than you thought, or it’s much more complicated.

I chose violence. I corrupted the CID font in various ways… the printer would stop working and printed an error message instead. Some printers have an internal error log, but before these experiments it was empty.

Perhaps the document wasn’t wrong, but the printer software was? But by now we could reproduce the issue with a bunch of printers—how can they all have the same issue?

After some wrong attempts related to font hinting, I was out of ideas and decided to kill all fields one by one and check if it made any difference.

I deleted the /FontMatrix entry and… suddenly it printed nicely.

Now, the font matrix is a feature of CFF fonts to encode their scaling and shearing factors. It’s a 2x3 matrix that encodes an affine transformation (perhaps you know this from SVG). The details don’t matter, but in practice you only have two values set and they determine the font size relative to the sizes used in the font drawing instructions. By default, the font matrix is [0.001 0 0 0.001 0 0], meaning that moving by 1000 units will move by 1 PostScript point on paper.

I was happy, but I also was very confused: of all things, why exactly did that fix it? I noticed earlier that the MKIV document didn’t have the font matrix set, but I also looked at the Ghostscript output and there it worked fine. Even more so, LMTX set the font matrix to its default value! It shouldn’t make a difference at all!

Gone this far, I wasn’t satisfied without a real answer. I wondered if LMTX encoded the font matrix the wrong way, but after digging into the spec for that (Technical Note #5177) and double checking, it seemed fine. The working Ghostscript PDF used exactly the same byte sequence to encode the font matrix.

Staring some more at CFFDump output, I finally noticed what Ghostscript did differently: the CFF had two font matrices defined! CFF allows defining a font matrix in the “Top DICT INDEX” as well as the “Font DICT INDEX”.

And while the “Top DICT INDEX” was the same that we used, [0.001 0 0 0.001 0 0], the one in the “Font DICT INDEX” was [1 0 0 1 0 0], i.e. the identity matrix. I added this matrix to LMTX output, and finally the PDF printed properly.

Still, this was a surprise. Why would explicitly setting the font matrix to its default value change the behavior? It turns out the reason for this is an interaction between both of these default values. Unfortunately, it seems to be not specified by Adobe. I found a similar bug in Ghostscript that explains the reasonable thing to do:

1) If both Top DICT and Font DICT does _not_ have FontMatrix, then Top DICT = [0.001 0 0 0.001 0 0], Font DICT 
= [1 0 0 1 0 0].  (Or, Top DICT = (absent), Font DICT = [0.001 0 0 0.001 0 0] then let '/CIDFont defineresource' 
make Top DICT = [0.001 0 0 0.001 0 0], Font DICT = [1 0 0 1 0 0].)

2) If Top DICT has FontMatrix and Font DICT doesn't, then Top DICT = (supplied matrix), Font DICT = [1 0 0 1 0 0].

3) If Top DICT does not have FontMatrix but Font DICT does, then Top DICT = [1 0 0 1 0 0], Font DICT = 
(supplied matrix).  (Or, Top DICT = (absent), Font DICT = (supplied matrix) then let '/CIDFont defineresource' 
make Top DICT = [0.001 0 0 0.001 0 0], Font DICT = (supplied matrix 1000 times larger). I think this is better.)

4) If both Top DICT and Font DICT _does_ have FontMatrix, then Top DICT = (supplied matrix), Font DICT = 
(supplied matrix).

All previewers seem to have adapted this algorithm. But certain older printers botched step 2. They end up with two font matrices [0.001 0 0 0.001 0 0] that are multiplied together, which ends up printing your document at a thousandth of its size; i.e. you get a blank page. But note that it’s a perfectly valid PDF!

We thus had two ways to fix the bug: write no font matrix at all, or write both of them. I was first learning towards the latter, and do it as Ghostscript does, but we found an issue with FontForge that it will render the fonts internally at 1000x the size and thus consume a lot more memory. Since we did not find a need to use a non-default font matrix, we decided to go with the former: no font matrix at all. After all, it worked fine for LuaTeX all those years, too.

(Why did this issue not affect the TrueType font? It’s embedded in a different format that only has a single scaling factor and has no concept of a font matrix.)

A trial print of the PDF on many printers is on-going but seems to be very promising so far, so that this fix (essentially, deletion of one line of code) will be shipped soon in a ConTeXT snapshot for general availability.

I would like to thank Hans Hagen for not giving up on helping me with this, and all my friends that test-printed some page for me and/or had to hear me talking about nothing else for a week or so.

NP: Rites of Spring—All Through A Life

Copyright © 2004–2022