Recently I have been nerdsniped into thinking about optimal stack
programs, and one part of this are optimal stack arrangements, that
is, words (i.e. procedures) that rearrange the stack but do not look into
the contents. Borrowing terms of functional programming parlor, we can call
these “words for free”, as their behavior is uniquely determined by their
stack effect, which we write as `( before -- after )`

as is customary in Forth.
For example `( a -- aa )`

is `dup`

, which duplicates top of stack,
while `( ab -- ba )`

is `swap`

, which exchanges the two topmost
elements of stack.

In most common Forths, the stack arrangement primitives are: `drop`

,
`dup`

, `swap`

. Together with the words to handle the return stack
(`>r`

and `r>`

), every stack arrangement can be performed, as we’ll see below.
I don’t want to explain the semantics of the return stack, for now we
can just think of `>r`

moving an element from *the* stack to the return
stack, and `r>`

from the return stack to *the* stack. Additionally, at
the end of the word, the return stack must be in the same state at as
the beginning of the word.

For reasons of performance, of course many Forths also implement more complex words as primitives.

I wondered if a Forth with fewer than these 5 words can still be complete, and a bit of research turned up a paper by Mikael Patel from 1990. It’s written in Swedish, but we can figure out the essentials from page 4:

```
variable t
: t! ( x -- ) t ! ;
: t@ ( -- x ) t @ ;
```

If you don’t speak Forth, that’s just a single global variable
and two words to set (`t!`

) and read it (`t@`

). The paper
then shows how to define `drop`

, `dup`

and `swap`

from these.
(Or see below.)

We shall call a Forth with `t!`

, `t@`

, `>r`

, and `r>`

a Minimal Forth.

So, I wondered how to get the optimal definitions for many stack words, and while I wished for a smarter approach I didn’t find anything that can deal with the infinite stack well (of course, you can limit it, but also implementing a stack machine in a way a SAT/SMT solver likes seems to be inconvenient), I just decided I can brute force short forth programs.

So I wrote `femtoforth`

, which implements these minimal subsets in Ruby:

```
def femtoforth(input, code)
s = input.dup
r = []
t = nil
code.each { |insn|
case insn
when :"t@"; s << t
when :"t!"; return :oos if s.empty?; t = s.pop
when :">r"; return :oos if s.empty?; r << s.pop
when :"r>"; return :oor if r.empty?; s << r.pop
when :"dup"; return :oos if s.empty?; s << s.last
when :"swap"; return :oos if s.size < 2; s[-1],s[-2] = s[-2],s[-1]
when "drop"; return :oos if s.empty?; s.pop
else raise "Invalid insn: #{insn}"
end
}
return :error if !r.empty? or s.include? nil
s
end
```

And then we can happily brute force all programs of length `l`

:

```
stdops.repeated_permutation(l).each { |code|
if femtoforth(input, code) == output
return code
end
}
```

Turns out that with recent hardware, programs up to size 16 are tractable in a few hours without a particularily fast implementation.

These are the results I found for the standard words, and some custom ones I think can be useful:

Word | Effect | Standard Forth | Minimal Forth |

DROP | ( a – ) | drop | t! |

DUP | ( a – aa ) | dup | t! t@ t@ |

SWAP | ( ab – ba ) | swap | t! >r t@ r> |

NIP | ( ab – b ) | swap drop | >r t! r> |

OVER | ( ab – aba ) | >r dup r> swap | >r t! t@ r> t@ |

TUCK | ( ab – bab ) | dup >r swap r> | t! >r t@ r> t@ |

ROT | ( abc – bca ) | >r swap r> swap | >r >r t! r> r> t@ |

2SWAP | ( abcd – cdab ) | >r swap >r swap r> r> swap >r swap r> | >r t! >r >r t@ r> r> r> t! >r >r t@ r> r> |

2DUP | ( ab – abab ) | >r dup r> dup >r swap r> | t! t@ t@ >r >r t! t@ r> t@ r> |

2OVER | ( abcd – abcdab ) | n/a | n/a |

2DROP | ( ab – ) | drop drop | t! t! |

-ROT | ( abc – cab ) | swap >r swap r> | t! >r >r t@ r> r> |

3REV | ( abc – cba ) | swap >r swap r> swap | t! >r t@ >r t! r> r> t@ |

4REV | ( abcd – dcba ) | >r swap r> swap >r swap >r swap r> swap r> swap | t! >r t@ >r t! r> r> t@ >r >r >r t! r> r> r> t@ |

THIRD | ( abc – abca ) | >r >r dup r> swap r> swap | >r >r t! t@ r> r> t@ |

FOURTH | ( abcd – abcda ) | >r >r >r dup r> swap r> swap r> swap | >r >r >r t! t@ r> r> r> t@ |

3DUP | ( abc – abcabc ) | dup >r swap dup >r swap >r >r dup r> swap r> swap r> r> | n/a |

The words with `n/a`

are not possible to define in up to 16 words.

We know that `2OVER`

is `>r >r 2dup r> r> 2swap`

,
so by mere inlining that we get 21 primitive words in Standard Forth,
and 28 primitive words in Minimal Forth.
A definition for `3DUP`

is `third third third`

, so
`3DUP`

is at most 21 words in Minimal Forth,
which is shorter than a translation of Standard Forth `3DUP`

to Minimal Forth.

Why are both sets of words complete in the sense that *any* stack
arrangement can be expressed with it? Essentially, we can decompose
every stack arrangement into a compsition of multiple words that
extract a certain stack value and then keep the rest of the stack below, e.g.
to implement the word `( abc -- cab )`

we can compose the four words
`( abc -- cabc )`

then `( abc -- aabc )`

then `( abc -- babc )`

then `( abc -- )`

. Of course, this doesn’t yield an efficient program.
But all of these words are easy to define by shuffling
the stack to the return stack until there is the value you need, and
then `dup`

‘ing and `swap`

‘ing it around until you move the whole
return stack back to the stack in the end.

Word | Effect | Standard Forth | Minimal Forth |

W1 | ( abc – cabc ) | dup >r swap >r swap r> r> | t! >r >r t@ r> r> t@ |

W2 | ( abc – aabc ) | >r >r dup r> r> | >r >r t! t@ t@ r> r> |

W3 | ( abc – babc ) | >r dup >r swap r> r> | >r t! >r t@ r> t@ r> |

W4 | ( abc – ) | drop drop drop | t! t! t! |

In this contemplation, we focused on the runtime of the words (assuming every primitive takes 1 step). It would also be interesting to optimize for size, by factoring out common (sub)words into their own definitions, but this makes the problem a lot more complex.

NP: Pearl Jam—Take The Long Way