Java Generics are Turing Complete

This page contains material that supplements the article [Grigore, Java Generics are Turing Complete, arXiv].

Just Show me the Basics

Overview

Several programs are included, and they fit together as follows:

The data formats are as follows:

CFG: context free grammar
Simper: simple imperative programming language
TM: Turing machine

The programs run_simper and run_tm are interpreters, meant to be used for testing. The program javac is meant to be a compiler, but we use it here as an interpreter. The program cyk_of_cfg is a parser generator. The programs tm_of_simper and java_of_tm are compilers.

The rest of the document assumes that you can run these programs. More precisely, it assumes you are in a Docker container, started as follows:

  docker run -ti rgrig/javats

You might need to install Docker, if you don't already have it. Alternatively, you could download the sources from the end of this document, and compile the programs yourself. Each of the examples below assumes you use the Docker method, and each of the examples starts by reminding you how to start the Docker container. But, please don't run a Docker container inside a docker container inside a …

Do not expect javac to be an efficient interpreter: it was not meant to be an interpreter at all. Also, tm_of_simper generates rather inefficient Turing machines: they are small enough for run_tm, but way too big for javac. However, if you are willing to write the Turing machine by hand, then javac will likely handle it. If you would be interested in adding optimizations to tm_of_simper, or even in re-implementing it completely in a better way, please let me know.

Example: Counting in Binary

The first example from the paper (Section 6.1) is a Turing machine that counts, starting from a given $k$ bit number, up to $2^k-1$ and then halts. In ASCII notation, it looks as follows:

  States { start flip rewind }
  Alphabet { b0 b1 end }
  Transitions {
    start hash -> flip end Right
    flip b0 -> rewind b1 Left
    flip b1 -> flip b0 Right
    flip hash -> Halt
    rewind b0 -> rewind b0 Left
    rewind b1 -> rewind b1 Left
    rewind end -> flip end Right
  }

Let's run it:

  docker run -ti rgrig/javats
  mkdir work
  cp tm/count.tm work
  cd work
  echo "b0 b0" | run_tm -v count.tm

If all is OK, you should see:

     start:  <hash> b0 b0
      flip: end <b0> b0
    rewind:  <end> b1 b0
      flip: end <b1> b0
      flip: end b0 <b0>
    rewind: end <b0> b1
    rewind:  <end> b0 b1
      flip: end <b0> b1
    rewind:  <end> b1 b1
      flip: end <b1> b1
      flip: end b0 <b1>
      flip: end b0 b0 <hash>
  HALT

The least significant bit is towards the left. The head of the Turing machine is marked by angle brackets. If we look only at those lines where the head is on the end symbol we see

    rewind:  <end> b1 b0
    rewind:  <end> b0 b1
    rewind:  <end> b1 b1

If we run

  echo "b0 b0 b0 b0" | run_tm -v count.tm | grep '<end>'

we see

    rewind:  <end> b1 b0 b0 b0
    rewind:  <end> b0 b1 b0 b0
    rewind:  <end> b1 b1 b0 b0
    rewind:  <end> b0 b0 b1 b0
    rewind:  <end> b1 b0 b1 b0
    rewind:  <end> b0 b1 b1 b0
    rewind:  <end> b1 b1 b1 b0
    rewind:  <end> b0 b0 b0 b1
    rewind:  <end> b1 b0 b0 b1
    rewind:  <end> b0 b1 b0 b1
    rewind:  <end> b1 b1 b0 b1
    rewind:  <end> b0 b0 b1 b1
    rewind:  <end> b1 b0 b1 b1
    rewind:  <end> b0 b1 b1 b1
    rewind:  <end> b1 b1 b1 b1

It should now be clear that this machine counts in binary. However, if we do not use the verbose flag (-v) then we get at most one bit of output: HALT or STUCK. In other words, we can solve decision problems; for other problems we need to compute their output bit by bit. But that's in general: for our Turing machine that counts the output will always be HALT, printed after more or less time.

Now let's produce Java code:

  java_of_tm count.tm

You should see a file count.java, and you should be able to compile it:

  javac count.java

This did not run the Turing machine. To run the Turing machine, we need to first provide it with some input.

  cp ../tm/Tape.java .
  javac Tape.java

This did run the Turing machine, but on an empty tape. You can run it on a different tape by inserting method calls in-between start and stop in Tape.java. For example, you could edit Tape.java so it looks as follows:

  class Tape {
    void f() {
      E<?super E<?super Z>> l =
        Builder.start.b0().b0().stop();
    }
  }

and then run the Turing machine on the input tape ‘b0 b0’.

  javac Tape.java

There is no type error, which corresponds to the output HALT. (We will see in a later example how STUCK results in type errors.) You can try to increase the number of calls to b0(). On my computer, I get a StackOverflowError already at four bits. No matter: we can ask javac to work harder:

  javac -J-Xss100m Tape.java

Now we can keep adding bits, and watch how the compilation time goes up, exponentially. With 10 bits, it already takes about a minute.

Example: Counting in Binary, Easy but Slow

Instead of implementing a Turing machine to count, we can do it in a straightforward way in Simper.

  docker run -ti rgrig/javats
  mkdir work
  cp simper/count.simper work
  cd work
  cat count.simper

You should see

  var x
  x := 0
  while x != 1 { ++x }
  halt

You can run this directly, and you can compile it to Java.

  echo "" | run_simper count.simper
  tm_of_simper count.simper
  echo "" | run_tm count.tm
  java_of_tm count.tm

The resulting Java code is 4.6MiB in size, though. If you have lots of memory, then you might be able to compile it, although it's also possible that javac will run out of memory.

  javac -J-Xss100m count.java   # this might take a minute or two

This time we don't need anything on the input tape. so use the following Tape.java:

  cp ../tm/Tape.java .
  javac -J-Xss100m Tape.java

This should terminate without a type error. On my machine, javac runs out of memory. As I said, tm_of_simper doesn't generate efficient Turing machines, and javac wasn't meant to be an interpreter. If javac does terminate for you, I'd appreciate an email letting me know how long it took and how much memory it used.

Example: Computing Ackermann

We can count, but we can also perform more interesting computations. For example, let's compute an Ackermann-like function.

  docker run -ti rgrig/javats
  mkdir work
  cp simper/ack2.simper work
  cd work
  cat ack2.simper

You should see

  var m, a0, a1, a2
  m := 2
  a0 := 0
  a1 := 0
  a2 := 1
  eval: if a0 != 0 { --a0 ++m goto eval }
  l1: if a1 != 0 { --a1 a0:=m ++a0 goto eval }
  l2: if a2 != 0 { --a2 a1:=m ++a1 goto eval }
  if m == 23 { halt }

The usual definition of Ackermann is recursive. However, Simper does not have procedures, nor recursion. Luckily, there is a fun way of computing some function that has the same growth rate, a way which I learned from [Schnoebelen, Revisiting Ackermann-hardness for lossy counter machines and Petri nets, MFCS2010]. So, if you want to understand why I'm saying that the code above computes Ackermann, I'm afraid I'm just going to redirect you to Schnoebelen's paper.

Anyway, even for the simple code from above, we wouldn't want to write the equivalent Turing machine by hand. But, with the tools provided here, you can run the program from above, compile it to a Turing machine, and run that too:

  echo "" | run_simper ack2.simper
  tm_of_simper ack2.simper
  echo "" | run_tm ack2.tm

Now try to change the 23 in ack2.simper to some other number, and then repeat the steps above. The Turing machine should get stuck. Now let's generate Java code:

  java_of_tm ack2.tm

The generated ack2.java has 32MiB. Alas, javac doesn't seem able to compile this on a computer with 32GiB.

Example: Is Set Theory Consistent?

A fun example is the Turing machine designed recently that searches for contradictions in set theory (ZF), and halts when it finds one. (We hope it won't halt, but we can't hope to prove it won't halt.) First, copy this Turing machine to your working directory.

  docker run -ti rgrig/javats
  mkdir work
  cp tm/zf.tm work/
  cd work/

You can run this machine using the Turing machine interpreter.

  echo "" | run_tm zf.tm

You can also compile it to Java:

  java_of_tm zf.tm
  javac -J-Xss100m zf.java

If the previous step doesn't work on your computer, it might be you need more memory. In that case, you can download the bytecode produced by javac from here:

zf.jar (3.4MiB)

Now, you can run the Turing machine using javac:

  cp ../tm/Tape.java .
  javac -J-Xss100m Tape.java

Or, if you use the downloaded Jar, say

  javac -J-Xss100m -cp zf.jar Tape.java

In any case, don't expect javac to terminate normally.

Example: Recognizing Palindromes

Suppose we want to recognize palindromes using javac. There are three ways we can do it: by implementing a Turing machine, by implementing a Simper program, or by specifying a context free grammar. The last option is the easiest, and the most inefficient one. Let's start with it.

  docker run -ti rgrig/javats
  mkdir work
  cp cfg/palindrome.cfg work/
  cd work/
  cat palindrome.cfg

You should see

  START: | a START a | b START b | a | b;

You can now compile this to a Turing machine and run it:

  cyk_of_cfg palindrome.cfg
  tm_of_simper palindrome.cyk.simper
  echo "" | run_tm palindrome.cyk.tm   # takes a minute, prints HALT
  echo "a b" | run_tm palindrome.cyk.tm  # takes a minute, prints STUCK
  echo "a b a" | run_tm palindrome.cyk.tm  # ..., prints HALT

We can also generate Java code from the Turing machine

  java_of_tm palindrome.cyk.tm

However, the result is too big (about 1.4GiB) to be compiled with javac. We can also start with a recognizer written directly in Simper.

  rm *  # you must still be in directory work/
  cp ../simper/palindrome.simper .
  cat palindrome.simper

You should see

  var i, j, x
  if n == 0 { halt }
  i := 0
  j := n
  --j
  loop:
  if input[i] == input[j] {
    if j == 0 { halt }
    else { ++i --j goto loop }
  }
  x := "a"  x := "b"  // hack to tell tm_of_simper to include these letters

As before, you can compile this to a Turing machine and test it:

  tm_of_simper palindrome.simper
  echo "" | run_tm palindrome.tm      # prints HALT
  echo "a b" | run_tm palindrome.tm   # prints STUCK
  echo "a b a" | run_tm palindrome.tm # prints HALT

We can compile the Turing machine to Java:

  java_of_tm palindrome.tm

This time the source is considerably smaller than 1.4GiB: only 39MiB. Alas, still out of the reach of javac on my machine.

Finally, we can hand-code a Turing machine for recognizing palindromes.

  rm *
  cp ../tm/palindrome.tm .
  echo "" | run_tm palindrome.tm      # prints HALT
  echo "a b" | run_tm palindrome.tm   # prints STUCK
  echo "a b a" | run_tm palindrome.tm # prints HALT

From this one you can get bytecode:

  java_of_tm palindrome.tm
  javac palindrome.java

And we can even run it.

  cp ../tm/Tape.java .
  javac Tape.java

Now edit Tape.java to add some letters in-between start and stop. For example, you can make it look like this:

  class Tape {
    void f() {
      E<?super E<?super Z>> l =
        Builder.start.a().b().stop();
    }
  }

Since ‘a b’ is not a palindrome, the following command should report a type error:

  javac Tape.java

But if you change the tape to something that is a palindrome, then it should compile. Possibly after giving the compiler more stack (-J-Xss100m), and possibly after a lot of time; but it will compile.

Example: Recognizing an Inherently Ambiguous Language

The standard example of an inherently ambiguous context free language is $\{\,a^m b^n c^n d^m\mid\,m,n\ge 0\} \cup \{\,a^m b^m c^n d^n\mid\,m,n\ge 0\}$. You can see the grammar and the hand-crafted Turing machine here:

  docker run -ti rgrig/javats
  cat cfg/abcd.cfg
  cat tm/abcd.tm

And you can play with these files in the same way we did with the palindromes in the previous section.

Details: Syntax of CFG

The grammar for CFGs is as follows:

  <start>   ::= { <rule> }
  <rule>    ::= <nonterminal> ':' <element> { '|' <element> }
  <element> ::= <nonterminal>
  <element> ::= <terminal>

Nonterminals start with an uppercase letter; terminals start with a lowercase letter. There is no lexing: the name of the terminal is the same as (what ends up being) the corresponding letter/symbol on the Turing machine tape. The start nonterminal has the special name ‘START’.

Details: Syntax of Simper

The grammar for Simper is as follows:

  <program>   ::= <decls> <body>
  <program>   ::= <body>
  <decls>     ::= 'var' <id> { ',' <id> }
  <body>      ::= { <command> }
  <command>   ::= <label> ':'
  <command>   ::= 'goto' <label>
  <command>   ::= <lvalue> ':=' <value>
  <command>   ::= 'if' <condition> '{' <body> '}' <else>
  <command>   ::= 'while' <condition> '{' <body> '}'
  <command>   ::= 'switch' <value> '{' { <branch> } '}'
  <command>   ::= '++' <lvalue>
  <command>   ::= '--' <lvalue>
  <command>   ::= 'halt'
  <else>      ::=
  <else>      ::= 'else' '{' <body> '}'
  <label>     ::= <id>
  <value>     ::= <lvalue>
  <value>     ::= <rvalue>
  <lvalue>    ::= <id>
  <lvalue>    ::= <id> '[' <value> { ',' <value> } ']'
  <rvalue>    ::= 'array' '[' <value> { ',' <value> } ']' '(' <value> ')'
  <rvalue>    ::= <num_literal>
  <rvalue>    ::= <sym_literal>
  <condition> ::= <value> '==' <value>
  <condition> ::= <value> '!=' <value>
  <branch>    ::= <value> '{' <body> '}'

Simper has two special variables, which should not be declared explicitly: input and n. A numeric literal is a nonnegative integer written in decimal notation. A symbol literal is an identifier (without spaces) put in double quotes.

Details: Syntax of TM

  <machine>     ::= <Q> <alphabet> <delta>
  <Q>           ::= 'States' '{' { <state> } '}'
  <alphabet>    ::= 'Alphabet' '{' { <letter> } '}'
  <delta>       ::= 'Transitions' '{' { <transition> } '}'
  <transition>  ::= <state> <letter> '->' <state< <letter> { ';' <letter> } <move>
  <transition>  ::= <state> <letter> '->' 'Halt'
  <transition>  ::= <state> '_' '->' <move>
  <transition>  ::= <state> '_' '->' <state>
  <move>        ::= 'Left'
  <move>        ::= 'Stay'
  <move>        ::= 'Right'
  <state>       ::= <id>
  <letter>      ::= <id>

The initial state has the special name ‘start’.

Source Code

The paper contains two theorems. Their proofs can be paraphrased as follows.

Theorem 1. It is undecidable whether $t$ is a subtype of $t'$ according to a given class table.

Proof. See java_of_tm.ml.

Theorem 6. Given is a context free grammar $G$ that describes a language ${\cal L}\subseteq\Sigma^*$ over an alphabet of method names. We can construct Java class definitions, a type $T$, and expressions ${\it Start}$, ${\it Stop}$ such that the code $$ T\; \ell \quad=\quad {\it Start}.f^{(1)}().f^{(2)}()\ldots f^{(m)}().{\it Stop} $$ type checks if and only if $f^{(1)}f^{(2)}\ldots f^{(m)} \in {\cal L}$. Moreover, the class definitions have size polynomial in the size of $G$, and the Java code can be type-checked in time polynomial in the size of $G$.

Proof. See cyk_of_cfg.ml and tm_of_simper.ml, in addition to java_of_tm.ml.

So, here is the source code:

javats-ecf7199.tar.bz2

Changelog:

20160817 first webpage
20160818 minor: typos, mentioned special identifiers for TM and CFG, bugfixes to some (not presented) example TMs