Making a Virtual Machine Interpreter in Java – Part 1

May 12, 2020

My intention here is to create an Aha!! moment, a moment when you realise this is all there is to a Virtual Machine.

Introduction
Virtual Machines are everywhere, you must have definitely heard about the JVM(Java Virtual Machine), .NET CLR (C#), Zend(PHP), Dalvik/ART Virtual Machine for Android etc. You might have also heard about the the different Nintendo emulators all the way to VMWare and KVM.

The primary difference between things like JVM and things like VMWare is that JVM emulates a machine that doesn’t physically exist while VMWare tries to emulate an existing X86 or ARM hardware. One is tuned for performance while the other is tuned for correctness (for correctly emulating the hardware).

This article focuses on building something that functions like a JVM, but before we get our hands dirty, I would like to tell you about the one important decision that you would have to make before building a Virtual Machine Interpreter. You could build it as a Stack based Machine or Register based Machine. There are reasons why you would choose one over the other but as JVM is a stack based machine, we will try to mimic the same and in my opinion stack based machines are slightly easier to comprehend during the initial phases.

What is a Stack?
Stack is a data structure, wherein anything you put in last comes out first or LIFO(Last In First Out). You can compare it to a stack of Lego or a stack of plates.

Image stolen from Here
Refer the above link to learn more about stacks.

SimpleMachine
Let us call our new Virtual Machine as “SimpleMachine” and let us define some code that our machine will execute.

print "Hello World"

The above code is simple enough for humans but too high level for a machine. This is the job of a compiler which converts the above code into instructions that a machine would understand and since it is a stack based machine the instructions would be in the form a stack.

["Hello World", print]

Operands like “Hello World” get pop’d out till an operator/instruction like `print` is found and the instruction is executed taking the operands as a parameter. Let us see how does this look for some basic math expressions, for the code below ,

print 1+2*3

The above is first converted into a stack like format by the compiler,

[3, 2, *, 1, +, print]

If the above stack representation looks a bit weird to you, then you are on the right track, what we did there was an infix to postfix conversion. To put it simply, infix is how you would normally write mathematical expressions and follow BODMAS/PEMDAS but computers aren’t as smart, hence it is converted to something simpler that a computer can work with which is called the postfix expression. This again is done by a compiler, if you would like to know the algorithm behind this conversion, then I recommend this video.

Let us get back to where we were,
We start by popping the first element
hmm wait…, it looks like a number?
let us keep it in a bag called operands,

stack: [2, *, 1, +, print]
operands:[3]

Let us pop the next element, looks like it is an operand again, so we put it inside our bag of operands,

stack: [*, 1, +, print]
operands:[2, 3]

Now when we pop the next element, we see that it is a multiplication operator “*”, and we already have the operands in our bag that we called operands, we just pop them out multiply them (2*3 = 6) and place the result back on to the stack.

stack: [6, 1, +, print]
operands:[]

Let us pop the next element, looks like it is an operand again, so we put it inside our bag of operands,

stack: [1, +, print]
operands:[6]

Let us pop the next element, looks like it is an operand again, so we again push it inside our bag of operands, (see the loop?)

stack: [+, print]
operands:[1, 6]

Now when we pop the next element, we see that it is an addition operator “+”, and we already have the operands in our bag called operands, we just pop them out add them (1+6 = 7) and push the result back on to the stack.

stack: [7, print]
operands:[]

Let us pop the next element, looks like it is an operand again, so we again put it inside our bag of operands,

stack: [print]
operands:[7]

Now when we pop the next element, we see that it is a print instruction and we have operands in our bag that need to be printed, we just pop the operand out, print it and since print doesn’t return a value, our stack and operands bag are now empty and our program comes to an end.

Now you should be like, I get it! else, go read it again before proceeding.

Now let us make a simple Virtual Machine Interpreter in Java that can be used print strings. our main class is as follows,

package com.returnzero;

import java.util.List;

public class Main {

    public static void main(String[] args) {
        List<String> instructions = List.of("\"Hello World\"", "print");
        SimpleMachine simpleMachine = new SimpleMachine(instructions);
        simpleMachine.run();
    }
}

As you can see, we have the instructions in a list that our VM is going to go over and execute. Let us now look at the SimpleMachine class,

package com.returnzero;

import java.util.*;

public class SimpleMachine {

    private Stack<String> dataStack = new Stack<>();
    private int instructionPointer = 0;
    private List<String> code = new ArrayList<>();
    private Map<String, FunctionInterface> instructionFunctionMap = new HashMap<>();


    SimpleMachine(List<String> code) {
        this.code = code;
        instructionFunctionMap.put("print", () -> { System.out.print(pop()); });
    }

    public void run() {
        while (instructionPointer < code.size()) {
            String opcode = code.get(instructionPointer);
            instructionPointer++;
            dispatch(opcode);
        }
    }

    private void dispatch(String opcode) {
        if (instructionFunctionMap.containsKey(opcode)) {
            instructionFunctionMap.get(opcode).exec();
        } else if (isNumeric(opcode)) {
            push(opcode);
        } else if (opcode.startsWith("\"") && opcode.endsWith("\"")) {
            push(opcode.substring(1, opcode.length() - 1));
        } else {
            throw new RuntimeException("Unknown opcode: " + opcode);
        }
    }

    public static boolean isNumeric(String str) {
        return str.matches("-?\\d+(\\.\\d+)?");  //match a number with optional '-' and decimal.
    }

    private String pop() {
        return dataStack.pop();
    }

    private void push(String value) {
        dataStack.push(value);
    }

    private String top() {
        return dataStack.peek();
    }

}

Remember the bag of operands, Here we have something similar, a data stack stored in a variable called dataStack. we basically loop through the code and if the code is numeric or a quoted string like “Hello World”, we simply push it on to the dataStack.

Whenever we hit a code that is a pre-defined instruction in the variable instructionFunctionMap, we simply get a reference to the instance that implements that instruction and execute it, in our case, we implement it in the form of a lambda expression.

instructionFunctionMap.put("print", () -> { System.out.print(pop()); });

And if you were wondering what is theFunctionInterface, it is basically a single method interface as defined below.

package com.returnzero;

public interface FunctionInterface {
    void exec();
}

Now when you run the Main class, you will get the result as:

Hello World

I hope you were able to get a basic understanding of how VM Interpreter works at the lowest level, In the next part, we will cover more instructions like “read”,”if”, “jump”, “loop” etc. This will allow your VM to be Turing Complete and allow it to run some more complex programs.

Update 29-Jan-2023: There won’t be a second part to this, please read the LISP: My second attempt at explaining Interpreters instead.