Brainfuck

Brainfuck is an esoteric programming language from 1993.

It is a very user-unfriendly with its eight commands, each only a character long. This makes it difficult to program with. As a short project I decided to write an interpreter for the language.

++++++++++[>+++++++>++++++++++>+++>+<<<<-]>++.>+.+++++++..+
++.>++.<<+++++++++++++++.>.+++.------.--------.>+.>.

The above code is your standard “Hello World!” program written in Brainfuck. When run, it prints “Hello World!” to the console and exits.

A Brainfuck program expects to have a minimal environment to work in. This environment consists of a bunch of cells (at least 30,000) that can hold integers and a pointer to the current active cell. The environment will vary slightly between implementations.

The language consists of the following commands:

Character Meaning
> Increment pointer
< Decrement pointer
+ Increment value
- Decrement value
. Print value
, Accept value
[ Jump forward
] Jump backward

The first two commands (> and <) increment and decrement a main cell pointer, respectively. This pointer points to cells that have been initialized to zero by the environment. Every other command operates on the cell that the pointer is pointing to.

The second two commands are pretty straight forward. + and - increment and decrement the value in the current cell. The first four commands are the easiest to implement. They’re just adding and subtracting from a number after all.

Next up are . and , for printing and accepting data to and from the console. Values are printed from the current cell as an ASCII character. Accepted values are taken from ASCII and stored as an integer. Things probably get pretty wonky when non-ASCII is used, but we won’t go over that here.

Lastly, we have the two commands that do all of the “thinking” in this language: the jumps. Whenever either one of these commands are encountered, the value in the current cell determines what happens. For the forward jump ([) command, if the value is zero jump to the command just after to the matching backward jump (]) command. Jumping backwards is similar, but only jump back if the value is non-zero.

Reading and writing the language is no walk in the park. An easier task is to write a compiler or an interpreter for the language, which is what I’ve gone and done. I used Go for this little project as it’s currently my language of choice.

The program has two main parts: the “front-end” and the “back-end”. The front end takes care of making sure the command line arguments are correct and configuring the back-end. The back-end is what actually does all the work, so that’s what I’ll mainly be focusing on here.

engine := NewEngine()
if err := engine.Load("source.bf"); err != nil {
    log.Fatalf("Error loading source: %s", err)
}
if err := engine.Run(); err != nil {
    log.Fatalf("Run failed: %s\n", err.String())
}

There isn’t much to go over here, the front end of the program is pretty straight forward. The engine is initialized, the code is loaded from a source file, and then it’s run. The first two steps could even be combined, but I decided to have them separate to keep things easier to follow.

type Command rune
const (
    C_INCPTR    Command = '>'   // Increment pointer
    C_DECPTR    Command = '<'   // Decrement pointer
    C_INC       Command = '+'   // Increment value
    C_DEC       Command = '-'   // Decrement value
    C_OUT       Command = '.'   // Print value
    C_ACC       Command = ','   // Accept value
    C_JMPFOR    Command = '['   // Jump forward
    C_JMPBAC    Command = ']'   // Jump backwards
)

type Engine struct {
    cellIdx     int
    cells       []int

    commandIdx  int
    commands    []Command
}

func NewEngine() *Engine {
    e := &Engine{
        cellIdx:    0,
        commandIdx: 0,

        cells:      []int{0},
        commands:   []Command{},
    }
    return e
}

Here we have the initialization code. The Engine structure consists of two sets of indexes and slices. The first set (cellIdx and cells) holds the working memory of the program. The second set (commandIdx and commands) holds the commands that will be loaded. Commands from the second set will act on the data in the first set. I don’t define an upper bounds for these two slices, so there can be as many cells as the memory in the host computer can handle. Same thing goes for the commands, although longer programs will probably take an exceptionally long time to run.

func (e *Engine) Load(filename string) error {
    file, err := os.Open(filename)
    if err != nil {
        return err
    }
    defer file.Close()

    rawBytes, err := ioutil.ReadAll(file)
    if err != nil {
        return err
    }
    e.programFilename = filename

    for _, b := range rawBytes {
        switch Command(b) {
            case C_INCPTR, C_DECPTR, C_INC, C_DEC, C_OUT, C_ACC, C_JMPFOR, C_JMPBAC:
                e.commands = append(e.commands, Command(b))
            default:
                // Ignore all other characters
                continue
        }
    }
    return nil
}

Loading and parsing Brainfuck is much easier than your standard programming language. Many languages have rather strict syntax rules where placement of punctuation and white space matters a whole lot. In Brainfuck, everything except the eight commands from above are ignored and are considered comments.

func (e *Engine) Run() *BrainfuckError {
    for e.commandIdx = 0; e.commandIdx &lt; len(e.commands); e.commandIdx++ {
        switch e.commands[e.commandIdx] {
            // --8&lt;--
        }
    }
    return nil
}

And finally we come to the main bulk of the interpreter. Almost everything happens in this loop. The only things that have their own place are the jump commands (we’ll get to those later). The basic flow of the interpreter is straight forward: read command, perform command’s job, move to next command, rinse and repeat until the last command is reached.

// Increment the pointer
case C_INCPTR:
    e.cellIdx += 1

    // Add another cell if the index is larger than the number of cells.
    if e.cellIdx &gt;= len(e.cells) {
        e.cells = append(e.cells, 0)
    }

// Decrement the pointer
case C_DECPTR:
    e.cellIdx -= 1

    // Error if cell index is below zero.
    if e.cellIdx &lt; 0 {
        return e.newError("cellIdx below zero.")
    }

Once again we have the increment and decrement pointer commands first. The only special thing we do here is bounds checking. There can’t be a negative number of cells, and there can’t be any un- nitilaized cells. If the program tries to access a cell that doesn’t exist, create it and set its value to zero.

// Increment the value
case C_INC:
    e.cells[e.cellIdx]++

// Decrement the value
case C_DEC:
    e.cells[e.cellIdx]--

These two commands are by far the easiest to implement. There isn’t much to say here other than we don’t need to do any bounds checking as that is done for us with the previous two commands.

// Print the cell's value
case C_OUT:
    fmt.Printf("%c", e.cells[e.cellIdx])

// Accept a new value
case C_ACC:
    var v int
    // Caveat: User must hit enter to continue
    if _, err := fmt.Scanf("%c", &v); err != nil {
        return e.newError(fmt.Sprintf("C_ACC failure: %s", err))
    }
    e.cells[e.cellIdx] = v

Here we have the only two I/O commands in the language. The first just printing out the ASCII representation of the value in the cell, and the second saving the integer value of the inputted ASCII value to a cell.

There is a small caveat to the input command, however. Due to buffered input, the user is required to press enter to send any data to the program. They can enter as much as they want, but only the first character is kept. There are ways around this, but that is a bit out of scope for this project for now.

// Jump forwards to matching close bracket if current cell is zero
case C_JMPFOR:
    if e.cells[e.cellIdx] == 0 {
        e.gotoLoopEnd()
    }

// Jump backwards to matching close open if current cell not zero
case C_JMPBAC:
    if e.cells[e.cellIdx] != 0 {
        e.gotoLoopStart()
    }

Lastly we come across the two jump commands. As you’ll notice, both commands' bulk is elsewhere and it’s only executed if the current cell’s conditions allow.

// Go to the end of the current loop
func (e *Engine) gotoLoopEnd() *BrainfuckError {
    lvl := 0
    tlvl := 0
    for e.commandIdx += 1; e.commandIdx &lt; len(e.commands); e.commandIdx++ {
        switch e.commands[e.commandIdx] {
            case C_JMPFOR:
                lvl++
                tlvl++
            case C_JMPBAC:
                if lvl &gt; 0 {
                    lvl--
                } else if lvl &lt; 0 {
                    return e.newError("nest level too low in gotoLoopStart()")
                } else {
                    return nil
                }
        }
    }
    return e.newError(fmt.Sprintf("gotoLoopEnd finished without finding C_JMPBAC [%d]", tlvl))
}

These commands get a loop all to themselves. The loop conditions are almost identical to the main loop’s with one main difference. We want to start looking for the matching jump command after the current command, so we add one the the command index. After that, for ever jump forward command we come across we add one to a counter. For every jump backward command we come across we subtract one from the same counter. If the counter is zero when we find the jump backward command the loop ends and we return to the main loop with the command index pointing to the jump backwards command.

// Go to the start of the current loop
func (e *Engine) gotoLoopStart() *BrainfuckError {
    lvl := 0
    tlvl := 0
    for e.commandIdx -= 1; e.commandIdx &gt; -1; e.commandIdx-- {
        switch e.commands[e.commandIdx] {
            case C_JMPBAC:
                lvl++
                tlvl++
            case C_JMPFOR:
                if lvl &gt; 0 {
                    lvl--
                } else if lvl &lt; 0 {
                    return e.newError("nest level too low in gotoLoopStart()")
                } else {
                    return nil
                }
        }
    }
    return e.newError(fmt.Sprintf("gotoLoopStart finished without finding C_JMPFOR [%d]", tlvl))
}

Jumping backward is almost identical to jumping forward, only we look backwards through the command slice. Also note that both of the jump loops bounds check for negative levels. Encountering a negative level means that the Brainfuck code is malformed.

And with that, you have a Brainfuck interpreter. I didn’t include all of the code here (there’s some additional helper functions and a structure), but all that can be found over on GitHub.

links

I do computer things, i guess


2023-05-20