Brainfuck is a highly minimalistic programming language created in 1993 by Swiss physics student Urban Müller. The language consists of only eight simple commands, a data pointer and an instruction pointer. In this article, we will implement a custom Brainfuck interpreter in Rust.

Workspace setup

Let’s create a workspace for our project:

  1. Create a folder with your workspace name. For example: brainfuck.
  2. Add file Cargo.toml with this content:
[workspace]

members = [
]
  1. Create a brainfuck interpreter library:
cargo new --lib brainfuck_interpreter
  1. Modify your outer Cargo.toml. You should add the new library name to the members list:
[workspace]

members = [
    "brainfuck_interpreter",
]

Now, we are ready to work with the interpreter.

Interpreter library setup

Change your directory to brainfuck_interpreter. From now, we will work inside this folder.

Modify Cargo.toml file with these dependencies:

[dependencies]
thiserror = "1.0"

[dev-dependencies]
criterion = "0.3"

[[bench]]
name = "interpreter_bench"
harness = false
  • thiserror is a common library that simplifies work with custom error types.
  • criterion adds the ability to run benchmarks to catch any regression or improvements of the program.

Interpreter implementation

Many interpreters can be created by following these steps:

  • Parsing the source
  • Building Abstract Syntax Tree (AST)
  • Optimizing AST
  • Executing AST

In our case, Brainfuck language is simple enough and can be executed command by command. We can represent commands as an array and run them linearly.

Source parsing

Let’s start by implementing source parsing.

Custom Types

We should create an internal enum to represent each command. Let’s call it a Token:

#[derive(PartialEq, Eq, Debug)]
enum Token {
    MoveRight,
    MoveLeft,
    Increment,
    Decrement,
    Output,
    Input,
    LoopBegin,
    LoopEnd,
    Unknown,
}

Can you notice an issue with this enum? Its main drawback is that it’s hard to maintain loops. We can transform Token enum into something more high level, like Operation:

#[derive(PartialEq, Eq, Debug)]
enum Operation {
    MoveRight(usize),
    MoveLeft(usize),
    Increment(u8),
    Decrement(u8),
    Output,
    Input,
    Loop(Vec<Operation>),
}

You can see that LoopBegin and LoopEnd were replaced with Loop(Vec<Operation>). It’s easier to execute such a program. We can also optimize the AST by replacing consecutive tokens with one. For example, >>> can be stored as MoveRight(3).

Lastly, we should define custom error type to use throughout our program:

#[derive(Debug, Error)]
pub enum InterpreterError {
    #[error("Error parsing source: `{0}`")]
    ParseError(String),
    #[error("Memory overflow")]
    MemoryOverflow,
    #[error("Pointer is out of memory bounds")]
    PointerOverflow,
    #[error("Error reading from stdin: `{0}`")]
    StdinError(io::Error),
}

Parsing implementation

Let’s define a method that will:

  • Parse &str into Vec<Token>
  • Parse Vec<Token> into Vec<Operation>

We can do the first step quickly with this code:

let tokens = source
    .chars()
    .map(|cur| match cur {
        '>' => Token::MoveRight,
        '<' => Token::MoveLeft,
        '+' => Token::Increment,
        '-' => Token::Decrement,
        '.' => Token::Output,
        ',' => Token::Input,
        '[' => Token::LoopBegin,
        ']' => Token::LoopEnd,
        _ => Token::Unknown,
    })
    .filter(|token| token.ne(&Token::Unknown));

You can see that we convert chars to associated tokens and skip all other chars by using Token::Unknown.

Conversion from Token to Operation is more complex. We should maintain the previous LoopBegin token and all the commands in between. One of the simplest methods is to use Stack. Here is the implementation:

let mut stack: LinkedList<Vec<Operation>> = LinkedList::new();
stack.push_back(Vec::new());

for token in tokens {
    let cur_operations = stack.back_mut().expect("Stack should not be empty!");
    match token {
        Token::MoveRight => {
            if let Some(Operation::MoveRight(x)) = cur_operations.last_mut() {
                *x += 1;
            } else {
                cur_operations.push(Operation::MoveRight(1))
            }
        }
        Token::MoveLeft => {
            if let Some(Operation::MoveLeft(x)) = cur_operations.last_mut() {
                *x += 1;
            } else {
                cur_operations.push(Operation::MoveLeft(1))
            }
        }
        Token::Increment => {
            if let Some(Operation::Increment(x)) = cur_operations.last_mut() {
                *x += 1;
            } else {
                cur_operations.push(Operation::Increment(1))
            }
        }
        Token::Decrement => {
            if let Some(Operation::Decrement(x)) = cur_operations.last_mut() {
                *x += 1;
            } else {
                cur_operations.push(Operation::Decrement(1))
            }
        }
        Token::Input => cur_operations.push(Operation::Input),
        Token::Output => cur_operations.push(Operation::Output),
        Token::LoopBegin => stack.push_back(Vec::new()),
        Token::LoopEnd => {
            let cur_operations = stack.pop_back().unwrap();
            let prev_operations = stack.back_mut().ok_or_else(|| {
                InterpreterError::ParseError(String::from("Unexpected end of loop"))
            })?;

            prev_operations.push(Operation::Loop(cur_operations))
        }
        _ => {
            return Err(InterpreterError::ParseError(format!(
                "Unexpected token {:?}",
                token
            )))
        }
    }
}

let operations = stack.pop_back().unwrap();
if !stack.is_empty() {
    Err(InterpreterError::ParseError(String::from(
        "Expected end of loop",
    )))
} else {
    Ok(operations)
}

As you can see, this code contains the optimization mentioned above: consecutive tokens are joined together. LoopBegin creates a new stack entry, and LoopEnd removes the previous stack entry and stores it as a Loop(operations).

Here you can see the whole method:

fn parse_source(source: &str) -> Result<Vec<Operation>, InterpreterError> {
    let tokens = source
        .chars()
        .map(|cur| match cur {
            '>' => Token::MoveRight,
            '<' => Token::MoveLeft,
            '+' => Token::Increment,
            '-' => Token::Decrement,
            '.' => Token::Output,
            ',' => Token::Input,
            '[' => Token::LoopBegin,
            ']' => Token::LoopEnd,
            _ => Token::Unknown,
        })
        .filter(|token| token.ne(&Token::Unknown));

    let mut stack: LinkedList<Vec<Operation>> = LinkedList::new();
    stack.push_back(Vec::new());

    for token in tokens {
        let cur_operations = stack.back_mut().expect("Stack should not be empty!");
        match token {
            Token::MoveRight => {
                if let Some(Operation::MoveRight(x)) = cur_operations.last_mut() {
                    *x += 1;
                } else {
                    cur_operations.push(Operation::MoveRight(1))
                }
            }
            Token::MoveLeft => {
                if let Some(Operation::MoveLeft(x)) = cur_operations.last_mut() {
                    *x += 1;
                } else {
                    cur_operations.push(Operation::MoveLeft(1))
                }
            }
            Token::Increment => {
                if let Some(Operation::Increment(x)) = cur_operations.last_mut() {
                    *x += 1;
                } else {
                    cur_operations.push(Operation::Increment(1))
                }
            }
            Token::Decrement => {
                if let Some(Operation::Decrement(x)) = cur_operations.last_mut() {
                    *x += 1;
                } else {
                    cur_operations.push(Operation::Decrement(1))
                }
            }
            Token::Input => cur_operations.push(Operation::Input),
            Token::Output => cur_operations.push(Operation::Output),
            Token::LoopBegin => stack.push_back(Vec::new()),
            Token::LoopEnd => {
                let cur_operations = stack.pop_back().unwrap();
                let prev_operations = stack.back_mut().ok_or_else(|| {
                    InterpreterError::ParseError(String::from("Unexpected end of loop"))
                })?;

                prev_operations.push(Operation::Loop(cur_operations))
            }
            _ => {
                return Err(InterpreterError::ParseError(format!(
                    "Unexpected token {:?}",
                    token
                )))
            }
        }
    }

    let operations = stack.pop_back().unwrap();
    if !stack.is_empty() {
        Err(InterpreterError::ParseError(String::from(
            "Expected end of loop",
        )))
    } else {
        Ok(operations)
    }
}

Program state

According to the language specification, it consists of the following:

  • Memory blocks of the size 30000. Each of type u8 (from 0 to 255).
  • Memory pointer. It starts at 0.
  • Access to the stdin and stdout.

Code for Program state:

const MEMORY_SIZE: usize = 30_000;

struct Program<'a> {
    memory: [u8; MEMORY_SIZE],
    pointer: usize,
    stdin: Box<dyn io::Read + 'a>,
    stdout: String,
}

The constructor should accept stdin and build the empty program state:

impl<'a> Program<'a> {
    fn new(stdin: Box<dyn io::Read + 'a>) -> Self {
        Self {
            memory: [0u8; MEMORY_SIZE],
            pointer: 0,
            stdin,
            stdout: String::new(),
        }
    }
}

The execution of each operation is straightforward. It’s defined inside the execute method:

impl<'a> Program<'a> {
    fn execute(mut self, operations: &[Operation]) -> Result<String, InterpreterError> {
        self.process_operations(operations)?;
        Ok(self.stdout)
    }

    fn process_operations(&mut self, operations: &[Operation]) -> Result<(), InterpreterError> {
        for operation in operations {
            match operation {
                Operation::MoveLeft(count) => {
                    self.pointer = self
                        .pointer
                        .checked_sub(*count)
                        .ok_or(InterpreterError::PointerOverflow)?;
                }
                Operation::MoveRight(count) => {
                    self.pointer = self
                        .pointer
                        .checked_add(*count)
                        .ok_or(InterpreterError::PointerOverflow)?;
                    if self.pointer >= self.memory.len() {
                        return Err(InterpreterError::PointerOverflow);
                    }
                }
                Operation::Increment(count) => {
                    self.memory[self.pointer] = self.memory[self.pointer]
                        .checked_add(*count)
                        .ok_or(InterpreterError::MemoryOverflow)?;
                }
                Operation::Decrement(count) => {
                    self.memory[self.pointer] = self.memory[self.pointer]
                        .checked_sub(*count)
                        .ok_or(InterpreterError::MemoryOverflow)?
                }
                Operation::Input => {
                    let mut buf = [0u8];
                    if let Err(err) = self.stdin.read(&mut buf) {
                        return Err(InterpreterError::StdinError(err));
                    }
                    self.memory[self.pointer] = buf[0] as u8;
                }
                Operation::Output => self.stdout.push(self.memory[self.pointer] as char),
                Operation::Loop(operations) => {
                    while self.memory[self.pointer] != 0 {
                        self.process_operations(operations)?;
                    }
                }
            }
        }

        Ok(())
    }
}

You can see that we use stdin as the input and stdout as the output. At the end of execution, we can return the stdout.

To define a public interface for our interpreter, let’s define a separate public method:

pub fn interpret<'a>(
    source: &'a str,
    stdin: Box<dyn io::Read + 'a>,
) -> Result<String, InterpreterError> {
    let operations = parse_source(source)?;
    let program = Program::new(stdin);
    program.execute(&operations)
}

To verify the implementation, we should use unit tests. You can refer to the unit tests inside the repository.

Benchmarking

To benchmark the implementation, we will use the criterion crate. Sample code is testing fibonacci program and is located in the file benches/interpreter_bench.rs:

use brainfuck_interpreter::interpret;
use criterion::{criterion_group, criterion_main, Criterion};

fn fibonacci() -> String {
    let source = "+++++++++++
    >+>>>>++++++++++++++++++++++++++++++++++++++++++++
    >++++++++++++++++++++++++++++++++<<<<<<[>[>>>>>>+>
    +<<<<<<<-]>>>>>>>[<<<<<<<+>>>>>>>-]<[>++++++++++[-
    <-[>>+>+<<<-]>>>[<<<+>>>-]+<[>[-]<[-]]>[<<[>>>+<<<
    -]>>[-]]<<]>>>[>>+>+<<<-]>>>[<<<+>>>-]+<[>[-]<[-]]
    >[<<+>>[-]]<<<<<<<]>>>>>[+++++++++++++++++++++++++
    +++++++++++++++++++++++.[-]]++++++++++<[->-<]>++++
    ++++++++++++++++++++++++++++++++++++++++++++.[-]<<
    <<<<<<<<<<[>>>+>+<<<<-]>>>>[<<<<+>>>>-]<-[>>.>.<<<
    [-]]<<[>>+>+<<<-]>>>[<<<+>>>-]<<[<+>-]>[<+>-]<<<-]";
    let input = "".as_bytes();

    let actual = interpret(source, Box::new(input));
    actual.expect("Program should work")
}

fn bench_fibonacci_interpreter(c: &mut Criterion) {
    c.bench_function("fib", |b| b.iter(|| fibonacci()));
}

criterion_group!(benches, bench_fibonacci_interpreter);
criterion_main!(benches);

Benchmark result in Ryzen 5800H:

fib                     time:   [169.89 us 170.94 us 172.74 us]                
Found 15 outliers among 100 measurements (15.00%)
  5 (5.00%) high mild
  10 (10.00%) high severe

Summary

This article described how to implement Brainfuck interpreter from scratch in Rust. We started by creating the workspace with the library inside, then defined custom types and the code. Lastly, we added unit tests and benchmarks for our implementation.

As a result, we received a library that can parse Brainfuck language with the provided input and produce the correct output. Its implementation is tested and optimized. You can modify it even more. Here are some ideas: Serialize operations in a separate file. That way, we can compile and execute the code separately to improve performance. Optimize the interpreter even more. You can use benchmarks to track your performance. Replace memory block’s type from u8 to u16. It will provide support for UTF8 characters. Currently, it supports only ASCII.

You can view the final code in my public repo.

In the following article, we will use this library to create a CLI program, web application, and AWS Lambda.