Brainfuck is a highly minimalistic programming language created in 1993 by Swiss physics student Urban Müller. The language consists of only eight simple commands, a data pointer and an instruction pointer. In this article, we will implement a custom Brainfuck interpreter in Rust.
Workspace setup
Let’s create a workspace for our project:
- Create a folder with your workspace name. For example:
brainfuck
. - Add file
Cargo.toml
with this content:
[workspace]
members = [
]
- Create a brainfuck interpreter library:
cargo new --lib brainfuck_interpreter
- Modify your outer
Cargo.toml
. You should add the new library name to themembers
list:
[workspace]
members = [
"brainfuck_interpreter",
]
Now, we are ready to work with the interpreter.
Interpreter library setup
Change your directory to brainfuck_interpreter
. From now, we will work inside this folder.
Modify Cargo.toml
file with these dependencies:
[dependencies]
thiserror = "1.0"
[dev-dependencies]
criterion = "0.3"
[[bench]]
name = "interpreter_bench"
harness = false
thiserror
is a common library that simplifies work with custom error types.criterion
adds the ability to run benchmarks to catch any regression or improvements of the program.
Interpreter implementation
Many interpreters can be created by following these steps:
- Parsing the source
- Building Abstract Syntax Tree (AST)
- Optimizing AST
- Executing AST
In our case, Brainfuck
language is simple enough and can be executed command by command. We can represent commands as an array and run them linearly.
Source parsing
Let’s start by implementing source parsing.
Custom Types
We should create an internal enum to represent each command. Let’s call it a Token
:
#[derive(PartialEq, Eq, Debug)]
enum Token {
MoveRight,
MoveLeft,
Increment,
Decrement,
Output,
Input,
LoopBegin,
LoopEnd,
Unknown,
}
Can you notice an issue with this enum? Its main drawback is that it’s hard to maintain loops. We can transform Token
enum into something more high level, like Operation
:
#[derive(PartialEq, Eq, Debug)]
enum Operation {
MoveRight(usize),
MoveLeft(usize),
Increment(u8),
Decrement(u8),
Output,
Input,
Loop(Vec<Operation>),
}
You can see that LoopBegin
and LoopEnd
were replaced with Loop(Vec<Operation>)
. It’s easier to execute such a program. We can also optimize the AST
by replacing consecutive tokens with one. For example, >>>
can be stored as MoveRight(3)
.
Lastly, we should define custom error type to use throughout our program:
#[derive(Debug, Error)]
pub enum InterpreterError {
#[error("Error parsing source: `{0}`")]
ParseError(String),
#[error("Memory overflow")]
MemoryOverflow,
#[error("Pointer is out of memory bounds")]
PointerOverflow,
#[error("Error reading from stdin: `{0}`")]
StdinError(io::Error),
}
Parsing implementation
Let’s define a method that will:
- Parse
&str
intoVec<Token>
- Parse
Vec<Token>
intoVec<Operation>
We can do the first step quickly with this code:
let tokens = source
.chars()
.map(|cur| match cur {
'>' => Token::MoveRight,
'<' => Token::MoveLeft,
'+' => Token::Increment,
'-' => Token::Decrement,
'.' => Token::Output,
',' => Token::Input,
'[' => Token::LoopBegin,
']' => Token::LoopEnd,
_ => Token::Unknown,
})
.filter(|token| token.ne(&Token::Unknown));
You can see that we convert chars to associated tokens and skip all other chars by using Token::Unknown
.
Conversion from Token
to Operation
is more complex. We should maintain the previous LoopBegin
token and all the commands in between. One of the simplest methods is to use Stack
. Here is the implementation:
let mut stack: LinkedList<Vec<Operation>> = LinkedList::new();
stack.push_back(Vec::new());
for token in tokens {
let cur_operations = stack.back_mut().expect("Stack should not be empty!");
match token {
Token::MoveRight => {
if let Some(Operation::MoveRight(x)) = cur_operations.last_mut() {
*x += 1;
} else {
cur_operations.push(Operation::MoveRight(1))
}
}
Token::MoveLeft => {
if let Some(Operation::MoveLeft(x)) = cur_operations.last_mut() {
*x += 1;
} else {
cur_operations.push(Operation::MoveLeft(1))
}
}
Token::Increment => {
if let Some(Operation::Increment(x)) = cur_operations.last_mut() {
*x += 1;
} else {
cur_operations.push(Operation::Increment(1))
}
}
Token::Decrement => {
if let Some(Operation::Decrement(x)) = cur_operations.last_mut() {
*x += 1;
} else {
cur_operations.push(Operation::Decrement(1))
}
}
Token::Input => cur_operations.push(Operation::Input),
Token::Output => cur_operations.push(Operation::Output),
Token::LoopBegin => stack.push_back(Vec::new()),
Token::LoopEnd => {
let cur_operations = stack.pop_back().unwrap();
let prev_operations = stack.back_mut().ok_or_else(|| {
InterpreterError::ParseError(String::from("Unexpected end of loop"))
})?;
prev_operations.push(Operation::Loop(cur_operations))
}
_ => {
return Err(InterpreterError::ParseError(format!(
"Unexpected token {:?}",
token
)))
}
}
}
let operations = stack.pop_back().unwrap();
if !stack.is_empty() {
Err(InterpreterError::ParseError(String::from(
"Expected end of loop",
)))
} else {
Ok(operations)
}
As you can see, this code contains the optimization mentioned above: consecutive tokens are joined together. LoopBegin
creates a new stack entry, and LoopEnd
removes the previous stack entry and stores it as a Loop(operations)
.
Here you can see the whole method:
fn parse_source(source: &str) -> Result<Vec<Operation>, InterpreterError> {
let tokens = source
.chars()
.map(|cur| match cur {
'>' => Token::MoveRight,
'<' => Token::MoveLeft,
'+' => Token::Increment,
'-' => Token::Decrement,
'.' => Token::Output,
',' => Token::Input,
'[' => Token::LoopBegin,
']' => Token::LoopEnd,
_ => Token::Unknown,
})
.filter(|token| token.ne(&Token::Unknown));
let mut stack: LinkedList<Vec<Operation>> = LinkedList::new();
stack.push_back(Vec::new());
for token in tokens {
let cur_operations = stack.back_mut().expect("Stack should not be empty!");
match token {
Token::MoveRight => {
if let Some(Operation::MoveRight(x)) = cur_operations.last_mut() {
*x += 1;
} else {
cur_operations.push(Operation::MoveRight(1))
}
}
Token::MoveLeft => {
if let Some(Operation::MoveLeft(x)) = cur_operations.last_mut() {
*x += 1;
} else {
cur_operations.push(Operation::MoveLeft(1))
}
}
Token::Increment => {
if let Some(Operation::Increment(x)) = cur_operations.last_mut() {
*x += 1;
} else {
cur_operations.push(Operation::Increment(1))
}
}
Token::Decrement => {
if let Some(Operation::Decrement(x)) = cur_operations.last_mut() {
*x += 1;
} else {
cur_operations.push(Operation::Decrement(1))
}
}
Token::Input => cur_operations.push(Operation::Input),
Token::Output => cur_operations.push(Operation::Output),
Token::LoopBegin => stack.push_back(Vec::new()),
Token::LoopEnd => {
let cur_operations = stack.pop_back().unwrap();
let prev_operations = stack.back_mut().ok_or_else(|| {
InterpreterError::ParseError(String::from("Unexpected end of loop"))
})?;
prev_operations.push(Operation::Loop(cur_operations))
}
_ => {
return Err(InterpreterError::ParseError(format!(
"Unexpected token {:?}",
token
)))
}
}
}
let operations = stack.pop_back().unwrap();
if !stack.is_empty() {
Err(InterpreterError::ParseError(String::from(
"Expected end of loop",
)))
} else {
Ok(operations)
}
}
Program state
According to the language specification, it consists of the following:
- Memory blocks of the size
30000
. Each of typeu8
(from 0 to 255). - Memory pointer. It starts at
0
. - Access to the
stdin
andstdout
.
Code for Program
state:
const MEMORY_SIZE: usize = 30_000;
struct Program<'a> {
memory: [u8; MEMORY_SIZE],
pointer: usize,
stdin: Box<dyn io::Read + 'a>,
stdout: String,
}
The constructor should accept stdin
and build the empty program state:
impl<'a> Program<'a> {
fn new(stdin: Box<dyn io::Read + 'a>) -> Self {
Self {
memory: [0u8; MEMORY_SIZE],
pointer: 0,
stdin,
stdout: String::new(),
}
}
}
The execution of each operation is straightforward. It’s defined inside the execute
method:
impl<'a> Program<'a> {
fn execute(mut self, operations: &[Operation]) -> Result<String, InterpreterError> {
self.process_operations(operations)?;
Ok(self.stdout)
}
fn process_operations(&mut self, operations: &[Operation]) -> Result<(), InterpreterError> {
for operation in operations {
match operation {
Operation::MoveLeft(count) => {
self.pointer = self
.pointer
.checked_sub(*count)
.ok_or(InterpreterError::PointerOverflow)?;
}
Operation::MoveRight(count) => {
self.pointer = self
.pointer
.checked_add(*count)
.ok_or(InterpreterError::PointerOverflow)?;
if self.pointer >= self.memory.len() {
return Err(InterpreterError::PointerOverflow);
}
}
Operation::Increment(count) => {
self.memory[self.pointer] = self.memory[self.pointer]
.checked_add(*count)
.ok_or(InterpreterError::MemoryOverflow)?;
}
Operation::Decrement(count) => {
self.memory[self.pointer] = self.memory[self.pointer]
.checked_sub(*count)
.ok_or(InterpreterError::MemoryOverflow)?
}
Operation::Input => {
let mut buf = [0u8];
if let Err(err) = self.stdin.read(&mut buf) {
return Err(InterpreterError::StdinError(err));
}
self.memory[self.pointer] = buf[0] as u8;
}
Operation::Output => self.stdout.push(self.memory[self.pointer] as char),
Operation::Loop(operations) => {
while self.memory[self.pointer] != 0 {
self.process_operations(operations)?;
}
}
}
}
Ok(())
}
}
You can see that we use stdin
as the input and stdout
as the output. At the end of execution, we can return the stdout
.
To define a public interface for our interpreter, let’s define a separate public method:
pub fn interpret<'a>(
source: &'a str,
stdin: Box<dyn io::Read + 'a>,
) -> Result<String, InterpreterError> {
let operations = parse_source(source)?;
let program = Program::new(stdin);
program.execute(&operations)
}
To verify the implementation, we should use unit tests. You can refer to the unit tests inside the repository.
Benchmarking
To benchmark the implementation, we will use the criterion
crate. Sample code is testing fibonacci
program and is located in the file benches/interpreter_bench.rs
:
use brainfuck_interpreter::interpret;
use criterion::{criterion_group, criterion_main, Criterion};
fn fibonacci() -> String {
let source = "+++++++++++
>+>>>>++++++++++++++++++++++++++++++++++++++++++++
>++++++++++++++++++++++++++++++++<<<<<<[>[>>>>>>+>
+<<<<<<<-]>>>>>>>[<<<<<<<+>>>>>>>-]<[>++++++++++[-
<-[>>+>+<<<-]>>>[<<<+>>>-]+<[>[-]<[-]]>[<<[>>>+<<<
-]>>[-]]<<]>>>[>>+>+<<<-]>>>[<<<+>>>-]+<[>[-]<[-]]
>[<<+>>[-]]<<<<<<<]>>>>>[+++++++++++++++++++++++++
+++++++++++++++++++++++.[-]]++++++++++<[->-<]>++++
++++++++++++++++++++++++++++++++++++++++++++.[-]<<
<<<<<<<<<<[>>>+>+<<<<-]>>>>[<<<<+>>>>-]<-[>>.>.<<<
[-]]<<[>>+>+<<<-]>>>[<<<+>>>-]<<[<+>-]>[<+>-]<<<-]";
let input = "".as_bytes();
let actual = interpret(source, Box::new(input));
actual.expect("Program should work")
}
fn bench_fibonacci_interpreter(c: &mut Criterion) {
c.bench_function("fib", |b| b.iter(|| fibonacci()));
}
criterion_group!(benches, bench_fibonacci_interpreter);
criterion_main!(benches);
Benchmark result in Ryzen 5800H
:
fib time: [169.89 us 170.94 us 172.74 us]
Found 15 outliers among 100 measurements (15.00%)
5 (5.00%) high mild
10 (10.00%) high severe
Summary
This article described how to implement Brainfuck interpreter
from scratch in Rust
. We started by creating the workspace with the library inside, then defined custom types and the code. Lastly, we added unit tests and benchmarks for our implementation.
As a result, we received a library that can parse Brainfuck
language with the provided input and produce the correct output. Its implementation is tested and optimized. You can modify it even more. Here are some ideas:
Serialize operations in a separate file. That way, we can compile and execute the code separately to improve performance.
Optimize the interpreter even more. You can use benchmarks to track your performance.
Replace memory block’s type from u8
to u16
. It will provide support for UTF8
characters. Currently, it supports only ASCII
.
You can view the final code in my public repo.
In the following article, we will use this library to create a CLI
program, web application, and AWS Lambda
.