Part 1 - SQLite Introduction and Setting up the REPL in fdb

7 minute read

Welcome to Part 1! Here, we’ll quickly talk about the broad overview of SQLite’s internals (and the database we’ll make!) and then setup a basic Read-Eval-Print-Loop (REPL) for the binary in our project fdb.

The general “flow” for a SQLite query looks like this:

SQLite Architecture

Paraphrasing part 1 of cstack’s series, the first step in executing some SQL against a SQLite database is to make the API call (in the SQLite library) with your SQL statement. That statement first hits the “frontend” of the engine which includes the tokenizer, parser, and code generator. Basically, this classic Computer Science compiler theory by taking the SQL string, turning it into a series of tokens, letting the parser make sense of those tokens (based on the semantics of the language), and then generate some bytecode to be ran in a virtual machine. Another option after the parser is to do what’s known as a tree-walking interpreter where you just walk the Abstract Syntax Tree that the parser creates. This is usually easier to conceptualize and code but it does typically affect performance since its really just a large recursive function that creates a massive stack frame along the way. We’ll follow the original design and generate bytecode.

Side note: I feel fairly comfortable with this step due to David Beasley’s excellent “Write a Compiler” course. Sadly, it’s no longer being offerred, but it was a great way to really dive into the depths of compiler design and theory. (Side note to the side note: I am not claiming I could write a tokenizer and parser for the entirety of the SQL grammer but the small subset we’ll do here, I think I can handle because of this course).

After we have bytecode generated, the next step is to hit the backend which consists of the virutal machine, B-tree, pager, and OS interface. The virtual machine will take the generated bytecode and perform operations on the B-tree data structures that make up the tables and indices of the database. When it comes down to it, it’s just a giant match (not switch in Rust!) statement that takes action based on the instruction.

Once the virtual machine executes those operations, it hits the B-tree. The B-tree is truly the defining aspect of SQLite. The B-tree has many nodes that are each a page in length and this layer can retrieve pages from disk or write to disk by issuing commands to the pager.

The pager receives commands from the B-tree to read or write pages of data to the database file. The main function here is to properly write these pages to the correct offset in the database file and keep a cache of recently accessed pages in memory to make access quick and decide when to write them back to disk.

The pager then interfaces with the OS interface for the actual reading and writing of files. SQLite also calls this the “Virtual File System” or VFS to provide a generic API for opening, reading, writing, and closing files (and more!). The OS interface then implements the actual mechanisms for the OS that the database is running on (like Windows or Unix). cstack’s original tutorial skipped over this portion to not have to support multiple platforms but luckily in this tutorial, if we just leverage the Rust standard library std::fs module, we get cross-platform support for free!

Making the REPL for fdb

cstack’s original post details the basic REPL which requires a decent more amount of work than in Rust. cstack clearly appreciates good design upfront and created a C struct for an input buffer to reuse an allocation. This is where C is a fantastic language but clearly dated and doesn’t have a lot of the niceties of a modern language like Rust. We’ll go on our own here to make the simple REPL which just accepts input from the user and only recognizes the ‘.exit’ command.

Taking from cstack’s tutorial, this is the basic usage of sqlite3 from the command line:

$ sqlite3
SQLite version 3.34.1 2021-01-20 14:10:07
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.
sqlite> create table users (id int, username varchar(255), email varchar(255));
sqlite> .tables
users
sqlite> .exit
$

We’ll get to the point where we have something similar at the end of this post (minus the create table line).

Of course there are great crates out there to build elegant REPL’s such as rustyline (which gives you tab completion and history!) but we’ll do it the hard way in this tutorial!

We want to read from stdin line-by-line, so we’ll use the standard library’s BufRead trait to expose read_line() for easy reading. Of note, you first have to get a handle to stdin by locking it from other threads:

use std::io::{self, stdin, BufRead, Write};

fn main() -> Result<(), io::Error> {
    let mut stdin = stdin().lock();
    let mut buffer = String::new();

    loop {
        // print out the prompt and flush the buffer to ensure it prints
        print!("fdb > ");
        io::stdout().flush()?;

        // clear the String 
        buffer.clear();
        // read a line from stdin
        let _ = stdin.read_line(&mut buffer);
        // print it back out
        println!("{}", buffer.trim());
    }

    Ok(())
}

This is a simple REPL that does nothing but print the input back to the user. I added comments in the blog post to explain some things in depth if you are newer to Rust. One note: I decided for main to return a Result<(), io::Error> which basically means we can propagate Error’s up to main and print the message to the terminal on an unrecoverable Error. Other than that, the code should be self-explanatory here.

Instead of just reading a line and printing it, now we want to take some action based on that. In SQLite, there are two main ways to interact. There are “.” commands which are non-SQL “meta” commands to get information from the database. Of these, .exit is probably the easiest, it just exits the process. The other commands are standard SQL commands that we’ll tackle later. The SQL statements will be the bulk of our focus with the tokenizer, parser, etc. For now, we’ll just support .exit to get the REPL started.

In Rust, the easiest approach will be to use the read_line method on the BufRead trait which has a signature of: fn read_line(&mut self, buf: &mut String) -> Result<usize>. We’ll have to handle the Result. Ok(0) means EOF (i.e. the input has been closed from a Ctrl-C or Ctrl-D). Ok(n) means n bytes were read into the buffer. This will include the newline at the end of the line, so we’ll use trim() to remove all leading and trailing whitespace. Any Err on the Result will be propagated through and exit the process.

Then, we’ll match on the input and select our action based on that. To keep main minimal (a personal goal of mine), we’ll write a helper function to perform the command matching.

We’ll start with the modified minimal main loop that will change after buffer.clear():

        // -- SNIP --
        match stdin.read_line(&mut buffer) {
            Ok(0) => {
                // EOF; just return (no error)
                return Ok(());
            }
            Ok(_n) => {
                // process the command
                process_command(buffer.trim());
            }
            Err(e) => {
                // print error and return it
                eprintln!("Error while reading input: {e:?}");
                return Err(e);
            }
        }

Then, with process_command, we will just convert the input to all lowercase (to make matching case insensitive) and only take action based on the command. (Here, we could have returned something like a Result and taken action in main but instead we’ll just handle everything here and treat process_command as almost like an inline function).

fn process_command(cmd: &str) {
    let cmd = cmd.to_lowercase();
    match cmd.as_str() {
        ".exit" => {
            // Exit with success
            eprintln!("Goodbye!");
            std::process::exit(0);
        }
        _ => {
            println!("Unrecognized command: {cmd}");
        }
    }
}

Pretty simple. We make another allocation (sadly) to make the command lowercase and then match on the &str of that lowercase string. If the buffer has .exit, we exit the application. Everything else we treat as an unrecognized command and print that to the user.

And that’s it! 41 lines and we have a basic REPL for fdb. Here’s the entire code at this moment:

use std::io::{self, stdin, BufRead, Write};

fn main() -> Result<(), io::Error> {
    let mut stdin = stdin().lock();
    let mut buffer = String::new();

    loop {
        print!("fdb > ");
        io::stdout().flush()?;

        buffer.clear();

        match stdin.read_line(&mut buffer) {
            Ok(0) => {
                // EOF; just return (no error)
                return Ok(());
            }
            Ok(_n) => {
                process_command(buffer.trim());
            }
            Err(e) => {
                eprintln!("Error while reading input: {e:?}");
                return Err(e);
            }
        }
    }
}

fn process_command(cmd: &str) {
    let cmd = cmd.to_lowercase();
    match cmd.as_str() {
        ".exit" => {
            // Exit with success
            eprintln!("Goodbye!");
            std::process::exit(0);
        }
        _ => {
            println!("Unrecognized command: {cmd}");
        }
    }
}

That’s it for now! Next time, we’ll work on the SQL compiler and virtual machine.

Series

This section contains the order of the series for easier navigation.

Previous Article	Next Article
Intro	Coming Soon!

Share on

X Facebook LinkedIn Bluesky

Carter Burn

Part 1 - SQLite Introduction and Setting up the REPL in fdb

Making the REPL for fdb

Series

Share on

You May Also Enjoy

Roundup - April 9th

New Series: Let’s Build a Simple Database…in Rust

Hello World