Build your own SQLite, Part 1: Listing tables

As developers, we use databases all the time. But how do they work? In this series, we'll try to answer that question by building our own SQLite-compatible database from scratch.
Source code examples will be provided in Rust, but you are encouraged to follow along using your language of choice, as we won't be relying on many language-specific features or libraries.
As an introduction, we'll implement the simplest version of the tables command,
which lists the names of all the tables in a database. While this looks simple, we'll
see that it requires us to make our first deep dive into the SQLite file format.
Building the test database
To keep things as simple as possible, let's build a minimalistic test database:
sqlite3 minimal_test.db
sqlite> create table table1(id integer);
sqlite> create table table2(id integer);
sqlite> .exit
This creates a database with two tables, table1 and table2, each with a single
column, id. We can verify this by running the tables command in the SQLite shell:
sqlite3 minimal_test.db
sqlite> .tables
table1 table2
sqlite> .exit
Bootstrapping the project
Let's start by creating a new Rust project. We'll use the cargo add to add our only dependency
for now, anyhow:
cargo new rsqlite
cd rsqlite
cargo add anyhow
The SQLite file format
SQLite databases are stored in a single file, the format of which is
documented in the SQLite File Format Specification.
The file is divided into pages, with each page having the same size: a power of 2, between
512 and 65536 bytes.
The first 100 bytes of the first page contain the database header, which includes
information such as the page size and the file format version. In this first part, we'll only
be interested in the page size.
Pages can be of different types, but for this first article, we'll only be interested in
table btree leaf pages, which store the actual table data.
Our first task will be to implement a Pager struct that reads and caches pages from the
database file. But before we do, we'll have to read the page size from the database header.
Let's start by defining our Header struct:
// src/page.rs
#[derive(Debug, Copy, Clone)]
pub struct DbHeader {
pub page_size: u32,
}
The header starts with the magic string SQLite format 3\0, followed by the page size
encoded as a big-endian 2-byte integer at offset 16. With this information, we can
implement a function that reads the header from a buffer:
// src/pager.rs
pub const HEADER_SIZE: usize = 100;
const HEADER_PREFIX: &[u8] = b"SQLite format 3\0";
const HEADER_PAGE_SIZE_OFFSET: usize = 16;
const PAGE_MAX_SIZE: u32 = 65536;
pub fn parse_header(buffer: &[u8]) -> anyhow::Result<page::DbHeader> {
if !buffer.starts_with(HEADER_PREFIX) {
let prefix = String::from_utf8_lossy(&buffer[..HEADER_PREFIX.len()]);
anyhow::bail!("invalid header prefix: {prefix}");
}
let page_size_raw = read_be_word_at(buffer, HEADER_PAGE_SIZE_OFFSET);
let page_size = match page_size_raw {
1 => PAGE_MAX_SIZE,
n if ((n & (n - 1)) == 0) && n != 0 => n as u32,
_ => anyhow::bail!("page size is not a power of 2: {}", page_size_raw),
};
Ok(page::Header { page_size })
}
fn read_be_word_at(input: &[u8], offset: usize) -> u16 {
u16::from_be_bytes(input[offset..offset + 2].try_into().unwrap())
}
Two things to note here:
- As the maximum page size cannot be represented as a 2-byte integer, a page size of 1 is use to represent the maximum page size.
- We use a somewhat convoluted expression to check if the page size is a power of 2.
The expression
n & (n - 1) == 0is true if and only ifnis a power of 2, except forn = 0.