Introduction

Welcome to ic-rusqlite, a convenience wrapper around the crate rusqlite for integrating SQLite databases in Internet Computer (IC) canisters.

SQLite is a lightweight, embedded database engine. ic-rusqlite allows to use the SQLite server on your Rust-based IC projects.

This guide explains how to install, configure, and optimize ic-rusqlite, including practical examples, performance considerations, and best practices.

Installation

Prerequisites

  • Rust

  • DFX

  • install wasi2ic: cargo install wasi2ic

  • install WASI target: rustup target add wasm32-wasip1

Setting Up the Environment for SQLite compilation from source

If you intend to compile the sqlite from source, you will need to install WASI-SDK. You can setup your build environment via script:

curl -fsSL https://raw.githubusercontent.com/wasm-forge/ic-rusqlite/main/prepare.sh | sh

The script will:

  • download WASI-SDK and WASI-oriented clang: WASI-SDK.
  • Define variables in your .bashrc:
export WASI_SDK_PATH=<path to wasi-sdk>
export PATH=$WASI_SDK_PATH/bin:$PATH

Creating Your SQLite Canister

Configure your canister

To integrate SQLite into your backend Rust Canister project, you will need to add the ic-rusqlite dependency and adjust the build process.

The first part is easy:

cargo add ic-rusqlite

The second part requires you to update the dfx.json to specify path to wasm binary, set cansiter type to custom, and specify custom build steps to enforce compilation to the wasm32-wasip1 target. Finally, use wasi2ic to produce a Wasm executable on the Internet Computer.

Example dfx.json:

{
  "canisters": {
    "backend": {
      "candid": "can.did",
      "package": "backend",
      "build": [
        "cargo build --release --target wasm32-wasip1",
        "wasi2ic target/wasm32-wasip1/release/backend.wasm target/wasm32-wasip1/release/backend_nowasi.wasm"
      ],
      "wasm": "target/wasm32-wasip1/release/backend_nowasi.wasm",
      "type": "custom",
      "metadata": [
        {
          "name": "candid:service"
        }
      ]
    }
  },
  "dfx": "0.28.0",
  "version": 1
}

Accessing the Database

The database connection is established with the first call to ic_rusqlite::with_connection(...), so there is no need to explicitly initialize or create a database.

Internally, the ic-rusqlite uses stable structures with the memory manager. The virtual memories 101..119 are reserved for the file system.

The virtual memory with the ID 120 is the default storage memory for the database database file is /DB/main.db. These settings are default, but can be change via the set_connection_config(...) function.

Note

The ability to associate a file with a virtual memory is a special feature of stable-fs. This allows to create dedicated files with fast I/O access.

Using File System

ic-rusqlite is compiled to the WASI target and then processed by the wasi2ic workflow, embedding ic-wasi-polyfill and stable-fs into the output binary. This enables the use of standard Rust I/O APIs for file operations.

Note

By default the main database file is stored in the root folder: /DB/main.db and there are a few additional helper files that can be created by the databsae engine.

Using Stable Structures

You can freely create other stable structures for your extra storage needs, just make sure to use a virtual memory ID that is not yet occupied.

Use Memory Manager

Make sure you are using the memory manager or you will destroy the database and the file system stored in the stable memory.

Basic Canister Example

A very basic canister backend for storing persons can look as follows:

#![allow(unused)]
fn main() {
use ic_rusqlite::with_connection;

#[ic_cdk::update]
fn add(name: String, data: String, age: u32) {
    with_connection(|conn| {
        // execute statement with parameters
        conn.execute(
            "INSERT INTO person (name, data, age) VALUES (?1, ?2, ?3)",
            (&name, &data, age),
        )
        .unwrap();
    })
}

#[ic_cdk::query]
fn list() -> Vec<(u64, String, String, u32)> {
    with_connection(|conn| {
        // prepare SQL statement
        let mut stmt = conn
            .prepare("SELECT id, name, data, age FROM person")
            .unwrap();

        // execute statement and map results into an iterator
        let iter = stmt
            .query_map([], |row| {
                Ok((row.get(0)?, row.get(1)?, row.get(2)?, row.get(3)?))
            })
            .unwrap();

        iter.map(|r| r.unwrap()).collect()

    })
}

#[ic_cdk::init]
fn init() {
    with_connection(|conn| {
        // create the initial tables on the first deployment

        conn.execute(
            "CREATE TABLE IF NOT EXISTS person (
                id    INTEGER PRIMARY KEY,
                name  TEXT NOT NULL,
                data  TEXT,
                age   INTEGER
            )",
            (),
        )
        .unwrap();

    })

}
}

For a complete picture of what you can accomplish with a Connection, see the rusqlite documentation.

Creating a custom database connection

You can create a customized database connection, in case you want to store it in another file mount onto another virtual memory or customize its initial pragma settings:

#![allow(unused)]
fn main() {
#[init]
fn init() {
    // default configuration
    let mut config = ic_rusqlite::ConnectionConfig::new();

    // optinally, create a custom connection to a database different from the default one
    config.db_file_name = "/my_custom_path/my_base.db".to_string(); // some custom path to the database
    config.db_file_mount_id = Some(150); // store database in the virtual memory ID 150
    config
        .pragma_settings
        .insert("cache_size".to_string(), "10000".to_string()); // modify the default pragma settings

    ic_rusqlite::set_connection_config(config);
    //...
    // The actual connection is not needed here, it will be done automatically on the next "with_connection" statement.
}

#[pre_upgrade]
fn pre_upgrade() {
    // closing connection explicitly unlocks the database before canister upgrade
    ic_rusqlite::close_connection();
}

#[post_upgrade]
fn post_upgrade() {
    // same initialization
    init();
}
}

In this example, we explicitly close the database connection in the pre_upgrade hook. This ensures that the database lock is released, preventing any issues when creating a new connection after the canister upgrade.

Note

Note: Since the canister operates in single-user mode, there is no risk of conflicts from concurrent connections. Therefore, ic-rusqlite will attempt to delete the lock file when establishing a database connection, if one is found. This means that even if you do not explicitly close the connection, you will not be locked out of the database after an upgrade. However, the situation is different if the database remains locked and, after a canister upgrade, you attempt to use it with an ATTACH DATABASE query.

ic-rusqlite Users-Orders Benchmark

This project benchmarks ic-rusqlite dependency on a benchmark canister, the original benchmark project can be found here in the benchmarks repository.

Pragma settings

This shows the current benchmark results for database performance based on a database file that writes directly to a stable memory (minimized chunked storage overheads).

Following pragma settings:

PragmaValueDescription
journalPERSISTpersist journal file (is faster than deleting the file every time). Setting it to OFF works faster, but disallows atomic COMMIT/ROLLBACK
synchronousNORMALa readonable value for data safety and performance
page_size4096a reasonable default value
locking_modeEXCLUSIVEexclusive mode is faster because we avoid locking and unlocking the database for each query
temp_storeMEMORYcauses to keep the temporary data in memory, at the moment this is necessary to avoid sqlite cash during complex queries
cache_size1000000gives a significant performance boost at the expence of the canister memory used. (It tries to keep the whole database in memory, thus reducing read operation request count)

Database structure

CREATE TABLE users (
  user_id INTEGER PRIMARY KEY AUTOINCREMENT,
  username TEXT NOT NULL,
  email TEXT NOT NULL,
  created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

CREATE TABLE orders (
  order_id INTEGER PRIMARY KEY AUTOINCREMENT,
  user_id INTEGER NOT NULL,
  amount REAL NOT NULL,
  created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
  FOREIGN KEY (user_id) REFERENCES users(user_id)
);

Benchmark results

TestCycles cost
Create 100 000 users (cached INSERT query with parameters executed 100000 times).1.94 B
Create 1M orders (each refers to one of the users, no extra indexes present)20.05 B
Create indexes, when there are 1M orders in the table: CREATE INDEX IF NOT EXISTS idx_users_email ON users(email);CREATE INDEX IF NOT EXISTS idx_orders_user_id ON orders(user_id);6.27 B
Make a joint selection (100K users, 1M orders): SELECT u.user_id, u.username, o.order_id, o.amount FROM users u JOIN orders o ON u.user_id = o.user_id WHERE u.user_id < 1000 ORDER BY o.created_at DESC;202.28 M
Select using LIKE on an indexed field: SELECT * FROM users WHERE email LIKE 'user%'781.46 M
Create 100 extra orders after there were already 1M orders and field indexes created.9.03 M
Remove 1000 orders (we remove all orders from the first 100 users): DELETE FROM orders WHERE user_id <= 10023.14 M
Create 1M orders after indices were created29.88 B
Delete 100000 orders with transaction rollback: BEGIN TRANSACTION; DELETE FROM orders WHERE order_id > 900000; ROLLBACK1.53 B

Reproducing the Benchmarks

Check out the Benchmark repository sql-users-orders if you wish to repeat or examine the experiment.

Testing SQLite on a Large Dataset

To build a large database, the sample Chinook database from the SQLite tutorial was used.

Chinook Entity Relationship Diagram

Before testing, the database was changed to have a typical 4096 byte page size.

The customers table was constantly filled with relatively large rows, until the table has reached 1001753 rows and the DB size has grown to 71_248_150_528 bytes.

The first name and last name are generated as a string concatenation: {id}customer_name{id} and {id}customer_last_name{id}, so that all names are unique.

Benchmarks

TestCycles cost (first run)Cycles cost (second run)
Count elements on the first call: SELECT COUNT(*) FROM customers79501945431399546
Customer search by row ID: SELECT firstname, lastname, email WHERE customer_id=9000003856220662421
Select by the indexed field first name by the exact match: SELECT firstname, lastname, email FROM customers WHERE firstname = "2912169customer_name2912169"3528508361988
Select by first name, that does not exist: SELECT firstname, lastname, email FROM customers WHERE firstname = "1"3214356359929
Customer search by row ID: SELECT firstname, lastname, email FROM customers WHERE customerid>900000 and customerid<90005045809516740405
Customer count depending on the first name: SELECT count(*) FROM customers WHERE firstname>="1" and firstname<"2"6729952521361297471
Select a person that doesn't exist using LIKE: SELECT firstname, lastname, email FROM customers WHERE firstname LIKE "a%"Failed
Select a person using OFFSET: SELECT firstname, lastname, email FROM customers WHERE firstname LIKE "a%"Failed

Note

Note: Both failing queries require full table scan, and SQLite is not able to process that amount within estimated time, you can check if a query is going for a full scan by prepending EXPLAIN QUERY PLAN to your query:

EXPLAIN QUERY PLAN SELECT firstname, lastname, email FROM customers LIMIT 5 OFFSET 900000

This test shows that it is possible to create a very large database, it is however, not as optimal as one might expect. On a large database some operations will always fail because they would require too many cycles. The exact limit on the database size may depend on many aspects such as the page size, the row size and number, and the table structure. The actual "practial" database size might be in range from 2GiB to 20GiB, which might be enough in the majority of usecases.

Reproducing the Benchmarks

Check out the Benchmark repository ic-chinook-base if you wish to repeat or examine the experiment.

Tips on Working with Large Datasets

  • Use indexed queries.

Plan ahead, which fields will be critical to search the right elements and create indexes on those fields. For example, if you need to quickly find a person by name, make sure this name field is indexed:

CREATE INDEX IF NOT EXISTS idx_persons_name ON customers(name);

Also plan how you store your data. If you store the first_name and last_name concatenated in the same column, it won’t be possible to efficiently search by last_name without performing a full table scan, e.g.:

... WHERE name LIKE '%Johnson'
  • Check Instructions passed to see if you want to quit early and bulk insertions.
  1. When processing queries iteratively, check timing constraints inside the loop. If there isn’t enough time to complete the operation, exit early with a partial result rather than letting the process overrun.

  2. Every call to execute opens and closes a transaction. To improve performance when inserting many records, open a transaction before the first insert and commit changes only once after all inserts are complete. This avoids committing after each row:

#![allow(unused)]
fn main() {
#[ic_cdk::update]
fn add_customers(id_offset: u64) {
    let start = ic_instruction_counter();

    with_connection(|mut conn| {
        let tx = conn.transaction().unwrap();

        let sql =
            String::from("insert into customers (firstname, lastname, email) values (?, ?, ?)");

        {
            let mut stmt = tx.prepare_cached(&sql).unwrap();

            let mut i = 0;

            // do as many rows as the timing allows
            while i < 100000000 {
                let id = id_offset + i + 1;
                let name = format!("{id}customer_name{id}");
                let last_name = format!("{id}customer_last_name{id}");
                let email = format!("{id}customer@example.com");

                stmt.execute(ic_rusqlite::params![name, last_name, email])
                    .expect("insert of a user failed!");

                i += 1;

                //
                let end = ic_instruction_counter();
                if end - start > 20000000000 {
                    break;
                }
            }
        }

        tx.commit().expect("COMMIT USER INSERTION FAILED!");
    });
}

}
  • Examine query plan.

To identify problems with a complex or slow query, study its query plan. See if there are any full scans on a table, you might want to change the query logic and/or introduce indexes.

  • Use pagination.

To avoid running out of instructions and the returned data size on potentially large datasets, it is possible to limit the number of data rows returned by the SQL query. Use LIMIT <N> to limit the number of rows returned by a query, and OFFSET <N> to skip the first N rows of a response.

To return at most 5 persons and skip the first 3, write:

SELECT * FROM persons LIMIT 5 OFFSET 3

Pragma settings

Following pragma settings are set in ic-rusqlite by default:

journal_mode - PERSIST - persist journal file (is faster than deleting the file every time). Setting it to OFF works even faster, but disallows atomic COMMIT/ROLLBACK.

synchronous - OFF - because writes are always written and not cached, we do not need to explicity "flush" data changes to the disk.

page_size - 4096 - bigger page size works a bit faster by reduring sequential writes. The performance degrades, however, if there are many small writes at random locations.

locking_mode
- EXCLUSIVE
- exclusive mode is faster because we only lock the database once (the canister is the only user of the database).

temp_store - MEMORY
- avoids creating temporary file on disk, saves extra I/O operations.

cache_size - 500000 - set cache size to 4096 * 500000 bytes, this gives a significant performance boost at the expence of the canister's memory used.

The pragma settings can be adjusted with SQL queries, you might want to do that during the init or the post_upgrade events:

    PRAGMA cache_size=1000; -- Reduce the maximum DB cache size

Migrations

During canister development you might come across the situation when you need upgrade your canister's database structure. To do it, check out the ic-sql-migrate library. With it you can keep all the database changes within your source code and automatically upgrade your database on every deployment.

Backup and Recover

Download and Uploading the Database

To create a backup of your database, you can use the icml tool. The download command lets you save the canister’s database to your local drive.

Later, you can restore it by uploading the database back with the upload command.

By downloading and uploading the database it should be also possible to move the database from one canister to another.

Do not read or write database with connections open

Before running download or upload, the canister's database connection must be closed. This ensures that the cached pages of the SQLite engine do not conflict with the database written on disk and your database does not get corrupted.

Recovering from a Canister Snapshot

If a canister becomes unusable and its functions cannot be called, but you can still create and download a snapshot, icml can recover the SQLite database directly from the stable memory snapshot file.

When Is SQLite Appropriate?

SQLite is a fast, lightweight SQL database that stores all data in a single file. This works well with a canister's memory model, but there are pros and cons to consider before choosing it.

When to Use SQLite

SQLite is ideal when you need structured, reliable, and transactional data storage but don’t want the complexity of running a database server.

  • You need complex queries. SQLite supports standard SQL, allowing you to perform advanced queries using SELECT, JOIN, GROUP BY, and other operations.

  • You need transactions. SQLite is ACID-compliant, so it guarantees data consistency, supports rollback, and handles multi-step operations safely.

  • Tabular data with multiple indexes. SQLite is useful, if you have a large table and you want to find records by multiple indexed fields.

When to Use Stable Structures

If your data needs are simple, stable structures may be faster and easier to use.

Use a stable structure like Vec, BTreeMap, or BTreeSet when:

  • You don’t need transactions. If you don’t need rollback or atomic updates.

  • You don’t need SQL-style queries. For simple key -> value dictionary search, stable data structures are faster and easier to work with.

  • You need speed over flexibility. Structures like Vec provide fast access to an array of elements, and BTreeMap gives sorted element access with logarithmic complexity.

When to Use the File System

For file-based hierarchical data, the file system may be the best choice.

Use the file system storage when:

  • You want to organize data into directories and subfolders. Storing backups and documents.

  • You work with large data. Files are larger than the canister's heap memory and/or contain large sections of zeroed memory (sparse files).

  • You perform byte-level data manipulation. Editing or seeking within large files is faster with standard file I/O than loading and modifying data through an SQL database.