Introduction

Welcome to ic-rusqlite, a convenience wrapper around the crate rusqlite for integrating SQLite databases in Internet Computer (IC) canisters.

SQLite is a lightweight, embedded database engine. ic-rusqlite allows to use the SQLite server on your Rust-based IC projects.

This guide explains how to install, configure, and optimize ic-rusqlite, including practical examples, performance considerations, and best practices.

Installation

Required Tools

Setting Up the Environment

You can setup your build environment via script:

curl -fsSL https://raw.githubusercontent.com/wasm-forge/ic-rusqlite/main/prepare.sh | sh

The script will:

  • install wasi2ic: cargo install wasi2ic
  • install WASI target: rustup target add wasm32-wasip1
  • download WASI-SDK and WASI-oriented clang: WASI-SDK.
  • Finally, define variables in your .bashrc:
export WASI_SDK=<path to wasi-sdk>
export PATH=$WASI_SDK/bin:$PATH

Creating Your SQLite Canister

Configure your canister

To integrate SQLite into your backend Rust Canister project, you will need to add the ic-rusqlite dependency and adjust the build process.

The first part is easy:

cargo add ic-rusqlite

The second part requires you to update the dfx.json to specify path to wasm binary, set cansiter type to custom, and specify custom build steps to enforce compilation to the wasm32-wasip1 target. Finally, use wasi2ic to produce a Wasm executable on the Internet Computer.

Example dfx.json:

{
  "canisters": {
    "backend": {
      "candid": "can.did",
      "package": "backend",
      "build": [
        "cargo build --release --target wasm32-wasip1",
        "wasi2ic target/wasm32-wasip1/release/backend.wasm target/wasm32-wasip1/release/backend_nowasi.wasm"
      ],
      "wasm": "target/wasm32-wasip1/release/backend_nowasi.wasm",
      "type": "custom",
      "metadata": [
        {
          "name": "candid:service"
        }
      ]
    }
  },
  "dfx": "0.28.0",
  "version": 1
}

Accessing the Database

The database connection is established with the first call to ic_rusqlite::with_connection(...), so you don't need to explicitly initialize or create the database.

Internally, the ic_rusqlite depends on the ic-wasi-polyfill library, which is backed up by the stable-fs storage. The stable-fs uses stable structures with the memory manager. The virtual memories 101..119 are reserved for the file system, and the virtual memory with the ID 120 is storing the database.

Note

The ability to associate a file with a virtual memory is a special feature of stable-fs. This allows to create dedicated files with fast I/O access.

Using File System

With the ic-rusqlite it is possible to use standart Rust I/O functions to create files.

Currently, for technical reasons, the database is stored in the root folder: /main.db, but a few additional helper files may be created by the SQLite engine.

Other Stable Structures

You can freely create other stable structures for your extra storage needs, just make sure to use a virtual memory ID that is not yet occupied.

Use Memory Manager

Make sure you are using the memory manager or you will destroy the database and the file system stored in the stable memory.

Basic Canister Example

A very basic canister backend for storing persons can look as follows:

#![allow(unused)]
fn main() {
use ic_rusqlite::with_connection;

#[ic_cdk::update]
fn add(name: String, data: String, age: u32) {
    with_connection(|conn| {
        // execute statement with parameters
        conn.execute(
            "INSERT INTO person (name, data, age) VALUES (?1, ?2, ?3)",
            (&name, &data, age),
        )
        .unwrap();
    })
}

#[ic_cdk::query]
fn list() -> Vec<(u64, String, String, u32)> {
    with_connection(|conn| {
        // prepare SQL statement
        let mut stmt = conn
            .prepare("SELECT id, name, data, age FROM person")
            .unwrap();

        // execute statement and map results into an iterator
        let iter = stmt
            .query_map([], |row| {
                Ok((row.get(0)?, row.get(1)?, row.get(2)?, row.get(3)?))
            })
            .unwrap();

        iter.map(|r| r.unwrap()).collect()

    })
}

#[ic_cdk::init]
fn init() {
    with_connection(|conn| {
        // create the initial tables on the first deployment

        conn.execute(
            "CREATE TABLE IF NOT EXISTS person (
                id    INTEGER PRIMARY KEY,
                name  TEXT NOT NULL,
                data  TEXT,
                age   INTEGER
            )",
            (),
        )
        .unwrap();

    })

}
}

For a complete picture of what you can accomplish with a Connection, see the rusqlite documentation.

ic-rusqlite Users-Orders Benchmark

This project benchmarks ic-rusqlite dependency on a benchmark canister, the original benchmark project can be found here in the benchmarks repository.

Prerequisites

It is assumed that you have rust, dfx.

You will also need the Wasm-oriented clang installation.

Install wasi2ic and canbench:

  cargo install wasi2ic
  cargo install canbench

Deployment and testing

To run benchmarks, launch the canbench command.

Pragma settings

This shows the current benchmark results for database performance based on a database file that writes directly to a stable memory (minimized chunked storage overheads).

Following pragma settings:

PragmaValueDescription
journalPERSISTpersist journal file (is faster than deleting the file every time). Setting it to OFF works faster, but disallows atomic COMMIT/ROLLBACK
synchronousNORMALa readonable value for data safety and performance
page_size4096a reasonable default value
locking_modeEXCLUSIVEexclusive mode is faster because we avoid locking and unlocking the database for each query
temp_storeMEMORYcauses to keep the temporary data in memory, at the moment this is necessary to avoid sqlite cash during complex queries
cache_size1000000gives a significant performance boost at the expence of the canister memory used. (It tries to keep the whole database in memory, thus reducing read operation request count)

Database structure

CREATE TABLE users (
  user_id INTEGER PRIMARY KEY AUTOINCREMENT,
  username TEXT NOT NULL,
  email TEXT NOT NULL,
  created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

CREATE TABLE orders (
  order_id INTEGER PRIMARY KEY AUTOINCREMENT,
  user_id INTEGER NOT NULL,
  amount REAL NOT NULL,
  created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
  FOREIGN KEY (user_id) REFERENCES users(user_id)
);

Benchmark results

TestCycles cost
Create 100 000 users (cached INSERT query with parameters executed 100000 times).1.94 B
Create 1M orders (each refers to one of the users, no extra indexes present)20.05 B
Create indexes, when there are 1M orders in the table: CREATE INDEX IF NOT EXISTS idx_users_email ON users(email);CREATE INDEX IF NOT EXISTS idx_orders_user_id ON orders(user_id);6.27 B
Make a joint selection (100K users, 1M orders): SELECT u.user_id, u.username, o.order_id, o.amount FROM users u JOIN orders o ON u.user_id = o.user_id WHERE u.user_id < 1000 ORDER BY o.created_at DESC;202.28 M
Select using LIKE on an indexed field: SELECT * FROM users WHERE email LIKE 'user%'781.46 M
Create 100 extra orders after there were already 1M orders and field indexes created.9.03 M
Remove 1000 orders (we remove all orders from the first 100 users): DELETE FROM orders WHERE user_id <= 10023.14 M
Create 1M orders after indices were created29.88 B
Delete 100000 orders with transaction rollback: BEGIN TRANSACTION; DELETE FROM orders WHERE order_id > 900000; ROLLBACK1.53 B

Testing SQLite on a Large Dataset

To build a large database, the sample Chinook database from the SQLite tutorial was used.

Chinook Entity Relationship Diagram

The customers table was constantly filled with relatively large rows, until the table has reached 1115397 rows and the DB size has grown to 80_106_637_312 bytes.

The first name and last name are generated as a string concatenation: {id}customer_name{id} and {id}customer_last_name{id}, so that all names are unique.

Benchmarks

TestCycles cost (first run)Cycles cost (second run)
Count elements on the first call: SELECT COUNT(*) FROM customers123924021596205886
Customer search by row ID: SELECT firstname, lastname, email WHERE customer_id=9000003631601484516
Select by the indexed field first name by the exact match: SELECT firstname, lastname, email FROM customers WHERE firstname = "2912169customer_name2912169"3325234562988
Select by first name, that does not exist: SELECT firstname, lastname, email FROM customers WHERE firstname = "1"3006713160316
Customer search by row ID: SELECT firstname, lastname, email FROM customers WHERE customerid>900000 and customerid<900050638393431821716
Customer count depending on the first name: SELECT count(*) FROM customers WHERE firstname>="1" and firstname<"2"6498875034254846762
Select a person that doesn't exist using LIKE: SELECT firstname, lastname, email FROM customers WHERE firstname LIKE "a%"Failed
Select a person using OFFSET: SELECT firstname, lastname, email FROM customers WHERE firstname LIKE "a%"Failed

Note

Note: Both failing queries require full table scan, and SQLite is not able to process that amount within estimated time, you can check if a query is going for a full scan by prepending EXPLAIN QUERY PLAN to your query:

EXPLAIN QUERY PLAN SELECT firstname, lastname, email FROM customers LIMIT 5 OFFSET 900000

This test shows that it is possible to create a very large database, it is however, not as optimal as one might expect. On a large database some operations will always fail because they would require too many cycles. The exact limit on the database size may depend on many aspects such as the page size, the row size and number, and the table structure. The actual "practial" database size might be in range from 2GiB to 20GiB, which might be enough in the majority of usecases.

Reproducing the Benchmarks

Check out the Benchmark repository ic-chinook-base if you wish to repeat or examine the experiment.

Tips on Working with Large Datasets

  • Use indexed queries.

Plan ahead, which fields will be critical to search the right elements and create indexes on those fields. For example, if you need to quickly find a person by name, make sure this name field is indexed:

CREATE INDEX IF NOT EXISTS idx_persons_name ON customers(name);

Also plan how you store your data. If you store the first_name and last_name concatenated in the same column, it won’t be possible to efficiently search by last_name without performing a full table scan, e.g.:

... WHERE name LIKE '%Johnson'
  • Check Instructions passed to see if you want to quit early and bulk insertions.
  1. When processing queries iteratively, check timing constraints inside the loop. If there isn’t enough time to complete the operation, exit early with a partial result rather than letting the process overrun.

  2. Every call to execute opens and closes a transaction. To improve performance when inserting many records, open a transaction before the first insert and commit changes only once after all inserts are complete. This avoids committing after each row:

#![allow(unused)]
fn main() {
#[ic_cdk::update]
fn add_customers(id_offset: u64) {
    let start = ic_instruction_counter();

    with_connection(|mut conn| {
        let tx = conn.transaction().unwrap();

        let sql =
            String::from("insert into customers (firstname, lastname, email) values (?, ?, ?)");

        {
            let mut stmt = tx.prepare_cached(&sql).unwrap();

            let mut i = 0;

            // do as many rows as the timing allows
            while i < 100000000 {
                let id = id_offset + i + 1;
                let name = format!("{id}customer_name{id}");
                let last_name = format!("{id}customer_last_name{id}");
                let email = format!("{id}customer@example.com");

                stmt.execute(ic_rusqlite::params![name, last_name, email])
                    .expect("insert of a user failed!");

                i += 1;

                //
                let end = ic_instruction_counter();
                if end - start > 20000000000 {
                    break;
                }
            }
        }

        tx.commit().expect("COMMIT USER INSERTION FAILED!");
    });
}

}
  • Examine query plan.

To identify problems with a complex or slow query, study its query plan. See if there are any full scans on a table, you might want to change the query logic and/or introduce indexes.

  • Use pagination.

To avoid running out of instructions and the returned data size on potentially large datasets, it is possible to limit the number of data rows returned by the SQL query. Use LIMIT <N> to limit the number of rows returned by a query, and OFFSET <N> to skip the first N rows of a response.

To return at most 5 persons and skip the first 3, write:

SELECT * FROM persons LIMIT 5 OFFSET 3

Pragma settings

Following pragma settings are set in ic-rusqlite by default:

journal_mode - PERSIST - persist journal file (is faster than deleting the file every time). Setting it to OFF works even faster, but disallows atomic COMMIT/ROLLBACK.

synchronous - OFF - because writes are always written and not cached, we do not need to explicity "flush" data changes to the disk.

page_size - 4096 - bigger page size works a bit faster by reduring sequential writes. The performance degrades, however, if there are many small writes at random locations.

locking_mode
- EXCLUSIVE
- exclusive mode is faster because we only lock the database once (the canister is the only user of the database).

temp_store - MEMORY
- avoids creating temporary file on disk, saves extra I/O operations.

cache_size - 500000 - set cache size to 4096 * 500000 bytes, this gives a significant performance boost at the expence of the canister's memory used.

The pragma settings can be adjusted with SQL queries, you might want to do that during the init or the post_upgrade events:

    PRAGMA cache_size=1000; -- Reduce the maximum DB cache size

When Is SQLite Appropriate?

SQLite is a fast, lightweight SQL database that stores all data in a single file. This works well with a canister's memory model, but there are pros and cons to consider before choosing it.

When to Use SQLite

SQLite is ideal when you need structured, reliable, and transactional data storage but don’t want the complexity of running a database server.

  • You need complex queries. SQLite supports standard SQL, allowing you to perform advanced queries using SELECT, JOIN, GROUP BY, and other operations.

  • You need transactions. SQLite is ACID-compliant, so it guarantees data consistency, supports rollback, and handles multi-step operations safely.

  • Tabular data with multiple indexes. SQLite is useful, if you have a large table and you want to find records by multiple indexed fields.

When to Use Stable Structures

If your data needs are simple, stable structures may be faster and easier to use.

Use a stable structure like Vec, BTreeMap, or BTreeSet when:

  • You don’t need transactions. If you don’t need rollback or atomic updates.

  • You don’t need SQL-style queries. For simple key -> value dictionary search, stable data structures are faster and easier to work with.

  • You need speed over flexibility. Structures like Vec provide O(1) index-based access, and BTreeMap gives sorted access with logarithmic complexity — both are faster than SQLite for many common tasks.

When to Use the File System

For file-based hierarchical data, the file system may be the best choice.

Use the file system storage when:

  • You want to organize data into directories and subfolders. Storing backups and documents.

  • You work with large data. Files are larger than the canister's heap memory and/or contain large sections of zeroed memory (sparse files).

  • You perform byte-level data manipulation. Editing or seeking within large files is faster with standard file I/O than loading and modifying data through an SQL database.