Introduction
Welcome to ic-rusqlite, a convenience wrapper around the crate rusqlite
for integrating SQLite databases in Internet Computer (IC) canisters.
SQLite is a lightweight, embedded database engine. ic-rusqlite
allows to use the SQLite server on your Rust-based IC projects.
This guide explains how to install, configure, and optimize ic-rusqlite
, including practical examples, performance considerations, and best practices.
Installation
Prerequisites
-
install
wasi2ic
:cargo install wasi2ic
-
install WASI target:
rustup target add wasm32-wasip1
Setting Up the Environment for SQLite compilation from source
If you intend to compile the sqlite from source, you will need to install WASI-SDK. You can setup your build environment via script:
curl -fsSL https://raw.githubusercontent.com/wasm-forge/ic-rusqlite/main/prepare.sh | sh
The script will:
- download
WASI-SDK
and WASI-orientedclang
: WASI-SDK. - Define variables in your
.bashrc
:
export WASI_SDK_PATH=<path to wasi-sdk>
export PATH=$WASI_SDK_PATH/bin:$PATH
Creating Your SQLite Canister
Configure your canister
To integrate SQLite into your backend Rust Canister project, you will need to add the ic-rusqlite
dependency and adjust the build process.
The first part is easy:
cargo add ic-rusqlite
The second part requires you to update the dfx.json
to specify path to wasm
binary, set cansiter type
to custom
, and
specify custom build steps to enforce compilation to the wasm32-wasip1
target.
Finally, use wasi2ic
to produce a Wasm executable on the Internet Computer.
Example dfx.json
:
{
"canisters": {
"backend": {
"candid": "can.did",
"package": "backend",
"build": [
"cargo build --release --target wasm32-wasip1",
"wasi2ic target/wasm32-wasip1/release/backend.wasm target/wasm32-wasip1/release/backend_nowasi.wasm"
],
"wasm": "target/wasm32-wasip1/release/backend_nowasi.wasm",
"type": "custom",
"metadata": [
{
"name": "candid:service"
}
]
}
},
"dfx": "0.28.0",
"version": 1
}
Accessing the Database
The database connection is established with the first call to ic_rusqlite::with_connection(...)
, so there is no need to explicitly initialize or create a database.
Internally, the ic-rusqlite
uses stable structures with the memory manager. The virtual memories 101..119
are reserved for the file system.
The virtual memory with the ID 120
is the default storage memory for the database database file is /DB/main.db
. These settings are default, but can be change via the set_connection_config(...)
function.
The ability to associate a file with a virtual memory is a special feature of stable-fs
. This allows to create dedicated files with fast I/O access.
Using File System
ic-rusqlite
is compiled to the WASI target and then processed by the wasi2ic
workflow, embedding ic-wasi-polyfill
and stable-fs
into the output binary. This enables the use of standard Rust I/O APIs for file operations.
By default the main database file is stored in the root folder: /DB/main.db
and there are a few additional helper files that can be created by the databsae engine.
Using Stable Structures
You can freely create other stable structures for your extra storage needs, just make sure to use a virtual memory ID that is not yet occupied.
Make sure you are using the memory manager or you will destroy the database and the file system stored in the stable memory.
Basic Canister Example
A very basic canister backend for storing persons can look as follows:
#![allow(unused)] fn main() { use ic_rusqlite::with_connection; #[ic_cdk::update] fn add(name: String, data: String, age: u32) { with_connection(|conn| { // execute statement with parameters conn.execute( "INSERT INTO person (name, data, age) VALUES (?1, ?2, ?3)", (&name, &data, age), ) .unwrap(); }) } #[ic_cdk::query] fn list() -> Vec<(u64, String, String, u32)> { with_connection(|conn| { // prepare SQL statement let mut stmt = conn .prepare("SELECT id, name, data, age FROM person") .unwrap(); // execute statement and map results into an iterator let iter = stmt .query_map([], |row| { Ok((row.get(0)?, row.get(1)?, row.get(2)?, row.get(3)?)) }) .unwrap(); iter.map(|r| r.unwrap()).collect() }) } #[ic_cdk::init] fn init() { with_connection(|conn| { // create the initial tables on the first deployment conn.execute( "CREATE TABLE IF NOT EXISTS person ( id INTEGER PRIMARY KEY, name TEXT NOT NULL, data TEXT, age INTEGER )", (), ) .unwrap(); }) } }
For a complete picture of what you can accomplish with a Connection
, see the rusqlite documentation.
Creating a custom database connection
You can create a customized database connection, in case you want to store it in another file mount onto another virtual memory or customize its initial pragma settings:
#![allow(unused)] fn main() { #[init] fn init() { // default configuration let mut config = ic_rusqlite::ConnectionConfig::new(); // optinally, create a custom connection to a database different from the default one config.db_file_name = "/my_custom_path/my_base.db".to_string(); // some custom path to the database config.db_file_mount_id = Some(150); // store database in the virtual memory ID 150 config .pragma_settings .insert("cache_size".to_string(), "10000".to_string()); // modify the default pragma settings ic_rusqlite::set_connection_config(config); //... // The actual connection is not needed here, it will be done automatically on the next "with_connection" statement. } #[pre_upgrade] fn pre_upgrade() { // closing connection explicitly unlocks the database before canister upgrade ic_rusqlite::close_connection(); } #[post_upgrade] fn post_upgrade() { // same initialization init(); } }
In this example, we explicitly close the database connection in the pre_upgrade
hook. This ensures that the database lock is released, preventing any issues when creating a new connection after the canister upgrade.
Note: Since the canister operates in single-user mode, there is no risk of conflicts from concurrent connections. Therefore, ic-rusqlite
will attempt to delete the lock file when establishing a database connection, if one is found. This means that even if you do not explicitly close the connection, you will not be locked out of the database after an upgrade. However, the situation is different if the database remains locked and, after a canister upgrade, you attempt to use it with an ATTACH DATABASE
query.
ic-rusqlite
Users-Orders Benchmark
This project benchmarks ic-rusqlite
dependency on a benchmark canister, the original benchmark project can be found here in the benchmarks repository.
Pragma settings
This shows the current benchmark results for database performance based on a database file that writes directly to a stable memory (minimized chunked storage overheads).
Following pragma settings:
Pragma | Value | Description |
---|---|---|
journal | PERSIST | persist journal file (is faster than deleting the file every time). Setting it to OFF works faster, but disallows atomic COMMIT/ROLLBACK |
synchronous | NORMAL | a readonable value for data safety and performance |
page_size | 4096 | a reasonable default value |
locking_mode | EXCLUSIVE | exclusive mode is faster because we avoid locking and unlocking the database for each query |
temp_store | MEMORY | causes to keep the temporary data in memory, at the moment this is necessary to avoid sqlite cash during complex queries |
cache_size | 1000000 | gives a significant performance boost at the expence of the canister memory used. (It tries to keep the whole database in memory, thus reducing read operation request count) |
Database structure
CREATE TABLE users (
user_id INTEGER PRIMARY KEY AUTOINCREMENT,
username TEXT NOT NULL,
email TEXT NOT NULL,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
CREATE TABLE orders (
order_id INTEGER PRIMARY KEY AUTOINCREMENT,
user_id INTEGER NOT NULL,
amount REAL NOT NULL,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
FOREIGN KEY (user_id) REFERENCES users(user_id)
);
Benchmark results
Test | Cycles cost |
---|---|
Create 100 000 users (cached INSERT query with parameters executed 100000 times). | 1.94 B |
Create 1M orders (each refers to one of the users, no extra indexes present) | 20.05 B |
Create indexes, when there are 1M orders in the table: CREATE INDEX IF NOT EXISTS idx_users_email ON users(email);CREATE INDEX IF NOT EXISTS idx_orders_user_id ON orders(user_id); | 6.27 B |
Make a joint selection (100K users, 1M orders): SELECT u.user_id, u.username, o.order_id, o.amount FROM users u JOIN orders o ON u.user_id = o.user_id WHERE u.user_id < 1000 ORDER BY o.created_at DESC; | 202.28 M |
Select using LIKE on an indexed field: SELECT * FROM users WHERE email LIKE 'user%' | 781.46 M |
Create 100 extra orders after there were already 1M orders and field indexes created. | 9.03 M |
Remove 1000 orders (we remove all orders from the first 100 users): DELETE FROM orders WHERE user_id <= 100 | 23.14 M |
Create 1M orders after indices were created | 29.88 B |
Delete 100000 orders with transaction rollback: BEGIN TRANSACTION; DELETE FROM orders WHERE order_id > 900000; ROLLBACK | 1.53 B |
Reproducing the Benchmarks
Check out the Benchmark repository sql-users-orders if you wish to repeat or examine the experiment.
Testing SQLite on a Large Dataset
To build a large database, the sample Chinook
database from the SQLite tutorial was used.
Before testing, the database was changed to have a typical 4096
byte page size.
The customers
table was constantly filled with relatively large rows, until the table has reached 1001753
rows and the DB size has grown to 71_248_150_528
bytes.
The first name
and last name
are generated as a string concatenation: {id}customer_name{id}
and {id}customer_last_name{id}
, so that all names are unique.
Benchmarks
Test | Cycles cost (first run) | Cycles cost (second run) |
---|---|---|
Count elements on the first call: SELECT COUNT(*) FROM customers | 7950194543 | 1399546 |
Customer search by row ID: SELECT firstname, lastname, email WHERE customer_id=900000 | 38562206 | 62421 |
Select by the indexed field first name by the exact match: SELECT firstname, lastname, email FROM customers WHERE firstname = "2912169customer_name2912169" | 35285083 | 61988 |
Select by first name, that does not exist: SELECT firstname, lastname, email FROM customers WHERE firstname = "1" | 32143563 | 59929 |
Customer search by row ID: SELECT firstname, lastname, email FROM customers WHERE customerid>900000 and customerid<900050 | 45809516 | 740405 |
Customer count depending on the first name : SELECT count(*) FROM customers WHERE firstname>="1" and firstname<"2" | 6729952521 | 361297471 |
Select a person that doesn't exist using LIKE : SELECT firstname, lastname, email FROM customers WHERE firstname LIKE "a%" | Failed | |
Select a person using OFFSET : SELECT firstname, lastname, email FROM customers WHERE firstname LIKE "a%" | Failed |
Note: Both failing queries require full table scan, and SQLite is not able to process that amount within estimated time, you can check if a query is going for a full scan by prepending EXPLAIN QUERY PLAN
to your query:
EXPLAIN QUERY PLAN SELECT firstname, lastname, email FROM customers LIMIT 5 OFFSET 900000
This test shows that it is possible to create a very large database, it is however, not as optimal as one might expect. On a large database some operations will always fail because they would require too many cycles. The exact limit on the database size may depend on many aspects such as the page size, the row size and number, and the table structure. The actual "practial" database size might be in range from 2GiB to 20GiB, which might be enough in the majority of usecases.
Reproducing the Benchmarks
Check out the Benchmark repository ic-chinook-base if you wish to repeat or examine the experiment.
Tips on Working with Large Datasets
- Use indexed queries.
Plan ahead, which fields will be critical to search the right elements and create indexes on those fields. For example, if you need to quickly find a person by name
, make sure this name
field is indexed:
CREATE INDEX IF NOT EXISTS idx_persons_name ON customers(name);
Also plan how you store your data. If you store the first_name
and last_name
concatenated in the same column, it won’t be possible to efficiently search by last_name without performing a full table scan, e.g.:
... WHERE name LIKE '%Johnson'
- Check Instructions passed to see if you want to quit early and bulk insertions.
-
When processing queries iteratively, check timing constraints inside the loop. If there isn’t enough time to complete the operation, exit early with a partial result rather than letting the process overrun.
-
Every call to
execute
opens and closes a transaction. To improve performance when inserting many records, open a transaction before the first insert and commit changes only once after all inserts are complete. This avoids committing after each row:
#![allow(unused)] fn main() { #[ic_cdk::update] fn add_customers(id_offset: u64) { let start = ic_instruction_counter(); with_connection(|mut conn| { let tx = conn.transaction().unwrap(); let sql = String::from("insert into customers (firstname, lastname, email) values (?, ?, ?)"); { let mut stmt = tx.prepare_cached(&sql).unwrap(); let mut i = 0; // do as many rows as the timing allows while i < 100000000 { let id = id_offset + i + 1; let name = format!("{id}customer_name{id}"); let last_name = format!("{id}customer_last_name{id}"); let email = format!("{id}customer@example.com"); stmt.execute(ic_rusqlite::params![name, last_name, email]) .expect("insert of a user failed!"); i += 1; // let end = ic_instruction_counter(); if end - start > 20000000000 { break; } } } tx.commit().expect("COMMIT USER INSERTION FAILED!"); }); } }
- Examine query plan.
To identify problems with a complex or slow query, study its query plan. See if there are any full scans on a table, you might want to change the query logic and/or introduce indexes.
- Use pagination.
To avoid running out of instructions and the returned data size on potentially large datasets, it is possible to limit the number of data rows returned by the SQL query. Use LIMIT <N>
to limit the number of rows returned by a query, and OFFSET <N>
to skip the first N
rows of a response.
To return at most 5 persons and skip the first 3, write:
SELECT * FROM persons LIMIT 5 OFFSET 3
Pragma settings
Following pragma settings are set in ic-rusqlite
by default:
journal_mode
- PERSIST
- persist journal file (is faster than deleting the file every time). Setting it to OFF
works even faster, but disallows atomic COMMIT/ROLLBACK.
synchronous
- OFF
- because writes are always written and not cached, we do not need to explicity "flush" data changes to the disk.
page_size
- 4096
- bigger page size works a bit faster by reduring sequential writes. The performance degrades, however, if there are many small writes at random locations.
locking_mode
- EXCLUSIVE
- exclusive mode is faster because we only lock the database once (the canister is the only user of the database).
temp_store
- MEMORY
- avoids creating temporary file on disk, saves extra I/O operations.
cache_size
- 500000
- set cache size to 4096 * 500000 bytes, this gives a significant performance boost at the expence of the canister's memory used.
The pragma settings can be adjusted with SQL queries, you might want to do that during the init
or the post_upgrade
events:
PRAGMA cache_size=1000; -- Reduce the maximum DB cache size
Migrations
During canister development you might come across the situation when you need upgrade your canister's database structure. To do it, check out the ic-sql-migrate
library. With it you can keep all the database changes within your source code and automatically upgrade your database on every deployment.
Backup and Recover
Download and Uploading the Database
To create a backup of your database, you can use the icml
tool. The download
command lets you save the canister’s database to your local drive.
Later, you can restore it by uploading the database back with the upload
command.
By downloading and uploading the database it should be also possible to move the database from one canister to another.
Before running download
or upload
, the canister's database connection must be closed. This ensures that the cached pages of the SQLite engine do not conflict with the database written on disk and your database does not get corrupted.
Recovering from a Canister Snapshot
If a canister becomes unusable and its functions cannot be called, but you can still create and download a snapshot, icml
can recover the SQLite database directly from the stable memory snapshot file.
When Is SQLite Appropriate?
SQLite is a fast, lightweight SQL database that stores all data in a single file. This works well with a canister's memory model, but there are pros and cons to consider before choosing it.
When to Use SQLite
SQLite is ideal when you need structured, reliable, and transactional data storage but don’t want the complexity of running a database server.
-
You need complex queries. SQLite supports standard SQL, allowing you to perform advanced queries using
SELECT
,JOIN
,GROUP BY
, and other operations. -
You need transactions. SQLite is ACID-compliant, so it guarantees data consistency, supports rollback, and handles multi-step operations safely.
-
Tabular data with multiple indexes. SQLite is useful, if you have a large table and you want to find records by multiple indexed fields.
When to Use Stable Structures
If your data needs are simple, stable structures may be faster and easier to use.
Use a stable structure like Vec
, BTreeMap
, or BTreeSet
when:
-
You don’t need transactions. If you don’t need rollback or atomic updates.
-
You don’t need SQL-style queries. For simple
key -> value
dictionary search, stable data structures are faster and easier to work with. -
You need speed over flexibility. Structures like
Vec
provide fast access to an array of elements, andBTreeMap
gives sorted element access with logarithmic complexity.
When to Use the File System
For file-based hierarchical data, the file system may be the best choice.
Use the file system storage when:
-
You want to organize data into directories and subfolders. Storing backups and documents.
-
You work with large data. Files are larger than the canister's heap memory and/or contain large sections of zeroed memory (sparse files).
-
You perform byte-level data manipulation. Editing or seeking within large files is faster with standard file I/O than loading and modifying data through an SQL database.