DuckLake
DuckLake is the main storage solution for the DuckDB database engine used by Ilum.
Since embedded DuckDB by default supports only in-memory mode or local disk storage (single user), DuckLake provides a multi-user, concurrent storage layer that enables shared access to DuckDB datasets across the platform.
When to use DuckLake
Use DuckLake when:
- Multiple users or jobs need concurrent access to the same DuckDB tables
- You need persistent table storage with metadata management
- Your workload requires time travel or schema evolution
Skip DuckLake when:
- You have single-user, ad-hoc analytics workflows
- You’re prototyping and don’t need persistence
- Direct Parquet file access is enough
Features
- Shared dataset storage using object storage (S3/MinIO/GCS)
- ACID-like durability through snapshot isolation and cross-table transactions
- Time travel – query previous versions of your data
- Schema evolution – add/modify columns without breaking existing queries
- Compatibility with standard DuckDB file formats and Parquet
For more on DuckLake’s capabilities, see the official DuckLake documentation.
Concurrency Model
DuckLake provides snapshot isolation for concurrent workloads:
- Readers never block writers, and writers never block readers
- Each query sees a consistent snapshot of data at transaction start
- Cross-table transactions maintain atomicity across related operations
This differs from strict serializable ACID – it’s optimized for analytic workloads where high read concurrency is prioritized over write serialization.
Configuration
DuckLake is attached by default to all DuckDB instances in Ilum when enabled. Tables created in SQL Viewer will automatically use DuckLake.
Configure DuckLake via the Helm chart:
ilum-core:
sql:
duckdb:
ducklake:
enabled: true # Set false to disable DuckLake entirely
location: s3://ilum-ducklake/ # Root path for all DuckLake data
postgres: # Metadata storage (required)
host: "ilum-postgresql-hl" # PostgreSQL service hostname
port: 5432 # PostgreSQL port
database: ducklake # Database name (created automatically if missing)
user: ilum # Database user
password: "CHANGEMEPLEASE" # Database password
s3: # Data storage backend
endpoint: ilum-minio:9000 # S3 endpoint (MinIO, AWS S3, GCS, etc.)
region: us-east-1 # S3 region
keyId: minioadmin # Access key ID
secret: minioadmin # Secret access key
urlStyle: path # Path-style access (use 'virtualHost' for AWS S3)
ssl: false # Enable TLS for S3 connections
Setting enabled: false causes DuckDB to fall back to in-memory tables only.
Use this only for single-user scenarios where persistence isn’t required.
Usage Examples
Creating Tables
When enabled, DuckLake is automatically selected as the default storage backend for all DuckDB instances. This means any created tables in the default catalog will be stored in DuckLake.
-- Create a table (automatically stored in DuckLake)
CREATE TABLE events (
event_id BIGINT,
event_time TIMESTAMP,
user_id BIGINT,
payload JSON
);
Limitations
- DuckDB-only: DuckLake catalogs cannot be accessed from Spark, Trino, or other engines
- No branching/tagging: Unlike Nessie or Iceberg, DuckLake doesn't support Git-like version control
- Concurrency bound by metadata DB: High-write concurrency may impact PostgreSQL performance