DuckLake

DuckLake is the main storage solution for the DuckDB database engine used by Ilum.

Since embedded DuckDB by default supports only in-memory mode or local disk storage (single user), DuckLake provides a multi-user, concurrent storage layer that enables shared access to DuckDB datasets across the platform.

When to use DuckLake

Use DuckLake when:

Multiple users or jobs need concurrent access to the same DuckDB tables
You need persistent table storage with metadata management
Your workload requires time travel or schema evolution

Skip DuckLake when:

You have single-user, ad-hoc analytics workflows
You’re prototyping and don’t need persistence
Direct Parquet file access is enough

Features

Shared dataset storage using object storage (S3/MinIO/GCS)
ACID-like durability through snapshot isolation and cross-table transactions
Time travel – query previous versions of your data
Schema evolution – add/modify columns without breaking existing queries
Compatibility with standard DuckDB file formats and Parquet

For more on DuckLake’s capabilities, see the official DuckLake documentation.

Concurrency Model

DuckLake provides snapshot isolation for concurrent workloads:

Readers never block writers, and writers never block readers
Each query sees a consistent snapshot of data at transaction start
Cross-table transactions maintain atomicity across related operations

This differs from strict serializable ACID – it’s optimized for analytic workloads where high read concurrency is prioritized over write serialization.

Configuration

note

DuckLake is attached by default to all DuckDB instances in Ilum when enabled. Tables created in SQL Viewer will automatically use DuckLake.

Configure DuckLake via the Helm chart:

ilum-core:
  sql:
    duckdb:
      ducklake:
        enabled: true                          # Set false to disable DuckLake entirely
        location: s3://ilum-ducklake/          # Root path for all DuckLake data
        postgres:                              # Metadata storage (required)
          host: "ilum-postgresql-hl"           # PostgreSQL service hostname
          port: 5432                           # PostgreSQL port
          database: ducklake                   # Database name (created automatically if missing)
          user: ilum                           # Database user
          password: "CHANGEMEPLEASE"           # Database password
        s3:                                   # Data storage backend
          endpoint: ilum-minio:9000           # S3 endpoint (MinIO, AWS S3, GCS, etc.)
          region: us-east-1                   # S3 region
          keyId: minioadmin                   # Access key ID
          secret: minioadmin                  # Secret access key
          urlStyle: path                      # Path-style access (use 'virtualHost' for AWS S3)
          ssl: false                          # Enable TLS for S3 connections

Disabling DuckLake

Setting enabled: false causes DuckDB to fall back to in-memory tables only. Use this only for single-user scenarios where persistence isn’t required.

Usage Examples

Creating Tables

When enabled, DuckLake is automatically selected as the default storage backend for all DuckDB instances. This means any created tables in the default catalog will be stored in DuckLake.

-- Create a table (automatically stored in DuckLake)
CREATE TABLE events (
    event_id BIGINT,
    event_time TIMESTAMP,
    user_id BIGINT,
    payload JSON
);

Limitations

DuckDB-only: DuckLake catalogs cannot be accessed from Spark, Trino, or other engines
No branching/tagging: Unlike Nessie or Iceberg, DuckLake doesn't support Git-like version control
Concurrency bound by metadata DB: High-write concurrency may impact PostgreSQL performance

When to use DuckLake​

Features​

Concurrency Model​

Configuration​

Usage Examples​

Creating Tables​

Limitations​