Skip to main content

Object Storage in Ilum

Overview

Ilum runs entirely on S3-compatible object storage. Every bundled component that reads or writes objects (Trino, Nessie, MLflow, Airflow, Kestra, Langfuse, Jupyter, Loki, Hive Metastore, the Spark History Server, the Ilum core service, and the embedded notebook environment) targets the same Service alias, which routes to whichever provider is currently active. Switching backends is therefore a Helm flag, not a fleet-wide reconfiguration.

This section explains the storage model, lists the supported providers, and links to the operational guides for choosing, migrating between, and extending them.

Quick summary
  • Default in 6.7.2 and later: RustFS (Apache-2.0, bundled).
  • Still supported and shippable: MinIO (AGPL-3.0, bundled, opt-in).
  • Bring your own: any S3-compatible backend reachable from the cluster (AWS S3, Wasabi, Backblaze B2, on-prem MinIO, Cloud Storage via S3 interop, and others).
  • Planned bundled additions: Garage and SeaweedFS (registry-ready).

The ilum-objectstorage Service alias

A stable Service named ilum-objectstorage is provisioned in the release namespace. It is a label-selector alias that points at whichever provider is currently active. The Service exposes two ports:

  • 9000 for the S3 API.
  • 9001 for the provider's web console (proxied through the Ilum UI's nginx reverse proxy as part of the Object Storage view).

Bundled consumers target this alias by hostname (http://ilum-objectstorage:9000) rather than provider-specific names like ilum-minio or ilum-rustfs-svc. Flipping the active provider is therefore a single change to the alias selector. No consumer reconfiguration is required.

The provider registry

The set of providers known to the chart lives under objectStorage.providers in the helm_aio values:

objectStorage:
providers:
rustfs:
consolePath: /rustfs/console/
consoleMode: same-origin
minio:
consolePath: /external/minio/
consoleMode: nginx-rewrite

Each entry declares the provider's iframe path and routing mode for the Ilum UI's Object Storage view. The chart ships entries for the two bundled providers. Adding a third (Garage, SeaweedFS, or any S3-compatible backend) is a values-file edit. See Add a New Object Storage Provider.

Active provider, previous provider, cutover

Two flags determine which provider the alias targets when more than one is enabled:

  • objectStorage.activeProvider — explicit override. When set to a provider name (rustfs, minio, ...) the alias targets that provider unconditionally. The default value auto defers to the resolution rules below.
  • objectStorage.previousProvider — names the data-bearing side during a cutover. Defaults to minio for back-compat with installs that predated the registry.
  • objectStorage.cutoverAcknowledged — flips the alias from previousProvider to the other enabled provider once the operator has finished migrating data. Defaults to false.

A legacy rustfs.migrationAcknowledged flag continues to be accepted as an alias for cutoverAcknowledged so existing values overlays survive the upgrade unchanged.

Resolution rules

With activeProvider=auto (the default), the chart resolves the active provider from the set of enabled providers:

Enabled providerscutoverAcknowledgedAlias targets
None(irrelevant)no alias rendered (BYO external S3)
One(irrelevant)that provider
TwofalsepreviousProvider (data-bearing side)
Twotruethe other one (post-cutover)
Three or more(irrelevant)render-time error asking for an explicit activeProvider

With an explicit activeProvider, the alias targets the named provider verbatim. If no pods carry the matching app.kubernetes.io/name label, the alias has no endpoints and the ilum-core readiness probe surfaces the misconfiguration in pod logs.

Common operator scenarios

The following flows are documented elsewhere in this section. Use this list as a map to the right guide for the situation at hand.

Default bucket layout

The chart provisions seven default buckets on the active provider. Each bucket is owned by one or more bundled consumers. Operators tuning the chart's defaults should consult the table below before renaming or removing entries.

BucketConsumersConfigurable via
ilum-filesilum-core (Spark jars, job artifacts), ilum-jupyter, ilum-kyuubi, ilum-hive-metastore, LokiobjectStorage.defaultBuckets
ilum-datailum-core warehouse, Hive Metastore, Unity Catalog, Trino, Airflow logsobjectStorage.defaultBuckets, per-consumer warehouseDir overrides
ilum-tablesilum-core data tables (Iceberg, Delta, Hudi)ilum-core.kubernetes.s3.dataBucket
ilum-mlflowMLflow tracking artifactsmlflow.s3.bucket
ilum-kestraKestra workflow internal storagekestra.config.storage_driver.defaults.s3.bucket
ilum-ducklakeDuckDB DuckLake catalogduckdb.ducklake.location
ilum-langfuseLangfuse trace storagelangfuse.s3.bucket

The bucket list is created idempotently by the bundled init Job (init-rustfs-buckets for RustFS, init-minio-policies for MinIO). For external S3 backends, the operator creates the buckets manually before installing Ilum. See Provider reference: External S3.

How the alias plugs into the stack

The same alias hostname routes both the Ilum UI's iframe traffic and every consumer's S3 API calls. Switching providers updates the alias selector; no consumer rewiring is required.

Reference