Production Deployment Guide
This comprehensive guide provides detailed instructions for deploying Ilum in production environments with enhanced security, namespace separation, and multiple configuration options to meet diverse operational requirements.
Table of Contents
- Overview
- Kubernetes Prerequisites
- Architecture
- High Availability Configuration
- Failure Domains & Resilience
- Dynamic Resource Scaling
- Namespace Separation Strategy
- Security Configuration
- Dependency Deployment
- Pre-configured Stack Options
- Helm Values Configuration
- Installation Instructions
- Post-Installation Configuration
- Troubleshooting
Overview
For production environments, it's strongly recommended to deploy critical dependencies in separate namespaces to achieve:
- Enhanced Security: Namespace-level isolation and RBAC policies
- Resource Management: Independent resource quotas and limits
- Operational Excellence: Simplified maintenance and upgrades
- Compliance: Meeting organizational separation requirements
- Scalability: Independent scaling of components
Critical Components for Namespace Separation
The following components should be deployed in separate namespaces for production:
- PostgreSQL: Primary metadata store for
ilum-coreand shared dependency for Marquez, Hive Metastore, Airflow, Superset, MLflow, and others - MongoDB: Legacy metadata store, still supported for existing deployments
- Apache Kafka: Message broker and communication layer (required for
ilum-coreHA) - MinIO: Object storage for Spark applications and data
Kubernetes Prerequisites
Ilum has been extensively tested across all leading Kubernetes environments, ensuring compatibility with a variety of deployment scenarios:
Supported Platforms
- Lightweight Distributions: k3s, Rancher, MicroK8s
- Bare-metal Clusters: Self-managed Kubernetes installations
- Managed Services:
- Google Kubernetes Engine (GKE)
- Amazon Elastic Kubernetes Service (EKS)
- Azure Kubernetes Service (AKS)
- DigitalOcean Kubernetes
- Red Hat OpenShift
Minimum Suggested Requirements
| Component | Requirement |
|---|---|
| Kubernetes Version | 1.20+ |
| CPU | 8 cores minimum, 16+ recommended |
| Memory | 16GB minimum, 32GB+ recommended |
Air-gapped (Offline) Environments
For air-gapped installations, refer to our comprehensive Air-gapped Installation Guide.
Testing vs Production
Minikube is used throughout our documentation for demonstration purposes but is not suitable for production due to limitations in scalability, resource management, and high availability.
Architecture
Components and modules
The Ilum platform consists of three core services and a curated set of optional modules:
ilum-core: Main backend service. Hosts the public REST API, job and cluster orchestration, multi-engine SQL execution, security, and lineage capture.ilum-ui: React-based web frontend. Hosts the SQL Editor, Table Explorer, Lineage view, Workloads management, and the Modules registry.ilum-api: Module-management microservice. Drives Helm-based install, upgrade, and disable of optional Ilum modules at runtime via cluster-scoped RBAC. Future releases will extendilum-apiwith Model Context Protocol (MCP) capabilities and open APIs for third-party extension.
Optional modules include execution engines (Trino, Apache Flink), additional catalogs (Project Nessie, Unity Catalog, DuckLake), notebooks (JupyterHub, Zeppelin), orchestration (Airflow, Kestra, Mage, n8n, NiFi), BI tools (Superset, Streamlit), AI and ML stacks (MLflow, LangFuse), and observability components (Kube Prometheus stack, Loki, Promtail).
PostgreSQL
PostgreSQL is the primary metadata store for ilum-core (accessed via R2DBC with jOOQ-generated SQL DSL). It is also used by shared dependencies including Marquez, Hive Metastore, Airflow, Superset, MLflow, Hydra, Gitea, n8n, and Kestra to store their respective metadata.
PostgreSQL is enabled in Ilum by default.
If you want to control whether PostgreSQL is enabled or not, you can use the helm value postgresql.enabled. For example, to disable it, you can add --set postgresql.enabled=false to your installation command.
MongoDB
MongoDB is supported as a legacy metadata store for ilum-core. New deployments should use PostgreSQL; existing MongoDB-backed deployments remain fully supported. Ilum ships migration tooling (M001 through M009 scripts) for moving from MongoDB to PostgreSQL.
Ilum automatically creates all necessary databases and collections during the startup process when MongoDB is in use.
Apache Kafka
Apache Kafka serves as Ilum's communication layer, facilitating interaction between Ilum-Core and Spark jobs, as well as between different Ilum-Core instances when scaled. It is critical to ensure Apache Kafka brokers are accessible by both Ilum-Core and Spark jobs, especially when Spark jobs are launched on a different Kubernetes cluster.
Ilum utilizes Kafka to carry out communication using several topics, all created during Ilum's startup. Therefore, users don't need to manage these topics manually.
MinIO
Ilum uses MinIO as the storage layer for Spark application components. All files (including jars, configurations, data files) needed for the operation of Spark components (driver, executors) are stored and made available for download via MinIO.
MinIO implements the S3 interface, which also enables it to store input/output data.
Ilum-Livy-proxy
Ilum-Livy proxy is our implementation of Livy Api, that integrates spark code with Ilum Groups in services such as Jupyter, Zeppelin, Airflow
Ilum Livy-proxy is enabled in Ilum by default.
In case you want to add or remove Ilum-Livy-proxy, you can use ilum-livy-proxy.enabled helm value to manage it.
For example: --set ilum-livy-proxy.enabled=false to disable it.
Read more about Ilum-Livy-proxy here
Jupyter
Jupyter is a Notebook - sophisticated development environment which allows you to have code, charts, explanations and more in one executable document.
Jupyter is enabled in Ilum by default.
However, in case you want to control whether it is enabled or not, you can use helm value ilum-jupyter.enabled. For example, you can add
--set ilum-jupyter.enabled=false to your installation command to disable it.
Be aware, that Jupyter makes use of Ilum-Livy-proxy to integrate with Ilum Groups. Therefore, you should enable it as well:
--set ilum-livy-proxy.enabled=true
If you want to access the Jupyter UI, you can do it by:
- using Ilum UI: go to Modules > Jupyter
- configuring an ingress
- using the port-forward command
kubectl port-forward svc/ilum-jupyter 8888:8888
Read more about Jupyter here
Apache Zeppelin
Zeppelin is a Notebook - sophisticated development environment which allows you to have code, charts, explanations and more in one executable document.
Please be aware, that Zeppelin notebook is not bundled in ilum package by default. If you want to run this service, add --set ilum-zeppelin.enabled=true to your installation command.
Be aware, that Zeppelin makes use of Ilum-Livy-proxy to integrate with Ilum Groups. Therefore, you should enable it as well:
--set ilum-livy-proxy.enabled=true
If you want to access the Zeppelin UI, the best way to do it is by configuring an ingress or using the port-forward command kubectl port-forward svc/ilum-zeppelin 8080:8080
Read more about Zeppelin here
Hive Metastore
Hive Metastore is a metadata storage used to store your Spark catalogs (Spark tables, databases, views, and more) in a database instead of runtime memory. You can view these schemas later on the Table Explorer page.
Hive Metastore is not enabled in Ilum by default.
To enable the Hive Metastore bundled instance, set the following values in your Helm installation command:
ilum-core:
metastore:
enabled: true
type: hive
ilum-hive-metastore:
enabled: true
Hive Metastore uses PostgreSQL database to store metadata. You can read about Postgres in Ilum below.
Project Nessie
Nessie is a transactional catalog for your data. It was inspired by Git and is designed to support a wide range of data-lake tooling. It works best with Apache Iceberg tables.
To learn more about Nessie, visit the Nessie documentation page.
Project Nessie is not enabled in Ilum by default. To enable it, set the following values in your Helm installation command:
ilum-core:
metastore:
enabled: true
type: nessie
nessie:
enabled: true
Ilum SQL (Kyuubi Gateway)
Ilum SQL is not enabled in Ilum by default.
Ilum SQL is the multi-engine SQL gateway built on Apache Kyuubi. It exposes a single JDBC and REST entry point for queries that route to Apache Spark, Trino, DuckDB, or Apache Flink based on the engine selected for each query, or chosen by the automatic engine router.
To enable Ilum SQL, add --set ilum-sql.enabled=true to enable the SQL execution host and --set ilum-core.sql.enabled=true to enable the SQL features inside Ilum itself.
Read more about the SQL Editor on the SQL Editor page.
Trino
Note: Trino is not enabled in Ilum by default. To enable it, add --set trino.enabled=true to deploy a built-in Trino distribution.
Trino is a distributed SQL query engine well suited to interactive analytics on medium-to-large datasets and for federated queries across multiple data sources. It complements Spark for workloads that benefit from low-latency response times.
Once enabled, Trino is reachable through the Ilum SQL gateway and selectable from the Engine Selector in the SQL Editor. Read more on the SQL Editor page.
DuckDB and DuckLake
DuckDB is enabled by default and provides single-node SQL execution for small-to-medium data, ad-hoc exploration, and DuckLake-managed tables. It runs in-process with ilum-core and is selectable from the Engine Selector in the SQL Editor.
DuckLake is the DuckDB-native catalog, enabled by default, with table data stored in MinIO (or any configured S3-compatible backend). No additional Helm values are required to use DuckDB or DuckLake.
Apache Flink
Apache Flink support is available for Enterprise deployments as a Beta feature. Flink is exposed through the Kyuubi SQL gateway for low-latency stream processing. Contact Ilum for enablement details.
n8n
Note: n8n is not enabled in Ilum by default. To enable it, add --set ilum-n8n.enabled=true to enable a built-in n8n distribution.
n8n is a fair-code workflow automation platform with native AI capabilities.
More about it read on the n8n page.
Apache Airflow
Apache Airflow is a powerful platform for orchestrating and managing data workflows. To read more about Airflow in Ilum, visit the Airflow documentation page.
Airflow is not enabled in the Ilum package by default.
To deploy Airflow, add --set airflow.enabled=true to your installation command.
Once enabled, Airflow will appear in the Ilum UI under the Modules section.
Airflow can leverage Ilum’s Livy proxy to easily create jobs within Ilum. For more details, see the Livy proxy section above.
Marquez
Marquez is an open-source metadata management tool that focuses on capturing, aggregating, and visualizing the lineage of data assets within an organization’s data ecosystem. It tracks how datasets are produced and consumed by different jobs and provides a central view of these dependencies.
Please be aware that Marquez is not bundled in Ilum package by default.
If you want to run this service,
add --set global.lineage.enabled=true to your installation command.
Take into account that Marquez makes use of PostgreSQL database to store the metadata. You can read about it below.
Additionally, if you wish to use Marquez’s web client instead of Ilum’s UI, enable the default web client with
--set ilum-marquez.web.enabled=true and set up one of the access methods:
- use the port-forward command
kubectl port-forward svc/ilum-marquez-web 9444:9444 - configure an ingress
Read more about Marquez and Ilum Lineage here
Kestra
Kestra is an open-source data orchestration platform designed for orchestrating and automating data pipelines and business workflows. You can read about it here.
Kestra is not enabled in Ilum by default. To enable it, add --set kestra.enabled=true to your installation command.
Kestra uses PostgreSQL database to store the data about the jobs and tasks and Minio for general file storage
Mage
Mage is an open-source data engineering platform that simplifies the process of building, deploying, and maintaining data pipelines. It provides a user-friendly interface for creating data workflows, integrating with various data sources, and managing data transformations.
To read about Mage next to Ilum, visit the documentation page.
Mage is not enabled in Ilum by default. To enable it, add --set mageai.enabled=true to your installation command.
Ilum deploys Mage OSS, which is the open-source version of Mage, and does not include the commercial features available in Mage Pro (Cloud).
NiFi
Apache NiFi is a software project for building data processing pipelines. It provides a user interface for creating, managing, and deploying data processing pipelines.
To read about NiFi next to Ilum, visit the documentation page.
NiFi is not enabled in Ilum by default. To enable it, add --set nifi.enabled=true to your installation command.
Streamlit
Streamlit is a library for creating beautiful, performant, and scalable data apps in Python. It is used to build custom data apps accessible from the Ilum UI.
To read about Streamlit next to Ilum, visit the documentation page.
Streamlit is not enabled in Ilum by default.
To enable it, add --set streamlit.enabled=true to your installation command.
Additionally, can provide a docker image with your streamlit application running in it. To see how to do it, visit our documentation page.
Kube Prometheus Stack
Kube Prometheus Stack includes Prometheus, Grafana and other tools for monitoring your data infrastructure
Please be aware, that Kube Prometheus Stack is not bundled in ilum package by default. If you want to run this service, add --set kube-prometheus-stack.enabled=true to your installation command.
If you are upgrading an existing Ilum Helm chart that previously did not have the Kube Prometheus Stack enabled, you must first install the required Prometheus Custom Resource Definitions (CRDs) before proceeding with the upgrade. To do this, run the following commands:
kubectl apply --server-side -f https://raw.githubusercontent.com/prometheus-operator/prometheus-operator/v0.80.0/example/prometheus-operator-crd/monitoring.coreos.com_alertmanagerconfigs.yaml
kubectl apply --server-side -f https://raw.githubusercontent.com/prometheus-operator/prometheus-operator/v0.80.0/example/prometheus-operator-crd/monitoring.coreos.com_alertmanagers.yaml
kubectl apply --server-side -f https://raw.githubusercontent.com/prometheus-operator/prometheus-operator/v0.80.0/example/prometheus-operator-crd/monitoring.coreos.com_podmonitors.yaml
kubectl apply --server-side -f https://raw.githubusercontent.com/prometheus-operator/prometheus-operator/v0.80.0/example/prometheus-operator-crd/monitoring.coreos.com_probes.yaml
kubectl apply --server-side -f https://raw.githubusercontent.com/prometheus-operator/prometheus-operator/v0.80.0/example/prometheus-operator-crd/monitoring.coreos.com_prometheusagents.yaml
kubectl apply --server-side -f https://raw.githubusercontent.com/prometheus-operator/prometheus-operator/v0.80.0/example/prometheus-operator-crd/monitoring.coreos.com_prometheuses.yaml
kubectl apply --server-side -f https://raw.githubusercontent.com/prometheus-operator/prometheus-operator/v0.80.0/example/prometheus-operator-crd/monitoring.coreos.com_prometheusrules.yaml
kubectl apply --server-side -f https://raw.githubusercontent.com/prometheus-operator/prometheus-operator/v0.80.0/example/prometheus-operator-crd/monitoring.coreos.com_scrapeconfigs.yaml
kubectl apply --server-side -f https://raw.githubusercontent.com/prometheus-operator/prometheus-operator/v0.80.0/example/prometheus-operator-crd/monitoring.coreos.com_servicemonitors.yaml
kubectl apply --server-side -f https://raw.githubusercontent.com/prometheus-operator/prometheus-operator/v0.80.0/example/prometheus-operator-crd/monitoring.coreos.com_thanosrulers.yaml
If you want to access the Prometheus UI, the best way to do it is by configuring an ingress or using the port-forward command kubectl port-forward svc/prometheus-operated 9090:9090
If you want to access the Grafana UI, the best way to do it is by configuring an ingress or using the port-forward command kubectl port-forward svc/ilum-grafana 8080:80
Loki and Promtail
Loki is used to gather and manage logs of your data infrastructure. Promtail is used as an agent that pushes logs into Loki
Please be aware, that Loki is not enabled in Ilum by default. If you want to run this service, add --set global.logAggregation.loki.enabled=true to your installation command.
Promtail also is not enabled in Ilum by default. To enable it add --set global.logAggreagtion.promtail.enabled=true
to your installation command
If you want to access Loki and run Loki Queries, you can configure an ingress or use the port-forward command kubectl port-forward svc/ilum-loki-read 3100:3100 for read queries and kubectl port-forward svc/ilum-loki-write 3100:3100 for write queries. You can also use
service ilum-loki-gateway to link grafana to loki
Production Architecture Overview
Recommendations on which optionally deployed components to place in which namespace, whether in the one belonging to the Ilum release or in separate, dedicated ones.
┌─────────────────────────┐ ┌────────────────────────────┐
│ ILUM │ │ Dependencies │
│ components │ │ separated │
│ namespace │ │ namespaces │
├─────────────────────────┤ ├────────────────────────────┤
│ • Ilum Core ** │ │ • PostgreSQL ** │
│ • Ilum UI ** │ │ • MongoDB (legacy) │