🗃️ Execution Engines

Multi-engine SQL execution in Ilum. Apache Spark, Trino, DuckDB, and Apache Flink, unified behind the Apache Kyuubi gateway with an automatic engine router.

Multi-engine SQL workbench for Ilum. Run queries against Spark, Trino, DuckDB, and Flink from a single browser-based editor with engine lifecycle controls, dialect transpilation, in-app notebooks, profiling, and visualization.

🗃️ Data Catalogs

Explore Ilum's four supported data catalogs: Hive Metastore (default), Project Nessie for Git-like branching, Unity Catalog for governance, and DuckLake for DuckDB-native local analytics.

📄️ Schedule

Automate your Spark jobs with Ilum's built-in scheduler. Easily configure periodic tasks for data ingestion, ETL pipelines, and reporting using intuitive UI or custom CRON expressions.

📄️ File Explorer

Browse and manage files across multiple storage systems, profile data quality, and create tables from raw files. Supports S3, GCS, HDFS, and Azure Storage.

🗃️ Notebooks

Discover Ilum's notebook capabilities. Compare JupyterLab, JupyterHub, and Zeppelin environments to choose the best tool for your interactive analytics, data science, and collaborative workflows.

📄️ Table Explorer

Browse and monitor your datasets with the Table Explorer. Inspect table metadata, view data lineage, and interactively visualize data samples using the built-in Data Exploration Tool.

📄️ Data Lineage

Data lineage is the process of tracking data flow from source to destination. Learn how to visualize data, understand the flow of data through pipelines, and implement data lineage for your organization.

📄️ OpenMetadata

OpenMetadata in Ilum: unified data catalog, governance, and OpenLineage-based column-level lineage across Spark, Hive, Delta Lake, and Airflow.

📄️ Ilum Tables

Ilum Tables provides a unified Spark wrapper for Delta, Iceberg, and Hudi data formats. Learn how to easily read, write, and stream data using a consistent interface for flexible data management.

📄️ Clusters and Storages

Manage your multi-cluster Spark infrastructure and diverse storage systems from a centralized control plane. Simplify access, enhance security, and streamline job deployment across local, GKE, and other clusters.

📄️ Spark Connect Server

Leverage Spark Connect in Ilum to decouple client applications from Spark clusters. Learn how to run remote Spark jobs, build interactive applications, and connect securely from local environments.

📄️ Airflow

Technical guide for orchestrating Spark on Kubernetes using Apache Airflow and Ilum. Covers LivyOperator configuration, dependency management, logging architecture, and Git Sync.

📄️ Monitoring

A comprehensive guide to monitoring Apache Spark on Kubernetes with Ilum. Learn to configure Prometheus metrics, visualize data in Grafana, analyze logs with Loki, and debug performance issues like OOM errors and data skew.

📄️ Superset

Configure Apache Superset on Kubernetes with Ilum. Comprehensive guide on Spark/Trino integration, SQLAlchemy tuning, Helm deployment strategies, and performance optimization for enterprise BI.

📄️ Tableau

Technical integration guide for connecting Tableau Desktop to Ilum via JDBC.

📄️ MLflow

Technical documentation for MLflow integration in Ilum. Architecture, Spark autologging, model registry operations, and batch inference implementation on Kubernetes.

📄️ NiFi

A comprehensive technical guide on integrating Apache NiFi with Ilum. Learn to orchestrate event-driven data pipelines, trigger Spark jobs via REST API, and manage robust data flows.

📄️ Observability

Per-run distributed traces and per-table cost attribution for Spark pipelines, surfaced as Pipeline Trace and Cost tabs on every Ilum job detail view.

📄️ Power BI

Technical integration guide for connecting Power BI to ilum via JDBC/ODBC.

📄️ AI Data Analyst Agent

Discover Ilum's AI Data Analyst Agent, an intelligent assistant powered by LLMs that translates natural language into optimized SQL queries, integrates with your data catalog, and simplifies complex data analysis for all users.

📄️ n8n

Automate your data workflows with Ilum's n8n integration. Build visual pipelines connecting Apache Spark, APIs, and databases using a low-code interface with custom nodes for deep platform integration.

📄️ Data Science Platform

Explore Ilum's unified Data Science Platform, offering seamless data access, pre-configured notebook environments, automated MLOps with MLflow, and tools to build and deploy AI applications at scale.

📄️ Kestra

Orchestrate Apache Spark jobs on Kubernetes with Kestra and Ilum. Learn to build declarative, event-driven data pipelines using YAML workflows and optimizing submission latency.

📄️ Resource Control & Governance

Master Kubernetes resource management in Ilum. Deep dive into Resource Quotas, Limit Ranges, and their impact on Spark workloads for multi-tenant cluster stability.

📄️ Mage

Technical guide for integrating Mage with Ilum. Learn how to architect production-grade ETL pipelines using Spark Connect, manage Kubernetes resources, and configure custom Docker environments for distributed data processing.

📄️ Streamlit

Build and deploy enterprise-grade data applications with Streamlit in Ilum. Comprehensive guide on integrating Spark Connect, optimizing performance with caching, and configuring Kubernetes deployments for scalable data dashboards.

📄️ LangFuse

LLM observability for AI-driven workloads on Ilum. Trace prompts, completions, and agent steps alongside the data pipelines that produce their inputs.

📄️ Performance & Query Optimization

MPP query execution, vectorized processing, caching strategies, and workload management in ilum's dual-engine architecture.

📄️ ClickHouse

ClickHouse on Ilum. An optional analytics-store module for low-latency OLAP queries alongside the lakehouse.

📄️ Data Lifecycle Management

Manage data lifecycle in ilum with Iceberg table maintenance, storage tiering, retention policies, compaction, and automated cleanup operations.

Features