🗃️ Execution Engines
Multi-engine SQL execution in Ilum. Apache Spark, Trino, DuckDB, and Apache Flink, unified behind the Apache Kyuubi gateway with an automatic engine router.
📄️ SQL Editor
Multi-engine SQL workbench for Ilum. Run queries against Spark, Trino, DuckDB, and Flink from a single browser-based editor with engine lifecycle controls, dialect transpilation, in-app notebooks, profiling, and visualization.
🗃️ Data Catalogs
Explore Ilum's four supported data catalogs: Hive Metastore (default), Project Nessie for Git-like branching, Unity Catalog for governance, and DuckLake for DuckDB-native local analytics.
📄️ Schedule
Automate your Spark jobs with Ilum's built-in scheduler. Easily configure periodic tasks for data ingestion, ETL pipelines, and reporting using intuitive UI or custom CRON expressions.
📄️ File Explorer
Browse and manage files across multiple storage systems, profile data quality, and create tables from raw files. Supports S3, GCS, HDFS, and Azure Storage.
🗃️ Notebooks
Discover Ilum's notebook capabilities. Compare JupyterLab, JupyterHub, and Zeppelin environments to choose the best tool for your interactive analytics, data science, and collaborative workflows.
📄️ Table Explorer
Browse and monitor your datasets with the Table Explorer. Inspect table metadata, view data lineage, and interactively visualize data samples using the built-in Data Exploration Tool.
📄️ Data Lineage
Data lineage is the process of tracking data flow from source to destination. Learn how to visualize data, understand the flow of data through pipelines, and implement data lineage for your organization.
📄️ Ilum Tables
Ilum Tables provides a unified Spark wrapper for Delta, Iceberg, and Hudi data formats. Learn how to easily read, write, and stream data using a consistent interface for flexible data management.
📄️ Clusters and Storages
Manage your multi-cluster Spark infrastructure and diverse storage systems from a centralized control plane. Simplify access, enhance security, and streamline job deployment across local, GKE, and other clusters.
📄️ Spark Connect Server
Leverage Spark Connect in Ilum to decouple client applications from Spark clusters. Learn how to run remote Spark jobs, build interactive applications, and connect securely from local environments.
📄️ Airflow
Technical guide for orchestrating Spark on Kubernetes using Apache Airflow and Ilum. Covers LivyOperator configuration, dependency management, logging architecture, and Git Sync.
📄️ Monitoring
A comprehensive guide to monitoring Apache Spark on Kubernetes with Ilum. Learn to configure Prometheus metrics, visualize data in Grafana, analyze logs with Loki, and debug performance issues like OOM errors and data skew.
📄️ Superset
Configure Apache Superset on Kubernetes with Ilum. Comprehensive guide on Spark/Trino integration, SQLAlchemy tuning, Helm deployment strategies, and performance optimization for enterprise BI.
📄️ Tableau
Technical integration guide for connecting Tableau Desktop to Ilum via JDBC.
📄️ MLflow
Technical documentation for MLflow integration in Ilum. Architecture, Spark autologging, model registry operations, and batch inference implementation on Kubernetes.
📄️ NiFi
A comprehensive technical guide on integrating Apache NiFi with Ilum. Learn to orchestrate event-driven data pipelines, trigger Spark jobs via REST API, and manage robust data flows.
📄️ Power BI
Technical integration guide for connecting Power BI to ilum via JDBC/ODBC.
📄️ AI Data Analyst Agent
Discover Ilum's AI Data Analyst Agent, an intelligent assistant powered by LLMs that translates natural language into optimized SQL queries, integrates with your data catalog, and simplifies complex data analysis for all users.
📄️ n8n
Automate your data workflows with Ilum's n8n integration. Build visual pipelines connecting Apache Spark, APIs, and databases using a low-code interface with custom nodes for deep platform integration.
📄️ Data Science Platform
Explore Ilum's unified Data Science Platform, offering seamless data access, pre-configured notebook environments, automated MLOps with MLflow, and tools to build and deploy AI applications at scale.
📄️ Kestra
Orchestrate Apache Spark jobs on Kubernetes with Kestra and Ilum. Learn to build declarative, event-driven data pipelines using YAML workflows and optimizing submission latency.
📄️ Resource Control & Governance
Master Kubernetes resource management in Ilum. Deep dive into Resource Quotas, Limit Ranges, and their impact on Spark workloads for multi-tenant cluster stability.
📄️ Mage
Technical guide for integrating Mage with Ilum. Learn how to architect production-grade ETL pipelines using Spark Connect, manage Kubernetes resources, and configure custom Docker environments for distributed data processing.
📄️ Streamlit
Build and deploy enterprise-grade data applications with Streamlit in Ilum. Comprehensive guide on integrating Spark Connect, optimizing performance with caching, and configuring Kubernetes deployments for scalable data dashboards.
📄️ LangFuse
LLM observability for AI-driven workloads on Ilum. Trace prompts, completions, and agent steps alongside the data pipelines that produce their inputs.
📄️ Performance & Query Optimization
MPP query execution, vectorized processing, caching strategies, and workload management in ilum's dual-engine architecture.
📄️ ClickHouse
ClickHouse on Ilum. An optional analytics-store module for low-latency OLAP queries alongside the lakehouse.
📄️ Data Lifecycle Management
Manage data lifecycle in ilum with Iceberg table maintenance, storage tiering, retention policies, compaction, and automated cleanup operations.