What is Ilum?

Modular Data Lakehouse Platform for Cloud-Native Apache Spark Workloads

Ilum is an Apache Spark management platform designed for Kubernetes and Apache Hadoop Yarn environments. It provides enterprise-grade cluster orchestration, interactive Spark sessions via REST API, and seamless integration with modern data engineering tools including Jupyter, Apache Airflow, MLflow, and Delta Lake/Iceberg/Hudi table formats.

Key Capabilities

Kubernetes-native Spark operator with automatic pod orchestration and resource management
Multi-cluster management across cloud providers (GKE, EKS, AKS) and on-premise deployments
Interactive Spark sessions accessible through REST API endpoints for building Spark-based microservices
Apache Hadoop Yarn integration for hybrid cluster architectures
Built-in S3-compatible object storage for cloud-native data lake architectures
Horizontal scalability from single-node development to production clusters with hundreds of executors

Get started with Ilum → | View architecture documentation →

Ilum - Apache Spark on Kubernetes Platform

Ilum transforms Apache Spark cluster management by providing a unified control plane for Kubernetes and Yarn-based Spark deployments. Unlike traditional Spark management approaches, Ilum treats Spark applications as first-class microservices with REST API interfaces, enabling real-time data processing architectures.

Architecture

Ilum's architecture consists of two core components:

Ilum Core: Backend service providing gRPC/Kafka-based job orchestration, cluster state management, and REST API endpoints
Ilum UI: Web-based dashboard for cluster monitoring, job submission, and resource visualization

The platform supports both Python (PySpark) and Scala programming languages, with native integration for Spark SQL, Spark Streaming, and MLlib frameworks.

Data Lakehouse Capabilities

Ilum provides comprehensive support for modern table formats:

Delta Lake: ACID transactions with time travel and schema evolution
Apache Iceberg: Partition evolution and hidden partitioning for large-scale analytics
Apache Hudi: Record-level upserts and incremental data processing
Ilum Tables: Unified API abstraction across multiple table formats

Integration with Hive Metastore and Nessie catalog enables SQL-based metadata management and Git-like data versioning.

REST API for Spark Microservices

Ilum exposes Spark functionality through RESTful endpoints:

# Submit Spark job
POST /api/v1/jobs

# Query interactive session
POST /api/v1/sessions/{id}/execute

# Monitor job status
GET /api/v1/jobs/{id}/status

This enables building responsive data applications where Spark computations are triggered by HTTP requests, supporting use cases like:

Real-time feature engineering for ML models
On-demand data transformations via API
Streaming analytics with REST-based controls
Jupyter notebook execution through HTTP interface

Multi-Cluster Orchestration

Ilum manages heterogeneous Spark clusters from a single control plane:

Cloud clusters: GKE, EKS, AKS with auto-scaling groups
On-premise clusters: Bare metal Kubernetes or Hadoop Yarn deployments
Hybrid architectures: Mixed cloud and on-premise for data sovereignty requirements

Each cluster maintains independent resource quotas, storage backends, and security policies while sharing centralized monitoring and job scheduling.

Comparison with Alternative Solutions

Feature	Ilum	Databricks	Cloudera
Kubernetes native	✓	Partial	Partial
Multi-cluster management	✓	Limited	✓
Vendor lock-in	None	High	High
REST API for sessions	✓	✓	Limited
On-premise deployment	✓	Limited	✓
Cloud deployment	✓	✓	✓
Yarn integration	✓	✗	✓

Video Overview

tip

Prefer a guided path? Build your first data product on Ilum in hours. Official course →.

Features

Spark Cluster Management

Kubernetes Operator Integration: Native CRD-based Spark application deployment with pod lifecycle management
Multi-cluster Control Plane: Centralized management for GKE, EKS, AKS, and on-premise Kubernetes clusters
Horizontal Pod Autoscaling: Dynamic executor scaling based on CPU/memory metrics and queue depth
Resource Quotas: Namespace-level limits for CPU cores, memory, and persistent volume claims

Interactive Computing

REST API Endpoints: HTTP interface for Spark session creation, code execution, and result retrieval
Jupyter Integration: Spark Magic kernels with automatic session binding and DataFrame visualization
Apache Zeppelin Notebooks: Multi-language interpreters (Scala, Python, SQL) with paragraph-level execution
Code Groups: Reusable Spark contexts shared across multiple notebook sessions

Storage & Data Formats

S3-Compatible Object Storage: MinIO-based distributed storage with S3 API compatibility
Table Format Support: Delta Lake, Iceberg, Hudi with ACID guarantees and schema evolution
Catalog Integration: Hive Metastore, AWS Glue, Nessie for metadata management
Distributed File Systems: HDFS, GCS, Azure Blob Storage, and S3 connectivity

Orchestration & Scheduling

Built-in Scheduler: Cron-based job scheduling with dependency management
Apache Airflow Integration: DAG-based workflow orchestration with Spark operators
Kestra Support: Event-driven pipelines with Spark task execution
dbt Core: SQL transformations with Spark as execution engine

Monitoring & Observability

Spark History Server: Job timeline, stage metrics, and executor resource utilization
Prometheus Integration: Custom metrics for application-level monitoring
Grafana Dashboards: Pre-configured visualizations for cluster health and job performance
Loki Log Aggregation: Centralized logging with Promtail collectors
OpenLineage: Data lineage tracking for table-level dependencies

Security & Access Control

RBAC Policies: Kubernetes-native role-based access with fine-grained permissions
OAuth2/OIDC: Integration with Keycloak, Okta, Azure AD for authentication
TLS/mTLS: Certificate-based encryption for inter-service communication
LDAP/Active Directory: Enterprise directory service integration
Network Policies: Pod-to-pod traffic restrictions and egress controls

Explore full feature documentation → | Request new features →

Ilum Spark cluster management dashboard showing job execution timeline, resource utilization graphs, and executor pod status

Advantages

Cloud-Native Architecture

Ilum is designed as a cloud-native first platform with containerized services, declarative configuration, and GitOps-compatible deployment:

Helm Charts: Parameterized Kubernetes manifests for reproducible deployments
Container Images: Official images for Spark 3.x with pre-installed connectors (S3, GCS, Azure)
Custom Resource Definitions: Kubernetes API extensions for Spark application management
Service Mesh Ready: Compatible with Istio/Linkerd for advanced traffic management

No Vendor Lock-In

Unlike proprietary platforms, Ilum provides:

Open APIs: REST and gRPC interfaces following OpenAPI specifications
Standard Protocols: JDBC/ODBC connectivity, S3 API compatibility, Kafka integration
Portable Workloads: Spark applications run on any Kubernetes cluster without modification
Multi-Cloud Support: Deploy across AWS, GCP, Azure without platform-specific dependencies

Hadoop Migration Path

For organizations migrating from Hadoop/HDFS ecosystems:

Yarn Compatibility: Run existing Yarn-based Spark jobs without code changes
HDFS Connector: Direct access to HDFS clusters during migration phases
Hive Metastore: Reuse existing table metadata and partitioning schemes
Incremental Migration: Gradual transition with hybrid Yarn/Kubernetes deployment

Performance Optimization

Ilum includes performance enhancements:

Dynamic Allocation: Automatic executor scaling based on shuffle data and pending tasks
Adaptive Query Execution (AQE): Runtime optimization for join strategies and partition coalescing
Columnar Caching: Parquet/ORC in-memory caching with LRU eviction policies
Network-Aware Scheduling: Pod placement considering data locality and network topology

Enterprise Integration

Built for enterprise data platforms:

Apache Kafka: Native Spark Structured Streaming integration with exactly-once semantics
Apache Airflow: Managed Airflow instances with Spark operators pre-configured
MLflow: Model registry and experiment tracking for machine learning pipelines
Superset/Tableau: BI tool connectivity via JDBC drivers and load balancers

Read architecture documentation → | View use cases →

Project Roadmap

Explore planned features and integrations:

GPU Scheduling: CUDA-enabled executors for deep learning workloads

View full roadmap → | See changelog →

Additional Resources

API Reference: REST API documentation for programmatic access
Security Guide: Authentication, authorization, and network policies
Production Deployment: Best practices for production clusters
User Guides: Step-by-step tutorials for common workflows

Key Capabilities​

Ilum - Apache Spark on Kubernetes Platform​

Architecture​

Data Lakehouse Capabilities​

REST API for Spark Microservices​

Multi-Cluster Orchestration​

Comparison with Alternative Solutions​

Video Overview​

Features​

Spark Cluster Management​

Interactive Computing​

Storage & Data Formats​

Orchestration & Scheduling​

Monitoring & Observability​

Security & Access Control​

Advantages​

Cloud-Native Architecture​

No Vendor Lock-In​

Hadoop Migration Path​

Performance Optimization​

Enterprise Integration​

Project Roadmap​

Additional Resources​