Skip to main content

What is Ilum?

Modular Data Lakehouse Platform for Cloud-Native Apache Spark Workloads

Ilum is an Apache Spark management platform designed for Kubernetes and Apache Hadoop Yarn environments. It provides enterprise-grade cluster orchestration, interactive Spark sessions via REST API, and seamless integration with modern data engineering tools including Jupyter, Apache Airflow, MLflow, and Delta Lake/Iceberg/Hudi table formats.

Key Capabilities

  • Kubernetes-native Spark operator with automatic pod orchestration and resource management
  • Multi-cluster management across cloud providers (GKE, EKS, AKS) and on-premise deployments
  • Interactive Spark sessions accessible through REST API endpoints for building Spark-based microservices
  • Apache Hadoop Yarn integration for hybrid cluster architectures
  • Built-in S3-compatible object storage for cloud-native data lake architectures
  • Horizontal scalability from single-node development to production clusters with hundreds of executors

Get started with Ilum → | View architecture documentation →

Ilum - Apache Spark on Kubernetes Platform

Ilum transforms Apache Spark cluster management by providing a unified control plane for Kubernetes and Yarn-based Spark deployments. Unlike traditional Spark management approaches, Ilum treats Spark applications as first-class microservices with REST API interfaces, enabling real-time data processing architectures.

Architecture

Ilum's architecture consists of two core components:

  1. Ilum Core: Backend service providing gRPC/Kafka-based job orchestration, cluster state management, and REST API endpoints
  2. Ilum UI: Web-based dashboard for cluster monitoring, job submission, and resource visualization

The platform supports both Python (PySpark) and Scala programming languages, with native integration for Spark SQL, Spark Streaming, and MLlib frameworks.

Data Lakehouse Capabilities

Ilum provides comprehensive support for modern table formats:

  • Delta Lake: ACID transactions with time travel and schema evolution
  • Apache Iceberg: Partition evolution and hidden partitioning for large-scale analytics
  • Apache Hudi: Record-level upserts and incremental data processing
  • Ilum Tables: Unified API abstraction across multiple table formats

Integration with Hive Metastore and Nessie catalog enables SQL-based metadata management and Git-like data versioning.

REST API for Spark Microservices

Ilum exposes Spark functionality through RESTful endpoints:

# Submit Spark job
POST /api/v1/jobs

# Query interactive session
POST /api/v1/sessions/{id}/execute

# Monitor job status
GET /api/v1/jobs/{id}/status

This enables building responsive data applications where Spark computations are triggered by HTTP requests, supporting use cases like:

  • Real-time feature engineering for ML models
  • On-demand data transformations via API
  • Streaming analytics with REST-based controls
  • Jupyter notebook execution through HTTP interface

Multi-Cluster Orchestration

Ilum manages heterogeneous Spark clusters from a single control plane:

  • Cloud clusters: GKE, EKS, AKS with auto-scaling groups
  • On-premise clusters: Bare metal Kubernetes or Hadoop Yarn deployments
  • Hybrid architectures: Mixed cloud and on-premise for data sovereignty requirements

Each cluster maintains independent resource quotas, storage backends, and security policies while sharing centralized monitoring and job scheduling.

Comparison with Alternative Solutions

FeatureIlumDatabricksCloudera
Kubernetes nativePartialPartial
Multi-cluster managementLimited
Vendor lock-inNoneHighHigh
REST API for sessionsLimited
On-premise deploymentLimited
Cloud deployment
Yarn integration

Video Overview

tip

Prefer a guided path? Build your first data product on Ilum in hours. Official course →.

Features

Spark Cluster Management

  • Kubernetes Operator Integration: Native CRD-based Spark application deployment with pod lifecycle management
  • Multi-cluster Control Plane: Centralized management for GKE, EKS, AKS, and on-premise Kubernetes clusters
  • Horizontal Pod Autoscaling: Dynamic executor scaling based on CPU/memory metrics and queue depth
  • Resource Quotas: Namespace-level limits for CPU cores, memory, and persistent volume claims

Interactive Computing

  • REST API Endpoints: HTTP interface for Spark session creation, code execution, and result retrieval
  • Jupyter Integration: Spark Magic kernels with automatic session binding and DataFrame visualization
  • Apache Zeppelin Notebooks: Multi-language interpreters (Scala, Python, SQL) with paragraph-level execution
  • Code Groups: Reusable Spark contexts shared across multiple notebook sessions

Storage & Data Formats

  • S3-Compatible Object Storage: MinIO-based distributed storage with S3 API compatibility
  • Table Format Support: Delta Lake, Iceberg, Hudi with ACID guarantees and schema evolution
  • Catalog Integration: Hive Metastore, AWS Glue, Nessie for metadata management
  • Distributed File Systems: HDFS, GCS, Azure Blob Storage, and S3 connectivity

Orchestration & Scheduling

  • Built-in Scheduler: Cron-based job scheduling with dependency management
  • Apache Airflow Integration: DAG-based workflow orchestration with Spark operators
  • Kestra Support: Event-driven pipelines with Spark task execution
  • dbt Core: SQL transformations with Spark as execution engine

Monitoring & Observability

  • Spark History Server: Job timeline, stage metrics, and executor resource utilization
  • Prometheus Integration: Custom metrics for application-level monitoring
  • Grafana Dashboards: Pre-configured visualizations for cluster health and job performance
  • Loki Log Aggregation: Centralized logging with Promtail collectors
  • OpenLineage: Data lineage tracking for table-level dependencies

Security & Access Control

  • RBAC Policies: Kubernetes-native role-based access with fine-grained permissions
  • OAuth2/OIDC: Integration with Keycloak, Okta, Azure AD for authentication
  • TLS/mTLS: Certificate-based encryption for inter-service communication
  • LDAP/Active Directory: Enterprise directory service integration
  • Network Policies: Pod-to-pod traffic restrictions and egress controls

Explore full feature documentation → | Request new features →

Ilum Spark cluster management dashboard showing job execution timeline, resource utilization graphs, and executor pod status

Advantages

Cloud-Native Architecture

Ilum is designed as a cloud-native first platform with containerized services, declarative configuration, and GitOps-compatible deployment:

  • Helm Charts: Parameterized Kubernetes manifests for reproducible deployments
  • Container Images: Official images for Spark 3.x with pre-installed connectors (S3, GCS, Azure)
  • Custom Resource Definitions: Kubernetes API extensions for Spark application management
  • Service Mesh Ready: Compatible with Istio/Linkerd for advanced traffic management

No Vendor Lock-In

Unlike proprietary platforms, Ilum provides:

  • Open APIs: REST and gRPC interfaces following OpenAPI specifications
  • Standard Protocols: JDBC/ODBC connectivity, S3 API compatibility, Kafka integration
  • Portable Workloads: Spark applications run on any Kubernetes cluster without modification
  • Multi-Cloud Support: Deploy across AWS, GCP, Azure without platform-specific dependencies

Hadoop Migration Path

For organizations migrating from Hadoop/HDFS ecosystems:

  • Yarn Compatibility: Run existing Yarn-based Spark jobs without code changes
  • HDFS Connector: Direct access to HDFS clusters during migration phases
  • Hive Metastore: Reuse existing table metadata and partitioning schemes
  • Incremental Migration: Gradual transition with hybrid Yarn/Kubernetes deployment

Performance Optimization

Ilum includes performance enhancements:

  • Dynamic Allocation: Automatic executor scaling based on shuffle data and pending tasks
  • Adaptive Query Execution (AQE): Runtime optimization for join strategies and partition coalescing
  • Columnar Caching: Parquet/ORC in-memory caching with LRU eviction policies
  • Network-Aware Scheduling: Pod placement considering data locality and network topology

Enterprise Integration

Built for enterprise data platforms:

  • Apache Kafka: Native Spark Structured Streaming integration with exactly-once semantics
  • Apache Airflow: Managed Airflow instances with Spark operators pre-configured
  • MLflow: Model registry and experiment tracking for machine learning pipelines
  • Superset/Tableau: BI tool connectivity via JDBC drivers and load balancers

Read architecture documentation → | View use cases →

Project Roadmap

Explore planned features and integrations:

  • Flink Operator: Stream processing workloads alongside Spark batch jobs
  • GPU Scheduling: CUDA-enabled executors for deep learning workloads
  • Cost Attribution: Resource usage tracking with cloud billing integration

View full roadmap → | See changelog →

Additional Resources