Architecture

Ilum is designed to bring the power of Apache Spark to Kubernetes environments, leveraging the best of both ecosystems. At its core, it aims to simplify the process of deploying, managing, and monitoring Spark jobs, irrespective of the underlying cluster manager.

The idea of interactive Spark jobs is to give a user a possibility to run consecutive Spark jobs without a long Spark application initialization time. It wraps Spark application logic into a long-running Spark job which is able to handle calculation requests immediately.

Key Components

ilum-core: The central piece of the Ilum architecture is the ilum-core, which is responsible for creating, managing, and monitoring Spark jobs. It exposes REST APIs (conforming to OpenAPI 3.0 standard) for clients to interact with, and is responsible for scheduling and executing Spark jobs on the connected Kubernetes clusters.
ilum-ui: This is a user-friendly web interface that allows users to manage and monitor Spark jobs. It communicates with the ilum-core via the REST APIs.
Kubernetes Cluster: Ilum integrates seamlessly with Kubernetes clusters, where Spark jobs are executed. Kubernetes is used as the cluster manager, and ilum-core deploys Spark jobs as Kubernetes pods.
Object Storage: Ilum integrates with Kubernetes-based object storage solutions, providing an alternative to Hadoop's HDFS. This allows storing and retrieving large volumes of data in a distributed and scalable manner.
MongoDB: Ilum utilizes MongoDB as its internal database for storing job metadata, cluster information, and other operational data.
Apache Kafka: Kafka can be used as a communication layer in Ilum for reliable and efficient data streaming and processing.

Workflow

Users can submit Spark jobs via the Ilum UI or directly through the REST API. These jobs can be part of a long-running interactive session or batch jobs.
The ilum-core receives these requests, schedules the Spark jobs, and deploys them on the connected Kubernetes cluster.
The Spark jobs run as Kubernetes pods, scaling horizontally across the nodes of the Kubernetes cluster. The number of Spark executor pods can be controlled by the users.
Results from the Spark jobs are returned via the ilum-core and can be viewed on the Ilum UI or fetched via the REST API.
For storing and retrieving data, the Spark jobs can utilize the integrated object storage, which behaves like an S3-compatible storage solution.

Communication Types

Ilum supports two primary forms of communication between Spark jobs and the ilum-core: Apache Kafka and gRPC.

Apache Kafka Communication

Ilum's integration with Apache Kafka facilitates reliable and scalable communication, supporting all of Ilum's features, including High Availability (HA) and scalability. All event exchanges are conducted via automatically created topics using Apache Kafka brokers.

gRPC Communication (default)

As an alternative, gRPC can be used for communication. This option simplifies the deployment process by eliminating the need for Apache Kafka during installation. gRPC establishes direct connections between ilum-core and Spark jobs, removing the requirement for a separate message broker. However, using gRPC does not support High Availability (HA) for ilum-core under the current implementation. While ilum-core can be scaled, existing Spark jobs will continue communicating with the same ilum-core instances.

Cluster Types

Ilum simplifies Spark cluster configuration, and once configured, the cluster can be used for various jobs, irrespective of their type or quantity. Ilum currently supports three types of clusters: Kubernetes, Yarn, and Local.

note

During the initial launch, Ilum automatically creates a default cluster, using the same Kubernetes cluster on which Ilum is installed. If this default cluster is accidentally deleted, you can easily recreate it by restarting the ilum-core pod.

Kubernetes Cluster

Ilum's primary focus is to facilitate easy integration between Spark and Kubernetes. It simplifies the configuration and launch of Spark applications on Kubernetes. To connect to an existing Kubernetes cluster, users need to provide default configuration information, such as the Kubernetes API URL and authentication parameters. Ilum supports both user/password and certificate-based authentication methods. Multiple Kubernetes clusters can be managed by Ilum, provided they are accessible. This feature enables the creation of a hub for managing numerous Spark environments from a single location.

Yarn Cluster

Ilum also supports Apache Yarn clusters, which can be easily configured using Yarn configuration files present in the Yarn installation.

Local Cluster

The local cluster type runs Spark applications where ilum-core is deployed, meaning it runs Spark applications either inside the ilum-core container when deployed on Docker/Kubernetes, or on the host machine when deployed without an orchestrator. This cluster type is suitable for testing purposes due to its resource limitations.

Scalability

ilum-core was designed with scalability in mind. Being completely stateless, ilum-core can recover its entire existing state after a crash, making it easy to scale up or down based on load requirements.

High Availability

ilum-core and its required components support High Availability (HA) deployments. Ilum, MongoDB, Apache Kafka, and MinIO can be configured to provide a fully high availability environment. Note that an HA deployment necessitates the use of Apache Kafka as the communication type, as using gRPC does not support HA.

Security

Ilum incorporates a straightforward mechanism to secure access to the ilum-ui web console. It allows for the setting of a default admin username and password during deployment. Currently, security measures apply only to the ilum-ui web console.

Key Components​

Workflow​

Communication Types​

Apache Kafka Communication​

gRPC Communication (default)​

Cluster Types​

Kubernetes Cluster​

Yarn Cluster​

Local Cluster​

Scalability​

High Availability​

Security​