Skip to main content

Get Started

This guide will walk you through the process of setting up and running your first Spark job with Ilum on a Kubernetes cluster.

Prerequisites

Before proceeding, ensure that you have the following installed and properly configured:

  1. Kubernetes Cluster: This guide assumes that you already have a Kubernetes cluster up and running. If not, please follow the instructions to set up a Kubernetes cluster on minikube. You can find detailed instructions on how to install minikube here.

  2. Helm: Helm, the package manager for Kubernetes, is used to install Ilum. If you haven't installed Helm yet, you can find instructions here.

Setting Up the Kubernetes Cluster(minikube)

If you haven't already, start your Kubernetes cluster with the following command:

minikube start --cpus 4 --memory 8192 --addons metrics-server

This command sets up a minikube cluster with 4 CPUs, 8GB of memory, and adds the metrics-server for resource metrics collection.

Installing Ilum

Once your Kubernetes cluster is up and running, you can install Ilum by adding the Ilum Helm chart repository and then installing Ilum using Helm:

helm repo add ilum https://charts.ilum.cloud
helm install ilum ilum/ilum

This will install Ilum into your Kubernetes cluster. It should take around 2 minutes for Ilum to initialize.

Accessing the Ilum UI

After Ilum is installed, you can access the UI by port-forwarding the Ilum UI service to your localhost on port 9777:

kubectl port-forward svc/ilum-ui 9777:9777

Now, you can navigate to http://localhost:9777 in your web browser to access the Ilum UI. You can use default credentials admin:admin to log in.

Submitting a Spark Application on UI

Now that your Kubernetes cluster is configured to handle Spark jobs via Ilum, let's submit a Spark application. For this example, we'll use the "SparkPi" example from the Spark documentation. You can download the required jar file from this link.

Ilum will create a Spark driver pod using the Spark 3.x docker image. The number of Spark executor pods can be scaled to multiple nodes as per your requirements.

Ilum

And that's it! You've successfully set up Ilum and run your first Spark job. Feel free to explore the Ilum UI and API for submitting and managing Spark applications. For traditional approaches, you can also use the familiar spark-submit command.

Interactive Spark Job with Scala/Java

Interactive jobs in Ilum are long-running sessions that can execute job instance data immediately. This is especially useful as there's no need to wait for Spark context to be initialized every time. If multiple users point to the same job ID, they will interact with the same Spark context.

To enable interactive capabilities in your existing Spark jobs, you'll need to implement a simple interface to the part of your code that needs to be interactive. Here's how you can do it:

First, add the Ilum job API dependency to your project:

Gradle

implementation 'cloud.ilum:ilum-job-api:6.0.0'

Maven

<dependency>
<groupId>cloud.ilum</groupId>
<artifactId>ilum-job-api</artifactId>
<version>6.0.0</version>
</dependency>

sbt

libraryDependencies += "cloud.ilum" % "ilum-job-api" % "6.0.0"

Then, implement the Job trait/interface in your Spark job. Here's an example:

Scala

package interactive.job.example

import cloud.ilum.job.Job
import org.apache.spark.sql.SparkSession

class InteractiveJobExample extends Job {

override def run(sparkSession: SparkSession, config: Map[String, Any]): Option[String] = {
val userParam = config.getOrElse("userParam", "None").toString
Some(s"Hello ${userParam}")
}
}

Java

package interactive.job.example;

import cloud.ilum.job.Job;
import org.apache.spark.sql.SparkSession;
import scala.Option;
import scala.Some;
import scala.collection.immutable.Map;
public class InteractiveJobExample implements Job {
@Override
public Option<String> run(SparkSession sparkSession, Map<String, Object> config) {
String userParam = config.getOrElse("userParam", () -> "None");
return Some.apply("Hello " + userParam);
}
}

In this example, the run method is overridden to accept a SparkSession and a configuration map. It retrieves a user parameter from the configuration map and returns a greeting message.

You can find a similar example on GitHub.

By following this pattern, you can transform your Spark jobs into interactive jobs that can execute calculations immediately, improving user interactivity and reducing waiting times.

Interactive Spark Job with Python

pip install ilum

The Spark job logic is encapsulated in a class that extends IlumJob, particularly within its run method

from ilum.api import IlumJob

class PythonSparkExample(IlumJob):
def run(self, spark, config):
# Job logic here

Simple interactive spark pi example:

from random import random
from operator import add

from ilum.api import IlumJob


class SparkPiInteractiveExample(IlumJob):

def run(self, spark, config):
partitions = int(config.get('partitions', '5'))
n = 100000 * partitions

def f(_: int) -> float:
x = random() * 2 - 1
y = random() * 2 - 1
return 1 if x ** 2 + y ** 2 <= 1 else 0

count = spark.sparkContext.parallelize(range(1, n + 1), partitions).map(f).reduce(add)

return "Pi is roughly %f" % (4.0 * count / n)

You can find a similar example on GitHub.

Submitting an Interactive Spark Job on UI

After creating a file that contains your Spark code, you will need to submit it to Ilum. Here's how you can do it:

Open Ilum UI in your browser and create a new group:

Ilum

Put a name of a group, choose a cluster, upload your spark file and create a group:

Ilum

After applying the changes, Ilum will create a Spark driver pod. You can control the number of Spark executor pods by scaling them according to your needs.

Once the Spark container is ready, you can execute the jobs. To do this, you'll need to provide the canonical name of your Scala class and define any optional parameters in JSON format.

Ilum

Now we have to put the canonical name of our Scala class

interactive.job.example.InteractiveJobExample

and define the slices parameter in JSON format:

{
"userParam": "World"
}

The first requests might take few seconds because of initialization phase, each another will be immediate.

Ilum

By following these steps, you can submit and run interactive Spark jobs using Ilum. This functionality provides real-time data processing, enhances user interactivity, and reduces the time spent waiting for results.