Get Started
This guide will walk you through the process of setting up and running your first Spark job with Ilum on a Kubernetes cluster.
Prerequisites
Before proceeding, ensure that you have the following installed and properly configured:
Kubernetes Cluster: This guide assumes that you already have a Kubernetes cluster up and running. If not, please follow the instructions to set up a Kubernetes cluster on minikube. You can find detailed instructions on how to install minikube here.
Helm: Helm, the package manager for Kubernetes, is used to install Ilum. If you haven't installed Helm yet, you can find instructions here.
Setting Up the Kubernetes Cluster(minikube)
If you haven't already, start your Kubernetes cluster with the following command:
minikube start --cpus 4 --memory 8192 --addons metrics-server
This command sets up a minikube cluster with 4 CPUs, 8GB of memory, and adds the metrics-server for resource metrics collection.
Installing Ilum
Once your Kubernetes cluster is up and running, you can install Ilum by adding the Ilum Helm chart repository and then installing Ilum using Helm:
helm repo add ilum https://charts.ilum.cloud
helm install ilum ilum/ilum
This will install Ilum into your Kubernetes cluster. It should take around 2 minutes for Ilum to initialize.
Accessing the Ilum UI
After Ilum is installed, you can access the UI by port-forwarding the Ilum UI service to your localhost on port 9777:
kubectl port-forward svc/ilum-ui 9777:9777
Now, you can navigate to http://localhost:9777 in your web browser to access the Ilum UI.
You can use default credentials admin:admin
to log in.
Submitting a Spark Application on UI
Now that your Kubernetes cluster is configured to handle Spark jobs via Ilum, let's submit a Spark application. For this example, we'll use the "SparkPi" example from the Spark documentation. You can download the required jar file from this link.
Ilum will create a Spark driver pod using the Spark 3.x docker image. The number of Spark executor pods can be scaled to multiple nodes as per your requirements.And that's it! You've successfully set up Ilum and run your first Spark job. Feel free to explore the Ilum UI and API for submitting and managing Spark applications. For traditional approaches, you can also use the familiar spark-submit
command.
Interactive Spark Job with Scala/Java
Interactive jobs in Ilum are long-running sessions that can execute job instance data immediately. This is especially useful as there's no need to wait for Spark context to be initialized every time. If multiple users point to the same job ID, they will interact with the same Spark context.
To enable interactive capabilities in your existing Spark jobs, you'll need to implement a simple interface to the part of your code that needs to be interactive. Here's how you can do it:
First, add the Ilum job API dependency to your project:
Gradle
implementation 'cloud.ilum:ilum-job-api:6.0.0'
Maven
<dependency>
<groupId>cloud.ilum</groupId>
<artifactId>ilum-job-api</artifactId>
<version>6.0.0</version>
</dependency>
sbt
libraryDependencies += "cloud.ilum" % "ilum-job-api" % "6.0.0"
Then, implement the Job
trait/interface in your Spark job. Here's an example:
Scala
package interactive.job.example
import cloud.ilum.job.Job
import org.apache.spark.sql.SparkSession
class InteractiveJobExample extends Job {
override def run(sparkSession: SparkSession, config: Map[String, Any]): Option[String] = {
val userParam = config.getOrElse("userParam", "None").toString
Some(s"Hello ${userParam}")
}
}
Java
package interactive.job.example;
import cloud.ilum.job.Job;
import org.apache.spark.sql.SparkSession;
import scala.Option;
import scala.Some;
import scala.collection.immutable.Map;
public class InteractiveJobExample implements Job {
@Override
public Option<String> run(SparkSession sparkSession, Map<String, Object> config) {
String userParam = config.getOrElse("userParam", () -> "None");
return Some.apply("Hello " + userParam);
}
}
In this example, the run
method is overridden to accept a SparkSession
and a configuration map. It retrieves a user parameter from the configuration map and returns a greeting message.
You can find a similar example on GitHub.
By following this pattern, you can transform your Spark jobs into interactive jobs that can execute calculations immediately, improving user interactivity and reducing waiting times.
Interactive Spark Job with Python
pip install ilum
The Spark job logic is encapsulated in a class that extends IlumJob, particularly within its run method
from ilum.api import IlumJob
class PythonSparkExample(IlumJob):
def run(self, spark, config):
# Job logic here
Simple interactive spark pi example:
from random import random
from operator import add
from ilum.api import IlumJob
class SparkPiInteractiveExample(IlumJob):
def run(self, spark, config):
partitions = int(config.get('partitions', '5'))
n = 100000 * partitions
def f(_: int) -> float:
x = random() * 2 - 1
y = random() * 2 - 1
return 1 if x ** 2 + y ** 2 <= 1 else 0
count = spark.sparkContext.parallelize(range(1, n + 1), partitions).map(f).reduce(add)
return "Pi is roughly %f" % (4.0 * count / n)
You can find a similar example on GitHub.
Submitting an Interactive Spark Job on UI
After creating a file that contains your Spark code, you will need to submit it to Ilum. Here's how you can do it:
Open Ilum UI in your browser and create a new group:
Put a name of a group, choose a cluster, upload your spark file and create a group:
After applying the changes, Ilum will create a Spark driver pod. You can control the number of Spark executor pods by scaling them according to your needs.Once the Spark container is ready, you can execute the jobs. To do this, you'll need to provide the canonical name of your Scala class and define any optional parameters in JSON format.
Now we have to put the canonical name of our Scala classinteractive.job.example.InteractiveJobExample
and define the slices parameter in JSON format:
{
"userParam": "World"
}
The first requests might take few seconds because of initialization phase, each another will be immediate.
By following these steps, you can submit and run interactive Spark jobs using Ilum. This functionality provides real-time data processing, enhances user interactivity, and reduces the time spent waiting for results.