Run Spark Submit (spark-submit) on Kubernetes

A simple Spark job in Ilum operates just like one submitted via the standard spark-submit command, but with additional enhancements for ease of use, configuration, and integration with external tools.

You can use the JAR file with Spark examples from your local Spark installation or any custom JAR you have.

Below is a step-by-step guide to setting up and running a simple Spark job using spark-submit on Ilum. This guide demonstrates the core configuration needed and shows how to monitor your job’s progress within the Ilum platform. For a complete overview of Ilum's architecture, check the Architecture Overview.

Quick Start (TL;DR)

How do I run a Spark job on Kubernetes with spark-submit?

To run a Spark job on Ilum (Kubernetes), ensure Java 17 and Spark are installed, upload your JAR, and run:

Spark 4 (default)
Spark 3

Quick Start: Spark Submit on K8s
./bin/spark-submit \
  --master k8s://http://<ilum-core-address>:<ilum-core-port>  \
  --deploy-mode cluster \
  --class org.apache.spark.examples.SparkPi \
  --conf spark.driver.memory=4g \
  --conf spark.ilum.cluster=default \
  --conf spark.kubernetes.container.image=ilum/spark:4.1.1 \
  --conf spark.kubernetes.submission.waitAppCompletion=true \
  s3a://ilum-files/ilum/default/spark-examples_2.13-4.1.1.jar

Quick Start: Spark Submit on K8s
./bin/spark-submit \
  --master k8s://http://<ilum-core-address>:<ilum-core-port>  \
  --deploy-mode cluster \
  --class org.apache.spark.examples.SparkPi \
  --conf spark.driver.memory=4g \
  --conf spark.ilum.cluster=default \
  --conf spark.kubernetes.container.image=ilum/spark:3.5.8 \
  --conf spark.kubernetes.submission.waitAppCompletion=true \
  s3a://ilum-files/ilum/default/spark-examples_2.12-3.5.8.jar

Note: Replace <ilum-core-address> with your actual Ilum Core endpoint.

Step-by-Step Guide

1. Prerequisites

Ensure Java 17 is installed and correctly set in your JAVA_HOME.
Download and extract the appropriate version of Apache Spark:

Spark 4 (default)
Spark 3

Download Spark 4
wget https://archive.apache.org/dist/spark/spark-4.1.1/spark-4.1.1-bin-hadoop3.tgz
tar -xzf spark-4.1.1-bin-hadoop3.tgz
cd spark-4.1.1-bin-hadoop3

Download Spark 3
wget https://dlcdn.apache.org/spark/spark-3.5.8/spark-3.5.8-bin-hadoop3.tgz
tar -xzf spark-3.5.8-bin-hadoop3.tgz
cd spark-3.5.8-bin-hadoop3

2. Connect to Ilum

If Ilum is deployed on Kubernetes, forward the service port to your local machine to make Ilum accessible at localhost:9888.

Forward Core

kubectl port-forward svc/ilum-core 9888:9888

Production Tip

If you're communicating from within the same Kubernetes cluster, you can use Kubernetes DNS-based service addresses (e.g., http://ilum-core.namespace.svc.cluster.local) or expose services using Ingress.

3. Submit Your Spark Job

Choose the submission method that best fits your workflow:

REST (Local Testing)
Kubernetes (Production)
Auto-Upload (Local JAR)

This method is suitable for quick local testing.

1. Upload your JAR File

For demonstration, we assume the JAR is uploaded manually to MinIO.

Spark 4 (default)
Spark 3

Locate the example JAR: examples/jars/spark-examples_2.13-4.1.1.jar

Upload it to MinIO (bucket ilum-files, path ilum/default/): s3a://ilum-files/ilum/default/spark-examples_2.13-4.1.1.jar

Locate the example JAR: examples/jars/spark-examples_2.12-3.5.8.jar

Upload it to MinIO (bucket ilum-files, path ilum/default/): s3a://ilum-files/ilum/default/spark-examples_2.12-3.5.8.jar

2. Submit via REST

Limitation

spark.ilum.pyRequirements is not supported in this mode, as REST does not support PySpark submissions.

Run the following command:

Spark 4 (default)
Spark 3

REST Submit (Spark 4)
./bin/spark-submit \
  --master spark://localhost:9888 \
  --deploy-mode cluster \
  --class org.apache.spark.examples.SparkPi \
  --conf spark.master.rest.enabled=true \
  --conf spark.ilum.cluster=default \
  --conf spark.app.name=my-spark-job \
  s3a://ilum-files/ilum/default/spark-examples_2.13-4.1.1.jar

REST Submit (Spark 3)
./bin/spark-submit \
  --master spark://localhost:9888 \
  --deploy-mode cluster \
  --class org.apache.spark.examples.SparkPi \
  --conf spark.master.rest.enabled=true \
  --conf spark.ilum.cluster=default \
  --conf spark.app.name=my-spark-job \
  s3a://ilum-files/ilum/default/spark-examples_2.12-3.5.8.jar

Parameters:

Parameter	Description
`--master`	Ilum Core address via REST (e.g. `spark://localhost:9888`).
`--conf spark.master.rest.enabled=true`	Enables REST submission.
`s3a://...`	JAR file path in MinIO.

Expected Output

Running Spark using the REST application submission protocol.
25/03/12 12:58:01 INFO RestSubmissionClient: Submitting a request to launch an application in spark://localhost:9888.
25/03/12 12:58:03 INFO RestSubmissionClient: Submission successfully created as 20250312-1158-qdnioef2rny. Polling submission state...
25/03/12 12:58:03 INFO RestSubmissionClient: Submitting a request for the status of submission 20250312-1158-qdnioef2rny in spark://localhost:9888.
25/03/12 12:58:03 INFO RestSubmissionClient: State of driver 20250312-1158-qdnioef2rny is now SUBMITTED.
25/03/12 12:58:03 INFO RestSubmissionClient: Driver is running on worker ILUM at ILUM_UI_ADDRESS/workloads/details/job/20250312-1158-qdnioef2rny.
25/03/12 12:58:03 INFO RestSubmissionClient: Server responded with CreateSubmissionResponse:
{
  "action" : "CreateSubmissionResponse",
  "serverSparkVersion" : "4.1.1",
  "submissionId" : "20250312-1158-qdnioef2rny",
  "success" : true
}
25/03/12 12:58:03 INFO ShutdownHookManager: Shutdown hook called
25/03/12 12:58:03 INFO ShutdownHookManager: Deleting directory /tmp/spark-fa2603be-488a-4e2a-9b7f-5e49825d379b

This method is recommended for production and supports advanced features like Python dependencies.

1. Upload your JAR File

For demonstration, we assume the JAR is uploaded manually to MinIO.

Production Best Practice

In production environments, this process should be automated and executed programmatically (e.g., using AWS SDK, Hadoop's S3A connector) before triggering spark-submit.

Spark 4 (default)
Spark 3

Locate the example JAR: examples/jars/spark-examples_2.13-4.1.1.jar

Upload it to MinIO (bucket ilum-files, path ilum/default/): s3a://ilum-files/ilum/default/spark-examples_2.13-4.1.1.jar

Locate the example JAR: examples/jars/spark-examples_2.12-3.5.8.jar

Upload it to MinIO (bucket ilum-files, path ilum/default/): s3a://ilum-files/ilum/default/spark-examples_2.12-3.5.8.jar

2. Submit via Kubernetes Mode

Include Ilum-specific configurations for better management:

--conf spark.ilum.cluster=default
--conf spark.app.name=my-spark-job
--conf spark.ilum.tags=analytics,pi-calculation
--conf spark.ilum.pyRequirements="numpy==1.24.1,pandas==2.0.3" (if using Python)

These configurations allow you to better categorize, identify, and manage jobs within the Ilum UI. Run this command:

Spark 4 (default)
Spark 3

K8s Submit (Spark 4)
./bin/spark-submit \
  --master k8s://http://localhost:9888 \
  --deploy-mode cluster \
  --class org.apache.spark.examples.SparkPi \
  --conf spark.driver.memory=4g \
  --conf spark.ilum.cluster=default \
  --conf spark.kubernetes.container.image=ilum/spark:4.1.1 \
  --conf spark.kubernetes.submission.waitAppCompletion=true \
  s3a://ilum-files/ilum/default/spark-examples_2.13-4.1.1.jar

K8s Submit (Spark 3)
./bin/spark-submit \
  --master k8s://http://localhost:9888 \
  --deploy-mode cluster \
  --class org.apache.spark.examples.SparkPi \
  --conf spark.driver.memory=4g \
  --conf spark.ilum.cluster=default \
  --conf spark.kubernetes.container.image=ilum/spark:3.5.8 \
  --conf spark.kubernetes.submission.waitAppCompletion=true \
  s3a://ilum-files/ilum/default/spark-examples_2.12-3.5.8.jar

Parameters:

Parameter	Description
`--master`	Address of your Kubernetes API (or Ilum Core for REST mode).
`--deploy-mode cluster`	Submits the job to the Spark cluster.
`--class`	Entry point class of your Spark application (use format `filename.classname` for Python).
`--conf spark.driver.memory`	Specifies memory allocation for the driver.
`--conf spark.ilum.cluster`	Logical cluster name defined in Ilum.
`--conf spark.kubernetes.container.image`	Docker image containing Spark.
`--conf spark.kubernetes.submission.waitAppCompletion=true`	Keeps the CLI process attached until the job completes.
`s3a://...`	JAR path in S3-compatible storage like MinIO.

Expected Output

25/04/02 15:42:30 INFO SparkKubernetesClientFactory: Auto-configuring K8S client using current context from users K8S config file
25/04/02 15:42:38 INFO LoggingPodStatusWatcherImpl: State changed, new state: 
 pod name: org-apache-spark-examples-sparkpi-30ae6d95f6bd37fd-driver
 namespace: default
 labels: ilum.jobId -> 20250402-1342-s55afwq7gax, ilum.clusterName -> default
 pod uid: 20250402-1342-s55afwq7gax
 creation time: 2025-04-02T13:42:37.145Z
 service account name: ilum-test-ilum-core-spark
 volumes: N/A
 node name: ilum
 start time: 2025-04-02T13:42:37.145Z
   phase: Running
 container status: 
   container name: spark-driver
   container image: ilum/spark:4.1.1
   container state: running
   container started at: 2025-04-02T13:42:37.145Z
25/04/02 15:42:38 INFO LoggingPodStatusWatcherImpl: Waiting for application org.apache.spark.examples.SparkPi with application ID spark-a4f8f1eb4ed344f38d799d79817c45dc and submission ID default:org-apache-spark-examples-sparkpi-30ae6d95f6bd37fd-driver to finish...
25/04/02 15:42:39 INFO LoggingPodStatusWatcherImpl: Application status for spark-a4f8f1eb4ed344f38d799d79817c45dc (phase: Running)
...
25/04/02 15:43:03 INFO LoggingPodStatusWatcherImpl: Application status for spark-a4f8f1eb4ed344f38d799d79817c45dc (phase: Running)
25/04/02 15:43:03 INFO LoggingPodStatusWatcherImpl: Container final statuses:

  container name: spark-driver
  container image: ilum/spark:4.1.1
  container state: terminated
  container started at: 2025-04-02T13:42:37.145Z
  container finished at: 2025-04-02T13:43:03.145Z
  exit code: 0
  termination reason: Spark job completed
25/04/02 15:43:03 INFO LoggingPodStatusWatcherImpl: Application org.apache.spark.examples.SparkPi with application ID spark-a4f8f1eb4ed344f38d799d79817c45dc and submission ID default:org-apache-spark-examples-sparkpi-30ae6d95f6bd37fd-driver finished
25/04/02 15:43:03 INFO ShutdownHookManager: Shutdown hook called
25/04/02 15:43:03 INFO ShutdownHookManager: Deleting directory /tmp/spark-61e63485-79a3-491d-9bac-c71a8c1d96aa

Ilum can automatically upload your local JARs to MinIO during submission.

1. Additional Prerequisites Forward the MinIO port so the local Spark client can upload files:

Forward MinIO

kubectl port-forward svc/ilum-minio 9000:9000

2. Download S3 Dependencies Download the required Hadoop/AWS JARs to your local jars folder:

Spark 4.x Compatibility

Spark 4.x bundles hadoop-client-3.4.x which uses time-suffixed config values (e.g., "60s"). Using hadoop-aws-3.3.4 with Spark 4.x will fail with java.lang.NumberFormatException: For input string: "60s". Make sure to use the matching hadoop-aws version for your Spark version.

Spark 4 (default)
Spark 3

Download Dependencies (Spark 4)
wget -P jars \
  https://repo1.maven.org/maven2/org/apache/hadoop/hadoop-aws/3.4.1/hadoop-aws-3.4.1.jar \
  https://repo1.maven.org/maven2/com/amazonaws/aws-java-sdk-bundle/1.12.367/aws-java-sdk-bundle-1.12.367.jar \
  https://repo1.maven.org/maven2/software/amazon/awssdk/bundle/2.24.6/bundle-2.24.6.jar

Download Dependencies (Spark 3)
wget -P jars \
  https://repo1.maven.org/maven2/org/apache/hadoop/hadoop-aws/3.3.4/hadoop-aws-3.3.4.jar \
  https://repo1.maven.org/maven2/com/amazonaws/aws-java-sdk/1.12.262/aws-java-sdk-1.12.262.jar \
  https://repo1.maven.org/maven2/com/amazonaws/aws-java-sdk-core/1.12.262/aws-java-sdk-core-1.12.262.jar \
  https://repo1.maven.org/maven2/com/amazonaws/aws-java-sdk-dynamodb/1.12.262/aws-java-sdk-dynamodb-1.12.262.jar \
  https://repo1.maven.org/maven2/com/amazonaws/aws-java-sdk-s3/1.12.262/aws-java-sdk-s3-1.12.262.jar

3. Submit with Auto-Upload

This command includes configuration to connect to the local MinIO port and upload files. (replace image/tag or S3 credentials as needed)

Spark 4 (default)
Spark 3

Auto-Upload Submit (Spark 4)
./bin/spark-submit \
  --master k8s://http://localhost:9888 \
  --deploy-mode cluster \
  --class org.apache.spark.examples.SparkPi \
  --conf spark.ilum.cluster=default \
  --conf spark.app.name=my-sparkpi-job \
  --conf spark.kubernetes.container.image=ilum/spark:4.1.1 \
  --conf spark.kubernetes.submission.waitAppCompletion=true \
  --conf spark.kubernetes.file.upload.path=s3a://ilum-files/spark-jobs \
  --conf spark.hadoop.fs.s3a.endpoint=http://localhost:9000 \
  --conf spark.hadoop.fs.s3a.access.key=minioadmin \
  --conf spark.hadoop.fs.s3a.secret.key=minioadmin \
  --conf spark.hadoop.fs.s3a.path.style.access=true \
  --conf spark.hadoop.fs.s3a.impl=org.apache.hadoop.fs.s3a.S3AFileSystem \
  --conf spark.hadoop.fs.s3a.fast.upload=true \
  --conf spark.hadoop.fs.s3a.aws.credentials.provider=org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider \
  ./examples/jars/spark-examples_2.13-4.1.1.jar

Auto-Upload Submit (Spark 3)
./bin/spark-submit \
  --master k8s://http://localhost:9888 \
  --deploy-mode cluster \
  --class org.apache.spark.examples.SparkPi \
  --conf spark.ilum.cluster=default \
  --conf spark.app.name=my-sparkpi-job \
  --conf spark.kubernetes.container.image=ilum/spark:3.5.8 \
  --conf spark.kubernetes.submission.waitAppCompletion=true \
  --conf spark.kubernetes.file.upload.path=s3a://ilum-files/spark-jobs \
  --conf spark.hadoop.fs.s3a.endpoint=http://localhost:9000 \
  --conf spark.hadoop.fs.s3a.access.key=minioadmin \
  --conf spark.hadoop.fs.s3a.secret.key=minioadmin \
  --conf spark.hadoop.fs.s3a.path.style.access=true \
  --conf spark.hadoop.fs.s3a.impl=org.apache.hadoop.fs.s3a.S3AFileSystem \
  --conf spark.hadoop.fs.s3a.fast.upload=true \
  --conf spark.hadoop.fs.s3a.aws.credentials.provider=org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider \
  ./examples/jars/spark-examples_2.12-3.5.8.jar

What happens behind the scenes?

Spark client uploads your local JAR automatically to the specified MinIO bucket (ilum-files/spark-jobs).
Kubernetes driver pod is created on the Ilum-managed Kubernetes cluster.
Job execution is monitored directly through the Ilum UI.

4. Monitor and Troubleshoot

Using the Ilum UI:

Monitor Job Progress: Track executors, memory usage, and job stages.
Review Results: Access logs and the integrated Spark History Server.
Troubleshoot: Diagnose failures by checking detailed executor logs.

For more details on monitoring metrics, see the Monitoring Guide.

Comparison: Classic spark-submit vs Ilum Approach

Running Spark directly on Kubernetes requires significant administrative effort. Ilum simplifies this by automating infrastructure management.

Traditional Approach (Native Spark on K8s) vs Ilum

Feature	Native Spark on K8s	Ilum (Managed Spark)
Setup	Manual Docker image build & complex `spark-submit` args.	Automated. Use existing JARs; Ilum handles images.
Config	Verbose (Service Accounts, Volumes, Secrets).	Simplified. Minimal args; configs are injected automatically.
Storage	Manual Hadoop/S3 configuration per job.	Integrated. Automatic credential injection for S3/GCS/Azure.
Monitoring	CLI-based (`kubectl logs`), ephemeral.	Centralized UI. Persistent logs, metrics, and history.
Observability	Basic Spark UI (if exposed).	Advanced. Data Lineage, detailed resource metrics.

Key Benefits of Ilum:

Automatic Image Selection: Ilum selects a compatible Spark Docker image matching the cluster version.
Advanced Observability: Ilum provides deep lineage observability and advanced monitoring capabilities.
Simplified Configuration: Reduce spark-submit parameters by 3x-4x.
Integrated Storage Access: Credentials for all configured storages are automatically injected.
Instant Monitoring: Logs and metrics (CPU/RAM) appear in the Ilum UI immediately.

For a developer, this means less time fighting with infrastructure and error-prone configurations, and more time delivering business logic.

For advanced customization, refer to the official Spark documentation.

Step-by-Step Guide​

1. Prerequisites​

2. Connect to Ilum​

3. Submit Your Spark Job​

4. Monitor and Troubleshoot​

Comparison: Classic spark-submit vs Ilum Approach​

Traditional Approach (Native Spark on K8s) vs Ilum​