Skip to main content

Run Apache Spark Jobs via Ilum UI

Running an Apache Spark job on Kubernetes with Ilum operates just like one submitted via spark-submit, but with additional enhancements for ease of use, configuration, and integration with external tools.

You can use the jar file with spark examples from one of these links:

Spark 4 / Scala 2.13: spark-examples_2.13-4.1.1.jar

Interactive Spark Job Submission Guide

Guide in Full Screen

Here's a step-by-step guide to setting up a simple Spark job using Ilum. This guide will walk you through configuring, executing, and monitoring a basic job named MiniReadWriteTest within the Ilum platform.

Step-by-Step Tutorial: Running Your First Spark Job

  1. Navigate to the Jobs Section: This area allows you to manage all your data processing tasks.

  2. Create a New Job:

    • Click on the ‘New Job +’ button to start the setup process.
  3. Fill Out Job Details:

    • General Tab:

      • Name: Enter MiniReadWriteTest
      • Job Type: Select Spark Job
      • Class: Enter org.apache.spark.examples.MiniReadWriteTest
      • Language: Select Scala
    • Configuration Tab:

      • Arguments: Enter /opt/spark/examples/src/main/resources/kv1.txt

      This path specifies a local file to be distributed to executors, a test file available in every Spark environment.

    • Resources Tab:

      • Jars: Upload the JAR file:

Spark 4 / Scala 2.13: spark-examples_2.13-4.1.1.jar

  • Memory Tab:
    • Leave all settings at their default values for this example.
  1. Submit and Monitor the Job:

    • Submit the job.
    • Navigate to the logs section to review logs from each executor.
    • You should see log output showing the job execution, including:
      • Spark initialization messages (SparkContext: Running Spark version 3.5.7)
      • File reading and word count operations (Performing local word count from /opt/spark/examples/src/main/resources/kv1.txt)
      • Task execution across executors (Starting task 0.0 in stage 0.0)
      • Final success message (Success! Local Word Count 500 and D Word Count 500 agree.)
  2. Review Job Execution:

    • Once the job has started, check the status in the job overview section.
    • Monitor the memory usage and other performance metrics in the executors section.
    • Observe the progress of your job through each stage on the timeline.
  3. Completion and Review:

    • Upon completion, the job details and results are logged into the Spark history server.
    • Visit the history server section to see your completed job and review detailed execution stages.
  4. Final Step:

    • Congratulations! You have successfully set up and run your MiniReadWriteTest job in Ilum. For further information or support, contact [email protected].

To submit jobs programmatically instead of using the UI, see the Run Spark Job via REST API guide.

Congratulations! You have successfully set up and run your MiniReadWriteTest job in Ilum. For further information or support, contact [email protected].

By following these steps, you'll be able to efficiently set up, run, and monitor a basic Spark job within the Ilum platform, gaining familiarity with its functionalities and preparing you for more complex data processing tasks.

Here's a consolidated explanation of how Ilum facilitates Spark job submissions, blending the traditional features of spark-submit with Ilum's advanced management capabilities:

Loading example job

info

Ilum provides an example job to help new users get started quickly. Example job loading is enabled by default. However, you can disable it by using --set ilum-core.examples.job=false.

Why Ilum is a Better Alternative to spark-submit

  • Universal Compatibility: Ilum enables the submission of any Spark job, akin to using spark-submit. It supports various programming languages used with Spark, including Scala, Python, and R, catering to all typical Spark operations like batch processing, streaming jobs, or interactive queries.

  • Simplified Command Execution: While spark-submit often involves complex command-line inputs for library dependencies, job parameters, and cluster configurations, Ilum abstracts these into an intuitive user interface. This approach minimizes error risks and simplifies operations, especially beneficial for those less familiar with command-line intricacies.

  • Direct Code Deployment: Users can upload their JAR files, Python scripts, or notebooks directly into Ilum, similar to specifying resources in a spark-submit command. Ilum enhances this by allowing these resources to be configured for scheduled or event-triggered executions, providing greater operational flexibility.

  • Automated Environment Handling: Unlike the manual setup required with spark-submit, Ilum ensures all dependencies and configurations are automatically managed. This guarantees that the execution environment is consistently prepared for job execution, whether on local clusters, cloud, or hybrid setups.

  • Integrated Monitoring and Tooling: Ilum comes with built-in integration for monitoring and logging tools, which in the spark-submit workflow would require additional setup. This integration provides users with ready-to-use solutions for tracking job performance, managing logs, and connecting with other data services seamlessly.

Enhanced Job Submission Experience

Ilum not only matches the capabilities of spark-submit but extends them by reducing the overhead associated with job configuration and environmental setup. It offers an all-encompassing platform that simplifies the deployment, management, and scaling of Spark jobs, making it an ideal solution for organizations aiming to enhance their data processing workflows without compromising the power and flexibility of Apache Spark.

Job Configuration Reference

ParameterDescription
NameA unique identifier for the job. This name is used in the dashboard and logs to track the job's execution and history.
Job TypeThe category of the job to be created. Select Spark Job for standard batch processing or Spark Connect Job for client-server Spark applications.
ClusterThe target cluster where the job will be executed. Choose a cluster that has the necessary resources and data access for your job.
ClassThe fully qualified class name of the application (e.g., org.apache.spark.examples.SparkPi) or the filename for Python scripts. This tells Spark which code to execute as the entry point.
LanguageThe programming language used for the job. Select Scala or Python to match your application code.
Max RetriesThe maximum number of times Ilum will attempt to restart the job if it fails. Setting this helps ensure job completion in case of transient errors.