Skip to main content

How to run simple spark job

A simple Spark job in Ilum operates just like one submitted via spark-submit, but with additional enhancements for ease of use, configuration, and integration with external tools.

You can use the jar file with spark examples from this link.

Interactive Guide

Guide in Full Screen

Here's a step-by-step guide to setting up a simple Spark job using Ilum. This guide will walk you through configuring, executing, and monitoring a basic job named MiniReadWriteTest within the Ilum platform.

Step-by-Step Guide to Running MiniReadWriteTest in Ilum

  1. Navigate to the Jobs Section: This area allows you to manage all your data processing tasks.

  2. Create a New Job:

    • Click on the ‘New Job’ button to start the setup process.
  3. Fill Out Job Details:

    • Job Name: Give your job a unique name for easy identification.
    • Full Class Name: Input MiniReadWriteTest, a simple example job that performs the following:
      • Reads a local file.
      • Computes a word count on the file.
      • Writes the output to a directory on each executor.
      • Reads the output back from each executor.
      • Re-computes the word count using Spark.
      • Compares the word count results to ensure consistency.
  4. Set Job Parameters and Configuration:

    • Specify the location of the local file to distribute to executors. Use kv1.txt, a test file available in every Spark environment.
  5. Upload Resources:

    • You can upload additional resources like files or scripts required for your job.
    • Upload the executable jar file that contains your job code.
  6. Resource Allocation:

    • Configure resource settings such as CPU and memory. For this guide, we will proceed with the default settings.
  7. Submit and Monitor the Job:

    • Submit the job.
    • Navigate to the logs section to review logs from each executor.
  8. Review Job Execution:

    • Once the job has started, check the status in the job overview section.
    • Monitor the memory usage and other performance metrics in the executors section.
    • Observe the progress of your job through each stage on the timeline.
  9. Completion and Review:

    • Upon completion, the job details and results are logged into the Spark history server.
    • Visit the history server section to see your completed job and review detailed execution stages.
  10. Final Step:

  • Congratulations! You have successfully set up and run your MiniReadWriteTest job in Ilum. For further information or support, contact [email protected].

By following these steps, you'll be able to efficiently set up, run, and monitor a basic Spark job within the Ilum platform, gaining familiarity with its functionalities and preparing you for more complex data processing tasks.

Here's a consolidated explanation of how Ilum facilitates Spark job submissions, blending the traditional features of spark-submit with Ilum's advanced management capabilities:

Seamless Integration and Simplified Management

  • Universal Compatibility: Ilum enables the submission of any Spark job, akin to using spark-submit. It supports various programming languages used with Spark, including Scala, Python, and R, catering to all typical Spark operations like batch processing, streaming jobs, or interactive queries.

  • Simplified Command Execution: While spark-submit often involves complex command-line inputs for library dependencies, job parameters, and cluster configurations, Ilum abstracts these into an intuitive user interface. This approach minimizes error risks and simplifies operations, especially beneficial for those less familiar with command-line intricacies.

  • Direct Code Deployment: Users can upload their JAR files, Python scripts, or notebooks directly into Ilum, similar to specifying resources in a spark-submit command. Ilum enhances this by allowing these resources to be configured for scheduled or event-triggered executions, providing greater operational flexibility.

  • Automated Environment Handling: Unlike the manual setup required with spark-submit, Ilum ensures all dependencies and configurations are automatically managed. This guarantees that the execution environment is consistently prepared for job execution, whether on local clusters, cloud, or hybrid setups.

  • Integrated Monitoring and Tooling: Ilum comes with built-in integration for monitoring and logging tools, which in the spark-submit workflow would require additional setup. This integration provides users with ready-to-use solutions for tracking job performance, managing logs, and connecting with other data services seamlessly.

Enhanced Job Submission Experience

Ilum not only matches the capabilities of spark-submit but extends them by reducing the overhead associated with job configuration and environmental setup. It offers an all-encompassing platform that simplifies the deployment, management, and scaling of Spark jobs, making it an ideal solution for organizations aiming to enhance their data processing workflows without compromising the power and flexibility of Apache Spark.