User Guides | Ilum Documentation - Managed Spark Cluster

📄️ How to run simple spark job

A simple Spark job in Ilum operates just like one submitted via spark-submit, but with additional enhancements for ease of use, configuration, and integration with external tools.

📄️ How to run interactive spark job

An interactive Spark job is a dynamic and responsive way to execute Spark tasks that allows for real-time manipulation and querying of data directly within a live Spark session. Unlike a traditional Spark job, which is typically batch-oriented and executed as a single, predefined workflow, an interactive job offers a more flexible and exploratory environment.

📄️ How to run interactive code group

The Interactive Code Group in Ilum offers a robust and dynamic environment for real-time data exploration and analysis.

📄️ How to Run a Simple Spark Job via `spark-submit`

A simple Spark job in Ilum operates just like one submitted via the standard spark-submit command, but with additional enhancements for ease of use, configuration, and integration with external tools.

📄️ How to create local cluster

Simple guide on how to create a local cluster

📄️ How to add Kubernetes cluster to Ilum?

Introduction

📄️ How to create a storage

Ilum allows you to link GCS, S3, WASDB and HDFS storages to your clusters. Such a link allows Ilum to automatically configure all your jobs

📄️ How to setup spark cluster in air gapped (offline) environment

Below is a step‐by‐step guide to installing ilum in an offline (air‐gapped) environment. This guide is written to be agnostic to your Kubernetes distribution and covers both approaches for managing container images—using containerd (with the ctr tool) or Docker. The instructions assume that you have:

📄️ How to run dbt-spark with ilum

This is a beginner-friendly guide to set up dbt-core with Apache Spark on a Kubernetes cluster, using the Ilum platform as the Spark computation engine. The default approach leverages Ilum SQL module as a Thrift server for SQL query execution and Hive Metastore for metadata management. Additionally, this guide includes an alternative method using Spark Connect with method: session, which connects directly to a Spark cluster without a Thrift server. The guide walks through deploying Ilum, connecting to either the Thrift server or Spark Connect, configuring dbt-spark, and running simple dbt models to create and read data.

📄️ Apache Spark Connect on Ilum: Configuration and Connection Guide

Configuring and Connecting to Apache Spark Connect on Ilum Cluster. Intro to Spark Microservices.

📄️ Handling Spark Dependencies in Ilum

Ilum provides three methods to handle dependencies, each suited for different use cases.