How to setup spark cluster in air gapped (offline) environment

Below is a step‐by‐step guide to installing ilum in an offline (air‐gapped) environment. This guide is written to be agnostic to your Kubernetes distribution and covers both approaches for managing container images—using containerd (with the ctr tool) or Docker. The instructions assume that you have:

An Internet-connected workstation to download the required Helm chart and container images.
A method (such as a USB drive or internal file server) to transfer files from the online workstation to your offline environment.
A working Kubernetes cluster (any distribution) in your offline environment.
Helm installed and configured to connect to your offline cluster.

The process is broken down into these steps:

1.1. Download the ilum Helm Chart

Ilum chart is provided via a public Helm repository, run:

helm repo add ilum https://charts.ilum.cloud
helm repo update
helm pull ilum/ilum #(you can add --version <desired_version> to specify a ilum version)

This will create a file ilum-<version>.tgz.

Tip: You can extract and modify the chart’s values.yaml later if you want to change the image repository references.

1.2. Identify and Download Required Container Images

Below is the list of images(version 6.3.0 of ilum):

apache/superset:dockerize
bitnami/git:2.48.1
bitnami/postgresql:16
curlimages/curl:8.5.0
docker.io/bitnami/minio:2023.12.23-debian-11-r3
docker.io/bitnami/os-shell:11-debian-11-r72
docker.io/bitnami/postgresql:16.1.0-debian-11-r25
docker.io/bitnami/redis:7.0.10-debian-11-r4
docker.io/ilum/mongodb:6.0.5
gitea/gitea:1.22.3
ilum/airflow:2.9.3
ilum/core:6.3.0
ilum/hive:3.1.3
ilum/kyuubi:1.10.0-spark
ilum/livy-proxy:6.3.0
ilum/marquez:0.49.0
ilum/mongodb:6.0.5
ilum/spark:3.5.3-delta
ilum/spark-launcher:spark-3.5.3
ilum/sparkmagic:0.22.0
ilum/superset:4.1.0
ilum/ui:6.3.0
registry.k8s.io/git-sync/git-sync:v4.1.0

1.3. Save Each Image as a Tarball

You can script this process. For example, create a file named pull_and_save.sh:

#!/bin/bash
IMAGES=(
  "apache/superset:dockerize"
  "bitnami/git:2.48.1"
  "bitnami/postgresql:16"
  "curlimages/curl:8.5.0"
  "docker.io/bitnami/minio:2023.12.23-debian-11-r3"
  "docker.io/bitnami/os-shell:11-debian-11-r72"
  "docker.io/bitnami/postgresql:16.1.0-debian-11-r25"
  "docker.io/bitnami/redis:7.0.10-debian-11-r4"
  "docker.io/ilum/mongodb:6.0.5"
  "gitea/gitea:1.22.3"
  "ilum/airflow:2.9.3"
  "ilum/core:6.3.0"
  "ilum/hive:3.1.3"
  "ilum/kyuubi:1.10.0-spark"
  "ilum/livy-proxy:6.3.0"
  "ilum/marquez:0.49.0"
  "ilum/mongodb:6.0.5"
  "ilum/spark:3.5.3-delta"
  "ilum/spark-launcher:spark-3.5.3"
  "ilum/sparkmagic:0.22.0"
  "ilum/superset:4.1.0"
  "ilum/ui:6.3.0"
  "registry.k8s.io/git-sync/git-sync:v4.1.0"
)

for image in "${IMAGES[@]}"; do
  echo "Pulling $image..."
  docker pull "$image"
  filename=$(echo "$image" | tr '/:' '__')
  echo "Saving $image to ${filename}.tar..."
  docker save "$image" -o "${filename}.tar"
done

Run the script:

chmod +x pull_and_save.sh
./pull_and_save.sh

This produces a set of .tar files containing your images.

2. Transfer Artifacts to the Offline Environment

Use your preferred method (USB drive, internal file server, scp etc.) to copy the following from your online machine to the offline environment:

The Helm chart package (e.g., ilum-<version>.tgz)
All image tarballs (e.g., apache_superset_dockerize.tar, ilum_hive_3.1.3.tar, etc.)

3. Load Images into Your Container Runtime

When using a local registry, you don’t need to load the images manually on every node. Instead, you can push them into your local registry, and then all nodes will pull the images from the registry as needed. Below are instructions for preparing the images before pushing them into your registry.

Option A: Using containerd (with `ctr`)

3A.1. Import the Image Tarballs (on one machine)

On a machine that can access your local registry import the tarball(s):

sudo ctr -n k8s.io images import /path/to/<image_tarball>.tar

For example:

sudo ctr -n k8s.io images import /opt/offline-images/ilum_hive_3.1.3.tar

3A.2. Tag the Image for the Local Registry

Tag the image with your local registry’s endpoint (for example, if your registry is accessible at localhost:5000):

sudo ctr -n k8s.io images tag ilum/hive:3.1.3 localhost:5000/ilum/hive:3.1.3

3A.3. Push the Image to the Local Registry

Push the tagged image:

sudo ctr -n k8s.io images push --plain-http localhost:5000/ilum/hive:3.1.3

(Use --plain-http if your registry is configured without TLS.)

Option B: Using Docker

3B.1. Load the Image Tarball (if needed)

If you wish to verify locally before pushing (optional), you can load an image on your admin machine:

docker load -i /path/to/<image_tarball>.tar

For example:

docker load -i /opt/offline-images/ilum_hive_3.1.3.tar

3B.2. Tag the Image for Your Local Registry

docker tag ilum/hive:3.1.3 localhost:5000/ilum/hive:3.1.3

3B.3. Push the Image to the Local Registry

docker push localhost:5000/ilum/hive:3.1.3

Note: Once the images are in your local registry, every node in your cluster can pull them automatically when needed. There is no requirement to pre-load the images on every node.

4. Preferred Option: Set Up a Local Image Registry

Using a local registry is highly recommended because it simplifies image management and scales well for larger or dynamic clusters. Although you can run a basic registry using registry:2, consider robust alternatives like Harbor, Nexus Repository Manager, or Quay. For example, Harbor offers role‑based access control, vulnerability scanning, image replication, and a user-friendly web UI.

Important: Ensure that whichever registry you choose is configured with an attached persistent volume (or persistent storage). This guarantees that your images remain available even if the registry container is restarted or updated.

Example: Setting Up a Basic Registry with Persistent Storage (Docker)

Create a Directory for Registry Data:
```
mkdir -p /opt/registry-data
```

Run the Registry Container with a Volume:

docker run -d \
  -p 5000:5000 \
  --name registry \
  -v /opt/registry-data:/var/lib/registry \
  registry:2

Example: Using Harbor

For a more robust solution, download Harbor’s offline installer from the Harbor GitHub Releases page and follow the provided documentation. Harbor requires you to configure persistent storage (using volumes) as part of its installation.

5. Update the Helm Chart to Use the Local Registry

To ensure that ilum pulls images from your local registry rather than public repositories, update the image repository references in the Helm chart. For example, if the default values have:

ilum-core:
  image: "ilum/core:6.3.0"

Change it to:

ilum-core:
  image: "localhost:5000/ilum/core:6.3.0"

You can either edit the chart’s default values.yaml or provide an override file. For example, create local-registry-values.yaml:

ilum-core:
  image: "localhost:5000/ilum/core:6.3.0"

ilum-ui:
  image: "localhost:5000/ilum/ui:6.3.0"

ilum-livy-proxy:
  image: "localhost:5000/ilum/livy-proxy:6.3.0"

# Add similar overrides for other components (e.g., ilum/airflow, ilum/hive, etc.)

Then install (or upgrade) using:

helm install ilum /path/to/ilum-<version>.tgz --namespace ilum --create-namespace -f local-registry-values.yaml

6. Install ilum Using Helm

Ensure your kubeconfig is configured for your offline cluster, then install the Helm chart:

helm install ilum /path/to/ilum-<version>.tgz --namespace ilum --create-namespace

(Include the override file if you are using one.)

7. Verify and Troubleshoot

7.1. Verify the Deployment

Check the Helm Release Status:
```
helm status ilum --namespace ilum
```
List Pods:
```
kubectl get pods -n ilum
```
Inspect Pod Image References:

For example:
```
kubectl describe pod <pod_name> -n ilum | grep Image:
```
Confirm that the image paths reference your local registry (e.g., localhost:5000/ilum/hive:3.1.3).

7.2. Troubleshooting

ImagePullBackOff Errors:
Verify that images are available in the local registry and that all nodes can access the registry.
Registry Access:
Ensure that any required insecure registry settings (if using HTTP) are configured on your nodes.
Persistent Storage:
Confirm that the local registry’s data directory is correctly mounted so that images persist across container restarts.

Apache Spark on Kubernetes in an Air-Gapped Environment – Extended Technical Overview.

When deploying Apache Spark on Kubernetes in an air-gapped environment, which means Spark’s driver and executor processes are all running as containerized processes, 100% managed within boxes devoid of any public connectivity. All the container images and dependencies along with the configuration files get hosted internally and they do not need external network access.

In this environment, we deploy Spark components as containerized applications on Kubernetes. Spark driver and executors are pods. Scheduling, resource allocation, and scaling are handled with Kubernetes (or K8s). The setup uses properties of Kubernetes like resource limits, Horizontal Pod Autoscaling, node selectors etc. for smooth and reliable functioning.

A local image registry is a key part of the architecture. Instead of having to load images yourself on every node, you push them into a registry that is within your infrastructure. Regardless of whether you’ve done a basic deployment with registry:2 or a full-blown installation of Harbor, the registry must be backed by persistent storage to retain images after restarts/updates. When the images are in the registry, individual nodes can pull them on demand.

Networking and security are also critical. An air gapped environment has network policies that allow or deny communication of pods. These policies limit interactions only to the components that need them with the help of Kubernetes’ own security controls (like role-based access control, service accounts, etc.). Configuring this way will not only protect your cluster but also comply with rules to that may have no ingress.

This structure allows for complex jobs, i.e. Spark Core, Spark SQL, Spark Streaming, and MLlib applications. With tools like kubectl, Helm, and logging solutions like Prometheus and Grafana, Kubernetes is highly integrated with Spark, making deployment and debugging easy.

In other words, you will be able to combine containerization with orchestration on the one hand and a secure, scalable infrastructure on the other.

1.1. Download the ilum Helm Chart​

1.2. Identify and Download Required Container Images​

1.3. Save Each Image as a Tarball​

2. Transfer Artifacts to the Offline Environment​

3. Load Images into Your Container Runtime​

Option A: Using containerd (with ctr)​

3A.1. Import the Image Tarballs (on one machine)​

3A.2. Tag the Image for the Local Registry​

3A.3. Push the Image to the Local Registry​

Option B: Using Docker​

3B.1. Load the Image Tarball (if needed)​

3B.2. Tag the Image for Your Local Registry​

3B.3. Push the Image to the Local Registry​

4. Preferred Option: Set Up a Local Image Registry​

Example: Setting Up a Basic Registry with Persistent Storage (Docker)​

Example: Using Harbor​

5. Update the Helm Chart to Use the Local Registry​

6. Install ilum Using Helm​

7. Verify and Troubleshoot​

7.1. Verify the Deployment​

7.2. Troubleshooting​