Air-Gapped Installation of Apache Spark on Kubernetes
Below is a step‐by‐step guide to installing ilum in an offline (air‐gapped) environment. This guide is written to be agnostic to your Kubernetes distribution and covers both approaches for managing container images—using containerd (with the ctr tool) or Docker. The instructions assume that you have:
- An Internet-connected workstation to download the required Helm chart and container images.
- A method (such as a USB drive or internal file server) to transfer files from the online workstation to your offline environment.
- A working Kubernetes cluster (any distribution) in your offline environment.
- Helm installed and configured to connect to your offline cluster.
- Sufficient Disk Space: At least 60GB free on both the download and offline machines (for handling large image tarballs).
- Recommended Resources: 12 CPU and 18GB RAM (or more depending on workload).
Architecture Overview: Spark on Kubernetes in Air-Gapped Environments
When deploying Apache Spark on Kubernetes in an air-gapped environment, the driver and executor processes run as containerized processes 100% managed within infrastructure devoid of public connectivity. All container images and dependencies, along with configuration files, are hosted internally and do not need external network access.
In this environment, we deploy Spark components as containerized applications on Kubernetes. Spark driver and executors operate as pods. Scheduling, resource allocation, and scaling are handled by Kubernetes. The setup uses native Kubernetes features like resource limits, Horizontal Pod Autoscaling, and node selectors for smooth and reliable functioning. For general details on how Ilum manages these resources, see the Architecture Overview.
A local image registry is a key part of this architecture. Instead of manually loading images on every node, you push them into a registry within your infrastructure. Whether using a basic deployment with registry:2 or a robust solution like Harbor, the registry must be backed by persistent storage to retain images after restarts. Once images are in the registry, individual nodes can pull them on demand.
Networking and security are critical. Air-gapped environments use network policies to control pod communication. These policies limit interactions to necessary components using Kubernetes' security controls (RBAC, service accounts), ensuring compliance with strict no-ingress rules.
This structure supports complex jobs including Spark Core, Spark SQL, Spark Streaming, and MLlib applications. Integrated with tools like kubectl, Helm, Prometheus, and Grafana, this setup makes deployment, monitoring, and debugging efficient even without internet access. You can run jobs via REST API or Spark Submit once the cluster is operational.
1. Preparation and Downloads
The process is broken down into these steps:
1.1. Download Ilum Helm Chart for Offline Use
Ilum chart is provided via a public Helm repository, run:
helm repo add ilum https://charts.ilum.cloud
helm repo update
helm pull ilum/ilum #(you can add --version <desired_version> to specify a ilum version)
This will create a file ilum-<version>.tgz.
Tip: You can extract and modify the chart’s
values.yamllater if you want to change the image repository references.
1.2. Identify and Download Required Docker Images
Below is the list of images(version 6.6.1 of ilum):
Click to view required images list
alpine/kubectl:1.34.1apache/airflow:3.0.2apache/nifi:2.5.0apache/superset:dockerizebitnamilegacy/postgresql:16bitnami/git:latestcurlimages/curl:8.5.0docker.io/bitnamilegacy/minio:2025.3.12-debian-12-r0docker.io/bitnamilegacy/os-shell:11-debian-11-r72docker.io/bitnamilegacy/postgresql:16.1.0-debian-11-r25docker.io/bitnamilegacy/redis:7.0.10-debian-11-r4docker.io/ilum/mongodb:6.0.5ghcr.io/projectnessie/nessie:0.105.1gitea/gitea:1.22.3ilum/airflow:3.1.1ilum/core:6.6.1ilum/hive:3.1.3ilum/kyuubi:1.10.0-sparkilum/mageai:0.9.76ilum/mongodb:6.0.5ilum/spark:3.5.7-deltailum/spark-launcher:spark-3.5.3ilum/sparkmagic:0.23.3ilum/streamlit-example:1.0.0ilum/superset:4.1.0.1ilum/ui:6.6.1jpgouin/openldap:2.6.9-fixminio/mc:RELEASE.2025-04-16T18-13-26Zregistry.k8s.io/git-sync/git-sync:v4.3.0trinodb/trino:477
1.3. Save Each Image as a Tarball
You can script this process. For example, create a file named pull_and_save.sh:
#!/bin/bash
IMAGES=(
"alpine/kubectl:1.34.1"
"apache/airflow:3.0.2"
"apache/nifi:2.5.0"
"apache/superset:dockerize"
"bitnamilegacy/postgresql:16"
"bitnami/git:latest"
"curlimages/curl:8.5.0"
"docker.io/bitnamilegacy/minio:2025.3.12-debian-12-r0"
"docker.io/bitnamilegacy/os-shell:11-debian-11-r72"
"docker.io/bitnamilegacy/postgresql:16.1.0-debian-11-r25"
"docker.io/bitnamilegacy/redis:7.0.10-debian-11-r4"
"docker.io/ilum/mongodb:6.0.5"
"ghcr.io/projectnessie/nessie:0.105.1"
"gitea/gitea:1.22.3"
"ilum/airflow:3.1.1"
"ilum/core:6.6.1"
"ilum/hive:3.1.3"
"ilum/kyuubi:1.10.0-spark"
"ilum/mageai:0.9.76"
"ilum/mongodb:6.0.5"
"ilum/spark:3.5.7-delta"
"ilum/spark-launcher:spark-3.5.3"
"ilum/sparkmagic:0.23.3"
"ilum/streamlit-example:1.0.0"
"ilum/superset:4.1.0.1"
"ilum/ui:6.6.1"
"jpgouin/openldap:2.6.9-fix"
"minio/mc:RELEASE.2025-04-16T18-13-26Z"
"registry.k8s.io/git-sync/git-sync:v4.3.0"
"trinodb/trino:477"
)
for image in "${IMAGES[@]}"; do
echo "Pulling $image..."
docker pull "$image"
filename=$(echo "$image" | tr '/:' '__')
echo "Saving $image to ${filename}.tar..."
docker save "$image" -o "${filename}.tar"
done
Run the script:
chmod +x pull_and_save.sh
./pull_and_save.sh
This produces a set of .tar files containing your images.
2. Transfer Artifacts to the Offline Environment
Use your preferred method (USB drive, internal file server, scp etc.) to copy the following from your online machine to the offline environment:
- The Helm chart package (e.g.,
ilum-<version>.tgz) - All image tarballs (e.g.,
apache_superset_dockerize.tar,ilum_hive_3.1.3.tar, etc.)
3. Import Container Images (Docker & Containerd)
When using a local registry, you don’t need to load the images manually on every node. Instead, you can push them into your local registry, and then all nodes will pull the images from the registry as needed. Below are instructions for preparing the images before pushing them into your registry.
Managing Disk Space: If you are short on disk space, consider processing images sequentially (load one, push it, then delete the tarball) instead of copying all tarballs at once.
- Option A: containerd (ctr)
- Option B: Docker
3A.1. Import the Image Tarballs (on one machine)
On a machine that can access your local registry import the tarball(s):
sudo ctr -n k8s.io images import /path/to/<image_tarball>.tar
For example:
sudo ctr -n k8s.io images import /opt/offline-images/ilum_hive_3.1.3.tar
3A.2. Tag the Image for the Local Registry
Tag the image with your local registry’s endpoint (for example, if your registry is accessible at localhost:5000):
sudo ctr -n k8s.io images tag ilum/hive:3.1.3 localhost:5000/ilum/hive:3.1.3
3A.3. Push the Image to the Local Registry
Push the tagged image:
sudo ctr -n k8s.io images push --plain-http localhost:5000/ilum/hive:3.1.3
(Use --plain-http if your registry is configured without TLS.)
3B.1. Load the Image Tarball (if needed)
If you wish to verify locally before pushing (optional), you can load an image on your admin machine:
docker load -i /path/to/<image_tarball>.tar
For example:
docker load -i /opt/offline-images/ilum_hive_3.1.3.tar
3B.2. Tag the Image for Your Local Registry
docker tag ilum/hive:3.1.3 localhost:5000/ilum/hive:3.1.3
3B.3. Push the Image to the Local Registry
docker push localhost:5000/ilum/hive:3.1.3
Note: Once the images are in your local registry, every node in your cluster can pull them automatically when needed. There is no requirement to pre-load the images on every node.
4. Setup Local Image Registry (Preferred)
Using a local registry is highly recommended because it simplifies image management and scales well for larger or dynamic clusters. Although you can run a basic registry using registry:2, consider robust alternatives like Harbor, Nexus Repository Manager, or Quay. For example, Harbor offers role‑based access control, vulnerability scanning, image replication, and a user-friendly web UI.
Important: Ensure that whichever registry you choose is configured with an attached persistent volume (or persistent storage). This guarantees that your images remain available even if the registry container is restarted or updated.
Example: Setting Up a Basic Registry with Persistent Storage (Docker)
-
Create a Directory for Registry Data:
Create Directorymkdir -p /opt/registry-data -
Run the Registry Container with a Volume:
Start Registrydocker run -d \
-p 5000:5000 \
--name registry \
-v /opt/registry-data:/var/lib/registry \
registry:2
Example: Using Harbor
For a more robust solution, download Harbor’s offline installer from the Harbor GitHub Releases page and follow the provided documentation. Harbor requires you to configure persistent storage (using volumes) as part of its installation.
5. Configure Helm for Local Registry
To ensure that ilum pulls images from your local registry rather than public repositories, update the image repository references in the Helm chart. For example, if the default values have:
ilum-core:
image: "ilum/core:6.6.1"
Change it to:
ilum-core:
image: "localhost:5000/ilum/core:6.6.1"
You can either edit the chart’s default values.yaml or provide an override file. For example, create local-registry-values.yaml:
ilum-core:
image: "localhost:5000/ilum/core:6.6.1"
ilum-ui:
image: "localhost:5000/ilum/ui:6.6.1"
# Add similar overrides for other components (e.g., ilum/airflow, ilum/hive, etc.)
Then install (or upgrade) using:
helm install ilum /path/to/ilum-<version>.tgz --namespace ilum --create-namespace -f local-registry-values.yaml
6. Install ilum Using Helm
Ensure your kubeconfig is configured for your offline cluster, then install the Helm chart:
helm install ilum /path/to/ilum-<version>.tgz --namespace ilum --create-namespace
(Include the override file if you are using one.)
7. Verify and Troubleshoot
7.1. Verify the Deployment
-
Check the Helm Release Status:
Helm Statushelm status ilum --namespace ilum -
List Pods:
Get Podskubectl get pods -n ilum -
Inspect Pod Image References:
For example:
Describe Podkubectl describe pod <pod_name> -n ilum | grep Image:Confirm that the image paths reference your local registry (e.g.,
localhost:5000/ilum/hive:3.1.3).
7.2. Troubleshooting
Click to view troubleshooting steps
- ImagePullBackOff Errors:
Verify that images are available in the local registry and that all nodes can access the registry. - Registry Access:
Ensure that any required insecure registry settings (if using HTTP) are configured on your nodes. - Persistent Storage:
Confirm that the local registry’s data directory is correctly mounted so that images persist across container restarts.
Frequently Asked Questions (FAQ)
Can I run Apache Spark on Kubernetes without internet access?
Yes. By using an air-gapped installation method, all necessary dependencies (Docker images, Helm charts) are downloaded on an online machine, transferred to the offline environment, and hosted in a local registry.
Do I need a local image registry for air-gapped installation?
While you can technically load images manually onto every node using docker load or ctr image import, setting up a local registry (like Harbor or the basic Docker Registry) is strongly recommended. It simplifies scaling, image management, and ensures all nodes can pull images reliably.
How do I handle Spark dependencies in an offline cluster?
For Spark jobs that require external libraries (Maven/PyPI), you must pre-download these artifacts. You can either build custom Docker images containing these libraries or host a local Maven/PyPI mirror (e.g., using Sonatype Nexus or JFrog Artifactory) inside your air-gapped network.
What is the advantage of using Ilum in an air-gapped setup?
Ilum simplifies the management of Spark on Kubernetes by providing a unified control plane. In air-gapped environments, its ability to manage interactive sessions and jobs without reaching out to external cloud services makes it an ideal orchestrator for secure, on-prem data platforms.