Skip to main content

Day-2 Operations

Once your Ilum deployment is running, you will eventually need to upgrade it, inspect logs, diagnose problems, and recover from failures. This section covers the operational commands you use after the initial install.

Upgrading Ilum

The ilum upgrade command upgrades an existing Helm release to a new chart version, applies value changes, or both. Under the hood it runs an 8-step pipeline:

  1. Resolve profile defaults -- reads the active profile for release name, namespace, and context.
  2. Ensure Helm repo -- adds and updates the ilum chart repo if not already configured.
  3. Detect stuck release -- checks for pending-install, pending-upgrade, or pending-rollback status.
  4. Resolve versions -- determines the current and target chart versions. When the chart version changes, the CLI automatically enables --reset-defaults so that new chart defaults (e.g. updated image tags) take effect. Pass --reuse-values to opt out.
  5. Plan upgrade -- fetches live values, detects drift, resolves module flags, and computes a values diff.
  6. Check breaking changes -- scans the known breaking changes registry for anything between your current and target versions.
  7. Show summary and confirm -- presents the upgrade summary, values diff, breaking change warnings, and the exact Helm command.
  8. Execute -- runs helm upgrade with --reset-then-reuse-values (on version change) or --reuse-values (same version) and saves a values snapshot for drift detection.

Basic upgrade to the latest chart version:

$ ilum upgrade --yes
ℹ Chart version changing (6.7.0 → 6.8.0) — using new chart defaults (--reset-defaults).
╭─ Upgrade Summary ───────────────────────────────────────╮
│ Release ilum │
│ Namespace default │
│ Chart ilum/ilum │
│ Version 6.7.0 → 6.8.0 │
│ Modules core, gitea, jupyter, minio, mongodb, │
│ postgresql, ui │
│ Atomic True │
╰──────────────────────────────────────────────────────────╯

ℹ No value changes.
ℹ Upgrade notes: https://ilum.cloud/docs/upgrade-notes/

Command: helm upgrade ilum ilum/ilum \
--namespace default \
--timeout 10m \
--atomic \
--reset-then-reuse-values

⠋ Upgrading Ilum...
✓ Ilum upgraded successfully (release: ilum).

Pin a specific target version with --version:

$ ilum upgrade --version 6.7.1 --yes

Add modules during the upgrade with -m:

$ ilum upgrade -m airflow --version 6.8.0 --yes

When you pass -m, the upgrade resolves that module's dependencies and merges them into the --set flags alongside the version bump.

Supply custom values files or individual overrides:

$ ilum upgrade -f production-values.yaml --set ilum-core.replicaCount=3 --yes

Preview the upgrade without applying it:

$ ilum upgrade --version 6.8.0 --dry-run
ℹ Chart version changing (6.7.0 → 6.8.0) — using new chart defaults (--reset-defaults).
╭─ Upgrade Summary ───────────────────────────────────────╮
│ Release ilum │
│ Namespace default │
│ Chart ilum/ilum │
│ Version 6.7.0 → 6.8.0 │
│ Modules core, gitea, jupyter, minio, mongodb, │
│ postgresql, ui │
│ Atomic True │
╰──────────────────────────────────────────────────────────╯

ℹ No value changes.
ℹ Upgrade notes: https://ilum.cloud/docs/upgrade-notes/

Command: helm upgrade ilum ilum/ilum \
--namespace default \
--timeout 10m \
--atomic \
--reset-then-reuse-values \
--version 6.8.0

ℹ Dry-run mode — no changes applied.

Values safety. Every upgrade follows a fetch-merge-diff-apply pipeline. The CLI fetches the live user-supplied values from the running release, detects any external drift (for example, someone ran a manual helm upgrade outside the CLI), computes a diff of the intended changes, and saves a snapshot after execution. If drift is detected, you see a warning:

⚠ External changes detected since last CLI operation:
╭─ External Changes (Drift) ──────────────────────────────╮
│ Key Before After │
├──────────────────────────────────────────────────────────┤
│ ilum-core.replicaCount 1 3 │
╰──────────────────────────────────────────────────────────╯
ℹ These external changes will be preserved. CLI changes will be applied on top.

If nothing has changed -- same chart version, no value changes, and no new modules -- the CLI exits early with a success message instead of running a no-op upgrade:

✓ Already on version 6.7.0 with no value changes. Nothing to upgrade.

See Command Reference for the complete list of ilum upgrade flags.

Breaking Change Warnings

When upgrading across minor versions, the CLI checks a built-in registry of known breaking changes and surfaces warnings before you confirm. This works whether you specify --version explicitly or let the CLI resolve the latest version automatically.

Example: upgrading from 6.5.x to 6.7.x crosses two known breaking changes.

$ ilum upgrade --version 6.7.0
╭─ Upgrade Summary ───────────────────────────────────────╮
│ Release ilum │
│ Namespace default │
│ Chart ilum/ilum │
│ Version 6.5.2 → 6.7.0 │
│ Modules core, gitea, jupyter, minio, mongodb, │
│ postgresql, ui │
│ Atomic True │
╰──────────────────────────────────────────────────────────╯

⚠ Breaking change (6.5 -> 6.6): MongoDB chart upgraded from bitnami/mongodb 13.x to 14.x — auth key format changed.
Hint: Back up MongoDB data before upgrading.
⚠ Breaking change (6.6 -> 6.7): Langfuse requires custom Docker image with NEXT_PUBLIC_BASE_PATH baked in.
Hint: Build custom Langfuse image or disable Langfuse module.
⚠ Upgrading across minor versions (6.5.2 → 6.7.0). Review upgrade notes: https://ilum.cloud/docs/upgrade-notes/
ℹ Upgrade notes: https://ilum.cloud/docs/upgrade-notes/

? Proceed with upgrade? [y/N]

Each warning includes:

  • The version range where the break was introduced.
  • A description of what changed.
  • A migration hint with a concrete action to take.

The warnings are informational -- you can still proceed with the upgrade after reviewing them. In non-interactive mode (--yes), the warnings are printed but the upgrade proceeds automatically. In CI/CD pipelines, use --dry-run first to surface these warnings without applying any changes.

Viewing Logs

The ilum logs command streams logs from a module's pod without requiring you to look up pod names or label selectors. You specify the module name, and the CLI resolves the correct pod and container automatically.

Show the last 100 lines (default) from the core module:

$ ilum logs core
2026-02-14 09:45:12.234 INFO 1 --- [main] i.c.IlumCoreApplication : Started IlumCoreApplication in 12.3 seconds
2026-02-14 09:45:12.456 INFO 1 --- [main] o.s.b.a.e.web.EndpointLinksResolver : Exposing 15 endpoints
2026-02-14 09:45:13.789 INFO 1 --- [scheduling-1] i.c.s.SparkSessionManager : Initializing Spark session pool

Follow logs in real time (--follow / -f):

$ ilum logs core --follow

Press Ctrl+C to stop streaming. The CLI exits gracefully without an error.

Control the number of lines with --tail:

$ ilum logs jupyter --tail 50

Pass --tail 0 to retrieve the full log history (all lines):

$ ilum logs core --tail 0

Select a specific container with --container / -c:

When a pod has multiple containers (for example, sidecars or init containers), the CLI lists them:

$ ilum logs airflow
ℹ Pod has multiple containers: airflow-webserver, airflow-scheduler. Use -c to select one.

Then target the one you want:

$ ilum logs airflow -c airflow-scheduler --follow

If you specify a container that does not exist in the pod, the CLI reports the available containers:

✗ Container 'wrong-name' not found in pod 'ilum-airflow-0'.
Available containers: airflow-webserver, airflow-scheduler

View logs from a previously terminated container with --previous / -p:

$ ilum logs core --previous --tail 200

This is useful for inspecting crash logs from a container that has already restarted.

Pipe logs to external tools:

Because ilum logs writes to stdout (not Rich-formatted), you can pipe it directly:

$ ilum logs core --tail 0 | grep ERROR
$ ilum logs core --follow | tee /tmp/core.log

When multiple pods match a module's label selector (for example, a module with replicaCount > 1), the CLI automatically picks the first pod and tells you:

ℹ Multiple pods found (3), showing logs for 'ilum-core-6f8b4c7d9-xk2pl'.
note

The ilum logs command uses the module's pod_label from the module registry to find pods. Modules without a pod_label (such as infrastructure modules that do not own a distinct pod) will report an error. In those cases, use kubectl logs directly.

Health Checks

Getting Started introduced ilum doctor and its full check suite. This section focuses on using doctor for targeted operational checks.

Run a single check by name with --check / -c:

$ ilum doctor --check pods
╭─ ilum doctor ────────────────────────────────────────────────────────╮
│ Status Check Message │
├──────────────────────────────────────────────────────────────────────┤
│ ✓ pods All pods are healthy │
╰──────────────────────────────────────────────────────────────────────╯

The exit code reflects the check result: 0 on pass, 1 on failure. This makes single-check mode useful in scripts:

$ ilum doctor --check release && echo "Release is healthy"

Target a specific release and namespace:

$ ilum doctor --release ilum-staging --namespace staging --check pods

Available single checks (13 total):

CheckWhat it verifies
helmHelm CLI version >= 3.12
kubectlkubectl CLI version >= 1.28
dockerDocker CLI version >= 24.0
helm-repoIlum Helm repo is configured
clusterCan connect to the Kubernetes API
namespaceTarget namespace exists
podsAll pods in the namespace are healthy
pvcsAll PersistentVolumeClaims are bound
rbacCLI has required RBAC permissions
releaseHelm release exists and is in deployed state
compatibilityKubernetes version is compatible with the chart
service-endpointsServices have backing endpoints (no dangling Services)
health-endpointsKnown health endpoints respond (e.g. /actuator/health, /health)

The service-endpoints check iterates over Services in the namespace and verifies that each has at least one backing pod endpoint. This catches cases where a Deployment has zero ready replicas but the Service still exists. The health-endpoints check probes known HTTP health paths for core services (such as /actuator/health for Ilum Core, /health for Airflow) via pod proxy and reports any that are not responding.

If you pass an unknown check name, the CLI reports the error and exits with code 1:

$ ilum doctor --check nonexistent
✗ Unknown check: nonexistent

For the complete doctor output format, see Getting Started and Command Reference.

Recovering Stuck Releases

A Helm release can get stuck in pending-install, pending-upgrade, or pending-rollback state if a previous operation was interrupted (for example, a network timeout, Ctrl+C during install, or a pod that never became ready). When this happens, subsequent ilum upgrade or ilum module enable commands fail because Helm refuses to operate on a release in a pending state.

Detection. The CLI detects stuck releases automatically at the start of every ilum upgrade:

$ ilum upgrade --yes
✗ Release 'ilum' is stuck. Use --force-rollback to recover.

Recovery with --force-rollback. Pass the --force-rollback flag to roll back the stuck release to the last successful revision before the upgrade proceeds:

$ ilum upgrade --force-rollback --yes
⚠ Release 'ilum' is stuck — rolling back.
⠋ Rolling back...
✓ Rollback complete.

╭─ Upgrade Summary ───────────────────────────────────────╮
│ Release ilum │
│ Namespace default │
│ Chart ilum/ilum │
│ Version 6.7.0 → 6.7.0 │
│ Modules core, gitea, jupyter, minio, mongodb, │
│ postgresql, ui │
│ Atomic True │
╰──────────────────────────────────────────────────────────╯

⠋ Upgrading Ilum...
✓ Ilum upgraded successfully (release: ilum).

The recovery flow is:

  1. Detect that the release status is one of pending-install, pending-upgrade, or pending-rollback.
  2. Run helm rollback to restore the last known-good revision.
  3. Proceed with the requested upgrade normally.

You can also use --force-rollback together with --dry-run to test the detection without actually rolling back:

$ ilum upgrade --force-rollback --dry-run

Diagnosing the root cause. After recovering a stuck release, use ilum doctor --check release and ilum logs <module> --previous to understand what went wrong. The --previous flag on ilum logs shows output from the terminated container, which typically contains the crash reason.

note

The --force-rollback flag only affects the current upgrade invocation. It does not permanently change any configuration. If the underlying problem persists (for example, insufficient resources causing pods to crash-loop), the next upgrade will also get stuck unless you address the root cause.

Operation History

The CLI maintains an audit log of all operations. Every install, upgrade, enable, disable, and uninstall is recorded with a timestamp, operation type, release name, and outcome.

$ ilum history
╭─ Operation History ──────────────────────────────────────────────────────╮
│ Time Operation Release Status │
├──────────────────────────────────────────────────────────────────────────┤
│ 2026-02-14 10:30:12 install ilum success │
│ 2026-02-14 11:15:44 enable ilum success │
│ 2026-02-14 14:22:01 upgrade ilum success │
╰──────────────────────────────────────────────────────────────────────────╯

Show only the last N operations:

$ ilum history --last 5

Filter by operation type:

$ ilum history --operation upgrade

Machine-readable output for CI/CD:

$ ilum history --last 5 --output json

Inspecting Values

The ilum values command lets you view, filter, and export the live Helm values from a running release. It replaces manual helm get values calls with dot-path filtering, revision pinning, diff mode, and YAML export.

Show all user-supplied values:

$ ilum values
╭─ Values ─────────────────────────────────────────────────────────────╮
│ ilum-core: │
│ enabled: true │
│ replicaCount: 1 │
│ ilum-ui: │
│ enabled: true │
│ mongodb: │
│ enabled: true │
│ ... │
╰──────────────────────────────────────────────────────────────────────╯

Filter with a dot-notation path:

$ ilum values --path ilum-core
╭─ Values ─────────────────────────────────────────────────────────────╮
│ enabled: true │
│ replicaCount: 1 │
│ sql: │
│ enabled: false │
│ ... │
╰──────────────────────────────────────────────────────────────────────╯

If the path does not exist, the CLI reports the error:

$ ilum values --path nonexistent.key
✗ Key path 'nonexistent.key' not found in values.

Include chart defaults (computed values) with --all:

$ ilum values --all

By default, ilum values shows only user-supplied values (the equivalent of helm get values). With --all, it shows the fully computed values including chart defaults (the equivalent of helm get values --all).

View values at a specific revision with --revision:

$ ilum values --revision 3
╭─ Values (revision 3) ───────────────────────────────────────────────╮
│ ilum-core: │
│ enabled: true │
│ ... │
╰──────────────────────────────────────────────────────────────────────╯

This is useful for inspecting what values were active at a previous revision before deciding whether to roll back.

Diff mode with --diff:

$ ilum values --diff
╭─ User-Supplied vs Computed Values ──────────────────────────────────╮
│ Key User Computed │
├──────────────────────────────────────────────────────────────────────┤
│ ilum-core.replicaCount 1 1 │
│ ilum-core.image.tag (absent) 6.7.0 │
│ ... │
╰──────────────────────────────────────────────────────────────────────╯

The diff shows which values you have explicitly set versus what the chart provides by default. This helps identify which values are safe to remove from your overrides.

Export values to a YAML file with --export:

$ ilum values --export my-values.yaml
✓ Values exported to my-values.yaml

The exported file is round-trip safe YAML suitable for passing back to ilum upgrade -f my-values.yaml. Combined with --all, this captures the complete computed state:

$ ilum values --all --export full-state.yaml

Comparing Values

The ilum diff command compares values across different sources. While ilum values --diff shows user-supplied vs. computed, ilum diff gives you fine-grained control over what to compare.

Compare user-supplied values against chart defaults (the default):

$ ilum diff
╭─ User-Supplied vs Computed Values ──────────────────────────────────╮
│ Key Before After │
├──────────────────────────────────────────────────────────────────────┤
│ ilum-core.replicaCount 1 1 │
│ ... │
╰──────────────────────────────────────────────────────────────────────╯

Compare live values against a local YAML file:

$ ilum diff --source file --values-file production-values.yaml
╭─ Live Values vs production-values.yaml ─────────────────────────────╮
│ Key Before After │
├──────────────────────────────────────────────────────────────────────┤
│ ilum-core.replicaCount 1 3 │
│ airflow.enabled false true │
╰──────────────────────────────────────────────────────────────────────╯

This is useful for previewing what a helm upgrade -f production-values.yaml would change.

Compare a specific revision against current values:

$ ilum diff --source revision --revision 2
╭─ Revision 2 vs Current ────────────────────────────────────────────╮
│ Key Before After │
├──────────────────────────────────────────────────────────────────────┤
│ ilum-sql.enabled false true │
│ ilum-core.sql.enabled false true │
╰──────────────────────────────────────────────────────────────────────╯

Compare the last CLI snapshot against live values (drift detection):

$ ilum diff --source snapshot
╭─ Last CLI Snapshot vs Live Values ──────────────────────────────────╮
│ Key Before After │
├──────────────────────────────────────────────────────────────────────┤
│ ilum-core.replicaCount 1 3 │
╰──────────────────────────────────────────────────────────────────────╯

If no snapshot exists (no previous CLI operation has been recorded), the command prints a message and exits cleanly:

⚠ No CLI snapshot found. Run an operation first to create one.

Filter the diff to a specific path:

$ ilum diff --source file --values-file prod.yaml --path ilum-core

The four valid sources are: defaults (default), file, revision, and snapshot.

Rolling Back

The ilum rollback command provides an explicit rollback to a previous Helm revision. Unlike the --force-rollback flag on ilum upgrade (which is designed for stuck releases), ilum rollback is a standalone operation with a values diff preview, confirmation prompt, and dry-run mode.

Roll back to the previous revision:

$ ilum rollback --yes
╭─ Rollback Plan ──────────────────────────────────────────────────────╮
│ Current revision 3 │
│ Current status deployed │
│ Current chart ilum-6.8.0 │
│ Current deployed 2026-02-15 14:22:01 │
│ │
│ Target revision 2 │
│ Target status superseded │
│ Target chart ilum-6.7.0 │
│ Target deployed 2026-02-14 11:15:44 │
╰──────────────────────────────────────────────────────────────────────╯

╭─ Values Changes (current vs target) ───────────────────────────────╮
│ Key Before After │
├──────────────────────────────────────────────────────────────────────┤
│ ilum-sql.enabled true false │
│ ilum-core.sql.enabled true false │
╰──────────────────────────────────────────────────────────────────────╯

⠋ Rolling back to revision 2...
✓ Rolled back 'ilum' to revision 2.

Roll back to a specific revision with --revision:

$ ilum rollback --revision 1 --yes

By default (--revision 0), the command targets the immediately previous revision.

Preview a rollback with --dry-run:

$ ilum rollback --dry-run
╭─ Rollback Plan ──────────────────────────────────────────────────────╮
│ Current revision 3 │
│ Current status deployed │
│ ... │
│ Target revision 2 │
│ Target status superseded │
│ ... │
╰──────────────────────────────────────────────────────────────────────╯

╭─ Values Changes (current vs target) ───────────────────────────────╮
│ Key Before After │
├──────────────────────────────────────────────────────────────────────┤
│ ilum-sql.enabled true false │
╰──────────────────────────────────────────────────────────────────────╯

ℹ Dry run — no changes applied.

After a successful rollback, the CLI updates the values snapshot and syncs the module configuration to reflect the rolled-back state. This means subsequent ilum status and ilum upgrade commands see the correct module list without manual intervention.

If only one revision exists (the initial install), the command reports the error:

✗ Only one revision exists. Nothing to roll back to.

Shelling into Pods

The ilum exec command opens an interactive shell in a module's pod. It resolves the module name to the correct pod automatically, handles multi-pod scenarios with interactive selection, and falls back from bash to sh if bash is not available in the container.

Open a shell in the core module:

$ ilum exec core
root@ilum-core-6f8b4c7d9-xk2pl:/#

The CLI finds the pod matching the module's pod_label, connects with kubectl exec -it, and defaults to /bin/bash. If bash is not available, it automatically falls back to /bin/sh:

⚠ bash not available, falling back to /bin/sh
$ ilum exec minio
/ #

Specify a different shell with --shell:

$ ilum exec core --shell /bin/sh

Run a one-off command instead of an interactive shell:

$ ilum exec core --command "cat /opt/ilum/conf/application.conf"

When --command is used, the CLI runs the command non-interactively and returns the exit code.

Select a specific container with --container:

$ ilum exec airflow --container airflow-scheduler

Multi-pod selection. When a module has multiple running pods (for example, replicaCount > 1), the CLI presents an interactive selection menu:

$ ilum exec core
? Multiple pods for 'core'. Select one:
ilum-core-6f8b4c7d9-xk2pl
> ilum-core-6f8b4c7d9-ab3cd
ilum-core-6f8b4c7d9-ef7gh

In non-interactive mode, use --pod to specify the pod directly:

$ ilum exec core --pod ilum-core-6f8b4c7d9-ab3cd

Health warning. If the target pod has a high restart count (more than 5), the CLI warns you before connecting:

⚠ Pod 'ilum-core-6f8b4c7d9-xk2pl' has 12 restarts. It may be unstable.
note

Modules without a pod_label (infrastructure modules that do not own a distinct pod) report an error. In those cases, use kubectl exec directly.

Resource Usage

The ilum top command shows per-module CPU and memory usage. It queries the Kubernetes metrics API (provided by metrics-server) and aggregates resource consumption by module, similar to kubectl top pods but organized by Ilum module.

Show resource usage for all modules:

$ ilum top
╭─ Resource Usage ─────────────────────────────────────────────────────────────────╮
│ Module Pods CPU Used CPU Req CPU % Mem Used Mem Req Mem % │
├──────────────────────────────────────────────────────────────────────────────────┤
│ core 1 250m 500m 50% 512Mi 1Gi 50% │
│ gitea 1 50m 100m 50% 128Mi 256Mi 50% │
│ jupyter 1 100m 200m 50% 256Mi 512Mi 50% │
│ minio 1 30m 100m 30% 64Mi 256Mi 25% │
│ mongodb 1 80m 250m 32% 192Mi 512Mi 38% │
│ postgresql 1 40m 100m 40% 96Mi 256Mi 38% │
│ ui 1 20m 50m 40% 48Mi 128Mi 38% │
│ TOTAL 7 570m 1.3 44% 1.3Gi 2.9Gi 44% │
╰──────────────────────────────────────────────────────────────────────────────────╯

The table shows actual usage versus requested resources and the utilization percentage. The TOTAL row aggregates across all modules.

Filter to a single module:

$ ilum top core
╭─ Resource Usage ─────────────────────────────────────────────────────────────────╮
│ Module Pods CPU Used CPU Req CPU % Mem Used Mem Req Mem % │
├──────────────────────────────────────────────────────────────────────────────────┤
│ core 1 250m 500m 50% 512Mi 1Gi 50% │
│ TOTAL 1 250m 500m 50% 512Mi 1Gi 50% │
╰──────────────────────────────────────────────────────────────────────────────────╯

Sort by CPU or memory usage:

$ ilum top --sort-by cpu       # Highest CPU usage first
$ ilum top --sort-by memory # Highest memory usage first

The default sort order is by module name.

Watch mode with --watch:

$ ilum top --watch

Watch mode continuously refreshes the table at a configurable interval (default: 5 seconds). Press Ctrl+C to stop. Adjust the interval with --interval:

$ ilum top --watch --interval 10

Metrics-server requirement. The ilum top command requires the Kubernetes metrics-server to be installed on the cluster. If the metrics API is not available, the CLI reports the error with an install command:

✗ Metrics API not available. Install metrics-server:
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
note

The ilum top command maps pods to modules using the pod_label from the module registry. Pods that do not match any known module label are grouped under (other).