Skip to main content

Production

For production environments, it's recommended to deploy all dependencies in separate namespaces.

Kubernetes Prerequisites

Ilum has been extensively tested across all leading Kubernetes environments, ensuring compatibility with a variety of deployment scenarios. This includes lightweight Kubernetes distributions such as k3s and Rancher, as well as bare-metal Kubernetes clusters. Additionally, Ilum is fully compatible with major managed Kubernetes services in the cloud, including Google Kubernetes Engine (GKE), Amazon Elastic Kubernetes Service (EKS), and Azure Kubernetes Service (AKS).

Minikube for Testing

Throughout our documentation, we use Minikube for demonstration and testing purposes. Minikube provides an easy-to-set-up environment that allows users to quickly try out Ilum's features on a local machine. However, it is important to note that Minikube is not suitable for production deployments due to its limitations in scalability, resource management, and high availability.

For production use, we strongly recommend deploying Ilum on a robust Kubernetes setup that aligns with your infrastructure needs, ensuring optimal performance and reliability.

Prerequisites

The table below provides necessary prerequisites and related instructions.

PrerequisiteInstruction
MongoDBRefer to https://bitnami.com/stack/mongodb/helm
KafkaRefer to https://bitnami.com/stack/kafka/helm
ObjectStorageRefer to https://min.io/docs/minio/kubernetes/upstream/operations/installation.html

ilum-core

helm install ilum-core --create-namespace -n ilum --set mongo.instances=<mongo uri> --set kafka.address=<kafka broker address> --set s3a.host=<s3 host> --set s3a.port=<s3 port> ilum/ilum-core

ilum-ui

helm install ilum-ui --create-namespace -n ilum ilum/ilum-ui

MongoDB

Ilum employs MongoDB as its storage layer, preserving all data required between restarts within the MongoDB database. Ilum automatically creates all necessary databases and collections during the startup process.

Apache Kafka

Apache Kafka serves as Ilum's communication layer, facilitating interaction between Ilum-Core and Spark jobs, as well as between different Ilum-Core instances when scaled. It is critical to ensure Apache Kafka brokers are accessible by both Ilum-Core and Spark jobs, especially when Spark jobs are launched on a different Kubernetes cluster.

Ilum utilizes Kafka to carry out communication using several topics, all created during Ilum's startup. Therefore, users don't need to manage these topics manually.

MinIO

Ilum uses MinIO as the storage layer for Spark application components. All files (including jars, configurations, data files) needed for the operation of Spark components (driver, executors) are stored and made available for download via MinIO.

MinIO implements the S3 interface, which also enables it to store input/output data.

Security keys

This application uses JSON Web Tokens (JWT) for authentication purposes. By default, the application employs an RSA key pair, which is randomly generated at runtime, to sign these tokens.

In its standard configuration, the application creates a fresh RSA key pair each time it starts. This approach simplifies local development and testing by automatically handling the key generation process. However, it must be emphasized that this approach is not suitable for a production environment.

The primary issue with using randomly generated keys in a production environment is the lack of persistence. Each time the application restarts, it generates a new RSA key pair, invalidating all previously issued tokens. This could lead to an abrupt and unanticipated logout for all users, disrupting user experience and potentially leading to data loss.

Generate private key

For a production environment, a stable and secure key pair should be manually generated and used consistently. This ensures that tokens remain valid across multiple application restarts, thus providing a consistent user experience.

You can generate an RSA key pair manually using tools like OpenSSL. A common command to generate a 2048-bit RSA private key is as follows:

openssl genpkey -algorithm RSA \
-pkeyopt rsa_keygen_bits:2048 \
-pkeyopt rsa_keygen_pubexp:65537 | \
openssl pkcs8 -topk8 -nocrypt -outform pem > private-key.p8

The contents of the private key should look like the following:

-----BEGIN PRIVATE KEY-----
MIIEvQIBADANBgkqhkiG9w0BAQEFAASCBKcwggSjAgEAAoIBAQCsRnE83rm6BJya
nTyzVqX0SG+D4zBjkyWsOmGG+CoDdgQ6Z8AaocmnjP1SbRykQsQSMf6SeW+fdpH+
ccmzuHe7pZIa2o2Mg8xbk/UszJDaPztwoQbUt/2gHi/rZP8cIVkquzhnN/yxrMls
...
-----END PRIVATE KEY-----

In order to use private key as the setting security.jwt.privateKey, remove header and footer from the key.

Generate public key

To generate the corresponding public key, use:

openssl pkey -pubout -inform pem -outform pem -in private-key.p8 -out public-key.spki

The contents of the public key should look like the following:

-----BEGIN PUBLIC KEY-----
MIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEArEZxPN65ugScmp08s1al
9Ehvg+MwY5MlrDphhvgqA3YEOmfAGqHJp4z9Um0cpELEEjH+knlvn3aR/nHJs7h3
u6WSGtqNjIPMW5P1LMyQ2j87cKEG1Lf9oB4v62T/HCFZKrs4Zzf8sazJbMN3E/mJ
...
-----END PUBLIC KEY-----

In order to use public key as the setting security.jwt.publicKey, remove header and footer from the key.

Modules

Ilum provides several modules that are integrated and preconfigured and will be useful in your data infrastructure.

Ilum-Livy-proxy

Ilum-Livy proxy is our implementation of Livy Api, that integrates spark code with Ilum Groups in services such as Jupyter, Zeppelin, Airflow

Ilum Livy-proxy is enabled in Ilum by default.

In case you want to add or remove Ilum-Livy-proxy, you can use ilum-livy-proxy.enabled helm value to manage it. For example: --set ilum-livy-proxy.enabled=false to disable it.

Read more about Ilum-Livy-proxy here

Jupyter

Jupyter is a Notebook - sophisticated development environment which allows you to have code, charts, explanations and more in one executable document.

Jupyter is enabled in Ilum by default.

However, in case you want to control whether it is enabled or not, you can use helm value ilum-jupyter.enabled. For example, you can add --set ilum-jupyter.enabled=false to your installation command to disable it.

Be aware, that Jupyter makes use of Ilum-Livy-proxy to integrate with Ilum Groups. Therefore, you should enable it as well: --set ilum-livy-proxy.enabled=true

If you want to access the Jupyter UI, you can do it by:

  • using Ilum UI: go to Modules > Jupyter
  • configuring an ingress
  • using the port-forward command kubectl port-forward svc/ilum-jupyter 8888:8888

Read more about Jupyter here

Apache Zeppelin

Zeppelin is a Notebook - sophisticated development environment which allows you to have code, charts, explanations and more in one executable document.

Please be aware, that Zeppelin notebook is not bundled in ilum package by default. If you want to run this service, add --set ilum-zeppelin.enabled=true to your installation command.

Be aware, that Zeppelin makes use of Ilum-Livy-proxy to integrate with Ilum Groups. Therefore, you should enable it as well: --set ilum-livy-proxy.enabled=true

If you want to access the Zeppelin UI, the best way to do it is by configuring an ingress or using the port-forward command kubectl port-forward svc/ilum-zeppelin 8080:8080

Read more about Zeppelin here

Hive Metastore

Note: Hive Metastore is not enabled in Ilum by default.

Hive Metastore is a metadata storage used to store your Spark catalogs (Spark tables, databases, views, and more) in a database instead of runtime memory. You can view these schemas later on the Table Explorer page.

To enable the Hive Metastore bundled instance, add --set ilum-hive-metastore.enabled=true to your installation command. You'll also need to include --set ilum-core.hiveMetastore.enabled=true to link it with the Table Explorer.

Take into account, that Hive Metastore uses PostgreSQL database to store metadata. You can read about it below.

Ilum SQL

Note: Ilum SQL is not enabled in Ilum by default.

To enable it, add --set ilum-sql.enabled=true to enable the SQL execution host and --set ilum-core.sql.enabled=true to enable the SQL viewer inside Ilum itself.

Ilum SQL can execute SQL queries on your data in the UI. More about it read on the SQL Viewer page.

Apache Airflow

Airflow is a tool for management of data pipelines.

Please be aware, that Airflow is not bundled in ilum package by default. If you want to run this service, add --set airflow.enabled=true to your installation command.

Take into account that Airflow can use Ilum-Livy-proxy to create jobs integrated with Ilum Groups. In case you want to use Ilum-Livy-proxy and it is disabled, you can enable it with helm value ilum-livy-proxy.enabled

If you want to access the Airflow UI, the best way to do it is by configuring an ingress or using the port-forward command kubectl port-forward svc/ilum-webserver 8080:8080

Marquez

Marquez is a tool that is used to store metadata about data flow in you data infrastructure. This metadata is used by Ilum Lineage in order to present relationships between Jobs and Dataset as a graph.

Please be aware, that Marquez is not bundled in ilum package by default. If you want to run this service, add --set global.lineage.enabled=true to your installation command and --set ilum-marquez.web.enabled=true for web client.

Take into account, that Marquez makes use of PostgreSQL database to store the metadata. You can read about it below.

If you want to access the Marquez UI, you can do it by:

  • configuring an ingress
  • using the port-forward command kubectl port-forward svc/ilum-marquez-web 9444:9444

Read more about Marquez and Ilum Lineage here

PostgreSQL

Postgre SQL database is used by services such as Marquez, Hive Metastore, Airflow in order to store metadata.

PostheSQL databases are enabled in Ilum by default.

If you want to control whether PostgreSQL is enabled or not, you can use helm value postgresql.enabled. For example, to disable them, you can add --set postgresql.enabled=true to your installation command.

Kube Prometheus Stack

Kube Prometheus Stack includes Prometheus, Grafana and other tools for monitoring your data infrastructure

Please be aware, that Kube Prometheus Stack is not bundled in ilum package by default. If you want to run this service, add --set kube-prometheus-stack.enabled=true to your installation command.

If you want to access the Prometheus UI, the best way to do it is by configuring an ingress or using the port-forward command kubectl port-forward svc/prometheus-operated 9090:9090

If you want to access the Grafana UI, the best way to do it is by configuring an ingress or using the port-forward command kubectl port-forward svc/ilum-grafana 8080:80

Loki and Promtail

Loki is used to gather and manage logs of your data infrastructure. Promtail is used as an agent that pushes logs into Loki

Please be aware, that Loki is not enabled in Ilum by default. If you want to run this service, add --set global.logAggregation.loki.enabled=true to your installation command.

Promtail also is not enabled in Ilum by default. To enable it add --set global.logAggreagtion.promtail.enabled=true to your installation command

If you want to access Loki and run Loki Queries, you can configure an ingress or use the port-forward command kubectl port-forward svc/ilum-loki-read 3100:3100 for read queries and kubectl port-forward svc/ilum-loki-write 3100:3100 for write queries. You can also use service ilum-loki-gateway to link grafana to loki

Graphite

Please be aware, that Graphite is not bundled in ilum package by default. If you want to run this service, add --set graphite-exporter.graphite.enabled=true to your installation command.

Trouble Shooting

Image Pulling Errors

During the installation of Ilum on your cluster, Helm will pull Docker images, which may be as large as 10 GB, depending on the additional modules you enable. Consequently, with a slow internet connection, you might encounter Image Pull Timeout errors if the image download time exceeds the configured timeout. To resolve this issue, you can:

  1. Pull Docker image manually by running:
minikube ssh docker pull image
# for example
minikube ssh docker pull ilum/core-6.1.3

  1. Change the image pull timeout in your kubernetes configurations like this:
minikube start --extra-config=kubelet.runtime-request-timeout=5m

or like this:

minikube start --extra-config=kubelet.image-pull-progress-deadline=5m

Default Passwords / Credentials

Ilum comes with predefined credentials for various modules to simplify initial setup and testing. However, for production deployments, it is critical to change these default credentials to ensure security and prevent unauthorized access.

Default Credentials

ApplicationDefault UsernameDefault Password
Ilum UIadminadmin
MinIO Consoleminioadminminioadmin
Airflow Web UIadminadmin
Superset UIadminadmin
Gitea UIilumilum
Grafanaadminadmin

Database Credentials (For Internal Use)

DatabaseDefault UsernameDefault Password
PostgreSQLpostgresCHANGEMEPLEASE
MarquezpostgresCHANGEMEPLEASE
Hive MetastorepostgresCHANGEMEPLEASE