Upgrade Notes
Upgrade notes
NOTE TEMPLATE
1. Change
Feature:
Feature description
Values deleted - chart name
Name | Reason |
---|---|
helm.value | Helm value deletion reason |
Values added - chart name
Values section description
Name | Description | Value |
---|---|---|
helm.value | Helm value description | default value |
Warnings
NEXT RELEASE
RELEASE 6.2.0-RC1
1. Jupyter default sparkmagic configuration change
Feature
Changed method of passing spark default configs to jupyter notebook, now it is passed as json string
Values added - ilum-jupyter
sparkmagic configuration parameters
Name | Description | Value |
---|---|---|
sparkmagic.config.sessionConfigs.conf | sparkmagic session spark configuration | '{ "pyRequirements": "pandas", "spark.jars.packages": "io.delta:delta-core_2.12:2.4.0", "spark.sql.extensions": "io.delta.sql.DeltaSparkSessionExtension", "spark.sql.catalog.spark_catalog": "org.apache.spark.sql.delta.catalog.DeltaCatalog"}' |
sparkmagic.config.sessionConfigsDefaults.conf | sparkmagic session defaults spark configuration | '{ "pyRequirements": "pandas", "spark.jars.packages": "io.delta:delta-core_2.12:2.4.0", "spark.sql.extensions": "io.delta.sql.DeltaSparkSessionExtension", "spark.sql.catalog.spark_catalog": "org.apache.spark.sql.delta.catalog.DeltaCatalog"}' |
RELEASE 6.1.3
1. Jupyter configuration and persistent storage
Feature
Added extended configuration of jupyter notebook helm chart through helm values. Moreover added persitent storage to jupyter pod.
All data saved in `work` directory will now be available after jupyter restart/update.
Values added - ilum-jupyter
pvc parameters
Name | Description | Value |
---|---|---|
pvc.annotations | persistent volume claim annotations | {} |
pvc.selector | persistent volume claim selector | {} |
pvc.accessModes | persistent volume claim accessModes | ReadWriteOnce |
pvc.storage | persistent volume claim storage requests | 4Gi |
pvc.storageClassName | persistent volume claim storageClassName | `` |
sparkmagic configuration parameters
Name | Description | Value |
---|---|---|
sparkmagic.config.kernelPythonCredentials.username | sparkmagic python kernel username | "" |
sparkmagic.config.kernelPythonCredentials.password | sparkmagic python kernel password | "" |
sparkmagic.config.kernelPythonCredentials.auth | sparkmagic python kernel auth mode | "None" |
sparkmagic.config.kernelScalaCredentials.username | sparkmagic python kernel username | "" |
sparkmagic.config.kernelScalaCredentials.password | sparkmagic scala kernel password | "" |
sparkmagic.config.kernelScalaCredentials.auth | sparkmagic scala kernel auth mode | "None" |
sparkmagic.config.kernelRCredentials.username | sparkmagic r kernel username | "" |
sparkmagic.config.kernelRCredentials.password | sparkmagic r kernel password | "" |
sparkmagic.config.waitForIdleTimeoutSeconds | sparkmagic timeout waiting for idle state | 15 |
sparkmagic.config.livySessionStartupTimeoutSeconds | sparkmagic timeout waiting for the session to start | 300 |
sparkmagic.config.ignoreSslErrors | sparkmagic ignore ssl errors flag | false |
sparkmagic.config.sessionConfigs.conf | sparkmagic session spark configuration | [pyRequirements: pandas, spark.jars.packages: io.delta:delta-core_2.12:2.4.0, spark.sql.extensions: io.delta.sql.DeltaSparkSessionExtension,spark.sql.catalog.spark_catalog: org.apache.spark.sql.delta.catalog.DeltaCatalog] |
sparkmagic.config.sessionConfigs.driverMemory | sparkmagic session driver memory | 1000M |
sparkmagic.config.sessionConfigs.executorCores | sparkmagic session executor cores | 2 |
sparkmagic.config.sessionConfigsDefaults.conf | sparkmagic session defaults spark configuration | [pyRequirements: pandas, spark.jars.packages: io.delta:delta-core_2.12:2.4.0, spark.sql.extensions: io.delta.sql.DeltaSparkSessionExtension,spark.sql.catalog.spark_catalog: org.apache.spark.sql.delta.catalog.DeltaCatalog] |
sparkmagic.config.sessionConfigsDefaults.driverMemory | sparkmagic session defaults driver memory | 1000M |
sparkmagic.config.sessionConfigsDefaults.executorCores | sparkmagic session defaults executor cores | 2 |
sparkmagic.config.useAutoViz | sparkmagic use auto viz flag | true |
sparkmagic.config.coerceDataframe | sparkmagic coerce dataframe flag | true |
sparkmagic.config.maxResultsSql | sparkmagic max sql result | 2500 |
sparkmagic.config.pysparkDataframeEncoding | sparkmagic pyspark dataframe encoding | utf-8 |
sparkmagic.config.heartbeatRefreshSeconds | sparkmagic heartbeat refresh seconds | 30 |
sparkmagic.config.livyServerHeartbeatTimeoutSeconds | sparkmagic livy server heartbeat timeout seconds | 0 |
sparkmagic.config.heartbeatRetrySeconds | sparkmagic heartbeat retry seconds | 10 |
sparkmagic.config.serverExtensionDefaultKernelName | sparkmagic server extension default kernel name | pysparkkernel |
sparkmagic.config.retryPolicy | sparkmagic retry policy | configurable |
sparkmagic.config.retrySecondsToSleepList | sparkmagic retry seconds to sleep list | [0.2, 0.5, 1, 3, 5] |
sparkmagic.config.configurableRetryPolicyMaxRetries | sparkmagic retry policy max retries | 8 |
RELEASE 6.1.2
RELEASE 6.1.2-RC2
1. Hive metastore in ilum-aio chart
Feature
Hive metastore in ilum AIO chart. HMS is a central repository of metadata for Hive tables and partitions in a relational database,
and provides clients (including Hive, Impala and Spark) access to this information using the metastore service API.
With hive metastore enabled in ilum AIO helm stack spark jobs run by ilum can be configured to autmatically access it.
Values added - ilum-hive-metastore
Newly added whole chart, check its values on chart page
Values added - ilum-core
Name | Description | Value |
---|---|---|
hiveMetastore.enabled | passing hive metastore config to ilum spark jobs flag | false |
hiveMetastore.address | hive metastore address | thrift://ilum-hive-metastore:9083 |
2. Postgres extensions added
Feature
Few of ilum AIO subchars use postgresql, to make it easier to manage deployment of them we have added postgres extension resource to create postgresql databases for ilum sucharts.
Values added - ilum-aio
postgresql extensions parameters
Name | Description | Value |
---|---|---|
postgresExtensions.enabled | postgres extensions enabled flag | true |
postgresExtensions.image | image to run extensions in | bitnami/postgresql:16 |
postgresExtensions.pullPolicy | image pull policy | IfNotPresent |
postgresExtensions.imagePullSecrets | image pull secrets | [] |
postgresExtensions.host | postgresql database host | ilum-postgresql-0.ilum-postgresql-hl |
postgresExtensions.port | postgresql database port | 5432 |
postgresExtensions.databasesToCreate | comma separated list of databases to create | marquez,airflow,metastore |
postgresExtensions.auth.username | postgresql account username | ilum |
postgresExtensions.auth.password | postgresql account password | CHANGEMEPLEASE |
postgresExtensions.nodeSelector | postgresql extensions pods node selector | {} |
postgresExtensions.tolerations | postgresql extensions pods tolerations | [] |
3. Loki and promtail in ilum-aio chart
Feature
Loki and promtail in ilum AIO chart. Loki is a horizontally scalable, highly available, multi-tenant log aggregation system inspired by Prometheus.
Promtail is an agent which ships the contents of local logs to a Grafana Loki instance. Ilum will now use loki to aggregate logs from spark job pods
to be able to clean cluster resources after jobs are done. Loki and promtail are preconfigured to scrap logs only from spark pods run by ilum in order to fetch job logs after their finish.
Values added - ilum-core
log aggregation config
Name | Description | Value |
---|---|---|
global.logAggregation.enabled | ilum log aggregation flag, if enabled Ilum will fetch logs of finished kubernetes spark pods from loki | false |
global.logAggregation.loki.url | loki gateway address to access logs | http://ilum-loki-gateway |
Values added - ilum-aio
log aggregation - loki config
Name | Description | Value |
---|---|---|
loki.nameOverride | subchart name override | ilum-loki |
loki.monitoring.selfMonitoring.enabled | self monitoring enabled flag | false |
loki.monitoring.selfMonitoring.grafanaAgent.installOperator | self monitoring grafana agent operator install flag | false |
loki.monitoring.selfMonitoring.lokiCanary.enabled | self monitoring canary enabled flag | false |
loki.test.enabled | tests enabled flag | false |
loki.loki.auth_enabled | authentication enabled flag | false |
loki.loki.storage.bucketNames.chunks | storage chunks bucket | ilum-files |
loki.loki.storage.bucketNames.ruler | storage ruler bucket | ilum-files |
loki.loki.storage.bucketNames.admin | storage admin bucket | ilum-files |
loki.loki.storage.type | storage type | s3 |
loki.loki.s3.endpoint | s3 storage endpoint | http://ilum-minio:9000 |
loki.loki.s3.region | s3 storage endpoint | us-east-1 |
loki.loki.s3.secretAccessKey | s3 storage secret access key | minioadmin |
loki.loki.s3.accessKeyId | s3 storage access key id | minioadmin |
loki.loki.s3.s3ForcePathStyle | s3 storage path style access flag | true |
loki.loki.s3.insecure | s3 storage insecure flag | true |
loki.loki.compactor.retention_enabled | logs retention enabled flag | true |
loki.loki.compactor.deletion_mode | deletion mode | filter-and-delete |
loki.loki.compactor.shared_store | shared store | s3 |
loki.loki.limits_config.allow_deletes | allow logs deletion flag | true |
log aggregation - loki config
Name | Description | Value |
---|---|---|
promtail.config.clients[0].url | first client url | http://ilum-loki-write:3100/loki/api/v1/push |
promtail.snippets.pipelineStages[0].match.selector | pipeline stage to drop non ilum logs selector | {ilum_logAggregation!="true"} |
promtail.snippets.pipelineStages[0].match.action | pipeline stage to drop non ilum logs action | drop |
promtail.snippets.pipelineStages[0].match.drop_counter_reason | pipeline stage to drop non ilum logs drop_counter_reason | non_ilum_log |
promtail.snippets.extraRelabelConfigs[0].action | relabel config to keep ilum pod labels action | labelmap |
promtail.snippets.extraRelabelConfigs[0].regex | relabel config to keep ilum pod labels regex | __meta_kubernetes_pod_label_ilum(.*) |
promtail.snippets.extraRelabelConfigs[0].replacement | relabel config to keep ilum pod labels replacement | ilum${1} |
promtail.snippets.extraRelabelConfigs[1].action | relabel config to keep spark pod labels action | labelmap |
promtail.snippets.extraRelabelConfigs[1].regex | relabel config to keep spark pod labels regex | __meta_kubernetes_pod_label_spark(.*) |
promtail.snippets.extraRelabelConfigs[1].replacement | relabel config to keep spark pod labels replacement | spark${1} |
RELEASE 6.1.2-RC1
RELEASE 6.1.1
1. Added health checks for ilum interactive jobs
Feature
To prevent situations with unexpected crushes of ilum groups we added healthchecks to make sure they work as they should.
Values added - ilum-core
ilum-job parameters
Name | Description | Value |
---|---|---|
job.healthcheck.enabled | spark interactive jobs healthcheck enabled flag | true |
job.healthcheck.interval | spark interactive jobs healthcheck interval in seconds | 300 |
job.healthcheck.tolerance | spark interactive jobs healthcheck response time tolerance in seconds | 120 |
2. Parameterized replica scale for ilum scalable services
Feature
The configuration of the number of replicas for ilum scalable services was extracted to helm values.
Values added - ilum-core
ilum-core common parameters
Name | Description | Value |
---|---|---|
replicaCount | number of ilum-core replicas | 1 |
Values added - ilum-ui
ilum-ui common parameters
Name | Description | Value |
---|---|---|
replicaCount | number of ilum-ui replicas | 1 |
RELEASE 6.1.0
RELEASE 6.1.0-RC4
RELEASE 6.1.0-RC3
RELEASE 6.1.0-RC2
1. Deleted unneeded parameters from ilum cluster wasbs storage
Feature
WASBS storage containers no longer needs to have sas token porvided in helm values as it turned out to be unnecessary
Values deleted - ilum-core
wasbs storage parameters
Name | Reason |
---|---|
kubernetes.wasbs.sparkContainer.name | Moved to kubernetes.wasbs.sparkContainer value |
kubernetes.wasbs.sparkContainer.sasToken | Turned out to be unnecessary |
kubernetes.wasbs.dataContainer.name | Moved to kubernetes.wasbs.dataContainer value |
kubernetes.wasbs.dataContainer.sasToken | Turned out to be unnecessary |
Values added - ilum-core
wasbs storage parameters
Name | Description | Value |
---|---|---|
kubernetes.wasbs.sparkContainer | default kubernetes cluster WASBS storage container name to store spark resources | ilum-files |
kubernetes.wasbs.dataContainer | default kubernetes cluster WASBS storage container name to store ilum tables | ilum-tables |
2. Added init containers to check service availability
Feature
To make Ilum deployment more gracefully, from now on Ilum containers have containers waiting for the availability of the services they depend on.
Values added - ilum-core
Name | Description | Value |
---|---|---|
mongo.statusProbe.enabled | mongo status probe enabled flag | true |
mongo.statusProbe.image | init container that waits for mongodb to be available image | mongo:7.0.5 |
kafka.statusProbe.enabled | kafka status probe enabled flag | true |
kafka.statusProbe.image | init container that waits for kafka to be available image | bitnami/kafka:3.4.1 |
historyServer.statusProbe.enabled | ilum history server ilum-core status probe enabled flag | true |
historyServer.statusProbe.image | ilum history server init container that waits for ilum-core to be available image | curlimages/curl:8.5.0 |
Values added - ilum-livy-proxy
Name | Description | Value |
---|---|---|
statusProbe.enabled | ilum-core status probe enabled flag | true |
statusProbe.image | init container that waits for ilum-core to be available image | curlimages/curl:8.5.0 |
Values added - ilum-ui
Name | Description | Value |
---|---|---|
statusProbe.enabled | ilum-core status probe enabled flag | true |
statusProbe.image | init container that waits for ilum-core to be available image | curlimages/curl:8.5.0 |
3. Parameterized kafka producers in ilum-core chart
Feature
In kafka communication mode ilum interactive jobs responses to interactive job instances using kafka producers. With newly added helm values kafka producer can be adapted to match user needs.
Values added - ilum-core
kafka parameters
Name | Description | Value |
---|---|---|
kafka.maxPollRecords | kafka max.poll.records parameter for ilum jobs kafka consumer, it determines how much requests ilum-job kafka consumer will fetch with each poll | 500 |
kafka.maxPollInterval | kafka max.poll.interval.ms parameter for ilum jobs kafka consumer, it determines the maximum delay between invocations of poll, which in ilum-job context means time limit for processing requests fetched in poll | 60000 |
RELEASE 6.1.0-RC1
1. added support for service annotations
Feature
Ilum helm charts services annotations may now be configured through helm values
Values added - ilum-core
service parameters
Name | Description | Value |
---|---|---|
service.annotations | service annotations | {} |
grpc.service.annotations | grpc service annotations | {} |
historyServer.service.annotations | history server service annotations | {} |
Values added - ilum-jupyter
service parameters
Name | Description | Value |
---|---|---|
service.annotations | service annotations | {} |
Values added - ilum-livy-proxy
service parameters
Name | Description | Value |
---|---|---|
service.annotations | service annotations | {} |
Values added - ilum-ui
service parameters
Name | Description | Value |
---|---|---|
service.annotations | service annotations | {} |
Values added - ilum-zeppelin
service parameters
Name | Description | Value |
---|---|---|
service.annotations | service annotations | {} |
2. Pulled out security oauth2 parameters to global values
Feature
Ilum security oauth2 configuration is now being set through global values
Values added - ilum-aio
security parameters
Name | Description | Value |
---|---|---|
global.security.oauth2.clientId | oauth2 client ID | `` |
global.security.oauth2.issuerUri | oauth2 URI that can either be an OpenID Connect discovery endpoint or an OAuth 2.0 Authorization Server Metadata endpoint defined by RFC 8414 | `` |
global.security.oauth2.audiences | oauth2 audiences | `` |
global.security.oauth2.clientSecret | oauth2 client secret | `` |
Values deleted - ilum-core
security parameters
Name | Reason | Value |
---|---|---|
security.oauth2.clientId | oauth2 security parameters are now configured through global values | `` |
security.oauth2.issuerUri | oauth2 security parameters are now configured through global values | `` |
3. Runtime environment variables for frontend
Feature
Configuration for frontend environment variables throuhg helm ui values.
Values added - ilum-ui
runtime variables
Name | Description | Value |
---|---|---|
runtimeVars.defaultConfigMap.enabled | default config map for frontend runtime environment variables | true |
runtimeVars.debug | debug logging flag | false |
runtimeVars.backenUrl | ilum-core backend url | http://ilum-core:9888 |
runtimeVars.historyServerUrl | url of history server ui | http://ilum-history-server:9666 |
runtimeVars.jupyterUrl | url of jupyter ui | http://ilum-jupyter:8888 |
runtimeVars.airflowUrl | url of airflow ui | http://ilum-webserver:8080 |
runtimeVars.minioUrl | url of minio ui | http://ilum-minio:9001 |
runtimeVars.mlflowUrl | url of mlflow ui | http://mlflow:5000 |
runtimeVars.historyServerPath | ilum-ui proxy path to history server ui | /external/history-server/ |
runtimeVars.jupyterPath | ilum-ui proxy path to jupyter ui | /external/jupyter/lab/tree/work/IlumIntro.ipynb |
runtimeVars.airflowPath | ilum-ui proxy path to airflow ui | /external/airflow/ |
runtimeVars.dataPath | ilum-ui proxy path to minio ui | /external/minio/ |
runtimeVars.mlflowPath | ilum-ui proxy path to mlflow ui | /external/mlflow/ |
Values deleted - ilum-ui
Name | Reason |
---|---|
debug | moved to runtimeVars section |
backenUrl | moved to runtimeVars section |
historyServerUrl | moved to runtimeVars section |
jupyterUrl | moved to runtimeVars section |
airflowUrl | moved to runtimeVars section |
4. Kube-prometheus-stack in ilum-aio chart
Feature
Kube prometheus stack in ilum AIO chart. Preconfigured to automatically work wiht ilum deployment in order to collect metrics of ilum pods and spark jobs run by ilum.
Ilum provides prometheus service monitors to autoamtically scrape metrics from spark driver pods run by ilum and ilum backend services.
Additionally ilum_aio chart provides built-in grafana dashboards that can be found in `Ilum` folder.
Values added - ilum-aio
kube-prometheus-stack variables - for extended configuration check kube-prometheus stack helm chart
Name | Description | Value |
---|---|---|
kube-prometheus-stack.enabled | kube-prometheus-stack enabled flag | false |
kube-prometheus-stack.releaseLabel | kube-prometheus-stack flag to watch resource only from ilum_aio release | true |
kube-prometheus-stack.kubeStateMetrics.enabled | kube-prometheus-stack Component scraping kube state metrics enabled flag | false |
kube-prometheus-stack.nodeExporter.enabled | kube-prometheus-stack node exporter daemon set deployment flag | false |
kube-prometheus-stack.alertmanager.enabled | kube-prometheus-stack alert manager flag | false |
kube-prometheus-stack.grafana.sidecar.dashboards.folderAnnotation | kube-prometheus-stack, If specified, the sidecar will look for annotation with this name to create folder and put graph here | grafana_folder |
kube-prometheus-stack.grafana.sidecar.dashboards.provider.foldersFromFilesStructure | kube-prometheus-stack, allow Grafana to replicate dashboard structure from filesystem | true |
Values added - ilum-core
Name | Description | Value |
---|---|---|
job.prometheus.enabled | prometheus enabled flag, If true spark jobs run by Ilum will share metrics in prometheus format | true |
5. Marquez OpenLineage in ilum-aio chart
Feature
Marquez OpenLineage in ilum AIO chart. Marquez enables consuming, storing, and visualizing OpenLineage metadata from across an organization,
serving use cases including data governance, data quality monitoring, and performance analytics. With marquez enabled in ilum AIO helm stack spark job run by Ilum will share lineage information with marquez backend.
Marquez web interface visualize data lienage information collected from spark jobs and it is accesible through ilum UI as iframe.
Values added - ilum-aio
Name | Description | Value |
---|---|---|
global.lineage.enabled | marquez enabled flag | false |
Values added - ilum-core
Name | Description | Value |
---|---|---|
job.openLineage.transport.type | marquez communication type | http |
job.openLineage.transport.serverUrl | marquez backend url including namespace name, where event from ilum's spark job should be stored | http://ilum-marquez:9555/api/v1/namespaces/ilum |
Values added - ilum-marquez
Newly added whole chart, check its values on chart page
Values added - ilum-ui
Name | Description | Value |
---|---|---|
runtimeVars.lineageUrl | url to provide marquez openlineage UI iframe | http://ilum-marquez-web:9444 |
runtimeVars.lineagePath | ilum-ui proxy path to marquez openlineage UI | /external/lineage/ |
RELEASE 6.0.3
1. Parameterized kafka producers max.request.size parameter in ilum-core chart
Feature
In kafka communication mode ilum interactive jobs responses to interactive job instances using kafka producers. With newly added helm value max.request.size kafka producer parameter can be adapted to match responses size needs.
Values added - ilum-core
kafka parameters
Name | Description | Value |
---|---|---|
kafka.requestSize | kafka max.request.size parameter for ilum jobs kafka producers | 20000000 |
RELEASE 6.0.2
1. Support for hdfs, gcs and azure blob storage in ilum-core chart
Feature
Ilum cluster no longer has to be attached to s3 storage, from now default cluster can be configured to use hdfs, gcs or azure blob as storage as well. It can be achieved using newly added values in ilum-core helm chart.
Values deleted - ilum-core
Name | Reason |
---|---|
kubernetes.s3.bucket | From now on two separated buckets must be set with new values: kubernetes.s3.sparkBucket , kubernetes.s3.dataBucket |
Values added - ilum-core
kubernetes storage parameters
Name | Description | Value |
---|---|---|
kubernetes.upgradeClusterOnStartup | default kubernetes cluster upgrade from values in config map flag | false |
kubernetes.storage.type | default kubernetes cluster storage type, available options: s3 , gcs , wasbs , hdfs | s3 |
s3 kubernetes storage parameters
Name | Description | Value |
---|---|---|
kubernetes.s3.host | default kubernetes cluster S3 storage host to store spark resources | s3 |
kubernetes.s3.port | default kubernetes cluster S3 storage port to store spark resources | 7000 |
kubernetes.s3.sparkBucket | default kubernetes cluster S3 storage bucket to store spark resources | ilum-files |
kubernetes.s3.dataBucket | default kubernetes cluster S3 storage bucket to store ilum tables | ilum-tables |
kubernetes.s3.accessKey | default kubernetes cluster S3 storage access key to store spark resources | "" |
kubernetes.s3.secretKey | default kubernetes cluster S3 storage secret key to store spark resources | "" |
gcs kubernetes storage parameters
Name | Description | Value |
---|---|---|
kubernetes.gcs.clientEmail | default kubernetes cluster GCS storage client email | "" |
kubernetes.gcs.sparkBucket | default kubernetes cluster GCS storage bucket to store spark resources | "ilum-files" |
kubernetes.gcs.dataBucket | default kubernetes cluster GCS storage bucket to store ilum tables | "ilum-tables" |
kubernetes.gcs.privateKey | default kubernetes cluster GCS storage private key to store spark resources | "" |
kubernetes.gcs.privateKeyId | default kubernetes cluster GCS storage private key id to store spark resources | "" |
wasbs kubernetes storage parameters
Name | Description | Value |
---|---|---|
kubernetes.wasbs.accountName | default kubernetes cluster WASBS storage account name | "" |
kubernetes.wasbs.accessKey | default kubernetes cluster WASBS storage access key to store spark resources | "" |
kubernetes.wasbs.sparkContainer.name | default kubernetes cluster WASBS storage container name to store spark resources | "ilum-files" |
kubernetes.wasbs.sparkContainer.sasToken | default kubernetes cluster WASBS storage container sas token to store spark resources | "" |
kubernetes.wasbs.dataContainer.name | default kubernetes cluster WASBS storage container name to store ilum tables | "ilum-tables" |
kubernetes.wasbs.dataContainer.sasToken | default kubernetes cluster WASBS storage container sas token to store ilum tables | "" |
hdfs kubernetes storage parameters
Name | Description | Value |
---|---|---|
kubernetes.hdfs.hadoopUsername | default kubernetes cluster HDFS storage hadoop username | "" |
kubernetes.hdfs.config | default kubernetes cluster HDFS storage dict of config files with name as key and base64 encoded content as value | "" |
kubernetes.hdfs.sparkCatalog | default kubernetes cluster HDFS storage catalog to store spark resources | "ilum-files" |
kubernetes.hdfs.dataCatalog | default kubernetes cluster HDFS storage catalog to store ilum-tables | "ilum-tables" |
kubernetes.hdfs.keyTab | default kubernetes cluster HDFS storage keytab file base64 encoded content | "" |
kubernetes.hdfs.principal | default kubernetes cluster HDFS storage principal name | "" |
kubernetes.hdfs.krb5 | default kubernetes cluster HDFS storage krb5 file base64 encoded content | "" |
kubernetes.hdfs.trustStore | default kubernetes cluster HDFS storage trustStore file base64 encoded content | "" |
kubernetes.hdfs.logDirectory | default kubernetes cluster HDFS storage directory absolute path to store eventLog for history server | "" |
Important! Make sure S3/GCS buckets or WASBS containers are already created and reachable!
2. Added spark history server to ilum-core helm chart
Feature
Spark history server can be deployed from now on along with ilum-core. History server config is being passed to every spark job ilum runs.
History server UI can now be accesesed by ilum UI. If enabled it will use default kubernetes cluster storage configured with kubernetes.[STORAGE_TYPE].[PARAMETER] values as eventLog storage.
Values added - ilum-core
history server parameters
Name | Description | Value |
---|---|---|
historyServer.enabled | spark history server flag | true |
historyServer.image | spark history server image | ilum/spark-launcher:spark-3.4.1 |
historyServer.address | spark history server address | http://ilum-history-server:9666 |
historyServer.pullPolicy | spark history server image pull policy | IfNotPresent |
historyServer.imagePullSecrets | spark history server image pull secrets | [] |
historyServer.parameters | spark history server custom spark parameters | [] |
historyServer.resources | spark history server pod resources |
|
historyServer.service.type | spark history server service type | ClusterIP |
historyServer.service.port | spark history server service port | 9666 |
historyServer.service.nodePort | spark history server service nodePort | "" |
historyServer.service.clusterIP | spark history server service clusterIP | "" |
historyServer.service.loadBalancerIP | spark history server service loadbalancerIP | "" |
historyServer.ingress.enabled | spark history server ingress flag | false |
historyServer.ingress.version | spark history server ingress version | "v1" |
historyServer.ingress.className | spark history server ingress className | "" |
historyServer.ingress.host | spark history server ingress host | "host" |
historyServer.ingress.path | spark history server ingress path | "/(.*)" |
historyServer.ingress.pathType | spark history server ingress pathType | Prefix |
historyServer.ingress.annotations | spark history server annotations | nginx.ingress.kubernetes.io/rewrite-target: /$1 nginx.ingress.kubernetes.io/proxy-body-size: "600m" nginx.org/client-max-body-size: "600m" |
Warnings
1. Make sure HDFS logDirectory (helm value kubernetes.hdfs.logDirectory) is absolute path of configured sparkCatalog with /ilum/logs suffix! Eg for kubernetes.hdfs.sparkCatalog=spark-catalog put hdfs://name-node/user/username/spark-catalog/ilum/logs
3. Job retention in ilum-core chart
Feature
Ilum jobs will be deleted after the configured retention period expires
Values added - ilum-core
job retention parameters
Name | Description | Value |
---|---|---|
job.retain.hours | spark jobs retention hours limit | 168 |