Skip to main content

Table Explorer

Overview

Table Explorer is a powerful tool for monitoring datasets in your applications and their contents. Here you have

List of all databases and tables

That are created via Ilum Jobs, Ilum Groups or queries in Sql Viewer

Ilum

Table schema

Ilum

Lineage

Lineage of each table - the traceable history of transformations and operations applied to that table displayed on UI

Ilum

Powerful Data Exploration tool

It gives you oportunity to create all kinds of charts, apply mathematical function on the data, apply filters and more.

Ilum

How to use your datasets as in table explorer?

In order to see you data in table explorer, you need to save your dataset as a table.

There are multiple ways you can do that:

  • In Spark Sql:
    CREATE TABLE target_table AS (SELECT col1, col2, col3 FROM source_table)
    CREATE TABLE target_table (col1 TYPE1, col2 TYPE2, col3 TYPE3);
INSERT INTO target_table (col1, col2, col3) VALUES
(value1_1, value1_2, value1_3)
...
  • Programatically in Scala
    df.write
.mode("overwrite")
.format("hive")
.saveAsTable("table_name")

Data Exploration Tool

The Data Exploration Tool allows you to interactively explore and visualize a sample of your dataset (default: 1,000 rows, or a custom value of your choice) through an intuitive user interface. This tool enables users to analyze data efficiently, offering a wide range of customization options for data representation and chart generation.

Customizable Axes for Charts

Select columns for the x-axis (horizontal) and y-axis (vertical) to visualize relationships between variables and quickly configure your chart to represent data with precision and flexibility.

Ilum

Data Aggregation and Grouping

Aggregate and group data using common statistical functions such as:

  • Sum
  • Mean
  • Median
  • Standard Deviation
  • Variance Apply these functions to your data for more insightful analysis.

Filtering Capabilities

Filter your data based on various data types and conditions.

Ilum

Diverse Data Representation Formats

Choose from 12 different formats to represent your data visually, including bar charts, line charts, scatter plots, and more. Along with that there are many ways to customize your chart

Ilum

Ilum

Exporting Charts

Export your charts in multiple formats:

  • CSV (data)
  • SVG (vector graphics)
  • PNG (image format)

Insights and Depoyment

Spark catalogs

In Spark, SQL operations rely on Spark Catalogs, which manage database and table schemas in runtime memory. However, the limitation of Spark Catalogs is that the tables created within them only persist for the duration of the Spark Session. Once the session ends, the table definitions are lost.

Hive Catalog and Hive Metastore

The Hive Catalog addresses this limitation by storing table schemas and metadata in a persistent database called the Hive Metastore. This ensures that table definitions are retained across multiple Spark sessions.

To configure Spark to use the Hive Catalog, you typically need to adjust the Spark session settings as follows:

    # To make spark catalog use hive metastorage 
spark.sql.catalogImplementation=hive

# URI to Hive Metastore with Thrift protocol
spark.hadoop.hive.metastore.uris=thrift://ilum-hive-metastore:9083

However, Ilum simplifies this process by automatically configuring all Ilum Jobs, Groups, and SQL queries in the SQL Viewer to use the Hive Catalog, eliminating the need for manual setup.

Setting up Hive Metastore: Metadata Database

Typically, to use the Hive Catalog, you must set up the Hive Metastore by completing the following steps:

  • Set up the Database: Configure a database to store Hive metadata.
  • Set up the Hive Metastore Server: Install and configure the Hive Metastore service.
  • Configure the Server to Use the Database: Modify the appropriate XML configuration files (e.g., hive-site.xml) to connect the Hive Metastore to the database.
  • Configure the Server to Use the Storage: Set up the storage backend (e.g., HDFS, S3, GCS) by updating the relevant XML files.

These steps can be time-consuming and repetitive.

Ilum simplifies this process by automatically handling the entire Hive Metastore setup, including database and storage configuration.

    helm upgrade 
--set ilum-hive-metastore.enabled=ture
--reuse-values ilum ilum/ilum

Take into account: in case you use custom credentials for Postgre Sql Database like this:

    helm upgrade \
--set postgresql.auth.username=customuser \
--set postgresql.auth.password="CHOOSE PASSWORD" \
--reuse-values ilum ilum/ilum

You must configure Hive Metastore to use these credentials:

    helm upgrade \
--set ilum-hive-metastore.postgresql.auth.password="CHOOSE PASSWORD" \
--set ilum-hive-metastore.postgresql.auth.username=customuser \
--reuse-values ilum ilum/ilum

Setting up Hive Metastore: Storage

Storage, also referred to as a Warehouse, is the location where the actual data is stored. Hive supports various storage backends, including:

  • HDFS (Hadoop Distributed File System)
  • Amazon S3 Buckets and MinIO
  • Google Cloud Storage (GCS)
  • Windows Azure Storage Blob (WASBS)

Typically, you would need to set up one of these storage options and configure Hive's metastore connection accordingly within an XML file.

However, with Ilum, the S3 MinIO storage is pre-configured for you, and the Hive Metastore is already set up to use it by default. Configuring Other Storage Backends

If you prefer to use an alternative storage backend, you can configure Hive to work with it by reconfiguring your helm values:

For S3 storage or MinIO:

    helm upgrade 
--set ilum-hive-metastore.storage.type="s3" \
--set ilum-hive-metastore.storage.metastore.warehouse="s3a://yourbucket/yourfolder" \
--set ilum-hive-metastore.storage.s3.accessKey="your_access_key" \
--set ilum-hive-metastore.storage.s3.secretKey="your_secret_key" \
--set ilum-hive-metastore.storage.s3.host="yourhost" \
--set ilum-hive-metastore.storage.s3.port=yourport \
--reuse-values ilum ilum/ilum

For GCS:

    helm upgrade
--set ilum-hive-metastore.storage.type="gcs" \
--set ilum-hive-metastore.storage.metastore.warehouse="gs://my-gcs-bucket/path/to/folder/" \
--set ilum-hive.metastore.storage.gcs.clientEmail="your@email" \
--set ilum-hive-metastore.storage.gcs.privateKey="yourprivatekey" \
--set ilum-hive-metastore.storage.gcs.privateKeyId="privatekeyid" \
--reuse-values ilum ilum/ilum

For WASBS:

    helm upgrade 
--set ilum-hive-metastore.storage.type="wasbs" \
--set ilum-hive-metastore.storage.metastore.warehouse="wasbs://[email protected]/path/to/folder/" \
--set ilum-hive-metastore.storage.wasbs.accountName="youraccountname" \
--set ilum-hive-metastore.storage.wasbs.accessKey="youraccesskey" \
--reuse-values ilum ilum/ilum

For HDFS:

Here you will require specify your hdfs configurations in

    ilum-hive-metastore.storage.hdfs.config

You can provide them in hdfs-config.yaml:

    helm upgrade 
--set ilum-hive-metastore.storage.type="hdfs" \
--set ilum-hive-metastore.storage.metastore.warehouse="hdfs://node:port/path/to/folder" \
--set ilum-hive-metastore.storage.hdfs.hadoopUsername="yourusername" \
--reuse-values ilum ilum/ilum \
-f hdfs-config.yaml

Table Metadata Gathering in Ilum

Ilum uses hive client to gather data about tables and their columns and these way you can see everything in data explorer.

Features

Right now Ilum supports only one Hive Metastore, that is created automatically. Right now we are developing infrastructure for addition of your own Hive Metastores and Metastore of different types.