Tableau
Overview
Tableau and is a powerful tools for data analytics, offering a wide range of charts to visualize your data, the ability to create interactive dashboards, and the flexibility to transform your data in various ways.
Tableau can be easily configured to leverage data stored on Ilum's infrastructure.
Tableau Integration
Install Tableau Desktop
To install Tableau Desktop and activate the free trial, visit Tableau`s official page. Here, you can access a 14-day free trial and download the desktop application.
Activate Ilum SQL and Hive Metastore
Ilum SQL will be used by Tableau to retrieve data from the Ilum Data Infrastructure. It launches a Spark session, allowing you to gather distributed data using Spark SQL. The Hive Metastore is employed to maintain the Spark catalog in long-term memory.
When integrated with Tableau, your charts can access data from the entire cluster as if it were from a single database.
To use Ilum SQL and the Hive Metastore, you need to enable them through Helm properties.
helm upgrade ilum ilum/ilum \
--set ilum-hive-metastore.enabled=true \
--set ilum-core.hiveMetastore.enabled=true \
--set ilum-kyuubi.enabled=true \
--reuse-values
To learn more about Ilum SQL and Hive Metastore in Ilum visit:
Expose Ilum SQL
To use Ilum SQL from Tableau, you must have access to the ilum-sql-thrift-binary
service on your cluster from your computer.
If your cluster is running on Minikube on the same computer as Tableau, you can easily set up a port-forward with the following command:
kubectl port-forward svc/ilum-sql-thrift-binary 10009:10009
If you are using an actual cluster, you can expose the Ilum SQL service in various ways. For example, you can modify the Helm chart values to change the service type to LoadBalancer like this:
helm upgrade ilum ilum/ilum \
--set ilum-kyuubi.services.thriftBinary.type="LoadBalancer" \
--reuse-values
Then after a minute you can run
kubectl get service ilum-sql-thrift-binary
and you will see a public ip assigned to the service that you will be able to use for connection:
Install Kyuubi JDBC Driver
To gather data and build charts in Tableau, you need to create a datasource. A datasource defines the connection details and specifies the plugin required to retrieve the data.
To communicate with Ilum SQL, you must first install the kyuubi-hive-jdbc-shaded
JAR package. You can download it from the Maven Central Repository.
Once the JAR file is downloaded, place it in Tableau's drivers folder. On Windows, this is typically located at: C:\Program Files\Tableau\Drivers
.
After placing the JAR file in the correct folder, restart the Tableau Desktop application.
Add Datasource
To add a datasource in Tableau, follow these steps:
- Go to Home.
- Navigate to Connect > To a Server > Other Databases (JDBC).
- Enter the URL for your Ilum SQL service:
If you exposed the service using port-forward, use:
jdbc:kyuubi://localhost:10009/default
If you exposed the service using LoadBalancer, use:jdbc:kyuubi://<public-ip>:10009/default
- Click Sign In.
- Start using Tableau with your data!
Ilum SQL Spark Configurations
In Ilum SQL, the Spark session is preconfigured automatically to work with the storage, Hive Metastore, Delta, and other components of the Ilum architecture. However, if you want to add additional configurations, you can do so by modifying the URL according to the following syntax:
jdbc:kyuubi://<public-ip>:10009/default;spark.key1=value1;spark.key2=value2;
Alternatively you can configure and launch spark session from Ilum UI and then the Tableau will automatically connect to this spark session.
Tableau Usage
- Choose the spark_catalog
- Choose default as schema
- Double Click on the Table with data that you are interested in
After that you can create a sheet where you will draw a chart that you are interested in by assigning columns of the table to columns, rows or content of the charts and selecting the chart that you like.
For example this chart compares machine types in the system by their cpu usage:
In Tableau there are 15 available kinds of charts:
Moreover you can combine multiple sheets into dashboards that will periodicaly gather data to refresh the carts and will allow you monitor data analysis in an easy way.