Skip to main content

How to create a storage

Ilum allows you to link GCS, S3, WASDB and HDFS storages to your clusters. Such a link allows Ilum to automatically configure all your jobs to make use of the storages that you have. This way you won't need to add additional spark parameters in order to load and save data from the storage.

Simple Storage

Google Cloud Storage

1. Creating a Google Cloud Storage

Demo:

Guide In Full Screen

  1. Create a Google Cloud project
  • Click on the Project Selector in the top left corner
  • Click on create button and create a project or choose an existing one
  1. Create a GCS
  • Type GCS into the search bar and choose Cloud Storage option
  • Click Create Bucket
  • Specify the region configurations and a Bucket name
  1. Create Service Account
  • Type IAM into the search bar and move to IAM dashboard
  • Move to Service Account section and create a Service Account by specifying its name
  • Choose the Service Account that you have just created and add keys to it

The keys will be installed as JSON (by default) to your Donwloads folder.

This JSON will have private_key, private_key_id and client_email properties - they will be used to connect to GCS from Ilum.

2. Adding GCS to your default cluster

Demo:

Guide in Full Screen

  1. Go to storage edditing page
  • Choose a cluster where you want to add a storage
  • Click "Add Storage"
  1. Speicfy the bucket and the name
  2. Provide GCS authorization details:
  • Get the JSON file from previous step
  • Fit in the email
  • Fit in the private key of your Service Account
  • Fir in the private key id of your Service Account

3. Testing if the conneciton works

  1. Create a Code group

  2. Paste this code:

val data = Seq(
("Alice", 34),
("Bob", 45),
("Cathy", 29)
)
val columns = Seq("name", "age")
val df = spark.createDataFrame(data).toDF(columns: _*)

df.write.format("csv").save("gs://gcs-test-ilum/output/")

Remember to replace the bucket in the url with your GCS bucket

  1. Click execute

If the code does not throw an erorr, then everythin works. Congratulations!