How to create a storage
Ilum allows you to link GCS, S3, WASDB and HDFS storages to your clusters. Such a link allows Ilum to automatically configure all your jobs to make use of the storages that you have. This way you won't need to add additional spark parameters in order to load and save data from the storage.
Simple Storage
Google Cloud Storage
1. Creating a Google Cloud Storage
Demo:
- Create a Google Cloud project
- Click on the Project Selector in the top left corner
- Click on create button and create a project or choose an existing one
- Create a GCS
- Type GCS into the search bar and choose Cloud Storage option
- Click Create Bucket
- Specify the region configurations and a Bucket name
- Create Service Account
- Type IAM into the search bar and move to IAM dashboard
- Move to Service Account section and create a Service Account by specifying its name
- Choose the Service Account that you have just created and add keys to it
The keys will be installed as JSON (by default) to your Donwloads folder.
This JSON will have private_key
, private_key_id
and client_email
properties - they will be used to connect to GCS from Ilum.
2. Adding GCS to your default cluster
Demo:
- Go to storage edditing page
- Choose a cluster where you want to add a storage
- Click "Add Storage"
- Speicfy the bucket and the name
- Provide GCS authorization details:
- Get the JSON file from previous step
- Fit in the email
- Fit in the private key of your Service Account
- Fir in the private key id of your Service Account
3. Testing if the conneciton works
-
Create a Code group
-
Paste this code:
val data = Seq(
("Alice", 34),
("Bob", 45),
("Cathy", 29)
)
val columns = Seq("name", "age")
val df = spark.createDataFrame(data).toDF(columns: _*)
df.write.format("csv").save("gs://gcs-test-ilum/output/")
Remember to replace the bucket in the url with your GCS bucket
- Click execute
If the code does not throw an erorr, then everythin works. Congratulations!