<img height="1" width="1" style="display:none;" alt="" src="https://dc.ads.linkedin.com/collect/?pid=412841&amp;fmt=gif">

Loading structured and semi-structured data into snowflake, clone and time travel

Today I’d like to show you how simple it is to load data into snowflake. Not just structured data, but also semi-structured ones. I’d also like to show you how you can clone these data and how you can access them previous to their updates by using time travel.

 

LOAD STRUCTURED DATA INTO SNOWFLAKE

The first to do, as soon as you logged in to snowflake, is to create a database:

create database BIMANU_DB;

You can write SQL statements in the Worksheet tab.

You will also need a schema on this database:

create or replace schema BIMANU_SCHEMA;

After that you have to create a stage. This is where the data will be loaded to. It’s actually pretty easy using the snowflake platform:

1) Go to the Databases tab
2) Choose the right database (here: BIMANU_DB)
3) Go to the Stages tab
4) Click on Create

5) Now you can choose a location for the data to be staged. There are three choices: snowflake, amazon and azure. The default one is snowflake. Fort the sake of this tutorial you can choose amazon.

6) Next, you have to give a name to the stage (BIMANU_STAGE), a URL address and two more key id’s.

 

 

 

 

 

 

 

 

7) That’s it! You have successfully created a stage.

 

In order to transport the data from the source to the stage, you are going to need a file format. You can use the snowflake platform for this purpose too:

1) Go to the Database tab
2) Choose the right database (here: BIMANU_DB)
3) Go to the File Formats tab
4) Click on Create

 

 

 

 

 

 

5) Now, you have to give a name to the file format (BIMANU_FILEFORMAT) and you can choose the Format Type. There are many choices: csv, json and so on. The default one is csv. For the sake of this tutorial you can choose csv. You can also configure some other preferences too.

 

 

 

 

 

 

 

 

 

 

6) That’s it! You have successfully created a file format.

 

 

 

 

 

 

 

You still need a table (BIMANU_SALES) to load your data:

CREATE OR REPLACE TABLE BIMANU_SALES 

(
sale_id VARCHAR(255),
sale_price INTEGER,
sale_date DATE,
postcode VARCHAR(16),
property_type CHAR(1),
new_build CHAR(1),
sale_type CHAR(1),
paon VARCHAR(255),
saon VARCHAR(255),
street VARCHAR(255),
locality VARCHAR(255),
town VARCHAR(255),
district VARCHAR(255),
county VARCHAR(255),
code1 CHAR(1),
code2 CHAR(2)
);

Once you’ve created it, you can start the loading process. You can use the snowflake platform to do it or you can type an SQL statement. An SQL statement should look like this:

 

 

 

 

So, now you’ve loaded structured data in snowflake. The next step is to load the semi-structured ones.

 

LOAD SEMI-STRUCTURED DATA INTO SNOWFLAKE

Since you’ve already created the database and the schema previously, you can start by creating the file format. You can follow the same steps as you did for the structured data. The only difference is the Format Type (here: JSON).

 

 

 

 

 

 

 

 

 

 

 

Now you can create a table (BIMANU_JSON) to load the semi-structured data:

CREATE OR REPLACE TABLE BIMANU_JSON (COLUMN1 VARIANT);

This table has only one column (COLUMN1) and its data type is variant. It’s a special data type. You can cast every other data type into variant.

In order to load the semi-structured data you have to click on this new table and then on Load table.

 

 

 

 

 

 

After that you’ll have to follow the four steps below:

1) Firstly, you have to select the Warehouse

 

 

 

 

 

 

 

2) Then you have to specify the location of the semi-structured data. You have two choices: your computer and the S3 bucket (since you chose the amazon location).

 

 

 

 

 

 

 

 

 

Under the assumption that you have stored your semi-structured files on your computer, you can choose this option.

 

3) Once you’ve selected the files, you can choose the JSON-file format.

 

 

 

 

 

 

You can now click on Load if you like or you can click on Next and configure how you want the system to handle errors while parsing the files.

 

4) If you decide for the second one, the default option is to stop loading. Of course you can change it if you like.

 

 

 

 

 

 

 

 

 

 

Once you click on Load, it’s just a matter of seconds before the load process ends.

 

That’s it! As you can see, you only needed a couple of minutes to load both structured and semi-structured data into snowflake.

 

CLONE

Let us now talk about cloning objects. Snowflake offers the possibility to clone a table, a schema or even a whole database in a very short time. The cloned object is independent from the original one. Once you’ve create it, you can update it, delete it or just process it without having an impact on the original one.

For example:

You can create a new database (BIMANU_DEV) as a clone of BIMANU_DB:

CREATE DATABASE BIMANU_DEV CLONE BIMANU_DB;

Then, you can update the values of a column of the cloned database:

UPDATE BIMANU_DEV.BIMANU_SCHEMA.BIMANU_SALES SET SALE_PRICE = 0;

You can check the changes on the cloned table:

SELECT * FROM BIMANU_DEV.BIMANU_SCHEMA.BIMANU_SALES LIMIT 10;

 

 

 

 

 

 

 

 

 

Now let’s have a look at the original table:

SELECT * FROM BIMANU_DB.BIMANU_SCHEMA.BIMANU_SALES LIMIT 10;

 

 

 

 

 

 

 

 

 

 

As you can see, the changes on the cloned table had no impact on the original one.

 

TIME TRAVEL

And now a few words about time travel. Let’s assume that you have updated the values of certain data. Time travel enables you to go back to the past and get access to the original values of these data. Or you can go back and process data, which already have been deleted. Or you can even recreate or clone deleted objects.

For example:

SELECT * FROM BIMANU_DEV.BIMANU_SCHEMA.BIMANU_SALES at(offset =&gt; -60*10) LIMIT 10;

With the above statement you go ten minutes (-60 * 10) back in time and select the data of the cloned object previous to the update.

The result:

 

 

 

 

 

 

 

 

 

 

Hence, snowflake offers you the possibility to work with both updated (or deleted) and original data.

Anastasios Koukothakis, 29.06.2018

Finally, the Snowflake Platform worth 400 US dollars can be tested free of charge. For this purpose we provide a link.

PS: We are pleased to be able to offer training courses as a Snowflake partner in the future. We will publish these as soon as possible in our Rubik training courses.