0% found this document useful (0 votes)
3 views

Lab2 - Data Loading - Snowflake Training-converted (AutoRecovered) (1)

This document outlines a lab training on data loading in Snowflake, covering the creation of internal stages, pushing data, and querying data. It includes steps for creating file formats, staging tables, and copying data from stages to tables, along with validation and simple data transformations. Participants will learn the basics of data loading and how to manage data within Snowflake effectively.
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Lab2 - Data Loading - Snowflake Training-converted (AutoRecovered) (1)

This document outlines a lab training on data loading in Snowflake, covering the creation of internal stages, pushing data, and querying data. It includes steps for creating file formats, staging tables, and copying data from stages to tables, along with validation and simple data transformations. Participants will learn the basics of data loading and how to manage data within Snowflake effectively.
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 10

Lab-Data Ingestion - Data Loading

Lab - Snowflake Training


What will you Learn? 2

High Level Overview of Data Loading in Stage 2

Stage 3
Create Internal Stage 3
Push Data to Stage 3

File Formats 4

Create Staging Table 5

Copy Data 6

Validating Results 6

Directly Querying Data on Stage 7

Simple Data Transformation During Load 8


What will you Learn?
In this lab, you will learn
1. Basics of data loading in Snowflake
2. Stage - internal/external stage, creating stage, pushing data to stage, basic queries
on stage
3. Formats - creating formats
4. Creating Tables
5. Copying data from stage to tables

High Level Overview of Data Loading in Stage


1. Go To https://www.transtats.bts.gov/DL_SelectFields.asp?Table_ID=258
2. Select All The Columns And download the CSV file
3. Load the CSV File into Stage
4. Query the Columns From Stage
5. Cross Verify the column Names and CSV file
6. Select required columns from Stage
7. Create the table Flight_performance_raw
8. The table script is down below if needed but I don’t want you to
see the script
9. Load the data into Table – Copy into <table> from (Select $1,$2….) in this type

Stage
Create Internal Stage

First step is to create an internal stage. A stage is always created for a database and a schema.

USE BTSDB;

--create schema specific to you


CREATE SCHEMA IF NOT EXISTS RAWSCHEMA_<YourName>;
USE SCHEMA RAWSCHEMA_<YourName>;

CREATE STAGE IF NOT EXISTS BTSDataStage;

Push Data to Stage


Note: If you do not have SnowSQL installed, you can skip these steps and work with
BTSDB.RAWSCHEMA.BTSDataStage which already has data files

Download data from Google Drive shared separately


#The put command can only be executed using SnowSQL (or Python etc
connectors)

#connect to SnowSQL in terminal window

snowsql -u <username> -a <account> USE BTSDB;


USE SCHEMA RAWSCHEMA_<YourName>;

PUT file://<your download location> @BTSDataStage


(e.g. PUT file://data/btsdata/extract/* @BTSDataStage)

You can provide specific path to Stage too

PUT file://<your download location> @%BTSDataStage/2014/06/01


Partitioning the data on stage helps in better organizing it

Listing the files on stage

list @BTSDataStage;

File Formats
File formats are associated with a specific database and schema.
Switch to worksheet on browser for this exercise. You can continue to use SnowSQL as well.

USE BTSDB;
USE SCHEMA RAWSCHEMA_<YourName>;

CREATE OR REPLACE FILE FORMAT


FLIGHT_PERFORMANCE_RAW_FORMAT
TYPE = 'CSV' COMPRESSION = 'AUTO' FIELD_DELIMITER = ','
RECORD_DELIMITER = '\n'
SKIP_HEADER = 1 FIELD_OPTIONALLY_ENCLOSED_BY = '\"'
TRIM_SPACE = FALSE ERROR_ON_COLUMN_COUNT_MISMATCH = FALSe
ESCAPE = 'NONE'
DATE_FORMAT = 'AUTO'
Create Staging Table
USE BTSDB;
USE SCHEMA RAWSCHEMA_<YourName>;

CREATE TABLE IF NOT EXISTS


FLIGHT_PERFORMANCE_RAW( YE
AR INT,
QUARTER INT,
MONTH INT,
DAY_OF_MONTH INT,
DAY_OF_WEEK INT,
UNIQUE_CARRIER varchar(3),
FL_NUM STRING(6),
ORIGIN varchar(50),
ORIGIN_CITY_NAME STRING(50),
DEST STRING(6),
DEST_CITY_NAME STRING(50),
DEP_DELAY number(8, 2),
ARR_DELAY numbar(8, 2),
CANCELLED INT,
CANCELLATION_CODE STRING(6),
DIVERTED INT,
CARRIER_DELAY number(8, 2),
WEATHER_DELAY number(8, 2),
NAS_DELAY number(8, 2),
SECURITY_DELAY number(8, 2),
LATE_AIRCRAFT_DELAY number(8, 2)
);
Copy Data
Data Copy needs active Warehouse

USE BTSDB;
USE SCHEMA RAWSCHEMA_<YourName>;

USE warehouse Load_WH;

COPY INTO FLIGHT_PERFORMANCE_RAW


FROM @BTSDataStage
file_format = FLIGHT_PERFORMANCE_RAW_FORMAT;

Validating Results
USE BTSDB;
USE SCHEMA RAWSCHEMA_<YourName>;

select month, count(*) from FLIGHT_PERFORMANCE_RAW group by 1;


Directly Querying Data on Stage
USE BTSDB;
USE SCHEMA RAWSCHEMA_<YourName>;

SELECT count(*)
FROM @BTSDataStage
(FILE_FORMAT => FLIGHT_PERFORMANCE_RAW_FORMAT) t;

SELECT t.$1, t.$2, t.$3


FROM @BTSDataStage
(FILE_FORMAT => FLIGHT_PERFORMANCE_RAW_FORMAT) t
Limit 1;

SELECT metadata$filename, metadata$file_row_number, t.$1, t.$2


FROM @BTSDataStage
(FILE_FORMAT => FLIGHT_PERFORMANCE_RAW_FORMAT) t
LIMIT 10;

SELECT distinct metadata$filename


FROM @BTSDataStage
(FILE_FORMAT => FLIGHT_PERFORMANCE_RAW_FORMAT) t
LIMIT 10;
Simple Data Transformation During Load
Snowflake supports basic transformations during load operation. This helps in column
reordering, column omission, casting, loading subset of data, adding sequence columns, adding
current timestamp values etc.

USE BTSDB;
USE SCHEMA RAWSCHEMA_<YourName>;

create temporary table carrier


(
carrier_id number autoincrement start 1 increment 1,
unique_carrier string(6),
fl_num string(6)
);

copy into carrier (unique_carrier, fl_num)


from (
select distinct t.$6, t.$7
FROM @BTSDataStage
(FILE_FORMAT => FLIGHT_PERFORMANCE_RAW_FORMAT) t
);

select count(*) from carrier;


select * from carrier limit 10;

drop table carrier;

You might also like