0% found this document useful (0 votes)
3 views

sf

The document provides an overview of Snowflake's staging areas, including internal, external, and named stages, and their respective commands for loading and unloading files. It covers Snowpipe for automated data loading, micro-partitioning, clustering, cloning, time travel, and the types of tables and views available in Snowflake. Additionally, it explains the use of parameters for data retention and error handling during data loading operations.

Uploaded by

Kittu Tiger
Copyright
© © All Rights Reserved
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

sf

The document provides an overview of Snowflake's staging areas, including internal, external, and named stages, and their respective commands for loading and unloading files. It covers Snowpipe for automated data loading, micro-partitioning, clustering, cloning, time travel, and the types of tables and views available in Snowflake. Additionally, it explains the use of parameters for data retention and error handling during data loading operations.

Uploaded by

Kittu Tiger
Copyright
© © All Rights Reserved
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
You are on page 1/ 8

Snowflake

Stage area: An itermediate stage and space in cloud to the files before loading
into target environment
so staging area is in betweeen files souce to destination.A tempararory space
adrea to load and unload the files.We cant load the files in to cloud. before we
can load we need to load the files in to Stage area.

Types of Stages:

Internal Stage: Space of staging area is controlled by Snowflake

External Stage: Space of Staging area is controlled by Cloud Provider

Internal Stage: again it will be devided in to 3 types

User Stage: when ever a user logged or added in to SF account, SF autometically


will provide some space to user.

It can denoted with @~

Student is the table

csv id the file format

mycsv is thefile

fileformat

list @~;
put file ://D:\Movies\Snowflake/Testfile.Csv @~;
copy into STUDENT from @~
files =("Testfile.Csv.gz")
file_format = (format_name=Mycsv);

Table Stage: Space provided by SF can use every table. any one can use tthat table
space.
it is denoted with @%table name
ex:@%student

list @%student;
put file://file location/ @%student
copy into student from @%student
files =('file2.csv.gz')
file_format = (file-name =My_pipe)

-->Rm is the command used to remove the files in staging area


in user stage RM @~
in table stage RM @%table name--complete files
RM @%student-- complete files

list @%student;
put file://file location/ @%student
copy into student from @%student
files =('file2.csv.gz')
file_format = (file-name =My_pipe)
Purge=True

purge is used to remove the files from (files we wants) staging area after
succesfully loadidng into table

--> Named Stage: by using named stage one user can create stage another user can
use that stage area and any one can use and create the tables in named stage.

it is denoted with @Stage

Create STAGE local-stage


put file://<filepath>/@local-stage
copy into student from @local-stage
files =('file2.csv.gz')
file_format =(file_name=Mycsv)

-->Copying multiple files(all are similar files(csv))


we have files called file1.csv, file2.csv,file3.csv and file4.csv for all this
files wants to put in local stage. from local stage to table

put file://<file1location>/@local-stage
put file://<file2location>/@local-stage
put file://<file3location>/@local-stage
putfile://<file4location>/@local-stage

to see the files in local stage... list @local_stage

now copy the files to table(student) from local stage

copy into student from local-stage

files =('file1.csv,file2.csv,file3.csv,file4.csv')
file_format =(file_name= mycsv)

But this is not best practice to load multiple files. So we can use PATTERN

copy into student from @local-stage


pattern = '.*[].csv.gz'
file_format = (file_name=my csv)

--> If the file format has difftrent types of formats then while loading we get
error. to avoid this we can use on-error

Ex: I have 100 files in that 99 files are csv and 1 is pipe.. in that cenario if i
load 100 files i get error so to acheive this

copy into student @local-stage


pattern = '.*[].csv.gz'
file_format =(file_name=mycsv)
purge=true
On_error = Skip_file (or)
On_error = Continue
so it will load all 99 csv files and unload one pile file

-->To load the same data again and again we can use force.so it didnt check
irrespective of data loaded previously or not it will load the data again and again

copy into student @local-stage


pattern = '.*[].csv.gz'
file_format =(file_name=mycsv)
force= true

External Stage: where stage is controlled by cloud provider

to load the files in ext stage

list @extstage
copy into student from @extstage
pattern = 'file1.csv'
file-format =(file-name=my csv)
on-error = skip-file

to check the errors in table

select * from table (validate(student, job-id =>'qid of job'))


qid=12345...

Snow Pipe: it is a serverless service is sf.


it is used to autometically load the data in to sf table if any new file uploaded
in external S3 Bucket.
No need to write copy into command

Creation of snowpipe and loading the data in to sf:

lets assume pipe name as Testpipe

Create pipe Testpipe


auto_ingest = True
as
copy into student from extstage
file-format =My-csv

To check the history Of Snow pipe

how many times files will uploadedin sf and failed to upload by below command

select * from table(information-schema.copy_history(table_name=>'student',start-


time=>dateadd(hours,-5,currenttimestamp())));

to see the pipes

Show Pipes

Micro partition: to store the data in tables sf it self will devide in to small
parts called mico partition
it is columnar format
every micro partition consist of 50 to 500mb
it stores the meta data(data about data) like range of each column
Clustering:

After applying Clustering on micro partitions data will be stores in orderly.


so it will improove the performance

How to apply Clustering:


let assume Test is the table consist of C1,C2 columns

Create table Test (C1 int,C2 int),


Cluster By (C1, C2)
For already existed table:
consider Test2 is the already exist table

Alter table Test2 Cluster By (C1,C2)

Re Clustering:
Re clustering is used to cluster the already clustered table.
it is used when new values are updated in cluster table we can use re clustering

Alter table Test Recluster

To stop the reclustering

Alter table Test Resume Reclustering

To suspend the Reclustering

Alter table Test Suspend Reclustering

Cloaning:

In genral cloaning is creating relicas of existance table is called cloaning.

The main purpose of cloaning is at a time many users can do process the data on
single table

with out cloaning it will impact on other user like if A is user updating Table
called Test at the same time user B is deleting some files table Test. it will get
work disturbance to each user

to avoid this we can use cloaning.

From Test table i will create 2 rep-licas


Test A for user A
Test B for user B
the main dis advantage of cloaning is storage.
if Test table has 1 gb size the replicas Test A and Test B also have same size 1 gb
so total size would be 2 Gb... to avoid this situation we can use Zero Cloaning.

Zero Cloaning: In Zero cloaning it will copy only structure of Master table.
it wont copy the data present in Master table
it will create a link which is pointing to the MP in Master table
If any DML operations(Insert, Update, Delete, Create) are done it will create New
MP in cloned tables (Data will update in Cloned tables only not in Master Table)
If any old data is there it will read from Master table.

How To Clone:

Test is the Master table, Test_A is the clone table


Create table Test_A Clone Test
for verification
select * from Test_A

Cloaning the Schema

if we clone the schema it will created cloned objectes for all the tables under
that Schema

ex: Department is schema Department_C is cloned schema

Create Schema Department_C clone Department

Cloaning the database

Create Databse DB-C clone DB

Time Travel:

Immutable: Once Records are updated in MP we cant


manipulate(Insert,Create,Delete,Update) the data untill certain time.

This Certain time is called as "TimeTravel"

if by mistake any data has lost or corrupted we can get the data by using time
travel

How to get the data:

we can get the data by 3 ways

1) TimeStamp
2)Offset
3) Quiry id

AGGfunctions is table

select * from AGGfunctions at(timestamp='2023-08-29 04:48:605' :: Timestamp)

select * from AGGfunctions at(Offset = -60*5)

--> - back, 60 is 1 hour, 5 is 5 minuts


total 5 minuts back

select * from AGGfunctions Before(Statement ='Qid(123....)')

--> By default SF permist 90 days is the maximum time travel


--> After 90 days data will move in to 'Fail safe peroid'
we cant get data manually in fail safe.to get the data from fail safe we have to
reach out SF Customer support.
and fail safe period is 7 days.

--> We can set a value to time travel by using "Data_retention_Time_IN_Days"


parameter
--> By using Data_Retention_Time_IN_Days we can get all the historical data till
that period

to see this parameter


show parameters for table AGGfunctions

How to set the parameter:


lets consider i wants to see the data up to 30 days

Alter Table Aggfunctions Set Data_Retention_Time_IN_Days =30;

Alter Schema Department Set Data_Retention_Time_IN_Days =30;

So it will maitain the data 30 days out of 90 days

Types of Tables:

in sf there are 4 types of tables

1) Physical table: this is the table in genrally using normal day to day
activities. this table consist TT 90 days and FS 7 days

2) Temporary Table: This is the table we can create only for Temp0rary purpuse.
This table consist of 1 day TT and 0 FS
It is Session Dependable. Once session is closed table
autometically dropped off. If we retrive the data untill session has been closed
only.

How to Create Temporary Table:

table 2 is the table name

Create Temporary table Table 2(Id int, Name Varchar)

Never give same name to Temporary table which as already name given to physical
table bcs sf will give more priority to the Temp Table.

3) Transient Table: Table which is similar to Physical table only diff is no fail
safe

Create Transient table table3(Id int, Name varchar)

4) External Table: which can access the external files as table. it doesnt have TT
and FS bcs it is not loading the table data it just reading the data from table

Create External table table5


with loacation = @Myexternalstage
file-format = My-CSV

Viwes: View is nothing but Logical representation of table. It wont store the data
Physically.

it is similar to the table the main diff is it wont store the data

it has less performance(DIS Advantage)


No need of refresh the view each time. If any data is incerted in main table
autometically it will reflect in views

Ex: I have a table called Hospital

Create table hosptital(sno int, PId int,Pname varchar,Treatment varchar,diagnosis


varchar,billing address varchar)

insert into hospital values(........)

assume we are creating 2 views

1) Doctor_view
2)Accounts_view

Create View Doctor_view


As
Select Pname,Treatment,Diagnosis from Hospital;

2)Accounts_view

Create View Accounts_view


As
Select PId,Billing address,Cost From Hospital;

So wehave devided Hospital data in to 2 loical tables which are using respective
sections.

Types Of Views:

1)Non Materialized View: Above one is the ex

2)Materialized View: it is similar to the table it can store the data physically

Creation Of Materialized View:

Create Materialized View Doctor_view


As
Select Pname,Treatment,Diagnosis
From Hospital;

You might also like