0% found this document useful (0 votes)
47 views37 pages

SnowPro Core Study Guide

Uploaded by

learninginminute
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
47 views37 pages

SnowPro Core Study Guide

Uploaded by

learninginminute
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 37

SNOWPRO CORE

CERTIFICATION GUIDE
INTRODUCTION
Welcome to this SnowPro Core Certification Guide!

At Intelligen, we're all about helping you extract the maximum amount of
business value from your data.

In this guide, we help you prepare for your SnowPro certification with what
we believe to be the MOST comprehensive FREE guide out there! We have
also included resources from our National Engagement Director, Adam
Morton, Data Authour and official Snowflake Data superhero!

We know it can be a struggle to find high-value, practical advice on


Snowflake. Our aim is to help take you from wasting time looking for
Solutions to becoming a Snowflake expert and avoiding costly pitfalls along
the way!

The bottom line we are all about providing you with robust, real-world advice
based on our unique experiences in the field.
USEFUL RESOURCES
Snowflake Practice Questions
SnowPro Core Certified

Our YouTube Channel


Like and Subscribe!
SECTIONS
About the Exam
Exam Topics
Snowflake Overview and Architecture
Data Movement
Accounts and Security
Data Sharing
Performance and Tuning
Storage and Protection
Streams & Tasks
Working with Semi-Structured Data
ABOUT THE EXAM
100 questions

Question Types: Multiple Select, Multiple Choice, True/False

Time Limit: 115 minutes

Languages: English & Japanese

Registration Fee: $175 USD

Passing Score: 750 + on Scaled Scoring from 0 - 1000


EXAM TOPICS
Domain Estimated Percentage Range

Account and Security 10 - 15%

Virtual Warehouses 15 - 20%

Data Movement 11 - 20%

Performance Management 5 - 10%

Snowflake Overview and Architecture 25 - 30%

Storage and Protection 10 - 15%


OVERVIEW AND ARCHITECTURE

https://www.youtube.com/watch?v=5ugxQi3b16k
SNOWFLAKE OVERVIEW AND ARCHITECTURE
The three key aspects which make up
Snowflake's architecture are:
Authentication and Access Control

Database Storage - manages, organises and Cloud Infrastructure Metadata


Optimizer Security
stores data in an optimised format. Services Manager Manager

Query Processing - manages all query


execution. Query execution is performed in the
Virtual Virtual Virtual
processing layer. Snowflake processes query Warehouse Warehouse Warehouse
using virtual warehouses. Query
Processing
Cloud Services - handles a collection of
services to support the use of Snowflake such
as user authentication, metadata management
and infrastructure management. Database
Storage
SNOWFLAKE OVERVIEW AND ARCHITECTURE
Snowflake offers four different editions; Standard, Enterprise, Business Critical and Virtual Private
Snowflake.

Snowflake runs completely on cloud infrastructure. All components of Snowflake’s service (other than
optional command line clients, drivers, and connectors), run in public cloud infrastructures.

Snowflake can run on Microsoft Azure, Google Cloud Platform and Amazon Web Services Cloud
Infrastructure. It cannot however run on private cloud infrastructures which includes cloud and
on-premise.

There is no hardware (virtual or physical) to select, install, configure, or manage.

There is virtually no software to install, configure, or manage.

Ongoing maintenance, management, upgrades, and tuning are handled by Snowflake.


SNOWFLAKE OVERVIEW AND ARCHITECTURE

Snowflake is committed to providing a seamless, always up-to-date experience for our users
while also delivering ever-increasing value through rapid development and continual
innovation.

To meet this commitment, Snowflake deploys new releases each week.

Accounts are moved to the release using a three-stage approach over two (or more) days.

This staged approach enables Snowflake to monitor activity as accounts are moved and
respond to any issues that may occur.

The deployments happen transparently in the background; users experience no downtime or


disruption of service and are always assured of running on the most recent release with
access to the latest features.
SNOWFLAKE VIRTUAL WAREHOUSES
Warehouses are required for queries, as well as all DML operations, including loading data into tables. A warehouse is
defined by its size, as well as the other properties that can be set to help control and automate warehouse activity.

Warehouses can be started and stopped at any time. They can also be resized at any time, even while running, to
accommodate the need for more or less compute resources, based on the type of operations being performed by the
warehouse.

Warehouse Size Credits/Hour Credits/Second Notes

X-Small 1 0.0003 Default size for warehouses created


using CREATE WAREHOUSE.
Small 2 0.0006

Medium 4 0.0011

Large 8 0.0022

X-Large 16 0.0044 Default for warehouses created in the web interface.

2X-Large 32 0.0089

3X-Large 64 0.0178

4X-Large 128 0.0356

5X-Large 256 0.0711 Preview feature; currently only available on


Amazon Web Services.
6X-Large 512 0.1422 Preview feature; currently only available on
Amazon Web Services.
SNOWFLAKE VIRTUAL WAREHOUSES
In Snowflake a Virtual Warehouse is required to execute queries or DML against the data which
resides within Snowflake.

Any virtual warehouse can read any data.

You can amend the number of clusters in a multi-cluster warehouse at any time.

The maximum number of clusters in a multi-cluster warehouse is 10.

Only new queries will take advantage of additional resources added to the cluster. Queries which are
executing at the time will continue to execute until they complete with the same amount of resources
they started with.

All DML and DDL statements against your data require the use of a Virtual Warehouse.

All servers must be provisioned for the warehouse before they can be used. This is generally very fast
(1-2 seconds). If cost and access are not an issue, enable auto-resume to ensure that the warehouse
starts whenever needed. Keep in mind that there might be a short delay in the resumption of the
warehouse due to server provisioning.
SNOWFLAKE VIRTUAL WAREHOUSES
If you wish to tightly control costs and/or user access, leave auto-
resume disabled and instead manually resume the warehouse only
when needed.

The cache may be reset if the virtual warehouse is suspended and


restarted.

Snowflake credits are charged based on the number of virtual


warehouses you use, how long they run, and their size.

Warehouses come in 10 sizes (2 in preview mode on AWS). The size


specifies the number of servers per cluster in the warehouse.

Users can use the Snowflake web interface under Account > Billing and
usage to view the warehouse usage.

Adding additional clusters to an existing virtual warehouse is called


scaling up. Scaling-out is when you add additional virtual warehouses
to your environment.
SNOWFLAKE VIRTUAL WAREHOUSES

Snowflake utilizes per-second billing the exception of which is a 60-


second minimum each time the warehouse starts

Standard is the default mode which looks to minimise queries queuing


by starting additional clusters at the expense of credits.

Economy node will look to wait longer until more clusters are added.
Cost will be of a higher importance here than queries which are waiting
for available resources.

Maximized is enabled as a mode when both the minimum and maximum


no of clusters are set to the same value. This means Snowflake will start
the cluster with the maximum resources available.

Snowflake recommends to start in auto-scale mode in low numbers and


gradually increase them to find the optimum balance between
performance and cost.
DATA MOVEMENT

https://www.youtube.com/watch?v=M0f81QKlYZQ https://www.youtube.com/watch?v=0uUuiI8yIws
DATA MOVEMENT (UNLOADING & LOADING DATA)
A stage is a location where files are stored before they're loaded into a target table within Snowflake.

There are a number of different kinds of stages which cater for different use cases in Snowflake.

There are 3 Internal Stages (Internal named stages are not cloned):

User Stage - each user is assigned a stage by default for storing files. This works well if only this
individual involved needs to see the files in the staging area.

Table Stage - each table has a stage assigned to it in Snowflake. This is a good option if multiple users
need to access the files in the stage and the data only needs to go to one table.

Named Stage - this stage is a database object which exists in its own Schema. This provides greater
flexibility as it allows access to more than one user while the data can be loaded to one or
more tables. As this is a database object the user has greater control over security and
access privileges.
DATA MOVEMENT (UNLOADING & LOADING DATA)

And 1 External Stage:

This is a typical 'data lake' style pattern.

Files can be loaded to Amazon S3, Azure Blob Storage or Google


Cloud Storage. Snowflake can then be given access to this cloud
storage location and a 'pointer' to this location can be configured.

The benefit of this setup is that the files in the data lake can be
used not only by Snowflake but a wider range of applications.

Additionally, Snowflake can query the files in the external stage


without having to bring the data onto Snowflake.

The cost of this is performance as queries will run slower.


However, it does give allow the user to selectively decide what
data to bring into Snowflake from the data lake based on
business value.
DATA MOVEMENT (UNLOADING & LOADING DATA)
This load metadata expires after 64 days. If the LAST MODIFIED date for a
staged data file is less than or equal to 64 days, the COPY command can
determine its load status for a given table and prevent reloading (and data
duplication).

This can be a useful option to have if you don't have control over your
staging area as it guarantees no duplication will occur during those 64 days.

You are able to specify an explicit set of fields/columns (separated by


commas) to load from the staged data files. You could also use the load
option ERROR ON COLUMN COUNT MISMATCH = TRUE. This will ignore any new
columns appended to the source file. It is worth carefully considering the
behaviour you want here.

It is possible to cast columns to a new data type, concatenate columns, and


re-order columns as part of the COPY INTO command in Snowflake.

The optional parameter ON ERROR = CONTINUE can be used to skip records


which fail to load.

Run the COPY INTO command using the VALIDATION MODE option to validate
files before loading them into Snowflake.
DATA MOVEMENT (UNLOADING & LOADING DATA)

Account administrators (users with the ACCOUNTADMIN role) or users with


a role granted the MONITOR USAGE global privilege can use the Snowflake
web interface or SQL to view the credits billed to your Snowflake account
within a specified date range.

To view the credits billed for Snowpipe data loading for your account:

Click on Account > Billing & Usage. Snowpipe utilization is shown as


a special Snowflake-provided warehouse named SNOWPIPE.

Querying the PIPE USAGE HISTORY View (in Account Usage).

If you want to understand what files have been loaded into Snowflake using
the COPY INTO command you can use SQL and query the LOAD HISTORY
view in ACCOUNT USAGE.

When data is staged to a Snowflake internal staging area using the PUT
command, the data is encrypted on the client’s machine.
ACCOUNTS & SECURITY

https://www.youtube.com/watch?v=SmXlY5n7N58 https://www.youtube.com/watch?v=QTpYidjft5c&t=96s
SNOWFLAKE ACCOUNTS & SECURITY

MFA can be configured for the WebUI, JDBC, and SnowSQL.

Snowflake encrypts all customer data by default, using the latest security standards, at no
additional cost.

Data within Snowflake is always encrypted by default

Enterprise is the lowest level which provides column-level security.

Role Based Access Control (RBAC) means that access privileges are assigned to roles,
which are in turn assigned to users.

Different worksheets can be assigned to different Roles.


SNOWFLAKE ACCOUNTS & SECURITY
The account administrator (i.e users with the ACCOUNTADMIN system role) role is the most
powerful role in the system.

This role alone is responsible for configuring parameters at the account level. Users with the
ACCOUNTADMIN role can view and operate on all objects in the account, can view and manage
Snowflake billing and credit data, and can stop any running SQL statements. Therefore it is
important to limit access to this account.

Standard views might allow data that is hidden from users of the view to be exposed through
user code, such as user-defined functions, or other programmatic methods. Secure views do not
utilize these optimizations, ensuring that users have no access to the underlying data.

Secure views should not be used for views that are defined for query convenience, such as views
created for simplifying querying data for which users do not need to understand the underlying
data representation.

This is because the Snowflake query optimizer when evaluating secure views bypasses certain
optimizations used for regular views. This might result in some impact on query performance for
secure views.
SNOWFLAKE ACCOUNTS & SECURITY

Snowflake utilizes database replication to allow data


providers to securely share data with data consumers
across different regions and cloud platforms. Cross-region
data sharing is supported for Snowflake accounts hosted on
AWS, Google Cloud Platform, or Microsoft Azure.

Use must exclusively grant access to specific database


objects (schemas, tables, and secure views) to the share
before they are made available.

The 5 standard roles available in Snowflake are:


ACCOUNTADMIN, SECURITYADMIN, SYSADMIN, PUBLIC

The GRANTS TO ROLES account usage view can be used to


query access control privileges that have been granted to a
role.
DATA SHARING

https://www.youtube.com/watch?v=zLNKIsDMNOo&t=80s
DATA PLATFORM
There is no hard limit on the number of accounts that can be added
to a share

There is no physical data movement when data is shared with


other users using Data Sharing.

You can share data by creating a Reader account and using the
Share functionality. This functionality works even if the customer is
not a Snowflake customer.

You can add objects to an existing share at any time using the
GRANT <privilege> … TO SHARE command. Any objects that you
add to a share are instantly available to the consumers' accounts
who have created databases from the share.

To create a share using SQL:

Use the CREATE SHARE command to create an empty share.

Use the GRANT <privilege> … TO SHARE command to add a


database to the share and then selectively grant access to specific
database objects (schemas, tables, and secure views) to the share.

Use the ALTER SHARE command to add one or more accounts


access to the share.
PERFORMANCE & TUNING

https://www.youtube.com/watch?v=h849iLXJEmk&t=1s
SNOWFLAKE PERFORMANCE & TUNING

Increasing the virtual warehouse size also known as


'Scaling Up' can help if you have a long running
query. You would have to resubmit the query on the
larger warehouse size and review the results. This
will help you assess if the additional CPU and memory
have aided performance.

In order to make use of the query results cache the


query must be exactly the same, executed within a 24
hour time frame and the underlying data must not
have changed.

Snowflake best practice suggests that you should only


consider re-clustering when a table reaches 1TB or
more and queries against the table start to take a long
time to execute.

Multi-cluster warehouses are designed specifically for


handling queuing and performance issues related
to large numbers of concurrent users and/or queries.
SNOWFLAKE PERFORMANCE & TUNING

Clustering keys can be changed for those tables


which are over 1TB is size where query performance
is suffering.

Dedicated virtual warehouses can be used to


separate workloads and remove contention for the
same resources. For example, you could have a
warehouse for the ETL process, another for your
Data Vizualization tool and another for your Finance
department for example.

In the history area of the Snowflake WebUI the


query history is available for 14 days.
STORAGE & PROTECTION

https://www.youtube.com/watch?v=990KcEXT7Fw
SNOWFLAKE STORAGE AND PROTECTION
The 3 types of table in Snowflake are:
Permanent
Transient
Temporary

Temporary tables only exist within the session in


which they were created and persist only for the
remainder of the session. As such, they are not visible
to other users or sessions. Once the session ends,
data stored in the table is purged completely from the
system and, therefore, is not recoverable, either by the
user who created the table or Snowflake.

Transient and temporary tables have no Fail-safe


period. As a result, no additional data storage charges
are incurred beyond the Time Travel retention period.

By default temporary and transient tables have a time


travel retention period of 1 day.
SNOWFLAKE STORAGE AND PROTECTION

Snowflake supports two types of views.

Standard Views (or views): A named definition of a SQL query. The


SQL query executes against the underlying tables every time the view
is executed.

Materialized Views: Although defined similar to a view it behaves


more like a table. The results are physically stored which allows for
faster access. The trade off here is there is a cost associated with
storage.

Snowflake removes a lot of the administrational overhead


associated with looking after a database. This includes
taking backups. Snowflake handles all of this in the
background as part of its continuous data lifecycle feature.

Storage costs are calculated based on the daily average of


data stored for that billing month.
SNOWFLAKE STORAGE AND PROTECTION

All data in Snowflake tables is automatically divided into micro-


partitions, which are contiguous units of storage.

Each micro-partition contains between 50 MB and 500 MB of


uncompressed data (note that the actual size in Snowflake is smaller
because data is always stored compressed).

Snowflake can automatically manage the micro-partitions. Doing this


removes the need for a DBA to look after this.

Snowflake stores metadata about all rows stored in a micro-partition,


including:

The range of values for each of the columns in the micro-partition.

The number of distinct values.

Additional properties used for both optimization and efficient query processing
SNOWFLAKE STORAGE AND PROTECTION

The most efficient way of creating copy of your production environment (or specific objects) is
to create a clone of the required objects to the test environment. This process involves zero data
movement and does not require any additional storage.

You can update records in the cloned object. While this doesn't impact the source table it does
use additional storage at this point to maintain the row version.

Fail-Safe is for use only by Snowflake to recover data that may have been lost or damaged due
to extreme operational failures.

Fail-safe provides a (non-configurable) 7-day period during which historical data may be
recoverable by Snowflake.

The micro-partition metadata maintained by Snowflake enables precise pruning of columns in


micro-partitions at query run-time, including columns containing semi-structured data. In other
words, a query that specifies a filter predicate on a range of values that accesses 10% of the
values in the range should ideally only scan 10% of the micro-partitions.
STREAMS & TASKS

There are no additional costs associated with Streams or Tasks.

A task can execute a single SQL statement, including a call to a stored


procedure.

Tasks are a way of linking SQL statements together to run in a pre-


defined order.

Tasks must be triggered off a schedule initially. Following the


triggering of the parent task child tasks can be subsequently called.

A task name, valid SQL statement, and virtual warehouse are all
required when creating a task.

Child tasks can only be called from a Parent task. Parent tasks must
be triggered from a schedule.

All tasks in a simple tree must have the same task owner (i.e. a single
role must have the OWNERSHIP privilege on all of the tasks in the tree)
and be stored in the same database and schema.
STREAMS & TASKS
When you run ALTER TABLE …CHANGE TRACKING = TRUE against a table, this adds a pair of hidden
columns to the table and begins storing change tracking metadata. The columns consume a small
amount of storage.

You must create a stream against the source table. Then create the target table to store the
changes. You must create a task which checks for the data in the stream and then merges changes
into the target table, this can either be a call to a stored procedure which holds the SQL logic or SQL
within the task itself. Finally, you must set the task to run.

The following system function SYSTEM$STREAM HAS DATA indicates whether a specified stream
contains change data capture (CDC) records.

The offset is a bookmark to the stream position. The offset is advanced when the stream is used in a
DML statement. The position is updated at the end of the transaction to the beginning timestamp of
the transaction.

The METADATA$ISUPDATE: Specifies whether the action recorded (INSERT or DELETE) is part of
an UPDATE applied to the rows in the source table.
WORKING WITH SEMI-STRUCTURED DATA

Semi-structured data should be loaded into a


column with the Variant data type.

The Variant data type is a tagged universal


type, which can store values of any other
type, including OBJECT and ARRAY, up to a
maximum size of 16 MB compressed.

The recommended approach for making a


variant column accessible in a BI tool is to use
a view to flatten the data in order to provide it
to a BI tool is the best approach here.

The Object Construct function allows you to


create a JSON output from data within a
relational table in Snowflake.
THANK YOU
intelligengroup.com

[email protected]

@intelligen.au

You might also like