SnowPro Core Study Guide
SnowPro Core Study Guide
CERTIFICATION GUIDE
INTRODUCTION
Welcome to this SnowPro Core Certification Guide!
At Intelligen, we're all about helping you extract the maximum amount of
business value from your data.
In this guide, we help you prepare for your SnowPro certification with what
we believe to be the MOST comprehensive FREE guide out there! We have
also included resources from our National Engagement Director, Adam
Morton, Data Authour and official Snowflake Data superhero!
The bottom line we are all about providing you with robust, real-world advice
based on our unique experiences in the field.
USEFUL RESOURCES
Snowflake Practice Questions
SnowPro Core Certified
https://www.youtube.com/watch?v=5ugxQi3b16k
SNOWFLAKE OVERVIEW AND ARCHITECTURE
The three key aspects which make up
Snowflake's architecture are:
Authentication and Access Control
Snowflake runs completely on cloud infrastructure. All components of Snowflake’s service (other than
optional command line clients, drivers, and connectors), run in public cloud infrastructures.
Snowflake can run on Microsoft Azure, Google Cloud Platform and Amazon Web Services Cloud
Infrastructure. It cannot however run on private cloud infrastructures which includes cloud and
on-premise.
Snowflake is committed to providing a seamless, always up-to-date experience for our users
while also delivering ever-increasing value through rapid development and continual
innovation.
Accounts are moved to the release using a three-stage approach over two (or more) days.
This staged approach enables Snowflake to monitor activity as accounts are moved and
respond to any issues that may occur.
Warehouses can be started and stopped at any time. They can also be resized at any time, even while running, to
accommodate the need for more or less compute resources, based on the type of operations being performed by the
warehouse.
Medium 4 0.0011
Large 8 0.0022
2X-Large 32 0.0089
3X-Large 64 0.0178
You can amend the number of clusters in a multi-cluster warehouse at any time.
Only new queries will take advantage of additional resources added to the cluster. Queries which are
executing at the time will continue to execute until they complete with the same amount of resources
they started with.
All DML and DDL statements against your data require the use of a Virtual Warehouse.
All servers must be provisioned for the warehouse before they can be used. This is generally very fast
(1-2 seconds). If cost and access are not an issue, enable auto-resume to ensure that the warehouse
starts whenever needed. Keep in mind that there might be a short delay in the resumption of the
warehouse due to server provisioning.
SNOWFLAKE VIRTUAL WAREHOUSES
If you wish to tightly control costs and/or user access, leave auto-
resume disabled and instead manually resume the warehouse only
when needed.
Users can use the Snowflake web interface under Account > Billing and
usage to view the warehouse usage.
Economy node will look to wait longer until more clusters are added.
Cost will be of a higher importance here than queries which are waiting
for available resources.
https://www.youtube.com/watch?v=M0f81QKlYZQ https://www.youtube.com/watch?v=0uUuiI8yIws
DATA MOVEMENT (UNLOADING & LOADING DATA)
A stage is a location where files are stored before they're loaded into a target table within Snowflake.
There are a number of different kinds of stages which cater for different use cases in Snowflake.
There are 3 Internal Stages (Internal named stages are not cloned):
User Stage - each user is assigned a stage by default for storing files. This works well if only this
individual involved needs to see the files in the staging area.
Table Stage - each table has a stage assigned to it in Snowflake. This is a good option if multiple users
need to access the files in the stage and the data only needs to go to one table.
Named Stage - this stage is a database object which exists in its own Schema. This provides greater
flexibility as it allows access to more than one user while the data can be loaded to one or
more tables. As this is a database object the user has greater control over security and
access privileges.
DATA MOVEMENT (UNLOADING & LOADING DATA)
The benefit of this setup is that the files in the data lake can be
used not only by Snowflake but a wider range of applications.
This can be a useful option to have if you don't have control over your
staging area as it guarantees no duplication will occur during those 64 days.
Run the COPY INTO command using the VALIDATION MODE option to validate
files before loading them into Snowflake.
DATA MOVEMENT (UNLOADING & LOADING DATA)
To view the credits billed for Snowpipe data loading for your account:
If you want to understand what files have been loaded into Snowflake using
the COPY INTO command you can use SQL and query the LOAD HISTORY
view in ACCOUNT USAGE.
When data is staged to a Snowflake internal staging area using the PUT
command, the data is encrypted on the client’s machine.
ACCOUNTS & SECURITY
https://www.youtube.com/watch?v=SmXlY5n7N58 https://www.youtube.com/watch?v=QTpYidjft5c&t=96s
SNOWFLAKE ACCOUNTS & SECURITY
Snowflake encrypts all customer data by default, using the latest security standards, at no
additional cost.
Role Based Access Control (RBAC) means that access privileges are assigned to roles,
which are in turn assigned to users.
This role alone is responsible for configuring parameters at the account level. Users with the
ACCOUNTADMIN role can view and operate on all objects in the account, can view and manage
Snowflake billing and credit data, and can stop any running SQL statements. Therefore it is
important to limit access to this account.
Standard views might allow data that is hidden from users of the view to be exposed through
user code, such as user-defined functions, or other programmatic methods. Secure views do not
utilize these optimizations, ensuring that users have no access to the underlying data.
Secure views should not be used for views that are defined for query convenience, such as views
created for simplifying querying data for which users do not need to understand the underlying
data representation.
This is because the Snowflake query optimizer when evaluating secure views bypasses certain
optimizations used for regular views. This might result in some impact on query performance for
secure views.
SNOWFLAKE ACCOUNTS & SECURITY
https://www.youtube.com/watch?v=zLNKIsDMNOo&t=80s
DATA PLATFORM
There is no hard limit on the number of accounts that can be added
to a share
You can share data by creating a Reader account and using the
Share functionality. This functionality works even if the customer is
not a Snowflake customer.
You can add objects to an existing share at any time using the
GRANT <privilege> … TO SHARE command. Any objects that you
add to a share are instantly available to the consumers' accounts
who have created databases from the share.
https://www.youtube.com/watch?v=h849iLXJEmk&t=1s
SNOWFLAKE PERFORMANCE & TUNING
https://www.youtube.com/watch?v=990KcEXT7Fw
SNOWFLAKE STORAGE AND PROTECTION
The 3 types of table in Snowflake are:
Permanent
Transient
Temporary
Additional properties used for both optimization and efficient query processing
SNOWFLAKE STORAGE AND PROTECTION
The most efficient way of creating copy of your production environment (or specific objects) is
to create a clone of the required objects to the test environment. This process involves zero data
movement and does not require any additional storage.
You can update records in the cloned object. While this doesn't impact the source table it does
use additional storage at this point to maintain the row version.
Fail-Safe is for use only by Snowflake to recover data that may have been lost or damaged due
to extreme operational failures.
Fail-safe provides a (non-configurable) 7-day period during which historical data may be
recoverable by Snowflake.
A task name, valid SQL statement, and virtual warehouse are all
required when creating a task.
Child tasks can only be called from a Parent task. Parent tasks must
be triggered from a schedule.
All tasks in a simple tree must have the same task owner (i.e. a single
role must have the OWNERSHIP privilege on all of the tasks in the tree)
and be stored in the same database and schema.
STREAMS & TASKS
When you run ALTER TABLE …CHANGE TRACKING = TRUE against a table, this adds a pair of hidden
columns to the table and begins storing change tracking metadata. The columns consume a small
amount of storage.
You must create a stream against the source table. Then create the target table to store the
changes. You must create a task which checks for the data in the stream and then merges changes
into the target table, this can either be a call to a stored procedure which holds the SQL logic or SQL
within the task itself. Finally, you must set the task to run.
The following system function SYSTEM$STREAM HAS DATA indicates whether a specified stream
contains change data capture (CDC) records.
The offset is a bookmark to the stream position. The offset is advanced when the stream is used in a
DML statement. The position is updated at the end of the transaction to the beginning timestamp of
the transaction.
The METADATA$ISUPDATE: Specifies whether the action recorded (INSERT or DELETE) is part of
an UPDATE applied to the rows in the source table.
WORKING WITH SEMI-STRUCTURED DATA
@intelligen.au