0% found this document useful (0 votes)
82 views

Matillion Optimizing Snowflake

This document provides an overview and introduction to optimizing Snowflake, a cloud data warehouse. It discusses Snowflake's unique architecture, including its "multi-cluster shared data" approach which separates compute and storage capabilities. The document also outlines chapters on accessing and loading data into Snowflake, exporting data, managing storage and compute costs, and best practices. The goal is to help users understand Snowflake's capabilities and how to make the most of the platform for ingesting, manipulating, and exporting data efficiently.

Uploaded by

kmdeore359
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
82 views

Matillion Optimizing Snowflake

This document provides an overview and introduction to optimizing Snowflake, a cloud data warehouse. It discusses Snowflake's unique architecture, including its "multi-cluster shared data" approach which separates compute and storage capabilities. The document also outlines chapters on accessing and loading data into Snowflake, exporting data, managing storage and compute costs, and best practices. The goal is to help users understand Snowflake's capabilities and how to make the most of the platform for ingesting, manipulating, and exporting data efficiently.

Uploaded by

kmdeore359
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

A

Optimizing Snowflake
A real-world guide
Optmizing Snowflake: A real-world guide
1

Contents
2 Introduction
3 About this book
4 Snowflake Architecture
5 Snowflake Virtual Warehousing
7 Accessing Snowflake
9 Loading Data
16 Exporting Data
Optmizing Snowflake: A real-world guide

18 Storage and Compute Costs


20 Best Practices
21 Conclusion
2

Introduction

If you are reading this eBook, you are probably considering This eBook is brought to you by Matillion. Matillion is an
or have already selected Snowflake—the data warehouse industry-leading data transformation solution for cloud
built for the cloud—as your modern cloud data warehouse. data warehouses. Delivering a true end-to-end data
That is an excellent choice. You are ahead of the curve transformation solution (not just data prep or movement
(and your competitors, with their outdated on-premise from one location to another), Matillion provides an instant-
databases). on experience to get you up and running in just a few
Now what? clicks, a pay-as-you-go billing model to cut out lengthy
procurement processes, and an intuitive user interface
Whether you’re a data warehouse developer, a data to minimize technical pain and speed up time to results.
architect or manager, a business intelligence specialist, an Matillion is available globally for Snowflake on AWS
Optmizing Snowflake: A real-world guide

analytics professional, or a tech-savvy marketer, you now Marketplace, Microsoft Azure Marketplace and Google
need to make the most of the Snowflake platform to get Cloud Platform.
the most out of your data. That’s where this eBook comes in
handy. It introduces the Snowflake platform and helps you More information is available at www.matillion.com
understand the various options available in the platform to
ingest, manipulate, and export data that is performant and
cost-efficient.
3

About
this book
Snowflake Architecture Accessing Snowflake Exporting Data
In this chapter, you’ll “look under the This chapter looks Snowflake access This chapter describes differences
hood” to understand Snowflake’s options. This may be different from between exporting data to an internal
unique underlying architecture. The what you are used to since you or external stage or to a local file
primary differences between Snowflake will have more options. Once you system or other target. What you want
and other cloud offerings is Snowflake’s understand the inner workings of to do with the data may affect which
“multi-cluster shared data” approach, Snowflake and can access it, you can option is best for your use case.
which separates your compute start making the most of it. Query
and storage capabilities to improve your data using SnowSQL, which is
Storage and
concurrency. Data is accessible as specifically designed for Snowflake, or
your users need it, speeding up data use a number of other methods, giving Compute Costs
processing and analysis. you the flexibility you need to get the Costs for storage and compute
data you need. resources couldn’t be simpler due to
Snowflake’s pay-as-you-go pricing
Snowflake Virtual Loading Data model. Although Snowflake is a cost-
Optmizing Snowflake: A real-world guide

Warehousing This chapter walks you through ways efficient solution, this chapter provides
Snowflake is essentially a cluster to bring data into Snowflake. This some tips that can help you keep costs
of compute resource, which makes information is useful if your business is down and under control.
Snowflake a scalable and powerful just starting out on Snowflake and you
solution. Knowing how to size your need to populate the data warehouse Best Practices
resources can help you take advantage with historical data or external data The final chapter ties together all the
of Snowflake’s highly scalable, low- from many different sources. It’s also information in this eBook into best
cost, and automatically managed useful if you want to improve your data practices that reduce costs, increase
enterprise data warehouse. Ultimately, loading processes. performance, and help you make the
this makes your business more most out of your data.
performant and cost-efficient.
4

Snowflake
Architecture

Snowflake is a fully relational, analytical, SQL-based, virtual If you have different workloads with different needs, you can
data warehouse built from the ground up to take advantage size your warehouses to fit your current need. You can even
of the full elasticity and scalability of the cloud. Snowflake create “multi-cluster warehouses” that scale automatically
delivers a database-as-a-service (DBaaS) platform to relieve to accommodate the number of concurrent queries. Best of
users of the complexity and administrative burdens that all, you can turn off any of the virtual warehouses at any time,
plague traditional architectures. Furthermore, Snowflake was so you pay only for what you use.
built with a new, unique architecture called “multi-cluster
shared data.” On top of its scalable storage and compute platform,
Snowflake handles optimization, security, and availability for
Snowflake can give your company the flexibility and agility your business and provides many other overhead-reducing
Optmizing Snowflake: A real-world guide

to meet changing data needs. Flexible cloud storage allows benefits. This makes it a fully managed solution. Lastly, you
you to store nearly unlimited amounts of structured and can move real-time data, at nearly unlimited scales. With
semi-structured data in a single location, consolidating Snowflake Data Sharing, data can be simply shared with
your disparate data sources. Compute nodes, called “virtual another Snowflake account.
warehouses” execute queries and perform transformations
on your data. These compute nodes can be created at any
time using either SQL commands or the Snowflake UI.
This means your virtual warehouse is scalable, allowing
your business to respond to growing data needs, without
additional procurement or managerial overhead.
5

Snowflake
Virtual Warehousing

In Snowflake, a virtual warehouse, often referred to simply Virtual Warehousing Sizing


as a “warehouse,” is a cluster of compute resources. A
The size of a warehouse can impact the amount of time
warehouse provides necessary resources—such as CPU,
required to execute queries submitted to the warehouse,
memory, and temporary storage—to perform operations in
particularly for larger, more complex queries. In general,
a Snowflake session:
query performance scales linearly with warehouse size,
because additional compute resources are provisioned
Perform the following actions in Snowflake: with each size increase.
• Executing SQL SELECT statements that require
compute resources (for example, retrieving rows from Virtual warehouses come in various sizes: X-Small, Small,
tables and views) Medium, Large, X-Large, 2X-Large, and so on. The size
specifies the number of servers that comprise each
• Loading data into tables (COPY INTO table) cluster in a warehouse.
• Performing DML operations (DELETE/INSERT/UPDATE)
Unlike traditional systems, you are not locked into a
• Unloading data from tables (COPY INTO location) specific size. You have the flexibility to change the size
of your virtual warehouse at any time: manually via
Optmizing Snowflake: A real-world guide

the Snowflake web interface or by using the ALTER


WAREHOUSE command.

Also, bigger is not always better. While a larger warehouse


places a lot of compute resources at your disposal,
depending on the size and distribution of your data, a
medium-sized warehouse may suffice. Try different sizing
options to see what best suits your requirements.

See Warehouse Considerations in the Snowflake


documentation for more advice about sizing.
6

continued...

Single vs. Multiple Warehouses


Besides scaling a warehouse (up/down), you may also
consider creating virtual warehouses of different sizes
and then using whichever one is more appropriate for
the task at hand.
Further Reading:

For example, you may be loading data from two sets of


files: one that is just a few thousand rows and another • Overview of Virtual Warehouses
with millions of rows. Instead of resizing a single, small
warehouse to handle the larger files, you may create a • Warehouse Considerations
separate large warehouse and use that. This would free
up your small warehouse to do other relevant tasks, • Understanding Snowflake Credit
Optmizing Snowflake: A real-world guide

and it would also provide the benefit of having rightly


sized warehouses handling your loads in parallel. and Storage Usage

This can be especially useful when you load data, and it • Multi-cluster Warehouses
enables faster parallel loads.

Also, you can use the Auto-Suspend and Auto-Resume


features to save costs.
7

Accessing
Snowflake

Now that you’ve had a glimpse under the hood


of Snowflake, let’s look at the different ways you
can access and start using Snowflake. Snowflake
allows users to interact with it in a number of
ways. This chapter explores some of the options
available for connecting to your Snowflake
account.

How to access Snowflake:


See Snowflake documentation for a list of all
available options and further detail on each.

Snowflake Web Interface


Optmizing Snowflake: A real-world guide

Snowflake’s web interface is a powerful and


easy-to-use platform for accessing your Once you login, you will be able to manage your databases and
Snowflake account and the data you’ve loaded warehouses, execute DDL/DML statements using worksheets,
there. This is the default method for accessing review query history, and much more. The screenshot shows
Snowflake. Use the following URL in any the Worksheet page.
supported browser to access your account using
the web interface.
Check out Snowflake’s the web interface tour for more
details on working with Snowflake.
For more information on logging in, see Logging
8

Continued...
SnowSQL: Snowflake’s
Command-Line Interface
(CLI)
If you are a script junkie, you’ll love SnowSQL.
Here’s another example that
SnowSQL is a modern CLI that allows users
executes a script stored in a OTHER DATA QUERYING AND
to execute SQL queries, perform all DDL
local file (input_script.sql) and MANAGEMENT OPTIONS
and DML operations—including loading and
stores the results in a local file.
unloading data into and out of Snowflake— You may want to query Snowflake
Here, a username and password
and perform many other tasks. SnowSQL from your favorite SQL client or
are used instead of stored
may be used to to access your Snowflake from a reporting or dashboarding
credentials.
database from the command line quickly and, tool. If you are a software
if required, automate operations by running developer, you may want to access
scripts. snowsql -a abc123 -u jsmith or manage your data in Snowflake
-f //tmp/input_script.sql -o using your favorite programming
output_file=//tmp/output.csv language.
See Installing SnowSQL for instructions on
-o quiet=true -o friendly=false Snowflake provides various drivers
downloading and installing the
-o header=false -o output_ or programmatic interfaces, some
format=csv of which are listed below:
SnowSQL (that is, the snowsql executable)
can be run as an interactive shell or in batch • Snowflake connector for Python
Optmizing Snowflake: A real-world guide

mode. Here’s an example of running a simple You may use the PUT command
• Snowflake connector for Apache
query where results are printed to stdout to upload local files into an
Spark
(the console). Note that the login credentials internal stage and the GET
• JDBC driver
represented in the example command by command to download files from
• Node.js driver
myaccount are stored in the [connections] an internal stage to local disk/
• Go Snowflake driver (which
section of a configuration file. system. Read more about internal
provides interface for developing
and external stages in the Ch 4:
applications using the Go
HELPFUL HINT: Use SnowSQL for writing and Loading Data.
programming language)
running scripts in Snowflake or for automating • .NET driver
your data load and other processes. For more details, see the
• ODBC driver
SnowSQL user guide.
9

Loading To start fully benefiting from the power of Snowflake, you need to load your data into a database.
There are several options for loading data into Snowflake tables:

Data •


Bulk load data
Use Snowpipe to continuously load data
Load data using the web interface
• Use custom/vendor applications

These options are reviewed in detail below. Choose a method that suits your use case.

Bulk Loading Data


All data platforms provide some means by which to load data in bulk, and files have typically
been
the medium of choice. Bulk loading data is a fast and cost-effective method for populating your
Snowflake warehouse.

You can bulk load data from files into tables in Snowflake using the COPY INTO table command.
You may execute a COPY command from the Snowflake web interface, from a SnowSQL prompt,
or from your favorite programming language by using the appropriate driver, as discussed earlier
in the Other Data Querying and Management Options section.

1. Check that the files you intend to load are of a supported format (see below).
2.Compress files for faster loading.
3. Check that your target table already exists (use CREATE TABLE to create a table).
Optmizing Snowflake: A real-world guide

With Matillion ETL for 4. Ensure that the files are already staged in an internal or external stage.
Snowflake you can
also take advantage of
Review and (where possible) adhere to best practices.
components developed
to streamline efforts, File types supported by the bulk loader:
such as the built-in
scheduler to ensure your • Any flat, delimited plain-text format (comma-separated values, tab-separated values, etc.)
Snowflake tables are all • Semi-structured data in JSON, Avro, ORC, Parquet, or XML format (XML is currently supported as
updated consistently at a a preview feature)
convenient time interval.
10

Continued... A stage in Snowflake is a (named) location where you store data files before loading them. Use
CREATE STAGE command to create one. Snowflake supports internal and external stages, which
in turn, can be permanent or temporary. A temporary stage gets dropped automatically at the
end of a session.

Internal Stage
An internal stage stores data within your Snowflake environment. Files can be uploaded into an
internal stage using SnowSQL and the PUT command, or they can be loaded programmatically
using one of the drivers. Loading data via an internal stage:

Step 1: Upload one or more data files to a Snowflake stage (a named internal stage, a
stage for a specified table, or a stage for the current user) using the PUT command.
Step 2: Use the COPY INTO table command to load the contents of the staged file(s) into
a Snowflake database table.

When should you use an internal stage?


An internal stage is intended for temporarily holding data files before loading them into tables
or downloading to a local system. You will incur standard data storage costs. You should moni-
tor the data files and remove them from the stages once you have loaded the data and you no
longer need the files.

External Stage
Optmizing Snowflake: A real-world guide

An external stage stages the data in a location outside of Snowflake. This is usually within Am-
azon Simple Storage Service (S3), Microsoft Azure Blob Storage or and Google Cloud Storage
Containers. External stages can be named entities in Snowflake or references to a location or
bucket in the relevant service (for example, s3://bucketname/path-to-file).

Accessing an external stage requires credentials. These can be passed via the CREDENTIALS
parameter of the COPY INTO table command. Alternatively, the CREATE STAGE command al-
lows you to specify the credentials required to access an external stage. Snowflake would auto-
matically use the stored-credentials if they are not passed into the COPY INTO command when
loading from this stage. However, many organisations frequently rotate keys for added security
which may invalidate the stored key and may introduce additional overhead when storing keys
within Snowflake.
11

Continued... Loading data via an external stage:


Step 1: Use the upload interfaces or utilities provided by Amazon or Microsoft to
stage your files into the external stage.

Step 2: Use the COPY INTO table command to load the contents of the staged file(s)
into a Snowflake database table.

When should you use an external stage?


External stages may also double as an additional backup for your data or as a location from
where other processes or consumers may consume data in the data files.

You may already have existing or established processes for moving data files into an S3 bucket
or a Microsoft Azure Container.
Costs related to using an external stage can be found on the bill from the respective vendor. You
may use any tools at your disposal to upload files to or download files from an external stage.

Data Staging Examples


Let’s look at some examples of using the COPY INTO command to load data from files to tables
in Snowflake.

Loading from an Assume that you uploaded some files to an internal stage using SnowSQL and the PUT
internal stage command. Here are some examples of loading a file called 1.csv from various internal stages—a
named stage (@mystage), a table stage (%mytable), and a user stage (~):
Optmizing Snowflake: A real-world guide

copy into mytable from ‘@mystage/path 1/file 1.csv’;


copy into mytable from ‘@%mytable/path 1/file 1.csv’;
copy into mytable from ‘@~/path 1/file 1.csv’;

Loading from an The following example loads data from a file in Amazon S3. Note that credentials are passed as
external stage part of the command.
(Amazon S3)
copy into mytable
from ‘s3://mybucket 1/prefix 1/file 1.csv’
credentials = (aws_key_id=’xxxx’ aws_secret_key=’xxxxx’
aws_token=’xxxxxx’);
12

Continued... Here’s an example of loading data from a file in a named external stage:
copy into mytable
Loading data from a file in from ‘@myextstage/some folder/file 1.csv’;
a named external stage

The following example loads data from a file in Microsoft Azure.


copy into mytable
Loading from an external from ‘azure://myaccount.blob.core.windows.net/my
stage (Microsoft Azure) load/encrypted_files/file 1.csv’;

Further Reading:
• Bulk Loading from a Local File System Using COPY
• Bulk Loading from Amazon S3 Using COPY
• Bulk Loading from Microsoft Azure Using COPY

Using Snowpipe to Load Data


Snowpipe is a service from Snowflake that can be used to load data from files as soon as they
DID YOU KNOW... are available in a Stage (internal or named-external). The service provides Snowflake-managed
Snowflake allows you to compute resources and exposes a set of REST endpoints that can be invoked to initiate loads
query the data in files in an (COPY).
internal or external stage?
See following for more Snowflake also provides Java and Python APIs that simplify working with the Snowpipe REST
information: API. This is a great option if you are a programmer building tools or implementing workflows
for ingesting data files into Snowflake.
• Querying Staged Data
Optmizing Snowflake: A real-world guide

Users can build tools that initiate loads by invoking a REST endpoint; without managing a virtual
• Querying Metadata for warehouse or manually running a COPY command every time a file needs to be loaded. The
Staged Files service is managed by Snowflake and it automatically scales up or down based on the load on
the Snowpipe service.
Snowpipe Billing
Snowpipe is billed based on compute credits used per second. Snowflake tracks the resource
consumption of loads for all pipes in an account, with per-second/per-core granularity, as
Snowpipe actively queues and processes data files. “Per-core” refers to the physical CPU cores
in a compute server. You will see a new line item (SNOWPIPE) on your Snowflake bill. Go to
the “Billings and Usage” page in the Snowflake web interface to get a detailed breakdown of
Snowpipe usage. You can break usage information down to a specific date and hour as well as
13

Continued... Central to Snowpipe is the concept of a “pipe”. A Snowpipe pipe is a wrapper around the COPY
command that is used to load a file into the target table in Snowflake. A few of the options
from the COPY command are not supported. See CREATE PIPE for more information.
Snowpipe Benefits

Snowpipe provides a serverless data loading option that manages compute capacity on your
behalf. You can also take advantage of the per-second/per-core billing to save on compute
costs and pay only for the exact compute resources you use.

Instant insights. Snowpipe immediately provides fresh data to all your business users
without contention.
Cost-effectiveness. You pay only for the per-second compute used to load data rath-
er than the costs for running a warehouse continuously or by the hour.
Ease-of-use. You can point Snowpipe at an S3 bucket from within the Snowflake UI
and data will automatically load asynchronously as it arrives.
Flexibility. Technical resources can interface directly with the programmatic REST API,
using Java and Python SDKs to enable highly customized loading use cases.
Zero management. Snowpipe automatically provisions the correct capacity for the
data being loaded. There are no servers or management to worry about.
Cost
Optmizing Snowflake: A real-world guide

Performance
Read more about how you can streamline data loading with Snowpipe.

LOAD FILES BY INVOKING A REST API

Let’s look at how to explicitly invoke Snowflake via the REST API to load files using the Snow-
pipe service. In this example, we will copy a file to an S3 staging area represented by a named
external stage in snowflake and then invoke the Snowpipe REST endpoint to ingest the file.
We are ingesting a JSON file which will be loaded into a Variant column in a table.
14

Continued... Configuring Snowpipe

You may notice that Steps 1, 2, and 3 below are exactly the same as in the previous section
except that the pipe is created without auto_ingest=true.
Step 1: Create a named stage in Snowflake.

create or replace stage mydb.public.snowstage


url=’s3://snowpipe-demo/’
credentials = (AWS_KEY_ID = ‘...’ AWS_SECRET_KEY = ‘...’ );

Step 2: Create a table.

create table mydb.public.mydatatable(jsontext variant);

Step 3: Create a pipe.


Create a new pipe in the system for defining the “COPY INTO <table>” statement
used by Snowpipe to load data from an ingestion queue into tables. For more in-
formation, see CREATE PIPE.

create or replace pipe mydb.public.mysnowpipe as


copy into mydb.public.mydatatable
from @mydb.public.snowstage
file_format = (type = ‘JSON’);
Optmizing Snowflake: A real-world guide

Step 4: Configure security (per user).

Users cannot authenticate with the REST API using their Snowflake login
credentials. Generate a public-private key pair for making calls to the Snowpipe
REST endpoints. In addition, grant sufficient privileges on the objects for the data
load, for example, the target database, schema, and table; the stage object; and the
pipe.

For more information on generating a compliant key pair and associating it with
the relevant user, see Configure Security in the Snowflake documentation. Also
refer to the relevant SDK documentation on key-based authentication.
15

Further Reading:
Continued... • Loading Continuously Using Snowpipe
• Understanding Billing for Snowpipe Usage
• How Snowpipe Streamlines Your Continuous Data Loading and Your Business
• Video: Automatically Ingesting Streaming Data with Snowpipe
• Video: Load Data Fast, Analyze Even Faster

Loading Data Using the Web Interface


You may use the Snowflake web interface to issue DML commands such as INSERT and UP-
DATE or to load data files up to 50MB in size. Both of these options are good for ad hoc data
loads and when you are working with small data sets.

The Snowflake web interface provides a simple wizard for loading files into tables. The wizard
uploads files to an internal stage (via PUT), and then it uses a COPY command to load data into
the table. See the documentation for more information on using the web interface to upload
files and load data.

Using Custom/Vendor Applications


You can use your favorite programming language to build your own applications to access and
Optmizing Snowflake: A real-world guide

leverage the Snowflake platform. Snowflake provides connectors and drivers for many popu-
lar languages that can be used to build custom applications.
SnowSQL is a good example of this. It uses the Python connector provided by Snowflake to
provide an effective CLI.
16

Exporting There are several options for exporting (also called unloading) data:

• Bulk exporting data from a table, a view, or the result of a SELECT statement into files in

Data an internal or external stage


• Using custom code or a client to query a table or a view and then writing the results to
one or more files in a local file system or any other target

These options are reviewed in detail below. Choose a method that suits your use case.

Exporting Data to a Stage


Use the COPY INTO location command to perform a bulk export of data (also called data unload-
ing) from a table, a view, or the result of a SELECT statement into files in an internal or external
stage (Amazon S3, Microsoft Azure or Google Cloud Storage).

Files exported to an internal stage may be downloaded to a local file system using SnowSQL
and the GET command. Files exported to an external stage may be accessed/downloaded via
interfaces provided by the respective platform (Amazon S3, Microsoft Azure or Google Cloud
Storage).

At the time of this writing, Snowflake can export data as single-character delimited files (CSV,
TSV, etc.) or as JSON. Exports can be optionally compressed and are always transparently en-
crypted when written to internal stages. Exports to external stages can be optionally encrypted
as well.
Optmizing Snowflake: A real-world guide

Exporting Data to a Stage Examples

Sign up for a demonstration of Matillion ETL for Snowflake and an opportunity to speak to
a Solution Architect about your unique use case.
17

Continued... The following command is an example of exporting the result of a SELECT statement to
an internal stage named my_stage, to files that reside in a folder named result and whose
names are prefixed with data_, with a file format object named vsv, and using gzip compres-
sion. You can then load data from these files into other tables or download the files to your
local disk using the SnowSQL GET command.
Exporting to an external copy into @my_stage/result/data_ from (select * from orders)
stage (Amazon S3) file_format=(format_name=’vsv’ compression=’gzip’);

The following command is an example of exporting from a table named mytable to CSV-
formatted files in Amazon S3 by specifying the credentials* for the desired S3 bucket.
Exporting to an external copy into s3://mybucket/unload/ from mytable
stage (Microsoft Azure) credentials = (aws_key_id=’xxxx’ aws_secret_key=’xxxxx’ aws_token=’xxxxxx’)
file_format = csv;

The following command is an example of exporting from a table named mytable to CSV-
formatted files in Microsoft Azure. The command specifies credentials for the targeted Blob
NOTE storage bucket.
You can avoid specifying copy into azure://myaccount.blob.core.windows.net/unload/
credentials by creating from mytable
named external stages in credentials = (aws_key_id=’xxxx’ aws_secret_key=’xxxxx’ aws_token=’xxxxxx’)
advance using the CREATE file_format = csv;
STAGE command. To do this,
you specify the external
stage location and optionally Exporting Data to a Local File System or Other Target
the credentials required to You may also use your favorite programming language or client to query a table or a view and
Optmizing Snowflake: A real-world guide

access this location. then write the results to one or more files in a local file system or any other target. This approach
may be slower than using the “COPY INTO <location>” command, because data needs to travel to
the local machine running your code or client, which then writes to the file system. This approach
may not noticeably affect exports for small tables but it will affect exports for larger tables. COPY
INTO also benefits from using compression techniques when data is exported, which results in
reduced network traffic and faster data movement.

You can also issue a “COPY INTO <location>” command using programming interfaces, download
the files from the appropriate stage, and then access the data. This may be appropriate if you
intend to download large data sets to local files.

Further Reading: Data Unloading


18

Storage and
Compute Costs
This eBook has described methods and best practices for Storage Costs
optimizing your usage of Snowflake to control costs. This All customers are charged a monthly fee for the data they store in
chapter discusses how billing works, recaps methods Snowflake. Storage cost is measured using the average amount of
for cost optimization, and describes how you can track storage used per month for all customer data consumed or stored
and control costs. The following information is based on in Snowflake, after compression.
Snowflake’s pricing guide.
Note that features such as Time Travel and Fail-safe may increase
Snowflake’s unique architecture allows for a clear costs associated with storage, because data is not immediately
separation between your storage and compute resources. deleted but instead is held in reserve to support these features.
This allows Snowflake’s pricing model to be much simpler Files in internal stages and features such as Snowflake Data
and to include only two items: Sharing, also known as the Data Sharehouse™, and cloning will
also affect your storage costs.
• The cost of storage used
• The cost of compute resources (implemented as Virtual Warehouse (Compute) Costs
Optmizing Snowflake: A real-world guide

virtual warehouses) consumed Snowflake charges you only when your warehouse is in a “started”
state. There is no charge when it is in a “suspended” state. This
Snowflake credits are used to pay for the processing time allows you to create multiple warehouse definitions and suspend
used by each virtual warehouse. them to prevent you from being billed for them. You must issue an
ALTER WAREHOUSE RESUME command before you intend to use a
suspended virtual warehouse.
19

Continued...
There is a linear relationship between the number of OTHER FACTORS
servers in a warehouse cluster and the number of credits
the cluster consumes. Snowflake uses per-second billing The following factors influence the unit costs for the
(with a 60-second minimum each time the warehouse credits you use and the data storage you use:
starts), so warehouses are billed only for the credits
they actually consume. For more information, see • Whether you have a Snowflake On Demand account
Understanding Snowflake Credit and Storage. or a Capacity account
• The Region in which you create your Snowflake
You can profile a warehouse to understand its usage account
and credits spent. To profile a warehouse, use the • The Snowflake Edition that your organization
WAREHOUSE_LOAD_HISTORY and WAREHOUSE_ chooses
METERING_HISTORY functions. The information provided
by these functions can tell you if scaling up a warehouse Most Snowflake customers use Snowflake On Demand
would benefit any existing loads. Conversely, you may initially to develop and test the application workload in
also be able to identify underutilized warehouses and order to gain real-world experience that enables them
consolidate them, if appropriate. Read more about to estimate their monthly costs. When the application
profiling here. workload is understood, customers can then purchase an
appropriately sized capacity.
Once you understand how best to use your virtual
warehouse for your data needs you can implement best NOTE : The Snowflake Edition your business chooses will
Optmizing Snowflake: A real-world guide

practices for performance and cost efficiency. impact billing. On Demand: Usage-based pricing with no
long-term licensing requirements. Capacity: Discounted
pricing based on an upfront capacity commitment.

Further Reading:
• Snowflake’s Pricing Guide
• How Usage-Based Pricing Delivers a Bud-
get-Friendly Cloud Data Warehouse
20 IMPROVING LOAD PERFORMANCE

Best
• Use bulk loading to get the data into tables in Snowflake.
• Consider splitting large data files so the load can be efficiently distributed across servers
in a cluster.

Practices •


Delete from internal stages files that are no longer needed. This may improve
performance in addition to saving on costs.
Isolate load and transform jobs from queries to prevent resource contention. Dedicate
separate warehouses for loading and querying operations to optimize performance for
each.
• Leverage the scalable compute layer to do the bulk of the data processing.
• Consider using Snowpipe in micro-batching scenarios.
• Cost

IMPROVING QUERY PERFORMANCE


This chapter lists best
practices for optimizing • Consider implementing clustering keys for large tables.
Snowflake. These best • Try to execute relatively homogeneous queries (size, complexity, data sets, etc.) on the
practices fit many use same warehouse. Your query may benefit from cached results from a previous execution.
cases, but there are some • Use separate warehouses for your queries and load tasks. This will facilitate targeted
circumstances where provisioning of warehouses and avoid any resource contention between dissimilar
they may not apply. Your operations.
understanding of Snow- • Cost
flake’s architecture should • Performance
help you determine what
is best for your particular
use case. MANAGING A VIRTUAL WAREHOUSE
• Experiment with different warehouse sizes before deciding on the size that suits your
Optmizing Snowflake: A real-world guide

requirements. Remember, you are not tied into a particular size.


• Use auto-suspend and auto-resume to save costs. Depending on your workloads, these
may save costs as far as loads are concerned. This may not be as good for queries,
though; see the next item below.
• Understand the impact of caching on queries. Warehouses that use auto-suspend and
auto-resume may not benefit from caching. Consider the trade-off between saving
credits by suspending a warehouse versus maintaining the cache of data from previous
queries for quicker response times.
• Warehouse suspension and resumption takes time, which is noticeable for larger
warehouses. Keep the warehouse running if you need an immediate response.
• Cost
• Performance
21

Conclusion
About Matillion
We hope you enjoyed this eBook and that you have Matillion is an industry-leading data
found some helpful tips on how to make the most of your transformation solution for cloud data
Snowflake database. Implementing the best practices warehouses. Delivering a true end-to-end data
and optimizations described in this eBook should help you transformation solution (not just data prep
enhance big data analytics performance and reduce your or movement from one location to another),
Snowflake costs. Matillion provides an instant-on experience to
get you up and running in just a few clicks, a
With Snowflake, you can spend fewer resources on managing pay-as-you-go billing model to cut out lengthy
database overhead and focus on what’s really important: procurement processes, and an intuitive user
answering your organization’s most pressing business interface to minimize technical pain and speed
questions. up time to results. Matillion is available globally
for Snowflake on AWS Marketplace, Microsoft
Azure Marketplace and Google Cloud Platform.

Find out more at www.matillion.com


Optmizing Snowflake: A real-world guide

© 2019 Matillion. All rights reserved


Optmizing Snowflake: A real-world guide
22

You might also like