Introduction To Snowflake: Sunil Gurav
Introduction To Snowflake: Sunil Gurav
Sunil Gurav
What is Snowflake?
Snowflake is an analytic data warehouse provided as Software-as-
Services(SaaS). Snowflake provides data warehouse that is faster, easier
to use and more flexible that other traditional data warehouses.
Snowflake data warehouse is not built on existing databases or not on
big data software platform as Hadoop.
The snowflake data warehouse uses a new SQL database engine with
unique architecture designed for the cloud.
Key Concept and Architecture
Data Warehouse as Cloud Service:
Snowflake data warehouse is true SaaS offering.
• There is no hardware (virtual or physical) for you to select, install, configure and
manage.
• There is no software for you install, configure and manage.
• Ongoing maintenance, management and tuning is handled by snowflake.
Snowflake completely runs on cloud infrastructure. All the component of the
snowflake service runs on public cloud infrastructure.
Snowflake uses virtual compute instance for its compute need and storage service
for storage of data. Snowflake can not be run on private cloud infrastructure(on-
primises)
Snowflake Architecture
Snowflake architecture is the combination of shared-disk database
architecture and shared-nothing database architecture.
Shared-disk database architecture:
Snowflake uses central repository for data storage that is accessible from all the
compute nodes in the data warehouse.
Sharing-nothing architecture:
Snowflake processes queries using Massive Parallel Processing(MMP) compute
cluster where each nodes in compute cluster stores a portion of the entire data set
locally.
Shared-Disk Database Architecture
In this architecture each node can access the data stored in central
repository.
Data Warehouse
MPP Cluster
If your account is hosted on AWS and latency is a concern, you should choose
the available US region with the closest geographic proximity to your end
users.
European Union(EU) Regions
Snowflake maintains following EU regions:
These regions are provided for organizations that prefer or require their data to be
stored in the European Union. Multiple regions are provided to allow organizations
to meet their individual compliance and data sovereignty requirements. Snowflake
does not move data between accounts, so any data in an account in an EU region
will be maintained in this region only, unless explicitly copied or moved by users.
Asia Pacific(AP) Regions
Snowflake currently maintains a single AP region:
This region is provided for organizations that prefer or require their data to be
stored in the Asia Pacific part of the world. Snowflake does not move data
between accounts, so any data in an account in an AP region will be
maintained in this region only, unless explicitly copied or moved by users.
Snowflake Edition
Snowflake provides several alternatives to ensure that your usage of the
service fits your organization’s specific requirements. This includes offering
multiple Snowflake Editions to chose from, with each successive edition
building on the previous edition through additional features and/or higher
levels of service. And, as your organization’s needs change and grow, changing
editions is easy.
Note:
The Snowflake Edition that your organization chooses determines the unit costs for
the credits and the data storage you use. Another factor that impacts unit costs is
whether you have a Snowflake On Demand or Capacity account:
• On Demand: Usage-based pricing with no long-term licensing requirements.
• Capacity: Discounted pricing based on an up-front Capacity commitment.
Standard Edition
Standard edition is introductive level, providing full, unlimited access to all of
snowflake standard feature.
Standard Edition
Premier Edition
Snowflake offers Premier Edition as a cost-effective option for organizations that do
not need additional features, but would benefit from expanded access to Snowflake
Support.
It includes all the features and services of Standard Edition, with an upgrade to the
next level of support:
Enterprise Edition
Enterprise Edition provides all the features and services of Premier Edition, with the
following additional features designed specifically for the needs of large-scale
enterprises and organizations:
Enterprise Edition for Sensitive Data(ESD)
Enterprise for Sensitive Data offers even higher levels of data protection to support
the needs of organizations with extremely sensitive data, particularly PHI data that
must comply with HIPAA regulations.
It includes all the feature and services of Enterprise Edition, with the addition of the
following enhanced security and data protection:
Virtual Private Snowflake(VPS)
Virtual Private Snowflake offers our highest level of security for organizations that
have the strictest security requirements, such as financial institutions.
It includes all the feature and services of Enterprise Edition for Sensitive Data (ESD),
but in an separate Snowflake environment, isolated from all other Snowflake
accounts (i.e. VPS accounts do not share any resources with other accounts). VPS
delivers this level of extreme security through the use of:
Overview of key Feature
This topic lists the notable/significant features supported in the current
release. Note that it does not list every feature provided by Snowflake.
• Security and Data Protection
• Standard and Extended SQL Support
• Tools and Interfaces
• Connectivity
• Data Import and Export
• Data Sharing
Security and Data Protection
• Choose the level of security you require for your Snowflake account, based on your Snowflake Edition.
• Choose the geographical location where your data is stored, based on your Snowflake Region.
• User authentication through standard user/password credentials.
• Enhanced authentication:
• Multi-factor authentication (MFA).
• Federated authentication and single sign-on (SSO) — requires Snowflake Enterprise Edition.
• All communication between clients and the server protected through TLS.
• Deployment inside a cloud platform VPC.
• Isolation of data via Amazon S3 policy controls.
• Support for PHI data (in compliance with HIPAA regulations) — requires Snowflake Enterprise for Sensitive Data (ESD).
• Automatic data encryption by Snowflake using Snowflake-managed keys.
• Object-level access control.
• Snowflake Time Travel (1 day standard for all accounts; additional days, up to 90, allowed with Snowflake Enterprise) for:
• Querying historical data in tables.
• Restoring and cloning historical data in databases, schemas, and tables.
• Snowflake Fail-safe (7 days standard for all accounts) for disaster recovery of historical data.
Standard and Extended SQL Support
• Most DDL and DML defined in SQL:1999, including:
• Database and schema DDL.
• Table and view DDL.
• Standard DML such as UPDATE, DELETE, and INSERT.
• DML for bulk data loading/unloading.
• Core data types.
• SET operations.
• CAST functions.
• Advanced DML such as multi-table INSERT, MERGE, and multi-merge.
• Transactions.
• Temporary and transient tables for transitory data.
• Lateral views.
• Statistical aggregate functions.
• Analytical aggregates (Group by cube, rollup, and grouping sets).
• Parts of the SQL:2003 analytic extensions:
• Windowing functions.
• Grouping sets.
• Scalar and tabular user-defined functions (UDFs), with support for both SQL and JavaScript.
• Information Schema for querying object and account metadata, as well as query and warehouse usage history data.
Tools and Interface
• Web-based GUI for account and general management, monitoring of
resources and system usage, and querying data.
• SnowSQL (Python-based command line client).
• Virtual warehouse management from the GUI or command line,
including creating, resizing (with zero downtime), suspending, and
dropping warehouses.
Connectivity
• Broad ecosystem of supported 3rd-party partners and technologies.
• Support for using free trials to connect to selected partners.
• Extensive set of client connectors and drivers provided by Snowflake:
• Python connector
• Spark connector
• Node.js driver
• Go Snowflake driver
• .NET driver
• JDBC client driver
• ODBC client driver
• dplyr-snowflakedb(open source dplyr package extension maintained on GitHub)
Data Import and Export
• Support for bulk loading and unloading data into/out of tables,
including:
• Load any data that uses a supported character encoding.
• Load data from compressed files.
• Load most flat, delimited data files (CSV, TSV, etc.).
• Load data files in JSON, Avro, ORC, Parquet, and XML format.
• Load from S3 data sources and local files using Snowflake web interface or
command line client.
• Support for continuous bulk loading data from files:
• Use Snowpipe to load data in micro-batches from internal stages (i.e. within
Snowflake) or external stages (i.e. in S3 or Azure).
Data Sharing
• Support for sharing data with other Snowflake accounts:
• Provide data to other accounts to consume.
• Consume data provided by other accounts.
Overview of the Data Lifecycle
Snowflake provides support for all standard SELECT, DDL, and DML operations across
the lifecycle of data in the system, from organizing and storing data to querying and
working with data, as well as removing data from the system.
• Lifecycle Diagram
• Organizing Data
• Storing Data
• Querying Data
• Working with Data
• Removing Data
Lifecycle Diagram
All user data in snowflake is logical represented as tables that can be queried and
modified through standard SQL interface. Each table belongs to schema which in
turn belongs to database.
Organizing Data
You can organize your data into databases, schemas, and tables. Snowflake does not
limit the number of databases you can create or the number of schemas you can
create within a database. Snowflake also does not limit the number of tables you can
create in a schema.
For more information, see:
Storing Data
You can insert data directly into tables. Snowflake provides DML for loading data into
snowflake tables from external, formatted files.
For more information, see:
Querying Data
Once the data stored in table, you can issue SELECT statement to query data.
For more Information, see:
Working With Data
Once the data is stored in table, All standard DML operations can be performed on
the data. Snowflake support DDL actions such as cloning entire databases, schemas
and tables.
For more information, see:
Removing Data
In addition to using DML command, DELETE, to remove data from table. You can
truncate or drop an entire table. You can also drop entire schemas and databases.
For more information, see:
Continuous Data Protection
Continuous Data Protection (CDP) encompasses a comprehensive set of features
that help protect data stored in Snowflake against human error, malicious acts, and
software or hardware failure. At every stage within the data lifecycle, Snowflake
enables your data to be accessible and recoverable in the event of accidental or
intentional modification, removal, or corruption.
Connecting to Snowflake
These topics provide an overview of the Snowflake-provided and 3rd-party tools and technologies that
form the ecosystem for connecting to Snowflake. They also provide detailed installation and usage
instructions for using the Snowflake-provided clients, connectors, and drivers.
1. Overview of the Ecosystem
2. Snowflake Partner connect
3. SnowSQL(CLI Client)
4. Snowflake connector for Python
5. Snowflake connector for Spark
6. Node.js Driver
7. Go Snowflake Driver
8. .NET Driver
9. JDBC Driver
10. ODBC Driver
11. Client Considerations
Overview of Ecosystem
Snowflake works with a wide array of industry-leading tools and technologies, enabling you to access
Snowflake through an extensive network of connectors, drivers, programming languages, and utilities,
including:
• Snowflake-provided client software: SnowSQL (CLI), Python, Node.js, JDBC, ODBC, etc.
• Certified partners who have developed cloud-based and on-premises solutions for connecting to
Snowflake through our drivers and connectors.
• Other 3rd-party tools and technologies that are known to work with Snowflake.
Data Integration
Commonly referred to as ETL, data integration encompasses the following primary
operations:
Extract: Exporting data from specified source.
Transform: Modifying the source data as needed using rules, merges, lookup tables
or other conversion methods to match the target.
Load: Importing the transformed data into target database.
More recent usage references the term ELT, emphasizing that the transformation
part of the process does not necessarily need to be performed before loading,
particularly in systems such as Snowflake that support transformation during or after
loading.
Business Intelligence(BI)
Business intelligence (BI) tools enable analyzing, discovering, and
reporting on data to help executives and managers make more informed
business decisions. A key component of any BI tool is the ability to
deliver data visualization through dashboards, charts, and other
graphical output.
Business intelligence also sometimes overlaps with technologies such
as data integration/transformation and advanced analytics; however,
we’ve chosen to list these technologies separately in their own
categories.
Advanced Analytics
Also referred to as data science, machine learning (ML), artificial
intelligence (AI), and “Big Data”, advanced analytics covers a broad
category of vendors, tools, and technologies that provide advanced
capabilities for statistical and predictive modeling.
These tools and technologies often share some overlapping features
and functionality with BI tools; however, they are less focused on
analyzing/reporting on past data. Instead, they focus on examining large
data sets to discover patterns and uncover useful business information
that can be used to predict future trends.
Security and Privacy
Security and privacy tools ensure sensitive data maintained by an organization is
protected from inappropriate access and tampering. These tools support a wide
range of operations, including risk assessment, intrusion
detection/monitoring/notification, data masking, and more.
Snowflake is known to inter-operate with the following security and privacy tools:
Programmatic Interfaces
The Snowflake ecosystem supports developing applications using many popular programming
languages and development platforms.
Using our client drivers and connectors, Snowflake supports connecting natively through the following
languages and platforms:
SQL Editing/Querying Tools
Snowflake provides native SQL editing and querying solutions: