5.micro Partitions+and+Clustering

Uploaded by

clouditlab9

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

29 views20 pages

5.micro Partitions+and+Clustering

Uploaded by

clouditlab9

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 20

Micro-Partitions and Clustering

What is a Snowflake micro-partition?

A micro-partition is a file, stored in the blob storage
service for the cloud service provider on which a
Snowflake account runs:

AWS - S3
Azure - Azure Blob Storage
GCP - Google Cloud Storage
Data Partitioning
Partitioning is the process of breaking down a table into smaller, more
manageable parts based on a common Creteria,
for example, a date,
a geographic region,
or a product category.
Each partition is treated as a separate table and can be queried
independently, allowing faster and more efficient data retrieval. Also, keep in
mind that partitioning can help lower storage costs by putting data that is used
less often in cheaper storage space.
Ex:
Let's consider an example to illustrate the benefits of data partitioning.
Let's say, for example, that we have a sales database containing millions of records organized
into year and month partitions so that data from specific months or years can be promptly
accessed.
------------------
Therefore, by partitioning the data like this, requests are more efficiently processed and more
accurate answers can be obtained.

SELECT store_location, SUM(sales_amount)

FROM sales
WHERE transaction_date BETWEEN '2023-01-01' AND '2023-12-31'
AND product_category = 'Electronics'
GROUP BY store_location
Assume that the sales table is partitioned on the transaction_date and store_location columns.
The warehouse can prune the partitions only to scan those containing data within a certain
time frame or within a particular store location. This way, the number of records it needs to
scan is significantly reduced, resulting in faster query times.
Benefits of Snowflake Micro-Partitions
The benefits of Snowflake's approach to partitioning table data include:

Automatic partitioning requires virtually no user oversight

Snowflake Micro-partitions are small, allowing for efficient DML operations
Snowflake Micro-partition metadata enables "zero-copy cloning", allowing for efficient
copying of tables, schemas, and databases with no extra storage costs
Original micro-partitions remain immutable, ensuring data integrity when editing data in a
Snowflake zero-copy clone
Snowflake Micro-partitions improve query performance through horizontal and vertical
query pruning, scanning only the needed micro-partitions for better query performance
Clustering metadata is recorded for each micro-partition, allowing Snowflake to further
optimize query performance.
Micro partitions
• Snowflake has implemented a powerful and unique form of partitioning,
called micro-partitioning.
• Micro-partitioning is automatically performed on all Snowflake tables.
• Tables are transparently partitioned using the ordering of the data as it is
inserted/loaded.
• Micro-partitions are small in size (50 to 500 MB).
• Data is compressed in micro partitions
• Snowflake automatically determines the most efficient compression algorithm for
the columns in each micro-partition.
Eg: Observe how the blue coloured and magenta
coloured data stored
Metadata of Micro partitions
• Metadata is also maintained by Snowflake which Includes…
• the number of distinct values for each field
• the range of values for each field
• Other useful information to improve performance.

• Metadata is a key part of the Snowflake architecture as it allows queries to determine

whether or not the data inside a micro-partition should be queried.
• This way, when a query is executed, it does not need to scan the entire dataset but
instead only queries the micro-partitions that hold relevant data.
• This process is known as query pruning, as the data is pruned before the query itself
is executed.
SELECT type, country
FROM MY_TABLE
WHERE name = "Y“;

The only micro-partitions that match this criterion are micro-partitions 3 and 4. The query pruning has
reduced our total dataset to just these two partitions
And only the [type] and [country] fields are required in the query output, any part of micro-partitions that do
not contain data for these columns would also be pruned.
i.e. When the micro-partitions themselves are queried, only the required columns are queried
Benefits of Micro partitioning
• In contrast to traditional static partitioning, Snowflake micro-partitions are derived
automatically; they don’t need to be explicitly defined up-front or maintained by
users.
• Micro-partitions are small in size (50 to 500 MB), which enables extremely
efficient DML and fine-grained pruning for faster queries.
• Columns are stored independently within micro-partitions, often referred to as
columnar storage.
• This enables efficient scanning of individual columns; only the columns
referenced by a query are scanned.
• Columns are also compressed individually within micro-partitions, this optimizes
the storage cost.
Benefits of Snowflake Micro-Partitions
The benefits of Snowflake's approach to partitioning table data include:

Automatic partitioning requires virtually no user oversight

Snowflake Micro-partitions improve query performance through horizontal and vertical

query pruning, scanning only the needed micro-partitions for better query performance
Clustering metadata is recorded for each micro-partition, allowing Snowflake to further
optimize query performance.
Clustering
• Clustering is a key factor in query performance, It reduces the scanning of micro
partitions.
• A clustering key is a subset of columns in a table that are explicitly designated to
co-locate the data in the table in the micro-partitions.
• Initially the data will be stored in micro partitions in the order of inserting records,
then will be realigned based on the cluster keys.
• We have to choose proper Cluster keys.
• We can define cluster keys on multiple columns as well.
• We can modify the cluster keys based on our requirements, this is called as re-
clustering.
• Re-clustering consumes credits, number of credits consumed depends on the
size of the table.
Clustering Keys :
In this second example, lets say we clustered
our dataset based on the [Name] field, as this
was the frequently used key field used in our
queries in WHERE CLAUSE / JOIN
• The data is now stored and ordered based on the value of the [name] field.
• The micro-partition 4 is now the only micro-partition that contains [name] values of Y.
• When we execute our earlier query now, the query pruning will reduce our target data down to just micro-
partition 4, which means our query has less data to interpret and thus will perform more efficiently.
Or if we see type and date
column getting frequently in
where clause.. Then ALTER
table to include both as cluster
keys
Defining Cluster keys
Defining Cluster keys on a new table:
CREATE TABLE MY_TABLE
(
type number,
name string,
country string,
date date,
)
CLUSTER BY (name);

Modifying Cluster keys on existing table:

ALTER TABLE MY_TABLE CLUSTER BY (name, date);

Choosing Cluster Keys
Define cluster keys on
• Columns frequently used in Filter conditions (Where clause)
• Columns using as Join keys
• Frequently used functions or expressions
Like YEAR(date), SUBSTRING(med_cd,1,6)

Snowflake recommends
• Define cluster keys on large tables and don’t on small tables.
• Don’t define cluster keys on more than 4 columns.
Thank You

Database Systems Handbook
No ratings yet
Database Systems Handbook
293 pages
Running Head: Acid Properties, Cap Theorem & Mobile Databases 1
No ratings yet
Running Head: Acid Properties, Cap Theorem & Mobile Databases 1
7 pages
CDA-C2-r-074-en-file-68.en
No ratings yet
CDA-C2-r-074-en-file-68.en
3 pages
Snowflake Snowpro Core Preview
No ratings yet
Snowflake Snowpro Core Preview
2 pages
Partitioning for Database Performance
No ratings yet
Partitioning for Database Performance
3 pages
Pruning in Snowflake:Working Smarter, Not Harder
No ratings yet
Pruning in Snowflake:Working Smarter, Not Harder
14 pages
1.NOSQL-chapter2
No ratings yet
1.NOSQL-chapter2
26 pages
Clusturing
No ratings yet
Clusturing
2 pages
Random Question 1
No ratings yet
Random Question 1
16 pages
Snowflake notes
No ratings yet
Snowflake notes
2 pages
Snowflake quiz
No ratings yet
Snowflake quiz
37 pages
Snf prre mngr ytw
No ratings yet
Snf prre mngr ytw
6 pages
Snowflake Cost Optimization Strategies 7 Tips To Reduce Your Snowflake Costs [English (auto-generated)] [DownloadYoutubeSubtitles.com]
No ratings yet
Snowflake Cost Optimization Strategies 7 Tips To Reduce Your Snowflake Costs [English (auto-generated)] [DownloadYoutubeSubtitles.com]
22 pages
20241120 Officehours Offline Pos Update Service
No ratings yet
20241120 Officehours Offline Pos Update Service
28 pages
Three SQL Techniques
No ratings yet
Three SQL Techniques
11 pages
Arnav Resume Final
No ratings yet
Arnav Resume Final
1 page
DataLoading in Snowflake
No ratings yet
DataLoading in Snowflake
10 pages
3 RD Unit Partioning
No ratings yet
3 RD Unit Partioning
3 pages
Geographic Information Science and Systems, 4th Edition 4th Edition, (Ebook PDF) All Chapters Instant Download
100% (3)
Geographic Information Science and Systems, 4th Edition 4th Edition, (Ebook PDF) All Chapters Instant Download
47 pages
Micro Partitioning
No ratings yet
Micro Partitioning
3 pages
Query Pruning
No ratings yet
Query Pruning
2 pages
Class 10 IT Part B Complete Notes
No ratings yet
Class 10 IT Part B Complete Notes
235 pages
Database Partitioning A Review Paper
No ratings yet
Database Partitioning A Review Paper
4 pages
Query Optimization
No ratings yet
Query Optimization
24 pages
Snowflake Scripting
No ratings yet
Snowflake Scripting
2 pages
5.Micro-Partitions and Clustering
No ratings yet
5.Micro-Partitions and Clustering
14 pages
Matillion Profile
No ratings yet
Matillion Profile
1 page
Snowflake Clustering
No ratings yet
Snowflake Clustering
19 pages
Partitioning Method
No ratings yet
Partitioning Method
8 pages
Snowflake - Virtual Warehouse
No ratings yet
Snowflake - Virtual Warehouse
14 pages
Serializability Theory
100% (1)
Serializability Theory
8 pages
Planning Installation and Configuration Guide For Windows PDF
No ratings yet
Planning Installation and Configuration Guide For Windows PDF
228 pages
Snowflake Training
No ratings yet
Snowflake Training
136 pages
JavaScript Interview Questions For Freshers
No ratings yet
JavaScript Interview Questions For Freshers
69 pages
Dynamo DB Cheat Sheet: Partitions - 10% Rule
No ratings yet
Dynamo DB Cheat Sheet: Partitions - 10% Rule
3 pages
Davao Oriental State College of Science and Technology Banaybanay Campus
No ratings yet
Davao Oriental State College of Science and Technology Banaybanay Campus
14 pages
The Missing Manual - SELECT - Data Council
No ratings yet
The Missing Manual - SELECT - Data Council
54 pages
Streams Tasks
No ratings yet
Streams Tasks
3 pages
Micro
No ratings yet
Micro
265 pages
Matillion - Best - Practices
No ratings yet
Matillion - Best - Practices
2 pages
INSERT&UPDATE
No ratings yet
INSERT&UPDATE
2 pages
Microsoft - Strategies For Partitioning Relational Data Warehouses in SQL Server
No ratings yet
Microsoft - Strategies For Partitioning Relational Data Warehouses in SQL Server
27 pages
Lehel John Kovach Resume PDF
No ratings yet
Lehel John Kovach Resume PDF
2 pages
Data Mining Questions
No ratings yet
Data Mining Questions
9 pages
Azure Data Fundamentals Explore Non Relational Data in Azure - Explore Non-Relational Data Offerings in Azure
No ratings yet
Azure Data Fundamentals Explore Non Relational Data in Azure - Explore Non-Relational Data Offerings in Azure
20 pages
Snowflake SnowPro Core Certification Exam Questions - Page 26 of 27 - SkillCertPro
No ratings yet
Snowflake SnowPro Core Certification Exam Questions - Page 26 of 27 - SkillCertPro
1 page
Micro Partitions and Clustering
No ratings yet
Micro Partitions and Clustering
6 pages
Ravi Snowflake Interview Questions-1
No ratings yet
Ravi Snowflake Interview Questions-1
20 pages
BH2_S4HANA2022_BPD_EN_DE
No ratings yet
BH2_S4HANA2022_BPD_EN_DE
19 pages
DWH Interview Questions
No ratings yet
DWH Interview Questions
2 pages
RDS-MySQL-To-Sf-With-Matillion
No ratings yet
RDS-MySQL-To-Sf-With-Matillion
5 pages
Snowflake - Search Optimization
No ratings yet
Snowflake - Search Optimization
2 pages
Partitioning in Oracle
No ratings yet
Partitioning in Oracle
5 pages
SQL Server
No ratings yet
SQL Server
29 pages
Oracle Partitioning
No ratings yet
Oracle Partitioning
6 pages
0141 Creating Queries and Reports Course
No ratings yet
0141 Creating Queries and Reports Course
69 pages
Snowflake - Data Ingestion - Loading
No ratings yet
Snowflake - Data Ingestion - Loading
12 pages
Create Cluster: Purpose
No ratings yet
Create Cluster: Purpose
24 pages
Datadog Annual Report 2023 (as Filed 04-19-2024)
No ratings yet
Datadog Annual Report 2023 (as Filed 04-19-2024)
101 pages
The Simple Guide To Snowflake Tables
No ratings yet
The Simple Guide To Snowflake Tables
51 pages
Object Oriented Programming Using C++ Microproject The Library Management (2062)
No ratings yet
Object Oriented Programming Using C++ Microproject The Library Management (2062)
11 pages
Semi Structured Query
No ratings yet
Semi Structured Query
2 pages
Architecture
No ratings yet
Architecture
4 pages
Snowflake Learning Path. Let Your Data Take Centerstage - by DataCouch - Medium
No ratings yet
Snowflake Learning Path. Let Your Data Take Centerstage - by DataCouch - Medium
19 pages
Snowflake
No ratings yet
Snowflake
73 pages
Partitioning Fundamentals
No ratings yet
Partitioning Fundamentals
1 page
Snowflake
No ratings yet
Snowflake
16 pages
13.TimeTravel and FailSafe
No ratings yet
13.TimeTravel and FailSafe
10 pages
QA Profile
No ratings yet
QA Profile
5 pages
Oracle 11g Partitioning
No ratings yet
Oracle 11g Partitioning
11 pages
Partitioned Tables and Indexes: Introduction To Partitioning
No ratings yet
Partitioned Tables and Indexes: Introduction To Partitioning
18 pages
Snowflake Material.docx
No ratings yet
Snowflake Material.docx
2,157 pages
Table Partitioning:: Secret Weapon For Big Data Problems
No ratings yet
Table Partitioning:: Secret Weapon For Big Data Problems
46 pages
Partitioning Strategy
No ratings yet
Partitioning Strategy
17 pages
Course Challenge w5 3 Coursera
0% (1)
Course Challenge w5 3 Coursera
1 page
COPA Trade Syllabus
No ratings yet
COPA Trade Syllabus
33 pages
Google Bigtable
No ratings yet
Google Bigtable
21 pages
Unit V NoSQL Databases
No ratings yet
Unit V NoSQL Databases
124 pages
Giftegrity by Timothy Wilken
No ratings yet
Giftegrity by Timothy Wilken
7 pages
Access Control Snowflake
No ratings yet
Access Control Snowflake
6 pages
Employee Management System SRS
68% (19)
Employee Management System SRS
29 pages
Snowflake - Interview Questions
No ratings yet
Snowflake - Interview Questions
15 pages
ADF Course Deck
No ratings yet
ADF Course Deck
88 pages
Snowflake
No ratings yet
Snowflake
7 pages
Acsa - Ossim
No ratings yet
Acsa - Ossim
309 pages
IP21 Study Guide
No ratings yet
IP21 Study Guide
6 pages
Lesson 1 - Introduction To Database Management System - Presentation PDF
100% (1)
Lesson 1 - Introduction To Database Management System - Presentation PDF
11 pages
Teradata To Snowflake Migration Guide
No ratings yet
Teradata To Snowflake Migration Guide
14 pages
How To Partition PostgreSQL Database
No ratings yet
How To Partition PostgreSQL Database
8 pages
16.access Control in Snowflake
No ratings yet
16.access Control in Snowflake
14 pages
Snowflake+Interview+Questions+ +Part+I
No ratings yet
Snowflake+Interview+Questions+ +Part+I
27 pages
Itt Questions m2
No ratings yet
Itt Questions m2
150 pages
SQL Mastery: From Novice Queries to Advanced Database Wizardry
From Everand
SQL Mastery: From Novice Queries to Advanced Database Wizardry
Scott Markham
No ratings yet
Brief Introduction: Employee Payroll Management
No ratings yet
Brief Introduction: Employee Payroll Management
12 pages
SnowFlake Notes
100% (1)
SnowFlake Notes
40 pages
5 Micro-Partitions+and+Clustering
No ratings yet
5 Micro-Partitions+and+Clustering
13 pages
Honeywell
No ratings yet
Honeywell
1 page
Teradata To Snowflake Migration Guide
100% (2)
Teradata To Snowflake Migration Guide
15 pages
Matillion - Interview - Questions
No ratings yet
Matillion - Interview - Questions
2 pages
Python Programming & SQL
100% (3)
Python Programming & SQL
152 pages
External Tables
No ratings yet
External Tables
105 pages
Assessing The Effect of Using Computerized Accounting Systems On Organizational Performance in Zambia
No ratings yet
Assessing The Effect of Using Computerized Accounting Systems On Organizational Performance in Zambia
72 pages
Snowflake - Billing Components
No ratings yet
Snowflake - Billing Components
9 pages
What's New in The SAP HANA Platform (Release Notes)
No ratings yet
What's New in The SAP HANA Platform (Release Notes)
176 pages
15.table Types
100% (1)
15.table Types
13 pages
3 Snowflake+Architecture
No ratings yet
3 Snowflake+Architecture
20 pages
Create Temporary, Permanent & Transient Table
No ratings yet
Create Temporary, Permanent & Transient Table
2 pages
Data Warehouse - What Is It
No ratings yet
Data Warehouse - What Is It
5 pages
Snowflake Prctice1
No ratings yet
Snowflake Prctice1
51 pages
Views in Snowflake
No ratings yet
Views in Snowflake
13 pages
Mastering Trino: The Definitive Guide to Distributed SQL
From Everand
Mastering Trino: The Definitive Guide to Distributed SQL
Robert Johnson
No ratings yet
Commonly Asked Snowflake
No ratings yet
Commonly Asked Snowflake
26 pages
MICROSOFT AZURE ADMINISTRATOR EXAM PREP(AZ-104) Part-3: AZ 104 EXAM STUDY GUIDE
From Everand
MICROSOFT AZURE ADMINISTRATOR EXAM PREP(AZ-104) Part-3: AZ 104 EXAM STUDY GUIDE
Devi Prasad
No ratings yet
The Snowflake Handbook: Optimizing Data Warehousing and Analytics
From Everand
The Snowflake Handbook: Optimizing Data Warehousing and Analytics
Robert Johnson
No ratings yet
Snowflake - T
No ratings yet
Snowflake - T
108 pages

5.micro Partitions+and+Clustering

Uploaded by

5.micro Partitions+and+Clustering

Uploaded by

Micro-Partitions and Clustering

What is a Snowflake micro-partition?

SELECT store_location, SUM(sales_amount)

Automatic partitioning requires virtually no user oversight

• Metadata is a key part of the Snowflake architecture as it allows queries to determine

Automatic partitioning requires virtually no user oversight

Snowflake Micro-partitions improve query performance through horizontal and vertical

Modifying Cluster keys on existing table:

ALTER TABLE MY_TABLE CLUSTER BY (name, date);

You might also like