0% found this document useful (0 votes)

195 views

The Snowflake Elastic Data Warehouse SIGMOD 2016 and Beyond Ashish Motivala, Jiaqi Yan

The document summarizes Snowflake, an elastic data warehouse built for the cloud. Snowflake uses a multi-cluster, shared data architecture where storage is decoupled from compute. Data is stored immutably in micro-partitions across an object store. Virtual warehouses provide isolated compute resources that can independently scale and access all shared data. Snowflake's execution engine is columnar, vectorized, and push-based to efficiently process queries.

Uploaded by

imanon

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

195 views

The Snowflake Elastic Data Warehouse SIGMOD 2016 and Beyond Ashish Motivala, Jiaqi Yan

Uploaded by

imanon

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 40

The Snowflake Elastic Data Warehouse

SIGMOD 2016 and beyond

Ashish Motivala, Jiaqi Yan

1
Our Product
•The Snowflake Elastic Data Warehouse, or “Snowflake”
•Built for the cloud
•Multi-tenant, transactional, secure, highly scalable, elastic
•Implemented from scratch (no Hadoop, Postgres etc.)
•Currently runs on AWS and Azure
•Serves tens of millions of queries per day over hundreds
petabytes of data
•1000+ active customers, growing fast

2
Talk Outline
•Motivation and Vision
•Storage vs. Compute or the Perils of Shared-Nothing
•Architecture
•Feature Highlights
•Lessons Learned

3
Why Cloud?

•Amazing platform for building distributed systems

•Virtually unlimited, elastic compute and storage
•Pay-per-use model (with strong economies of scale)
•Efficient access from anywhere

•Software as a Service (SaaS)

•No need for complex IT organization and infrastructure
•Pay-per-use model
•Radically simplified software delivery, update, and user support
•See “Lessons Learned”

4
Data Warehousing in the Cloud
•Traditional DW systems pre-date the cloud
•Designed for small, fixed clusters of machines
•But to reap benefits of the cloud, software needs to be elastic!

•Traditional DW systems rely on complex ETL

(extract-transform-load) pipelines and physical tuning
•Fundamentally assume predictable, slow-moving, easily categorized
data from internal sources (OLTP, ERP, CRM…)
•Cloud data increasingly stems from changing, external sources
•Logs, click streams, mobile devices, social media, sensor data
•Often arrives in schema-less, semi-structured form (JSON, XML, Avro)

5
What about Big Data?
•Hive, Spark, BigQuery, Impala, Blink…
•Batch and/or stream processing at datacenter scale
•Various SQL’esque front-ends
•Increasingly popular alternative for high-end use cases
•Drawbacks
•Lack efficiency and feature set of traditional DW technology
•Security? Backups? Transactions? …
•Require significant engineering effort to roll out and use

6
Our Vision for a Cloud Data Warehouse

Data warehouse Multidimensional All business

as a service elasticity data

No infrastructure to On-demand scalability Native support for

manage, no knobs to tune data, queries, users relational +
semi-structured data

7
Shared-nothing Architecture

•Tables are horizontally partitioned across nodes

•Every node has its own local storage
•Every node is only responsible for its local table partitions

•Elegant and easy to reason about

•Scales well for star-schema queries

•Dominant architecture in data warehousing

•Teradata, Vertica, Netezza…

8
The Perils of Coupling

•Shared-nothing couples compute and storage resources

•Elasticity
•Resizing compute cluster requires redistributing (lots of) data
•Cannot simply shut off unused compute resources → no pay-per-use
•Limited availability
•Membership changes (failures, upgrades) significantly impact
performance and may cause downtime
•Homogeneous resources vs. heterogeneous workload
•Bulk loading, reporting, exploratory analysis

9
Multi-cluster, shared data architecture
ETL & Data
• No data silos Loading
Storage decoupled from compute
• Any data
Native for structured & semi-structured Virtual
Data Science Finance
Warehouse
• Unlimited scalability
Along many dimensions
Virtual Virtual
• Low cost Warehouse Warehouse
Compute on demand
Databases
• Instantly cloning
Isolate production from DEV & QA Clone

• Highly available Marketing Virtual

Warehouse
Virtual Dev, Test,
11 9’s durability, 4 9’s availability Warehouse QA

Dashboards

10
Multi-cluster Shared-data Architecture
Rest (JDBC/ODBC/Python)
Authentication & access control

Cloud Infrastructure Transaction

Optimizer Security
Services manager Manager

Metadata • All data in one place

• Independently scale
Virtual Virtual Virtual Virtual
storage and compute
Warehouse Warehouse Warehouse Warehouse
• No unload / reload to
Cache Cache Cache Cache shut off compute
• Every virtual warehouse
can access all data
Data Storage

11
Data Storage Layer
•Stores table data and query results
•Table is a set of immutable micro-partitions
•Uses tiered storage with Amazon S3 at the bottom
•Object store (key-value) with HTTP(S) PUT/GET/DELETE interface
•High availability, extreme durability (11-9)
•Some important differences w.r.t. local disks
•Performance (sure…)
•No update-in-place, objects must be written in full
•But: can read parts (byte ranges) of objects
•Strong influence on table micro-partition format and
concurrency control
12
Table Files

•Snowflake uses PAX [Ailamaki01] aka

hybrid columnar storage

•Tables horizontally partitioned into

immutable mirco-partitions (~16 MB)
•Updates add or remove entire files
•Values of each column grouped together
and compressed
•Queries read header + columns they need

13
Other Data
•Tiered storage also used for temp data and query results
•Arbitrarily large queries, never run out of disk
•New forms of client interaction
•No server-side cursors
•Retrieve and reuse previous query results

•Metadata stored in a transactional key-value store (not S3)

•Which table consists of which S3 objects
•Optimizer statistics, lock tables, transaction logs etc.
•Part of Cloud Services layer (see later)

14
Virtual Warehouse
•warehouse = Cluster of EC2 instances called worker nodes
•Pure compute resources
•Created, destroyed, resized on demand
•Users may run multiple warehouses at same time
•Each warehouse has access to all data but isolated performance
•Users may shut down all warehouses when they have nothing to run
•T-Shirt sizes: XS to 4XL
•Users do not know which type or how many EC2 instances
•Service and pricing can evolve independent of cloud platform

15
Worker Nodes
•Worker processes are ephemeral and idempotent
•Worker node forks new worker process when query arrives
•Do not modify micro-partitions directly but queue removal or addition
of micro-partitions

•Each worker node maintains local table cache

•Collection of table files i.e. S3 objects accessed in past
•Shared across concurrent and subsequent worker processes
•Assignment of micro-partitions to nodes using consistent hashing, with
deterministic stealing.

16
Execution Engine
•Columnar [MonetDB, C-Store, many more]
•Effective use of CPU caches, SIMD instructions, and compression
•Vectorized [Zukowski05]
•Operators handle batches of a few thousand rows in columnar format
•Avoids materialization of intermediate results
•Push-based [Neumann11 and many before that]
•Operators push results to downstream operators (no Volcano iterators)
•Removes control logic from tight loops
•Works well with DAG-shaped plans
•No transaction management, no buffer pool
•But: most operators (join, group by, sort) can spill to disk and recurse

17
Self Tuning & Self Healing
• Adaptive Automatic Automatic Automatic
Memory Distribution Degree of
Management Method Parallelism
• Self-tuning

• Do no harm!

• Automatic

• Default Automatic Automatic

Fault Workload
Handling Management

18 18
Example: Automatic Skew Avoidance
1

2
Execution Plan

•
1 join 2
•
filter

• scan scan

19
Cloud Services
•Collection of services
•Access control, query optimizer, transaction manager etc.
•Heavily multi-tenant (shared among users) and always on
•Improves utilization and reduces administration
•Each service replicated for availability and scalability
•Hard state stored in transactional key-value store

20
Concurrency Control
•Designed for analytic workloads
•Large reads, bulk or trickle inserts, bulk updates
•Snapshot Isolation (SI) [Berenson95]
•SI based on multi-version concurrency control (MVCC)
•DML statements (insert, update, delete, merge) produce new table
versions of tables by adding or removing whole files
•Natural choice because table files on S3 are immutable
•Additions and removals tracked in metadata (key-value store)
•Versioned snapshots used also for time travel and cloning

21
Pruning
•Database adage: The fastest way to process data? Don’t.
•Limiting access only to relevant data is key aspect of query processing
•Traditional solution: B+-trees and other indices
•Poor fit for us: random accesses, high load time, manual tuning
•Snowflake approach: pruning
•AKA small materialized aggregates [Moerkotte98], zone maps
[Netezza], data skipping [IBM]
•Per file min/max values, #distinct values, #nulls, bloom filters etc.
•Use metadata to decide which files are relevant for a given query
•Smaller than indices, more load-friendly, no user input required

22
Pure SaaS Experience
•Support for various standard interfaces and third-party tools
•ODBC, JDBC, Python PEP-0249
•Tableau, Informatica, Looker
•Feature-rich web UI
•Worksheet, monitoring, user management,
usage information etc.
•Dramatically reduces time to onboard users
•Focus on ease-of-use and service exp.
•No tuning knobs
•No physical design
•No storage grooming

23
Continuous Availability
•Storage and cloud services replicated across datacenters
•Snowflake remains available even if a whole datacenter fails
•Weekly Online Upgrade
•No downtime, no performance degradation!
•Tremendous effect on pace of development and bug resolution time
•Magic sauce: stateless services
•All state is versioned and stored in common key-value store
•Multiple versions of a service can run concurrently
•Load balancing layer routes new queries to new service version, until
old version finished all its queries

24
Semi-Structured and Schema-Less Data
•Three new data types: VARIANT, ARRAY, OBJECT
•VARIANT: holds values of any standard SQL type + ARRAY + OBJECT
•ARRAY: offset-addressable collection of VARIANT values
•OBJECT: dictionary that maps strings to VARIANT values
•Like JavaScript objects or MongoDB documents
•Self-describing, compact binary serialization
•Designed for fast key-value lookup, comparison, and hashing
•Supported by all SQL operators (joins, group by, sort…)

25
Post-relational Operations
•Extraction from VARIANTs using path syntax
SELECT sensor.measure.value, sensor.measure.unit
FROM sensor_events
WHERE sensor.type = ‘THERMOMETER’;

•Flattening (pivoting) a single OBJECT or ARRAY into multiple rows

SELECT p.contact.name.first AS "first_name",
p.contact.name.last AS "last_name",
(f.value.type || ': ' || f.value.contact) AS "contact"
FROM person p,
LATERAL FLATTEN(input => p.contact) f;

26
Schema-Less Data
•Cloudera Impala, Google BigQuery/Dremel
•Columnar storage and processing of semi-structured data
•But: full schema required up front!

•Snowflake introduces automatic type inference and columnar storage for

schema-less data (VARIANT)
•Frequently common paths are detected, projected out, and stored in separate (typed
and compressed) columns in table file
•Collect metadata on these columns for use by optimizer → pruning
•Independent for each micro-partition → schema evolution

27
Automatic Columnarization of
semi-structured data
> SELECT … FROM …

Semi-structured data
(e.g. JSON, Avro, XML)

Structured data
(e.g. CSV, TSV, …)
Native support Optimized SQL querying
Loaded in raw form (e.g. Full benefit of database optimizations
JSON, Avro, XML)
(pruning, filtering, …)
Optimized storage
Optimized data type, no fixed schema or
transformation required
28
Schema-Less Performance

29
ETL vs. ELT
•ETL = Extract-Transform-Load
•Classic approach: extract from source systems, run through some
transformations (perhaps using Hadoop), then load into relational DW
•ELT = Extract-Load-Transform
•Schema-later or schema-never: extract from source systems, leave in
or convert to JSON or XML, load into DW, transform there if desired
•Decouples information producers from information consumers

•Snowflake: ELT with speed and expressiveness of RDBMS

30
Time Travel and Cloning
•Previous versions of data
automatically retained
•Same metadata as Snapshot Isolation
> SELECT * FROM mytable
•Accessed via SQL extensions AT T0

•UNDROP recovers from accidental

deletion T0 T1 T2
•SELECT AT for point-in-time selection
•CLONE [AT] to recreate past versions New Modified
data data

31
Security
•Encrypted data import and export
•Encryption of table data using NIST 800-57 compliant
hierarchical key management and key lifecycle
•Root keys stored in hardware security module (HSM)
•Integration of S3 access policies
•Role-based access control (RBAC) within SQL
•Two-factor authentication and federated authentication

32
Post-SIGMOD ‘16 Features

•Data sharing
•Serverless ingestion of data
•Reclustering of data
•Spark connector with pushdown
•Support for Azure Cloud
•Lots more connectors

33
Lessons Learned
•Building a relational DW was a controversial decision in 2012
•But turned out correct; Hadoop did not replace RDBMSs
•Multi-cluster, shared-data architecture game changer for org
•Business units can provision warehouses on-demand
•Fewer data silos
•Dramatically lower load times and higher load frequency
•Semi-structured extensions were a bigger hit than expected
•People use Snowflake to replace Hadoop clusters

34
Lessons Learned (2)
•SaaS model dramatically helped speed of development
•Only one platform to develop for
•Every user running the same version
•Bugs can be analyzed, reproduced, and fixed very quickly
•Users love “no tuning” aspect
•But creates continuous stream of hard engineering challenges…
•Core performance less important than anticipated
•Elasticity matters more in practice

35
Ongoing Challenges
•SaaS and multi-tenancy are big challenges
•Support tens of thousands of concurrent users, some of which do
weird things, and need protection for themselves.
•Metadata layer has become huge
•Categorizing and handling failures automatically is hard, but
•Automation is key to keeping operations lean
•Lots of work left to do
•SQL performance improvements, better skew handling etc.
•Cloud platform enables a slew of new classes of features.

36
Future work
•Advisors
•Materialized Views
•Stored procedures
•Data Lake support
•Streaming
•Time series
•Multi-cloud
•Global Snowflake
•Replication

37
Who We Are

•Founded: August 2012

•Mission in 2012: Build an enterprise data warehouse as a cloud
service
•HQ in downtown San Mateo (south of San Francisco), Engr
Office #2 in Seattle
•400+ employees, 80 engrs and hiring…
•Founders: Benoit Dageville, Thierry Cruanes, Marcin Zukowski
•CEO: Bob Muglia
•Raised $283M in 2018

38
Summary
•Snowflake is an enterprise-ready data warehouse as a service
•Novel multi-cluster, shared-data architecture
•Highly elastic and available
•Semi-structured and schema-less data at the speed of relational data
•Pure SaaS experience
•Rapidly growing user base and data volume
•Lots of challenging work left to do

39
40

Snowflake CCMCN Case Study 04-18-2023 Final
No ratings yet
Snowflake CCMCN Case Study 04-18-2023 Final
3 pages
Databricks Delta Guide
No ratings yet
Databricks Delta Guide
11 pages
Protegrity Database Protector
No ratings yet
Protegrity Database Protector
2 pages
Information Security Quiz
No ratings yet
Information Security Quiz
2 pages
Eb Cloud Data Warehouse Comparison Ebook en
No ratings yet
Eb Cloud Data Warehouse Comparison Ebook en
10 pages
The Snowflake Handbook: Optimizing Data Warehousing and Analytics
From Everand
The Snowflake Handbook: Optimizing Data Warehousing and Analytics
Robert Johnson
No ratings yet
AWS Certified SysOps Administrator
No ratings yet
AWS Certified SysOps Administrator
3 pages
COF C02 Demo
No ratings yet
COF C02 Demo
4 pages
Talend Installation Guide (Data Service Platform)
No ratings yet
Talend Installation Guide (Data Service Platform)
14 pages
Access Control Snowflake
No ratings yet
Access Control Snowflake
6 pages
Apache Airflow Fundamentals Study Guide
No ratings yet
Apache Airflow Fundamentals Study Guide
7 pages
Snowflake Certification Practice Paper3 V1-Done
No ratings yet
Snowflake Certification Practice Paper3 V1-Done
22 pages
Certification
No ratings yet
Certification
16 pages
05.azure Data Lake Authentication
No ratings yet
05.azure Data Lake Authentication
16 pages
Pricing Guide Snowflake
No ratings yet
Pricing Guide Snowflake
9 pages
Snowflake Fundamentals Anand Jha
No ratings yet
Snowflake Fundamentals Anand Jha
50 pages
DataEngineer Roadmap
No ratings yet
DataEngineer Roadmap
12 pages
Data Warehouse - What Is It
No ratings yet
Data Warehouse - What Is It
5 pages
Unstructured Dataload Into Hive Database Through PySpark
No ratings yet
Unstructured Dataload Into Hive Database Through PySpark
9 pages
Snowflake - Data Ingestion - Loading
No ratings yet
Snowflake - Data Ingestion - Loading
12 pages
Snowflake: City - Key City - Name City - Code
No ratings yet
Snowflake: City - Key City - Name City - Code
2 pages
Snowflake SnowPro Core Certification Exam Questions - Page 24 of 27 - SkillCertPro
No ratings yet
Snowflake SnowPro Core Certification Exam Questions - Page 24 of 27 - SkillCertPro
1 page
Snowflake Best Practices
No ratings yet
Snowflake Best Practices
7 pages
Cloudera Hive
No ratings yet
Cloudera Hive
132 pages
Caching in Snowflake
No ratings yet
Caching in Snowflake
7 pages
EC2 Notes
No ratings yet
EC2 Notes
10 pages
Snowflake To Oracle
No ratings yet
Snowflake To Oracle
16 pages
Advanced Data Warehousing
No ratings yet
Advanced Data Warehousing
295 pages
Recommendations For Deploying Apache Kafka On Kubernetes
No ratings yet
Recommendations For Deploying Apache Kafka On Kubernetes
9 pages
Hadoop Security S360 2015v8 PDF
No ratings yet
Hadoop Security S360 2015v8 PDF
27 pages
Snowflake Architecture
No ratings yet
Snowflake Architecture
18 pages
Aws Redshift: Calculations Are Typically Executed On Small Number of Columns
No ratings yet
Aws Redshift: Calculations Are Typically Executed On Small Number of Columns
8 pages
Snowflake Prctice1
No ratings yet
Snowflake Prctice1
51 pages
Amazon: Exam Questions AWS-Certified-Solutions-Architect-Professional
No ratings yet
Amazon: Exam Questions AWS-Certified-Solutions-Architect-Professional
27 pages
Azure Data Factory Monitoring Best Practices
No ratings yet
Azure Data Factory Monitoring Best Practices
9 pages
Set Your Data in Motion
No ratings yet
Set Your Data in Motion
8 pages
Matillion Optimizing Snowflake
No ratings yet
Matillion Optimizing Snowflake
23 pages
Azure Cloud Intro
No ratings yet
Azure Cloud Intro
34 pages
Cloudera Manager Administration Guide
No ratings yet
Cloudera Manager Administration Guide
78 pages
Apache Hive
No ratings yet
Apache Hive
3 pages
AWS - DOP-C01 - DumpsTool - Mansoor
No ratings yet
AWS - DOP-C01 - DumpsTool - Mansoor
4 pages
Akash Resume
No ratings yet
Akash Resume
7 pages
KBT RACE 2 User Manual
No ratings yet
KBT RACE 2 User Manual
4 pages
Hadoop Interviews Q
No ratings yet
Hadoop Interviews Q
9 pages
Snowflake Certification Practice Paper2 V2-Done
No ratings yet
Snowflake Certification Practice Paper2 V2-Done
22 pages
Aws
100% (1)
Aws
2 pages
Data Prep Ebook Snowflake 1
No ratings yet
Data Prep Ebook Snowflake 1
8 pages
DWH Basics by Suresh
No ratings yet
DWH Basics by Suresh
17 pages
Module 7: Data Management Backup, DR, Test/Dev Environments
No ratings yet
Module 7: Data Management Backup, DR, Test/Dev Environments
9 pages
Hands-On Lab Guide For: Virtual Zero-To-Snowflake
No ratings yet
Hands-On Lab Guide For: Virtual Zero-To-Snowflake
63 pages
Facebook Hive POC
No ratings yet
Facebook Hive POC
18 pages
Koustav BigData Resume
No ratings yet
Koustav BigData Resume
2 pages
SnowPro Core Exam - Actual Q&as, Page 1 - ExamTopics
No ratings yet
SnowPro Core Exam - Actual Q&as, Page 1 - ExamTopics
1,148 pages
Practice Questions Edition'22: Prepare Yourself For Exam Azure Administrator
No ratings yet
Practice Questions Edition'22: Prepare Yourself For Exam Azure Administrator
14 pages
Apache Pig
100% (2)
Apache Pig
80 pages
Create An Spark Streaming App: 1. Architecture and Abstraction
No ratings yet
Create An Spark Streaming App: 1. Architecture and Abstraction
8 pages
Semantic Data Lineage and Impact Analysi
No ratings yet
Semantic Data Lineage and Impact Analysi
126 pages
2 - Apache Airflow
No ratings yet
2 - Apache Airflow
5 pages
GCP Data
No ratings yet
GCP Data
6 pages
Spark Details
No ratings yet
Spark Details
11 pages
Ultimate AWS Certified Solutions Architect Associate Exam Guide: Master Designing Resilient, Scalable Architectures with Core and Advanced AWS Services to Crack the SAA-C03 Certification (English Edition)
From Everand
Ultimate AWS Certified Solutions Architect Associate Exam Guide: Master Designing Resilient, Scalable Architectures with Core and Advanced AWS Services to Crack the SAA-C03 Certification (English Edition)
Venkata Sasi Kanumuri
No ratings yet
Network Addressing Primer: Public IP, Private IP, CIDR
No ratings yet
Network Addressing Primer: Public IP, Private IP, CIDR
23 pages
Data Architect: Alliance Internal
No ratings yet
Data Architect: Alliance Internal
2 pages
Data Architecture-ACN
No ratings yet
Data Architecture-ACN
5 pages
AWS Content Analysis: Implementation Guide
No ratings yet
AWS Content Analysis: Implementation Guide
34 pages
Making Sense of Schema-on-Read: Modeling JSON
No ratings yet
Making Sense of Schema-on-Read: Modeling JSON
49 pages
Snowflake
No ratings yet
Snowflake
3 pages
House Rent Receipt: Dated: 1/4/2020
No ratings yet
House Rent Receipt: Dated: 1/4/2020
9 pages
House Rent Receipt: Dated: 1/4/2020
No ratings yet
House Rent Receipt: Dated: 1/4/2020
9 pages
RR33 PDF
No ratings yet
RR33 PDF
9 pages
House Rent Receipt: Dated: 1/4/2020
No ratings yet
House Rent Receipt: Dated: 1/4/2020
9 pages
House Rent Receipt: Dated: 1/4/2020
No ratings yet
House Rent Receipt: Dated: 1/4/2020
9 pages
House Rent Receipt: Dated: 1/4/2020
No ratings yet
House Rent Receipt: Dated: 1/4/2020
9 pages
RR4 PDF
No ratings yet
RR4 PDF
9 pages
House Rent Receipt: Dated: 1/4/2020
No ratings yet
House Rent Receipt: Dated: 1/4/2020
9 pages
House Rent Receipt: Dated: 1/4/2020
No ratings yet
House Rent Receipt: Dated: 1/4/2020
9 pages
HANA IQs
No ratings yet
HANA IQs
21 pages
RR1 PDF
No ratings yet
RR1 PDF
9 pages
SAP HANA Troubleshooting and Performance Analysis Guide en
No ratings yet
SAP HANA Troubleshooting and Performance Analysis Guide en
220 pages
216715b8-7914-4620-93c0-aad5e47efab3
No ratings yet
216715b8-7914-4620-93c0-aad5e47efab3
3 pages
Business Analyst Role in SQL
No ratings yet
Business Analyst Role in SQL
2 pages
SAP MM Account Assignment Category 1702295242
No ratings yet
SAP MM Account Assignment Category 1702295242
35 pages
ROM and Its Types
No ratings yet
ROM and Its Types
1 page
BA MCQs
No ratings yet
BA MCQs
2 pages
CC Unit 5
No ratings yet
CC Unit 5
8 pages
OSINT - Digital Mosaic Case Study
No ratings yet
OSINT - Digital Mosaic Case Study
9 pages
Data Storage
No ratings yet
Data Storage
44 pages
SQL Cheatsheet Zero To Mastery V1.01 PDF
No ratings yet
SQL Cheatsheet Zero To Mastery V1.01 PDF
20 pages
BIDM Quiz - 2022
No ratings yet
BIDM Quiz - 2022
13 pages
The Art of Filing Systems
No ratings yet
The Art of Filing Systems
26 pages
1033483.bobcatsss Proceedings4 PDF
No ratings yet
1033483.bobcatsss Proceedings4 PDF
583 pages
Calibre Manual
No ratings yet
Calibre Manual
382 pages
Acuerdo-025-De-2013 Estatuto de Rentes e Impuestos de Villa Del Rosario
100% (1)
Acuerdo-025-De-2013 Estatuto de Rentes e Impuestos de Villa Del Rosario
418 pages
Raid 5
No ratings yet
Raid 5
26 pages
Database Middleware and Web Services For Data Distribution and Integration in Distributed Heterogeneous Databased Systems
No ratings yet
Database Middleware and Web Services For Data Distribution and Integration in Distributed Heterogeneous Databased Systems
6 pages
class 10 it electronic spreadsheet ppt
No ratings yet
class 10 it electronic spreadsheet ppt
12 pages
When Using OSAM: - Reasons You May Want To Use OSAM Are
No ratings yet
When Using OSAM: - Reasons You May Want To Use OSAM Are
90 pages
Lecture 12 - Soc 07012024 082159pm
No ratings yet
Lecture 12 - Soc 07012024 082159pm
33 pages
Asm502 Ind Assignment Access Dec2022
No ratings yet
Asm502 Ind Assignment Access Dec2022
22 pages
Assignment 1 2
No ratings yet
Assignment 1 2
4 pages
Wiser:: Electronic Resources For Research
No ratings yet
Wiser:: Electronic Resources For Research
25 pages
Big Data 2.0 Processing Systems 2ed
No ratings yet
Big Data 2.0 Processing Systems 2ed
155 pages
616 Cataloging Policy Procedures
No ratings yet
616 Cataloging Policy Procedures
7 pages
Ecdis Handbook 2
100% (1)
Ecdis Handbook 2
1 page
Basic SQL Queries On PostgreSQL
No ratings yet
Basic SQL Queries On PostgreSQL
16 pages
Welcome : Revit Architecture
No ratings yet
Welcome : Revit Architecture
20 pages
Exam70-767-Implementing A Data Warehouse Using SQL
100% (1)
Exam70-767-Implementing A Data Warehouse Using SQL
2 pages
Fascist Components in The Political Thought of Vladimir Jabotinsky - Tress-FascistComponentsPolitical-1984
No ratings yet
Fascist Components in The Political Thought of Vladimir Jabotinsky - Tress-FascistComponentsPolitical-1984
22 pages
Belfasin GTN Blue Sign
No ratings yet
Belfasin GTN Blue Sign
3 pages

The Snowflake Elastic Data Warehouse SIGMOD 2016 and Beyond Ashish Motivala, Jiaqi Yan

Uploaded by

The Snowflake Elastic Data Warehouse SIGMOD 2016 and Beyond Ashish Motivala, Jiaqi Yan

Uploaded by

The Snowflake Elastic Data Warehouse

SIGMOD 2016 and beyond

Ashish Motivala, Jiaqi Yan

•Amazing platform for building distributed systems

•Software as a Service (SaaS)

•Traditional DW systems rely on complex ETL

Data warehouse Multidimensional All business

No infrastructure to On-demand scalability Native support for

•Tables are horizontally partitioned across nodes

•Elegant and easy to reason about

•Dominant architecture in data warehousing

•Shared-nothing couples compute and storage resources

• Highly available Marketing Virtual

Cloud Infrastructure Transaction

Metadata • All data in one place

•Snowflake uses PAX [Ailamaki01] aka

•Tables horizontally partitioned into

•Metadata stored in a transactional key-value store (not S3)

•Each worker node maintains local table cache

• Default Automatic Automatic

•Flattening (pivoting) a single OBJECT or ARRAY into multiple rows

•Snowflake introduces automatic type inference and columnar storage for

•Snowflake: ELT with speed and expressiveness of RDBMS

•UNDROP recovers from accidental

•Founded: August 2012

You might also like