0% found this document useful (0 votes)
3 views

An Introduction to Snowflake - SQLKonferenz

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

An Introduction to Snowflake - SQLKonferenz

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 29

An introduction to Snowflake - the

data cloud
Johan Ludvig Brattås
Deloitte
Agenda
• A short history
• Overview
• Snowflake as a DB
• Integrations
• Snowpark
The cloud data warehouse
• Initially a response on challenges faced by
traditional RDBMS

• Massivelly Parallell Processing (MPP)

• Still a take on EDW


the cloud data platform
Can data lake functionality and EDW merge
somehow?

Suggestions for solving the issues:


• Logical data warehouse
• Cloud data warehouse
• Virtualization

Enter the new cloud data platforms


Definition of a cloud data platform
• No longer just your Dad-a-base…
• Storage supporting diverse data types
• Compute and tools supporting diverse
workloads
• Tooling for CI/CD, encryption, RBAC etc
• Data management tools
Snowflake
• Established in 2012
• Launched publicly in 2015
• Record IPO in 2020
• Unique architecture with fully separated
storage and compute
• Based on ANSI SQL
• Started as a data warehousing service
Snowflake vs Databricks
• Snowflake comes from EDW world
• Databricks from Spark data science and
python data engineering

• Converge as both have added new


features
Snowflake vs Databricks
• Handbags at dawn
The Snowflake Architecture
• The core Snowflake platform
• Storage
• Compute
• Cloud Services
• Snowgrid
Storage
• Databases for ACID + RDBMS
• Automated partitioning
• Time travel
• Autotuned
• Internal Stage for semi- & unstructured
• External stages to on-prem
& cloud
Storage
• Cloud stages support S3, GCS & ADLS
• On-prem only S3-compatible

• External stages support


• JSON/XML/CSV…
• Avro/Parquet…
• Apache Iceberg
• Delta Lake
Storage
• Create External Tables
• Build materialized views on
semi-structured data
Compute
• Called warehouses
• Elastic
• From XS -> 6XL
• 2 types
• Normal
• Snowpark (memory) optimized
• Auto-pause + instant restart
Compute
• Consists of CPU & RAM
• Cache
• Separate warehouses per
usecase

• Be mindful of auto-pause =
cache emptied
• Plan your usecase usage patterns
Cloud Services
• The central administration and
control layer
• 4 pillars
• Maintenance & tuning
• Administration
• Networking & Encryption
• Resource Manager
Cloud Services – 4 pillars
• Maintenance & tuning
• Administration
• Networking & Encryption
• Resource Manager
Cloud Services – 4 pillars
• Maintenance & tuning
• Common meta-data repository
• Snowflake is “DBA-free”
• Auto-tuning of queries
• Auto-partitioning
• Auto-indexing/”Indexfree”
Cloud Services – 4 pillars
• Administration
• Transaction manager
• Security/RBAC
• Authentication & Authorization
• Networking & Encryption
• Intra-cluster
• Cloud connectivity
• Resource Manager
• Cluster management
The Snowflake Architecture
• Snowgrid
• Global Snowflake internal
network
• Cloud Agnostic
Integrations
• Integration
• Stages
• External Tables
• Dynamic Tables
• Snowpipes
• Unistore
The Snowflake Architecture
• Snowpark
• Streamlit
Snowpark
• Expands Snowflake from traditional RDBMS
• Python – offers traditional dataframe APIs
• Also ML modelling and opreations APIs

• Can run inside warehouses


• Can run on containers (Snowpark Container Services)
Streamlit
• Company aqcuired by Snowflake 2022
• Build interactive apps with Python that runs on Snowflake
• Web apps, widgets – with unique URLs that can be shared
• Still in public preview
The Snowflake Marketplace
• From the consumer
• Search, discover and sample datasets globally
• Access datasets –
some free, some commercial
• No need to run ETL processes to fetch data
• Directly start querying the data inside own
account
• Can combine internal and marketplace data
The Snowflake Marketplace
• From the producer
• Share data with users outside your
organization
• This done through listings
• Listings can be global or limited to select
users/organizations
• Datasets can be a one-off, an update or
stream.
• No special development needed
• Listings can be private, free or paid
THANK YOU TO OUR SPONSORS
Chronic volunteer
Co-organizer – DataSaturday Oslo
President – MDPUG Oslo
Frequent voulenteer in general

When not geeking out over new tech

Johan Ludvig Brattås Teaching coeliacs how to bake gluten free


Baking
Hiking
Gardening
Director, Deloitte

/johanludvig

@intoleranse

[email protected]
Thank you very much for your attention.
Vielen Dank für Eure Aufmerksamkeit.

You might also like