0% found this document useful (0 votes)
37 views

Presentation 1

Traditional data platform architectures like shared-disk and shared-nothing aimed to improve scalability but faced limitations. Shared-disk relied on complex locking mechanisms that caused bottlenecks. Shared-nothing distributed storage and compute but introduced data shuffling overhead between nodes, making it challenging to balance resources. NoSQL solutions also adopted shared-nothing architectures and excelled at flexible storage of nonrelational data types, but faced similar scalability challenges due to their distributed designs.

Uploaded by

anupsnair
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
37 views

Presentation 1

Traditional data platform architectures like shared-disk and shared-nothing aimed to improve scalability but faced limitations. Shared-disk relied on complex locking mechanisms that caused bottlenecks. Shared-nothing distributed storage and compute but introduced data shuffling overhead between nodes, making it challenging to balance resources. NoSQL solutions also adopted shared-nothing architectures and excelled at flexible storage of nonrelational data types, but faced similar scalability challenges due to their distributed designs.

Uploaded by

anupsnair
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 57

CLOUD COMPUTING OVERVIEW

• ON-DEMAND AVAILABILITY OF DATA STORAGE AND COMPUTING POWER


CHARACTERIZES CLOUD COMPUTING.
• PRIMARY BENEFIT: NO DIRECT USER INVOLVEMENT IN RESOURCE
MANAGEMENT.
• ADDITIONAL BENEFITS: UNLIMITED STORAGE, AUTOMATIC UPDATES,
INSTANT SCALABILITY, HIGH SPEED, AND COST REDUCTIONS.
• RECENT CLOUD COMPUTING EXPLOSION LED BY AWS REDSHIFT, GOOGLE
BIGQUERY, AND MICROSOFT AZURE DATA WAREHOUSE RESULTED IN ON-
PREMISES DATA CENTER DECLINE.
EVOLUTION OF SNOWFLAKE IN CLOUD
ENVIRONMENT

• MAJOR DATA WAREHOUSE PROVIDERS LIKE ORACLE AND IBM ADAPTED


TRADITIONAL SOLUTIONS TO THE CLOUD.
• SNOWFLAKE BUILT NATIVELY FOR THE CLOUD, ORIGINATED AS A
DISRUPTIVE CLOUD DATA WAREHOUSE.
• SNOWFLAKE HAS EVOLVED BEYOND BEING A MODERN DATA WAREHOUSE.
SNOWFLAKE'S ACHIEVEMENTS AND
RECOGNITION

• SNOWFLAKE'S NOTABLE RECOGNITION INCLUDES WINNING AT THE 2015


STRATA + HADOOP WORLD STARTUP COMPETITION.
• NAMED A “COOL VENDOR” IN GARTNER’S MAGIC QUADRANT 2015 DBMS
REPORT.
• 2019 RANKINGS: NUMBER 2 ON FORBES CLOUD 100 LIST, NUMBER 1 ON
LINKEDIN’S U.S. LIST OF TOP STARTUPS.
• ON SEPTEMBER 16, 2020, SNOWFLAKE ACHIEVED THE LARGEST SOFTWARE
IPO IN HISTORY.
SNOWFLAKE DATA CLOUD PLATFORM

• SNOWFLAKE DATA CLOUD PLATFORM ENABLES VARIOUS WORKLOADS,


INCLUDING DATA ENGINEERING, DATA ANALYTICS, AND CYBERSECURITY.
• SUPPORTS DATA LAKE, DATA COLLABORATION, DATA SCIENCE, UNISTORE,
AND TRADITIONAL DATA WAREHOUSE WORKLOADS.
• “MANY DATA WORKLOADS, ONE PLATFORM” APPROACH FACILITATES QUICK
VALUE DERIVATION FROM GROWING DATA SETS SECURELY.
SNOWFLAKE'S JOURNEY AND FOUNDING
VISION

• FOUNDED IN 2012 WITH A VISION TO BUILD A CLOUD-NATIVE DATA


WAREHOUSE FOR LIMITLESS INSIGHTS FROM DIVERSE DATA.
• COMMERCIALLY AVAILABLE IN 2015, SNOWFLAKE DISRUPTED THE DATA
WAREHOUSING MARKET WITH ITS UNIQUE ARCHITECTURE.
• DEMOCRATIZED DATA ANALYTICS BY MAKING DATA ENGINEERING
BUSINESS-ORIENTED, LESS TECHNICAL, AND TIME-EFFICIENT.
SNOWFLAKE'S UNIQUE ARCHITECTURE

• SNOWFLAKE OFFERS NEAR-ZERO MANAGEMENT CAPABILITY, REDUCING


ADMINISTRATIVE AND MANAGEMENT OVERHEAD.
• EACH CHAPTER INCLUDES SQL EXAMPLES, ALLOWING USERS TO EXPLORE
SNOWFLAKE'S FUNCTIONALITY.
• THE ARCHITECTURE SUPPORTS A DEMOCRATIZED APPROACH TO DATA
ANALYTICS BY ELIMINATING TECHNICAL BARRIERS.
SNOWFLAKE'S SECURE DATA SHARING AND
ORGANIZATIONS

• SNOWFLAKE INTRODUCED SECURE DATA SHARING CAPABILITIES IN 2018,


ENABLING SECURE DATA SHARING ACROSS BUSINESS ECOSYSTEMS.
• IN 2021, SNOWFLAKE INTRODUCED ORGANIZATIONS FOR MANAGING
MULTIPLE ACCOUNTS, FACILITATING A MULTICLOUD STRATEGY.
• SECURE DATA SHARING OPENS POSSIBILITIES FOR MONETIZING DATA
ASSETS.
SNOWFLAKE'S SNOWPARK AND SECURITY
DATA LAKE

• SNOWPARK, INTRODUCED IN 2021, IS A DEVELOPER FRAMEWORK ENABLING


JAVA, SCALA, OR PYTHON CODE DEPLOYMENT IN A SERVERLESS MANNER.
• THE SECURITY DATA LAKE, INTRODUCED IN 2022, EMPOWERS
CYBERSECURITY AND COMPLIANCE TEAMS WITH FULL VISIBILITY INTO
SECURITY LOGS.
• BOTH SNOWPARK AND THE SECURITY DATA LAKE ENHANCE DATA
ENGINEERING AND MACHINE LEARNING CAPABILITIES.
SNOWFLAKE WEB USER INTERFACES

• TWO SNOWFLAKE WEB USER INTERFACES AVAILABLE: CLASSIC CONSOLE


AND SNOWSIGHT (INTRODUCED IN 2021).
• SNOWSIGHT IS THE DEFAULT UI IN NEWLY CREATED SNOWFLAKE ACCOUNTS
SINCE ITS INTRODUCTION.
• EXPECTED TRANSITION: SNOWSIGHT BECOMES THE DEFAULT UI FOR ALL
ACCOUNTS BY EARLY 2023, WITH THE CLASSIC CONSOLE DEPRECATED SOON
AFTER.
• HANDS-ON EXAMPLES IN THIS PRESENTATION WILL BE COMPLETED IN
SNOWSIGHT UNLESS OTHERWISE SPECIFIED.
CONTINUOUS IMPROVEMENT OF SNOWSIGHT

• SNOWSIGHT, INTRODUCED IN 2021, IS SUBJECT TO CONTINUOUS


IMPROVEMENT LIKE OTHER SNOWFLAKE FEATURES.
• SLIGHT DEVIATIONS IN SCREENSHOTS MAY OCCUR AS SNOWSIGHT
UNDERGOES ENHANCEMENTS.
• FREQUENT RELEASES: SNOWFLAKE DEPLOYS TWO SCHEDULED RELEASES
WEEKLY AND ONE BEHAVIOR CHANGE RELEASE MONTHLY.
• REFER TO SNOWFLAKE DOCUMENTATION FOR MORE INFORMATION ON
RELEASES.
CREATING AND MANAGING THE SNOWFLAKE ARCHITECTURE
EVOLUTION OF DATA PLATFORM
ARCHITECTURES
• A DECADE AGO, DATA PLATFORM ARCHITECTURES LACKED SCALABILITY,
HINDERING SIMULTANEOUS DATA SHARING FOR DATA-DRIVEN TEAMS,
REGARDLESS OF TEAM SIZE OR PROXIMITY TO DATA.
• INCREASED DEMAND FOR SCALABLE SOLUTIONS AND GOVERNED DATA
ACCESS LED TO MODIFICATIONS IN EXISTING DATA PLATFORM
ARCHITECTURES.
• THE SHEER NUMBER AND COMPLEXITY OF PLATFORMS, COUPLED WITH
DATA-INTENSIVE APPLICATIONS, PERSISTED AS CHALLENGES UNTIL THE
INTRODUCTION OF SNOWFLAKE.
• SNOWFLAKE EMERGED AS AN EVOLUTIONARY MODERN DATA PLATFORM,
UNIQUELY SOLVING THE SCALABILITY PROBLEM.
SNOWFLAKE'S EVOLUTIONARY IMPACT

• SNOWFLAKE, AS AN EVOLUTIONARY MODERN DATA PLATFORM,


SUCCESSFULLY SOLVED THE SCALABILITY PROBLEM.
• COMPARED TO TRADITIONAL CLOUD DATA PLATFORM ARCHITECTURES,
SNOWFLAKE OFFERS FASTER DATA STORAGE AND PROCESSING, ENHANCED
USABILITY, AND GREATER AFFORDABILITY.
• SNOWFLAKE'S DATA CLOUD PROVIDES A DISTINCTIVE USER EXPERIENCE,
COMBINING A NEW SQL QUERY ENGINE WITH AN INNOVATIVE
ARCHITECTURE PURPOSE-BUILT FOR THE CLOUD.
TRADITIONAL DATA
PLATFORM ARCHITECTURES
AGENDA
1. INTRODUCTION TO TRADITIONAL DATA PLATFORM ARCHITECTURES
1. BRIEF OVERVIEW OF TRADITIONAL DATA PLATFORM ARCHITECTURES.
2. EXAMINATION OF SCALABILITY IMPROVEMENTS ATTEMPTED IN THEIR DESIGN.
2. LIMITATIONS OF TRADITIONAL ARCHITECTURES
1. DISCUSSION ON THE LIMITATIONS INHERENT IN TRADITIONAL DATA PLATFORM
ARCHITECTURES.
3. UNIQUENESS OF SNOWFLAKE DATA CLOUD ARCHITECTURE
1. EXPLORATION OF WHAT SETS SNOWFLAKE DATA CLOUD ARCHITECTURE APART.
2. UNDERSTANDING HOW SNOWFLAKE ADDRESSES THE SCALABILITY CHALLENGE.
4. DEEP DIVE INTO SNOWFLAKE ARCHITECTURE LAYERS
1. IN-DEPTH EXPLORATION OF THE THREE KEY SNOWFLAKE ARCHITECTURE
LAYERS:
1. CLOUD SERVICES LAYER
2. QUERY PROCESSING (VIRTUAL WAREHOUSE) COMPUTE LAYER
3. CENTRALIZED (HYBRID COLUMNAR) DATABASE STORAGE LAYER.
SHARED-DISK (SCALABLE) ARCHITECTURE
• SHARED-DISK ARCHITECTURE, AN EARLY SCALING APPROACH,
CENTRALIZES DATA STORAGE FOR ACCESSIBILITY FROM MULTIPLE
DATABASE CLUSTER NODES CONSISTENT DATA AVAILABILITY FOR EACH
CLUSTER NODE IS ENSURED AS ALL DATA MODIFICATIONS ARE WRITTEN TO
THE SHARED DISK.
• KNOWN FOR SIMPLICITY IN DATA MANAGEMENT, THIS TRADITIONAL
DATABASE DESIGN RELIES ON COMPLEX ON-DISK LOCKING MECHANISMS
FOR DATA CONSISTENCY, LEADING TO BOTTLENECKS.
• DATA CONCURRENCY CHALLENGES ARISE, HINDERING MULTIPLE USERS
FROM AFFECTING TRANSACTIONS SIMULTANEOUSLY. THE ADDITION OF
MORE COMPUTE NODES EXACERBATES ISSUES IN A SHARED-DISK
ARCHITECTURE, LIMITING ITS TRUE SCALABILITY.
SHARED-DISK ARCHITECTURE CHALLENGES

• SHARED-DISK ARCHITECTURE, KNOWN FOR ITS SIMPLICITY, RELIES ON


COMPLEX ON-DISK LOCKING MECHANISMS FOR DATA CONSISTENCY,
CAUSING BOTTLENECKS.
• DATA CONCURRENCY ISSUES HINDER MULTIPLE USERS FROM AFFECTING
TRANSACTIONS SIMULTANEOUSLY.
• THE ADDITION OF MORE COMPUTE NODES EXACERBATES CHALLENGES,
LIMITING THE TRUE SCALABILITY OF THIS TRADITIONAL DATABASE DESIGN.
LIMITATIONS OF SHARED-DISK
ARCHITECTURE

• SHARED-DISK ARCHITECTURE'S SIMPLICITY IN THEORY IS MARRED BY


COMPLEX ON-DISK LOCKING MECHANISMS, LEADING TO BOTTLENECKS.
• DATA CONCURRENCY CHALLENGES ARISE, IMPEDING MULTIPLE USERS
FROM CONCURRENTLY AFFECTING TRANSACTIONS.
• THE SCALABILITY OF THIS TRADITIONAL DESIGN IS CONSTRAINED, AND
ADDING MORE COMPUTE NODES COMPOUNDS THE EXISTING PROBLEMS.
INTRODUCTION TO SHARED-NOTHING
• SHARED-NOTHING ARCHITECTURE, ILLUSTRATED IN FIGURE 2-2, SCALES
ARCHITECTURE
STORAGE AND COMPUTE TOGETHER, EVOLVING FROM THE LIMITATIONS OF
SHARED-DISK ARCHITECTURE.
• THIS ARCHITECTURE EMERGED AS STORAGE COSTS BECAME MORE
AFFORDABLE, ADDRESSING THE BOTTLENECK CREATED BY THE SHARED-
DISK APPROACH.
• DISTRIBUTED CLUSTER NODES, ACCOMPANIED BY DISK STORAGE, CPU, AND
MEMORY, NECESSITATE DATA SHUFFLING BETWEEN NODES, INTRODUCING
OVERHEAD.
• THE DISTRIBUTION METHOD OF DATA ACROSS NODES DETERMINES THE
EXTENT OF ADDITIONAL OVERHEAD, MAKING IT CHALLENGING TO STRIKE
THE RIGHT BALANCE BETWEEN STORAGE AND COMPUTE.
CHALLENGES OF SHARED-NOTHING
ARCHITECTURE
• SHARED-NOTHING ARCHITECTURE SCALES STORAGE AND COMPUTE
TOGETHER, ADDRESSING SHARED-DISK BOTTLENECKS BUT INTRODUCING
CHALLENGES.
• DESPITE AFFORDABLE STORAGE COSTS, SHUFFLING DATA BETWEEN
DISTRIBUTED CLUSTER NODES ADDS OVERHEAD.
• BALANCING STORAGE AND COMPUTE IS CHALLENGING, AND THE
DISTRIBUTION METHOD OF DATA ACROSS NODES IMPACTS THE EXTENT OF
ADDITIONAL OVERHEAD.
• RESIZING A CLUSTER, THOUGH POSSIBLE, IS TIME-CONSUMING, LEADING
ORGANIZATIONS TO OFTEN OVERPROVISION RESOURCES, RESULTING IN
UNUSED CAPACITY.
BALANCING STORAGE AND COMPUTE IN
SHARED-NOTHING ARCHITECTURE
• SHARED-NOTHING ARCHITECTURE AIMS TO BALANCE STORAGE AND
COMPUTE, RESPONDING TO SHARED-DISK BOTTLENECKS.
• AFFORDABLE STORAGE COSTS FACILITATED THIS ARCHITECTURE'S
EVOLUTION, REDUCING THE SHARED-DISK BOTTLENECK.
• CHALLENGES ARISE FROM DATA SHUFFLING OVERHEAD BETWEEN
DISTRIBUTED CLUSTER NODES, IMPACTING THE BALANCE BETWEEN
STORAGE AND COMPUTE.
• RESIZING A CLUSTER IS A TIME-CONSUMING PROCESS, LEADING
ORGANIZATIONS TO OVERPROVISION SHARED-NOTHING RESOURCES.
OVERVIEW OF NOSQL SOLUTIONS

• MOST NOSQL SOLUTIONS ADOPT A SHARED-NOTHING ARCHITECTURE,


SHARING SIMILARITIES AND LIMITATIONS.
• NOSQL SOLUTIONS EXCEL IN STORING NONRELATIONAL DATA WITHOUT
REQUIRING DATA TRANSFORMATION.
• NOSQL SYSTEMS TYPICALLY OPERATE WITHOUT SCHEMAS, PROVIDING
FLEXIBILITY IN DATA STORAGE.
• THE TERM NOSQL STANDS FOR "NOT ONLY SQL," EMPHASIZING ITS
VERSATILITY FOR VARIOUS DATA TYPES LIKE EMAIL, WEB LINKS, SOCIAL
MEDIA POSTS, TWEETS, ROAD MAPS, AND SPATIAL DATA.
TYPES OF NOSQL DATABASES

• FOUR TYPES OF NOSQL DATABASES EXIST: DOCUMENT STORES, KEY-VALUE (KV) STORES,
COLUMN FAMILY DATA STORES (WIDE COLUMN DATA STORES), AND GRAPH DATABASES.
• DOCUMENT-BASED NOSQL DATABASES LIKE MONGODB STORE DATA IN JSON OBJECTS, USING
KEY-VALUE PAIR-LIKE STRUCTURES.
• KEY-VALUE DATABASES, EXEMPLIFIED BY DYNAMODB, ARE EFFECTIVE FOR CAPTURING
CUSTOMER BEHAVIOR IN SPECIFIC SESSIONS.
• CASSANDRA, AN EXAMPLE OF A COLUMN-BASED DATABASE, LOGICALLY GROUPS DYNAMIC
COLUMNS INTO COLUMN FAMILIES.
• GRAPH-BASED DATABASES, INCLUDING NEO4J AND AMAZON NEPTUNE, EXCEL IN
RECOMMENDATION ENGINES AND SOCIAL NETWORKS, IDENTIFYING PATTERNS OR
RELATIONSHIPS AMONG DATA POINTS.
LIMITATIONS AND USE CASES OF NOSQL
• NOSQL SOLUTIONS FACE LIMITATIONS, PARTICULARLY IN PERFORMING
CALCULATIONS INVOLVING MANY RECORDS, SUCH AS AGGREGATIONS,
WINDOW FUNCTIONS, AND ARBITRARY ORDERING.
• NOSQL EXCELS IN QUICK CRUD OPERATIONS (CREATE, READ, UPDATE,
DELETE) FOR INDIVIDUAL ENTRIES BUT IS NOT RECOMMENDED FOR AD HOC
ANALYSIS.
• SPECIALIZED SKILL SETS ARE REQUIRED FOR NOSQL ALTERNATIVES, AND
THEY MAY NOT BE COMPATIBLE WITH MOST SQL-BASED TOOLS.
• IT'S CRUCIAL TO NOTE THAT NOSQL SOLUTIONS ARE NOT REPLACEMENTS
FOR DATABASE WAREHOUSES AND MAY NOT PERFORM WELL FOR
ANALYTICS.
THE SNOWFLAKE
ARCHITECTURE
TRADITIONAL DATA PLATFORMS VS.
SNOWFLAKE APPROACH

• IMPROVED TRADITIONAL DATA PLATFORMS, ESPECIALLY ON-PREMISES


IMPLEMENTATIONS, STRUGGLED TO ADDRESS MODERN DATA PROBLEMS
AND SCALABILITY ISSUES.
• THE SNOWFLAKE TEAM OPTED FOR A UNIQUE APPROACH, BUILDING AN
ENTIRELY NEW, MODERN DATA PLATFORM EXCLUSIVELY FOR THE CLOUD.
• SNOWFLAKE'S DESIGN PHYSICALLY SEPARATES BUT LOGICALLY
INTEGRATES STORAGE, COMPUTE, SECURITY, AND MANAGEMENT,
ENABLING CONCURRENT SHARING OF LIVE DATA.
SNOWFLAKE HYBRID-MODEL ARCHITECTURE
OVERVIEW
• THE SNOWFLAKE HYBRID-MODEL ARCHITECTURE CONSISTS OF THREE
LAYERS: THE CLOUD SERVICES LAYER, THE COMPUTE LAYER, AND THE DATA
STORAGE LAYER (REFER TO FIGURE 2-3).
• SNOWFLAKE'S DESIGN EMPHASIZES PHYSICAL SEPARATION YET LOGICAL
INTEGRATION OF STORAGE AND COMPUTE, COMPLEMENTED BY SERVICES
LIKE SECURITY AND MANAGEMENT.
• THROUGHOUT THE UPCOMING CHAPTERS, WE'LL DELVE INTO THE UNIQUE
FEATURES THAT MAKE SNOWFLAKE THE SOLE ARCHITECTURE CAPABLE OF
ENABLING THE DATA CLOUD.
LAYERS OF SNOWFLAKE HYBRID-MODEL
ARCHITECTURE
• THE SNOWFLAKE HYBRID-MODEL ARCHITECTURE FEATURES THREE
LAYERS: CLOUD SERVICES, COMPUTE, AND DATA STORAGE
• EACH LAYER, ALONG WITH THE THREE SNOWFLAKE CACHES, WILL BE
EXPLORED IN MORE DETAIL IN THE FOLLOWING SECTIONS.
THE CLOUD SERVICES LAYER
INTRODUCTION TO SNOWFLAKE CLOUD
• ALL INTERACTIONS WITH DATA IN A SNOWFLAKE INSTANCE COMMENCE IN
SERVICES LAYER
THE CLOUD SERVICES LAYER, ALSO KNOWN AS THE GLOBAL SERVICES
LAYER.
• THE SNOWFLAKE CLOUD SERVICES LAYER ORCHESTRATES VARIOUS
ACTIVITIES, INCLUDING AUTHENTICATION, ACCESS CONTROL, ENCRYPTION,
INFRASTRUCTURE MANAGEMENT, METADATA HANDLING, QUERY PARSING,
AND OPTIMIZATION.
• OFTEN REFERRED TO AS THE SNOWFLAKE BRAIN, THE CLOUD SERVICES
LAYER SEAMLESSLY COORDINATES USER REQUESTS, STARTING FROM LOGIN
INITIATION.
CLOUD SERVICES LAYER OPERATIONS
• THE CLOUD SERVICES LAYER, ALSO KNOWN AS THE GLOBAL SERVICES
LAYER, MANAGES AUTHENTICATION, ACCESS CONTROL, ENCRYPTION,
INFRASTRUCTURE, METADATA, QUERY PARSING, AND OPTIMIZATION.
• OFTEN REFERRED TO AS THE SNOWFLAKE BRAIN, THIS LAYER
COLLABORATES WITH VARIOUS SERVICE COMPONENTS TO HANDLE USER
REQUESTS FROM LOGIN INITIATION.
• I SERVES AS THE GATEWAY FOR SQL CLIENT INTERFACES, ENABLING DATA
DEFINITION LANGUAGE (DDL) AND DATA MANIPULATION LANGUAGE (DML)
OPERATIONS ON DATA.
USER INTERACTION WITH CLOUD SERVICES
LAYER
• EVERY INTERACTION WITH DATA IN SNOWFLAKE INITIATES IN THE CLOUD
SERVICES LAYER, THE GLOBAL SERVICES LAYER.
• USER REQUESTS, STARTING FROM LOGIN INITIATION, ARE EFFICIENTLY
HANDLED BY THE CLOUD SERVICES LAYER, OFTEN REGARDED AS THE
SNOWFLAKE BRAIN.
• FOR SNOWFLAKE QUERIES, THE SQL QUERY UNDERGOES OPTIMIZATION IN
THE CLOUD SERVICES LAYER BEFORE BEING SENT TO THE COMPUTE LAYER
FOR PROCESSING.
• THE CLOUD SERVICES LAYER SERVES AS THE FOUNDATION FOR SQL CLIENT
INTERFACES, FACILITATING DATA DEFINITION LANGUAGE (DDL) AND DATA
MANIPULATION LANGUAGE (DML) OPERATIONS ON DATA.
DATA SECURITY IN CLOUD SERVICES LAYER

• THE CLOUD SERVICES LAYER TAKES CHARGE OF DATA SECURITY,


ENCOMPASSING ASPECTS LIKE DATA SHARING SECURITY.
• OPERATING ACROSS MULTIPLE AVAILABILITY ZONES IN EACH CLOUD
PROVIDER REGION, THE SNOWFLAKE CLOUD SERVICES LAYER HOUSES THE
RESULT CACHE—A CACHED COPY OF EXECUTED QUERY RESULTS.
• ESSENTIAL METADATA FOR QUERY OPTIMIZATION AND DATA FILTERING IS
ALSO STORED WITHIN THE CLOUD SERVICES LAYER.
ARCHITECTURE OF CLOUD SERVICES LAYER

• THE SNOWFLAKE CLOUD SERVICES LAYER SPANS MULTIPLE AVAILABILITY


ZONES IN EACH CLOUD PROVIDER REGION.
• IT MANAGES DATA SECURITY, INCLUDING DATA SHARING SECURITY, AND
HOUSES THE RESULT CACHE—A CACHED COPY OF EXECUTED QUERY
RESULTS.
• CRITICAL METADATA ESSENTIAL FOR QUERY OPTIMIZATION AND DATA
FILTERING IS STORED WITHIN THE CLOUD SERVICES LAYER.
SCALING AND AUTOMATION IN CLOUD
SERVICES LAYER

• SCALING OF THE CLOUD SERVICES LAYER, LIKE OTHER SNOWFLAKE


LAYERS, OCCURS INDEPENDENTLY.
• THE CLOUD SERVICES LAYER'S SCALING IS AUTOMATED, REQUIRING NO
DIRECT MANIPULATION BY SNOWFLAKE END USERS.
SNOWFLAKE PRICING AND CLOUD SERVICES
USAGE

• SNOWFLAKE PRICING AND CLOUD SERVICES USAGE


• MOST CLOUD SERVICES CONSUMPTION IS INTEGRATED INTO SNOWFLAKE
PRICING, WITH OCCASIONAL OVERAGE BILLING FOR EXCEEDING 10% OF
DAILY COMPUTE CREDIT USAGE.
• DAILY COMPUTE CREDIT USAGE CALCULATIONS ARE BASED ON THE UTC
TIME ZONE, PROVIDING CLARITY FOR USERS.
FACTORS INFLUENCING CLOUD SERVICES
COSTS

• UNDERSTANDING FACTORS INFLUENCING CLOUD SERVICES COSTS


• QUERIES AND DDL OPERATIONS, BEING METADATA OPERATIONS, UTILIZE
CLOUD SERVICES RESOURCES.
• CONSIDERATION OF COST IMPLICATIONS IS CRUCIAL WHEN EXCEEDING 10%
OF DAILY COMPUTE CREDIT USAGE.
SCENARIOS IMPACTING CLOUD SERVICES
COSTS

• INCREASED CLOUD SERVICES CONSUMPTION MAY ARISE IN SPECIFIC SCENARIOS.


• USAGE PATTERNS INCLUDE EXECUTING NUMEROUS SIMPLE QUERIES, ESPECIALLY THOSE
INVOLVING SESSION INFORMATION OR SESSION VARIABLES.
• LARGE, COMPLEX QUERIES WITH MULTIPLE JOINS CONTRIBUTE TO HIGHER CLOUD
SERVICES CONSUMPTION.
• SINGLE-ROW INSERTS, AS OPPOSED TO BULK OR BATCH LOADING, RESULT IN INCREASED
CLOUD SERVICES USAGE.
• USING COMMANDS ON INFORMATION_SCHEMA TABLES OR SPECIFIC METADATA-ONLY
COMMANDS, SUCH AS THE SHOW COMMAND, CONSUMES ONLY CLOUD SERVICES
RESOURCES.
INVESTIGATION AND OPTIMIZATION TIPS

• IF EXPERIENCING HIGHER-THAN-EXPECTED COSTS FOR CLOUD SERVICES, CONSIDER INVESTIGATING THE


FOLLOWING SCENARIOS.
• EVALUATE THE USAGE OF SEVERAL SIMPLE QUERIES, ESPECIALLY THOSE ACCESSING SESSION
INFORMATION OR USING SESSION VARIABLES.
• ASSESS THE IMPACT OF LARGE, COMPLEX QUERIES WITH MULTIPLE JOINS ON CLOUD SERVICES
CONSUMPTION.
• VERIFY THE INFLUENCE OF SINGLE-ROW INSERTS, CONTRASTING WITH BULK OR BATCH LOADING, ON
CLOUD SERVICES USAGE.
• EXAMINE THE UTILIZATION OF COMMANDS ON INFORMATION_SCHEMA TABLES OR SPECIFIC METADATA-
ONLY COMMANDS, LIKE THE SHOW COMMAND.
• INVESTIGATE PARTNER TOOLS, INCLUDING THOSE LEVERAGING THE JDBC DRIVER, FOR POTENTIAL
OPTIMIZATION OPPORTUNITIES.
ECONOMIC AND STRATEGIC CONSIDERATIONS

• ECONOMIC AND STRATEGIC CONSIDERATIONS IN CLOUD SERVICES COST


• WHILE THE CLOUD SERVICES COST FOR A SPECIFIC USE CASE MAY BE HIGH, THERE ARE
INSTANCES WHERE INCURRING THIS COST MAKES ECONOMIC AND STRATEGIC SENSE.
• LEVERAGING THE RESULT CACHE FOR QUERIES, PARTICULARLY FOR LARGE OR
COMPLEX QUERIES, CAN RESULT IN ZERO COMPUTE COST FOR THAT SPECIFIC QUERY.
• MAKING THOUGHTFUL CHOICES ABOUT THE FREQUENCY AND GRANULARITY OF DATA
DEFINITION LANGUAGE (DDL) COMMANDS, ESPECIALLY FOR USE CASES LIKE
CLONING, AIDS IN ACHIEVING A BALANCED APPROACH BETWEEN CLOUD SERVICES
CONSUMPTION COSTS AND VIRTUAL WAREHOUSE COSTS, ULTIMATELY LEADING TO AN
OVERALL COST REDUCTION.
THE QUERY PROCESSING (VIRTUAL WAREHOUSE)
COMPUTE LAYER
INTRODUCTION TO SNOWFLAKE COMPUTE
CLUSTERS
• A SNOWFLAKE COMPUTE CLUSTER, OFTEN REFERRED TO AS A VIRTUAL
WAREHOUSE, IS A DYNAMIC CLUSTER OF COMPUTE RESOURCES,
ENCOMPASSING CPU, MEMORY, AND TEMPORARY STORAGE.
• CREATING VIRTUAL WAREHOUSES IN SNOWFLAKE LEVERAGES COMPUTE
CLUSTERS—VIRTUAL MACHINES IN THE CLOUD THAT ARE PROVISIONED
SEAMLESSLY BEHIND THE SCENES.
• SNOWFLAKE ENSURES TRANSPARENCY IN THE PROCESS, AS IT DOESN'T
DISCLOSE THE EXACT SERVER IN USE, WHICH MAY CHANGE BASED ON
CLOUD PROVIDER MODIFICATIONS.
ROLE OF VIRTUAL WAREHOUSES IN
OPERATIONS

• A RUNNING VIRTUAL WAREHOUSE IS ESSENTIAL FOR MOST SQL QUERIES


AND ALL DATA MANIPULATION LANGUAGE (DML) OPERATIONS,
ENCOMPASSING TASKS LIKE DATA LOADING, UNLOADING, AND ROW
UPDATES.
• CERTAIN SQL QUERIES CAN BE EXECUTED WITHOUT THE NEED FOR A
VIRTUAL WAREHOUSE, A CONCEPT WE WILL EXPLORE FURTHER IN THE
DISCUSSION ON THE QUERY RESULT CACHE.
SEPARATION OF STORAGE AND COMPUTE
• SNOWFLAKE'S UNIQUE ARCHITECTURE ALLOWS FOR THE SEPARATION OF
STORAGE AND COMPUTE WITHIN VIRTUAL WAREHOUSES.
• EACH VIRTUAL WAREHOUSE OPERATES INDEPENDENTLY, ACCESSING THE
SAME DATA WITHOUT CONTENTION OR PERFORMANCE IMPACT ON OTHER
WAREHOUSES.
• THIS INDEPENDENCE IS HIGHLIGHTED IN FIGURE , ILLUSTRATING
SNOWFLAKE'S COMPUTE (VIRTUAL WAREHOUSE) LAYER.
DYNAMIC NATURE OF VIRTUAL WAREHOUSES

• VIRTUAL WAREHOUSES IN SNOWFLAKE ARE CONSTANTLY CONSUMING


CREDITS WHEN RUNNING IN A SESSION.
• HOWEVER, THEY CAN BE STARTED, STOPPED, AND RESIZED AT ANY TIME,
EVEN WHILE ACTIVELY RUNNING.
• SNOWFLAKE SUPPORTS TWO SCALING METHODS: SCALING UP BY RESIZING
A WAREHOUSE AND SCALING OUT BY ADDING CLUSTERS TO A WAREHOUSE.
SCALING METHODS AND CONFIGURATION
CONSISTENCY

• UNLIKE THE MULTITENANT ARCHITECTURE OF THE CLOUD SERVICES LAYER


AND DATA STORAGE LAYER, THE SNOWFLAKE VIRTUAL WAREHOUSE LAYER
IS NOT MULTITENANT.
• SNOWFLAKE PREDETERMINES THE CPU, MEMORY, AND SOLID-STATE DRIVE
(SSD) CONFIGURATIONS FOR EACH NODE IN A VIRTUAL WAREHOUSE,
MAINTAINING CONSISTENCY ACROSS ALL THREE CLOUD PROVIDERS.
MANAGING VIRTUAL WAREHOUSE CREDITS
AND SCALING OPTIONS

• A VIRTUAL WAREHOUSE IS ALWAYS CONSUMING CREDITS WHEN RUNNING


IN A SESSION.
• SNOWFLAKE VIRTUAL WAREHOUSES CAN BE STARTED, STOPPED, AND
RESIZED AT ANY TIME, EVEN WHILE ACTIVELY RUNNING.
• SNOWFLAKE PROVIDES TWO SCALING METHODS: SCALING UP BY RESIZING
A WAREHOUSE AND SCALING OUT BY ADDING CLUSTERS TO A WAREHOUSE.
• IT IS POSSIBLE TO USE ONE OR BOTH SCALING METHODS SIMULTANEOUSLY.
ARCHITECTURE DISTINCTION IN VIRTUAL
WAREHOUSES

• UNLIKE THE MULTITENANT ARCHITECTURE OF THE CLOUD SERVICES LAYER


AND DATA STORAGE LAYER, THE SNOWFLAKE VIRTUAL WAREHOUSE LAYER
(AS SHOWN IN FIGURE 2-5) IS NOT A MULTITENANT ARCHITECTURE.
• SNOWFLAKE PREDETERMINES CPU, MEMORY, AND SOLID-STATE DRIVE (SSD)
CONFIGURATIONS FOR EACH NODE IN A VIRTUAL WAREHOUSE, ENSURING
CONSISTENCY ACROSS ALL THREE CLOUD PROVIDERS.
VIRTUAL WAREHOUSE SIZE
• A SNOWFLAKE COMPUTE CLUSTER IS DEFINED BY ITS SIZE, INDICATING THE
NUMBER OF SERVERS IN THE VIRTUAL WAREHOUSE CLUSTER.
• AS THE VIRTUAL WAREHOUSE SIZE INCREASES, THE NUMBER OF COMPUTE
RESOURCES ON AVERAGE DOUBLES IN CAPACITY (SEE TABLE 2-1).
• BEYOND THE 4X-LARGE SIZE, A DIFFERENT APPROACH IS EMPLOYED TO
DETERMINE THE NUMBER OF SERVERS PER CLUSTER.
• DESPITE THE CHANGE IN APPROACH FOR EXTREMELY LARGE VIRTUAL
WAREHOUSES, THE CREDITS PER HOUR STILL INCREASE BY A FACTOR OF 2.
VIRTUAL WAREHOUSE RESIZING AND COST
CONSIDERATIONS
• RESIZING A VIRTUAL WAREHOUSE TO A LARGER SIZE, OFTEN REFERRED TO
AS SCALING UP, IS PRIMARILY AIMED AT ENHANCING QUERY PERFORMANCE
AND MANAGING SUBSTANTIAL WORKLOADS.
• THE DETAILS OF THIS PROCESS WILL BE EXPLORED IN GREATER DEPTH IN
THE UPCOMING SECTION.
• TIP: SNOWFLAKE'S PER-SECOND BILLING MODEL OFTEN MAKES IT COST-
EFFECTIVE TO OPERATE LARGER VIRTUAL WAREHOUSES, ESPECIALLY
CONSIDERING THE ABILITY TO SUSPEND THEM DURING PERIODS OF
INACTIVITY. HOWEVER, AN EXCEPTION ARISES WHEN RUNNING NUMEROUS
SMALL OR BASIC QUERIES ON LARGE VIRTUAL WAREHOUSE SIZES. IN SUCH
CASES, THE ADDITIONAL RESOURCES MAY NOT YIELD BENEFITS,
IRRESPECTIVE OF THE NUMBER OF CONCURRENT QUERIES.
SCALING UP A VIRTUAL WAREHOUSE TO PROCESS LARGE
DATA VOLUMES AND COMPLEX QUERIES
• IN AN IDEAL SCENARIO (SIMPLE WORKLOAD, IDENTICAL PER TEST), USING AN X-SMALL VIRTUAL
WAREHOUSE WOULD INCUR THE SAME TOTAL COST AS USING A 4X-LARGE VIRTUAL WAREHOUSE, WITH
THE SOLE DIFFERENCE BEING A REDUCTION IN TIME TO COMPLETION.
• HOWEVER, REALITY INTRODUCES COMPLEXITIES. SEVERAL FACTORS IMPACT VIRTUAL WAREHOUSE
PERFORMANCE, INCLUDING THE NUMBER OF CONCURRENT QUERIES, QUERIED TABLES, AND THE SIZE
AND COMPOSITION OF THE DATA.
• SIZING A SNOWFLAKE VIRTUAL WAREHOUSE APPROPRIATELY IS CRUCIAL. INSUFFICIENT RESOURCES
DUE TO A SMALL VIRTUAL WAREHOUSE MAY LEAD TO PROLONGED QUERY COMPLETION TIMES, WHILE
A VIRTUAL WAREHOUSE BEING TOO LARGE FOR SMALL QUERIES COULD HAVE NEGATIVE COST
IMPLICATIONS.
• RESIZING A SNOWFLAKE VIRTUAL WAREHOUSE IS A MANUAL PROCESS, AND IT CAN BE DONE WHILE
QUERIES ARE RUNNING, WITHOUT THE NEED TO STOP OR SUSPEND THE VIRTUAL WAREHOUSE.
RESIZING AFFECTS SUBSEQUENT QUERIES, WITH ONGOING QUERIES FINISHING ON THE ORIGINAL SIZE.
CREATING VIRTUAL WAREHOUSES IN
• IN SNOWFLAKE, VIRTUAL WAREHOUSES CAN BE CREATED VIA THE USER
SNOWFLAKE
INTERFACE OR WITH SQL.
• CONSIDERATIONS WHEN CREATING A NEW VIRTUAL WAREHOUSE INCLUDE
KNOWING ITS NAME, DESIRED SIZE, AND DECIDING WHETHER IT SHOULD BE
A MULTICLUSTER VIRTUAL WAREHOUSE.
• A MULTICLUSTER VIRTUAL WAREHOUSE IN SNOWFLAKE ENABLES
AUTOMATIC SCALING IN AND OUT, A CONCEPT EXPLORED IN MORE DETAIL
IN THE NEXT SECTION.
• WHEN USING THE SNOWSIGHT USER INTERFACE TO CREATE A VIRTUAL
WAREHOUSE (AS SHOWN IN FIGURE 2-7), INFORMATION SUCH AS THE
VIRTUAL WAREHOUSE NAME AND SIZE IS REQUIRED.
RESIZING AND MANAGING VIRTUAL
WAREHOUSES
• ONCE A VIRTUAL WAREHOUSE IS CREATED, IT CAN BE RESIZED, EITHER SCALING UP OR DOWN.
RESIZING IS A MANUAL PROCESS THAT INVOLVES EDITING THE EXISTING VIRTUAL WAREHOUSE.
• NOTE: RESIZING CAN BE DONE AT ANY TIME, EVEN WHILE THE VIRTUAL WAREHOUSE IS RUNNING AND
PROCESSING STATEMENTS. IT DOES NOT IMPACT STATEMENTS CURRENTLY BEING EXECUTED.
• RESIZING TO A LARGER SIZE MAKES THE NEW, LARGER VIRTUAL WAREHOUSE IMMEDIATELY
AVAILABLE FOR QUEUED QUERIES AND FUTURE STATEMENTS.
• SCALING A VIRTUAL WAREHOUSE CAN BE ACCOMPLISHED THROUGH THE USER INTERFACE OR BY
USING SQL STATEMENTS IN THE WORKSHEET.
• ADVANCED VIRTUAL WAREHOUSE OPTIONS, SUCH AS AUTO SUSPEND AND AUTO RESUME (AS SHOWN
IN FIGURE 2-8), CAN BE CONFIGURED. THESE ARE ENABLED BY DEFAULT.
CONSIDERATIONS FOR VIRTUAL WAREHOUSE
SIZE
• LARGER VIRTUAL WAREHOUSES DON'T AUTOMATICALLY ENSURE IMPROVED
PERFORMANCE FOR QUERY PROCESSING OR DATA LOADING.
• THE KEY CONSIDERATION FOR SELECTING A VIRTUAL WAREHOUSE SIZE IS
THE COMPLEXITY OF QUERIES. THE TIME TAKEN TO EXECUTE COMPLEX
QUERIES IS TYPICALLY LONGER THAN THAT FOR SIMPLE QUERIES.
• DATA LOADING OR UNLOADING PERFORMANCE IS ALSO INFLUENCED BY
FACTORS SUCH AS THE NUMBER AND SIZE OF FILES. FOR DATA LOADING,
THE NUMBER OF FILES AND THEIR INDIVIDUAL SIZES SIGNIFICANTLY
IMPACT PERFORMANCE.
USE CASE CONSIDERATIONS

• CAREFULLY ASSESS YOUR USE CASE BEFORE CHOOSING A VIRTUAL


WAREHOUSE LARGER THAN LARGE. QUERY COMPLEXITY AND DATA
LOADING SPECIFICS PLAY CRUCIAL ROLES IN DETERMINING THE OPTIMAL
SIZE.
• AN EXCEPTION TO THIS GUIDELINE IS WHEN BULK-LOADING HUNDREDS OR
THOUSANDS OF FILES CONCURRENTLY.
SCALING OUT WITH MULTICLUSTER VIRTUAL
WAREHOUSES TO MAXIMIZE CONCURRENCY

You might also like