SlideShare a Scribd company logo
VAST-Tree: A Vector-Advanced and
Compressed Structure
for Massive Data Tree Traversal

         EDBT 2012, March 27th-29th, 2012
         Humboldt University, Berlin




                                            1
Outline
• Backgrounds & Motivation
  – Modern HW and HW-aware algorithms

• Prerequisite Knowledge
  – Search keys with SIMD instructions

• Proposed Technique
  – Branch compression for high parallelization

• Experimental
  – Twitter Public Timeline as a real data set
  – Compression ratio and throughput

                                                  2
Backgrounds: Modern Hardware
• Fast and highly-functional Hardware
  – Multi-/Many-core CPUs
    Intel Ivy bridge/Haswell/Skylake/Knights Ferry
  – GPUs for General Purpose
  – ...


• New algorithms advanced by these hardware
  – Sorts, Searches, Compression, and DB kernels

         A today topic: tree searches on multi-core CPUs
                                                      3
Backgrounds: Multi-core CPUs
• Highly-advanced instructions
  – 128/256/512-bit SIMD, Transactional Memory, ...


• Branch Prediction
  – Process “if-then” paths efficiently
  – High penalties of branch misses


• Parallelism & Memory
  – Many cores on a single processor
  – Limited by memory accesses [5][14][15]
                                                      4
Backgrounds: Tree Traversal
• Search a key from a sequence of values

• Fundamental operations
     – Used everywhere, and well-known
            search_key
                             Code Snippet:
                   48        if (search_key >
                                        node->compare_key)
              12        68      node = node->right;
                             else
                                node = node->left;
 7          20

  Ex.) Binary-Tree                                           5
Backgrounds: Tree Traversal
• But, legacy algorithms too inefficient
                                          Actual Execution time: 20-40%
                           100%                                                    6.0E+03
 Ratio of execution time




                                                                                             # of instructions
                                                                                   4.0E+03
                                        complete instructions
                           50%          stall time
                                        branch penalties
                                        # of instructions                          2.0E+03



                            0%                                                     0.0E+00
                                  22(0.161)   24(0.167)    26(0.206)   28(0.319)
                                                 log2(# of keys)
                                                                                                                 6
Backgrounds: Existing Algorithms
• Cache-conscious B+Tree [4][10][11][19][20]
  – Realigning, prefetching, and buffering nodes


• FAST [14]
  – Cache-conscious and branch-free techniques
  – SIMD instructions used for branch-free searches


• PALM [24]
  – Support incremental updates for FAST

                                                      7
Prerequisite Knowledge:
Tree traversal with SIMD instructions




                                        8
Prerequisite Knowledge: Searches with SIMD
• Process multiple data with SIMD instructions
  – Most x86 processors support 128bit SIMD
  – Return 1 or 0 with inequality relation


• FAST compare 3 keys simultaneously
                             32bit        128bit


               Register A:   34      78   91       x

               Register B:   79      79   79       x

               Register C:     1     1    0        x   9
Logical Example: Searches with SIMD
   : SIMD blocks compared simultaneously
79 : A search key




                                           10
Logical Example: Searches with SIMD
     : SIMD blocks compared simultaneously
  79 : A search key

Compare keys with SIMD




                                             11
Logical Example: Searches with SIMD
     : SIMD blocks compared simultaneously
  79 : A search key                     A lookup table
                                     Returned        Offset
                                     Values          Blocks
Compare keys with SIMD               ...             ...

                                           1 1 0 x   3
                                     ...             ...




        1                2    3              4


                                                           12
Logical Example: Searches with SIMD
   : SIMD blocks compared simultaneously
79 : A search key




                             Move to a next SIMD block   13
Physical Example: Searches with SIMD
• Arrange SIMD blocks in breadth first order on
  physically consecutive memory




                                                  14
Physical Example: Searches with SIMD
 • Arrange SIMD blocks in breadth first order on
   physically consecutive memory


                  36B Offset Jumps!

  [34, 78, 91], [2, 11, 23], [35, 39, 49], [80, 87, 88], ...
                               To high addresses in memory
Each SIMD block is 12B


                                                               15
Issue: Number of Comparison Keys
• More keys compared simultaneously!
  – SIMD supports 1byte and 2byte elements



                           x                          x


                           x                          x


                           x                          x

  1byte each and 16 elements   2byte each and 8 elements


                                                           16
A proposed technique:
Branch compression for high parallelization




                                          17
VAST-Tree: Designing Data Structure
• Classify branches into 3 layers
  – Apply FAST to P32, and compress keys in P16 and P8


  : SBs - SIMD blocks
                                                     (H32)
  : CBs - Compression blocks

2byte keys, and 7 keys
compared simultaneously                              (H16)



1byte keys, and 15 keys                              (H8)
compared simultaneously

                                                     18
Detail Outline: VAST-Tree
• Branch Lossy Compression
  – Comparison Errors
• SIMD Aligned Layouts
• Other topics ...




                             19
Proposed: Branch Lossy Compression
• Apply to each compression block
   – Prefix and suffix bit truncation

• Transform ‘search’ keys similarly for comparison
   – Extracted bit location stored in the header of CBs

                              Remove lower bits   1byte keys
  Ascending order
    keys in a CB




                          1
                    Extract partial bits with
                        red background                         20
Penalty: Comparison Errors
• But, some lossy keys compared incorrectly
   Example)
        value1 - 3220 (1100 1001 01002=20110)
        value2 - 3219 (1100 1001 00112=20110)

         Original Values: 3220 > 3219 --> Return 0
                          A error happens!
     Compressed Values: 201 ≦ 201 --> Return 1

• Check and correct errors after tree traversal
  – Scan leaf nodes sequentially
                                                     21
Proposed: SIMD-Aligned Layouts
• Load data efficiently to SIMD registers

• A few padding spaces between blocks
  – Many blanks caused by page alignment in FAST

                     Each block is SIMD-length aligned
                     SBs




                   CBs
                                                         22
Proposed: SIMD-Aligned Layouts
• Load data efficiently to SIMD registers

• A few padding spaces between blocks
  – Many blanks caused by page alignment in FAST

                     Each block is SIMD-length aligned
                     SBs




                   CBs              Padding spaces
                                                         23
Proposed: Other Topics
• Linear search optimization
  – Remove bottom SBs


• Apply P4Delta to leaf nodes
  – A lossless compression method        Compress fixed k keys
                                            into a chunk

    Keys in leaf nodes:




                          Single chunk Single chunk       24
Experimental Results




                       25
Setup: Synthetic and Realistic Data Sets
• Twitter Public Timeline data
  – May, 2010 to Apr., 2011
  – Twitter Ids and Timestamps
  – 36,068,948 entries (nearly equal to 225)
                               1.0
                               0.8          Twitter - Ids
• Synthetic data       Ratio                Twitter - Timestamps
                               0.6
  – Follow a Poisson           0.4
    distribution               0.2
                               0.0
                                     0 1 2 3 4 5 6 7 8 9 10
                                             d-gaps
                                                               26
Results: Compression Ratio – Branch Nodes

   VAST-Tree parameters(H32, H16)     Best




                                             27
Results: Compression Ratio – Leaf Nodes


          Minimize Error Penalty




                Improve Compression




                                          28
Results: Throughput – Synthetic Data
             1.0E+08
                                      VAST-Tree w/o P4Delta
                                      VAST-Tree w P4Delta
             7.5E+07                  FAST
Throughput




                                      binary trees




                                                              Better
             5.0E+07


             2.5E+07


             0.0E+00
                       22    24         26            28
                            log2 (# of keys)



                                                                       29
Results: Throughput – Twitter Data

                                     Better
                                     Worse




                                              30
Results: Error Ratio
        1.0
                                    1/λ= 16
                                    1/λ= 64
        0.8
                                    1/λ= 256
                                    Twitter -Ids
        0.6                         Twitter -Timestamps
Ratio




        0.4

        0.2

        0.0        Better
              0-   10-       100-       1000-      10000-
                            ⊿w



                                                            31
Summary & Future Work
• Proposed lossy compression for high parallelization
   – Linear search opt., leaf compression, and others


• Experimental Evaluation
   –     Compress branch nodes dynamically
   –     Improve throughput and compression ratio
   –     Throughput worsen by leaf compression


• Future Work
   – Update supports, and more amount of keys


                                                        32
33

More Related Content

PDF
Bayesian Counters
DataWorks Summit
 
PDF
Ugif 04 2011 france ug04042011-jroy_ts
UGIF
 
PDF
GPU Computing with CUDA
PriyankaSaini94
 
PDF
Advanced High-Performance Computing Features of the OpenPOWER ISA
Ganesan Narayanasamy
 
PDF
The Goal and The Journey - Turning back on one year of C++14 Migration
Joel Falcou
 
PDF
Designing C++ portable SIMD support
Joel Falcou
 
PPTX
Dynamo and BigTable in light of the CAP theorem
Grisha Weintraub
 
PDF
NickKallen_DataArchitectureAtTwitterScale
Kostas Mavridis
 
Bayesian Counters
DataWorks Summit
 
Ugif 04 2011 france ug04042011-jroy_ts
UGIF
 
GPU Computing with CUDA
PriyankaSaini94
 
Advanced High-Performance Computing Features of the OpenPOWER ISA
Ganesan Narayanasamy
 
The Goal and The Journey - Turning back on one year of C++14 Migration
Joel Falcou
 
Designing C++ portable SIMD support
Joel Falcou
 
Dynamo and BigTable in light of the CAP theorem
Grisha Weintraub
 
NickKallen_DataArchitectureAtTwitterScale
Kostas Mavridis
 

Similar to VAST-Tree, EDBT'12 (20)

PPTX
Hadoop Summit 2012 | Bayesian Counters AKA In Memory Data Mining for Large Da...
Cloudera, Inc.
 
PDF
MySQL NDB Cluster 8.0 SQL faster than NoSQL
Bernd Ocklin
 
PDF
Nikita Abdullin - Reverse-engineering of embedded MIPS devices. Case Study - ...
DefconRussia
 
PDF
Revisão: Forwarding Metamorphosis: Fast Programmable Match-Action Processing ...
Bruno Castelucci
 
PPTX
presentasi-raid-server-cloud-computing.pptx
sendukedian
 
PPT
Lec11 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Memory part3
Hsien-Hsin Sean Lee, Ph.D.
 
PDF
Lecture 25
Berkay TURAN
 
PPTX
Computer System Architecture Lecture Note 8.1 primary Memory
Budditha Hettige
 
PDF
MongoDB: Scaling write performance | Devon 2012
Daum DNA
 
PDF
QuadIron An open source library for number theoretic transform-based erasure ...
Scality
 
PDF
What should be done to IR algorithms to meet current, and possible future, ha...
Simon Lia-Jonassen
 
PPT
7. Key-Value Databases: In Depth
Fabio Fumarola
 
PDF
Galera Cluster for MySQL vs MySQL (NDB) Cluster: A High Level Comparison
Severalnines
 
KEY
Everything I Ever Learned About JVM Performance Tuning @Twitter
Attila Szegedi
 
PPTX
Apache Cassandra Opinion and Fact
mediumdata
 
PDF
Memory ECC - The Comprehensive of SEC-DED.
Sk Cheah
 
PDF
NoSQL Data Stores: Introduzione alle Basi di Dati Non Relazionali
Steve Maraspin
 
PDF
Pitfalls of Object Oriented Programming
Slide_N
 
PPTX
Advanced computer architecture
krishnaviswambharan
 
PPTX
L6.sp17.pptx
SudheerKumar499932
 
Hadoop Summit 2012 | Bayesian Counters AKA In Memory Data Mining for Large Da...
Cloudera, Inc.
 
MySQL NDB Cluster 8.0 SQL faster than NoSQL
Bernd Ocklin
 
Nikita Abdullin - Reverse-engineering of embedded MIPS devices. Case Study - ...
DefconRussia
 
Revisão: Forwarding Metamorphosis: Fast Programmable Match-Action Processing ...
Bruno Castelucci
 
presentasi-raid-server-cloud-computing.pptx
sendukedian
 
Lec11 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Memory part3
Hsien-Hsin Sean Lee, Ph.D.
 
Lecture 25
Berkay TURAN
 
Computer System Architecture Lecture Note 8.1 primary Memory
Budditha Hettige
 
MongoDB: Scaling write performance | Devon 2012
Daum DNA
 
QuadIron An open source library for number theoretic transform-based erasure ...
Scality
 
What should be done to IR algorithms to meet current, and possible future, ha...
Simon Lia-Jonassen
 
7. Key-Value Databases: In Depth
Fabio Fumarola
 
Galera Cluster for MySQL vs MySQL (NDB) Cluster: A High Level Comparison
Severalnines
 
Everything I Ever Learned About JVM Performance Tuning @Twitter
Attila Szegedi
 
Apache Cassandra Opinion and Fact
mediumdata
 
Memory ECC - The Comprehensive of SEC-DED.
Sk Cheah
 
NoSQL Data Stores: Introduzione alle Basi di Dati Non Relazionali
Steve Maraspin
 
Pitfalls of Object Oriented Programming
Slide_N
 
Advanced computer architecture
krishnaviswambharan
 
L6.sp17.pptx
SudheerKumar499932
 
Ad

More from Takeshi Yamamuro (20)

PDF
LT: Spark 3.1 Feature Expectation
Takeshi Yamamuro
 
PDF
Apache Spark + Arrow
Takeshi Yamamuro
 
PPT
Quick Overview of Upcoming Spark 3.0 + α
Takeshi Yamamuro
 
PDF
MLflowによる機械学習モデルのライフサイクルの管理
Takeshi Yamamuro
 
PDF
Taming Distributed/Parallel Query Execution Engine of Apache Spark
Takeshi Yamamuro
 
PPTX
LLJVM: LLVM bitcode to JVM bytecode
Takeshi Yamamuro
 
PDF
20180417 hivemall meetup#4
Takeshi Yamamuro
 
PDF
An Experimental Study of Bitmap Compression vs. Inverted List Compression
Takeshi Yamamuro
 
PDF
Sparkのクエリ処理系と周辺の話題
Takeshi Yamamuro
 
PDF
20160908 hivemall meetup
Takeshi Yamamuro
 
PDF
20150513 legobease
Takeshi Yamamuro
 
PDF
20150516 icde2015 r19-4
Takeshi Yamamuro
 
PDF
VLDB2013 R1 Emerging Hardware
Takeshi Yamamuro
 
PDF
浮動小数点(IEEE754)を圧縮したい@dsirnlp#4
Takeshi Yamamuro
 
PDF
LLVMで遊ぶ(整数圧縮とか、x86向けの自動ベクトル化とか)
Takeshi Yamamuro
 
PDF
Introduction to Modern Analytical DB
Takeshi Yamamuro
 
PDF
SIGMOD’12勉強会 -Session 7-
Takeshi Yamamuro
 
PDF
A x86-optimized rank&select dictionary for bit sequences
Takeshi Yamamuro
 
PDF
VLDB’11勉強会 -Session 9-
Takeshi Yamamuro
 
PDF
研究動向から考えるx86/x64最適化手法
Takeshi Yamamuro
 
LT: Spark 3.1 Feature Expectation
Takeshi Yamamuro
 
Apache Spark + Arrow
Takeshi Yamamuro
 
Quick Overview of Upcoming Spark 3.0 + α
Takeshi Yamamuro
 
MLflowによる機械学習モデルのライフサイクルの管理
Takeshi Yamamuro
 
Taming Distributed/Parallel Query Execution Engine of Apache Spark
Takeshi Yamamuro
 
LLJVM: LLVM bitcode to JVM bytecode
Takeshi Yamamuro
 
20180417 hivemall meetup#4
Takeshi Yamamuro
 
An Experimental Study of Bitmap Compression vs. Inverted List Compression
Takeshi Yamamuro
 
Sparkのクエリ処理系と周辺の話題
Takeshi Yamamuro
 
20160908 hivemall meetup
Takeshi Yamamuro
 
20150513 legobease
Takeshi Yamamuro
 
20150516 icde2015 r19-4
Takeshi Yamamuro
 
VLDB2013 R1 Emerging Hardware
Takeshi Yamamuro
 
浮動小数点(IEEE754)を圧縮したい@dsirnlp#4
Takeshi Yamamuro
 
LLVMで遊ぶ(整数圧縮とか、x86向けの自動ベクトル化とか)
Takeshi Yamamuro
 
Introduction to Modern Analytical DB
Takeshi Yamamuro
 
SIGMOD’12勉強会 -Session 7-
Takeshi Yamamuro
 
A x86-optimized rank&select dictionary for bit sequences
Takeshi Yamamuro
 
VLDB’11勉強会 -Session 9-
Takeshi Yamamuro
 
研究動向から考えるx86/x64最適化手法
Takeshi Yamamuro
 
Ad

Recently uploaded (20)

PDF
Why Your AI & Cybersecurity Hiring Still Misses the Mark in 2025
Virtual Employee Pvt. Ltd.
 
PDF
Google’s NotebookLM Unveils Video Overviews
SOFTTECHHUB
 
PPTX
Comunidade Salesforce São Paulo - Desmistificando o Omnistudio (Vlocity)
Francisco Vieira Júnior
 
PDF
Security features in Dell, HP, and Lenovo PC systems: A research-based compar...
Principled Technologies
 
PDF
madgavkar20181017ppt McKinsey Presentation.pdf
georgschmitzdoerner
 
PPTX
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
PDF
How Onsite IT Support Drives Business Efficiency, Security, and Growth.pdf
Captain IT
 
PDF
Enable Enterprise-Ready Security on IBM i Systems.pdf
Precisely
 
PDF
GYTPOL If You Give a Hacker a Host
linda296484
 
PDF
CIFDAQ'S Market Insight: BTC to ETH money in motion
CIFDAQ
 
PDF
Using Anchore and DefectDojo to Stand Up Your DevSecOps Function
Anchore
 
PDF
Chapter 2 Digital Image Fundamentals.pdf
Getnet Tigabie Askale -(GM)
 
PDF
Building High-Performance Oracle Teams: Strategic Staffing for Database Manag...
SMACT Works
 
PDF
How-Cloud-Computing-Impacts-Businesses-in-2025-and-Beyond.pdf
Artjoker Software Development Company
 
PDF
Presentation about Hardware and Software in Computer
snehamodhawadiya
 
PDF
Software Development Company | KodekX
KodekX
 
PDF
Event Presentation Google Cloud Next Extended 2025
minhtrietgect
 
PDF
AI Unleashed - Shaping the Future -Starting Today - AIOUG Yatra 2025 - For Co...
Sandesh Rao
 
PDF
SparkLabs Primer on Artificial Intelligence 2025
SparkLabs Group
 
PDF
NewMind AI Weekly Chronicles - July'25 - Week IV
NewMind AI
 
Why Your AI & Cybersecurity Hiring Still Misses the Mark in 2025
Virtual Employee Pvt. Ltd.
 
Google’s NotebookLM Unveils Video Overviews
SOFTTECHHUB
 
Comunidade Salesforce São Paulo - Desmistificando o Omnistudio (Vlocity)
Francisco Vieira Júnior
 
Security features in Dell, HP, and Lenovo PC systems: A research-based compar...
Principled Technologies
 
madgavkar20181017ppt McKinsey Presentation.pdf
georgschmitzdoerner
 
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
How Onsite IT Support Drives Business Efficiency, Security, and Growth.pdf
Captain IT
 
Enable Enterprise-Ready Security on IBM i Systems.pdf
Precisely
 
GYTPOL If You Give a Hacker a Host
linda296484
 
CIFDAQ'S Market Insight: BTC to ETH money in motion
CIFDAQ
 
Using Anchore and DefectDojo to Stand Up Your DevSecOps Function
Anchore
 
Chapter 2 Digital Image Fundamentals.pdf
Getnet Tigabie Askale -(GM)
 
Building High-Performance Oracle Teams: Strategic Staffing for Database Manag...
SMACT Works
 
How-Cloud-Computing-Impacts-Businesses-in-2025-and-Beyond.pdf
Artjoker Software Development Company
 
Presentation about Hardware and Software in Computer
snehamodhawadiya
 
Software Development Company | KodekX
KodekX
 
Event Presentation Google Cloud Next Extended 2025
minhtrietgect
 
AI Unleashed - Shaping the Future -Starting Today - AIOUG Yatra 2025 - For Co...
Sandesh Rao
 
SparkLabs Primer on Artificial Intelligence 2025
SparkLabs Group
 
NewMind AI Weekly Chronicles - July'25 - Week IV
NewMind AI
 

VAST-Tree, EDBT'12

  • 1. VAST-Tree: A Vector-Advanced and Compressed Structure for Massive Data Tree Traversal EDBT 2012, March 27th-29th, 2012 Humboldt University, Berlin 1
  • 2. Outline • Backgrounds & Motivation – Modern HW and HW-aware algorithms • Prerequisite Knowledge – Search keys with SIMD instructions • Proposed Technique – Branch compression for high parallelization • Experimental – Twitter Public Timeline as a real data set – Compression ratio and throughput 2
  • 3. Backgrounds: Modern Hardware • Fast and highly-functional Hardware – Multi-/Many-core CPUs Intel Ivy bridge/Haswell/Skylake/Knights Ferry – GPUs for General Purpose – ... • New algorithms advanced by these hardware – Sorts, Searches, Compression, and DB kernels A today topic: tree searches on multi-core CPUs 3
  • 4. Backgrounds: Multi-core CPUs • Highly-advanced instructions – 128/256/512-bit SIMD, Transactional Memory, ... • Branch Prediction – Process “if-then” paths efficiently – High penalties of branch misses • Parallelism & Memory – Many cores on a single processor – Limited by memory accesses [5][14][15] 4
  • 5. Backgrounds: Tree Traversal • Search a key from a sequence of values • Fundamental operations – Used everywhere, and well-known search_key Code Snippet: 48 if (search_key > node->compare_key) 12 68 node = node->right; else node = node->left; 7 20 Ex.) Binary-Tree 5
  • 6. Backgrounds: Tree Traversal • But, legacy algorithms too inefficient Actual Execution time: 20-40% 100% 6.0E+03 Ratio of execution time # of instructions 4.0E+03 complete instructions 50% stall time branch penalties # of instructions 2.0E+03 0% 0.0E+00 22(0.161) 24(0.167) 26(0.206) 28(0.319) log2(# of keys) 6
  • 7. Backgrounds: Existing Algorithms • Cache-conscious B+Tree [4][10][11][19][20] – Realigning, prefetching, and buffering nodes • FAST [14] – Cache-conscious and branch-free techniques – SIMD instructions used for branch-free searches • PALM [24] – Support incremental updates for FAST 7
  • 8. Prerequisite Knowledge: Tree traversal with SIMD instructions 8
  • 9. Prerequisite Knowledge: Searches with SIMD • Process multiple data with SIMD instructions – Most x86 processors support 128bit SIMD – Return 1 or 0 with inequality relation • FAST compare 3 keys simultaneously 32bit 128bit Register A: 34 78 91 x Register B: 79 79 79 x Register C: 1 1 0 x 9
  • 10. Logical Example: Searches with SIMD : SIMD blocks compared simultaneously 79 : A search key 10
  • 11. Logical Example: Searches with SIMD : SIMD blocks compared simultaneously 79 : A search key Compare keys with SIMD 11
  • 12. Logical Example: Searches with SIMD : SIMD blocks compared simultaneously 79 : A search key A lookup table Returned Offset Values Blocks Compare keys with SIMD ... ... 1 1 0 x 3 ... ... 1 2 3 4 12
  • 13. Logical Example: Searches with SIMD : SIMD blocks compared simultaneously 79 : A search key Move to a next SIMD block 13
  • 14. Physical Example: Searches with SIMD • Arrange SIMD blocks in breadth first order on physically consecutive memory 14
  • 15. Physical Example: Searches with SIMD • Arrange SIMD blocks in breadth first order on physically consecutive memory 36B Offset Jumps! [34, 78, 91], [2, 11, 23], [35, 39, 49], [80, 87, 88], ... To high addresses in memory Each SIMD block is 12B 15
  • 16. Issue: Number of Comparison Keys • More keys compared simultaneously! – SIMD supports 1byte and 2byte elements x x x x x x 1byte each and 16 elements 2byte each and 8 elements 16
  • 17. A proposed technique: Branch compression for high parallelization 17
  • 18. VAST-Tree: Designing Data Structure • Classify branches into 3 layers – Apply FAST to P32, and compress keys in P16 and P8 : SBs - SIMD blocks (H32) : CBs - Compression blocks 2byte keys, and 7 keys compared simultaneously (H16) 1byte keys, and 15 keys (H8) compared simultaneously 18
  • 19. Detail Outline: VAST-Tree • Branch Lossy Compression – Comparison Errors • SIMD Aligned Layouts • Other topics ... 19
  • 20. Proposed: Branch Lossy Compression • Apply to each compression block – Prefix and suffix bit truncation • Transform ‘search’ keys similarly for comparison – Extracted bit location stored in the header of CBs Remove lower bits 1byte keys Ascending order keys in a CB 1 Extract partial bits with red background 20
  • 21. Penalty: Comparison Errors • But, some lossy keys compared incorrectly Example) value1 - 3220 (1100 1001 01002=20110) value2 - 3219 (1100 1001 00112=20110) Original Values: 3220 > 3219 --> Return 0 A error happens! Compressed Values: 201 ≦ 201 --> Return 1 • Check and correct errors after tree traversal – Scan leaf nodes sequentially 21
  • 22. Proposed: SIMD-Aligned Layouts • Load data efficiently to SIMD registers • A few padding spaces between blocks – Many blanks caused by page alignment in FAST Each block is SIMD-length aligned SBs CBs 22
  • 23. Proposed: SIMD-Aligned Layouts • Load data efficiently to SIMD registers • A few padding spaces between blocks – Many blanks caused by page alignment in FAST Each block is SIMD-length aligned SBs CBs Padding spaces 23
  • 24. Proposed: Other Topics • Linear search optimization – Remove bottom SBs • Apply P4Delta to leaf nodes – A lossless compression method Compress fixed k keys into a chunk Keys in leaf nodes: Single chunk Single chunk 24
  • 26. Setup: Synthetic and Realistic Data Sets • Twitter Public Timeline data – May, 2010 to Apr., 2011 – Twitter Ids and Timestamps – 36,068,948 entries (nearly equal to 225) 1.0 0.8 Twitter - Ids • Synthetic data Ratio Twitter - Timestamps 0.6 – Follow a Poisson 0.4 distribution 0.2 0.0 0 1 2 3 4 5 6 7 8 9 10 d-gaps 26
  • 27. Results: Compression Ratio – Branch Nodes VAST-Tree parameters(H32, H16) Best 27
  • 28. Results: Compression Ratio – Leaf Nodes Minimize Error Penalty Improve Compression 28
  • 29. Results: Throughput – Synthetic Data 1.0E+08 VAST-Tree w/o P4Delta VAST-Tree w P4Delta 7.5E+07 FAST Throughput binary trees Better 5.0E+07 2.5E+07 0.0E+00 22 24 26 28 log2 (# of keys) 29
  • 30. Results: Throughput – Twitter Data Better Worse 30
  • 31. Results: Error Ratio 1.0 1/λ= 16 1/λ= 64 0.8 1/λ= 256 Twitter -Ids 0.6 Twitter -Timestamps Ratio 0.4 0.2 0.0 Better 0- 10- 100- 1000- 10000- ⊿w 31
  • 32. Summary & Future Work • Proposed lossy compression for high parallelization – Linear search opt., leaf compression, and others • Experimental Evaluation – Compress branch nodes dynamically – Improve throughput and compression ratio – Throughput worsen by leaf compression • Future Work – Update supports, and more amount of keys 32
  • 33. 33