SlideShare a Scribd company logo
qCube:
Efficient integration of range query operators over a high
dimension data cube
Rodrigo Rocha Silva
Doctorate Student
Prof. Dr. Celso Massaki Hirata
Advisor
Prof. Dr. Joubert de Castro Lima
Co-Advisor
ITA – INSTITUTO TECNOLÓGICO DE AERONÁUTICA

Electronic Engineering and Computer Science Division - EEC/I
Department of Computer Science
Brazil
qCube: Efficient integration of range query operators over a high dimension data cube

Goal
Present a new cube approach, designed for
high dimension range queries. Our cube
approach, named Query Cube (qCube),
implements Equal, Not Equal, Greater or
Less than, Some, Between and Similar
range query operators and Distinct, Subcube and Top-k Similar inquire query
operators

Wednesday, October 02, 2012

28º Simpósio Brasileiro de Banco de Dados - Rodrigo Rocha Silva

2
qCube: Efficient integration of range query operators over a high dimension data cube

Topics
–
–
–
–
–
–
–

Motivation
Data Cube
Related Work
Query Cube (qCube)
Experiments
Results
Conclusions

Wednesday, October 02, 2012

28º Simpósio Brasileiro de Banco de Dados - Rodrigo Rocha Silva

3
qCube: Efficient integration of range query operators over a high dimension data cube

Motivation
Users need to view data in a tangible way, such as reports,
cross tables and histograms

Wednesday, October 02, 2012

28º Simpósio Brasileiro de Banco de Dados - Rodrigo Rocha Silva

4
qCube: Efficient integration of range query operators over a high dimension data cube

Motivation
• Suppose that at some decision-making
process it is necessary the following
information :
“What is the women journal research papers
variance impact, using months {1, 3, 5, 7,
11}, year 2012 and ages varying from 25-40
years? Return results for all countries”

“The average temperatures above 30 degrees
Celsius on the weekends of leap years in the last
200 years.”
Wednesday, October 02, 2012

28º Simpósio Brasileiro de Banco de Dados - Rodrigo Rocha Silva

5
qCube: Efficient integration of range query operators over a high dimension data cube

Data Cube
A data cube, introduced by Gray et al., 1996, is
a generalization of the group-by operator over all
possible combinations of dimensions with
various granularity aggregates.

Wednesday, October 02, 2012

28º Simpósio Brasileiro de Banco de Dados - Rodrigo Rocha Silva

6
qCube: Efficient integration of range query operators over a high dimension data cube

Data Cube

A data cube has exponential
complexity with respect to the
number of dimensions
For an input with size d the
output has size 2d

Wednesday, October 02, 2012

28º Simpósio Brasileiro de Banco de Dados - Rodrigo Rocha Silva

7
qCube: Efficient integration of range query operators over a high dimension data cube

Data Cube
• Hierarchies
Year

Discipline

Day

Department

Year

Wednesday, October 02, 2012

Hour

28º Simpósio Brasileiro de Banco de Dados - Rodrigo Rocha Silva

8
qCube: Efficient integration of range query operators over a high dimension data cube

Data Cube
A

C

COUNT

A

B

C

COUNT

*

*

*

11

*

b2

c1

1

a1

*

*

3

*

b2

c2

1

a2

*

*

5

*

b3

c2

3

a3

Base Relation R – 11 tuples

B

*

*

3

a1

b1

c1

1

A

B

C

COUNT

*

b1

*

6

a3

b3

c2

1

a1

b1

c1

1

*

b2

*

2

a2

b3

c2

1

a3

b3

c2

1

*

b3

*

3

a3

b1

c1

1

a2

b3

c2

1

*

*

c1

4

a2

b1

c1

1

a3

b1

c1

1

*

*

c2

7

a2

b2

c2

1

a2

b1

c1

1

a1

b1

*

2

a1

b1

c2

1

a2

b2

c2

1

a1

b3

*

1

a2

b2

c1

1

a1

b1

c2

1

a2

b1

*

2

a3

b1

c2

1

a2

b2

c1

1

a2

b2

*

2

a1

b3

c2

1

a3

b1

c2

1

a2

b3

*

1

a2

b1

c2

1

a1

b3

c2

1

a3

b1

*

2

a2

b1

c2

1

a3

b3

*

1

a1

*

c1

1

a1

*

c2

2

a2

*

c1

2

a2

*

c2

3

a3

*

c1

1

a3

*

c2

2

*

b1

c1

3

*

b1

c2

3

Wednesday, October 02, 2012

FULL 3D CUBE

+

38 tuples

28º Simpósio Brasileiro de Banco de Dados - Rodrigo Rocha Silva

9
qCube: Efficient integration of range query operators over a high dimension data cube

Related Work – Frag-Cubing Approach
•

Partitions the data vertically

•

Reduces high-dimensional cube into a set of lower
dimensional cubes

•

Lossless reduction

•

Offers tradeoffs between the amount of pre-processing
and the speed of online computation

From book Han and Kamber: Data Mining Concepts and Techniques
Wednesday, October 02, 2012

28º Simpósio Brasileiro de Banco de Dados - Rodrigo Rocha Silva

10
qCube: Efficient integration of range query operators over a high dimension data cube

Related Work – Frag-Cubing Example
• Let the cube aggregation function be count
tid

A

B

C

D

E

1

a1

b1

c1

d1

e1

2

a1

b2

c1

d2

e1

3

a1

b2

c1

d1

e2

4

a2

b1

c1

d1

e2

5

a2

b1

c1

d1

e3

• Divide the 5 dimensions into 2 shell fragments:
– (A, B, C) and (D, E)
From book Han and Kamber: Data Mining Concepts and Techniques
Wednesday, October 02, 2012

28º Simpósio Brasileiro de Banco de Dados - Rodrigo Rocha Silva

11
qCube: Efficient integration of range query operators over a high dimension data cube

Related Work – Frag-Cubing 1-D Inverted Indices
• Build traditional invert index or RID list
Attribute Value

TID List

List Size

a1

123

3

a2

45

2

b1

145

3

b2

23

2

c1

12345

5

d1

1345

4

d2

2

1

e1

12

2

e2

34

2

e3

5

1

From book Han and Kamber: Data Mining Concepts and Techniques
Wednesday, October 02, 2012

28º Simpósio Brasileiro de Banco de Dados - Rodrigo Rocha Silva

12
qCube: Efficient integration of range query operators over a high dimension data cube

Related Work – Frag-Cubing Approach
• Generalize the 1-D inverted indices to multi-dimensional
ones in the data cube sense
• Compute all cuboids for data cubes ABC and DE while
retaining the inverted indices
• For example, shell
fragment cube ABC
contains 7 cuboids:
– A, B, C
– AB, AC, BC
– ABC
• This completes the offline
computation stage

Cell

Intersection

TID List List Size

a1 b1

1 2 3 ∩1 4 5

1

1

a1 b2

1 2 3 ∩2 3

23

2

a2 b1

4 5 ∩1 4 5

45

2

a2 b2

4 5 ∩2 3

⊗

0

From book Han and Kamber: Data Mining Concepts and Techniques
Wednesday, October 02, 2012

28º Simpósio Brasileiro de Banco de Dados - Rodrigo Rocha Silva

13
qCube: Efficient integration of range query operators over a high dimension data cube

Related Work – Frag-Cubing Measure Table
• If measures other than count are present, store in
ID_measure table separate from the shell fragments
tid

count

sum

1

5

70

2

3

10

3

8

20

4

5

40

5

2

30

From book Han and Kamber: Data Mining Concepts and Techniques
Wednesday, October 02, 2012

28º Simpósio Brasileiro de Banco de Dados - Rodrigo Rocha Silva

14
qCube: Efficient integration of range query operators over a high dimension data cube

Related Work – Frag-Cubing Query
•

Given the fragment cubes, process a query as follows

1.

Divide the query into fragment, same as the shell

2.

Fetch the corresponding TID list for each fragment
from the fragment cube

3.

Intersect the TID lists from each fragment to construct
instantiated base table

4.

Compute the data cube using the base table with any
cubing algorithm
From book Han and Kamber: Data Mining Concepts and Techniques

Wednesday, October 02, 2012

28º Simpósio Brasileiro de Banco de Dados - Rodrigo Rocha Silva

15
qCube: Efficient integration of range query operators over a high dimension data cube

Related Work – Frag-Cubing Approach
A B C D E F G H I J K L M N …

Base Table

Online
Computation

From book Han and Kamber: Data Mining Concepts and Techniques
Wednesday, October 02, 2012

28º Simpósio Brasileiro de Banco de Dados - Rodrigo Rocha Silva

16
qCube: Efficient integration of range query operators over a high dimension data cube

qCube Approach
Implements a set of tuple identifiers per dimension attribute,
similar to Frag-Cubing;
Therefore, qCube can answer point queries using tuple
identifiers intersections and range queries using unions plus
intersections algorithms, regardless measure function types.
Frag-Cubing just implements point and some inquire queries.
There is no Frag-Cubing solution for queries like

“What is the women journal research papers variance impact,
using months {1, 3, 5, 7, 11}, year 2012 and ages varying
from 25-40 years? Return results for all countries”
Wednesday, October 02, 2012

28º Simpósio Brasileiro de Banco de Dados - Rodrigo Rocha Silva

17
qCube: Efficient integration of range query operators over a high dimension data cube

qCube Approach
Implements the range query operators:
• Equal;
• Not Equal;
• Greater or Less than;
• Some;
• Between and Similar.
Also implements inquire query operators:
• Distinct;
• Sub-cube;
• Top-k Similar.
Over a high dimension data cube.
Wednesday, October 02, 2012

28º Simpósio Brasileiro de Banco de Dados - Rodrigo Rocha Silva

18
qCube: Efficient integration of range query operators over a high dimension data cube

qCube Architecture

Wednesday, October 02, 2012

28º Simpósio Brasileiro de Banco de Dados - Rodrigo Rocha Silva

19
qCube: Efficient integration of range query operators over a high dimension data cube

qCube Computation
TID
1
2
3
4
5
6

A
a1
a2
a1
a3
a1
a5

Function
tid
1
2
3
4
5
6

B
b1
b2
b1
b3
b1
b5

C
c1
c2
c1
c3
c4
c5

D
d1
d2
d1
d2
d1
d2

Variance
M1
2.56
3.14
2.45
6.7
9
1

Wednesday, October 02, 2012

E
e1
e2
e1
e2
e2
e2

Count
M2
1
1
1
1
1
1

Attribute Value TID List

Attribute Value TID List

a1
a2
a3
a5
b1
b2
b3
b5
c1

c2
c3
c4
c5
d1
d2
e1
e2

Average
M3
10
20
10
11
3
1

1, 3, 5
2
4
6
1, 3, 5
2
4
6
1, 3

Skewness
M4
1
0
1
1
1
1

2
4
5
6
1, 3, 5
2, 4, 6
1, 3
2, 4, 5, 6

Standard deviation
M5
877686769698
7986676867.99
-7878789.8777
-99974333.23
100045.655
1

28º Simpósio Brasileiro de Banco de Dados - Rodrigo Rocha Silva

20
qCube: Efficient integration of range query operators over a high dimension data cube

qCube Update

The same qCube Computation algorithm

Wednesday, October 02, 2012

28º Simpósio Brasileiro de Banco de Dados - Rodrigo Rocha Silva

21
qCube: Efficient integration of range query operators over a high dimension data cube

qCube Update
TID
1
2
3
4
5
6

A
a1
a2
a1
a3
a1
a5

B
b1
b2
b1
b3
b1
b5

C
c1
c2
c1
c3
c4
c5

D
d1
d2
d1
d2
d1
d2

E
e1
e2
e1
e2
e2
e2

Attribute Value
a1
a2
a3
a5
b1
b2
b3
b5
c1
c2
c3
Wednesday, October 02, 2012

tid
5
7
8
9

TID List
1, 3
2, 8
4, 5, 7
6, 9
1, 3, 5
2, 7
4, 8
6, 9
1, 3
2
4, 7

A
a3
a3
a2
a5

B
b1
b2
b3
b5

C
c4
c3
c4
c5

Attribute Value
c4
c5
d1
d2
d3
e1
e2
e3
f1
f2

D
d1
d3
d3
d1

E
e2
e3
e2
e1

F

f1
f2

TID List
5, 8
6, 9
1, 3, 5, 9
2, 4, 6
7, 8
1, 3, 9
2, 4, 5, 6, 8
7
8
9

28º Simpósio Brasileiro de Banco de Dados - Rodrigo Rocha Silva

22
qCube: Efficient integration of range query operators over a high dimension data cube

qCube Query
pQ= a1:*:*:*:e1
Attribute Value
a1
a2
a3
a5
b1
b2
b3
b5
c1
c2
c3

Wednesday, October 02, 2012

TID List
1, 3
2, 8
4, 5
6, 9
1, 3, 5
2, 7
4, 8
6, 9
1, 3
2
4, 7

Attribute Value
c4
c5
d1
d2
d3
e1
e2
e3
f1
f2

TID List
5, 8
6, 9
1, 3, 5, 9
2, 4, 6
7, 8
1, 3, 9
2, 4, 5, 6, 8
7
8
9

28º Simpósio Brasileiro de Banco de Dados - Rodrigo Rocha Silva

23
qCube: Efficient integration of range query operators over a high dimension data cube

qCube Range and Inquire Query
rOp= (greater than + less than + between + some + different +

similar x (fv1 … fvn))
iOp =(sub-cube + distinct + top-k similar x (fv1 … fvn))
qCube rearranges Q sub-queries in order to improve query
response times

a result of Q we have qR=(TID1, TID2 … TIDk), where TIDi is
the ith tuple identifier of relation R.
Wednesday, October 02, 2012

28º Simpósio Brasileiro de Banco de Dados - Rodrigo Rocha Silva

24
qCube: Efficient integration of range query operators over a high dimension data cube

qCube Query - example

Wednesday, October 02, 2012

28º Simpósio Brasileiro de Banco de Dados - Rodrigo Rocha Silva

25
qCube: Efficient integration of range query operators over a high dimension data cube

qCube Query - example
“What is the women journal research papers variance impact,
using months {1, 3, 5, 7, 11}, year 2012 and ages varying from 2540 years? Return results for all countries”
In Q, they are (sex = women, paperType=journal, year=2012).
The range queries (month = (1,3,5,7,11), age <>25-40) are also
sorted according to their cardinalities.
In Q, there is inquire query (country=distinct).

Wednesday, October 02, 2012

28º Simpósio Brasileiro de Banco de Dados - Rodrigo Rocha Silva

26
qCube: Efficient integration of range query operators over a high dimension data cube

qCube Query - example

Wednesday, October 02, 2012

28º Simpósio Brasileiro de Banco de Dados - Rodrigo Rocha Silva

27
qCube: Efficient integration of range query operators over a high dimension data cube

Experiments
•

We tested qCube Computation and Query algorithms against Frag-Cubing
algorithm used in [Li et al. 2004];

•

The qCube algorithms were coded in Java 64 bits;

•

Frag-Cubing is a free and open source C++
application(http://illimine.cs.uiuc.edu/);

•

The synthetic base relations were created using data generator provided by the
IlliMine project;

•

The IlliMine project is an open-source project to provide various approaches for
data mining and machine learning.

•

Frag-Cubing approach is part of IlliMine project.

•

We ran the algorithms in two Intel Xeon six-core processors with 2.4GHz each
core, 12MB cache and 128GB of RAM DDR3 1333MHz.

•

The system runs Windows Server 2008 64 bits, High Performance version.

Wednesday, October 02, 2012

28º Simpósio Brasileiro de Banco de Dados - Rodrigo Rocha Silva

28
qCube: Efficient integration of range query operators over a high dimension data cube

Results - Performance Evaluation of Point Queries and Skewed
Relations

Response time per query over
100 trials: T=107; C=5000;

D=30, S=0

Response time per query over 100
trials: T=107; C=5000; D=30,

S=2.5

Wednesday, October 02, 2012

28º Simpósio Brasileiro de Banco de Dados - Rodrigo Rocha Silva

29
qCube: Efficient integration of range query operators over a high dimension data cube

Results - Performance Evaluation of Range Query Operators
and Skewed Relations

Response time queries with one infrequent point
operator: T=107; C=5000; D=30, S=2.5
Wednesday, October 02, 2012

28º Simpósio Brasileiro de Banco de Dados - Rodrigo Rocha Silva

30
qCube: Efficient integration of range query operators over a high dimension data cube

Results - Performance Evaluation of of Inquire Operators
and Skewed Relations

Response time queries with inquire operators: T = 107; C = 5000; D = 30, S = 2.5.

Wednesday, October 02, 2012

28º Simpósio Brasileiro de Banco de Dados - Rodrigo Rocha Silva

31
qCube: Efficient integration of range query operators over a high dimension data cube

Results - Runtime and Memory Consumption

Wednesday, October 02, 2012

28º Simpósio Brasileiro de Banco de Dados - Rodrigo Rocha Silva

32
qCube: Efficient integration of range query operators over a high dimension data cube

Conclusions
• qCube has linear runtime and memory consumption, similar to

Frag-Cubing;
• It implements Not Equal, Greater or Less than, Some, Between
and Similar range query operators and Distinct, Sub-cube and
Top-k Similar inquire query operators;
• When compared with Frag-Cubing, qCube is faster to answer
point and inquire queries with sub-cube operators.
• It introduces a different cube representation with less empty cells
than Frag-Cubing;
• Frag-Cubing cannot answer two sub-cube operators in a data
cube with 107 tuples, C=5000, D=30 and S=2.5.
Wednesday, October 02, 2012

28º Simpósio Brasileiro de Banco de Dados - Rodrigo Rocha Silva

33
qCube: Efficient integration of range query operators over a high dimension data cube

Conclusions
Interesting research directions to further extend qCube:
First, we must experiment it with holistic measures. Update and computation
experiments with many holistic measures are a hard problem;
TIDs can become huge, thus memory consumption and intersection costs can
become impracticable, and therefore we must address an efficient solution to
partition TIDs with fast data retrieval.
Multicore and multicomputer versions of qCube must be implemented.

qCube must be improved to answer top-k queries combined with range, point
and inquire queries.
Experiments with high dimensional text cubes must be made to evaluate qCube ,
specially its text measures computing.

Wednesday, October 02, 2012

28º Simpósio Brasileiro de Banco de Dados - Rodrigo Rocha Silva

34
qCube: Efficient integration of range query operators over a high dimension data cube

Acknowlegements

Wednesday, October 02, 2012

28º Simpósio Brasileiro de Banco de Dados - Rodrigo Rocha Silva

35

More Related Content

What's hot (9)

Exploring Titan and Spark GraphX for Analyzing Time-Varying Electrical Networks
Exploring Titan and Spark GraphX for Analyzing Time-Varying Electrical Networks Exploring Titan and Spark GraphX for Analyzing Time-Varying Electrical Networks
Exploring Titan and Spark GraphX for Analyzing Time-Varying Electrical Networks
DataWorks Summit/Hadoop Summit
 
Quick and Dirty: Scaling Out Predictive Models Using Revolution Analytics on ...
Quick and Dirty: Scaling Out Predictive Models Using Revolution Analytics on ...Quick and Dirty: Scaling Out Predictive Models Using Revolution Analytics on ...
Quick and Dirty: Scaling Out Predictive Models Using Revolution Analytics on ...
Revolution Analytics
 
Using spark for timeseries graph analytics
Using spark for timeseries graph analyticsUsing spark for timeseries graph analytics
Using spark for timeseries graph analytics
Sigmoid
 
IMDb Data Integration
IMDb Data IntegrationIMDb Data Integration
IMDb Data Integration
Giuseppe Andreetti
 
Trivento summercamp fast data 9/9/2016
Trivento summercamp fast data 9/9/2016Trivento summercamp fast data 9/9/2016
Trivento summercamp fast data 9/9/2016
Stavros Kontopoulos
 
Trivento summercamp masterclass 9/9/2016
Trivento summercamp masterclass 9/9/2016Trivento summercamp masterclass 9/9/2016
Trivento summercamp masterclass 9/9/2016
Stavros Kontopoulos
 
The State of Postgres | Strata San Jose 2018 | Umur Cubukcu
The State of Postgres | Strata San Jose 2018 | Umur CubukcuThe State of Postgres | Strata San Jose 2018 | Umur Cubukcu
The State of Postgres | Strata San Jose 2018 | Umur Cubukcu
Citus Data
 
Voxxed days thessaloniki 21/10/2016 - Streaming Engines for Big Data
Voxxed days thessaloniki 21/10/2016 - Streaming Engines for Big DataVoxxed days thessaloniki 21/10/2016 - Streaming Engines for Big Data
Voxxed days thessaloniki 21/10/2016 - Streaming Engines for Big Data
Stavros Kontopoulos
 
Hadoop User Group Ireland (HUG) Ireland - Eddie Baggot Presentation April 2016
Hadoop User Group Ireland (HUG) Ireland - Eddie Baggot Presentation April 2016Hadoop User Group Ireland (HUG) Ireland - Eddie Baggot Presentation April 2016
Hadoop User Group Ireland (HUG) Ireland - Eddie Baggot Presentation April 2016
John Mulhall
 
Exploring Titan and Spark GraphX for Analyzing Time-Varying Electrical Networks
Exploring Titan and Spark GraphX for Analyzing Time-Varying Electrical Networks Exploring Titan and Spark GraphX for Analyzing Time-Varying Electrical Networks
Exploring Titan and Spark GraphX for Analyzing Time-Varying Electrical Networks
DataWorks Summit/Hadoop Summit
 
Quick and Dirty: Scaling Out Predictive Models Using Revolution Analytics on ...
Quick and Dirty: Scaling Out Predictive Models Using Revolution Analytics on ...Quick and Dirty: Scaling Out Predictive Models Using Revolution Analytics on ...
Quick and Dirty: Scaling Out Predictive Models Using Revolution Analytics on ...
Revolution Analytics
 
Using spark for timeseries graph analytics
Using spark for timeseries graph analyticsUsing spark for timeseries graph analytics
Using spark for timeseries graph analytics
Sigmoid
 
Trivento summercamp fast data 9/9/2016
Trivento summercamp fast data 9/9/2016Trivento summercamp fast data 9/9/2016
Trivento summercamp fast data 9/9/2016
Stavros Kontopoulos
 
Trivento summercamp masterclass 9/9/2016
Trivento summercamp masterclass 9/9/2016Trivento summercamp masterclass 9/9/2016
Trivento summercamp masterclass 9/9/2016
Stavros Kontopoulos
 
The State of Postgres | Strata San Jose 2018 | Umur Cubukcu
The State of Postgres | Strata San Jose 2018 | Umur CubukcuThe State of Postgres | Strata San Jose 2018 | Umur Cubukcu
The State of Postgres | Strata San Jose 2018 | Umur Cubukcu
Citus Data
 
Voxxed days thessaloniki 21/10/2016 - Streaming Engines for Big Data
Voxxed days thessaloniki 21/10/2016 - Streaming Engines for Big DataVoxxed days thessaloniki 21/10/2016 - Streaming Engines for Big Data
Voxxed days thessaloniki 21/10/2016 - Streaming Engines for Big Data
Stavros Kontopoulos
 
Hadoop User Group Ireland (HUG) Ireland - Eddie Baggot Presentation April 2016
Hadoop User Group Ireland (HUG) Ireland - Eddie Baggot Presentation April 2016Hadoop User Group Ireland (HUG) Ireland - Eddie Baggot Presentation April 2016
Hadoop User Group Ireland (HUG) Ireland - Eddie Baggot Presentation April 2016
John Mulhall
 

Similar to qCube: Efficient integration of range query operators over a high dimension data cube (20)

BICOD-2017
BICOD-2017BICOD-2017
BICOD-2017
Rim Moussa
 
Analyzing petabytes of smartmeter data using Cloud Bigtable, Cloud Dataflow, ...
Analyzing petabytes of smartmeter data using Cloud Bigtable, Cloud Dataflow, ...Analyzing petabytes of smartmeter data using Cloud Bigtable, Cloud Dataflow, ...
Analyzing petabytes of smartmeter data using Cloud Bigtable, Cloud Dataflow, ...
Edwin Poot
 
OLAP Basics and Fundamentals by Bharat Kalia
OLAP Basics and Fundamentals by Bharat Kalia OLAP Basics and Fundamentals by Bharat Kalia
OLAP Basics and Fundamentals by Bharat Kalia
Bharat Kalia
 
Joins in a distributed world Distributed Matters Barcelona 2015
Joins in a distributed world Distributed Matters Barcelona 2015Joins in a distributed world Distributed Matters Barcelona 2015
Joins in a distributed world Distributed Matters Barcelona 2015
Lucian Precup
 
Workshop on Google Cloud Data Platform
Workshop on Google Cloud Data PlatformWorkshop on Google Cloud Data Platform
Workshop on Google Cloud Data Platform
GoDataDriven
 
Ds for finance day 4
Ds for finance day 4Ds for finance day 4
Ds for finance day 4
QuantUniversity
 
Ogi conf delft_v1_evangelos_kalampokis
Ogi conf delft_v1_evangelos_kalampokisOgi conf delft_v1_evangelos_kalampokis
Ogi conf delft_v1_evangelos_kalampokis
OpenGovIntelligence
 
Democratization of NOSQL Document-Database over Relational Database Comparati...
Democratization of NOSQL Document-Database over Relational Database Comparati...Democratization of NOSQL Document-Database over Relational Database Comparati...
Democratization of NOSQL Document-Database over Relational Database Comparati...
IRJET Journal
 
SQL Analytics Powering Telemetry Analysis at Comcast
SQL Analytics Powering Telemetry Analysis at ComcastSQL Analytics Powering Telemetry Analysis at Comcast
SQL Analytics Powering Telemetry Analysis at Comcast
Databricks
 
Logical Data Fabric and Data Mesh – Driving Business Outcomes
Logical Data Fabric and Data Mesh – Driving Business OutcomesLogical Data Fabric and Data Mesh – Driving Business Outcomes
Logical Data Fabric and Data Mesh – Driving Business Outcomes
Denodo
 
Key projects Data Science and Engineering
Key projects Data Science and EngineeringKey projects Data Science and Engineering
Key projects Data Science and Engineering
Vijayananda Mohire
 
Key projects Data Science and Engineering
Key projects Data Science and EngineeringKey projects Data Science and Engineering
Key projects Data Science and Engineering
Vijayananda Mohire
 
Amazon's Exabyte-Scale Migration from Spark to Ray
Amazon's Exabyte-Scale Migration from Spark to RayAmazon's Exabyte-Scale Migration from Spark to Ray
Amazon's Exabyte-Scale Migration from Spark to Ray
All Things Open
 
Webinar: Introducing the MongoDB Connector for BI 2.0 with Tableau
Webinar: Introducing the MongoDB Connector for BI 2.0 with TableauWebinar: Introducing the MongoDB Connector for BI 2.0 with Tableau
Webinar: Introducing the MongoDB Connector for BI 2.0 with Tableau
MongoDB
 
Data Modeling IoT and Time Series data in NoSQL
Data Modeling IoT and Time Series data in NoSQLData Modeling IoT and Time Series data in NoSQL
Data Modeling IoT and Time Series data in NoSQL
Basho Technologies
 
MongoDB and the Internet of Things
MongoDB and the Internet of ThingsMongoDB and the Internet of Things
MongoDB and the Internet of Things
Sam_Francis
 
Machine Learning Applications in Credit Risk
Machine Learning Applications in Credit RiskMachine Learning Applications in Credit Risk
Machine Learning Applications in Credit Risk
QuantUniversity
 
Qo comparision
Qo comparisionQo comparision
Qo comparision
Manuell Labor
 
Introduction to H2O and Model Stacking Use Cases
Introduction to H2O and Model Stacking Use CasesIntroduction to H2O and Model Stacking Use Cases
Introduction to H2O and Model Stacking Use Cases
Jo-fai Chow
 
Apache Beam and Google Cloud Dataflow - IDG - final
Apache Beam and Google Cloud Dataflow - IDG - finalApache Beam and Google Cloud Dataflow - IDG - final
Apache Beam and Google Cloud Dataflow - IDG - final
Sub Szabolcs Feczak
 
Analyzing petabytes of smartmeter data using Cloud Bigtable, Cloud Dataflow, ...
Analyzing petabytes of smartmeter data using Cloud Bigtable, Cloud Dataflow, ...Analyzing petabytes of smartmeter data using Cloud Bigtable, Cloud Dataflow, ...
Analyzing petabytes of smartmeter data using Cloud Bigtable, Cloud Dataflow, ...
Edwin Poot
 
OLAP Basics and Fundamentals by Bharat Kalia
OLAP Basics and Fundamentals by Bharat Kalia OLAP Basics and Fundamentals by Bharat Kalia
OLAP Basics and Fundamentals by Bharat Kalia
Bharat Kalia
 
Joins in a distributed world Distributed Matters Barcelona 2015
Joins in a distributed world Distributed Matters Barcelona 2015Joins in a distributed world Distributed Matters Barcelona 2015
Joins in a distributed world Distributed Matters Barcelona 2015
Lucian Precup
 
Workshop on Google Cloud Data Platform
Workshop on Google Cloud Data PlatformWorkshop on Google Cloud Data Platform
Workshop on Google Cloud Data Platform
GoDataDriven
 
Ogi conf delft_v1_evangelos_kalampokis
Ogi conf delft_v1_evangelos_kalampokisOgi conf delft_v1_evangelos_kalampokis
Ogi conf delft_v1_evangelos_kalampokis
OpenGovIntelligence
 
Democratization of NOSQL Document-Database over Relational Database Comparati...
Democratization of NOSQL Document-Database over Relational Database Comparati...Democratization of NOSQL Document-Database over Relational Database Comparati...
Democratization of NOSQL Document-Database over Relational Database Comparati...
IRJET Journal
 
SQL Analytics Powering Telemetry Analysis at Comcast
SQL Analytics Powering Telemetry Analysis at ComcastSQL Analytics Powering Telemetry Analysis at Comcast
SQL Analytics Powering Telemetry Analysis at Comcast
Databricks
 
Logical Data Fabric and Data Mesh – Driving Business Outcomes
Logical Data Fabric and Data Mesh – Driving Business OutcomesLogical Data Fabric and Data Mesh – Driving Business Outcomes
Logical Data Fabric and Data Mesh – Driving Business Outcomes
Denodo
 
Key projects Data Science and Engineering
Key projects Data Science and EngineeringKey projects Data Science and Engineering
Key projects Data Science and Engineering
Vijayananda Mohire
 
Key projects Data Science and Engineering
Key projects Data Science and EngineeringKey projects Data Science and Engineering
Key projects Data Science and Engineering
Vijayananda Mohire
 
Amazon's Exabyte-Scale Migration from Spark to Ray
Amazon's Exabyte-Scale Migration from Spark to RayAmazon's Exabyte-Scale Migration from Spark to Ray
Amazon's Exabyte-Scale Migration from Spark to Ray
All Things Open
 
Webinar: Introducing the MongoDB Connector for BI 2.0 with Tableau
Webinar: Introducing the MongoDB Connector for BI 2.0 with TableauWebinar: Introducing the MongoDB Connector for BI 2.0 with Tableau
Webinar: Introducing the MongoDB Connector for BI 2.0 with Tableau
MongoDB
 
Data Modeling IoT and Time Series data in NoSQL
Data Modeling IoT and Time Series data in NoSQLData Modeling IoT and Time Series data in NoSQL
Data Modeling IoT and Time Series data in NoSQL
Basho Technologies
 
MongoDB and the Internet of Things
MongoDB and the Internet of ThingsMongoDB and the Internet of Things
MongoDB and the Internet of Things
Sam_Francis
 
Machine Learning Applications in Credit Risk
Machine Learning Applications in Credit RiskMachine Learning Applications in Credit Risk
Machine Learning Applications in Credit Risk
QuantUniversity
 
Introduction to H2O and Model Stacking Use Cases
Introduction to H2O and Model Stacking Use CasesIntroduction to H2O and Model Stacking Use Cases
Introduction to H2O and Model Stacking Use Cases
Jo-fai Chow
 
Apache Beam and Google Cloud Dataflow - IDG - final
Apache Beam and Google Cloud Dataflow - IDG - finalApache Beam and Google Cloud Dataflow - IDG - final
Apache Beam and Google Cloud Dataflow - IDG - final
Sub Szabolcs Feczak
 
Ad

Recently uploaded (20)

TrustArc Webinar: Mastering Privacy Contracting
TrustArc Webinar: Mastering Privacy ContractingTrustArc Webinar: Mastering Privacy Contracting
TrustArc Webinar: Mastering Privacy Contracting
TrustArc
 
STKI Israel Market Study 2025 final v1 version
STKI Israel Market Study 2025 final v1 versionSTKI Israel Market Study 2025 final v1 version
STKI Israel Market Study 2025 final v1 version
Dr. Jimmy Schwarzkopf
 
AI Emotional Actors: “When Machines Learn to Feel and Perform"
AI Emotional Actors:  “When Machines Learn to Feel and Perform"AI Emotional Actors:  “When Machines Learn to Feel and Perform"
AI Emotional Actors: “When Machines Learn to Feel and Perform"
AkashKumar809858
 
Kubernetes Cloud Native Indonesia Meetup - May 2025
Kubernetes Cloud Native Indonesia Meetup - May 2025Kubernetes Cloud Native Indonesia Meetup - May 2025
Kubernetes Cloud Native Indonesia Meetup - May 2025
Prasta Maha
 
Let’s Get Slack Certified! 🚀- Slack Community
Let’s Get Slack Certified! 🚀- Slack CommunityLet’s Get Slack Certified! 🚀- Slack Community
Let’s Get Slack Certified! 🚀- Slack Community
SanjeetMishra29
 
The case for on-premises AI
The case for on-premises AIThe case for on-premises AI
The case for on-premises AI
Principled Technologies
 
Evaluation Challenges in Using Generative AI for Science & Technical Content
Evaluation Challenges in Using Generative AI for Science & Technical ContentEvaluation Challenges in Using Generative AI for Science & Technical Content
Evaluation Challenges in Using Generative AI for Science & Technical Content
Paul Groth
 
Nix(OS) for Python Developers - PyCon 25 (Bologna, Italia)
Nix(OS) for Python Developers - PyCon 25 (Bologna, Italia)Nix(OS) for Python Developers - PyCon 25 (Bologna, Italia)
Nix(OS) for Python Developers - PyCon 25 (Bologna, Italia)
Peter Bittner
 
Agentic AI - The New Era of Intelligence
Agentic AI - The New Era of IntelligenceAgentic AI - The New Era of Intelligence
Agentic AI - The New Era of Intelligence
Muzammil Shah
 
Measuring Microsoft 365 Copilot and Gen AI Success
Measuring Microsoft 365 Copilot and Gen AI SuccessMeasuring Microsoft 365 Copilot and Gen AI Success
Measuring Microsoft 365 Copilot and Gen AI Success
Nikki Chapple
 
Offshore IT Support: Balancing In-House and Offshore Help Desk Technicians
Offshore IT Support: Balancing In-House and Offshore Help Desk TechniciansOffshore IT Support: Balancing In-House and Offshore Help Desk Technicians
Offshore IT Support: Balancing In-House and Offshore Help Desk Technicians
john823664
 
Droidal: AI Agents Revolutionizing Healthcare
Droidal: AI Agents Revolutionizing HealthcareDroidal: AI Agents Revolutionizing Healthcare
Droidal: AI Agents Revolutionizing Healthcare
Droidal LLC
 
Palo Alto Networks Cybersecurity Foundation
Palo Alto Networks Cybersecurity FoundationPalo Alto Networks Cybersecurity Foundation
Palo Alto Networks Cybersecurity Foundation
VICTOR MAESTRE RAMIREZ
 
Jeremy Millul - A Talented Software Developer
Jeremy Millul - A Talented Software DeveloperJeremy Millul - A Talented Software Developer
Jeremy Millul - A Talented Software Developer
Jeremy Millul
 
Introducing FME Realize: A New Era of Spatial Computing and AR
Introducing FME Realize: A New Era of Spatial Computing and ARIntroducing FME Realize: A New Era of Spatial Computing and AR
Introducing FME Realize: A New Era of Spatial Computing and AR
Safe Software
 
Gihbli AI and Geo sitution |use/misuse of Ai Technology
Gihbli AI and Geo sitution |use/misuse of Ai TechnologyGihbli AI and Geo sitution |use/misuse of Ai Technology
Gihbli AI and Geo sitution |use/misuse of Ai Technology
zainkhurram1111
 
Supercharge Your AI Development with Local LLMs
Supercharge Your AI Development with Local LLMsSupercharge Your AI Development with Local LLMs
Supercharge Your AI Development with Local LLMs
Francesco Corti
 
Improving Developer Productivity With DORA, SPACE, and DevEx
Improving Developer Productivity With DORA, SPACE, and DevExImproving Developer Productivity With DORA, SPACE, and DevEx
Improving Developer Productivity With DORA, SPACE, and DevEx
Justin Reock
 
UiPath Community Zurich: Release Management and Build Pipelines
UiPath Community Zurich: Release Management and Build PipelinesUiPath Community Zurich: Release Management and Build Pipelines
UiPath Community Zurich: Release Management and Build Pipelines
UiPathCommunity
 
GDG Cloud Southlake #43: Tommy Todd: The Quantum Apocalypse: A Looming Threat...
GDG Cloud Southlake #43: Tommy Todd: The Quantum Apocalypse: A Looming Threat...GDG Cloud Southlake #43: Tommy Todd: The Quantum Apocalypse: A Looming Threat...
GDG Cloud Southlake #43: Tommy Todd: The Quantum Apocalypse: A Looming Threat...
James Anderson
 
TrustArc Webinar: Mastering Privacy Contracting
TrustArc Webinar: Mastering Privacy ContractingTrustArc Webinar: Mastering Privacy Contracting
TrustArc Webinar: Mastering Privacy Contracting
TrustArc
 
STKI Israel Market Study 2025 final v1 version
STKI Israel Market Study 2025 final v1 versionSTKI Israel Market Study 2025 final v1 version
STKI Israel Market Study 2025 final v1 version
Dr. Jimmy Schwarzkopf
 
AI Emotional Actors: “When Machines Learn to Feel and Perform"
AI Emotional Actors:  “When Machines Learn to Feel and Perform"AI Emotional Actors:  “When Machines Learn to Feel and Perform"
AI Emotional Actors: “When Machines Learn to Feel and Perform"
AkashKumar809858
 
Kubernetes Cloud Native Indonesia Meetup - May 2025
Kubernetes Cloud Native Indonesia Meetup - May 2025Kubernetes Cloud Native Indonesia Meetup - May 2025
Kubernetes Cloud Native Indonesia Meetup - May 2025
Prasta Maha
 
Let’s Get Slack Certified! 🚀- Slack Community
Let’s Get Slack Certified! 🚀- Slack CommunityLet’s Get Slack Certified! 🚀- Slack Community
Let’s Get Slack Certified! 🚀- Slack Community
SanjeetMishra29
 
Evaluation Challenges in Using Generative AI for Science & Technical Content
Evaluation Challenges in Using Generative AI for Science & Technical ContentEvaluation Challenges in Using Generative AI for Science & Technical Content
Evaluation Challenges in Using Generative AI for Science & Technical Content
Paul Groth
 
Nix(OS) for Python Developers - PyCon 25 (Bologna, Italia)
Nix(OS) for Python Developers - PyCon 25 (Bologna, Italia)Nix(OS) for Python Developers - PyCon 25 (Bologna, Italia)
Nix(OS) for Python Developers - PyCon 25 (Bologna, Italia)
Peter Bittner
 
Agentic AI - The New Era of Intelligence
Agentic AI - The New Era of IntelligenceAgentic AI - The New Era of Intelligence
Agentic AI - The New Era of Intelligence
Muzammil Shah
 
Measuring Microsoft 365 Copilot and Gen AI Success
Measuring Microsoft 365 Copilot and Gen AI SuccessMeasuring Microsoft 365 Copilot and Gen AI Success
Measuring Microsoft 365 Copilot and Gen AI Success
Nikki Chapple
 
Offshore IT Support: Balancing In-House and Offshore Help Desk Technicians
Offshore IT Support: Balancing In-House and Offshore Help Desk TechniciansOffshore IT Support: Balancing In-House and Offshore Help Desk Technicians
Offshore IT Support: Balancing In-House and Offshore Help Desk Technicians
john823664
 
Droidal: AI Agents Revolutionizing Healthcare
Droidal: AI Agents Revolutionizing HealthcareDroidal: AI Agents Revolutionizing Healthcare
Droidal: AI Agents Revolutionizing Healthcare
Droidal LLC
 
Palo Alto Networks Cybersecurity Foundation
Palo Alto Networks Cybersecurity FoundationPalo Alto Networks Cybersecurity Foundation
Palo Alto Networks Cybersecurity Foundation
VICTOR MAESTRE RAMIREZ
 
Jeremy Millul - A Talented Software Developer
Jeremy Millul - A Talented Software DeveloperJeremy Millul - A Talented Software Developer
Jeremy Millul - A Talented Software Developer
Jeremy Millul
 
Introducing FME Realize: A New Era of Spatial Computing and AR
Introducing FME Realize: A New Era of Spatial Computing and ARIntroducing FME Realize: A New Era of Spatial Computing and AR
Introducing FME Realize: A New Era of Spatial Computing and AR
Safe Software
 
Gihbli AI and Geo sitution |use/misuse of Ai Technology
Gihbli AI and Geo sitution |use/misuse of Ai TechnologyGihbli AI and Geo sitution |use/misuse of Ai Technology
Gihbli AI and Geo sitution |use/misuse of Ai Technology
zainkhurram1111
 
Supercharge Your AI Development with Local LLMs
Supercharge Your AI Development with Local LLMsSupercharge Your AI Development with Local LLMs
Supercharge Your AI Development with Local LLMs
Francesco Corti
 
Improving Developer Productivity With DORA, SPACE, and DevEx
Improving Developer Productivity With DORA, SPACE, and DevExImproving Developer Productivity With DORA, SPACE, and DevEx
Improving Developer Productivity With DORA, SPACE, and DevEx
Justin Reock
 
UiPath Community Zurich: Release Management and Build Pipelines
UiPath Community Zurich: Release Management and Build PipelinesUiPath Community Zurich: Release Management and Build Pipelines
UiPath Community Zurich: Release Management and Build Pipelines
UiPathCommunity
 
GDG Cloud Southlake #43: Tommy Todd: The Quantum Apocalypse: A Looming Threat...
GDG Cloud Southlake #43: Tommy Todd: The Quantum Apocalypse: A Looming Threat...GDG Cloud Southlake #43: Tommy Todd: The Quantum Apocalypse: A Looming Threat...
GDG Cloud Southlake #43: Tommy Todd: The Quantum Apocalypse: A Looming Threat...
James Anderson
 
Ad

qCube: Efficient integration of range query operators over a high dimension data cube

  • 1. qCube: Efficient integration of range query operators over a high dimension data cube Rodrigo Rocha Silva Doctorate Student Prof. Dr. Celso Massaki Hirata Advisor Prof. Dr. Joubert de Castro Lima Co-Advisor ITA – INSTITUTO TECNOLÓGICO DE AERONÁUTICA Electronic Engineering and Computer Science Division - EEC/I Department of Computer Science Brazil
  • 2. qCube: Efficient integration of range query operators over a high dimension data cube Goal Present a new cube approach, designed for high dimension range queries. Our cube approach, named Query Cube (qCube), implements Equal, Not Equal, Greater or Less than, Some, Between and Similar range query operators and Distinct, Subcube and Top-k Similar inquire query operators Wednesday, October 02, 2012 28º Simpósio Brasileiro de Banco de Dados - Rodrigo Rocha Silva 2
  • 3. qCube: Efficient integration of range query operators over a high dimension data cube Topics – – – – – – – Motivation Data Cube Related Work Query Cube (qCube) Experiments Results Conclusions Wednesday, October 02, 2012 28º Simpósio Brasileiro de Banco de Dados - Rodrigo Rocha Silva 3
  • 4. qCube: Efficient integration of range query operators over a high dimension data cube Motivation Users need to view data in a tangible way, such as reports, cross tables and histograms Wednesday, October 02, 2012 28º Simpósio Brasileiro de Banco de Dados - Rodrigo Rocha Silva 4
  • 5. qCube: Efficient integration of range query operators over a high dimension data cube Motivation • Suppose that at some decision-making process it is necessary the following information : “What is the women journal research papers variance impact, using months {1, 3, 5, 7, 11}, year 2012 and ages varying from 25-40 years? Return results for all countries” “The average temperatures above 30 degrees Celsius on the weekends of leap years in the last 200 years.” Wednesday, October 02, 2012 28º Simpósio Brasileiro de Banco de Dados - Rodrigo Rocha Silva 5
  • 6. qCube: Efficient integration of range query operators over a high dimension data cube Data Cube A data cube, introduced by Gray et al., 1996, is a generalization of the group-by operator over all possible combinations of dimensions with various granularity aggregates. Wednesday, October 02, 2012 28º Simpósio Brasileiro de Banco de Dados - Rodrigo Rocha Silva 6
  • 7. qCube: Efficient integration of range query operators over a high dimension data cube Data Cube A data cube has exponential complexity with respect to the number of dimensions For an input with size d the output has size 2d Wednesday, October 02, 2012 28º Simpósio Brasileiro de Banco de Dados - Rodrigo Rocha Silva 7
  • 8. qCube: Efficient integration of range query operators over a high dimension data cube Data Cube • Hierarchies Year Discipline Day Department Year Wednesday, October 02, 2012 Hour 28º Simpósio Brasileiro de Banco de Dados - Rodrigo Rocha Silva 8
  • 9. qCube: Efficient integration of range query operators over a high dimension data cube Data Cube A C COUNT A B C COUNT * * * 11 * b2 c1 1 a1 * * 3 * b2 c2 1 a2 * * 5 * b3 c2 3 a3 Base Relation R – 11 tuples B * * 3 a1 b1 c1 1 A B C COUNT * b1 * 6 a3 b3 c2 1 a1 b1 c1 1 * b2 * 2 a2 b3 c2 1 a3 b3 c2 1 * b3 * 3 a3 b1 c1 1 a2 b3 c2 1 * * c1 4 a2 b1 c1 1 a3 b1 c1 1 * * c2 7 a2 b2 c2 1 a2 b1 c1 1 a1 b1 * 2 a1 b1 c2 1 a2 b2 c2 1 a1 b3 * 1 a2 b2 c1 1 a1 b1 c2 1 a2 b1 * 2 a3 b1 c2 1 a2 b2 c1 1 a2 b2 * 2 a1 b3 c2 1 a3 b1 c2 1 a2 b3 * 1 a2 b1 c2 1 a1 b3 c2 1 a3 b1 * 2 a2 b1 c2 1 a3 b3 * 1 a1 * c1 1 a1 * c2 2 a2 * c1 2 a2 * c2 3 a3 * c1 1 a3 * c2 2 * b1 c1 3 * b1 c2 3 Wednesday, October 02, 2012 FULL 3D CUBE + 38 tuples 28º Simpósio Brasileiro de Banco de Dados - Rodrigo Rocha Silva 9
  • 10. qCube: Efficient integration of range query operators over a high dimension data cube Related Work – Frag-Cubing Approach • Partitions the data vertically • Reduces high-dimensional cube into a set of lower dimensional cubes • Lossless reduction • Offers tradeoffs between the amount of pre-processing and the speed of online computation From book Han and Kamber: Data Mining Concepts and Techniques Wednesday, October 02, 2012 28º Simpósio Brasileiro de Banco de Dados - Rodrigo Rocha Silva 10
  • 11. qCube: Efficient integration of range query operators over a high dimension data cube Related Work – Frag-Cubing Example • Let the cube aggregation function be count tid A B C D E 1 a1 b1 c1 d1 e1 2 a1 b2 c1 d2 e1 3 a1 b2 c1 d1 e2 4 a2 b1 c1 d1 e2 5 a2 b1 c1 d1 e3 • Divide the 5 dimensions into 2 shell fragments: – (A, B, C) and (D, E) From book Han and Kamber: Data Mining Concepts and Techniques Wednesday, October 02, 2012 28º Simpósio Brasileiro de Banco de Dados - Rodrigo Rocha Silva 11
  • 12. qCube: Efficient integration of range query operators over a high dimension data cube Related Work – Frag-Cubing 1-D Inverted Indices • Build traditional invert index or RID list Attribute Value TID List List Size a1 123 3 a2 45 2 b1 145 3 b2 23 2 c1 12345 5 d1 1345 4 d2 2 1 e1 12 2 e2 34 2 e3 5 1 From book Han and Kamber: Data Mining Concepts and Techniques Wednesday, October 02, 2012 28º Simpósio Brasileiro de Banco de Dados - Rodrigo Rocha Silva 12
  • 13. qCube: Efficient integration of range query operators over a high dimension data cube Related Work – Frag-Cubing Approach • Generalize the 1-D inverted indices to multi-dimensional ones in the data cube sense • Compute all cuboids for data cubes ABC and DE while retaining the inverted indices • For example, shell fragment cube ABC contains 7 cuboids: – A, B, C – AB, AC, BC – ABC • This completes the offline computation stage Cell Intersection TID List List Size a1 b1 1 2 3 ∩1 4 5 1 1 a1 b2 1 2 3 ∩2 3 23 2 a2 b1 4 5 ∩1 4 5 45 2 a2 b2 4 5 ∩2 3 ⊗ 0 From book Han and Kamber: Data Mining Concepts and Techniques Wednesday, October 02, 2012 28º Simpósio Brasileiro de Banco de Dados - Rodrigo Rocha Silva 13
  • 14. qCube: Efficient integration of range query operators over a high dimension data cube Related Work – Frag-Cubing Measure Table • If measures other than count are present, store in ID_measure table separate from the shell fragments tid count sum 1 5 70 2 3 10 3 8 20 4 5 40 5 2 30 From book Han and Kamber: Data Mining Concepts and Techniques Wednesday, October 02, 2012 28º Simpósio Brasileiro de Banco de Dados - Rodrigo Rocha Silva 14
  • 15. qCube: Efficient integration of range query operators over a high dimension data cube Related Work – Frag-Cubing Query • Given the fragment cubes, process a query as follows 1. Divide the query into fragment, same as the shell 2. Fetch the corresponding TID list for each fragment from the fragment cube 3. Intersect the TID lists from each fragment to construct instantiated base table 4. Compute the data cube using the base table with any cubing algorithm From book Han and Kamber: Data Mining Concepts and Techniques Wednesday, October 02, 2012 28º Simpósio Brasileiro de Banco de Dados - Rodrigo Rocha Silva 15
  • 16. qCube: Efficient integration of range query operators over a high dimension data cube Related Work – Frag-Cubing Approach A B C D E F G H I J K L M N … Base Table Online Computation From book Han and Kamber: Data Mining Concepts and Techniques Wednesday, October 02, 2012 28º Simpósio Brasileiro de Banco de Dados - Rodrigo Rocha Silva 16
  • 17. qCube: Efficient integration of range query operators over a high dimension data cube qCube Approach Implements a set of tuple identifiers per dimension attribute, similar to Frag-Cubing; Therefore, qCube can answer point queries using tuple identifiers intersections and range queries using unions plus intersections algorithms, regardless measure function types. Frag-Cubing just implements point and some inquire queries. There is no Frag-Cubing solution for queries like “What is the women journal research papers variance impact, using months {1, 3, 5, 7, 11}, year 2012 and ages varying from 25-40 years? Return results for all countries” Wednesday, October 02, 2012 28º Simpósio Brasileiro de Banco de Dados - Rodrigo Rocha Silva 17
  • 18. qCube: Efficient integration of range query operators over a high dimension data cube qCube Approach Implements the range query operators: • Equal; • Not Equal; • Greater or Less than; • Some; • Between and Similar. Also implements inquire query operators: • Distinct; • Sub-cube; • Top-k Similar. Over a high dimension data cube. Wednesday, October 02, 2012 28º Simpósio Brasileiro de Banco de Dados - Rodrigo Rocha Silva 18
  • 19. qCube: Efficient integration of range query operators over a high dimension data cube qCube Architecture Wednesday, October 02, 2012 28º Simpósio Brasileiro de Banco de Dados - Rodrigo Rocha Silva 19
  • 20. qCube: Efficient integration of range query operators over a high dimension data cube qCube Computation TID 1 2 3 4 5 6 A a1 a2 a1 a3 a1 a5 Function tid 1 2 3 4 5 6 B b1 b2 b1 b3 b1 b5 C c1 c2 c1 c3 c4 c5 D d1 d2 d1 d2 d1 d2 Variance M1 2.56 3.14 2.45 6.7 9 1 Wednesday, October 02, 2012 E e1 e2 e1 e2 e2 e2 Count M2 1 1 1 1 1 1 Attribute Value TID List Attribute Value TID List a1 a2 a3 a5 b1 b2 b3 b5 c1 c2 c3 c4 c5 d1 d2 e1 e2 Average M3 10 20 10 11 3 1 1, 3, 5 2 4 6 1, 3, 5 2 4 6 1, 3 Skewness M4 1 0 1 1 1 1 2 4 5 6 1, 3, 5 2, 4, 6 1, 3 2, 4, 5, 6 Standard deviation M5 877686769698 7986676867.99 -7878789.8777 -99974333.23 100045.655 1 28º Simpósio Brasileiro de Banco de Dados - Rodrigo Rocha Silva 20
  • 21. qCube: Efficient integration of range query operators over a high dimension data cube qCube Update The same qCube Computation algorithm Wednesday, October 02, 2012 28º Simpósio Brasileiro de Banco de Dados - Rodrigo Rocha Silva 21
  • 22. qCube: Efficient integration of range query operators over a high dimension data cube qCube Update TID 1 2 3 4 5 6 A a1 a2 a1 a3 a1 a5 B b1 b2 b1 b3 b1 b5 C c1 c2 c1 c3 c4 c5 D d1 d2 d1 d2 d1 d2 E e1 e2 e1 e2 e2 e2 Attribute Value a1 a2 a3 a5 b1 b2 b3 b5 c1 c2 c3 Wednesday, October 02, 2012 tid 5 7 8 9 TID List 1, 3 2, 8 4, 5, 7 6, 9 1, 3, 5 2, 7 4, 8 6, 9 1, 3 2 4, 7 A a3 a3 a2 a5 B b1 b2 b3 b5 C c4 c3 c4 c5 Attribute Value c4 c5 d1 d2 d3 e1 e2 e3 f1 f2 D d1 d3 d3 d1 E e2 e3 e2 e1 F f1 f2 TID List 5, 8 6, 9 1, 3, 5, 9 2, 4, 6 7, 8 1, 3, 9 2, 4, 5, 6, 8 7 8 9 28º Simpósio Brasileiro de Banco de Dados - Rodrigo Rocha Silva 22
  • 23. qCube: Efficient integration of range query operators over a high dimension data cube qCube Query pQ= a1:*:*:*:e1 Attribute Value a1 a2 a3 a5 b1 b2 b3 b5 c1 c2 c3 Wednesday, October 02, 2012 TID List 1, 3 2, 8 4, 5 6, 9 1, 3, 5 2, 7 4, 8 6, 9 1, 3 2 4, 7 Attribute Value c4 c5 d1 d2 d3 e1 e2 e3 f1 f2 TID List 5, 8 6, 9 1, 3, 5, 9 2, 4, 6 7, 8 1, 3, 9 2, 4, 5, 6, 8 7 8 9 28º Simpósio Brasileiro de Banco de Dados - Rodrigo Rocha Silva 23
  • 24. qCube: Efficient integration of range query operators over a high dimension data cube qCube Range and Inquire Query rOp= (greater than + less than + between + some + different + similar x (fv1 … fvn)) iOp =(sub-cube + distinct + top-k similar x (fv1 … fvn)) qCube rearranges Q sub-queries in order to improve query response times a result of Q we have qR=(TID1, TID2 … TIDk), where TIDi is the ith tuple identifier of relation R. Wednesday, October 02, 2012 28º Simpósio Brasileiro de Banco de Dados - Rodrigo Rocha Silva 24
  • 25. qCube: Efficient integration of range query operators over a high dimension data cube qCube Query - example Wednesday, October 02, 2012 28º Simpósio Brasileiro de Banco de Dados - Rodrigo Rocha Silva 25
  • 26. qCube: Efficient integration of range query operators over a high dimension data cube qCube Query - example “What is the women journal research papers variance impact, using months {1, 3, 5, 7, 11}, year 2012 and ages varying from 2540 years? Return results for all countries” In Q, they are (sex = women, paperType=journal, year=2012). The range queries (month = (1,3,5,7,11), age <>25-40) are also sorted according to their cardinalities. In Q, there is inquire query (country=distinct). Wednesday, October 02, 2012 28º Simpósio Brasileiro de Banco de Dados - Rodrigo Rocha Silva 26
  • 27. qCube: Efficient integration of range query operators over a high dimension data cube qCube Query - example Wednesday, October 02, 2012 28º Simpósio Brasileiro de Banco de Dados - Rodrigo Rocha Silva 27
  • 28. qCube: Efficient integration of range query operators over a high dimension data cube Experiments • We tested qCube Computation and Query algorithms against Frag-Cubing algorithm used in [Li et al. 2004]; • The qCube algorithms were coded in Java 64 bits; • Frag-Cubing is a free and open source C++ application(http://illimine.cs.uiuc.edu/); • The synthetic base relations were created using data generator provided by the IlliMine project; • The IlliMine project is an open-source project to provide various approaches for data mining and machine learning. • Frag-Cubing approach is part of IlliMine project. • We ran the algorithms in two Intel Xeon six-core processors with 2.4GHz each core, 12MB cache and 128GB of RAM DDR3 1333MHz. • The system runs Windows Server 2008 64 bits, High Performance version. Wednesday, October 02, 2012 28º Simpósio Brasileiro de Banco de Dados - Rodrigo Rocha Silva 28
  • 29. qCube: Efficient integration of range query operators over a high dimension data cube Results - Performance Evaluation of Point Queries and Skewed Relations Response time per query over 100 trials: T=107; C=5000; D=30, S=0 Response time per query over 100 trials: T=107; C=5000; D=30, S=2.5 Wednesday, October 02, 2012 28º Simpósio Brasileiro de Banco de Dados - Rodrigo Rocha Silva 29
  • 30. qCube: Efficient integration of range query operators over a high dimension data cube Results - Performance Evaluation of Range Query Operators and Skewed Relations Response time queries with one infrequent point operator: T=107; C=5000; D=30, S=2.5 Wednesday, October 02, 2012 28º Simpósio Brasileiro de Banco de Dados - Rodrigo Rocha Silva 30
  • 31. qCube: Efficient integration of range query operators over a high dimension data cube Results - Performance Evaluation of of Inquire Operators and Skewed Relations Response time queries with inquire operators: T = 107; C = 5000; D = 30, S = 2.5. Wednesday, October 02, 2012 28º Simpósio Brasileiro de Banco de Dados - Rodrigo Rocha Silva 31
  • 32. qCube: Efficient integration of range query operators over a high dimension data cube Results - Runtime and Memory Consumption Wednesday, October 02, 2012 28º Simpósio Brasileiro de Banco de Dados - Rodrigo Rocha Silva 32
  • 33. qCube: Efficient integration of range query operators over a high dimension data cube Conclusions • qCube has linear runtime and memory consumption, similar to Frag-Cubing; • It implements Not Equal, Greater or Less than, Some, Between and Similar range query operators and Distinct, Sub-cube and Top-k Similar inquire query operators; • When compared with Frag-Cubing, qCube is faster to answer point and inquire queries with sub-cube operators. • It introduces a different cube representation with less empty cells than Frag-Cubing; • Frag-Cubing cannot answer two sub-cube operators in a data cube with 107 tuples, C=5000, D=30 and S=2.5. Wednesday, October 02, 2012 28º Simpósio Brasileiro de Banco de Dados - Rodrigo Rocha Silva 33
  • 34. qCube: Efficient integration of range query operators over a high dimension data cube Conclusions Interesting research directions to further extend qCube: First, we must experiment it with holistic measures. Update and computation experiments with many holistic measures are a hard problem; TIDs can become huge, thus memory consumption and intersection costs can become impracticable, and therefore we must address an efficient solution to partition TIDs with fast data retrieval. Multicore and multicomputer versions of qCube must be implemented. qCube must be improved to answer top-k queries combined with range, point and inquire queries. Experiments with high dimensional text cubes must be made to evaluate qCube , specially its text measures computing. Wednesday, October 02, 2012 28º Simpósio Brasileiro de Banco de Dados - Rodrigo Rocha Silva 34
  • 35. qCube: Efficient integration of range query operators over a high dimension data cube Acknowlegements Wednesday, October 02, 2012 28º Simpósio Brasileiro de Banco de Dados - Rodrigo Rocha Silva 35