0% found this document useful (0 votes)
42 views

An Efficient and Robust Access Method For Points and Rectangles

Presentation describing R* Tree. This is mainly based on this paper "The R*-Tree: An Efficient and Robust Access Method for Points and Rectangles -N. Beckmann, H.-P. Kriegel, R. Schneider and B. Seeger. SIGMOD 1990"

Uploaded by

Alok Yadav
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
42 views

An Efficient and Robust Access Method For Points and Rectangles

Presentation describing R* Tree. This is mainly based on this paper "The R*-Tree: An Efficient and Robust Access Method for Points and Rectangles -N. Beckmann, H.-P. Kriegel, R. Schneider and B. Seeger. SIGMOD 1990"

Uploaded by

Alok Yadav
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 38

An Efficient and Robust Access Method for Points and Rectangles

Norbert Beckmann, Hans-Peter Kriegel, Ralf Schneider, Bernhard Seeger

Presented By : Alok Kumar Yadav 2009cs10177

Overview
Introduction
Rtree and Its Optimization R-tree Variants

R*-tree
Experiments Conclusions

Introduction
Spatial Access Methods (SAMs): Approximation of complex spatial object by MBR Pros:

Complex object can be represented by limited no of bytes Preserves most essential geometric properties i.e. location and extension

Cons: (Obviously) A lot of information is lost

Introduction (Cont..)
R-tree: B+-tree like structure Popular access method for rectangles Based on the heuristic optimization of area of enclosing rectangles in each inner node R*-tree: Combined optimization: Area, Margin & Overlap Outperforms exiting R-tree variants Efficiently supports point and spatial data

Introduction (Cont..)
Given a city map, index all university buildings in an efficient structure for quick topological search.

Introduction (Cont..)

Introduction (Cont..)
MBR of the city
neighbourhoods.

MBR of the city


defining the overall search region.

R-tree
B+-tree like structure
Nodes : E (cp, rectangle) cp :

Child Pointer For leaves it is a record in database MBR of all rectangle in child node For leaves it is enclosing rectangle of spatial object

rectangle :

Max no of elements in a node is M, min is m

R-tree (Cont..)
Structure
c
I(A) I(B) I(M)

d
b

a
I(a) I(b) I(c) I(d)

R-tree (Cont..)
B+-tree like structure Nodes : E (cp, rectangle) cp :

Child Pointer For leaves it is a record in database

rectangle : MBR of all rectangle in child node For leaves it is enclosing rectangle of spatial object

Optimization criterion: Minimization of area of enclosing

rectangles in each inner node Allows overlapping of directory rectangles, hence cannot guarantee a single search path

R-tree Variants
It is dynamic, hence all optimization
Insertion Algorithm

have to applied during insertion Finds most suitable subtree for new entry

ChooseSubtree Algorithm

If node is filled with more then M

If ends in a node filled with M entries

entries, distribute in two nodes in a appropriate manner

Split Algorithm

R-tree Variants
Original R-tree : Guttman
Greenes R-tree

Original R-tree: Guttman


Method of optimization is minimize directory

rectangle area Split algos: Exponential, Linear, Quadratic Exponential best but cpu cost too high Others are approximations Quadratic outperforms linear

Guttmans ChooseSubtree
CS1 CS2 Set N to be the root node If N is a leaf, return N else Choose the entry in N whose rectangle needs least area enlargement to include the new data. Resolve ties by choosing the entry with the rectangle of smallest area end Set N to be the childnode pointed to by the childpointer of the chosen entry. Repeat from CS2

CS3

Guttmans Split Algorithm


Quadratic Split
[Divide a set of M+1 index entries into two groups] QS1 Invoke PickSeeds to choose two entries, each be first entry of each group QS2 Repeat DistributeEntry until all entries are distributed or one of the two groups has Mm+1 entries (so that the other group has m entries) QS3 If entries remain, assign them to the other group so that it has the minimum number m required

Guttmans Split Algorithm (Cont..)


PickSeeds PS1 For each pair of entries E1 and E2, compose a rectangle R including E1 rectangle and E2 rectangle
Calculate d = area(R) - area(E1 rectangle) - area(E2 rectangle)

PS2

Choose the pair with the largest d

DistributeEntry
DE1 DE2 Invoke PickNext to choose the next entry to be assigned Add It to the group whose covering rectangle will have to be enlarged least to accommodate It. Resolve ties by adding the entry to the group with the smallest area, then to the one with the fewer entries, then to either

Guttmans Split Algorithm (Cont..)


PickNext
DE1 For each entry E not yet in a group, calculate d1 = the area increase required in the covering rectangle of Group 1 to include E Rectangle. Calculate d2 analogously for Group 2. Choose the entry with the maximum difference between d1 and d2

DE2

Problems
Small Seeds: If d-1 of the d axes of a far away rectangle is same as one seed, needle like bounding rectangle may be formed May initiate a bad split
R1 R2

R3

Problems (Cont..)
Prefer Bounding Rectangle: Algo prefer the MBR created from previous assignment Since it was enlarged, it requires less area enlargement to include next entry

G1

X
Z

G2

If a group reached M-m+1 entries, rest are assigned

to other without considering geometric properties

Greenes R-tree
ChooseSubtree is same as Guttmans
Alternative split algorithm Invokes PickSeeds to find two most distant rectangles

Picks a axis using these two rectangles depending

upon there separation distance Sorts the remaining rectangles along chosen axis Distributes half entries to one group and remaining to other

Greenes R-tree (Cont..)


In some situations cannot find right axis and bad split

may occur

Inspiration For R*-tree


Minimize overlap between directory rectangles :

Decrease no of path to be traversed Minimize margin of a directory rectangle : Rectangle would be shaped more quadratic
Optimize space utilization : Height will be low

R*-tree
Structure same as R-tree
For insertion R-tree versions only consider area R*-tree consider area, margin & overlap in different

combinations Overlap is defined as

R*-tree: ChooseSubtree
Similar to original one, only difference is that it minimizes overlap enlargement when N points to leaves
CS1 CS2 Set N to be the root node If N is a leaf, return N else If childpointers in N point to leaves [determine the minimum overlap cost], choose the entry in N whose rectangle needs least overlap enlargement to include the new data rectangle. Resolve ties by choosing the entry whose rectangle needs least area enlargement, then the entry with the rectangle of smallest area else [determine the minimum area cost] choose the entry in N whose rectangle needs least area enlargement to include the new data rectangle. Resolve ties by choosing entry with rectangle of smallest area end Set N to be the childnode pointed to by the childpointer of the chosen entry. Repeat from CS2

CS3

R*-tree: Split Algorithm


Three goodness values:
area-value margin-value overlap-value

area[bb(first group)] + area[bb(second group)] margin[bb(first group)] + margin[bb(second group)] area[bb(first group) bb(second group)]

Depending on these values final distribution is determined


*bb: bounding box

R*-tree: Split Algorithm (Cont..)


Method for good split: Along each axis entries are sorted first by lower and then by upper value of their rectangles For each sort M-2m+2 distribution of M+1 entries is determined First group contains (m-1)+k entries while other contains remaining (k=1,..,(M-2m+2)) For each distribution goodness value is measured

R*-tree: Split Algorithm (Cont..)


Split S1 Invoke ChooseSplitAxis to determine the axis, perpendicular to which the split is performed S2 Invoke ChooseSplitIndex to determine the best distribution into two groups along that axis S3 Distribute the entries into two groups
ChooseSplitAxis

CSA1

CSA2
CSI1

For each axis Sort the entries by the lower then by the upper value of their rectangles and determine all distributions as described above. Compute S, the sum of all marginvalues of the different distributions end Choose the axis with the minimum S as split axis
Along the chosen split axis, choose the distribution with the minimum overlap-value. Resolve ties by choosing the distribution with minimum area-value

ChooseSplitIndex

Reinsert
Dealing with under filled nodes in R-tree: Remove its

entries and reinsert them It improves retrieval performances Deletion and reinsertions tunes R-tree but it is very static To achieve dynamic reorganization R*-tree uses forced reinsertion during insertion routine

Forced Reinsert
If a node is overfilled, R*-tree takes p entries based on

distance of their center from center of MBR Removes p entries and adjust the MBR Reinserts them to prevent splits If they are reinserted in the same node again then it calls split Now cpu cost of insert is increased but if we take average on large insert it is only increased about 4% due to reduced splits and better structure

Forced Reinsert: Advantages


More reconstruction, less split
Storage utilization is improved Outer rectangles are reinserted, directory rectangle

becomes more quadratic which is a desired property

Experiments
Comparison between four R-tree variants R-tree with quadratic split algorithm (qua.Gut) R-tree with linear split algorithm (lin.Gut) Greenes variant of R-tree (Greene) R*-tree Six data files containing about 100,000 2D rectangles
All experiments were measured in number of disk

accesses

Experiments (Cont..)
Types of queries Rectangle intersection query

given a rectangle S, find all rectangles R in the file with R S

Point query given a point P, find all rectangles R in the file with P R. Rectangle enclosure query given a rectangle S, find all rectangles R in the file with R S Spatial join Over two files as the set of all pairs of rectangles where rectangle from f1 intersects rectangle from f2 Also measured the parameters insertion and storage

utilization

Seven query files were created

Results
The page access for queries to R*-tree are standardized

to 100%. Here is the relative performance for all 4 variants for R-tree

Results (Cont..)
Unweighted average results over all distributions

Results (Cont..)
R*-tree are very efficient for PAM
Even outperforms very popular 2-level grid file

Conclusions
Since all three area, margin and overlap are reduced,

R*-tree is very robust against ugly data Storage utilization is higher, insertion cost is low Outperforms all existing R-tree variants R*-tree can efficiently be used as an access method in database systems organizing both multidimensional points and spatial data

References
The R*-Tree: An Efficient and Robust Access Method for Points and Rectangles

-N. Beckmann, H.-P. Kriegel, R. Schneider and B. Seeger. SIGMOD 1990 http://en.wikipedia.org/wiki/R-tree Image Sources:

Maps & R-tee: http://electures.informatik.uni-freiburg.de/portal/download/3/8534/thm03%20-%20rTtee%20p1.ppt Thumbs Up: http://www.ideachampions.com/weblogs/Peer%20to%20Peer%20Recognition.png Gears: http://www.yesup.net/wordpress/wp-content/themes/yesupnet2/images/icon5.png Tree: http://a01421.deviantart.com/art/tree-variants-304634600 Choose: http://www.transforming-technologies.com/blog/index.php/2011/06/16/how-to-choose-an-esd-mat/ Original: http://www.pixmac.com/picture/original+ink+stamp/000045168969 Split: http://www.clipartguide.com/_pages/0511-1001-2605-2460.html Forced: http://thepoliticalcarnival.net/2011/05/ Advantages : http://www.webgraffiti.ca/advantages.html Experiments: http://nilssmith.com/becoming-a-social-media-pastor-part-4-the-experiment/ Results: http://www.iconshock.com/icons/sigma/project_managment/results-icon.html Conclusions: http://herbertjlkld.portrelay.com/conclusions-clip-art.html Introduction: http://www.eng.fju.edu.tw/iacd_2010S/computer/introduction1.htm

Thanks

Q/A

You might also like