An Efficient and Robust Access Method For Points and Rectangles
An Efficient and Robust Access Method For Points and Rectangles
Overview
Introduction
Rtree and Its Optimization R-tree Variants
R*-tree
Experiments Conclusions
Introduction
Spatial Access Methods (SAMs): Approximation of complex spatial object by MBR Pros:
Complex object can be represented by limited no of bytes Preserves most essential geometric properties i.e. location and extension
Introduction (Cont..)
R-tree: B+-tree like structure Popular access method for rectangles Based on the heuristic optimization of area of enclosing rectangles in each inner node R*-tree: Combined optimization: Area, Margin & Overlap Outperforms exiting R-tree variants Efficiently supports point and spatial data
Introduction (Cont..)
Given a city map, index all university buildings in an efficient structure for quick topological search.
Introduction (Cont..)
Introduction (Cont..)
MBR of the city
neighbourhoods.
R-tree
B+-tree like structure
Nodes : E (cp, rectangle) cp :
Child Pointer For leaves it is a record in database MBR of all rectangle in child node For leaves it is enclosing rectangle of spatial object
rectangle :
R-tree (Cont..)
Structure
c
I(A) I(B) I(M)
d
b
a
I(a) I(b) I(c) I(d)
R-tree (Cont..)
B+-tree like structure Nodes : E (cp, rectangle) cp :
rectangle : MBR of all rectangle in child node For leaves it is enclosing rectangle of spatial object
rectangles in each inner node Allows overlapping of directory rectangles, hence cannot guarantee a single search path
R-tree Variants
It is dynamic, hence all optimization
Insertion Algorithm
have to applied during insertion Finds most suitable subtree for new entry
ChooseSubtree Algorithm
Split Algorithm
R-tree Variants
Original R-tree : Guttman
Greenes R-tree
rectangle area Split algos: Exponential, Linear, Quadratic Exponential best but cpu cost too high Others are approximations Quadratic outperforms linear
Guttmans ChooseSubtree
CS1 CS2 Set N to be the root node If N is a leaf, return N else Choose the entry in N whose rectangle needs least area enlargement to include the new data. Resolve ties by choosing the entry with the rectangle of smallest area end Set N to be the childnode pointed to by the childpointer of the chosen entry. Repeat from CS2
CS3
PS2
DistributeEntry
DE1 DE2 Invoke PickNext to choose the next entry to be assigned Add It to the group whose covering rectangle will have to be enlarged least to accommodate It. Resolve ties by adding the entry to the group with the smallest area, then to the one with the fewer entries, then to either
DE2
Problems
Small Seeds: If d-1 of the d axes of a far away rectangle is same as one seed, needle like bounding rectangle may be formed May initiate a bad split
R1 R2
R3
Problems (Cont..)
Prefer Bounding Rectangle: Algo prefer the MBR created from previous assignment Since it was enlarged, it requires less area enlargement to include next entry
G1
X
Z
G2
Greenes R-tree
ChooseSubtree is same as Guttmans
Alternative split algorithm Invokes PickSeeds to find two most distant rectangles
upon there separation distance Sorts the remaining rectangles along chosen axis Distributes half entries to one group and remaining to other
may occur
Decrease no of path to be traversed Minimize margin of a directory rectangle : Rectangle would be shaped more quadratic
Optimize space utilization : Height will be low
R*-tree
Structure same as R-tree
For insertion R-tree versions only consider area R*-tree consider area, margin & overlap in different
R*-tree: ChooseSubtree
Similar to original one, only difference is that it minimizes overlap enlargement when N points to leaves
CS1 CS2 Set N to be the root node If N is a leaf, return N else If childpointers in N point to leaves [determine the minimum overlap cost], choose the entry in N whose rectangle needs least overlap enlargement to include the new data rectangle. Resolve ties by choosing the entry whose rectangle needs least area enlargement, then the entry with the rectangle of smallest area else [determine the minimum area cost] choose the entry in N whose rectangle needs least area enlargement to include the new data rectangle. Resolve ties by choosing entry with rectangle of smallest area end Set N to be the childnode pointed to by the childpointer of the chosen entry. Repeat from CS2
CS3
area[bb(first group)] + area[bb(second group)] margin[bb(first group)] + margin[bb(second group)] area[bb(first group) bb(second group)]
CSA1
CSA2
CSI1
For each axis Sort the entries by the lower then by the upper value of their rectangles and determine all distributions as described above. Compute S, the sum of all marginvalues of the different distributions end Choose the axis with the minimum S as split axis
Along the chosen split axis, choose the distribution with the minimum overlap-value. Resolve ties by choosing the distribution with minimum area-value
ChooseSplitIndex
Reinsert
Dealing with under filled nodes in R-tree: Remove its
entries and reinsert them It improves retrieval performances Deletion and reinsertions tunes R-tree but it is very static To achieve dynamic reorganization R*-tree uses forced reinsertion during insertion routine
Forced Reinsert
If a node is overfilled, R*-tree takes p entries based on
distance of their center from center of MBR Removes p entries and adjust the MBR Reinserts them to prevent splits If they are reinserted in the same node again then it calls split Now cpu cost of insert is increased but if we take average on large insert it is only increased about 4% due to reduced splits and better structure
Experiments
Comparison between four R-tree variants R-tree with quadratic split algorithm (qua.Gut) R-tree with linear split algorithm (lin.Gut) Greenes variant of R-tree (Greene) R*-tree Six data files containing about 100,000 2D rectangles
All experiments were measured in number of disk
accesses
Experiments (Cont..)
Types of queries Rectangle intersection query
Point query given a point P, find all rectangles R in the file with P R. Rectangle enclosure query given a rectangle S, find all rectangles R in the file with R S Spatial join Over two files as the set of all pairs of rectangles where rectangle from f1 intersects rectangle from f2 Also measured the parameters insertion and storage
utilization
Results
The page access for queries to R*-tree are standardized
to 100%. Here is the relative performance for all 4 variants for R-tree
Results (Cont..)
Unweighted average results over all distributions
Results (Cont..)
R*-tree are very efficient for PAM
Even outperforms very popular 2-level grid file
Conclusions
Since all three area, margin and overlap are reduced,
R*-tree is very robust against ugly data Storage utilization is higher, insertion cost is low Outperforms all existing R-tree variants R*-tree can efficiently be used as an access method in database systems organizing both multidimensional points and spatial data
References
The R*-Tree: An Efficient and Robust Access Method for Points and Rectangles
-N. Beckmann, H.-P. Kriegel, R. Schneider and B. Seeger. SIGMOD 1990 http://en.wikipedia.org/wiki/R-tree Image Sources:
Maps & R-tee: http://electures.informatik.uni-freiburg.de/portal/download/3/8534/thm03%20-%20rTtee%20p1.ppt Thumbs Up: http://www.ideachampions.com/weblogs/Peer%20to%20Peer%20Recognition.png Gears: http://www.yesup.net/wordpress/wp-content/themes/yesupnet2/images/icon5.png Tree: http://a01421.deviantart.com/art/tree-variants-304634600 Choose: http://www.transforming-technologies.com/blog/index.php/2011/06/16/how-to-choose-an-esd-mat/ Original: http://www.pixmac.com/picture/original+ink+stamp/000045168969 Split: http://www.clipartguide.com/_pages/0511-1001-2605-2460.html Forced: http://thepoliticalcarnival.net/2011/05/ Advantages : http://www.webgraffiti.ca/advantages.html Experiments: http://nilssmith.com/becoming-a-social-media-pastor-part-4-the-experiment/ Results: http://www.iconshock.com/icons/sigma/project_managment/results-icon.html Conclusions: http://herbertjlkld.portrelay.com/conclusions-clip-art.html Introduction: http://www.eng.fju.edu.tw/iacd_2010S/computer/introduction1.htm
Thanks
Q/A