Finding The Closest Pair of Points
Finding The Closest Pair of Points
Problem: Given n ordered pairs (x1 , y1), (x2 , y2), ... , (xn , yn), find the distan e bet!een the t!o points in the set that are losest together. "he br#te for e algorithm is as follo!s: $terate thro#gh all possible pairs of points, al #lating the distan e bet!een ea h of these pairs. %ny time yo# see a distan e shorter than the shortest distan e seen, #pdate the shortest distan e seen. &in e omp#ting the distan e bet!een t!o points ta'es ((1) n( n 1) time, and there are a total of ) (n2) distin t pairs of 2 points, it follo!s that the r#nning time of this algorithm is (n2). Can !e do better* +ere,s the idea: 1) &plit the set of n points into t!o halves by a verti al line. (-e an do this by sorting all the points by their x. oordinate and then pi 'ing the middle point and dra!ing a verti al line /#st to the left or right of it.) 2) 0e #rsively all the f#n tion to solve the problem on both sets of points. 1) 0et#rn the smaller of the t!o val#es. -hat,s the problem !ith this idea*
"he problem is that the a t#al shortest distan e bet!een any t!o of the original points 2$G+" 34 bet!een a point from the first set and a point in the se ond set5 Consider this sit#ation: 7 % 7 6 6 7 6 3 7
+ere !e have fo#r points separated into gro#ps % and 3. 8et the points be labeled 1, 2, 1 and 9 from left to right. :oti e that the minim#m of the distan es bet!een points 1 and 2 and then points 1 and 9 is :(" the a t#al shortest possible distan e bet!een points, sin e points 2 and 1 are m# h loser to one another. "h#s, !e m#st adapt o#r approa h: $n step 1, !e an ;save; the smaller of the t!o val#es, (!e,ll all this ), then, !e have to he ' to see if there are points (one in ea h gro#p) that are loser than apart. <o !e need to sear h thro#gh all possible pairs of points from the t!o different sides* Probably not. -e m#st only onsider points that are !ithin a distan e of to o#r dividing line.
6 ........................... 7 6 7 7 6 % 3 .................................................
&till, ho!ever, one o#ld onstr# t a ase !here %88 the points on ea h side are !ithin of the verti al line: 7 6 7 7 6 7 6 7 6 7 7 6 7 &o, te hni ally spea'ing, this ase is as bad as o#r original idea !here !e,d have to ompare ea h pair of points to one another from the different gro#ps. 3#t, !ait, is it really ne essary to ompare ea h point on one side !ith every other point on every other side*** Consider the follo!ing re tangle aro#nd the dividing line that is onstr# ted by eight =2 x =2 s>#ares. :ote that the diagonal of ea h s>#are is 2 , !hi h is less than . &in e ea h s>#are lies on a single side of the dividing line, at most one point lies in ea h box, be a#se if t!o points !ere !ithin a single box the distan e bet!een those t!o points !o#ld be less than , ontradi ting the definition of . "his means, that there are at 2(&" ? other points that o#ld possibly be a distan e of less than apart from a given point, that have a greater y oordinate than that point. (-e ass#me that o#r point is on the bottom ro! of this grid@ !e dra! the grid that !ay.)
:o!, !e ome to the iss#e of ho! do !e 'no! -+$C+ ? points to ompare a given point !ith*** "he idea is as follo!s: as yo# are pro essing the points re #rsively, &(0" them based on the y. oordinate. "hen, for a given point !ithin the strip, yo# only need to ompare !ith the next seven points5 +ere is a pse#do ode version of the algorithm. ($ have simplified the pse#do ode from the boo' so that it,s easier to get an overall #nderstanding of the flo! of the algorithm.) losestApair(p) B mergesort(p, 1, n) == n is n#mber of points ret#rn re A lApair(p, 1, 2) C re A lApair(p, i, /) B if (/ . i D 1) B EE $f there are three points or less... mergesort(p, i, /) ret#rn shortestAdistan e(pFiG, pFiH1G, pFiH2G) C xval ) pF(iH/)=2G.x delta8 ) re A lApair(p, i, (iH/)=2) delta0 ) re A lApair(p, (iH/)=2H1, /) delta ) min(delta8, delta0) merge(p, i, ', /) v ) vertAstrip(p, xval, delta) for ')1 to siIe(v).1 for s ) ('H1) to min(t, 'H?) delta ) min(delta, dist(vF'G, vFsG)) ret#rn delta
C 8et,s tra e thro#gh an example !ith the follo!ing J points: (K, K), (1, L), (2, M), (2, 1), (1, 9), (N, 1), (L, ?), (?, 9) and (M, K). "hese are already sorted these by the x. oordinate. -e !ill split the points in half, !ith the first fo#r in the first half and the last five in the last half. &o, !e m#st re #rsively all re A lApair on the set B(K, K), (1, L), (2, M), (2, 1)C "his res#lts in t!o more re #rsive alls, to re A lApair on the sets B(K, K), (1, L)C and B(2, M), (2, 1)C. "he first all simply ret#rns the distan e 37 and sorts the points by y. oordinate !hi h leaves them #n hanged. "he se ond all ret#rns N and s!aps the order of the points (2, 1), (2, M) in the array p. "hen, delta is set to N, and the points are all merged by y oordinate: (K, K), (2, 1), (1, L), (2, M). (%ll of these are then opied into the v array sin e all points are !ithin a distan e of N of the line x)1.) &in e !e have less than ? points, all points are ompared and it is dis overed that (1, L) and (2, M) are a distan e of 5 apart. :o!, let,s all re A lApair on the set B(1, 9), (N, 1), (L, ?), (?, 9), (M, K)C -e first ma'e t!o re #rsive alls to the sets: B(1, 9), (N, 1)C and B(L, ?), (?, 9), (M, K)C. "he first !ill res#lt in delta8 being set to 13 and the points getting sorted by y: B(N, 1), (1, 9)C.
"he se ond res#lts in delta being set to 10 and the points getting sorted by y: B(M, K), (?, 9), (L, ?)C. -e then merge all of these points: B(M, K), (N, 1), (1, 9), (?, 9), (L, ?)C. %ll of these points get opied into v. "hen, !e dis over that :( pair of points beats the original delta0 of 10 . "his is then the losest distan e bet!een any pair of points !ithin the se ond set of data. Finally, !e m#st merge %88 the points together by y. oordinate: (K, K), (M, K), (N, 1), (2, 1), (1, 9), (?, 9) , (1, L), (L, ?), (2, M) this time, !e only pi ' those points that are !ithin line x)2 to opy into v. "hese points are: (K, K), (N, 1), (2, 1), (1, 9), (1, L), (2, M) :o!, !e s an thro#gh all pairs to dis over that the shortest distan e bet!een any of the t!o points is 2 .
10
of the
if
C = AB ,
M 9
"(n) ) M"(n=2) H ((n2) Pl#g in a ) M, b ) 2, ' ) 2 P logba )1 P "(n)) ((n1) &trassen sho!ed ho! t!o matri es an be m#ltiplied #sing only ? m#ltipli ations and 1M additions: Consider al #lating the follo!ing ? prod# ts:
>1 ) (a11 H a22) 7 (b11 H b22) >2 ) (a21 H a22) 7 b11 >1 ) a117( b12 Q b22) >9 ) a22 7 (b21 Q b11) >N ) (a11 H a12) 7 b22 >L ) (a21 Q a11) 7 (b11 H b12) >? ) (a12 Q a22) 7 (b21 H b22) $t t#rns o#t that ) >1 H >9 Q >N H >? 12 ) >1 H >N 21 ) >2 H >9 22 ) >1 H >1 Q >2 H >L
11
:ote: $ haven,t a t#ally loo'ed at &trassen,s paper. "his is /#st from the text. $n going thro#gh other boo's, $ sa! a different set of prod# ts than these. ($ also sa! these in other boo's too tho#gh.) "he different set of prod# ts $ sa! re>#ired 29 additions instead of 1M, so this is a slight improvement (tho#gh not asymptoti ) over the other algorithm $ sa! in a different boo'.
8et,s verify a o#ple of these: >1 H >9 Q >N H >? ) (a11 H a22) 7 (b11 H b22) H a22 7 (b21 Q b11) Q (a11 H a12) 7 b22 H (a12 Q a22) 7 (b21 H b22) ) a11b11 H a11b22 H a22b11 H a22b22 H a22b21 Q a22b11 Q a11b22 Q a12b22 H a12b21 H a12b22 Q a22b21 Q a22b22 ) a11b11 H a12b21 , sin e everything else an els o#t. >1 H >N ) a117( b12 Q b22) H (a11 H a12) 7 b22 ) a11b12 Q a11b22 H a11b22 H a12b22 ) a11b12 H a12b22 $ have no idea ho! &trassen ame #p !ith these ombinations. +e probably realiIed that he !anted to determine ea h element in the prod# t #sing less than M m#ltipli ations. From there, he probably /#st played aro#nd !ith it. $n the algorithms text by Cormen, 8eiserson and 0ivest, they sho! ho! one o#ld derive these prod# ts. $f !e let "(n) be the r#nning time of &trassen,s algorithm, then it satisfies the follo!ing re #rren e relation: "(n) ) ?"(n=2) H ((n2) $t,s important to note that the hidden onstant in the ((n 2) term is larger than the orresponding onstant for the standard divide and on>#er algorithm for this problem. +o!ever, for large matri es this algorithm yields an improvement over the standard one !ith respe t to time.