8.5 Selecting The MTH Largest: Indx (1..n) Indexx Irank (1..n)
8.5 Selecting The MTH Largest: Indx (1..n) Indexx Irank (1..n)
341
was the smallest) to N (if that element was the largest). One can easily construct a rank table from an index table, however:
void rank(unsigned long n, unsigned long indx[], unsigned long irank[]) Given indx[1..n] as output from the routine indexx, returns an array irank[1..n], the corresponding table of ranks. { unsigned long j; for (j=1;j<=n;j++) irank[indx[j]]=j; }
Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copying of machinereadable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, visit website http://www.nr.com or call 1-800-872-7423 (North America only), or send email to [email protected] (outside North America).
342
Chapter 8.
Sorting
#define SWAP(a,b) temp=(a);(a)=(b);(b)=temp; float select(unsigned long k, unsigned long n, float arr[]) Returns the kth smallest value in the array arr[1..n]. The input array will be rearranged to have this value in location arr[k], with all smaller elements moved to arr[1..k-1] (in arbitrary order) and all larger elements in arr[k+1..n] (also in arbitrary order). { unsigned long i,ir,j,l,mid; float a,temp; l=1; ir=n; for (;;) { if (ir <= l+1) { Active partition contains 1 or 2 elements. if (ir == l+1 && arr[ir] < arr[l]) { Case of 2 elements. SWAP(arr[l],arr[ir]) } return arr[k]; } else { mid=(l+ir) >> 1; Choose median of left, center, and right elements as partitioning element a. Also SWAP(arr[mid],arr[l+1]) rearrange so that arr[l] arr[l+1], if (arr[l] > arr[ir]) { arr[ir] arr[l+1]. SWAP(arr[l],arr[ir]) } if (arr[l+1] > arr[ir]) { SWAP(arr[l+1],arr[ir]) } if (arr[l] > arr[l+1]) { SWAP(arr[l],arr[l+1]) } i=l+1; Initialize pointers for partitioning. j=ir; a=arr[l+1]; Partitioning element. for (;;) { Beginning of innermost loop. do i++; while (arr[i] < a); Scan up to nd element > a. do j--; while (arr[j] > a); Scan down to nd element < a. if (j < i) break; Pointers crossed. Partitioning complete. SWAP(arr[i],arr[j]) } End of innermost loop. arr[l+1]=arr[j]; Insert partitioning element. arr[j]=a; if (j >= k) ir=j-1; Keep active the partition that contains the if (j <= k) l=i; kth element. } } }
Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copying of machinereadable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, visit website http://www.nr.com or call 1-800-872-7423 (North America only), or send email to [email protected] (outside North America).
In-place, nondestructive, selection is conceptually simple, but it requires a lot of bookkeeping, and it is correspondingly slower. The general idea is to pick some number M of elements at random, to sort them, and then to make a pass through the array counting how many elements fall in each of the M + 1 intervals dened by these elements. The k th largest will fall in one such interval call it the live interval. One then does a second round, rst picking M random elements in the live interval, and then determining which of the new, ner, M + 1 intervals all presently live elements fall into. And so on, until the k th element is nally localized within a single array of size M , at which point direct selection is possible. How shall we pick M ? The number of rounds, log M N = log2 N/ log2 M , will be smaller if M is larger; but the work to locate each element among M + 1 subintervals will be larger, scaling as log 2 M for bisection, say. Each round
343
requires looking at all N elements, if only to nd those that are still alive, while the bisections are dominated by the N that occur in the rst round. Minimizing O(N logM N ) + O(N log2 M ) thus yields the result M 2
log2 N
(8.5.1)
Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copying of machinereadable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, visit website http://www.nr.com or call 1-800-872-7423 (North America only), or send email to [email protected] (outside North America).
The square root of the logarithm is so slowly varying that secondary considerations of machine timing become important. We use M = 64 as a convenient constant value. Two minor additional tricks in the following routine, selip, are (i) augmenting the set of M random values by an M + 1st, the arithmetic mean, and (ii) choosing the M random values on the y in a pass through the data, by a method that makes later values no less likely to be chosen than earlier ones. (The underlying idea is to give element m > M an M/m chance of being brought into the set. You can prove by induction that this yields the desired result.)
#include "nrutil.h" #define M 64 #define BIG 1.0e30 #define FREEALL free_vector(sel,1,M+2);free_lvector(isel,1,M+2); float selip(unsigned long k, unsigned long n, float arr[]) Returns the kth smallest value in the array arr[1..n]. The input array is not altered. { void shell(unsigned long n, float a[]); unsigned long i,j,jl,jm,ju,kk,mm,nlo,nxtmm,*isel; float ahi,alo,sum,*sel; if (k < 1 || k > n || n <= 0) nrerror("bad input to selip"); isel=lvector(1,M+2); sel=vector(1,M+2); kk=k; ahi=BIG; alo = -BIG; for (;;) { Main iteration loop, until desired elemm=nlo=0; ment is isolated. sum=0.0; nxtmm=M+1; for (i=1;i<=n;i++) { Make a pass through the whole array. if (arr[i] >= alo && arr[i] <= ahi) { Consider only elements in the current brackets. mm++; if (arr[i] == alo) nlo++; In case of ties for low bracket. Now use statistical procedure for selecting m in-range elements with equal probability, even without knowing in advance how many there are! if (mm <= M) sel[mm]=arr[i]; else if (mm == nxtmm) { nxtmm=mm+mm/M; sel[1 + ((i+mm+kk) % M)]=arr[i]; The % operation provides a some} what random number. sum += arr[i]; } } if (kk <= nlo) { Desired element is tied for lower bound; FREEALL return it. return alo; } else if (mm <= M) { All in-range elements were kept. So reshell(mm,sel); turn answer by direct method. ahi = sel[kk];
344
FREEALL return ahi;
Chapter 8.
Sorting
} sel[M+1]=sum/mm; Augment selected set by mean value (xes shell(M+1,sel); degeneracies), and sort it. sel[M+2]=ahi; for (j=1;j<=M+2;j++) isel[j]=0; Zero the count array. for (i=1;i<=n;i++) { Make another pass through the array. if (arr[i] >= alo && arr[i] <= ahi) { For each in-range element.. jl=0; ju=M+2; while (ju-jl > 1) { ...nd its position among the select by jm=(ju+jl)/2; bisection... if (arr[i] >= sel[jm]) jl=jm; else ju=jm; } isel[ju]++; ...and increment the counter. } } j=1; Now we can narrow the bounds to just while (kk > isel[j]) { one bin, that is, by a factor of order alo=sel[j]; m. kk -= isel[j++]; } ahi=sel[j]; } }
Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copying of machinereadable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, visit website http://www.nr.com or call 1-800-872-7423 (North America only), or send email to [email protected] (outside North America).
Approximate timings: selip is about 10 times slower than select. Indeed, for N in the range of 10 5 , selip is about 1.5 times slower than a full sort with sort, while select is about 6 times faster than sort. You should weigh time against memory and convenience carefully. Of course neither of the above routines should be used for the trivial cases of nding the largest, or smallest, element in an array. Those cases, you code by hand as simple for loops. There are also good ways to code the case where k is modest in comparison to N , so that extra memory of order k is not burdensome. An example is to use the method of Heapsort (8.3) to make a single pass through an array of length N while saving the m largest elements. The advantage of the heap structure is that only log m, rather than m, comparisons are required every time a new element is added to the candidate list. This becomes a real savings when m > O( N ), but it never hurts otherwise and is easy to code. The following program gives the idea.
void hpsel(unsigned long m, unsigned long n, float arr[], float heap[]) Returns in heap[1..m] the largest m elements of the array arr[1..n], with heap[1] guaranteed to be the the mth largest element. The array arr is not altered. For eciency, this routine should be used only when m n. { void sort(unsigned long n, float arr[]); void nrerror(char error_text[]); unsigned long i,j,k; float swap; if (m > n/2 || m < 1) nrerror("probable misuse of hpsel"); for (i=1;i<=m;i++) heap[i]=arr[i]; sort(m,heap); Create initial heap by overkill! We assume m for (i=m+1;i<=n;i++) { For each remaining element... if (arr[i] > heap[1]) { Put it on the heap? heap[1]=arr[i]; for (j=1;;) { Sift down.
n.
345
k=j << 1; if (k > m) break; if (k != m && heap[k] > heap[k+1]) k++; if (heap[j] <= heap[k]) break; swap=heap[k]; heap[k]=heap[j]; heap[j]=swap; j=k; } } } }
Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copying of machinereadable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, visit website http://www.nr.com or call 1-800-872-7423 (North America only), or send email to [email protected] (outside North America).
CITED REFERENCES AND FURTHER READING: Sedgewick, R. 1988, Algorithms, 2nd ed. (Reading, MA: Addison-Wesley), pp. 126ff. [1] Knuth, D.E. 1973, Sorting and Searching, vol. 3 of The Art of Computer Programming (Reading, MA: Addison-Wesley).