C11 Searching
C11 Searching
11. Searching
In many applications it is necessary to search a list of data elements for one (or more) that
matches some specific criterion.
For example:
- find the largest integer in a list of test scores
- find the location of the string "Fred" in a list of names
- find all Trip variables whose destination is "Jackson, WY"
There are several basic issues when searching:
- how do we recognize and keep track of matches to the search criteria?
- how do we terminate the search?
- what do we report? (value, location, yes/no, . . .)
- how do we recognize that there is no matching data element?
- what do we do if no matching data element is found?
Here we will only consider the problem of searching among the elements of an array.
Computer Science Dept Va Tech August, 2002
11. Searching
When searching an array of simple data elements, we may resolve these questions easily:
- We recognize matches by comparing with the equality operator.
- We terminate the search if we find an element that equals the search target, or if we
have rejected the last used cell in the array.
- We may report a match by providing the index or a copy of the element.
- We recognize that there is no matching data element if we reach the end of the used
cells in the array without finding a match.
- If no matching element is found, we return an impossible index (e.g., -1) or a dummy
value that cannot logically be a valid data element.
unused
A[3]
A[4]
A[5]
A[6]
A[7]
A[8]
A[9]
A[10]
A[11]
A[12]
A[13]
A[2]
13
A[1]
A[0]
A
43
19
23
29
??
??
??
??
??
11. Searching
The simplest technique for searching a list is to just "walk" the list, examining each element
in turn until you either find a match or reject the last used cell:
const int MISSING = -1 ;
int LinearSearch(const int List[], int Target, int
int Scan = 0;
Size) {
11. Searching
Consider implementing an application that will store data for a list of trips and perform
lookup operations on that list (e.g., report the distance or time for a trip given a specific
origin and destination).
Clearly we may use an array of Trip variables, as defined earlier, to store the data.
Assume that the application will use two input files, one containing the trip data as we've
seen before, and a second input file that will contain the lookup commands the program is
supposed to process, using the general syntax:
<command string><tab><tab-separated command parameters>
For example:
mileage
time
neighbors
. . .
Knoxville, TN
Tucumcari, NM
Birmingham, AL
Nashville, TN
Albuquerque, NM
11. Searching
In this case, the array elements are complex structures and the search function must
access the appropriate member(s) to make the comparison:
int reportMileage(const Trip& toFind, const Trip dB[], int numTrips) {
int Idx;
for (Idx = 0; Idx < numTrips; Idx++) {
if ( (toFind.Origin
toFind.Destination
||
(toFind.Origin
toFind.Destination
== dB[Idx].Origin
== dB[Idx].Destination)
&&
== dB[Idx].Destination
== dB[Idx].Origin) ) {
&&
return dB[Idx].Miles;
}
}
return MISSING;
}
The designer has determined that the direction of the trip does not matter, which
complicates the Boolean test for a match somewhat. How would the implementation
change if direction did matter?
Note the use of const in the parameter list. Is this good design? Why or why not?
Computer Science Dept Va Tech August, 2002
Binary Search
11. Searching
Linear search will always work, but there is an alternative that is, on average, much
faster. The difficulty is that binary search may only be applied under the following:
Assumption: the list is sorted in ascending (or descending) order:
List[0]
Binary search:
<= List[1]
<=
...
<= List[Size-1]
z
Intro Programming in C++
11. Searching
An implementation of binary search is more complex logically than a linear search, but the
performance gain is impressive:
const int MISSING = -1;
int BinSearch(const int List[] , int Target, int Lo, int Hi) {
int Mid;
while ( Lo <= Hi ) {
Mid =
( Lo + Hi ) / 2;
if ( List[Mid] == Target )
return ( Mid );
else if ( Target < List[Mid] )
Hi = Mid - 1;
else
Lo = Mid + 1;
}
return ( MISSING );
Note this implementation allows the caller to search a specified portion of List[].
11. Searching
10
11
12
13
21
35
41
43
51
82
85
86
93
??
??
??
??
??
Lo
Mid
Hi
Lo
Mid
Hi
Cost of Searching
11. Searching
Suppose the array to be searched contains N elements. The cost of searching may be
measured by the number of array elements that must be compared to Target.
Linear Search
Best Case
1
Worst Case
N
Average Case
N/2
Binary Search
log2N
log2N
To get an idea of how much cheaper binary search is, note that
N = 1024
N = 10242
log2N = 10
log2N = 20
Probability Ordering
11. Searching 10
Implemented when a small subset of the list elements are accessed more frequently than
other elements.
Static Probabilities
When the contents of the list are static the most frequently accessed elements are stored at
the beginning of the list.
Assumes that access probabilities are also static
Dynamic Probabilities
For nonstatic lists or lists with dynamic probability element accesses, a dynamic element
ordering scheme is required:
Bubble Scheme
Swap each element accessed with the preceding element to allow elements to bubble to the head
of the list.
Sentinel Method
Q
11. Searching 11
Setup:
Store the desired element at the end of the array:
A[], Item
K, int size) {
int i;
A[size] = K;
for ( i = 0; !(K == A[i]); i++ )
;
if ( i < size )
return ( i );
else
return ( MISSING);
}
Interpolation Search
11. Searching 12
Attempts to more accurately predict where the item may fall within the list. Similar to looking up
telephone numbers
Standard Binary Search Midpoint Computation:
Midpoint = (L+R) / 2;
Interpolation replaces the 1/2 (in the above formula) with an estimate of where the desired element is
located in the range, based on the available values (be careful of int arithmetic):
Interp = L +
//
((K - A[L]) /
//
(A[R] - A[L])) * //
(R - L);
//
base loc +
% of distance K is from
A[L] to A[R] *
length of search space
Example: Assume 30K recs of SSNs in the range from 0 ... 600 00 0000
Searching for 222 22 2222 yields an initial estimate of:
Interpolation = 0 + ((222222222 - 0) /
(600000000 - 0)) *
(30000 - 0);
= 11111