05-linsearch
05-linsearch
Linear Search
15-122: Principles of Imperative Computation
Frank Pfenning
Lecture 5
January 29, 2013
1 Introduction
One of the fundamental and recurring problems in computer science is to
find elements in collections, such as elements in sets. An important algo-
rithm for this problem is binary search. We use binary search for an integer
in a sorted array to exemplify it. As a preliminary study in this lecture we
analyze linear search, which is simpler, but not nearly as efficient. Still it is
often used when the requirements for binary search are not satisfied, for
example, when we do not have the elements we have to search arranged in
a sorted array.
In term of our learning goals, we discuss the following:
Computational Thinking: We will see the first time the power of order in
various algorithmic problems.
3 Sorted Arrays
A number of algorithms on arrays would like to assume that they are sorted.
Such algorithms would return a correct result only if they are actually run-
ning on a sorted array. Thus, the first thing we need to figure out is how
to specify sortedness in function specifications. The specification function
is_sorted(A,lower,upper) traverses the array A from left to right, start-
ing at lower and stopping just before reaching upper , checking that each el-
ement is smaller or equal to its right neighbor. We need to be careful about
the loop invariant to guarantee that there will be no attempt to access a
memory element out of bounds.
The loop invariant here does not have an upper bound on i. Fortunately,
when we are inside the loop, we know the loop condition is true so we
know i < upper − 1. That together with lower ≤ i guarantees that both
accesses are in bounds.
We could also try i ≤ upper − 1 as a loop invariant, but this turns out to
be false. It is instructive to think about why. If you cannot think of a good
reason, try to prove it carefully. Your proof should fail somewhere.
Actually, the attempted proof already fails at the initial step. If lower =
upper = 0 (which is permitted by the precondition) then it is not true that
0 = lower = i ≤ upper − 1 = 0 − 1 = −1. We could say i ≤ upper , but that
wouldn’t seem to serve any particular purpose here since the array accesses
are already safe.
Let’s reason through that. Why is the acccess A[i] safe? By the loop
invariant lower ≤ i and the precondition 0 ≤ lower we have 0 ≤ i, which
is the first part of safety. Secondly, we have i < upper − 1 (by the loop
condition, since we are in the body of the loop) and upper ≤ length(A)
(by the precondition), so i will be in bounds. In fact, even i + 1 will be in
bounds, since 0 ≤ lower ≤ i < i + 1 (since i is bounded from above) and
i + 1 < (upper − 1) + 1 = upper ≤ length(A).
Whenever you see an array access, you must have a very good reason
why the access must be in bounds. You should develop a coding instinct
where you deliberately pause every time you access an array in your code
and verify that it should be safe according to your knowledge at that point
in the program. This knowledge can be embedded in preconditions, loop
invariants, or assertions that you have verified.
This does not exploit that the array is sorted. We would like to exit the
loop and return −1 as soon as we find that A[i] > x. If we haven’t found x
already, we will not find it subsequently since all elements to the right of i
will be greater or equal to A[i] and therefore strictly greater than x. But we
have to be careful: the following program has a bug.
Can you spot the problem? If you cannot spot it immediately, reason
through the loop invariant. Read on if you are confident in your answer.
Now A[i] <= x will only be evaluated if i < n and the access will be in
bounds since we also know 0 ≤ i from the loop invariant.
Alternatively, and perhaps easier to read, we can move the test into the
loop body.
This program is not yet satisfactory, because the loop invariant does not
have enough information to prove the postcondition. We do know that if we
return directly from inside the loop, that A[i] = x and so A[\result] == x
holds. But we cannot deduce that !is_in(x, A, 0, n) if we return −1.
Before you read on, consider which loop invariant you might add to
guarantee that. Try to reason why the fact that the exit condition must
be false and the loop invariant true is enough information to know that
!is_in(x, A, 0, n) holds.
Did you try to exploit that the array is sorted? If not, then your invariant
is most likely too weak, because the function is incorrect if the array is not
sorted!
What we want to say is that all elements in A to the left of index i are smaller
than x. Just saying A[i-1] < x isn’t quite right, because when the loop is
entered the first time we have i = 0 and we would try to access A[−1]. We
again exploit shirt-circuiting evaluation, this time for disjunction.