Binary Search and Linear Search (DSA REPORT) .
Binary Search and Linear Search (DSA REPORT) .
TECHNOLOGY TAXILA
COMPUTER ENGINEERING DEPARTMENT
(Theory)
Hint:
Complexity of a search algorithm may very well stand for number of iterations it takes in order to find the
desired element in your collection. Try to mark the number of iterations it takes in each run. Since you are
required to test your programs for varying size inputs it is necessary you do perform your testing for each
input size properly. For example, in order to get the average time complexity for input of size 5000
entries: you must run the testing for at least 2500 times to get an average number of iterations. In the
range of 2 – 20million you must take at least 200 separately random lists that make it 200 test runs which
in turn provide you with 200 points in resulting graph. In each run make at least as much searches as half
the number of items in the list itself.
Once you get the average number of comparisons for your selection plot a graph for each algorithm.
It is required that you generate your lists randomly each time you test your program for specific number
of items. In the meantime, the search key should also be generated separately (Random) in the driver
program.
Report (includes)
1. Your own random token (number, string character etc.) and list generator.
7. Comparative analysis
OUTPUT SCREEN
Working of Code:
The above code generates a list of random numbers upto range 2 to 1million and store it into the
output file as Me.txt. This output file is taken as input to our main programs of binary and linear search
and put into test for validation of the algorithms in order to measure the complexity.
int low = 0;
int mid;
if(searchValue == array[mid])
return mid;
low = mid + 1;
else
high = mid - 1;
return -1;
}
Driver PROGRAM FOR BINARY SEARCH
int main() {
while(true)
double totaltime;
FILE* fptr;
q++;
i++;
int x;
cout<<""<<endl;
cin>>x;
starttime = clock();
(result == -1) ? cout << "Element not found" : cout << "Element found at index: " << result;
endtime = clock();
cout<<endl;
printf("%f\n", (float)endtime);
OUTPUT SCREEN
Performance Chart For BINARY Search
12
10
ITERATION
0
4740 4760 4780 4800 4820 4840 4860 4880 4900 4920 4940
INPUT
LINEAR SEARCH
int linearSearch(int array[], int key, int searchValue)
if (arr[i]== key)
return i;
return -1;
}
Driver PROGRAM FOR LINEAR SEARCH
int main() {
while (true)
double totaltime;
FILE* fptr;
q++;
i++;
cout<<key<<endl;
starttime = clock();
(result == -1) ? cout << "Element not found" : cout << "Element found at index: " << result;
endtime = clock();
cout<<endl;
printf("%f\n", (float)endtime);
OUTPUT SCREEN
Performance Chart For LINEAR SearcH
25000
20000
ITERATION
15000
10000
5000
0
0 5000 10000 15000 20000 25000 30000
INPUT VALUES
GRAPH PERFOMANCES
Both linear and binary search algorithms can be useful depending on the application.
When an array is the data structure and elements are arranged in sorted order, then
binary search is preferred for quick searching. If linked list is the data structure
regardless how the elements are arranged, linear search is adopted due to
unavailability of direct implementation of binary search algorithm.
The typical Binary search algorithm cannot be employed to linked list because linked
list is dynamic in nature and it is not known where the middle element is actually
assigned. Hence, there is a requirement to design the variation of the binary search
algorithm that can work on linked list too because the binary search is faster in
execution than a linear search.
If the record is to present somewhere in the search table, on an average, the number
of comparisons will be (n+1/2). The worst case efficiency of this technique is O(n)
stands for the order of execution.
BASIS FOR LINEAR SEARCH BINARY SEARCH
COMPARISON
an array
Usefulness Easy to use and no need for Anyhow tricky algorithm and
order.
Worst case for N N comparisons are required Can conclude after only log2N
number of comparisons
elements
sorted list.
int count = 0;
count++;
if (searchValue == array[i])
break;
cout << "Linear Search comparisons: " << count << endl;
int low = 0;
int mid;
int count = 0;
count++;
if(searchValue == array[mid])
break;
low = mid + 1;
}
else
high = mid - 1;
cout << "Binary Search comparisons: " << count << endl;
array[i - 1] = i;
int main()
int size;
int userValue;
char input;
do
cout << "Enter an int for the size of the array:" << endl;
int a[size];
cout << "Enter an integer between 1 and " << size << " to search for in the array." << endl;
populateArray(a, size);
cout << "Press 'y' and enter to run again or just enter to quit";
cin.sync("Me.txt");
cin.get(input);
while(input == 'y');
}
comparision analysis chart
Comparative Analysis of
comparison program
We know that for a small number of elements (say 10), the difference between the number of operations
performed by binary search and linear search is not so big. But in the real world, most of the time, we deal with
problems that have big chunks of data.
For example, if we have 4 billion elements to search for, then, in its worst case, linear search will take 4 billion
operations to complete its task. Binary search will complete this task in just 32 operations. That’s a big
difference. Now let’s assume that if one operation takes 1 ms for completion, then binary search will take only
32 ms whereas linear search will take 4 billion ms (that is approx. 46 days). That’s a significant difference.
Performance of both algorithm
• For a binary search, branchless implementation is much faster than branching one: almost four times faster
in throughput sense and two times faster by latency. I think it would remain so even if support for non-
power-of-two array lengths was added to branchless implementation.
• Full unrolling of branchless binary search (with exactly known number of comparisons to do) makes search
faster by about 30% in throughput sense. The difference in latency is negligible: 3-4 cycles.
• It is hard to ensure that binary search written in C++ is branchless on every platform, every compiler and
every set of settings. I realized this after looking at GCC's work: it managed to miracously emit cmov in once
case and stupidly screw cmov-s in the other place. Also, MSVC screws the last cmov
• In binary_search_branchless_UR if it is inlined into the benchmark code. There is no way to get cmov-s reliably
except for using assembly.
• Latency time of branchless binary search is twice as large as throughput time. I suppose it means that CPU
runs two searches in parallel when they are independent. However, it can be harder to ensure such
independence in real workflow, when binary search is not the only computation done in the hot loop.
• Counting linear search is faster than linear search with break for small N. This is because linear search with
break executes unpredictable branch (which does not happen for N = 16 because it always needs one
iteration). There is about 60% improvement in throughput for N = 32, and about 35% improvement for N = 64
(less improvement in latency). It seems that the branch costs about 10 cycles in both throughput and latency.
• The linear search with break becomes faster than counting linear search shortly after N = 128. For N = 1024
it is 80% faster, and I guess the performance ratio should converge to two at infinity. This happens because
linear search with break processes only half of input array on average, while counting linear search always
goes through the whole array. It is questionable whether it is a huge benefit, given that for N > 128
branchless binary search is faster and reads substantially less data anyway.
• For counting linear search, AVX version is noticeably faster only for N = 64, and for larger N it is about 50%
faster. Perhaps more efficient AVX implementation would achieve better ratio. As for full unrolling, it gives
another almost 50% boost for large N (like N = 256 or more), althrough for small N its benefit is again much
less noticeable.
• As for binary vs linear search competition, the situation is different for throughput and latency. For
throughput performance, branchless binary search is slower for N < 64 (by at most 30%) and faster for N > 64,
although the fully unrolled version of it is as fast as SSE linear search even for N = 16 and N = 32. For latency
performance, branchless binary search catches up with SSE linear search only for N between 128 and 256
(double that for AVX), and it is slower by up to 50% for N = 32 and N = 64.
CONCLUSION:
• The proposed "counting" linear search is noticeably faster for small array lengths than linear search with
break, because branch misprediction is removed from it. So it makes sense to always use counting version of
linear search instead of the breaking one, since for larger array lengths binary search becomes faster anyway.
• Binary search is surprisingly good to stand against linear search, given that it fully utilizes conditional move
instructions instead of branches. In my opinion, there is no reason to prefer linear search over binary search,
better make sure that your compiler does not generate branches for the binary search. Counting linear search is
worth using only in a rare case when it is known that array length is very small and the search performance is
really
• critically important. Also, the idea of counting linear search is easy to embed into some vectorized
computation, unlike the binary search.
The end.