0% found this document useful (0 votes)
108 views

Binary Search and Linear Search (DSA REPORT) .

1. The document provides code implementations for binary and linear search algorithms, along with driver programs to test and analyze their performance on randomly generated lists of varying sizes from 2 to 20 million entries. 2. Testing results in performance charts that show binary search requiring around 12 iterations to search lists of size 4800, while linear search requires over 25,000 iterations to search lists of size 20,000. 3. The analysis finds binary search more suitable than linear search when elements are sorted, as it has superior time complexity of O(log n) compared to linear search's O(n), allowing binary search to quickly narrow down the search space through ordered comparisons.

Uploaded by

Recto Gaming
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
108 views

Binary Search and Linear Search (DSA REPORT) .

1. The document provides code implementations for binary and linear search algorithms, along with driver programs to test and analyze their performance on randomly generated lists of varying sizes from 2 to 20 million entries. 2. Testing results in performance charts that show binary search requiring around 12 iterations to search lists of size 4800, while linear search requires over 25,000 iterations to search lists of size 20,000. 3. The analysis finds binary search more suitable than linear search when elements are sorted, as it has superior time complexity of O(log n) compared to linear search's O(n), allowing binary search to quickly narrow down the search space through ordered comparisons.

Uploaded by

Recto Gaming
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

UNIVERSITY OF ENGINEERING AND

TECHNOLOGY TAXILA
COMPUTER ENGINEERING DEPARTMENT

Course: DATA STRUCTURE AND ALGORITHMS

(Theory)

Third Semester 2019

(B.S. Computer Engineering)

Submitted To: Dr. M Rizwan

Submitted By: Sheikh Abuzar Amjad

Reg NO: 19-CP-06

Section : OMEGA (Ω)


Assignment NO : 1
THE PROBLEM:
Put together C++ programs, one for linear search and one for binary search algorithm. Write driver
programs in order to test complexity of each. Since the complexity of linear search is a linear function of
input size (Big Oh (n)) whereas complexity of binary search is Big Oh (log n). Keeping in view the reported
complexity of both algorithms it is desirable that testing/benchmarking of complexity should be carried
out in order to establish the fact for sure. In order to carry out this exercise you are required to test
these algorithms separately with varying size of input lists as starting from list size 2 up to 20million
entries.

Hint:

Complexity of a search algorithm may very well stand for number of iterations it takes in order to find the
desired element in your collection. Try to mark the number of iterations it takes in each run. Since you are
required to test your programs for varying size inputs it is necessary you do perform your testing for each
input size properly. For example, in order to get the average time complexity for input of size 5000
entries: you must run the testing for at least 2500 times to get an average number of iterations. In the
range of 2 – 20million you must take at least 200 separately random lists that make it 200 test runs which
in turn provide you with 200 points in resulting graph. In each run make at least as much searches as half
the number of items in the list itself.

Once you get the average number of comparisons for your selection plot a graph for each algorithm.

It is required that you generate your lists randomly each time you test your program for specific number
of items. In the meantime, the search key should also be generated separately (Random) in the driver
program.

Report (includes)

1. Your own random token (number, string character etc.) and list generator.

2. Binary search implementation

3. Linear search implementation

4. Testing programs that work as drivers.

5. Performance chart for binary search

6. Performance chart for linear search

7. Comparative analysis

8. Comments about performance.


PROGRAM TO GENERATE ARRAY OF
NUMBERS UP TO 1Million

OUTPUT SCREEN
Working of Code:
The above code generates a list of random numbers upto range 2 to 1million and store it into the
output file as Me.txt. This output file is taken as input to our main programs of binary and linear search
and put into test for validation of the algorithms in order to measure the complexity.

BINARY SEARCH IMPLEMENTATION


int binarySearch(int array[], int size, int searchValue)

int low = 0;

int high = size - 1;

int mid;

while (low <= high)

mid = (low + high) / 2;

if(searchValue == array[mid])

return mid;

else if (searchValue > array[mid])

low = mid + 1;

else

high = mid - 1;

return -1;

}
Driver PROGRAM FOR BINARY SEARCH
int main() {

while(true)

clock_t starttime, endtime;

// variable for calculating total time of execution

double totaltime;

int i = 0, j, q = 0, min, index; int k=0,l=10;

// declaring array to store data from file

static int arr[10000000] ;

// declaring file pointer

FILE* fptr;

// opening the integer file.

fptr = fopen("Me.txt", "r");

// scanning integer from file to array

while (fscanf(fptr, "%d", &arr[i]) == 1)

// for counting the number of elements

q++;

// for incrementing the array index

i++;

int x;

cout<<""<<endl;

cout<<"Enter the number you want to search ";

cin>>x;

int n = sizeof(arr) / sizeof(arr[0]);

starttime = clock();

printf("start time : %f\n", (float)starttime);

int result = binarySearch(arr, 0, n - 1, x);

(result == -1) ? cout << "Element not found" : cout << "Element found at index: " << result;
endtime = clock();

cout<<endl;

printf("%f\n", (float)endtime);

totaltime = ((double)(endtime - starttime)) / CLOCKS_PER_SEC;

printf("\n\nendtime : %f\n", (double)endtime);

printf("\n\ntotal time of execution = %f", totaltime);

OUTPUT SCREEN
Performance Chart For BINARY Search

Binary search Performance


14

12

10
ITERATION

0
4740 4760 4780 4800 4820 4840 4860 4880 4900 4920 4940
INPUT

LINEAR SEARCH
int linearSearch(int array[], int key, int searchValue)

for(int i = 0; i < key; i++)

if (arr[i]== key)

return i;

return -1;

}
Driver PROGRAM FOR LINEAR SEARCH
int main() {

while (true)

clock_t starttime, endtime;

// variable for calculating total time of execution

double totaltime;

int i = 0, j, q = 0, min, index; int k=0,l=10;

// declaring array to store data from file

static int arr[20000000] ;

// declaring file pointer

FILE* fptr;

// opening the integer file.

fptr = fopen("Me.txt", "r");

// scanning integer from file to array

while (fscanf(fptr, "%d", &arr[i]) == 1)

// for counting the number of elements

q++;

// for incrementing the array index

i++;

int key= rand()%(20000000-2+1)+2;

cout<< "\n Random number that can be generated"<<endl;

cout<<key<<endl;

int n = sizeof(arr) / sizeof(arr[0]);

starttime = clock();

printf("start time : %f\n", (float)starttime);

int result = linearsearch(arr, n, key);

(result == -1) ? cout << "Element not found" : cout << "Element found at index: " << result;

endtime = clock();

cout<<endl;

printf("%f\n", (float)endtime);

totaltime = ((double)(endtime - starttime)) / CLOCKS_PER_SEC;


printf("\n\nendtime : %f\n", (double)endtime);

printf("\n\ntotal time of execution = %f", totaltime);

OUTPUT SCREEN
Performance Chart For LINEAR SearcH

Linear search performance


30000

25000

20000
ITERATION

15000

10000

5000

0
0 5000 10000 15000 20000 25000 30000
INPUT VALUES

GRAPH PERFOMANCES
Both linear and binary search algorithms can be useful depending on the application.
When an array is the data structure and elements are arranged in sorted order, then
binary search is preferred for quick searching. If linked list is the data structure
regardless how the elements are arranged, linear search is adopted due to
unavailability of direct implementation of binary search algorithm.

The typical Binary search algorithm cannot be employed to linked list because linked
list is dynamic in nature and it is not known where the middle element is actually
assigned. Hence, there is a requirement to design the variation of the binary search
algorithm that can work on linked list too because the binary search is faster in
execution than a linear search.

If the record is to present somewhere in the search table, on an average, the number
of comparisons will be (n+1/2). The worst case efficiency of this technique is O(n)
stands for the order of execution.
BASIS FOR LINEAR SEARCH BINARY SEARCH

COMPARISON

Time Complexity O(N) O(log 2 N)

Best case time First Element O(1) Center Element O(1)

In linear search, performance


In binary search, performance is
Performance is done by equality
done by ordering comparisons.
comparisons.

Prerequisite for No required Array must be in sorted order

an array

Usefulness Easy to use and no need for Anyhow tricky algorithm and

any ordered elements. elements should be organized in

order.

Worst case for N N comparisons are required Can conclude after only log2N

number of comparisons

elements

Can be Array and Linked list Cannot be directly implemented

implemented on on linked list

Insert operation Easily inserted at the end of Require processing to insert at

list its proper place to maintain a

sorted list.

Algorithm type Iterative in nature Divide and conquer in nature


COMPARATIVE PROGRAM
#include <iostream>

using namespace std;

void linearSearch(int array[], int size, int searchValue)

int count = 0;

for(int i = 0; i < size; i++)

count++;

if (searchValue == array[i])

break;

cout << "Linear Search comparisons: " << count << endl;

void binarySearch(int array[], int size, int searchValue)

int low = 0;

int high = size - 1;

int mid;

int count = 0;

while (low <= high)

mid = (low + high) / 2;

count++;

if(searchValue == array[mid])

break;

else if (searchValue > array[mid])

low = mid + 1;

}
else

high = mid - 1;

cout << "Binary Search comparisons: " << count << endl;

void populateArray(int array[], int size)

for(int i = 1; i <= size; i++)

array[i - 1] = i;

int main()

int size;

int userValue;

char input;

do

cout << "Enter an int for the size of the array:" << endl;

cin >> size;

int a[size];

cout << "Enter an integer between 1 and " << size << " to search for in the array." << endl;

cin >> userValue;

populateArray(a, size);

linearSearch(a, size, userValue);

binarySearch(a, size, userValue);

cout << "Press 'y' and enter to run again or just enter to quit";

cin.sync("Me.txt");

cin.get(input);

while(input == 'y');

}
comparision analysis chart

Comparative Analysis of

comparison program
We know that for a small number of elements (say 10), the difference between the number of operations
performed by binary search and linear search is not so big. But in the real world, most of the time, we deal with
problems that have big chunks of data.
For example, if we have 4 billion elements to search for, then, in its worst case, linear search will take 4 billion
operations to complete its task. Binary search will complete this task in just 32 operations. That’s a big
difference. Now let’s assume that if one operation takes 1 ms for completion, then binary search will take only
32 ms whereas linear search will take 4 billion ms (that is approx. 46 days). That’s a significant difference.
Performance of both algorithm
• For a binary search, branchless implementation is much faster than branching one: almost four times faster
in throughput sense and two times faster by latency. I think it would remain so even if support for non-
power-of-two array lengths was added to branchless implementation.

• Full unrolling of branchless binary search (with exactly known number of comparisons to do) makes search
faster by about 30% in throughput sense. The difference in latency is negligible: 3-4 cycles.

• It is hard to ensure that binary search written in C++ is branchless on every platform, every compiler and
every set of settings. I realized this after looking at GCC's work: it managed to miracously emit cmov in once
case and stupidly screw cmov-s in the other place. Also, MSVC screws the last cmov

• In binary_search_branchless_UR if it is inlined into the benchmark code. There is no way to get cmov-s reliably
except for using assembly.
• Latency time of branchless binary search is twice as large as throughput time. I suppose it means that CPU
runs two searches in parallel when they are independent. However, it can be harder to ensure such
independence in real workflow, when binary search is not the only computation done in the hot loop.
• Counting linear search is faster than linear search with break for small N. This is because linear search with
break executes unpredictable branch (which does not happen for N = 16 because it always needs one
iteration). There is about 60% improvement in throughput for N = 32, and about 35% improvement for N = 64
(less improvement in latency). It seems that the branch costs about 10 cycles in both throughput and latency.
• The linear search with break becomes faster than counting linear search shortly after N = 128. For N = 1024
it is 80% faster, and I guess the performance ratio should converge to two at infinity. This happens because
linear search with break processes only half of input array on average, while counting linear search always
goes through the whole array. It is questionable whether it is a huge benefit, given that for N > 128
branchless binary search is faster and reads substantially less data anyway.
• For counting linear search, AVX version is noticeably faster only for N = 64, and for larger N it is about 50%
faster. Perhaps more efficient AVX implementation would achieve better ratio. As for full unrolling, it gives
another almost 50% boost for large N (like N = 256 or more), althrough for small N its benefit is again much
less noticeable.
• As for binary vs linear search competition, the situation is different for throughput and latency. For
throughput performance, branchless binary search is slower for N < 64 (by at most 30%) and faster for N > 64,
although the fully unrolled version of it is as fast as SSE linear search even for N = 16 and N = 32. For latency
performance, branchless binary search catches up with SSE linear search only for N between 128 and 256
(double that for AVX), and it is slower by up to 50% for N = 32 and N = 64.
CONCLUSION:
• The proposed "counting" linear search is noticeably faster for small array lengths than linear search with
break, because branch misprediction is removed from it. So it makes sense to always use counting version of
linear search instead of the breaking one, since for larger array lengths binary search becomes faster anyway.
• Binary search is surprisingly good to stand against linear search, given that it fully utilizes conditional move
instructions instead of branches. In my opinion, there is no reason to prefer linear search over binary search,
better make sure that your compiler does not generate branches for the binary search. Counting linear search is
worth using only in a rare case when it is known that array length is very small and the search performance is
really
• critically important. Also, the idea of counting linear search is easy to embed into some vectorized
computation, unlike the binary search.

The end.

You might also like