Modified Binary Search Algorithm for Duplicate ...

International Journal of Computer & Communication Engineering Research (IJCCER) Volume 2 - Issue 2 March 2014

Modified Binary Search Algorithm for Duplicate Elements Phyu Phyu Thwe 1 , Lai Lai Wi n Kyi 2 Department of Information Technology Mandalay Technological University,Mandalay, Myanmar 1 [email protected] m, 2 laelae83@g mail.co m 1,2

Abstract— In computer science, searching an item or data from a large data set efficiently gives a challenging task. A search strategy is a procedure that performs many comparisons. It starts searching for every value starting from the start, so it performs many comparisons but consumes a lot of time. This searching time can be reduced by avoiding searching every time for each value from the start. Binary search is based on this concept. An d it gives a very good performance with respect to other algorithms due to its logarithmic time complexity. But, the limitation with the Binary S earch (BS ) technique is that it can only be used to search for one element in a given list. S o this paper tends to extend Binary Search (BS ) algorithm to overcome this limitation. On the other hand, Duplicate Element Binary Search (DEBS ) algorithm is developed for duplicate elements in a given list with the same time complexity of classical BS algorithm. Applications of these two algorithms are considered together with the book database in an e-library system. This system is implemented by using java programming language and MyS QL server. Keywords: Analysis of Algorithm, Binary Search, Duplicate Element, Time Complexity

I. INTRO DUCTIO N Searching is one of the most fundamental operations in the field of co mputing. In co mputer science, the search algorith ms-i.e. the algorithms used to find a particular item fro m a set-are generally divided on uninformed (used on unsorted list) and informed ones (used on already sorted list), that apply knowledge about the structure of the search space to reduce the amount of time spent searching [7]. Searching is one of the most time-consuming processes of many processing systems. There are many types of searching techniques such as Linear Search, Binary Search (BS), Depth-First Search (DFS), Best-First Search (BFS), and so on. Among them, Binary Search is a popular and a useful technique for practical applications due to its logarithmic time comp lexity. Time co mplexity of BS algorith m is O(logN). As the time co mp lexity is logarith mic, the algorith m exh ibits significant improvements in computation time with a very large size of the list [3]. The logarith mic behavior of BS algorithm to find elements requires data set to be arranged in ascending or descending order. However, it can be used to search for one element or data in a given list. This is the limitation of this algorith m. Some of the database applications require searching for duplicate elements. It may include repeated values (eg. student name in student database and book title in book database in e-library, etc.) with different properties in records of database. The elements of the list are not necessarily all unique. If one searches for a value that occurs multiple t imes in the list, the index returned will be of the first-encountered equal element. To find all equal elements an upward and © http://ijccer.org e-ISSN: 2321-4198

downward search can be carried out fro m the in itial result, stopping each search when the element is no longer equal [7]. This research tends to extend the classical BS technique as Duplicate Element Binary Search (DEBS) for duplicate elements in solving computational problems. The proposed search algorithm is also considered iteratively with two index limits that progressively narrow the search range. These two algorith ms are demonstrated on the book database in an elib rary system with the same time co mplexity. The rest of this paper is organized as follows. Sect ion II presents the related work for this research. Section III presents description of the proposed algorithm. It describes Binary Search (BS) algorith m, Duplicate Element Binary Search (DEBS) algorithm and analysis of algorithms . Section IV describes the implementation of the proposed algorithm. Section V closes the paper with conclusion. II. R ELATED WO RKS M. Archibald studied “Average Depth in a Binary Search Tree with Repeated Keys”. Here, Random sequences from alphabet {1….r} are examined where repeated letters are allo wed. Binary search trees are formed fro m these sequences and the average left-going depth of the first ‘1’ is found. Next, the right-going depth of the first ‘r’ is examined, and finally a merge (or ‘shuffle’) operator is used to obtain the average depth of an arbitrary node, which can be exp ressed in terms of the left-going and right-going depths. This paper examines various parameters of these trees and gives an average case analysis under two standard probabilistic models (probability and mu ltiset) [1]. A. Tarek proposed “a New Approach for Multiple Element Binary Search in Database Applications”. In this paper, the mu lti-key binary search (MKBS) algorith m and the Multikey Binary Insertion Search (MKBIS) algorith m are developed based on the classical Binary search algorith m. MKBS is searching for m different keys in a list of n different list elements. MKBIS can be used to insert multiple elements inside a sorted list. Both the MKBS and the MKBIS algorith ms are used for extracting records fro m d ifferent layers within the structure as well as for inserting mult iple records. Applications of the proposed algorithms are considered together with a model Emp loyee Database Management program with imp roved efficiency [3]. S. Korteweg developed a new dictionary structure supporting binary search. This dictionary structure can be implemented without a penalty in memo ry usage but does not support vocabularies. Due to the O(logN) t ime for Binary Search, the system can reduce searching time by implementing the binary search dictionary structure [2]. R. Nowak studied a generalizat ion of the classic binary search problem. The classic problem can be v iewed as determining the correct one-dimensional, binary-valued p-ISSN: 2321-418X

Page 77

Phyu Phyu Thwe , et al International Journal of Computer and Communication Engineering Research [Volume 2, Issue 2 March 2014] threshold function from a fin ite class of such functions based on queries taking the form of point samples of the function. The generalized problem extends binary search techniques to mu lti-dimensional threshold functions, which arise in mach ine learning and pattern classification. It identifies geometrical conditions on the pair (specific query space X, hypothesis space H) that guarantee that Generalized Binary Search determines the correct hypothesis in O(log|H|) queries. Extensions to handle noise are also discussed [4]. Above of all researches, Binary Search algorith m is a faster algorith m than other search algorith ms and very useful to search elements in Database applications. Many researches use rapid searching using a variant extension of the Binary Search algorith m. In this paper, we propose an efficient algorith m by modifying Binary Search algorithm for finding the duplicate elements in a sorted list. III. DESCRIPTION OFTHEPROPOSED ALGORITHM The proposed algorithm is based on BS algorith m for finding duplicate values that occur more than one time in database. This section discusses Binary Search A lgorith m, Duplicate Element Binary Search Algorith m and Analysis of Algorith ms. A. Binary Search Algorithm

low

mid

high

keyitem[mid] n/4 items

low

low

mid

high

log2 n steps key=item[mid] low high mid

Figure 1: Co mparison of Binary Search

In computer science, a binary search or half-interval search algorith m finds the position of a specified value (the input “key”) within a sorted array. In each step, the algorithm compares the input key value with the key value of the middle element of the array. If the keys match then a matching element has been found so its index or position is returned. Otherwise, if the key is less than the middle element's key, then the algorithm repeats its action on the sub-array to the left of the middle element or, if the input key is greater, on the sub-array to the right. If the remaining array to be searched is reduced to zero and the key cannot be found in the array, then a special "Not found" indication is returned. A binary search halves the number of items to check with each of iterat ions, so locating an item (or determin ing its absence) takes logarithmic t ime [5]. A binary search is an example of a d ivide and conquer search algorithm. Algorithm binary search (A [0…n-1], key) while (lo w key then high = middle - 1 else low = middle + 1 end if end while return -1

© http://ijccer.org

n items

e-ISSN: 2321-4198

Binary search requires a more co mp lex p rogram than the linear search and thus for small data N it may run slower than the simple linear search [6]. For large data N, BS is faster than linear search. B. Duplicate Element Binary Search Algorithm Duplicate Element binary search (DEBS) algorith m modifies binary search operations for finding duplicate elements in the sorted list. The algorithm finds first and last occurrences of the key and stores their indexes into an array and returns it. The search starts with the midpoint of the array. If the key is less than or greater than middle element, the algorithm performs the same function of the binary search. If the key matches the middle element, there can be duplicate keys on both sides of the array. Therefore, the algorith m checks the sub array to the right for the last occurrence whether the last index is equal to the key. If the key is not equal to the last index, the algorith m finds the midpoint again on this sub array and repeats its action. Then it checks again on the sub array to left for the first occurrence whether the first index is equal to the key. If the key is not equal to the first index, the algorith m performs action same to the right sub array. Finally, the indexes of the first and last occurrences are returned in an array. If remaining sub array to be search reaches zero and the key is not found in the array, then a special “Not found” indication is returned. This algorithm is useful to find where duplicate items are in a sorted array. Assume that, the user to find book informat ion contain book title, author name, edition, description by using author name. There are one or more books written by the author. DEBS can perform searching p-ISSN: 2321-418X

Page 78

Phyu Phyu Thwe , et al International Journal of Computer and Communication Engineering Research [Volume 2, Issue 2 March 2014] process for duplicate elements. BS algorith m support searching process for one element not for duplicate element. This is the weak point of BS. As a result, DEBS is developed to overcome this weak point without changing the time complexity. Algorithm duplicate element binary search (A [0…n-1], key) while (low key then high = middle – 1 else if A[middle] < key then low = mid + 1 else if A[middle = key] then lastOccurrence = find LastIndex(A [0…n -1], key, middle, high) firstOccurrence = findFirstIndex(A[0…n-1],key, midd le, low) return result[lastOccurrence,firstOccurrence] end if end while return -1 findLastIndex(A[0…n-1], key, mi ddle, high) low = middle if (A[high] = key) then return h igh end if while (low key and A[middle-1] = key) then return midd le-1 else if (A[middle] = key) then low = middle + 1 else if (A[middle] > key and A[middle -1] != key) then high = middle – 1 end if end while return -1 findFirstIndex(A[0…n-1 ],key, mi ddle, l ow) high = middle - 1 if (A[low] = key) then return low end if while (low