Using Glowworm Swarm Optimization Algorithm ... - Semantic Scholar

4 downloads 0 Views 820KB Size Report
Feb 2, 2011 - [9] Dong Jin-xin,Qi Min-yong. “New clustering algorithm based on particle swarm optimization and simulated annealing”.Computer Engineering ...
Using Glowworm Swarm Optimization Algorithm for Clustering Analysis Zhengxin Huang, Yongquan Zhou Journal of Convergence Information Technology, Volume 6, Number 2. February 2011

Using Glowworm Swarm Optimization Algorithm for Clustering Analysis Zhengxin Huang, Yongquan Zhou College of Mathematics and Computer Science, Guangxi University for Nationalities Nanning 530006,China E-mail:[email protected] doi:10.4156/jcit.vol6. issue2.9

Abstract In this paper, two new cluster analysis methods based on glowworm swarm optimization (GSO) algorithm are proposed. The first algorithm showed how GSO can be used to self-organization cluster analysis. The second algorithm is hybrid the GSO clustering analysis with the K-means algorithm to accelerate classification. Two clustering algorithms are tested on three data sets; experimental results show that the two kind of clustering algorithm has higher clustering results.

Keywords: Glowworm Swarm Optimization, Self-organization Clustering, Swarm-based Clustering, Clustering Analysis

1. Introduction Cluster analysis has become an important technique in exploratory data analysis, pattern recognition, machine learning, neural computing, and other engineering fields. The clustering aims at identifying and extracting significant groups in underlying data. The four main classes of clustering algorithms are partitioning methods, hierarchical methods; density based clustering and grid-based clustering. Clustering with swarm-based algorithms is emerging as an alternative to more conventional clustering methods [1][11-13]. Such as the ant colony optimization (ACO), particle swarm optimization (PSO) had been widely applied to the cluster analysis, and highly attractive in many important clustering applications. Glowworm swarm optimization (GSO) is a new swarm intelligence algorithm, was proposed by Krishnanand and Ghose in 2005. The application of GSO is in the area of multiple signal source location or identification of odour sources and hazardous spills [2]. There have not researched related to the clustering analysis based on the GSO algorithm in current literature. In this paper, we will propose two new clustering approaches base on the GSO algorithm. The first algorithm, we using the GSO realize self-organization data clustering; the second algorithm is hybrid the GSOCA with the Kmeans algorithm to accelerate classification. Two new algorithms are tested on three data sets; the experimental results show that both GSO clustering techniques have much potential. The rest of the paper is organized as follows: Section 2 presents an overview of the K-means algorithm. GSO is introduced in section 3. The glowworm swarm optimization clustering algorithm is proposed in section 4. The hybrid GSOCA with the K-means algorithm appeared in Section 5. Experimental results are summarized in section 6, and Section 7 presents conclusions.

2. K-means clustering The K-means clustering method is one of the simplest unsupervised learning algorithms for solving the well-known clustering problem. The goal is to divide the data points in a data set into K -clusters fixed a priori such that some metric relative to the centroids of the clusters (called the fitness function) is minimized. The algorithm consists of two stages: an initial stage and an iterative stage. The initial stage involves defining K initial centroids, one for each cluster. These centroids must be selected carefully because of differing initial centroids causes differing results. One policy for selecting the initial centroids is to place them as far as possible from each other. The second, iterative stage repeats the assignment of each data point to the nearest centroid and K new centroids are recalculated according to the new assignments. This iteration stops when a certain criterion is met, for example, when there is no further change in the assignment of the data points. Given a set of n data samples,

- 78 -

Using Glowworm Swarm Optimization Algorithm for Clustering Analysis Zhengxin Huang, Yongquan Zhou Journal of Convergence Information Technology, Volume 6, Number 2. February 2011

suppose that we want to classify the data into k groups, the algorithm aims to minimize a fitness function, such as a squared error function defined as: k

n

F   xi  c j

2

(1)

j 1 i 1

th

Where, xi is the i data point of the data sample, c j is the j distance measure between the i

th

th

cluster center, xi  c j

2

is a chosen

data point of the data sample and the cluster center c j . The k-means

clustering algorithm is summarized in the following steps [7]: Step 1. Place K points into the space represented by the objects that are being clustered. These points represent the initial group centroids. Step 2. Assign each object to the group that has the closest centroid. Step 3. When all objects have been assigned, recalculate the positions of the k centroids. Step 4. Repeat steps 2 and 3 until a certain criterion is met, such as the centroids no longer moving or a preset number of iterations have been performed. This results in the separation of objects into groups for which the score of the fitness function is minimized.

3. Glowworm swarm optimization It is necessary to introduce GSO with optimizing multi-modal functions, as it is helpful to understand the working principle of the algorithm. When using GSO optimizing multi-modal functions, physical agents i (i  1, , n) are initially randomly deployed in the objective function space. Each agent in the swarm decides its direction of movement by the strength of the signal picked up from its neighbours. This is similar to the luciferin induced glow of a glowworm which is used to attract mates or prey in nature. The brighter the glow, the more is the attraction. Therefore, the authors use the glowworm metaphor to represent the underlying principles of the GSO algorithm. GSO algorithm to optimize the multi-modal function include 5 major steps: 1) According to the formula (2), each glowworm i encodes the objective function value J ( xi (t )) at its current location

xi (t ) into a luciferin value li (t ) ; 2) Constructing neighborhood set N i (t ) ; 3) According to the formula (3),each glowworm i calculate moves toward j probability pij (t ) ; 4) Select the moving objects j , using the formula (4) calculate the new location xi (t  1) , s is the moving step; 5) *

According to the formula (5) update the radius of the dynamic decision domain.

li (t )  (1   )li (t  1)  J ( xi (t )) pij (t ) 



l j (t )  li (t )

(2)

(3)

l (t )  li (t ) k N ( t ) k i

 x j (t )  xi (t ) xi (t  1)  xi (t )  s *   x j (t )  xi (t ) 

   

rdi (t  1)  min{rs , max{0, rdi (t )   (nt  N i (t ) )}}

- 79 -

(4)

(5)

Using Glowworm Swarm Optimization Algorithm for Clustering Analysis Zhengxin Huang, Yongquan Zhou Journal of Convergence Information Technology, Volume 6, Number 2. February 2011

A full parameters analysis is found in Krishnanand and Ghose (2008b), show that the choice of these parameters has some influence on the performance of the algorithm. In terms of the total number of peaks captured, they suggest the parameter selection as shown in Table 1. Thus, only n and rs need to be selected. These parameters value brings more convenience to people to apply the GSO algorithm. Table 1. The GSO algorithm parameter selection



 0.4

0.6



nt

0.08

5

s 0.03

l0 5

4. GSO clustering The previous section we have brief introduced the GSO algorithm, the GSO clustering algorithm will be proposed in this section, abbreviation GSOCA. One of the most important components of a clustering algorithm is the measure of similarity used to determine how close two patterns are to one another. In this paper, we use the local space relative density to reflect the local data similarity. The local space relative density and its related definitions in GSO clustering algorithm were defined as follows: Define 1. For a cluster data object X ( x1 , x2 , , xm ) , the local space relative density is calculated by the formulate (6):

d(X ) 

N ( X , r) num _ g

(6)

Where r is a constants, N ( X , r ) is the data set containing in local space within r of X ,

N ( X , r ) is the size of that set, num _ g is the total numbers of data object. The bigger d ( X ) value, the more data object X ( x1 , x2 , , xm ) similarity with the data containing in its local space. Define 2. The attraction of data object X ( x1 , x2 , , xm ) is calculated by the formulate (7):

J ( X )   ln(

1 )  ln(d ( X )) num _ g

(7)

Where d ( X ) the local space relative density of is X , ln() is the natural logarithm. The more X similarity with the data containing in its local space has more attraction.

4.1. GSO clustering process description After above definitions, the GSO clustering process can be described as follows: each glowworm i represents a cluster data object, according to the formulate (5) calculate it’s local space relative density , use the formulate (6) calculate its attraction, then encodes its current attraction value into a luciferin value by the formulate (1), and broadcasts the same within its neighbourhood; Within its dynamic decision domain range, select have a relatively higher luciferin value agent to constitute its neighbours; According to the formula (2), glowworm i calculate moves toward neighbour j probability pij (t ) ; According to probability select move objects moves toward it, update its location by the formulate (3); According to the formula (4) update the radius of the dynamic decision domain. Through repeatedly carries out the above process, and finally realize the data objects self-organization cluster.

- 80 -

Using Glowworm Swarm Optimization Algorithm for Clustering Analysis Zhengxin Huang, Yongquan Zhou Journal of Convergence Information Technology, Volume 6, Number 2. February 2011

4.2. GSO clustering algorithm description The GSO clustering algorithm (GSOCA) is proposed can be described as follows: Input cluster data object; Set maximum iteration number = iter _ max ; Let s be the step size; Let r be the local space radius; Let li (0) be the initial luciferin ; i

Let rd (0) be the initial dynamic decision domain radius Set t  1 While (t