Regular PaperParticle swarm clustering fitness evaluation with computational centroids
Introduction
Particle Swarm Optimization (PSO) has global convergence ability and, due to its stochastic nature, it can avoid local optima. Therefore, it has been widely exploited to solve complex clustering tasks, where simpler clustering algorithms, such as K-means, are likely to get stuck into a local optimum – possibly far from a satisfactory result. PSO can be also used to simultaneously optimize the number of clusters. The term Particle Swarm Clustering (PSC) refers to clustering conducted using PSO. In PSC, fitness evaluation, i.e., the criteria used to evaluate the quality of each solution found by the algorithm, is critical to obtain successful clustering results. Nevertheless, this aspect of PSC has not received much attention before. Most novel PSC applications simply adopt their fitness evaluation approach from a previous work, while in most cases, the results could be easily improved via minor modifications in the fitness evaluation as will be shown in this paper.
PSC algorithms typically use clustering validity indices (CVIs) as their fitness functions. Most CVIs depend on cluster centroid positions. Traditionally in PSC fitness evaluation, the cluster centroids in a CVI formula are directly replaced by the centroids proposed by a particle position. However, computational centroids of the formed clusters are usually different from the particle positions. We propose a novel approach, where the fitness evaluation is carried out using computational centroids. We call the approach Fitness Evaluation with Computational Centroids (FECC). We explain why the FECC approach can better exploit promising solutions and why it can lead to improved final clustering results compared to the traditional approach. Finally, we conduct a thorough comparison of FECC and the traditional fitness evaluation with several different CVIs to show the superiority of the new approach.
Our second objective is to exhaustively investigate which CVIs are the most suitable fitness functions for PSC. Despite the importance of this question, exhaustive comparisons of PSC fitness functions are lacking. While several general CVI comparisons can be found in the literature, the results cannot be directly assumed to help in PSC fitness function selection due to different requirements. In this paper, we will discuss previous CVI comparisons and the results relevant for PSC fitness function selection. Thereafter, we conduct an extensive comparison of PSC fitness functions based on different CVIs.
As different fitness functions or a different approach to perform fitness evaluation may be optimal for different data types, we will evaluate the effect of typical dataset characteristics (true number of clusters, dimensionality, asymmetric density, overlap, and noise). The results will be analyzed with statistical significance tests. In the experiments, we use two different PSC approaches, namely Multi-elitist Particle Swarm Optimization (MEPSO) [1] and Multi-dimensional Particle Swarm Optimization (MDPSO) along with the Fractional Global Best Formation (FGBF) method [2]. Both approaches use centroid-based particle encoding and can be used for dynamic clustering, where the optimal number of clusters is searched for simultaneously with the optimal centroid positions.
The rest of the paper is organized as follows. PSC and the applied PSC methods, MEPSO and MDPSO, are briefly described in Section 2. Fitness evaluation in PSC is discussed in Section 3. First, the importance and problems of selecting a proper CVI as the fitness function are discussed in section 3.1 and the proposed FECC approach is introduced in Section 3.2. Section 4 introduces a rich set of CVI considered in this paper along with previous CVI comparisons. Experimental results are given in Section 5 and, finally, Section 6 concludes the paper.
Section snippets
Particle swarm clustering
The basic form of the PSO algorithm was introduced in [3] and later modified in [4]. In the algorithm, a swarm of particles flies stochastically through an -dimensional search space, where each particle's position represents a potential solution to an optimization problem. Each particle p with current position and current velocity remembers its personal best solution so far, . The swarm as a whole remembers the overall best solution globally achieved so far, . The particles
Fitness function selection
The quality of a partition produced by a clustering method can be evaluated with three types of CVIs: external, internal, and relative. External CVIs compare clustering results with the ground truth information. As PSC is usually exploited in situations where such information is not available, external CVIs cannot be used as fitness functions. Instead, they provide a means to evaluate the performance achieved using internal and relative CVIs as fitness functions in test circumstances where the
Definitions of CVIs
In this section, we introduce the 17 CVIs selected for comparative evaluations. All the selected indices except SIL, Dunn, and Ratkowsky & Lance directly depend on the cluster centroids and they are used in fitness evaluation in the traditional way and with the FECC approach. The considered fitness functions are listed in Table 1. The FECC version is denoted with an asterisk after the abbreviation (e.g., BH*). As PSC fitness functions, CVIs are minimized even if another approach (e.g.,
Partition similarity measures
We use three different partition similarity measures (external CVIs) to evaluate the similarity of the ground-truth partitions and the obtained partitions: the Jaccard index [50], the Adjusted Rand index [51], and the labeling error. If Preal is the ground-truth partition and is the partition obtained by PSC, for each pair of items there are four possible cases: 1) they belong to the same cluster in both and , 2) they belong to the same cluster in , but different
Conclusions
Traditionally in dynamic PSC with centroid-based particle encoding, the fitness of particle positions is evaluated using a CVI as a fitness function. Most CVIs somehow depend on the cluster centroids. In fitness evaluation, the cluster centroids are traditionally replaced by centroids proposed by a particle position. In this paper, we propose a new way to conduct fitness evaluation in PSC. In the proposed FECC approach, the actual centroids of the items belonging to the corresponding clusters
References (60)
- et al.
Automatic kernel clustering with a multi-elitist particle swarm optimization algorithm
Pattern Recognit. Lett.
(2008) A perturbed particle swarm algorithm for numerical optimization
Appl. Soft Comput.
(2010)- et al.
Combinatorial particle swarm optimization (CPSO) for partitional clustering problem
Appl. Math. Comput.
(2007) - et al.
Evolutionary RBF classifier for polarimetric SAR images
Expert Syst. Appl.
(2012) - et al.
A non-parametric method to estimate the number of clusters
Comput. Stat. Data Anal.
(2014) Bayesian Ying-Yang machine, clustering and number of clusters
Pattern Recognit. Lett.
(1997)Silhouettes: a graphical aid to the interpretation and validation of cluster analysis
J. Comput. Appl. Math.
(1987)- et al.
An extensive comparative study of cluster validity indices
Pattern Recognit.
(2013) - et al.
New indices for cluster validity assessment
Pattern Recognit. Lett.
(2005) - et al.
Towards a standard methodology to evaluate internal cluster validity indices
Pattern Recognit. Lett.
(2011)
Heterogeneous comprehensive learning particle swarm optimization with enhanced exploration and exploitation
Swarm Evolut. Comput.
Fractional particle swarm optimization in multidimensional search space
IEEE Trans. Syst. Man, Cybern. Part B: Cybern.
Particle swarm optimization - an overview
Swarm Intell.
Orthogonal learning particle swarm optimization
IEEE Trans. Evolut. Comput.
The fully informed particle swarm: simpler, maybe better
IEEE Trans. Evolut. Comput.
Dynamic clustering using particle swarm optimization with application in image segmentation
Pattern Anal. Appl.
A review on particle swarm optimization algorithms and their applications to data clustering
Artif. Intell. Rev.
A review on particle swarm optimization algorithm and its variants to clustering high-dimensional data
Artif. Intell. Rev.
A comparison study of validity indices on swarm-intelligence-based clustering
IEEE Trans. Syst. Man Cybern. Part B: Cybern.
Multidimensional Particle Swarm Optimization for Machine Learning and Pattern Recognition
Perceptual dominant color extraction by multi-dimensional particle swarm optimization
EURASIP J. Adv. Signal Process.
Training radial basis function neural networks for classification via class-specific clustering
IEEE T. Neural Netw. Learn. Syst.
On the shortest spanning subtree of a graph and the traveling salesman problem
Proc. Am. Math. Soc.
An enhanced PSO-based clustering energy optimization algorithm for wireless sensor network
Sci. World J.
Cited by (13)
An artificial bee colony algorithm with a balance strategy for wireless sensor network
2023, Applied Soft ComputingConfiguring differential evolution adaptively via path search in a directed acyclic graph for data clustering
2020, Swarm and Evolutionary ComputationCitation Excerpt :Leveraging the capability of continuously approximating optimal solutions, optimization-based clustering algorithms show competitive performance in clustering data [22]. It is noting that population-based evolutionary algorithms [23–25], such as particle swarm optimization [11,20] and differential evolution [26], have become powerful tools, and provided alternatives to the traditional clustering algorithms. Among various evolutionary algorithms, the differential evolution algorithm and its variants [27–30] are more attractive and widely employed in data cluster analysis [26,31,32].
Prediction of human diseases using optimized clustering techniques
2020, Materials Today: ProceedingsCitation Excerpt :The ability to work with the noisy data, sensible running time and also avoiding it to trap in the local optima makes the algorithms to give a much efficient clusters. The ability of the metaheuristic algorithms are they can find a optimal solution using a objective function and defining a objective function is very important as it has a direct impact on the solution [12]. In literature a large number of nature inspired algorithms are available such as genetic algorithm[13], particle swarm optimization[14], cuckoo search[15], grey wolf optimization[16], whale optimization[17], fire fly algorithm[18], bat algorithm[19] and also other algorithms [20,21].
Swarm intelligence for clustering — A systematic review with new perspectives on data mining
2019, Engineering Applications of Artificial IntelligenceCitation Excerpt :Hence, MDPSO can tackle problems in which the solutions may assume several possible solutions. This approach was also used in Raitoharju et al. (2017) and Cura (2012), to find the optimal number of clusters and simultaneously search for optimal centroid positions. Das et al. (2008) firstly proposed this encoding scheme in the MEPSO clustering algorithm and used this encoding in Chen et al. (2016).
Clustering of multi-view relational data based on particle swarm optimization
2019, Expert Systems with ApplicationsCitation Excerpt :The top three fitness functions highlighted were the Silhouette index, the Xu index and the Intra-cluster homogeneity. Other researchers obtained similar results regarding the ranking of clustering validation indexes, the Xu index was ranked second by Dimitriadou, Dolničar, and Weingessel (2002), and within top three by Raitoharju, Samiee, Kiranyaz, and Gabbouj (2017). In general, as can be seen in Table 15, the proposed hybrid methods obtained better results compared to the other algorithms considering the external indexes and the databases used.
- 1
Present address: Department of Electrical Engineering, College of Engineering, Qatar University, Qatar.