Regular Paper
Particle swarm clustering fitness evaluation with computational centroids

https://doi.org/10.1016/j.swevo.2017.01.003Get rights and content

Abstract

In this paper, we propose a new way to carry out fitness evaluation in dynamic Particle Swarm Clustering (PSC) with centroid-based encoding. Generally, the PSC fitness function is selected among the clustering validity indices and most of them directly depend on the cluster centroids. In the traditional fitness evaluation approach, the cluster centroids are replaced by the centroids proposed by a particle position. We propose to first compute the centroids of the corresponding clusters and then use these computational centroids in fitness evaluation. The proposed way is called Fitness Evaluation with Computational Centroids (FECC). We conducted an extensive set of comparative evaluations and the results show that FECC leads to a clear improvement in clustering results compared to the traditional fitness evaluation approach with most of the fitness functions considered in this study. The proposed approach was found especially beneficial when underclustering is a problem. Furthermore, we evaluated 31 fitness functions based on 17 clustering validity indices using two PSC methods over a large number of synthetic and real data sets with varying properties. We used three different performance criteria to evaluate the clustering quality and found out that the top three fitness functions are Xu index, WB index, and Dunn variant DU23 applied using FECC. These fitness functions consistently performed well for both PSC methods, for all data distributions, and according to all performance criteria. In all test cases, they were clearly among the better half of the fitness functions and, in the majority of the cases, they were among the top 4 functions. Further guidance for improved fitness function selection in different situations is provided in the paper.

Introduction

Particle Swarm Optimization (PSO) has global convergence ability and, due to its stochastic nature, it can avoid local optima. Therefore, it has been widely exploited to solve complex clustering tasks, where simpler clustering algorithms, such as K-means, are likely to get stuck into a local optimum – possibly far from a satisfactory result. PSO can be also used to simultaneously optimize the number of clusters. The term Particle Swarm Clustering (PSC) refers to clustering conducted using PSO. In PSC, fitness evaluation, i.e., the criteria used to evaluate the quality of each solution found by the algorithm, is critical to obtain successful clustering results. Nevertheless, this aspect of PSC has not received much attention before. Most novel PSC applications simply adopt their fitness evaluation approach from a previous work, while in most cases, the results could be easily improved via minor modifications in the fitness evaluation as will be shown in this paper.

PSC algorithms typically use clustering validity indices (CVIs) as their fitness functions. Most CVIs depend on cluster centroid positions. Traditionally in PSC fitness evaluation, the cluster centroids in a CVI formula are directly replaced by the centroids proposed by a particle position. However, computational centroids of the formed clusters are usually different from the particle positions. We propose a novel approach, where the fitness evaluation is carried out using computational centroids. We call the approach Fitness Evaluation with Computational Centroids (FECC). We explain why the FECC approach can better exploit promising solutions and why it can lead to improved final clustering results compared to the traditional approach. Finally, we conduct a thorough comparison of FECC and the traditional fitness evaluation with several different CVIs to show the superiority of the new approach.

Our second objective is to exhaustively investigate which CVIs are the most suitable fitness functions for PSC. Despite the importance of this question, exhaustive comparisons of PSC fitness functions are lacking. While several general CVI comparisons can be found in the literature, the results cannot be directly assumed to help in PSC fitness function selection due to different requirements. In this paper, we will discuss previous CVI comparisons and the results relevant for PSC fitness function selection. Thereafter, we conduct an extensive comparison of PSC fitness functions based on different CVIs.

As different fitness functions or a different approach to perform fitness evaluation may be optimal for different data types, we will evaluate the effect of typical dataset characteristics (true number of clusters, dimensionality, asymmetric density, overlap, and noise). The results will be analyzed with statistical significance tests. In the experiments, we use two different PSC approaches, namely Multi-elitist Particle Swarm Optimization (MEPSO) [1] and Multi-dimensional Particle Swarm Optimization (MDPSO) along with the Fractional Global Best Formation (FGBF) method [2]. Both approaches use centroid-based particle encoding and can be used for dynamic clustering, where the optimal number of clusters is searched for simultaneously with the optimal centroid positions.

The rest of the paper is organized as follows. PSC and the applied PSC methods, MEPSO and MDPSO, are briefly described in Section 2. Fitness evaluation in PSC is discussed in Section 3. First, the importance and problems of selecting a proper CVI as the fitness function are discussed in section 3.1 and the proposed FECC approach is introduced in Section 3.2. Section 4 introduces a rich set of CVI considered in this paper along with previous CVI comparisons. Experimental results are given in Section 5 and, finally, Section 6 concludes the paper.

Section snippets

Particle swarm clustering

The basic form of the PSO algorithm was introduced in [3] and later modified in [4]. In the algorithm, a swarm of S particles flies stochastically through an N-dimensional search space, where each particle's position represents a potential solution to an optimization problem. Each particle p with current position xp and current velocity vp remembers its personal best solution so far, bp. The swarm as a whole remembers the overall best solution globally achieved so far, bS. The particles

Fitness function selection

The quality of a partition produced by a clustering method can be evaluated with three types of CVIs: external, internal, and relative. External CVIs compare clustering results with the ground truth information. As PSC is usually exploited in situations where such information is not available, external CVIs cannot be used as fitness functions. Instead, they provide a means to evaluate the performance achieved using internal and relative CVIs as fitness functions in test circumstances where the

Definitions of CVIs

In this section, we introduce the 17 CVIs selected for comparative evaluations. All the selected indices except SIL, Dunn, and Ratkowsky & Lance directly depend on the cluster centroids and they are used in fitness evaluation in the traditional way and with the FECC approach. The considered fitness functions are listed in Table 1. The FECC version is denoted with an asterisk after the abbreviation (e.g., BH*). As PSC fitness functions, CVIs are minimized even if another approach (e.g.,

Partition similarity measures

We use three different partition similarity measures (external CVIs) to evaluate the similarity of the ground-truth partitions and the obtained partitions: the Jaccard index [50], the Adjusted Rand index [51], and the labeling error. If Preal is the ground-truth partition and PPSO is the partition obtained by PSC, for each pair of items xi,xj,ij there are four possible cases: 1) they belong to the same cluster in both Preal and PPSO, 2) they belong to the same cluster in Preal, but different

Conclusions

Traditionally in dynamic PSC with centroid-based particle encoding, the fitness of particle positions is evaluated using a CVI as a fitness function. Most CVIs somehow depend on the cluster centroids. In fitness evaluation, the cluster centroids are traditionally replaced by centroids proposed by a particle position. In this paper, we propose a new way to conduct fitness evaluation in PSC. In the proposed FECC approach, the actual centroids of the items belonging to the corresponding clusters

References (60)

  • N. Lynn et al.

    Heterogeneous comprehensive learning particle swarm optimization with enhanced exploration and exploitation

    Swarm Evolut. Comput.

    (2015)
  • S. Kiranyaz et al.

    Fractional particle swarm optimization in multidimensional search space

    IEEE Trans. Syst. Man, Cybern. Part B: Cybern.

    (2010)
  • J. Kennedy, R. Eberhart, Particle swarm optimization, in: Proceedings of IEEE International Conference on Neural...
  • Y. Shi, R. Eberhart, A modified particle swarm optimizer, in: Proceedings of IEEE Congr. on Evolutionary Computation,...
  • R. Poli et al.

    Particle swarm optimization - an overview

    Swarm Intell.

    (2007)
  • Z.-H. Zhan et al.

    Orthogonal learning particle swarm optimization

    IEEE Trans. Evolut. Comput.

    (2011)
  • R. Mendes, Population topologies and their influence in particle swarm performance. (Ph.D. thesis), Departamento de...
  • R. Mendes et al.

    The fully informed particle swarm: simpler, maybe better

    IEEE Trans. Evolut. Comput.

    (2004)
  • M. Omran, A. Salman, A. Engelbrecht, Image classification using particle swarm optimization, in: Proceedings of the 4th...
  • M. Omran et al.

    Dynamic clustering using particle swarm optimization with application in image segmentation

    Pattern Anal. Appl.

    (2006)
  • S. Rana et al.

    A review on particle swarm optimization algorithms and their applications to data clustering

    Artif. Intell. Rev.

    (2011)
  • A.A. Esmin et al.

    A review on particle swarm optimization algorithm and its variants to clustering high-dimensional data

    Artif. Intell. Rev.

    (2013)
  • R. Xu, J. Xu, D. Wunsch, Clustering with differential evolution particle swarm optimization, in Proceedings of IEEE...
  • R. Xu et al.

    A comparison study of validity indices on swarm-intelligence-based clustering

    IEEE Trans. Syst. Man Cybern. Part B: Cybern.

    (2012)
  • S. Kiranyaz et al.

    Multidimensional Particle Swarm Optimization for Machine Learning and Pattern Recognition

    (2014)
  • S. Kiranyaz et al.

    Perceptual dominant color extraction by multi-dimensional particle swarm optimization

    EURASIP J. Adv. Signal Process.

    (2009)
  • J. Raitoharju et al.

    Training radial basis function neural networks for classification via class-specific clustering

    IEEE T. Neural Netw. Learn. Syst.

    (2016)
  • J.B. Kruskal

    On the shortest spanning subtree of a graph and the traveling salesman problem

    Proc. Am. Math. Soc.

    (1956)
  • K. Govindarajan, D. Boulanger, V.S. Kumar, Kinshuk, Parallel particle swarm optimization (ppso) clustering for learning...
  • C. Vimalarani et al.

    An enhanced PSO-based clustering energy optimization algorithm for wireless sensor network

    Sci. World J.

    (2016)
  • Cited by (13)

    • Configuring differential evolution adaptively via path search in a directed acyclic graph for data clustering

      2020, Swarm and Evolutionary Computation
      Citation Excerpt :

      Leveraging the capability of continuously approximating optimal solutions, optimization-based clustering algorithms show competitive performance in clustering data [22]. It is noting that population-based evolutionary algorithms [23–25], such as particle swarm optimization [11,20] and differential evolution [26], have become powerful tools, and provided alternatives to the traditional clustering algorithms. Among various evolutionary algorithms, the differential evolution algorithm and its variants [27–30] are more attractive and widely employed in data cluster analysis [26,31,32].

    • Prediction of human diseases using optimized clustering techniques

      2020, Materials Today: Proceedings
      Citation Excerpt :

      The ability to work with the noisy data, sensible running time and also avoiding it to trap in the local optima makes the algorithms to give a much efficient clusters. The ability of the metaheuristic algorithms are they can find a optimal solution using a objective function and defining a objective function is very important as it has a direct impact on the solution [12]. In literature a large number of nature inspired algorithms are available such as genetic algorithm[13], particle swarm optimization[14], cuckoo search[15], grey wolf optimization[16], whale optimization[17], fire fly algorithm[18], bat algorithm[19] and also other algorithms [20,21].

    • Swarm intelligence for clustering — A systematic review with new perspectives on data mining

      2019, Engineering Applications of Artificial Intelligence
      Citation Excerpt :

      Hence, MDPSO can tackle problems in which the solutions may assume several possible solutions. This approach was also used in Raitoharju et al. (2017) and Cura (2012), to find the optimal number of clusters and simultaneously search for optimal centroid positions. Das et al. (2008) firstly proposed this encoding scheme in the MEPSO clustering algorithm and used this encoding in Chen et al. (2016).

    • Clustering of multi-view relational data based on particle swarm optimization

      2019, Expert Systems with Applications
      Citation Excerpt :

      The top three fitness functions highlighted were the Silhouette index, the Xu index and the Intra-cluster homogeneity. Other researchers obtained similar results regarding the ranking of clustering validation indexes, the Xu index was ranked second by Dimitriadou, Dolničar, and Weingessel (2002), and within top three by Raitoharju, Samiee, Kiranyaz, and Gabbouj (2017). In general, as can be seen in Table 15, the proposed hybrid methods obtained better results compared to the other algorithms considering the external indexes and the databases used.

    View all citing articles on Scopus
    1

    Present address: Department of Electrical Engineering, College of Engineering, Qatar University, Qatar.

    View full text