Stochastic approximation driven particle swarm optimization with simultaneous perturbation – Who will guide the guide?

doi:10.1016/j.asoc.2010.07.022

Applied Soft Computing

Volume 11, Issue 2, March 2011, Pages 2334-2347

https://doi.org/10.1016/j.asoc.2010.07.022 Get rights and content

Abstract

The need for solving multi-modal optimization problems in high dimensions is pervasive in many practical applications. Particle swarm optimization (PSO) is attracting an ever-growing attention and more than ever it has found many application areas for many challenging optimization problems. It is, however, a known fact that PSO has a severe drawback in the update of its global best (gbest) particle, which has a crucial role of guiding the rest of the swarm. In this paper, we propose two efficient solutions to remedy this problem using a stochastic approximation (SA) technique. In the first approach, gbest is updated (moved) with respect to a global estimation of the gradient of the underlying (error) surface or function and hence can avoid getting trapped into a local optimum. The second approach is based on the formation of an alternative or artificial global best particle, the so-called aGB, which can replace the native gbest particle for a better guidance, the decision of which is held by a fair competition between the two. For this purpose we use simultaneous perturbation stochastic approximation (SPSA) for its low cost. Since SPSA is applied only to the gbest (not to the entire swarm), both approaches result thus in a negligible overhead cost for the entire PSO process. Both approaches are shown to significantly improve the performance of PSO over a wide range of non-linear functions, especially if SPSA parameters are well selected to fit the problem at hand. A major finding of the paper is that even if the SPSA parameters are not tuned well, results of SA-driven (SAD) PSO are still better than the best of PSO and SPSA. Since the problem of poor gbest update persists in the recently proposed extension of PSO, called multi-dimensional PSO (MD-PSO), both approaches are also integrated into MD-PSO and tested over a set of unsupervised data clustering applications. As in the basic PSO application, experimental results show that the proposed approaches significantly improved the quality of the MD-PSO clustering as measured by a validity index function. Furthermore, the proposed approaches are generic as they can be used with other PSO variants and applicable to a wide range of problems.

Introduction

The Merriam Webster dictionary defines optimization as the mathematical procedures (as finding the maximum of a function) involved in this. More specifically, consider the problem of finding a root θ^* (either minimum or maximum point) of the gradient equation: g(θ) ≡ ∂ L(θ)/∂ θ = 0 for some differentiable function L : R^p → R¹. When g is present and L is a differentiable and uni-modal function, there are powerful deterministic methods for finding the global θ^*such as traditional steepest descent and Newton-Raphson methods. However, in many real problems g cannot be observed directly and/or L is multi-modal, in which case the aforementioned approaches may be trapped into some deceiving local optima. This brought the era of the stochastic optimization algorithms, which can estimate the gradient and may avoid being trapped into a local optimum due to their stochastic nature. One of the most popular stochastic optimization techniques is stochastic approximation (SA), in particular the form that is called “gradient free” SA. Among many SA variants proposed by several researchers such as Styblinski and Tang [37], Kushner [25], Gelfand and Mitter [13], and Chin [6], the one and somewhat different SA application is called simultaneous perturbation SA (SPSA) proposed by Spall in 1992 [35]. The main advantage of SPSA is that it often achieves a much more economical operation in terms of loss function evaluations, which are usually the most computationally intensive part of an optimization process.

Particle swarm optimization (PSO) was introduced by Kennedy and Eberhart [20] in 1995 as a population based stochastic search and optimization process. It is originated from the computer simulation of the individuals (particles or living organisms) in a bird flock or fish school [42], which basically show a natural behavior when they search for some target (e.g. food). Henceforth, PSO exhibits certain similarities with the other evolutionary algorithms (EAs) [4] such as genetic algorithm (GA) [14], genetic programming (GP) [24], evolution strategies (ES) [5], and evolutionary programming (EP) [11]. The common point of all is that EAs are population based methods and they may avoid being trapped in a local optimum; however, this is never guaranteed. In a PSO process, a swarm of particles (or agents), each of which represent a potential solution to an optimization problem; navigate through the search (or solution) space. The particles are initially distributed randomly over the search space and the goal is to converge to the global optimum of a function or a system. Each particle keeps track of its position in the search space and its best solution so far achieved. This is the personal best position (the so-called pbest in [20]) and the PSO process also keeps track of the global best (GB) solution so far achieved by the swarm with its particle index (the so-called gbest in [20]). So during their journey with discrete time iterations, the velocity of each particle in the next iteration is computed by using the best position of the swarm (personal best position of the particle gbest as the social component), its personal best position (pbest as the cognitive component), and its current velocity (the memory term). Both social and cognitive components contribute randomly to the position of the particle in the next iteration.

As a stochastic search algorithm in multi-dimensional (MD) search space, PSO exhibits some major problems similar to the aforementioned EAs. The first one is due to the fact that any stochastic optimization technique depends on the parameters of the optimization problem where it is applied and variation of these parameters significantly affects the performance of the algorithm. This problem is a crucial one for PSO where parameter variations may result in large performance shifts [26]. The second one is due to the direct link of the information flow between particles and gbest, which then “guides” the rest of the swarm and thus resulting in the creation of similar particles with some loss of diversity. Hence this phenomenon increases the likelihood of being trapped in local optima [32] and it is the main cause of the premature convergence problem especially when the search space is of high dimensions [40] and the problem to be optimized is multi-modal [32]. Therefore, at any iteration of a PSO process, gbest is the most important particle; however, it has the poorest update equation, i.e. when a particle becomes gbest, it resides on its personal best position (pbest) and thus both social and cognitive components are nullified in the velocity update equation. Although it guides the swarm during the following iterations, ironically it lacks the necessary guidance to do so effectively. In that, if gbest is (likely to get) trapped in a local optimum, so is the rest of the swarm due to the aforementioned direct link of information flow. This deficiency has been raised in a recent work [22] where an artificial GB particle, the aGB, is created at each iteration as an alternative to gbest, and replaces the native gbest particle as long as it achieves a better fitness score. In that study, it has been shown that such an enhanced guidance alone is indeed sufficient in most cases to achieve global convergence performance on multi-modal functions and even in high dimensions. However, the underlying mechanism for creating the aGB particle, the so-called fractional GB formation (FGBF), is not generic in the sense that it is rather problem dependent, which requires (the estimate of) individual dimensional fitness scores. This may be quite hard or infeasible for certain problems.

In order to address this drawback efficiently, in this paper we shall propose two approaches. The first one moves gbest efficiently or simply put, guides it with respect to the function (or error surface). The idea behind this is quite simple: since the velocity update equation of gbest is quite poor, SPSA as a simple yet powerful search technique is used to drive it instead. Due to its stochastic nature the likelihood of getting trapped into a local optimum is further decreased and with the SA, gbest is driven according to (an approximation of) the gradient of the function. The second approach has a similar idea with the FGBF proposed in [22], i.e. an aGB particle is created by SPSA this time, which is applied over the personal best (pbest) position of the gbest particle. The aGB particle will then guide the swarm instead of gbest if and only if it achieves a better fitness score than the (personal best position of) gbest. Note that both approaches only deal with the gbest particle and hence the internal PSO process remains as is. That is, neither of the proposed approaches is a PSO variant by itself; rather a solution for the problem of the original PSO caused by poor gbest update. Furthermore, we shall demonstrate that the proposed approaches have a negligible computational cost overhead, e.g. only few percent increase of the computational complexity, which can be easily compensated with a slight reduction either in the swarm size or in the iteration number. Both approaches of SA-driven PSO (SAD PSO) will be tested and evaluated against the basic PSO (bPSO) over several benchmark uni- and multi-modal functions in high dimensions. Moreover, they are also applied to the multi-dimensional extension of PSO, the MD-PSO technique proposed in [22], which can find the optimum dimension of the solution space and hence voids the need of fixing the dimension of the solution space in advance. SAD MD-PSO is then tested and evaluated against the standalone MD-PSO application over several data clustering problems where both complexity and the dimension of the solution space (the true number of clusters) are varied significantly.

The rest of the paper is organized as follows. Section 2 surveys the basic PSO (bPSO), MD-PSO methods with the related work in data clustering. The proposed techniques applied over both PSO and MD-PSO are presented in Section 3. Section 4 presents the experimental results over two problem domains, non-linear function minimization and data clustering. Finally, Section 5 concludes the paper.

Section snippets

The basic PSO technique

In the basic PSO method, (bPSO), a swarm of particles flies through an N-dimensional search space where the position of each particle represents a potential solution to the optimization problem. Each particle a in the swarm with S particles, ξ ={x₁, …, x_a, …, x_S}, is represented by the following characteristics:

•
x_a,j(t): jth dimensional component of the position of particle a, at time t
•
$v_{a, j} (t) : j th dimensional component of the velocity of particle a, at time t$
•
y_a,j(t): jth

SPSA overview

The goal of deterministic optimization methods is to minimize a loss function L : R^p → R¹, which is a differentiable function of θ and the minimum (or maximum) point θ^* corresponds to the zero-gradient point, i.e. $g (θ) \equiv {\frac{\partial L (θ)}{\partial θ}|}_{θ = θ^{*}} = 0$

As mentioned earlier, in cases where more than one point satisfies this equation (e.g. a multi-modal problem), then such algorithms may only converge to a local minimum. Moreover, in many practical problems, g is not readily available. This makes the SA algorithms quite

Experimental results

Two problem domains are considered in this paper over which the proposed techniques are evaluated. The first one is non-linear function minimization where several benchmark functions are used. This allows us to test the performance of SAD PSO against bPSO over both uni- and multi-modal functions. The second domain is data clustering, which provides certain constraints in multi-dimensional solution space and allows the performance evaluation in the presence of significant variation in data

Conclusions

In this paper, we draw the focus on a major drawback of the PSO algorithm: the poor gbest update. This can be a severe problem, which may cause premature convergence to local optima since gbest as the common term in the update equation of all particles, is the primary guide of the swarm. Therefore, we basically seek a solution for the social problem in PSO, i.e. “Who will guide the guide?” which resembles the rhetoric question posed by Plato in his famous work on government: “Who will guard the

References (42)

D.C. Chin
A more efficient global optimization algorithm based on Styblinski and Tang
Neural Networks
(1994)
H. Frigui et al.
Clustering by competitive agglomeration
Pattern Recognition
(1997)
S. Kiranyaz et al.
Evolutionary artificial neural networks by multi-dimensional particle swarm optimization
Neural Networks
(2009)
N.R. Pal et al.
Cluster validation using graph theoretic concepts
Pattern Recognition
(1997)
M.A. Styblinski et al.
Experiments in nonconvex optimization: stochastic approximation with function smoothing and simulated annealing
Neural Networks
(1990)
A. Abraham et al.
Swarm intelligence algorithms for data clustering
Soft Computing for Knowledge Discovery and Data Mining Book
(2007)
P.J. Angeline
Using selection to improve particle swarm optimization
P.I. Angeline, Evolutionary optimization versus particle swarm optimization: Philosophy and performance differences,...
T. Back et al.
An overview of evolutionary algorithm for parameter optimization
Evolutionary Computation
(1993)
T. Back et al.
Evolutionary algorithms for fuzzy logic: a brief overview
Fuzzy Logic and Soft Computing
(1995)

D.L. Davies et al.

A cluster separation measure

IEEE Transactions on Pattern Analysis and Machine Intelligence

(1979)

J.C. Dunn

Well separated clusters and optimal fuzzy partitions

Journal of Cybernetics

(1974)

R. Eberhart et al.

Computational Intelligence. PC Tools

(1996)

S.C. Esquivel et al.

On the use of particle swarm optimization with multimodal functions

IEEE Transactions on Evolutionary Computation

(2003)

U.M. Fayyad et al.

Advances in Knowledge Discovery and Data Mining

(1996)

S.B. Gelfand et al.

Recursive stochastic algorithms for global optimization

Rd, SIAM Journal on Control and Optimization

(1991)

D. Goldberg

Genetic Algorithms in Search, Optimization and Machine Learning

(1989)

M. Gunther et al.

a comparison of neighbourhood topologies for staff scheduling with particle swarm optimisation

Advances in Artificial Intelligence, Lecture Notes in Computer Science

(2009)

M. Halkidi et al.

On cluster validation techniques

Journal of Intelligent Information Systems

(2001)

H. Higashi et al.

Particle swarm optimization with Gaussian mutation

T. Ince et al.

A generic and robust system for automated patient-specific classification of electrocardiogram signals

IEEE Transactions on Biomedical Engineering

(2009)

Cited by (5)

Global optimization using a multipoint type quasi-chaotic optimization method
2013, Applied Soft Computing Journal
Citation Excerpt :
The proposed method may be deemed an incorporation of the SPSA method into a PSO-like method, since the proposed method is a multipoint search method and uses PSO-like elite points and the SPGA. For comparison regarding the issue, we also apply stochastic approximation driven PSO (SAD-PSO) proposed by Kiranyaz et al. [9]. In SAD-PSO, the SPSA method is introduced in order to prevent stagnation of the gbest update.
This paper proposes a new global optimization method called the multipoint type quasi-chaotic optimization method. In the proposed method, the simultaneous perturbation gradient approximation is introduced into a multipoint type chaotic optimization method in order to carry out optimization without gradient information. The multipoint type chaotic optimization method, which has been proposed recently, is a global optimization method for solving unconstrained optimization problems in which multiple search points which implement global searches driven by a chaotic gradient dynamic model are advected to their elite search points (best search points among the current search histories). The chaotic optimization method uses a gradient to drive search points. Hence, its application is restricted to a class of problems in which the gradient of the objective function can be computed. In this paper, the simultaneous perturbation gradient approximation is introduced into the multipoint type chaotic optimization method in order to approximate gradients so that the chaotic optimization method can be applied to a class of problems for which only the objective function values can be computed. Then, the effectiveness of the proposed method is confirmed through application to several unconstrained multi-peaked, noisy, or discontinuous optimization problems with 100 or more variables, comparing to other major meta-heuristics.
A modified particle swarm optimization for correlated phenomena
2011, Applied Soft Computing Journal
Citation Excerpt :
Different variants of the PSO algorithms have been presented in [9] that can be classified into four major categories as: hybrid PSO, adaptive PSO, PSO in complex environment, and the other PSOs. More than 35 variants of PSO algorithms have been developed within these four categories [9–17]. In different cases of optimization, various PSO versions are applied.
The wide applicability of correlation analysis inspired the development of this paper. In this paper, a new correlated modified particle swarm optimization (COM-PSO) is developed. The Correlation Adjustment algorithm is proposed to recover the correlation between the considered variables of all particles at each of iterations. It is shown that the best solution, the mean and standard deviation of the solutions over the multiple runs as well as the convergence speed were improved when the correlation between the variables was increased. However, for some rotated benchmark function, the contrary results are obtained. Moreover, the best solution, the mean and standard deviation of the solutions are improved when the number of correlated variables of the benchmark functions is increased. The results of simulations and convergence performance are compared with the original PSO. The improvement of results, the convergence speed, and the ability to simulate the correlated phenomena by the proposed COM-PSO are discussed by the experimental results.
A hybrid model for mining and classification of gene expression pattern for detecting neurodegenerative disorder
2019, Advances in Intelligent Systems and Computing
An optimized particle swarm optimization based ann model for clinical disease prediction
2016, Indian Journal of Science and Technology
A review on particle swarm optimization algorithm and its variants to clustering high-dimensional data
2015, Artificial Intelligence Review

View full text

Stochastic approximation driven particle swarm optimization with simultaneous perturbation – Who will guide the guide?

Abstract

Introduction

Section snippets

The basic PSO technique

SPSA overview

Experimental results

Conclusions

Neural Networks

Pattern Recognition

Neural Networks

Pattern Recognition

Neural Networks

Swarm intelligence algorithms for data clustering

Soft Computing for Knowledge Discovery and Data Mining Book

Using selection to improve particle swarm optimization

An overview of evolutionary algorithm for parameter optimization

Evolutionary Computation

Evolutionary algorithms for fuzzy logic: a brief overview

Fuzzy Logic and Soft Computing

A cluster separation measure

IEEE Transactions on Pattern Analysis and Machine Intelligence

Well separated clusters and optimal fuzzy partitions

Journal of Cybernetics

Computational Intelligence. PC Tools

On the use of particle swarm optimization with multimodal functions

IEEE Transactions on Evolutionary Computation

Advances in Knowledge Discovery and Data Mining

Recursive stochastic algorithms for global optimization

Rd, SIAM Journal on Control and Optimization

Genetic Algorithms in Search, Optimization and Machine Learning

a comparison of neighbourhood topologies for staff scheduling with particle swarm optimisation

Advances in Artificial Intelligence, Lecture Notes in Computer Science

On cluster validation techniques

Journal of Intelligent Information Systems

Particle swarm optimization with Gaussian mutation

A generic and robust system for automated patient-specific classification of electrocardiogram signals

IEEE Transactions on Biomedical Engineering