Stochastic approximation driven particle swarm optimization with simultaneous perturbation – Who will guide the guide?
Introduction
The Merriam Webster dictionary defines optimization as the mathematical procedures (as finding the maximum of a function) involved in this. More specifically, consider the problem of finding a root θ* (either minimum or maximum point) of the gradient equation: g(θ) ≡ ∂ L(θ)/∂ θ = 0 for some differentiable function L : Rp → R1. When g is present and L is a differentiable and uni-modal function, there are powerful deterministic methods for finding the global θ*such as traditional steepest descent and Newton-Raphson methods. However, in many real problems g cannot be observed directly and/or L is multi-modal, in which case the aforementioned approaches may be trapped into some deceiving local optima. This brought the era of the stochastic optimization algorithms, which can estimate the gradient and may avoid being trapped into a local optimum due to their stochastic nature. One of the most popular stochastic optimization techniques is stochastic approximation (SA), in particular the form that is called “gradient free” SA. Among many SA variants proposed by several researchers such as Styblinski and Tang [37], Kushner [25], Gelfand and Mitter [13], and Chin [6], the one and somewhat different SA application is called simultaneous perturbation SA (SPSA) proposed by Spall in 1992 [35]. The main advantage of SPSA is that it often achieves a much more economical operation in terms of loss function evaluations, which are usually the most computationally intensive part of an optimization process.
Particle swarm optimization (PSO) was introduced by Kennedy and Eberhart [20] in 1995 as a population based stochastic search and optimization process. It is originated from the computer simulation of the individuals (particles or living organisms) in a bird flock or fish school [42], which basically show a natural behavior when they search for some target (e.g. food). Henceforth, PSO exhibits certain similarities with the other evolutionary algorithms (EAs) [4] such as genetic algorithm (GA) [14], genetic programming (GP) [24], evolution strategies (ES) [5], and evolutionary programming (EP) [11]. The common point of all is that EAs are population based methods and they may avoid being trapped in a local optimum; however, this is never guaranteed. In a PSO process, a swarm of particles (or agents), each of which represent a potential solution to an optimization problem; navigate through the search (or solution) space. The particles are initially distributed randomly over the search space and the goal is to converge to the global optimum of a function or a system. Each particle keeps track of its position in the search space and its best solution so far achieved. This is the personal best position (the so-called pbest in [20]) and the PSO process also keeps track of the global best (GB) solution so far achieved by the swarm with its particle index (the so-called gbest in [20]). So during their journey with discrete time iterations, the velocity of each particle in the next iteration is computed by using the best position of the swarm (personal best position of the particle gbest as the social component), its personal best position (pbest as the cognitive component), and its current velocity (the memory term). Both social and cognitive components contribute randomly to the position of the particle in the next iteration.
As a stochastic search algorithm in multi-dimensional (MD) search space, PSO exhibits some major problems similar to the aforementioned EAs. The first one is due to the fact that any stochastic optimization technique depends on the parameters of the optimization problem where it is applied and variation of these parameters significantly affects the performance of the algorithm. This problem is a crucial one for PSO where parameter variations may result in large performance shifts [26]. The second one is due to the direct link of the information flow between particles and gbest, which then “guides” the rest of the swarm and thus resulting in the creation of similar particles with some loss of diversity. Hence this phenomenon increases the likelihood of being trapped in local optima [32] and it is the main cause of the premature convergence problem especially when the search space is of high dimensions [40] and the problem to be optimized is multi-modal [32]. Therefore, at any iteration of a PSO process, gbest is the most important particle; however, it has the poorest update equation, i.e. when a particle becomes gbest, it resides on its personal best position (pbest) and thus both social and cognitive components are nullified in the velocity update equation. Although it guides the swarm during the following iterations, ironically it lacks the necessary guidance to do so effectively. In that, if gbest is (likely to get) trapped in a local optimum, so is the rest of the swarm due to the aforementioned direct link of information flow. This deficiency has been raised in a recent work [22] where an artificial GB particle, the aGB, is created at each iteration as an alternative to gbest, and replaces the native gbest particle as long as it achieves a better fitness score. In that study, it has been shown that such an enhanced guidance alone is indeed sufficient in most cases to achieve global convergence performance on multi-modal functions and even in high dimensions. However, the underlying mechanism for creating the aGB particle, the so-called fractional GB formation (FGBF), is not generic in the sense that it is rather problem dependent, which requires (the estimate of) individual dimensional fitness scores. This may be quite hard or infeasible for certain problems.
In order to address this drawback efficiently, in this paper we shall propose two approaches. The first one moves gbest efficiently or simply put, guides it with respect to the function (or error surface). The idea behind this is quite simple: since the velocity update equation of gbest is quite poor, SPSA as a simple yet powerful search technique is used to drive it instead. Due to its stochastic nature the likelihood of getting trapped into a local optimum is further decreased and with the SA, gbest is driven according to (an approximation of) the gradient of the function. The second approach has a similar idea with the FGBF proposed in [22], i.e. an aGB particle is created by SPSA this time, which is applied over the personal best (pbest) position of the gbest particle. The aGB particle will then guide the swarm instead of gbest if and only if it achieves a better fitness score than the (personal best position of) gbest. Note that both approaches only deal with the gbest particle and hence the internal PSO process remains as is. That is, neither of the proposed approaches is a PSO variant by itself; rather a solution for the problem of the original PSO caused by poor gbest update. Furthermore, we shall demonstrate that the proposed approaches have a negligible computational cost overhead, e.g. only few percent increase of the computational complexity, which can be easily compensated with a slight reduction either in the swarm size or in the iteration number. Both approaches of SA-driven PSO (SAD PSO) will be tested and evaluated against the basic PSO (bPSO) over several benchmark uni- and multi-modal functions in high dimensions. Moreover, they are also applied to the multi-dimensional extension of PSO, the MD-PSO technique proposed in [22], which can find the optimum dimension of the solution space and hence voids the need of fixing the dimension of the solution space in advance. SAD MD-PSO is then tested and evaluated against the standalone MD-PSO application over several data clustering problems where both complexity and the dimension of the solution space (the true number of clusters) are varied significantly.
The rest of the paper is organized as follows. Section 2 surveys the basic PSO (bPSO), MD-PSO methods with the related work in data clustering. The proposed techniques applied over both PSO and MD-PSO are presented in Section 3. Section 4 presents the experimental results over two problem domains, non-linear function minimization and data clustering. Finally, Section 5 concludes the paper.
Section snippets
The basic PSO technique
In the basic PSO method, (bPSO), a swarm of particles flies through an N-dimensional search space where the position of each particle represents a potential solution to the optimization problem. Each particle a in the swarm with S particles, ξ ={x1, …, xa, …, xS}, is represented by the following characteristics:
- •
xa,j(t): jth dimensional component of the position of particle a, at time t
- •
- •
ya,j(t): jth
SPSA overview
The goal of deterministic optimization methods is to minimize a loss function L : Rp → R1, which is a differentiable function of θ and the minimum (or maximum) point θ* corresponds to the zero-gradient point, i.e.
As mentioned earlier, in cases where more than one point satisfies this equation (e.g. a multi-modal problem), then such algorithms may only converge to a local minimum. Moreover, in many practical problems, g is not readily available. This makes the SA algorithms quite
Experimental results
Two problem domains are considered in this paper over which the proposed techniques are evaluated. The first one is non-linear function minimization where several benchmark functions are used. This allows us to test the performance of SAD PSO against bPSO over both uni- and multi-modal functions. The second domain is data clustering, which provides certain constraints in multi-dimensional solution space and allows the performance evaluation in the presence of significant variation in data
Conclusions
In this paper, we draw the focus on a major drawback of the PSO algorithm: the poor gbest update. This can be a severe problem, which may cause premature convergence to local optima since gbest as the common term in the update equation of all particles, is the primary guide of the swarm. Therefore, we basically seek a solution for the social problem in PSO, i.e. “Who will guide the guide?” which resembles the rhetoric question posed by Plato in his famous work on government: “Who will guard the
References (42)
A more efficient global optimization algorithm based on Styblinski and Tang
Neural Networks
(1994)- et al.
Clustering by competitive agglomeration
Pattern Recognition
(1997) - et al.
Evolutionary artificial neural networks by multi-dimensional particle swarm optimization
Neural Networks
(2009) - et al.
Cluster validation using graph theoretic concepts
Pattern Recognition
(1997) - et al.
Experiments in nonconvex optimization: stochastic approximation with function smoothing and simulated annealing
Neural Networks
(1990) - et al.
Swarm intelligence algorithms for data clustering
Soft Computing for Knowledge Discovery and Data Mining Book
(2007) Using selection to improve particle swarm optimization
- P.I. Angeline, Evolutionary optimization versus particle swarm optimization: Philosophy and performance differences,...
- et al.
An overview of evolutionary algorithm for parameter optimization
Evolutionary Computation
(1993) - et al.
Evolutionary algorithms for fuzzy logic: a brief overview
Fuzzy Logic and Soft Computing
(1995)
A cluster separation measure
IEEE Transactions on Pattern Analysis and Machine Intelligence
Well separated clusters and optimal fuzzy partitions
Journal of Cybernetics
Computational Intelligence. PC Tools
On the use of particle swarm optimization with multimodal functions
IEEE Transactions on Evolutionary Computation
Advances in Knowledge Discovery and Data Mining
Recursive stochastic algorithms for global optimization
Rd, SIAM Journal on Control and Optimization
Genetic Algorithms in Search, Optimization and Machine Learning
a comparison of neighbourhood topologies for staff scheduling with particle swarm optimisation
Advances in Artificial Intelligence, Lecture Notes in Computer Science
On cluster validation techniques
Journal of Intelligent Information Systems
Particle swarm optimization with Gaussian mutation
A generic and robust system for automated patient-specific classification of electrocardiogram signals
IEEE Transactions on Biomedical Engineering
Cited by (5)
Global optimization using a multipoint type quasi-chaotic optimization method
2013, Applied Soft Computing JournalCitation Excerpt :The proposed method may be deemed an incorporation of the SPSA method into a PSO-like method, since the proposed method is a multipoint search method and uses PSO-like elite points and the SPGA. For comparison regarding the issue, we also apply stochastic approximation driven PSO (SAD-PSO) proposed by Kiranyaz et al. [9]. In SAD-PSO, the SPSA method is introduced in order to prevent stagnation of the gbest update.
A modified particle swarm optimization for correlated phenomena
2011, Applied Soft Computing JournalCitation Excerpt :Different variants of the PSO algorithms have been presented in [9] that can be classified into four major categories as: hybrid PSO, adaptive PSO, PSO in complex environment, and the other PSOs. More than 35 variants of PSO algorithms have been developed within these four categories [9–17]. In different cases of optimization, various PSO versions are applied.
A hybrid model for mining and classification of gene expression pattern for detecting neurodegenerative disorder
2019, Advances in Intelligent Systems and ComputingAn optimized particle swarm optimization based ann model for clinical disease prediction
2016, Indian Journal of Science and TechnologyA review on particle swarm optimization algorithm and its variants to clustering high-dimensional data
2015, Artificial Intelligence Review