Вы находитесь на странице: 1из 13

Parallelization of a Swarm Intelligence System

Iman Eshraghi
School of Computer Science
University of Ottawa
Ottawa, Canada
ieshr007@uottawa.ca
December 17, 2015
Abstract
Swarm Intelligence is the area dealing with natural or artificial systems whose constituent particles called agents or boids could individually and intelligently interact
with each other towards showing a collective behavior in order to solve an optimization
problem. Among these systems are: Ant Colonies, Bee Colonies, Bird Flocking, Animal
Herding, Fish Schooling and etc. In this project I have tried to study one of these
systems, simulate their behaviour to optimize a problem and then by parallelizing the
code try to compare the performance in each case. The adopted system in this project
has been Artificial Bee Colony and parallelization of it proved to be a good decision in
the end by resulting in speed-up.

Introduction

Quite some time ago I had heard about Particle Swarm Optimization and Swarm Intelligence in a computer graphics course. One of the initial applications of this was simulating
social behaviour of agents. The area seemed interesting to me but remained unexplored
and untouched by me until a few months ago when after some research I decided to learn
more about it and pick Parallelization of a Swarm Intelligence System as the topic of my
research for this course. By this I both intended to learn the algorithm and know how I
can implement it, and further try to parallelize it to see if it can improve and outdo the
sequential way or not.
In this paper, in Section 2 the relevant literature will be overviewed. Section 3 will present
the approach in implementing and parallelizing the system. In Section 4 the results of the
project is presented, and Section 5 concludes the paper.

Literature Review

Swarm intelligence (SI) is the collective behavior of decentralized, self-organized systems,


natural or artificial. The concept of SI is employed in AI. SI systems consist typically of
a population of simple agents or boids interacting locally with one another and with their
environment. The agents follow very simple rules, and although there is no centralized
control structure dictating how individual agents should behave, local, and to a certain degree random, interactions between such agents lead to the emergence of intelligent global
1

behavior, unknown to the individual agents. Examples in natural systems of SI include ant
colonies, bee colonies, bird flocking, fish schooling, animal herding, bacterial growth, and
microbial intelligence. Artificial Bee Colony (ABC) [4] has proven to be a successful way of
solving optimization problems. It is a swarm-based meta-heuristic optimization technique
which has become very popular over the course of recent years. Parallelization of ABC
could lead to faster execution time and also better performance if the right approach is
selected.
Subotic et al. in [9] have tried to parallelize this algorithm by assigning different threads
to separate swarms. The main question in implementing the parallelization is the level of
parallelization as they discuss. By having every cycle of ABC algorithm as an independent
thread they have faced one major disadvantage which is the extensive use of CPU time
during synchronization of knowledge sharing phase, as ABC algorithm normally contains
thousands of cycles. As a result of this problem they have offered some other approaches
including: parallel independent runs, multiple swarms (one best solution), and multiple
swarms (best solutions from all swarms). In the parallel independent runs, threads have no
communication between themselves and every thread runs the same sequential ABC algorithm with different randomness, and the final solution would be the best of all independent
runs. In the multiple swarms approach, more than one swarm is used on the same search
space and they can communicate with each other in order to narrow down the search space.
They have used a number of different benchmark functions for the optimization through the
parallelized algorithm to validate their claim, and have concluded that for simple functions
like Sphere more CPU time is used only for creating and synchronizing threads however for
more complex functions with higher number of parameters parallel runs has led to major
speed-up compared to the serialized version.
Luo et al. [6] have used a communication strategy in their parallelized ABC algorithm.
They have divided the agents into G subpopulations which can evolve independently, i.e.
agents in one subpopulation are not aware of the existence of other subpopulations in the
solution space. Through their design they have reported better accuracy and reduced time
in finding the near best solution.
In one study, parallelization strategies for a distributed memory multiprocessor architecture
under the Message Passing Interface (MPI) has been discussed [10] ,and in another research
[5] a CUDA based Bees Algorithm called CUBA has been proposed for the parallelization
of the algorithm through GPU.
In this paper [8] the author has proposed a parallel ABC algorithm to solve numerical optimization functions by equally distributing the colony of bees at each designated processor.
Then, solutions obtained from each processor are recorded in the local memory assigned to
them, and through a global-shared memory the improved solutions are collected from each
processor which can be used by other swarms.
These days, ABC algorithm is being used in a wide range of different applications such as
data clustering, image analysis, data mining [7], minimum spanning tree with the minimum
total weight, traveling salesman problem and etc. Optimizing this algorithm through parallelization when problem domain and number of involved parameters gets bigger would be
of great benefit.
2

3
3.1

Project Report
Methodology

The Artificial Bee Colony (ABC) algorithm is one of the populations-based metaheuristic
algorithms which is used for numerical optimization. It is dependent on the intelligent foraging behavior of honey bees and was proposed by Karaboga in 2005 [3].
Metaheuristic population-based algorithms will interact and trace out multiple paths in
order to find the optimum solution. So, maybe they are good choices for parallelization!
(since they are tracing multiple paths in parallel). The question or goal here is whether we
can have a parallelized version of this algorithm to achieve better results in terms of speed.

Figure 1: ABC - Solution Space

3.2

Parallelism

MATLAB is a high-level language which is used for various computational purposes. It has
built-in multi-threading capability and many of its commands will run in a multithreaded
fashion at run time. It has support for multiple computation engines besides the Matlab
computation engine and also very high-level constructs that allow parallelization of applications without the hassle of programming for specific hardware and network architectures.
Its interactive IDE provides many useful toolboxes, e.g. it has built in support for other
products or platforms (like CUDA) and can easily scale up to clusters, grids, and clouds
using Distributed Computing Server. All of these reasons, beside my previous experience in
working with the software convinced me to benefit from it for parallelization purposes. For
that aim, I have installed and used Parallel Computing Toolbox which gives access to run
a process on each of the cores of the processor. It can also give access to run computations
on a cluster of machines [1].

SPMD (Single Program, Multiple Data) is a technique employed to achieve parallelism. It


is a subcategory of MIMD. Tasks are split up and run simultaneously on multiple processors
with different input in order to obtain results faster. SPMD is the most common style of
parallel programming, it was proposed first in 1983 in the OPSILA parallel computer lab
and then in 1984 at IBM for highly parallel machines.
spmd construct in Matlab is a way of running parallel code which is fundamentally different
from another command called parfor for parallelizing for loops. spmd is a parallel region,
while parfor is a parallel for loop. The difference is that in spmd region we have a much
larger flexibility when it comes to the tasks we can perform in parallel. We can write a for
loop, we can operate on distributed arrays and vectors, and many other. In Matlab spmd is
used along with labindex (to return the rank of the worker) and numlabs (to return the
total number of workers) in order to exchange data between workers. For Parallelizing the
loop it requires to explicitly divide the loop index range among the workers (using labindex)
[1].
Approach: code inside spmd block is run on all workers, these blocks also give full access
on what is executed on each worker. labSend() can be used to send data to an indicated
specific worker, and labReceive() can receive data from an indicated worker and we can
exchange data between workers in a circular way. Here, by executing an spmd block a
parallel pool will be opened with 4 workers and data can be generated on each worker and
be passed among each other. To have this implemented, an spmd region needs workers to
cooperate on the program as discussed above, so first we issue a matlabpool request to use
a specific number of workers: -matlabpool open local 4
After the initialization step is done, which will be discussed in part 3.3.1, we will divide
the entire colony of bees among the number processors or workers in Matlab. Each worker
will have a set of solutions in its local memory and a copy of each solution is also kept in
the global shared memory. During each cycle, bees at a specific worker will try to improve
the solutions in the local memory. In the end of the cycle, the solutions will be copied into
the corresponding slots of the shared memory and overwrite the previous copies. This way
solutions will be accessible to other workers as well. We define p as the number of workers
(processors) and SNp as the number of solutions in the local memory of each worker which
equals to the number of employed bees and the number of onlooker bees per worker. This

could be implemented in the following way within the spmd block.


spmd
SN p = SourceN um/numlabs;
a = (labindex 1) SN p + 1;
b = SN p labindex;
S p f ood = f ood(a : b);
[minv alue, index] = min([S p f ood.f it]);
BestSource = S p f ood(index);
.
.
EmployedSearchP hase
OnlookerSearchP hase
ScoutSearchP hase
.
.
end
Now, we will discuss the implementation of each of the internal phases above in the rest of
the paper.

3.3

Implementation

For implementing the parallelized version of ABC, first we need to initialize the parameters
of the algorithm, like the number of bees (e.g. Employed bees totalling %50 of all bees,
same as Onlooker bees), and then we iterate through the following steps (from 2 to 5) until
the stop condition holds true (maximum cycle number is reached):
1- initializing the food source positions
2- each employed bee creates a new food source in the current position and exploits the
better source
3- each onlooker bee selects a source depending on its solution quality, creates a new food
source and exploits the better source
4- determining the source that needs to be abandoned, and assigning its employed bees as
scout bees to search for new food sources
5- memorizing the best food source up to now
At first the number of Employed bees is equal to the number of solutions. In the initialization step Employed bees are dispatched towards a set of random food sources and
measure the nectar, then they come back to the hive and share their knowledge with other
bees waiting there. Each search loop here would contain three parts:
In the first part, each Employed bee goes to the food source domain that it had visited
in the previous cycle which had kept in its memory, then a new food source is picked in the
neighbourhood based on the visual collected data. In the next part, an Onlooker bee selects
a domain for food source depending on the nectar data shared by the Employed bees. Thus,
5

dancing Employed bees having information about higher nectars encourage Onlooker bees
to those food sources. After entering the selected domain or region, those bees select a new
food source in its neighbourhood by comparing the direction or distance to those sources.
In the third part, when the food source is released by bees a new one is substituted and
assigned to them randomly by the Scout bees.
In the end the position of the food source will show a solution for the optimization problem
and the nectar amount will show the quality of the solution.
Now I will further explain how each of the parts discussed above is implemented.
3.3.1

part 1 - initialization

First, the lower and upper bound (range), and number of variable is defined for the function
that is intended to be optimized through the algorithm:
lb = -100; ub = 100; nvar = 5;
Then, other initializations as explained in 3.3 is done:
BeeNum = 100000 I have chosen to have a colony of 100000 bees
SourceNum = round (BeeNum/2) The number of food sources is equal to half of the
number of bees
OnlookerNum = BeeNum SourceNum The other half would be the Onlooker ones
MaxCycle = 20 After this number of cycles the loop will stop
Limit = 50 This shows when to leave a food source
TryToImprove = zeros (SourceNum,1) The number of times that is attempted to
improve the source, but without success, will be stored in this matrix
A source struct is created for each food source. This struct will have a field for position
and a field for fitness to show the quality level of that food source:
source.pos =[ ]; source.fit =[ ];
Now I need an array (called food) equal to the size of the food source:
food = repmat (source,SourceNum,1) here we get a matrix with the value of source
variable with SourceNum rows and 1 column
In the next step, within a loop each source is initialized by giving it a random position, and
then its fitness is calculated:
f ori = 1 : SourceN um
f ood(i).pos = lb + rand(1, nvar). (ub lb);
f ood(i).f it = f it eval(f ood(i).pos);
end
In the first statement inside the loop, a random position based on the range (ub and lb)
and number of variables is created for sources and by calling fit eval function the quality of
that source is measured.
After randomly creating the sources, we need to find the best source like this:
[min value,index ]= min ([food(i).fit]) here we find the source with the lowest fit value
and put it in min value and save the source number in index.
6

GlobalBestSource = food (index) we can now keep the source with the known index
as the best source
Following parts of 3.3.2, 3.3.3, 3.3.4, 3.3.5 will all be repeated inside the main loop, and
this is where parallelization will come to our help.
3.3.2

part 2 - creation & exploitation

Each Employed bee creates a new food source, then moves and exploits the better source.
They will create the new source in this way:
Xij (t + 1) = Xij (t) + ij (Xij (t) Xkj (t))
Xij is the position of the current source where the bee is, Xkj is the position of the neighbouring source, t is the iteration number, and ij is a random number between -1 and 1.
We save the position of each source in X (each Employed bee is equivalent to one food
source) (2). Each bee in order to create a new food source in its neighbouring space first
selects another source from among other sources and then randomly selects one of the
dimensions of its own source and change its value (3).
f ori = 1 : SourceN um

(1)

X = f ood(i).pos;

(2)

otherSource = [1 : i 1, i + 1 : SourceN um];

(3)

k = randi([1, length(otherSource)], 1);

(4)

ind = otherSource(k);

(5)

neighbour = f ood(ind).pos;

(6)

j = randi([1, nvar], 1);

(7)

X(j) = X(j) + unif rnd(1, 1, size(j)). (X(j) neighbour(j));

(8)

X = max(X, lb);

(9)

X = min(X, ub);

(10)

X f it = f it eval(X);

(11)

if X f it < f ood(i).f it

(12)

f ood(i).f it = X f it;

(13)

f ood(i).pos = X;

(14)

T ryT oImprove(i) = 0;

(15)

else

(16)
T ryT oImprove(i) = T ryT oImprove(i) + 1;

end

(17)
(18)

end

(19)

By making a random number(k) between 1 and the number of other sources, we assign the
k-th value of otherSource as the number for the neighbouring source (ind) (4,5), and define
a variable (neighbour) with the position of the neighbouring source as its value (6).
We then select an attribute randomly which would be a dimension of position and update
the selected dimension (7,8). By doing that we have created a new food source which is in
X. We have to check whether the new source is within the defined domain or not, if not we
7

have to modify it (9,10). Now it is time to evaluate the new source, if the source is better
than the i-th source we would replace it (11,12).
3.3.3

part 3 - selection

Each Onlooker bee selects a source randomly based on its quality. It is first needed to
calculate the probability of each source for selection, and for this we use the Fitness Proportionate Selection or Roulette Wheel probability formula. The more suitable the food
source the more would be its chances of getting selected. Probability of selecting a nectar
source is:
F (i )
Pi = S
X
F (k )
k=1

S : the number of bees


i : the position of the i-th bee
F (i ) : the fitness value
This is how we calculate this probability:
fit = [food.fit]
maxfit = max (fit)
fit = maxfit fit
selection probability = fit/sum (fit)
P = cumsum (selection probability)
After calculating P, we generate a random number and then select the very first source whose
P value is higher than the generated random number. After that, we put the position of
selected bee in X and create its neighbourhood, and if the created neighbour is better we
will then replace it again.
f orn = 1 : OnlookerN um
i = f ind(rand <= P, 1,0 f irst0 );
X = f ood(i).pos;
otherSource = [1 : i 1, i + 1 : SourceN um];
.
.
end
3.3.4

part 4 - abandonment & search

If the source does not get improved after a certain number of trial times kept in a variable
(limit) then that food source should be abandoned. That is the reason we count this number
and hold a value for it in T ryT oImprove variable, and increment it each time the source is
not improved.
We find these unimproved sources (Q) and then randomly create positions for them. By
doing this we are actually doing a new search:

Q = find (TryToImprove>Limit)
f orj = 1 : length(Q)
i = Q(j);
f ood(i).x = lb + rand(1, nvar). (ub lb);
f ood(i).f it = f ite val(f ood(i).x);
T ryT oImprove(i) = 0;
end
3.3.5

part 5 - memorization

We can find the best source with the lowest fit value:
[min value,index] = min([food.fit])
We keep the minimum value and number of that source. If the optimized value that we
found is better than the best global source then we will replace it with that one and have
this as the optimum value.
if min value < GlobalBestSource.f it
GlobalBestSource = f ood(index);
end
After that, the best solution and mean are kept for each cycle:
bestFit(cycle) = GlobalBestSource.fit
meanFit(cycle) = mean ([food.fit])

Results

Here in this section the results for the serial and parallelized versions are compared. The
following results are obtained on a machine with an Intel Core i7-3770 quad core processor
(processing frequency: 3.40 GHz) along with a 16 GB RAM. A sample function is used to
be optimized for a colony of 80000 and 180000 bees respectively with max cycle number of
10 and a limit of 25.
Colony Size: 80,000 - Sequential
In cycle 1 Best found Fitness is: 198.7254
In cycle 2 Best found Fitness is: 54.6097
In cycle 3 Best found Fitness is: 18.7218
.
.
In cycle 9 Best found Fitness is: 10.3675
In cycle 10 Best found Fitness is: 2.4581
BEST Solutions are = -1.2595 -0.53635 0.1315 0.29282 -0.69369
BEST Fitness is = 2.4581
Running time: 120.832971 seconds

Colony Size: 180,000 - Sequential


In cycle 1 Best found Fitness is: 105.0669
.
.
In cycle 8 Best found Fitness is: 4.6674
In cycle 9 Best found Fitness is: 2.068
In cycle 10 Best found Fitness is: 1.433
BEST Solutions are = 0.86437 -0.40237 -0.6945 -0.030165 0.20176
BEST Fitness is = 1.433
Running time: 505.501860 seconds
In the following Figure we can see both the best solution and the mean value in each
cycle for the sequential case.

Figure 2: Sequential - diagram

Colony Size: 80,000 - Parallel (shown for the last cycle with 4 workers)
... Connected to 4 workers ..
.
Lab 4:
In cycle 10 Best found Fitness is: 0.011768
Lab 1:
In cycle 10 Best found Fitness is: 0.048066
Lab 3:
In cycle 10 Best found Fitness is: 0.023486
Lab 2:
In cycle 10 Best found Fitness is: 0.035126
BEST Solutions are = 0.033013 0.14476 -0.03946 -0.064349 0.038852
BEST Fitness is = 0.011768
Running time: 62.352175 seconds
10

Sending a stop signal to all the workers ... stopped.


Colony Size: 180,000 - Parallel (shown for the last cycle with 4 workers)
.
Lab 2:
In cycle 10 Best found Fitness is: 0.046209
Lab 4:
In cycle 10 Best found Fitness is: 0.06528
Lab 1:
In cycle 10 Best found Fitness is: 0.08433
Lab 3:
In cycle 10 Best found Fitness is: 0.009253
BEST Solutions are = -0.054256 -0.027253 -0.0075901 0.073788 -0.050791
BEST Fitness is = 0.009253
Running time: 174.825952 seconds
In the following Figure we can see both the best solution and the mean value in each
cycle for the parallelized case with four workers.

Figure 3: Parallel - diagram

Speedup :
80, 000 :

SerialExecutionT ime(1worker)
P arallelExecutionT ime(4workers)

180, 000 :

SerialExecutionT ime(1worker)
P arallelExecutionT ime(4workers)

120.8s
62.4s

= 1.9x

505.5s
174.9s

= 2.9x

By analysing the results we understand that through parallelization technique we could


11

improve the running time of the algorithm, and that is something we had intended to
achieve. However, quality of solutions is not improved in all cases, which indicates that the
parallelization technique has no significant effect on the quality of solutions.
We can clearly see that by increasing the colony size, the parallelized version will even
work better and enhance the results by reducing the execution time in half, one third, one
fourth, and so forth as a result of doubling, tripling,.. the size of colony. It is also expected
that the parallelized version will provide much better results in terms of speed for optimizing
complex functions which need higher and more expensive processing.

Conclusion

Through this research I have tried to fully learn the ABC algorithm and paralleize it in a
new way. It is understood that this parallelization of ABC can lead to a speed-up. However, the solutions produced by the parallel ABC are quite similar to those produced by the
sequential ABC algorithm.
For future work I intend to extend and test this parallel algorithm on Cluster by benefiting from Matlab Distributed Computing Server feature, and find out how the speed up
could be different in that case. Having the proper hardware, another intriguing possibility
would also be to run and test a version of the code on GPU which can be achieved through
Matlabs support for CUDA-enabled NVIDIA GPUs.

12

References
[1] Matlab parallel computing.
http://www.mathworks.com/products/parallelcomputing/features.html. Accessed: 2015-12-05.
[2] Fahad S. Abu-Mouti and Mohamed E. El-Hawary. Overview of artificial bee colony
(abc) algorithm and its applications. 2012 IEEE International Systems Conference
SysCon 2012, Mar 2012.
[3] Dervis Karaboga. An idea based on honey bee swarm for numerical optimization.
Technical report, Erciyes University, Engineering Faculty, Computer Engineering Department, Kayseri/Trkiye, 2005.
[4] Dervis Karaboga, Beyza Grkemli, Celal Ozturk, and Nurhan Karaboga. A comprehensive survey: artificial bee colony (abc) algorithm and applications. Artif. Intell. Rev.,
42(1):2157, 2014.
[5] Guo-Heng Luo, Sheng-Kai Huang, Yue-Shan Chang, and Shyan-Ming Yuan. A parallel
bees algorithm implementation on GPU. Journal of Systems Architecture, 60(3):271
279, Mar 2014.
[6] Ruhai Luo, Tien-Szu Pan, Pei-Wei Tsai, and Jeng-Shyang Pan. Parallelized artificial
bee colony with ripple-communication strategy. 2010 Fourth International Conference
on Genetic and Evolutionary Computing, Dec 2010.
[7] David Martens, Bart Baesens, and Tom Fawcett. Editorial survey: Swarm intelligence
for data mining. Machine Learning, 82(1):142, 2011. 10.1007/s10994-010-5216-5.
[8] Harikrishna Narasimhan. Parallel artificial bee colony (pabc) algorithm. 2009 World
Congress on Nature & Biologically Inspired Computing (NaBIC), 2009.
[9] Milos Subotic, Milan Tuba, and Nadezda Stanarevic. Parallelization of the artificial bee colony (abc) algorithm. In Proceedings of the 11th WSEAS International
Conference on Nural Networks and 11th WSEAS International Conference on Evolutionary Computing and 11th WSEAS International Conference on Fuzzy Systems,
NN10/EC10/FS10, pages 191196, Stevens Point, Wisconsin, USA, 2010. World Scientific and Engineering Academy and Society (WSEAS).
[10] Dusan Ramljakb Milica Selmicc & Dusan Teodorovicc Tatjana Davidovica, Tatjana Jaksica. Parallelization strategies for bee colony optimization based on message
passing communication protocol. Optimization, 62(8):11131142, Aug 2013.

13

Вам также может понравиться