A GPU-based Framework For Large-Scale Multi-Ag

2013 Second IIAI International Conference on Advanced Applied Informatics
A GPU-based Framework for Large-scale Multi-Agent Trafc Simulations

Yoshihito Sano
Graduate School of Informatics
Shizuoka University
Hamamatsu, Japan
gs13017@s.inf.shizuoka.ac.jp
Naoki Fukuta
Graduate School of Informatics
Shizuoka University
Hamamatsu, Japan
fukuta@cs.inf.shizuoka.ac.jp
kind of inuences were obtained if the city has lanes

which only allow electric vehicles to run on them[4]. In
the case shown in [4], the simulation was performed using
approximately 3 million of agents.
It is one of the important key issues to be investigated to
make a multi-agent simulation large scale and increases the
efciency of execution[12]. However, to realize a simulation
which is handling hundreds of thousands of individual agents
to analyze behaviors of them on a very large-scale complex
environment (e.g., an airport, a crowded train station or a
whole megacity) with credible behaviors, it requires massive
amount of computational power.
Our goal is to improve scalability of multi-agentsimulations by accelerating processings of the code that
agents could respond to dynamic environmental changes
by utilizing modern computation infrastructures that were
not recognized as a resource to be used for the purpose.
In this paper, we present a preliminary framework that
allows its users to realize easily GPGPU-based large scale
simulations which can cover various agent models. We
show the framework can help handing a large amount of
processing needs on a multi-agent simulation by efciently
using rich computational resources, such as GPUs.
AbstractIn order to improve the reproducibility of real

situations, agents should respond to dynamic environmental
changes, as well as considering efcient computation of them
since the simulation often becomes huge scale. In this paper, an
approach and the basic architecture for GPGPU-based efcient
and scalable framework is presented, by applying OpenCLbased multi-platform agent code conversion engine. We present
a prototype implementation of the framework to easily test and
try the implemented path planning codes in various settings.
Keywords-multiagent system; trafc simulation; GPU computing;
I. I NTRODUCTION
Multi-agent simulation has been applied to various elds,
including trafc simulation[1], crowd simulation[11], evacuation simulation of the airport[6], etc. On deploying a
good agent simulation, it is important to be able to handle
details in a simulation. In [6], it was introduced that the
proposed agent simulation for an evacuation simulation in
an airport, could handle details of the feeling and the human
relations. It shows that the human beings may not be able
to escape in a certain disaster scenario that were considered
as a safe case for evacuation. To analyze how vehicles
and people act in unusual situations, such as disasters or
events, or to investigate the case that the precise behaviors
of people might affect greatly in the overall behaviors,
agents need to dynamically respond to their environmental
changes. In such a situation, agents should be coded to
respond to such environmental changes and therefore it is
important to be able to handle them in a simulation to run
it in a reasonable time. As an example scenario that agents
have to dynamically respond to such environmental changes,
replanning of the car-agent on a road trafc simulation
should run within a ne grained time in the simulation[3].
It is one of the promising issues to realize a multi-agent
simulation on a large scale. In actual situations, even if
considering a trafc simulation that covers just one city,
it should be inuenced by the trafcs from and to other
connected cities, and in order to simulate a trafc in one city
without considering trafc ows into the city, it is important
to handle millions of vehicles in the simulation to reproduce
the actual phenomenon happening there. For example, a
multi-agent simulation was proposed which considers what
978-0-7695-5071-8/13 $26.00 2013 IEEE
DOI 10.1109/IIAI-AAI.2013.75
II. R ELATED W ORK

There are several works based on the idea of changing
degrees of details in behavior of agents to reect what should
be examined for the users needs to handle a large-scale
multi agent simulation(e.g., [2]). On implementing such a
simulation, the approaches can be divided to two approaches.
In the viewpoint of tradeoffs among the complexly and
granularity of the simulation, often some simplications
should be done. For example, the behavior of each car agent
can be simplied to cover the phenomenon of daily trafc to
handle a whole city in the. Extending the scale of a multiagent simulation is also presented[11]. The method proposed
in [11] limits the possible search space which an agent is
going to move within a linked nodes instead of considering
the whole 3-dimension spaces. In [11], the authors showed
that it can dramatically reduce the calculation of collisions
among agents. These methods can help realize a large-scale
262
CPU. Although agents may perform replanning in order

to respond to the dynamic environmental changes, such
graph algorithms would be used so frequently. However,
even when the shortest path search in a graph could be
calculated on a GPGPU, we may need much intelligent
and complex search algorithms rather than for a shortest
paths in an agent simulation that aims at reproducing more
realistic trafc behaviors, etc. In an agent simulation, an
algorithm to be applied may be varied in their aims and their
corresponding environments. In this paper, our framework
enables the simulation code developers to develop and run a
complicated planning and other processings using GPGPUs.
multi-agent simulation by effectively simplifying the details

in an agent simulation.
Moreover, a number of approaches have been proposed to
realize efcient processing of a large-scale trafc simulation
by massively parallel computers. In [13] as a base for performing an agent simulation, agent server IBM Zonal Agentbased SimulationEnvironment (ZASE) has been developed
which can efciently run thread-level parallel programs.
ZAZE combines two or more agent servers accelerated by
thread-level parallel executions, as well as decomposing the
agent simulation into multiple processes that can be run on
massively parallel computers to realize a large-scale agent
simulation. However, the approach can normally be applied
to SMP-based scalar processor computer clusters.
To improve scalability of a simulation, the use of cloud
computing infrastructures and frameworks, (e.g., Hadoop,
etc.) could also be effective. Since a huge amount of
communications are necessary to synchronize data among
distributed processes, it is crucial to keep their networks
latency low and give enough bandwidth for them to keep
scalability. In addition, the developers who want to perform
a simulation cannot always prepare massive amount of
computers with such low-latency networks. In this paper,
we would initially focus on how a large-scale simulation
could be done in a single or a small number of computers
each of which has many computation cores.
GPU (Graphics Processing Unit) has been widely used to
realize high-performance computing, especially on graphicsrelated operations. In order to effectively utilize the rich
computation resources provided by GPU on non-graphic
operations, GPGPU(General Purpose GPU)-based programming models have been proposed. When an agent simulation
would be run on GPGPU-based computing infrastructure,
the code of agents internal processing have to be developed
having special coding techniques and detailed parameter
tunings for each specic runtime environment. In this paper,
we propose a framework which supports those coding,
verication, and parameter tuning processes effectively, as
well as converting the codes to make it easy to run on
existing simulation systems. Although our work initially
targets a road trafc simulation, it could be extended to
a generic simulation on a given network represented as
a graph. To reproduce phenomenon caused by trafcs of
cars, humans, etc. on a simulation, their most common unit
could be the movements of agents in a graph. In typical
trafc simulations, an origin and a destination for each agent
are given, and them the simulation engine reproduces each
agents moves toward the destination. To reproduce such
moves in a simulation, a kind of graph search algorithms
should be used to nd out each agents itineraries and
paths to the destination. There are some approaches which
calculates the shortest path of a large-scale graph using
GPGPU[9,8,7], and it has been reported that GPGPU can
perform well to compute the shortest paths compared with
III. P ROPOSED F RAMEWORK

There are several GPGPU-based computing platforms
(e.g., CUDA, ATI Stream, OpenCL, etc.). In this paper,
we initially focuses on the use of OpenCL because of its
easiness to learn and wider support of hardware platforms.
When we consider about building an agent simulation, the
person who wants to develop a simulation is not necessarily
a specialist in GPGPU-based programming. In this paper,
we create a framework which can easily build and analyze a
certain scale of agent simulations empowered by GPGPUs
which also allows the developers to easily analyze and
tune its execution speed for optimal executions on a certain
hardware by presenting a kind of instant testing environment.
When agents should respond to dynamic environmental
changes in the simulation, they should perform re-planning,
e.g., a kind of behavior which re-determine the route to
the destination in a car trafc simulation. Our framework
can help the developers perform efcient processing of such
replannings using an acceleration of GPGPU.
GPU is good at performing SMID-computation, which
applies a single instruction to multiple data, as well as
running such SIMD computing threads in parallel. On GPU
processing, a core program of such parallel processing is
called kernel program. Therefore, the code which performs
the same instructions to multiple data in the replanning
process should also be described as a kernel program. In
addition, we should also consider the case that each path
planning algorithm applied to each agent may differ in order
to express each agents behavior. Therefore, we focus on
improvement in the speed of the whole simulation including
the planning for every agent can do parallel processing by
using GPU rather than presenting a fast search and planning
algorithm that could run faster on GPUs.
In the OpenCL programming model, there are two basic
types of parallel processing; data parallel processing and task
parallel processing. Data parallel processing performs single
instruction on each processor to multiple data. The efciency
of computation on data parallel approach often faster by
applying similar computations to multiple data at once. In
our case, this data parallel approach can be applied to path
planning processing for every agent.
263
temporarily. In this function, the scale of a road network

can be shrunk by cutting some links of the road network
while keeping its consistency, and also it can be expanded
by copying them twice or more and connecting them while
preventing the situation that the agents cannot move between
specic nodes that have no inter-connections via the links.
In this way, the system effectively support the tests in its
scale of actual simulations.
Our goal is to help realize a large scale simulation which
simulates various phenomenon on disasters and where any
dynamic events will occur on their road trafc. Since we
considered such use-case scenario, we prepared a function
for giving dynamic changes to its road networks while
running the simulation on the runtime platform. Since the
time when a specic road should be disappeared depends on
a specic event or disaster to be reproduced in a simulation,
we provided the function that allows the users to set up
when such road connections should be disappeared, in what
algorithm (e.g., probabilistic disconnection, etc.). By doing
so, it is possible to easily test various conditions that the
road network might be dynamically changed.
The developer using our framework can use OpenCL

programming model within a C program, and describes
various kinds of computations for the agents, e.g., path
planning. The data of a road network, the number of agents,
etc. can be received as an argument of the specic kernel
functions, and the developer can describe various programs
that access those data. When the coding of kernel programs
are completed, the developer then register them to the
runtime platform by simply specifying their function names.
By doing so, developers can perform a test-simulation with
the newly developed path planning, etc. In addition, we
designed our framework to allow the developers to convert
their developed codes to source codes which can run on
another simulation platform such as MATSim[1]. To make
the code runnable on other simulation platforms, the code
should be nally coded in different languages (e.g., Java,
python, etc.), and it may not be able to use OpenCL
programming model directly. Therefore, our framework uses
a kind of universal description about its computation (e.g.,
path planning to the target) programming language which
does not use OpenCL based notations directly. In order to
enable the above functionality to adapt other simulation
platforms, we prepared a kind of source code converter
which uses external libraries to allow the codes to use
OpenCL on Java, or other programming languages.
We have two methods to import a road network into
the runtime platform. First, the developer can prepare road
networks manually. Using the map data creation function of
our runtime platform, the developer arranges nodes on the
screen of the system and creates map data by connecting
nodes by links. The edited map data can be exported to a
le and the stored map data can be imported to the system
again. The developers can reuse them in another simulation,
or use a base and to make a new map. Second, the developer
can also import map data which can be used in MATSim.
The map data used in MATSim is described in XML. Our
runtime platform can import a road network that has been
stored in a MATSim-compatible XML le, and adjust the
parameters which are relevant, if necessary. Also the runtime
platform can be used to extend road networks by arranging
new nodes and connecting them by links.
Here, we consider about the ways for an improvement
of the scalability of a multi-agent simulation where the
simulation is performed using a large-scale road network.
In such case, the processing needs of each core in a GPU
become huge, depending on the path planning algorithm to
be used and it might not be able to process efciently. In
some case, it might have some corruption on a result because
of lack of resources (e.g., a specic kind of memories, etc.).
In this paper, in order to examine the scalability of the agent
codes to be run (e.g., path planning algorithm etc.) from
the viewpoint of the size of the map which can directly
be processed on the GPU, we prepared a function which
dynamically expand or reduce the size of road network
IV. I MPLEMENTATION
We implemented a runtime platform based on the framework we proposed in the previous section. Figure2 shows
the overview of the runtime platform that we implemented.
It can perform a simple road trafc simulation, and perform
an agents internal processing (e.g., dynamic planning) using
OpenCL-based codings. We implemented four major path
planning algorithms, Dijkstra, A*, RTA*[5], and LRTA*[10]
as a sample code set for further investigation of the users of
our framework.
We conducted a preliminary evaluation for its parallel
processing performance on the path planning algorithms
to validate the potential scalability of our framework. On
each planning problem, each agent receives an origin and
a destination randomly, then we measured the processing
time for the whole processing where all agents retrieved the
route on various conditions in the scale, parameters, and
parallelizing methods that can be specied on OpenCL.
Here, to keep simplicity of the evaluation we used a map
which consists of the 22 links among 12 nodes. We used a
MacBook Pro(os: OS X 10.8.2, cpu: 2.4 GHz Intel Core 2
Duo, compiler: gcc4.2.1 build 5658, gpu: NVIDIA GeForce
320M, memory: 8GB 1067 MHz DDR3) as the experiment
execution environment.
Figure 3 shows the results of total processing time executed by GPU(using OpenCL) for two planning algorithms:
Dijkstra and A*. The horizontal axis shows the number of
agents and the vertical axis shows their processing time.
The graph shows four conditions; GPU sequential path
planning, GPU sequential all process , GPU parallel path
planning, and GPU parallel all process.
264

#$
$
( )

#!
#

'!
#

#'
(
&)

#

#
Figure 1.
Figure 2.
The outline of proposed framework
The overview of execution environment
to issue OpenCL-based processings for GPU parallel path

planning. From the result of Fig. 3, on serial processing
results, when the number of agents increases, the processing
time increased proportionally. However, on parallel processing, we observed that, even when the processing has
been done less than 256 travel agents, their processing
times are mostly same in each experiment setting. From
this result, we have conrmed that the planning could be
performed in parallel by multiple GPU cores. In addition,
we also conrmed that the RTA* and the LRTA* have been
GPU sequential path planning shows the results which

restrict parallel processing for all agents path planning on
single GPU core, and GPU sequential all process shows
the results which include the whole time on processing required tasks in order to issue the OpenCL-based processings
for GPU sequential path planning. GPU parallel path
planning shows the results which allow serial processings
for all agents path planning on two or more GPUs cores.
GPU parallel all process shows the results which include
the whole time on processing all required tasks in order
265
"
!'

!&
!$

!"

!

'

&

$

"
!
!& #" &$ !"' "%& %!" ! "$" $'

Figure 3.
The test about parallel processing
adjustment to each optimal parameter set for each GPU and

their simulation settings. In addition, we are also working on
the implementation of the functionality which can carry out
such auto-optimization tests and adjustments simultaneously
on two or more execution settings which also have different
kind of GPU(s).
performed similarly on their scalability performance.

Next, we conrmed that how these running performances
are different when different GPUs are used. In this comparison, we used NVIDIA GeForce 320M, NVIDIA GeForce
8800GT, and AMD Radeon HD 6750M as the runtime
hardware.
Figure 4 shows the result on comparing the results with
those GPU(s) to the Dijkstra and the A*. The horizontal
axis shows the number of agents and the vertical axis
shows processing time. In comparison between GeForce
320M and GeForce 8800GT, both of which share the same
architecture, GeForce 8800GT could constantly process the
tasks within shorter time. This means that our runtime platform can be used to conrm the difference of throughputs
between GPUs whose architectures are similar or same
but their performances are different. In addition, Figure 4
could be seen that when the number of agents is small
on AMD Radeon environment, Radeon HD 6750M did the
computation shorter than the GPUs compared to NVIDIAs
architecture. However, when the number of agents was
increased, the processing time could become longer. The
runtime platform can help reproduce the phenomenon that,
for example, even when evaluating scalability on the number
of agents NVIDIA-based GPUs would do it within stable
time compared with AMD-based GPUs, and the overheads
of performing parallel processing were different on their
GPU architectures. From this result, it might be said that
the performance of AMD-based GPUs may require suitable
tunings for better scalability. In those experiments, all the
parameters for their memory arrangement on a program, the
degree of maximum parallel threads, etc. were not tuned
for specic problems. Instead, we just used the standard
(default) parameters for those architectures. Currently, we
are working on developing the techniques (as well as their
implementations to our framework) to realize automated
V. C ONCLUSION
In this paper, we presented a framework to help investigate that multi-agent simulations could be scaled up
which would be used as an approach to analyze what could
happen in large scale simulations covering the movements
of people, vehicles and moving objects when disasters or
events occurred. We proposed the framework which could be
useful for investigating scalability issues on such simulations
running on GPU computing resources. We presented a
preliminary case study on a multi-agent trafc simulation
with dynamic road situation changes. For instance, agents
can perform replanning during its short simulation period in
consideration of changes of the road network by disasters,
trafc congestions, and other reasons. We initially prepared
an OpenCL-based implementation on our runtime platform
to cover major four planning algorithms. By using the
proposed framework, when the agents actions are coded
using OpenCL, our framework could reduce the load in
analyzing characteristics of each GPU types, parameters, etc.
Future work includes an evaluation of effectiveness of our
proposed framework on an actual scalability improvement
scenario on a specic simulation problem. In addition,
to make it easy to utilize two or more GPU(s) on our
framework. There could be several approaches utilizing two
or more GPU(s). For example, use two or more GPU(s)
in one computer is one possible scenario, and also run it
on two or more computers each of which has single GPU,
and their combinations. Because it is not easy to prepare all
266

##

!"

##

!"

!

!
#
!

Figure 4.
Comparison by multiple GPU(s)
possible settings as actual execution environment, it could

be helpful to collect some key characteristics from machines
with some typical congurations and then the framework
predicts possible performances for a specic (or optimal)
conguration of equipments.
Currently, the runtime platform can measure execution
performances for each agent program. However, it should
be run on each test environment. To extend the framework
to be deployed on a networked environment is also future
work, to run necessary measurements that should be done on
different computers, etc. Those extensions will help predict
possible performance range on a specic set of computers
for each specic application.
[6] J. Tsai, N. Fridman, E. Bowring, M. Brown, S. Epstein,

G. Kaminka, S. Marsella, A. Ogden, I. Rika, A. Sheel, M. E.
Taylor, X. Wang, A. Zilka, and M. Tambe. Escapes - evacuation
simulation with children, authorities, parents, emotions, and
social comparison. In Proc. of 10th Int. Conf. on Autonomous
Agents and Multiagent Systems (AAMAS2011), pages 457464,
2011.
[7] P. Harish, V. Vineet, and P. J. Narayanan. Large Graph Algorithms for Massively Multithreaded Architectures. Technical
Report IIIT/TR/2009/74, 2009.
[8] V. Vineet, P. Harish, S. Patidar, and P. J. Narayanan. Fast
minimum spanning tree for large graphs on the gpu. In Proc.
of the Conference on High Performance Graphics 2009, HPG
09, pages 167171, New York, NY, USA, 2009. ACM.
R EFERENCES
[9] P. Harish and P. J. Narayanan. Accelerating large graph algorithms on the gpu using cuda. In Proc. of the 14th international
conference on High performance computing, HiPC07, pages
197208, Berlin, Heidelberg, 2007. Springer-Verlag.
[1] M. Balmer, K. Meister, M. Rieser, K. Nagel, and K. Axhausen.

Agent-based simulation of travel demand: Structure and computational performance of matsim-t. In Proc. the 2nd TRB
Conference on Innovations in Travel Modeling, 2008.
[10] T. Ishida and M. Shimbo. Path Learning by Realtime Search.

Japanese Society for Articial Intelligence, 11(3):411419,
1996. (in Japanese)
[2] L. Navarro, F. Flacher and V. Corruble. Dynamic level of

detail for large scale agent-based urban simulations. In Proc. of
10th Int. Conf. on Autonomous Agents and Multiagent Systems
(AAMAS2011), pp. 701708, 2011.
[11] T. Yamashita, T. Okada and I. Noda. Implementation of

Simulation Environment for Control of Huge-scale Pedestrian
Flow. In Joint Agent Workshop and Symposium(JAWS), 2012.
(in Japanese)
[3] E. de la Hoz, I. Marsa-Maestre, M. A. Lopez-Carmona, and

P. Perez. Extending matsim to allow the simulation of route
coordination mechanisms. In Proc. The 1st International
Workshop on Multi-Agent Smart Computing(MASmart 2011),
pages 115, 2011.
[12] Y. Nakajima, S. Yamane and H. Hattori. Multi-model based

Simulation Platform for Urban Trafc Simulation. In Joint
Agent Workshop and Symposium(JAWS), 2010. (in Japanese)
[4] R. Kanamori, T. Morikawa, and T. Ito. Evaluation of special

lanes as incentive policies for promoting electric vehicles. In
Proc. The 1st International Workshop on Multi-Agent Smart
Computing(MASmart 2011), pages 4556, 2011.
[13] S. Kato, G. Yamamoto, H. Tai and H. Mizuta. Large-scale

Trafc Simulation with Scholar SMP Supercomputing System.
In Joint Agent Workshop and Symposium(JAWS), 2008. (in
Japanese)
[5] R. E. Korf. Real-time heuristic search. Articial Intelligence,

42(2-3):189211, 1990.
267

A GPU-based Framework For Large-Scale Multi-Ag

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

A GPU-based Framework For Large-Scale Multi-Ag

Загружено:

Авторское право:

Доступные форматы

2013 Second IIAI International Conference on Advanced Applied Informatics

A GPU-based Framework for Large-scale Multi-Agent Trafc Simulations

kind of inuences were obtained if the city has lanes

AbstractIn order to improve the reproducibility of real

II. R ELATED W ORK

CPU. Although agents may perform replanning in order

multi-agent simulation by effectively simplifying the details

III. P ROPOSED F RAMEWORK

temporarily. In this function, the scale of a road network

The developer using our framework can use OpenCL

The outline of proposed framework

The overview of execution environment

to issue OpenCL-based processings for GPU parallel path

GPU sequential path planning shows the results which

    

   

   

   

   

  

!& #" &$ !"' "%& %!" ! "$" $'

The test about parallel processing

adjustment to each optimal parameter set for each GPU and

performed similarly on their scalability performance.

  

  ## 

Comparison by multiple GPU(s)

possible settings as actual execution environment, it could

[6] J. Tsai, N. Fridman, E. Bowring, M. Brown, S. Epstein,

[1] M. Balmer, K. Meister, M. Rieser, K. Nagel, and K. Axhausen.

[10] T. Ishida and M. Shimbo. Path Learning by Realtime Search.

[2] L. Navarro, F. Flacher and V. Corruble. Dynamic level of

[11] T. Yamashita, T. Okada and I. Noda. Implementation of

[3] E. de la Hoz, I. Marsa-Maestre, M. A. Lopez-Carmona, and

[12] Y. Nakajima, S. Yamane and H. Hattori. Multi-model based

[4] R. Kanamori, T. Morikawa, and T. Ito. Evaluation of special

[13] S. Kato, G. Yamamoto, H. Tai and H. Mizuta. Large-scale

[5] R. E. Korf. Real-time heuristic search. Articial Intelligence,

Вам также может понравиться

##