Академический Документы
Профессиональный Документы
Культура Документы
net/publication/4108063
CITATIONS READS
101 221
2 authors, including:
Daniel Mahrenholz
rt-solutions.de GmbH
24 PUBLICATIONS 281 CITATIONS
SEE PROFILE
All content following this page was uploaded by Daniel Mahrenholz on 04 June 2014.
Real−time simulation
Event execution
Event scheduling
the initiating node always expires before the delivery of the 2. Simulation Environment
CTS frame and the IEEE 802.11 protocol does not operate
correctly. Because of the acknowledgement/retransmission In this section we describe our experimental environment
scheme used in IEEE 802.11 such misleading results occur – the network setup and the simulator interaction with it.
even with RTS/CTS disabled. So, delays in the event exe- Then we present key concepts of ns-2, on which we base
cution in a real-time emulation may change the chronolog- our further investigations.
ical order of dependent events and falsify the execution of
the simulation model. 2.1. Network Setup
Wireless network
TAP transport
mechanism
Bridge firewall
warded through the bridge, i.e. direct transfers among the In the simulation model we have the following simpli-
UMLs. It is now the role of the simulator to connect the vir- fication: we ignore the overhead of the layers below the
tual machines together. It acts as an emulator of a wireless application layer in the virtual hosts and the overhead of
network among the virtual machines and uses the bridge de- the application layer in ns-2. Then we obtain a model of an
vice as a central point for reading and writing packets from application transmitting data over a wireless network. The
and to the virtual hosts. For this purpose, the network simu- overhead consists of the time spent when passing through
lator ns-2 uses its emulation facility. It mainly consists of the “ignored” layers and the size of the protocol headers,
network objects and tap agents. The network objects are added to the data packets. We measured the round-trip times
used to send and receive packets to and from a network. of ICMP echo request/reply packets over the virtual Ether-
The tap agents are application level processes on ns-2 nodes net network and over an emulated wireless network. The
that convert network packets between the simulated and the time spend in the virtual Ethernet was around 1.5% of the
real network. Each tap agent can be connected to at most round-trip-times in the emulated wireless network. So the
one network object. In our case we need a network object time overhead of the virtual Ethernet network and the pro-
to access a network device on the link layer. Because ns-2 tocol stack in the virtual machines is relatively small. The
(version 2.27) does not provide such a network object in its header overhead slightly affects the simulation because it
Linux version, we implemented a new one using raw net- adds to the size of a packet which is used to compute the
work sockets. transmission time of a packet inside the simulation. It can
In our model there is a one-to-one correspondence be- be avoided if we have enough information about the inter-
tween ns-2 nodes and virtual machines. So, we need a nal structure of the frames in the live network.
mechanism to map simulator nodes to virtual machines. For
this purpose we use the correspondence between the Ether- 2.2. Ns-2 Schedulers
net addresses of the virtual machines and the network layer
addresses of the ns-2 nodes. This mapping is implemented Ns-2 is a single threaded discrete event simulator. The
in our new tap agent. ns-2 scheduler maintains an internal virtual clock. The sim-
ulator objects use this virtual clock as a time reference. The
5 http://ebtables.sf.net scheduler also maintains a timely-ordered list of events and
processes them one by one. It takes the next earliest event which has a higher precision than the blocking. In the
from the list, advances the virtual clock till the firing time case it blocks, it wakes up with a delay dw in time.
of the event, and executes it till completion. Then the con- The proposed waiting algorithm aims at a higher
trol returns back to the scheduler to execute the next event. precision than the blocking method and a lower CPU
There are two basic scheduler categories that differ in the consumption than the pure busy waiting approach.
method used to advance the virtual clock – non real-time Instead of blocking for time tw , it blocks for time
and real-time. In a non real-time scheduler, the virtual clock (tw − dwmax ) and after that does busy waiting until
simply jumps between firing times of consecutive events. the event execution time. dwmax is chosen such that
The real-time scheduler in contrast tries to execute events in it is greater than a considerable part of the wake up
the actual moments in real-time. It uses the physical clock delays dw , and it is as small as possible. A greater
of the machine as a real-time reference. If the firing mo- dwmax increases the precision and CPU utilization at
ment of a next earliest event is in the future, the scheduler the same time because it causes a higher percentage of
waits until that moment in time. busy waiting.
3. Disk I/O reduction
3. Ns-2 Improvements The ns-2 simulator uses disk writes to store infor-
mation about occurred events. This is called event trac-
This section describes the changes we made in ns-2 to ing. It is implemented as a part of the event execu-
improve its real-time behavior and enables it to run a wire- tion. The ns-2 event tracing system maintains a mem-
less network emulation. Section 3.1 describes performance ory buffer to accumulate trace information before writ-
improvements, that increase the accuracy of the simulator’s ing it to the disk. However, the disk writes are still per-
virtual clock. Section 3.2 presents a time monitoring and formed as a part of the event execution, thus blocking
correction technique, that ensures the correct behavior of the simulator process for some time.
network protocols in real-time simulations with ns-2. Fi- We solve this problem by dividing the simulator
nally 3.3 gives an outlook on a distributed emulation setup. into two processes. The first one is a real-time prior-
ity process which only simulates and generates event
3.1. Performance Improvements information. The second one is a low priority process
that writes the event information to the trace file. These
Our performance improvements decrease the impact of two processes communicate through a non-blocking
system calls on the ns-2 process during simulation. We im- ring buffer in a shared memory segment.
prove the performance of the simulator with modifications In order to reduce disk write operations in the sys-
in the time measurement and waiting functions, and reduce tem during simulation, the reader process compresses
disk I/O operations during simulation. the data before writing it to the output file. It uses the
1. Time measurement zlib compression/decompression library 6 and cre-
The real-time scheduler in ns-2 uses the system call ates a gzip file in the main memory. Since the trace
gettimeofday() to synchronize event execution files contain many similar string patterns, the zlib li-
with the system clock. It is the most often used sys- brary can compress them considerably (up to 10-12
tem call during simulation – around 1800 times/sec times for our trace files). So, this approach signifi-
in a simple wireless simulation with two nodes. We cantly reduces the disk I/O during simulation or avoids
replaced this system call by a function that works it at all. The compressed storage area can be set as
completely inside the user space and uses the CPU large as needed – but it should not exceed the avail-
cycle counter which is available on many architec- able physical memory. Otherwise the operating sys-
tures. From this cycle counter and the CPU frequency tem has to swap memory pages to disk which is an
a timestamp is simply computed as T imestamp = unwanted I/O operation. If the compressed trace file
#Cycles
F requency . This approach requires platform specific
can not fit into the main memory I/O operations cannot
instructions (RDTSC for Pentium compatible CPUs) be avoided. These operations increase the system load
but avoids the context switch delays caused by the sys- and might preempt the simulator process. This would
tem call. So it saves time in the main loop of the sched- cause longer event execution times, bigger event dis-
uler and increases the accuracy of the virtual clock. patching delays and a smaller accuracy. A possible so-
2. Precise waiting lution to this problem is to stop the simulator process
Before executing the next earliest event, the sched- and all the processes, that are using the emulated net-
uler has to wait for tw . If tw is greater that a given work, for the time of flushing the buffer. Then the sys-
threshold tt (1ms by default), the scheduler blocks it-
self for time tw . Otherwise it performs busy waiting, 6 http://www.gzip.org/zlib
tem time could be turned back to the moment of “freez- would be planned for time (tE1 + ∆p1,2 ) where in a real-
ing”. This approach also requires changes in the simu- time simulation it is planned for time (tE1 +∆tE1 +∆p1,2 ).
lation scheduler to consider this time gap. Correctly scheduled events preserve the same chronolog-
ical order in real-time and non real-time simulations. If all
the events in a real-time simulation are correctly scheduled,
3.2. Time Correction then it will execute the same sequence of events as a non
real-time simulation with the same input.
The performance improvements, described above, in- Our technique enforces the execution of late events in
crease the accuracy of the real-time simulation in ns-2. the right moments in the virtual simulation time and en-
However, the delays in the event execution cannot com- sures that all events in a real-time simulation are correctly
pletely be eliminated. Furthermore, they are unpredictable scheduled. If the scheduler observes that E1 is late, it as-
under a high system load and complex simulations with a signs to the virtual clock the time instant tE1 , in which
large number of nodes. So, the intended behavior of net- the event should have been executed. Then it executes the
work protocols in ns-2 is still not guaranteed. In the follow- event E1 , and after its completion again updates the vir-
ing we describe a technique that ensures the correct execu- tual clock from the system time. So, for the simulator ob-
tion of the simulation model, which includes the simulated jects, the events happen exactly in the time moments they
protocols, even under a high system load. are planned. Then, the event dispatch delays does not inter-
Consider an event E1 which has to be executed in a real- fere with the time calculations and all the consecutive events
time instant tE1 . The scheduler continuously updates its vir- that are triggered by E1 , are correctly scheduled. If we as-
tual clock (V c) from the system time and waits for the mo- sume that E1 is the start of the simulation, then all the events
ment tE1 . If tV c is equal or greater than tE1 the simulator in this simulation are correctly scheduled.
executes the event with the delay ∆tE1 = tV c − tE1 . Re- This approach guarantees that a real-time and a non real-
gardless how fast we query the system clock, we always time simulations with the same input execute the same se-
obtain different values. Because of this high resolution it is quence of events at the same virtual time instants. Since the
unlikely to hit the exact time tE1 in the waiting loop and so simulator objects access only the virtual clock as a time ref-
events are always executed with a small delay. But there is erence, the behavior of the protocols in ns-2 does not differ
another, more critical reason for delays. The simulator can in both kind of simulations. Our approach has another very
only execute one event at a time. So, if the simulator is busy important effect on the actual execution of events. Com-
with the execution of an event at tE1 then E1 has to wait un- pared to the virtual clock, all events are correctly executed.
til the execution of the previous event has been completed. But compared to the system clock this is not the case. As
So it will be executed with a possibly much larger delay explained before, the simulator can execute only one event
than the one caused by the waiting loop. Independent from at a time. So, if the execution of two events overlap, the
the reason for a delay, the results are the same and shall be second one is delayed until the first is finished. Now as-
explained below. sume that E2 schedules a third event E3 with a difference of
Lets assume that another event E2 has to be executed ∆p2,3 . Then E3 is scheduled relatively to the virtual clock
after a time period ∆p1,2 after E1 (in fact in time instant for tE3 = tE2 + ∆p2,3 . Now, if the execution of E2 is fin-
(tE1 + ∆p1,2 )). The virtual clock of the scheduler is the ished before tE3 than E3 will be executed only with the de-
only time reference for the simulator objects. They assume lay caused by the waiting loop. If E2 is still running at tE3
that all the events are executed exactly in the planned mo- but the overlap time between E3 and E2 is smaller than be-
ments in time. Then, the event E1 will schedule the event tween E2 and E1 than E3 is less delayed than E2 . This
E2 for tE2 = tV c + ∆p1,2 = tE1 + ∆tE1 + ∆p1,2 . Here the shows that our approach is also capable to correct errors that
event dispatch delays interfere in the time calculations of are caused by the sequential execution of concurrent events.
the simulator objects. So the event E2 in a real-time simu-
lation is scheduled for a later time instant than in a non real- 3.3. Outlook on Distributed Emulations
time simulation. This can lead to changes in the chronolog-
ical order of events in a real-time simulation and so, falsify As explained in section 3 the increased precision also in-
the behavior of network protocols. creases the CPU utilization. Additionally an increased num-
We call an event in a real-time simulation correctly ber of virtual machines also increases the CPU utilization
scheduled, if it is planned for execution at the same time in- on the simulation host. This places strong restrictions on
stant, as if the simulation were non real-time. Let the event the scalability of the current setup. A solution that is cur-
E1 be the start of the simulation, planned for time 0. Then rently under development will create a distributed emula-
E1 is a correctly scheduled event. However, E2 is not a tion environment using the presented techniques. In a first
correctly scheduled event. In a non real-time simulation it step we are distributing the applications running inside the
virtual machines to physical machines or virtual machines
on different hosts. This highly increases the transmission % events per Event dispatch delays
time from the application to the virtual network. To han- interval
(original version)
30
a generic solution but to place one cluster per simulator in-
stance to minimize the synchronization overhead required
for the cluster gateways. 20
4. Simulation Experiments 10
ulated wireless network and a real wireless network. All time [microseconds]
the network stack. With only the time correction feature the
application running on top of ns-2 work as expected. But the
delay distribution clearly shows that we have nearly 20% of Figure 3. Scheduler accuracy – delay distri-
all events with a delay of 100µs or above and even some bution
events with delays of more than 1ms. This is more than the
threshold for the busy waiting and so can only happen if the
simulator is blocked by a system call for at least this time.
Real Time Time cor. +
We can see that about 50% of all events on the application
network correction perf. impr.
layer are delayed by more than 100µs. So we probably have
an impact on the application performance in these cases. In Bandwidth 74.02 58.37 75.43
the second diagram we see the delay distribution after ap- [pkt/s] (0.02) (±0.07) (±0.04)
plying the performance improvements. We now have an up- average 13459.57 14908.24 12738.53
per bound for the delay of 100µs. The most important re- RTT [µs] (2.76) (±23.17) (±4.05)
sult is that we now have very small delays for all applica- max RTT 77577 816030 595103
tion level events which significantly reduces the probability spread [µs]
of changes in the behavior of applications running on top of on-time 99.93 98.65 98.73
ns-2. [%] (0.05) (±0.02) (±0.02)