Академический Документы
Профессиональный Документы
Культура Документы
Proceedings of the Ninth IEEE International Symposium on Object and Component-Oriented Real-Time Distributed Computing
0-7695-2561-X/06 $20.00 © 2006 IEEE
Symbian Application (or server)
tive scheduler. In other terms, AOs cooperatively
Non-preemtive scheduling,
Event driven synchronization
multitask using an event-driven model: when an
mechanisms AO requests a service, it leaves the execution to an-
other AO. When the requested service completes,
Active
it generates an event that is detected by the ac-
Scheduler
Kernel tive scheduler, which in turn inserts the requesting
space
Thread Scheduler AO in the queue of the AOs to be activated. Non-
Time-sharing preemption was chosen to meet light-weight con-
Preemptive
Thread Active Object
Priority based scheduler straints, avoiding synchronization primitives such
as mutexes or semaphores. Moreover, AOs belong-
ing to the same thread all run in the same address
Figure 1. Symbian multitasking model space, so that a switch between AOs incurs a lower
overhead than a thread context switch. AOs non-
preemption characteristics make them not suitable
September 2005. Since phone’s failures are not for real-time tasks. On Symbian OS, real-time
frequent events, the data collected so far does not tasks should be rather implemented using threads
have the statistical significance needed to draw directly. The whole design constitute a good com-
conclusions. Nevertheless, looking into portions promise between real-time and light-weight design
of the log files produced by the logger, in section requirements.
5 we show the logger capabilities to detect failure A crucial aspect of interest for our activity is rep-
occurrences and to relate them with the state of the resented by panics. In the Symbian OS world, a
device. The logged data also allows to pinpoint panic represents a non-recoverable error condition
dependability bottlenecks and failures root causes, notified to the Kernel by either user or system ap-
to re-build failure dynamics, and to measure their plications. The panic information associated with
temporal characteristics. the event is a record composed of its category and
type. Once this event has been notified, the appli-
cation is killed by the kernel. As for panics no-
tified by system servers, the kernel might decide
2 Background and Related Research to reboot the phone to recover them, based on the
panic’s severity.
2.1 Symbian OS fundamentals
2.2 Related Research
Symbian [7] is a light-weight operating system
designed for mobile phones and carried out by The field failure data analysis of operating sys-
several leader mobile phone’s manufacturers. It tems is a well established research area. Exam-
is based on a hard real-time, multithreaded ker- ples are analysis of Windows NT [8, 16], Windows
nel that is designed according to the microker- 2000 [14], and Linux [6, 13]. Other studies char-
nel approach. Specifically, the microkernel pro- acterized failures of networked systems, network
vides simple, supervisor-mode threads, along with of workstations [15] and more recently, large-scale
their scheduling and synchronization operations. heterogeneous server environments [11]. Less
Moreover, the kernel offers basic abstractions, i.e., work has been profused in the field of mobile
address spaces and message passing interprocess distributed systems. An architecture for gather-
communication. All system services are provided ing and analyze failure data for the Bluetooth dis-
by server applications. Clients access servers using tributed systems has been proposed in [5], whereas
message passing kernel’s mechanisms. Examples in [10], data collection and processing for a cellular
of servers are the File Server, for files’ manage- telecommunication system have been addressed.
ment, the Window Server, for user interface draw- All these works exploit failure information stored
ing, and the Message Server for the Short Message into system event logs, automated reportings, or
Service (SMS) management. failure reports provided by specialized mainte-
The Symbian OS defines two levels of multitask- nance staff. In the case of smart phones devices,
ing: threads and Active Objects (AOs). Threads logging facilities are limited and not fully exploited
are scheduled by the OS thread scheduler, which is yet. In particular, the Symbian OS offers a par-
a time-sharing, preemptive, priority-based sched- ticular server (the flogger) allowing an application
uler. Moving up a level, multiple AOs run within to log its information. Yet, to access the logged
a thread (see figure 1). They are scheduled by a data of a generic X system/application module it
non-preemptive, event-driven scheduler, called ac- is necessary to create a particular directory, with
Proceedings of the Ninth IEEE International Symposium on Object and Component-Oriented Real-Time Distributed Computing
0-7695-2561-X/06 $20.00 © 2006 IEEE
a well defined name (e.g. Xdir). The problem is Although this classification is defined according to
that the names of such directories are not made a high level of abstraction, it represents an impor-
publicly available to developers, and are used by tant initial step towards the definition of the logger.
manufacturers during the development. Recently, a It is indeed necessary to know the nature of the fail-
tool called D EXC1 has been proposed to register ures in order to detect them and to relate their oc-
all panic events generated on a phone. However, currences with other information available on the
the tool does not relate panic events with failure phone. More insights about failure dynamics and
manifestations, running applications, and phone’s causes can be gained once a significant amount of
activity at the time of the failure. failure data will be available. As shown later, in
section 5, the logger represents a valid instrument
to collect such failure data.
3 Smart Phones’ Failure Model
Proceedings of the Ninth IEEE International Symposium on Object and Component-Oriented Real-Time Distributed Computing
0-7695-2561-X/06 $20.00 © 2006 IEEE
Files answers the questions posed in previous section.
Log
beats activity File runapp power
Question 1 is answered by asking the phone’s ac-
tivity to the Log Engine. The panic notifications
captured by the Panic Detection allow to identify
Kernel Db Log File
Appl. System user/system modules responsible for a self-reboot
Arch. Agent
System
or freeze, thus answering question 2. Finally, ques-
Servers tion 3 is answered by means of the Running Appli-
cation Detector.
Proceedings of the Ninth IEEE International Symposium on Object and Component-Oriented Real-Time Distributed Computing
0-7695-2561-X/06 $20.00 © 2006 IEEE
is the same in both the cases. However, they can Uncertainty Zones
High
Medium
Low
phone, or reboot duration. It is reasonable to state:
Proceedings of the Ninth IEEE International Symposium on Object and Component-Oriented Real-Time Distributed Computing
0-7695-2561-X/06 $20.00 © 2006 IEEE
Smartphone under
observation
Gateway
Workstation
Database
Server size (20 KB in the current implementation). When
(GW) the user is ready, the midlet can be used to send
the Log File to the tier 2, the Gateway Worksta-
tion (GW), via a Bluetooth connection. However,
logger, midlet database
GW software if the user’s phone or GW does not provide Blue-
tooth connection facilities, he or she can avoid to
use the midlet and can transfer the file via the serial
Figure 4. Distributed Data Collection cable usually used to synchronize the phone with a
Architecture computer.
The GW (tier 2) is a user’s computer connected
to the Internet. It runs our software to receive
the Log File via Bluetooth, and to send it to our
rent, fh starts from 0.05 Hz). The figure shows Database node (tier 3) using the Internet. To this
that the most critical application is the video call. aim, the user must authenticate himself/herself to
This could be expected as the video call use a the Database node. Again, if Bluetooth connec-
wide range of phone’s resources. From figure 3 tions are not available, the GW allows the user to
one could conclude that the best choice to meet select the Log File to send from his/her computer’s
precision and battery consumption requirements file system.
would be fh = 0.1 Hz or even lower. On the Finally, the tier 3 stores the received files on a cen-
other hand, low frequencies increase the uncer- tralized database, after checking the Log File for-
tainty. For example, fh = 0.1 Hz introduces an mat. The data collected on the database can then
uncertainty of 10 seconds on the measured TSR , be used to perform the field failure data analysis.
TM R , and TSH , which makes it hard to distinguish
one from the other. For this reason, on figure 3
we draw three qualitative uncertainty zones: high 5 Preliminary Results: Logger Capa-
uncertainty (fh ≤ 0.1 Hz), medium uncertainty bilities
(0.1 Hz < fh ≤ 0.33 Hz), and low uncertainty
(fh > 0.33 Hz). For the logger we deployed so far The logger has been implemented for
on actual phones, we chose fh = 0.33 Hz, in the several Symbian OS phones with differ-
medium uncertainty zone, since it represents an ac- ent user interfaces and APIs. It can be
ceptable trade-off between uncertainty (3 s.), pre- downloaded from the project’s web site:
cision (in the worst case, the average Δw is 28 ms., http://www.mobilab.unina.it/symdep.htm, along
hence the precision is 0.99), and battery consump- with the java midlet and the GW software for data
tion (the average I is 6.6 mA, hence, being for collection. The logger has a low memory footprint
example the battery capacity of the Nokia 6630 (16.1 KB) and, as for the files it produces, they
equals to 900 mAh, the stand-by time with the log- occupy at most 30 KB on the phone internal
ger running on the phone would be 136 hours, that memory.
is almost 6 days. This is acceptable if we consider At the time of writing, the logger was running
that the manufacturer declare a stand-by time from on 16 phones from September 2005. However,
6 to 11 days for the Nokia 6630). the data collected so far (i.e., about 100 failure
points) are not enough to achieve the needed
4.5 Distributed Data Collection statistical significance to perform dependability
Architecture measurements. More time, and more phones are
needed to achieve this goal. Therefore, in this
The logger has been deployed on actual phones section we prefer to focus on some significant
used by students, faculty and staff of our Univer- portions of the collected Log Files in order to show
sity. In order to allow them to easily transfer to the logger’s capabilities.
us their Log Files, without requiring them to spend The first Log File portion shown in figure 5 comes
money (e.g., by avoiding the use of SMS services from a Nokia 6680 device. Entries in the log
or data connections), we developed a data collec- are not timely ordered. This is due to the fact
tion architecture. The architecture is structured ac- that the Panic Detector starts to log from either a
cording to a 3-tier model (see figure 4). panic indication or a FREEZE or PWOFF event,
The first tier is the phone. In particular, we devel- and then it gathers and writes on the Log File all
oped a Java midlet for the phone using the Java 2 the related information from the other files (i.e.,
Micro Edition technology. The logger requests the activity and runapps), which could be registered
user to send the Log File when it reaches a certain before the panic (or FREEZE or PWOFF) event.
Proceedings of the Ninth IEEE International Symposium on Object and Component-Oriented Real-Time Distributed Computing
0-7695-2561-X/06 $20.00 © 2006 IEEE
Portion 1
19/10/2005 12:12:17
Category: KERN-EXEC
Panic Type: 3
19/10/2005 12:12:09 SysAp ScreenSaver Telefono Menu Messaggi Autolock SymDep
19/10/2005 12:12:15 Short message
19/10/2005 12:15:52 FREEZE Days 0 Hours 0 Mins 3 Secs 15
19/10/2005 12:12:09 SysAp ScreenSaver Telefono Menu Messaggi Autolock SymDep
19/10/2005 12:12:22 SysAp ScreenSaver SymDep
Portion 2
01/10/2002 15:36:51
Category: irSec
Panic Type: 666
01/10/2002 15:36:49 Total irRemote SysAp ScreenSaver Telefono Menu Autolock
SymDep
01/10/2005 15:41:03
Category: irSec
Panic Type: 666
Category: KERN-EXEC
Panic Type: 3
01/10/2002 15:40:52 SysAp ScreenSaver Telefono Menu Orologio Autolock SymDep
01/10/2005 15:42:24 PWOFF Days 0 Hours 0 Mins 1 Secs 16
01/10/2002 15:40:52 SysAp ScreenSaver Telefono Menu Orologio Autolock SymDep
01/10/2005 15:41:04 SysAp ScreenSaver SymDep
The portion reported in the figure is an example only 3 applications were active. We can thus argue
of a freeze. The log allows us to pinpoint the that one of the absent applications provoked the
Messages service (Messaggi in Italian) to be panic. Aided by the information provided by the
responsible of the panic, and thus, of the failure. Log Engine, we can conclude that the responsible
At the 12:12:17 of October 19, 2005, the logger application was the Messages service.
captured a type 3, Kern-Exec panic. Browsing the We can rebuild the failure dynamic as follows:
list of panic types and categories on the Symbian at 12:12:15 a short message is sent or received.
OS web site, we can obtain more details about the This causes the Messages service to fail and to
panic: “This panic is raised when an unhandled signal a type 3, Kern-Exec panic. The OS recovery
exception occurs. Exceptions have many causes, mechanism killed the failing service and caused a
but the most common are access violations caused, chain of events that brought the phone to freeze.
for example, by dereferencing NULL. Among The second portion in figure 5 is a further example
other possible causes are: general protection of the type of information that can be captured. It
faults, executing an invalid instruction, alignment shows a self-reboot caused by a third-party appli-
checks, etc.”. A few seconds before the panic, the cation, called irRemote, used to turn the phone into
Running Application Detector registered seven a universal, infra-red remote control. Data comes
active applications (SymDep is our logger). One from a Nokia 6600 phone. At 15:36:51, a type
of those applications signaled the panic which in 666, irSec panic is detected. This panic is specific
turn caused the phone to freeze. A FREEZE entry of the irRemote application, that is thus killed
is logged by the Panic Detector when the phone by the kernel. After a few minutes, at 15:41:03,
finished the reboot, at the 12:15:52; the reboot another irSec panic is signaled. This indicates
took several minutes, specifically 3 minutes and that, after the first failure, the irRemote application
15 seconds (TSR = 210s.), probably due to the has been launched again by the user. The second
fact that the user had to pull out the battery. By time, however, the failure is more severe, and also
subtracting TSR from 12:15:52, we infer that the causes a type 3, Kern-Exec panic, perhaps signaled
last ALIVE item was written at 12:12:37, i.e., 20 by a service used by irRemote. 6 seconds after
seconds after the panic. This could be the time the panic, at 15:41:08, the phone reboots itself
needed by the user to realize the phone were frozen (the reboot took 1 minutes and 16 seconds), as
and to decide to pull out the battery. Moreover, can be observed looking at the PWOFF entry. In
the Log Engine registered that 2 seconds before other terms, this time the OS recovery mechanisms
the panic, a short message was sent or received. killed irRemote and successfully performed a self-
Another interesting information is given by the last reboot in response to the unrecoverable Kern-Exec
line, which shows that 5 seconds after the panic panic.
Proceedings of the Ninth IEEE International Symposium on Object and Component-Oriented Real-Time Distributed Computing
0-7695-2561-X/06 $20.00 © 2006 IEEE
6 Conclusions and Future Work [3] A. A. Aziz and R. Besar. Application of Mobile
Phone in Medical Image Transmission. Proc. of
This paper presented the design of a logger ap- the 4th National Conference on Telecommunication
plication to automatically gather failure-related in- Technology, January 2003.
[4] A. Bondavalli and L. Simoncini. Failures Classifi-
formation from Symbian-OS-based smart phones. cation with Respect to Detection. Proc. of the 2nd
Even if tailored for the Symbian OS, the described IEEE Workshop on Future Trends in Distributed
technique can also be adopted on other platforms. Computing Systems, 1990.
Although the volume of data collected so far are [5] M. Cinque, F. Cornevilli, D. Cotroneo, and
not enough to achieve statistical significance, the S. Russo. An Automated Distributed Infrastructure
examples discussed in section 5 allow us to draw for Collecting Bluetooth Field Failure Data. to ap-
pear in Proc. of the 8th IEEE International Sym-
the following conclusions about the logger capa-
posium on Object-oriented Real-time distributed
bilities:
Computing (ISORC’05), May 2005.
• The logger enables the definition of a detailed [6] W. Gu, Z. Kalbarczyk, R. K. Iyer, and Z. Yang.
Characterization of Linux Kernel Behavior under
failure model for mobile phones, in that it al-
Errors. Proc. of the 2003 International Conference
lows to pinpoint causes, i.e., panics, that lead
on Dependable Systems and Networks (DSN’03),
to failure manifestations, and to rebuild fail- June 2003.
ures dynamics. [7] R. Harrison. Symbian OS C++ for Mobile Phones
Volume 2. Symbian Press, 2004.
• Since all failure manifestations come with a [8] R. K. Iyer, Z. Kalbarczyk, and M. Kalyanakrish-
timestamp, the logger makes it possible to nam. Measurement-Based Analysis of Networked
quantify the dependability of current smart System Availability. Performance Evaluation Ori-
phones, in terms of Mean Time Between Fail- gins and Directions, Ed. G. Haring, Ch. Linde-
ures (MTBF) and Mean Time To Recover mann, M. Reiser, Lecture Notes in Computer Sci-
(MTTR). More in detail, parameters such as ence 1769, Springer Verlag, 2000.
[9] T. Kubik and M. Sugisaka. Use of a Cellular Phone
the mean time between freezes or reboot, and in mobile robot voice control. Proc. of the 40th
the propagation time between causes (panics) SICE Annual Conference, July 2001.
and failures could also be measured. [10] S. M. Matz, L. G. Votta, and M. Malkawi. Analysis
of Failure Recovery Rates in a Wireless Telecom-
• The logger allows to pinpoint application or munication System. Proc. of the 2002 International
servers responsible for failures. In other Conference on Dependable Systems and Networks
terms, it allows to identify dependability bot- (DSN’02), June 2002.
tlenecks. [11] R. K. Sahoo, A. Sivasubramaniam, M. S. Squil-
lante, and Y. Zhang. Failure Data Analysis of
Future work will be devoted to the deployment of a Large-Scale Heterogeneous Server Environment.
the logger over more terminals and to the analysis Proc. of the 2004 International Conference on De-
of the collected failure data, in order to fully pendable Systems and Networks (DSN’04), June
exploit the logger capabilities and define a detailed 2004.
[12] A. Sekman, A. B. Koku, and S. Z. Sabatto. Hu-
failure model for smart phones.
man Robot Interaction via Cellular Phones. Proc.
of the 2003 IEEE Int. Conf. on Systems, Man and
Acknowledgments Cybernetics, October 2003.
This work has been partially supported by the fund for mo- [13] C. Simache and M. Kaâniche. Measurement-Based
bility of researchers, sponsored by the University of Naples
Availability Analysis of Unix Systems in a Dis-
Federico II - Ufficio Programmi Internazionali, and by the Ital-
ian Ministry for Education, University, and Research (MIUR) tributed Environment. Proc. of the 12th Inter-
in the framework of the FIRB Project “Middleware for ad- national Symposium on Software Reliability Engi-
vanced services over large-scale, wired-wireless distributed sys- neering (ISSRE’01), November 2001.
tems (WEB-MINDS)”. [14] C. Simache, M. Kaâniche, and A. Saidane. Event
Log based Dependability Analysis of Windows NT
and 2K Systems. Proc. of the 2002 Pacific Rim In-
References ternational Symposium on Dependable Computing
(PRDC’02), December 2002.
[1] V. Astarita and M. Florian. The use of Mobile [15] A. Thakur and R. K. Iyer. Analyze-NOW - An En-
Phones in Traffic Management and Control. Proc. vironment for Collection and Analysis of Failures
of the 2001 IEEE Intelligent Transportation Sys- in a Network of Workstations. IEEE Transactions
tems Conference, August 2001. on Reliability, 45(4):560–570, 1996.
[2] A. Avizienis, J. Laprie, B. Randell, and [16] J. Xu, Z. Kalbarczyc, and R. K. Iyer. Networked
C. Landwehr. Basic Concepts and Taxonomy Windows NT System Field Data Analysis. Proc.
of Dependable and Secure Computing. IEEE of the 1999 Pacific Rim International Symposium
Transactions on Dependable and Secure Comput- on Dependable Computing (PRDC’99), December
ing, 1(1):11–33, 2004. 1999.
Proceedings of the Ninth IEEE International Symposium on Object and Component-Oriented Real-Time Distributed Computing
0-7695-2561-X/06 $20.00 © 2006 IEEE