Вы находитесь на странице: 1из 134

Proceedings of 3rd Australian Computer, Network

& Information Forensics Conference

Published by
School of Computer and Information Science
Edith Cowan University
Perth, Western Australia

Edited by
Dr. Craig Valli and Dr. Andrew Woodward
School of Computer and Information Science
Edith Cowan University
Perth, Western Australia
Copyright 2005, All Rights Reserved
ISBN 0-7298-0612-X

Foreword
Dear Delegate
The 3rd Australian Computer, Network & Information Forensics Conference has seen an
increase in the type, range and quality of papers submitted for consideration for
publication and we accepted 17 papers from those submitted. All papers were double
blind peer-reviewed before acceptance into the conference for publication.
There are several definite strands of research and interest within the subject of computer
based forensics these include disk sanitisation, honeypots and discovery techniques. The
papers authors are drawn from a cross section of the forensics community from
practioners to academics.
Conferences such as these are simply not possible without willing volunteers who follow
through with the commitment they have initially made and I would like to take this
opportunity to thank the conference committee for their tireless efforts in this regard.
These efforts have included but not been limited to the reviewing and editing of the
conference papers, helping with the planning, organisation and exEdith Cowan
Universitytion of the conference.
To our sponsors who have contributed both financial and moral support by their
contributions and donations to the conference, who likewise conferences like this need.
Finally, to the administrative staff of the School of Computer and Information Science for
their contributions to the running of the conference.

Yours sincerely

Dr. Craig Valli


Conference Chair

Conference Committee
Assoc. Prof. Wojciech Kuczborski
Dr. Craig Valli
Dr. Andrew Woodward
Prof. Bill Hutchinson
Suen Yek
Chris Bolan
Catherine Bell

Executive Chair
Conference Chair and Co-Editor
Co-Editor
Committee Member
Committee Member
Committee Member
Committee Member

Conference Reviewers
Dr. Craig Valli
Dr. Andrew Woodward
Prof. Bill Hutchinson
Suen Yek
Chris Bolan
Patricia Williams
Chris Hu
Justin Brown
Prof. Matthew Warren
Dr. Jill Slay
Tom Waghorn
Andy Jones
Christian Frichot

Edith Cowan University


Edith Cowan University
Edith Cowan University
Edith Cowan University
Edith Cowan University
Edith Cowan University
Edith Cowan University
Edith Cowan University
Deakin University
UniSA
WA Police
British Telecom Labs
RPG

Conference Sponsors
Major Sponsors
SEdith Cowan Universityre Systems
For the Best Paper Award
Department of Premier and Cabinet
For the Best Conference Presentation
Edith Cowan University
Minor Sponsors
Visual Analysis
Thomson Prometric

INDEX
RADIO FREQUENCY IDENTIFICATION A REVIEW OF LOW COST TAG SECURITY
PROPOSALS..................................................................................................................................................1
802.11 DCF DENIAL OF SERVICE VULNERABILITIES ......................................................................8
MANAGING THE FALSE ALARMS: A FRAMEWORK FOR ASSURANCE AND
VERIFICATION OF SURVEILLANCE MONITORING.......................................................................15
SECURE DELETION AND THE EFFECTIVENESS OF EVIDENCE ELIMINATION SOFTWARE
........................................................................................................................................................................24
TURNING A LINKSYS WRT54G INTO MORE THAN JUST A WIRELESS ROUTER..................45
AFTER CONVERSATION - AN FORENSIC ICQ LOGFILE EXTRACTION TOOL ......................54
GOOGLING FORENSICS .........................................................................................................................62
HONEYPOT TECHNOLOGIES AND THEIR APPLICABILITY AS AN INTERNAL
COUNTERMEASURE ................................................................................................................................68
A UK AND AUSTRALIAN STUDY OF HARD DISK DISPOSAL........................................................74
AN INVESTIGATION INTO THE EFFICIENCY OF FORENSIC ERASURE TOOLS FOR HARD
DISK MECHANISMS .................................................................................................................................79
AN INVESTIGATION INTO LONG RANGE DETECTION OF PASSIVE UHF RFID TAGS ........84
INFORMATION GATHERING USING GOOGLE ................................................................................87
THE EFFECTIVENESS OF COMMERCIAL ERASURE PROGRAMS ON BITTORRENT
ACTIVITY..................................................................................................................................................108
BLACKHAT FINGERPRINTING OF THE WIRED AND WIRELESS HONEYNET.....................115
HOW TO BUILD A FARADAY CAGE ON THE CHEAP FOR WIRELESS SECURITY TESTING
......................................................................................................................................................................126

Radio Frequency Identification A Review of Low Cost Tag Security Proposals


Christopher Bolan
School of Computer and Information Science
Edith Cowan University
c.bolan@ecu.edu.au

Abstract
With the increased awareness of the potential of RFID technology across a range of applications and as a
new source of information emerge new points of attack. Due to cost issues most proposals focus on
securing the RFID reader rather than the tag. This paper investigates some of the low-cost proposals for
securing RFID tags from potential misuse through authentication and encryption.

Keywords
RFID, Tags, Hash Locking, Encryption

INTRODUCTION
Despite common misconceptions, Radio Frequency Identification (RFID) systems have been around since
the 1940s with the British using RFID like systems to distinguish between friendly and enemy planes
(RFID, 2005). Early evidence of the technology appears in Stockman (1948) which called for further
research into the remaining basic problems in reflected-power communication. While RFID research was
still in its infancy, the advent of bar-coding systems in the 1970s had a profound effect on a large number of
different industries and has become the most recognised and used auto-identification system today (Sarma,
Weiss & Engels, 2002).
This dominance of bar-coding systems may soon be at an end with the realisation that through the use of
RFID technology a range of new possibilities exist. RFID systems are currently used for such purposes as:
Animal Identification Systems (Neary, 2002) RFID sensors are embedded in animals to allow each animal
to be uniquely identified;

Product Tracking (Sarma et al., 2002) RFID sensors where attached to individual items to allow
real time product monitoring;

Long range access control of Vehicle systems (OnStar, 2005);

Prisoner tracking systems in gaols (Best, 2004) Prisoners in the Ross correctional facility in Ohio
will be required to wear wristwatch size RFID tags to allow movement tracking.

In addition to the implanting of Animals with RFID chips several plans for the use of RFID technology in
human subjects have been trialled (Kanellos, 2004). The implanting of RFID chips has several uses in areas
such as, preventing identity fraud, people based building access systems, and the storage of medical data.
Currently the Baja Beach Resort in Spain uses an implanted RFID chip as a way for their clientele to
purchase resort services without having to carry cash or other forms of verification (Jones, 2004; Baja
Beach Resort, 2005).
For the present RFID systems remain too expensive to completely penetrate all possible markets, with
typical transponders costing around US$0.50 US$1.00 (Sarma et al., 2002). However, with mass
production coupled with an open standard, supporters aim to bring the price down to around US$0.05
US$0.10 which would see RFID integration into almost every facet of life. Even if RFID technology is only
used to replace bar-coding systems, there are still over five million barcodes scanned daily (Weis, Sarma,
Rivest & Engels, 2003).

Proceedings of 3rd Australian Computer, Network & Information Forensics Conference

Page 1

RFID BASICS
Radio frequency identification (RFID) technology stems back to Faradays discovery that light and radio
waves were both forms of electromagnetic energy (figure 1). As previously stated, the first concrete step
towards the modern conception of RFIDs was made by Harry Stockman in his 1948 paper Communication
by means of reflected power (Stockman, 1948). Although it was not until 1973 the first direct patent on
passive RFID tags was lodged in America by ComServ (Cardullo, 2005). A timeline of major RFID related
events is shown in figure 1 (created from data in Engels, 2004).

Figure 1. Timeline of RFID related history


RFID tags now come in various shapes and sizes including stick on labels, tie-on tags, 3mm pellets, and
button disks although internally, they consist of a microcontroller and attached antenna embedded in a
protective material. Every RFID system consists of three major components (Sarma et al., 2002, p.3):

the RFID tag, or transponder, which is located on the object to be identified and is the data
carrier in the RFID system,

the RFID reader, or transceiver, which may be able to both read data from and write data to a
transponder, and

the data processing subsystem which utilizes the data obtained from the transceiver in some
useful manner.

The RFID transceiver emits a radio frequency carrier signal, when the transceiver is placed within range of
a transponder; the antenna of the transponder detects the electromagnetic field. The transponders
microcircuit then activates causing the antenna to fluctuate in a coded sequence in such a way to transmit its
encoded data. This transmission is then read by the transceiver and utilised by a data processing subsystem.
RFID Antenna
RFID systems employ two types of antenna, Dipole for the reception of electric fields and Loop, for
reception of magnetic fields. Loop antennas consist of one or more loops of conductive material. This type
of antenna uses a magnetic field known as an inductive or near field which loses its strength after a short
distance, thus is suited for RFID applications that allow the RFID transponders to be placed very near the
transceiver (figure 2).

Proceedings of 3rd Australian Computer, Network & Information Forensics Conference

Page 2

Magnetic (near) field

RFID
Reader
(Transceiver)

RFID tag
(Transponder)

Antenna

Microcontroller
Reader loop coil antenna
Tag loop coil antenna

Figure 2. Loop Antenna RFID Tag System (Adapted from Wild, 2005)
The near field signal that a RFID transponder emits is reduced in strength proportional to the distance
cubed, as with all near fields (Pope & Loukine, 2004, p.3). This gives a range approximately one sixteenth
of the carrier wave length, which would be around 1.38 meters for a 13.56 MHz tag (Cole, 2004; Wild,
2005). Dipole antennas (figure 3) consist of two conductive strips around one quarter of the wavelength of
the transmitted signal in length (Wild, 2005, p.8). They produce an electric far field in addition to a
magnetic near field which affords a greater broadcast distance as electric fields dissipate at a lesser rate than
magnetic fields. The actual dissipation of an electric far field is a quartering of the signal strength as the
distance from the antenna is doubled - 6dB per doubling of distance (Macleish, 2002).
Magnetic (near) field

Conductive Strips

Electric (far) field


RFID dipole tag (half actual size)

Antenna

RFID
Reader

Microcontroller chip
Reader dipole
antenna

Figure 3. Dipole Antenna RFID Tag System (Wild, 2005)

Active versus Passive Tags


RFID tags or transponders may be either passive or active. Passive tags have no on tag power and are thus
only able to use the electromagnetic energy transmitted by the transceiver to power the microcontroller. Due
to their reliance on transmitted power passive RFID tags have only small transmission areas ranging from a
few centimetres to around fifteen meters for the UHF tags. The price for passive tags is also comparably
small with prices varying from around $0.50 (US) to $5.00 (US) per tag.
Active tags have an additional power cell used to provide power to the RFID microcontroller. The inclusion
of a power source provides active tags with several advantages over their passive counterparts such as the
ability to receive lower power signals or to output stronger signals than would otherwise be possible. The
higher signal strength means that active tags are able to transmit over greater distances up to around 100

Proceedings of 3rd Australian Computer, Network & Information Forensics Conference

Page 3

meters. With the added benefits that the active tags bring, also comes a shelf life. Modern tag battery life
varies from one to ten years according to usage and data transfer settings. Also, while only slightly larger in
size than their passive equivalents, active tags are considerably more expensive ranging from twenty to
three hundred dollars per tag (Wild, 2005).

SECURING RFID TAGS


RFID tags are typically a silicon-based microchip. Functionality beyond simple identification upon request
may be achieved including integrated sensors, read/write storage, encryption and access control (Weis et al.,
2003). The downside to such operations is the increased production cost of the RFID tag away from the
ideal market penetration cost of $0.05 (US) - $0.10 (US) quoted earlier, thus RFID security is often focused
on reader security ignoring the obvious avenue of attack. Irrespective of the security focus, no single
security or encryption standard for tags or readers has been adopted and thus many systems remain insecure.

Sarma, Weis & Engels (2003) cite the problem of RFID tagged underwear and medicine as an example
where shoppers who had RFID purchases could be unwittingly revealing their preference in undergarments
and illnesses to an attacker. Other more plausible attacks such as corporate spying by scanning a rivals
inventory to gain a picture of stock levels and thus make inference on sales are also suggested (ibid). Thus if
RFID systems do not include a method of verification between tags and readers, they will accept all
communication as valid. Several proposed solutions to such problems have been suggested and while no
one proposal has been adopted, there have been four major solutions that allow for low cost tags.
Hash-Locking Tags
Weis et al.(2003) proffer a security solution while minimising tag cost in the form of a one way hashing
algorithm where each tag has a portion of memory reserved for a meta-ID. To lock a tag, the system writes
the hash of a random key to the meta-ID which is then stored by the authorised user along with the key in a
secure database. The tag then enters a locked state and will not respond until it is unlocked with the
transmission of the correct key, which it compares to the hash stored in its meta-ID. Once unlocked the tag
responds normally to any reader within broadcast range, the functionality of this approach is demonstrated
in figure 4.

Figure 4. Hash-Locking of RFID tags


Such a method would increase the difficulty of unauthorised reading of the contents of locked tags, however
once a tag is unlocked it remains as vulnerable as any other non hash protected tag until relocked. Weis et
al. (2003) do warn that such systems may be vulnerable to spoof attacks where tags are queried to gather
meta-IDs which are later provided to a legitimate transponder which replies with the unlock key that will
allow the access to a secured tag. While currently nothing would prevent such a spoof attack on the system
the occurrence of the actual spoof may be detected by the repeated lack of response of an acceptable tag
identifier by the spoofing attacker.
Minimalist Encryption Approach
Juels (2003) proposes a method dubbed minimalist cryptography which uses a small amount of re-writable
tag memory and the very limited tag computational power to implement the encryption. The method is built

Proceedings of 3rd Australian Computer, Network & Information Forensics Conference

Page 4

upon the assumption that an adversary may only interact with a given tag on a limited basis before that tag
is able inturn to interact in a protected manner with a valid verifier (Juels, 2003, p.8). This assumption
requires that every tag protected by the system is within range of a RFID reader unit that is out of range
from an attacker, and as such severely limits the practicality of the approach.
The approach relies upon RFID tags storing a short list of random identifiers (referred to as pseudonyms).
For each query the tag receives, it will transmit the next pseudonym in the list and then return to the
beginning of the list when the last pseudonym is transmitted. For each unique pseudonym ( i) there exists a
query key ( i) and an authentication key ( i). The valid transponder contacts a tag through the broadcast of
a query key ( i) unique to the relevant pseudonym ( i) after tag based authentication has occurred the tag
responds with ( i). As these values rotate and differ between each tag it reduces the probability that an
eavesdropping attacker will discover the entire range of ( i), ( i) and ( i) for an individual tag.

Additionally Juels (2003) suggests several ways of increasing the size of the pseudonym set on the tags
through time dependant pseudonyms and other such methods. However, each of these methods increase the
processing overhead required by an individual tag and thus makes tags more expensive. While not
unbreakable, such a security system would prevent or at least discourage attacks from casual parties who
would be the RFID equivalent of script kiddies.
Universal Re-encryption Approach
Saito, Ryou & Sakurai (2004) propose a system of re-encryption based upon the work of Golle, Jakobsson,
Juels & Szyddlo (2004) which does not require knowledge of a public key for re-encryption. In this system
transponders query RFID tags which reply with identification information via cipher text (C) encrypted
through universal re-encryption (ibid). The cipher text (C) is then used by the transponder along with a
private key (x) to query a database which stores data on the individual tag. The tag is then sent a newly
generated cipher text which is generated via the database for its next use.
Mathematically the protocol of the approach may be shown as (Saito, Ryou & Sakurai, 2004, p.882):

Secret key (x)

Public key (y = gx)

Cipher text (C = [( 0, 0); ( 1, 1)]) generated using a message m, a public key y and a random
number r = (k0, k1) so that:

0 = myk0
0 = gk0
1 = yk1
1 = gk1

This approach forms a PGP type encryption for RFID systems but is more vulnerable to attack than its
wired based counter part. Additionally such a standard would increase the cost of RFID readers and require
that all systems had a key database which if cracked would give an attacker full control over the RFID
system and all its tags.
Hopper Blum Authentication
Weis (2005) suggests that a form of human-computer authentication may be adapted to provide security in
RFID technology. While Weis (ibid) fails to explain how such a method would be implemented in an RFID
system the technique will be detailed as claims have been made that an implementation will be forthcoming
in the next year (Weis, 2003, p.4) and Weis himself is an notable researcher in this field. The aim of his
research is to adapt this approach to allow users of RFID systems to manually authenticate tags. While such
an application would be time consuming it might allow for the development of secure RFID tags that could
be personally secured by an individual rather than requiring a more expensive solution and thus should be
considered when looking at attack methods and vulnerabilities to examine if such proposals are viable.
The Hopper-Blum (HB) protocol is based on the learning parity with noise (LPN) problem which states:
Given an q n matrix A, where q is a polynomial of n in size, a q-bit vector z, and a noise parameter
(0, ), find an n-bit vector x such that |Ax-z| q (Weis, 2005, p.3). Despite the seeming mathematical
complexity of the definition the actual procedure for the HB protocol is based on simple calculations that

Proceedings of 3rd Australian Computer, Network & Information Forensics Conference

Page 5

may be done using mental arithmetic as demonstrated in Tollinger (cited in Weis, 2005) where a vending
machine was setup to dispense free soft drinks to students who could master the protocol.
The protocol is based on two parties sharing a random n-bit secret which will be referred to as x. If party A
wishes to check party B then a random challenge is sent in the form of a {0, 1}n. Both parties then
compute the boolean inner product a . x, denoted by a parity bit which we will refer to as z. Party B then
responds with z to party A who accepts only if both z values match. As a and x are random an unauthorised
source would still be able to guess the correct z value fifty percent of the time, to combat this Weis (2005)
suggests that the procedure be repeated a given number of times q. If this is followed then an intruders
chances of guess every answer correctly is reduced to 2-q.

While reducing the probability of guess based attacks the multiple challenge approach means that any
attacker passively scanning and capturing O(n) repetitions could easily calculate x through the use of
Gaussian elimination. This is tackled using the injection of false responses or noise into the z transmissions
by party B. If a pre-agreement exists then party B will only send the correct response for rounds (the
agreed fraction) and an incorrect response for the other responses. Authentication is only given if the exact
fraction of correct rounds q and incorrect rounds (|1 |)q are achieved.

Even this layer of additional complexity is not completely safe from attack; a passive eaves dropper has a
small probability of success if system operations are known i.e. if an attacker has prior knowledge of q.
That if all q challenges are captured and the responding parity bit z are stored in a matrix M then x may
possibly be solved using the form |Mx-z| q. Thus while not impossible to solve the security is increased
based on the difficulty of solving the aforementioned LNR problem. Weis (2005) estimates that for a key
space of n = 128 even the most efficient algorithm would need 256 computational steps
(72,057,594,037,927,936).

CONCLUSION
The four security proposals discussed have all suggested methods of securing RFID tags from unauthorised
access. The limitations due to cost on the computational power of RFID tags represents a real threat to the
security of RFID systems. Modern attacks on wireless networks are growing in sophistication and frequency
and have lead to a large body of cautionary literature on wireless use in critical infrastructure. RFID systems
are poised to replace many elements of critical infrastructure especially in product tracking systems and yet
no clear standard has been adopted for securing transmissions between RFID transceiver and transponders.
Through the consideration of security proposals, many vulnerabilities which have not yet been documented
in their own right may be discovered.

REFERENCES
Baja Beach Club. (2005). Retrieved 21/03/2005, from http://www.bajabeach.es/
Best, J. (2004). 44,000 Inmates to be RFID-Chipped. Retrieved 24/03/2005, from
http://networks.silicon.com/lans/0,39024663,39122811,00.htm
Cardullo, M. (2005). Genesis of the Versatile RFID Tag. RFID Journal, 2(1).
Cole, P. (2004). Fundamentals of Radio Frequency Identification. Retrieved 27/03/2005, from
http://autoidlab.eleceng.adelaide.edu.au/Tutorial/SeattlePaper.doc
Engels, D. W. (2004). RFID: The Technical Reality. Paper presented at the Radio Frequency IDentification:
Applications and Implications for Consumers, Washington.
Golle, P., Jakobsson, M., Juels, A., & Syverson, P. (2004). Universal Re-Encryption for Mixnets. In T.
Okamoto (Ed.), The Cryptographers' Track at the RSA Conference - CT-RSA (Vol. 2964, pp. 163178). San Francisco, California, USA: Springer-Verlag.
Jones, V. (2004, 07/04/2004). Baja Beach Club in Barcelona, Spain Launches Microchip Implantation for
VIP Members. Retrieved 21/03/2005, from
http://www.prisonplanet.com/articles/april2004/040704bajabeachclub.htm

Proceedings of 3rd Australian Computer, Network & Information Forensics Conference

Page 6

Juels, A., Rivest, R. L., & Szydlo, M. (2003). The Blocker Tag: Selective Blocking of RFID Tags for
Consumer Privacy. Paper presented at the 10th ACM conference on Computer and communications
security, Washington D.C.
Kanellos, M. (2004). Human Chips - More than skin deep. Retrieved 25/03/2005, from
http://news.com.com/Human+chips+more+than+skin-deep/2009-7337_3-5318076.html
Macleish, K. (1992). Why an Antenna Radiates. QST, 59-63.
Neary, M., & Yager, A. (2002). Methods of Livestock Identification (No. AS-556-W). West Lafayette:
Perdue University.
On Star. (2005). Experience On Star in Action. Retrieved 24/03/05, from
http://www.onstar.com/us_english/jsp/index.jsp
Pope, G. S., & Loukine, M. Y. (2003). Low Cost Single Chip RFID Beyond 130kHz. Retrieved 27/03/2005,
from http://autoidlab.eleceng.adelaide.edu.au/Tutorial/lowcost.pdf
RFID. (2005). Retrieved 21/03/2005, 2005, from http://en.wikipedia.org/wiki/RFID
Saito, J., Ryou, J.-C., & Sakurai, K. (2004). Enhancing Privacy of Universal Re-encryption Scheme for
RFID Tags. Paper presented at the Embedded and Ubiquitous Computing - EUC 2004, AizuWakamatsu City, Japan.
Sarma, S. E., Weis, S. A., & Engels, D. W. (2002). RFID Systems and Security and Privacy Implications. In
Workshop on Cryptographic Hardware and Embedded Systems (Vol. 2523, pp. 454-470).
Sarma, S. E., Weis, S. A., & Engels, D. W. (2003). Radio-Frequency Identification: Security Risks and
Challenges. CryptoBytes, 6(1), 2-9.
Stockman, H. (1948). Communication by Means of Reflected Power. Proceedings of the IRE, 1196-1204.
Weis, S. A. (2005). Security Parallels Between People and Pervasive Devices. Paper presented at the The
2nd IEEE International Workshop on Pervasive Computing and Communication Security - PerSec
2005, Kauai Island, Hawaii, USA.
Weis, S. A., Sarma, S. E., Rivest, R. L., & Engels, D. W. (2003). Security and Privacy Aspects of Low-Cost
Radio
Frequency Identification Systems. In D. Hutter, G. Muller, W. Stephan & M. Ullmann (Eds.), International
Conference on Security in Pervasive Computing -- SPC 2003 (Vol. 2802, pp. 454-469). Boppard,
Germany: Springer-Verlag.
Wild, K. (2005). 3D Asset Location for Mobile Devices Using Passive RFID Tags. Unpublished Masters
Proposal, Edith Cowan University, Perth, Western Australia.

COPYRIGHT
Christopher Bolan 2005. The author/s assign the School of Computer and Information Science (SCIS) &
Edith Cowan University a non-exclusive license to use this document for personal use provided that the
article is used in full and this copyright statement is reproduced. The authors also grant a non-exclusive
license to SCIS & ECU to publish this document in full in the Conference Proceedings. Such documents
may be published on the World Wide Web, CD-ROM, in printed form, and on mirror sites on the World
Wide Web. Any other usage is prohibited without the express permission of the authors.

Proceedings of 3rd Australian Computer, Network & Information Forensics Conference

Page 7

802.11 DCF Denial of Service Vulnerabilities


Steve Glass
Griffith University
s.glass@griffith.edu.au
Vallipuram Muthukkumarasamy
Griffith University
v.muthu@griffith.edu.ay

Abstract
Abstract This paper addresses denial of service vulnerabilities (DOS) in 802.11 wireless LANs that arise
from the 802.11 distributed coordination function (DCF). We demonstrate that the DCF is vulnerable to
equivalent DOS attacks at both the MAC and PHY layers. Denial of service attacks against the DCF are
easily staged by a single adversary and affect both infrastructure and mobile ad hoc networks. These
attacks are very effective and there are no workable counter-measures when using the standard MAC
protocol. When staged by a shrewd adversary these denial of service attacks reveal little information about
the attacker and provide almost no forensic evidence. We have demonstrated that 802.11 wireless LANs are
particularly vulnerable to denial of service attacks and should not be used where availability is essential.

Keywords
Wireless LAN, Denial of Service

INTRODUCTION
The security aspects of the 802.11 wireless LAN (WLAN) standard has been the subject of concern for
several years. Serious flaws in the access control (Borisov et al., 2001) (Arbaugh et al., 2002) and
confidentiality mechanisms (Walker, 2000) (Fluhrer et al., 2001) (Stubblefield et al., 2002) were soon
discovered and resulted in intensive committee activity to revise and strengthen the security-critical aspects
of the 802.11 MAC protocol (Cam-Winget et al., 2003).
An enduring security concern is the ease with which 802.11 wireless LANs can be subjected to a denial of
service attack. This paper discusses denial of service attacks directed toward the distributed coordination
function (DCF). The skills and resources required to mount such an attack are minimal - an ordinary
wireless network interface and an appropriate attack program.
Denial of service attacks have an impact beyond creating a major loss of service. Shrewd adversaries know
that one of the best ways to subvert a security system is to bring it into disrepute with the people who have
to work it Needham (1993). An adversary selectively conducting many low-profile denial of service
attacks may be able to discredit specific security measures and thus cause the network operators to lower
their defences.
The 802.11 DCF
The 802.11 wireless network standard defines the MAC protocol along with a number of distinct physical
layers (of the IEEE Computer Society, 2003) (of the IEEE Computer Society, 2004). A distributed
coordination function (DCF) is defined to control access to the shared physical transmission medium. An
optional point coordination function (PCF) builds on the DCF to provide centralised management of
medium access. The DCF has two components: a physical layer carrier-sense mechanism and a MAC layer
virtual carrier sense mechanism named the Network Allocation Vector (NAV).
At the physical layer the DCF requires each station to ensure the channel is not busy before transmitting.
When the channel is busy the station will wait for the end of the transmission before trying to send its
frame. To reduce the risk of collision the station will perform a back-off procedure before transmission. The
back-off combines a short random delay with a power-of-two exponential delay for each subsequent retry
attempt.

Proceedings of 3rd Australian Computer, Network & Information Forensics Conference

Page 8

At the MAC layer the NAV is employed to ensure that transmissions do not collide with those of hidden
stations. These are stations out of radio range of the transmitting station but within range of the receiver such as may commonly occur where different WLANs share a channel or otherwise in a mobile ad hoc
network.
The NAV can be thought of as a timer that counts down at a uniform rate and which must be zero before a
frame is passed to the PHY layer. Certain frame types allow for a duration field to specify a period of time
for which to reserve the channel. Stations receiving frames with a non-zero duration set their own NAV

from this value. This allows a station to reserve the channel in advance of transmission.
Figure 1: RTS/CTS/Data/ACK sequence
The NAV is used in conjunction with the RTS/CTS exchange as shown in figure 1. The first frame, the
request to send (RTS), serves to notify the receiver of the intent to transmit and specifies the maximum time
to reserve the channel. The receiver sets the remaining time in the clear to send (CTS) frame, as do the data
and acknowledgement (ACK) frames. Any station receiving one of these frames is required to set their own
NAV value to that contained in the frame header.
From this description it becomes apparent that the DCF offers a station that does not conform to the
protocol (a misbehaving station) a number of opportunities to conduct a denial of service attack:

Jamming the channel at the physical layer with a continuous transmission,

Enhancing conventional traffic flooding attacks by modifying the back-off procedure at the
misbehaving station,

Virtual jamming of the channel by repeatedly sending messages intended to keep other stations
NAVs at non-zero values.

The following sections describe the relationship to previous work, the setup of a denial of service
experiment conducted against a small WLAN and presents the results we derived from the experiment.

RELATED WORK
PHY Layer Attacks
The PHY layer attack described here is due to Wullems et al. (2004). It employs a firmware test mode
provided by certain wireless interfaces to repeatedly transmit a set bit-pattern without interruption. Such a
transmission may be thought of as a jamming signal that causes receiving stations to be unable to transmit.
This attack can be very effective but is tied to a specific physical transmission mode and requires that an
appropriate mechanism is present in the wireless network interface. So far, only the 802.11b mode has been
demonstrated to be at risk of attack. By way of mitigation, Wullems et al. (2004) suggest the use of
dynamically negotiated spreading sequences.

Proceedings of 3rd Australian Computer, Network & Information Forensics Conference

Page 9

Some denial of service attacks saturate a channel by sending large volumes of traffic between stations.
Modifying the back-off process at the misbehaving station should ensure that this station wins access to the
medium. Gupta et al. (2002) demonstrated that the exponential back-off procedure favours channel access
towards the station already generating the traffic. They identify this phenomenon as capture effect and
proposed a fair MAC layer to address this weakness.
MAC Layer Attacks
Denial of service attacks conducted at the MAC layer have the advantage of being independent of the
physical medium and are potentially be more energy efficient for the attacker.
Another paper addressing the detection and prevention of MAC layer denial of service attacks against the
DCF is that of Chen et al. (2003). They consider virtual jamming as a concerted denial of service directed at
the DCF by injecting RTS or CTS frames. This approach is based on simulation of the attack and they
identify a possible countermeasure in the form of NAV validation. NAV validation resets the NAV to zero
if the expected data frame does not commence within the appropriate time following an RTS/CTS
exchange. This makes use of a feature of the 802.11 standard which allows the NAV to be reset.
Bellardo and Savage (2003) demonstrated the viability of MAC layer denial of service attacks. They
demonstrated attacks against specific stations by injecting MAC layer disassociation and deauthentication
management frames into the network. They also simulated an attack against the DCF by injecting RTS,
CTS, ACK and data frames thus showing that such an attack could be effective. In practice, however, the
attack was unsuccessful and this was attributed to bugs in the wireless network interface firmware. As a
preventative measure Bellardo and Savage also propose a version of the NAV validation scheme.

EXPERIMENTS
The experiment we have conducted aimed to verify the effectiveness of denial of service attacks targeting
the DCF. For each of the attack types identified in the literature above, we staged a single adversary attack
against a small test WLAN and recorded our results. The test equipment consisted of two desktop
computers and two laptop computers (two using Windows XP, two using Debian Linux), an 802.11b/g
access point and a variety of network interfaces implementing the 802.11b and 802.11b/g standards. Our
equipment came from a variety of manufacturers and included a number of different wireless network

interface designs. The test network is shown in figure 2 in infrastructure configuration.

Figure 2: Infrastructure test network


Infrastructure networks have a single point of failure in the form of the access points (APs) through which
all stations communicate. A denial of service attack that targets an access point can be effective in shutting
down the whole basic service area. To ensure that we were not exploiting a weakness only found in our

Proceedings of 3rd Australian Computer, Network & Information Forensics Conference

Page 10

access point, we conducted the attacks against both ad hoc and infrastructure network configurations. The
test network is shown in figure 3 in an ad hoc configuration.

Figure 3: Ad hoc test network


The PC used for our adversary was used to generate a jamming signal and to inject MAC layer frames. To
assist in identifying problems in the environment, this computer was also used to run a packet sniffer.
Many of the wireless network interfaces we had available allow for both 802.11b and 802.11g operation and
so we selectively tested both modes of operation. Although it was possible to lock the Linux PC and the AP
to a specific operating mode, it was not possible to explicitly select this mode of operation under Windows
XP. We were able to approximate this mode of operation by setting the data transfer rate to either an
802.11b or an 802.11b rate.
PHY Layer Attack
We modified the HostAP Linux device driver to include a private configuration command to enable and
disable the continuous transmit firmware test mode (Int, 2002) of an 802.11b wireless network interface.
When enabled, the radio interface repeatedly transmits a specified sixteen-bit value until the test mode is
subsequently disabled.
A laptop computer equipped with the modified driver and the appropriate wireless network interface was
used to generate the jamming signal as required. This particular interface was equipped with a high power
(200mW) wireless network interface coupled to an external 8dB gain directional antenna.
For each of the stations in the test network (infrastructure and ad hoc) we generated traffic to all visible IP
addresses using a one hundred iteration PING. We recorded the minimum, maximum and average times
reported. We then engaged the continuous transmit mode at the attack PC and repeated the test.
Furthermore, to account for spurious environmental factors, we repeated this process three times.
MAC Layer Attack
A denial of service attack using the MAC layer offers the attacker several potential advantages:

Medium independence - the same attack should work against 802.11a, 802.11b and 802.11g
WLANs.

Energy efficiency - for battery powered equipment a MAC layer attack has the potential of
reducing the power consumption necessary for the attack.

To conduct a MAC layer attack requires the ability to inject frames directly. Bellardo and Savage discuss
several Black Hat tools in addition to their own approach, which exploited a race hazard in the wireless

Proceedings of 3rd Australian Computer, Network & Information Forensics Conference

Page 11

network interface firmware. We chose to use an Atheros AR5212 wireless network interface for which the
Linux device driver directly supports MAC frame injection.
A C program was used to perform the actual frame injection. This program transmits frames repeatedly and
allows for an optional user-specified delay between frames.
As with the PHY mode, for each of the stations in the test networks we generated traffic to all visible IP
addresses using a one-hundred-iteration PING. This was repeated for each of a number of frame insertion
attacks:

RTS addressed to the AP (CTS from AP),

RTS to the attacker (hence no CTS),

CTS to the attacker,

ACK to the attacker.

Each attack consists of repeatedly transmitting a frame which specifies the maximum duration of 32767 s .
Sending frames at an appropriate frequency ensures that the NAV can be continuously held at a non-zero
value. We collected data with no inter-frame delays and delays of approximately 10ms and 20ms. When
there is no delay between frames they are transmitted as fast as the wireless network interface allows essentially saturating the channel.

RESULTS
We were able to successfully demonstrate that the above attacks stopped our test networks completely in all
but two cases:
1.

The PHY layer attack against an 802.11g network.

2.

The MAC layer RTS attack would fail for certain network cards when the frame was directed at the
AP and there was a significant delay between frames.

As it was only these two limited cases where the attacks failed, we discuss these in more details below. In
all other cases, once the attack commenced, all of the stations in our test network were silenced. The work
stations quickly produced Destination Host Unreachable messages. After the attack ended, we found the
Linux PC recovered quite quickly, one of our XP computers would take a good while longer and the other
could not usually recover automatically.
MAC Layer Attack
Initial Results
An initial result was that we were able replicate the results of Bellardo and Savages denial of service
attack: certain wireless network interfaces did not appear to honour the MAC protocol. Interestingly it was
also only the devices with firmware derived from Choice Microsystems that were in error. The duration set
by the attacker in the frame header reserved the channel for 32767 s . Nevertheless these interfaces were
observed to emit frames within a few milliseconds of the RTS in violation of the MAC protocol. In this
respect our observations concur with those of Bellardo and Savage.
On closer examination we observed that our attackers wireless network interface was retransmitting the
control frame several times. The presence of the retry bit in the frame header indicates that the interface was
erroneously expecting an acknowledgement. Control frames such as RTS, CTS and ACK are never
acknowledged by an ACK frame themselves. Our frame injection software required modification to ensure
that the frame would not be re-transmitted.
The presence of retried frames explains why the NAV was not being set as we expected. To distinguish
between original and retried frames, a retry bit is set in the frame header. A station receiving such a frame
can, therefore, correctly determine that the sender cannot contact its correspondent. In this case resetting the
NAV is both an allowed and sensible precaution.
Final Results
With the above problem overcome, we were able to conduct effective denial of service attacks across all
modes and network types. The effect of reducing the inter-frame delay to zero would saturate the network
and guaranteed to bring all network configurations to a halt. This may, in part, be attributable to the capture

Proceedings of 3rd Australian Computer, Network & Information Forensics Conference

Page 12

effect of Gupta et al. . With increased delays between frames the attack was equally successful with only
one exception.
When RTS frames were addressed to the AP we expected that either the RTS or the CTS would silence
other stations. This is useful because the CTS will silence stations hidden to the attacker. To our surprise,
wireless network interfaces derived from the Choice Microsystems design appear to reset their NAV after
receiving the CTS from the AP.
An RTS directed at a non-existent station (and hence never answered by a CTS) would silence all interfaces
even though it should be possible to reset the NAV after the expected arrival of a CTS has elapsed. We
demonstrated this with delays of up to 20ms. Sending any of RTS, CTS or ACK had the same effect but the
CTS from the AP always results in these particular interfaces (one Orinoco-based the other Intersil Prismbased) resetting their NAV. Thus, not all wireless network interfaces respond consistently for this particular
attack.

CONCLUSION
The DCF is vulnerable to denial of service attacks at both the MAC and PHY layers. Such attacks are easily
mounted by an adversary of ordinary abilities and with access only to conventional wireless network
interface cards. The threat of a denial of service attack against the DCF as currently constructed is nearly
impossible to prevent. The DCF negotiates access to a shared public medium and must coordinate access
with untrusted parties to do so. For this reason, cryptographic techniques are unlikely to be effective as
preventative measures.
PHY layer attacks against 802.11 require the ability to produce a continuous jamming signal. Such modes
exist and are exploitable in DSSS network interfaces but are not known for the OFDM-based 802.11a and
802.11g modes. In spite of this, multi-mode network interfaces appear to be vulnerable unless locked to the
802.11g mode. The above study, therefore, demonstrates 802.11WLANs must be considered unsuitable for
any uses where guaranteed network availability is essential.

REFERENCES
Arbaugh, W.et.al (2002) Your 802.11 wireless network has no clothes. IEEE Wireless Communications ,
9(6):4451, Dec 2002.
Bellardo, J and Savage, S (2003) 802.11 denial-of-service attacks: Real vulnerabilities and practical
solutions. In Proceedings of the 12th USENIX Security Symposium , Washington D.C., Aug 2003.
Borisov, N. et.al (2001) Intercepting mobile communications: the insecurity of 802.11. In Proceedings of
the 7th annual international conference on Mobile computing and networking , pages 180189, New
York, NY, USA, 2001. ACM SIGMOBILE, ACM Press. ISBN 1-58113-422-3.
Cam-Winget, N. et.al (2003) Security flaws in 802.11 data link protocols. Communications of the ACM ,
46(5):3539, May 2003.
Chen, D.et.al (2003). Protecting wireless networks against a denial of service attack based on virtual
jamming. In The Ninth ACM Annual International Conference on Mobile Computing and
Networking (MobiCom 2003), San Diego, CA, USA, September 2003. SIGMobile, ACM.
Fluhrer, S. et.al, (2001) Weaknesses in the key scheduling algorithm of RC4. In S. Vaudenay and A.M.
Youssef, editors, Selected Areas in Cryptography: 8th Annual International Workshop, SAC 2001
Toronto, Ontario, Canada, August 16-17, 2001. Revised Papers, volume 2259 of Lecture Notes in
Computer Science, Heidelberg, NY, Jan 2001. Springer-Verlag. ISBN 0302-9743.
Gupta, V., et.al (2002). Denial of service attacks at the MAC layer in wireless ad hoc networks. In Military
Communications Conference (MILCOM 2002), pages 11181123, Anaheim, CA, USA, October
2002. The Institute of Electrical and Electronics Engineers, Inc.
IEEE(2003) LAN/MAN Standard Committee of the IEEE Computer Society. ANSI/IEEE Std 802.11,
Wireless LAN Medium Access Control (MAC) and Physical Layer (PHY) Specifications. The
Institute of Electrical and Electronics Engineers, Inc., 1999(r2003) edition, 2003.
IEEE(2004) LAN/MAN Standard Committee of the IEEE Computer Society. IEEE Std 802.11i, Part 11:
Medium Access Control (MAC) security enhancements. The Institute of Electrical and Electronics
Engineers, Inc., 2004.

Proceedings of 3rd Australian Computer, Network & Information Forensics Conference

Page 13

Intersil (2002), Intersil PRISM driver programmers manual. Intersil Americas Inc., Irvine, CA, USA, 2.30
edition, 2002.
Needham, R. (1993) Denial of service. In Proceedings of the 1st ACM Conference on Computer and
Communications Security, pages 151153, New York, NY, USA, 1993. ACM Press.
Stubblefield, A. et.al (2002) Using the Fluhrer, Mantin and Shamir attack to break WEP. In Ninth Annual
Symposium on Network and Distributed System Security, pages 1722. Internet Society, 2002.
Walker, J. (2000) Unsafe at any key size; an analysis of the WEP encapsulation. IEEE 802.11 Working
Group Document 00/361, The Institute of Electrical and Electronics Engineers, Inc., Oct 2000.
Wullems, C.et.al (2004) A trivial denial of service attack on IEEE 802.11 direct sequence spread spectrum
wireless LANs. In 2004 Wireless Telecommunications Symposium, pages 129136, Pomona, CA,
USA, May 2004. The Institute of Electrical and Electronics Engineers, Inc.

COPYRIGHT
Steve Glass and Vallipuram Muthukkumarasamy 2005. The author/s assign the School of Computer and
Information Science (SCIS) & Edith Cowan University a non-exclusive license to use this document for
personal use provided that the article is used in full and this copyright statement is reproduced. The authors
also grant a non-exclusive license to SCIS & ECU to publish this document in full in the Conference
Proceedings. Such documents may be published on the World Wide Web, CD-ROM, in printed form, and
on mirror sites on the World Wide Web. Any other usage is prohibited without the express permission of
the authors.

Proceedings of 3rd Australian Computer, Network & Information Forensics Conference

Page 14

Managing the False Alarms: A Framework for Assurance and Verification of


Surveillance Monitoring
Peter Goldschmidt
Information Management
UWA Business School
The University of Western Australia
pgold@biz.uwa.edu.au

Abstract
This article discusses methods to support assurance of surveillance monitoring; and compliance verification
knowledge management (CV-KM). The discussion includes aspects of primary monitoring systems; the
different environments in which they operate; the verification problem solving and decision making tasks;
the problem structure and the coordination of the review process to facilitate truth maintenance. Based on
the ALCOD prototype developed with the Surveillance Division of the Australian Stock Exchange, the
surveillance operation is considered a primary monitoring function with the analysis of the resulting output
the secondary monitoring function - the assurance component.

Keywords
Supporting compliance monitoring assurance, Primary surveillance systems (PSS), Compliance verification, Simple
environments, Complex environments, Suspected non-compliance event (SNCE), Secondary monitoring.

INTRODUCTION
This article discusses methods to support assurance of surveillance monitoring and compliance
verification knowledge management (CV-KM) to verify true positive alarms and manage the
false alarms. The discussion is based on a framework developed for the proof of concept
prototype (ALCOD). The prototype was successfully tested at the Surveillance division of the
Australian Stock Exchange where quantitative and qualitative information plus surveillance
analysts judgment is required to verify surveillance results in a complex environment. ALCOD
remains in prototype form. The stock exchange was chosen as it represents one of the most
complex and dynamic environment to test the construct. Current research is now being conducted
in domains such as compliance verification of the nuclear non-proliferation treaty, Goldschmidt
(2000), asset management and machine condition monitoring in the oil, gas and waste water
industries; and in the verification of results generated by continuous audit monitoring and fraud
detection systems. The latter two being are an extension to the stock market surveillance
verification research.
The discussion below includes aspects of primary monitoring systems; the different environments in which
they operate; the verification problem solving and decision making tasks; the problem structure and the
coordination of the review process to facilitate truth maintenance. The surveillance operation is considered
a primary monitoring function with the analysis of the resulting output the secondary monitoring function the assurance component.

MONITORING
Governments and commercial organizations typically use monitoring facilities which depend on data that
identifies source agents and their relationships, to detect and draw attention to possible anomalies and
potential non-compliance.

Proceedings of 3rd Australian Computer, Network & Information Forensics Conference

Page 15

The assurance of compliance monitoring requires decision support and appropriate domain knowledge,
relevant to the level of user, to manage the results of the surveillance. This is required in order to fulfill the
necessary and sufficient evidence verifying or refuting the generated alerts.
Examples of primary monitoring systems (PSS) range from standard data processing routines that ensure
internal control, such as data input, processing and output compliance, (Weber 1999) provides a
comprehensive discussion on these processes, to the monitoring of events transacted in more complex
environments, such as fraud detection, intrusion detection, data mining systems, international treaty
compliance and the like, via sophisticated statistical, artificial intelligence and neural computing techniques,
or hybrid combinations.
Assuring, verifying and managing PSS information quality and integrity is fundamental to the success of
modern information - dependent organizations. Concurrent with the need for surveillance is a need to
maintain personal privacy, due diligence, and accountability. (Cillufo, 2000).
Clarke (1988) highlights the inherent dangers of drawing conclusions resulting from the electronic
monitoring of data related to individuals and groups of individuals, and points out that a major problem in
dataveillance is the high noise to signal ratio, which may be misleading. Davis and Ord (1990)
acknowledge the problem of setting threshold levels in an ever-changing environment. With any set of
tolerance levels, deviant (even fraudulently motivated) behaviour may escape detection. Tightening
tolerance levels limits increases the likelihood that exception conditions will trigger an alert but also
increases false positive alerts since the number of instances that fall outside the tolerance increase. The cost
for the analyst (the decision-maker) to review the additional non-exception condition alerts must be assessed
in relation to the imputed value of identifying the additional true exceptions detected by more stringent
limits. (Davis and Ord 1990, 39 - 40).
Advances have, in general, reduced the problem of misleading results produced from 'noisy data', including
improvements in data processing and the increased use of sophisticated computational techniques such as
statistical, knowledge-based and artificial neural computational methods.
Typically, the systems described above are centered on the events being monitored and the events' source
agents. Their results however may still require human judgment to determine their validity. (Goldschmidt
2001). CV-KM systems act as a secondary monitoring facility supporting, verifying and assuring data and
information compliance by assisting in analyzing and categorizing exceptions, or results, generated by PSS.
CV-KMs assists in assuring the fulfillment of the necessary and sufficient evidence supporting (true
positive / negative) or refuting (false positive) hypotheses of non-compliance. The input to CV-KMs
requires the output resulting from the organizations domain specific PSS plus related information.
Operationally, the CV-KMs is an add-on component to the PSS.
What are Primary Systems?
Typically, these systems examine the integrity of transaction data as well as the entire transaction, or event,
to ensure compliance with predetermined conditions. An exception report identifies any variances. This
identification either fulfills the conditions of necessary and sufficient evidence and determines an instance
of non-compliance, or indicates possible noncompliance. In the latter case further evidence may be sought
to substantiate the hypothesis of non-compliance.
The function of PSS is twofold: identifying a variance, and producing and accumulating supporting
evidence. When both these conditions are met, the evidence points to the detective, corrective or
preventative actions required.
The detective function is fulfilled by recognition of the variance; correction can then be made to the data or
the event, which is then reprocessed. The preventative function is fulfilled by the recognition of the
variance resulting in the rejection of the event. Decision-makers must interpret ambiguous evidence to
determine what action is required, or if the non-compliant indicator is a true or a false positive directive.
Examples of PSS range from standard data processing routines that ensure internal control, such as data
input, processing and output compliance, to the use of sophisticated statistical (procedural) techniques,
artificial intelligent (declarative) techniques and neural (associative) techniques, or hybrid combinations. In
general, computational techniques are either demons or objects. (OLeary (1991; Vasarhelyi and Halper
1991). Demons are computerized routines that are instantiated by data or events received, as opposed to
being requested by some program. Demons add knowledge to a system without specification of where they
will be used ... like competent assistants they do not need to be told when to act (Winston 1977, 380). They
are data or event dependent, rather than program dependent, and provide intelligent self activation for

Proceedings of 3rd Australian Computer, Network & Information Forensics Conference

Page 16

monitoring data triggered by compliance threshold levels. OLeary points out that daemons have been
developed to monitor patterns for the purpose of auditing activities conducted on computer-based systems.
Vasarhelyi and Halper describe an alternate: CPAS, Continuous Process Audit System. CPAS allows for the
continuous audit of on-line systems by monitoring transactions to determine variance between monitored
information and expected information.
More recently extensible business reporting language (XBRL), as it is called, is gaining a critical following.
Regulators have begun encouraging business to start preparing financial statements in XBRL which
provides a standard structure that can be run through many types of analytical software. XBRL is like a
high-definition camera for financial results, as it shows all the wrinkles and bumps of a company's financial
results in ways traditional data delivery do not.
Issues such as anti-money laundering detection systems also use XBRL to spot suspicious financial deals,
and auditors also benefit from XBRL as it increases the opportunities to perform continuous auditing of
financial information. Continuous auditing involves the evaluation of transactions simultaneous to, or
shortly after, their occurrence. XBRL has the capability to populate auditor databases for immediate
evaluation by auditors and their automated tools.

THE PSS AND CV-KMS ENVIRONMENT


PSS and CV-KM can be classified by levels of complexity, characterized by their place on the simple or
complex environmental continuum in which they operate and the decisions required to determine instances
of non-compliance Constraints may take the form of an organizations predetermined policies and
procedures, needed to ensure data and event integrity, contractual agreements, and statutory requirements.
These constraints are not mutually exclusive and can be seen as bounds or threshold levels. The parameters
used to construct these levels may change with modifications to threshold requirements such as evolutionary
changes in constraints and changes in data and event requirements. A simple environment is so-called
because: 1) the threshold levels either seldom change or only change over the longer term; 2) the
identification of the variance fulfils the conditions of necessary and sufficient evidence to determine an
instance of non-compliance; and 3) the decisions, needed to determine if events comply, lie on the
structured to highly structured portion of the decision-making continuum. The degree to which the bounds
of the threshold levels are set, very narrow to very broad, determines the type of decision required. Under a
simple environment the bounds or threshold limits are narrow, characteristic of structured decisions such as
data input integrity and customer credit checks. Decision making in this environment is ex-ante, made of a
single step, and the constraints are all predetermined.
In a complex environment, decision making is ex-post, complex and may require multiple steps. Initial
monitoring uses a priori thresholds broader than in a simple environment, i.e. more granular and produces
exceptions that identify non-compliant events (SNCEs). Once these exceptions have been produced, the
decision-maker must substantiate true positive exceptions. This task must be broken down into smaller
components and sub-goals must be developed (Simon 1973), to identify, categorise and discard any false
positive exceptions. False negatives do not generate an exception, and allow possible suspect events to slip
through the surveillance sieve. If the threshold limits are stringent enough marginal false negatives could be
subsumed and later considered. Nevertheless, this would not necessarily reduce the occurrences of true false
negatives, as their characteristics may not be known. True positives are those exceptions that the decisionmaker has determined are indeed anomalous. Evidence for this decision uses the results of the initial
monitoring as well as important information related to the event, characterized a need for judgmental
expertise. Examples of these approaches to complex environments include: Byrnes et al. (1990), Major and
Riedinger (1992), Senator et al. (1995), Kirkland et al. (1999).
CV Problem Solving and Decision Making Tasks
Secondary monitoring problem solving, human evaluation of the exceptions produced by the primary
monitoring system, determines if a generated exception is feasible. This is similar to an analytical review
(AR) conducted by auditors, characterised by Libby (1985) as a diagnostic-inference process. Koonce
(1993) defines AR as the diagnostic process of identifying and determining the cause of unexpected
fluctuations in account balances and other financial relationships. Similarly, secondary monitoring problem
solving identifies and determines the causes of unexpected variances resulting from the primary monitoring
facility. Blocher and Cooper (1988) found that analytical review (AR) typically follows four distinct
diagnostic inference components: accumulation and evaluation of relevant information; initial recognition
of unusual fluctuations; subsequent hypothesis generation; and information search and hypothesis
evaluation.

Proceedings of 3rd Australian Computer, Network & Information Forensics Conference

Page 17

With CV-KM, accumulation and evaluation is guided by the results of the PSS. Subsequently an hypothesis
of the potential causes of the observed variance is generated. The diagnostic approach takes the form of
defeasible logic, which means that any inference made may be only tentative, as the inference may require
revision if new information is presented. The decision-maker must evaluate all possible legitimate reasons
for the occurrence of the variant. If none is found, the hypothesis of non-compliance is strengthened.
CV Problem Structure
Following Sol (1982), structuredness of the complex problem is twofold: the variance identification is the
structured component, and the accumulation of evidence supporting or refuting the non-compliant event
(NCE) hypothesis is the ill-structured component. The variance is typically the product of some algorithm
indicating a possible occurrence of NCEs, but in order to substantiate a true NCE the required accumulation
of evidence requires judgment of agent behaviour. The agents include the source of the event, the
identification of the source agents possible motivations, the environment in which the source agent is
operating and the impact this event may have on the environment.
Coordination: The Review Process to Facilitate Truth Maintenance
Coordination refers to the managing of interactions between multiple agents cooperating in some collective
task. Pete et al. (1993) show that optimal organizational design depends on the task environment and, as
with an audit team or group, is hierarchical. The objective is to reduce the problems discussed by Freedman
(1991), to reduce any potentially redundant activities conducted by the evaluating agents, and to increase
efficiency and effectiveness. The agent or agents may be human or machine based. Machine based or
independent software agents function as repositories of human opinions related to the event under scrutiny.
The process of review when evaluating judgments made on accounting data and information is well
established in the auditing literature (Libby and Trotman 1993, p. 559). To facilitate coordination,
evaluating agents should communicate their findings via a communication protocol. Communication
protocol establishes the means and modes of communication between agents. Information exchange can be
either via an implicit communication mechanism such as a common memory or blackboard (Hayes-Roth et
al. 1983), or via explicit communication mechanisms, such as message sending. Using the blackboard
approach, the SNCEs details plus the evaluating agents assumptions and results are posted. This facilitates
the more senior agents imposing their criteria on lesser agents results, as well as using their task specific
criteria to further refine the classifications.
Computerised decision support systems have been proposed and built to address some of the above
mentioned problems. A limited framework for a CV-KM intelligent decision support system using multiagent technology is presented in Chang et al. (1993) and Goldschmidt (1996, 2001).
CV-KMs Implementation
The technology used to develop the ALCOD prototype used a novel combination of existing proven object
oriented technologies. The components of the system include the databases, the graphic user interfaces
(GUIs) for human agents, expert systems technologies, the communication and coordination protocols.
Application domains currently being researched include: Monitoring (surveillance) systems such as
surveillance telemetry, machine performance monitoring, targeting, broad area surveillance, sensor outputs,
fraud monitoring etc. The primary monitoring systems produce alarms, some true alarms and some false.
These alerts then need to be verified and substantiated in order to reduce the analysts workload and allocate
resources judiciously. In order to verify the true alarms it is required to identify the false surveillance
alarms The CV-KMs analyses all alarms produced by the monitoring systems and identifies, verifies,
substantiates and categorises these alarms and generates prioritised actionable codes indicating what further
actions need to be taken. It also records audit trails of the alarm details and the identification, verification,
substantiation and categorisation done but CV-KMs. This information is encapsulated and associated with
the original alarm and its source.
To apply CV-KMS technology to existing monitoring infrastructure requires application specific:
Middleware to retrieve their monitoring system output and access to their reference databases; Input
interfaces; Alarm categories and types; Information used to verify alarm;
Classifications of alerts and actions to be taken; Knowledge acquisition (i.e. analysts check lists
and procedure manuals) of verification procedures (i.e. threat matrix verification); User interfaces;
Communication and Coordination procedures applied to protocol; Output specifications and interfaces; Cost
function if required

Proceedings of 3rd Australian Computer, Network & Information Forensics Conference

Page 18

Human analysts were used by ALCOD as evaluating agents, however if human judgment has been
previously recorded (as in policy guidelines, check lists, meta rules, past cases or human judgment recorded
in a distributed environment by on the spot human intel), then humans can be substituted by machine
agents.
General Description of the Model Behaviour
The compliance analysts primary goal is to evaluate all possible information that can repudiate the
hypothesis of non-compliance. ALCOD operationalises this goal by developing a set of appropriate
environmental and stock specific propositions derived from the premise set associated with the SNCE case
under review, the analyst's default assumptions (the RMs), the analyst's environmental and stock specific
assumptions (the LV associated with a positive response to the cues). Figure 1.0 illustrates the procedural
and declarative knowledge used by the model; the processes used to apply this knowledge, and the order in
which the processes are applied.
Detailed Description of the Model Behaviour
The following describes ALCOD's behaviour. The first step is the retrieval of the SNCE hypothesis.
Control rules on the blackboard retrieve this hypothesis from the output of the primary monitoring system.
Based on the SNCE type, the blackboard meta rules then select the Boolean cues appropriate to this SNCE
type. The hypothesis is screened for plausibility by using the positive Boolean responses1 and associated
LVs. The LV metrics are elicited from the analysts at run time and used to adjust the RMs to produce the
adjusted evidence at the atomic level. These evidence measures are then combined to form evidence chunks
which are further combined to produce a feasible SNCE classification. The classification's evidence values
are then ranked, and the results are summarised. The outcome is the SNCE classification plus supporting
evidence. This outcome and the updated coordination knowledge are then posted on the blackboard to
facilitate the review.

CONCLUSION
CV-KMs operates in highly complex environments, domains where the threshold granularity is high and the
decision making time factor is short may benefit from the decision support discussed. It is essential for
accountability that organisations in these domains ensure transactions identified as suspected NCE are
scrutinised and substantiated. This assists in minimising false positive conclusions that may result from the
speed, volume and increased complexity of transactions, and the information used to analyse them. CV-KM
also addresses some of the problems highlighted by Clarke (1988), that electronic monitoring of data related
to individuals and groups of individuals is subject to inherent dangers of drawing misleading conclusions
from this data. Assurance and compliance monitoring team infrastructure support includes aspects of
information systems, cognitive sciences, decision support and auditing judgment. Fuzzy set theory is
advocated in decision environments where there may be a high degree of uncertainty and ambiguity,
catering for qualitative and quantitative evidence validating and assuring the assertion of noncompliance.
Current research efforts in monitoring and assurance systems (UCD, 1996, SRI, 1999, Schneier B. 2001,
Roohani 2003, Denning (various) for example, still concentrate on improving the efficiency and accuracy of
primary monitoring systems. Whilst this is necessary, further research opportunities exist in addressing and
improving the utility and effectiveness of supporting the analysts responsible for evaluating the results of
these primary systems and ensuring their accountability. XBRL now also offers the opportunity to
standardise the transfer of operational information and surveillance output from the primary monitoring
systems to secondary monitoring facilities.
OBJECTIVES OF CV-KM

Adding functionality to primary monitoring infrastructure, without modifying primary


system.
Propose a framework for compliance verification knowledge management.
Provide for the decomposition of surveillance tasks.
Provide a consistent evidence evaluation and combination structure.

Proceedings of 3rd Australian Computer, Network & Information Forensics Conference

Page 19

Provide records of evidence from each stage.


Add value to surveillance operations by reducing the cost of surveillance monitoring,
assisting in surveillance accountability and providing transparency, when required,
thereby
contributing to surveillance governance and due diligence.
Employ a method that adds value to a generated exception by encapsulating and
associating the
event's attributes, its source agent's characteristics, the evaluating agent's analysis and
the
recommended remedial action plus the substantiating evidence.
Exploit an infrastructure support construct and secondary filter, allowing for
collaboration,
truth maintenance, audit trails and decision support, thereby facilitating decision
consistency
and greater processing volume.
Using the approach as a decision aid and secondary filter, analysis of results can then
be used
to review the analyst's decision-making processes and to refine the primary filter
tolerance levels.
Supports a structured, flexible and inclusive approach to surveillance analysis.
Adding a cost function to the surveillance-monitoring infrastructure can capture cost
benefit
trade-off.
Gaining insight from the knowledge acquisition component when setting up parameters
and
heuristics.
Assist in the development of an effective accountability structure.
Reduce distrust of surveillance monitoring systems, by reinforcing accountability,
transparency
and professionalism.
Highlight information deficiencies
Identifies resource allocation redundancies
Potential to conduct scenario analysis
Allow for Meta Domain peer review of results

Proceedings of 3rd Australian Computer, Network & Information Forensics Conference

Page 20

Overview of Selected Processes Used to Apply Knowledge


Environmental
knowledge
relevant
to the SNCE

[Process SNCE by SNCE]

Knowledge
specific to the
state associated
with the SNCE

Retrieve SNCE hypotheses

Knowledge of
cues
pertaining to
the SNCE
environment
[boolean
variables]

Heuristic selection of cues


appropriate to the type of SNCE

Knowledge of
cues specific to
the state
associated
with the SNCE
[boolean variables]

Heuristic selection of cues (LV)


based on positive boolean
responses
Knowledge of
cues
pertaining to
the SNCE
environment
[linguistic
variables]

Screen hypotheses for


Plausiblility

Knowledge of
cues specific to
the state
associated with
the SNCE
[linguistic
variables]

Evaluate adjusted evidence


relevance measures at the
atomic level

Knowledge of
the analysts
environmental
default
assumptions
[RMs]

Evaluate evidence
knowledge chunks

Knowledge of
the analysts
SNCE nonenvironmental
default
assumptions
[RMs]

Combine evidence and generate


intermediate propositions
Combine evidence
knowledge chunks producing
classifications (propositions)

Rank classifications
Summarise results

FIGURE 1.0: PROCEDURAL AND DECLARATIVE KNOWLEDGE


REFERENCES

Proceedings of 3rd Australian Computer, Network & Information Forensics Conference

Page 21

BLOCHER, E. AND J. COOPER (1988). A Study of Auditors Analytical Review Performance. Auditing:
A Journal of Practice and Theory (Spring): 1-28.
BYRNES, E. C. THOMAS, N. HENRY, AND S. WALDMAN (1990). INSPECTOR An Expert System for
Monitoring World-wide Trading Activities in Foreign Exchange. AI Review, summer 1990.
CHANG, A., A. BAILEY JR. AND A. WHINSTON (1993). Multi-Auditor Cooperation: A Model of
Distributed Reasoning, IEEE Transactions on Engineering Management, 20(4): 346-59.
CILLUFO, F. (2000), Cyber Attack: The National Protection Plan and its Privacy Implications, Journal of
Homeland Security, September,http://www.homelandsecurity.org/journal/,ANSER Analytical
Service, Inc, Arlington, VA,22206.
CLARKE, R. (1988). Information Technology and Dataveillance, Communications of the ACM, 31(5): 498512.
DE KLEER, J. (1986). An Assumption Based Truth Maintenance System, Artificial Intelligence, 28(2): p.
127-62.
DAVIS, S. AND K. ORD (1990). Improving and Measuring the Performance of a Security Industry
Surveillance System, INTERFACES, 20(5): 31-42.
DENNING (various) http://www.cs.georgetown.edu/~denning/index.html
FBI (1999). Digital Storm, FBI Annual Report (http://www.fbi.gov/programs/
lab/labannual99.pdf)
FREEDMAN, R. S. (1991). AI on Wall Street, IEEE Expert, 6(2): 2-7.
GOLDSCHMIDT, P. (1996). Compliance Monitoring for Anomaly Detection in a Complex Environment
Using Multiple Agents: Supporting Capital Market Surveillance. Ph.D. dissertation, The University
of Western Australia, July 1996. This was awarded The 1997 International Outstanding Doctoral
Dissertation Award: Information Systems section of the American Accounting Association eligible for
Ph.D. dissertations completed between the years 1994 -1996 inclusive.
GOLDSCHMIDT, P. (2000), CMAD and Integrated Safeguards, Invited presentation to The Australian
Department of Foreign Affairs and Trade, Australian Safeguards and Non-Proliferation Office Canberra,
July 2000.
GOLDSCHMIDT, P. (2001), Assurance and Compliance Monitoring Support, chapter 10 in Information
Security Management: Global Challenges in the Next Millennium, ed., GURPREET DHILLON, Idea
Group Publishing, pp. 135-154.
HAYES-ROTH, F., D.A. WATERMAN AND D.B. LENAT (1983). Building Expert Systems. Reading,
MA: Addison-Wesley.
KOONCE, L. (1993). A Cognitive Characterisation of Audit Analytical Review.
Auditing: A Journal of Practice and Theory, 12(Supplementary): 57-76.
LIBBY, R. AND K.TROTMAN (1993). The Review Process as a Control for Differential Recall of
Evidence in Auditor Judgements. Accounting, Organisations and Society, 18(6): 559-74.
MAJOR, J. AND D. RIEDINGER (1992). EFD: A Hybrid Knowledge / Statistical-Based System for the
Detection of Fraud. International Journal of Intelligent Systems, (7): 687-703.
OLEARY, D. (1991). Artificial Intelligence and Expert Systems in Accounting
Databases: Survey and Extensions. Expert Systems with Applications, 3, 143-52.
SENATOR, T., H. GOLDBERG, J. WOOTEN, M. COTTINI, A. KHAN, C. KLINGER, W. LLAMAS, M.
MARRONE AND R. WONGS (1995), Financial Crimes Enforcement Network AI System (FAIS),
AI Magazine, 16(4): 21- 39.
ROOHANI, S. J. (2003). Trust and Data Assurances in Capital markets: The Role of Technology Solutions.
PriceWaterhouseCoopers research monograph, ed. DR. SAREED J. ROOHANI, Bryant College, RI
02917.
SIMON, H. (1973). The Structure of Ill Structured Problems, Artificial Intelligence, 4(3-4): 181-201.
SCHNEIER B. (2001), (http://www.counterpane.com/msm.html).

Proceedings of 3rd Australian Computer, Network & Information Forensics Conference

Page 22

SRI (1999), http://www.csl.sri.com/%7eneumann/det99.html


TROTMAN, K. T. (1985). The Review Process and the Accuracy of Auditor Judgements. Journal of
Accounting Research, 23(2): 740-52.
UCD (1996), http://olympus.cs.ucdavis.edu/cmad/4-1996/slides.html
VASARHELYI, M.A. AND F.B. HALPER (1991), The Continuous Audit of Online Systems, Auditing: A
Journal of Theory and Practice, 10(1): 110 -25.
WINSTON, P. (1977), Artificial Intelligence. Addison-Wesley, Reading, MA.

End Note
Alert-KM Pty, Ltd. holds exclusive rights to all intellectual property rights to the CV-KM and CMAD business
process methods. This IP is covered by an Australian full patent No. 758491; Singapore full patent , 200106150-6;
patent pending Canada, 2,366,548, USA 09/958,513; Hong Kong 02106820.1; Patent Cooperative Treaty
International Application No. PCT AU00/00295 (20 countries; and US copyright TX 895-898.
Aspects of this paper were presented at the 2nd Information Warfare and Security Conference, Perth 2001; the
Information Systems Audit and Control Association conference, Auckland, New Zealand, Nov 2004. and the
Encyclopedia of Information Science and Technology (Five-Volume Set), Idea Group, Hershey 2005.
Compliance Monitoring for Anomaly Detection - CMAD - coined by P. Goldschmidt in 1995 has no relationship or
affiliation with Computer Misuse and Anomaly Detection previously coined by UC Davis.

COPYRIGHT
Peter Goldschmidt 2005. The author/s assign the School of Computer and Information Science (SCIS) &
Edith Cowan University a non-exclusive license to use this document for personal use provided that the
article is used in full and this copyright statement is reproduced. The authors also grant a non-exclusive
license to SCIS & ECU to publish this document in full in the Conference Proceedings. Such documents
may be published on the World Wide Web, CD-ROM, in printed form, and on mirror sites on the World
Wide Web. Any other usage is prohibited without the express permission of the authors.

Proceedings of 3rd Australian Computer, Network & Information Forensics Conference

Page 23

Secure Deletion and the Effectiveness of Evidence Elimination Software


Simon Innes
School of Computer and Information Science
Edith Cowan University
nme@arach.net.au

Abstract
This paper will discuss and analyse the different methods of wiping media to make them forensically clean.
This will include naming the tools, running them on a device and seeing what the device logically looks like
after it has completed. It will then follow on to analyse the effectiveness of software that is designed to
eliminate evidence (such as web browser history) from a computer. This analysis will take place on a small
FAT32 partition running Windows 98. The test environment will be limited to using only internet explorer.
The procedure will consist of installing a 'vanilla' test system, taking a bitwise copy and recording the md5.
Websites will be browsed and recorded and then the system will be imaged again. After this the software
will be installed and run and the 2 images will be compared. The main things that will be checked will be
the temporary internet files and the registry. This will be carried out with at least 2 separate pieces of
software.

Keywords
Forensic, wipe, secure delete, evidence elimination

INTRODUCTION
One of the exciting new areas that is developing in the industry of technology is the science of Computer
Forensics. As Computer Forensics is such a new area, there is much discussion regarding the correct
methods of implementing the tools associated with the field.. One of the topics within the Computer
Forensics field involves the recovery of data from formatted hard drives and the state of the storage on
which disk images are analysed. There are many tools available for use in recovering data from forensically
clean devices.
The main industry standard for cleaning a device has previously been DoD 5220.22 (Deutch, 2003). This
standard states that for a device to be classified as clean, it needs to be written over with three passes. The
first pass consists of all 0x00, the second pass is 0xFF and the final pass is random. As time has progressed
this standard has evolved into a more advanced, more secure criteria. DoD 5200.28-STD is similar to its
predecessor, different in that it runs seven passes rather than three and uses much more random input. The
pattern for this is to write 0x00, 0xFF, random, random, random 0xFF, 0x00 (Grinaker, n.d.)
Another method of secure deletion was a method created by Peter Gutmann in his paper Secure Deletion of
Data from Magnetic and Solid-State Memory (Gutmann,1996). This method indicates that eight passes
must be completed on a device, with the pattern 0x00, 0x55, 0xAA, 0xFF, 0x00, 0x55, 0xAA, 0xFF. The
data from this method should ideally be difficult to recover, due to it being overwritten five times. There is
no random pattern generation happening, thus it is a rapid method of cleaning a device.
There is an array of tools that currently exist to help make a device forensically clean. This paper will look
into several of these and outline the similarities and differences between them.
The first piece of software to be discussed is called Wipe (Wipe: Download, 2000). Wipe is a powerful
tool as it allows the user to wipe anything from a single file to an entire hard disk. Wipe will write random
data to the disk with a default of 4 passes. It is extremely configurable from command line options. It is
possible to change the number of passes made and even the random device or seed used. Wipe was run on a
test system and the device appearing as follows once finished:

00000000
./...3h..._..,.9....B...ic..o.....O.E..9.....H
_nx3..h..L..b..&.
00000040
O."{../..[.K....t..nh.........._.}..4>.i?.^.m..$K.BQ.Yv..D..
.b .

Proceedings of 3rd Australian Computer, Network & Information Forensics Conference

Page 24

00000080
.[....2.j....B..N=....|........u.aI~za.6O..U..(.4..}..y\..A.
.f.f
000000C0
g/.\..+.~..\..yt......@.K.#H......{^...&.c}}.......Hr.....W.
....
00000100
..:6....#.F3<6..0....D.qu...Q..]a..m.<...Qd`.8........O...j.
@Pjv

A simple, yet somewhat effective way of doing the same thing is to simply use dd. Dd can be used in having
the input file set to /dev/random and the output as the disk to be written to. As many passes as needed may
be performed on the disk. Here is a sample output from a single pass using dd:

00000000
%.^..,...A.1.7B..9...6u...k...%.0....T\....'T0%.b..X.'..&!..
....
00000040
&..Y.,..?.....=.Q>..p.X.C....6.>Is?...r.W.$....R^.....!...z.
.p..
00000080
D..Z...t.w.H<..0a.R.2k.yd...*R.9..#...z...Qu.I....3.p].."...
..,|
000000C0
.i.Wgk.=.....`V....%...o\x.%Ml.>...w.w....Q%...S.O......x..Z
.3.O
00000100
...O...`.<......Yy.q.V`J.".8oD..wr.{....yF.}.}.}.P.:.....q1`
...@
Reset (Grinaker, E. n.d.) is a piece of software designed to run from a floppy disk and boot a condensed
linux kernel to erase a device. Reset employs several different techniques for wiping a device. In the
simplest instance, Reset will set all bytes on the disk to be 0. Alternatively the aforementioned DoD 5220.22
standard or DoD 5200.28-STD may be used. The last option available in Reset employs Peter Gutmanns
method of data erasing.
The final piece of software to be analysed is a more commercial option. It is called GDisk (Symantec, 2005)
and was written by the company Symantec. GDisk was originally a tool that was designed for disk partition
management and recovery; however it also contains a disk wiping facility. In contrast to the other software,
GDisk has been designed to run in both a Windows and DOS environment. GDisk is command line based
and works by erasing a disk using DoD 5200.28-STD or by using a customized number of passes.
As demonstrated, most pieces of software designed to securely erase a disk follow similar methodology and
processes to achieve the desired outcome. The next section of this paper will look at tools that are designed
for removing specific items from a disk as opposed to wiping the entire disk.
As technology and the internet progress, companies develop software to handle different tasks. One of the
ever growing fields of software is evidence elimination software. This is software that is designed to
remove the traces of someones internet browsing history. There is a range of widely available adult and
illegal material on the internet, thus a market exists for this software, as web users may attempt to delete
their browsing history. This paper aims to analyse the effectiveness of various evidence elimination
software.
The research
Using a simple test environment, research was conducted to analyse the effectiveness of different evidence
elimination software. The process of the research and the outcomes will now be discussed.

Proceedings of 3rd Australian Computer, Network & Information Forensics Conference

Page 25

The test environment


The hardware for the test environment consisted of a Pentium 200MMX with 192MB of RAM and a 4GB
hard drive. The system had a floppy drive and a cd-rom and was connected to a network via an on-board
network interface card. The hard drive was broken into 2 separate partitions, one 500mb partition formatted
with FAT32 and a 3.5 GB partition formatted with an ext2 filesystem.
The software use in the environment consisted of Windows 98 for the purpose of executing the software.
The version of internet explorer was updated to 6.0. A distribution of the Helix live cd was used to execute
the analysis along with a copy of Spider for Windows which checks various locations for files such as
index.dat.
The Contenders
For the purpose of these tests, three pieces of software were chosen. All software was either freeware or had
a trial period. They are as follows:
A. Window Washer accessed from www.webroot.com Wash away all traces of your PC and Internet
activity and improve system performance.
B. History Kill 2003 accessed from www.historykill.com The #1 rated privacy tool on the internet!
C. History Swatter accessed from www.historyswatter.com Free Internet History Eraser!

This software was located through various searches on the search engine google.com.
The Process
To start with, a fresh installation of Windows 98 was made on the 500mb partition (we will call it hda1).
Internet Explorer 6 was then installed. Once the installation was completed, several web pages were loaded
via internet explorer and noted so they would be easy to find. Among these sites was a site of an adult
nature that spawned a many popups which potentially left a lot of data remaining. After the websites had
been browsed, the system was restarted and Helix was loaded. The 2nd disk partition was then created with
an ext2 file system (hda2):

[Helix (mnt)]# mkfs.ext2 /dev/hda2


mke2fs 1.35 (28-Feb -2004)
Filesystem label=
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
405600 inodes, 810432 blocks
40521 blocks (5.00%) reserved for the super user
First data block=0
25 block groups
32768 blocks per group, 32768 fragments per group
16224 inodes per group
Superblock backups stored on blocks:
32768, 98304, 163840, 229376, 294912
Writing inode tables: done
Writing superblocks and filesystem accounting information: done
This filesystem will be automatically checked every 39 mounts or
180 days, whichever comes first. Use tune2fs -c or -i to override.

Proceedings of 3rd Australian Computer, Network & Information Forensics Conference

Page 26

Wipe was then run on the file system with seven passes to ensure that it was forensically clean.

Once the device was clean, the windows partition was imaged. This was done using dd with the 500 MB
windows drive as the source and the file first-image on the 2nd partition being the destination:

dd if=/dev/hda1 of=./first-image;
1056321+0 records in
1056321+0 records out
540836352 bytes transferred in 469.378755 seconds (1152239 bytes/sec)
Once this was done, hash checks where made to ensure that the copy was identical to the drive itself:

MD5
caa25d3bd04b61941646073a5a568388 /dev/hda1
caa25d3bd04b61941646073a5a568388 /mnt/hda2//first-image
SHA1
e4764646153e81f1e7b8074f58dd441d30b77435 /dev/hda1
e4764646153e81f1e7b8074f58dd441d30b77435 /mnt/hda2/first-image
CRC32
2867881248 540836352 /dev/hda1
2867881248 540836352 /mnt/hda2/first-image
After backing up the original image, the first piece of software was tested.
Contender A: Window Washer

The first piece of software used was Window Washer. The most interesting thing about Window Washer
was that it was advertised as having a Bleach function that overwrites files with random characters and
makes them unrecoverable and undetectable to unerase software. Apart from that, items were simply
removed from the recycle bin, registry, windows temp files, index.dat, recently opened documents and
recently viewed pictures.

Proceedings of 3rd Australian Computer, Network & Information Forensics Conference

Page 27

Figure 1 A summary of the tasks that Window Washer has carried out.

Window Washer was installed and run without difficulties. All the things that were cleaned are visible on
the above screenshot (figure 1). Upon completion of this, Spider was run to investigate the various index.dat
files that windows creates. The output was as follows:

{\rtf1\ansi\deff0\deftab720{\fonttbl{\f0\fswiss MS Sans Serif;}{\f1\froman\fcharset2


Symbol;}{\f2\froman Times New Roman;}{\f3\froman Times New Roman;}}
{\colortbl\red0\green0\blue0;}
\deflang1033\pard\plain\f2\fs18 Spider Log File - Copyright (C) 1999 - Ward van
Wanrooij <ward@ward.nu>
\par
=====================================================
==================
\par Scanned c:
\par
\par Files Scanned:
\par c:\\WINDOWS\\Temporary Internet Files\\Content.IE5\\index.dat
\par c:\\WINDOWS\\Cookies\\index.dat
\par c:\\WINDOWS\\History\\History.IE5\\index.dat
\par
=====================================================
==================
\par URLs Found:
\par ***** Scanning c:\\WINDOWS\\Temporary Internet Files\\Content.IE5\\index.dat...

Proceedings of 3rd Australian Computer, Network & Information Forensics Conference

Page 28

*****
\par
\par
http://www.microsoft.com/isapi/redir.dll?prd=ie&clcid=0x0409&pver=6.0&ar=ienews&
os=98
\par http://v4.windowsupdate.microsoft.com/
\par
\par ***** Scanning c:\\WINDOWS\\Cookies\\index.dat... *****
\par
\par
\par ***** Scanning c:\\WINDOWS\\History\\History.IE5\\index.dat... *****
\par
\par
\par
=====================================================
==================
\par
\par }
After this, the registry was also searched to ensure that nothing could be found. This assumption was
accurate.

After the software had been run, Helix was booted and the analysis method took place. The image was made
using dd and the hashes were recorded.

MD5
e7608fb190debf9be92b2ad4849e6e38 /dev/hda1
e7608fb190debf9be92b2ad4849e6e38 /mnt/hda2/first-image-after-ww
SHA1
48e4f721e55dc4f1ee01fd2e7916098644395ebf /dev/hda1
48e4f721e55dc4f1ee01fd2e7916098644395ebf /mnt/hda2/first-image-after-ww
CRC32
922612215 540836352 /dev/hda1
922612215 540836352 /mnt/hda2/first-image-after-ww

Because the websites that had been browsed with this computer had been recorded, hexedit was used to
search through the image. Just searching for the names of the websites was enough to uncover some data
relating to them.

The first batch of data that was found appeared to be hidden in the windows registry. Interestingly, there
was no trace of it through regedit in Windows.

Note: The hexadecimals values have been removed as they irrelevant.

1033A4D4

...RegBackup].......]...........0..........,

Proceedings of 3rd Australian Computer, Network & Information Forensics Conference

Page 29

1033A500
1033A52C
1033A558
1033A584
1033A5B0
1033A5DC
1033A608
1033A634
1033A660
1033A68C
1033A6B8
1033A6E4
1033A710
1033A73C
1033A768
1033A794
1033A7C0
1033A7EC
1033A818
1033A844
1033A870
1033A89C
1033A8C8
1033A8F4
1033A920
1033A94C
1033A978
1033A9A4
1033A9D0
1033A9FC
1033AA28
1033AA54
1033AA80
1033AAAC
1033AAD8
1033AB04

.3bdd6b017b35029e,Software\Microsoft\Interne
t Explorer\Main,.w.......w...........0.map..
........B.3bdd6b017b35029e,1,HKCU,Software\M
icrosoft\Internet Explorer\Main,First Home P
age,w...}...w...........1001............Desc
riptionInternet Explorer............FileName
IEXPLORE.EXE............Version....v.......v
...........{89820200-ECBD-11cf-8B85-00AA005B
4383}.Restore............Version6,0,2800,110
6............Localeenj.......j...&.......{9E
F0045A-CDD9-438e-95E6-02B9AFEC8E11}.........
...Version1,0,2195,0............Locale*"....
..."...........Help_Menu_URLs...............
.....Discardable....................PostSetu
p(.......(...........Component Categories:..
.....:...&.......{00021493-0000-0000-C000-00
0000000046}o.......Q...........Toolbar......
......LinksFolderNameLinks....9-5E....Locked
....rder........9-5E....Locked............%.
..........RunMRU............awinipcfg\1.....
.......MRUListacb.......... .b\\192.168.0.3\
Public Drop Zone\1.....s.7....cregedit\1....
................Count............HRZR_PGYFRF
FVBA.v%..........3\P....HRZR_PGYPHNPbhag:pgb
e............................HRZR_HVGBBYONE.
.......@R.R................HRZR_HVGBBYONE:0k
1,125........@R.R........x...............Typ
edURLs............url1http://www.pornmovies.
com/............url2http://www.hotmail.com/.
...........url3http://www.sexpics.com/......
......url4http://www.msn.com................
....RunMRU............awinipcfg\1...........
.MRUListgfedacb.......... .b\\192.168.0.3\Pu
blic Drop Zone\1.....s.7....cregedit\1....mr
5.....d\\192.168.0.3\c$\1............e\\192.
168.0.3\d$\1............f\\192.168.0.101\\1.

The labels in this part of the disk image look like this;

{89820200-ECBD-11cf-8B85-00AA005B 4383}
Software\Microsoft\Internet Explorer\Main

This can be interpreted to mean that this information is stored somewhere in the registry. As can be seen,
the URLs that exist here have the label url1, url2 etc. in front of them, which would lead one to believe they
are to do with the visited websites.

Another interesting find after running Window Washer was this;

0D780440
0D78046C
0D780498
0D7804C4

............................................
....................REDR.....Q...tG.http://w
ww.pornmovies.com/..........................
............................................

Proceedings of 3rd Australian Computer, Network & Information Forensics Conference

Page 30

There were several of these REDR tags containing previously browsed URLs. Interestingly, there is
nothing else surrounding them but what appear to be blank areas on the image.

The next major find in this analysis was the actual presence of cookies. It is difficult to say where exactly in
a Windows environment these existed because doing a regular search for them did not return any results.

0E1C00E4
0E1C0110
0E1C013C
0E1C0168
0E1C0194
0E1C01C0
0E1C01EC
0E1C0218
0E1C0244
0E1C0270
0E1C029C
0E1C02C8

000&lc=1033&_lang=EN........URL ......*.....
.E...........................Q..`...h.......
...... .............J1.q....................
:2004101020041011: Simon@:Host: loginnet.pas
sport.com...................................
............................................
....................URL ....................
e1.q.................Q..`...h............. .
............J1.q....................:2004101
020041011: Simon@http://login.passport.net/u
ilogin.srf?id=2.............................
............................................

The strangest and probably the most unexpected find in the image was the presence of actual HTML code.
Again, a windows search showed nothing existed on the FAT partition. This works towards disproving
Window Washers theory of Bleaching.

0D73EBF0
0D73EC20
0D73EC50
0D73EC80
0D73ECB0
0D73ECE0
0D73ED10
0D73ED40
0D73ED70
0D73EDA0

.....+.....0..... . .<html>.<head>.<title>The I
nternet Movie Database (IMDb)</title>.<meta name
="description" content="IMDb">.<meta name="keywo
rds" content="movies,films,movie database,actors
,actresses,directors,hollywood,stars,quotes">.<l
ink rel="stylesheet" type="text/css" href="http:
//i.imdb.com/imdb.css">.<link rel="stylesheet" t
ype="text/css" href="http://i.imdb.com/sok.css">
.<link rel="icon" href="http://i.imdb.com/favico
n.ico">.</head>.<!-- h=imdb-online-2108.iad2.ama

A block of rather unusual data was located and is displayed below. This data is believed to be the outcome
of Window Washers bleach function.

0E4605DC
0E460608
0E460634
0E460660
0E46068C
0E4606B8
0E4606E4
0E460710
0E46073C
0E460768
0E460794
0E4607C0
0E4607EC
0E460818
0E460844

............................................
....+&1.<<<<.;;;.<<<...<<<<...:::..<<<<<<..:
::::...<<<.:::...::..<<.::<<<<<<<<<<<..:::::
..<<.::<<<<<<::.<<..<<<<<<<<<<<<<<<<.;;;;;;.
.<<<<<<<<<.;;.......;;;.<<;;;;;;;;;;;;.<<<.;
;;;;;;;;.<8.;;;;;;....;;;;;;....<<<<<...;;;;
;;;;;;;....".<<<.;..<<<<<<<<<<<.:::::::.<<<<
.:::::::::.<<.::....::...<<.::<<<<<<<<<..:::
:::::.<<.::<<<<<<::.<<..<<<<<<<<<<<<<<<<.;;;
;;;..<<<<<<<<<.;;;.....;;;..<<<...;;;;...<<<
<<<<<<;;;...<..;;;;;.<<<<<<..;;;;....<<<..;;
;;;;;;;;;;;;;.... <<<<<...<<<<<<<<<....<..::
.<<.:::......:.<<.::.<<.::.<<<<.::<<<<<<<<<.
::........<<.::<<<<<<::.<<..<<<<<<<<<<<<<<<<
.;;;;;;...<<<<<<<<<.;;;;;;;;;;..<<<<<<;;;;..

Proceedings of 3rd Australian Computer, Network & Information Forensics Conference

Page 31

0E460870

<<<<<<<<<<;;;...<.;;;;.<<<<<<<<<<..;;;...<<.

Another point of interest was the location of data that appeared to be an actual HTTP request. The data
below shows the HTTP/1.1 200 OK, which can be associated with HTTP requests. It was unknown at the
time that windows logged such information.

07E377E8
07E37818
07E37848
07E37878
07E378A8
07E378D8
07E37908

........................LEAK............`.1.u...
........................`...h...........A ......
~.......71Ep........71Ep........http://www.imdb.
com/....imdb[1].HTTP/1.1 200 OK..Cneonction: clo
se..Content-Type: text/html..Content-Encoding: g
zip...Content-Length:11910.....~U:simon......
................................................

Foremost is a program which has the ability to retrieve files from a raw disk image. After locating the
actual html code within the disk image, foremost was tun to investigate whether it was possible to recover
any files. Foremost took roughly ten minutes to analyse the image, in that time several html pages that had
been browsed and allegedly deleted were located.

Upon completion of the analysis of Webroots Window Washer, it would seem that the tool deletes all
history and keeps it well hidden from a standard Windows environment. This would prove useful if the user
examining the system only searched for the data in Windows, however a more thorough forensic
investigation would reveal remaining evidence. It can be concluded that due to the recovery of HTML on
the disk, this product is second rate and would not be value for money if purchased.

Contender B: History Kill 2003

The second piece of software used in this research was History Kill 2003. History Kill has an interesting
feature called a File Shredder which, according to the website from which the software is available, will:

Encrypt and overwrite your web surfing tracks 21 times (or more) so that no one can undelete or recover
your web tracks! HistoryKill defeats forensic software used by the US Secret Service, Customs Department
and LAPD! (History Kill, 2005)

The amount of times History Kill overwrites this data is configurable before running. For the purpose of this
exercise, twenty one, the default, was chosen.

Proceedings of 3rd Australian Computer, Network & Information Forensics Conference

Page 32

Figure 2 HistoryKills options for Internet Explorer and MSN Privacy

Figure 3 - HistoryKills options for Windows Privacy.

Proceedings of 3rd Australian Computer, Network & Information Forensics Conference

Page 33

Figure 4 - HistoryKills options for file shredding.


As you can see from the above screenshots, (figures 2 4) History Kill removes quite a range of data from
different locations. You can also specify how many times to shred certain data.

Figure 5 HistoryKills progress during an execution.

Figure 6 HistoryKill after the execution has completed.


After running the program History Kill, Spider was again executed to examine what remained on the
system.

\deflang1033\pard\plain\f2\fs18 Spider Log File - Copyright (C) 1999 - Ward van


Wanrooij <ward@ward.nu>
\par
================================================================
=======
\par Scanned C:\\WINDOWS

Proceedings of 3rd Australian Computer, Network & Information Forensics Conference

Page 34

\par
\par Files Scanned:
\par C:\\WINDOWS\\Temporary Internet Files\\Content.IE5\\index.dat
\par C:\\WINDOWS\\Cookies\\index.dat
\par C:\\WINDOWS\\History\\History.IE5\\index.dat
\par
================================================================
=======
\par URLs Found:
\par ***** Scanning C:\\WINDOWS\\Temporary Internet Files\\Content.IE5\\index.dat...
*****
\par
\par http://www.imdb.com/
\par http://www.imdb.com/
\par http://ia.imdb.com/media/imdb/01/I/21/62/48.jpg
\par http://ia.imdb.com/media/imdb/01/I/31/62/48.jpg
\par http://ia.imdb.com/media/imdb/01/I/94/09/38.gif
\par http://i.imdb.com/sok.css
\par http://ia.imdb.com/media/imdb/01/I/41/62/48.jpg
\par http://i.imdb.com/imdb.css
\par
================================================================
=======
\par
\par }
\par ***** Scanning C:\\WINDOWS\\Cookies\\index.dat... *****
\par
\par Cookie:simon@msn.com/
\par Cookie:simon@microsoft.com/
\par Cookie:simon@www.ultravideos.com/s6/
\par Cookie:simon@imdb.com/
\par Cookie:simon@maxserving.com/
\par Cookie:simon@atdmt.com/
\par Cookie:simon@www.imdb.com/
\par Cookie:simon@advertising.com/
\par Cookie:simon@servedby.advertising.com/
\par Cookie:simon@ultravideos.com/
\par Cookie:simon@pornoground.com/
\par
\par ***** Scanning C:\\WINDOWS\\History\\History.IE5\\index.dat... *****
\par
\par http://www.pornmovies.com.20041010.nudity.com/main.php
\par http://www.msn.com
\par http://www.ultravideos.com/s6/video002.html
\par http://www.pornmovies.com.20041010.nudity.com
\par
\par
================================================================
=======
\par

Proceedings of 3rd Australian Computer, Network & Information Forensics Conference

Page 35

\par }

The output from Spider has been truncated as it located all sites that had been browsed on this machine.
History Kill appears to have missed one important factor when deleting these files, as index.dat files have
been located in the 3 areas that Spider checks. Another point to note is that there are cookies listed here for
websites that were never browsed. It would appear that if one site is visited which is linked to another, the
cookies will be created.
Although the use of Spider proved that History Kill doesnt remove everything, a forensic analysis was still
carried out on the device to see if anything unusual or interesting happened. The hashes for this disk image
look like this:

MD5
247caafd62369d427018e042013173ff /dev/hda1
247caafd62369d427018e042013173ff ./after-hk2k3
SHA1
626d67ef331a09f0ec0f81d2eafdb49caf2919e2 /dev/hda1
626d67ef331a09f0ec0f81d2eafdb49caf2919e2 ./after-hk2k3
CRC32
3899082824 540836352 /dev/hda1
3899082824 540836352 ./after-hk2k3
History Kill appeared to leave all the same tracks that Window Washer did. There were URLs found that
started with the REDR tag;

0D780458
0D780480
0D7804A8
0D7804D0

........................................
REDR.....Q...tG.http://www.pornmovies.co
m/......................................
........................................

More cookies were found;

0E1C0454
0E1C047C
0E1C04A4

....................:2004101020041011: S
imon@http://www.pornmovies.com.20041010.
nudity.com..............................

And traces of HTML code were located;

0E316BF0
0E316C18
0E316C40
0E316C68
0E316C90
0E316CB8
0E316CE0
0E316D08

U....)E..{..t:.{<HTML>.<HEAD>.<TITLE>:::
PornMovies.com :::Ultra Videos!!!</TIT
LE>.<meta NAME="description" CONTENT="Of
6fers porn movies, live porn, sex picture
s, movies, and more.">.<meta NAME="keywo
rds" CONTENT="sex,porn,porno,adult video
s,xxx,porn movies,xxx videos,adult movie
s">.<style>.a:hover {color:#FF0000}.TD{f

Diagnosis: Whilst History Kill appears to delete internet history, it leaves remnants of the history to be
found, This software would be recommended for amateur use, such as in a domestic setting, but would be
ineffective in a professional environment where a user doing the search would know to carry out a more
thorough analysis,

Proceedings of 3rd Australian Computer, Network & Information Forensics Conference

Page 36

Contender C: History Swatter


The third piece of software to be analysed is called History Swatter. History Swatter was created by a
company called Fun Web Products, and was accessible from the web, presented as an advert in a popup
window. Within the content of the website, History Swatter claims to remove the following items:

Document History
Temporary Files
Disk Temp Files
Clipboard
Memory Dump File
MS Download Temp Folder
Run History
Start Menu Click History
Start Menu Order History
Page (Swap) File
Address Bar History
AutoComplete
Index.dat
Media Bar History
Temporary Internet Files
History (Visited Sites)
Cookies
Unlike the initial two tested programs, History Swatter does not boast any special deletion techniques. The
only feature unique to History Swatter is that it includes Fun Web Products own Internet Explorer Toolbar.
This is a feature indicative of low grade software.

Proceedings of 3rd Australian Computer, Network & Information Forensics Conference

Page 37

Figure 7 The install process for History Swatter.


Rather than having to download an executable, History Swatter installs via Internet Explorer. (Figure 7)
One of the reasons this software was chosen was because of the unique install method.

Figure 8 History Swatters main screen.

Proceedings of 3rd Australian Computer, Network & Information Forensics Conference

Page 38

Figure 9 History Swatters options for Windows privacy

Figure 10 History Swatters options for Internet Explorer Privacy

Proceedings of 3rd Australian Computer, Network & Information Forensics Conference

Page 39

Figure 11 History Swatters options for cookies.

Figure 12 History Swatters main screen after a run has completed.

In comparing the numbers here to that of the tests run on Window Washer, there are many similarities to be
seen.

Proceedings of 3rd Australian Computer, Network & Information Forensics Conference

Page 40

As has been performed with the previous analytic tests, Spider will be used to determine the basic
effectiveness of the software.

\par
================================================================
=======
\par Scanned C:\\WINDOWS
\par
\par Files Scanned:
\par C:\\WINDOWS\\Temporary Internet Files\\Content.IE5\\index.dat
\par C:\\WINDOWS\\Cookies\\index.dat
\par C:\\WINDOWS\\History\\History.IE5\\index.dat
\par
================================================================
=======
\par URLs Found:
\par ***** Scanning C:\\WINDOWS\\Temporary Internet Files\\Content.IE5\\index.dat...
*****
\par
\par
\par
\par http://view.atdmt.com/AVE/view/msnnkonm00200074ave/direct;wi.309;hi.60/01/
\par http://www.imdb.com/
\par
\par http://www.imdb.com/Sections/Gallery/
\par http://www.imdb.com/Sections/Gallery/
\par http://fpdownload.macromedia.com/get/shockwave/cabs/flash/swflash.cab
\par http://www.imdb.com/rg/SECGAL/PTF//name/nm1663927
\par http://www.imdb.com/name/nm1663927/
\par http://www.imdb.com/google/box?num=3;k=power100withsc;placement=midbucket;rnd=96922;sid=8483;referer=%2Fname%2Fnm1663927%2
F;slot=GOOGLE
\par http://www.imdb.com/name/nm1663927/board/threads/
\par http://www.pornmovies.com/
\par http://www.ultravideos.com/?revid=30032&s=6&nopop=1
\par http://www.ultravideos.com/s6/images/blonde.jpg
\par http://216.158.129.34/ml/wPCmavdMlpTfNQDA4xMA/UV/vid002.wmv
http://download.historyswatter.com/frame.jsp?partner=ZHXXXXXXXXAU&product=his
toryswatter&w=h&opcreativeid=&partner=&opcreativeland=505%2C0%2C5494%2Csh.
11.3pj
\par
\par ***** Scanning C:\\WINDOWS\\Cookies\\index.dat... *****
\par
\par Cookie:simon@www.ultravideos.com/s6/
\par Cookie:simon@imdb.com/
\par Cookie:simon@maxserving.com/
\par Cookie:simon@atdmt.com/
\par Cookie:simon@www.imdb.com/
\par Cookie:simon@advertising.com/
\par Cookie:simon@servedby.advertising.com/

Proceedings of 3rd Australian Computer, Network & Information Forensics Conference

Page 41

\par Cookie:simon@ultravideos.com/
\par
\par ***** Scanning C:\\WINDOWS\\History\\History.IE5\\index.dat... *****
\par
\par
================================================================
=======
\par
\par }
From this output, the conclusion can be made that History Swatter only deleted one index.dat file. It is also
interesting to note that there is an entry for a .wmv file that did not appear in the output for History Kill.
In carrying out a forensic analysis, Helix was loaded, the disk was imaged and the hashes were made to
verify it.

MD5
ee1bb1d107c645860fb77922b751c1e8 /dev/hda1
ee1bb1d107c645860fb77922b751c1e8 ./after-hs
SHA1
9e4f932361fede51ce9a784123774bb76982c8a9 /dev/hda1
9e4f932361fede51ce9a784123774bb76982c8a9 after-hs
CRC32
3357363561 540836352 /dev/hda1
3357363561 540836352 after-hs
The analysis of this disk was very similar to the analysis after History Kill was run. Traces of cookies were
located, but this time there appeared to be several more than before.

0E1A8168
0E1A8198

:2004100420041011: Simon@:Host: www.pornmovies.c


om.20041010.nudity.com..........................

0E1C0468
0E1C0498

:2004101020041011: Simon@http://www.pornmovies.c
om.20041010.nudity.com..........................

0E1A8858
0E1A8888

................:2004100420041011: Simon@http://
www.ultravideos.com/s6/video002.html............

The mysterious REDR tag showed up again:

0D780470
0D7804A0

................REDR.....Q...tG.http://www.pornm
ovies.com/......................................

Also more HTML code was recovered, however as it was at the same location as the previous analyses, they
will not be displayed here.
There was some data in this analysis that made it different from the other two. Although it is difficult to
deduce exactly then nature of this data or its origin, it may be a derivative of a script of some description.

0E462BF8
0E462C28
0E462C58
0E462C88
0E462CB8

........revid.30032.ultravideos.com/.1536.283255
6928.29685132.2740218304.29667026.*.ssite.6.ultr
avideos.com/.1536.2832556928.29685132.2740218304
.29667026.*.nopop.1.ultravideos.com/.1536.283255
6928.29685132.2740718304.29667026.*.refer.http%3

Proceedings of 3rd Australian Computer, Network & Information Forensics Conference

Page 42

0E462CE8
0E462D18
0E462D48
0E462D78
0E462DA8
0E462DD8
0E462E08

A%2F%2Fwww.pornmovies.com.20041010.nudity.com.ul
travideos.com/.1536.2832556928.29685132.27407183
04.29667026.*.sessid.25f081fc65f17457c9ae3e5f32c
ffcd9.ultravideos.com/.1536.2832556928.29685132.
2740718304.29667026.*.tstamp.1097417795.ultravid
eos.com/.1536.2832556928.29685132.2740718304.296
67026.*.........................................

The next section, which was located not too much further into the image, is obviously java script that must
have come from one of the web pages that caused the popup windows to appear.

0E467248
0E467278
0E4672A8
0E4672D8
0E467308
0E467338
0E467368
0E467398
0E4673C8

nopopimg = new Image();...nopopimg.src = "/?act


ion=nopopimg&revid=30032&nopop=1&refer=http%3A%2
F%2Fwww.pornmovies.com.20041010.nudity.com";..}.
}.self.focus();.stayUnder();..attachEvent('onbef
oreunload',goBUL);../* ]]> */.//-->.</script>..<
script language="JavaScript" type="text/JavaScri
pt">.<!--.function MM_openBrWindow(theURL,winNam
e,features) { //v2.0. window.open(theURL,winNam
e,features);.}.//-->.</s

The next block of data is similar to an earlier block, however this particular one appears to convey itself as
being generated as a result of a php script. It would appear that all the labels and numbers contained within
this block are actually php variables. It also makes reference to a .dll file.

102F3BE8
102F3C18
102F3C48
102F3C78
102F3CA8
102F3CD8
102F3D08
102F3D38

.dll..%.CreateBitmap..T.revid.30032.teensforcash
.com/.1536.3552556928.29685132.3461318304.296670
26.*.ssite.2.teensforcash.com/.1536.3552556928.2
9685132.3461318304.29667026.*.nopop.1.teensforca
sh.com/.1536.3552556928.29685132.3461318304.2966
7026.*.refer.http%3A%2F%2Fwww.pornmovies.com.200
41010.nudity.com%2Fmain.php.teensforcash.com/.15
36.3552556928.29685132.3461318304.29667026.*.ses

The final discovery found in this disk image that made it different from the previous analyses was a block of
data that appeared to be a variety of history. There is a reference to TypedURLs followed by a list of the
URLs that had been typed in, rather than all the URLs that were linked to by another page. It appears that
this data is actually from some internet explorer settings. This is evidenced by the portions of data which
explain can be derived from the parts that express Cache_Update_Frenquency and
Save_Session_History.

1033A640
1033A670
1033A6A0
1033A6B8
1033A6E8
1033A718
1033A748
1033A778
1033A7A8
1033A7D8
1033A808
1033A838

e.lnk................MRUListba....M.(...#.bhs-1.
bmp...0...........hs-1.lnk........x.............
..TypedURLs............u
rl1http://www.historyswatter.com/............url
2http://www.pornmovies.com/............url3http:
//www.hotmail.com/............url4http://www.sex
pics.com/............url5http://www.msn.com2...s
...2...........Main............Anchor Underliney
es............Cache_Update_FrequencyOnce_Per_Ses
sion............Display Inline Imagesyes........
....Do404Search................Local PageC:\WIND
OWS\SYSTEM\blank.htm............Save_Session_His

History Swatter was a second rate software program, similar to those previously tested in that it sufficiently
cleans the system for amateur use. However any further forensic testing would reveal the users history on
the machine.

Proceedings of 3rd Australian Computer, Network & Information Forensics Conference

Page 43

CONCLUSION
In conclusion, the freeware and shareware evidence elimination software that claims to offer a
comprehensive elimination system fails to deliver on this promise. This evidences the expertise and
extensive expertise needed to remove all traces of web browsing history. The most promising of the three
pieces of software tested was Window Washer, which successfully removed all visible traces of browser
history from a windows environment. Both History Swatter and History Kill overlooked some areas from
which to remove data.
The software is marketed towards home or domestic users wanting to delete browser history for fear of
virus attack or discovery by other users of personal browsing history. For this purpose, the software is
satisfactory and will meet the needs of the user. It is not designed, however, to protect users within a
business which has a forensic capability setup. In this respect, the marketing claims of the software can be
misleading as it claims to have removed items, while in reality it has only removed them from sight of the
naked eye. Upon conclusion of this analysis, it would seem the best way to cover the tracks of a user would
be to not leave any in the first instance.

REFERENCES
Deutch, J. M. (2003). DoD 5220.22-M National Industrial Security Program Operating Manual Retrieved
12 October, 2004 from http://www.dss.mil/isec/nispom.htm
Grinaker, E. (n. d.) Reset 0.1.0. Retrieved 10 October, 2004 from http://oss.codepoet.no/reset/README
Guntmann, P. (1996), Secure Deletion of Data from Magnetic and Solid-State Memory. Retrieved 12
October, 2004 from http://www.cs.auckland.ac.nz/~pgut001/pubs/secure_del.html
History Kill (2005). Features. Retrieved 8 October, 2004 from http://www.historykill.com/features.asp
Laliberte, S & Gupta, A. (2004). The Role of Computer Forensics in Stopping Executive Fraud. Retrieved
10 October, 2004 from http://www.awprofessional.com/articles/article.asp?p=336258&seqNum=2
McKemmish, R. (1999). What is Forensic Computing?. Retrieved 6 October,
2004 from http://www.aic.gov.au/publications/tandi/ti118.pdf
O'Conner, T. (2004). Syllabus for Computer Forensics. Retrieved 6 October, 2004 from
http://faculty.ncwc.edu/toconnor/426/default.htm
Spector, L. (2003). Answer Line: Wipe Your Drive Clean of All Its Sensitive Data. Retrieved 6 October,
2004 from http://www.pcworld.com/howto/article/0,aid,110338,00.asp
Symantec, (2005). Switches: GDisk. Retrieved 8 October, 2004 from
http://service1.symantec.com/SUPPORT/ghost.nsf/docid/2000030715304425
Vier, T. (2004). Wipe. Retrieved 25 October, 2004 from http://wipe.sourceforge.net/

COPYRIGHT
Simon Innes 2005. The author/s assign the School of Computer and Information Science (SCIS) & Edith
Cowan University a non-exclusive license to use this document for personal use provided that the article is
used in full and this copyright statement is reproduced. The authors also grant a non-exclusive license to
SCIS & ECU to publish this document in full in the Conference Proceedings. Such documents may be
published on the World Wide Web, CD-ROM, in printed form, and on mirror sites on the World Wide Web.
Any other usage is prohibited without the express permission of the authors.

Proceedings of 3rd Australian Computer, Network & Information Forensics Conference

Page 44

Turning a Linksys WRT54G into more than just a Wireless Router


Simon Innes
School of Computer and Information Science
Edith Cowan University
nme@arach.net.au

Abstract
This paper will discuss and analyse the ability of a Linksys wireless router to become an extremely useful
wireless tool. It will analyse the default setup for a Linksys WRT54G and its capabilities. It will then discuss
3rd party firmware available and the potential activities available using this firmware. The report will also
discuss several wireless tools, demonstrate them running on the router and discuss their potential uses. It
will attempt to outline the advantages and limitations of running different wireless tools on an embedded
device. The test environment will consist of a Linksys WRT54G, a laptop equipped with 2 wireless cards and
a PC connected to the router via a wired port.

Keywords
Wireless, WRT54G, Linksys, OpenWRT, Intrusion Detection

INTRODUCTION
One of the latest trends in technology and computer networking is the concept and implementation of
wireless connectivity. People use wireless devices in homes and offices because of the added convenience
and productivity to tasks. As the technology emerges, more sophisticated devices are being designed and
developed. Certain devices in particular give the user much control over the setup and utilization of the
device. This paper will look at the Linksys WRT54G and the way it can be changed into a powerful wireless
device.
The WRT54G was the Linksys flagship 802.11G wireless router in 2004. It is called a wireless router
because it is designed to carry out routing tasks between wireless devices and an internet connection. The
WRT54G also sports a 4-port switch which is useful when there is more than one wired device on a
network. The router supports all the features expected in a wireless device such as WEP key encryption,
MAC Address filtering, NAT routing, VPN passthrough and a built in DHCP server. These are easily
configured with a web administration tool. Another noteworthy feature is the two external RP-TNC antenna
ports that allow attachment of different, stronger antennas.
In creating the WRT54G, the decision was made to run an embedded version of Linux on it along with other
General Public Licensed (GPL) software. When open source developers were looking into the workings of
the WRT54G, it was noticed that not all of the modified GPL code had been supplied. Developers attempted
to contact Linksys to obtain the code, which they were legally entitled to due to its licensing under the GPL.
This code included changes to the Linux kernel, changes to components such as iptables and changes to
wireless kernel modules. When Linksys eventually released their modified code, developers were able to
look at, change and add an unlimited number of enhancements and features. Some went as far as to create
their own entire firmware for the router to the exclusion of the majority of the included Linksys firmware.
One such development was OpenWRT (OpenWRT, 2005).
OpenWRT differs from other firmware available for the WRT54G because it does not attempt to be an all
inclusive solution. For example, the Sveasoft (Sveasoft Incorporated, 2003) firmware called Alchemy
comes complete with a collection of features that the Linksys firmware does not include, all of which are
installed by default. With OpenWRT, upon loading the firmware, a very minimal Linux install occurs. From
this point, it is, possible to download packages and modules to add the functionality desired. To some users,
this may appear excessive and inefficient; however the ability to customise is appealing and useful to many

Proceedings of 3rd Australian Computer, Network & Information Forensics Conference

Page 45

Due to the limited storage space on a WRT54G, OpenWRT attempts to utilise the space effectively by
storing the firmware on a compressed Squashfs partition and set the remaining space as a JFFS2 partition to
allow for storage of new packages and configuration files. For a shell, OpenWRT uses Busybox.

Squashfs is a compressed file system that was designed for the sake of being a small, read only file system
that would be used on devices with limited storage and a need for very small overhead. Squashfs can also be
applied to individual files for archival purposes as it is documented to outperform tar and gzip archiving.
Squashfs is the ideal choice of file systems for storing the firmware for the WRT54G as it allows the
utilization of space for other modules and features which may increase the effectiveness of the device.
Journaling Flash File System, or JFFS2 is a file system created by the Swedish company Axis
Communications AB for use on flash devices. It is designed to manage files and space when operating on a
device with limited storage. JFFS2 aims to have minimal file overhead, resulting in the utilisation of space
more efficiently elsewhere. This is an ideal choice for the WRT54G due to the space restrictions of the
device.
Busybox (http://www.busybox.net) is a lightweight, customisable shell designed to run on embedded
systems. It is based on UNIX and GNU utilities to allow the user to create a familiar environment.
Generally, the options available will be less extensive than the complete versions, primarily because they
are deemed less useful and the space may be more economically used elsewhere. Busybox can be
customised at compile time to allow a user to add and remove features as required. Busybox is effectively
the shell interface users are presented with when logging onto a WRT54G running OpenWRT.
Installing the Firmware
Once the firmware has been identified, the next step is to obtain the firmware and install it. The WRT54G
used for these tests is Revision 2.0. This can be identified by examining the serial number of the router. At
the beginning of the serial number are the characters CDF50D, where the 5 represents Revision 2.0. The
firmware was retrieved from the OpenWRT website (http://openwrt.org/) and the experimental binary was
used. The install process uses an exploit located in the Linksys web interface. On the web interface is a
diagnostics page that allows the user to ping remote hosts. By issuing a ;in the area that allows the user to
enter the target, Linux function creates a new line, and any command may be entered. By redirecting the
output of the command to /tmp/ping.log (e.g. ls la / > /tmp/ping.log) the output can be seen in the ping
reply window once the command has finished. This is needed so that it is possible to switch on what is
known as boot wait. This is an environment variable that tells the router how to proceed at boot time.
Normally, the boot loader and firmware are loaded in rapid succession, however with this variable set, there
will be a pause allowing new firmware to be installed using tftp. Once this is done, a tftp client will need to
be configured to point at the routers diagnostic address (192.168.1.1) and be set to retry sending the
firmware, allowing it to continue to try and send until a connection is made. Once this has happened, the
router is rebooted and the new firmware will be automatically recognised and installed.
Upon booting the firmware for the first time a jffs2 partition will be created from the remaining space on the
device. Once the lights on the router stop flashing (DMZ and power in particular), the router is ready for
use. By default, a telnet server is started which can be connected to. On first connecting, this screen will be
displayed:
=== IMPORTANT ============================
Use 'passwd' to set your login password
this will disable telnet and enable SSH
------------------------------------------

BusyBox v1.00 (2005.05.25-20:30+0000) Built-in shell (ash)


Enter 'help' for a list of built-in commands.
_______
________
__
|
|.-----.-----.-----.| | | |.----.| |_
|
|| _ | -__|
|| | | ||
_||
_|
|_______||
__|_____|__|__||________||__| |____|
|__| W I R E L E S S
F R E E D O M
root@crankap:/#

Proceedings of 3rd Australian Computer, Network & Information Forensics Conference

Page 46

It is stated in the OpenWRT documentation that the default telnet server has been purposely left without a
password to emphasise the insecurity of the telnet protocol. When first changing the password, telnet is
disabled and ssh is enabled automatically.
An issue of note regarding the default install of OpenWRT is that there is no web interface by default. A
package is available called interface-wrt which provides basic configuration functionality. Some users
prefer to simply use the command line.

On conclusion of the basic install, examine the file system layout to become familiar with the actual lack of
storage space.
root@crankap:~# mount
/dev/root on /rom type squashfs (ro)
none on /rom/dev type devfs (rw)
/dev/mtdblock/4 on / type jffs2 (rw)
none on /proc type proc (rw)
none on /dev type devfs (rw)
none on /tmp type tmpfs (rw)
none on /dev/pts type devpts (rw)
root@crankap:~# df -h
Filesystem
/dev/root
/dev/mtdblock/4
none

Size
1.0M
2.2M
6.9M

Used Available Use% Mounted on


1.0M
0 100% /rom
364.0k
1.8M 16% /
12.0k
6.9M
0% /tmp

As is recognisable from this command, there is just less than 2 megabytes of flash memory left to install
packages to. There is also 7 megabytes of RAM to use for temporary storage. This will most likely be used
for log outputs.
The firmware also comes with several base packages installed:
root@crankap:~# ipkg list_installed
bridge - 1.0.6-1 - Ethernet bridging tools
busybox - 1.00-2 - Core utilities for embedded Linux systems
dnsmasq - 2.22-1 - A lightweight DNS and DHCP server
dropbear - 0.45-2 - a small SSH 2 server/client designed for small memory
environments.
ipkg - 0.99.145-1 - lightweight package management system
iptables - 1.3.1-1 - The netfilter firewalling software for IPv4
kmod-brcm-et - 2.4.30-1 - Proprietary driver for Broadcom Ethernet chipsets
kmod-brcm-wl - 2.4.30-1 - Proprietary driver for Broadcom Wireless chipsets
kmod-diag - 2.4.30-1 - Driver for Router LEDs and Buttons
kmod-ppp - 2.4.30-1 - PPP support
kmod-pppoe - 2.4.30-1 - PPP over Ethernet support
kmod-wlcompat - 2.4.30-1 - Compatibility module for using the Wireless Extension
with broadcom's wl
openwrt-utils - 1 - Basic OpenWrt utilities
ppp - 2.4.3-4 - a PPP (Point-to-Point Protocol) daemon (with MPPE/MPPC support)
ppp-mod-pppoe - 2.4.3-4 - a PPPoE (PPP over Ethernet) plugin for PPP
wireless-tools - 28.pre6-1 - Tools for setting up WiFi cards using the Wireless
Extension
zlib - 1.2.2-1 - an implementation of the deflate compression method (library)
Successfully terminated.

OpenWRT comes with an easy to use package management system called ipkg. ipkg stands for Itsy
Package Management System and is designed to be an rpm style system for flash based devices such as
PDAs. The packages are retrieved from a list of indexes that are downloaded from user specified locations.
This is a very effective and ideal solution for this type of device.
For the purpose of this paper, a list of tools have been selected to be loaded onto the device and tested.
These tools include Kismet, Snort, nmap and wireless-tools.

Proceedings of 3rd Australian Computer, Network & Information Forensics Conference

Page 47

Kismet
The first tool to be examined is Kismet. Kismet is a wireless sniffer that picks up 802.11 traffic. It is a
useful tool that will work with any wireless card which supports monitor mode. With the Broadcom chipset
that comes with the Linksys WRT54G, this tool is ideal. Kismet can also work as an intrusion detection
system, due to the fact that it can detect wireless scans (such as Netstumbler) and other suspicious activity.
Kismet was set up to run on the WRT54G to do a basic scan to attempt to detect and wireless devices in the
area. The wireless card on the laptop was set to Master mode to obtain some expected results. Please note
that the output has been cut down to remove unnecessary data.

root@crankap:~# kismet_server
Enabling channel splitting.
Source 0 (wireless): Enabling monitor mode for wrt54g source interface prism0
channel 0...
Source 0 (wireless): Opening wrt54g source interface prism0...
Dropped privs to nobody (65534) gid 65534
Allowing clients to fetch WEP keys.
WARNING: Disabling GPS logging.
Writing data files to disk every 300 seconds.
Mangling encrypted and fuzzy data packets.
Tracking probe responses and associating probe networks.
Reading AP manufacturer data and defaults from /etc/ap_manuf
Reading client manufacturer data and defaults from /etc/client_manuf
Dump file format: wiretap (local code) dump
Crypt file format: airsnort (weak packet) dump
Kismet 2005.04.R1 (WRT-Kismet)
Listening on port 2501.
Gathering packets...
Sat Jan 1 01:40:00 2000 Found new network "belkin54g" bssid 00:11:50:32:88:77
WEP Y Ch 11 @ 54.00 mbit
Sat Jan 1 01:40:24 2000 Found new network "<no ssid>" bssid 00:0F:66:AA:1C:96
WEP N Ch 0 @ 0.00 mbit
Sat Jan 1 01:40:24 2000 Found new probed network "<no ssid>" bssid
00:0E:35:41:6F:98
Sat Jan 1 01:40:31 2000 Found new probed network "<no ssid>" bssid
00:04:47:00:14:7C
Sat Jan 1 01:40:31 2000 Associated probe network "00:04:47:00:14:7C" with
"00:11:50:32:88:77" via probe response.
Sat Jan 1 01:40:50 2000 Associated probe network "00:0E:35:41:6F:98" with
"00:11:50:32:88:77" via probe response.
Sat Jan 1 01:41:00 2000 Found new network "FAKE" bssid 00:09:5B:EA:DD:E6 WEP N
Ch 1 @ 11.00 mbit

From this output it can be seen that there were two named networks detected. FAKE was the laptop sitting
in master mode. An interesting point of note is that the program did not detect the SSID of itself.
00:0F:66:AA:1C:96 is the MAC address of the router and was detected as <no ssid>. A network scan was
run on the laptop to see if the outcome was similar, however in this instance, the SSID was displayed. For
the purpose of testing, the MAC address on the laptop card was incremented by one and the SSID I was
changed. The results were as follows:
Sat Jan 1 01:50:27 2000 Found new network "ALSO-FAKE" bssid 00:09:5B:EA:DD:E7
WEP N Ch 11 @ 11.00 mbit
ALERT Sat Jan 1 01:50:45 2000 Beacon on 00:09:5B:EA:DD:E7 (ALSO-FAKE) for
channel 2, network previously detected on channel 11

As seen in this instance, the MAC address now ends in E7 rather than E6. Finally for the purpose of
completion, fakeap was run on the laptop. Fakeap is a simple perl script that randomly generates MAC
addresses and changes the SSID of a wireless card while in master mode. This is how kismet on the
WRT54G displayed the findings:

Proceedings of 3rd Australian Computer, Network & Information Forensics Conference

Page 48

Sat Jan 1 01:51:55 2000


WEP N Ch 0 @ 0.00 mbit
Sat Jan 1 01:51:55 2000
Sat Jan 1 01:51:55 2000
WEP N Ch 0 @ 0.00 mbit
Sat Jan 1 01:51:55 2000
Sat Jan 1 01:51:56 2000
N Ch 6 @ 11.00 mbit
Sat Jan 1 01:51:56 2000
WEP N Ch 0 @ 0.00 mbit
Sat Jan 1 01:51:56 2000
Sat Jan 1 01:51:56 2000
WEP N Ch 3 @ 11.00 mbit
Sat Jan 1 01:51:57 2000
WEP N Ch 6 @ 11.00 mbit
Sat Jan 1 01:51:57 2000
WEP N Ch 0 @ 0.00 mbit
Sat Jan 1 01:51:57 2000
00:00:CE:C3:6E:48
Sat Jan 1 01:51:57 2000
WEP N Ch 0 @ 0.00 mbit
Sat Jan 1 01:51:57 2000
WEP N Ch 0 @ 0.00 mbit
Sat Jan 1 01:51:57 2000
00:00:CE:64:15:24

Found new network "<no ssid>" bssid 00:00:0C:02:F6:19


Found SSID "zoar" for network BSSID 00:00:0C:02:F6:19
Found new network "<no ssid>" bssid 00:00:0C:15:C6:F8
Found SSID "tulley" for network BSSID 00:00:0C:15:C6:F8
Found new network "redbook" bssid 00:00:CE:5F:45:BD WEP
Found new network "<no ssid>" bssid 00:00:0C:76:5C:F8
Found SSID "tarra" for network BSSID 00:00:0C:76:5C:F8
Found new network "rivaherrera" bssid 00:00:CE:52:C3:61
Found new network "sam-houston" bssid 00:00:CE:21:26:41
Found new network "<no ssid>" bssid 00:00:CE:C3:6E:48
Found SSID "faretheewell" for network BSSID
Found new network "<no ssid>" bssid 00:00:0C:19:B9:2D
Found new network "<no ssid>" bssid 00:00:CE:64:15:24
Found SSID "locusts" for network BSSID

As can be seen from this, Kismet was successful in detecting the networks. Unfortunately it was unable to
establish that these access points were not real. An issue that was observed whilst running Kismet on the
WRT54G was that the small amount of space available for logging was very restricted. If fakeap had been
left running, , the /tmp directory would have filled and the server would have stopped running.
Snort
The next tool looked at is called Snort. Snort is an open source Intrusion Detection System (IDS). Snort is a
lightweight program, making it ideal for use on the WRT54G. Another advantage of Snort is that it is free,
as Commercial IDS software can carry expensive licensing fees,. Snort is a rule based IDS, meaning it will
analyse each packet and determine, from a list of rules, whether or not the packet is malicious. It is
convenient to have Snort running on a router, as it examines all traffic coming in and out of the network.
This makes detection of attacks and intrusions effectively from both inside and outside the network. From
the OpenWRT firmware, there are no data analysis tools available for Snort. This is acceptable as the large
content of data analysis would prove difficult for the WRT54G to manage. Another problem is that the
amount of logging Snort can perform is limited to the storage size of the device. One solution to this
potential problem is to install a package that allows for Snort data to be logged to a MySQL database and
point it at a database on another machine. On the second machine you could also run the data analysis tools.
NMap
Another tool that could be used to increase the usefulness of the WRT54G is called Nmap. Nmap is short
for Network Mapper and is an open source tool for scanning networks. This tool has the ability to detect
how many systems are in a network, what the IP addresses are and what services are running. It can also
determine what operating systems are running on each host. For the purpose of this paper, Nmap will be
used for simple network and port scans as shown below:
root@crankap:~# nmap -v -sT 192.168.0.0/24
Starting nmap 3.81 ( http://www.insecure.org/nmap/ ) at 2000-01-01 02:17 UTC
Initiating Connect() Scan against 192.168.0.1 [1663 ports] at 02:17
The Connect() Scan took 4.11s to scan 1663 total ports.
Host 192.168.0.1 appears to be up ... good.
Interesting ports on 192.168.0.1:
(The 1659 ports scanned but not shown below are in state: closed)
PORT
STATE SERVICE
22/tcp open ssh
23/tcp open telnet

Proceedings of 3rd Australian Computer, Network & Information Forensics Conference

Page 49

53/tcp open
80/tcp open

domain
http

The Connect() Scan took 16.42s to scan 6652 total ports.


Host 192.168.0.3 appears to be up ... good.
Interesting ports on 192.168.0.3:
(The 1656 ports scanned but not shown below are in state: closed)
PORT
STATE SERVICE
80/tcp
open http
135/tcp open msrpc
139/tcp open netbios-ssn
443/tcp open https
445/tcp open microsoft-ds
1025/tcp open NFS-or-IIS
3389/tcp open ms-term-serv
MAC Address: 00:11:D8:4C:52:16 (Asustek Computer)
Host 192.168.0.10 appears to be up ... good.
Interesting ports on 192.168.0.10:
(The 1660 ports scanned but not shown below are in state: closed)
PORT
STATE SERVICE
22/tcp open ssh
111/tcp open rpcbind
631/tcp open ipp
MAC Address: 00:0E:35:41:6F:98 (Intel)
Host 192.168.0.11 appears to be up ... good.
Interesting ports on 192.168.0.11:
(The 1660 ports scanned but not shown below are in state: closed)
PORT
STATE SERVICE
22/tcp open ssh
111/tcp open rpcbind
631/tcp open ipp
MAC Address: 00:0E:35:41:6F:98 (Intel)
Host 192.168.0.254 appears to be up ... good.
Interesting ports on 192.168.0.254:
(The 1648 ports scanned but not shown below are in state: closed)
PORT
STATE SERVICE
21/tcp
open ftp
22/tcp
open ssh
23/tcp
open telnet
25/tcp
open smtp
53/tcp
open domain
80/tcp
open http
110/tcp open pop3
111/tcp open rpcbind
139/tcp open netbios-ssn
143/tcp open imap
445/tcp open microsoft-ds
515/tcp open printer
901/tcp open samba-swat
3128/tcp open squid-http
3306/tcp open mysql
MAC Address: 00:90:27:35:3B:CD (Intel)
Nmap finished: 256 IP addresses (5 hosts up) scanned in 64.999 seconds
Raw packets sent: 1010 (34.3KB) | Rcvd: 15 (468B)

This scan shows that there are 4 machines and 5 network interfaces available on the network at the time of
the scan. A point of note is that there is still a telnet service open on the WRT54G (192.168.0.1) even
thought this appeared to have closed when a password was set. When attempting to run an operating system
fingerprint, Nmap was unable to initialise.
root@crankap:~# nmap -O 192.168.0.0/24
Killed

Proceedings of 3rd Australian Computer, Network & Information Forensics Conference

Page 50

Aircrack
A tool which may prove to be more malicious than useful is Aircrack. Aircrack is designed to be a WEP key
cracker. It works by sniffing for wireless traffic and taking note of all the encrypted packets. It will then
take the packets initialisation vector (IV) and use them to attempt to establish the WEP key. The more
unique an IV is, the greater the possibility of cracking the WEP key. The process initiates by running a
piece of software called Airodump which is a wireless sniffer. Its basic function is to sniff for encrypted
packets, establish the network they are for and log them. This can be run for as long as needed, with more
unique IVs being detected the longer it is run, When run from the WRT54G, the output looked like this:
BSSID

CH

MB

ENC

PWR

Packets

LAN IP / # IVs

00:00:0C:0D:EA:B7
00:0F:66:AA:1C:96
00:11:50:32:88:77

1
-1
11

11
-1
48

WEP

0
0
0

1
16
5099

WPA

2565

ESSID
FAKE
belkin54g

In the space of just a few minutes, airodump retrieved 2565 unique IVs from the belkin54g network. Once
the captured packets have been saved, aircrack needs to be run on the file to attempt to crack the WEP key.
Unfortunately, the WRT54G does not have the processing power or memory available to be able to run
aircrack.
root@crankap:~# aircrack /tmp/test.cap
malloc(80 MB) failed

This is to be expected as much processing power would be needed to sort through all of the data. Another
limitation is storage space again. In the 2500 IVs that were detected by airodump, a file of almost 2
megabytes was created. This means that 10 000 IVs would fill the routers temporary storage. This becomes
problematic as takes at least 150 000 unique IVs to reliably crack a 40bit WEP key and around 500k to 1
million IVs for a 104bit WEP key.
Wireless-tools
The last package to be tested is known as wireless-tools. Using this, the WRT54G will disguise itself as a
different access point to investigate how the laptop running Kismet will react. From previous examples it
would seem as though the belkin54g network is nearby and usable. The SSID is belkin54g and the MAC
address is 00:11:50:32:88:77.
root@crankap:~# ifconfig eth1 down
root@crankap:~# ifconfig eth1 hw ether 00:11:50:32:88:77
root@crankap:~# ifconfig eth1 up
root@crankap:~# ifconfig eth1
eth1
Link encap:Ethernet HWaddr 00:11:50:32:88:77
BROADCAST MULTICAST MTU:1500 Metric:1
RX packets:57573 errors:0 dropped:0 overruns:0 frame:9521
TX packets:91347 errors:7668 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:3112315 (2.9 MiB) TX bytes:81200011 (77.4 MiB)
Interrupt:4 Base address:0x1000
root@crankap:~# iwconfig eth1 essid b3lkin
root@crankap:~# iwconfig eth1
eth1
IEEE 802.11-DS ESSID:"b3lkin"
Mode:Master Frequency:2.462 GHz Access Point: 00:11:50:32:88:77
Tx-Power:22 dBm
RTS thr=2347 B
Fragment thr=2346 B
Encryption key:0000-0000-0000-0000-0000-0000-0000-0000

Once that is done, Kismet is run on the laptop:


Sat May 28 01:20:59 2005 Found SSID "belkin54g" for network BSSID
00:11:50:32:88:77
Sat May 28 01:20:59 2005 Found SSID "b3lkin" for network BSSID
00:11:50:32:88:77
Sat May 28 01:21:00 2005 Found SSID "belkin54g" for network BSSID

Proceedings of 3rd Australian Computer, Network & Information Forensics Conference

Page 51

00:11:50:32:88:77
Sat May 28 01:21:00
00:11:50:32:88:77
Sat May 28 01:21:00
00:11:50:32:88:77
Sat May 28 01:21:01
00:11:50:32:88:77
Sat May 28 01:21:02
00:11:50:32:88:77
Sat May 28 01:21:02
00:11:50:32:88:77
Sat May 28 01:21:02
00:11:50:32:88:77
Sat May 28 01:21:03
00:11:50:32:88:77

2005 Found SSID "b3lkin" for network BSSID


2005 Found SSID "belkin54g" for network BSSID
2005 Found SSID "belkin54g" for network BSSID
2005 Found SSID "b3lkin" for network BSSID
2005 Found SSID "belkin54g" for network BSSID
2005 Found SSID "b3lkin" for network BSSID
2005 Found SSID "belkin54g" for network BSSID

As it can be seen, it has become difficult to know which is the actual access point. Kismet changes who it
sees as the access point, based on the most recently received packet. This could prove interesting if a user
were to create a fakeap script for the WRT54G.

As useful as the WRT54G undeniably is, there are several limitations. The primary problem is the lack of
storage size. A possible remedy to this may lie in setting up a samba or CIFS share on a machine elsewhere
and mounting this on the WRT54G, to be used for storage space. In the case of Snort, a plugin is available
to allow it to log to a MySQL server. The other limitation is the lack of memory and processing power for
carrying out any calculation tasks. Again, this can be remedied by moving the data to a larger machine and
carrying out the functions there.

CONCLUSION
The Linksys WRT54G can be turned into a powerful wireless tool. There is potential for the router to be
used as an IDS, a wireless scanner or a rogue access point. With the large amount of customisation that
OpenWRT offers, the possibilities are endless. When evaluating the features and applications available on
the WRT54G, it is apparent that the device, which retails at around $150AU, is an economical, reliable and
quality option. Whilst the WRT54G carries out the aforementioned tasks using 3rd party firmware, an
interesting comparison might involve a cost analysis of other devices built specifically to perform these
tasks. Future work with this router would include setting up an NFS server on the same network as the
router and then running applications that require disk space for logging, such as snort and kismet. The
WRT54G can also be configured to handle menial tasks on a network such as DNS and DHCP. Further
investigation may lead to using the WRT54G as a complete wireless defence solution.

REFERENCES
Anderson, E. (2005). BusyBox: The Swiss Army Knife of Embedded Linux. Retrieved 18 May, 2005 from
http://www.busybox.net/about.html
Bull, D. (2003). iPKG the Itsy Package Management System. Retrieved 20 May, 2005 from http://www.ukdave.com/tutorials/zaurus/ipkg.shtml
Cisco Systems. (2005). Wireless-G Broadband Router:WRT54G. Retrieved 19 May, 2005 from
http://www.linksys.com/international/product.asp?coid=19&ipid=452
Davis, Z. (2001). JFFS a GPL Journaling Flash File System. Retrieved 15 May, 2005 from
http://www.linuxdevices.com/links/LK6391004496.html
Devine, C. (2005). Aircrack documentation. Retrieved 20 May, 2005 from
http://www.cr0.net:8040/code/network/aircrack/#q40
Flickenger, R. (2003). Is Linksys shirking the GPL? (Maybe not.) Retrieved 17 May, from
http://www.oreillynet.com/pub/wlg/3580
Fyodor. (2005). Nmap Security Scanner. Retrieved 19 May, 2005 from http://www.insecure.org/nmap/
GNU Project. (2005). GNU General Public Licence. Retrieved 17 May, 2005 from
http://www.gnu.org/copyleft/gpl.html

Proceedings of 3rd Australian Computer, Network & Information Forensics Conference

Page 52

Kershaw, M. (2005). Kismet: Documentation. Retrieved 20 May, 2005 from


http://www.kismetwireless.net/documentation.shtml
Martin P. (2005). Configuring OpenWRT as a Wireless Client. Retrieved 14 May, 2005, from
http://martybugs.net/wireless/openwrt/client.cgi
Miklas, A. (2003). Linksys WRT54G and the GPL. Retrieved 18 May, 2005 from
http://www.uwsg.iu.edu/hypermail/linux/kernel/0306.0/1758.html
Open Wrt. (2005). Open Wrt Docs. Retrieved 20 May, 2005 from http://openwrt.org/OpenWrtDocs
Pavlov, A. (2005) What is SquashFS. Retrieved 16 May, 2005 from http://tldp.org/HOWTO/SquashFSHOWTO/whatis.html
Pirie,S. (2005). WRT54G Version Differences. Retrieved 15 May, 2005, from
http://www.linksysinfo.org/modules.php?name=News&file=article&sid=18
Roesch, M. (2005). About Snort. Retrieved 19 May, 2005 from http://www.snort.org/about_snort/
Russel, R. (1999). Using iptables. Retrieved 15 May, 2005 from http://www.telematik.informatik.unikarlsruhe.de/lehre/seminare/LinuxSem/downloads/netfilter/iptables-HOWTO.html#toc6
Seattle Wireless. (2005). LinksysWrt54g. Retrieved 14 May, 2005, from
http://www.seattlewireless.net/index.cgi/LinksysWrt54g
Sveasoft Incorporated. (2003). Sveassoft. Retrieved 20 May, 2005, from http://www.sveasoft.com/
Woodhouse, D. (2001). JFFS: The Journaling Flash File System. Retrieved 16 May, 2005 from
http://sources.redhat.com/jffs2/jffs2-html/jffs2-html.html

COPYRIGHT
Simon Innes 2005. The author/s assign the School of Computer and Information Science (SCIS) & Edith
Cowan University a non-exclusive license to use this document for personal use provided that the article is
used in full and this copyright statement is reproduced. The authors also grant a non-exclusive license to
SCIS & ECU to publish this document in full in the Conference Proceedings. Such documents may be
published on the World Wide Web, CD-ROM, in printed form, and on mirror sites on the World Wide Web.
Any other usage is prohibited without the express permission of the authors.

Proceedings of 3rd Australian Computer, Network & Information Forensics Conference

Page 53

After Conversation - An Forensic ICQ Logfile Extraction Tool


Kim Morfitt and Craig Valli
Edith Cowan University, School of Computer and Information Science
Email k.morfitt@student.ecu.edu.au
Email c.valli@ecu.edu.au

Abstract
Instant messenger programs such as ICQ are often used by hackers and criminals for illicit purposes and
consequently the logfiles from such programs are of forensic interest. This paper outlines research in
progress that has resulted in the development of a tool for the extraction of ICQ logfile entries. Detailed
reconstruction of data from logfiles was achieved with a number of different ICQ software, with other
programs still to be tested. There are several limitations including timestamp information not adjusted for
the time zone, data could be altered, and conversations must be manually reconstructed. Future research
will aim to address these and other limitations as pointed in this paper.

Keywords ICQ, instant messaging, logfile, forensic, extraction


INTRODUCTION
ICQ is an instant messaging system by which users can send messages and files to other users, as well as
interact in various other ways such as voice chat and the use of web cams. The official ICQ client logs all
conversations and other interactions between users to file in binary format. There are several different
versions of the official ICQ client, with differences between different log files for different versions. This
variety of logfiles and software versions and binary nature of the logfile makes for difficult and time
consuming forensic analysis of these files.
Hackers and criminals often use instant messenger programs such as ICQ for illicit purposes and
consequently the logfiles from such programs are of forensic interest (Mosnews, 2005;Poulsen 2005). If
ICQ itself is not used directly in committing an offence its log files may contain valuable corroborative
evidence (Anonymous, 2002).
Prior to this project some work had gone into working out the format of the database files by Soeder (2000),
and further work by Strickz (2002). Their work documents the structures of the files and most of the
components of the data structures used in the binary logfiles. The information provided by the two authors
that provides good insight into the structure and purpose of the two different log file types identified. While
these files impart information about the structure of the log files, there is no information contained within on
how to extract the information or algorithms to verify log file information. The information contains only
information on the data structures in the database.
A program called icqhr (HitU, 2002) outputs log file information to html, rather than XML and was used to
assist in comparing the accuracy of the system under development. The tool has been black box tested and it
accurately outputs the format. Although, a Russian hacking group has created it icqhrs value as reference
was invaluable. This paper outlines research in progress that has resulted in the development of a tool for
the extraction of ICQ logfile entries.

ICQ LOGFILE FORMAT BASICS


Analysis of the IDX file was easily achieved the file layout was simple, with only a few data structures that
required some interpretation. An IDX file is so named as it is the file extension for this file type. The file
name itself consists of the users UIN (Users Identification Number) used to access the Internet, followed by
.idx extension.
An IDX file was opened in a hex editor and the information in the files were compared to the
documentation that was downloaded. This illuminated the fact that the files were written in little Endian
format (least to most significant byte). Manual analysis of an IDX file is a time consuming process, even for
a small number of entries, and there is a limited amount of information that can be gained from the file. The
garnered information however, could be used for cross-referencing and verification of DAT file entries.
To verify the information gained from the analysis was correct, a test program called IDX Reader was
created to extract information from the IDX files. The purpose of this program was to use the information

Proceedings of 3rd Australian Computer, Network & Information Forensics Conference

Page 54

gathered to extract all the entries from an IDX file and store the information in a linked list. From there the
information could be output to screen showing the information that was extracted.
Possible reconstruction of log files from fragments
This paper presents ideas regarding possible methods for reconstructing ICQ log files. First a caveat, due to
the fact that not everything is known about the log files, and the limited testing data, some of the
information presented here may not work in all conditions or versions of ICQ, even though the IDX file
structures appear to be the same over all versions of ICQ that implement this method of logging. The files
used when forming these techniques had little or no previous data deleted from them i.e. they may not be
representative of extended and extensive use by user. It may be that deleted data affects the structure of the
IDX files, and the ordering of these records. This would affect attempts to reconstruct IDX files.
IDX files contain a doubly linked list of offsets to valid entries in the DAT file. These entries are numbered.
Once the corresponding entry is located in the dat file and identified by its number, its position within the
DAT file is known from the offset given in the IDX file. There are two possible paths to take with IDX
files. One is to reconstruct the file, and the other is to extract what data can be extracted from the available
fragments. The main reason for reconstructing an IDX file is to allow reconstruction of DAT files for
forensically sound analysis.
Reconstruction of an IDX
Reconstruction of an IDX file is relatively straight forward task if all the fragments of the file are available,
which for explanation of the process is assumed. The IDX file itself contains a file header, page headers and
the entries themselves.
The file header is 20 bytes in size. It has the format shown in figure 1. As shown, it has a signature of 12
bytes at the start of the header that can be searched for. That signature will be at the start of a file fragment,
as it is also the start of an IDX file. A page header will immediately follow the file header.

============================================
=======
== Format of IDX main header (20 BYTES):
============================================
=======
00000000 3 LONGS Unknown, but always 4, 20, 8.
(04,00,00,00,14h,00,00,00,08,00,00,00)
0000000C LONG
IDX pointer to root entry
00000010 LONG
ICQ database version
10
= ICQ 99a
14
= ICQ 99b
17
= ICQ 2000a
18
= ICQ 2000b
19
= ICQ 2001a, 2001b, 2002a,
2003a
00000014 --Start of first IDX page header
Figure 1: Format of IDX file header (Strickz, 2002)
The page headers are 205 bytes in size and are always separated by space for 1000 linked list entries
(20Kb). The format for a page header is shown in figure 2. Again the entry begins with a series of bytes, 20
bytes in all that can be used as a signature to locate page headers.

Proceedings of 3rd Australian Computer, Network & Information Forensics Conference

Page 55

=====================================================
========
== Format of IDX page header (205 BYTES):
=====================================================
========
00000000 5 LONGS
Unknown, but always 201, 0, 0, 0, 0.
00000014 LONG
Pointer to next page header. -1 if this is the last
page.
00000018 LONG
Unknown, always 1?
0000001C
LONG
Number of bytes in each slot (20)
00000020 LONG
Number of fragments in the page with one or more
consecutive free slots.
00000024 LONG
Number of empty slots in this page.
00000028
10 LONGS Unknown, always 0?
00000050 125 BYTES Allocation bitmap
000000CD
--1000 list entries (slots)
Figure 2: Format of IDX page header (Strickz, 2002)
Initially, the IDX file is created with a file header at offset 0h, followed by a page header at offset 14h. This
is followed by space for 1000 20 byte entries, making the initial file size 20225 bytes in size. Additional
space for entries is allocated as required. Allocation is done by appending an additional page header, plus
space for 1000 entries. From this we know that:

The size of an IDX file will be a multiple of 20205 bytes plus 20 bytes.
Beginning at offset 20h, every 20205 bytes there will be a page header
If all the page headers in the fragments are located, then the file size can be calculated. This is done by
multiplying the number of page headers by 20205 and adding 20, to get the size of the file. This will only
work if all page headers are properly identified.
Alternatively another method is the highest value next pointer can be located and 20205 bytes can be added
to that to get the expected file size. The page headers can be ordered in ascending order by their next
pointers. The first page header is always located at offset 14h, and the remainder located at intervals of
20205 bytes. To locate position of a page header subtract 4EEDh from the next pointer to find offset at
which the header begins. All the page headers can be positioned in this manner. It is possible to locate the
position of the remaining fragments by using the positioned fragments containing the page headers. Locate a
valid linked list record near the end of a previously positioned fragment. The format of the IDX linked list
entry is shown in figure 3.

=================================================
========
== Format of IDX linked list entry (20 BYTES each):
=================================================
========
00000000 LONG
entry status? : -2
= valid IDX entry
00000004 LONG
DAT entry number:
00000008 LONG
IDX pointer to next entry (-1 = none)
0000000C LONG
IDX pointer to previous entry (-1 = none)
00000010 LONG
DAT pointer to corresponding DAT entry (-1
= none)
Figure 3: Format of IDX Linked List Entry (Strickz, 2002)

Proceedings of 3rd Australian Computer, Network & Information Forensics Conference

Page 56

It should be simple enough to order the remaining fragments by the DAT entry numbers in the linked list
records contained in the fragments. The higher entry number fragments go towards the end of the file and it
should be possible to reconstruct the remaining fragments into the file in that manner.
If it is possible to search for information in fragments at certain offsets, there is a second method that could
possibly be able to identify the correct fragments and be used to automate the reconstruction. This method
would use the last linked list record in a fragment that has already been positioned in the reconstructed file,
and noting its next pointer. This pointer would likely point to a linked list entry in the next fragment. The
offset into the fragment could be calculated. The signature of a linked list entry is known. It is FE FF FF FF
(-2) in the first 4 bytes of the linked list entry.
Using the signature to search for an entry at that position in all of the fragments. If an entry is found there,
and it is the next fragment, its pointer to the previous linked list entry would contain the offset of the last
entry in the previous fragment (being the linked list entry used to find the next file fragment).
Extracting IDX information without Reconstruction
The required data can be extracted without reconstructing the file; however it may require performing
additional checks to identify valid records. The only identifier of valid records is the signature FE FF FF FF
(-2), which identifies a valid record. Assuming that the signature is not found in any other position in the
file (it may possibly be found in the page header bit maps), it should be possible to dump all the valid
records out to another file.
As a result of analysis, three data structures were found in the IDX file namely file header, page header and
the actual data. The file itself was found to be highly structured with predictable characteristics. Most of this
information was already documented in the downloaded information, however there were a few
observations made that, could not be investigated and explained in the time available. The first data
structure found was a file header. The file header contains some unknown information, a pointer to the first
record data structure, and the version of ICQ database. The file header is always found at the start of an IDX
file and is 20 bytes in length. Immediately after the file header is the second data structure, called a page
header. A page consists of a 205 byte page header and space allocated for 1000 data record entries, making
a total page size of 20205 bytes. When an IDX file is initially created, it is created with a file header and a
single page, making a size of 20225 bytes.
The page header itself consists of some unknown information, but includes a pointer to the next page header
(or -1 if its the last page header), and accounting information about record sizes, number of record slots
used and what record slots are allocated.
New pages appear to be allocated on an as needed basis, as the current page in the file is filled. As page
sizes are always exactly the same, if the number of pages is known, the exact file size can be calculated.
This is useful if the file is being recovered after being deleted, however it does require that the last two page
headers be located in fragments that can be pieced together, to create a single portion of the file.
The reason for this is that as the pages are always the same size, the page headers are known to occur at
exact intervals in the file. A page header that has a pointer to a subsequent page header can have its own
position calculated by subtracting 20205 from the pointer value to gain the position of that page header.
This can not be done with the last page header as its pointer value is -1. Hence the previous page header is
required to get the pointer value of the start of the last page. Adding 20205 to that file offset gives the exact
size of the IDX file.
File fragments containing page headers can be positioned by ordering them by the values of their pointers to
the next file. This is useful when reconstructing log files as the exact position of the start of each page can
be calculated.
The third data structure is the data records themselves. The data records are arranged in a doubly linked list
of valid record entries, with each record storing the position in the IDX file of the previous and next record
in the linked list. The other information stored in the record is the status of the entry, the DAT entry number
and the file offset within the DAT file of that record. The IDX file can be used to find any and all valid
entries in the DAT file, which contains the logged information. This information can be used to help
recreate DAT files from fragments, and also to help recreate IDX files, due to the file offsets that are stored.
The IDX Reader program verified the information as correct, showing that a correct algorithm had been
created for extracting IDX information. While this information was not essential to the project, it did
provide the following benefits:

Proceedings of 3rd Australian Computer, Network & Information Forensics Conference

Page 57

1.

Assisted with locating records in the DAT file for analysis

2.

Understanding of the way information was stored in binary files

3.

Verify programming techniques used for extraction of information

4.

Provide algorithms that could be used to provide further verification of data extracted from the
DAT files

5.

Provide a tool to assist in manual analysis of the DAT files.

ANALYSIS OF THE DAT FILE


Dat File Recovery
Dat file recovery should be simple once enough records have been extracted out of the IDX file. The record
signatures in figure 4 could be used to locate records and their entry numbers. The extracted linked list
entries could then be searched for matching entries, which would have the offset of the dat entry. The list
presented in figure 4 may not be a comprehensive list of all dat file entries but it Strictz (Strickz, 2002) says
that it covers most entry types.

Various messages:
Chat request:
File request:
My Details:
Contact:
Reminder:
Note:

E0,23,A3,DB,DF,B8,D1,11,8A,65,00,60,08,71,A3,91
E1,23,A3,DB,DF,B8,D1,11,8A,65,00,60,08,71,A3,91
E2,23,A3,DB,DF,B8,D1,11,8A,65,00,60,08,71,A3,91
E4,23,A3,DB,DF,B8,D1,11,8A,65,00,60,08,71,A3,91
E5,23,A3,DB,DF,B8,D1,11,8A,65,00,60,08,71,A3,91
E6,23,A3,DB,DF,B8,D1,11,8A,65,00,60,08,71,A3,91
EE,23,A3,DB,DF,B8,D1,11,8A,65,00,60,08,71,A3,91

Figure 4: DAT entry signatures. (Strickz, 2002)


As different versions of the official Mirabilis ICQ client were released, changes were made to the structure
of the log file databases. These changes mean that any program that extracts information from a database
has to be able to identify the version of the program that created the database first, and then extract the
information and format it according to the version of ICQ that created the file under examination.
Initial information that was downloaded showed numerous and a different version of the log files, each with
differences that needed accounting for when being extracted and formatted. After some analysis, it was
found that each record had a record header that is consistent across every version of the databases looked at.
This header contained the length of the record, the record number, filing information, a message signature,
and what was described in the documentation as a separator value. The record header is consistent across
all versions of the ICQ database. Even the 2003b client database, which has significant differences from all
other versions, appears to have this header format.
Another consistency across all versions, 2003b not withstanding, was that the structure of normal text
messages was identical. This made it much simpler to implement a system that could extract useful data. As
ICQ is primarily a text based messaging system, much of the useful information could be expected to be
extracted from these types of messages. It made sense therefore that once a program had been created to
extract records that the first records that would be interpreted would be messages. This would allow
extraction of text conversations in all versions of ICQ up to and including 2003a.
The issue of deleted log entries
Initial research indicated that records were probably not deleted from the DAT file. Instead the records in
the IDX file that referenced records to be deleted were removed. The freed record space in the DAT file
would then be used as required.
This often meant that whole and partial records were left in the DAT file. When the first attempt to create a
program to extract DAT records was created, the program simply searched the file for the various signature
values, read the record length and copied the data into buffers that was then put into a linked list.
The problem with this was that when signatures from deleted records would be found, often parts of the
records had been deleted, including the start of record with the record length. When run over the test data,

Proceedings of 3rd Australian Computer, Network & Information Forensics Conference

Page 58

when a record was found with a length field that was overwritten to zero length, the program entered an
infinite loop. The only to recover from this was to terminate the program.
While this method would have been suitable for databases that contained no deleted data, it was clearly not
suitable for databases that contained numerous records that had been deleted over time. It is also not suitable
to ignore deleted data as it may contain vital information that could be used. This meant that using the IDX
file to recover only valid records was also unsuitable. Another reason for not using the IDX file is that if it
deleted an overwritten it would be unavailable for use. The only useful way to extract the information would
be to locate and identify records and partial records in the DAT file and successfully extract them.
How the entries were extracted
Several ideas were thought up and worked through on paper, modified and refined until one method was
found that would be suitable for extracting the log files and a prototype made. First a search function was
created that would return the file offset of the next signature in the file, or zero if the end of file was found
instead.
The search function was called and then if a signature was found the program entered a loop. A data
structure was created to store the record and accounting information about it. The search function was then
called to find the next signature. If another signature was found then the length of the current record was
found and the end position was calculated, if possible. If the previous record possibly overwrote the start of
the current record then the end of record was calculated to be the last byte before the start of the next record.
The start of the first record was always assumed to be valid. If the end of the record was calculated to occur
after the start of the next record then the two records were deemed to overlap. The end of the current record
and the start of the next record were recorded as DIRTY and the number of overlapping bytes was
recorded with each record.
The length of the record was calculated and that much data read into the data structure. Accounting
information about the status of the end of the record was also added and then the record structure added the
end of the linked list. Information obtained about the next signature was stored in local variables. The loop
then iterated. After the next record data structure was created, the information would be copied to the new
data structure. The loop exited when no new records were found. Once the prototype was able to extract all
records and partial records, it was ported to C++ and record structures were added to for validation and
output of data.
Identifying the ICQ version
Initially it was thought that extensive examination of the record structures would be required to identify the
version of ICQ being used. After some analysis it was noted that for the versions of ICQ listed in the
downloaded information, the separator value changed with each version. Right beside the hexadecimal
notation of the separator value was the decimal value, which corresponded with the version number that
created the log. What this meant was that a single record could be located and from that value the exact
version of the ICQ client that created the log could be identified.
Identification and verification of entries
Identification of individual record types is also quite simple. The signature values are all 16 bytes in length.
Most signatures differ only in the first byte with the last 15 bytes being identical. The first byte however, is
unique to each record type and provides an excellent method for identifying the type of record being
extracted.
As information sorted out about the data that was contained in each data structure, a class hierarchy was
created that had its root class as containing the header structure only. From there a level of classes was
created that contained information common to all versions of ICQ for a type of record. The third layer
generally contained information specific to individual record types.
This allows the manipulation of records as a single record type, as a type of record such as Message, or as
a specific type of message, depending on what was required from them data structure.
Each data structure inherits a method called verify from the base class which can be overwritten. Verify is
designed to verify the information stored specifically by that class if possible. Each class would contain
information specific to that class and would be designed to verify the validity of that data. In the initial test
version of this program, there was no validation of entries as to test the basic concepts only entries verified

Proceedings of 3rd Australian Computer, Network & Information Forensics Conference

Page 59

as being valid were used. In later implementations of the program validation methods would be identified
and added to the program.
Testing
Once the classes were created and a library of functions to do endian conversions created, outputting the
records was quite simple. Streaming operators were created and the output was simply streamed to the
console. Once a significant amount of work had been completed, a chat session was held and a small
amount of test data obtained from it. The version of ICQ used was 2003a, the most recent version that can
be analysed with this version of the extraction program. Using the log files that were obtained, three types
of analysis was performed. These were manual, using the ICQ program itself to output the history of the
conversation and the program that was created in this project.
Initial results of this program are that the program is extremely accurate in the limited output that it can
achieve with only one limitation. The analysis of the timestamps does not account for the time zone, which
for Perth is GMT +08:00
So far a fairly extensive amount of analysis has been completed allowing for the creation of this program.
The program itself has the following limitations:
1.

Areas of possible concern for the algorithm have been identified and need to be investigated. This
is further discussed in the next section on future work.

2.

Only the message structures being identified and output.

3.

There are no options to adjust time stamps for time zones or discrepancies in time. The program
does adjust for time zones.

4.

Only complete entries that are marked as CLEAN are used. There is no validation of the entries
that would allow the other records to be used.

5.

The extracted data records are not constant, which means that they could be accidentally altered.

6.

The program does not sort by UIN number or time, so while all the messages are output they are
not sorted into conversations, which must be done manually.

7.

The current data structures do not allow for the 2003b version of the ICQ client to be analysed.
This is because the database is now distributed over a number files, signatures have changed, and
the structure of the messages have changed.

While these limitations reduce the usefulness of the program somewhat in its current form, it still has great
usefulness in reducing the amount of time needed to extract useful information, which would then need to
be manually organised.

FUTURE WORK
While records are being extracted from the databases, there is still more work to be done. The first task will
be to address the limitations that have already been mentioned above. The concern with the extraction
algorithm is that although the current algorithm performs flawlessly, it depends on the first record being
valid. Also, the algorithm has yet to be tested with multiple DIRTY records one after the other. Further
refinements may be required to make the algorithm more robust.
Another purpose for strengthening this algorithm is that it could be used to analyse partial database files
where the first record in a file fragment is not CLEAN. It is possible that once this project is completed, a
second project or an extension of the current project is undertaken to create a program that locates database
file fragments and reconstructs a database file. This was not a concern for this project, however it would be
useful for a forensic analyst to have such a capability and this could be considered at a later date.
There are a number of other data structures that need to be added to the program. These data structures all
require that the version of the program be known as the data structures to be used are specific to that version
of the database. Thus, work needs to be done to properly implement objects for each version of the database
that uses the required data structures. These objects could be used to hold global database information
which could be used for various purposes. What this data is has yet to be identified. However, validation
routines could be expected to use this information.
It is also expected that validation routines will increase the amount of accounting information that a record
will need to keep about its associated data, such as previous and next records. Another concern is that only

Proceedings of 3rd Australian Computer, Network & Information Forensics Conference

Page 60

records that have a complete signature are found. In principle, it is possible that a record could exist that
contains useable data, but which does not have a valid signature due to it being overwritten. It would be
useful to extend the extraction algorithm to detect partial records without a complete signature. In this event,
it would require the validation routines to be able to identify a record type from a partial record. Therefore,
research is required to determine how much of a record type is required before its type can be determined.

REFERENCES
Anonymous. (2002) Computer-aided crime faces computer-aided forensics. Last Update September 18,
2002, Retrieved 16th April, 2005, from http://www.info.gov.hk/gia/general/200209/18/0918158.htm
Hitu. (2002). IcqHR (Version 1.8).
Mosnews. (2005), U.S. Cyber-Crime Unit Focuses on Russian Hackers. Retrieved 14th April, 2005, from
http://www.mosnews.com/news/2005/04/05/compcrime.shtml
Poulsen, K. (2005), Hacker penetrates T- Mobile Systems, Retrieved 14th April, 2005, from
http://www.crime-research.org/news/12.01.2005/892/
Soeder, D. (2000), Icqnewdb.txt. Last Update 19th April, 2000, Retrieved April 6, 2005
Strickz. (2002), ICQ Db Specs. Last Updated 8th July, 2002, Retrieved April 5, 2005, from
http://cvs.sourceforge.net/viewcvs.py/miranda-icq/Plugins/import/docs/import-ICQ_Db_Specs.txt

COPYRIGHT
Kimberley Morfitt and Craig Valli 2005. The author/s assign the School of Computer and Information
Science (SCIS) & Edith Cowan University a non-exclusive license to use this document for personal use
provided that the article is used in full and this copyright statement is reproduced. The authors also grant a
non-exclusive license to SCIS & ECU to publish this document in full in the Conference Proceedings. Such
documents may be published on the World Wide Web, CD-ROM, in printed form, and on mirror sites on
the World Wide Web. Any other usage is prohibited without the express permission of the authors.

Proceedings of 3rd Australian Computer, Network & Information Forensics Conference

Page 61

Googling Forensics
Benjamin Turnbull
University of South Australia
benjamin.turnbull@unisa.edu.au
Detective Sergeant Barry Blundell
South Australia Police (Electronic Crime Division)
Dr Jill Slay
University of South Australia

Abstract
This paper discusses the emerging trend of Personal Desktop Searching utilities on desktop PCs, and how
the information cached and stored with these systems can be retrieved and analysed, even after the original
document has been removed. Focusing on the free Google Desktop Search program, this paper first analyses
how the program operates, the processes involved, files created and altered, and methods on retrieving this
data without corrupting the contents. The limitations of extracting data from Google Desktop Search have
also been discussed, along with some future work in the area.

Keywords
Forensic computing, computer forensics, Google, desktop search

INTRODUCTION
As computer usage continues to become more ubiquitous, the data created, stored and edited by the average
user has grown in variety, complexity and quantity. Email, word processing, basic text, accounting, video
and audio are just a small number of file types that the average computer user may utilise. Whilst searching
for files is a feature found in the majority of Operating Systems, the complexity and range of data on the
modern PC has left them limited in their usage, awkwardly slow and unable to navigate within documents,
leaving them useless unless searching by file names.
This technology gap has recently become a contested area between several companies with Internet search
engines, as well as a number of small start-up enterprises. The attractiveness of this new market and being
in a position to merge desktop and Internet searching can in effect ensure more clients for a particular
online search site.
The uptake in these programs may have benefits within the field of Forensic Computing. In essence, these
indexed files may take much of the drudgery away from searching entire hard disks for keywords, when the
majority of user data may already be indexed by one or more search utilities. Whilst it is expected that there
will be limitations, any program that stores metadata may be of use within an investigation, as that is the
primary purpose of these tools.
This paper aims to analyse one popular desktop search program, Google Desktop Search, discuss how it
operates, if and where it stores data, and the limitations of its operation. All data has been collected on
dedicated machines utilising no other software that may interfere, and where analysis software has been
used, it has been chosen for its unobtrusive and passive nature. Google automatically updates its desktop
search program via HTTP, so it is difficult to discuss versions of the program. All experiments were carried
out between the 11th and 26th May, 2005.

GOOGLE DESKTOP SEARCH


Google Desktop Search was one of the first programs released onto the public market in mid 2004, and
despite only recently leaving the Beta testing stage it represents one of the more popular desktop searching
utilities. It is designed for use on a single-user Windows machine. Within a multi-user environment, should

Proceedings of 3rd Australian Computer, Network & Information Forensics Conference

Page 62

a user with administrative rights install and run Google Desktop Search, the program indexes and searched
all users files, regardless of their owner.
Google experienced negative publicity from a number of sources after the initial release of the product
which was widely reported in the press, with many citing it as a potential security weakness (Spring, 2004;
Posey, 2005). Google Desktop Search merely indexed all files that it is given access to, highlighting the
security issues of multi-user systems and Windows reliance on administrative accounts rather than causing
these issues. To many, this represents a failure in effective design if not security.
Google Desktop has also had other bugs discovered within it, resulting from a study conducted by Rice
University, indicating that vulnerabilities existed in the integration of Google Desktop Search and the
Google internet search engine (Nielson et al, 2004). Google has since claimed to have patched the
vulnerabilities announced in this paper, but has not discussed what steps were taken to ensure this. Google
has also maintained that there is no evidence to suggest that these vulnerabilities were exploited (n.a., 2005).
A Deeper Understanding of Google Desktop Search
The first point of interest is that Google Desktop Search is only designed for use on NT-based Operating
Systems from Windows 2000 and onwards. This may be seen to be isolating a significant portion of
potential user-base, but, as discussed below, the program itself makes use of libraries only available in
these, newer platforms.
Google has also designed their desktop searching utility to allow third-party additions to its software,
publishing several APIs that it uses, allowing for customization in searching parameters. However, all thirdparty additions must use the Google API to customise settings through the Google program, meaning that
direct communication with the database used to store files is not permissible.
Google Desktop Search is comprised of three executables, GoogleDesktopIndex.exe,
GoogleDesktopSearch.exe and GoogleDesktopCrawl.exe. GoogleDesktopSearch.exe is the main program of
the Google Desktop Search suite, controls the user interaction, and launches the other executables. The
GoogleDesktopSearch.exe is the main executable, and operates by setting up a HTTP server on local port
4664. It is from this that all user interaction occurs. GoogleDesktopCrawl.exe is a program that traverses the
file structure of a hard disk and reports changes to the GoogleDesktopIndex program.
GoogleDesktopIndex.exe interfaces with the persistent storage files, GoogleDesktopCrawl and the
Microsofts Indexing Service. The Indexing Service can send notifications when files are changed, and by
listening to this, GoogleDesktopCrawl is able to determine files that potentially require updating.
The
Google
Desktop
Search
program
creates
a
registry
key
at
HKEY_USERS\<SID>\Software\Google\Google Desktop where <SID> is the unique SID, which may look
similar to S-1-5-21-3721486523-3945230961-2495595618-1004. There are several options here, including
the location for storage of files.
Opening Googles Files
Upon installation, Google Desktop Search makes two folders. The first of these, with the default location
\Program Files\Google\Google Desktop Search stores the executables and DLL files required to run the
application. The other, with the default installation \Documents and Settings\<username>\Local
Settings\Application Data\Google\Google Desktop Search (where <username> equates to the user that
installed the application) stores a series of files named dbc2e.ht1, dbdam, dbdao, dbeam, dbeo, dbm,
dbu2d.ht1, dbvm.cf1, dbvmh.ht1, fii, fii.cf1, fiih.ht1, hes.evt, outlook_data, rpm.cf1, rpmh.ht1,and sites.txt.
These files are not always present, and there are also several temporary files that are used by the program.
Of these, several are human readable, but the majority are not. The file sites.txt is merely a list of different
Google mirrors (for example, google.com, google.com.au, etc). The files dbdam, dbdao, dbeam, and dbeao
are text-based, and appear to show the process of GoogleDesktopCrawl, and represent all files indexed and
websites visited. The non text-based files within this folder are of interest as they may contain the
information collected by the Google Desktop Search, the settings for the database used by the program and
possibly other data required for the program, such as information required for the Outlook email program
(such as passwords to offline folders). It has been surmised that these files are encrypted and/or compressed
(Krishnan, 2004). Evidence of compression to these files can be obtained from analysis of the libraries that
each of the Google executables utilise.
Using a non-invasive file activity monitor, such as Filemon (www.sysinternals.com), the files and libraries
used by processes may be examined. Upon activation, GoogleDesktopIndex.exe calls a series of DLL
libraries within Windows. Of these, notably RSAENH.DLL, CRYPT32.DLL, CRYPTUI.DLL and

Proceedings of 3rd Australian Computer, Network & Information Forensics Conference

Page 63

MSASN1.DLL are used for the encryption and decryption of files. Also, the Google installation folder
contains gzlib.dll, which is a compression library (Krishnan, 2004). As the Google Desktop Search
application does not obviously use encryption for other purposes, and the examination of compressed ZIP
files is done via Microsofts own zipfldr.dll (with the path C:\Windows\System32\ zipfldr.dll), the most
obvious explanation would be that stored files are encrypted and possibly compressed, as this also accounts
for the lack of obvious structure found. Further evidence of encryption is given by the Forensic Tool Kit
program (www.accessdata.com), which utilises an Entropy Test, designed to detect files which are
encrypted, compressed or otherwise obfuscated. Of these files, only the outlook_data file is classified as an
encrypted or compressed files however, testing for entropy will only indicate files which are entirely
encrypted. Based on this, it could be inferred that Google Desktop Search may use a database, the files for
which are not encrypted, but all data contained within them may be.
As the Google Desktop Search provides its interface through HTML pages in the default browser, it was
hoped that the use of a passive network sniffer, such as Ethereal (www.ethereal.com) could be used to
determine the exact communication between the two programs. However, these programs do not monitor
the localhost interface, and can only be used in conjunction with actual network connections.
There are several obstacles that need to be overcome before data can be extracted from the Google Desktop
Search.
As discussed, the majority of files used by Google Desktop Search are not stored in a human-readable
format.
Google Desktop Search makes use of some encryption and possibly compression libraries, and it is not
know how these are implemented or how to retrieve this information.
Considering that these files are not readily available to interpretation, one method to view their contents is
to use the Google Desktop Search program itself. However, there are several reasons why this is not an
optimal solution within a Forensic investigation. The first reason is that access to raw data is preferable to
information that has been filtered in an unknown way, which the Google Desktop Search program may do.
Access to the raw data is much preferred, as it eliminates any contamination which may result from the use
of an interface. Searching through the Google Desktop Search interface is also disadvantageous for its
inefficiency the data cannot be browsed for information, only searched on specific criteria. This implies
that an investigator must already have search combinations in mind before searching for data.
There are also logistical problems with the use of using one copy of the Google Desktop Search program to
view files created in another in a forensically sound manner, as this program was never designed to do this.
The first obstacle is that although Google Desktop Search has separate programs executing different tasks of
the suite, these are inter-dependent and rely upon each other to work correctly. For example, when loading
GoogleDesktopSearch.exe, the program immediately executes GoogleDesktopCrawl.exe and
GoogleDesktopIndex.exe. If the GoogleDesktopIndex process is ended by the Windows Task Manager,
GoogleDesktopSearch will automatically re-execute it.
What is required is a method of searching the Google Desktop Search program without it indexing or
changing files. Google has one solution to this from within the program a user has the ability to Pause
indexing. This action pauses the GoogleDesktopCrawl program. However, this occurs after Google
Desktop Search is running and presumably indexing, so it occurs too late. Ensuring that the files required by
Google Desktop Search are read-only (either by changing the default storage location in the registry to a CD
media or by changing the attributes) is also not effective, as the program performs a check on this before
executing. When loading the files, if they are not able to be written to, the program fails with Database error
13387, which automatically diverts to Googles help centre.
One method to prevent Google Desktop Search from indexing at all is to prevent the two components of the
program responsible for indexing and updating the cache from loading, by manually renaming both the
GoogleDesktopCrawl.exe and the GoogleDesktopIndex.exe executables. This prevents them from being
activated by the original program when the program is first loaded. However, as these tools are so interdependent, running GoogleDesktopSearch independently of the other two programs results in only the
Google Desktop Search icon in the taskbar no other functions of the program operate correctly. It would
appear from this that the GoogleDesktopIndex operates components of the user interface. However, by
renaming only the GoogleDesktopCrawl.exe (for example, renaming it to GoogleDesktopCrawl.exe2)
solves many of these issues. The program will still execute and the user interface is still accessible, but the
indexing of files does not occur.
One must also be careful about Google Desktop Search creating and altering files whilst in operation.
Whilst the authors have been unable to reproduce the exact conditions under which this occurs, the files that

Proceedings of 3rd Australian Computer, Network & Information Forensics Conference

Page 64

are created are temporary and removing them does not affect the integrity of the results produced. Similarly,
the outlook_data file produced will be altered by an open copy of the Microsoft Outlook program. Google
Desktop Search also will edit all files contained within the default storage location when it is manually
closed, with the exception of the dbdao file.
From this, there can be derived a procedure for viewing the stored contents of the Google Desktop Search
program without tampering with them:
Copy the Google Desktop Search storage folder (where the default is c:\Documents and
Settings\<username>\Local Settings\Application Data\Google\Google Desktop Search) from the source
machine to the Google Desktop Search folder on a machine conducting analysis
On the analysis machine, rename the file GoogleDesktopCrawl.exe to GoogleDesktopCrawl.exe2. This will
prevent it from loading.
Open the Google Desktop Search program, ensuring that no Email programs are loaded.
After the Google Desktop Search program has loaded, traverse to the storage folder on the analysis
machine, and change the file attributes of these files to Read-Only. This will allow the Google Desktop
Search program to close without editing any files.

THE USES OF
COMPUTING

GOOGLE

DESKTOP

SEARCH

WITHIN

FORENSIC

Although the storage files of Google Desktop Search are not human-readable, the data that is stored within
these files is still accessible, although access to the data is limited to the Google Desktop Search user
interface.
Searching and storage of emails is a varied task, as it depends on the type of mail used and how the client
has been configured. In the case that email is stored remotely via an IMAP (www.imap.org) or through the
Exchange protocol, it may be problematic or time-consuming retrieving all email from a machine. However,
Google Desktop Search stores emails locally for searching, which is accessible through the program. This
includes offline storage such as Microsoft Outlooks use of .PST files to store information.
By far the most unique feature within Google Desktop Search for a Forensic investigator is that the program
caches, indexes and stores Internet sites visited, much in the same way that Windows does, by default. This
is the only desktop searching utility with this feature, and possibly stems from Googles background within
the Internet searching field. Google Desktop Search performs all cataloguing and indexing entirely
independently of the Windows caching of Internet pages, so should a user delete their temporary Internet
files, cache and cookies, this record is maintained by the Google Desktop Search program. Google Desktop
Search caches all HTML Internet pages visited, including pages retrieved via an SSL connection (this can
be removed via a configuration option). This has added benefit when it is realised that there are several
programs available designed to remove this very information in an irretrievable manner, but these operate
solely with the Operating System, and fail to take into account any other programs that may be collecting
and storing this data. Additionally, should a single webpage have been visited repeatedly, the Google
Desktop Search will store cached copies of all of these pages, giving exact information on what was
presented to the browser on each occasion visited. Much in the same way that the Google Internet Search
(www.google.com) caches popular pages, only the HTML is stored with images retrieved from the remote
site.
Whilst the program does not store images locally, either from local or remote locations, it often will store
thumbnails of images that are stored locally on a system. This is independent of the image itself not
arranged on the fly, meaning that investigators interested in images that may have been altered or deleted
may still find a thumbnail PNG file 109x75 pixels in size.
The Google Personal Desktop Search was remarkably interesting for its caching of certain file types such
as text, that continue to exist after the original item has been deleted. This may continue indefinitely, and
the result is not easily removed.

LIMITATIONS OF DESKTOP SEARCH UTILITIES


As Desktop Searching programs are primarily designed for users to locate files, images, emails or Internet
history, forensic analysis of metadata produced by these programs may not provide an accurate
representation of the files contained with these machines. This is intentional in the programs designs, as
they are designed to index and retrieve user-created data, and will therefore not index all files on a machine,

Proceedings of 3rd Australian Computer, Network & Information Forensics Conference

Page 65

merely ones that conform to particular criteria and are stored in locations that are likely to contain such data.
Google Desktop Search did not search or index all files, but narrowed search space to areas that are more
liable to contain documents stored by the user rather than files used to operate and maintain the machine.
Files stored within the default Windows directory, within the Recycle Bin or that are invisible were not be
indexed, as it was unlikely that these areas would yield results, and their exclusion increases the efficiency
of the program.
These restricted searching limits the results returned and stored by Desktop Searching programs and reduces
the impact that analysis can provide, as it is possible to ensure that should these files exist on a given
machine, they are not indexed.
It would be a simple matter for a user to ensure that particular files stored on a machine are not searched and
indexed by a Desktop Search program, but these programs are not designed for thorough searching, rather to
aid the user where appropriate. From a forensic computing perspective, it cannot be assumed that any data
found within these programs could be considered complete, as it is a simple matter to ensure that files are
not indexed. The benefit here does not lie in providing a complete account of all activity in itself merely
another source of potentially enlightening material.
The increased usage of utilities that provide metadata for a particular system beyond that created by the
Operating System may have several benefits for those in Forensic Computing Investigation, as they may
create data that does not exist in any other form or has been deleted, and may be used to verify other data by
providing consistent results. For example, Google Desktop Search retains past Internet history
independently of the Operating System and browser, and needs to be cleared independently by the user.
Even current disk-wipe programs, designed to securely delete Internet history, recently opened documents
and slack space make no claim to removing the metadata produced by these programs.
There are a number of disadvantages to the increased use of Desktop Searching programs, and in their
current stage they only have limited applicability. As discussed, one major limiting factor for utilities such
as Google Desktop Search is that they have a refined searching field and only index files according to strict
criteria of visibility, location and file extension. Further, as these are still new technologies, their interface
and searching mechanisms are often primitive and unsuited to the personal desktop. Searches are made by
keyword and cannot be made by date or other factor. It is this that limits these programs usefulness, as
without a clear indication of what to search for, there is a possibility that information will be missed. Within
the Google Desktop Search, a search for a word will not return results with that word as a substring, so a
search for celeb will not return results where the word celebrity appears. Whilst this is logical within an
Internet search, which may return results numbered in the millions, this closed approach is not suited to a
desktop, and when trying to extract information from the stored search data, it is tedious.
The incomplete nature of the Google Desktop Search can be further identified when the process discussed
above, to read index data created by other machines and other copies of Google Desktop Search, stops the
indexing process. Shutting off the indexing component of the software prevents the program from indexing
files changed during this period it fails to register changes made even once it has been resumed. This
could be because the program has failed, rather than being manually shut off from within the program.
As there is to date no single product dominating this market there are several proprietary data formats used
for the storage of the data produced by these programs, and tailoring of a system, when possible, is designed
to act as a plug-in for an already running program, which is not applicable within a Forensic Computing
context. Google Desktop Search makes some use of encryption, compression or obfuscation to ensure that
the information collected is not be human-readable, or could not be used by other programs. Whilst this
makes good sense from a computer security perspective, it also reduces the effectiveness of these programs
from a Forensic Computing perspective, as the raw data is not available for viewing, only through the
program interface.

CONCLUSION:
Whilst still new, the desktop search utility represents a growing area of software, with many Internet-based
companies adapting their work to this area and merging their services. Where this makes sense for
investigators is that these programs often store data independently of an operating system platform, and
hence may contain extraneous metadata, which has potential use within an investigation. As these programs
become more popular and as they improve, their use will only grow and they will become more powerful.
Discussed here is only a work-around solution to extracting the data stored within Google Desktop Search,
whereas ideally, extracting, interpreting and querying the data directly would be a preferable solution. The
most obvious method of doing so would be to reverse engineer the storage files, and construct programs to

Proceedings of 3rd Australian Computer, Network & Information Forensics Conference

Page 66

analyse and present directly from the raw data. However, given the use of encryption, this could possibly be
a time-consuming task.
A more feasible solution would be to expand on the solution discussed above, and reiterating that Google
does allow third part extensions of their work, write plug-ins or programs that utilise the GDS Developer
Search API, and performing more exhaustive and in-depth searches. This would not be difficult and would
make data-mining much more automatic in nature.
These programs exist only to overcome the limitations found within existing search programs and it is
unknown if in the long-term, these programs will continue to exist. Microsoft have released their own
searching program, which could potentially be integrated into the next Windows release codenamed
Longhorn. It is not difficult to see this occurring, and if it does, then there will be no need for any other,
similar programs. But for the moment, there is a market for these products, and it does provide another
source of data that may be of use, as often the user-data captured is similar to the data searched for within a
Forensic Investigation.

REFERENCES:
Krishnan, S, 2004, Reverse Engineering Google Desktop Search, available at
http://dotnetjunkies.com/WebLog/sriram/archive/2004/11/22/33091.aspx
n.a., 2005, Google Desktop Search Release Notes, Google, available online at
http://desktop.google.com/releasenotes.html
Nielson, S., Fogarty, S., & Wallach, D., 2004, Attacks on local searching tools, Technical Report TR04445, Department of Computer Science, Rice University. Available at http://seclab.cs.rice.edu
Posey, B., 2005, The Security Risks of Desktop Searches, Windows Security. Com, available online at
www.windowssecurity.com
Spring, T., 2004, Google Desktop Search: Security Threat?, PC World Magazine, October 15, 2004,
available online at http://blogs.pcworld.com/staffblog/archives/000264.html

COPYRIGHT
Turnbull, Blundell, Slay 2005. The author/s assign the School of Computer and Information Science
(SCIS) & Edith Cowan University a non-exclusive license to use this document for personal use provided
that the article is used in full and this copyright statement is reproduced. The authors also grant a nonexclusive license to SCIS & ECU to publish this document in full in the Conference Proceedings. Such
documents may be published on the World Wide Web, CD-ROM, in printed form, and on mirror sites on
the World Wide Web. Any other usage is prohibited without the express permission of the authors.

Proceedings of 3rd Australian Computer, Network & Information Forensics Conference

Page 67

Honeypot technologies and their applicability as an internal countermeasure


Craig Valli
School of Computer and Information Science, Edith Cowan University
c.valli@ecu.edu.au

Abstract
Honeypots or honeynets are a technology that is rapidly maturing and establishing this archetype of
countermeasure as viable and useful in modern network defence. Honeypot technology is now at a point of
development where near real-time monitoring and forensic analysis of security events can occur. This paper
explores the hurdles to be overcome for the internal deployment of honeypot technologies.
Keywords honeypot, internal, misuse, IDS, firewall

INTRODUCTION
Honeypots or honeynets are a technology that is rapidly maturing and establishing this countermeasure as
viable and useful in modern network defence. However, most honeypot technology is designed to be
outwards facing and consequently, it is not useful in reducing
the impact of internal cyber attacks in organisations.
Various recent security surveys (AusCERT, Australian High Tech Crime Centre et al., 2005; AusCERT,
Australian High Tech Crime Centre et al., 2004; CSO Magazine et al., 2004; L. A. Gordon et al., 2004; L.
A. Gordon et al., 2005; Richardson, 2003; Schneier, 2005) cite the most expensive and frequent forms of
successful attack originating from within an organisation. These surveys quote that 60-90% of attacks are
internally oriented. It seems incongruous that honeypot technology is not being deployed as an internal
countermeasure to combat insider misuse of information systems.
It is now well established in the literature that honeypot class systems are effective in trapping and
monitoring malicious activity (Honeynet Project, 2004; Spencer, 2004; Spitzner, 2002, 2003). Honeypot
technology is now at a point of development where near real-time monitoring and forensic analysis of
security events can occur. Typically, these systems are nearly all externally focused.
Honeypot systems tend to use deception as a key weapon. The deception is based largely on a premise of
masking the real that is, an attacker is intentionally misled about a networks structure or weak points. An
external entity wishing to gain access to a networked system has to perform certain perceptible and tangible
probes on a network to gather intelligence on the network composition and structure. This probing may
employ brute force or stealth either way much of this probing should be detected by even the most basic of
intrusion detection systems.
Further to this initial probing the attacker must then craft attacks to penetrate the network which is already
based a posteriori on the attack intelligence gathered. Hence, with some initial changes in perception by
displaying the false advantage can be gained over the attacker. A simple example is the manipulation of the
TCP/IP stack Operating System (OS) fingerprint of the probed host to indicate a different OS than is
operating on the system. This is a deception with relative low complexity and low deployment cost and can
be readily perpetrated against the external attacker. This deception can have a magnifying effect due to the a
posteriori state that the prober believes to be true.
By comparison, the users within the organisation are almost gifted with omnipotence when compared to an
external entity for gaining knowledge of network construction. Much of the knowledge that an internal user
can gather through social interaction and engineering will be a priori giving them significant advantage
when attacking systems. As an example, they would know that the main routers and switches of a network
are a particular model and type because they will have sighted this in their physical locale (typically most
devices display their brand and model visibly on the front panel). This type of internal intelligence gathering
presents significant challenges for the composition of internally focussed honeypot systems to effectively
deceive and ensnare internal miscreants.

Proceedings of 3rd Australian Computer, Network & Information Forensics Conference

Page 68

HONEYPOTS, HONEYNETS, HONEYFILES...


Evidence presented in various publications and conferences would support the proposition that honeypots
and derivative technologies are becoming a viable digital countermeasure(Gupta, 2002; Valli, 2003; Yek &
Valli, 2003) . Honeypots have been found to be effective in retarding or negating the spread of malicious
code such as network borne worms and spam (Oudot, 2003; Spencer, 2004).
For the purposes of this paper the semantics behind terms such as honeypots, honeynets and honeyfiles will
be largely ignored. The paper is concerned more with operational and strategic issues of deployment and
will refer to the term honeypot technologies (HPT) to encompass all of these technologies.
The principle underlying tenet of each honeypot deployment is the gaining of advantage by techniques of
deception that allows for the successful misleading or decoying of the attacking entity be it a human or
digital instance. This deception is achieved by the HPT being compromised or being seen as vulnerable so
that valuable forensic information can be collected as a result of attempted exploit or penetration of that
system by attackers. The information gathered from HPT can have a variety of uses and purposes with the
paramount being that of providing advantage. The advantage can be either short term through mitigation of
attacks or long term solving or overcoming an exploit as a result of analysing data collected.
Most current HPT have been constructed with the premise that attack will occur from outside a network
from entities attempting to compromise or penetrate a system. Furthermore, most literature that is in
existence about honeypots their morphology and modus operandi is likewise focussed at repelling the
digital outsider. It would appear that internal honeypot construction, deployment and development is largely
ignored, why?
This phenomena of external fixation is in direct contradiction to the evidence that is published from
numerous computer and network security reports touted by industry and academics alike to advance security
as an issue (AusCERT, Centre et al., 2004, 2005; Gordon et al., 2004; Gordon et al., 2005). These reports
attempt to give vignettes of the status quo of the computer security threat landscape, and they consistently
find or conclude that internal threats realise the most damage by monetary impact and number of incidents.
Firewalls and Intrusion Detection Systems (IDS) are deployed to control and monitor internal assets in
organisations yet HPT does not yet appear to be on the inside even though there is sufficient evidence to
indicate their utility in detecting malicious behaviour.

Barriers to deployment?
HPT is rapidly maturing and is starting to see deployment in major organisations that are required to protect
highly confidential or readily fungible commodities such as money. Businesses and governments are
increasing surveillance of the workplace both physically and digitally.
It is interesting to posit if the barriers to the deployment technological or organisational? From a technology
perspective it could be said it is true that most honeypot deployments are still highly experimental even the
commercially supported solutions. HPT in some respects are probably more stable than the Intrusion
Prevention System (IPS) suites offered by commercial vendors that some sites are using. The acceptance of
IPS could be as a result of being sold as the replacement silver-bullet for IDS inability to detect newer forms
of attack (Conry-Murray, 2005). IPS, IDS and firewall technology all have the comfortable and safe modus
operandi of repelling the external hackers at the perimeter network interface.
By contrast, honeypots are almost the anti-thesis of IDS, IPS and firewall technologies taking the approach
by using the principle extolled by Sun Tzu of keeping ones foes under closer observation than ones friends
by bringing them inside the defensive perimeter and monitoring their every machination. This mode of
operation is a significant change in mindset from the perimeter mentality on which most current network
security is predicated. In the same way, that modern warfare has developed into network effects and
operations likewise network security needs a similar paradigm shift from regimented compartments to
network centric tactics.
One of the most often touted responses to the deployment of honeypots is the legal concept of entrapment
(Schultz, 2002). This area is highly contentious and, of course, varies greatly depending upon which

Proceedings of 3rd Australian Computer, Network & Information Forensics Conference

Page 69

jurisdiction the concept of entrapment is viewed. It is not the purpose of the paper to debate these issues
other than to highlight they exist for conventional externally faced honeypot deployments. Internally faced
honeypot systems do present various ethical and moral problems to an organisation but no more than
possible existing schemes of network based monitoring. For example, most organisations now record a
users e-mail and World Wide Web usage often without the knowledge of the user. The entrapment
argument could be applied to these existing scenarios of workplace surveillance. However, it is fair to say
that properly constructed honeypot systems placed into an enterprise network that is controlled tightly by
appropriate policy and procedure should not present problems in this regard. A honeypot in this situation
should only encounter users who are in direct breach of company policy and whose actions are deliberate,
malicious and intentional. This would include policy that restricts users to accessing services for which they
are allocated and have legitimate use of. Furthermore, the honeypot system itself would contain warnings
via banners or pop-up messages that accesses of these systems are for authorised personnel only and that
action beyond this point is in breach of company policy. If the user then chooses to probe or attempts to
compromise a system, they have made a conscious decision to do so and this could not be called
entrapment.
Unlike external honeypots internally deployed honeypots would trap existing behaviours and threat patterns
within the confines of the organisational network. Internal honeypots are possibly not having to respond to
the new malicious code types or new exploits that an externally faced honeypot would due to their network
depth in the organisation. Hence, the level of staff expertise needed to effectively manage one of those
could be lower as they can utilise existing defaults within the systems that mitigate against known threats.
Honeyfiles are a method that are not related to any particular exploit and are focussed squarely at
determining if a file has been accessed.
Where IDSs fail internally is that they are looking for a particular binary sequence or compromise of a rule
set for a single instance of behaviour, which they then respond to typically with a single action such as
dropping the connection, which either succeeds or fails. IDS typically have poor mechanisms for allowing
the forensic reconstruction of an incident beyond action taken by the IDS as their primary function is one of
system protection and not that of data collection. HPT however, are designed to track and monitor all
behaviours without prejudice even if there is interaction for example with an IDS in a honeypot system.
HPT should be designed in such a way as to record an attackers actions and effects on a system at the not
only the network layer but also the systems and applications layers. This approach enables security
specialists to recreate the sequence of events or actions that resulted in an incident as all data should be
present sufficient to all forensic reconstruction of an incident. This should then enable internal security
personnel to adopt a learning paradigm in remediation of security incidents through the analysis of this data.
Internal users could also compromise valuable internal systems not seen or accessible by the Internet with
conventional countermeasures in place, unable to detect this misuse as they are typically externally focused
at the Internet egress point. Many organisations utilise firewall and Virtual LAN (VLAN) systems to control
traffic flow to these high value networks and some even deploy IDS (Wilson, 2005) . IDS and Firewalls
technologies can suffer from a lack of forensic completeness due to their modus operandi. At an
organisational level, there are often trust elements at play that may make the deployment and even
management of these contemporary countermeasures vexatious (Anonymous, 2003; Camardella, 2003;
Fidler, 2004). This problem is possibly exacerbated by the perception of attacks being external (Briere,
2005; Robb, 2005).
The deployment of an internal honeypot to these high-value network segments could significantly reduce
risk profiles within an organisation. This is because HPT could detect behaviours before they become
issues and work as an early warning system to security administrators. For example, HPT could also
effectively record unknown or unspecified errors in the security of existing systems for example, incorrect
rights management settings for a particular server or service that allows an internal user probative abilities.

INTERNAL HONEYPOT DESIGN CONSIDERATIONS


The design philosophy for internal honeypots as compared to conventionally deployed external honeypots is
affected on several levels. Firstly, external honeypots use the concept of exhausting an attacker's resources
which given the finite constraints of a broadband Internet connection is an achievable maxim, although
getting more remote as bandwidth increases. Honeypots that are deployed internally do not have this same
ability as an internal abuser will typically be using a high-speed Ethernet connection with high levels of
access to the honeypot. Network latency is normally measured in fractions of a microsecond and often on

Proceedings of 3rd Australian Computer, Network & Information Forensics Conference

Page 70

the same network segment or in close proximity. Therefore, issues of network latency, system availability
and reachability for the gathering of attack intelligence, or perpetration of attack, do not present a significant
resource barrier for an internal abuser.
As mentioned earlier, an internal user will have a higher level of initial attack intelligence from which to
attempt system surveillance or even penetration and compromise. For example, they can use legitimate
network browsing tools such as Microsoft Network Neighbourhood to examine server naming conventions
to significantly reduce the amount of guesswork or probing that an external attacker would have to
undertake to garner the same results. This also allows an internal attacker to significantly narrow the
detection window for existing countermeasures such as IDS, Firewalls and IPS as a result of this reduced
activity. The reduced activity from an insider would possibly not trigger response from any countermeasure.
In some cases the countermeasures will be configured to trust these internal entities allowing for a more
extensive and secretive probing of the network. This is reduced further because of the lessened need for
actual probing of the network to discover things such as operating system, patch levels, server application
and other information that an external attacker would gather before attempting system compromise.
One of the other issues with deployment of internal honeypots is that of organisational leakage of the
honeypot secret. Most honeypot systems rely upon deception via masking that is the hacker is unaware
that they are in a honeypot. If a malicious insider now knows a honeypot exists within the network then they
may cease activity or adjust behaviour to suit, and be ultimately untrusting of any new additions to the
network.
Topology will be an issue dependent upon the topology of the internal network. An external facing
honeypot can add layers of network such as De-Militarized Zone (DMZ) and VLANs into the design. These
are designed typically to delay and distract an external hacker; the internal design does not have this luxury.
The internal user will have an understanding of the existing network topology and if the honeypot network
topology likewise does not effectively and reliably mimic the existing real network topology then once
again exposure of the internal honeypot is a high-risk proposition.

CONCLUSION
The use of honeypots as an effective countermeasure to external attack is a well proven concept. Much of
the literature in this area of investigation focuses on the use of honeypots as external facing
countermeasures to external attacks. There is very little, if any, literature available or experimentation being
conducted on internally focused honeypots. This indicates the need for some exploratory and ongoing
research in this area.
The modus operandi of internal honeypot deployment presents significant changes in focus and design from
externally faced honeypots. Malicious insiders will have a significant tactical advantage over their external
counterparts when probing, penetrating or compromising a network this is borne out by many of the security
surveys are conducted around the world. This significantly changes some of the basic premises upon which
existing honeypot technologies are deployed and has major impacts on design and deployment.
Internal honeypots offer a potentially viable method for tracking malicious insiders compared to other
contemporary network countermeasures and this warrants further investigation. The high level of interaction
and logging that is possible in a single instance is potentially superior to Firewalls, IDS and IPS. Where
attacks may simply be denied by firewalls or IDS, a honeypot system will allow for greater surveillance and
monitoring of the malicious insiders activities.
Future research in this area needs to be conducted into the practical implementation and deployment of
internal honeypots within contemporary organisational settings. Deployment of internal honeypots is not a
simple technical issue but also has potentially many organisational factors such as trust and insider
malfeasance to contend with as disenfranchisers of this particular technology.

REFERENCES
Anonymous. (2003). Watching the watchers. Country Monitor, 11(37), 6.
AusCERT, et al. (2004). The 2004 Australian Computer Crime and Security Survey (report No. 4).

Proceedings of 3rd Australian Computer, Network & Information Forensics Conference

Page 71

Queensland: University of Queensland.


AusCERT, et al. (2005). The 2005 Australian Computer Crime and Security Survey (report No. 5).
Queensland: University of Queensland.
Briere, D. (2005). Defending the castle. Network World, 22(22), 37.
Camardella, M. J. (2003). Electronic monitoring in the workplace. Employment Relations Today, 30(3), 91.
Conry-Murray, A. (2005). Keep Attackers At Bay. InformationWeek(1046), 45.
CSO Magazine, et al. (2004). The 2004 E-crime watch survey (report). Pennsylvania: Pittsburgh: Carnegie
Mellon University.
Fidler, S. I. R. (2004). Workplace Privacy Issues: Potential Pitfalls For Unwary Employers (with Forms).
Practical Lawyer, 50(5), 43.
Gordon, L. A., et al. (2004). 2004 CSI/FBI Computer Crime and Security Survey (No. 9). San Francisco:
Computer Security Institute, Federal Bureau of Investigation's Computer Intrusion Squad.
Gordon, L. A., et al. (2005). 2005 CSI/FBI Computer Crime and Security Survey: Computer Security
Institute.
Gupta, N. (2002). Improving the Effectiveness of Deceptive Honeynets through an Empirical Learning
Approach. Paper presented at the 2002 Australian Information Warfare and Security Conference,
Perth, Western Australia.
Honeynet Project. (2004). Know your enemy: Learning about security threats (2nd ed.). Boston: AddisonWesley.
Oudot, L. (2003). Fighting Internet Worms With Honeypots. Retrieved 5th August, 2005, from
http://www.securityfocus.com/infocus/1740
Richardson, R. (2003). 2003 CSI/FBI computer crime and security survey (report No. 8). San Francisco:
Computer Security Institute, Federal Bureau of Investigation's Computer Intrusion Squad.
Robb, D. (2005). Erecting Barriers. Computerworld, 39(12), 42.
Schneier, B. (2005). 2005 Internet attack trends. Retrieved 15 June, 2005, from
http://www.schneier.com/essay-085.pdf
Schultz, E. (2002). Honeypots make headlines. Computers & Security, 21(6), 489.
Spencer, B. (2004). Honeypots: A couple of production honeypots used to fight spam. Retrieved 13
February, 2004, from http://seclists.org/lists/honeypots/2004/Jan-Mar/0025.html
Spitzner, L. (2002). Know your enemy. Indianapolis: Addison-Wesley.
Spitzner, L. (2003). Honeypots - tracking hackers. Boston: Pearson Education Inc.
Valli, C. (2003). Honeyd - A fingerprinting Artifice. Paper presented at the 1st Australian Computer,
Information and Network Forensics Conference, Scarborough, Western Australia.
Wilson, J. (2005, June 25). The Two Sides Of Network-Security Devices -- Infonetics study shows secure
routers and appliances are most popular. VARbusiness, 65.
Yek, S. & Valli, C. (2003). If you go down to the Internet today - Deceptive Honeypots. Journal of
Information Warfare, 2(3), 101-108.

COPYRIGHT

Proceedings of 3rd Australian Computer, Network & Information Forensics Conference

Page 72

Craig Valli 2005. The author/s assign the School of Computer and Information Science (SCIS) & Edith
Cowan University a non-exclusive license to use this document for personal use provided that the article is
used in full and this copyright statement is reproduced. The authors also grant a non-exclusive license to
SCIS & ECU to publish this document in full in the Conference Proceedings. Such documents may be
published on the World Wide Web, CD-ROM, in printed form, and on mirror sites on the World Wide Web.
Any other usage is prohibited without the express permission of the authors.

Proceedings of 3rd Australian Computer, Network & Information Forensics Conference

Page 73

A UK and Australian Study of Hard Disk Disposal


Craig Valli
Edith Cowan University
School of Computer and Information Science
c.valli@ecu.edu.au
Andy Jones
British Telecom Labs
Edith Cowan University Adjunct
School of Computer and Information Science

Abstract
Recent studies in Australia and the United Kingdom indicate that a broad cross-section of organisations are
failing to adequately protect or erase confidential data stored on hard disk drives before subsequent
disposal. Over 90% of hard disks that were examined as a result of the two Independent studies were in an
easily recoverable state with some drives simply requiring a boot. This paper will give an overview and
comparison of the two studies conducted. Then an examination of possible factors responsible for the
inadequate erasure of hard disk devices will be undertaken. Furthermore, possible future research
directions will also be explored

Keywords
hard disk, erasure, forensic recovery

INTRODUCTION
Hard drives are the primary storage devices for most modern Internet enabled organisations. Whether these
drives are in large server based Raid arrays, a desktop PC or a laptop they all, potentially, contain
information which, if it is discovered or disclosed, could have catastrophic consequences for the
organisation or an individual. Recent studies in Australia (Valli, 2004) and the United Kingdom (Jones
et.al, 2005) indicate that a broad cross-section of organisations are failing to adequately protect or erase
confidential data stored on hard disk drives before subsequent disposal. Over 90% of hard disks that were
examined as a result of the two Independent studies contained data that was in an easily recoverable state
with some drives simply requiring a boot. Earlier studies conducted by (Garfinkel & Shelat, 2003) in the US
found similar issue with erasure of hard disks.
The US Department of Defense (USDoD) erasure standards (Defense, 1997) have become a de-facto
standard and supporting commercial and freeware software exists for the secure erasure of such devices
based upon these. The USDoD standard DoD 5220.22-M is stated as Overwrite all addressable locations
with a character, its complement, then a random character and verify p58, (Defense, 1997). It should be
noted that this level of erasure is recommended for all devices except those containing Top Secret
classification materials, which must be disposed of by disintegration into particles. The use of USDoD
standards and other similar stringent erasure techniques are designed to make it sufficiently expensive or
highly improbable that recovery can be achieved by using standard forensic recovery techniques.
Furthermore, the use of high number of forensic rewrites (normally 35 as per the USDOD standard) is
meant to provide protection against recovery of data via magnetic remanence techniques using electron
microscopes to scan the disk surface.
There are some perceived technical constraints for the timely secure erasure of software and data contained
on these hard drives when systems have to be rapidly replaced as is common practice during system
rollovers. This paper will give an overview and comparison of the two studies conducted. Then an
examination of possible factors responsible for the inadequate erasure of hard disk devices will be
undertaken. Furthermore, possible future research directions will also be explored.

Proceedings of 3rd Australian Computer, Network & Information Forensics Conference

Page 74

OVERVIEW OF THE STUDIES


Two studies were conducted into the ability to forensically recover data contained on disposed or secondhand hard disk mechanisms available for sale to the public. The Australian study experimented on 23
independent cases that were acquired through public computer auctions. The UK case study experimented
on 105 independent hard disks that were acquired from computer auctions, computer fairs and the on-line
auction at e-Bay. In the UK the disks were blind sourced to the University by a computer recycling
company. In each study a battery of forensic techniques were applied to the hard disks to affect recovery. In
the Australian study 21 of the hard disks were easily recoverable, one hard disk was erased and one hard
disk had mechanical failure. In the UK study 105 disks were examined. Of these 13 were unreadable due
to mechanical failure, with 10 of these having been physically tampered, 16 had been erased and 76 were
easily readable.
The level of forensic expertise needed to recover the data contained on the disposed hard drives was very
low. In several cases simply powering on the hard drive revealed its contents virgo intactica from point of
disposal to the examiners. In both the Australian and the UK studies, simple use of the unformat command
or the use of a hexadecimal editor allowed recovery of nearly all data.
The profile of organisational types uncovered in both studies was broad from small home owned computer
systems to government critical infrastructure provider server hard drives. In Australia the only hard drive
that was properly erased was a large telecommunications infrastructure provider hard disk from a desktop
PC. In the UK, of the 16 hard disks that had been properly erased, it was subsequently revealed that 12 had
come from a computer recycling company that undertakes government contracts. The origin of the
remaining four erased disks could not be determined. All of the hard drives that were recoverable contained
information that was confidential or commercial in nature whose disclosure could have had catastrophic
consequences for the individual or the organisation concerned. None of the drives examined were using
cryptography to protect the sensitive confidential or commercial data contained on the disks.
In some of the cases the hard disk drives had files that carried date stamps that were less than two weeks old
at the time that they were acquired, implying that the information contained on the hard disks was current.
It should be noted that this two-week turnaround is often well within organisational security policy
constraints relating to password changing on systems for instance. This narrow window of time from
disposal to resale makes the data highly contextual and relevant to intelligence gathering for attack or other
malicious intentions.
While it might be expected that the disks that had come from home use computers and from small
companies would be more likely to contain information that should have been erased, the reality was that
there was no significant difference between these and the larger enterprises discovered in the studies.
The level of organizational information that was recovered from the disks included a wide range of
information that would be of value and interest to a number of groups, including;
The disks (16) from one major leisure service organisation, had originally been used by the finance
department, including those used by the finance manager and their assistant (the root directories of the disks
were named FMANAGER/STOCK and FASSISTANT/FMANAGER) that contained detailed documents
on their property holdings, names, addresses and telephone numbers of members of staff, wage information,
balance sheets, incident reports, profit and loss balance sheets, expenses details for members of staff all of
which was less than three months old. Another example from a large financial institution gave details of
confidential memos marked for internal use only, staff directories and staff profiles. Disks from a third
organisation (a food biotechnology company) gave details of crop trials in the UK, including locations dates
of the trials.
The disks that were recovered from academic institutes, which were found to contain a large amount of
information, including course work and exam results actual and predicted, letters of reference, together
with template letters and logos.
The information on one disk that was recovered that was found to have been used by a primary (elementary)
school was particularly disturbing in that it contained a significant level of information relating to children
that could be identified, including reports on student progress throughout the year, a report that related to a
bullying incident and another that related to medical treatment for another child.
Examples of disks recovered from private systems included a database of passwords from the system of one
(identifiable) user and emails relating to an extra-marital affair, where the participants could be identified.

Proceedings of 3rd Australian Computer, Network & Information Forensics Conference

Page 75

IMPLICATIONS FOR PRACTICE


The implications of information being available to people who have no right to access it are wide ranging.
From the point of view of the corporation, there is the potential exposure for industrial espionage, noncompliance with statutory or industry sector requirements and potential litigation. In the case of the disks
that could be attributed to the commercial sector, the potential cost to the organisation of a leakage of
information is high. For the leisure service industry organisation (a household name in the UK), the
information could be attributed to its finance department and the loss of it could be considered to be nothing
short of negligent. To any competitor or a potential supplier, the level and detail that was available would
allow for a very accurate analysis of the financial viability of a number of the operations being run by the
organisation. It provided details of turnover, stock levels and the names and contact details for staff.
The disks that had previously belonged to a financial institution gave information including directories of
staff and business plans, all of which would be of high potential value to anyone outside the organisation
and of great potential embarrassment to the organisation. Some of the issues that failure to dispose of the
data effectively exposed the organisations to include: breach of statutory or industry sector regulations,
fraud, industrial espionage and potential network intrusion or hacking. Where credit card or at other
account details have been left on the devices simple and significant theft can occur. Furthering this premise,
what would occur if a high-value, high transaction customers details were compromised? The potential for
financial malfeasance is large, immediate and potentially catastrophic.
The discovery of information in the studies from major information and critical infrastructure providers has
significant implications for national security. Since the initial research over a year ago and more recently
again in Australia (Jenkins, 2005) government servers acquired at an auction were found to contain sensitive
information. This risk is further substantiated in research by (Valli, 2004) where computers acquired at
auction had information from state owned infrastructure utilities present. The potential for misuse of this
information by people wishing to impact on the nation state such as other governments, activist and terrorist
groups is something that cannot be discounted as an unrealisable threat. If we run a scenario where servers
came out of a major infrastructure provider and these servers contained SCADA control software and codes,
the potential impact of this could be catastrophic.
For an individual or small business, some of the issues that the failure to erase the information can cause are
identity theft, blackmail and fraud. Identity theft is one of the fastest growing new trends in criminal activity
and much of the activity is now focused on the Internet with the growth of spyware. One has to ask the
question: why should thieves spend time developing spyware programs when much of the information they
require is readily available at auction or computer recyclers?
Furthermore, the high level of discovery that was encountered in both of these studies demonstrates that
many organisational security initiatives have been negated because of the poor asset disposal procedures. It
is almost incongruous that large organisations could spend millions if not tens of millions in currency per
year defending the corporate edifices from external and internal compromise and leave the very data that
people are trying to acquire on disposed hard drives or equipment. This approach by organisations to
erasure negates the money spent on deploying and maintaining countermeasures such as firewalls, virus
scanners, spyware removers and content filters which are used to reduce the risk of systems penetration or
compromise.

WHY NOT ERASED?


The length of time it takes to erase a hard disk is an issue, particularly if the hard disk is in a machine that is
to be rolled out over a weekend period for a large organisation. The redundant computer is often live until
the close of business at weeks end, leaving little time for the secure erasure let alone removal and deregistration of the equipment. It would be impractical in terms of time and human resource to remove each
drive and erase it. This problem will only increase as hard disk capacities continue to climb and secure
erasure which involves the need multiple overwrites is needed to ensure confidentiality. In the case of the
large organisations that were identified in the UK study, it was subsequently discovered that in all cases, the
organisations had agreements with third party organisations to securely dispose of the material. It is clear
that these arrangements had failed to achieve the objective and the organisations must be considered to
remain responsible, as they had not tested that the arrangement worked.
Recovery from drives that may actually still be functional on a computer that has failed is a possibility. This
is could be categorised as a loss of data by misassumption i.e. the specialist has assumed that the hard disk
is the cause of the failure. Or in the case of the Australian study the hard drives were recovered from laptops

Proceedings of 3rd Australian Computer, Network & Information Forensics Conference

Page 76

that were sent in for repair and then having been taken off of organisational asset registers were disposed of
by third parties.
A lack of legislative requirement could be a contributing factor in the lack of proper erasure of digital
media. Many Australian State and Federal Acts cover the use, storage and transmission of information. One
such act is the Privacy Act, 1988 which provides legislated requirement to protect information collected on
individuals. It covers the transmission and transferral of data and private information but does not cover the
destruction or disposal of this data particularly in digital form. This is an area that could be considered for
incorporation into legislature to ensure the correct erasure of data from drives. The efficacy of this approach
is also questionable as the UK study found hard disks that under the Data Protection Act should have been
erased.
Hardware vendors are realizing the need for protecting data on hard disk drives and Seagate have released a
hard disk in 2.5 form factor that has onboard hardware based encryption of data (Seagate, 2005). Other
vendors such as (Systems, 2002) have had third party hardware encryption for hard disks for some time.
These types of hardware based solutions provide the only real protection from compromise of sensitive data.
They provide protection by encryption of the hard disk drive in hardware

CONCLUSION
Further studies need to be conducted into the reasons why organizations do not adequately ensure that their
hard drives are erased properly. Freeware utilities exist that will adequately perform erasure of hard disk
drives so cost of software acquisition is a relatively minor issue.
Research should focus on the human and organisational impediments to the secure erasure of data devices.
The inability to erase drives from a technology perspective, as a defence for non-erasure does not hold the
drives can be erased. However, as entry level hard disks are now approaching 80 Gigabytes in size the issue
of time is a significant factor and warrants investigation as this will only continue to grow as the use of
larger digital storage technology expands.
Sales of other storage technology such as USB flash memory sticks, Compact Flash, Secure Digital Card
Memory are also rapidly increasing, as is their usage. These other alternate channel repositories of corporate
memory will also require investigation. In particular flash memory may be recoverable particularly Flash
Translation Layer (FTL) types of memory. As erasers may only erase the FTL leaving the data intact in the
memory space on the device.
Research into effective strategies to educate end users is urgently needed to combat the extreme risk of data
recovery from incorrectly cleansed data storage devices that for the best part individuals and organisations
appear to be either ignorant of or simply unwilling to address adequately. This will then provide
mechanisms for the proper and correct protection of corporate data.

REFERENCES
Defense (1997). DoD 5220.22-M: National Industrial Security Program Operating
Manual, Department of Defense.
Garfinkel, S. L. and A. Shelat (2003). "Remembrance of Data Passed: A Study of Disk
Sanitization Practise." IEEE Security and Privacy 1(1).
Jenkins, C. (2005, August 2nd). Govt data sent to auction. The Australian.
Jones, A. et.al (2005) Analysis of Data Recovered from Computer Disks released for Resale by
Organisations, Journal of Information Warfare, 4, (2)
Seagate (2005) Seagate Introduces World's First 2.5-Inch Perpendicular Recording Hard Drive; First
Major Hdd Maker To Deliver Notebook Pc Drive With Hardware-Based Full Disc Encryption
Security, http://www.seagate.com/cda/newsinfo/newsroom/releases/article/0,,2732,00.html
Systems, (2002) Secure Data Vault, Secure Systems, Perth. http://www.securesystems.com.au
Valli, C. (2004) Throwing out the Enterprise with Hard Disk, In 2nd Australian Computer, Networks &
Information Forensics Conference, School of Computer and Information Science, Edith Cowan
University, Perth, Western Australia, pp. 124-129.

Proceedings of 3rd Australian Computer, Network & Information Forensics Conference

Page 77

Valli, C. and Patak, P. (2005) An investigation into the efficiency of forensic erasure tools for hard disk
mechanisms, Paper accepted for 3rd Australian Computer, Networks & Information Forensics
Conference, School of Computer and Information Science, Edith Cowan University, Perth, Western
Australia

COPYRIGHT
Craig Valli and Andy Jones 2005. The author/s assign the School of Computer and Information Science
(SCIS) & Edith Cowan University a non-exclusive license to use this document for personal use provided
that the article is used in full and this copyright statement is reproduced. The authors also grant a nonexclusive license to the SCIS & ECU to publish this document in full in the Conference Proceedings. Such
documents may be published on the World Wide Web, CD-ROM, in printed form, and on mirror sites on
the World Wide Web. Any other usage is prohibited without the express permission of the authors

Proceedings of 3rd Australian Computer, Network & Information Forensics Conference

Page 78

An investigation into the efficiency of forensic erasure tools for hard disk
mechanisms
Craig Valli
Paul Patak
Edith Cowan University
School of Computer and Information Science
c.valli@ecu.edu.au
p.patak@ecu.edu.au

Abstract
One of the common anecdotal complaints used when defending the insecure erasure of hard disks is the
length of time taken to affect a secure erasure. This paper discusses results of experiments conducted with
Unix/Linux based hard disk wiping software when run on various machines and hard disk mechanisms in
terms of size, speed and interface. The initial research has uncovered a range of issues and factors that
affect the speed of erasure of hard disk mechanisms. Some of these factors included memory configuration
and CPU but not in ways that were expected. This paper includes results from contemporary ATA and the
newer SATA IDE hard disk drives in use today.

Keywords

erasure, hard drive, forensics, Knoppix


INTRODUCTION
A large volume of confidential, secret and sensitive information is stored on millions of
mass storage devices such as hard drives. All organisations and individuals that use
computers will have to dispose of them at some stage due to obsolescence of the
equipment. Many organisations or individuals will simply on sell or trade in the computer.
The problem is that many of these computers have drives that are in a state where
information contained on the drives can be recovered (Duvall 2003; Monroe 2003; de
Paula 2004; de Paula 2004). Recent studies by (Garfinkel and Shelat 2003; Jones, et .al
2005; Valli, 2004) have indicated significant problems with the safe and secure disposal
of hard disk assets. Over 80% or more of drives examined in these studies indicate they
had information that was readily retrievable some simply by powering up the hard disk.
The market for hard disks is not decreasing but expanding. Gartner Dataquest predicts that
shipments of desktop-class 3.5-inch hard disks will grow from 190.8 million in 2003 to
298.7 million in 2008. For laptops with 2.5-inch hard disks the growth is expected to go
from 3.6 million units in 2003 to almost 20 million units in 2008 (Monroe 2003). This
indicates that the problem of disk disposal will continue to increase as these drives
become obsolete. One of the remedies for this is the secure erasure of the hard disk device
by software that conforms to the US Deparment of Defense (DoD) 5220.22-M or uses
other techniques to write psuedo-random strings of data to a hard disk several times over.
The US Department of Defense (USDOD) standard DoD 5220.22-M is stated as
Overwrite all addressable locations with a character, its complement, then a random
character and verify p58, (Department_of_Defense, 1997). It should be noted that this
level of erasure is recommended for all devices except those containing Top Secret
classification materials which must be disposed of by disintegration into particles. One of
the paragons of the erasure literature Gutmann(1996) stated that 35 wipes or passes of a
drive made it sufficiently expensive to recover data off of hard disks.
One of the common anecdotal complaints used when defending the insecure erasure of

Proceedings of 3rd Australian Computer, Network & Information Forensics Conference

Page 79

hard disks is the length of time taken to affect a secure erasure. This paper discusses
results of Linux based hard disk wiping software when run on various machines and hard
disk mechanisms in terms of size, speed and interface. The tests yielded some anomalous
performance with various software and hardware configurations. This paper only dealt
with speed issues; a subsequent paper will be produced that looks at the ability to recover
the data from the mechanisms.
EXPERIMENTAL PROCEDURE
A range of hard disks types and Intel based PCs of varying processing power and RAM configurations were
utilised to conduct the experiments. The hard disks and supporting technology utilised were a range of IDE
drives and computers taken from recently redundant computers from Edith Cowan University. The different
configurations are shown in Table 1.
Mainboard/Computer
ASUS
ASUS
IBM NetVista
IBM

CPU
Pentium 3 - 733 Mhz
Pentium 3 - 866 Mhz
AthlonXP - 1800
Pentium 4 - 1.8Ghz
Pentium 4 - 3.0Ghz

Table 1 - Types of Computer used in Experiments


The software used to perform the wipes was the wipe program from the Auditor CD (Version 1.6). The
wipes were performed in quick mode on the wipe utility. The hard disk parameters were extracted using the
hdparm utility and the /proc system. No changes to the disk configuration as is possible with the hdparm
utility were made this was done so as to reflect normal practice.
For each drive that was utilised there were 3 sets of tests performed these were 1, 3 and 7 wipes/passes of
the drive with the software used. Due to time constraints 35 wipe tests were not conducted however, as can
be seen later from the results and extrapolation via simple mathematics is possible to ascertain wipe times if
such tests were conducted. Each test was conducted 3 times and the resultant time in seconds was averaged
from these 3 results for each wipe set. If there were significant variance in the results then the test set was
run again until such time as there was minimal variance. A baseline of RAM for each machine was set
these were 256MB for the P3s, 512 for the IBM P4s. Then tests were rerun using varying the original
baseline RAM to allow comparative analysis on varying RAM configurations. The variety of configurations
was limited by the technical resources available and the ability to acquire/use various RAM stick
configurations.
The intention of the experiment was to validate a range of KNOPPIX based CDs that are in common use
these being Knoppix STD, Autopsy and Helix. However, during the testing procedure the first CD used was
the Knoppix STD and it gave exceptional performance for erasure of drives across on all machines tested.
The figures when tested in fact were so exceptional that the times were not physically possible. The defect
or why this occurred at this stage has not been investigated the experimental conditions will be replicated at
a later stage.

EXPERIMENTAL RESULTS AND DISCUSSION


This section of the paper is a summary of the tests conducted so far a full copy of the results from all of
testing is available on-line from http://scissec.scis.ecu.edu.au/wipers and will be updated as new testing is
conducted and results become available. The current results will be used to infer trends and patterns but
should not be relied upon as ultimately conclusive.
As can be seen in Table 2 the time taken to erase hard disk mechanisms to recognised standards such as
Department of Defence is significant. If some of the more modern ATA mechanisms e.g. drives above 40
GB in size are correctly wiped with 35 passes the time taken to complete this is not an insignificant hurdle.
For example to securely erase a mechanism on a typically redundant SOE computer with a 80GB ATA
mechanism taken from Table 2 takes between 28 and 45 minutes per pass with quick settings which means a
completion time of approximately 16 to 26 hours. Indications are that Serial ATA (SATA) mechanisms are
even more of a problem with the 250GB SATA as tested taking a calculated 61 hours to complete.

Proceedings of 3rd Australian Computer, Network & Information Forensics Conference

Page 80

SATA drives are replacing conventional ATA mechanisms and most new mainboards now natively support
this type of bus. The SATA standards promise higher speed transfer rates than conventional ATA with its
maximum speed now at 72 Mbytes, having SATA-I specified as 150Mbytes per second and SATA-II
(which is now in limited release) at 300Mbytes per second. The speed advantages from the new mode of
operation are now lost against larger capacity drives with the entry level for this class of drive being 80
Gigabytes. The results from the testing conducted are not promising with a 250GB SATA taking 58 - 61
hours to erase to the DoD 35 pass standard. The 80GB SATA mechanism used in this test actually
performed worse than the 80GB ATA mechanism by 26% which is somewhat strange give then SATA by
default is supposed to be the faster technology. This warrants further investigation as to why this has
occurred.
In another scenario a defective PC or laptop that needs to be sent to a repairer for repair would take
considerable time before the machine with hard disk erased could be released to the repairer. It would
appear from these preliminary results that there may be some credence in the claims that secure erasure
takes too long to accomplish given restricted time frames.
Regardless of SATA/ATA misnomer these results indicate that the amounts of time that a PC must be
powered on and left running to achieve erasure of the mechanism to the DoD standards is not insignificant.
This amount of time to effect erasure if taken in the context of a critical path for a new system rollout that
may only be as long as 48 hours is a significant impost or in the case of SATA impossible to achieve.

Number
of Wipes

Processor, RAM, Drive


ATH-1.8-S256-IBM-40
ATH-1.8-S256-QTM-20
ATH-1.8-S256-WD-80
P3-733-128-IBM-40
P3-733-128-QTM-20
P3-733-256-IBM-20
P3-733-256-QTM-10
P3-733-256-QTM-20
P3-733-256-WD-80
P3-733-384-IBM-40
P3-733-S256-IBM-20
P3-866-256-IBM-40
P3-866-256-QTM-10
P3-866-256-WD-80
P4-1.8-256-WD-80
P4-1.8-384-IBM-40
P4-1.8-384-IBM-60
P4-1.8-384-QTM-10
P4-1.8-384-QTM-3.2
P4-1.8-384-WD-80
P4-1.8-512-WD-80
P4-3-512-QTM-20
P4-3.0-1024-IBM-40
P4-3.0-1024-SATA250
P4-3.0-1024-SATA80
P4-3.0-1024-WD-80
P4-3.0-512-QTM-3.2
P4-3.0-512-SATA250
P4-3.0-512-SATA80
P4-3.0-512-WD-80
P4-3.0-NOHT-1024-WD-80
P4-3.0-S512-QTM-20

1
0.30
0.23
0.52
0.45
0.26
0.23
0.24
0.28
0.76
0.48
0.22
0.47
0.23
0.67
0.47
0.25
0.25
0.23
0.10
0.47
0.47
0.22
0.33
1.66
0.55
0.60
0.09
1.75
0.59
0.47
0.60
0.22

3
0.89
0.68
1.57
1.38
0.77
0.68
0.72
0.81
2.26
1.43
0.70
1.40
0.68
2.02
1.40
0.76
0.76
0.70
0.29
1.40
1.40
0.60
0.98
4.98
1.66
1.81
0.28
5.24
1.75
1.40
1.81
0.60

7
2.07
1.60
3.66
3.19
1.80
1.59
1.70
1.89
5.28
3.33
1.59
3.25
1.59
4.71
3.25
1.78
1.78
1.63
0.67
3.25
3.25
1.37
2.28
11.63
3.87
4.23
0.66
12.24
4.09
3.25
4.23
1.37

35
10.34
7.98
18.29
15.99
9.00
7.95
8.47
9.47
26.38
16.68
8.00
16.28
7.95
23.54
16.27
8.89
8.89
8.15
3.35
16.27
16.27
7.76
11.38
58.12
19.36
21.10
3.29
61.19
20.44
16.28
21.10
6.99

Table 2 Time Taken to Wipe Drive in Hours

Proceedings of 3rd Australian Computer, Network & Information Forensics Conference

Page 81

More RAM not necessarily faster but CPU does have an impact
RAM configuration for the same machine can have an marked impact on erasure speeds. These preliminary
results indicate that single sticks of RAM will out perform 2 sticks for the same RAM configuration. For
example from Table 2 P3-733-384-IBM-40 performed 6% slower than P3-733-128-IBM-40 utilising
combinations of 128 MB 133 MHz SDRAM sticks from the same manufacturer. Similarly the 3.0 GHz P4
with 512MB (P4-3.0-1024-WD-80) is 29.7% faster than (P4-3.0-1024-WD-80) on the 80GB mechanism.
Differences in performance on the P3 were expected due to the use of Synchronous DRAM (SDRAM) chips
in the machines used these were 133Mhz modules and as such are rated to transfer at 1.1GB/sec . However,
the P4-3.0 has DDR (Double Data Rate) technologies and as such the addition of extra RAM should not
present as much of a problem due to the purported increases in transfer rates as the chips were DDR 400.
The 29.7% difference in performance on the P4-WD-80 possibly indicates that DDR technologies suffer the
same problems as SDRAM. However, in fairness it should be pointed out that this could be a problem with
mainboard design and will have to be tested on other mainboards to see if the same results are garnered.
The ability for CPU to affect the erasure of hard disks is not to be discounted either if we isolate the RAM
difficulties and examine results from the same hard disk mechanism across the same family of CPU types
we notice a plateau in performance.
It is apparent from Figure 1 that the erasure speed of the hard past a given CPU performance barrier does
not improve. Both the P4-1.8Ghz and P4-3.0Ghz machines return identical results on the same hard disk.
What further points to this being a hard disk performance bottleneck is that the two P4s had different hard
disk controller chipsets yet returned near identical wipe times with differences of seconds over 16 hours of
operation. Excluding a software fault this would indicate that a hard drive performance affects erasure speed
dependent on sufficient CPU to drive them. This theme has since being repeated with drives still under test.

Wipe of 80GB - CPU - 35 passes


27.50
25.00
22.50

Hours to Wipe

20.00
17.50
15.00
12.50

Column N

10.00
7.50
5.00
2.50
0.00
P3-733-256WD-80

P3-866-256WD-80

P4-1.8-256WD-80

P4-1.8-384WD-80

P4-1.8-512WD-80

P4-3.0-512WD-80

CPU_Clock_RAM

Figure 1 Graph of CPU Type vs Wipe times for hard drive erasure

CONCLUSION
This research in progress has indicated that there are a range of factors and issues that can affect the ability
of hard disks to be erased forensically. Indications from this initial study are that memory configuration can
affect wipe times significantly and that single stick configurations out perform dual stick combinations. For
older architectures that used SDRAM this was an expected outcome but seeing as the newer DDR based
machine suffered from similar problems this will need further examination. The time savings by using a
single stick of RAM are significant and these would indicate that should a machine have more than 1 stick
of RAM that the extra sticks be removed to increase wipe speeds.
CPU is another factor that has some impact on wiping performance. The ability of the CPU to drive the hard
disk controller s to maximum performance levels has been an observed phenomenon in these tests. A

Proceedings of 3rd Australian Computer, Network & Information Forensics Conference

Page 82

sufficiently fast CPU can garner significant speed advantages up to the point of hard drive performance
limits.
Some software in particular the KNOPPIX STD wipe program performed wipes erroneously in that the
wipe speeds achieved for the ATA drives in our initial tests had wipe rates faster than the technology could
provide. The experimental results for the KNOPPIX STD need replication and extensive forensic
examination of the hard disks to determine if the hard drive surfaces have been correctly erased. Initial
evidence is that the drives wiped by this tool were not properly erased.
The research has accurately measured performance of the drives under forensic erasure procedures and has
found that the time to wipe drives sufficiently is not insignificant. If drives need to be wiped to stringent
levels such as disks from government departments, financial services sectors or any other entity having
sensitive information then time taken is a impediment. In the case of large 80GB plus ATA and SATA
drives times are measured in days not hours to achieve the DoD 35 passes standard. Protection of data by
physical destruction of the drives in these cases becomes a logical and potentially cheaper alternative. Even
for conventional ATA hard drives less 80GB the time taken to affect a high level of erasure is sizable.
These findings provide some credence to the argument that forensic erasure of hard drives is a time
consuming process.

REFERENCES
CCSE (2000). Clearing and Declassifying Electronic Data Storage Devices, Government of Canada,
Communications Security Establishment.
de Paula, M. (2004). "One Man's Trash Is... Dumpster-diving for disk drives raises eyebrows." USBanker
114(6): 12.
de Paula, M. (2004). "Security: Risk Of Improper Disposal Of Computer Trash Grows ; Wamu found out
the hard way that special care is necessary when discarding software and hardware." Bank
Technology News 17(6): 12.
Defense (1997). DoD 5220.22-M: National Industrial Security Program Operating Manual, Department of
Defense.
Duvall, M. (2003). "Memory Loss ; How a missing $100 pocket-sized drive spooked 825,000 customers of
canadian companies." Baseline 1(16): 65.
Garfinkel, S. L. and A. Shelat (2003). "Remembrance of Data Passed: A Study of Disk Sanitization
Practise." IEEE Security and Privacy 1(1).
Gutmann, P. (1996). Secure Deletion of Data from Magnetic and Solid-State Memory. Sixth USENIX
Security Symposium, San Jose, CA.
Jones, A. et.al (2005) Analysis of Data Recovered from Computer Disks released for Resale by
Organisations, Journal of Information Warfare, 4, (2)
Monroe, J. (2003). Forecast: hard disk drives, worldwide, 1999-2008 (executive summary), Gartner.
Valli, C. (2004). Throwing out the Enterprise with the Hard Disk, Proceedings of 2nd Australian,
Computer, Network and Information Forensics Conference, Fremantle, Western Australia

COPYRIGHT
Craig Valli & Paul Patak 2005. The author/s assign the School of Computer and Information Science &
Edith Cowan University a non-exclusive license to use this document for personal use provided that the
article is used in full and this copyright statement is reproduced. The authors also grant a non-exclusive
license to the School of Computer and Information Science & ECU to publish this document in full in the
Conference Proceedings. Such documents may be published on the World Wide Web, CD-ROM, in printed
form, and on mirror sites on the World Wide Web. Any other usage is prohibited without the express
permission of the authors

Proceedings of 3rd Australian Computer, Network & Information Forensics Conference

Page 83

An investigation into long range detection of passive UHF RFID tags


Craig Valli
Andrew Woodward
Ken Wild
Reino Karvinen
School of Computer and Information Science
Edith Cowan University
c.valli@ecu.edu.au

Abstract
Radio frequency identification tags (RFID) have been in use for a number of years, in a variety of
applications. They are a small computer chip like device that can range in size from a thumbnail to a credit
card size device. They consist of a small silicon chip, and an antenna used to receive and transmit data.
When a tag receives a signal from a valid reader it sends a response, typically a tag ID and any other
requested/available data back to the reader device. The newer range of RFID chips that are coming into use
now use higher frequencies (UHF) and are able to be detected, or transmitted to, from longer distances (1
10 m) with a conventional handheld reader. This increased distance alone presents many opportunities for
users and misusers alike. These include but are not limited to passive scanning/sniffing of information in
transit, deception, disruption of signal, and injection of malicious or false data into the broadcast envelope.
There is no evidence currently in the literature of long-range scans or attacks on UHF RFID tag or
supporting infrastructure. Given that these tags are now being used in military applications, an improved
understanding of their vulnerabilities from long range scanning techniques will contribute to national
security. An understanding of the long range scanning potential of these devices also will allow further
study into the possible misuse of RFID technology in society by governments, business and individuals.

Keywords
RFID, UHF tags, long range detection

OVERVIEW OF RFID
Radio frequency identification tags (RFID) have been in use for a number of years, in a variety of
applications. They are a small computer chip like device that can range in size from a thumbnail to a credit
card size device. They consist of a small silicon chip, and an antenna used to receive and transmit data.
When a tag receives a signal from a valid reader it sends a response, typically a tag ID and any other
requested/available data back to the reader device.
RFID tags are similar to a bar code for much of their application, but they have several advantages. They do
not need to be in line-of-sight like a conventional bar code, and they are less susceptible to environmental
degradation. If a bar code gets wet or torn, or the lines in the code obscured, then it is no longer useful as it
cannot be read anymore. In contrast, RFID tags can be coated in a protective resin rendering it less
vulnerable to certain environmental factors. An advantage of the RFID tag system is that each has a unique
identifying number and dependent on memory space can have an almost infinite range of numbers. For barcodes, the range of numbers is limited by their use of numerical codes, which is also limited by the width of
the label. Where bar-codes cannot store data, some RFID tags are able to store the equivalent of two pages ~
4Kbytes of textual information on the chip that can be modified at will.
The greater robustness of RFID compared to other technologies is now seeing military application of this
technology, particularly in the US (Gilbert 2003), as well as on a trial basis at a logistical level in Australia
(Bushell 2004). The Australian Cattle Industry is now also using RFID to replace conventional cattle herd
identifiers such as brands and earmarking (NLIS 2003). The mining industry also uses RFID for human
resource and inventory tracking such as tyres for large earthmoving equipment. The healthcare industry
employs RFID for the tracking of supplies and patients (Woodhead 2003; OConnor 2005).
One of the most common types of RFID tag in use today are the so-called 13.56 MHz passives. These are
powered solely by the signal emanating from the reader unit. Range for this type of tag may vary from 1cm
to 1m, limited by the laws of physics, as well as by practical considerations. There are existing programs
such as rfdump (www.rf-dump.org) that gather information from existing 13.56 MHz RFID tags, but this
simply gathers information from the tags. There are similar programs to do this for other tags in existence

Proceedings of 3rd Australian Computer, Network & Information Forensics Conference

Page 84

and of course handheld readers essentially provide this functionality. The challenge is in combining this
with accurate location of data at considerable distance
The newer range of RFID chips that are coming into use now use higher frequencies (UHF) and are able to
be detected, or transmitted to, from longer distances (1 10 m) with a conventional handheld reader. This
increased distance alone presents many opportunities for users and misusers alike. These include but are not
limited to passive scanning/sniffing of information in transit, deception, disruption of signal, and injection
of malicious or false data into the broadcast envelope.
There is no evidence currently in the literature of long-range scans or attacks on UHF RFID tag or
supporting infrastructure. Given that these tags are now being used in military applications, an improved
understanding of their vulnerabilities from long range scanning techniques will contribute to national
security. An understanding of the long range scanning potential of these devices also will allow further
study into the possible misuse of RFID technology in society by governments, business and individuals.

OVERALL PURPOSE AND SIGNIFICANCE OF PROPOSED RESEARCH


The purpose of the research is to investigate long range location and scanning of RFID tags. Current UHF
tag readers have an optimal distance of approximately 10 meters. It is hoped that this research will allow the
detection of these tags at a range of up to 5 km.
The ability to detect RFID tags at long range has a wide range of legitimate and useful applications. These
include tracking and movement of strategic assets such as personnel and inventory, herd management,
disaster recovery, and asset control.
The initial stage of the research will aim to produce a proof-of-concept implementation which demonstrates
that long range location of tags is possible. Currently, no system exists for the long-range detection of UHF
RFID tags using ground based radio signal location techniques. There are systems available that use GPS
devices in every tag to achieve location services (Jansen 2003). However, this is an added expense that
could be eradicated by the development of the proposed system. The removal of GPS services also means
that in the same design, the space, power and computation gained could be used for other tasks. An RFID
based location system that is solely dependent upon ground based location methods provides a potentially
cheaper and usable alternative. Furthermore, by using location based on radio signal propagation the
problem of GPS black spots such as buildings, warehouses or other structure that block access to the sky
will be mitigated against. This is not to say that ground based location services are not also susceptible to
black spots, it is just that these are likely to be less of a problem.

STUDY DESIGN
The methodology used for the conduct of the research will use a quasi-experimental method. The research
will involve practical experimentation and theoretical modelling to examine long range detection and
location for RFID UHF tags using an empirical learning approach. There are several stages to the study each
being cumulative and building empirically on the previous stages
Stage 1 Long Range Detection - will involve initial trials to simply detect the tags via signal resonance at
range using a high powered antennae. This stage will be achieved with a custom built antennae that gives
theoretically 15Dbi gain and placement of a small cluster of tags.
Stage 2 Long Range Identification - This will be the placement of a small number of UHF RFID tags
within an open field in a geographical area where there is relative radio silence, such as in rural Australia.
Tags will be evenly distributed in a grid pattern at heights of 1m, and their location recorded using a GPS.
This experiment
Stage 3 Long Range Scanning This will involve scanning and reading the tags as discrete tags at range.
Stage 4 Location - Two RFID Reader stations will be placed at fixed points a set distance apart, adjacent
to the tag field. Each reader station will include a GPS unit to determine absolute location. Each location
station will include a rotating directional antenna, and computer controlled UHF radio transceiver. One
station will transmit a series of short bursts of radio-frequency energy in a narrow radio beam into the field.
The second station will receive the radio echo of any tags caught in the radio beam-path. These low level
echoes will be accumulated by using averaging techniques over a large number of bursts to lift the signal
above the background noise. The first station will then rotate its directional antenna array to the next sector
of the field, and the process will repeat until the entire field is scanned. The two stations will then swap
transmit and receive roles, and the process will be repeated. Signal responses from each will then be
recorded, to hopefully permit triangulation and location of the RFID tags within the field.

Proceedings of 3rd Australian Computer, Network & Information Forensics Conference

Page 85

ANALYSIS AND TESTING


By using an empirical learning approach as the analysis framework, the data gathered from these field
experiments will be analysed using established radio location methods and techniques to identify possible
methods for location of UHF RFID tags. Finding the appropriate method for the system will be exploratory
in nature.
It is envisaged that the data will have to be analysed through a series of cycles and levels of precision and
accuracy to find the best balance between accurate location of RFIDs and cost of actual computation of
location. This modelling will be achieved through the use of the existing radio location algorithms such as
Time Difference On Arrival (TDOA) and relevant statistical calculations and methods. This stage of the
process should easily be handled by the utilised hardware. However, it intended that some of the project will
use modelling to examine various aspects, which is computationally intensive in nature. The use of existing
compute cluster architectures to process and run models will garner significant time advantage for the
researchers when processing these models.

INTENDED OUTCOMES
This research should provide the rudiments of a system which allows for the long range location of UHF
RFID tags in the field. If successful as a proof of concept then this research will have significant potential
and implication. The effective long range scanning of UHF RFID will open up a wide range of good use and
misuse applications. Good use could be in the mining sector to track humans in hostile work environments.
A use in the livestock sector could see farmers being able to monitor and account for stock with the strategic
placement of antennae around their property. This would add extra benefit as RFID technology is being
used for herd identification in Australia already.
The defence sector is currently deploying this type of technology which means that this proof of concept has
great intelligence and counter-intelligence potential. In addition the ability to scan at long range will have
major combat implications if personnel and ordinance are tagged for inventory management and asset
tracking.
From a misuse perspective it could be possible to scan a competitors inventory and gather critical business
intelligence hence garnering significant arbitrage from this malfeasance of the technology.

REFERENCES
Bushell, S. (2004). TAGS, Youre it! URL: http://www.cio.com.au/index.php/id;648258786;fp;4;fpid;19
accessed 8/4/05
Gilbert, A. (2003). U.S. military expands radio-wave tracking. URL: http://news.zdnet.com/2100-1009_22984391.html accessed 8/4/05
Jansen, S. (2003). Alternative Positioning: GPS is not the only way to determine position, RFIDs have a
place too. URL:
http://www.gisuser.com.au/POS/content/2005/POS16/pos16_feature/pos16_feature_3.html
NLIS (2003). Policy paper: Livestock identification and traceability. URL:
http://www.aahc.com.au/surveillance/nlis_policy.pdf accessed 8/4/05
OConnor, M.C. (2005). IBSS launches Healthcare Tracking. URL:
http://www.rfidjournal.com/article/articleview/1318/1/131/ accessed 8/4/05
RF-Dump (nd). RF-Dump.org. URL: http://www.rf-dump.org/ accessed 12/4/05
Woodhead, B. (2003). Electronic tags: are we next? URL:
http://afr.com/articles/2003/07/28/1059244557388.html accessed 8/4/05.

COPYRIGHT
Craig Valli, Andrew Woodward, Ken Wild, Reino Karvinen 2005. The author/s assign the School of
Computer and Information Science (SCIS) & Edith Cowan University a non-exclusive license to use this
document for personal use provided that the article is used in full and this copyright statement is
reproduced. The authors also grant a non-exclusive license to SCIS & ECU to publish this document in full
in the Conference Proceedings. Such documents may be published on the World Wide Web, CD-ROM, in
printed form, and on mirror sites on the World Wide Web. Any other usage is prohibited without the
express permission of the authors.

Proceedings of 3rd Australian Computer, Network & Information Forensics Conference

Page 86

Information Gathering Using Google


Lih Wern Wong
School of Computer and Information Science, Edith Cowan University
lihwern@yahoo.com

Abstract
Google is a powerful search engine. However, by combining Google features and creativity in construction
query, it will return sensitive information that usually would not be found by casual users. Attacker could
use Google to look for vulnerable targets and passively gather information about their targets to assist
further attacks. This paper discusses ways to exploit Google to obtain valuable information and how it can
be used by attackers to perform attack. The ideas discussed are applicable to other search engines as well.

Keywords:
Google, Google hacking, information gathering, penetration testing

INTRODUCTION
Google is the most widely used and powerful search engine. Lots of users are unaware that they are actually
exposing far more information on the Internet than they wanted. Users who are able to construct the
accurate query will be able to find the exact information they desire. Unfortunately, Google has been
exploited by attackers for malicious purposes, to find vulnerable systems, passwords, other sensitive
information and far more systems information than they need to know. Google can be used as an
information gathering tools to profile targets.
Though tools like Nessus and nmap are much more capable of scanning websites for vulnerabilities, the use
of such tools can be detected and they create lots of noise which usually will alert the administrator
(Mowse, 2003). By employing Google, an attacker can much more silently scan their targets for some of the
vulnerabilities. Since Google has been constantly crawling the Internet for websites and indexed them in
Googles enormous database, it speeds up attacker vulnerability scanning process. Though Google is used
solely in this paper, the ideas discussed are applicable to other search engines as well.

PROFILING A PERSON
This section focuses more on how to gather information about a particular person for general
reconnaissance, social engineering (e.g. deceives or talks bank customers into revealing password) or other
criminal acts.
Personal Webpage and Blog
In order to get a better understanding of a target, combining target name or email with words like
homepage, blogs and family could point attacker to more information about the target. Driven by selfimportance and vanity, a lot of individuals setup their own personal webpage or blog (a web version
personal journal). Blogging is gaining huge popularity as users share their daily routines, thoughts and
opinion on various matters. Through such sites, people have unwittingly released information including
personal opinions, interests, dislike, job particulars and contact information (Granneman, 2004). If personal
photos were posted on these sites, it allows attacker to identify the actual victim or victims friends and
associates. Using all this accurate information with matching photo at hand, attacker could easily socially
engineer the victim. The attacker could strike up a conversation using recent topics posted by the victim to
initially gain trust and later persuade the victim into revealing desired information. Personal webpage and
blog are highly resourceful and reliable source to profile a person.
Web-based Message Groups
People join groups in Yahoo! or Google Groups which they have interest in. Google Groups is Usenet
archives that enable users to access Usenet posts data since 1995 (Google, 2003). By searching an
individual screen name and checking their profile, attacker could potentially figure out their interests or the
kind of groups they most likely will join (Long, 2005, p. 141). Attacker could join a particular group and

Proceedings of 3rd Australian Computer, Network & Information Forensics Conference

Page 87

check their message archive for useful information about target themselves, which is useful for social
engineering. Sometimes, even a group description is enough to determine the group context without actually
joining the group.
Groups like computer related groups could reveals some details on what projects an organization is
currently engaged in or the type of hardware of software solution used. A software development
organization employees (using organization email to correspond) may post some questions related to
programming problems the employee faced on some of these software development groups. If the employee
uses an organization email to correspond, attacker could get a rough idea of the ongoing projects in that
organization. Even if actual name is only used to correspond, attacker could possible find out their affiliated
organization through sites like blog. Furthermore, if a system administrator is seeking help on solving
networks issues, attacker would know which organization is having possible exploitable holes. Attacker
could also actively engage in the group to help the victim with the problem, deceiving the victim into
revealing more information.
Resume
Resume or curriculum vitae mostly contain accurate and current particulars of an individual. They are
usually displayed in personal websites. It is a very reliable and favorable source that attacker can count on
when profiling a person. Its previous employment section gives attacker another approach to advance the
social engineering process. Attacker could impersonate a future employer/head hunter agent and call up the
victim to find out more background information about the candidate. At such time, the victim will most
probably give out accurate information to convince the attacker into hiring the victim. Query "phone * * *"
"address *" "e-mail" intitle:curriculum vitae would return positive resumes (Davies, 2004).

Figure 1.1: Curriculum Vitae (Curriculum Vitae, n.d.)

PROFILING A TARGET ORGANIZATION


If attacker has a fixed target, Google could assist in finding information that is useful for social engineering
or physical breaches. Most of this information is publicly accessible for their employees convenience,
while some of them can be very informative.

Proceedings of 3rd Australian Computer, Network & Information Forensics Conference

Page 88

Intranet, Human Resources and Help Desk


Many organizations have an intranet which contains information that should only be accessible by
employees. For organization convenience, the intranet may contain human resources information (e.g.
departmental contacts), policy and procedures, help desk information. Though intranet is supposedly
private, somehow such sites are still accessible to public by searching intitle:intranet inurl:intranet human
resources. Substitute human resources with words like help.desk, IT department for additional
information. This information includes name of individual in-charge, their position and contacts are very
helpful for social engineering, as shown in Figure 2.1, with helpful links to Contacts, Help Desk and
Policies. By skimming through the policies, which may usually include operation procedures, attacker could
roughly know how the organization operates.

Figure 2.1: Intranet (CSD Contacts, 2005)


Self-help Guides
Some organizations provide guided help for troubleshooting or installation that could be too informative.
Attacker could learn their configuration details and technology involved which are useful in later attacking
phase. A search how to network setup dhcp server (help desk | helpdesk) shows a how to guide on
network setup, as shown in Figure 2.2 (Long 2005, p. 124).
Information that is beneficial in Figure 2.2 includes proxy address and port number, workgroup name (i.e.
DIS-STUDENT), email information and configurations (i.e. web-based and MS Outlook support), and
additional server names (ie. dis.unimelb.edu.au, unimelb.edu.au.). Attacker can use this information within
the organization networks.

Proceedings of 3rd Australian Computer, Network & Information Forensics Conference

Page 89

Figure 2.2: Informative Self-help Guide (DIS, n.d.)


Jobs Postings
Recruitment section of organization website could easily being disregarded as a source of information.
However, it reveals information regarding information technology in use and corporate structure. It reveals
operating system, software used, network type and server type. It also shows various corporate departments
with their respective vacant positions with job description. Attacker could perform a physical breach by
impersonating a new employee taking up a new position, pretending it is his first day at work and ask for
access control. Attack will find information regarding jobs vacancies of an organization, by combining
operator site and employment | job | recruitment.

Figure 2.3: Job Postings Reveal Information (Employment, 2005)

Figure 2.3 shows that the technology this organization uses most probably includes .NET
Frameworks applications, Oracle database, Veritas application, various OS (e.g. Windows
2003, Linux) and IBM AS/400 server. Attacker could then set his path right, focus his

Proceedings of 3rd Australian Computer, Network & Information Forensics Conference

Page 90

attacks on such technology weakness.


Google Local
Part of the social engineering process includes eavesdropping on conversations, people watching, engage in
friendly target employee conversation. Google Local can be very helpful in locating employee favorite hang
out places such as coffee shops, restaurants, grocery stores and pubs to eavesdrop, chat with employees or
update with corporate gossip. Attacker could pose as an interested job applicant, and engage in a
conversation with a target organization IT personnel, which could potentially reveal information regarding
operating system, version, patch levels and application in use (Cole, 2003) in the target organization.
Google Local (http://local.google.com/) (Currently Google Local only works in US, Canada and UK) allows
attacker to find any business type in a target organization surrounding, with detailed map to locate the place,
shown in Figure 2.4.

Figure 2.4: Google Local (Google Local, 2005)


Link Mapping
Links in an organization website can reveal non-obvious relationship between the linked organizations.
Attacker could attack a poorly secured partner site and subvert the trust relationship between the two
domains to compromise the much better secured target. BiLE of www.sensepost.com (BiLE, 2003) is an
automated tool, capable of revealing such hidden relationships based on complex calculations and
predefined rules. For instance, a link from a target site weighs more than a link to a target site (SensePost,
2003, p. 9). Though Google link operator is only capable of showing sites that links to a particular given
URL in the query, it is used intensively to assist such footprinting process in BiLE. It is a subtle way to
learn the possible relationships of an organization, over which an organization have no control.

Proceedings of 3rd Australian Computer, Network & Information Forensics Conference

Page 91

PROFILING WEB SERVER AND WEB APPLICATION, AND LOCATING


LOGIN PORTALS
Attackers can use Google to profile web servers and the web applications the server is running on, before
attacking potential vulnerable machines with vulnerable version application. Login portals provide frontdoor access to the target which is a helpful start for attacker.
Server Versioning
Server tag at the bottom of directory listing page provides useful information to determine the type of web
server application and version that is running on the website, as shown in Figure 3.1. Attacker who wants to
exploit the vulnerabilities of say Apache 1.3.28, can run a search on server at Apache/1.3.28 to locate
potential vulnerable machines. Query Microsoft-IIS/6.0 server at will locate website running Microsoft
IIS server 6.0. This is a fairly easy way to determine the server version. Though a vulnerable version does
not guarantee a possible flaw as it may have been patched. However, if directory listing is allowed, it could
suggest that the administrator is not concerned with the server security and there is a possibility that the
server is not fully patched.

Figure 3.1: Server Tag Reveals Server Version (Kernelnewbies, n.d.)


Web Application Error Messages
Error message generated by application installed in web server can reveal information about the server and
applications that reside on the server. Query ASP.NET_SessionIddata source= "Application key"
reveal sites with ASP.NET application state dump, which contains a great deal of information regarding the
web application and applications that resides on the server, such as database connection string and
application path in web server, as shown in Figure 3.2. The connection string itself provides valuable
information including database type, database name, username and password to connect to the database.
Thus, it would be easy for the attacker to connect to the database server to manipulate the contents. After
all, if such dump files are crawled by Google, the attacker is convinced that the web server is probably not
secure.

Proceedings of 3rd Australian Computer, Network & Information Forensics Conference

Page 92

Figure 3.2: ASP.NET Application State Dump (Broward County, 2005)


PHP error message can be revealed using query intext:Warning: Failed opening include_path, which
help attacker to characterize the web server, as shown in Figure 3.3. The error message also exposes actual
server path, web path and related PHP filenames. Attacker could try to traverse on these actual file paths to
look for potential valuable information. Poor programming practice and lack of comprehensive testing have
caused such error messages to exist or not being caught by existing error checking mechanism.

Figure 3.3: PHP Error Message (RightVision, n.d.)


Default Pages and Documentations
Most web applications or web servers have default or test pages which enable administrators to validate that
the application is successfully installed. Poor configuration has left such pages being crawled by Google.
The mere existence of default pages even in Google cache will help attacker profiling process. Figure 3.4

Proceedings of 3rd Australian Computer, Network & Information Forensics Conference

Page 93

shows Apache installation using query intitle:Test Page.for.Apache seeing.this.instead. Besides, different
range of server version has disparate default pages (Long, 2004). Figure 3.4 shows Apache 1.3.11-1.3.31
installation as oppose to Figure 3.5 Apache 1.3.0-1.3.9 installation, found using query
intitle:Test.Page.for.Apache It worked! this Web site!.
Attacker can use manuals or documentations that are usually shipped with web server applications to profile
the web servers, though not as accurate as default pages. Query intitle:Apache 1.3 documentation or
intitle:Apache 2.0 documentation will find respective range of Apache servers (Racerx, 2005). Figure 3.6
shows IIS 5.1 release notes, using query inurl:iishelp core. The mere existence of default pages and
documentations could signify careless administrator, which means potential vulnerable sites. These two
techniques give attacker another approach to identity web server versions.

Figure 3.4: Apache 1.3.11-1.3.31 Installation (Lanalana, n.d.)

Figure 3.5: Apache 1.3.0-1.3.9 Installation (Mvacs, n.d.)

Proceedings of 3rd Australian Computer, Network & Information Forensics Conference

Page 94

Figure 3.6: IIS Default Documentation (IIS 5.1, 2001)


Locating Login Pages
Web application login pages, such as the one shown in Figure 3.7, found using query
allinurl:exchange/logon.asp, allows attacker to profile the applications that reside on the web server and
they act as a break-in channel. In this case, the site even specifies the way the username is constructed (i.e.
Note: In most cases, your username) and the software version and patch level (i.e. 5.5 SP4). Attacker
could find exploits related to the specific application version to compromise it. Besides, the login page is a
default page, it could implicitly indicate that the website administrator is unskillful and the security of this
site is probably weak (Long, 2005, p. 251). Administrator should customize the login page so that it does
not indicate the actual application in used. Way of finding generic login pages includes
inurl:/admin/login.asp, please log in or inurl:login.php. Attacker could use the login pages to brute force
or dictionary attack a range of passwords with the respective usernames.

Proceedings of 3rd Australian Computer, Network & Information Forensics Conference

Page 95

Figure 3.7: Microsoft Outlook Web Access Login Portal (Myers, 2005)

FINDING EXPLOITS AND VULNERABLE TARGETS USING VULNERABLE


APPLICATIONS COMMON WORDS
If an attacker intends to attack any vulnerable targets without a specific one, Google is highly effective in
finding such vulnerable targets. Attacker can first use Google to dig up exploit codes written by hackers to
facilitate exploitation on vulnerable targets. Subsequently, attacker can search vulnerable targets through
flawed applications commonly displayed words.
Attacker can rely on Google to search for exploit codes that are posted on public sites or in hacking
community sites. To retrieve this large number of exploits, usually written in C language, use query
filetype:c exploit. However, some exploits are shown in other view format such as txt, html or php. Thus, to
effectively locate such exploits, attacker will search for common code strings inside the exploit codes, such
as main or #include <stdio.h> which is commonly included in C programs to reference standard
input/output library (ComSec, 2003). Regardless of file extension, query #include <stdio.h> main exploit
will produce sites with exploit codes.
Attacker can use source code of a vulnerable application to construct an effective based query to search for
vulnerable targets. Attacker could visit security advisory websites to learn 3rd party application used in web
applications that have security vulnerabilities. Most of the sites that use such 3rd party components have the
phrase Powered by, follow by component name and version. For instance, query Powered by CubeCard
2.0.1 will locate websites using CubeCard 2.0.1, which is vulnerable to SQL injection and cross-site
scripting
(Secunia,
2005).
Another
example
to
look
for
vulnerable
targets
is
allinurl:/CuteNews/show_archives.php, as CuteNews show_archives parameter is susceptible to cross-site
scripting (Mohanty, 2005).
To get an idea on how to produce an accurate query to locate the vulnerable websites, attacker can learn the
common display words of sites using the 3rd party web components by checking the components source
code (Long, 2005, p. 185). If the source code is not available, attacker could directly install the vulnerable
components to learn their common sign. Large amount of sites that uses 3rd party applications/components
left these trials on their sites. There is a fair chances that attacker could locate lots of sites using unpatched
vulnerable 3rd party applications.

FINDING USERNAME, PASSWORD, SENSITIVE INFORMATION


Usernames and passwords are used by most authentication mechanism which attackers are very keen in
hunting them. Google can also be used to unveil highly sensitive information such as credit cards numbers.
Finding Username
Knowing a users username means attacker has solved half of the puzzle in breaking in. Attacker could use
username to socially engineer the help desk to reveal the matching password. Basically, a generic query of
finding username could be inurl:admin inurl:userlist. Alternatively, try inurl:root.asp?acs=anon to locate
Microsoft Outlook Web Access Address Book (Chambet, 2004). It contains a public accessible address
book with staff contacts, as shown in Figure 5.1. The Alias column is most likely staff username used on
the organization login. Attacker can randomly submit any common starting letters in names to harvest
almost the entire entries in the address book. Some sites even show how a username is usually created (e.g.
append first letter of your last name to your first name).

Proceedings of 3rd Australian Computer, Network & Information Forensics Conference

Page 96

Figure 5.1: Outlook Public Address Book (SPC, n.d.)


Finding Password
Host with Microsoft Frontpage Extension installed can be searched for username and password using # FrontPage- inurl:service.pwd. Although the password is encrypted using DES encryption, attacker could
run tool like John the Ripper to decrypt the encrypted form password, as seen in Figure 5.2 (ComSec, 2003).
In addition, MySQL database credential information could potentially be stored in connect.inc (Google
Hacking, n.d.). Figure 5.3 shows the result of searching intitle:index of intext:connect.inc. However,
finding password in Google does not actually yield much positive result. Most passwords found are no
longer valid. Most passwords found are usually stored in configuration or log files in an unencrypted or
weak encrypted format.

Figure 5.2: FrontPage Extension Usernames and Passwords (Heyerlist, n.d.)

Proceedings of 3rd Australian Computer, Network & Information Forensics Conference

Page 97

Figure 5.3: MySQL Database Credential (Central College, n.d.)


Other Valuable Information
There is much more valuable information that attackers can obtain through Google search, for instance
credit card numbers. Most of these highly valuable numbers are released by attacker who deceives unwitting
users into submitting personal information through phishing, not so much of a leak from e-commerce sites
(Leyden, 2005).
National identification number like Social Security Number (SSN) could also be located using Google. The
fact that some educational institutes use SSN for student identification has threatened students privacy,
exposing them to possible identity theft. There are usually posted alongside with associated names and
grades, and exposed in public networks, as shown in Figure 5.4 (edited), found using query SSN 772-55.
Besides, some organization announces competition winners name with national identification number (IC
in this example) in their websites, as shown in Figure 5.5 (edited), found using query IC "820508-*-*".
Once the attacker knows the format of such numbers, it is trivial for the attacker to find them. Such numbers
can be used to perform identify theft, for instance to apply credit card or driving licence.

Figure 5.4: Google Uncovers Social Security Number (Rutgers, 2005)

Proceedings of 3rd Australian Computer, Network & Information Forensics Conference

Page 98

Figure 5.5: National Identification Number Exposed in Competition Winners List (Maybank, 2005)

FINDING FILES
Files and database contains information that attackers can use to accomplish their distinct objectives.
Google can be used to locate files and contents inside these files. This section will focus on ways to locate
configuration files, log files, office documents and databases, since they usually contain sensitive
information.
Google Cache
Google cache can be very helpful to ordinary users as well as attacker. Each time Google crawls a page, it
stores a copy of the page as cache in Google own servers. Thus, users can always access the document even
though the live page has been removed. Unfortunately, attacker can take advantage of this feature to grab
sensitive information that has been removed from the hosting server. Additionally, attacker could achieve
anonymity by accessing a page cache version, as the data are retrieved from Google server, which act like a
proxy and not from the actual server. However, this is only true if the stripped or text-only cache is retrieved
(Greene, 2001). Other non-text objects like images in cache pages are still retrieved from the actual server.
Configuration Files
Configuration files provide program settings information on how applications or networks are configured to
operate, which are very helpful pieces of information to attacker. Figure 6.1 shows result sought using query
filetype:ini inurl:ws_ftp.ini . It locates WS_FTP application configuration files which contain FTP server
information on username, password, directory and host name. The poorly encrypted password shown can
easily be decrypted using free tools (Ipswitch, 1996).
Sometimes, Google returns vast amounts of results which require further refinement. Ways to filter the
results includes (Long, 2004)

Create unique base words or phrases base on actual file.

Filter out words like test, samples, how-to, and tutorial to exclude the example files.

Locates and filter out commonly changed values like yourservername, yourpassword in the
sample configuration files.

Proceedings of 3rd Australian Computer, Network & Information Forensics Conference

Page 99

Figure 6.1: WS_FTP Configuration File (Ipswitch, n.d.)


Office Documents
Office documents include word processing documents, spreadsheets, Microsoft PowerPoint, Microsoft
Access and Adobe Acrobat files. Some of the files contents can be crawled and rendered by Google as
HTML document, which enables attacker to hunt for highly relevant documents through Google search.
Attacker could for instance, query filetype:xls username password email to locate potential Microsoft Excel
files that contains sensitive information, as shown in Figure 6.2 (edited) (Greene, 2000).

Figure 6.2: Microsoft Excel Reveals Username and Password (Digitalbrain, 2003)
Network Report
Nessus is a vulnerability scanner that produces an assessment report after scanning network vulnerability
and misconfiguration. Thus, with that in mind, attacker could query Google with This file was generated
by Nessus to find the report and locate vulnerabilities on potential targets that yet to be fixed (Trivedi,

Proceedings of 3rd Australian Computer, Network & Information Forensics Conference

Page 100

2005, p. 8). Figure 6.3 shows such report contains assessed host IP, open port number, detailed potential
vulnerabilities description and countermeasures. There is a high possibility that the sites mentioned in the
report could be exploited, as such report may be uploaded by malicious users who perform vulnerability
scanning on other machines. If an administrator is conscious enough to perform a vulnerability scanning,
the assessment report should not have existed on the server.

Figure 6.3: Nessus Assessment Report (Nessus Scan Report, n.d.)


Database SQL scripts
Database dumps usually refer to SQL scripts that contain text-based information about database and table
structure, including table name, field name, field type and even actual records in tables. Administrator uses
this file to reconstruct the database. Figure 6.4 shows dump file containing table structures and actual
records (i.e username, password), using query filetype:sql # Dumping data for table
(username|user|users|password) (Long, 2005, p. 309). The query consists of generic dump file extension,
common header name and promising field names. Such SQL scripts are very helpful if the site is vulnerable
to SQL injection as well. Since attacker knows the table structure, attacker could manipulate the database
file through SQL injection. At worst, if the sites login credential is stored in that database, attacker could
insert a username and password to the database to access private sites.

Proceedings of 3rd Australian Computer, Network & Information Forensics Conference

Page 101

Figure 6.4: Database Dumps Reveal Helpful Information (MySQL Dump, n.d.)

WEB-ENABLED NETWORK DEVICES


Lots of network devices such as routers, firewalls, printers and proxy servers have web interfaces that show
the device status and allow administrators to remotely configure their settings. Network device
misconfiguration has exposed these devices to Google. Attacker could subvert these devices to gain access
to the trusted network protected by these devices, or directly exploit the device vulnerabilities. For instance,
query intitle:ADSL Configuration page will find SolWise ADSL modem crawled by Google, as shown in
Figure 7.1 (Chat11, 2004).

Proceedings of 3rd Australian Computer, Network & Information Forensics Conference

Page 102

Figure 7.1: ADSL Modem Configuration Page (ADSL Configuration Page, n.d.)
Most network printers have web-based interface that allows users to conveniently view the printers status
or modify their configurations from any web browser. Misconfiguration has exposed such printers on the
Internet. Figure 7.2 shows network printer captured using query Phaser 6250 Printer Neighborhood
(Chambet, 2004). The network printer provides a lot of detailed information about surrounding network,
including its IP address, print jobs list, printed document filenames, and computer names issuing print job.
Attacker can even further compromise the printer through its administrative page. Some of these printers
allow attacker to issue test print page through the internet! Network printers like Phaser 740 have
vulnerability that allows attacker to access a hidden file through the URL to modify the administrative
password. If attacker is aiming to cause annoyance for any users, Google is very effective in finding these
network devices for exploitation. Thus, administrator should never allow such devices to be exposed on the
Internet.

Proceedings of 3rd Australian Computer, Network & Information Forensics Conference

Page 103

Figure 7.2: Google Exposes Network Printer (Phaser 6250, 2003)

COUNTERMEASURES
There a few simple ways users should practice to protect themselves from such innocuous attack using
Google. Firstly, users need to know the two ways a page could be found and indexed by Google. First
method, the page could be linked from other sites that have been crawled by Google. Secondly, the page is
manually submitted to Google database. To avoid personal profiling, user should avoid using actual name
when corresponding in web-based message groups or blogs. Users could perform a search (e.g. actual name,
username) on themselves to so that they are aware of the information that is published in the Internet to
avoid such information being used against them.
Administrator should perform search on their own web servers to exposure potential threats using Google.
All of the above mentioned technique can be automated using tools like Gooscan, SiteDigger and Athena.
Such tools use a signature database consisting of various Google queries to search for sites information
leakage. Administrator can search their own websites for exposure effectively and efficiently using such
tools. However, using automated tools like Gooscan and Athena that does not utilize Google API violates
Google Terms and Condition, which can result in temporary service banning (Calashain, 2003, p. 137).
Employees in an organization should be advised of what information is published for public access to avoid
possible social engineering by using such information against them. Administrator should first make sure all
web servers are installed with the latest patches. Since directory listing provides a road map to private files,
administrator should disable directory listing, unless users are allow to browse files in a FTP-style manner.
All default username, password, test pages and documentation should be removed. Administrator should
ensure their web pages are fully tested for potential errors and all errors are caught properly. Default pages
should be customized to remove all possible common words.
Robots.txt should be used to specify web directories that should not be indexed by Google. However,
attacker can still directly access the robots.txt of a targeted site to learn the directory structures. Use
password protection mechanism to protect private pages that are intended for specific users, since Google is
unable to indexed password-protected pages. Put META tag <META NAME="ROBOTS"
CONTENT="NOARCHIVE"> in a pages HEAD section to prevent Google from caching a page. Lastly, if
a page that is not intended for public viewing is found in Google, after removing the page from the web
servers, administrator could resort to Google Remove URL and Google Groups Post
(http://services.google.com/urlconsole/controller) to remove the identified URL and its respective cache
page from Google repository.

Proceedings of 3rd Australian Computer, Network & Information Forensics Conference

Page 104

CONCLUSION
Google is able to produce some astonishing results, which depend very much on the precision of the
constructed query. The possibility of constructing potential exploitable query is boundless; creativity of
attacker in creating query is the only limitation. Google is very effective in profiling an individual as lots of
users have unwittingly disclosed personal information. They are unaware that search engines like Google
could collect and index all this information and serve them to anyone with the correct query. If attacker has
no specific targets, Google is highly effective in finding vulnerable targets from the mass of indexed
websites to perform random attacks, as opposed to finding potential vulnerabilities on a specific target,
which is not effective. There are others penetrations tools that are much better in finding vulnerabilities on a
specific target than Google. The powerfulness of Google is a two-edge-sword. Attackers have armed
themselves with Google to gather pieces of organization information that seems innocuous to facilitate
further compromise. Administrators should embrace Google as one of their penetration testing tools to
protect their organization from information leakage.

REFERENCES
ADSL Configuration Page (n.d.) Retrieved May 3, 2005, from http://router.breukink.co.uk/
BiLE (2003). Bi-directional Link Extraction. Retrieved April 30, 2005, from
http://www.sensepost.com/restricted/BilePublic.tgz
Broward County (2005). OnCoRe Setting Options. Retrieved May 5, 2005, from
http://205.166.161.12/OncoreV2/Settings.aspx
Calashain, T. (2003). Google Hacks: 100 Industrial-Strength Tips & Tools. California: OReilly.
Central College (n.d.) Retrieved May 5, 2005, from http://enrolme.centralcollege.ac.uk/enrolme/connect.inc
Chambet, P. (2004). Google Attacks. Retrieved May 3, 2005, from
http://www.blackhat.com/presentations/bh-usa-04/bh-us-04-chambet/bh-us-04-chambet-googleup.pdf
Chat11 (2004, July 5). Using Google to Find Passwords. Retrieved May 1, 2005, from
http://www.chat11.com/How_To_Use_Google_To_Find_Passwords
Cole, E. (2003). Hacker Beware. Singapore: Prentice Hall.
ComSec (2003, May 25). Google A Dream Come True. Retrieved May 3, 2005, from
htp://www.governmentsecurity.org/comsec/googletut1.txt
CSD Contacts (n.d.). Retrieved May 5, 2005, from http://www.jmls.edu/intranet/csd/contacts.shtml
Curriculum Vitae (n.d.). John Terning Curriculum Vitae. Retrieved May 3, 2005, from
http://t8web.lanl.gov/people/terning/john/cv/cvmain.html
Davies, G. (2004). Advanced Information Gathering. Retrieved May 3, 2005, from
http://packetstormsecurity.com/hitb04/hitb04-gareth-davies.pdf
Digitalbrain (2003). Retrieved May 3, 2005, from
http://frome.digitalbrain.com/frome/ICT/Digitalbrain%20users/ All%20DigitalBrain%20Users.xls
DIS (n.d.). DIS Student Plug-In Network Setup How-To. Retrieved May 3, 2005, from
http://www.dis.unimelb.edu.au/helpdesk/connect.pdf
Employment (2005, March 31). Public Mutual - Employment Opportunity. Retrieved May 3, 2005, from
http://www.publicmutual.com.my/page.aspx?name=co-Employment
Google (2003). Google Acquires Deja's Usenet Archive. Retrieved April 28, 2005, from
http://groups.google.com/googlegroups/deja_announcement.html
Google Local (2005). Retrieved May 5, 2005, from http://local.google.com/
Granneman, S. (2004, March 9). Googling Up Password. Retrieved May 2, 2005, from
http://www.securityfocus.com/columnists/224
Greene, T.C. (2000, June 25). Crackers Use Search Engines to Exploit Weak Sites. Retrieved April 30,
2005, from http://www.theregister.co.uk/2000/06/25/crackers_use_search_engines/

Proceedings of 3rd Australian Computer, Network & Information Forensics Conference

Page 105

Greene, T.C. (2001, November 28). The Google Attack Engine. Retrieved April 30, 2005, from
http://www.theregister.co.uk/2001/11/28/the_google_attack_engine/
Heyerlist (n.d.) Retrieved May 7, 2005, from http://www.heyerlist.org/garderobe/_vti_pvt/service.pwd
IIS 5.1 (2001). Internet Information Services 5.1 Release Notes. Retrieved May 6, 2005, from
http://www.aspit.net/iishelp/iis/misc/localhost/iishelp/iis/htm/core/readme.htm
Ipswitch (n.d.) Retrieved May 5, 2005, from http://www.ryerson.ca/~mblee/WS_FTP.ini
Ipswitch (1996). WS_FTP Professional User Guide. Retrieved April 30, 2005, from
http://www.oxinet.co.uk/ipswitch/ws_ftp.pdf
Kernelnewbies (n.d.) Index of /documents/kdoc. Retrieved May 5, 2005, from
http://kernelnewbies.org/documents/kdoc/
Lanalana (n.d.) Test Page for Apache Installation. Retrieved May 7, 2005, from http://iolanipalace.org/
Leyden, J. (2005, April 4). Hacking Google for Fun and Profit. Retrieved May 1, 2005, from
http://www.securityfocus.com/news/10816
Long, J. (2004, March 19). The Google Hackers Guide. Retrieved May 1, 2005, from
http://johnny.ihackstuff.com/modules.php?op=modload&name=Downloads&file=index&req=getit&
lid=34
Long, J. (2005). Google Hacking for Penetration Testers. United States of America: Syngress Publishing.
Maybank (2005). Maybank MaxiHome Year End Promotion Winners List. Retrieved May 5, 2005, from
http://www.maybank2u.com.my/maybank_group/
products_services/consumer_loan/maxihome_winners2.shtml
Mohanty, D. (2005, March 11). Demystifying Google Hacks. Retrieved May 2, 2005, from
http://www.securitydocs.com/link.php?action=detail&id=3098&headerfooter=no
Mowse (2003, February 16). Google Knowledge: Exposing Sensitive Data with Google. Retrieved May 1,
2005, from http://www.digivill.net/~mowse/code/mowse-googleknowledge.pdf
Mvacs (n.d.) It Worked! The Apache Web Server is Installed on this Web Site!. Retrieved May 7, 2005,
from http://mvacs.ess.ucla.edu/
Myers (2005). Microsoft Outlook Web Access Logon. Retrieved May 8, 2005, from
http://mail.dnmyers.edu/exchange/logon.asp
MySQL Dump (n.d.) MySQL Dump 8.22. Retrieved May 4, 2005, from
http://www.ozeki.hu/attachments/34/etalon.sql
Nessus Scan Report (n.d.). Retrieved May 8, 2005, from
http://www.geocities.com/mvea/debian30r2_install.htm
Phaser 6250 (2003). About Printer Printer 6250. Retrieved May 2, 2005, from
http://140.113.153.116/aboutprinter.html
PhaserLink (1999, November 16). Fwd: Printer Vulnerability: Tektronix PhaserLink Webserver gives
Administrator Password. Retrieved May 1, 2005, from http://www.securityexpress.com/archives/bugtraq/1999-q4/0001.html
Racerx (2005, April). Google Hacking Techniques. Retrieved May 2, 2005, from
http://www.exploitersteam.org/forumnews-id30.html
RightVision (n.d.). Serveur Appliance Software Right Vision. Retrieved May 10, 2005, from
http://www.rightvision.com/lg-fr-rubrique-distributeurs.html
Rutgers (2005). Names. Retrieved May 10, 2005, from http://teachx.rutgers.edu/~mja/wakka/
workfiles/int_excel/students.xls
Secunia (2005, March 3). CubeCart Cross-Site Scripting Vulnerabilities. Retrieved April 28, 2005, from
http://secunia.com/advisories/14416/
SensePost (2003, February). The Role of Non-Obvious Relationships in the Foot Printing Process.
Retrieved April 28, 2005, from http://www.sensepost.com/restricted/BH_footprint2002_paper.pdf

Proceedings of 3rd Australian Computer, Network & Information Forensics Conference

Page 106

SPC (n.d.). Find Names. Retrieved May 8, 2005, from


http://email.spc.edu/exchange/USA/finduser/root.asp?acs=anon
Trivedi, K. (2005, January). Foundstone SiteDigger 2.0 Identifying Information Leakage Using Search
Engines. Retrieved April 1, 2005, from
http://www.foundstone.com/resources/whitepapers/wp_sitedigger.pdf

COPYRIGHT
Lih Wern Wong 2005. The author/s assign the School of Computer and Information Science (SCIS) &
Edith Cowan University a non-exclusive license to use this document for personal use provided that the
article is used in full and this copyright statement is reproduced. The authors also grant a non-exclusive
license to SCIS & ECU to publish this document in full in the Conference Proceedings. Such documents
may be published on the World Wide Web, CD-ROM, in printed form, and on mirror sites on the World
Wide Web. Any other usage is prohibited without the express permission of the authors.

Proceedings of 3rd Australian Computer, Network & Information Forensics Conference

Page 107

The effectiveness of commercial erasure programs on BitTorrent activity


Andrew Woodward
School of Computer and Information Science
Edith Cowan University
a.woodward@ecu.edu.au

Abstract
Recent developments have seen the closure of P2P sites such as Kazaa and Napster due to legal action, and
a subsequent rise in the use of alternative file-sharing software, namely BitTorrent. This research in
progress aims to evaluate the effectiveness of commercial programs to erase traces of the use of such
software. The erasure programs Privacy Suite, Window Washer and R-Clean and Wipe were used on a
machine that had used the BitTorrent client Azureus to download two torrent files. The drive was imaged
and examined forensically with Autopsy, and the registry was also examined on the source machine. The
program R-Clean and Wipe left evidence in both the registry and the image of the name and type of files that
had been downloaded with this software. Of greater concern was that the software Window Washer and
Privacy Suite claimed to erase evidence of P2P activity, but it did not remove evidence of torrent activity.
Current erasure tools do not appear to be effective at removing traces of BitTorrent activity.

Keywords
P2P, BitTorrent, file sharing, erasure software

INTRODUCTION
Perceived losses by various media representative organisations (RIAA, MPAA) has lead to the closure of
many sites and even software which supported / allowed users to share files with each other. Such software
is known as peer-to-peer or more commonly as P2P. The most popular examples of this type of software are
eDonkey, Kazaa and Napster. Napster was the original P2P software client, and was closed down some time
ago (BBC 2000). Kazaa was a newer incarnation of P2P software, and has also recently had legal action
taken against it which will likely see its demise due to the changes being enforced on it (Ferguson 2005). In
addition to closing down the organisations that allowed for such file sharing to take place, the various
representative bodies have also gone after high volume users of these services or products (MPAA 2005).
As a result, a form of file sharing software that had been around since 2001, known as BitTorrent, has
become more popular, (BitTorent 2005). It is similar to its contemporary P2P clients in that it is a
decentralised means by which users can exchange information. The difference with torrent exchange, or
streaming as it is known, is that a file is broken down into much smaller fragments, and it is these
fragments that are exchanged between many users. This type of technology is actually a very efficient
means of allowing users to download files, and is being used by various organisations (Layman 2005;
Linspire 2005) for legitimate purposes. The eDonkey software (MetaMachine 2003) is being used by file
sharers as a replacement for the previous two clients, and while there is evidence that usage of this software
is increasing in some countries, but in others, it is torrent software that dominates (BBC 2005).
While software exists to remove traces of P2P activity from programs such as E-donkey, it is unknown
whether this software is effective at removing traces of BitTorrent activities. Three commercially available
erasure tools were selected to determine whether they can remove traces of torrent activity. These were RClean and wipe (RRT, 2005), Window Washer (Wenroot, 2005) and Privacy Suite (CyberScrub, 2005).
This research in progress paper examined the ability of these programs to remove evidence of torrent
activity.

THE ERASURE PROGRAMS


Three different erasure programs varying in both claims and manufacturer were used for testing. Details of
each and their claims to their ability to erase various activities are given here.

Proceedings of 3rd Australian Computer, Network & Information Forensics Conference

Page 108

R-Clean and Wipe - Version 5.1, Build 1169


This erasure software is produced by R-Tools technology and the manufacturer makes the following claims
about its software:
R-Wipe & Clean is a complete solution to wipe useless files and keep your computer
privacy. Irretrievably deletes private records of your on- and off-line activities, such as
temporary internet files, history, cookies, autocomplete forms and passwords, swap files,
recently opened documents list, Explorer MRUs, temporary files, etc. and free up your
disk space. The utility wipes files and unused disk space using either fast or secure erase
algorithms. All files and folders may be combined in wipe lists to erase them in a single
procedure. Supports both the FAT and NTFS file systems. All separate wiping and
cleaning tasks can be combined in one or more erasing procedures launched immediately
or at predefined times or events as a background task.
(RTT, 2005).
It is worth noting that this software does specifically claim to erase evidence of either P2P or BitTorrent
activity.
Window Washer Version 6, Build 6.0.2.466
This software is produced by the Webroot Company and makes the following claims about its software:
Extensive Wash Areas
Window Washer scrubs hundreds of areas on your PC to remove unnecessary files to
ensure your privacy and free up valuable disk space.
Browser Activity Eraser
Window Washer cleans all aspects of your browser activity, including Internet history,
address bar, cache, cookies, and more. Mozilla and Firefox users now enjoy the same
online privacy protection that users of Internet Explorer, AOL and Netscape already
enjoy.
Permanent Bleaching
Bleach, an encryption feature, completely overwrites files with random characters to
make them unrecoverable. This feature is so powerful it exceeds the tough standards of
the Department of Defense and the National Security Agency.
Free Space Cleaner
Free space on your computer contains portions of old and previously deleted files and
documents. Window Washer now cleans this area making the files you deleted earlier
permanently unrecoverable.
One-click Shredder
Window Washer lets you simply and conveniently shred a folder and all of its contents,
or just a single file, in one step. Just a simple right-click will permanently overwrites
these files, making them unrecoverable.
Critical File Protection
Window Washer includes built-in safety features to help prevent you from accidentally
removing important files. Alerts prompt you to confirm your request to delete special
folders, like system folders, My Documents, My Photos, and others, so they remain safe
from unintentional deletions.
Smart Cookie Saver
Window Washer deletes the cookies you don't want and lets you keep and save those
you do. That way you maintain your preferred Internet settings and log-ins for all your
favorite sites.
Flexible Washes

Proceedings of 3rd Australian Computer, Network & Information Forensics Conference

Page 109

During a wash, Window Washer automatically cleans the latest versions of your favorite
programs such as Real Player, Google Search Toolbar, iTunes, Macromedia Flash
Player, Adobe Acrobat and hundreds more, to keep these programs running smoothly.
Automatic Wash Cycles
You can set Window Washer to automatically clean your system at specified intervals,
like at shut down or start up. For added security, we recommend setting Window Washer
to wash when you close your Internet browser.
Total System Erase
Window Washer can be set to fully erase your hard drive, files, programs and operating
system for easy re-formatting. Consider using this feature if you're donating or selling
your PC and you don't want your files to be seen by strangers.
(Webroot, 2005)
Again, while this product states that it erases all history of Internet activity, it makes no specific claims
about either P2P or BitTorrent activity.
Privacy Suite Version 4.0, Build 4.0.0.144
The manufacturer of this software, Cyberscrub, made the following claims about their software:
Key Features
Completely eliminates sensitive data from your computer: valuable corporate trade
secrets, business plans, personal files, confidential letters, e-mail messages, Media
Player/Real Player history, Web browser tracks, AutoComplete, cookies, Recent Docs,
Find/Run data, etc.. Supports Internet Explorer, Netscape, Mozilla and Opera.
Peer2Peer- Erase all evidence from 22 popular applications such as KaZaA, iMesh,
Morpheus and more.
Privacy Suite erases data by wiping its contents beyond recovery, destroying its name
and dates and finally removing it from disk.
Meets and exceeds the U.S. Department of Defense standards for the permanent erasure
of digital information (U.S. DOD 5220.22).
Wipe compressed files on NTFS (allows wiping from the original location of the file).
Scramble file names and folders- destroy file attributes from FAT or MFT partitions.
Offers wipe methods that can stop both software and hardware recovery tools from
restoring the erased data.
Stealth mode.
Isaac Random Gernerating Algorithm.
Completely destroys any data from previously deleted files that might still be accessible
on your disk (in the Recycle Bin, in the unused area of the disk or in the slack portion of
existing files).
Destruction of file attributes from previously "deleted" files.
Integration with the Windows Recycle Bin: Privacy Suite can destroy the files contained
in the Recycle Bin beyond recovery.
Integration with the Windows shell. You can drag files and folders from Explorer and
drop them in Privacy Suite, or you can erase them directly from Explorer or My
Computer, with a single mouse click.
Eliminate newsgroup binaries (photos) and chat room conversations and Instant
Messages that are stored on your computer.
Erases folder structures (folders with all their subfolders and files) and even entire
drives.

Proceedings of 3rd Australian Computer, Network & Information Forensics Conference

Page 110

Delete "locked" Windows files, index.dat, the swap file and "cookies" that track your
Internet history .
Cookie management allows you to keep selected cookies.
Privacy Suite can automatically clear the contents of folders that usually contain
sensitive data (such as the Web browser cache, Temporary Internet files, the recent
document list, the folder designated for temporary files, etc.).
Advanced features like verifying each wipe pass and each disk operation allow Privacy
Suite to intercept any failures and inform you if data is not successfully erased.
The command line parameters allow you to insert erasing commands to your BAT files
and then run this BAT file automatically using SystemAgent or other scheduling
software.
USB flash mini/thumb drives.
Supports FAT12, FAT16, FAT32 and NTFS file systems, floppy, ZIP and Jaz drives.
(CyberScrub, 2005)

METHODOLOGY
Step 1
A PC was imaged with Microsoft Windows XP, service pack 2, and the latest Windows updates. The
BitTorrent client Azureus (version 2.3.0.4), an open source program, was downloaded and installed
(Azureus, 2005). As part of this install, the latest Java run-time environment (JRE) version 1.5.0 was also
installed (Sun Microsystems, 2005). After successful installation, a download of two legal files was
commenced using the Azureus program. At this point, the drive was imaged as the datum so that the three
erasure programs could be used. Details of the files used to perform the BitTorrent downloads are as
follows:
Observatory
Online
Archives

Volume
http://www.legaltorrents.com/bit/observatory-online-archives-vol-1.zip.torrent

1:

Lawrence Lessig Free culture: http://www.legaltorrents.com/bit/freeculture.zip.torrent


The first of these was a collection of MP3 music files, and the second a book title.
Step 2
The next step was to install one of the erasure programs, use it to erase Internet and downloading activities
with its default settings. Following this, the drive was imaged using dd on the Helix 1.6 Linux bootable
CDROM, and MD5 hashes of both the source drive and image files were created and compared for
consistency. If necessary, this image file was then examined using Autopsy (Sleuthkit 2005) to determine
whether information had actually been permanently erased. The registry of the source machine was also
examined to determine whether there were traces of BitTorrent activity still remaining.
Step 3
Repeat Step 2 using the image containing torrent software and torrent activity that was created in Step 1, but
with a different erasure program.
All erasure software was run with its default settings. The reason for this was that the aim of this research
was to determine what the programs themselves would erase. By altering settings from the default, the level
of knowledge of the researcher would influence what activities the erasure tools removed.

RESULTS
R-Clean and Wipe
This program did not remove any traces of orrent activity from the test machine. The Internet files which
contained links leading to the download site were deleted, but this was recoverable with Autopsy. The
actual files downloaded and the torrent file itself which pointed to the downloaded files was also not erased
(Figure 1). A search of the registry of this machine gave information relating to the exact files that were

Proceedings of 3rd Australian Computer, Network & Information Forensics Conference

Page 111

downloaded (Figure 2). In addition, the torrent files were still available in a hidden folder in documents and
settings for the user.

Figure 1 The torrent file linked to the download still remained after erasure

Figure 2: Evidence of the torrent activity was still found in the registry after using R-wipe and clean.
Window Washer
Whilst this program claims to remove evidence of P2P activity, it did not remove any evidence of the
BitTorrent downloading. As with the previous software, R-wipe and clean, evidence still remained in both
the files and in the registry of the test machine (Figure 3).

Figure 3: The torrent files used to download the test files were still found on the hard drive, without the
need for forensic analysis

Proceedings of 3rd Australian Computer, Network & Information Forensics Conference

Page 112

Figure 4: Evidence of torrent activity found in the registry after using Window Washer
Privacy Suite
Another package that listed removal of P2P activity on its web site, but did not remove all traces of torrent
activity. Unlike the previous two packages, this program did remove evidence from the registry, but again
did not remove the torrent files, or the downloaded files (Figure 5).

Figure 5: The program Privacy Suite did remove evidence from the registry, but did not remove the torrent
files that were used to download the files.

CONCLUSION
This research in progress examined the effectivness of three commercially available secure erasure
programs in removing evidence of BitTorrent activities. All three programs were found deficient when it
came to cleansing the PC of BitTorrent activity. Whilst forensic analysis was used with all programs,
location of a simple file, and in two cases a simple keyword based registry search, revealed that the
computer in question had been used to download files, and further, the names of these files were also
recoverable. It is worth pointing out again that these programs were used with their default settings. It is
likely that some of them may be configurable to remove traces of torrent activity. However, this would
require in depth knowledge of where the files and traces of torrent activity reside on the machine. If a user
already knows where this information is, then they would not be resorting to using an erasure program to
remove it.
Further research is necessary to determine whether, with modification from the defaults, these programs can
be made to remove evidence of torrent activity. There is also scope to examine other erasure software to
determine its effectiveness in perform the same task. More in-depth examination of the hard drives using
forensics analysis software to find out whether other evidence still exists will be a part of any further
research.

REFERENCES:
BBC (2000). Napster closure threat. Retrieved 5/9/05 from http://news.bbc.co.uk/1/hi/business/789132.stm
BBC (2005). File sharers move from Bit Torrent. Retrieved 4/9/05 from
http://news.bbc.co.uk/1/hi/technology/4196642.stm
BitTorrent (2005). What is BitTorrent? Retrieved 4/9/05 from http://www.bittorrent.com/introduction.html

Proceedings of 3rd Australian Computer, Network & Information Forensics Conference

Page 113

Cyberscrub (2005). Cyberscrub Privacy Suite 4 New features. Retrieved 15/9/05 from
http://www.cyberscrub.com/products/privacysuite/features.php?n=new_features_text
Ferguson, I. (2005). Kazaa appeal likely in 2006. Retrieved 9/9/05 from
http://www.zdnet.com.au/news/software/soa/Kazaa_appeal_likely_in_2006/0,2000061733,39210189
,00.htm
Layman, J. (2005). Legitimate use, open source, keep BitTorrent out of court. Retrieved 5/9/05 from
http://trends.newsforge.com/article.pl?sid=05/03/02/1748210&tid=147&tid=132
Linspire (2005). The worlds easiest desktop Linux. Retrieved 9/9/05 from http://www.linspire.com/
MetaMachine (2003). eDonkey v1.4 the most sophisticated file sharing technology available. Retrieved
4/7/05 from http://www.edonkey2000.com/index.html
Motion Picture Assicatioon of America (2005). Motion picture industry takes action against Rochester area
internet thieves. Retrieved 4/9/05 from http://www.mpaa.org/MPAAPress/2005/2005_07_28.doc
Rtt (2005). Disk Cleaning and PC Privacy: R-wipe & clean. Retrieved 14/09/2005 from http://www.rwipe.com/
SleuthKit (2005). Autopsy forensic browser. Retrieved 9/9/05 from http://www.sleuthkit.org/autopsy/
Webroot (2005). Window Washer. Retrieved 14/9/2005 from
http://www.webroot.com/consumer/products/windowwasher?rc=266&ac=383&wt.srch=1&wt.mc_id
=383

COPYRIGHT
Andrew Woodward 2005. The author/s assign the School of Computer and Information Science (SCIS) &
Edith Cowan University a non-exclusive license to use this document for personal use provided that the
article is used in full and this copyright statement is reproduced. The authors also grant a non-exclusive
license to SCIS & ECU to publish this document in full in the Conference Proceedings. Such documents
may be published on the World Wide Web, CD-ROM, in printed form, and on mirror sites on the World
Wide Web. Any other usage is prohibited without the express permission of the authors.

Proceedings of 3rd Australian Computer, Network & Information Forensics Conference

Page 114

Blackhat fingerprinting of the wired and wireless honeynet


Suen Yek
School of Computer and Information Science Security
Edith Cowan University
syek@student.ecu.edu.au

Abstract
TCP/IP fingerprinting is a common technique used to detect unique network stack characteristics of an
Operating System (OS). Its usage for network compromise is renowned for performing host discovery and in
aiding the blackhat to determine a tailored exploit of detected OSs. The honeyd honeynet is able to
countermeasure blackhats utilising TCP/IP fingerprinting via host device emulation on a virtual network.
Honeyd allows the creation of host personalities that respond to network stack fingerprinting as a real
network would. The nature of this technique however, has shown to provide inconsistent and unreliable
results when performed over wired and wireless network mediums. This paper presents ongoing research
into the TCP/IP fingerprinting capabilities of the popular host discovery tool Network Mapper (NMAP) on
the honeyd honeynet. The forensic analysis of raw packet-captures allowed the researcher to identify
differences in the modus operandi and outcomes of fingerprinting over the two mediums. The results of this
exploratory study show the process of discovery to uncover how TCP/IP fingerprinting with NMAP and
honeyd needs to be tested for effective network countermeasure.

Keywords
TCP/IP fingerprinting, NMAP fingerprint, NMAP signatures, honeyd honeypot

INTRODUCTION
TCP/IP fingerprinting is a scanning technique that can be used by network administrators to assess their
network in addition to blackhats identifying victims to exploit. The fingerprinting technique itself involves
determining any unique differences between Operating Systems (OS) and distinguishing the feature in
network packets so that it may be probed (Yarochkin, 1997). An OS fingerprint should have a distinctive
identification, that when recognised, reveals the platform name and version such as Microsoft Windows XP
SP2.
Network Mapper (NMAP) (Yarochkin, 2004c) is a tool that contains a database of known and previously
tested fingerprints by the online security community. Each fingerprint belongs to a specific OS platform that
is assigned a signature, which usually is identical to the actual OS name. The fingerprints themselves show
a series of TCP/IP probing packet sequences that are sent from the probing OS to the network stack of the
probed OS. Probing packets contain flag settings, which are changed according to the type of probe that is
sent. There is an extensive list of probes that may be sent by NMAP, mostly malformed TCP/IP packets,
which aids a clandestine approach to conducting scans.
NMAP is primarily used to determine the OS running on a selected IP address. This exercise is commonly
referred to as host discovery and NMAP performs this through sending probing packets to the specified
ports on the OS of the target IP selected. The host name that is discovered should correspond to a signature
in the NMAP database if it is known. NMAP can detect the types of services and applications that are
running on the ports that it scans. Consequently, a host discovery technique that may be fine-tuned using a
variety of probes aids to paint a full picture of the OS that is running on the IP of a target (Wolfgang, 2002).
As the tool is freely downloadable from the World Wide Web (WWW) and highly flexible in its use, it has
become a popular choice for administrators to use in addition to blackhats of various levels of sophistication
(Conry-Murray, 2003; Spitzner, 2003; Yarochkin, 1997, 2002).
The fingerprints in NMAPs database are tested and contributed by the security community, and maintained
by the original author Fyodor Yarochkin. When they are submitted to the database, they are tested by
several parties on unknown machines and OSs. A limitation that may arise is that the fingerprint may only
be effective for host discovery when performed from a specific platform OS or OSs. Consequently, when a
newly released OS is used to perform TCP/ IP fingerprinting, the results may not be consistently the same.

Proceedings of 3rd Australian Computer, Network & Information Forensics Conference

Page 115

One method for testing if NMAP can effectively fingerprint all the OS signatures in its database is by
configuring the honeyd (Provos, 2005) honeynet with all the signatures. A honeypot or honeynet is any
digital entity which behaves as a genuine resource when probed or attacked (Spitzner, 2003; The Honeynet
Project, 2004). The purpose of a honeypot or a honeynet, which is a network of honeypots, is to employ the
characteristics of the resource it is mimicking to deceive the blackhat. Honeyd is a daemon, which creates
host devices or a virtual network of devices by emulating lower layer protocols such as TCP, IP, UDP, and
ICMP as examples, in addition to upper-layer protocols including FTP, TELNET, and HTTP as part of an
OSs personality.
Honeyd was designed to countermeasure NMAPs fingerprinting ability, utilised by blackhats, by
employing its own techniques against itself. The virtual hosts and networks are configured through
templates that assign NMAP signatures to an assigned IP address. Honeyd can create many thousands of
hosts, with a personality that includes services, applications, and protocol instructions for each or any
specific port. For example, ports on a host may be configured to accept, drop or tarpit (prolong a connection
for an inevitable time) connection attempts initiated by the probing packets sent by the blackhat.
When a blackhat sends probing packets from NMAP to honeyd they may believe they are attacking a real
network of OSs because the responses that honeyd generates are identical to a real OS. This deceptive
capability may act as a network countermeasure for administrators attempting to prevent blackhats from
reaching the corporate network. In addition to this, the honeyd honeynet may act as a decoy to distract the
blackhat while the administrator monitors their methods and identifies the goal of their endeavour.
However, the effectiveness of honeyd to deceive network attackers is limited to its ability to mimic the
network stack of its configured hosts.

PREVIOUS STUDIES USING HONEYD TO COUNTERMEASURE NETWORK


ATTACK
In addition to the literature on the deceptive capabilities of honeyd and honeynets for decoying blackhats
from genuine systems and monitoring their activities when in the honeynet, studies by Gupta (2003)
presented results on the effectiveness of using honeyd as a network countermeasure. At the time, the
honeynet utilised a Linux Redhat 7.3 installation and honeyd version 0.4a with subsequent upgrade to 0.5a
to construct the deceptive network that was tested by voluntary participants of the study with network
attacking skills.
Cyclical experiments were conducted where the honeynet was subject to network penetration testing by
voluntary blackhat participants, and then upon feedback the honeynet was reconfigured to appear and
behave more securely. Three rounds of testing were conducted. The blackhats concluded that the initial
honeynet appeared to be unsecured and weakly configured and network logging showed high levels of
network (TCP, ICMP, UDP) traffic. By the third round, the blackhats reported that the network appeared to
be a well-configured corporate implementation allowing controlled levels of network traffic. These results
indicated that the honeyd was effective in deceiving the blackhat while network logging allowed the
researcher to monitor their activities while in the honeynet.
Subsequent research by Valli (2003) investigated how honeyd could be improved to deceive the blackhat. It
was deduced that the TCP/IP fingerprinting capabilities of NMAP against honeyd was one of the crucial
factors contributing to the blackhats deception. Also using Linux Redhat 7.3 as the base testing OS and the
then current honeyd version of 0.7a, Valli tested all possible NMAP signatures to determine which could be
fingerprinted across five separate scan-types over a wired medium. The results showed that of the possible
704 signatures, only 152 could effectively fingerprint.
The study utilised five of NMAPs probes, SYN, FIN, UDP, NULL, and XMAS, which are explained as
follows. The Synchronise (SYN) flag set in a packet that is usually sent to initiate a TCP connection. The
Finish (FIN) flag set in a packet is sent to tear down or terminate a TCP connection. The User Datagram
Protocol (UDP) utilises a flag set for a connectionless packet. The Null (NULL) flag has no flags enabled in
the packet; and the Christmas Tree (XMAS) flag enables a combination of the FIN, Urgent Pointer (URG)
and PUSH flags in a TCP packet. The URG flag indicates the packet requires urgent attention and is usually
for TELNET connections. The PUSH flag indicates not to wait before sending data.
Each of the probes form part of a scan-type, which implements the flag settings in a series of packets sent to
a target machine. NMAP interprets the response given by its target. A sophisticated user of NMAP may
identify with the types of responses that are produced from the scans. For example, a NULL scan-type that
yields a Reset (RST) response is most likely a Microsoft Windows OS (Yarochkin, 2004b).

Proceedings of 3rd Australian Computer, Network & Information Forensics Conference

Page 116

Of the 152 successful fingerprints, several were chosen to configure a wireless enabled honeynet using
honeyd version 0.8b on a Linux Mandrake 9.0 installation (Yek, 2003). The results indicated that the OS
signatures that could be previously fingerprinted over the wired medium did not fingerprint consistently or
reliability over the wireless medium (Yek, 2004). At the time, the network logging facilities afforded by
honeyd proved to be lacking in verbosity to identify the causes of fingerprint failure. The fingerprinting
tools NMAP and Nessus (Deraison, 2005) generated reports on the outcomes of scans and vulnerability
assessment.
While NMAP could not effectively scan all the OS signatures in honeyd, some could be fingerprinted across
the five scan-types previously used by Valli (2003). The network vulnerability assessments performed by
Nessus were mostly able to provide at least one significant weakness for a potential blackhat to exploit. The
overall results showed that the network would not be effective in deceiving a blackhat and countermeasure
network fingerprinting techniques. This study indicated that the TCP/IP fingerprinting of NMAP required
further examination into the causes of failure particularly over the wireless medium and to determine a
method for effective testing.

METHOD
In this research, the honeynet was an Athlon 1.5 GHz desktop machine with a Wireless Network Interface
Card (WNIC) extending an antenna for 802.11b wireless transmission and reception. Additionally, a NIC
allowed for a Category 5 (CAT-5) wired cable to be attached. The exploratory testing was conducted in a
university laboratory.
A minimal installation of Linux Mandrake 10.1 was set up as the base OS for running honeyd version 1.0
(Provos, 2005) the latest release. Mandrake 10.1 was the latest, robust version released at the time of
configuration. The honeyd had a minimal installation for a reduced number of unneeded utilities, programs,
and software to be running in the background of the honeynet, mitigating insecure programs and allowing
greater processing power for honeyd to run. The attack machine had the same base OS, except a full
installation including NMAP v.3.55 to allow the machine to run uninhibited. The attack machine was a
Pentium III, 800 MHz IBM Thinkpad Laptop which had inbuilt 802.11b wireless capability, which
contained a Hermes chipset to support promiscuous packet capture. The inbuilt card was later replaced (due
to failure) with an Orinoco 802.11b Direct Sequence (DS) Peripheral Connect (PC) card, which operated in
the same promiscuous manner.
The process for testing the TCP/IP fingerprinting was as follows. The NMAP database of OS signatures
were extracted as a text file and configured as hosts in the honeyd templates. The current version of NMAP
contained 988 OS signatures. The hosts imported into the honeyd templates were named host0001 to
host0988. These names held no significance other than representing a sequential numerical order. To
spread out the hosts over a realistic network of addresses, three B class network addresses were used, which
were 172.16.1.1 254, 172.16.2.1 254, and 172.16.23.1 226.
The honeynet machine was given an IP address of 192.168.1.1 on the eth1 interface and the attack
machine was given an IP address of 192.168.1.2 on its eth1 interface. The first tests were conducted over
a wired medium using a cross over cable connecting the two machines directly to eliminate any interference
from other networked devices.
On the initial start-up of honeyd with all the hosts loaded, six errors were reported indicating that the six
signatures had inaccuracies, which honeyd could not recognise. These signatures were deleted from honeyd
and honeyd was restarted without error. When honeyd was restarted successfully, the attack machine
initiated the first round of NMAP scanning to determine host name resolution via TCP/IP fingerprinting
over the wired medium, followed by the wireless medium. When the wireless scans were conducted, the
interfaces were changed from eth0 to wlan0 on the honeynet and the attack machines. No other changes
were made to the machines to mitigate the risk of confounding variables affecting the scanning results.

RESULTS OF THE FIRST ROUND OF SCANNING NMAP


For reliability, each scan-type was conducted five times on each IP address. Results showed that if a scantype could fingerprint the OS once, it would do so on the remaining four occasions. The version of NMAP
in use attributed percentage guesses for each OS that it found and ordered them alphabetically. Therefore,
several OSs could be guessed, where one or more could be assigned the highest percentage and numerous
others were assigned a lower percentage. The OS guess was counted as correct if the guess was one of the
highest percentage guesses listed. Figure 1 shows the generated output of an NMAP scan, which found

Proceedings of 3rd Australian Computer, Network & Information Forensics Conference

Page 117

several OS matches in its fingerprint of 172.16.0.7. The highest percentage score of 90% for the 3Com
Netbuilder Remote Office 222 router was taken as a correct guess.

Warning: OS detection will be MUCH less reliable


because we did not find at least 1 open and 1 closed
TCP port
Interesting ports on 172.16.0.7:
PORT
STATE SERVICE
1/tcp
open tcpmux
2/tcp
open compressnet
3/tcp
open compressnet
...
...
...
61440/tcp open netprowler-manager2
61441/tcp open netprowler-sensor
65301/tcp open pcanywhere
Device type: router|WAP|general purpose|switch
Running (JUST GUESSING) : 3Com embedded (90%), Compaq
embedded (86%), Netgear embedded (86%), Data General
AOS/VS (85%)
Aggressive OS guesses: 3Com Netbuilder Remote Office
222 router (90%), 3Com Netbuilder Remote Office 222
(ESPL-310), Version 10.1 (SW/NBRO-AB,10.1) (90%), 3Com
Netbuilder II Router Ver 11.4.0.51 (88%), 3Com
NetBuilder-II, OS version SW/NB2M-BR-5.1.0.27 (87%),
WAP: Compaq iPAQ Connection Point or Netgear MR814
(86%), AOS/VS on a Data General mainframe (85%), 3Com
SuperStack II switch SW/NBSI-CF,11.1.0.00S38 (85%)
No exact OS matches for host (test conditions nonideal).
Figure 1 example NMAP output on scan
Out of the 982 possible OSs in honeyd, NMAP could only resolve six of the signatures and therefore, only
six were effectively fingerprinted, which is shown in Table 1. In Table 1, the hostname and the IP addresses
do not hold significance towards the fingerprinting result. This initial round of scanning was primarily
concerned with determining which OS signatures used on honeyd could be fingerprinted by NMAP
consistently and reliably over the wired and wireless mediums.

Table 1 - NMAP signatures that fingerprinted across all scan-types


HOSTNAME
host0002
host0057
host0070
host0186

IP
172.16.0.2
172.16.0.57
172.16.0.70
172.16.0.186

host0341

172.16.1.87

NMAP SIGNATURE
3Com Access Builder 4000 Switch
Apple Color LaserWriter 600 Printer
Apple Mac OS 8.5.1 (Appleshare IP 6.0)
Cisco 7206 running IOS 11.1(24)
DSL Router: Flowpoint 144/22XX v3.0.8
or SpeedStream 5851 v4.0.5.1

Proceedings of 3rd Australian Computer, Network & Information Forensics Conference

Page 118

host0855

172.16.3.93

SCO Open Desktop 2.0

Upon assessing the remaining fingerprint results, 18 more signatures were fingerprinted at least four or
more times over the five scan-types conducted, which is shown in Table 2. The wired and wireless scores
out of five indicate the number of scan-types were successful. host0001, with signature 3Com / USR
TotalSwitch Firmware: 02.02.00R was successfully fingerprinted over three scan-types over the
wire and two scan-types over the wireless medium. For the purpose of this reporting, the type of scan-type
did not matter, as the goal was to fingerprint across all scan-types. However, it was found the SYN scans
were the most successful.

HOST
NAME

IP

host0001 172.16.0.1
host0009 172.16.0.9
host0055 172.16.0.55
host0056 172.16.0.56
host0091 172.16.0.91
host0185 172.16.0.185
host0258 172.16.1.4
host0325 172.16.1.71
host0495 172.16.1.241
host0497 172.16.1.243
host0505 172.16.1.251
host0541 172.16.2.33
host0558 172.16.2.50
host0681 172.16.2.173
host0717 172.16.2.209
host0805 172.16.3.43
host0910 172.16.3.148
host0948 172.16.3.186

NMAP SIGNARURE
3Com / USR TotalSwitch
Firmware: 02.02.00R
3Com NetBuilder-II, OS version
SW/NB2M-BR-5.1.0.27
Apple Color LaserWriter 12/660 PS
(Model No. M3036)
Apple Color LaserWriter 600
Printer
Asante FriendlyNet FR3004 Series
Internet Hub
Cisco 7206 router (IOS 11.1(17)
Compatible Systems (RISC Router,
IntraPort)
D-Link 704P Broadband Gateway
or DI-713P WAP
IBM MVS
IBM MVS TCP/IP TCPMVS 3.2
IBM OS/390 V5R0M0
Lantronix EPS2 print server
Version V3.5/2(970721)
Linksys BEFW11S4 WAP or
BEFSR41 router
Microsoft Windows Server 2003
MultiTech CommPlete Controller
(terminal server)
OpenBSD 3.0-STABLE (X86)
Speedstream 5871 DSL router
Toshiba TR650 ISDN Router

WIRED
/5
WIREL
ESS /5
TOTAL
/10

TABLE 2 - NMAP signatures that fingerprinted four or more times across all scan-types

5
5
5

2
3
4

7
8
9

5
5
5

1
1
4

6
6
9

At this point of testing, the hard disk on the honeyd honeynet failed and a reinstall was required. The new
honeyd that was configured remained as a Linux Mandrake 10.1 installation to maintain consistency;
however, the NMAP signatures had been updated to include over a 1000 in total, which was previously at
988. The attack machine was also modified as the Auditor Security Collection distribution version 20060502-no-ipw2100 which was released and tested by Valli (personal communication, September 5, 2005) to
verify the reliability of the wireless security tools that were part of the distribution OS.

Proceedings of 3rd Australian Computer, Network & Information Forensics Conference

Page 119

Further to the changes in the honeynet and attack machine, it was also decided that the testing environment
was unsuitable in a laboratory that was located within an 802.11b wireless saturated location of the
university. Using AiroPeek v.2.0, packet captures of the wireless traffic traversing the attack machine and
honeynet showed a high number of corrupted packets. TCPdump (Network Research Group, 2004) was
used to gather raw network packet captures of the wired and wireless traffic and was viewed through the
network packet analyser Ethereal (Combs, 2004). The comparison between a wired Ethernet header packet
and wireless header packet is shown in Figure 2 and indicates no significant errors at the TCP/IP level of
network packets. The packets shown in Figure 2 are a reflection of the results of TCPdump and it was
deduced that the errors occurring in the TCP/IP fingerprinting might have been at the lower network levels
of the physical and data link.
The new proposed solution was a faraday cage environment, in which a decommissioned stainless steel
cool-room was experimented with; however, the temperature became too hot with the equipment running
and there was no HVAC to circulate the air. The subsequent decision was to build a faraday cage with
inbuilt fans to regulate the heat and temperatures by circulating air out of the cage. A metal cabinet was
utilised, and holes where drilled to allow for minimal cabling and a fan at the top and bottom of the back
wall. The honeynet and packet capture machines were moved into the faraday cage and the attacking laptop
sat on top. When the cage was closed, it was able to keep out external 802.11b wireless traffic while the
antenna reception and transmission was unaffected inside the cage.

Source Port
Destination Port
61493
http (80)
Sequence Number
0
Acknowledgement Number
0
Flags Window
Length
Unused 0x0010
Size
20
ACK
3072
Checksum
Urgent Pointer
0x6b71 (correct)
Not set
Type

Length

Wireless IP header packet from the


attack machine frame# 13
Figure 2 Comparison of a wired and
wireless network packet

Data

Wired TCP header packet from the


attack machine frame# 13
Type of
Header
Total
Version
Service
Length
Length
4
Differentiated
20
40
service field
Fragment
Identification
Flags
Offset
0x911a (37146)
0x00
0
Time to Protocol
Header Checksum
Live
TCP
0x8dfa (correct)
(0x06)
46
Source IP Address
192.168.1.2
Destination IP Address
172.16.0.1
Options (if any)

Proceedings of 3rd Australian Computer, Network & Information Forensics Conference

Page 120

Several benefits were found when the NMAP scans were retested inside the faraday cage. Firstly, none of the scans took
more than one to two minutes to complete, whereas prior to testing in the cage, some wireless scans took up to 60
hours. It was deduced that the attack machine might have been performing a Denial of Service (DoS) onto the honeynet.
Network packets may have been lost in transmission or bits in the packets may have changed due to the wide area
exposed to the wireless network packets. High numbers of corrupted packets were detected by AiroPeek to support this
theory.
Additionally, the shorter scan times could be attributed to the faraday cage negating the outside wireless transmissions
of the universitys wireless Local Area Network (LAN). Another significant difference was the upgrade of NMAP as
part of using the Auditor distribution OS as the attack machine. NMAP version 3.75 was pre-installed on Auditor and
the subsequent changes included no percentage scores attributed to OS guesses. NMAP now reported all the OSs that it
guessed as most likely and did not report on possible other guesses. These OSs were also organised in alphabetical
order; therefore, if a guess was placed third in the list, it did not indicate it was a third guess. All the guesses had equal
weighting.

RESULTS OF THE SECOND ROUND OF SCANNING NMAP


An odd result that occurred in this round of testing was that the original six OS signatures that fingerprinted
successfully across all scan-types, did not succeed when tested again in the faraday cage. The 18 tentative OS signatures
that tested either four times or more in the laboratory tested more successfully in the cage. Table 3 shows the results of
the testing in the faraday cage. For this testing, all the redundant OS signatures were removed from honeyd, and only
the signatures that were to be tested this time were included in the honeyd configuration file. Consequently, only 24
signatures were incorporated as host01 to host24. The first six OS signatures were those that tested effectively
during the first round of scanning.

Table 3 NMAP signatures tested in faraday cage

HOST
IP
NAME
host01
host02
host03
host04

172.16.0.1
172.16.0.2
172.16.0.3
172.16.0.4

host05

172.16.0.5

host06

172.16.0.6

host07

172.16.0.7

host08

172.16.0.8

host09

172.16.0.9

host10

172.16.0.10

host11

172.16.0.11

host12

172.16.0.12

host13

172.16.0.13

host14

172.16.0.14

host15
host16
host17

172.16.0.15
172.16.0.16
172.16.0.17

NMAP OS SIGNATURE
3Com Access Builder 4000 Switch
Apple Color LaserWriter 600 Printer
Apple Mac OS 8.5.1 (Appleshare IP 6.0)
Cisco 7206 running IOS 11.1(24)
DSL Router: Flowpoint 144/22XX v3.0.8
or SpeedStream 5851 v4.0.5.1
SCO Open Desktop 2.0
3Com / USR TotalSwitch Firmware:
02.02.00R
3Com NetBuilder-II, OS version
SW/NB2M-BR-5.1.0.27
Apple Color LaserWriter 12/660 PS
(Model No. M3036)
Apple Color LaserWriter 600 Printer
Asante FriendlyNet FR3004 Series
Internet Hub
Cisco 7206 router (IOS 11.1(17)
Compatible Systems (RISC Router,
IntraPort)
D-Link 704P Broadband Gateway or DI713P WAP
IBM MVS
IBM MVS TCP/IP TCPMVS 3.2
IBM OS/390 V5R0M0

Proceedings of 3rd Australian Computer, Network & Information Forensics Conference

NUMBER
CORRECT
FINGER-PRINTS
/5
5
5
3
2
5
3
0
5
5
5
2
5
5
2
3
2
2
Page 121

host18

172.16.0.18

host19

172.16.0.19

host20

172.16.0.20

host21

172.16.0.21

host22
host23
host24

172.16.0.22
172.16.0.23
172.16.0.24

Lantronix EPS2 print server Version


V3.5/2(970721)
Linksys BEFW11S4 WAP or BEFSR41
router
Microsoft Windows Server 2003
MultiTech CommPlete Controller
(terminal server)
OpenBSD 3.0-STABLE (X86)
Speedstream 5871 DSL router
Toshiba TR650 ISDN Router

5
0
2
0
5
5
2

The results show that only three of the original six OS signatures fingerprinted effectively across all five scan-types
conducted. Further testing of the additional 18 showed that eight of the signatures could now fingerprint across all the
scan-types. These results firstly indicated that there was a variation with the NMAP fingerprinting because the first six
should have fingerprinted effectively in the faraday if they could be fingerprinted in an open laboratory with high
interference.
The possible reasons for the discrepancy in results were that NMAP v.3.55 was used for the first round of scans and
during the eight-month period between testing, NMAP v.3.75 was released and pre-installed on Auditor. With each
upgrade of NMAP, changes and improvements are introduced that may modify the scanning results (Yarochkin, 2004a).
This is due to changes in the scanning engine and sometimes, modifications to the fingerprints. This is the most likely
reason that the results differed in the six original fingerprints. An additional possibility is that the Auditor OS was used
as the base OS instead of Linux Mandrake 10.1, as was previously used. This difference is however, less likely to affect
the outcomes as to the change in NMAP itself.
The results that showed more effective results from the faraday testing, that is the remaining 18 OS signatures, indicated
that the testing was also affected by the faraday because some previous signatures that could not fingerprint were
fingerprinting successfully. The NMAP upgrade is also likely to be the reason for their testing success as the database
of OS signatures and fingerprints were expanded and improvements to some scan were made (ibid, 2004a)
From the now eleven available signatures that could fingerprint effectively, it was proposed to design a virtual network
using these signatures to build up OS personalities on eleven network hosts. However, upon closer examination of the
OS signatures, it was found that none was an Access Point (AP) and only the OpenBSD 3.0-STABLE (X86)
signature could act as a server, and none were able to act as clients. Most of the signatures that could be fingerprinted
were routers. Therefore, some additional testing was required so that a more believable network could be designed that
included servers, clients, and an AP.

RESULTS OF SUPPLEMENTARY NMAP SCANNING


The researcher then perused the list of fingerprints for potential OS signatures that could be used to fill out a virtual
network topology. Different OS signatures were chosen based on their believability for existing as part of a network.
For the proposed virtual DMZ, various Solaris, IBM, AIX, FreeBSD and NetBSD fingerprints were tested across the
five scan-types and from these, only FreeBSD and NetBSD could be successfully fingerprinted on all accounts.
Therefore, these OSs were chosen for the DMZ where the personalities could later be incorporated with mail, ftp and
http services.
For client machines, various Microsoft Windows and Apple Macintosh desktop OSs were tested and only Macintosh
signatures appeared to work. According to previous research results, Windows machines do not fingerprint when using
XMAS, UDP and NULL scans over the wireless medium. This has been reported due to Windows responding in a RST
for all these scan-types when no response should be the standard reaction. However, a Windows XP laptop with a
centrino chip was borrowed and tested against NMAP to determine if it could be fingerprinted. The result was that
NMAP was able to fingerprint the Windows laptop across all the scan-types and was able to guess the Windows XP OS
correctly, but among some other OSs as well.
Lastly, serval AP signatures were tested and it was found that a Cisco WGB350 802.11b WorkGroup Bridge
OS signature could fingerprint across all the scan-types effectively. Table 4 shows the additional signatures that were
added to the honeyd virtual network.

Table 4 Additional NMAP OS signatures


Proceedings of 3rd Australian Computer, Network & Information Forensics Conference

Page 122

HOST
NAME
host25
host26
host27
host28
Host29

IP

NMAP OS SIGNATURES

172.16.0.25
172.16.0.26
172.16.0.27
172.16.0.28
192.168.1.13

Cisco WGB350 802.11b WorkGroup Bridge


FreeBSD 5.0-RELEASE
Apple Mac OS 7.1
NetBSD 1.6
Windows XP SP2

Figure 3 Proposed honeyd network topology


The proposed virtual network of host devices including Macintosh clients, OpenBSD, FreeBSD, NetBSD servers and a
mix of networking mechanisms including an AP is illustrated in Figure 3. The attack machine is also shown in the
diagram as a laptop using the IP 192.168.1.12 connecting via its wireless antenna. This network will be tested with
other network fingerprinting tools such as Xprobe2 (Sys-Security Group, 2003), which functions similarly to NMAP in
an active manner, and p0f (Zalewski, 2004), which is a passive fingerprinting tool that assesses captured network traffic
instead of actively probing the target. It is intended that upon successful TCP/IP fingerprinting, the personalities of each
host will be configured to incorporate services, applications, and routing latency.

CONCLUDING DISCUSSION
Effective TCP/IP fingerprinting is an integral part of deceiving the blackhat when performing host discovery. The
honeyd honeynet implements NMAPs OS signatures and fingerprints for counteracting network-fingerprinting tools
like NMAP. Ongoing research has shown that the effectiveness of NMAP fingerprinting has varied both over the wired
and wireless mediums, and over varying releases of the each software in use. The best possible way of determining the
best OS emulation in honeyd is to test regularly the NMAP signatures and fingerprints.
This research reported on the results acquired from NMAP primarily as a comparison between wired and wireless
NMAP scans to determine where the problems have arisen. It was found the interference of other wireless activity and
large room space hinders effective physical and data link layer transmission between an attacking machine performing
the fingerprinting against the honeynet machine. It was then decided to test the fingerprinting in a faraday cage to
ascertain differences in results.
Proceedings of 3rd Australian Computer, Network & Information Forensics Conference

Page 123

The faraday cage proved more successful results in that 15 OS signatures could be fingerprinted, as opposed to the six
originally tested. This change was attributed to significant upgrades on the NMAP program itself, from v3.55 to 3.75 in
addition to the confining space and walls of the faraday, which allowed transmissions to occur faster and without
interference, resulting in the significantly shorter scanning times.
The outcomes of this research are that NMAP is a continuingly developing program as are honeynet architectures.
However, the implementation of NMAP signatures in honeyd is the most significant change that affects the TCP/IP
fingerprinting ability of the deceptive network. Therefore, effective TCP/IP fingerprinting is more reliant on the
environment in which it is performed, whether there is interference or a far range for signals to travel, in addition to the
version of NMAP used.
A final proposed test is to perform the attacks outside the faraday to determine if honeyd is still able to respond
effectively where there is the outside interference. This testing may illuminate on the honeynets ability to deceive in a
potentially live and realistic environment. Interference and uncontrolled signals are realistic obstacles for the blackhat
and are subsequent problems for the defender when attempting to employ deceptive network countermeasure against
TCP/IP fingerprinting. This can be further verified by inviting guest attackers to penetration test the network and
determine if these deceptive mechanisms work on a real blackhat!

REFERENCES
Combs, G. (2004). Ethereal (Version 0.9.14) [Network packet analyser].
Conry-Murray, A. (2003). Vulnerability assessment tools find new uses. Retrieved 7 October, 2004, from
http://www.networkmagazine.com/shared/article/showArticle.jhtml?articleId=14400061&pgno=2
Deraison, R. (2005). Nessus (Version 2.2.5) [Network vulnerability scanner].
Gupta, N. (2003, 25 November). Is honeyd effective or not? Paper presented at the 1st Australian Computer,
Information and Network Forensics Conference, Scarborough, Western Australia.
Network Research Group. (2004). TCPdump (Version 3.8.3) [Network traffic viewer].
Provos, N. (2005). Honeyd (Version 1.0) [Honeypot].
Spitzner, L. (2003). Honeypots - tracking hackers. Boston: Pearson Education Inc.
Sys-Security Group. (2003). Xprobe2 (Version 2.0.2) [Fingerprinting tool].
The Honeynet Project. (2004). Know your enemy: learning about security threats. Boston: Addison-Wesley.
Wolfgang, M. (2002). Host discovery with NMAP. Retrieved 1 September, 2005, from
http://moonpie.org/writings/discovery.pdf
Yarochkin, F. (1997). The art of port scanning. Retrieved 1 September, 2005, from
http://www.insecure.org/nmap/nmap_doc.html
Yarochkin, F. (2002). Remote OS detection via TCP/IP stack fingerprinting. Retrieved 1 September, 2005, from
http://www.insecure.org/nmap/nmap-fingerprinting-article.html
Yarochkin, F. (2004a). Nmap 3.75 released: less crashes and more OS fingerprints. Retrieved 15 September, 2005,
from http://seclists.org/lists/nmap-hackers/2004/Oct-Dec/0001.html
Yarochkin, F. (2004b). NMAP network security scanner man page.Unpublished manuscript.
Yarochkin, F. (2004c). NMAP: network mapper (Version 3.70) [Network exploration tool and security scanner].
Yek, S. (2003, 25 November). Measuring the effectiveness of deception in a wireless honeypot. Paper presented at the
1st Australian Computer Network & Information Forensics Conference, Scarborough.
Yek, S. (2004, 26 November). Implementing network defence using deception in a wireless honeypot. Paper presented
at the 2nd Australian Computer Network, Information and Forensics Conference, Fremantle.
Zalewski, M. (2004). The new p0f (Version 2.0.4) [Passive OS fingerprinting tool].

COPYRIGHT
Suen Yek 2005. The author/s assign the School of Computer and Information Science (SCIS) & Edith Cowan
University a non-exclusive license to use this document for personal use provided that the article is used in full and this
copyright statement is reproduced. The authors also grant a non-exclusive license to SCIS & ECU to publish this
Proceedings of 3rd Australian Computer, Network & Information Forensics Conference

Page 124

document in full in the Conference Proceedings. Such documents may be published on the World Wide Web, CDROM, in printed form, and on mirror sites on the World Wide Web. Any other usage is prohibited without the express
permission of the authors.

Proceedings of 3rd Australian Computer, Network & Information Forensics Conference

Page 125

How to build a faraday cage on the cheap for wireless security testing
Suen Yek
School of Computer and Information Science Security
Edith Cowan University
syek@student.ecu.edu.au

Abstract
The commonly known security weaknesses associated with the 802.11b wireless standard have introduced a variety of
security measures to countermeasure attacks. Using a wireless honeypot, a fake wireless network may be configured
through emulation of devices and the TCP/IP fingerprinting of OS network stacks. TCP/IP fingerprinting is one of the
most popular methods employed to determine the type of OS running on a target and this information can then be used
to determine the type of vulnerabilities to target on the host. Testing the effectiveness of this technique to ensure that a
wireless honeypot using honeyd may deceive an attacker has been an ongoing study due to problems conducting
TCP/IP fingerprinting in the wireless environment. Research conducted in a university laboratory showed that the
results were ineffective and the time taken to conduct testing could be as long as 60 hours. The subsequent exploration
of different testing methods and locations illuminated on an ideal research facility called a faraday cage. The design
and construction of the faraday is discussed in this paper as an affordable solution for controlled and reliable testing of
TCP/IP fingerprinting against the scanning tool Network Mapper (NMAP). The results are useful when looking to
deploy a deceptive honeypot as a defence mechanism against wireless attackers.

Keywords
wired and wireless TCP/IP fingerprinting, wireless NMAP scanning, faraday cage

INTRODUCTION
Research in wireless network security has become a major topic of interest with attacks becoming more apparent as
weaknesses in the lower layers of the Open Systems Interconnect (OSI) model reveal the shortcomings of open air
transmissions. The 802.11b standard for wireless networking has been vastly adopted and a variety of defence
mechanisms has been adopted for countermeasure such as Access Point (AP) firewalls and routing, wireless Intrusion
Detection Systems, and the recent experimentation with wireless honeypots. While wireless honeypot research has not
been widely explored, ongoing research (Yek, 2003, 2004) has focused on the premise that an emulated wireless
network may act as a decoy from genuine systems.
Furthermore, a wireless honeypot may be configured to emulate a wireless network that bridges wired resources to
deceive attackers and contain their network compromise while being monitored. The advantages of deploying a wireless
honeypot could therefore, detect and capture the attacker before damage is inflicted on genuine resources and assets.
One of the challenges found with emulating wireless networks is fine-tuning the data link and networking layers of
wireless transmission to appear real to an attacker (ibid, 2003; 2004). This paper explains some of the difficulties faced
when implementing effective TCP/IP fingerprinting on the network stack of emulated Operating Systems (OS) with a
honeypot.
The results of the research into performing TCP/IP fingerprinting over the wireless medium has uncovered a process of
discovery into ideal circumstances for testing this technique of OS identification. The goal has been to implement a
network of virtual devices called a honeynet as a potential network countermeasure for organisations using wireless
networks. The cycle of discovery lead to the construction of a faraday cage, which is a metal encased cage that does not
allow the escape of the 12.5 centre length 802.11b wireless electromagnetic frequency radio waves (RF) from the
inside. External wireless transmissions are not permitted inside the cage either. The design and physical characteristics
of the cage are described in this paper as a financially feasible and easy construction.

TCP/IP FINGERPRINTING IN A WIRELESS ENVIRONMENT


Previous studies into the art of TCP/IP fingerprinting have uncovered its value in performing OS discovery and
subsequent network reconnaissance for attackers, in addition to aiding the administrator in conducting a network audit.
The technique may be executed through a port scanning tool such as Network Mapper (NMAP) (Yarochkin, 2005).
NMAP is an active scanning tool that sends probing packets to query any unique differences on its target OS. Other
active tools include Xprobe2 (Yarochkin & Arkin, 2003) and Queso (Savage, n.d.), which function in a similar manner.
Proceedings of 3rd Australian Computer, Network & Information Forensics Conference

Page 126

The common feature essential to the tools success is the ability to reach its target without disorganising the bit
sequences when used over radio waves. This problem is not particularly evident when performing TCP/IP
fingerprinting over the wired medium where studies have found that a significant number of emulated OSs could be
guessed correctly than over the wireless medium.
The deceptive ability of the wireless honeypot is highly dependent on its ability to emulate OS devices through
responding to NMAPs TCP/IP fingerprinting. The honeyd is one such honeypot that creates host OSs that implement a
fingerprint containing a sequence of TCP/IP exchanges that are ready to respond when queried. A subsequent database
of these fingerprints is maintained by the author of NMAP and is continuously updated and improved by the online
security community. While NMAPs use was indented for security purposes, it is also a commonly used tool by
attackers of varying sophistication due to the flexibility of the tool and the granularity of which it is able to perform OS
discovery (Honeynet Project, 2004; Spitzner, 2002). Attackers would usually progress from the stage of OS discovery
to network discovery and consequently begin locating vulnerable targets to execute a tailored attack.
Testing TCP/IP fingerprinting to determine an effective way to countermeasure NMAP and other such scanning tools
has proven to be a defiant task because of many variables in the software of the tool, and mostly the nature of the
802.11b wireless environment. Wireless networks face difficulties because of the ease of interference. Interference may
be from competing wireless devices including Bluetooth 802.15 transmissions to lights, microwaves or physical objects.
Wireless networking in itself requires data to piggyback over RF waves that may travel in omnidirectional pathways
that require both the sender and receiver to transmit with a greater overhead and encounter many retransmissions due to
lost data or bits.

PROBLEMS ENCOUNTERED WITH WIRELESS TCP/IP FINGERPRINTING


Previous research by Valli (2003) involving the use of honeyd and NMAP employing TCP/IP fingerprinting in the
wired environment showed that out of a possible 704 OS fingerprints, NMAP was able to determine 152. Although this
only represented approximately 22% of the total possible, the 152 OS fingerprints provided sufficient choices for a
deceptive network to employ and deceive unknown attackers (Gupta, 2003). From these successful OS fingerprints, a
number were chosen to implement into the first undertaking of wireless testing. It was found that these fingerprints
which once were successfully identifying the underling OSs on hosts over the wire could not do so effectively in a
wireless environment.
The first wireless TCP/IP fingerprinting conducted (Yek, 2003) was in a university laboratory where there was little
interference from competing wireless devices. Furthermore, the proximity of the two interacting devices was within two
metres. The tests were performed using NMAP, followed by Nessus (Deraison, 2003) which also employs the NMAP
fingerprint engine and performs vulnerability assessments on the target OSs. The unsuccessful results of this testing
lead to an inquiry for reasons that may have contributed to the failed fingerprints that were effective on the wire.
The logging facilities afforded by the honeyd honeypot identified only connection attempts to ports and gave IP
numbers, which lacked richness in the reporting of network activity. The deductions made from these research
outcomes were that packet latency could have been occurring, bits may have been disordered, and TCP/IP packets may
not have been performing the three-way handshake effectively.

THE EVOLUTION OF WIRELESS TCP/IP FINGERPRINTING


Subsequent testing was performed using TCPdump (Network Research Group, 2004) to capture raw TCP/IP packets
and identify where differences were occurring over the wired and wireless mediums for direct comparison. This testing
was also conducted in a university laboratory. The results of the fingerprinting showed that of the then current number
of 988 OS fingerprints, 6 could only be fingerprinted on the wire and the wireless concurrently. Additionally, 18 OSs
could be fingerprinted on most occasions but not effectively over NMAPs variety of queries, leaving a remainder of
964 fingerprints unanswered. The TCPdump captures were imported into the network protocol analyser Ethereal
(Combs, 2004) for view of individual packets.
Ethereal revealed that there were no significant differences between TCP/IP packets aside from usual additional
wireless overhead. The most significant indication being that there were few errors in these packets. It was then
determined that errors were occurring below the layer three networking level and most likely occurring at the data link
and physical layers of the OSI model.
AiroPeek (WildPackets Inc, 2003) was subsequently used to perform a wireless packet capture that included lower level
activity. The logs of AiroPeek showed a high number of corrupted packets in addition to detecting extremely high
packet levels of the universitys newly installed wireless mesh network, also functioning in the 2.4 gigahertz spectrum.
Scan times when conducted in the laboratory could be up to 60 hours when performed over the wireless medium. At
that stage of the testing, it was not questioned and assumed that this length of fingerprinting was normal although time
consuming. It was then decided that the location of the fingerprinting tests should be changed to prevent interference
Proceedings of 3rd Australian Computer, Network & Information Forensics Conference

Page 127

from the universitys wireless network activity. The proposed solution was a decommissioned cool-room, presumably
utilised by students in the hospitality courses. The cool-room appeared ideal as it was encased in metal originally for
retaining the low temperatures for food consumables. The metal acted as a barrier from the universitys wireless
interference in addition to keeping in the wireless transmission of the fingerprinting machine and the honeyd honeypot.
Three machines, keyboards and monitors were relocated into the unused cool-room, which acted as a faraday cage.
When the machines were powered up to conduct the next round of testing it was found that the temperatures within the
once cool-room became excessively hot. The Heating Ventilation Air Conditioning (HVAC) was no longer functioning
for that area and air could not be effectively circulated and cooled to allow the machines to operate. While the coolroom acted as a faraday cage effectively to fend off gratuitous wireless traffic, the new problem of temperature
regulation had to be dealt with.
It was then proposed to build a faraday cage using a metal cabinet, based on the same premises of the cool-room not
allowing the ingress or egress of wireless traffic. A security lecturer at the university had already ordered some metal
cabinets for storage and the idea was conceived to utilise and transform one of the cabinets into a faraday cage. The
cabinet had previously been purchased at AUD $300 with dimensions of 1.74 metres in height, 0.6 metres in width, and
0.66 metres in depth. These dimensions were not chosen for the purpose of the faraday, they were merely what was
available in the university for use. However, these dimensions proved to be ideal to conduct the TCP/IP fingerprinting.
The earlier difficulties regulating the air temperatures in the cool-room were then addressed by drilling a hole at the top
of the rear wall of the cage. A small hole was also made in the left, rear corner to allow cabling to pass into the faraday
cage and provide outside connection. An earth was also attached to the chassis of the cage to allow any static charge to
dissipate into a safe outlet. The fan that was attached was a large sized computer fan, which later proved too small to
push air out and retain a cool enough temperature within the faraday to allow three machines to run. Two larger fans
were then installed, which costed about AUD $50. One remained at the back wall pushing air out while one was drilled
and attached to the bottom back wall to suck air in. The fans were both PAPST-brand Megafans purchased from a local
wholesaler. Their diameters were 119 millimetres each, which allowed them 1799 Revolutions Per Minute (RPMs) of
rotation. The airflow equated to approximately 27.78 litres of air flow a minute. The noise levels of the fans were as low
as 28 decibels.
When the faraday cage was built, the machines were re-entered in. A Keyboard Video Monitor (KVM) switch was
utilised to minimise cabling into the faraday. The holes allowing cabling and the fans to operate were large enough to
serve their purposes and were not too large to allow the 802.11b RF to seep in or out. This setup allowed just the two
PC towers to reside in the cage at 0.44 metres each in height, 0.016metres in width, and 0.45 metres in depth. Their
subsequent dimensions did not affect the temperature regulation. The third machine was an IBM laptop, which sat
ontop of the two towers.
The resulting total air flow within the faraday, with all the machines inserted, was calculated by the number of litres air
flow (27.78) multiplied by 60 seconds, which equalled to 1666.8 litres per minute. This amount is equivalent to 1.6668
cubic metres per minute. When the machines were entered into the cage, the AiroPeek packet capture was run to
determine if external 802.11b wireless activity could be detected. AiroPeek did not identify any wireless activity when
the cage doors were shut. The machines were then powered up and AiroPeek was able to detect the 802.11b beacons
that were being transmitted from the honeypot.
From the beacons, the packet capture identified the Service Set Identified (SSID), which is the name of the network, the
Medium Access Address (MAC) and other distinguishing characteristics that aid the identification of the wireless
network properties. These properties are beaconed to allow other wireless stations to identify with and connect to, and
perform the TCP/IP fingerprinting. Furthermore, ther machines inside the faraday still resembled an open air
configuration where they were able to connect via the honeypots AP.

OUTCOMES OF THE NEW FARADAY CAGE


The subsequent configurations of the new faraday cage proved to be highly beneficial in the outcomes of TCP/IP
fingerprinting. The most significant change was the dramatic drop in time to conduct each individual scan. All times fell
between a few seconds to no more than two minutes. RF waves were not able to pass through the metal walls and
continue their transmission in a directional path. The universitys wireless mesh was not able to penetrate through the
walls of the faraday and interfere or dilute the wireless transmissions within the cage.
If packet latency was decreased, the machines may not have needed to continuously resend packets reducing packet
overloading, which could have previously resulted in a denial of service on the honeypot. The honeypot may not have
been able to process packets efficiently when it did not receive sufficient data to send a response. The attack machine
may have subsequently kept resending packets attempting to illicit a fingerprint response from the honeypot.
The containment of wireless transmission space most likely allowed both the honeypot and attack machine to exchange
packets in a timely manner. However, while the scan times were improved, not all the OSs in honeyd could be
Proceedings of 3rd Australian Computer, Network & Information Forensics Conference

Page 128

fingerprinted even in the faraday cage. This result was reflective of NMAPs inability to fingerprint all OSs on the
wired Local Area Network (LAN) and therefore, would not be likely to fingerprint all OSs via wireless transmission.
The goal of the fingerprinting tests however, were to determine a sufficient number of OSs that could then later be
configured to emulate fully operational devices in honeyd (Provos, 2004). Additionally, the AiroPeek capture still
detected some corrupted packets, which could mean that altered data bits carried on the RF waves may not be
avoidable.

CONCLUSION
TCP/IP fingerprinting is one effective method for an attacker to discover OS devices for possible target. Its usage over
wired mediums have shown it to be a strategic method of exploit and the subsequent development of honeypots
adopting countermeasures. The transition of TCP/IP fingerprinting to the wireless environment however, has revealed
difficulties in the handling of packets over the RF waves.
Testing TCP/IP fingerprinting over the wireless medium has proven to be a cycle of discovery where once a university
laboratory was perceived to be an appropriate testing location. Results quickly indicated that numerous interfering
variables such as other wireless traffic and physical obstacles prevented the efficient transmission of packets. The
confines of a metal cage were then conceived as an ideal testing environment. The subsequent utilisation of the faraday
cage allowed wireless testing of TCP/IP fingerprinting between the honeyd honeypot and an attack machine utilising
NMAP.
Results of testing inside the faraday cage included vastly improved scan times, which were reduced from 60 hours
down to two minutes in some cases. This phenomenon was attributed to the cage peripheral being confined in space to
disallow wireless packet transmissions to dissipate and scatter beyond the reception of the honeypots antenna.
Additionally, it appeared that the outside wireless network transmissions were eliminated from the cages as they could
not penetrate through the metal construct. The implications of such a result highlights the value of using a faraday cage,
in which its construction was very simple and inexpensive, for wireless TCP/IP fingerprinting.
The importance of effective TCP/IP fingerprinting over a wireless environment is imperative to constructing a honeypot
that can deceive an attacker seeking a wireless entry point into an organisations network. Wireless honeypots that are
deployed may alert an organisation to the unsolicited connection of wireless stations or devices. Attackers that
intentionally attempt connections to a wireless network may perform reconnaissance techniques to determine the nature
and topology of the network in addition to locating individual hosts that may be compromised. The first stage to this
endeavour is often a TCP/IP scan to identify host OSs. Therefore, a wireless honeypot that can effectively
countermeasure this technique may contain the attackers attempts and later also provide forensic evidence of the
compromise.
Building a faraday cage that is both economical and easy to construct is part of the development towards network
countermeasures against wireless attacks. A testing environment that is able to eliminate confounding variables allows
researchers to experiment and explore varying defence mechanisms for countermeasures that can be singled out at the
physical layer, data link layer, and network layer as has been the process of discovery in this research. It is intended that
when upper layer protocols and applications are added to the honeyd honeypot, the faraday will act as a great assistance
towards identifying and distinguishing where errors are occurring and where deceptive mechanisms are failing. The
faraday allows for more apparent insight into the complex and intricate workings of wireless attack and subsequent
defence.

REFERENCES
Combs, G. (2004). Ethereal (Version 0.9.14) [Network packet analyser].
Deraison, R. (2003). Nessus (Version 2.0) [Network vulnerability scanner].
Gupta, N. (2003). Is honeyd effective or not? Paper presented at the 1st Australian Computer, Information and Network
Forensics Conference, Scarborough, Western Australia.
Honeynet Project. (2004). Know your enemy: Learning about security threats (2nd ed.). Boston: Addison-Wesley.
Network Research Group. (2004). TCPdump (Version 3.8.3) [Network traffic viewer].
Provos, N. (2004). Honeyd - network rhapsody for you. Retrieved 12 February, 2004, from
http://www.citi.umich.edu/u/provos/honeyd/
Savage. (n.d.). Queso (Version no version) [Fingerprinting tool].
Spitzner, L. (2002). Know your enemy. Indianapolis: Addison-Wesley.

Proceedings of 3rd Australian Computer, Network & Information Forensics Conference

Page 129

Valli, C. (2003). Honeyd - a fingerprinting artifice. Paper presented at the 1st Australian Computer, Information and
Network Forensics Conference, Scarborough, Western Australia.
WildPackets Inc. (2003). AiroPeek (Version 2.0) [Wireless packet capture]. Walnut Creek, California.
Yarochkin, F. (2005). NMAP: network mapper (Version 3.75) [Network exploration tool and security scanner].
Yarochkin, F., & Arkin, O. (2003). Xprobe2 (Version 2.02) [Fingerprinting tool].
Yek, S. (2003, 25 November). Measuring the effectiveness of deception in a wireless honeypot. Paper presented at the
1st Australian Computer Network & Information Forensics Conference, Scarborough.
Yek, S. (2004, 26 November). Implementing network defence using deception in a wireless honeypot. Paper presented
at the 2nd Australian Computer Network, Information and Forensics Conference, Fremantle.

COPYRIGHT
Suen Yek 2005. The author/s assign the School of Computer and Information Science (SCIS) & Edith Cowan
University a non-exclusive license to use this document for personal use provided that the article is used in full and this
copyright statement is reproduced. The authors also grant a non-exclusive license to SCIS & ECU to publish this
document in full in the Conference Proceedings. Such documents may be published on the World Wide Web, CDROM, in printed form, and on mirror sites on the World Wide Web. Any other usage is prohibited without the express
permission of the authors.

Proceedings of 3rd Australian Computer, Network & Information Forensics Conference

Page 130

Вам также может понравиться