Вы находитесь на странице: 1из 62

Database

VOl.2NO.10

METADATA
ANALYSIS TOOLS
AND TECHNIQUES
DEMYSTIFYING METADATA
EXTRACTING AND USING METADATA FOR A
DIGITAL FORENSIC INVESTIGATION:
A STEP BY STEP PROCESS
TOP METADATA CONSIDERATIONS FOR
NETWORK SECURITY
METADATA IN DIGITAL FORENSICS
METADATA: WHAT IS IT AND
WHY SHOULD YOU CARE?
Issue 10/2013 (14) August

What do all these have in common?

They all use Nipper Studio


to audit their firewalls, switches & routers
Nipper Studio is an award winning configuration auditing tool which
analyses vulnerabilities and security weaknesses. You can use our point
and click interface or automate using scripts. Reports show:
1) Severity of the Threat & Ease of Resolution
2) Configuration Change Tracking & Analysis
3) Potential Solutions including Command Line Fixes to resolve the Issue
Nipper Studio doesnt produce any network traffic, doesnt need to
interact directly with devices and can be used in secure environments.

www.titania.com
T: +44 (0) 1905 888785

SME
pricing from

650
scaling to
enterprise level

evaluate for free at


www.titania.com

Cyber attacks are on the rise.

So, you think your systems


and networks are secure?
Think again youve already been attacked and compromised.
And, we should know because we did it in less than four hours. Heres the good news:
were the good guys. We can tell you what we did and how we did it, so youll be
prepared when the bad guys try it and they will. Well show you how.

4 Combat cyber attacks

4 Ensure resilience

4 Mitigate risk

4 Improve operational efficiency

Visit www.KnowledgeCG.com to learn how KCGs experienced, certified cybersecurity


professionals help our government and commercial customers protect their
cybersecurity programs by knowing the threat from the inside out.

Trusted Cyber Advisor

TEAM
Editors:
Joanna Kretowicz
joanna.kretowicz@eforensicsmag.com
Nadia Mawloud
nadia.mawloud@software.com.pl
Betatesters/Proofreaders:
Kishore P.V , Mada R Perdhana, Olivier
Caleff, Jeff Weaver, Massa Danilo, Craig
Mayer, Andrew J Levandoski, Richard
Leitz, Lee Vigue, Jan-Tilo Kirchhoff,
Owain Williams, Craig Mayer, Larry
Smith, Sundaraparipurman Narayanan,
Henrik Becker, Yousuf Zubairi

Senior Consultant/Publisher:
Pawe Marciniak
CEO: Ewa Dudzic
ewa.dudzic@software.com.pl
Production Director: Andrzej Kuca
andrzej.kuca@software.com.pl
Marketing Director: Joanna Kretowicz
jaonna.kretowicz@eforensicsmag.com
Art Director: Ireneusz Pogroszewski
ireneusz.pogroszewski@software.com.pl
DTP: Ireneusz Pogroszewski
Publisher: Hakin9 Media Sp. z o.o. SK
02-682 Warszawa, ul. Bokserska 1
Phone: 1 917 338 3631
www.eforensicsmag.com

DISCLAIMER!
The techniques described in our articles
may only be used in private, local networks. The editors hold no responsibility
for misuse of the presented techniques or
consequent data loss.

Dear Readers!
Welcome to eForensic Magazine! We are proud to present our new issue
entitled Metadata Analysis Tools and Techniques. We decided to focus
on a topic that each and every one of us encounters on a daily basis and
that we believe will be of your interest and beneficial to read and learn
more about it. Metadata is crucial and is being used in various investigations, storage, processing, intelligence etc. and can be found in almost
any device.
The authors in this issue will describe the Metadata from the very basics
of what it is to more concrete examples of programs and usage. They will
show what tools are good to use in metadata and how to analyze them.
The authors who wrote these articles are professionals in this area and
who have agreed to share their expertise with us.
Our primal goal is to provide you a high quality of information and satisfaction. We are eager to hear your comments and suggestions for future
publications and what YOU would like to read more about. With high
hopes and excitement, we invite you to enter the world of Metadata!
eForensics Team

contents

NEARLY EVERYTHING IN YOUR CASE IS METADATA


by Trent Struttman

08

A PRIMER ON METADATA ANALYSIS


by Jeffrey Lewis

12

UNDERSTANDING FILE METADATA


by Chris Sampson

16

DEMYSTIFYING METADATA
by Mark Garnett

28

There are many forensic tools to help an analyst find out what happened in a case. The most common are the most popular automated forensic tools: EnCase and FTK. Each program provides a
wealth of tools for the examiner through both built-in and external scripts. EnCase provides the analyst many tools for metadata analysis within the Case Processor script and great support for third
party scripts. FTK has great email and document file analysis tools.

In the example from Bushs life, the memo is the data and the font is the metadata. Metadata is
data about data. Anything that describes data is metadata. There are different metadata standards
for different types of data. Information is not searchable and accessible without metadata. For example, without metadata you do not know who took a photograph, when they took it, what tool
they used to capture the image, any feedback on the image, topics and subjects as well as other
pertinent information.

Metadata exists throughout data storage systems, from the creation and modification dates stored
within the file system, through to specific information embedded within the content of a file. Metadata can be hugely important to any forensic investigation, knowing how to extract this information and spot when it has been manipulated can prove very important. This article, aimed at those
new to forensics, looks at various forms of metadata and provides examples of the way in which we
can manually retrieve this information using the information that is available within our operating
systems and moving on to other tools which can be used to extract this data from many different
file types.

Metadata are those often quoted, but sometimes misunderstood, attributes of a file that can sometimes provide the sought after breakthrough in determining what happened when on a computer
system with respect to particular documents. They are of paramount importance in those investigations involving the theft of intellectual property, electronic discovery, fraud and misconduct
investigations and patent disputes.
a

32

EXTRACTING AND USING METADATA FOR A DIGITAL FORENSIC


INVESTIGATION: A STEP BY STEP PROCESS
by Marc Bleicher

Metadata can often contain that needle in the haystack youre looking for during a forensics investigation; in fact it has helped me out in the past quite a few times. One particular case that stands
out the most was an internal investigation I did for the company I was working for at the time. Most
of the cases I dealt with in this role related to employee misconduct, which included wrongful use,
inappropriate behavior, harassment, etc. In this situation, metadata was the key piece of evidence
in the case of a lost smart phone.

38

VIEWING THE TREES IN SPITE OF THE FOREST


by Robert Reed

44

TOP METADATA CONSIDERATIONS FOR NETWORK SECURITY


by Brian Contos

48

METADATA: WHAT IS IT AND WHY SHOULD YOU CARE?


by Dr. Johnette Hassell & Jack Molisani

54

THE METADATA ANALYSIS TOOLS AND TECHNIQUES (HOW TO)


by Dr. Sameera de Alwi

58

With recent events in the news there is an increased interest into metadata and how it may be
used. What is metadata and what can it tell us? Forensics examiners have known for some time
now about metadata and have probably used it to assist in investigations. Meta data can be used
for a great many tasks from file attribution and intelligence gathering, to revealing manipulation
of time and date stamps. The manner in which metadata can be used is really a matter of the approach and creativity of the examiner. To get a better hold on what metadata is, a definition is
needed. Bert Moss on Metadata

In June 2013 the term metadata which is most generally defined as data about data, went mainstream following the Guardians NSA Prism program article. For many years the security industry
has been working with metadata and developing best practices around handling metadata and
even choosing the right technology for specific use cases. This article will focus on key areas of consideration when looking to leverage metadata to improve network security.

Until Edward Snowden unleashed his allegations about the US and UK collecting phone information on millions of their citizens, the word metadata was the providence of attorneys and computer
forensic/eDiscovery nerds, such as these authors. And while the world may be aware of the term,
few truly understand the breadth and pervasiveness of computer metadata.
In this article we will discuss what computer metadata is, explain its importance in investigations
and litigation, and provide a variety of examples.

Metadata is organized information that pronounces, clarifies, discovers, or else brands it laid-back
to recover, custom, or achieve an information resource. Metadata is frequently termed data about
data or information about information. An imperative motive for forming evocative metadata is to
expedite discovery of germane information. In adding to resource discovery, metadata can assist
consolidate electronic resources, enable interoperability and bequest resource amalgamation, deliver digital identification, support archiving and conservation. Metadata scrutiny is one of countless diverse types of analysis. The interpretation of consequences from whichever solitary examination process might be indecisive. It is imperative to authenticate verdicts with supplementary
analysis modus operandi and algorithms.

METADATA IN DIGITAL FORENSICS

by Bert Moss
In this article I will write about what is Metadata, some metadata analysis / extraction tools and the
various techniques that can be utilized in extracting and analyzing metadata mainly from a Digital
Forensics standpoint. As you may already know, data is usually described as a collection of facts,
such as values or measurements. It can be numbers, words, measurements, observations or even
just descriptions of things.

NEARLY EVERYTHING
IN YOUR CASE IS
METADATA
by Trent Struttmann

When I was asked to write this article, I didnt have any idea
where to start. I wanted a more specific topic. I could write
volumes on the metadata I could find in a case, I wanted a more
specific topic as metadata is, as said in the PFC Manning Trial,
is just data about data. I would agree, it is really any two pieces
of data that you can link together. It can tell you more about
what occurred on a computer better than the data itself can.
But to me the most interesting part of metadata, and maybe
one of the best ways to explain why you care about metadata,
is its potential for application in building a body of evidence for
a court case. Metadata can give you context for an event on a
computer.

ive me a little bit of information on concrete how/when/


from where/by whom data
that metadata can give you.
There are many forensic tools to
help an analyst find out what happened in a case. The most common are the most popular automated forensic tools: EnCase and FTK.
Each program provides a wealth of
tools for the examiner through both
built-in and external scripts. EnCase
provides the analyst many tools for
metadata analysis within the Case
Processor script and great support
for third party scripts. FTK has great
email and document file analysis
tools.

Lets get the big boys out of the


way.
The EnCase Case Processor contains a wealth of capabilities for
metadata analysis. Some of the
functions include a powerful registry analyzer with a number of forensically interesting spots already included for you: link file analyzer, a
recycle bin record finder, case initialization scripts, chat locators, and
webmail finders, among others. I
will talk about how you can use all of
this metadata to make conclusions
about your case later.
Probably the most interesting metadata are link files. Link files or Shortcuts can tell you a number of things

if analyzed with the correct tools. Combining certain metadata with the link file you can draw a solid
conclusion that the user of the computer knew that
a certain file existed.
Your Case: Lets say we are analyzing the computer of a user that was an accountant at a local
hardware store. Lets call the user Vector. You
have been contacted by the hardware stores attorney after the employee was downsized. You
find out through the attorney that the owner suspects accounting discrepancies were to blame for
the recent financial losses. He also thinks that the
Vector was keeping two sets of books. Vector also seemed to be working late even on days when
there werent many sales. After following procedures for taking a forensic image of the computer,
you begin your analysis.
Forensically interesting metadata in this case
are located in the following places: Link files found
in the recent documents folder, Link files found in
the registry, Link files found in the Internet History,
Windows registry USBStor, metadata from video
files and pictures found on the local computer. The
useful metadata gathered using an automated tool
is NTFS (New Technology File System) MFT (Master File Table) data. This information tells you when
files are last accessed, created, modified, and the
last time the MFT entry changed. Using a program
lnkanalyser you can parse relevant link files, to
reveal file-access times, the metadata for the file
the link points to. Not only can a complete link tell
you when a user clicked and opened a file, it also can tell you when the original file was created,
modified, and accessed.
In this case Vector, our accountant, had two
folders in his My Documents folder: one that contained the yearly accounting log and one that contained the weekly reports for the owner. Your tool
of choice gathers metadata that tells you the file
times, allowing you to see that the yearly log was
created the first week of January and the weekly
logs were created each week. They were both last
modified the Friday before Vector was let go.
By looking at the recent links in the users profile
folder you find the next set of important metadata
in this case. The most recently opened files on the
computer, called the link files, can tell you who accessed which files and when.
Most automated forensics tools provide methods to easily tag or recover deleted link files. Dig
deeper into the present and recovered links and
it is possible to tell that this user accessed other
files that used to reside in the weekly and yearly
accounting data folders. For nearly every week
Vector was the accountant there are two sets of
links: one with the original file name still living in
the weekly folder and a second to a now-deleted file that once resided in the weekly folder. We
can even tell that the filename had a Copy of
www.eForensicsMag.com

prefix, because this was Windows XP. If this had


been windows 7 or 8 you would see copy appended. This tells me that every week the user
made a copy of the weekly report and from the
last modified timestamp, that that copy had been
opened. The MFT times of the link files can reveal
even more about when the user opened the copies. A link file by its nature links data so the most
significant data contained in there is the location
of the file or folder it points to. In some cases link
files also contain the volume name of removable
media or computer name can be found in the
linked file. In this case, the significant targets of
the links are on the local drive and to an E: drive
to a folder named books. The volume name on
the link with the E: drive is photos. This can tell
me two things one that the file resided on some
mounted media and two that the volume name of
that removable media is photos.
It is also possible to find metadata about files
that no longer exist. Automated tools have built in
mechanisms to recover easily recoverable files.
You can also recover files using Recover My Files
from GetData, www.getdata.com or any number of
data recovery utilities. The accuracy of the file information recovered using external tools varies, so
be very careful quoting file time data from these
utilities. Recover My Files has an option to keep
file metadata, and in my experience it has been
accurate.
An examination of the internet history, using a
program called HstEx, uses metadata links to put
together a picture of Vectors work time activities.
In this case, we find egregious use of ESPN and
many fantasy football leagues viewed on company time, numerous YouTube videos and frequent
Facebook access. We also find files opened in various places on the computer and access to files in
a Dropbox TM folder online/in the cloud.
The Windows System Registry also contains
links to recently viewed files. There is a wealth
of metadata in the Registry. Looking at the MRU
(Most Recently Used) registry entries for Windows Media Player shows us that the user
watched numerous files in his personal Dropbox
folders. There are links to the folders My Videos\
Family\Cookout and My Videos\NightLife\Bar.
After speaking with the owner you find out that
most of the accounting was done in a large Excel spreadsheet. The owner was provided weekly
sales figures via email. Excel files contain metadata that can tell us things like the number of edits, the last user that saved the document, the
user that created it, the last time it was modified
and the last time it was printed. To find this information you can look at the advanced document
properties using Offices built in features. To have
easy access to this feature, it is possible to add
an icon to the Quick Access Toolbar in the top10

left of Office products 2007 and later. To add this


yourself click what looks like the eject Icon on the
bar to bring up a context menu and click More
Commands. From the Choose commands from
dropdown box choose All Commands, then select Advanced Document Properties from the
large scrolling list of commands below and click
Add >> then ok.
From the Registry on this computer you can locate USBStor information either using a script
within your automated tool or a very handy tool
from http://www.woanware.co.uk called USBDeviceForensics. There are many other very handy
free tools on the site. Once the tool is installed
you should extract files from your forensic image.
Depending on what OS is on the computer to be
analyzed determines which of these files, System
Hive, Software Hive, User Hive, setup.api.log, you
should export to perform analysis. In this case,
we find that someone logged on to the computer would plug in a generic flash drive at a certain
time, that that flash drive was mounted on drive E:,
and that metadata links pointed to a target with the
volume name photos.
Now, photos present interesting challenges to
forensic analysis. The user kept many personal
photos and documents on his work computer and
through analyzing the metadata contained within
them we can determine quite a bit about the location of the user, and when her or she was at each
location. The photos and videos can be analyzed
using scripts within your automated tool using external utilities like, Java EXIF viewer (from: http://
sourceforge.net/projects/jexifviewer/) for pictures
and MPEGID (from: http://www.manzanitasystems.com/products/mpegid.html) for video files. In
this case Vector or the photographer, had location
data turned on so the images taken with his phone
have embedded geo-location data. So we can find
that the user took photos with his phone at times
when he was supposed to be working based on
the times revealed by the EXIFviewer. We can also
find, using on the geo-location data, that he was
not at the office during those times.

What Actually Happened?

Vector had started keeping two separate books.


Both sets were kept in his My Documents folder.
The owner provided the most recent set of books
he had been emailed. Vector wanted to keep the
fake set of books hidden, so he had moved them
to a flash drive using the Move function, meaning he used cut and paste not copy then delete.
There were also personal documents and websites visited, along with videos and pictures of
him and his kids at a cookout. Vector would provide the fake accounting files to the owner and
keep the real ones on his flash drive. After every
faked sales day he would pocket the difference

NEARLY EVERYTHING IN YOUR CASE IS METADATA


between the real and faked entries by removing
cash from the register.
Apple computers metadata differs from Windows
metadata in a few ways. As always, the file system keeps a wealth of information. But the most
significant metadata is contained within .plist files.
These files keep program preferences and data,
they are used as an alternative to a central database comparable to the Windows registry. There
are also external .plist viewers like the one found
http://www.forensicfocus.com/Forums/viewtopic/
t=8635/. Things like video and picture file analysis
are OS agnostic so some of this applies to computers running OS X.
Other examples of metadata include things like
phone records. Location data pulled from cell
towers. Location data pulled from the cell phones
themselves. A few years ago apple was under fire
from privacy groups because of the metadata your
phone collected about where you were at specific
times. This was data about where you were at
specific location and time that was initially collected when an application or iOS wanted to know
where you were. Apple has since changed their
location data retention policy. The maps I have
made with this data are incredible. I can show
people driving to work, going home, going out to
eat. I could tell where they ate and how long they
spent there. The call logs on an iPhone or any
other device provide metadata about when and
who you called, and matched with the location
data where you placed the call. This was metadata collected from the phone itself. The same
metadata can be collected by your phone company, to improve network reliability, or because they
are asked to.
You can also collect metadata about network
traffic.
There

On the Web

http://www.woanware.co.uk
http://sourceforge.net/projects/jexifviewer/
http://www.manzanitasystems.com/products/mpegid.html
http://www.forensicfocus.com/Forums/viewtopic/
t=8635/
http://en.wikipedia.org/wiki/List_of_TCP_and_UDP_
port_numbers

are multiple ways to do this. The data is collected


by placing a listening device in the path of a data
transmission. The sophistication of the device depends on what you can pick up. The simplest is netflow data. This just shows how much data passed
by this point and from where to where, on what protocol, and on what port. This information alone can
tell me someone is on IRC, using RDP (Remote
Desktop Protocol), VNC (Virtual Network Computing), a local computer is running NetBIOS, generally how much data these computers exchanged. A
more complete list is Here: http://en.wikipedia.org/
wiki/List_of_TCP_and_UDP_port_numbers.
More advanced network sniffers can examine
packet data, and some can even examine application level data. They can intercept your HTTP
GET requests to find metadata about what websites you visit and look at or your torrent data to
find out your embarrassing TV, movie, or music
habits. For instance if you dont make sure the
website where you are inputting private data is
secure by SSL, and that is it the correct website,
you have the possibility of exposing your passwords or other sensitive data; Like your Social
Security number or credit card numbers. There
are caveats to all of these, such as TOR (The onion router network) to mask packet metadata and
strong encryption to mask torrent or web browsing data. VPN and Proxy services like hidemyass.
com or torguard.net can in many cases obfuscate
your internet activities.

About The Author

Trent Struttmann, is a Digital Forensic Examiner with Cyber Agents Inc. (www.cyberagentsinc.com). He has worked doing
digital forensics, data recovery and cell
phone data analysis on more than 100
cases and has testified more than 5 times
in DoD, civilian federal, civilian criminal
and state criminal courts. He has spoken
at conferences at the Naval Justice School in Rhode Island
and at the Public Defenders conference in Kentucky.
www.eForensicsMag.com

11

A PRIMER ON
METADATA
ANALYSIS
by Jeffrey Lewis

In the 1998 classic science fiction movie, The Matrix, the


character Neo, played by Keaneu Reeves finds out the reality that
he is living in is the construct of a computer program. Everything
he knew to be real turned out to be fake. Even outside of the
movies, computer software can easily twist the world around
and create a fake reality that can pass for what is authentic.

perfect example of this happened when George W. Bush


was running for reelection in
2004. The following story is taken
from his memoir, Decision Points.
From 1968-1974 Bush served in the
Texas Air National Guard. In September 2004 one of Bushs senior advisors brought him a typewritten memo
on National Guard stationary which
was signed by his previous commander and stated that he did not
perform up to standards. One journalist was ready to go to the presses
with this story, but Bush had no recollection of the memo. The senior advisor checked it out and indications
pointed to the memo being forged.
The typeface came from a modern
computer font that didnt exist in the
early 1970s. Within a few days, the
evidence was conclusive: The memo
was phony. [You can read the full

12

story in George W. Bushs memoir


Decision Points on P. 17-18]
In the example from Bushs life,
the memo is the data and the font
is the metadata. Metadata is data
about data. Anything that describes
data is metadata. There are different metadata standards for different types of data. Information is not
searchable and accessible without metadata. For example, without metadata you do not know who
took a photograph, when they took
it, what tool they used to capture the
image, any feedback on the image,
topics and subjects as well as other
pertinent information.

INTRODUCTION

Metadata has been around along as


information has been around. Even
before people talked about data, people talked about metadata, but under

A PRIMER ON METADATA ANALYSIS


different terms. Their used to be a time when everyone would physically get their hands on metadata, not by using computers and search engines,
but by going to the local library and flipping through
card catalogs. Books and journals have metadata
associated with them such as publisher, publisher
location, publication date, pages, subject descriptors and much more. Once you find the book you
will discover more metadata by going to the table
of contents and index to look into how extensively
the topic you are interested in is covered.
With the assistance of technologies such as
machine aided indexing, search engines and online databases the ability to dive into the context
of books you are researching is light years ahead
of what it was in the card catalog era. One such
tool that is available to researching is IllustrataTM, a
deep indexing tool patented and provided by ProQuest. [2] Deep indexing goes beyond traditional indexing by extracting tables and figures (also
known as objects) from journal articles. The extracted objects have their own metadata that is
composed of information derived from captions,
the type of object (scatter plot, line graph, table),
subject terms assigned to the object, taxonomic
terms assigned to the object, statistical terms assigned to the object, geographical terms assigned
to the object and subject headings applied to the
object. Traditional indexing allows a researcher to
only see metadata for the body of an article whereas deep as indexing produces metadata for objects
in the article which are the core research, findings
and experimental results. [3]
Deep indexing is about using metadata to make
scholarly information discoverable for analysis and
research. Metadata is also used in discovery in the
legal process of searching for evidence. This is often referred to as discovery or if in reference to
electronically stored information (ESI), e-discovery. Discovery is the required disclosure of relevant items in the possession of one party to the
opposing party during the course of legal action.
The relevant items needed for discovery are often collected by using metadata to describe what
needs to be captured. One smoking gun that poses a high risk of indictment is e-mail. E-mail is ESI
that is eligible for e-discovery. Metadata in your email can include items such as to, from, cc, subject, date sent, body, attachment title and contents
of attachment.
Performing e-discovery manually is very expensive, time consuming and inefficient. To aid in the
metadata analysis for e-discovery process many
software companies have created technology to
alleviate the manual work involved. One method of
e-discovery that is gaining popularity which uses a
combination of machine-learning technology and
workflow processes is predictive coding. The way
predictive coding works is by having subject matwww.eForensicsMag.com

ter experts train software to classify documents


based on keywords. The software is only as smart
as those programming it, so one person may key
in color and another may key in both color and
colour. [4]
With the many different context that metadata
can come in there are no techniques and tools
for analysis that are tried and true across the
board. No matter the purpose of the metadata,
there are principles and tips that can guide your
analysis.

TIP #1 ENSURE METADATA QUALITY

When metadata passes through different systems


processes that work on one end can have different results at another point. I had a past experience when working in content management where
we were using DOI as a unique identifier and at
one stage of the process the DOI was matching
properly and at another stage the DOI was no longer matching what was in our manifest.. The problem was that the DOI contained an ampersand and
when it went through entity conversion the ampersand was converted to & so it no longer
matched how it was captured in the manifest and
gave back wrong results.

TIP #2 DONT HOARD DATA

Survey results from the Compliance, Governance


and Oversight Council (CGOC) have revealed
that, organizations on average need to archive
about 2-3% of their data for legal hold, 5-10% to
meet regulatory requirements, and 25% for business analysis and insights. Outside of that percentage, the majority of data is being overretained.
If you can cut down on the data you are retaining,
then that will reduce the metadata you are analyzing. It is much easier to perform metadata analysis on a small batch of content instead on a large
batch. Jack Frazier, executive director for CGOC
told Information Week, Once you delete data
thats stale, the algorithms actually function much
better from an analytics standpoint. Leaving stale
data can actually skew the algorithms towards older facts. [5]

TIP #3 USE STANDARDS WHEN


PERFORMING ANALYSIS ON METADATA

Metadata serves the purpose of describing data. When describing data, especially if it is multiple people working with the data, it is important
to have standards. To make sure everyone is on
the same page with how they analyze metadata
it is important to use a thesaurus, semantic network or some other similar tool. Using standards
can be a means to ensure data quality as it will
make sure the most accurate descriptors are in
your metadata and no one uses the term Miscellaneous. While developing a file plan on a fed13

REFERRENCES

[1] You can read the full story in George W. Bushs memoir Decision Points on P. 17-18
[2] You can learn more about Illustrata at http://www.proquest.com/go/deepindexing
[3] Illustrata was so ground breaking that Information Today referred to it as one of the most important products
of the year... and was ranked one of the top ten developments in 2007. The iPhone was number one. You can
read more at https://www.proquest.com/assets/newsletters/products/CSA_Illustrata/0408_Illustrata_Informer.html
[4] For more information on predictive coding please see the White Paper Using Predictive Coding To Your E-Discovery Advantage http://searchdatabackup.bitpipe.com/fulfillment/1369159804_67
[5] Bertolucci, John, Are You A Data Hoarder http://www.informationweek.com/big-data/news/big-data-analytics/are-you-a-data-hoarder/240149328 published February 25, 2013, http://www.proquest.com/go/deepindexing,
https://www.proquest.com/assets/newsletters/products/CSA_Illustrata/0408_Illustrata_Informer.html, http://searchdatabackup.bitpipe.com/fulfillment/1369159804_67, http://www.informationweek.com/big-data/news/big-data-analytics/are-you-a-data-hoarder/240149328

eral government contract one of the rules we decided early on is that Miscellaneous would not
be used to describe a records series. Having a
category called, Miscellaneous is like having a
junk drawer where nothing is organized and it is
hard to find specific items. In your metadata analysis if you come across something that you want
to label as Miscellaneous then think hard about
what it is and if the terms you are using to categorize and classify information are too granular and
not broad enough.

TIP #4 ASK THE RIGHT QUESTIONS OF


METADATA

When performing metadata analysis the goal is


to gain business intelligence for a purpose. Intelligence must be worked for and does not come
naturally. Working for intelligence involves more
than just collecting information, but the right information must be collected and the right questions must be asked of it. For example, if I wanted
to drive from my home to my office I can gather
metadata such as distance and what speed can I
drive to get from here to there. That will provide a
lot of metadata that can theoretically answer my
question, but it can miss the mark if I dont ask
the question what is my departure time and when
is rush hour. The same goes for the story at the
beginning of this article about George W. Bush.
Someone looking at the memo can ask if the person who signed the memo was Bushs commanding officer at the time listed and if the signature
matched up, but if they dont ask, did the font in
the memo exist during the time the memo was
written then they have not asked the million dollar question.

TIP #5 KNOW THE TOOL NECESSARY

What are the requirements for the metadata you


are performing analysis on? Depending on what
type of reports are necessary and what type of information needs to be derived, a search may suffice using any of a number of type of queries or
combining different query methods together, such
as boolean and wildcard together. Other tools you
14

may use is a system to capture metadata derived


from a server such as Outlook or active directory.
Based on the complexity of the data and necessary reporting tools such as visualization or guided
navigation may be necessary.
Metadata makes your information searchable,
retrievable and most importantly valuable. I was
recently helping a friend with search engine optimization for his companys website. I took a look
at the meta tags and my metadata analysis discovered this line of HTML <meta name=robots
content=noindex,nofollow />. This line told
me two things 1) Google, Bing and other search
engines were not collecting metadata to index
the website and thus getting zero results when
searched for and 2) The vendor who created my
friends website did not complete the project. The
browser showed my eyes one thing about the
value of the website, but analyzing the meta tags
showed that the value of the website was worthless because it was not retrievable by search engines.

About The Author

Jeff Lewis CIP MLS is a Certified Information Professional (CIP) from the Association of Information and Imaging Management (AIIM) www.aiim.org. He holds a
graduate degree in Library Science with a
Specialization in Special Collections from
Indiana University. Currently he is employed as a federal government contractor for Zimmerman Associates Inc. http://
zai-inc.com/ You can follow his research
and writing on his blog Information Is Currency http://infocurrency.wordpress.com/ and is an Expert
blogger for AIIM on electronic records management. If you are
on Twitter you can connect with him at twitter.com/Info_Currency.

UNDERSTANDING FILE
METADATA HOW TO
VIEW & INTERPRET
DATA ABOUT DATA
by Chris Sampson

Metadata exists throughout data storage systems, from the


creation and modification dates stored within the file system,
through to specific information embedded within the content
of a file. Metadata can be hugely important to any forensic
investigation, knowing how to extract this information and spot
when it has been manipulated can prove very important. This
article, aimed at those new to forensics, looks at various forms
of metadata and provides examples of the way in which we can
manually retrieve this information using the information that is
available within our operating systems and moving on to other
tools which can be used to extract this data from many different
file types.

n this article you will learn what


metadata is and also how to access, interpret and even touches
upon the possibility of this data being manipulated. We will approach
the topic in a manner that explores
an overview of how different systems
and file containers use and display
this data. Whilst no two operating
systems share identical tools for accessing and displaying metadata, we
do look at an open source third party
tool that will work across multiple environments.

16

The purpose of providing this overview is to allow you to develop your


own particular techniques for evaluating the metadata that you find as well
as ways in which you can manually
validate the accuracy of the information that you find. It is important for any
individual that is presenting information based upon third party tools and
opinions to be comfortable about the
accuracy, as well as understanding the
potential for inaccuracy of this data.
When it comes to investigating information it is important to under-

UNDERSTANDING FILE METADATA HOW TO VIEW & INTERPRET DATA ABOUT DATA
stand that accidental or even potentially malicious manipulation or misinterpretation of any
and all data is possible. The emphasis is on the
fact that if we can read this data directly and we
can understand how it is recorded, that we can
then very easily manipulate or change the data
often in ways which are very close to being, if not
entirely undetectable.
We believe that any investigator should understand at a very fundamental level the workings and
viability of any data gathered that may later need
to be presented as fact. In order to do this properly
you must personally validate and be convinced of
the fact that the

What you should know

This article is aimed at a beginner within the field


of computer forensics and examinations, whilst no
prior knowledge is required or expected. Despite
this it will be useful for you to already be familiar
with and comfortable with the use of your chosen
operating system. A number of tools are discussed
which are used through the command line of various different computer operating systems, familiarity with the command line for your chosen OS is
recommended.
Within this document we provide limited examples of using software tools to examine file metadata, in doing so we will touch upon examples using the following operating systems:
Microsoft Windows 7
Mac OS X 10.8.3, Mountain Lion
Ubuntu Linux 12.0.4
A basic level of understanding when it comes to
general operating system usage and file storage
concepts as well as file properties and attributes
is expected.

WHAT IS METADATA?

Metadata is a fairly broad topic, there are many


different things that can accurately be described
as being metadata. There are many different types
of metadata each describes a specific feature or
attribute of a file or directory item on a computer,
some are common to all data, others are unique to
a specific file or directory type.
Put simply metadata is data that describes other data. It exists everywhere, in your file system
and (if supported) its journal, in email headers,
within instant search databases, inside the Windows Registry, log files and Mac OS file resource
forks to name but a few. For many file container
types there is a huge amount of metadata that
can also be found within the individual files themselves.
Metadata can be gathered from a large number
of different resources, key to getting the most acwww.eForensicsMag.com

curate picture of a file from its metadata is in understanding the system that created or used the
data file, its quirks, peculiarities and indexing abilities. If you have a good grasp of this then you are
off to an excellent start.
Good knowledge of the file type that you are investigating will help you to get the most complete
picture of the metadata that it can store and where
this data can be found. If you are investigating a
new file type for the first time it may be wise to conduct a little research, information that may prove
helpful could include:
Documentation from the publisher of the software used to create the file type, the availability of this kind of information varies from publisher to publisher.
If possible you should install and use the software, create files of the type that you are investigating then examine these to create familiarity with the file container.
Find third party documentation regarding a file
type, the open source community particularly
those who create tools to access or modify the
data type that you need to investigate can be a
great source of detailed information.
Familiarize yourself with manually editing or manipulating metadata within the data container.
The more information that you are armed with prior
to carrying out an examination, the better placed
you will be to accurately and efficiently extract the
information that you need.

FILE SYSTEM METADATA

The first place to look for metadata is within your


computers file manager. File managers are the
most direct link between the computers storage
and the user interface. Lots of information that
you will already be familiar with is available directly from the GUI of all modern operating systems. Examples of the types of metadata normally retrieved by your computer directly from the file
system are:




File Name
File Path
File Size
Creation Date
Modification Date

Whilst the above are pretty standard, the type of information available can vary significantly depending upon the specific operating system involved
and the type and version of file system that is being used to store data for that OS.
Often additional information exists defining certain attributes that have been assigned to a file or
folder by the OS or the user. This information can
17

include permissions, author, character encoding,


file version information and potentially much more.
In many modern file systems additional streams or
forks exist for each file entry which can contain additional information which is not normally displayed
in the GUI.
We should all already be familiar with how to
view most of this information, as users we have
had to sort data in Windows Explorer, Finder or
other mangers. We sort by date to find the file or
version of a file that we need and by alphabetical
order to quickly find a file when we know its name.
We do this daily without a second thought but as
computer examiners we often need more information than the operating system is going to provide
us with by default.
File system metadata is just one source of information that is available about a file. Different file types
often have attributes specific to their usage or purpose, often this data is not universally applicable.

WHAT KIND OF METADATA CAN BE


STORED WITHIN A FILE?

What metadata is stored within a file? The short


answer is, anything, the limit exists only within the
file specification, the format designers creativity
and practical usability limits. The kind of metadata
that is stored in a file can vary from none, or only certain key attributes, through to many different
features, properties, timestamps and more.
Table 1. Common internal file metadata

File Type

Samples of Supported Metadata


.jpeg image
files (EXIF)

Date and time, specific camera


information including make, model
and settings, thumbnail of the image,
description, dimensions, copyright
information, GPS information

MP3 files

Title, artist, album, genre, comments,


copyrights, size, bitrate

PDF
Documents

Version, File Size, No of Pages, Producer,


Creator, Title Subject, Author, Pages,
Keywords

DOCX files

CRC32, author, last modified date,


version number, preview thumbnail,
creation date, file size, word count,
number of pages etc

In the Table 1 we look at five commonly used file


types and describe some of the metadata which
can be written to (and subsequently extracted
from) a specific file container. With many of these
file types being based upon different interpretations of standardized data containers, some or all
of the supported metadata may be missing or im18

plemented incorrectly. Sometimes a file specification is followed very closely but something additional is added within the metadata to serve the
needs of a particular implementation.
It is important that metadata only be used as a
guide rather than an absolute, later we will look at
the output of different applications using the same
file format as well as tools and methods that are
available for the reading, extraction and manipulation of file metadata.
Note The above information is not intended to be
a complete list of available metadata for each file
type, it is rather an example of some of the data
that is commonly available within each file type.
Researching of the internal structure of a file container and its supported metadata is required for
a thorough understanding of the possibilities and
limitations of metadata storage for a particular container type.
Much of this data is accessible directly from either your computers file manager. There are also
a plethora of tools available, many third party utilities enable direct viewing and often editing of the
files metadata.

EXAMINING FILE METADATA

The information displayed within the file manager window within a GUI based operating system is
normally only a small subset of the metadata that
exists and is accessible for any given file. Some is
hidden, reserved for OS usage, some is considered to not be important and is therefore not displayed. But most information that is stored can be
accessed in one way or another. Often, more modern OS features like versioning, journaling and instant search can hold more data than is available
directly through the interface. In most cases there
are tools, applications or techniques which can be
used to display this data.
Windows
Using Windows Explorer we can see a number of
metadata elements from within an Explorer window. This information is configurable too and supports the metadata of many different file types. To
discover what types of metadata can be viewed
through Explorer, try the example given below:
Use Windows Explorer to navigate to the folder
that contains the file types that you wish to examine.
Change the View type to Details
Right click the column headings to display the
following contextual menu
Click the More item at the bottom of the menu
This will open a new window within which you can
choose the type of metadata that you want to display. If Windows supports the file type and that file

UNDERSTANDING FILE METADATA HOW TO VIEW & INTERPRET DATA ABOUT DATA
type contains the metadata that you have selected, you will be able to see the meta contents directly within Windows Explorer.
Ubuntu
Linux systems are a little more limited in the information provide within the standard GUI, although
this can easily be changed. Whilst no specific tool
exists within Ubuntu for metadata viewing there is
the file command.
There are some limitations to file though as it is
not really intended as a metadata analyser, so although you can find out lots of detail about the meta content of a Microsoft Word OLE (.doc) document, there is no metadata available for Microsoft
Open XML (docx) files.
Here is an example of using file to display the
metadata for a newly created word document:
:~/Desktop$ file Sample\ Document.doc

Sample Document.doc: Composite Document


File V2 Document, Little Endian, Os: MacOS, Version 10.3, Code page: 10000, Author: Christopher
Sampson, Template: Normal.dotm, Last Saved
Listing 1. Using Mac OS Xs mdls command to extract file
metadata
Command:
chrissampson$ mdls ~/Desktop/Sample\ Document.
docx
Output:
kMDItemAlternateNames
= (
Sample Document.docx
)
kMDItemAuthors
= (
Christopher Sampson
)
kMDItemContentCreationDate
= 2013-06-09
10:35:06 +0000
kMDItemContentModificationDate = 2013-06-09
10:35:06 +0000
kMDItemContentType
= org.openxmlformats.
wordprocessingml.document
kMDItemContentTypeTree
= (
org.openxmlformats.wordprocessingml.
document,
org.openxmlformats.openxml,
public.zip-archive,
com.pkware.zip-archive,
public.data,
public.item,
com.apple.bom-archive,
public.archive,
public.composite-content,
public.content

www.eForensicsMag.com

By: Christopher Sampson, Revision Number:


2, Name of Creating Application: Microsoft Macintosh Word, Total Editing Time: 01:00, Create
Time/Date: Sat Jun 8 11:39:00 2013, Last Saved
Time/Date: Sat Jun 8 11:39:00 2013, Number of
Pages: 1, Number of Words: 4, Number of Characters: 24, Security: 0
The formatting of this data isnt the most readable, but there is a good amount of important information that is displayed. However, running the file
command on one of the newer Office Open XML
docx format files simply returns:
:~/Desktop$ file Sample\ Document.docx
Sample Document.docx: Microsoft Word 2007+

This shows that whilst file is able to determine the


document type, it does not currently support metadata parsing from it. File is designed primarily to
identify different file types and as such is not really
best suited to the task of metadata analysis, with
that having been said though it is possible to extend file with new file types.
There are other ways to get more information
about lots of different file types on a Linux sys)
kMDItemDateAdded
= 2013-06-09 10:35:06
+0000
kMDItemDisplayName
= Sample Document
kMDItemEditors
= (
Christopher Sampson
)
kMDItemFSContentChangeDate
= 2013-06-09
10:35:06 +0000
kMDItemFSCreationDate
= 2013-06-09 10:35:06
+0000
kMDItemFSCreatorCode
= MSWD
kMDItemFSFinderFlags
= 16
kMDItemFSHasCustomIcon
= 0
kMDItemFSInvisible
= 0
kMDItemFSIsExtensionHidden
= 1
kMDItemFSIsStationery
= 0
kMDItemFSLabel
= 0
kMDItemFSName
= Sample Document.docx
kMDItemFSNodeCount
= 22944
kMDItemFSOwnerGroupID
= 20
kMDItemFSOwnerUserID
= 501
kMDItemFSSize
= 22944
kMDItemFSTypeCode
= WXBN
kMDItemKind
= Microsoft Word document
kMDItemLogicalSize
= 22944
kMDItemOrganizations
= (
TRC Data Recovery Ltd
)
kMDItemPhysicalSize
= 24576

19

tem using ExifTool, a free application that will also


run on Windows and Mac. We will look at ExifTool
once we have discussed the Mac OS X default
options.
Mac OS X
File is a Unix utility which is available by default
throughout most Unix and Unix like operating systems, as such Mac OS X (which borrows large portions of FreeBSD for its core) can also make use of
file. As with the Ubuntu example above, file alone
is probably not the best solution for metadata examination.
Similar to Windows Explorer the Mac OS X Finder is capable of displaying additional metadata for
supported file types directly within the interface.
The process for enabling this is almost identical
to the way that we achieve this within Windows.
First we need to set the Finder window to list view,
right click (or ctrl+click) on the column header area
where you can then select the data that you want
to view. It should be noted that a quirk of Mac OS
X is to only display the available the metadata options relevant to the data shown within the window
that you are changing.
There is also another powerful option if you are
running a version of OS X that includes Apples
Listing 2. An example of metadata extracted using
ExifTool
chrissampson$ exiftool ~/Desktop/Sample\
Document.docx
ExifTool Version Number
: 9.31
File Name
: Sample Document.docx
Directory
: /Users/chrissampson/
Desktop
File Size
: 22 kB
File Modification Date/Time
: 2013:06:09
11:35:06+01:00
File Access Date/Time
: 2013:06:09
12:13:07+01:00
File Inode Change Date/Time
: 2013:06:09
11:35:06+01:00
File Permissions
: rw-r--r-File Type
: DOCX
MIME Type
: application/vnd.
openxmlformats-officedocument.wordprocessingml.
document
Zip Required Version
: 20
Zip Bit Flag
: 0x0006
Zip Compression
: Deflated
Zip Modify Date
: 1980:01:01 00:00:00
Zip CRC
: 0xb01051e9
Zip Compressed Size
: 397
Zip Uncompressed Size
: 1474
Zip File Name
: [Content_Types].xml
Preview Image
: (Binary data 9500

20

Spotlight application. Mac OS X systems that include Spotlight also include a fantastic tool for
viewing metadata that has been captured and indexed by Spotlight, mdls. Mdls can be used to see
extended information for supported file types and
there are a lot of supported file types. The output
of mdls is very comprehensive, below is the data
that was extracted from a newly created Microsoft
Word .docx file: Listing 1.
As you can see, whilst the output of mdls is very
detailed, it is not formatted to make for easy reading. Despite the formatting it is still a pretty simple
task to extract the required information from the
output.
Spotlight presents several interesting possibilities for metadata analysis in general but we will
not go into any further detail about that within this
article.

Third party metadata tools

Whilst it is useful to understand the possibilities of


viewing file metadata within the operating system,
without the need for third party tools, there are other more comprehensive options available. We will
look briefly at some of the freely available tools before discussing the cross platform ExifTool, which
will be used during our examples.
bytes, use -b option to extract)
Title
:
Subject
:
Creator
: Christopher Sampson
Keywords
:
Description
:
Last Modified By
: Christopher Sampson
Revision Number
: 1
Create Date
: 2013:06:09 10:34:00Z
Modify Date
: 2013:06:09 10:35:00Z
Template
: Normal.dotm
Total Edit Time
: 1 minute
Pages
: 1
Words
: 4
Characters
: 24
Application
: Microsoft Macintosh Word
Doc Security
: None
Lines
: 1
Paragraphs
: 1
Scale Crop
: No
Company
: TRC Data Recovery Ltd
Links Up To Date
: No
Characters With Spaces
: 27
Shared Doc
: No
Hyperlinks Changed
: No
App Version
: 14.0000

UNDERSTANDING FILE METADATA HOW TO VIEW & INTERPRET DATA ABOUT DATA
Image Files
Image files often contain a large amount of metadata, from camera type, to time stamps, geo location and more. Much of this information can be
extracted by doing nothing more than opening the
file using a text editor or hex editor. Free tools are
plentiful, as are libraries and open source projects,
which can be used to develop your own utilities.
For a quick and simple inspection of a supported file types exifviewer.org has a web based tool
that displays friendly, easy to interpret metadata.
Exifviewer is built upon the Exif2 library.
It is also important to note that many image file
formats can contain a thumbnail of the original image. In most cases this thumbnail will mirror the
full sized image. When it does not comparison can
help to identify potential editing and manipulation.
Many operating systems also cache thumbnails within their File Managers for image previewing purposes. OS caches can prove an important
source for metadata analysis.
Listing 3. An example of metadata extracted using
ExifTool broken down by category
chrissampson$ exiftool ~/Desktop/Sample\
Document.docx
ExifTool Information
ExifTool Version Number

: 9.31

File System Metadata


File Name
: Sample Document.docx
Directory
: /Users/chrissampson/
Desktop
File Size
: 22 kB
File Modification Date/Time
: 2013:06:09
11:35:06+01:00
File Access Date/Time
: 2013:06:09
12:13:07+01:00
File Inode Change Date/Time
: 2013:06:09
11:35:06+01:00
File Permissions
: rw-r--r-File Type
: DOCX
MIME Type
: application/vnd.
openxmlformats-officedocument.wordprocessingml.
document
Parent zip container details
Zip
Zip
Zip
Zip
Zip
Zip

Required Version
: 20
Bit Flag
: 0x0006
Compression
: Deflated
Modify Date
: 1980:01:01 00:00:00
CRC
: 0xb01051e9
Compressed Size
: 397

www.eForensicsMag.com

PDF files
The metadata contained within a PDF document
varies greatly and can depend on what tool created the document as well as the settings for that application. There are also a number of different PDF
specifications that govern the file format, with varying metadata support and implementation. One of
the simplest approaches to extracting the metadata from a PDF document is to open the file in your
text editor or hex editor.
Some metadata from a PDF is also available
within the operating system or via specific tools like
Adobe Acrobat and Acrobat Reader. For a more
thorough examination or a custom implementation, Xpdf can be considered. Xpdf is open source
under the GPL. You can find many pre-compiled
versions of this tool for different systems.
Multiple Format Applications
A particularly useful application for metadata
analysis is ExifTool. ExifTool is written in Perl
Zip Uncompressed Size
Zip File Name

: 1474
: [Content_Types].xml

Open XML Metadata Extracted from the Document


Preview Image
: (Binary data 9500
bytes, use -b option to extract)
Title
:
Subject
:
Creator
: Christopher Sampson
Keywords
:
Description
:
Last Modified By
: Christopher Sampson
Revision Number
: 1
Create Date
: 2013:06:09 10:34:00Z
Modify Date
: 2013:06:09 10:35:00Z
Template
: Normal.dotm
Total Edit Time
: 1 minute
Pages
: 1
Words
: 4
Characters
: 24
Application
: Microsoft Macintosh Word
Doc Security
: None
Lines
: 1
Paragraphs
: 1
Scale Crop
: No
Company
: TRC Data Recovery Ltd
Links Up To Date
: No
Characters With Spaces
: 27
Shared Doc
: No
Hyperlinks Changed
: No
App Version
: 14.0000

21

and as such is available for most operating systems, giving a consistent command line interface
across each. ExifTool has support for a huge
number of different file types (which are also expandable) and is an excellent tool for extracting
metadata from common file types. Just like mdls,
the output of ExifTool is extremely detailed, but
unlike mdls, ExifTool can also be used on Windows and Linux as well as Mac OS X. Below is
the output of ExifTool on our Sample Document.
docx file: Listing 2.
Below is the same output from ExifTool, but this
time we have broken it down by metadata source.
As you will see some of the information displayed
by ExifTool is file system metadata and not simply
file metadata.: Listing 3.
ExifTool should be used in conjunction with your
own examination and validation techniques. We
often use ExifTool at TRC Data Recovery when we
are examining a new file type.

Figure 1. Document metadata as presented


by Windows 7 within the Properties window

Figure 2. Document metadata as displayed within Microsoft


Word
22

EXAMPLE: MANUALLY EXAMINING THE


METADATA OF A DOCX FILE

Microsoft Office has a massive installed base.


Most examinations of computers will turn up docx
files, many will also call for an interpretation of specific attributes and metadata of these files. For this
article we are going to be looking at a sample docx
file in 2 ways, first we will see what our operating
system can tell us about the file, then we will dive
in to the files content to see where this information came from and whether it is possible to extract
anything more or even differently interpreted data
from the file.
Perhaps the simplest way to find out information
about our Sample Document.docx file is to use our
Windows system. Simply by right clicking on the
file and selecting properties from the contextual
menu provides us with quite a bit of useful information. Microsoft Word does not need to be installed
for this information to be displayed, however, if it is
we can get more details. Figure 1.
If Word is installed you can find a documents
metadata through the File menu by selecting the
Info tab. Towards the right hand side of the resulting tab you will find the file details and a toggle link
that enables more or less information to be displayed: Figure 2.
This information is particularly important as it introduces us to the first example of ways to edit the
files metadata. In this case we can only set attributes that are not already set during the creation of
the document, yet it does show that direct manipulation is possible.
Manual extraction of the content of a Microsoft
Office XML file relies upon an understanding of
how the file format is structured, luckily enough the
format and structure of the docx file is standardized and relatively straightforward. A simplistic way
of looking at a docx file is to consider it as a folder,
this folder, just like any other folder on your computer has a hierarchy and somewhere within that
hierarchy is the data that we are looking for.
The docx folder structure is zipped to save space.
The first step involved in delving into the contents
of the docx, without resorting to specialist tools, is
to rename the file extension from docx to zip. Doing so will allow extraction of the files content using any standard application that can extract a zip
file. All of our operating systems can do this natively but there are also many third party tools available that can also carry out this task.
Once extracted we can see how a docx file is really made up, providing that your file was not encrypted or damaged, you should be presented with
a structure that is very similar to the structure of
our Sample Document.docx, here is an ls output:
Listing 4.
Please note, the structure of other Microsoft Office files, such as those created with Powerpoint

UNDERSTANDING FILE METADATA HOW TO VIEW & INTERPRET DATA ABOUT DATA
and Excel, follows the exact same structure as
the docx file, but some of the internal structure is
slightly different. Why not also rename your xlsx
and pptx files to zip and see how they differ.
The internal contents of a docx file are based
upon many XML files (this is why the letter x was
appended to the original doc format file extension,
it also why the standard is known as Office Open
XML). XML files are not all that a word document
can contain, it is possible to have images and other
files that are available as individual items embedded within the file. These items can also be extracted from your renamed docx file. So, this article is all
about metadata, and having used tools like ExifTool
and mdls we already know that our sample file is full
of metadata, how do we find it? Well there are a few
different locations, but the most important metadata
exists within the ./docProps directory as below:
Listing 4. Recursive output of the ls command on our
unzipped docx file
chrissampson$ ls -R
[Content_Types].xml docProps
_rels word
./_rels:
./docProps:
app.xml core.xml thumbnail.jpeg
./word:
_rels settings.xml theme
document.xml styles.xml webSettings.xml
fontTable.xml stylesWithEffects.xml
./word/_rels:
document.xml.rels
./word/theme:
theme1.xml

Listing 5. Sample Document.docx App.xml content


<?xml version=1.0 encoding=UTF-8?>
<Properties xmlns=http://schemas.
openxmlformats.org/officeDocument/2006/
extended-properties xmlns:vt=http://schemas.
openxmlformats.org/officeDocument/2006/
docPropsVTypes>
<Template>Normal.dotm</Template>
<TotalTime>1</TotalTime>
<Pages>1</Pages>
<Words>4</Words>
<Characters>24</Characters>
<Application>Microsoft Macintosh Word</
Application>
<DocSecurity>0</DocSecurity>

www.eForensicsMag.com

./docProps:
app.xml
core.xml
thumbnail.jpeg

The core.xml and app.xml contain the metadata that has been extracted by ExifTool in an XML
format, the output of these files from our Sample
Document.docx is reprinted below: Listing 5 and
Listing 6.
So now lets compare the information extracted manually with the metadata displayed within
ExifTool (Table 2).
So, as we can see, there is no great mystery to
determining the metadata for the docx file type.
The same applies to most other file types, all that
is required is a basic understanding of how the file
is structured and what metadata can be contained
<Lines>1</Lines>
<Paragraphs>1</Paragraphs>
<ScaleCrop>false</ScaleCrop>
<Company>TRC Data Recovery Ltd</Company>
<LinksUpToDate>false</LinksUpToDate>
<CharactersWithSpaces>27</
CharactersWithSpaces>
<SharedDoc>false</SharedDoc>
<HyperlinksChanged>false</HyperlinksChanged>
<AppVersion>14.0000</AppVersion>
</Properties>

Listing 6. Sample Document.docx App.xml content


<?xml version=1.0 encoding=UTF-8?>
<cp:coreProperties xmlns:cp=http://schemas.
openxmlformats.org/package/2006/metadata/
core-properties xmlns:dc=http://purl.org/dc/
elements/1.1/ xmlns:dcterms=http://purl.org/
dc/terms/ xmlns:dcmitype=http://purl.org/dc/
dcmitype/ xmlns:xsi=http://www.w3.org/2001/
XMLSchema-instance>
<dc:title />
<dc:subject />
<dc:creator>Christopher Sampson</dc:creator>
<cp:keywords />
<dc:description />
<cp:lastModifiedBy>Christopher Sampson</
cp:lastModifiedBy>
<cp:revision>1</cp:revision>
<dcterms:created
xsi:type=dcterms:W3CDTF>2013-06-09T10:34:00Z</
dcterms:created>
<dcterms:modified
xsi:type=dcterms:W3CDTF>2013-06-09T10:35:00Z</
dcterms:modified>
</cp:coreProperties>

23

within it. Once we have a clearer understanding


about the inner workings of the file, we can compare the results of tools like ExifTool or mdls with
what we can actually see within the file.
Note: for an even deeper understanding about
metadata extraction, the source code for ExifTool
and Unix file is freely available and may represent
an excellent research opportunity for those wishing to learn more about metadata.

MANIPULATING METADATA

We will not discuss methods and techniques for


manipulating metadata in depth, manipulation of
this information is beyond the scope of this article.
Below there is a single example of a very simple
method of manipulating our Sample Document.

docx. For this example no specialist tools are required beyond a text editor (all of our systems include text editors).
We have already renamed our Sample Document.docx file to .zip in the example above. To follow this yourself, please do the same and extract
the contents. We are going to modify just one of
the metadata fields changing the name of the person identified as the last modifier.
Using the table above we are able to see that
the Last Modified By metadata is stored within the ./docProps/core.xml file, between the tags
<cp:lastModifiedBy> and </cp:lastModifiedBy>. If
we again examine the content of the core.xml file we
can see that the current value for this tag is Christopher Sampson, highlighted in yellow. Listing 7.

Table 2. Comparison of ExifTool output and actual metadata location

ExifTool Output

24

Actual File Location

XML tags

Content

Preview Image: (Binary data 9500 ./docProps/thumbnail.jpeg N/A


bytes, use -b option to extract)

9,500 bytes JPEG


Image

Title:

./docProps/core.xml

<dc:title>

N/A

Subject:

./docProps/core.xml

<dc:subject>

N/A

Creator: Christopher Sampson

./docProps/core.xml

<dc:creator>

Christopher Sampson

Keywords:

./docProps/core.xml

<cp:keywords />

N/A

Description:

./docProps/core.xml

<dc:description />

N/A

Last Modified By: Christopher


Sampson

./docProps/core.xml

<cp:lastModifiedBy>

Christopher Sampson

Revision Number: 1

./docProps/core.xml

<cp:revision>

Create Date: 2013:06:09 10:34:00Z ./docProps/core.xml

<dcterms:created
xsi:type=dcterms:W3CDTF>

2013-06-09T10:34:00Z

Modify Date: 2013:06:09 10:35:00Z ./docProps/core.xml

<dcterms:modified
xsi:type=dcterms:W3CDTF>

2013-06-09T10:35:00Z

Template: Normal.dotm

./docProps/app.xml

<Template>

Normal.dotm

Total Edit Time: 1 minute

./docProps/app.xml

<TotalTime>

Pages: 1

./docProps/app.xml

<Pages>

Words: 4

./docProps/app.xml

<Words>

Characters: 24

./docProps/app.xml

<Characters>

24

Application: Microsoft Macintosh ./docProps/app.xml


Word

<Application>

Microsoft Macintosh
Word

Doc Security: None

./docProps/app.xml

<DocSecurity>

Lines: 1

./docProps/app.xml

<Lines>

Paragraphs: 1

./docProps/app.xml

<Paragraphs>

Scale Crop: No

./docProps/app.xml

<ScaleCrop>

False

Company: TRC Data Recovery Ltd ./docProps/app.xml

<Company>

TRC Data Recovery


Ltd

Links Up To Date: No

./docProps/app.xml

<LinksUpToDate>

false

Characters With Spaces: 27

./docProps/app.xml

<CharactersWithSpaces>

27

Shared Do: No

./docProps/app.xml

<SharedDoc>

false

Hyperlinks Changed: No

./docProps/app.xml

<HyperlinksChanged>

false

App Version: 14.0000

./docProps/app.xml

<AppVersion>

14.0000

UNDERSTANDING FILE METADATA HOW TO VIEW & INTERPRET DATA ABOUT DATA

Listing 7. The unmodified content of ./docProps/core.xml


<?xml version=1.0 encoding=UTF-8 standalone=yes?>
<cp:coreProperties xmlns:cp=http://schemas.openxmlformats.org/package/2006/metadata/coreproperties xmlns:dc=http://purl.org/dc/elements/1.1/ xmlns:dcterms=http://purl.org/dc/
terms/ xmlns:dcmitype=http://purl.org/dc/dcmitype/ xmlns:xsi=http://www.w3.org/2001/XMLSchemainstance><dc:title></dc:title><dc:subject></dc:subject><dc:creator>Christopher Sampson</
dc:creator><cp:keywords></cp:keywords><dc:description></dc:description>
<cp:lastModifiedBy>Christopher Sampson</cp:lastModifiedBy>
<cp:revision>1</cp:revision><dcterms:created xsi:type=dcterms:W3CDTF>2013-06-09T10:34:00Z</dcte
rms:created><dcterms:modified xsi:type=dcterms:W3CDTF>2013-06-09T10:35:00Z</dcterms:modified></
cp:coreProperties>

Listing 8. The modified content of ./docProps/core.xml


<?xml version=1.0 encoding=UTF-8 standalone=yes?>
<cp:coreProperties xmlns:cp=http://schemas.openxmlformats.org/package/2006/metadata/coreproperties xmlns:dc=http://purl.org/dc/elements/1.1/ xmlns:dcterms=http://purl.org/dc/
terms/ xmlns:dcmitype=http://purl.org/dc/dcmitype/ xmlns:xsi=http://www.w3.org/2001/XMLSchemainstance><dc:title></dc:title><dc:subject></dc:subject><dc:creator>Christopher Sampson</
dc:creator><cp:keywords></cp:keywords><dc:description></dc:description>
<cp:lastModifiedBy>Somebody Else</cp:lastModifiedBy>
<cp:revision>1</cp:revision><dcterms:created xsi:type=dcterms:W3CDTF>2013-06-09T10:34:00Z</dcte
rms:created><dcterms:modified xsi:type=dcterms:W3CDTF>2013-06-09T10:35:00Z</dcterms:modified></
cp:coreProperties>

Listing 9. Output of ExifTool after manual modification of


a docx file
chrissampson$ exiftool ~/Desktop/Sample\
Document\ Modified.docx
ExifTool Version Number
: 9.31
File Name
: Sample Document Modified.
docx
Directory
: /Users/chrissampson/
Desktop
File Size
: 14 kB
File Modification Date/Time
: 2013:06:10
12:18:17+01:00
File Access Date/Time
: 2013:06:10
12:21:54+01:00
File Inode Change Date/Time
: 2013:06:10
12:18:51+01:00
File Permissions
: rw-r--r-File Type
: DOCX
MIME Type
: application/vnd.
openxmlformats-officedocument.wordprocessingml.
document
Zip Required Version
: 20
Zip Bit Flag
: 0
Zip Compression
: Deflated
Zip Modify Date
: 1980:01:01 00:00:00
Zip CRC
: 0xbc27c2c7
Zip Compressed Size
: 251
Zip Uncompressed Size
: 735
Zip File Name
: _rels/.rels

www.eForensicsMag.com

Template
: Normal.dotm
Total Edit Time
: 1 minute
Pages
: 1
Words
: 4
Characters
: 24
Application
: Microsoft Macintosh Word
Doc Security
: None
Lines
: 1
Paragraphs
: 1
Scale Crop
: No
Company
: TRC Data Recovery Ltd
Links Up To Date
: No
Characters With Spaces
: 27
Shared Doc
: No
Hyperlinks Changed
: No
App Version
: 14.0000
Title
:
Subject
:
Creator
: Christopher Sampson
Keywords
:
Description
:
Last Modified By
: Somebody Else
Revision Number
: 1
Create Date
: 2013:06:09 10:34:00Z
Modify Date
: 2013:06:09 10:35:00Z
Preview Image
: (Binary data 9500
bytes, use -b option to extract)

25

Next we are going to use our text editor to


change the field contents to something different,
in this example I have chosen Somebody Else as
the string to insert. So now our modified file looks
like this: Listing 8.
Now we need to zip back up the modified contents to a standard zip file. Mac OS X users should
note that the file .DS_Store will be created at least
once within the zipped structure. Windows and
Linux users will not see this behavior. There are
Mac utilities that prevent this from happening but
we will not discuss those here.
Once our document has been zipped up again
we will need to rename it to create our docx file.
For this example the file was named Sample Document Modified.docx. once complete we can once
again examine this file using ExifTool. Listing 9.
There are other ways to modify these files and
even specialist tools designed purely for that task,
however, we will not discuss these here as we
have not investigated any of these tools.
As you can see from this rather basic example
manipulation is relatively straightforward once you
understand how it is recorded within a file. This also raises the question of identification of manipulated metadata.
Whilst a detailed discussion is beyond the scope
of this article, we will make a few observations that
may help you to determine your own process for
ascertaining if the metadata has changed from that
originally written to the file:
Inconsistency: Our document shows Revision
number 1 and having been created by Christopher Sampson, yet the Last Modified By attribute shows that Somebody Else was the last
to make a change. This should not be possible
for a file that is at revision 1.
Whilst there are exceptions and changes that
can be made to file system metadata when a
file is moved to a new location, in most circumstances the file system Modified date would
remain in sync with the Modify Date from the
files metadata. As stated there are a number
of exceptions to this and this cannot solely be
relied upon.
Many modern operating systems cache supported metadata and some offer file versioning, this may represent a very useful avenue of
pursuit whilst trying to ascertain whether or not
file metadata has been manipulated.
There are many other routes to explore and these
will vary depending upon the operating system, file
type and various other external factors.

SUMMARY

This document presents the reader with a very


brief and non-specific overview of some of the
26

ON THE WEB

https://developer.apple.com/library/mac/documentation/Darwin/Reference/ManPages/man1/
mdls.1.html Mac OS X developer Man page for
the mdls tool,
http://www.sno.phy.queensu.ca/~phil/exiftool/

ExifTool, a utility to extract, view and manipulate


the metadata for many different file types
http://unixhelp.ed.ac.uk/CGI/man-cgi?file Unix file
man pages
http://www.exifviewer.org ExifViewer.org, free online image metadata viewer
http://www.exiv2.org Exiv2 library
http://www.foolabs.com/xpdf/ XPDF, GPL open
source utility for extraction of metadata from PDF
documents, the site contains a number of precompiled binaries for different systems as well as
third party links to other pre-compiled binaries.
http://trcdatarecovery.com Authors website

types of metadata that are available. The aim of


the article is to present the reader with an introduction to ideas and concepts for viewing, understanding and getting the most out of file metadata.
The article is written as a guide only with the intent
of promoting the benefits of understanding and accessing this information without a reliance on any
one specific operating system or software application. By understanding the way in which binary
files are created, updated and used the examiner
is empowered with the ability to spot inconsistencies and reveal evidence that was not immediately
available for a far wider variety of file types than
the samples offered within this article.
We also aim to impress upon the reader that
metadata, as with all data that resides upon a storage device, or within a file can be modified and
spoofed by those with the knowledge of the inner
workings of the file type. In fact, this is so easy to
carry out in many cases that verification and validation of any and all metadata that is extracted
should always be carried out.

About The Author

Chris Sampson is director of UK based data recovery company TRC Data Recovery Ltd. Chris has worked within the
area of data recovery for over 10 years, producing tools and
techniques for the recovery of lost information from all manner
of different operating systems and file types. TRC Data Recovery Ltd primarily provide data recovery services but also
produce software tools for Microsoft Windows and Apple Mac
OS X to aid in data recovery and related matters. Chris is actively involved in research and development projects based
upon the indexing of file types for the purpose of examination,
repair and retrieval of these items from deleted and otherwise
missing states. Current projects include research into the recovery of fragmented multimedia and document files where
no file system information relevant to the files location or fragmented status exists.

LECTRONIC CYBER SECURITY

ECHNOLOGY

INTELLIGENCE

CONTROL

COMPLEXITY

you
PROTECTION
ISK Are

prepared?
CONTROL

ELECTRONIC

THREAT

CONTROL

FORENSICS

CONTROL

RISK

DATABASE

TECHNOLOGY

OMPLEXITY

kpmg.ca/forensic
ECHNOLOGY
RISK

NTELLIGENCE

OMPLEXITY

RISK

ELECTRONIC

SK COMPLEXITY
YBER SECURITY
NTELLIGENCE
RISK

NTELLIGENCE

COMPLEXITY

COMPLEXITY

YBER SECURITY

ATTACK

INVESTIGATIONS

ELECTRONIC

ORPORATE

THREAT

CYBER SECURITY

RISK

THREAT

INTELLIGENCE
ATTACK

THREAT

CYBER SECURITY

TECHNOLOGY eDISCOVERY

OMPLEXITY

ELECTRONIC

INFORMATION

THREAT

CONTROL

DATA ANALYTICS INFORMATION

ISK

INFORMATION

TECHNOLOGY

ATTACK

RISK

NTELLIGENCE DATA RECOVERY

OMPLEXITY

ELECTRONIC

PROTECTION

NTELLIGENCE

INFORMATION

ELECTRONIC

THREAT

CONTROL

INFORMATION
CONTROL

RISK

COMPLEXITY

ISK COMPLEXITY INTRUSION


2013 KPMG LLP, a Canadian limited liability partnership and a member firm of the KPMG network of independent member firms
affiliated with KPMG International Cooperative (KPMG International), a Swiss entity. All rights reserved.

NFORMATION

RISK

TECHNOLOGY

ATTACK

RISK

DEMYSTYFING
METADATA
by Mark Garnett

Metadata are those often quoted, but sometimes


misunderstood, attributes of a file that can sometimes provide
the sought after breakthrough in determining what happened
when on a computer system with respect to particular
documents. They are of paramount importance in those
investigations involving the theft of intellectual property,
electronic discovery, fraud and misconduct investigations and
patent disputes. There are many varying definitions with respect
to what is considered to be metadata but generally, metadata
can be considered data about data.

he subject of metadata is applicable to many thousands of


different file types and a myriad of operating systems and file systems, which could quite easily fill the
pages of an encyclopaedic book.
This article will concentrate on one of
the most common metadata analyses, being that of documents created
using Microsoft Office applications
on the Windows platform, arguably
the most common configuration used
in business environments today.
Whilst metadata may seem obvious, expert eyes are needed to interpret it correctly. There are many
tools available to extract and analyze
metadata ranging from expensive to
low-cost or free.

SOURCES OF METADATA

As outlined above, the most common


file types from which forensic practi28

tioners are required to recover and


interpret metadata are those documents created with the Microsoft Office suite of applications, specifically Word, Excel and PowerPoint. The
metadata contained within these files
can help to determine important aspects with respect to the nature of a
document such as:
When the document was created,
last saved and printed;
Whether a document is an original document, or whether the
document was instead created
from a parent or master;
Whether a document was created
from intellectual property belonging to another party;
The user names of the original
document creator and the person
who last saved the document;
and

DEMYSTYFING METADATA
Whether a document has been edited for a
substantial period of time, or whether it has
been created by cutting and pasting the contents of one document into another new document.
The metadata for Office files is embedded within
the file and as a result, travels with the document
from place to place. As a result, metadata is a rich
source of information as it is not as volatile as other file attributes such as the file dates and times
maintained by the operating system (i.e. created,
last written and last accessed dates).

INTERPRETING METADATA

It is one thing to find metadata, however it is another matter entirely to accurately and impartially
report on the meaning of metadata. As experts, we
are commonly called upon to undertake an analysis for the purposes of court litigation and it is extremely important that we see metadata for what it
is, not for what our client/s want it to be. As a result, I have outlined some commonly sought after
metadata attributes below along with their meaning and common issues associated with their interpretation:

Content Created Date

This metadata attribute is available for all documents created with Office applications and is simply the date that a particular document was created.
Care should be taken when interpreting this date as
it may prove to be misleading if not fully explained.
For example, if a Word document was created in
2010 and then saved using the Save As feature
in 2012 then the newly saved document will rightly
have a created date in 2012, regardless of where
the document was saved (i.e. saving the document
over the old document or to a different location).
Whilst the new document created date is technically accurate, it does not paint a true picture as to
when the document containing the original relevant
information was actually created. If a client was to
be presented with this information, they may form
the incorrect conclusion that the document was created after the fact in 2012 rather than contemporaneously in 2010. This information can also be important when determining if a user has provided you
with the actual document sought, or a copy of the
original document created using the Save As command. There are examples of persons admitting to
the theft of IP material and subsequently agreeing
to return the material only to have returned copies of the original documents rather than the documents themselves.

Revision Number

This is an often misinterpreted attribute and is the


number of times a document has been opened,
www.eForensicsMag.com

changed and then saved. This also includes those


circumstances where a user opens a document,
makes a change, deletes the change to restore the
document to its original form and then saves the
document. The value of this field, along with other
attributes, can also help to determine whether the
document is an original or a copy of an original. For example, if a user was to create a copy
of a document using the Save As command, the
value of the revision number is reset to 2 and the
total edit time (outlined below) is resent to zero.
This can be very important in matters involving the
theft of intellectual property in determining whether
the person responsible for its theft is returning the
actual documents stolen, or simply copies of those
documents.
Instances have occurred where litigants have
produced electronic documents along with detailed explanations as to how they were created,
however the information provided by the revision
number metadata attribute does not support those
explanations. For example, litigants who describe
a detailed process for the quality review of complex documents that upon inspection have a low
revision number count of one or two, which does
not support such a complex review process.

Total Edit Time

Whilst on the surface it seems self-explanatory,


it does not mean the amount of time a user has
actually been actively editing a document, but
only the amount of time a document has been
opened and edited, regardless of how long the
actual edits took. For example, if a user edits a
document by changing one letter, leaves the document open and closes it 8 hours later, then the
total edit time of the document will be 480 minutes. The effect would be exactly the same if a
user actively edited a document for the entire
480 minutes. As a result, it is impossible to determine how much editing was performed on a document, only that editing was performed. This attribute, along with the revision number is important
in determining if a document has been created by
cutting and pasting content from one document
into another. Such a situated would result in the
revision number being low and a very short total
edit time.

Date Last Saved

Put simply, this is the date and time that a user last
saved the document. It is important to note that
this date is not updated if a user simply opens a
document, makes no changes and then saves the
document. This date, quite logically, is updated
when a user creates a new copy of a document
using the Save As command as well as when a
user makes changes and saves the document using the Save command.
29

Date Last Printed

This attribute is self-explanatory and is the date


a document was printed. Practitioners should be
aware that this date is not always updated if a user
simply prints the document to a PDF. For example,
printing using Adobe Acrobat does not cause this
date to update, whereas printing using other software printers, for example LEADTOOLS [1], does
cause the date to update.
This date can also be a cause of concern in
those circumstances where the date last printed
precedes the date that a document was created.
This situation occurs when a user saves a document using the Save As command. For almost all
attributes, using Save As causes metadata attributes to be reset in the newly created document,
whereas the date last printed is carried across
from the old document to the new document.
For example, a situation may occur where an author creates a document in 2010, prints the document
in 2010 and then later saves a copy of the document
using the Save As command in 2012. Situations
such as this have occurred where an author has asserted that a document did not exist until the latter
date, whereas a forensic expert has refuted such an
assertion by demonstrating that the document must
have at least existed, in some form, earlier in order to
explain the last printed date and time.

Creator and Last Modified By

These fields contain the names of the user logged


onto the computer under whose name Microsoft
Office was installed. This can be extremely relevant in shared office environments as well as those
instances where documents are edited from master documents which may have been created by a
PA or office assistant.
I note that older versions of Office (i.e. Office 200
and older) store the names of the last ten authors
however newer versions do not record this information by default.
All of these attributes can also help to determine a
timeline associated with a document such as what
was the earliest time a document could have been
created. For example, if an examiner has several
copies of a document all containing the same substantive content, but with different dates associated with the created, last saved and last print times,
then a knowledge of how these fields work helps
to establish a document timeline. For example, depending on how many copies of a document are in
existence, an examiner can determine the earliest
point in time that a document must have been in
existence and the order in which copies of a document were created.

METADATA ANALYSIS

Metadata can be analysed using commonly available forensic applications such as EnCase (Guid30

ance Software) [2] and FTK (AccessData) [3]. Whilst


these applications do an admirable job of enabling a
practitioner to analyse metadata, not every practitioner has access to such tools. The good news however, is that metadata analysis can be undertaken with
something as simple as a hex editor and those practitioners with software development skills can write
their own simple metadata reader. The reliance on
commercial tools is of lesser importance when we
consider the Office Open XML format [4], which is the
most common format encountered by practitioners
today with respect to Microsoft Office documents.
Some users are unaware that a DOCX, XLSX or
PPTX file is, in actual fact, a compressed archive
file. These files can be opened using an application
such as WinZIP. To confirm this, on a computer on
which WinZIP (or 7zip or WinRAR etc.) is installed,
rename the DOCX or similar file to a ZIP file and
double-click on the file. Your archive application
will be invoked and you will see a series of folders
contained inside the archive named customXml
(may not be present), docProps, word (or xl
or ppt) and _rels. These folders will contain a
raft of additional XML files however those of most
relevance to an examiner are contained within the
docProps folder and are named app.xml and
core.xml. These are the XML files that contain
the metadata fields and associated attributes for
each document. The file core.xml contains metadata attributes such as document created and last
saved dates and times, revision numbers and author information. The file app.xml contains those
metadata attributes more applicable to the type of
document, such as the number of words, the number of paragraphs and the number of pages.
This analysis methodology is not applicable to
those documents saved in the older Microsoft Office formats such as DOC, XLS and PPT, which
are proprietary Microsoft formats. For these documents, an examiner will need to resort to a commercial application, such as EnCase, or develop
their own application.
Using a programming language such as C++ or
Object Pascal (i.e. Delphi), access to document
metadata can be facilitated through the use of the
Windows API function StgOpenStorage [5] and
API IPropertySetStorage interface [6]. When reading metadata from an Office Open XML document,
it is a simple case of reading the metadata contained within the XML files inside the archive.
Reading metadata using languages such as C#,
Visual Basic or other .NET languages is performed
via the PackageProperties Class from within .NET
[7] or a purpose build SDK such as the Open XML
SDK for Office [8]. The advantage of developing
an application is that an examiner is able to tailor
the results and workflow to suite their own circumstances rather than having to alter their workflow
to accommodate a commercial application.

DEMYSTYFING METADATA
data value and as a result, draw an incorrect conclusion with respect to a document.
The examination and interpretation of metadata and the subsequent presentation of results is
one of the most common tasks most practitioners
perform. The accurate interpretation, along with a
thorough knowledge of those circumstances that
cause metadata values to change, is of paramount
importance when presenting this information.
There are commercial tools that allow for the relative easy analysis of document metadata, however there are low cost alternatives, such as hex
editors, and the means also exists for an examiner
to develop his/her own tool tailored to their specific
needs.

REFERENCES

[1] http://www.leadtools.com/
[2] http://www.guidancesoftware.com/
[3] http://www.accessdata.com/
[4] http://msdn.microsoft.com/en-us/library/office/aa338205%
28v=office.12%29.aspx
[5] http://msdn.microsoft.com/en-us/library/windows/desktop/
aa380341%28v=vs.85%29.aspx
[6] http://msdn.microsoft.com/en-us/library/windows/desktop/
aa379840%28v=vs.85%29.aspx
[7] http://msdn.microsoft.com/EN-US/library/ms571919
[8] http://msdn.microsoft.com/en-us/library/office/bb448
854.aspx

CONCLUSION

Whilst the names and values contained within document metadata attributes appear, prima facie, to
be self-explanatory, it is clear that there is generally a history (or story) behind each attribute that
provides information as to the life of a document.
Is it the original document or a copy, when was
the earliest time that a document could have been
created and how long did it take to create, are all
questions that can be answered by looking beyond
the metadata value itself. An examiner wields considerable power when presenting metadata results
to clients, for without considerable explanation, it is
easy to see how a client can misinterpret a metaa

ABOUT THE AUTHOR

Mark is a Partner at McGrathNicol in Sydney, Australia and is the national leader


of the McGrathNicol forensic technology
practice. Mark spent 15 years as a detective in Queensland Police Service before
moving into the corporate world for the last
12 years performing forensic technology
investigations in the Asia Pacific region.
He has regularly provided evidence in legal proceedings with respect to metadata
analysis and has been involved some of
Australias largest litigations as a forensic technology expert.
i

Audit Associates Ltd

AUDIT, ANTI-MONEY LAUNDERING, FRAUD &


INFORMATION SECURITY SYSTEMS

(Consultancy and Training)

Email: auditassociateslimited@gmail.com
Website: www.fincrimes-auditassociates.com
Keep an eye on the website for updates coming soon

EXTRAXTING AND
USING METADATA FOR
A DIGIAL FORENSIC
INVESTIGATION
A STEP-BY-STEP PROCESS
by Marc Bleicher

Metadata can often contain that needle in the haystack youre


looking for during a forensics investigation; in fact it has helped
me out in the past quite a few times. One particular case that
stands out the most was an internal investigation I did for the
company I was working for at the time. Most of the cases I dealt
with in this role related to employee misconduct, which included
wrongful use, inappropriate behavior, harassment, etc. In this
situation, metadata was the key piece of evidence in the case of
a lost smart phone.
What you will learn:
How to extract metadata from image and PE files
What type of information is contained within the metadata of
JPEG and PE files
Tools and techniques to extract the
metadata
How to apply this to a forensic investigation

What you should know:


Familiarity with basic Forensic
Analysis
An understanding of basic file
types (JPEG, and .exe)

32

n employee sat at a table in


the cafeteria and found a mobile phone. He turned it on
to try and determine to whom it belonged. Upon powering the phone
on he saw a rather unsavory photo,
which appeared to be taken from
the ground up looking under a womans skirt. The person who found the
phone then started to look at the rest
of the photos, discovering 13 additional inappropriate images. The finders quest to discover whose phone
this was ended right there. He gave
me the device and explained the situation. I took custody of the phone
and determined who it belonged to
by matching the serial number to

the owner. I then contacted our legal


team because it turned out the device was company property. The legal team advised me to investigate
the matter since it fell under my jurisdiction.
My first step was to visit the person
to whom the device was assigned. I
told him what was going on and gave
him a chance to explain. He said he
had lost the phone a week prior and
he assured me had not taken the images. So I headed to my office to start
the analysis. First, I took a forensic
image of the device. Then I extracted
all the inappropriate images, and began to look at the metadata for each
picture. What I was most interested

in was the date, time, and geotag metadata. Fortunately for me, privacy mode was off and location
services were turned on when the images were
taken. I recorded the metadata and then ran the
geotag coordinates through Google maps. The coordinates for each of the 14 images were the very
office building we were in. Through a great deal of
additional research and analysis I was then able to
figure out that 11 of images were taken on various
different elevators throughout the building. Fortunately, we had surveillance cameras in each of our
elevators, so using a combination of surveillance
footage and the metadata from the pictures I was
able to place this individual in the elevator proving he did in fact take the photos. I matched the
location, date and time of the images to the video
footage date and time and, of course, his image in
the footage. For the other three pictures I was able
to use the metadata from the phone and correlate
that with RFID logs that tracked employee movement in the building based off their ID badges.

INTRODUCTION

You often hear metadata described as data about


data. Metadata also can be described as the content that makes up a file; it is, in essence, electronic DNA that isnt openly visible to the average user
just as physical DNA, such as chromosomes, is not
openly visible to the human eye. There are specialized tools and techniques to analyze physical
DNA, and experts who can interpret those results.
The same goes for forensic analysis and metadata
(eDNA). From a digital forensics standpoint, metadata is extremely important because it constitutes
evidence, as I explained above. Metadata analysis
is really the same as digital forensics analysis and
involves the identification, extraction, analysis and
interpretation of that data. In this article, we will focus on the metadata for two types of files image
files (JPEG) and Portable Executable Files (PE).

BODY

For every type of file that exists there is metadata


that that goes along with that file. Metadata can
come from several different sources depending on
the file type. There are certain types of metadata
specifically from image files that are only generated from a camera, whether its a standalone digital camera or a camera on a smart phone. Other metadata is generated by the application from
which it was created with, for example Microsoft
Word documents, Adobe PDFs and PE files add
their own metadata and contain other metadata not present elsewhere. Other sources include
user-added metadata and metadata attached by
Web browsers and protocols used to upload a file
to a website or email it across the Internet. In this
article we will focus on image files, specifically
JPEG and PE files.
www.eForensicsMag.com

File types we will examine and some


of the metadata they contain
Image files
Date/time image was taken
Make and model of camera or smartphone
Software/OS version
Latitude/longitude (GPS Info)
Portable Executable (PE) files
Digital certificate signer
Date/time of compilation
Version
Type of compiler used
The type of packer used for files that are
packed (encrypted)

STEPS IS METADATA ANALYSIS

Following are step-by-step instructions on tools


and techniques used to extract and analyze metadata for the specific JPEG and PE files.

ANALYZING IMAGE FILE METADATA

Of all the image file types, JPEG files contain the


most metadata especially for images created using a digital camera. JPEG metadata includes,
but is not limited to, the make and model of the
camera, aperture information, time and date the
image was taken, and geotag information. Other
common types of image files contain less metadata than JPEGs, unless a file was converted to a
JPEG. Almost all digital cameras and smartphones
save JPEG files with EXIF (Exchangeable Image
File Format) data. EXIF data exists in JPEG, TIFF
and RAW images. Within the EXIF data is the information we are most interested in for metadata
analysis.

STEPS TO EXTRACT METADATA FROM A


JPEG FILE

Extraction is the first step to obtain metadata from


a JPEG. There are several open source and commercial extraction tools, but generally the open
source tools are all you need since they provide the most relevant information for an investigation. The tool we will use for this procedure
is EXIF Viewer, which can be downloaded as a
standalone tool or installed as a plug-in for Firefox
and Chrome.
I will use the standalone version, since the
browser plug-in directly accesses information on
your hard drive and personal or corporate firewall
settings sometime s prevent the EXIF Viewer plugin from accessing remote URLs.
NOTE: Not all JPEG images will contain EXIF
data for one reason or another: The user deliberately removed it, or theres no geolocation information because the user may have disabled locations
services or GPS on their device.
You can download EXIF Viewer here.
34

Locate the JPEG file youre analyzing (for this


demonstration I chose a photo from my iPhone
that I took in Malibu, Calif.)

This is the GUI you see when opening EXIF


Viewer

Click on File > Open and Browse to the image


you want to analyze

EXIF View extracted the metadata from the image and below is the metdata:

EXTRAXTING AND USING METADATA FOR A DIGIAL FORENSIC INVESTIGATION


If this were a case where we needed to correlate a location and time with a particular incident or some type of activity related to a crime,
the next thing to do is note the GPS Info Tag
data and go to Google Maps.
Enter the GPS Coordinates in the search
box and click Search

Scenario
I took a very nasty piece of malware a Remote
Access Tool (RAT) which I identified during an investigation. This file was not originally packed, so
I used UPX a popular malware packer to pack the
file myself. I then used the three tools listed above
to extract various pieces of metadata that would be
helpful during an intrusion investigation.
PEiD

We now have the exact address where this


photo was taken.
NOTE: Its important to remember that the EXIF
metadata can be modified, especially the GPS information. Just as time stomping is an issue with
hard drive forensics, in intrusion cases the same
goes for metadata of an image file.

ANALYZING PORTABLE EXECUTABLE (PE)


FILE METADATA

The Portable Executable (PE) file format is a data


structure that contains all the required data for a
Windows Operating System loader to manage the
executable code. In addition, every PE file starts
with a header that describes information about the
code in the PE file, any required DLL files it uses,
the type of application it is (GUI or command line),
and much more.
So why would we want to analyze the metadata
of a PE file? The main reason is malware analysis. If we come across a suspicious file during
hard drive forensics, but we are not sure if the
file is malicious or not, then we can look at some
of its metadata to help determine if its dangerous or not. We can quickly determine if a PE file
is malicious by extracting and analyzing its metadata using several tools and techniques.
This is basic static analysis and it is a great
way to confirm whether a file is malicious. It also
can reveal additional information about the functionality, and even provide information that will
assist you in creating network signatures. Part
of static analysis is looking at the metadata of
the PE file.

Launch PEview. When it opens you will see the


following window. In the top field where it says
File: click on the browse icon with 3 dots ()
and browse to the file you want to analyze (Figure 1).
After you open the file you want to analyze,
PEview will automatically examine it and present you with the results in the window displayed below. The output shows that this file
is packed using UPX v. 0.89.6. I deliberately
blacked out the name of the malware for confidentiality (Figure 2).
The next procedure uses the tool sigcheck.
exe from Microsoft Sysinternals. This tool will
show you the publisher of the file, version information, hashes and more. Sigcheck.exe is a
command line tool. For this procedure I used a
tool called mftdump.exe, which is a command
line tool that extracts NTFS volume metadata.
Run the command Sigcheck.exe <filename.
exe>. The example: Figure 3

Figure 1. View of the PEiD Console

Figure 2. PEiD View of analyzed File

Steps to extract and analyze PE file


metadata
Tools Used:
PEiD
sigcheck.exe
http://regex.info/exif.cgi
www.eForensicsMag.com

Figure 3. Out from sigcheck.exe


35

JEFFERYS EXIF VIEWER

Figure 4. Jefferys Exif Viewer tool

The next tool we will use is called Jefferys Exif


Viewer. This is an online tool where you can enter a URL where there is an image file that you
want to extract the metadata from. This is an excellent tool because you are not limited to just
image files you can pretty much analyze any file
type that exists, on the home page of this tool is
a list of files that the tool works with. The tool is
hosted at regex.info/exif.cgi
For this scenario, Im going to once again analyze the malware from above to see what additional metadata it contains. Below is the output
after I clicked on View Image From File.

CONCLUSION

Metadata analysis is an important part of any forensic investigation. This article only scratches the
surface of the various different types of files and
metadata that exists. There is no one single technique or tool to use when conducting metadata
analysis. How you proceed depends on what data
youre after and the most efficient tool and process
to obtain it. There also is quite a bit of useful metadata in other file types, including Microsoft Office
documents, PDFs, markup language files such
as HTML and XML, and email headers. For now,
I hope Ive helped you learn the basics so you will
be able to successfully extract data in your next investigation.

ABOUT THE AUTHOR

Figure 5. Output of Jefferys Exif Viewer


36

Marc Bleicher, a Sr. IR Consultant with


Bit9, has over a decade of experience in
Cyber Security with an extensive background in digital forensics, malware analysis, incident response, and penetration
testing. Before joining Bit9 he worked for
one of the Big-4 Management Consulting firm and various defense contractors
as an IR and Forensics team lead. Marc frequently presents
at InfoSec conferences, is published in SC Magazine, and a
presenter on the website ultimatewindowssecurity.com. Marc
received his masters in Computer Science and Information
Security from Boston University in 2010.

METADATA: VIEWING
THE TREES IN SPITE OF
THE FOREST
by Robert Reed

Founder Trident Information Security and Investigations LLC.

With recent events in the news there is an increased interest


into metadata and how it may be used. What is metadata and
what can it tell us? Forensics examiners have known for some
time now about metadata and have probably used it to assist
in investigations. Meta data can be used for a great many tasks
from file attribution and intelligence gathering, to revealing
manipulation of time and date stamps. The manner in which
metadata can be used is really a matter of the approach and
creativity of the examiner. To get a better hold on what metadata
is, a definition is needed.

n its broadest and simplest definition, metadata is data about data. What exactly does that mean?
Well like most things, that depends on
your perspective. There are two potential avenues to gather metadata. First,
there are those things that are external
to the file, things like file system data,
MAC times, ACLS, ownership information, and transactional logs. This type
of metadata is clearly outside the files
content. The second avenue to gather metadata is internal to the file. This
type of metadata is stored inside the
file as a function of file or file format
standards. Most people will immediately think of Microsoft Office files or
EXIF and Geo-Tagging information in
JPEG images. There are many file formats that have standards for storing
data about the file inside the file itself.
Some advocate, that metadata is
any information about a file to the ex-

38

clusion of the files content. In light


of recent revelations about the NSA
Prism program, an interesting problem presents itself. Where does content start and stop in files? In a JPEG
image, is the content only the visual depiction of what we as the viewer see? Are the Geo-Tags, EXIF,
and other metadata tags fair game?
Does it matter if the data is stored
on a public or private cloud? These
are questions that are sure to be answered over time, and are part of the
challenge of working in a constantly
evolving field.
This article will look at some of the
most common types of metadata,
how to collect or observe them, and
how to use the data to assist in investigations. Keep in mind that each
investigation is unique and that the
examiner may have legal constraints
which may, or may not, allow access

METADATA: VIEWING THE TREES IN SPITE OF THE FOREST


to certain information. The creative approach of
each examiner will determine if and how any metadata may be collected and utilized in the investigative process.

involving intellectual property (IP) and similar theft


style cases. In these types of cases the exfiltration,
or transfer of property and knowledge, are typically
discovered, or proven with the assistance of log
files. In the case of intrusions, hacks, and other
similar compromises, log files and related metadata are crucial since they may be the only evidence
remaining of the events.
Some will argue that there would almost certainly
be some files or tools left on the system for examination. This very much depends on the motives
and methods of the intruder. Is the intruder setting
up a bot-net, or using the system in question as a
pivot for broader compromise? If so, they may be
making changes to the system, adding tools and
services to establish persistence or expand functionality. In these cases there will typically be evidence of these changes on the local machine. If
however, the intruder does not need persistence,
then they need never write anything to disk for the
examiner to find.

INTRODUCTION

First, lets look at some common external metadata and how it may be of assistance in investigations. For the purposes of this section, we will limit
this discussion to log files, subscriber information,
and in the mobile arena, call data records. These
records are metadata in the sense that they are
not files that directly contain the content, but describe how that content was transferred between
disparate systems; thus meeting the loose definition of metadata, data about data.

Log Files

Log files can take many different forms. When configured appropriately, they are created whenever
one system somehow communicates with another
system. Things like event logs, firewalls, proxies,
intrusion detection and prevention systems (IDS/
IPS) all catalog information about the interactions
of systems. These logs are of particular interest
when looking at intrusion and similar hacking
style cases. They are also often critical in incidents
a

Consider the following scenario

An individual is a prospective state sponsored intruder. His target is intellectual property of a major military contractor of a rival nation. It is likely have that
at his disposal are several zero-day exploits. He has
i

FarStone 2013 Distributor / Reseller Partner Recruitment

www.farstone.com
inquiry@farstone.com

the ability to utilize those exploits to gain access to


the machines that hold the property he seeks. Does
he need to establish persistence? No, he does not!
That person can exploit the machines that are
needed, copy out the information that is required,
with little or no change to the contents of the subjects machine(s). The only indicators of compromise will be in RAM, log files and other metadata
related to system(s), such as event logs, transactional logs, MAC times and the like.

How can that be?

Well, if you have played with tools like the Metasploit,


you know that most modern hacks operate in RAM.
Nowhere in the latest version of the Hackers SOPs
does it say step 32.... write data to disk. Hackers
will not write to disk unless they are trying to accomplish a specific task, like adding functionality or establishing persistence. So, as an intruder, a person,
launches an exploit that operates in RAM, spawns
an encrypted reverse shell, and exfiltrates the booty encrypted on an open port. Most probably via
SSL on port 80. This allows the traffic to clear many
systems because the activity looks like normal web
traffic. Since this communication resides in volatile

RAM the further from the time of the event the intrusion is discovered, the less likely evidentiary information still resides in RAM. Persistence need not
be established because the intruder has already
made off with the information wanted or needed.
Think of how difficult it would be to discover if
someone had broken into your home by sliding
open an unlocked window, took a picture of something inside your home, and left without removing
or touching anything. When you get home, nothing
is missing and everything is right where you left it?
In the case of zero day exploits the problem is even
worse; it is much like having a master key. You do
not need to leave an attack vector like an unlocked
door or window. An intruder can circumvent the lock
with his unknown exploit and gain entry at his leisure. Persistence is not needed because all your
stuff (Servers and Internet Addresses) are typically
going to be the same place tomorrow that they are
today. The same exploit can be used tomorrow that
was used today. The best way for the intruder to be
discovered is to start making changes to things. The
more things changed (new files or service) the more
likely it is that the target discovers that something is
amiss. The increased scrutiny on the part of the target may then reveal the intruders valuable zero-day
and allow it to be mitigated or patched.
Where is information about the exploit to be
found? Since it is not making direct changes to the
disk or its programs the data will reside only in RAM,
log files, metadata, and possibly the page/swap file.
The swap file presents us with a couple problems.
With the increase of 64Bit systems and large caches of RAM, there is more of a possibility that the
actions never get paged out to the swap file. Also,
the longer out in time we go, the better chance that
the actions may have been overwritten in swap. So
we are left with metadata in the form system and
event logs on the local machine. Logs on the local
machine are suspect because they may have been
altered. This leaves us with transactional, IDS and
IPS logs residing on other systems.

CALL DATA RECORDS

Figure 1. Approximating possible location from call data


records

Figure 2. Querying files for metadata with exiftool


40

Call data records can be utilized much in the same


way that we use IP addresses in typical computer
forensics. They can be used to determine who is
talking to whom, for how long and possibly most
importantly, where the parties were when the communications took place. The examiner discovers
previously unknown associations of victims and
suspects. Obviously from an intelligence standpoint it is nice to know who is talking with whom.
It helps establish who may be involved in the incident, such as suspects and co-conspirators. Call
data records will typically list the time of the event,
an antenna location and antenna face along with
other entries. Armed with this information we may
be able to determine the probable location of sus-

METADATA: VIEWING THE TREES IN SPITE OF THE FOREST


pects and victims at the time of an event. This is
done by relating the call data records (CDR) entries with information about other antennas in the
area. If we know a particular CDR event, know the
locations/configuration of other antennas in the area and their relative coverage area. All things being constant, the phone will most likely connect to
the strongest/closest signal. This is a gross over
simplifications as terrain and signal strength will affect coverage, but for purposes of this argument
consider the following diagram.
This is a call data record indicating that a phone
was using the maroon sites south west facing antenna at about the same time a crime occurred in
that area. The locations of the adjacent blue and
green sites and antennas can be used to determine
the likely location of the cell phone at the time of the
call. As the phone moves closer to the other sites
antenna faces it is increasing likely to connect to the
stronger signal emanating from those respective
antennas. So using this technique we can establish that the subjects phone was most likely in the
remaining maroon shaded area (Figure 1). This information can be used to place a subject, or at least
his phone, in a particular area. This technique has
clear implications for confirming or attacking potential alibi. There are obvious applications as well for
locating persons and determining their movements
and associates. If the phone remains dormant or
does not move from a location for an extended period of time, this may be where the subject currently
lives or works. Does the phone frequent specific locations? These may be friends, associates, or customers. We could look at the call data records and
build a history of where the phone was prior to, or
immediately after specific events or in relation to
other similar incidents.
Lets look at some more traditional metadata
retained inside the file.
Many evil-doers have been kind enough to
send threatening, ransom, or extortion letters in

Figure 3. Sample exiftool output for Microsoft Office


document
www.eForensicsMag.com

the form of Microsoft Word documents. Microsoft


Office files maintain a great deal of data about the
file inside the file format. Things like the original
author of the file, an unique GUID (which contains
information about the machine it was originally created on), when it was last printed, revisions information, was it created from a template, and the list
goes on and on. We will look at two different tools
to see what metadata they report. After this brief
glimpse at some of the metadata available, I will let
your imagination run wild with how it can be used
from a forensic standpoint.
First lets look at Phil Harveys EXIF tools1. This
is an excellent tool that supports reading and writing metadata for a great many file formats. From
the command line I will execute the exiftools command and have it give me all the metadata the tool
recognizes for the file in the specified directory
(Figure 2). The information is redirected to a text
file. Exiftool has an extensive help file and many
switches and options. The reader is encouraged to
look at the tool and its capabilities.
Parsing through the files returns we find this entry. It contains the name of the file and some use-

Figure 4. Sample output from Forensic Foca for same Office


Document as Figure 3

Figure 5. Sample exiftool output form JPEG image


41

ful information such as: the original author, original


created date and time, the last time it was printed
and a hyper link to a network resource where a jpg
file is being pulled. This is useful information but, if
we dig deeper there is much more hidden in metadata of office documents.
Lets look at it with a different tool that is still in
beta but shows a lot of promise. Foca2 is a tool in
development by Informatica64. The tool was originally developed to mine domains for metadata related to content on the site, and is particularly for
enumeration and reconnaissance. In addition to
the original version of Foca, there is an online interface that allows analysis of individual submitted
files, and a forensic version in development which
shows promise.
When we look at the same file in the beta Forensic Foca tool what information is provided?
This tool provides different data that could be
very useful. Foca is reporting some historical information about the file. It reports that the file has
been accessed and utilized by several different local and domain users. So an examiner could expect to see temporary copies of the file in those
folder locations. Also visible is a remote store at
the top of the history, as well as a location to a
mapped drive in the other metadata area and folders (Remote Users) area. If this file was part of a
suspected intellectual property theft investigation,
might this be of interest? Most certainly it would be.
It may contain references to our users, authors, or
network resources. That, along with the indicators
of possession, editing, or use of the file by unauthorized users, is clearly compelling evidence. On
top of this information, there is information on additional locations where we might find corroboratory
evidence related to the files in question.

EXIF and Geo-Tagging information

Exif and geo tag information are excellent sources


of metadata that can be looked at from a forensic
perspective. Again using the exiftool command we
will look at all the files in a directory and output the
exif information. When looking at the output, there
are several things that may be helpful. We see the
modified time pre dates, the created, and access
times. This is consistent with the file being moved
from one piece of media to another. But it would be
nice to know when the file was originally created.
With EXIF data we can look at when the camera
taking the photo, reports creating the file. Now of
course, this is dependent on the device taking the
photo having the correct time and date settings. Also visible is the make and model of camera that took
the photo, additionally many cameras will place the
cameras serial number in the EXIF data. This can
be very important if there is a need to show that a
particular device was responsible for a particular image. When we have relatively unique artifacts, such
42

REFERENCES

Exiftools: Phil Harvey, http://www.sno.phy.queensu.


ca/~phil/exiftool/#supported
Foca: Informatica64, http://www.informatica64.com/foca/

as camera make and model, this metadata can be


used when attempting to locate deleted information,
or files residing in compound file formats.

CONCLUSION

Most carving utilities will look at sector or cluster


boundaries for file headers. When looking at the
EXIF data markers we may be able to locate and
recover files that for whatever reason do not reside
on sector boundaries or whose headers have been
lost, removed, or corrupted.
This example does not have any geo-tagging information, but with more and more devices imprinting geographic locations into the files they create,
it makes sense to look for geo tags. Obviously with
geographic latitude and longitude information, it is
possible to establish where an image was taken (or
in the case of SQlite databases, stored on phones)
at a particular time. This can help the examiner resolve jurisdictional issues, track movements, and
either corroborate or refute alibi.
This article has looked at a few tools and examples
of how metadata can assist the examiner, investigator, and attorneys. Corporate and governmental entities have massive digital data sets that need to be
cataloged and quantified. To meet those needs developers have been keen to place additional information into metadata tags. Programs gat then be
used to parse metadata locating, controlling, and archiving valuable property and knowledge. The presence of metadata and the techniques to view it may
be proprietary trade secrets, or open standards, but
there is little doubt that it is increasingly present. To
determine what type of metadata may be present
in a file open it with the associated program. Look
through the menus for entries like properties, options, or history. What information is available. Look
at the file with a hex editor, is there readable data that does not appear in the normal file presentation? I bet you will be surprised what resides just
outside direct view. The data is often there, our job
as examiners, is to detect its presence and utilize it
in furtherance of the truth.
ABOUT THE AUTHOR

Robert Reed is an experienced computer


forensic examiner and trainer. With over 22
years in law enforcement and direct involvement in computer forensics since 2004, he
brings significant training and expertise to
the table. Mr. Reed has trained hundreds of
examiners, and security professionals from
defense, space, intelligence, law enforcement, and large private sector organizations.

DATA SECURITY

Computer
Forensics Experts
Computer Forensics Services
We are prepared to attend the situation urgency
supporting your needs and delivering our consulting
solutions considering our worldwide cybercrime
knowledge base by:

Dispute support services


Evidence Identify and Collection
Evidence Analysis and Reporting
Device analysis as: Computers, Smartphones,
Tablets, Network, Printers, even Games Consoles

Computer Forensics Training


Get in touch enjoying our cases applying methodologies
and tools resolving a real forensic case in 40 hours. At
last you will be submitted by a certification test (DSFE)
proofing your skills.

R. Ea de Queiroz, 682 Vila Mariana


So Paulo, S.P. 04011-033 - Brazil
Phone: +55 11 5011-7807

E-mail: datasecurity@datasecurity.com.br
facebook.com/data.secur.face
@datasecurity1

TOP 10 METADATA
CONSIDERATIONS FOR
NETWORK SECURITY
by Brian Contos

With network security metadata, 1+1 doesnt always equal 2 as


causal relationships become less opaque and discoveries often
prove to be both interesting and concerning.
In June 2013 the term metadata which is most generally
defined as data about data, went mainstream following
the Guardians NSA Prism program article. For many years
the security industry has been working with metadata and
developing best practices around handling metadata and even
choosing the right technology for specific use cases. This article
will focus on key areas of consideration when looking to leverage
metadata to improve network security.

01

WONDERING COULD I,SHOULD I: PRIVACY

There are multiple rules and regulations regarding the collection of data.
For example, many European countries such as France and Germany have
strong privacy laws that limit what can be captured. Note however that even
in countries with strong privacy laws it doesnt necessarily mean that metadata cannot be collected. Some multinational businesses that Ive worked with
utilize technological solutions to collect data automatically and convert it into
metadata but they dont leverage humans to analyze it unless it is security
incident-driven.
Whatever method, technology is used it must be vetted by legal counsel
and inline with organizational policies. It also may require updating employees regarding privacy expectations and general employee awareness surrounding the how and why of the data collection. Without these steps, metadata may be considered illegal or contrary to organizational perspectives on
monitoring. Simply put get permission.
44

TOP 10 METADATA CONSIDERATIONS FOR NETWORK SECURITY

02

DISCOVERING YOU CANT SEE 20-40


ERCENT OF YOUR TRAFFIC: ENCRYPTION

Encrypted network traffic is becoming more common. In most organizations it accounts for 20-40
percent of the packets and this number is trending
up. The great majority of advanced threats utilize
encryption to bypass security controls, facilitate
command and control activity and steal sensitive
data. As such, understanding how encrypted network traffic is going to be addressed is an important variable to consider.

03

RECORDING BITS OVER THE WIRE:


COLLECTION

There are several types of metadata that are applicable to network security. For years Security Information and Event Management (SIEM) solutions
took center stage with their ability to ingest logs,
events, alerts from disparate assets throughout the
network. These systems could capture thousands
of logs a second and are still an important part of
network security.
Over the last few years Security Intelligence and
Analytics (SIA) solutions often called SIEM for

04

PUTTING BITS IN A BOX: STORAGE

Equally important to collecting the data is being able to store it. Metadata is usually used for
a combination of real-time and forensic analysis.
But even in real-time analysis metadata that is
stored can improve the analytical process in terms
of event scoring, prioritization, history and impact
analysis. With millions of events a second crossing the wire the packets must be able to flow over
the Ethernet and into the storage system. This is
another area where its not of use if its not there.

www.eForensicsMag.com

While some high level information can still be collected even when encryption is being used, source
and destination IP addresses, certificate information, etc., for it to be of any significant value the
information must be decrypted. Fortunately there
are a number of network security solutions that are
purpose-built for real-time decryption that organizations can invest in. These solutions essentially
operate in the network path, where encrypted data goes in one end and decrypted data comes out
the other for analysis by whatever security solutions need visibility. In addition to purpose-built solutions, there are a number of firewall, proxy and
related vendors that offer decryption. If you are serious about network security metadata you need to
get serious about decryption solutions too.

packets or big data security solutions, have become increasingly common. Instead of thousands
of logs a second, they are designed to collect millions of packets, flows and sessions a second. Because of the volume, velocity and variety of the
packets, solutions designed to collect metadata off
the wire at the packet level need to be able to operate with lossless collection on 2, 10, and even
40 gig networks. When it comes to the analytics
phase, item 8 in this article, an analyst looking at
the metadata results will be at an extreme disadvantage if the packets are missing, files cannot be
reconstructed, and sessions cannot be followed.
The net you cant analyze it if it didnt get captured.

When looking at solutions, ensure that when storing network metadata that data is indexed across a
wide number of parameters so it can later be quickly retrieved. Because there are thousands of network applications each with hundreds of attributes,
it is important to leverage a solution that is extensible enough to store the packets, break them down
into disparate pieces of metadata, and utilize indexing to make it useful after the fact.

45

05

PULLING BITS OUT OF A BOX: RETRIEVAL

Up to this point weve been focusing on getting


the raw data into a system so we can work some
magic and push some interesting metadata out the
other side. How fast the data can be retrieved will
be the difference between a solution that is usable
and one that finds its way into the IT security bone
yards. Of course there are a number of variables
that relate to retrieval speed, but as a general rule
of thumb you want to at a minimum be able to retrieve results across gigabytes of data in seconds
and terabytes within minutes. Anything more than
that simply becomes too cumbersome to be practical.

06

LEVERAGING OTHER TOOLS:


INTEGRATION

Lets say we have a SIA solution as discussed in


section three. While collecting network packet data, processing it by breaking it into metadata, and
even applying deep packet inspection (DPI) to truly understand why that DNS-looking packet coming over port 53 was in fact botnet beaconing and
not a name server lookup for example is extremely
valuable, these types of solutions can be equally
relevant at making other security solutions more
extensible.
Consider an IPS solution. These tools are great
at generating an alert based on the detection of

07

MAKING METADATA BETTER:


ENRICHMENT

We have covered collecting data, applying DPI,


and deriving value through integration across our
security ecosystem. But we can still make it better
through enrichment.

46

Some items that could impact retrieval speeds


and should be considered when defining the architecture of your solution:
How much raw packet data do I want to store
vs. metadata? For example, is it enough for
me to know the metadata details relating to a
group of packets associated with a confidential
PDF leaving my network, or do I want to actually be able to recreate that PDF?
How long do I want to preserve my data?
Will I have one or multiple systems collecting
data and can they be queried from a single,
centralized location?
How many users will be interacting with my solution?
What are my architectural requirements data storage system, network bandwidth, CPU,
caching, memory, etc.?

something malicious. This is a bit like a photograph. Within the IPS interface, by pivoting from
the alert to a SIA solution that contains all of the
raw packet and metadata, it is like going from a
still frame to the entire movie since it contains all
information before, during and after the alert. This
type of integration is a must-have for the robust
and cost-effective use of metadata. Time and money can be saved because the integration between
disparate security solutions allows for a great reduction in the amount of time it takes to discover
and remediate an incident and preform root cause
analysis. Your solution should do this by keying off
of metadata attributes such as source and destination IPs, ports, time stamps and hundreds of other
variables. Besides IPS, solutions such as SIEM,
log management, firewalls and anti-malware can
all benefit from integration with solutions that are
focused on raw packet collection and metadata.

The data can become even more valuable by


taking advantage of reputation information associated with IPs, URLs, domains and the like, file
blacklisting and whitelisting and even anti-malware
capabilities. Regardless of the metadata solution,
enriching metadata is an imperative to getting the
most out of your solution else you will only be as
good as what you see on the wire instead of getting
the network effect of what potentially millions of users are seeing around the globe.

TOP 10 METADATA CONSIDERATIONS FOR NETWORK SECURITY

08

CONVERGING HUMAN AND MACHINEBASED ANALYTICS: ANALYSIS

The solution providing the metadata should do


much of the heavy lifting as it relates to firing alerts
on suspicious discoverers. This is usually done
through a combination of correlation, anomaly detection and pattern discovery.

It should also be designed to support detailed


and easy-to-follow visualization of the data that
draws an analysts eyes to potential areas of interest. Workflow is another factor as analyzing data
that should be highly tuned to support the natural
process that an investigator might follow and more
generally speaking allow things to happen in a
single click instead of four. The combination of robust machine-based analytics complemented by a
streamlined human interface ensures that big data
doesnt get the best of your solution and that the
metadata is actually usable.

09

REAPING REWARDS: REPORT

10

CONCLUSION

ABOUT THE AUTHOR

Security analysts like reports management loves


them. Even the best solution within your metadata
arsenal wont be as valuable if it isnt able to generate both technical and summarized reports of
discoveries. Both business leaders and technical
leaders alike can benefit from reporting.
Regardless of the metadata solution being leveraged, ensure that reporting can be general and
specific even to the point on generating metadata
about a specific file, file type, IP address, URL, user, etc.

Metadata is a rich topic and weve really only


scratched the surface of its capabilities in this article. Regardless of your solution being standalone
or integrated throughout the larger security ecosystem, the information it provides is absolutely
necessary to combat todays external and internal
threats.
A robust solution should be able to address at
a minimum use cases such as: situational awareness, incident response, data loss monitoring, and
advanced threats such as zero-day attacks and
malware. It should also help to answer forensic
questions like: who did it, when, how, for how long,
who else was involved, is it still going on, and what
was the extent of the impact. This type of packetlevel visibility combined with rapid and targeted incident response is one of the reasons that metadata has quickly become an indispensable piece
of an effective network security posture.

www.eForensicsMag.com

CLOSING THE LOOP: REPOSNSE

You have controls for prevention, while 100 percent


necessary they dont scale and must be augmented
by incident detection and response. Metadata provides an analytical platform for detection that in turn
can be leveraged to quickly mount a targeted response through automatic or human-assisted processes. Once a preventative control like a proxy,
firewall or IPS has been updated with a new rule,
policy, signature, etc., many solutions such as SIA
can take the stored raw packet data can replay that
data back through the preventative security controls
for assurance testing. When considering metadata
solutions ensure that you are considering use cases beyond monitoring and analysis; they can be extremely useful for incident response.

Brian Contos, CISSP, VP and Chief Information Security Officer within Blue Coats Advanced Threat Protection Group
Brian builds successful security companies and has had multiple IPOs and acquisitions. He is a published author, seasoned
business executive with a proven record of success and a recognized security expert with 20 years of experience. He has worked
with Global 2000 companies and government organizations in
45 countries across six continents. Brian authored two books including Enemy at the Water Cooler Real-Life Stories of Insider
Threats and Physical and Logical Security Convergence, which
he co-authored with former NSA Deputy Director William Crowell. He is an invited speaker at leading industry events like RSA,
Interop, AusCERT, Infosecurity Europe and GFIRST and has written for and been interviewed by industry and business press such
as CBS News, Bloomberg, Forbes, NY Times, USA Today and
the London Times. Brian was formerly the WW VP field engineering at Solera Networks, senior director for emerging markets at
McAfee, chief security strategist at Imperva, chief security officer
at ArcSight, and director of engineering at Riptech. In addition, he
has held security positions at Bell Laboratories and the Defense
Information Systems Agency (DISA). Brian is a Ponemon Institute
Distinguished Fellow and graduate of the University of Arizona.
47

METADATA: What
It Is and Why You
Should Care
by Johnette Hassell, Ph.D., CEDS and Jack Molisani

Until Edward Snowden unleashed his allegations about the


US and UK collecting phone information on millions of their
citizens, the word metadata was the providence of attorneys and
computer forensic/eDiscovery nerds, such as these authors. And
while the world may be aware of the term, few truly understand
the breadth and pervasiveness of computer metadata.

n this article we will discuss what


computer metadata is, explain its
importance in investigations and
litigation, and provide a variety of examples.

ABOUT ELECTRONICALLY
INFORMATION

When we discuss metadata in general and metadata as evidence in a


lawsuit in particular, we are discussing what is generally called Electronically Stored Information, ESI. The
need to collect, process and produce
ESI has caused substantial changes in the way ESI is handled when
compared to more tangible kinds of
evidence. In particular, ESI must be
handled in ways that preserve and
protect metadata.

48

WHAT IS METADATA?

When electronic devices store information, the files used normally contain the information itself (such as a
digital photograph) plus additional information about what is stored in the
file. For example, the time a photo
was taken and other information is
typically stored along with the actual
photo.
This addition information is called
metadata, because it is data about
the data.
Word processing documents may
contain information about the last edit, as Word Perfect does, or about the
username of the documents creator,
as Microsoft Word does.
Metadata, however, is not limited
to files in a computer or camera. The
US and the UK say they werent col-

METADATA: What It Is and Why You Should Care


lecting or storing the actual telephone calls, only
the metadata about the calls. (You see such metadata each month when you read your phone bill:
the numbers you called and how long each call
lasted.) If your call was made with a smartphone,
the metadata probably also contains the location
from where you made the call.

COMMON TYPES OF METADATA

While governments requesting information about


its citizens phone usage certainly provides a highprofile example of how metadata can be used (or
misused), lets look at two common types of metadata: the metadata in office documents (such as
MS Word and Excel files) and digital photographs.

TYPICAL MICROSOFT DOCUMENTS

Figure 1. Metadata in a Word 2003 2007 Document

You may know that metadata in documents contain easy-to-see information such as the name
of the author, the company name, and certain
dates. We say easy to see because you can
see and even change that information from within
the program.
To see a simple example of this information, open
a Microsoft Word 2003 or 2007 document and select Properties from the File menu. A dialog similar
to Figure 1 will appear showing some of this information, such as the document Title and Author.
For Microsoft Word 2010, select the Info tab on
the File menu: Figure 2.
A document created on a corporate PC might display more information, such as the company name
and the name of a corporate template (if any). See
Figure 2 for a typical example.
While you may have known you can change what
appears in the Author field, you may not know that
the metadata often includes hidden information,
such as the name of previous authors who edited
the document and the names of the printers used
to print the document.
To see the remainder of the metadata stored in
a Word file:
In Microsoft Word, select Open... from the File
menu.
From the Files of type drop-down list, select
Recover Text from Any File (*.*) and then select and open a Word document, as seen in
Figure 3.
When the file opens, page down to the bottom
of the file to see metadata such as the following (what you see will vary): Figure 4.

Figure 2. Metadata in Word 2010 Document


www.eForensicsMag.com

Figure 3. Using Microsoft Word to View Metadata


49

In Figure 4, above, you can see the name the document originally had (Administrative details 305
198.doc) and where it was located (on a machine
with user name Johnette Hassell).
Figure 5 shows this document was then saved
under a new name (Administrative details 305.
doc) in a folder on a different computer (E:\cs305.
fall.01 on the computer named hassell):
There is more information you can recover, but
this gives a good example of the type of data Microsoft Word stores. Such information might be critical evidence in a lawsuit, where the metadata might
show how an accused party saved a companys design document to an external hard drive, edited it on
a home PC, then edited it again on a computer at
his/her new (and competing) employer.

Figure 4. Metadata in a Word Document

DIGITAL PHOTOGRAPHS

Digital cameras (including those on smartphones)


record metadata such as the date and time a
photo was made. This data is recorded even if
the photographer has turned off Show Dates in
the photo. These time stamps may be useful, for
example, to police who performed a drug bust,
where they need to show the exact time of the
raid and seizure.

COPY MACHINES

Few people know that modern copy machines/


printers work by making an image of a page and
then printing the image. [1, 2] Fewer yet know that
such machines often retain copies of recently printed documents on an internal hard drive, including
the date and time each document was submitted, the username who printed the document, and
the computer from which the document was sent.
These types of metadata are useful in both trade
secret cases (in which an employee is accused of
theft) and in espionage cases (showing who stole
classified documents).
CBS News, in preparation for an investigative report on copy machines, bought 4 used copy machines. In examining their hard drives, they found
a list of targets in a major drug raid (from the Buffalo Police Narcotics Unit), 95 pages of pay stubs
with names, addresses and social security numbers and $40,000 in copied checks (from a New
York construction company), and 300 pages of individual medical records, included everything from
drug prescriptions, to blood test results, to a cancer diagnosis (from Affinity Health Plan, a New
York insurance company). [3]

SMARTPHONES

The photographs taken by smartphones may have


additional metadata, such as the location where
the photograph was made. Just ask Highinio
Ochoa. He was a hacker known as w0rmer [sic]
and worked with a hacking group CabinCr3s.
He was, allegedly, responsible for releasing the
personal information of scores of police officers
throughout the United States.
The FBI found him because he posted a photo of his scantily-clad girlfriend. The photo (made
with a smartphone) contained the GPS coordinates of where it was made. These coordinates
led to his girlfriends location, and, eventually
to him. [4, 5]

HOW COMPUTERS USE METADATA

Figure 5. More Metadata in a Word Document


50

Have you ever tried to open an email attachment


and received an error message saying the computer doesnt know what program to use to open
the file, or perhaps your system tried to open the
file and says the file didnt contain what it expected
to see?

METADATA: What It Is and Why You Should Care


Computers (such as PCs running MS Windows)
know what type of information should be in a
file two ways: the .xxx ending on the file name
called the files extension, and an internal code
within the file.
For example, a document stored in Adobe Acrobat
format normally ends in .pdf (portable document
format), such as MyDocument.pdf (Figure 6).
Windows uses the extension .pdf to know what
program to use to read the file (in this case, Adobe
Acrobat.)
If you were to look at that same file using a simple text editor (like Notepad), you would see that
the very first characters in the file are %PDF. This
file signature identifies that the type of information
stored in the file, in this case PDF). See Figure 7
for an example of such a signature.
A person can, however, change the file extension in an attempt to hide something from plain
view. For example, a spy might rename a spreadsheet recording a list of bribes from MyBribeList.
xls (Excel spreadsheet) to My home movie.mov
(a movie format).
A person who tries to open this movie will get a
message saying the file cannot be opened. While
an observer may assume such a file contains a
movie based on the .MOV extension, modern forensic tools can indicate when a file extension
does not match what the file really contains, as
identified by the file signature. (Renaming the file
extension in an attempt to mask whats inside the
file is a technique child pornographers frequently
use in an attempt to hide illegal photographs.)

GUARDING AND PRESERVING METADATA

Now that you knowing about metadata, what


should do?

turned on, make continuous changes to their storage areas. If your organization is faced with litigation, immediately consult with your corporate attorney and a reputable eDiscovery or computer
forensic specialist about the best way to preserve
all your ESI, including metadata.
If you are an attorney in litigation, be aware of
metadata in your clients productions and include
metadata in your requests for production. The federal rules of discovery are clear about metadata,
but state rules may vary. See the Kroll Ontrack, [7]
and K&L Gates [8] websites for up to date information on individual states rules.
Metadata may have much to tell someone interested in your business. One real estate attorney
handled lucrative casino properties. Many of his
clients did not want others to know of their interest in such properties. Unfortunately, the attorney
used a boilerplate proposal document, repeatedly
saving it under different clients names. The metadata revealed the names of interested parties going back several years; and many of those clients
were competitors.

SEARCH FOR METADATA

Ordinary search tools, such as Windows and


Googles search features, do not recognize nor
search the hidden metadata in files. There are,
however, a limited number of tools that allow people to examine the metadata in files. But if you
need to search a large collection of electronically
stored information (as is often necessary in litigation), use a certified eDiscovery consultant who
can help find potential evidence that might be in
the metadata.

SHARING YOUR METADATA?

First, exercise caution when sharing any documents you work with, especially when sharing
them with people outside your organization. Can
you remove metadata or otherwise protect it from
prying eyes? There are tools that can do this, to a
certain extent. [6] But dont forget there is also imbedded metadata, data that is harder to change.

PRESERVE METADATA

If you are an IT professional and your company


is involved in a lawsuit (or even if you think your
company might be involved in a lawsuit), you must
take steps to ensure the metadata of ESI in your
control is not altered. You may also need to set
up mechanisms so that other employees can set
aside ESI that need to be preserved. Merely turning on a computer makes more than 160 changes to a computers hard drive(s). Many of those
changes are to the dates on files, dates that may
be crucial to a case. Cell phones, once they are
www.eForensicsMag.com

Figure 6. .pdf Document Extension

Figure 7. .pdf Document Signature


51

SEARCH FOR ANOMALIES

There are tools available that can alter metadata


such as the time a photograph was taken or the
date a file was last modified. If you are involved in
a lawsuit where dates are important, an eDiscovery specialist can use forensic tools to determine
if relevant data, including metadata, were changed
(purposely or accidentally).

MMAINTAIN DIGITAL OF CUSTODY

Since ESI is easily changed by even simple, innocent acts such as opening a file or booting a computer, special care is needed in managing ESI.
Preserving the original media (such as a memory
card from a camera, the hard drive(s) from a computer, or the files in a smartphone) is the best way
to preserve data. The processes of insuring the integrity of potential evidence is known as maintaining chain of custody.
Other than the original media itself, currently, the
best way to preserve electronic media is for a forensic specialist to make a valid forensic image
[9], an exact bit-by-bit copy of the item in question.
Such images preserve everything on the media
(including all metadata) and are regularly accepted in court proceedings as valid evidence. There
are numerous tools for making such images. Using appropriate tools, these images can be examined without worry about changing the original evidence.

THE POWER OF METADATA

Information stored in metadata can make or break


a case if your company is ever sued (or, in turn, if
your company needs to sue a competitor). IT departments are usually the first to be contacted internally when litigation is known or contemplated.
Uninformed handling of potential evidence may inadvertently lose or modify important data, including metadata.
Be aware of metadata: what it is, where it is, how
to preserve it, when to (and when not to) delete it.
A law suit can be won or lost on metadata alone.
Use it to your advantage.

52

REFERENCES

[1] http://bucks.blogs.nytimes.com/2010/06/01/why-photocopiers-have-hard-drives/?_r=0.
[2] http://bucks.blogs.nytimes.com/2010/05/20/the-identity-theft-threat-from-copiers/?scp=1&sq=copier&st=cse.
[3] http://www.cbsnews.com/stories/2010/04/19/eveningnews/main6412439.shtml.
[4] http://gizmodo.com/5901430/these-breasts-nailed-anonymous-hacker-in-fbi-case.
[5] http://www.dailymail.co.uk/news/article-2129257/Higinio-O-Ochoa-III-FBI-led-Anonymous-hacker-girlfriend-posts-picture-breasts-online.html.
[6] See http://en.wikipedia.org/wiki/Metadata_removal_
tool.
[7] http://www.krollontrack.com/resource-library/rulesand-statutes/. (Double click on desired state.).
[8] http://www.ediscoverylaw.com/promo/state-districtcourt-rules/.
[9] Demystifying Computer Forensics, Louisiana State
Bar Association, J. Hassell, Ph.D. and S. Steen, December 1999.

ABOUT THE AUTHORS

Dr. Johnette Hassell has 30 years experience in computer-related litigation support. A retired computer science professor,
she is a court-recognized expert in computer forensics, eDiscovery, computer science, and data recovery. She is a Certified
eDiscovery Specialist, and serves on the
Association of Certified eDiscovery Specialists (ACEDS) exam and exam preparation committees. Her work is published in
law and technical journals and she is a highly sought-after lecturer in CLE courses. As President and CEO of Electronic Evidence Retrieval, Dr. Hassell provides consulting services
ranging from early case assessment through testimony: http://
www.ElectronicEvidenceRetrieval.com.
Jack Molisani is a Computer Engineer with
almost 30 years experience in software
engineering, technical communicate, and
eDiscovery/computer forensics. He is a
Fellow of the Society for Technical Communication and the Executive Director of The
LavaCon Conference on Digital Media and
Content Strategies: http://lavacon.org.

THE METADATA ANALYSIS


TOOLS AND TECHNIQUES
(HOW TO)
by Dr. Sameera de Alwis

Metadata is organized information that pronounces, clarifies,


discovers, or else brands it laid-back to recover, custom, or
achieve an information resource. Metadata is frequently termed
data about data or information about information. An imperative
motive for forming evocative metadata is to expedite discovery
of germane information. In adding to resource discovery,
metadata can assist consolidate electronic resources, enable
interoperability and bequest resource amalgamation, deliver
digital identification, support archiving and conservation.
Metadata scrutiny is one of countless diverse types of analysis.
The interpretation of consequences from whichever solitary
examination process might be indecisive. It is imperative to
authenticate verdicts with supplementary analysis modus
operandi and algorithms.

he word metadata denotes


to data about data and it is
vague, as it is cast-off for twofold profoundly diverse notions or
forms. The structural is almost the
scheme and condition of data structures and is supplementary as it
should be termed data as the ampoules of data; eloquent metadata.
In supplementary confrontations, is
about discrete case in point of application data, the data resources. In
this scenario, an advantageous portrayal would be data as the data re-

54

sources or resources about resources in consequence meta-resources.


The metadata are habitually originate in pass indexes of archives. As
information has developed progressively more arithmetical, metadata
are similarly cast-off to designate alphanumeric data consuming metadata criteria precise to a certain
chastisement. Through recitation
the substances and environment of
data files, the eminence of the original data, information or files is momentously augmented. Metadata

THE METADATA ANALYSIS TOOLS AND TECHNIQUES (HOW TO)


are demarcated as the data given that information
about solitary/supplementary facets of the data,
Ex. tenacity of the data, originator/biographer of
the data, used criterions, period/epoch of conception, capitals of formation of data and position on
the network wherever the data were formed. As
such, metadata can be put in storage and embedded in a databank, every so often named the
metadata archive/metadata repository.
The rudimentary metadata apprehended by
computers can consist of information about date/
time, who is generated the object, last update,
file size and file extension. It might be inscribed
hooked on a digital photograph/image file that will
pinpoint who preserves it, copyright/connection information, what camera, capturing devices (scanners), camera phones/tablets or (software used to
edit the file) generated the file, beside with disclosure information and eloquent information such as
keywords about the photo or supplementary aforementioned resources, creating the file searchable on the computers, networks or/and Internet.
The graphic metadata criterions are controlled by
officialdoms that progress the resulting criteria.
The foremost criteria would be EXIF (Exchangeable Image File format), IPTC (International Press
Telecommunications Council), XMP (Extensible
Metadata Platform), ICC (International Color Consortium) Profiles, PLUS (Picture Licensing Universal System) and PrintIM (The Epson Print Image
Matching). Every so often, metadata are predominantly expedient in audiovisual, where information
about its resources are not openly comprehensible through a computer. The web pages frequently comprise metadata in the practice of meta-tags.
The most internet search engines enumerate
these data when accumulation of pages to their
search index.
Metadata syntax denotes to the guidelines
formed to configuration the fields or sometimes origins of metadata. A solitary metadata outline might
be articulated in an amount of diverse markdown
or indoctrination languages, every single of which
necessitates a diverse syntax. The intercontinental
criteria spread over to metadata. Ample exertion is
being consummate in the domestic and universal
standards societies, particularly The ANSI/ISO to
influence consent on systematizing metadata and
archives. It is imperative to annotation that this customary denotes to metadata as the data approximately ampules of the data and not to metadata as
the data about the data substances. It might also
be renowned that this customary pronounces itself
initially as a data component archive, recitation incorporeal data essentials, and obviously denies
the competence of covering multifaceted assemblies. Consequently the initiated tenure data constituent is supplementary pertinent than the earlier
pragmatic catchphrase metadata. Even though not
www.eForensicsMag.com

a customary, micro-format is a web-based tactic to


semantic markdown which pursues to re-use prevailing HTML and XHTML tags to deliver metadata. The micro-format tracks XHTML and HTML criteria but is not a customary in the aforementioned.
The HTML format cast-off to outline web pages
consents for the enclosure of a diversity of forms of
metadata, from elementary evocative text, dates/
times and keywords to supplementary cuttingedge metadata patterns. The metadata possibly
will be encompassed in the pages header or in
a dispersed file. The micro-formats countenance
metadata to be added to on-page data in a manner
that handlers do not perceive, on the other hand
computers can freely access. Remarkably, numerous search engines are restrained about consuming metadata in their positioning algorithms due to
abuse of metadata and the run-through of search
engine optimization (SEO), to expand positions.
Discovery of Metadata Diverse photo formats
comprise of diverse forms of metadata. Certain
formats, such as BMP, PPM, and PBM encompass precisely diminutive information further
than the image magnitudes and color interplanetary. Even though metadata does not recognize
the meticulous vicissitudes made to the picture,
it can be castoff to detect attributes, discrepancies, supplementary springs, revises, timelines,
and a coarse intellect of how the resource was
managed. In consequence, metadata delivers evidences about a resources purebred. There is no
customary for an obligatory set of metadata to be
existent in whichever precise photograph. Nevertheless, branded utensils engender acknowledged metadata terrains; certain forms of metadata are engendered throughout a save/store
and others are affixed to prevailing data. Specific
might be updated, while others might be engaged/
detached. Through comprehending the metadata
and once it can give the impression, and detective can develop a timeline and detect the order of
vicissitudes made to the file.
Inspecting metadata necessitates mining the
information from the file. There are plenteously
of GNU Public License (Open Source), free, and
commercial resources are existing. The most powerful limited specimens of obtainable metadata
tools would be such as Exiv2 (open source tool
that decodes (Exif, IPTC, and XMP) metadata,
ExifTool (one of the supreme dominant commandline metadata extraction tool available in this time).
It provisions loads of diverse file and metadata formats, counting numerous that are vendor-explicit and IrfanView is a dense/tiny, stress-free to use
and dominant image viewer/editor, backings more
than 50+ binary file types. It also supports HEX
View/Analysis, EXIF, IPTC and JPEG Comments.
Example info-mining and advanced image forensics techniques using Exiv2 and ExifTool.
55

EXIV2

Most forensics focused ACTIONS,


pr | print (Print image metadata. This is the default action, i.e., the command exiv2 image.jpg
will print a summary of the image Exif metadata)
ex | extract (Extract metadata to *.exv, XMP
sidecar (*.xmp) and thumbnail image files. Modification commands can be applied on-the-fly)
Most forensics focused OPTIONS
-b (Show large binary values (default is to suppress them))
-u (Show unknown tags (default is to suppress
tags which dont have a name))
-g key (Only output info for this Exiv2 key (grep).
Multiple -g options can be used to grep info for
several keys)
-n enc (Charset to use to decode Exif Unicode
user comments. enc is a name understood by
iconv_open(3), e.g., UTF-8)
-e tgt (Extract target(s) for the extract action. Possible targets are the same as those for the -d
option, plus a target to extract preview images
and a modifier to generate an XMP sidecar file.
-p mode (Print mode for the print action).
Possible modes are:
s (print a summary of the Exif metadata), a (print Exif,
IPTC and XMP metadata), t (interpreted (translated)
Exif tags), v (plain Exif tag values), h (hexdump of
the Exif data), i (IPTC datasets), x (XMP properties),
c (JPEG comment), p (list available image previews,
sorted by preview image size in pixels), -P flgs (Print
flags for fine control of the tag list (print action). Allows control of the type of metadata as well as data
columns included in the print output.
Valid flags are:
E (include Exif tags in the list), I (IPTC datasets),
X (XMP properties), x (print a column with the tag
number), g (group name), k (key), l (tag label), n (tag
name), y (type), c (number of components (count)),
s (size in bytes), v (plain data value), t (interpreted
(translated) data), h (hexdump of the data)
Some Examples,
exiv2 *.jpg (Prints a summary of the Exif information for all JPEG files in the directory)
exiv2 -pi image.jpg (Prints the IPTC metadata
of the image)
exiv2 -et img1.jpg img2.jpg (Extracts the Exif
thumbnails from the two files into img1-thumb.
jpg and img2-thumb.jpg)
exiv2 -ep1,2 image.jpg (Extracts previews 1
and 2 from the image to the files image-preview1.jpg and image-preview2.jpg)
56

exiv2 -eiX image.jpg (Extracts IPTC datasets


into an XMP sidecar file image.xmp and in the
process converts them to IPTC Core XMP
schema)

ExifTool

Some Examples
exiftool -a -u -g1 a.jpg (Print all meta information in an image, including duplicate and unknown tags, sorted by group (for family 1))
exiftool -common dir (Print common meta information for all images in dir)
exiftool -T -createdate -aperture -shutterspeed
-iso dir > out.txt (List specified meta information in tab-delimited column form for all images
in dir to an output text file named out.txt)
exiftool -s -ImageSize -ExposureTime b.jpg
(Print ImageSize and ExposureTime tag names
and values)
exiftool -l -canon c.jpg d.jpg (Print standard
Canon information from two image files)
exiftool -r -w .txt -common pictures (Recursively extract common meta information from files
in pictures directory, writing text output to .txt
files with the same names)
exiftool -b -ThumbnailImage image.jpg > thumbnail.jpg (Save thumbnail image from image.jpg
to a file called thumbnail.jpg)
exiftool -b -JpgFromRaw -w _JFR.JPG -ext
NEF -r. (Recursively extract JPG image from all
Nikon NEF files in the current directory, adding
_JFR.JPG for the name of the output JPG files)
exiftool -d %r %a, %B %e, %Y -DateTimeOriginal -S -s -ext jpg. (Print formatted date/time for
all JPG files in the current directory)
exiftool -IFD1:XResolution -IFD1:YResolution
image.jpg (Extract image resolution from EXIF
IFD1 information (thumbnail image IFD))
exiftool -*resolution* image.jpg (Extract all
tags with names containing the word Resolution from an image)
exiftool -xmp:author:all -a image.jpg (Extract all
author-related XMP information from an image)
exiftool -xmp -b a.jpg > out.xmp (Extract complete XMP data record intact from a.jpg and write
it to out.xmp using the special XMP tag (see the
Extra tags in Image::ExifTool::TagNames))
exiftool -p $filename has date $dateTimeOriginal -q -f dir (Print one line of output containing
the file name and DateTimeOriginal for each
image in directory dir)
exiftool -ee -p $gpslatitude, $gpslongitude,
$gpstimestamp a.m2ts (Extract all GPS positions from an AVCHD video)
exiftool -icc_profile -b -w icc image.jpg (Save
complete ICC_Profile from an image to an output file with the same name and an extension
of .icc)

THE METADATA ANALYSIS TOOLS AND TECHNIQUES (HOW TO)


exiftool -htmldump -w tmp/%f_%e.html t/images (Generate HTML pages from a hex dump of
EXIF information in all images from the t/images directory. The output HTML files are written
to the tmp directory (which is created if it didnt
exist), with names of the form FILENAME_
EXT.html)
exiftool -a -b -ee -embeddedimage -W
Image_%.3g3.%s file.pdf (Extract embedded
JPG and JP2 images from a PDF file. The output
images will have file names like Image_#.jpg or
Image_#.jp2, where # is the ExifTool family 3
embedded document number for the image)

hand, want to adjacent by pledging that the grind


will have to be done-at the commencement, once
the blunders are less expensive, or future, when
the metadata implementation ultimately and inexorably terminates to work efficiently or to raise with
the association and its users. Remunerating vigilant courtesy to every concern, predominantly understanding the protagonist the information shows
for the business, will be recompensed when the
metadata implementation develops a precarious
tool for comprehending the organizational undertaking nowadays and far into the imminent.

CONCLUSION

ABOUT THE AUTHORS

The conservation metadata might, for that reason,


have an expedient role in assisting confirm that
digital information will be accessible to forthcoming
compeers. When contemplate about every constituent of a metadata enactment that must be addressed, the comprehensible enticement is to expurgated junctions, rapidly select a multipurpose
standard, and concern about every supplementary disputes future. In numerous eons of choosing and put on metadata standards, on the other
a

www.eForensicsMag.com

Dr. Sameera de Aliws has over 20 years of experience in Information Technology with emphasis on Information Security
and Consulting arena. Key assignments included security assessments, security architecture, business and systems analysis, and secured network/software design. Client base included public utilities, aerospace, financial institutions, health
maintenance organizations, educations, law prosecutions,
universities, militaries, police, telecommunications providers,
retail, distribution, and manufacturing businesses in both private and government (Local/Global).

57

METADATA IN DIGITAL
FORENSICS
by Bert Moss

In this article I will write about what is Metadata, some metadata


analysis / extraction tools and the various techniques used
in extracting and analyzing metadata mainly from a Digital
Forensics point of view.

s you may already know, data


is usually described as a collection of facts, such as values
or measurements. It can be numbers, words, measurements, observations or just descriptions of things.
Data is presented in:
Qualitative data Contains descriptive information about something
Quantitative data Contains numerical information (numbers)
Discrete data Can only take
certain value
Continuous data - Can take any
value within a range

ABOUT METADATA

Simply put, metadata can be described as data about data. This descriptive information can be about
a particular data set, object, or resource, including its format, when
and by whom it was collected. Metadata can describe either physical or
electronic resources. Note: The process collecting metadata is also creating metadata traces.
The essential concept of metadata has always existed since the
collection of information or data began. An example of this concept can
be found in a public library system,
where information in library card catalogs serves as a collection management and resource discovery tool
58

which can then be indexed. This is a


good example of metadata indexing.
Metadata helps to support the data that you produced; this is essential for retrieving information at a later
time about a particular file or document. To the average computer user,
data is generated every day.
A simple file, word document or
spreadsheet file will contain metadata. In more advanced scenarios, data
managers who are usually more technically inclined or computer specialists will manually create metadata.
At Scientific or Data Research
Warehouses where cataloging is of
great significance, specialized software will be used that usually will allow a user to manually create and
update metadata. In this scenario,
it is not uncommon where the data producer and metadata producer
are two separate different individuals or entities. However, in this environment they (Data producer and
Metadata producer) must work hand
in hand having good communication
between them to ensure that the data
and the metadata are in tandem.

WORKING WITH METADATA


(CREATION)

Creating metadata requires an understanding of both the data you are


trying to describe and the metadata
standard or scheme itself (for more
information about the different meta-

METADATA IN DIGITAL FORENSICS


data standards, visit http://en.wikipedia.org/wiki/
Metadata_standards). This is important because
you will need to decide how you will encode the
information. Usually, a single disk file is created
for each metadata record where one disk file describes one data set. You can then use a tool for
instance, (USGS Online Metadata Editor online
freeware) to enter information into this disk file so
that the metadata will conform to the appropriate
standard.

METADATA IN FORENSICS

Metadata Analysis / Extraction


Techniques

In Digital Forensics, the recovered data needs to


be properly documented. As previously mentioned
earlier in the article, the data that is analyzed contains information about Metadata.
The Digital Forensics industry standards require
certified computer examiners or forensics experts
to follow certain protocols during their investigations. The main objective of a properly conducted
investigation or analysis of a computer or digital
media by a professional examiner is to locate possible evidence by means of seizure, search, and
retrieval, while maintaining data integrity of the
original or suspect media. This evidence must be
able be upheld in a Court of Law.
A good practice would be to perform a hash of
the suspect media prior to beginning any investigation. A forensically clean copy (sanitized copy)
of the suspect media should be made bit for bit.
This is known as the Evidence media.
Once the investigation is completed, another
hash is then performed against the evidence media to ensure an exact match with the suspect media still exists.
Hashing is the process of getting a validated exclusive fixed string of data that defines the originality of a digital property. A hash is achieved when a
collection of information that you may want to pre-

serve is run through a hash function. This process


is what we term hashing and the resulting hash is
exclusive to the original content and can therefore
be used as a fingerprint of that data. Since a hash
creates its own exclusive fingerprint or exclusive
data signature, it can be used to determine whether a set of data was modified.
The evidences metadata is extremely crucial as
it presents evidence as to;





when the data in question was created,


last accessed or;
modified or;
deleted and;
by whom and ;
what time each action was performed.

Data can come in many forms such as, database


files, document files, spreadsheet files, picture or
media files, email and chat files, as well as temporary internet files (from browsers).
The commercial and free forensic tools listed later
in this article, are just a few of the tools that most
digital forensic professionals like myself use to carry out metadata analysis during their investigations.
Recommended tools for metadata analysis in
Windows based environments are FTK, Paraben
and Metadata Assistant, with MacQuisition being preferred for MAC OSX based environments.
These tools are mostly automated, and do a terrific job of producing precise metadata extraction
results when examining the evidences media.
You will be able to view, document and create reports for the metadata of the data set investigated.
The metadata information can work hand in hand
with the hashing results during an investigation.
For instance, if the hashing results do not match
a particular file, folder or media after the investigations, then the metadata results can be used to determine which possible files included in the investigations or analysis were modified.

METADATA OF A SIMPLE WORD


DOCUMENT FILE (TITLED DOCUment 15)

The above picture shows a Figure 1 of a simple


word document (titled DOCument 15). Here, you

Figure 1. Metadata of a simple word document file (titled


document 15)
www.eForensicsMag.com

Figure 2. MetaData Assistant (Payne Consulting Group)


Metadata Options Snapshot
59

can see information about the word document


such as who created the document, the creation
date, last saved date and the date the document
was last modified. It is this information that is contained in the generated forensic reports.

Figure 6. CAINE Metadata Snapshot

The subsequent snapshots below are metadata


information from a few commercial and free Forensic applications (Figure 2).
The metadata options, provides a list of criteria
that can be used to produce the resultant metadata (Figure 3 and Figure 4).

Forensics Metadata Analysis /


Extraction Tools (Commercial)

Figure 3. FTK Imager (AccessData) Metadata Snapshot

Figure 4. Encase (by Guidance Software) Metadata


Snapshot

FTK v5.0 (Forensic Toolkit) by AccessData


(Windows Based Platform)
Encase Forensic v7.0 by Guidance Software
(Windows Based Platform)
Metadata Assistant v4.0 by Payne Group
(Windows Based Platform)
Helix v3.0 by e-Fense Carpe Datum (Windows/MAC OSX/ Linux Based Platforms)
Paraben P2 Commander 2.0 by Paraben
Corporation (Windows Based Platform)
BlackLight 2013 R1.1 by BlackBag Technologies (Windows/MAC OSX/IOS Based Platforms)
MacQuisition 2013 R1.1 by BlackBag Technologies (MAC OSX Based Platforms)

Forensics Metadata Analysis /


Extraction Tools (Free)

SANS Investigative Forensics Toolkit v2.1 SFT


(UBUNTU Platform) (Figure 5).
CAINE v3.0 (Linux Platform) (Figure 6).

Note

Keep in mind, the tools listed above both commercial and free, have far greater features than just
the analysis / extraction of metadata.
ABOUT THE AUTHOR

Figure 5. SANS Investigative Forensics Metadata Snapshot


60

Bert Moss is the president of Integrated Systems Explorers


(Bahamas) (aka. ISEBahamas) and a partner in Tri-Technology Ltd (Bahamas) both located in Nassau, The Bahamas.
As an IT professional, he has over 24 years experience in
the field of Information Technology and 5 years experience
as a Computer Forensic Examiner. Email: isebahamas@coralwave.com

Its not
about data.
Its about
meanIng.
If you think mobile forensics is just about
extracting data think again. Its not only
what you get, but what you do with it that
really makes the difference.
XRY has an intuitive GUI thats easier to
use, has better display capabilities and
superior analysis functionality.

msab.com