Mining Massive RFID, Trajectory, and Traffic Data

Mining Massive RFID, Trajectory,
and Traffic Data Sets

RFID Data Mining
RFID Data Mining
• Radio Frequency Identification (RFID)
• Technology that allows a sensor (reader) to read, from a distance, and without line
of sight, a unique electronic product code (EPC) associated with a tag
Interrogate
(EPC, time)
EPC
Tag Reader Server

RFID Data Mining
RFID Data Mining
• Application
– Supply Chain Management: Real‐time inventory
tracking
– Airline luggage management: Reduce
lost/misplaced luggage
lost/misplaced luggage
– Medical: Implant patients with a tag that contains
their medical historyy
– Access control: Toll collection
RFID Data Mining
RFID Data Mining
RFID Data Mining
RFID Data Mining
• Challenges
g
– Data generated by RFID systems is enormous (peta‐bytes
in scale!) due to redundancy and low level of abstraction
• Data analysis requirements
D t l i i t
– Highly compact summary of the data
– Operations on multi‐dimensional view of the data
Operations on multi dimensional view of the data
– Preserving the path structures of RFID data for analysis
– Efficiently
Efficiently drilling down to individual tags when an
drilling down to individual tags when an
interesting pattern is discovered
RFID Data Warehousing and Mining
Flow Mining Traffic Mining Other Adaptive Fastest Path Computation on a

Road Network: A Traffic Mining
A
Approach [Gonzalez, et al. VLDB’07]
h [G l t l VLDB’07]
Mining Engine
FlowCube: Constructing RFID
FlowCubes for Multi‐Dimensional
Analysis of Commodity Flows
Analysis of Commodity Flows
RFID [Gonzalez et al. VLDB’06]
Warehouse
Mining Compressed Commodity
Workflows From Massive RFID Data Sets
Workflows From Massive RFID Data Sets
[Gonzalez, et al. CIKM’06]
Warehousing Engine
Warehousing and Analyzing Massive RFID
Data Cleaning
Data Cleaning Data Sets [Gonzalez et al ICDE’06
Data Sets [Gonzalez, et al. ICDE 06 ] (Best
] (Best
Student Paper)
Cost‐Conscious Cleaning of Massive
RFID Data Sets [Gonzalez et al
RFID Data Sets [Gonzalez, et al.
RFID Data RFID Data … RFID Data ICDE’07]
Site 1 Site 2 Site k
6
Cleaning of RFID Data Records
• Raw Data
– (EPC, location, time)
(EPC location time)
– Duplicate records due to multiple readings of a
product at the same location
product at the same location
– (r1, l1, t1) (r1, l1, t2) ... (r1, l1, t10)
• Cleansed Data: Minimal information to store and removal of
raw data
– (EPC, Location, time_in, time_out)
– (r
( 1, ll1, t1, t10)
• Warehousing can help fill‐up missing records and correct
wrongly‐registered information
gy g
7
Bulky Object Movements
shelf 1
10 pallets store 1 1.1.1.1

(1000 cases)
1.1.1 shelf 2
Dist. Center 1
1.1 store 2 1112
1.1.1.2
1.1.2 …
Factory Dist. Center2
1 1.2 …
10 packs
…
(12 sodas)
20 cases
(1000 packs)
[{i1,i2,…,i10000}, Dist Center 1, 01/01/08, 01/03/08]
11
1.1
8
Data Compression with GID
Bulky object movements
Objects often move and stay together
If 1000 packs of soda stay together at the distribution
center, register
g a single
g record
(GID, distribution center, time_in, time_out)
GID is a generalized identifier that represents the 1000
packs that stayed together at the distribution center
shelf 1
10 pallets store 1
((1000 cases)) Dist Center 1
Dist. shelf 2
store 2
Dist. Center2 …
Factory
…
10 packs
… (12 sodas)
20 cases
(1000 packs) 9
GID Naming
0.0 0.1
• GID Name Encodes Path l1 l2
• Benefit ‐ Speed 011

0.1.1
0.0.0 0.1.0
– Reduce GID Intersections
l3 l4
• Cost
Cost ‐ Space
– |Locations| 0.0.0.0
0.1.0.1
– Path length
Path length
l5 l6
10
RFID‐Cuboid Construction Algorithm
1. Build a prefix tree for the paths in the cleansed database
2 For each node, record a separate measure for each group of
2. F h d d f h f
items that share the same leaf and information record
3. Assign GIDs to each node:
Assign GIDs to each node:
GID = parent GID + unique id
4. Each node generates a stay record for each distinct measure
g y
5. If multiple nodes share the same location, time, and
measure, generate a single record with multiple GIDs
11
Three RFID‐Cuboids
• Stay Table: (GIDs, location, time_in, time_out: measures)
– Records information on items that stay together at a given location
Records information on items that stay together at a given location
– If using record transitions: difficult to answer queries, lots of
intersections needed
• Map Table: (GID <GID1,..,GID
Map Table: (GID, <GID GIDn>)
– Links together stages that belong to the same path. Provides
additional: compression and query processing efficiency
– High level GID points to lower level GIDs
Hi h l l GID i t t l l l GID
– If saving complete EPC Lists: high costs of IO to retrieve long lists,
costly query processing
• Information Table: (EPC list, attribute 1,...,attribute n)
– Records path‐independent attributes of the items, e.g., color,
manufacturer, price
12
Frequent Pattern and Sequential Pattern
Analysis
• Frequent patterns and sequential patterns can be related to
q p q p
movement segments and paths
• Taking movement segments and paths base units, one can
Taking movement segments and paths base units, one can
perform multi‐dimensional frequent pattern and sequential
p
pattern analysis
y
• Correlation analysis can be formed in a similar way
– Correlation components can be stay, move segments, and paths
Correlation components can be stay move segments and paths
• Efficient and scalable algorithms can be developed using the
warehouse modeling
warehouse modeling
13
Outlier Analysis in RFID Data
Outlier Analysis in RFID Data
• Outlier detection in RFID data is by‐product of other mining tasks
– Data flow analysis: Detect those not in the major flows
– Classification: Treat outliers and normal data as
diff
different class labels
l l b l
– Cluster analysis: Identify those that deviate
substantially in major clusters
b t ti ll i j l t
– Trend analysis: Those not following the major trend
– Frequent pattern and sequential pattern analysis:
anomaly patterns
14
Trajectory Data Mining
• A
A trajectory is a sequence of the location and
trajectory is a sequence of the location and
timestamp of a moving object
• Satellite
Satellite, sensor, RFID, and wireless
sensor RFID and wireless
technologies have been improved rapidly
– Tremendous amounts of trajectory data of moving
Tremendous amounts of trajectory data of moving
objects
T‐Pattern Mining
• Convert
Convert each trajectory to a sequence, i.e., by
each trajectory to a sequence i e by
converting a location (x,y) into a region
17
Sample T‐Patterns
Sample T Patterns
Data Source: Trucks in Athens – 273 trajectories)
18
Periodic Pattern (Mamoulis et al. 04)
Periodic Pattern (Mamoulis et al 04)
• In
In many applications, objects follow the same
many applications objects follow the same
routes (approximately) over regular time
intervals
– e.g., Bob wakes up at the same time and then
follows, more or less, the same route to his work
follows, more or less, the same route to his work
everyday Day 1:
Day 2:
Day 3:
19
Four Kinds of Relative Motion Patterns (Laube et al. 04,
Gudmundsson et al. 07)
• Flock (Parameters: m > 1 and r > 0) At least m entities are

within a circular region of radius r and they move in the same
within a circular region of radius r and they move in the same
direction
• Leadership (Parameters: m > 1, r > 0, and s > 0) At least m
entities are within a circular region of radius r, they move in the
entities are within a circular region of radius r, they move in the
same direction, and at least one of the entities was already
heading in this direction for at least s time steps
• Convergence (Parameters: m
(Parameters: m > 1 and r
> 1 and r > 0) At least m
> 0) At least m entities
entities
will pass through the same circular region of radius r (assuming
they keep their direction)
• Encounter (Parameters: m
(Parameters: m > 1 and r
> 1 and r > 0) At least m
> 0) At least m entities will
entities will
be simultaneously inside the same circular region of radius r
(assuming they keep their speed and direction)
20
Trajectory clustering
• Existing
Existing algorithms group trajectories as a whole
algorithms group trajectories as a whole
Ä They might not be able to find similar
portions of trajectories
p j
TR4 TR5
TR3
A common sub-trajectory
j y
TR2
TR1
• The partition‐and‐group framework is proposed
to discover common sub‐trajectories
to discover common sub trajectories
Trajectory
• Motif + Outliner detection
Motif + Outliner detection
An example of a normal An unusual trajectory;

trajectory The unusual points are
shown in black
Traffic Data
Traffic Data
car_id eid Time Speed Conditions

1 1 10 30 rain, no construction, accident
1 2 12 25 rain,
i no construction,
t ti no accident
id t
2 10 11 60 good weather, no construction, accident
... ... ... ...
Computed
RFID: Yes National Weather Service
Successive
Loop Sensor: No Transportation Management Centers
Observations
24
Traffic data mining
Traffic data mining
• Route
Route planning
planning
• Driving pattern
• Hot road detection
dd i
Major References
Major References
• J.‐G. Lee, J. Han, X. Li, and H. Gonzalez, “TraClass: Trajectory Classification Using Hierarchical Region‐
Based and Trajectory‐Based Clustering”, VLDB'08.
• J G Lee J Han and X Li "Trajectory
J.‐G. Lee, J. Han, and X. Li, Trajectory Outlier Detection: A Partition‐and‐Detect Framework
Outlier Detection: A Partition and Detect Framework", ICDEICDE'08
08.
• H. Gonzalez, J. Han, X. Li, M. Myslinska, and J. P. Sondag, “Adaptive Fastest Path Computation on a
Road Network: A Traffic Mining Approach”, VLDB'07.
• X. Li, J. Han, J.‐G. Lee, and H. Gonzalez, “Traffic Density‐based Discovery of Hot Routes in Road
N t
Networks”, SSTD'07.
k ” SSTD'07
• J.‐G. Lee, J. Han, and K.‐Y. Whang, “Trajectory Clustering: A Partition‐and‐Group Framework”,
SIGMOD'07.
• X. Li, J. Han, S. Kim, and H. Gonzalez, “ROAM: Rule‐ and Motif‐Based Anomaly Detection in Massive
M i Obj D S ” SDM'07
Moving Object Data Sets”, SDM'07.
• H.Gonzalez, J. Han, X. Shen, "Cost‐Conscious Cleaning of Massive RFID Data Sets", ICDE 2007
• H. Gonzalez, J. Han, X. Li, "Mining compressed commodity workflows from massive RFID data sets",
CIKM 2006
• H. Gonzalez, J.i Han, X. Li, D. Klabjan, "Warehousing and Analyzing Massive RFID Data Sets", ICDE 2006
• J. Han, H. Gonzalez, X. Li, and D. Klabjan, "Warehousing and mining massive RFID data sets", ADMA’06.
• X. Li, J. Han, S. Kim, "Motion‐alert: Automatic anomaly detection in massive moving objects", ISI’06
26

Mining Massive RFID, Trajectory, and Traffic Data

Загружено:

Сведения о документе

Исходное описание:

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Mining Massive RFID, Trajectory, and Traffic Data

Загружено:

Авторское право:

Доступные форматы

Mining Massive RFID, Trajectory,

and Traffic Data Sets

Tag Reader Server

Flow Mining Traffic Mining Other Adaptive Fastest Path Computation on a

10 pallets store 1 1.1.1.1

If 1000 packs of soda stay together at the distribution

• Benefit ‐ Speed 011

• Flock (Parameters: m > 1 and r > 0) At least m entities are

An example of a normal An unusual trajectory;

car_id eid Time Speed Conditions

Вам также может понравиться

Mining Massive RFID, Trajectory, and Traffic Data

Загружено:

Сведения о документе

Исходное описание:

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Mining Massive RFID, Trajectory, and Traffic Data

Загружено:

Авторское право:

Доступные форматы

Mining Massive RFID, Trajectory,

and Traffic Data Sets

Tag Reader Server

Flow Mining Traffic Mining Other Adaptive Fastest Path Computation on a

10 pallets store 1 1.1.1.1

 If 1000 packs of soda stay together at the distribution

• Benefit ‐ Speed 011

• Flock (Parameters: m > 1 and r > 0) At least m entities are

An example of a normal An unusual trajectory;

car_id eid Time Speed Conditions

Вам также может понравиться

If 1000 packs of soda stay together at the distribution