Академический Документы
Профессиональный Документы
Культура Документы
Interrogate
(EPC, time)
EPC
Cost‐Conscious Cleaning of Massive
RFID Data Sets [Gonzalez et al
RFID Data Sets [Gonzalez, et al.
RFID Data RFID Data … RFID Data ICDE’07]
Site 1 Site 2 Site k
6
Cleaning of RFID Data Records
• Raw Data
– (EPC, location, time)
(EPC location time)
– Duplicate records due to multiple readings of a
product at the same location
product at the same location
– (r1, l1, t1) (r1, l1, t2) ... (r1, l1, t10)
• Cleansed Data: Minimal information to store and removal of
raw data
– (EPC, Location, time_in, time_out)
– (r
( 1, ll1, t1, t10)
• Warehousing can help fill‐up missing records and correct
wrongly‐registered information
gy g
7
Bulky Object Movements
shelf 1
10 packs
…
(12 sodas)
20 cases
(1000 packs)
[{i1,i2,…,i10000}, Dist Center 1, 01/01/08, 01/03/08]
11
1.1
8
Data Compression with GID
Bulky object movements
Objects often move and stay together
center, register
g a single
g record
(GID, distribution center, time_in, time_out)
GID is a generalized identifier that represents the 1000
packs that stayed together at the distribution center
shelf 1
10 pallets store 1
((1000 cases)) Dist Center 1
Dist. shelf 2
store 2
Dist. Center2 …
Factory
…
10 packs
… (12 sodas)
20 cases
(1000 packs) 9
GID Naming
0.0 0.1
• GID Name Encodes Path l1 l2
– Path length
Path length
l5 l6
10
RFID‐Cuboid Construction Algorithm
1. Build a prefix tree for the paths in the cleansed database
2 For each node, record a separate measure for each group of
2. F h d d f h f
items that share the same leaf and information record
3. Assign GIDs to each node:
Assign GIDs to each node:
GID = parent GID + unique id
4. Each node generates a stay record for each distinct measure
g y
5. If multiple nodes share the same location, time, and
measure, generate a single record with multiple GIDs
11
Three RFID‐Cuboids
• Stay Table: (GIDs, location, time_in, time_out: measures)
– Records information on items that stay together at a given location
Records information on items that stay together at a given location
– If using record transitions: difficult to answer queries, lots of
intersections needed
• Map Table: (GID <GID1,..,GID
Map Table: (GID, <GID GIDn>)
– Links together stages that belong to the same path. Provides
additional: compression and query processing efficiency
– High level GID points to lower level GIDs
Hi h l l GID i t t l l l GID
– If saving complete EPC Lists: high costs of IO to retrieve long lists,
costly query processing
• Information Table: (EPC list, attribute 1,...,attribute n)
– Records path‐independent attributes of the items, e.g., color,
manufacturer, price
12
Frequent Pattern and Sequential Pattern
Analysis
• Frequent patterns and sequential patterns can be related to
q p q p
movement segments and paths
• Taking movement segments and paths base units, one can
Taking movement segments and paths base units, one can
perform multi‐dimensional frequent pattern and sequential
p
pattern analysis
y
• Correlation analysis can be formed in a similar way
– Correlation components can be stay, move segments, and paths
Correlation components can be stay move segments and paths
• Efficient and scalable algorithms can be developed using the
warehouse modeling
warehouse modeling
13
Outlier Analysis in RFID Data
Outlier Analysis in RFID Data
• Outlier detection in RFID data is by‐product of other mining tasks
– Data flow analysis: Detect those not in the major flows
– Classification: Treat outliers and normal data as
diff
different class labels
l l b l
– Cluster analysis: Identify those that deviate
substantially in major clusters
b t ti ll i j l t
– Trend analysis: Those not following the major trend
– Frequent pattern and sequential pattern analysis:
anomaly patterns
14
Trajectory Data Mining
Trajectory Data Mining
• A
A trajectory is a sequence of the location and
trajectory is a sequence of the location and
timestamp of a moving object
Trajectory Data Mining
Trajectory Data Mining
• Satellite
Satellite, sensor, RFID, and wireless
sensor RFID and wireless
technologies have been improved rapidly
– Tremendous amounts of trajectory data of moving
Tremendous amounts of trajectory data of moving
objects
T‐Pattern Mining
• Convert
Convert each trajectory to a sequence, i.e., by
each trajectory to a sequence i e by
converting a location (x,y) into a region
17
Sample T‐Patterns
Sample T Patterns
Data Source: Trucks in Athens – 273 trajectories)
18
Periodic Pattern (Mamoulis et al. 04)
Periodic Pattern (Mamoulis et al 04)
• In
In many applications, objects follow the same
many applications objects follow the same
routes (approximately) over regular time
intervals
– e.g., Bob wakes up at the same time and then
follows, more or less, the same route to his work
follows, more or less, the same route to his work
everyday Day 1:
Day 2:
Day 3:
19
Four Kinds of Relative Motion Patterns (Laube et al. 04,
Gudmundsson et al. 07)
20
Trajectory clustering
Trajectory clustering
Trajectory clustering
Trajectory clustering
• Existing
Existing algorithms group trajectories as a whole
algorithms group trajectories as a whole
Ä They might not be able to find similar
portions of trajectories
p j
TR4 TR5
TR3
A common sub-trajectory
j y
TR2
TR1
• The partition‐and‐group framework is proposed
to discover common sub‐trajectories
to discover common sub trajectories
Trajectory
• Motif + Outliner detection
Motif + Outliner detection
Computed
RFID: Yes National Weather Service
Successive
Loop Sensor: No Transportation Management Centers
Observations
24
Traffic data mining
Traffic data mining
• Route
Route planning
planning
• Driving pattern
• Hot road detection
dd i
Major References
Major References
• J.‐G. Lee, J. Han, X. Li, and H. Gonzalez, “TraClass: Trajectory Classification Using Hierarchical Region‐
Based and Trajectory‐Based Clustering”, VLDB'08.
• J G Lee J Han and X Li "Trajectory
J.‐G. Lee, J. Han, and X. Li, Trajectory Outlier Detection: A Partition‐and‐Detect Framework
Outlier Detection: A Partition and Detect Framework", ICDEICDE'08
08.
• H. Gonzalez, J. Han, X. Li, M. Myslinska, and J. P. Sondag, “Adaptive Fastest Path Computation on a
Road Network: A Traffic Mining Approach”, VLDB'07.
• X. Li, J. Han, J.‐G. Lee, and H. Gonzalez, “Traffic Density‐based Discovery of Hot Routes in Road
N t
Networks”, SSTD'07.
k ” SSTD'07
• J.‐G. Lee, J. Han, and K.‐Y. Whang, “Trajectory Clustering: A Partition‐and‐Group Framework”,
SIGMOD'07.
• X. Li, J. Han, S. Kim, and H. Gonzalez, “ROAM: Rule‐ and Motif‐Based Anomaly Detection in Massive
M i Obj D S ” SDM'07
Moving Object Data Sets”, SDM'07.
• H.Gonzalez, J. Han, X. Shen, "Cost‐Conscious Cleaning of Massive RFID Data Sets", ICDE 2007
• H. Gonzalez, J. Han, X. Li, "Mining compressed commodity workflows from massive RFID data sets",
CIKM 2006
• H. Gonzalez, J.i Han, X. Li, D. Klabjan, "Warehousing and Analyzing Massive RFID Data Sets", ICDE 2006
• J. Han, H. Gonzalez, X. Li, and D. Klabjan, "Warehousing and mining massive RFID data sets", ADMA’06.
• X. Li, J. Han, S. Kim, "Motion‐alert: Automatic anomaly detection in massive moving objects", ISI’06
26