Академический Документы
Профессиональный Документы
Культура Документы
VOL. 28,
951
INTRODUCTION
short). Unlike conventional cache-based path planning systems where a cached query is returned only when it
matches completely with a new query, PPC leverages partially matched queried-paths in cache to answer part(s)
of the new query. As a result, the server only needs to compute the unmatched path segments, thus significantly
reducing the overall system workload.
Fig. 1 provides an overview of the proposed PPC system
framework, which consists of three main components (in
rectangular boxes, respectively): (i) PPattern Detection,
(ii) Shortest Path Estimation, and (iii) Cache Management.
Given a path planning query (see Step (1)), which contains a
source location and a destination location, PPC firstly determines and retrieves a number of historical paths in cache,
called PPatterns, that may match this new query with high
probability (see Steps (2)-(4)).1 The idea of PPatterns is
based on an observation that similar starting and destination nodes of two queries may result in similar shortest
paths (known as the path coherence property [1]). In the component PPatern Detection, we propose a novel probabilistic
model to estimate the likelihood for a cached queried-path
to be useful for answering the new query by exploring their
geospatial characteristics. To facilitate quick detection of
PPatterns, instead of exhaustively scanning all the queriedpaths in cache, we design a grid-based index for the PPattern Detection module. Based on these detected PPatterns,
the Shortest Path Estimation module (see Steps (5)-(8)) constructs candidate paths for the new query and chooses the
best (shortest) one. In this component, if a PPattern perfectly
matches the query, we immediately return it to the user;
otherwise, the server is asked to compute the unmatched
1. The notion of PPatterns will be formally defined and further
explained in Section 3.2.
1041-4347 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
952
APRIL 2016
We propose an innovative system, namely, path planning by caching, to efficiently answer a new path
planning query by using cached paths to avoid
undergoing a time-consuming shortest path computation. On average, we save up to 32 percent of time
in comparison with a conventional path planning
system (without using cache).
We introduce the notion of PPattern, i.e., a cached
path which shares segments with other paths. PPC
supports partial hits between PPatterns and a new
query. Our experiments indicate that partial hits constitute up to 92.14 percent of all cache hits on average.
A novel probabilistic model is proposed to detect the
cached paths that are of high probability to be a
PPattern for the new query based on the coherency
property of the road networks. Our experiments
indicate that these PPatterns save retrieval of path
nodes by 31.69 percent on average, representing a
10-fold improvement over the 3.04 percent saving
achieved by a complete hit.
We have developed a new cache replacement mechanism by considering the user preference among
roads of various types. A usability measure is
assigned for each query by addressing both the road
type and query popularity. The experimental results
2. Notice that major roads such as freeways and state routes typically have more capacity and importance than the branch roads and
local streets. For simplicity, in this paper, we refer to road types of
higher importance and capacity as higher road types.
RELATED WORK
953
query logs from the EXCITE search engine [19]. Three important observations are described as follows. First, a small
number of queries are frequently re-used. By preserving
results of these queries in cache, the system is able to respond
to the users without incurring time-consuming computations. Second, while a larger cache size implies a higher hit
ratio, significant overheads may be incurred for cache maintenance. Third, static cache replacement has better performance when the cache size is small, and vice versa for
dynamic cache replacement. Static cache replacement [19],
[20], [21], [22], [23], [24] aims to preserve the results of the
most popular and frequent queries, thus incurring a very
low workload during query processing. However, the cache
content may not be up to date to respond to recent trends in
issued queries. Dynamic cache replacement [19], [25], [26], in
contrast to a static cache, preserves the results of the most
recent queries, but the system incurs an extra workload.
In order to improve the retrieval efficiency of the path
planning system, Thomsen et al. propose a new cache management policy [17] to cache the results of frequent queries
for reuse in the future. To enhance the hit ratio, a benefit
value function is used to score the paths from the query
logs. Consequently, the hit ratio is increased, hence reducing the execution times. However, the cost of constructing a
cache is high, since the system must calculate the benefit
values for all sub-paths in a full-path of query results. For
on-line, map-based applications, processing a large number
of simultaneous path queries is an important issue. In this
paper, we provide a new framework for reusing the previously cached query results as well as an effective algorithm
for improving the query evaluation on the server.
PRELIMINARIES
954
Symbols
Explanations
lat, lng
R
g
v
e
w
GV; E
W
q; qs;t
p; ps;t
SDPs; t,
SDPq
jpj
PT
D
Dv1 ; v2
d^pq
p
u
m
C
jCj
b
APRIL 2016
vnx
1 : 9m; n : 8x 2 0; k 1; vmx
i
j
0 : otherwise:
Based on the above definition, we may construct the shortest distance path of a query q from a starting node s0 to a
destination node t0 , denoted by SDPq or SDPs0 ; t0 , from
one of its cached PPatterns SDPs; t as follows:
SDPs0 ; t0 SDPs0 ; a SDPa; b SDPb; t0 ;
(1)
Fig. 4. Shortest path computation, ps;t is a 2-PPattern for path ps0 ;t0 as
they share a segment pa;b with each other.
955
where a and b are head and tail of the common path segment in the cached PPattern ps;t of the query qs0 ;t0 , respectively and is the concatenation operation. Note that
Eq. (1) is a general form of the constructed path, where
SDPs0 ; a or SDPb; t0 may not necessarily exist, e.g., s0 a
or b t0 . The unshared segments SDPs0 ; a and SDPb; t0
are to be computed by the server. The length of the path can
thus be recomputed as follows:
path if its error is within a certain tolerable range, exhaustive inspection cannot not be sure that the path with the
minimal error is found until all paths are inspected. To
address these challenges, we aim to narrow down the
inspection scope to only good candidates.
min
a;b2Vp ;a6b
(4)
In Eq. (4), the factor SDPs0 ; t0 is fixed for any cached path
p 2 C. Consequently, the estimation errors for the same
query corresponding to different cached paths are proportional to the lengths of their own constructed paths:
e / d^ps0 ;t0 :
(5)
(6)
p2C
PPATTERN DETECTION
To detect the best PPatterns, an idea is to calculate the estimation distance based on each cached path by Eq. (5), and
select the cached path with the shortest distance. However,
it faces several challenges. Firstly, the distance estimation in
Eq. (3) requires the server to compute the unshared segments (i.e., SDPs0 ; a; SDPb; t0 ). Therefore, it incurs significant computation to exhaustively examine all cached paths.
Secondly, such an exhaustive operation implicitly assumes
that each cached path is a PPattern candidate to the query.
However, this is not always true. For example, a path in
Manhattan does not contribute to a query in London.
While we assume that users may accept an approximate
956
APRIL 2016
(9)
(10)
In Scenario 6(a),
e aDs0 ; s Dt; t0 Ds; a Db; t
Ds0 ; a Db; t0 :
(11)
(12)
In Scenario 6(b),
e aDs0 ; s Dt; t0 Ds; o Do; t
Ds0 ; o Do; t0
aDs0 ; s Dt; t0 Ds; s0 Dt; t0 :
(13)
(14)
957
time for cache lookups and estimation. After that, the target
space is divided into equally sized grid cells by an input cell
size (Line 4). Then we locate the grid cells (an index number, we will discuss the cache index in Section 6.1) where
the query source and destination nodes are from (Line 5).
The system retrieves all queries overlapping both the source
and destination grid cells (Lines 6 to 7) and inserts them
into Q (Line 8), a set containing the candidate queries. This
process utilizes the grid index and path identifier to reduce
the lookup operations. The system returns the cached paths
in Q according to the query identifiers (Lines 9 to 10).
CACHE MANAGEMENT
958
APRIL 2016
TABLE 1
An Example of Cached Paths
path
cpid
1
2
..
.
n
For each cached path, all its nodes are maintained in the
second table with a sample profile shown as follows. Note
that these nodes are listed strictly in their traveling order
pathID:(nodeID),. . .,(nodeID).
A cache C stores result paths ps;t returned by queries qs;t ,
and each result path consists of a node set (e.g.,
ps;t fv0 ; v1 ; :::; vm g). As shown in Table 1, each result path
is stored in the cache and is identified by a cache id cpid
(e.g., c1 to represent p3;6 ).
(15)
We define the node weight as the maximal of all its outgoing edges as users can select one route from an intersection. We assume that users prefer higher road types [3],
which can be determined based on capacity, popularity, and
accessibility. In this work, we adopt ten road types defined
by Open Street Map and allocate a linearly increased weight
to each type, starting from 0.01 for unclassified roads till 1.0
for motorways. Fig. 9 shows an example of node and edge
weights corresponding to road types. Three edges are connected with node v1 . The weights for the three edges are
W1;2 0:4, W1;3 0:5, and W1;4 0:7, respectively. Based on
Eq. (15), the weight of W1 is set to W1;4 0:7. Table 2 shows
several examples of node weights and their primary road
types in a real road network in our experiment. The weight
of a node indicates how likely a path planning query with it
as the source node to be issued in the future. Such information can be used to design the cache replacement policy. In
959
TABLE 3
Two Stored Paths in the Cache
cpid
1
2
n
X
Wvk countvk ;
(16)
k1
v1
v2
v3
v4
v5
v6
p3;6 fv3 ; v1 ; v6 g
p2;4 fv2 ; v6 ; v1 ; v4 g
2.8
3.3
EXPERIMENTS
7.1 Dataset
We conduct a comprehensive performance evaluation of the
proposed PPC system using the road network dataset of
Seattle obtained from ACM SIGSPATIAL Cup 2012 [31].
The dataset has 25,604 nodes and 74,276 edges. For the
query log, we obtain the Points-of-interest (POIs) in Seattle
from [32]. Next, we randomly select pairs of nodes from
these POIs as the source and destination nodes for path
planning queries. Four sets of query logs with different
distributions are used in the experiments: QLnormal and
QLuniform are query logs with normal and uniform distributions, respectively. QLcentral is used to simulate a large-scale
event (e.g., the Olympics or the World Cup) held in a city.
QLdirection is to simulate possible driving behavior (e.g.,
changing direction) based on a random walk method
described as follows. We firstly randomly generate a query
to be the initial navigational route. Next, we randomly
draw a probability to determine the chance for a driver to
change direction. The point of direction change is treated as
a new source. This process is repeated until the anticipated
number of queries are generated. The parameters used in
our experiments are shown in Table 4.
7.2 Cache-Supported System Performance
7.2.1 Cache versus Non-Cache
The main idea of a cache-supported system is to leverage
the cached query results to answer a new query. Thus, we
are interested in finding how much improvement our path
planning system achieves over a conventional non-cache
system. We generate query sets of various sizes to compare
the paths generated by our PPC and A algorithm. The performance is evaluated by two metrics: a) Total number of
visited nodes: it counts the number of nodes visited by an
TABLE 2
Road Type and Weight
Node
path
TABLE 4
Experimental Parameters
Weight Wi
Road type
1.0
0.78
0.78
0.56
0.34
0.01
Motorway
Primary
Primary
Tertiary
Living street
Unclassified
Parameter
Default
Value
Grid size
Cache size
#Quries
Data sets
2 km
5k
5k
QLnormal
0.5 5 km
1k 10k
0.5k 5k
QLnormal , QLuniform ,
QLcentral , QLdirection
960
APRIL 2016
TABLE 5
Performance Comparison between PPC and
the Non-Cache Algorithm
#Nodes
#Query
1k
2k
3k
4k
5k
Time (ms)
PPC
A
PPC
A
80,087
157,162
230,185
328,879
437,362
107,856
215,459
319,231
419,345
501,312
14,190,670
27,869,212
41,493,092
55,139,411
69,843,232
19,973,996
39,889,166
59,983,844
79,937,684
100,037,461
where hitscache is the total number of hits and jQj is the total
number of queries. The hit ratio results using different cache
mechanisms are compared in Fig. 11, from which we find
that, as expected, PPC achieves a much higher hit ratio than
the other three methods in all scenarios. We further analyze
the correlation between the hit ratio and the performance
metrics in terms of visited node saving ratio and query time
saving ratio with the results shown in Fig. 12. From the figures, we make the following observations:
(18)
Visited node saving ratio and Query time saving ratio indicate
how many nodes and how much time an algorithm can
save from a non-cache routing algorithm (e.g., A ), respectively. A larger value indicates better performance. In the
experiment, we increase the total query number from 1k to
5k and calculate the above two metrics using each cache
mechanism with the results shown in Fig. 10.
The x-axis represents the total number of queries while
the y-axis indicates the metric values in percentages. From
these figures, we can see clearly that our cache policy
always achieves the best performance among all measurements. On average, LFU, LRU, SPC* and PPC visit 30.47
26.86 27.78 and 34.73 percent fewer nodes than A* algorithm, and reduces the computational time from A* algorithm by 29.83, 26.32, 27.04 and 32.09 percent, respectively.
(19)
Fig. 11. Performance comparison with four cache replacement mechanisms in terms of hit ratio.
961
Fig. 12. Correlation between (a) hit ratio and visited node saving ratio,
and (b) hit ratio and query time saving ratio.
network graph, i.e., PPC does not always save subpaths if they require complex computations.
Because PPC leverages both complete and partial hit
queries to answer a new query, we additionally measure
the saving node ratio for partial hits. We ran an experiment
with 5k queries, and have plotted the results in Fig. 13.
From the figure we can see that partial hits appear evenly
along the temporal dimension. On average it achieves a
97.63 percent saving ratio, which is quite close to the complete hit saving ratio (100 percent). Among all cache hits,
the take-up percentages of complete and partial hits are
illustrated in Fig. 14a and their saving node percentage is
shown in Fig. 14b. The x-axis is the query size and the y-axis
is the percentage values. Notice that partial hit does not
achieve a 100 percent saving node ratio. However, as partial
hits occur much more frequently than complete hits, its
overall benefit to the system performance outweighs that of
the complete hits. On average, partial hits take up to 92.14
percent of the whole cache hit. The average saving node
ratio is 31.67 percent by partial hits, 10 times as much as
that from complete hits at 3.04 percent.
Fig. 14. Comparison between partial hits and complete hits using PPC
with different query sets in terms of (a) hit ratio and (b) visited node
saving ratio.
Fig. 13. Visited node saving ratio distribution among partial hit queries
with #Q 5K.
1K
2K
3K
4K
5K
SPC
PPC
497,359
1,006,822
1,529,832
2,034,401
2,568,893
51
102
157
213
263
962
APRIL 2016
Fig. 16. PPC performance analysis: effect of grid size. (a) Visited node
saving ratio and query time saving ratio, and (b) average deviation
percentage.
Fig. 15. Performance comparison among four cache mechanisms with various data distributions for query logs in terms of (a,c,e) visited node saving
ratio and (b,d,f) query time saving ratio with different numbers of queries.
parameters use the default values. The results for each distribution are shown in Fig. 15. As shown, PPC always
achieves the highest score in all scenarios.
A statistical analysis of these metrics is summarized in
Table 7. In order to measure the performance improvement,
we calculate an improvement factor over the second best
method, denoted by D, as follows:
D dPPC =maxfdLRU ; dLFU ; dSPC g:
(20)
TABLE 7
Average Performance Using Four Cache Mechanisms
for Different Query Distributions
Metric
dnode
dtime
dhit
Distribution
PPC
LRU
LFU
SPC*
Uniform
Central
Directional
Uniform
Central
Directional
Uniform
Central
Directional
20.79
45.89
31.96
20.70
45.67
33.62
18.41
46.45
31.12
10.31
35.32
0.74
10.27
35.57
0.96
10.48
35.14
0.81
10.69
31.47
0.63
10.73
31.61
0.85
10.71
31.36
0.71
11.01
28.36
1.13
11.03
27.65
1.47
11.47
28.13
1.19
1.89
1.30
28.16
1.88
1.28
22.84
1.61
1.32
26.17
(21)
963
Fig. 17. PPC performance analysis: effect of cache size. (a) Visited node
saving ratio and query time saving ratio, and (b) average deviation
percentage.
the grid size, more cached paths are retrieved as a cache hit
which thus prevent sending a complete new query to the
server. However, it may retrieve paths that are less relevant
so the average deviation percentage increases as well. In a
real system, by adjusting the grid-cell size, we can keep a satisfactory balance between the benefit and cost empirically.
Fig. 19. PPC performance analysis: effect of temporal factor. (a) Visited node saving ratio and query time saving ratio, and (b) average
deviation percentage.
CONCLUSION
REFERENCES
[1]
[2]
Fig. 18. PPC performance analysis: effect of minimal source-destination
distance. (a) Visited node saving ratio and query time saving ratio, and
(b) average deviation percentage.
H. Mahmud, A. M. Amin, M. E. Ali, and T. Hashem, Shared execution of path queries on road networks, Clinical Orthopaedics
Related Res., vol. abs/1210.6746, 2012.
L. Zammit, M. Attard, and K. Scerri, Bayesian hierarchical
modelling of traffic flow - With application to Maltas road
network, in Proc. Int. IEEE Conf. Intell. Transp. Syst., 2013,
pp. 13761381.
964
[3]
[4]
[5]
[6]
[7]
[8]
[9]
[10]
[11]
[12]
[13]
[14]
[15]
[16]
[17]
[18]
[19]
[20]
[21]
[22]
[23]
[24]
[25]
[26]
[27]
APRIL 2016