Вы находитесь на странице: 1из 14

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING,

VOL. 28,

NO. 4, APRIL 2016

951

Efficient Cache-Supported Path


Planning on Roads
Ying Zhang, Member, IEEE, Yu-Ling Hsueh, Member, IEEE,
Wang-Chien Lee, Member, IEEE, and Yi-Hao Jhang
AbstractOwing to the wide availability of the global positioning system (GPS) and digital mapping of roads, road network navigation
services have become a basic application on many mobile devices. Path planning, a fundamental function of road network navigation
services, finds a route between the specified start location and destination. The efficiency of this path planning function is critical for
mobile users on roads due to various dynamic scenarios, such as a sudden change in driving direction, unexpected traffic conditions,
lost or unstable GPS signals, and so on. In these scenarios, the path planning service needs to be delivered in a timely fashion. In this
paper, we propose a system, namely, Path Planning by Caching (PPC), to answer a new path planning query in real time by efficiently
caching and reusing historical queried-paths. Unlike the conventional cache-based path planning systems, where a queried-path in
cache is used only when it matches perfectly with the new query, PPC leverages the partially matched queries to answer part(s) of the
new query. As a result, the server only needs to compute the unmatched path segments, thus significantly reducing the overall system
workload. Comprehensive experimentation on a real road network database shows that our system outperforms the state-of-the-art
path planning techniques by reducing 32 percent of the computation latency on average.
Index TermsSpatial database, path planning, cache

INTRODUCTION

the advance of the global positioning system


(GPS) and the popularity of mobile devices, we have
witnessed a migration of the conventional Internet-based
on-line navigation services (e.g., Mapquest) onto mobile
platforms (e.g., Google Map). In mobile navigation services,
on-road path planning is a basic function that finds a route
between a queried start location and a destination. While on
roads, a path planning query may be issued due to dynamic
factors in various scenarios, such as a sudden change in
driving direction, unexpected traffic conditions, or lost of
GPS signals. In these scenarios, path planning needs to be
delivered in a timely fashion. The requirement of timeliness
is even more challenging when an overwhelming number
of path planning queries is submitted to the server, e.g.,
during peak hours. As the response time is critical to user
satisfaction with personal navigation services, it is a mandate for the server to efficiently handle the heavy workload
of path planning requests. To meet this need, we propose a
system, namely, Path Planning by Caching (PPC), that aims
to answer a new path planning query efficiently by caching
and reusing historically queried paths (queried-paths in
ITH

Y. Zhang is with the School of Computing, National University of


Singapore, Singapore. E-mail: yingz118@gmail.com.
 Y.-L. Hsueh and Y.-H. Jhang are with the Department of Computer
Science and Information Engineering, National Chung Cheng University,
Taiwan. E-mail: hsueh@cs.ccu.edu.tw, phillipx303@gmail.com.
 W.-C. Lee is with the Department of Computer Science and Engineering,
Pennsylvania State University, University Park, PA 16802.
E-mail: wlee@cse.psu.edu.
Manuscript received 1 Apr. 2015; revised 25 Sept. 2015; accepted 3 Dec. 2015.
Date of publication 10 Dec. 2015; date of current version 3 Mar. 2016.
Recommended for acceptance by F. Li.
For information on obtaining reprints of this article, please send e-mail to:
reprints@ieee.org, and reference the Digital Object Identifier below.
Digital Object Identifier no. 10.1109/TKDE.2015.2507581

short). Unlike conventional cache-based path planning systems where a cached query is returned only when it
matches completely with a new query, PPC leverages partially matched queried-paths in cache to answer part(s)
of the new query. As a result, the server only needs to compute the unmatched path segments, thus significantly
reducing the overall system workload.
Fig. 1 provides an overview of the proposed PPC system
framework, which consists of three main components (in
rectangular boxes, respectively): (i) PPattern Detection,
(ii) Shortest Path Estimation, and (iii) Cache Management.
Given a path planning query (see Step (1)), which contains a
source location and a destination location, PPC firstly determines and retrieves a number of historical paths in cache,
called PPatterns, that may match this new query with high
probability (see Steps (2)-(4)).1 The idea of PPatterns is
based on an observation that similar starting and destination nodes of two queries may result in similar shortest
paths (known as the path coherence property [1]). In the component PPatern Detection, we propose a novel probabilistic
model to estimate the likelihood for a cached queried-path
to be useful for answering the new query by exploring their
geospatial characteristics. To facilitate quick detection of
PPatterns, instead of exhaustively scanning all the queriedpaths in cache, we design a grid-based index for the PPattern Detection module. Based on these detected PPatterns,
the Shortest Path Estimation module (see Steps (5)-(8)) constructs candidate paths for the new query and chooses the
best (shortest) one. In this component, if a PPattern perfectly
matches the query, we immediately return it to the user;
otherwise, the server is asked to compute the unmatched
1. The notion of PPatterns will be formally defined and further
explained in Section 3.2.

1041-4347 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

952

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING,

VOL. 28, NO. 4,

APRIL 2016

Fig. 1. Overview of the PPC system.

path segments between the PPattern and the query (see


Steps (6)-(7)). Because the unmatched segments are usually
only a smaller part of the original query, the server only processes a smaller subquery, with a reduced workload. Once
we return the estimated path to the user, the Cache Management module is triggered to determine which queried-paths
in cache should be evicted if the cache is full. An important
part of this module is a new cache replacement policy which
takes into account the unique characteristics of road networks. Through an empirical study, we find that common
road segments in various queried-paths usually have road
types of higher importance and capacity [2], [3].2 This
inspires us to define a usability value for each path by considering both of its road type and historical frequency of use.
The main contributions made in this work are summarized as follows:


We propose an innovative system, namely, path planning by caching, to efficiently answer a new path
planning query by using cached paths to avoid
undergoing a time-consuming shortest path computation. On average, we save up to 32 percent of time
in comparison with a conventional path planning
system (without using cache).
We introduce the notion of PPattern, i.e., a cached
path which shares segments with other paths. PPC
supports partial hits between PPatterns and a new
query. Our experiments indicate that partial hits constitute up to 92.14 percent of all cache hits on average.
A novel probabilistic model is proposed to detect the
cached paths that are of high probability to be a
PPattern for the new query based on the coherency
property of the road networks. Our experiments
indicate that these PPatterns save retrieval of path
nodes by 31.69 percent on average, representing a
10-fold improvement over the 3.04 percent saving
achieved by a complete hit.
We have developed a new cache replacement mechanism by considering the user preference among
roads of various types. A usability measure is
assigned for each query by addressing both the road
type and query popularity. The experimental results

2. Notice that major roads such as freeways and state routes typically have more capacity and importance than the branch roads and
local streets. For simplicity, in this paper, we refer to road types of
higher importance and capacity as higher road types.

show that our new cache replacement policy


increases the overall cache hit ratio by 25.02 percent
over the state-of-the-art cache replacement policies.
The rest of this paper is organized as follows. Section 2
reviews the related work. Section 3 analyzes the problem
and introduces the notion of PPattern. Section 4 proposes a
probabilistic model to detect PPatterns with an efficient
detection algorithm. Section 5 discusses how these PPatterns are used to estimate the path. Section 6 details cache
management issues and the proposed cache replacement
policy. Section 7 reports the evaluation result on a real data
set. Finally, Section 8 concludes this paper.

RELATED WORK

In this section, we review the related works in the research


lines of path planning, shortest path caching and cache
management, which are highly relevant to our study.

2.1 Path Planning


The Dijkstra algorithm [4], [5] has been widely used for path
planning [6] by computing the shortest distance between two
points on a road network. Many algorithms such as A [7],
ATL [8] have been established to improve its performance by
exploring geographical constraints as heuristics. Gutman [9]
propose a reach-based approach for computing the shortest
paths. An improved version [10] adds shortcut arcs to reduce
vertices from being visited and uses partial trees to reduce
the preprocessing time. This work further combines the benefits of the reach-based and ATL approaches to reduce the
number of vertex visits and the search space. The experiment
shows that the hybrid approach provides a superior result in
terms of reducing query processing time.
Jung and Pramanik [11] propose the HiTi graph model to
structure a large road network model. HiTi aims to reduce
the search space for the shortest path computation. While
HiTi achieves high performance on road weight updates
and reduces storage overheads, it incurs higher computation costs when computing the shortest paths than the
HEPV and the Hub Indexing methods [12], [13], [14]. To
compute time-dependent fast paths, Demiryurek et al. [15]
propose the B-TDFP algorithm by leveraging backward
searches to reduce the search space. It adopts an area-level
partition scheme which utilizes a road hierarchy to balance
each area. However, a user may prefer a route with
better driving experience to the shortest path. Thus,
Gonzalez et al. propose an adaptive fast path algorithm

ZHANG ET AL.: EFFICIENT CACHE-SUPPORTED PATH PLANNING ON ROADS

Fig. 2. An example to illustrate the calculation of path benefit value in


SPC. The popularity and expense values for each road segment are
listed in the table. For the path p1;3 containing three sub-paths, its benefit
value is calculated as 2  20:3 5  20:3 5  40:6 345:1.

which utilizes speed and driving patterns to improve the


quality of routes [16]. The algorithm uses a road hierarchical
partition and pre-computation to improve the performance
of the route computation. The small road upgrade is a novel
approach to improving the quality of the route computation.

2.2 Shortest Path Caching


Thomsen et al. [17] propose an efficient shortest path cache
(SPC). Based on the optimal subpath property [18], given a
source s and a destination t, the shortest path ps;t contains
the shortest path pk;j , where s  k; j  t, SPC computes a
benefit value to score a shortest path to determine whether
to preserve it in the cache. The benefit of a path is a summation of the benefit value of each sub-path in the shortest
path. The formula of a benefit value considers two features:
the popularity of a path and its expense. The popularity of a
path p is evaluated based on the number of occurrences
of the historical sub paths which overlap p. On the other
hand, the expense of a path represents the computational
time of the shortest path algorithm. Fig. 2 shows an example
to illustrate how to calculate the benefit value for a path
using SPC. In this example, a path p1;3 contains three subpaths p1;2 , p2;3 , and p1;3 . The popularity and expense values
for each path from node s to node t are listed in the table,
denoted as Xs;t and Es;t , separately. Accordingly, the benefit
value for path p1;3 is calculated as 2  20:3 5  20:3 5
40:6 345:1 using SPC.
Because SPC has to score each sub-path in the shortest
path, the time complexity is high. Assuming that a shortest
path contains n nodes, a shortest path contains n  n 
1=2 sub-paths. The time complexity for scoring a shortest
path is On2 . In the above study, each query is answered
independently. However, when queries in the current
request pool share similar properties, they may be processed as a group. Thus, Mahmud et al. [1] propose a groupbased approach to accelerate the processing by calculating
the similarity among a group of queries and send the common part as a query to the server. Therefore, only dissimilar
segments for each query are answered by the server individually. However, this work does not explore any cache
mechanism in the system.
2.3 Cache Management
Caching techniques have been employed to alleviate the
workload of web searches. Since cache size is limited, cache
replacement policies have been a subject of research. The
cache replacement policy aims to improve the hit ratio and
reduce access latency. Markatos et al. conduct experiments
to analyze classical cache replacement approaches on real

953

query logs from the EXCITE search engine [19]. Three important observations are described as follows. First, a small
number of queries are frequently re-used. By preserving
results of these queries in cache, the system is able to respond
to the users without incurring time-consuming computations. Second, while a larger cache size implies a higher hit
ratio, significant overheads may be incurred for cache maintenance. Third, static cache replacement has better performance when the cache size is small, and vice versa for
dynamic cache replacement. Static cache replacement [19],
[20], [21], [22], [23], [24] aims to preserve the results of the
most popular and frequent queries, thus incurring a very
low workload during query processing. However, the cache
content may not be up to date to respond to recent trends in
issued queries. Dynamic cache replacement [19], [25], [26], in
contrast to a static cache, preserves the results of the most
recent queries, but the system incurs an extra workload.
In order to improve the retrieval efficiency of the path
planning system, Thomsen et al. propose a new cache management policy [17] to cache the results of frequent queries
for reuse in the future. To enhance the hit ratio, a benefit
value function is used to score the paths from the query
logs. Consequently, the hit ratio is increased, hence reducing the execution times. However, the cost of constructing a
cache is high, since the system must calculate the benefit
values for all sub-paths in a full-path of query results. For
on-line, map-based applications, processing a large number
of simultaneous path queries is an important issue. In this
paper, we provide a new framework for reusing the previously cached query results as well as an effective algorithm
for improving the query evaluation on the server.

PRELIMINARIES

3.1 Symbols and Definitions


In this section, we first introduce the notations of symbols
used throughout the paper. Note that some symbols may be
used as prefixes further explained when used.
Definition 1 (Road network). A road network is represented as
a directed graph G V; E, where V fv1 ; v2 ; . . . ; vn g is a
set of nodes denoting road intersections and terminal nodes,
and E is a set of edges denoting road segments connecting two
nodes in V .
Definition 2 (Path, source node, destination node). A path
ps;t is a sequence of nodes fv1 ; v2 ; . . . ; vn g. The first node is
named as the source (node) vs v1 and the last node is named
as the destination (node) vt vn .
Definition 3 (Path planning query). A path planning query
qs;t , is specified by a source node vs and a destination node vt .
The system answers the query by returning a path from vs to
vt that satisfies the query criteria.
By default, an important criteria for a path planning
query is the distance, i.e., the minimal distance path is
returned. However, due to the requirement of timeliness,
the performance of a path planning service is evaluated in
terms of both distance and response time, i.e., the path computed by the server is acceptable if it is returned within a
tolerable time, with a tolerable distance deviate from the
true shortest distance path.

954

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING,

Symbols

Explanations

lat, lng
R
g
v

Latitude, longitude in geo-space.


A region in geo-space.
A grid cell.
A node in a road network, denoted by a coordinate
pair < lat; lng > .
The edge between two nodes in a road network.
Weight value.
A graph composed of node set V and edge set E.
Edge weight or node weight.
A path query q is specifically denoted as qs;t if with
source node s 2 V and destination node t 2 V .
A path p consists of a set of nodes in order. It is
specifically denoted as ps;t if it starts from node
s 2 V to node t 2 V .
The shortest distance path from node s to t, or the
shortest distance path for a path planning query q.
Path length.
PPattern.
A prefix indicating a distance value.
A function to calculate Euclidean distance between
node v1 and node v2 .
Length of the shortest path estimated for a query q
based on a path p.
Probability.
A probability threshold.
Unit step function.
Path utilization value.
A cache in memory, composed of a set of paths
C fpi g.
Cache size in terms of total contained nodes.
Cache budget in terms of total contained nodes.

e
w
GV; E
W
q; qs;t
p; ps;t
SDPs; t,
SDPq
jpj
PT
D
Dv1 ; v2
d^pq
p

u

m
C
jCj
b

Definition 4 (Cache). A cache C contains a collection of paths.


The cache size jCj is measured as the total number of nodes in
the cached paths. Note that jCj < b, where b is the maximal
cache size.
Definition 5 (Query logs). A query log is a collection of timestamped queries issued by users in the past.

3.2 Problem Analysis


The main goal in this work is to reduce the server workload
by leveraging the queried-paths in cache to answer a new
path planning query. An intuitive solution is to check
whether there exists a cached queried-path perfectly matching the new query. Here, a perfect match means that the
source and destination nodes of the new query are the same
as that of a queried-path in cache. Such a match is commonly known as a cache hit; otherwise, it is referred to as a
cache miss. When a cache hit occurs, the system retrieves the
corresponding path directly from the cache and returns it to
the user. As a result, the server workload is significantly
reduced. However, when there exists a cache miss, the system needs to request the server to compute the path for the
new query. In this paper, we aim to reduce the server workload when a conventional cache miss occurs by fully
leveraging the cached results.
We firstly analyze the potential factors influencing the
computational cost at the server, which can be measured by
the total number of nodes a routing algorithm visits [17].
We randomly generate 500 queries, compute the shortest
paths using the A algorithm upon a real road network

VOL. 28, NO. 4,

APRIL 2016

Fig. 3. The relationships between (a) shortest path distance and


number of visited nodes, and (b) number of nodes and computational
cost (response time).

dataset, and display (1) the relationship between the total


number of nodes A visits and the total number of nodes on
the shortest path (as illustrated in Fig. 3a) and (2) the relationship between the computational cost (response time)
and the total number of visited nodes (as illustrated in
Fig. 3b). These two figures show that the computational
cost of A increases exponentially as the length of the shortest path increases.
Additionally, we observe that the paths of two queries
are very likely to share segments. Thus, we argue that if one
is cached, the shared segments can be reused for the other
query such that only the unshared route segments need to
be computed by the server. As the unshared segment is
shorter than the original path, the computational overhead
may be significantly reduced (as indicated in Fig. 3). Based
on the observation, we introduce below some new notions
to be used in this work.
Definition 6 (PPattern, Head, Tail). Given a path, Path Pattern (PPattern for short) is another path which shares at least
two consecutive nodes. The first and last node of their shared
segments is named as head and tail, respectively. Specifically,
if the PPattern has k consecutive nodes where k  2, it is
termed as k-PPattern.
Fig. 4 shows an example for illustration of PPattern. As
shown, the paths ps;t and ps0 ;t0 are 2-PPattern to each other
as they share a common sub-path pa;b . Node a is the pattern
head and node b is the pattern tail.
Denoting the nth node on the path pi as vni , we use
PTpi ; pj to indicate whether path pj is a k-PPattern for
path pi in the following equation:

PTk pi ; pj

vnx
1 : 9m; n : 8x 2 0; k  1; vmx
i
j
0 : otherwise:

Based on the above definition, we may construct the shortest distance path of a query q from a starting node s0 to a
destination node t0 , denoted by SDPq or SDPs0 ; t0 , from
one of its cached PPatterns SDPs; t as follows:
SDPs0 ; t0 SDPs0 ; a SDPa; b SDPb; t0 ;

(1)

Fig. 4. Shortest path computation, ps;t is a 2-PPattern for path ps0 ;t0 as
they share a segment pa;b with each other.

ZHANG ET AL.: EFFICIENT CACHE-SUPPORTED PATH PLANNING ON ROADS

955

where a and b are head and tail of the common path segment in the cached PPattern ps;t of the query qs0 ;t0 , respectively and is the concatenation operation. Note that
Eq. (1) is a general form of the constructed path, where
SDPs0 ; a or SDPb; t0 may not necessarily exist, e.g., s0 a
or b t0 . The unshared segments SDPs0 ; a and SDPb; t0
are to be computed by the server. The length of the path can
thus be recomputed as follows:

path if its error is within a certain tolerable range, exhaustive inspection cannot not be sure that the path with the
minimal error is found until all paths are inspected. To
address these challenges, we aim to narrow down the
inspection scope to only good candidates.

jSDPs0 ; t0 j jSDPs0 ; aj jSDPa; bj jSDPb; t0 j: (2)


For a given path p, it is feasible to find all its PPatterns by
checking whether there exists any cached path sharing
segments with p. However, for a path planning query, the
system only knows the endpoints of an input query rather
than the returned path itself (which is however exactly
what we want to obtain). Thus, the query is reduced to
finding which cached queries are more likely to have
matched partial path segments with the path query. By
assuming each cached path is a PPattern, using Eq. (2),
the length of a path constructed from a cached path p
may be estimated as follows:
d^ps0 ;t0

min

a;b2Vp ;a6b

jSDPs0 ; aj jSDPa; bj jSDPb; t0 j; (3)

where Vp is the vertex set of path p. Therefore, the estimation


error is represented as follows:
e d^ps0 ;t0  jSDPs0 ; t0 j:

(4)

In Eq. (4), the factor SDPs0 ; t0 is fixed for any cached path
p 2 C. Consequently, the estimation errors for the same
query corresponding to different cached paths are proportional to the lengths of their own constructed paths:
e / d^ps0 ;t0 :

(5)

In conclusion, this work aims to find the path in cache with


minimal estimation distance for the path planning query:
p arg min d^ps0 ;t0 ;

(6)

p2C

where C is a cache maintaining queried-paths.


Next, we propose a novel probabilistic model to estimate
the likelihood for a cached path to be a PPattern, along with
a practical and efficient implementation to detect the mostpotential PPatterns.

PPATTERN DETECTION

To detect the best PPatterns, an idea is to calculate the estimation distance based on each cached path by Eq. (5), and
select the cached path with the shortest distance. However,
it faces several challenges. Firstly, the distance estimation in
Eq. (3) requires the server to compute the unshared segments (i.e., SDPs0 ; a; SDPb; t0 ). Therefore, it incurs significant computation to exhaustively examine all cached paths.
Secondly, such an exhaustive operation implicitly assumes
that each cached path is a PPattern candidate to the query.
However, this is not always true. For example, a path in
Manhattan does not contribute to a query in London.
While we assume that users may accept an approximate

4.1 Probabilistic Model for PPattern Detection


The coherency property of road networks indicates that two
paths are very likely to share segments while source nodes
(and their destination nodes, respectively) are close to each
other [27]. This property has been used in many applications for various purposes, e.g., efficient trajectory lookups
as the common segments among multiple paths are queried
only once [1]. Notice that this property is mainly attributed
to the locality of the path source and destination nodes. We
argue that, for two queries, if they satisfy certain spatial constraints, their shortest distance paths are very likely to be
the PPatterns to the other.
Several existing studies have proposed algorithms to
group paths with similar trajectories together [27]. In these
studies, paths within a cluster can be taken as the PPatterns
to each other. Given a new query, the system checks
whether it fits into an existing cluster and directly returns
the shortest path if there exists at least one path in that cluster. However, all these studies require a complete knowledge graph computed from the basic road network data,
which incurs a heavy workload and distracts from our goal.
Differing from the existing studies, we propose a method to
detect the potential PPatterns for an input query using only
existing paths in cache.
In summary, the coherency property indicates that two
queries are more likely to share sub-paths if they meet the
following three spatial constraints concurrently: (1) the
source nodes of the two queries are close to each other;
(2) the destination nodes of the two queries are close to each
other; and (3) the source node is distant from the destination
node for both queries. Formally, we denote pqs0 ;t0 ; qs;t as
the probability for two queries qs0 ;t0 and qs;t to be PPatterns
of each other. Thus, constraints (1) and (2) indicate
1
1
pqs0 ;t0 ; qs;t / Ds;s
0 and pqs0 ;t0 ; qs;t / Dt;t0 , respectively. On
the other hand, constraint (3) implies pqs0 ;t0 ; qs;t / Ds0 ;
t0  Ds; t. Thereafter, the final probability can be computed as the product of these three terms. As we would like
the three factors to achieve sufficient satisfaction concurrently, the probability will only be computed if each factor
is over a threshold as expressed by Eq. (7):
Y
wx
ufx qs0 ;t0 ; qs;t  Dx ;
(7)
pqs0 ;t0 ; qs;t
x2fs;t;lg
1
1
where fs qs0 ;t0 ; qs;t Ds;s
0 , ft qs0 ;t0 ; qs;t Dt;t0 and fl qs0 ;t0 ;
qs;t Ds0 ; t0  Ds; t respectively indicate the sourcesource, destination-destination and source-destination node
distance factors between the two queries of qs0 ;t0 and qs;t ; wx
is a weight indicating the contribution from each factor fx ;
Dx is a threshold controling the validation scope for fx ; and
u
is a shifted unit step function:

1 : x  Dx
(8)
ux  Dx
0 : x < Dx :

956

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING,

VOL. 28, NO. 4,

APRIL 2016

Fig. 5. Correlation between source-destination distance and the shortest


path distance.

When the score pqs0 ;t0 ; qs;t is over a threshold, it is very


likely that the two paths will share a path segment. In
PPatterns detection, given a query, we check every queriedpath in cache and select the ones with top scores as PPattern
candidates:

PT qs0 ;t0 ; qs;t

1 : pqs0 ;t0 ; qs;t > 


0 : otherwise;

(9)

where  is a probability threshold. Note that in the above, all


distances are Euclidean distances.
Based on the above, we select the highest ranked
PPatterns for a new query. In the following, we prove that
the estimation error is upper-bounded to a value
2aDs Dt where Ds and Dt are source-source and destination-destination distance thresholds and a is a parameter
approximating the relation between the shortest path distance of two points on a road network and their true Euclidean distance, i.e., SPDa; b aDa; b. Factor a can be
estimated through an empirical study on the road network. For example, Fig. 5 shows the correlation between
the shortest path distance and the Euclidean distance of
endpoints for 5k queries on a real road network database.
To validate the estimation bound, Fig. 6 summarizes three
scenarios where a new query qs;t is estimated from a pattern candidate qs0 ;t0 . In the first scenario (see Fig. 6a), there
exists at least one common segment between the paths of
the two queries. In the other two scenarios (shown in
Figs. 6b and 6c), there exist no common segments, but the
two queries are similar to each other. We calculate the
maximal estimation error for each scenario as follows.
Proof. According to Eq. (4) and the assumption SPDa; b
aDa; b, the estimation distance error is represented as:
e aDs0 ; s Dt; t0 Ds; t  Ds0 ; t0 :

(10)

In Scenario 6(a),
e aDs0 ; s Dt; t0 Ds; a Db; t
 Ds0 ; a  Db; t0 :

(11)

By triangle inequality Dx; y  Dx; z  Dy; z, the


above equation is converted as follows:
e  aDs0 ; s Dt; t0 Ds; s0 Dt; t0 :

(12)

Fig. 6. Three scenarios for error estimation of qs0 ; t0 from ps; t.

In Scenario 6(b),
e aDs0 ; s Dt; t0 Ds; o Do; t
 Ds0 ; o  Do; t0
 aDs0 ; s Dt; t0 Ds; s0 Dt; t0 :

(13)

In Scenario 6(c), assume Ds; s0 Dt; o and Ds; t


Ds0 ; o, so Eq. (10) is written as:
e aDs0 ; s Dt; t0 Ds0 ; o  Ds0 ; t0
 aDs0 ; s Dt; t0 Do; t0
 aDs0 ; s Dt; t0 Dt; o Dt; t0
aDs0 ; s Dt; t0 Ds; s0 Dt; t0 :

(14)

From Eqs. (12), (13), (14), the maximal error is


e  2aDs0 ; s Dt; t0 :
As we require the endpoint nodes of two paths to be
close within distance Dx , i.e., Dx0 ; x < Dx , we have
e  2aDs Dt :
In other words, the error is upper bounded by
u
t
2aDs Dt .

4.2 An Efficient Grid-Based Solution


In order to retrieve these patterns efficiently, we propose a
grid-based solution to further improve the system performance. The main idea is to divide the whole space into
equally sized grid cells, where the endpoints of all paths are
mapped to the grid cells. As such, the grid index facilitates
efficient cache lookups [28], [29]. The distance measures can
be approximated by counting the total number of covered
grids. Therefore, Eq. (9) is transformed as follows:
8
< 1 : gs gs0 ; gt gt0
Ds; t  Dl
PT qs0 ;t0 ; qs;t
:
0 : otherwise;
where g
is the grid cell in which a node is located. Algorithm 1 lists the pseudo code for detecting all PPatterns. We
first check whether the distance between the source and the
destination nodes is at least Dl (Lines 1 to 3). If the query
fails the inspection, i.e., If the distance between the source
and the destination is very small, the server computes the
path directly to answer the query, as it may take a longer

ZHANG ET AL.: EFFICIENT CACHE-SUPPORTED PATH PLANNING ON ROADS

957

time for cache lookups and estimation. After that, the target
space is divided into equally sized grid cells by an input cell
size (Line 4). Then we locate the grid cells (an index number, we will discuss the cache index in Section 6.1) where
the query source and destination nodes are from (Line 5).
The system retrieves all queries overlapping both the source
and destination grid cells (Lines 6 to 7) and inserts them
into Q (Line 8), a set containing the candidate queries. This
process utilizes the grid index and path identifier to reduce
the lookup operations. The system returns the cached paths
in Q according to the query identifiers (Lines 9 to 10).

Algorithm 1. PPatterns Detection


Input: qs;t : a query; Dl : distance threshold; Dg : grid cell size,
C: a cache.
Output: All candidate PPatterns PT .
1: if Ds; t < Dl then
2: Return PT ;.
3: end if
4: Divide the target space by size Dg .
5: Determine the start grid gs and destination grid gt .
6: Qs
Logged queries whose paths pass gs .
7: Qt
Logged queries whose paths pass gt .
8: Q IntersectQs ; Qt .
9: PT
(Sub)paths from Gs to Gt for each query in Q.
10: Return PT .

CACHE-SUPPORTED SHORTEST PATH


ESTIMATION

Based on the PPatterns detected above, we estimate the


shortest path for a new query using Eq. (6). Note that the
detected PPatterns contribute to at least a part of the answer
path returned to the users and actually increases the cache
utilization. To facilitate our discussion, we refine the concept of a traditional cache hit as follows:
Definition 7 (Complete hit). Given a path planning query (a
source-destination node pair), a complete hit occurs if there
exists at least one queried-path in cache with source-destination node pair matching exactly with that of the input query.
Definition 8 (Partial hit). Given a path planning query, a partial hit occurs if there exists at least one queried-path in cache
detected as a PPattern of the input query which however is not
a complete hit.
Extended from the grid-based index discussed previously, Fig. 7 summarizes four scenarios for estimating cache
hits. In Case 1, there exists a complete cache hit so the
cached path is directly returned to the user without contacting the server. In Cases 2, 3, and 4, there exist partial cache
hits. The cached paths (indicated by the solid line) whose
source and destination nodes are located in the source grid
cell and destination grid cell, where vs and vt of the new
query are located, can be immediately obtained without
contacting the server. Only the uncovered segments
(dashed line) need to be computed by the server.
To detect the estimated shortest path, we propose a heuristic algorithm as shown in Algorithm 2. If there exists no
PPattern, the system contacts the server to compute the
shortest path and returns it immediately (Lines 1 to 3). If

Fig. 7. Illustration of the four cases in PPC.

there exists a complete cache hit, the corresponding path is


returned immediately (Lines 6 to 8). Otherwise, the system
calculates the estimated distance from each pattern candidate in PT for the query and selects the one with minimal
distance. To improve the performance, we adopt an approximation distance (Lines 9 to 19) by calculating the Euclidean
distance between the source-source (Lines 9 to 11) and
destination-destination nodes (Lines 12 to 13) rather than
the true shortest path distance on the road network. Only
the best PPattern is sent to the server for true shortest path
computation (Lines 21-26). Finally, the system combines the
newlycalculated source-source and/or destination-destination shortest path(s) and cached segment as a complete
path, which is finally returned to the user (Line 27).

CACHE MANAGEMENT

In a cache-supported system, it is important to efficiently


manage the cache contents to accelerate the path planning.
Therefore, in this section, we first discuss the implementation of a grid-based index, followed by describing a
dynamic cache update and replacement policy.

6.1 Index Structure for Cache Lookups


The grid index is primarily used to improve the I/O accessing time of the road network topology to support the path
selection operation in the path planning service. The basic
idea is to divide the spatial region uniformly into grid cells.
The cache retains two tables for an efficient cache lookup.
The first table records each grid-cell through which paths
have passed. This table allows quick identification of potential
PPatterns for the new query. The second table records all
nodes of each path in their traveling order. This table is
accessed when the final path for the new query is determined.
We firstly determine the target space as the minimal
bounded rectangular region where all cached paths have
overlapped. Then this space is divided into a set of equally
sized grid-cells. For each grid-cell, we maintain an entry in
the first table recording the IDs of all pass-by paths:
GridID:(pathID),...,(pathID).

958

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING,

VOL. 28, NO. 4,

APRIL 2016

TABLE 1
An Example of Cached Paths
path

cpid
1
2
..
.
n

Fig. 8. An example of a cache using a grid structure.

This means at least one node on this path overlaps this


grid cell. Fig. 8 shows an example for illustration of the
grid-based index in which the target space is divided into
25 grid-cells, each with a unique ID numbered from 1 to 25.
There are two paths denoted as cpid 2 and 7 in this region.
As the starting node of these two paths resides in grid cell
g5 , we retain a list for g5 as h2; 7i. When a path is inserted
into or removed from the cache, this list is updated accordingly. The grid-based index improves the search efficiency
for cache lookup operations. When we receive a new query
qs;t for which the source node vs is located in cell 5 and the
destination node vt is located in cell 13, the system directly
searches the grid index to find the path with cpid 2 that
has the same source and destination nodes.

Algorithm 2. Shortest Path Estimation


Input: query source node s0 and destination node t0 ; all candidate PPatterns PT ; Cache C.
Output: Estimated shortest path p^ .
1: if isEmptyPT ) then
2: p^
Calculate path from server and return.
3: end if
4: Initialize Estimated Shortest Distance ESD 1.
5: for each path p 2 PT do
6: if p is a complete hit then
7:
Return p^s0 ;t0 p.
8: end if
9: s arg mins2Vp Ds0 ; s.
10: ds Ds0 ; s .
11: Remove s from path node-set Vp .
12: t arg mint2Vp Dt; t0 .
13: dt Dt ; t0 .
14: Let dr jSDPs0 ; t0 j.
15: d^ ds dr dt .
16: if d^ < ESD then
^
17:
ESD d.
18:
Update best PPattern p ps ;t .
19: end if
20: end for

21: if s0 is not equal to vps then


22: SDPs0 ; vps Compute shortest path SDPs0 ; vps .
23: end if

24: if t0 is not equal to vpt then


25: SDPvpt ; s0 Compute shortest path SDPvpt ; t0 .
26: end if


27: Return p^ SDPs0 ; vps p SDPvpt ; t0 .

For each cached path, all its nodes are maintained in the
second table with a sample profile shown as follows. Note
that these nodes are listed strictly in their traveling order

p3;6 fv3 ; v5 ; v10 ; v6 g


p23;45 fv23 ; v45 ; v68 ; v79 ; v45 g
..
.

p13;41 fv13 ; v64 ; v41 g

pathID:(nodeID),. . .,(nodeID).
A cache C stores result paths ps;t returned by queries qs;t ,
and each result path consists of a node set (e.g.,
ps;t fv0 ; v1 ; :::; vm g). As shown in Table 1, each result path
is stored in the cache and is identified by a cache id cpid
(e.g., c1 to represent p3;6 ).

6.2 Cache Construction and Update


Due to the limited cache size, it is necessary to determine
which queried-paths should be evicted when the cache is
full. In this section, we propose a new cache replacement
policy by exploring unique characteristics of road networks.
In road networks, notice that certain routes are usually
preferred by users [2]. Empirically, these common road segments are usually of higher road type, because major roads
are more frequently taken than the branch roads due to their
functions and capacities [30]. For example, the interstate
highway between San Jose and San Francisco is frequently
taken when traveling between these two cities. As a result,
interstate highways are assigned higher road type values.
Based on this observation, we define edge and node
weights as follows.
Definition 9 (Edge and node weights). In a road network
G V; E, each edge from node vi to vj is associated with a
weight Wi;j which is a system determined value according to
the road type. Higher Wi;j indicates that the corresponding
road type is higher. Additionally, we use Wi to denote the
weight of a node vi whose value is determined by the maximum
weight of all its outgoing edges as follows:
Wi max Wi;j :
j

(15)

We define the node weight as the maximal of all its outgoing edges as users can select one route from an intersection. We assume that users prefer higher road types [3],
which can be determined based on capacity, popularity, and
accessibility. In this work, we adopt ten road types defined
by Open Street Map and allocate a linearly increased weight
to each type, starting from 0.01 for unclassified roads till 1.0
for motorways. Fig. 9 shows an example of node and edge
weights corresponding to road types. Three edges are connected with node v1 . The weights for the three edges are
W1;2 0:4, W1;3 0:5, and W1;4 0:7, respectively. Based on
Eq. (15), the weight of W1 is set to W1;4 0:7. Table 2 shows
several examples of node weights and their primary road
types in a real road network in our experiment. The weight
of a node indicates how likely a path planning query with it
as the source node to be issued in the future. Such information can be used to design the cache replacement policy. In

ZHANG ET AL.: EFFICIENT CACHE-SUPPORTED PATH PLANNING ON ROADS

959

TABLE 3
Two Stored Paths in the Cache
cpid
1
2

Fig. 9. An example of edge and node weights.

this paper, we define the usability m of a path to measure the


importance of a path as below
mpi;j

n
X

Wvk  countvk ;

(16)

k1

where vk is a node on the path pi;j and countvk returns the


number of times that node vk has been queried in the past.
We use an example to illustrate the calculation of the usability value m for a path. Assume there exist two result paths
in the cache with their containing nodes shown in Table 3.
For path p3;6 , we count the occurrences for each of its containing nodes and multiply them each with their node
weights. Thus, its utilization value is calculated as
mp3;6 0:8  1 1  2 0  2 2:8. Similarly, the utilization value for path p2;4 is mp2;4 0:8  1 1  2 0 
2 0:5  1 3:3. A cached path with a lower usability
value is more likely to be evicted if the cache is full.

Algorithm 3. Cache Construction and Update


Input: a query q, a cache C.
Output: a cache C.
1: PT
PPatterns Detection.
2: p
Shortest Path Estimation from PT .
3: if C is not full then
4: Insert p into C; Return C.
5: else
6: fmg
Calculate usability for each cached path.
7: p
Path with the minimum usability.
8: if p :m < p:m then
9:
C
Replace p with p.
10: end if
11: end if
12: Return C.

Accordingly, we gradually update (construct) the cache


when new queries are inserted. Algorithm 3 provides
the pseudo code for this operation. Based on the current
cache (empty at the beginning), when a new query comes in,

v1
v2
v3
v4
v5
v6

p3;6 fv3 ; v1 ; v6 g
p2;4 fv2 ; v6 ; v1 ; v4 g

2.8
3.3

we estimate its path if any PPattern exists (Line 5); otherwise


the path is retrieved from the server (Line 3). If the cache C is
not full, the path p is inserted into the cache (Lines 7-9).
Otherwise, a replacement is triggered (Lines 10-14). We
check whether the usability value of the current path p is
larger than the minimum usability value in the current cache.
If so, we place the current query into the cache.

EXPERIMENTS

7.1 Dataset
We conduct a comprehensive performance evaluation of the
proposed PPC system using the road network dataset of
Seattle obtained from ACM SIGSPATIAL Cup 2012 [31].
The dataset has 25,604 nodes and 74,276 edges. For the
query log, we obtain the Points-of-interest (POIs) in Seattle
from [32]. Next, we randomly select pairs of nodes from
these POIs as the source and destination nodes for path
planning queries. Four sets of query logs with different
distributions are used in the experiments: QLnormal and
QLuniform are query logs with normal and uniform distributions, respectively. QLcentral is used to simulate a large-scale
event (e.g., the Olympics or the World Cup) held in a city.
QLdirection is to simulate possible driving behavior (e.g.,
changing direction) based on a random walk method
described as follows. We firstly randomly generate a query
to be the initial navigational route. Next, we randomly
draw a probability to determine the chance for a driver to
change direction. The point of direction change is treated as
a new source. This process is repeated until the anticipated
number of queries are generated. The parameters used in
our experiments are shown in Table 4.
7.2 Cache-Supported System Performance
7.2.1 Cache versus Non-Cache
The main idea of a cache-supported system is to leverage
the cached query results to answer a new query. Thus, we
are interested in finding how much improvement our path
planning system achieves over a conventional non-cache
system. We generate query sets of various sizes to compare
the paths generated by our PPC and A algorithm. The performance is evaluated by two metrics: a) Total number of
visited nodes: it counts the number of nodes visited by an

TABLE 2
Road Type and Weight
Node

path

TABLE 4
Experimental Parameters

Weight Wi

Road type

1.0
0.78
0.78
0.56
0.34
0.01

Motorway
Primary
Primary
Tertiary
Living street
Unclassified

Parameter

Default

Value

Grid size
Cache size
#Quries
Data sets

2 km
5k
5k
QLnormal

0.5 5 km
1k 10k
0.5k 5k
QLnormal , QLuniform ,
QLcentral , QLdirection

960

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING,

VOL. 28, NO. 4,

APRIL 2016

TABLE 5
Performance Comparison between PPC and
the Non-Cache Algorithm
#Nodes
#Query
1k
2k
3k
4k
5k

Time (ms)

PPC

A

PPC

A

80,087
157,162
230,185
328,879
437,362

107,856
215,459
319,231
419,345
501,312

14,190,670
27,869,212
41,493,092
55,139,411
69,843,232

19,973,996
39,889,166
59,983,844
79,937,684
100,037,461

algorithm under comparison in computing a path, and b)


Total query time: it is the total time an algorithm takes to
compute the path. By default, we apply 3,000 randomly
generated queries to warm up the cache before proceeding
to measure experimental results. Table 5 summarizes the
statistics of the above two metrics with five different sized
query sets. From the statistics we find that our cache-supported algorithm greatly reduces both the total visited
nodes and the total query time. On average, PPC saves 23
percent of visiting nodes and 30.22 percent of response time
compared with a non-cache system.

7.2.2 Cache with Different Mechanisms


Performance comparison. We further compare the performance of our system (PPC) with three other cachesupported systems (LRU, LFU, SPC ) which adopt various
cache replacement policies or cache lookup policies. The first
two algorithms detect conventional (complete) cache hits
when a new query is inserted, but update the cache contents
using either the Latest Recent Used algorithm (denoted
as LRU) or the Least-Frequently Used replacement policies
(LFU), respectively. The third compared algorithm, namely,
the shortest-path-cache (SPC ), is a state-of-the-art cachesupported system specifically designed for path planning as
PPC is. SPC also detects if any historical queries in the cache
match the new query perfectly, but it considers all sub-paths
in historical query paths as historical queries as well. We
compare these four cache mechanisms by converting the two
metrics, number of visited nodes and response time, to saving ratios against non-cache system for better presentation
(17)
dnode nodescache =nodesnoncache  100%;
dtime timecache =timenoncache  100%:

Fig. 10. Performance comparison with four cache mechanisms in terms


of (a) visited node saving ratio, and (b) query time saving ratio with different numbers of queries.

As shown, our algorithm outperforms the other cache


mechanisms in path planning significantly.
Performance analysis. In a cache-supported system, if the
cached results can be (partially) reused, the server workload
can be alleviated. Thus, we measure the hit ratio as follows:
dhit hitscache =jQj  100%;

where hitscache is the total number of hits and jQj is the total
number of queries. The hit ratio results using different cache
mechanisms are compared in Fig. 11, from which we find
that, as expected, PPC achieves a much higher hit ratio than
the other three methods in all scenarios. We further analyze
the correlation between the hit ratio and the performance
metrics in terms of visited node saving ratio and query time
saving ratio with the results shown in Fig. 12. From the figures, we make the following observations:


(18)

Visited node saving ratio and Query time saving ratio indicate
how many nodes and how much time an algorithm can
save from a non-cache routing algorithm (e.g., A ), respectively. A larger value indicates better performance. In the
experiment, we increase the total query number from 1k to
5k and calculate the above two metrics using each cache
mechanism with the results shown in Fig. 10.
The x-axis represents the total number of queries while
the y-axis indicates the metric values in percentages. From
these figures, we can see clearly that our cache policy
always achieves the best performance among all measurements. On average, LFU, LRU, SPC* and PPC visit 30.47
26.86 27.78 and 34.73 percent fewer nodes than A* algorithm, and reduces the computational time from A* algorithm by 29.83, 26.32, 27.04 and 32.09 percent, respectively.

(19)

The visited node saving ratio and query time saving


ratio are proportional to the hit ratio. Generally, with
a higher hit ratio, the system performance improves
as well. This is reasonable as the system does not
need to recompute the paths by analyzing the original road network graph, but retrieves the results
directly from the cache when a cache hit occurs.
However, saving ratios for visited nodes are not the
same as for the query time. For example, PPC visits
around 50 to 60 percent fewer nodes, but the
response time saved is around 30 to 40 percent. A
possible reason is that different nodes play different
roles in the roadmap. When a query occurs at subgraphs with more complex structures, the routing
usually takes longer as its computation may have
more constraints than other nodes.
The inconsistency above is particularly obvious in
PPC, probably because PPC leverages partial hits to
answer a new query. However, the remaining segments still need the computation from the road

Fig. 11. Performance comparison with four cache replacement mechanisms in terms of hit ratio.

ZHANG ET AL.: EFFICIENT CACHE-SUPPORTED PATH PLANNING ON ROADS

961

Fig. 12. Correlation between (a) hit ratio and visited node saving ratio,
and (b) hit ratio and query time saving ratio.

network graph, i.e., PPC does not always save subpaths if they require complex computations.
Because PPC leverages both complete and partial hit
queries to answer a new query, we additionally measure
the saving node ratio for partial hits. We ran an experiment
with 5k queries, and have plotted the results in Fig. 13.
From the figure we can see that partial hits appear evenly
along the temporal dimension. On average it achieves a
97.63 percent saving ratio, which is quite close to the complete hit saving ratio (100 percent). Among all cache hits,
the take-up percentages of complete and partial hits are
illustrated in Fig. 14a and their saving node percentage is
shown in Fig. 14b. The x-axis is the query size and the y-axis
is the percentage values. Notice that partial hit does not
achieve a 100 percent saving node ratio. However, as partial
hits occur much more frequently than complete hits, its
overall benefit to the system performance outweighs that of
the complete hits. On average, partial hits take up to 92.14
percent of the whole cache hit. The average saving node
ratio is 31.67 percent by partial hits, 10 times as much as
that from complete hits at 3.04 percent.

7.2.3 Cache Construction Time


Because both SPC* and PPC are cache-supported systems
for path planning, we additionally compare their cache construction time. SPC* is designed as a static cache, i.e., the
cache updates after a pre-determined number of queries
has accumulated, so the cache is constructed periodically.
PPC is designed as a dynamic cache, i.e., the cache is
updated whenever a new query is inserted, i.e., the cache is
constructed gradually over time. Therefore, for a fair comparison, we apply our algorithm to a batch of consecutively
inserted queries and calculate their total cache update time
(Note that the routing time has been removed for both

Fig. 14. Comparison between partial hits and complete hits using PPC
with different query sets in terms of (a) hit ratio and (b) visited node
saving ratio.

systems). The comparison result is illustrated in Table 6.


From the statistics in this table, we can see that our algorithm significantly reduces the construction time to 0.01 percent of SPC on average. Such significant improvement may
be due to the following reason. Let the total size of the log
files be n nodes. The time complexity for computing usability values for the paths in the cache is On. However, SPC*
needs to compute the usability values for all possible paths,
resulting in a time complexity of On2 .

7.2.4 Querylog Distributions


Previous experiments are conducted on queries generated
with a normal distribution. However, in a realistic scenarios, the queries may appear in different distribution. To
investigate whether our algorithm can robustly achieve satisfactory results under different distributions, we generate
three more query sets by uniform, directional and central
distributions (denoted as QLuniform , QLcentral and
QLdirectional , respectively). Under a uniform distribution,
each query on the road network appears with equal probability. Central distribution appears when there exists a
large-scale event. Directional distribution is used to formulate the driving behavior in which a driver may continuously change directions. Experiments are carried out to
measure both saving node ratio and saving response time
ratio by the LFU, LRU, SPC*, and PPC algorithms. All other
TABLE 6
Cache Update Time Comparison between PPC and
SPC with Different Numbers of Queries (Unit: ms)
Number of query logs

Fig. 13. Visited node saving ratio distribution among partial hit queries
with #Q 5K.

1K
2K
3K
4K
5K

SPC

PPC

497,359
1,006,822
1,529,832
2,034,401
2,568,893

51
102
157
213
263

962

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING,

VOL. 28, NO. 4,

APRIL 2016

Fig. 16. PPC performance analysis: effect of grid size. (a) Visited node
saving ratio and query time saving ratio, and (b) average deviation
percentage.

Fig. 15. Performance comparison among four cache mechanisms with various data distributions for query logs in terms of (a,c,e) visited node saving
ratio and (b,d,f) query time saving ratio with different numbers of queries.

parameters use the default values. The results for each distribution are shown in Fig. 15. As shown, PPC always
achieves the highest score in all scenarios.
A statistical analysis of these metrics is summarized in
Table 7. In order to measure the performance improvement,
we calculate an improvement factor over the second best
method, denoted by D, as follows:
D dPPC =maxfdLRU ; dLFU ; dSPC  g:

(20)

From these results, we make the following observations:


PPC always outperforms LFU, LRU and SPC under
all query distributions. In both uniform and directional scenarios, SPC works slightly better than LFU

TABLE 7
Average Performance Using Four Cache Mechanisms
for Different Query Distributions
Metric
dnode

dtime

dhit

Distribution

PPC

LRU

LFU

SPC*

Uniform
Central
Directional
Uniform
Central
Directional
Uniform
Central
Directional

20.79
45.89
31.96
20.70
45.67
33.62
18.41
46.45
31.12

10.31
35.32
0.74
10.27
35.57
0.96
10.48
35.14
0.81

10.69
31.47
0.63
10.73
31.61
0.85
10.71
31.36
0.71

11.01
28.36
1.13
11.03
27.65
1.47
11.47
28.13
1.19

1.89
1.30
28.16
1.88
1.28
22.84
1.61
1.32
26.17

and LRU. However, under the central distribution,


SPC performs the worst.
PPC receives the highest saving ratio (e.g.,
dtime 45:67%) under the central distributed queries,
but achieves the best performance improvement
(e.g., Dtime 22:84) from the second best method in
directional distributions.
In directional distribution, we observe a low saving
ratio below 2 percent for LFU, LRU and SPC*. This
happens because cache-supported systems introduce
an additional cache lookup overhead. When the hit
ratio is very low (e.g., 1.19 percent for SPC*), the
average response time quickly increases due to the
frequent requests to the server. Following a directional distribution, queries are very likely to be a
subpart of next queries. Such characteristics fit well
with the PPC model but not with the other models.

7.3 Parameter Analysis


PPC successfully reduces the system workload by making
full use of the cached results to estimate the shortest path
for the new query within a tolerable distance difference. So
together with the benefit in terms of visited node saving
ratio and query time saving ratio, PPC introduces a service
cost due to this distance difference as well. This cost can be
measured by the average deviation percentage [1] using
Eq. (21), where a smaller value implies a smaller cost:
p j  jSDPqj=SDPq  100%:
ddist j^

(21)

What follows is a discussion of how the system benefit and


the cost are affected by various system parameters such as
grid-cell size, cache size, source-destination minimal length
and temporal factor.

7.3.1 Effect of Grid-Cell Size


PPC adopts a grid-based solution to detect the potential
PPatterns for a new query, so the size of the grid-cell
directly impacts the hit ratio and the system performance.
We examine the system performance in terms of both benefit and cost by varying grid-cell sizes from 1 to 5 km. The
results are shown in Figs. 16a and 16b, respectively. The xaxis indicates the grid size and the y-axis indicates the metric values as a percentage.
From these figures, we find that the system obtains
higher visited node saving ratios and query time saving
ratios as the grid size increases. However, the average deviation percentage increases at the same time. By increasing

ZHANG ET AL.: EFFICIENT CACHE-SUPPORTED PATH PLANNING ON ROADS

963

Fig. 17. PPC performance analysis: effect of cache size. (a) Visited node
saving ratio and query time saving ratio, and (b) average deviation
percentage.

the grid size, more cached paths are retrieved as a cache hit
which thus prevent sending a complete new query to the
server. However, it may retrieve paths that are less relevant
so the average deviation percentage increases as well. In a
real system, by adjusting the grid-cell size, we can keep a satisfactory balance between the benefit and cost empirically.

7.3.2 Effect of Cache Size


The size of a cache directly determines the maximal number
of paths a system can maintain. In this section, we measure
the system performance by varying cache sizes with the
results shown in Fig. 17. The x-axis is the cache size in terms
of the total number of nodes a system can save. We vary it
from 1k to 10k nodes with an incremental step of 1k. The yaxis indicates the metric values.
We observe that as cache size increases, the system saves
more visited nodes and query time, but with a larger deviation percentage. This is because a bigger cache can potentially maintain more paths and thus increases the
opportunity of a cache hit. However it may also introduce a
less relevant path. We can choose a proper cache size to
avoid unsatisfactory deviation while still saving query time.
7.3.3 Effect of Source-Destination Minimal Length
The minimal source and destination node distance (i.e.,
Dl js0 ; t0 j) in the PPatterns detection algorithm (Algorithm 1) is another tunable factor. Fig. 18 illustrates the system performance with different distance values. By
increasing this distance threshold, the deviation percentage reduces as expected because the coherency property
indicates that queries are more likely to share subpaths
when the source node is distant from the end node. However, the visited node saving ratio and query time saving
ratio decrease because it takes more space to store a longer path. Therefore, the total number of paths retained in

Fig. 19. PPC performance analysis: effect of temporal factor. (a) Visited node saving ratio and query time saving ratio, and (b) average
deviation percentage.

cache reduces, with the probability of a cache hit consequently decreasing.

7.3.4 Effect of the Temporal Factor


Lastly, we investigate the system performance as time
passes. We consecutively insert into the system 50 query
groups, each with 100 queries, and observe the average saving ratio and deviation percentage for each group. The statistics are illustrated in Fig. 19. The x-axis indicates the
group ID from 1 to 50 and the y-axis indicates the value of
different evaluation metrics. As shown, these three ratios
are continuously changing as time passes. However, such
variation remains steady, which implies that our system
robustly and efficiently plans the path for a new query.

CONCLUSION

In this paper, we propose a system, namely, Path Planning


by Caching, to answer a new path planning query with rapid
response by efficiently caching and reusing the historical
queried-paths. Unlike the conventional cache-based path
planning systems, where a queried-path in cache is used
only when it matches perfectly with the new query, PPC
leverages the partially matched cached queries to answer
part(s) of a new query. As a result, the server only needs to
compute the unmatched segments, thus significantly reducing the overall system workload. Comprehensive experimentation on a real road network database shows that our
system outperforms the state-of-the-art path planning techniques by reducing 32 percent of the computational latency
on average.

REFERENCES
[1]
[2]
Fig. 18. PPC performance analysis: effect of minimal source-destination
distance. (a) Visited node saving ratio and query time saving ratio, and
(b) average deviation percentage.

H. Mahmud, A. M. Amin, M. E. Ali, and T. Hashem, Shared execution of path queries on road networks, Clinical Orthopaedics
Related Res., vol. abs/1210.6746, 2012.
L. Zammit, M. Attard, and K. Scerri, Bayesian hierarchical
modelling of traffic flow - With application to Maltas road
network, in Proc. Int. IEEE Conf. Intell. Transp. Syst., 2013,
pp. 13761381.

964

[3]
[4]
[5]
[6]
[7]
[8]
[9]
[10]
[11]
[12]
[13]

[14]

[15]

[16]

[17]
[18]
[19]
[20]
[21]

[22]
[23]
[24]
[25]
[26]
[27]

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING,

S. Jung and S. Pramanik, An efficient path computation model


for hierarchically structured topographical road maps, IEEE
Trans. Knowl. Data Eng., vol. 14, no. 5, pp. 10291046, Sep. 2002.
E. W. Dijkstra, A note on two problems in connexion with
graphs, Num. Math., vol. 1, no. 1, pp. 269271, 1959.
U. Zwick, Exact and approximate distances in graphs a
survey, in Proc. 9th Annu. Eur. Symp. Algorithms, 2001, vol. 2161,
pp. 3348.
A. V. Goldberg and C. Silverstein, Implementations of Dijkstras
algorithm based on multi-level buckets, Network Optimization,
vol. 450, pp. 292327, 1997.
P. Hart, N. Nilsson, and B. Raphael, A formal basis for the heuristic determination of minimum cost paths, IEEE Trans. Syst. Sci.
Cybern., vol. SSC-4, no. 2, pp. 100107, Jul. 1968.
A. V. Goldberg and C. Harrelson, Computing the shortest path:
A search meets graph theory, in Proc. ACM Symp. Discr. Algorithms, 2005, pp. 156165.
R. Gutman, Reach-based routing: A new approach to shortest
path algorithms optimized for road networks, in Proc. Workshop
Algorithm Eng. Experiments, 2004, pp. 100111.
A. V. Goldberg, H. Kaplan, and R. F. Werneck, Reach for A*: Efficient point-to-point shortest path algorithms, in Proc. Workshop
Algorithm Eng. Experiments, 2006, pp. 129143.
S. Jung and S. Pramanik, An efficient path computation model
for hierarchically structured topographical road maps, IEEE
Trans. Knowl. Data Eng., vol. 14, no. 5, pp. 10291046, Sep. 2002.
R. Goldman, N. Shivakumar, S. Venkatasubramanian, and H.
Garcia-Molina, Proximity search in aatabases, in Proc. Int. Conf.
Very Large Data Bases, 1998, pp. 2637.
N. Jing, Y.-W. Huang, and E. A. Rundensteiner, Hierarchical
optimization of optimal path finding for transportation
applications, in Proc. ACM Conf. Inf. Knowl. Manage., 1996,
pp. 261268.
N. Jing, Y. wu Huang, and E. A. Rundensteiner, Hierarchical
encoded path views for path query processing: An optimal model
and its performance evaluation, IEEE Trans. Knowl. Data Eng.,
vol. 10, no. 3, pp. 409432, May/Jun. 1998.
U. Demiryurek, F. Banaei-Kashani, C. Shahabi, and A. Ranganathan, Online computation of fastest path in time-dependent spatial networks, in Proc. 12th Int. Conf. Adv. Spatial Temporal
Databases, 2011, pp. 92111.
H. Gonzalez, J. Han, X. Li, M. Myslinska, and J. P. Sondag,
Adaptive fastest path computation on a road network: A traffic
mining approach, in Proc. 33rd Int. Conf. Very Large Data Bases,
2007, pp. 794805.
J. R. Thomsen, M. L. Yiu, and C. S. Jensen, Effective caching of
shortest paths for location-based services, in Proc. ACM SIGMOD
Int. Conf. Manage. Data, 2012, pp. 313324.
T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein, Introduction to Algorithms, 3rd ed. Cambridge, MA, USA: MIT Press 2009.
E. Markatos, On caching search engine query results, Comput.
Commun., vol. 24, no. 2, pp. 137143, 2001.
R. Ozcan, I. S. Altingovde, and O. Ulusoy, A cost-aware strategy
for query result caching in web search engines, in Proc. Adv. Inf.
Retrieval, 2009, vol. 5478, pp. 628636.
R. Baeza-Yates, A. Gionis, F. Junqueira, V. Murdock, V. Plachouras,
and F. Silvestri, The impact of caching on search engines, in
Proc. 30th Annu. Int. ACM Conf. Res. Develop. Inf. Retrieval, 2007,
pp. 183190.
R. Baeza-Yates and F. Saint-Jean, A three level search engine
index based in query log distribution, String Process. Inf. Retrieval,
vol. 2857, pp. 5665, 2003.
R. Ozcan, I. S. Altingovde, B. B. Cambazoglu, F. P. Junqueira, and
zgr Ulusoy, A five-level static cache architecture for web search
engines, Inf. Process. Manage., vol. 48, no. 5, pp. 828840, 2012.
R. Ozcan, I. S. Altingovde, and O. Ulusoy, Static query result
caching revisited, in Proc. 17th Int. Conf. World Wide Web, 2008,
pp. 11691170.
Q. Gan and T. Suel, Improved techniques for result caching in
web search engines, in Proc. 18th Int. Conf. World Wide Web, 2009,
pp. 431440.
X. Long and T. Suel, Three-level caching for efficient query processing in large web search engines, in Proc. 14th Int. Conf. World
Wide Web, 2005, pp. 257266.
J. Sankaranarayanan, H. Samet, and H. Alborzi, Path Oracles for
spatial networks, Proc. VLDB Endowment, vol. 2, no. 1, pp. 1210
1221, 2009.

VOL. 28, NO. 4,

APRIL 2016

[28] H. Hu, J. Xu, and D. L. Lee, A generic framework for monitoring


continuous spatial queries over moving objects, in Proc. ACM
SIGMOD Int. Conf. Manage. Data, 2005, pp. 479490.
[29] X. Xiong, M. F. Mokbel, and W. G. Aref, SEA-CNN: Scalable
processing of continuous k-nearest neighbor queries in spatiotemporal databases, in Proc. IEEE 21st Int. Conf. Data Eng., 2005,
pp. 643654.
[30] H. Kanoh, Dynamic route planning for car navigation systems
using virus genetic algorithms, Int. J. Knowl.-Based Intell. Eng.
Syst., vol. 11, no. 1, pp. 6578, 2007.
[31] M. Ali, J. Krumm, T. Rautman, and A. Teredesai, ACM SIGSPATIAL GIS Cup 2012, in Proc. ACM Int. Conf. Adv. Geographic Inf.
Syst., 2012, pp. 597600.
[32] Cloudmade. (2015). [Online]. Available: downloads.cloudmade.
com
Ying Zhang received the BE degree in computer
science from Northwestern Polytechnical University, Xian, China, in 2009, and the PhD degree in
computer science from the National University of
Singapore in 2014. Her research interests include
multimedia, location-based service, spatiotemporal databases, image/video processing, and
machine learning. She is a member of the IEEE.

Yu-Ling Hsueh received the MS and PhD


degrees in computer science from the University
of Southern California in 2003 and 2009, respectively. She has been an assistant professor
with the Department of Computer Science and
Information Engineering, National Chung Cheng
University, Taiwan, since 2011. Her research
interests include spatiotemporal databases,
mobile data management, scalable continuous
query processing, and spatial data indexing. She
is a member of the IEEE.
Wang-Chien Lee is an associate professor of
computer science and engineering at Pennsylvania State University, University Park, PA. He
currently leads the Intelligent Pervasive Data
Access Research (iPDA) Group at Penn State
University to pursue cross-area research in data
management, pervasive/mobile computing, and
networking. He is a member of the IEEE.

Yi-Hao Jhang received the BS and MS degrees


in computer science and information engineering
from the National Yunlin University of Science
and Technology in 2012, and from the National
Chung Cheng University in 2014, respectively.
His research interests include mobile data management, query processing, and spatiotemporal
databases.

" For more information on this or any other computing topic,


please visit our Digital Library at www.computer.org/publications/dlib.

Вам также может понравиться