Вы находитесь на странице: 1из 25

Knowl Inf Syst (2018) 56:533–557

https://doi.org/10.1007/s10115-017-1146-x

REGULAR PAPER

From location to location pattern privacy


in location-based services

Osman Abul1 · Cansın Bayrak1

Received: 21 December 2016 / Revised: 6 November 2017 / Accepted: 27 December 2017 /


Published online: 8 January 2018
© Springer-Verlag London Ltd., part of Springer Nature 2018

Abstract Location privacy is extensively studied in the context of location-based services


(LBSs). Typically, users are assigned a location privacy profile and the precise locations
are cloaked so that the privacy profile is not compromised. Though being well-defined for
snapshot location privacy, these solutions require additional precautions and patches in case of
consecutive LBS requests on the user trajectory. The attacker can exploit some background
knowledge like maximum velocity to compromise the privacy profile. To protect against
this kind of location privacy attacks, PROBE (Damiani et al. in Trans Data Priv 3(2):123–
148, 2010)-like systems constantly check location privacy violations and alter requests as
necessary. Clearly, the location privacy is defined in terms of snapshot locations. Observing
that there are usually user-specific movement patterns existing in the shared LBS requests, this
work extends location privacy to location pattern privacy. We present a framework where user-
specific sensitive movement patterns are defined and sanitized in offline and online fashions,
respectively. Our solution uses an efficient dynamic programming approach to decide on and
to prevent sensitive pattern disclosure. An extensive experimental evaluation has been carried
out too.

Keywords Location-based services · Location privacy · Location pattern privacy · Data


mining

1 Introduction

Smart mobile devices with positioning capabilities have leveraged increased use of location-
based services (LBSs). For example, a popular LBS provider Foursquare has more than 60
million registered users and 55 million monthly active users with 7 billion total check-ins, as

B Osman Abul
osmanabul@etu.edu.tr
1 Department of Computer Engineering, TOBB University of Economics and Technology, Ankara,
Turkey

123
534 O. Abul, C. Bayrak

of December 2015.1 It offers various location-based services: search, recommendations, tips


and expertise (on food, nightlife, fun, shopping and many more) as well as social networking.
Moreover, it offers an API for third party developers so that they can create their own LBSs
with respective applications. As a result, a large number of LBS providers have been spawned
and people are sharing their location information (with identity if the service is subscription
based) with them through various (mobile) applications. Many of such services now became
habits for majority, and indeed everybody acknowledges their merits. Being free and useful,
many LBSs are found handy and attractive. However, there is an inherent risk: compromising
privacy of users due to location sharing.
The identity of users is not anonymous with subscription-based LBSs, i.e., LBS providers
link their request types and locations to their identities. Even for non-subscription-based
LBSs, it is highly likely that the identity can be revealed from the shared information requested
by the service. For instance, when home address is known, a request from this location reveals
the identity of requester. In this work, we assume that the identity of the users is known or
inferred by LBS providers and our problem is to protect their precise locations. The canonical
solution to the problem starts with creating coarser-cloaked regions and then reporting the
respective cloaking region covering the precise location. This way, users can get an acceptable
quality of service while disclosing less information on their precise locations.
Since everybody’s location privacy requirement is not the same, a typical solution (e.g.,
PROBE [13]) creates a cloaking region map per profile. Each individual is then assigned a
privacy profile and hence owns a cloaking map. However, in this work we claim that this
solution indeed is not truly personal. In other words, ensuring that all the locations are cloaked
at each service request along the user trajectory does not mean that the user is safe with respect
to location-related privacy requirements. When we treat all the cloaked regions shared with
LBS requests so far as a unit, it might contain some user-specific sensitive patterns. In this
case, the threat comes from data mining: extracting possibly sensitive movement patterns
from the shared LBS requests. So, we need to identify the user-specific sensitive patterns
and constantly check whether any of them are about to be disclosed. In case so, we need
to apply a sanitization procedure before forwarding the LBS request to LBS provider. This
work extends location privacy to location pattern privacy. The motivation is detailed next.

1.1 Motivation

Suppose an LBS adopts a PROBE [13]-like location cloaking framework. First of all, it creates
and manages a location privacy profile pool. The pool contains a number of distinct privacy
profiles. When a new user registers with the LBS, it provides the user with the following
steps to specify the privacy profile with additional parameters. The typical steps of the offline
phase are as follows:
1. The user picks a privacy profile from the pool or creates a custom one
2. The user picks a geographic region of LBS access
3. The system assigns/generates a cloaking map to/for the user
There are three dimensions that differentiate the resulting cloaking map for the users:
(i) privacy profile picked, (ii) geographic region, and (iii) the cloaking method employed
in the third step above. As a result, for instance, different cloaking maps may be obtained
for two individuals sharing the same privacy profile within the same region in case different
algorithms are employed in the third step. On the other hand, two individuals may share
the same cloaking map whenever their options are the same for all of the three steps above.
1 http://expandedramblings.com/index.php/by-the-numbers-interesting-foursquare-user-stats/.

123
From location to location pattern privacy in location-based… 535

One can easily draw two facts based on the observations: (i) for two individuals having
the same privacy profile does not imply that the corresponding cloaking maps are the same
too, (ii) individuals may share the same cloaking map. The fact (ii) clearly suggests that the
same cloaking map relationship partitions all of the LBS users, i.e., every user in a particular
partition shares the same cloaking map. One fundamental question is whether all users within
the same partition have the same location privacy requirement. In other words, whether such
users are indifferent from the location privacy perspective. If the answer to this question is
NO, then such frameworks indeed are not truly personalized.
Clearly, two distinct users using the same cloaking map and having the same LBS
request sequence (always from the same place at the same time) will produce the same
footprint at LBS site. The question now is whether it is possible that the two users are
equally happy as far as their location privacy is concerned. Our answer to this ques-
tion is NO, just consider the following example. Suppose two distinct users Alice and
Bob have the same cloaking map and issue the same LBS request sequence. Also,
suppose that some of the request regions contain the private–personal locations (i.e.,
home, work place and frequented place) for Alice, but none for Bob. Then, Alice is
more doubtful than Bob with the LBS requests disclosure. This is because the trajectory
is sporadic for Bob, but frequent for Alice. Hence, Alice need to pay more atten-
tion on this as the LBS provider can extract (via data analysis/mining) private patterns
that Alice typically follows. This suggests that we need to define and protect person-
specific sensitive patterns defined over the whole trajectory of the user. So, we introduce
location pattern privacy in this work, which extends and patches location privacy for
LBSs.
Since our approach is truly personal, each user needs to specify his own sensitive location
patterns. To this end, we adopt temporally annotated sequences (TAS) pattern model [17],
in which each pattern has both spatial and temporal dimensions. We assume that each user
when provided with a cloaking map can manually specify his own sensitive pattern set by
examining the cloaking map, an example of which is shown in Fig. 2 and Example 1. However,
we recommend an alternative way for convenience, in which a particular user can privately
collect his own trajectories over a time period and can employ TAS miner [17] over the
collected dataset. The output is the frequent TAS pattern set specific for the user. He can
review these patterns and can pick a subset of them as sensitive. Distinguishing sensitive
from insensitive ones is a subjective matter, hence making the approach truly personal. For
instance, the user can decide that patterns involving his residence and his sensitive places
(like worship and nightlife) together in a time frame are sensitive. Note that the temporal
dimension matters too, for instance, for most people spending half an hour on a bar on the
way to home is less sensitive than spending a few hours.

1.2 Related work

Privacy issues due to disclosure of user-specific sensitive data (microdata) can be broadly
studied under two categories: (i) offline data-centric, and (ii) online service-centric. In the
former, the microdata are already stored by the server and are going to be shared with third
parties. In the latter, however, the microdata are still being accumulated at the server while
the user interacts with it. Note that, the subject (the microdata whom it refers to) in the former
needs not to be a user, while in the latter it is the user who can actively control the disclosure
limitation. For this reason, the data curator in the former category and the user in the second
category have the main responsibility to exploit privacy-enhancing technologies. The privacy
problem we studied here falls in the online service-centric category.

123
536 O. Abul, C. Bayrak

1.2.1 Offline data-centric privacy

Sweeney [26] introduced the k-anonymity privacy principle as a mean of limiting the dis-
closure of sensitive information from tabular data. Typically, the data are perturbed by
generalization, e.g., ages to age groups and streets to counties. The principle ensures that
each subject is indistinguishable from at least k − 1 others, and hence any attack on record
linkage (based on quasi-identifiers) cannot succeed with more than 1/k probability. Other
privacy principles like l-diversity [22] and its refinement t-closeness [21] are proposed to
strengthen k-anonymity model.
Originally developed for tabular data publishing, k-anonymity model is extended for
data mining results publishing [9] and anonymous trajectory data publishing [2,19,23] too.
Optimal k-anonymization is proven to be NP-hard [2,5].
Data mining, i.e., extracting knowledge from databases, is shown to be a threat to database
privacy and security [24]. Typically, a sanitization process, i.e., removing sensitive patterns, is
applied before releasing database. Atallah et al. [8] provided a knowledge hiding framework
to suppress sensitive rules in frequent item set mining task. Their method alters the source
database so that the support and/or confidence of sensitive rules are reduced below a thresh-
old. Knowledge hiding in the context of sequences [1], spatio-temporal databases [3] and
tree–graph databases [4] have been studied too. More concretely, Abul et al. [3] selectively
suppresses some points in trajectories so that the spatio-temporal database does not support
any of the user-specified sensitive patterns, which are expressed as temporally annotated
sequences (TAS) extracted by TAS Miner [17]. The work by Terrovitis and Mamoulis [27]
selectively suppresses some trajectory points so that the trajectory cannot be linked to the
owner under the background knowledge of partial trajectory.
In this work, we adopt TAS model to specify user-specific sensitive location patterns. We
would like to highlight that sensitive patterns employed by [3] are database specific and the
sanitization is offline; however, in this study the sensitive patterns are user specific and the
sanitization is online.

1.2.2 Online service-centric privacy

Since most service-centric applications are location based and classified under location-based
services (LBSs), we will constrain our scope to privacy issues in LBSs. The location pro-
tection and identity protection are the main privacy concerns of subscription-based LBSs
and anonymous LBSs, respectively. In the latter case, the classical solution relies on the con-
cept of location k-anonymity [15,18]. Similar to classical k-anonymity, location k-anonymity
requires at least k − 1 other users requesting services from the same coarse location, called
mix-zone, of the user. Services need to be delayed (temporal cloaking) most of the time as
service requests from at least k users from the same mix-zone at the same time rarely hap-
pen. The size of spatial cloaking and the size of temporal cloaking are the two performance
parameters.
In subscription-based LBSs, the typical approach is to obfuscate the true location of
the user. In Kido et al. [20], the user sends one or more fake positions in addition to the
true location. This way, the LBS provider is confused about the true location of the user,
but the solution has the overhead of increased traffic. Moreover, the server can extract the
true trajectory in case the user makes multiple requests along his trajectory. Note that in
this approach the user shares his exact coordinates. Private information retrieval (PIR) [16]
provides strong privacy guarantees by running an encryption-based protocol. However, the
communication/computation overhead is very high and, more importantly, the approach is

123
From location to location pattern privacy in location-based… 537

suitable only for pre-recorded static locations, e.g., to retrieve nearest points of interests.
Location perturbation methods based on Bayesian statistics [25] and differential privacy
[14] have been proposed too. Spatial cloaking, in which exact coordinates are replaced
with coarser region identifiers, is another popular approach employed by subscription-based
LBSs.
The spatial cloaking approach requires the area of operation to be divided into a num-
ber of uncertainty regions [10]. The collection of these regions is called cloaking map. The
cloaking map is typically pre-computed based on the location privacy preferences of the user
and shared with LBS provider. During online requests, the user reports the region identifier
where his true coordinate falls in. The service quality depends on the average size, measured
as geographic extension, of the cloaking regions. Being simple and applicable, this approach
should be patched when there are multiple requests from the same user over his trajectory.
More concretely, the LBS provider can exploit maximum velocity to constrain the cloak-
ing regions, a compromise for the location privacy. The canonical solution to this case is
employing either postdating or time delaying [10,13,28].
In PROBE [13]-like location cloaking methods such as [7,25,28], each user is assigned
a privacy profile and a respective cloaking map is generated for each profile. The privacy
profile is not only all about the preference on geographic size of cloaking regions, but also the
semantics assigned to the places within the regions. Clearly, each place is either non-sensitive,
e.g., parks and restaurants, or sensitive, e.g., hospitals and night clubs, with varying degrees
per profile. So, the resulting cloaking map ensures location privacy on two dimensions: coarse
spatial area and diverse semantic places of varying sensitivity within each region. After the
cloaking map is generated offline, PROBE-like systems just guard against velocity attacks,
i.e., deciding on whether time delaying or postdating is required and do it in case so. In
the current work, however, we postulate that PROBE-like methods should be extended so
that the whole trajectory, as seen by LBS provider from service requests, should not contain
user-specific sensitive patterns. To this end, our approach is a holistic approach to location
and location patterns protection from the trajectories shared with LBSs. This feature makes
it truly person specific as opposed to profile specific.

2 Problem formulation

This section first presents the framework where our proposal fits in. Next, the preliminaries
and the attack model are provided.

2.1 The framework

Figure 1 shows our framework. First of all, the user picks an operating region covering all of
his movements and also a location privacy profile (LPP). Using these two, a cloaking map
generator (e.g., PROBE) generates a cloaking map which is shared with LBS server and also
returned to the user. Then, the user defines his location pattern privacy profile (LPPP) with
reference to the cloaking map. These are all done offline, i.e., before any service request
takes place. The online phase starts with an LBS request Dr eq . The request is evaluated by a
location privacy ensurer (e.g., PROBE) which also checks the effects of background attacks
(e.g., velocity-based attack). In case there is no privacy compromise detected, then the request
is not modified, e.g., Dr eq = Dr eq . Otherwise, the request is modified by time delaying or
postdating, e.g., Dr eq  = Dr eq . In any case, similar to LPP ensurer, our LPPP ensurer decides
on Dreq (the LBS request to be delivered to LBS server) given Dr eq and LPPP. It either does

123
538 O. Abul, C. Bayrak

Fig. 1 Framework. It has online and offline modules interacting with the user and LBS server

(i) no transformation in case there is no pattern violation, (ii) time delaying in case there
is a pattern violation, but time delaying solves it, (iii) postdating in case there is a pattern
violation, but postdating solves it, or (iv) dropping in case there is a pattern violation, but
neither time delaying nor postdating can solve it. We would like to note that in case LPP
ensurer decides to drop service request, our LPPP ensurer is not aware of that as they are
totally orthogonal.

2.2 Preliminaries

Definition 1 (Generic region space) A generic region space R is the operating geographic
area of a particular LBS for a user group. In other words, movements of the users are confined
to R.

Generic region space R can be expressed in multiple ways depending on the model of user
location. Typically, we consider two concrete region space models: (i) free terrain and (ii)
urban area. In the free terrain model, users travel along any dimension from any given place,
while in the urban area model user movements are restricted to roads connecting places.
Typically, the former is modeled with a continuous coordinate system, while the latter is
modeled with a graph. We should note here that in the free terrain model there are some
movement restrictions too, e.g., walking over rivers and lakes. But such things can be treated
as constraints within the model.
In the free terrain model, user location can be expressed as longitude/latitude of the
geographic area or by any other continuous coordinate system. Alternatively, the region
space can be arbitrarily discretized and each region within the region space can be assigned
an identifier. Moreover, a mixture of the above two is possible too, i.e., some part of the
continuous coordinate system can be discretized, while the rest remains intact.
In the urban area model, though the user locations can be expressed within a continuous
coordinate system, user movements from one place (like home, hospital and bar) to another are
through roads only. This makes a fundamental difference as shortest paths (a very important
metric for any LBS) are not air distance anymore. Hence, the region space can be better

123
From location to location pattern privacy in location-based… 539

Fig. 2 Cloaking map and a user trajectory. The trajectory starts in region R2 and ends in region R5

modeled with a graph in which vertices are places and edges are roads. Also, note that graph
modeling defines a kind of discretization over the underlying terrain.

Definition 2 (LBS request) An LBS request is a data triple Dr eq = (rr, r t, sd) where rr ∈ R
is the request’s region identifier, r t is the request’s time, and sd is the satellite data associated
with the request. Typically, sd contains the user identifier, the service identifier and any other
parameters associated with the request itself.

With the free terrain model, the location identifier is either a two-dimensional point in the
continuous coordinate system or the identifier of discretized region. Similarly, it is either a
vertex or edge identifier from the graph representation of the urban area model.

Definition 3 (LBS request sequence) Given a generic region space R, an LBS request
sequence D is a sequence of LBS requests from the same user, i.e., D =< (rr1 , r t1 , sd1 ),
(rr2 , r t2 , sd2 ), . . . , (rrn , r tn , sdn ) > such that r ti < r ti+1 and r tn < r tnow , where r tnow is
the current time.

As the receiver of online LBS request sequence D, LBS provider can store this information
sent from the associated user. In what follows, we consider potential attacks by the LBS
provider at time r tnow .

Example 1 (Cloaking map and LBS request sequence) Figure 2 shows a sample cloaking
map produced by cloaking map generator shown in Fig. 1. The cloaking map contains 13 rect-
angular cloaking regions enumerated with R1 through R13. Also, shown is a user trajectory
starting from R2 and ending at R5. Along this trajectory, suppose that the user has the LBS
request sequence of D =< (R2, 10, Q1), (R8, 20, Q2), (R11, 35, Q1), (R9, 50, Q3) >,
where there are four LBS requests at time stamps of 10, 20, 35 and 50. In the scope of this
work, the kind of query (denoted with Q1, Q2 and Q3) is immaterial. We would like to

123
540 O. Abul, C. Bayrak

emphasize that the user need not to send an LBS request at every cloaking region he visits,
e.g., there is no request from R7. 

2.3 Attack model

2.3.1 Location privacy attack

Knowing the request sequence D of the user, LBS provider as the attacker can exploit some
background information to downsize some of the regions. Of course, in case a location identi-
fier is a two-dimensional point, then this is already precise and cannot be further downsized.
However, for other cases the region can potentially be downsized. The attack model for
this case exploits the maximum velocity information available as a background knowledge.
Indeed, this kind of attack is well studied in [13,28]. Typically, whenever the user understands
that such attack will succeed, then there are three canonical solutions: (i) time delaying, (ii)
postdating, or (iii) dropping the LBS request. Since the user is aware of this kind of attack,
the request is modified accordingly. In other words, the LBS request is forwarded to LBS
provider only if no cloaking region downsizing risk is detected.

2.3.2 Location pattern privacy attack

In some cases, although location privacy is provided, the user may not be completely satisfied
due to location patterns. Some sensitive patterns can emerge in request sequence D of the
user.
Like an LBS request, location patterns in the context of LBSs have both spatial and
temporal dimensions. The pattern format is indeed very similar to temporally annotated
sequences as defined in TAS Miner [17]. Following TAS Miner convention, we define the
temporally annotated location sequence (TALS) pattern as follows.

Definition 4 (Temporally annotated location sequence pattern) Given a generic region space
R, a TALS pattern is a couple P = (r̄ , t¯), where r̄ = r0 , r1 , . . . , rm , ∀0≤i≤m ri ∈ R is the
sequence of sensitive regions, and t¯ = t1 , t2 , . . . , tm  ∈ Rm
+ is the temporal annotation. A
t1 t2 tm
TALS pattern will be denoted as follows: P = (r̄ , t¯) = r0 −→ r1 −→ · · · −→ rm .

Definition 5 (Support) Given a time flexibility threshold τ and a TALS pattern P = (r̄ , t¯) =
t1 t2 tm
r0 −→ r1 −→ . . . −→ rm , LBS sequence D =< (rr1 , r t1 , sd1 ), (rr2 , r t2 , sd2 ), . . . , (rrn ,
r tn , sdn ) > is said to support P, denoted D τ P, if there exists a sequence of indices
0 ≤ i 0 < · · · < i m ≤ n such that:
– (spatial match) ∀0 ≤ k ≤ m . ∃(rrik , r tik , sdik ) ∈ D s.t. rk = rrik ; and
– (temporal match) ∀tk ∈ t¯ . |(r tik − r tik−1 ) − tk | ≤ τ

Example 2 (Support) Letting D =< (R2, 10, Q1), (R8, 20, Q2), (R11, 35, Q1), (R9, 50,
7 32
Q3) >. The pattern P = R2 −→ R8 −→ R9 with τ = 4 is supported by D, since (i) the
spatial match succeeds with r0 = rr1 , r1 = rr2 , r2 = rr4 , and (ii) the respective temporal
match succeeds as |(20 − 10) − 7| ≤ 4 and |(50 − 20) − 32| ≤ 4. However, the same pattern
with τ = 2, although spatial match is successful, fails to be supported by the same D due to
the failure in the temporal match as |(20 − 10) − 7| ≤ 2 evaluates to false. Another pattern
10 25
P = R2 −→ R11 −→ R8 with τ = 4 fails to be supported by the same D as there exists
no indices for the spatial match. 

123
From location to location pattern privacy in location-based… 541

For each user, we assume that a set of private TALS pattern is specified. This set indeed
specifies the sensitive location pattern profile for the user. Note that this pattern profile does
not override or replace the location privacy profile, but it is complementary. One nice feature
with the sensitive location pattern profile is that the user knowing his past LBS request
sequence together with the current LBS request can locally check whether any sensitive
pattern violation will happen. If the past LBS request sequence appended with the current
LBS request does not support any of the sensitive patterns, then the LBS request sequence is
said to be safe, as formally defined below.

Definition 6 (TALS pattern safe LBS request sequence) Given a set of person-specific sen-
sitive TALS pattern set P = {(r¯1 , t¯1 ), (r¯2 , t¯2 ), . . . , (r¯p , t¯p )} and a time flexibility threshold
τ , D =< (rr1 , r t1 , sd1 ), (rr2 , r t2 , sd2 ), . . . , (rrn , r tn , sdn ) > is TALS pattern safe w.r.t.
(P, τ ) if D supports none of the TALS pattern in P. Formally, D  τ P iff D  τ P.∀P ∈ P.

We would like to emphasize that τ is not a system, but a user parameter. Clearly, bigger
τ values specify loose, while smaller values specify strict time constraints. At the extreme
values, τ = 0 means that the temporal dimension is exact and has no flexibility, while τ = ∞
means that the temporal dimension is not effective at all.
We consider two privacy problems in the context of LBS request sequences and TALS
patterns: (i) online in which LBS sequence is still extending and (ii) offline in which LBS
sequence has already finished extending. In the former, the user is confronted with the problem
of whether to send the current LBS request or not. In the latter, however, the user is wondering
whether his footprint collected by LBS provider supports some of his sensitive TALS pattern,
and if so, ask the provider to do sanitization. In other words, the user is proactive in the former
and retroactive in the latter. The former is the subject of this study.

3 Online pattern privacy protection

In this section, we develop a definition of online sensitive location pattern protection problem
and a dynamic programming approach to efficiently solve it.

Problem 1 (Online TALS pattern privacy) Given person-specific sensitive TALS pattern set
P = {(r¯1 , t¯1 ), (r¯2 , t¯2 ), . . . , (r¯p , t¯p )} with the time flexibility threshold τ , and LBS request
sequence so far D =< (rr1 , r t1 , sd1 ), (rr2 , r t2 , sd2 ), . . . , (rrn , r tn , sdn ) >. Assuming that
D is already TALS pattern safe w.r.t. (P, τ ) (This is the property that we maintain all the
time). Also, suppose that the user is about to issue the current LBS request

Dr eq = (rrnow=n+1 , r tnow=n+1 , sdnow=n+1 ).

Online TALS pattern privacy problem is to decide whether the extended D  = D 


Dr eq =< (rr1 , r t1 , sd1 ), (rr2 , r t2 , sd2 ), . . . , (rrn , r tn , sdn ), (rrnow , r tnow , sdnow ) > is
TALS pattern safe w.r.t. (P, τ ).

To efficiently solve Problem 1, we develop a dynamic programming approach as explained


next. For ease of expressiveness, we first fix |P| = 1, i.e., P contains a single sensitive pattern
P. Then, we extend it to the general case that |P| ≥ 1.
Suppose D =< (rr1 , r t1 , sd1 ), (rr2 , r t2 , sd2 ), . . . , (rrn , r tn , sdn ) > and P = (r̄ , t¯) =
t1 t2 tm
r0 −→ r1 −→ · · · −→ rm are given. Then, the length- j prefix of D is denoted by D j =<

123
542 O. Abul, C. Bayrak

(rr1 , r t1 , sd1 ), (rr2 , r t2 , sd2 ), . . . , (rr j , r t j , sd j ) > where 1 ≤ j ≤ n. Similarly, the length-
t1 t2 ti−1
i prefix of P is denoted by P i = r0 −→ r1 −→ · · · −→ ri−1 , where 1 ≤ i ≤ m + 1. Note
at the boundary that P 1 = r0 .

Definition 7 (Partial support) Given a time flexibility threshold τ , length-i prefix of TALS
t1 t2 ti−1
pattern P i = r0 −→ r1 −→ · · · −→ ri−1 , and length- j prefix of LBS request sequence
D j =< (rr1 , r t1 , sd1 ), (rr2 , r t2 , sd2 ), . . . , (rr j , r t j , sd j ) >, D j is said to partially support
P at prefix P i and denoted with D j τ P i , where i ≤ j.

In this work, we are interested in different partial support (matching) of P i on D j . For


a given matching, we are particularly interested in the rightmost matching index of ri−1 on
D j and denote this index set with M(i, j). Formally,
M(i, j) = {k : k ∈ i . . . j and ri−1 = rrk and D k τ P i }.
Clearly, if M(i, j) = ∅ then D j τ P i does not hold, and otherwise it holds with possibly
more than one way of distinct support.
Let I (i, j) to be the indicator variable denoting whether there exist a rightmost partial
support of P i on D j . Formally, I (i, j) = D j τ P i ∧ j ∈ M(i, j). I (i, j) is very useful
for online support checking. Just consider that whenever we know I (i, k).∀k ∈ [1.. j], then
it will be very easy to compute I (i, j + 1) when the next LBS request (rr j+1 , r t j+1 , sd j+1 )
is issued. The update rule for the Boolean table I can be based on the following recursion,
and this recursion can be solved by dynamic programming efficiently.
I (0, j) = False.∀ j
I (i, 0) = False.∀i
I (i, j < i) = False.∀i
I (1, j) = isequal(r0 , rr j ).∀ j (1)
j
k=
I (i, j + 1) = [I (i − 1, k) ∧ isequal(ri−1 , rr j+1 ) ∧ (|(r t j+1 − r tk ) − ti−1 | ≤ τ )].∀i
k=1

Note that in the formula for I (i, j + 1), the term isequal(ri−1 , rr j+1 ) checks for the
spatial match, i.e., whether ri−1 = rr j+1 or not. The last term |(r t j+1 − r tk ) − ti−1 | ≤ τ
checks for the temporal match with the new arrival of (rr j+1 , r t j+1 , sd j+1 ).
After filling the j + 1th column of the table I , it is very easy to check whether D j+1 τ
P or not. Due to the construction of the table I (m + 1, j + 1) = T r ue ⇔ D j+1 τ
P. In other words, I (m + 1, j + 1) = T r ue means that the new LBS request Dr eq =
(rrnow= j+1 , r tnow= j+1 , sdnow= j+1 ) causes D  to be not TALS pattern safe w.r.t. (P, τ ). In
this case, we need to sanitize the service request as explained in Sect. 4. Otherwise, the new
LBS request Dr eq = (rrnow= j+1 , r tnow= j+1 , sdnow= j+1 ) makes D  to be TALS pattern safe
w.r.t. (P, τ ). In this case, we need not to sanitize the LBS request and we can forward it to
LBS provider as is.

Example 3 (Partial support) Let the LBS request sequence is D =< (R2, 10, Q1), (R8, 20,
Q2), (R11, 35, Q1), (R9, 50, Q3) >, and the current LBS request to be Dr eq =
7 32 8
(R5, 60, Q2). Assume that the sensitive pattern P = R2 −→ R8 −→ R9 −→ R5 and
τ = 4 are given. Then, the respective I table is computed as follows. Since the I table
computation is incremental, the last column is appended and computed after the current LBS
request Dr eq arrives. Since the entry at row 4 and column 5 is computed as True, the current

123
From location to location pattern privacy in location-based… 543

Fig. 3 Graphical illustration of temporal match with dual segment representation. Temporal match is suc-
ceeded for the two cases of S1 and S4

LBS request will cause a location pattern privacy violation if shared with LBS provider as
is. The next section shows how it is solved. 

(R2, 10, Q1) (R8, 20, Q2) (R11, 35, Q1) (R9, 50, Q3) (R5, 60, Q2)
I 0 1 2 3 4 5

0 False False False False False False


1 False True False False False False
2 False False True False False False
3 False False False False True False
4 False False False False False True

In case the sensitive pattern set contains p > 1 patterns, i.e., P = {(r¯1 , t¯1 ), (r¯2 , t¯2 ), . . . ,
(r¯p , t¯p )}, then the above I table is maintained for each pattern separately and denoted with
I P for the pattern P ∈ P.
To visualize the temporal match, we show a dual segment representation as shown in
Fig. 3. For the ith segment of a selected pattern, we obtain a line segment with endpoints
ti−1 − τ and ti−1 + τ on the elapsed time t axis, where the elapsed time is measured
from previous match at r tk . On this axis, the expression |(r t j+1 − r tk ) − ti−1 | ≤ τ is true
if the vertical line at (r t j+1 − r tk ) crosses the line segment. In the figure, for example, the
expression |(r t j+1 − r tk ) − ti−1 | ≤ τ evaluates to true only for the cases S1 and S4 .

3.1 Complexity and improvement

If n = |D|, p = |P| and m = max{|P| : P ∈ P}, then the computational complexity of filling
table I P (for P ∈ P) is O(mn 2 ). This is simply because the table I P contains O(mn) entries

123
544 O. Abul, C. Bayrak

and each entry is computed in O(n) time. Since there are p such tables, the complexity of
filling all the tables is O( pmn 2 ), which is quadratic in n.
Noting that the table is computed incrementally, i.e., with new LBS request (rr j+1 , r t j+1 ,
sd j+1 ), only the column j + 1 of tables is computed. Clearly, the computation takes O( pmn)
time, i.e., the number of entries filled is O( pm) and each of them is filled in O(n) time. This
can be prohibitive since D is a streaming data and can be very long, and the computational
complexity and also the size (space complexity) of any table I P can go very big in practice.
However, as the following observation and theorem note, it can be both time and space
bounded by taking the temporal bounding property into account.

Observation 1 (Temporal bounding) Suppose in the formula I (i, j + 1) the lowest k = l


such that |(r t j+1 − r tl ) − ti−1 | ≤ τ holds, then there is no need to check for k < l values.
This is because any k < l will give the false value for |(r t j+1 − r tk ) − ti−1 | ≤ τ , and
due to the conjunction in Formula 1, the support result will be false regardless of the value
of other conjuncts.
k= j k= j
The observation enables us to replace the term k=1 with the term k=l in For-
mula 1. Additionally, due to monotonicity of time, with arrival of new LBS request
(rr j+2 , r t j+2 , sd j+2 ), the new lower bound for k is not less than l. This observation enables
us to trash all the table entries I (i, k < l). So, by defining a sliding window of length ti−1 + τ
ti−1
(a constant value for a fixed pattern segment ri−2 −→ ri−1 ) we can effectively update k ≥ l
with new LBS request. As a result, the number of columns of the table is upper bounded, i.e.,
does not depend on n. The following theorem is another alternative proof of this fact.

Theorem 1 (Maximum temporal window) Given a time flexibility threshold τ , a sen-


t1 t2 tm
sitive pattern P = r0 −→ r1 −→ · · · −→ rm , and LBS sequence D =<
(rr1 , r t1 , sd1 ), (rr2 , r t2 , sd2 ), . . . , (rr
n , r tn , sdn ) >, the maximum temporal window length
for any D τ P is not greater than i=m i=1 ti + mτ .

t1
Proof Any temporal match at r0 −→ r1 is not greater than t1 +τ , and similarly it is not greater
ti
than ti + τ for the segment ri−1 −→ ri . Hence, summing
 them shows that the maximum
temporal match of any window is not greater than i=m t
i=1 i + mτ . 


Due to Observation 1 or Theorem 1, the time complexity of updating a single table I


reduces from O(mn) to O(m), as at each new LBS request we can trim the past LBS
sequence history beyond temporal bounding. This effectively makes the maintained part of
the LBS sequence history bounded by a constant. Also, note that the length m of a pattern can
be treated as a constant since the pattern does not change during the online LBS requests. As
a result, updating a single table I effectively has the time complexity of O(1). For p = |P|
tables, the complexity becomes p × O(1) which is O(1) as the value p is constant too, i.e.,
each person has a limited number of sensitive patterns.
In summary, Theorem 1 grants us that all of the I P tables can be effectively updated in
O(1) time at each LBS request. Additionally, the effective space complexity is O(1) too.

4 Solving online pattern privacy problem

Suppose that D =< (rr1 , r t1 , sd1 ), (rr2 , r t2 , sd2 ), . . . , (rrn , r tn , sdn ) > is TALS pattern
safe w.r.t. (P, τ ). Our task is to maintain this invariant with each and every new arrival

123
From location to location pattern privacy in location-based… 545

Dr eq = (rrnow=n+1 , r tnow=n+1 , sdnow=n+1 ), i.e., D  is TALS pattern safe w.r.t. (P, τ ) too.
To do this, we first need to check whether D  is safe w.r.t. (P, τ ). If it is the case, the
LBS request does not violate the user’s location pattern privacy and can be forwarded as is.
However, in case it is not safe then we should sanitize the LBS request before delivering it.
To this end, we consider three operations as options: (i) time delaying, (ii) postdating and
(iii) dropping the LBS request.
Time delaying involves postponing the LBS request t time units so that D  becomes
TALS pattern safe w.r.t. (P, τ ) with the perturbed LBS request Dr eq = (rrnow=n+1 ,

r tnow+t=n+1 , sdnow=n+1 ). This way Dr eq is forwarded at time now + t.
When D is not TALS pattern safe w.r.t. (P, τ ), there exists an index k on D  and at least


for one P ∈ P the expression |(r tnow=n+1 − r tk ) − tm | ≤ τ holds. The objective here is to

find minimum time delay t > 0 so that |(r tnow+t=n+1 − r tk ) − tm | ≤ τ does not hold for any
sensitive pattern P ∈ P. At this point, we need a time threshold (maximum patience) T T
which is the maximum delay the user can tolerate, as nobody can wait indefinitely. Of course,
in case t ≤ T T then time delaying can be applied, but otherwise it cannot be. Fortunately,
the search range for the minimum time delay t is [0..T T ].
Postdating involves altering the location identifier so that D  becomes TALS pattern safe
w.r.t. (P, τ ) with the perturbed LBS request Dr eq = (rrnow=n+1 , r tnow=n+1 , sdnow=n+1 ),

where rrnow=n+1  = rrnow=n+1 . This is indeed due to the fact that when D  becomes
not TALS pattern safe w.r.t. (P, τ ), then isequal(rm , rrnow=n+1 ) must hold. Replacing

rrnow=n+1 with rrnow=n+1 causes D  to become TALS pattern safe w.r.t. (P, τ ). Any ran-

dom location for rrnow=n+1 is nonsense as it may be out of the user’s interest region. To this
end, choosing one of the user’s previous locations is a good choice, both due to the spatial
closeness to the current location and being an authentic previous location on the trajectory,
i.e., not a false location out of the trajectory. Our approach is to look for a location rr  on the
trajectory so that rr   = rrnow=n+1 . To do this, we keep the last r egc locations shared with
the LBS provider and apply a backward regression search. Although this is sound, we need a
quick fix here that the distance between the current location and its substitute should be not
greater than user-defined distance threshold DT .
The third option indeed is the last resort that it is applied when neither time delaying nor
postdating provides a safe solution. In this case, we simply drop the request and do not update
D or any I P table.
Algorithm 1 summarizes our solution to the online TALS pattern privacy problem. Given
the current LBS request, it updates the I tables and LBS request sequence D and returns
the possibly time delayed or postdated LBS request to be sent to LBS provider. It also asks
and tells location privacy ensurer (LPE) about the decision. In case there is no solution, it
simply drops the LBS request. We would like to note that Spatial Distance(·, ·) measures
the spatial distance between center of masses of two given cloaking regions.

Example 4 (Partial support) Let the LBS request sequence so far to be D =< (R2, 10, Q1),
(R8, 20, Q2), (R11, 35, Q1), (R9, 50, Q3) >, and the current LBS request to be Dr eq =
7 32 8
(R5, 60, Q2). Assume that the sensitive pattern P = R2 −→ R8 −→ R9 −→ R5 and τ =
4 are given. As shown in Example 3, the new request causes a location pattern privacy viola-
tion. Suppose that T T = 5, then time delaying the current LBS request Dr eq = (R5, 60, Q2)
by t = 3 will make the new modified LBS request to be Dr eq = (R5, 60 + 3, Q2). Then,
the respective I P table is computed and updated as follows.

123
546 O. Abul, C. Bayrak

Input: Table I , P, D, Dr eq = (rrnow=n+1 , r tnow=n+1 , sdnow=n+1 ), τ , T T , DT , r egc


Output: Dr eq

1: D ← D  Dr eq
2: Compute last column for Table I P using Equation 1. ∀P ∈ P

3: if D  τ P. ∀P ∈ P then
4: Update Table I P . ∀P ∈ P using Equation 1
5: return Dr eq
6: end if

7: t ← Minimum time delay so that D is TALS pattern safe w.r.t. (P, τ )
8: if t ≤ T T then
9: // Time delaying
10: Dr eq ← (rrnow , r tnow+t , sdnow )
11: D  ← D  Dr eq
12: Update Table I P using Equation 1. ∀P ∈ P
13: TellLPE(Dr eq )
14: return Dr eq
15: end if
16: for i ← 1 to r egc do
17: r egri ← i’th region while doing regression on D
18: if Spatial Distance(r egri , rrnow ) ≤ DT then
19: // Postdating
20: Dr eq ← (r egri , r tnow , sdnow )
21: if AskLPE(Dr eq ) = False then
22: continue
23: end if
24: D ← D  Dr eq

25: Compute last column for Table I P using Equation 1. ∀P ∈ P
26: if D   τ P. ∀P ∈ P then
27: Update Table I P . ∀P ∈ P using Equation 1
28: TellLPE(Dr eq )
29: return Dr eq
30: end if
31: end if
32: end for
33: // Drop the request

34: D ← D
35: TellLPE(null)
36: return null
Algorithm 1: Online location pattern privacy algorithm

(R2, 10, Q1) (R8, 20, Q2) (R11, 35, Q1) (R9, 50, Q3) (R5, 63, Q2)
I 0 1 2 3 4 5

0 False False False False False False


1 False True False False False False
2 False False True False False False
3 False False False False True False
4 False False False False False False

On the other hand when T T = 1, then no time delay can prevent the temporal match. So,
we need to look for postdating. In this case, the first regressed region on D is R9 and when
it satisfies SpatialDistance(R9, R5) ≤ DT , then safe LBS request Dr eq = (R9, 60, Q2)
updates the I P table as follows and is forwarded to LBS provider.

123
From location to location pattern privacy in location-based… 547

(R2, 10, Q1) (R8, 20, Q2) (R11, 35, Q1) (R9, 50, Q3) (R9, 60, Q2)
I 0 1 2 3 4 5

0 False False False False False False


1 False True False False False False
2 False False True False False False
3 False False False False True False
4 False False False False False False

4.1 Interface between LPE and LPPE

Besides data flow interface from LPE to LPPE (as shown in Fig. 1), we need a control
flow interface (TellLPE and AskLPE functions in Algorithm 1) as well. TellLPE is to let
LPE to know about the decision applied by LPPE, so that LPE can update its own record
of information about what has been shared with LBS provider. This way the consistency
between LPE and LPPE is maintained.
Note that AskLPE is not consulted when doing time delaying, since further time delaying
(over LPE does) is always safe with respect to maximum velocity attack. For this reason, we
do not check the safety of further time delaying with LPE in Algorithm 1 (lines 8–13). But,
we always let LPE to update its own records for consistency.
On the other hand, postdating may violate the location privacy implemented by LPE. This is
because the region to be reported now must be accessible from the previously reported region
within the respective elapsed time. For this reason, we always check candidate postdating
with LPE, i.e., whether it is safe or not with respect to the maximum velocity attack. So,
postdating is applied only when granted by LPE (lines 21–28). Starting from the most recent
region, the backward regression procedure tries at most r egc regions along the user’s LBS
request history. The utility requirement checks that the spatial distance is not beyond a
distance threshold DT (line 18). The procedure is a kind of local search over the user’s own
trajectory. If the search fails r egc times, then we do not apply postdating and simply drop
the LBS request (lines 34–35).

4.2 Searching minimum time delay

Line 7 in Algorithm 1 searches for the minimum time delay t so that D  is TALS pat-
tern safe w.r.t. (P, τ ). We need to search minimum time delay t for those patterns P ∈ P
such that prefix P m has a partial match D n τ P m , and the current LBS request
Dr eq = (rrnow=n+1 , r tnow=n+1 , sdnow=n+1 ) has a spatial match in the last segment of P,
i.e., rrnow=n+1 = rm . We collect the last segments of patterns P ∈ P as shown in Fig. 4. The
task involves finding minimum t ∈ [0..T T ] so that the vertical line at t does not cross any
last segments of those patterns. This way, the current LBS request does not support any of
those patterns and sharing Dr eq = (rrnow=n+1 , r tnow+t=n+1 , sdnow=n+1 ) becomes safe.

Theorem 2 (Non-monotonicity of temporal match w.r.t. time delaying) Temporal match is


non-monotonic w.r.t. time delay value t.

Proof Let’s consider S2 (the last segment of pattern P2 ∈ P) in Fig. 4 for a graphical proof. It
does not have a temporal match at the current t, but increasing t results in a temporal match.
Continuing to increase t toward T T causes P2 to lose the temporal match. 


123
548 O. Abul, C. Bayrak

Fig. 4 Finding the minimum time delay. We search for minimum t (between 0 and T T ) that does not cross
any segment

Theorem 2 rules out simple binary search and other similar search methods while devel-
oping algorithms to find minimum time delay t.
Let p = |P|, then the minimum time delay t can be found in O( p 2 ) time by a straightfor-
ward algorithm as shown in Algorithm 2, where k is the index of rightmost partial match of
respective P on D. The patterns are handled in non-decreasing order of right endpoints of
last segments. The algorithm checks the violation for all patterns and when found one, a time
delay (tm +τ ) is applied so that with this delay the respective pattern is not supported, i.e., the
vertical line at t = tm + τ leaves the segment of the pattern totally on the left. Note that the
procedure always exits as the vertical line at t exhausts segments of all the patterns. Clearly,
whenever t > T T then the algorithm can be early terminated. The time complexity is O( p 2 )
as at each iteration the for loop leaves at least one segment on the left of new t. Since p is
expected to be a small constant, the time complexity of O( p 2 ) indeed is not prohibitive.

Input: P, D  , τ
Output: t
1: t ← 0
2: while tr ue do
3: violation ← f alse
4: for P ∈ P do
5: if |(r tnow + t − r tk ) − tm | ≤ τ then
6: violation ← tr ue
7: t ← tm + τ // move t to the right endpoint
8: end if
9: end for
10: if violation = f alse then
11: break
12: end if
13: end while
14: return t
Algorithm 2: Finding minimum time delay

123
From location to location pattern privacy in location-based… 549

We would like to note that the time complexity can easily be reduced to O( plogp) by
using a segment tree data structure [12]. The tree can be built at O( plogp) time and patterns
can be sorted at O( plogp), in non-decreasing order of right endpoints of segments. Then,
the segment tree can be queried at most p times (for each right endpoint), each with O(logp)
time. So, the total time complexity of this approach is O( plogp). In most applications,
however, since typical p value is too small the segment tree data structure may become an
overhead.

4.3 Backward regression

The “backward regression” is a kind of local cloaking region search nearby the current
location. The idea of which is checking for regions temporally closer to the current location
over the user’s own recent trajectory. Indeed if a region is temporally close, it is spatially
close too. This way, the current location is substituted with the user’s one of the previously
and recently visited locations. Another option might be substituting the current location with
a random but spatially close region. However, this option should be avoided as it injects
false data shared with LBS provider. Otherwise, data analysis on LBS site will give false
and misleading conclusions such as a place where the person never visited may look like a
popular place. So, with backward regression, we prefer sharing a recently been region over
sharing a never been region. Regarding the “backward regression,” we indeed follow the
same approach used by PROBE [13] with the same rationale.

4.4 Potential limitations

A potential limitation of our approach due to the partial support is that some proper pre-
fixes of sensitive patterns can be collected at LBS site. Indeed, this is the limitation of all
subset/subsequence based knowledge hiding frameworks, e.g., [3,8]. To cope with this lim-
itation, users are expected to specify minimally sensitive sequences, i.e., none of the proper
prefixes of them are sensitive.
Another potential limitation occurs due to inference of sensitive patterns from shared
LBS history. This happens when the shared LBS history does not explicitly support any
sensitive pattern, but the attacker can infer the existence of some sensitive patterns using
background knowledge, e.g., in the form of city network. This may happen, for instance that
some proper prefix of a sensitive pattern is already shared and due to the spatial/temporal
lack of alternatives the sensitive pattern can be fully inferred. Indeed, this problem can easily
be solved for a given kind of background knowledge as it is not difficult to analyze the
lack of alternative and suppress the necessary LBS requests. However, to guard against all
attackers regardless of the kind of background knowledge, a notion of geo-indistinguishability
(from differential privacy perspective) [6] for location patterns needs to be developed. As
an initial attempt, we consider that the LBS sharing mechanism should ensure that two real
LBS sequences, one supporting a given sensitive pattern and the another barely missing to
support the sensitive pattern, should produce the very similar shared LBS sequences. This
way, attackers are confused on whether the real LBS sequence supports the sensitive pattern
or not.

123
550 O. Abul, C. Bayrak

5 Experimental evaluation

5.1 Performance metric

Referring to Fig. 1, our TALS pattern sanitizer in offline phase accepts user-specific sensitive
location pattern profile. The only parameter decided on at this step is the temporal match
flexibility threshold τ . During the online phase, the LPPP ensurer (the sanitizer) accepts
an LBS request (Dr eq ) and its output (Dr eq ) is an LBS request too. Based on the maxi-
mum patience T T and maximum spatial distance DT parameters, the system does one of
the following: (i) do not change the request and forward it to LBS provider as is, (ii) time
delay the request, (iii) postdate the request, or (iv) drop the request. So, our effectiveness
metric is the ratio of four decisions at given thresholds of τ , T T and DT . We also mea-
sure the efficiency of Algorithm 1 on a standard PC with 2.4 GHz CPU clock and 8 GB of
RAM.

5.2 Datasets

We have experimented with two trajectory datasets: Milano from the GeoPKDD project
(http://www.geopkdd.eu) and Gowalla [11]. Both of the datasets contain user trajectories
collected over a time period. For our purpose, we need to simulate LBS requests from these
trajectories as if the requests are made online.
Milano contains 15,800 trajectories with 2,075,216 spatio-temporal points in metropoli-
tan Milano area; hence, each trajectory comes with 131 points on average. The cloaking
map with 1000 cloaked regions of rectangular shapes is generated arbitrarily. After removing
spatio-temporal points falling out of any cloaked region, we ended up with 580,692 points
(treated as LBS requests), i.e., on average 37 LBS requests per user.
Gowalla is a location check-ins dataset obtained worldwide between February 2009
and October 2010. The number of total check-ins is 6,442,890 belonging to 196,561 distinct
users. Since the data is worldwide, we need to focus on a particular area for our purpose.
To this end, we picked 8250 km2 area on greater Milano region. This filtering left us with
6933 check-ins. Since the temporal extension of the dataset is too large (extending several
months), each trajectory is segmented to full days. After removing spatio-temporal points
falling out of any cloaked region, we ended up with 1101 check-ins (treated as LBS requests)
for 102 distinct users, i.e., on average 11 LBS requests per user. The cloaking map with 500
cloaked regions of rectangular shapes is generated arbitrarily.
For the experimentation purpose, the sensitive pattern generation procedure is as follows.
For each user trajectory, starting from the first visited region we included the next region
into the growing sensitive pattern sequence with 20% probability as the next spatial point.
As a result, a sequence of cloaking regions is obtained per user. Since we know the time gap
ti
between any two consecutive cloaking regions in the sequences, for each segment ri−1 −→ ri
we compute a uniformly randomly chosen ti value between 80 and 120% neighborhood of the
time gap and assign it as the temporal dimension value for the respective segment. We repeat
the above procedure random number of times (between 1 and 10) for each user trajectory. As
a result, each trajectory gives us different number of sensitive patterns and differing length
of sensitive patterns. The sensitive pattern generation procedure ensures that the simulated
LBS sequence can support each of the respective sensitive location pattern with small but
nonzero probability.

123
From location to location pattern privacy in location-based… 551

Fig. 5 Effectiveness results on


Milano. a Percent of decisions
at varying τ levels when
DT = 2000 and T T = 25,000. b
Percent of decisions at varying
T T levels when τ = 2500 and
DT = 2000. c Percent of
decisions at varying DT levels
when τ = 2500 and T T = 4166

(a)

(b)

(c)

123
552 O. Abul, C. Bayrak

(a)

(b)

(c)
Fig. 6 Effectiveness results on Gowalla. a Percent of decisions at varying τ levels when DT = 5000 and
T T = 70. b Percent of decisions at varying T T levels when τ = 30 and DT = 5000. c Percent of decisions
at varying DT levels when τ = 60 and T T = 60

123
From location to location pattern privacy in location-based… 553

(a)

(b)
Fig. 7 Efficiency results on Milano. a Sensitive pattern count versus run-time. b Sensitive pattern length
versus run-time

5.3 Results

Figures 5 and 6 present the effectiveness results for Milano and Gowalla, respectively.
The figures measure the rate of the four effectiveness metrics due to our algorithm at varying
parameter values. First of all, we observed that all of the four outcomes have nonzero rates.
This shows that the algorithm’s design is successful as there is a trade-off between privacy
and utility.
Increasing τ makes the temporal matching easier and it causes LBS request sequences to
violate more sensitive patterns. So, this effectively causes the rate of “as is” class of requests
to decrease, as it is observed in Figs. 5a and 6a. Figures 5c and 6c confirm that with increasing
DT the rate of postdated LBS requests increases as expected. This is because more spatial
error is allowed with increasing DT and this likely causes more steps on the trajectory to
be candidates for the regression. Likewise, with increasing T T the rate of time delaying
increases too (Figs. 5b, 6b).

123
554 O. Abul, C. Bayrak

(a)

(b)
Fig. 8 Efficiency results on Gowalla. a Sensitive pattern count versus run-time. b Sensitive pattern length
versus run-time

Figures 7 and 8 present the efficiency results for Milano and Gowalla, respectively. The
x-axis in Figs. 7a and 8a shows the total number of sensitive patterns for all users. The x-axis
in Figs. 7b and 8b shows the length of sensitive patterns, where the run-time is measured at
two different sensitive pattern counts. Since the number of sensitive patterns per user is too
small (at most 10), we did not employ segment trees at line 7 of Algorithm 1. In all of the
figures, the run-time linearly increases with the number and length of sensitive patterns. This
shows the scalability of our algorithm with respect to sensitive pattern count and length. The
algorithm is practical since the absolute average run-time per user is less than one second
even with the larger dataset Milano.

6 Conclusion

This work introduced the sensitive location pattern privacy in the context of LBSs. It is not an
alternative but complementary to location privacy protection studied in the literature. They

123
From location to location pattern privacy in location-based… 555

together provide better location privacy protection and safer sanitized trajectories shared with
LBS providers.
Since sensitive location pattern privacy is checked online at each LBS request, the saniti-
zation procedure has to be efficient in terms of space and time consumption. Fortunately, our
location pattern privacy ensurer provides a fast and scalable solution by exploiting a dynamic
programming approach. Indeed, it achieves effective constant space and time performances
per LBS request.
The experimental evaluation suggests that considerable rate of LBS requests is either
time delayed, postdated or dropped due to location pattern privacy profile violation. This
shows that pattern privacy violation is relevant and is an important issue in the context of
LBS location privacy. The efficiency results have shown that our approach is practical and
scalable with respect to increasing sensitive pattern counts and lengths.
In this work, as an initial attempt to introduce and to solve location pattern privacy problem,
we modeled location pattern privacy and location privacy problems separately within the same
framework. Although this separation allowed us to develop orthogonal solutions for each
problem, we consider as a future work that a united problem definition with more effective
solution may be addressed as well. In-depth study of the potential limitations addressed in
Sect. 4.4 is another future research direction.

Acknowledgements This work has been supported by TUBITAK under the Grant Number 114E132.

References
1. Abul O, Atzori M, Bonchi F, Giannotti F (2007) Hiding sequences. In: Proceedings of the third ICDE
international workshop on privacy data management (PDM 2007), Istanbul, Turkey, Apr 2007
2. Abul O, Bonchi F, Nanni M (2008) Never walk alone: uncertainty for anonymity in moving objects
databases. In: Proceedings of 24th international conference on data engineering (ICDE 2008), Cancun,
Mexico, Apr 2008
3. Abul O, Atzori M, Bonchi F, Giannotti F (2010) Hiding sequential and spatiotemporal patterns. IEEE
Trans Knowl Data Eng 22(12):1709–1723
4. Abul O, Gokce H (2012) Knowledge hiding from tree and graph databases. Data Knowl Eng 72(108):148–
171
5. Aggarwal CC (2005) On k-anonymity and the curse of dimensionality. In: Proceedings of the 31th
international conference on very large databases (VLDB 2005), Trondheim, Norway, Sep 2005, pp 901–
909
6. Andrés ME, Bordenabe NE, Chatzikokolakis K, Palamidessi C (2013) Geo-indistinguishability: differ-
ential privacy for location-based systems. In: Proceedings of the 2013 ACM SIGSAC conference on
computer & communications security (CCS 2013), Berlin, Germany, Nov 2013, pp 901–914
7. Ağır B, Huguenin K, Hengartner U, Hubaux JP (2016) On the privacy implications of location semantics.
In: Proceedings on privacy enhancing technologies (PoPETs 2016), pp 165–183
8. Atallah M, Bertino E, Elmagarmid A, Ibrahim M, Verykios VS (1999) Disclosure limitation of sensitive
rules. In: Proceedings of the 1999 IEEE knowledge and data engineering exchange workshop (KDEX
1999), pp 45–52
9. Atzori M, Bonchi F, Giannotti F, Pedreschi D (2008) Anonymity preserving pattern discovery. In: Pro-
ceedings of the 34th international conference on very large databases (VLDB 2008), Auckland, New
Zealand, Aug 2008, vol 17(4), pp 703–727
10. Cheng R, Zhang Y, Bertino E, Prabhakar S (2006) Preserving user location privacy in mobile data
management infrastructures. In: Proceedings of the 6th international conference on privacy enhancing
technologies, Cambridge, UK, June 2006, pp 393–412
11. Cho E, Myers SA, Leskovec J (2011) Friendship and mobility: user movement in location-based social
networks. In: Proceedings of the 17th ACM SIGKDD international conference on knowledge discovery
and data mining (KDD 2011), San Diego, CA, USA, Aug 2011, pp 1082–1090
12. Cormen TH, Stein C, Rivest RL, Leiserson CE (2001) Introduction to algorithms, 2nd edn. McGraw-Hill
Higher Education, New York

123
556 O. Abul, C. Bayrak

13. Damiani ML, Bertino E, Silvestri C (2010) The PROBE framework for the personalized cloaking of
private locations. Trans Data Priv 3(2):123–148
14. Dwork C (2006) Differential privacy. In: Proceedings of 33rd international colloquium on automata,
languages and programming (ICALP 2006), Venice, Italy, June 2006, pp 1–12
15. Gedik B, Liu L (2005) Location privacy in mobile systems: a personalized anonymization model. In:
Proceedings of 25th IEEE international conference on distributed computing systems (ICDCS 2005),
Lisboa, Portugal, July 2006, pp 620–629
16. Ghinita G, Kalnis P, Khoshgozaran A, Shahabi C, Tan KL (2008) Private queries in location based services:
anonymizers are not necessary. In: Proceedings of the 2012 ACM SIGMOD international conference on
management of data (SIGMOD 2008), Vancouver, Canada, June 2008
17. Giannotti F, Nanni M, Pedreschi D (2006) Efficient mining of temporally annotated sequences. In: Pro-
ceedings of the sixth SIAM international conference on data mining, Bethesda, MD, USA, Apr 2006
18. Gruteser M, Grunwald D (2003) Anonymous usage of location-based services through spatial and tem-
poral cloaking. In: Proceedings of the 1st international conference on mobile systems, applications and
services, San Francisco, CA, USA, May 2003
19. Gurung S, Lin D, Jiang W, Hurson A, Zhang R (2014) Traffic information publication with privacy
preservation. ACM Trans Intell Syst Technol (TIST 2014) 5(3):44:1–44:26
20. Kido H, Yutaka Y, Satoh T (2005) Protection of location privacy using dummies for location-based
services. In: Proceedings of 21st international conference on data engineering workshops (ICDEW 2005),
Tokyo, Japan, Apr 2005
21. Li N, Li T, Venkatasubramanian S (2007) T-closeness: privacy beyond K -anonymity and L-diversity. In:
Proceedings of 23rd international conference on data engineering (ICDE 2007), Istanbul, Turkey, Apr
2007
22. Machanavajjhala A, Gehrke J, Kifer D, Venkitasubramaniam M (2006) l-diversity: privacy beyond k-
anonymity. In: Proceedings of the 22nd international conference on data engineering (ICDE 2006),
Atlanta, GA, USA, Apr 2006
23. Nergiz ME, Atzori M, Saygin Y, Guc B (2009) Towards trajectory anonymization a generalization based
approach. Trans Data Priv 2(106):47–75
24. O’Leary DE (1991) Knowledge discovery as a threat to database security. Knowl Discov Databases
9:507–516
25. Shokri R, Theodorakopoulos G, Troncoso C, Hubaux JP, Le Boudec JY (2012) Protecting location privacy:
optimal strategy against localization attacks. In: Proceedings of 19th ACM conference on computer and
communications security (CCS 2012), Raleigh, NC, USA, Oct 2012
26. Sweeney L (2002) K -anonymity: a model for protecting privacy. Int J Uncertain Fuzziness Knowl Based
Syst 10(5):557–570
27. Terrovitis M, Mamoulis N (2008) Privacy preservation in the publication of trajectories. In: Proceedings
of the 9th international conference on IEEE mobile data management (MDM 2008), Beijing, China, Apr
2008, pp 65–72
28. Yigitoglu E, Damiani ML, Abul O, Silvestri C (2012) Privacy-preserving sharing of sensitive semantic
locations under road-network constraints. In: Proceedings of the 19th international conference on IEEE
mobile data management (MDM 2012), Bengaluru, India, July 2008

Osman Abul is currently an associate professor of Computer Science


at TOBB University of Economics and Technology, Ankara, Turkey.
He received his Ph.D. degree in Computer Engineering from Middle
East Technical University, Ankara, Turkey. He held visiting posts in
University of Calgary, Norwegian University of Science and Technol-
ogy, and Italian Institute of Information Science and Technology. His
research interests include data mining, privacy and bioinformatics.

123
From location to location pattern privacy in location-based… 557

Cansın Bayrak received his B.S. degree (double major: Computer


Engineering and Industrial Engineering), and M.S. degree in Computer
Engineering from TOBB University of Economics and Technology,
Ankara, Turkey, where he pursues Ph.D. degree in Computer Engi-
neering. His research interests include spatio-temporal data mining and
privacy.

123

Вам также может понравиться