Вы находитесь на странице: 1из 10

Neurocomputing 169 (2015) 134–143

Contents lists available at ScienceDirect

Neurocomputing
journal homepage: www.elsevier.com/locate/neucom

Camera location for real-time traffic state estimation in urban road


network using big GPS data
Zhenyu Shan n, Qianqian Zhu
Intelligent Transportation and Information Security Lab, Hangzhou Normal University, Hangzhou, Zhejiang, China

art ic l e i nf o a b s t r a c t

Article history: Traffic camera has become a popular sensor for traffic state estimation in Intelligent Transportation
Received 5 May 2014 Systems (ITS). However, it is impracticable to cover the whole urban network. Then how to place sensors
Received in revised form in a certain network to ensure the accuracy for both installed and uninstalled roads is of great
14 November 2014
importance. The GPS-equipped taxis traveling on the urban roads yield big GPS data, which contains
Accepted 16 November 2014
Available online 16 May 2015
mass traffic information. In this paper, a traffic state network (TSN) will be presented to model the
relationship among all road sections based on these data. In TSN, the location problem is transformed to
Keywords: find a set of nodes in the network to maximize the received information. Then, it is verified NP-hard, and
Camera location a local optimal solution is presented to solve it in a linear complexity time based on greedy algorithm.
Traffic state estimation
The experiments were carried out on the GPS data from 8007 taxis traveling on 1432 sections for 197
Urban road network
days. The results show that the TSN effectively describes the relationship of road sections in traffic state,
GPS data
and performs better than random road location method and arterial road location method.
& 2015 Elsevier B.V. All rights reserved.

1. Introduction According to information theory, more information is more helpful


to eliminate the uncertainty of estimation [6]. The accuracy of state
Traffic state estimation has played a key role in Intelligent estimation depends on the information quantity. So we should
Transportation Systems (ITS), which is the basis of various ITS make sure that the amount of information each road receives from
services, such as route guidance, tour planning and traffic manage- other roads and keep it enough.
ment [1]. Nowadays, traffic cameras provide essential data for Most of taxis traveling on the urban roads have been equipped
high-accuracy and real-time estimation applications. However, it with Global Positioning System (GPS) over many years of construc-
is infeasible to cover all or most of the road sections in large urban tion and operation of ITS. These taxis yield large-scale GPS data
road network because of huge costs of installation and mainte- while the collection lasts a long period of time (such as one year). It
nance. Hence agencies are called in to determine the locations of will cover the whole network and contain the traffic information of
cameras for efficiency. Their objective is to find out a set of roads each road. Thus, it is suitable to use the data to model the
to install cameras to ensure the accuracy for the whole network. relationship among different roads. What is more important is that
In previous decades, researchers presented plenty of traffic state the GPS data will be collected much easier with the wide use of GPS
estimation methods using video data from cameras installed above equipment. The source of the data is confined to taxis, while drivers
the roads. These methods work well when there is more than one or passengers with mobile phones can also provide data.
camera installed on the road. But for uninstalled roads they perform In this paper, the big GPS data is applied to determine the
far from satisfying because there is no sensor data for estimation locations for traffic state estimation in urban road network. At first,
[2]. It is named as the missing data problem. In order to overcome a traffic state network (TSN) is presented to model the GPS data by
this problem, some missing data estimation methods are proposed the relationship of traffic state among different roads. The model is
such as history imputation methods [3], PPCA-based method [4], an undirected weighted graph, where the node represents the road
compressive sensing based method [5] and STM method [6]. Most section while the edge represents the information that can receive
of them are based on the observations that there are some relation- from these connected nodes. The weight is measured by mutual
ships between different road sections or adjacent time periods or information (MI), which is often used as a measure of the variables
both of them. Through modeling the relationships, the uninstalled mutual dependence. In MI calculation, Gaussian mixture model
roads can utilize the traffic information from the installed roads. (GMM) is used to describe the distribution of traffic state. Based on
TSN, the sum of information received by each uninstalled node can
be calculated. In order to improve traffic state estimation, it should
n
Corresponding author. keep each road receiving enough information. Thus, the aim is to

http://dx.doi.org/10.1016/j.neucom.2014.11.093
0925-2312/& 2015 Elsevier B.V. All rights reserved.
Z. Shan, Q. Zhu / Neurocomputing 169 (2015) 134–143 135

find a set of nodes in the network to space cameras to maximize the The GPS equipped probe vehicles are most likely to cover the
received information. Its complexity equals to minimum dominat- whole network because it is wide-spread, cheap and technically
ing set problem and maximum dominating edge problem, which is easy to deal with [16]. The limitation is the data missing problem
proved as an NP-hard problem. Then, according to greedy algo- mainly caused by uneven distribution of probe vehicles. Various
rithm, a TSN based location method is presented to get a local methods have been proposed to overcome this problem, mainly
optimal solution with linear complexity. This method can also be focused on using the information in time and space dimension. For
applied in placing fixed installed sensors, thus traffic camera is time dimension, Chrobok et al. employed historical data with
named as sensor instead. similar traffic behavior to aggregate missing data, which is based
The experiments were carried out on the GPS data for ten on road classification schemes such as Tele Atlas Functional Road
months in Hangzhou, China. The results are shown as follows: Classes [4,17] and day categories [18]. Without pre-classification,
(1) the GMMs (order¼2) have a good performance in describing Widhalm et al. [19] presented a GMM based method to learn
the traffic speed distribution, when the missing rate of the GPS information about typical shapes of the diurnal speed time series.
data is below 35% and the number of days is larger than 15. It is Considering both of time and space dimension, Shan et al. [6]
also the low bounds for the parameters training of TSN; (2) the modified the multiple linear regression models for applying the
uninstalled roads in our method received more information than information from adjacent times and roads with highest correla-
random road location method and arterial road location method, tion coefficients. Zhu et al. [5] proposed a compressive sensing
and have a better performance in traffic speed estimation. based algorithm through observation of the hidden structures
Our contributions include the followings: within the traffic states of a road network. The favorable results of
these methods reveal that it is a promising way to improve the
(1) The traffic state network is presented to transform the location missing data estimation while receiving more information.
problem into a graph problem similar as a dominant set The received traffic information of the road lies on the sensor
problem. We experimentally analyzed the lower bound of data placement. A variety of methods have been presented to determine
volume for TSN training. the sensors placement with different optimization objectives, such
2) The location method based on TSN is verified as an NP-hard as origin–destination (OD) matrix, travel time and traffic state. For
problem. A method based on greedy algorithm is presented to OD matrix, Van et al. [20] presented two models to estimate the OD
solve this problem in a linear complexity, which provides a matrix from vehicle counts which seeks to reproduce the observed
local optimal result. link flows. The models are based on the information minimization
3) The experimental data is collected from actual GPS-equipped and entropy maximization principles. Fisk et al. [21] combined the
taxis in the urban road network. The results show that our maximum entropy model with a user-equilibrium traffic model into
method is effective. a single mathematical problem which can be solved by bi-level
programming. Their work inspired us to solve the location problem
by combining information theory with optimum technology. In
This paper is composed of six sections. The next section
addition, the number of OD pairs is normally greater than the link
provides a review of related work on the traffic sensors location
traffic stations. It can integrate the priori matrix with the counts
research. Section 3 describes the details of TSN. Section 4 presents
to identify a unique estimated matrix by statistical inference
the TSN based method solving the graphic problem according to
approaches. Considering these, maximum likelihood method [22],
the greedy algorithm. Section 5 examines the results using the
generalized least-squares method [23] and Bayesian inference
presented models. The final section draws a conclusion and
method [24] were proposed to optimize the OD matrix.
introduces some future work.
For travel time, new sensor technologies can track vehicles
identifications, hence can estimate the actual travel time between
any pair of sensors. Gentili et al. [25] investigated guidelines for
2. Review of related work locating advanced traffic sensors that are able to read both the
identification and route information. Castillo et al. [26] determined
With the rapid progress of sensor technology, various types of the optimal locations of vehicle plate scanning sensors for path flow
sensors (e.g., loop detectors [7], microwave [8], probe vehicles [9], reconstruction. Fujito et al. [27] concluded that the actual placement
cameras [10], and cell phones [11]) are adopted to collect data for of sensors was critical in accurately estimating the traffic congestion
traffic state estimation. Among these sensors, loop detectors, probe levels on a freeway segment. Liu et al. [28] found that it was
vehicles and cameras are most widely used, and each has its own sufficient to have sensor stations placed at both ends of the segment
strong and weak points. For loop detectors, they have better for free-flow conditions. Meanwhile they proposed rules and an
performance in traffic state estimation but often limited by high iterative procedure for locating a limited number of sensors. Li et al.
installment cost and damage rates. And these drawbacks rarely [29] used four different models to compute freeway segment travel
happen in probe vehicles and cameras. But probe vehicles meet the times by aggregating the constituent sections. However, this study
limitation of high missing rate and errors in the map-matching aims to determine the best method to estimate travel times instead
process. Thus, some researchers devoted to fuse the data from of determining the optimal sensor locations. Bartin et al. [30] adopted
different sensors to make full use of traffic information. Cheu et al. K-means clustering algorithm to identify the optimal locations of
[12] presented a model based on neural network which achieved sensors, which has better travel time estimates than the equidistant
good effectiveness in simulations. El Faouzi et al. [13] discussed the approach. However, most of these studies focused on freeway
best linear estimation and weighted least-square methods for segment and few attentions are paid to the urban network.
fusing the traffic data. Zhang et al. [14] developed architecture to As for traffic state, most researchers are concerned about where
manage, analyze, and use the traffic data. Kong et al. [15] integrate and how to place sensors on one certain road for more accurate
the federated Kalman filter and evidence theory to form a platform estimation. There still lack of research works in the perspective of
for the fusion of multi-sensor data. However, it is always a difficult the whole network. In further, it is impossible to directly apply
task to locate kinds of sensors in the whole road network. Cameras existing location method for traffic state estimation because of
have been widely used for monitoring cars running red light. It will different optimization objectives. For OD matrix, the performance is
reduce the cost to estimate the traffic state by the video from these commonly measured by flow volume coverage [31]. Under this
cameras and it has been interested the industry. measure, a single sensor located anywhere along an OD flow path is
136 Z. Shan, Q. Zhu / Neurocomputing 169 (2015) 134–143

sufficient for covering the flow. It cannot meet the demand of traffic
state estimation. For travel time, the objective is the same as that is
in freeway applications. Because the freeway has no intersections,
the average traffic speed of a freeway segment can be calculated by
its length dividing travel time. However, this condition is not easily
satisfied in urban network.
In recent years, large-scale and complex traffic data have been
collected with wide deployment of ITS. It is usually isolated or has
some dispersed links. This raises a new challenge to extract or mine
additional information from these large and complex data [32]. In
the field of big data, there is no general approach to this problem.
However, a powerful idea is to integrate the data to compose a
network by the links reflecting their relationships. The information
of the data is hidden in the network. Thus, the useful information or
solution methods can be obtained based on this network. There are
some significant and insightful studies currently being done on
social network sites like Facebook and Twitter [33]. For example,
they conducted sentiment analysis of messages to help predict job Fig. 1. An example of traffic state network.
losses, spending reductions or disease outbreaks in a region [34]. On
complex gene networks, Aviv et al. [35] presented a mechanism It aims to find k nodes, and ensure that the minimum sum of weight
that extends the scope of evolutionary capacitance; Hamid [36] of other n k nodes is the biggest in all possible combinations of
presented a method to model genomic regulatory networks. In selection. For sensor location, there still exist two difficulties: one is
these methods, the value of data comes from the patterns that can how to calculate the weight (W), which will be introduced in
be derived by making connections between pieces of data. Section 3.2; the other is how to determine V n in Section 4.
Recently, there are plenty of researches on ITS applications
based on traffic videos. For traffic control, a video-based approach 3.2. Weight definition
is presented to learn the specific driving characteristics of drivers
from the traffic videos [37]. For traffic flow estimation, a virtual The definition of weight must obey the following two condi-
loop method is employed to improve the quality of vehicle tions: (1) it can quantify the received information one node can
counting [38]. For adaptive traffic signaling, a video based scheme obtain from connected nodes; 2) it is unknown that which
is proposed for reducing waiting period of vehicles at road methods will be used to estimate the missing data, so the weight
junctions without detecting or tracking vehicles [39]. For traffic calculation method should be irrelevant with the missing data
speed and space occupancy estimate, a video system is built to estimation method. In information theory, mutual information
calculate traffic flow speed and road space occupancy, and recog- (MI ðX; Y Þ) is defined as the quantity of information obtained in X, if
nize three typical traffic states (congested, slow, and smooth) [41]. the observation about Y is made [43]. It reveals the amount of
In these applications, the videos are applied to estimate traffic uncertainty that is reduced during the observation process. And it
parameters or traffic state in a direct or indirect way [40]. is irrelevant to the observation methods. Thus, MI meets the two
requirements to define the weight. Formally, it is defined as
Z Z  
pðx; yÞ
3. Traffic state network MIðX; Y Þ ¼ pðx; yÞlog dx dy ð2Þ
Y X pðxÞpðyÞ
3.1. Model description where pðx; yÞ is the joint probability distribution function of X and
Y, X and Y are variables of traffic state, and p(x) and p(y) are the
The goal of our research is to determine the sensor locations so probability distribution of X and Y respectively.
as to optimize the estimation performance of all the road sections According to the assumption of TSN, each node will receive
with limited costs. The performance is depended on the received information from other nodes (often more than one). As a result,
information that one section can obtain from other sections. Thus, the weight calculation is not only focused on the relationship
it needs to build the relationships of received information between between two sections, but also among three or more sections. For
all sections. Graph is a mathematical structure used to model example, the traffic state of one road can be estimated based on
pairwise relations among finite objects [42]. For a certain urban the data collected from sensors installed on other two sections.
network, the number of sections is fixed. Thus, the traffic state According to (1), this multi-variable MI is added as
network is built based on the graph structure.
MIðX; 〈Y; Z〉Þ ¼ MIðX; YÞ þMIðX; ZÞ ð3Þ
In the model, sections are regarded as nodes of the graph and
each pair of nodes are connected together with a weighted edge However, the accumulation method does not satisfy the origi-
reflecting the information quantity that one node can obtain from nal definition of MI. The reason is that there is a joint distribution
the other. According to this definition, the relationship of traffic (intersection) between Y and Z. The extension from two to multi-
state in the road network is transformed to an undirected weighted ple variables is not trivial, even for the simplest case of three
graph G ¼ ðV; E; WÞ, where V is a finite set of nodes and E denotes variables [44]. The reduction will lose information and make (3)
edges and W represents weights associated with each edge. It invalid. Furthermore, more variables mean greater computational
connects sections by the relationship of their traffic state, thus complexity in the extensions. When the variable number is bigger
called traffic state network. One example is shown in Fig. 1. On TSN, than 5, their joint distribution is hard to be estimated in a
the objective of optimizing traffic sensor locations can reasonable time.
 be forma-
lized as follows: it is to find a set of nodes V n and V n  ¼ k, where In order to overcome this limitation, we experimentally analyze
  the influence for variable MI calculation by using (3). The data
V n ¼ arg max min n ðΣ ðvi ;vjÞ A E Wðvi ; vj ÞÞ ð1Þ come from the GPS dataset and their introduction can be found in
vj A V n vi A V  V
Section 5.1. The r-variable (r ¼ 2; 3; 4; 5) MI is calculated by the
Z. Shan, Q. Zhu / Neurocomputing 169 (2015) 134–143 137

gðxj μi ; Σ i Þ are the Gaussian components:


( )
  1 1 0 X
1  
g xj μi ; Σ i ¼  1=2 exp  x  μ i x  μi ð6Þ
ð2π ÞD=2 Σ i  2 i
 
with mean μi and covariance matrix ðΣ i Þ
The GMM is parameterized by order, mean, covariance matrix
and mixture weights.
 These parameters are collectively repre-
sented by λ ¼ M; wi ; μi ; Σ i , which can be estimated by training.
The training process is to find the best parameters to match the
distribution of the training data. There are several techniques
available for estimating the parameters of a GMM. By far the most
popular and well-established method is the Expectation Max-
imum (EM) algorithm [48]. Before using EM, M should be deter-
mined. The details will be discussed in the experiments. It is an
iterative method. The initial λ0 is often derived by K-Means. In
each iteration of EM, letting λi be the initial parameter and
estimate a new model λi þ 1 to get pðX j λi þ 1 Þ ZpðX j λÞ.

3.4. MCMC for MI estimation

Fig. 2. The relationship of MI value calculated by original definition (OD) and For MI calculation, there is a double integration in (2) which
accumulation method (AM).
has no analytical solution for GMM. However, double integration
can be transformed into double sum through discrete method
original definition and the accumulation method respectively. We which is defined as follows:
XX  
do not continue to increase the number of variables because it is pðx; yÞ
enough to consider the relationship among six variables in most MIðX; YÞ ¼ pðx; yÞlog ð7Þ
x A Xy A Y
pðxÞpðyÞ
situations. The result is normalized by max–min normalization
method. As shown in Fig. 2, the value of original definition is If the number of samples (X and Y) is big enough, the result will be
proportional to accumulation method. This paper aims at max- very close to the real value. However, the number of samples is not
imizing the MI between nodes with least information among all usually satisfied.
the nodes, which will not be affected but replacing the parameters Markov chain Monte Carlo (MCMC) is a family of procedures for
for proportional relationship. As a consequence, although there is Bayesian computations. The basic idea is to draw random samples
no close-form formula for completely computing multi-variable from a Markov chain, whose equilibrium distribution is the target
MI, it still can be approximated based on the following inequality: distribution. Markov chain stands for a sequence of random variables
in which the distribution of each element or variable depends on its
X
N
previous variable value. These samples are then used to approximate
MIðX; 〈Y 1 ; Y 2 ; …; Y n 〉Þ Z MIðX; Y i Þ ð4Þ
i¼1
the desired sampling. Gibbs sampling is a MCMC algorithm for
obtaining a sequence of observations which are approximated from
a specified joint probability distribution [49]. This sequence can be
used to approximate  the joint  distribution like pðx; yÞ. Let
3.3. Distribution of road traffic speed π ðxÞ ¼ π ðx1 ; …; xk Þ x A Rn ; 1 o k o n denote

the target density,
where, for i ¼ 1; …; k; xi ¼ xi;1 ; …; xi;nðiÞ ðnðiÞ Z1Þ and xi;j are scalar
In MI calculation, we should know the distribution of traffic state components of x. π ðxi j x  i Þ is the induced conditional densities for
(p(x)) at first. In this paper, road traffic speed (RTS) is applied to each component
  sub-vectors xi, given values of the other components
denote the traffic state instead of the travel time or other measures, x  i ¼ xj ; j a i ði ¼ 1; …; kÞ. The Gibbs sampling algorithm proceeds
 
because speed is irrelevant to the section length and can be as follows: firstly, set arbitrary initial values x0 ¼ x01 ; …; x0k , then,
conveniently changed into any other quantities (travel time, density, successively make random variate drawings from each of π ðxi j x  i Þ. t
 
etc.). Several researchers have found that the RTS fits the Gaussian is a transition mechanism from x0 to x1 ¼ x11 ; …; x1k . Iteration of this
(normal) distribution, such as in [45,46]. Fig. 3 shows that the process generates a sequence x ; x ; …; x ; … . The samples will
0 1 t

histograms of RTS on four roads from 6:00 to 22:00 with the interval approximate the joint distribution of all variables. In this paper, we
1 km/h. In the exploration of the speed datasets, we noticed that it is employ Gibbs sampling algorithm to jointly estimate the MI.
a distribution with multi-peaks, which fits the multimodal distribu-
tion. The Gaussian mixture distribution or Gaussian mixture model
(GMM) is often used to describe, at least approximately, any set of 4. Location method
correlated real-valued random variables each of which clusters
around a mean value [47]. It is an extension of Gaussian distribution. This section presents an approach to the sensor location
Thus, it is applied to fit the RTS distribution in urban road. problem for optimizing traffic state estimation based on TSN by
A GMM (order¼M) is a weighted sum of M Gaussian compo- the idea of greedy algorithm, consisting of problem analysis and
nents, which is defined as method description. The focus of this approach is to solve the
sensor location problem with a limited number of sensors.
X
M
pðxj λÞ ¼ wi gðxj μi ; Σ i Þ ð5Þ
i¼1
4.1. Problem analysis

where x is a D dimensional vector (D ¼1 for RTS), wi are the Before analysis the complexity of the location problem on
PM
mixture weights satisfying the constraint that i ¼ 1 wi ¼ 1. The TSNM, two well-known problems need to be introduced:
138 Z. Shan, Q. Zhu / Neurocomputing 169 (2015) 134–143

Fig. 3. The histogram (blue solid line) of road traffic speed of four typical road sections and the distribution (red dotted line) by 2-order GMM. (For interpretation of the
references to color in this figure caption, the reader is referred to the web version of this paper.)

minimum dominating set problem and maximum edge dominat- optimal solution in a reasonable time [51]. Thus, a method based
ing set problem [50]. on greedy algorithm is proposed to solve the location problem. It is
In an unweight undirected graph (G ¼ ðV; EÞ), the objective of based on traffic state network so is named TSN method.
dominating set problem is to find a set V n , V n D V, if 8 vi A V  V n ,
(vj A V n makes that ðvi ; vj Þ A E. The result Vn is a dominating set of G. 4.2. Method description
The location problem can be concluded into a dominating set problem
on the assumption that the state of uninstalled nodes can be perfectly The process of the TSN method is shown in the pseudo-code
estimated while it is connected with an installed one. According to this below, where the set (Vn) of installed nodes is initialized by empty set.
assumption, all weights are equal. The aim of location method turns to If some roads have installed sensors or some roads must install sensor,
minimize the dominating set jV n j . It is a minimum dominating set the D is not empty at the beginning. In the pseudo-code, weightðvi ; vj Þ
problem that is NP-hard. It indicates that the location problem is as denotes the weight between node vi and vj. The variable coveredi is
hard as the minimum dominating set problem. true if Vn contains vi. The variable infoSendi denotes the sum of
The objective of maximum edge dominating set problem is to information that node vi sends to the connected nodes when it is
find a set Vn, V n A V, if 8 ei;j A E, that vi A V  V n and vj A V n . It is also installed. The variable infoReceivei denotes the sum of information that
an NP-hard problem. We assume that the location problem is to node vi receives from other installed nodes. If the node vi is installed,
select more edges connected with installed nodes so as to make infoReceivei is infinite (INFI) because of the assumption that the traffic
the whole network to obtain more information. The  objective
 state of the installed roads can be perfectly estimated.
becomes to include all edges in the set with minimal V n . In this The TSN is an iterative method to iteratively augment the mini-
situation, the problem can be viewed as a maximum edge mum received information. In each iteration, the best case is that a
dominating set problem. node is added to V n which would send the maximal information to
According to the above analysis, the solution of minimum uninstalled nodes and have received the minimal information. This
dominating set problem and maximum edge dominating set pro- case often appears at the beginning of the process while most nodes
blem is a subset of the location problem. If the location problem is are uninstalled. If no nodes meet this condition, the node that receives
not a NP-hard problem, so is the minimum dominating set problem the minimal information is added to guarantee obtaining enough
and maximum edge dominating set problem. Thus, the location information. If more than one node satisfies the condition, the node to
problem is an NP-hard problem. In application, it is impossible to be added is chosen randomly from the eligible nodes.
build TSN from real data to satisfy the above assumption. It will take If the number of sensors to be installed is n, it can be checked
more computational time than solving minimum dominating set that each call to ChooseVertex and AdjustInfo takes n times. Using
problem and maximum edge dominating set problem. an adjacency list to implement the graph structure, the total time
In an urban area, there are tens of thousands of road sections that spent in AdjustInfo throughout execution of the algorithm is O(m),
can be placed sensors on, for example more than 1432 in Hangzhou. where m{n and m is the average degree of the graph. In
Thus, it is time-consuming to adopt the exhaustive method to solve ChooseVertex, a list is used to store infoSendi and infoReceivei, in
the location problem. The idea of greedy algorithm is to find a global descending order and is updated immediately when changed,
optimum by following the problem solving heuristic of making the whose complexity is O(m). Thus the total complexity of the
locally optimal choice at each stage. A greedy heuristic may yield algorithm is OðnnmÞ r Oðn2 Þ.
locally optimal solutions. Nonetheless, it approximates a global ChooseVertex (Vector infoSend, Vector infoReceive)
Z. Shan, Q. Zhu / Neurocomputing 169 (2015) 134–143 139

Algorithm 1. the elapsed time. Considering smoothness and stability, the interval time
is set at 15 min [52]. It contains 64 points of RTS data in one day.
1. MS¼ arg max1 r i r n infoSendi The process of building a TSN is as follows: at first, the collected
2. MR¼ arg max1 r i r n infoReceivei GPS data is used to estimate the RTS for all 1432 road sections;
3. M ¼ MS⋂MR secondly, the RTS distribution (GMM) of each section is trained by EM.
4. if M ¼ ∅ The order of GMM is set according to the experimental requirement;
5. return element of MR chosen uniformly at random. then, MCMC is applied to calculate the MI between each pair of road
6. else sections. The total number of pairs is 1432n1431; finally, the TSN is
7. return element of M chosen uniformly at random. composed of 1432 nodes, 1432n1432 edges, and the weight is the MI.
After building the TSN, the greedy algorithm based location method is
AdjustInfo(Vector covered, Vector infoReceive, Vector infoSend,
applied to find a limited number of sections to locate sensors.
Vertex vi).

Algorithm 2. 5.2. GMM parameters estimation

1. coveredi ¼ true 5.2.1. Model order


2. infoReceivei ¼ INFI; infoSendi ¼ 0 In MI calculation, the order of each GMM should be set at the
3. for each neighbor vj of vi that coveredj ¼ false same value to ensure the comparability. It is essential to determine
4. infoSendj ¼ infoSendj  weightðvi ; vj Þ the order at first. Firstly, the order of GMM for each section was
5. infoReceivej ¼ infoReceivej þ weightðvi ; vj Þ calculated based on Akaike information criterion (AIC). Then, their
distribution was counted as shown in Fig. 5. It indicates that it is big
enough to set the order at 2 for most sections (4 96%). The reason
may be that the RTS of most sections tends to change around a fixed
Greedy(Graph G). value, even in the complex situation. Most of them contain less than
3 peaks. The fitting results of the typical distributions by GMM are
Algorithm 3. shown in Fig. 3 (blue dotted line). It further illustrates that the 2-
order model can accurately describe the RTS distribution. Thus, the
1. V n ¼ ∅; GMM (order¼2) was adopted in the following experiments.
2. For all nodes vi
3. infoSendi ¼sum(weight
5.2.2. Training data
(vi ; vj ÞÞ // vj is the neighbors of vj i
We proposed to apply large-scale data to solve location
4. infoReceivei ¼ 0
problem. Some cities may have few GPS-equipped probe vehicles
5. coveredi ¼ false
and the GPS data is limited. It will spend massively to collect GPS
6. while 1 to n
data in these cities. Thus, it is important to reveal how much data
7. v ¼ChooseVertex(infoSend, infoReceive)
is enough to train the GMM. For a 1-dimension 2-order GMM, the
8. if v ¼  1
number of uncertain parameters is 3n2 ¼6. Ideally, it only needs
9. add v to V n 6 points to calculate the parameter. In reality, the number of data
10. AdjustInfo(covered, infoSend, infoReceive, v) points must be far more than 6 in order to ensure the accuracy. We
11. return V n determined the base line through the following experiments.
The experiment employed the data from 36 sections in 60 days
with the missing rate below 5%. At first, the GMMN was trained
5. Performance evaluation using N days data, where N ¼ 1; 2; …; 40. It trained 40 GMM for
each section. Here, the number of days is the data volume for
5.1. Experimental setup training. Then, the difference between GMMN and GMM40 was
estimated by Kullback–Leibler divergence (KLD). In probability
The experiments were carried out on the GPS data collected from theory and information theory, the KLD is often used in measure
8007 GPS-equipped taxis traveling in Hangzhou, Zhejiang Province, the difference between two probability distributions [53]. For
China, from September 1, 2012 to June 30, 2013. It covers core area of distributions p(x) and q(x), KLD is defined as
the city as shown in Fig. 4. This area has 1432 road sections including Z 1  
358 arterial roads and 1074 collector roads. The different directions in pðxÞ
DKL ðpðxÞ J qðxÞÞ ¼ ln pðxÞ dx ð8Þ
one section are treated as two different sections. The travel purpose þ1 qðxÞ
is usual different between weekends (including legal holiday) and In order to maintain symmetric, it was calculated as
workdays, which will lead to the difference of traffic state distribu-
KLD0 ðpðxÞ; qðxÞÞ ¼ DKL ðpðxÞ J qðxÞÞ þ DKL ðqðxÞ J pðxÞÞ ð9Þ
tion. It is more important to acquire the traffic state in workdays, and
we only applied the data of these days. Besides, as vehicles in the There is an indefinite integral in KLD calculation. It can be turned
middle night and early morning are pretty few, the data in this time to a summation and calculated by Gibbs sampling [49].
period is also ignored. In summary, the data in 197 days from The experimental result for different days is shown in Fig. 6, where
Monday to Friday during 6:00 to 22:00 is applied in our experiments. the yellow (red and blue) line is the minimum (maximum and average)
The data volume is more than 712 GB. value of KDL. It shows that the average KDL does not obviously
The upload frequency of GPS equipment is set at 20 s. Based on the decrease with the data increasing, when N Z 7. It means that 7-days
GPS data, the RTS is calculated using the estimation method as described is enough for most sections. The reason may be that the distribution of
in [13]. In an interval, the processing includes four steps: at first, a RTS is almost the same in different 7 days which contains sufficient
coordinate transforming is operated to change the WGS-84 3-D information for model training. Considering the cost of data collection,
coordinates to a GIS-T map; second, the original points of roads are 7-days data can be set as the base line. In addition, the maximum KLD
matched onto the roads of the map by “Point-Arc” approach; third, the is very close to the average value when N 4 15, which indicates 15-
sample set on each link or road segment is approximated by least square days data performs well in the worst situation. Thus, 15-days data is
method; finally, the RTS is calculated by the traveled distance dividing appropriate for precisely GMM training.
140 Z. Shan, Q. Zhu / Neurocomputing 169 (2015) 134–143

Fig. 4. The core area for sensor locations experiments in Hangzhou. (For interpretation of the references to color in this figure, the reader is referred to the web version of
this paper.)

Fig. 5. The distribution of GMM order estimated by AIC.

5.2.3. Missing rate


The taxis are unevenly distributed and their number is limited,
so that the GPS data may not cover the whole network in a certain
time period. If the missing data is critical to depict the shape of the Fig. 6. The difference of GMMs trained using the data collected from different days.
distribution, it will affect the accuracy of model training. The The x-axis denotes the number (N) of days for GMM training, and the y-axis
missing rate is usually used to express the missing degree, which denotes the KLD between GMM 40 and GMMN . (For interpretation of the references
to color in this figure, the reader is referred to the web version of this paper.)
is defined as the ratio of missing times with total time intervals.
According to this definition, the average missing rate is 27.3% in
our dataset and 15.8% in the arterial sections. Though EM algo- However, the base line can be analyzed by the same experimental
rithm is helpful in data missing situation, we still do not know processing.
how much missing rate it can be tolerant. Thus, it is necessary to
analyze the influence of data missing. 5.3. MI performance
The experimental data is collected from 36 sections in 15 days,
which is the same as in the training data experiment. At first, we The performance of MI was evaluated in a comparison experi-
trained GMMR using the data of different missing rates ment. We selected 6 sections as shown in Fig. 8. It is a represen-
(R ¼ ½5%; 10%; 15%; …60%). Each section has 12 model parameters. tative sample in urban networks. The MI between each section is
Assuming the missing data follows the uniform distribution, we calculated and shown in Table 1. As shown in Table 1, MIðRS3 ; RS2 Þ
randomly removed the data to reach the certain missing rate. Then, and MIðRS3 ; RS4 Þ are bigger than MIðRS3 ; RS1 Þ and MIðRS3 ; RS5 Þ. It
we compared the GMM R ¼ 5% with other 12 models by KLD. The means that RS3 can receive more information from RS2 =RS5 than
average KLD of all sections is shown in Fig. 7. We found that the RS4 =RS1 . Here, RS2 links with RS3 and RS4 is in the next intersec-
models have small differences while R r 35%. It means that the data tion. The result fits the observations that it has a great possibility
with R r 35% can train the GMM parameters accurately, which can of the similar RTS distribution when two sections are linked. Thus,
be satisfied by the most sections in the dataset. While the missing it is suitable to apply MI for received information estimation.
rate rises higher (R 4 35%), the difference increases faster and the RS6 is far away from RS3. It seems no link in traffic state between
data missing affects the trained parameters. The model trained from them. But MIðRS3 ; RS6 Þ is almost the same as MIðRS3 ; RS2 Þ. Fig. 9
these data is unable to reflect the real distribution. Combined shows the RTS variation of RS3, RS6 and RS1 from 6:00 to 22:00 in one
with the results in the training data experiment, it summarizes day. The curve of RS3 and RS6 is similar. And the two sections have
that the RTS model should be trained from the data with N Z 15 the similar rush hour in the morning (7:45–9:45) and afternoon
and R r 35%. This is an analysis result on a certain dataset. (16:00–19:00). It means that send enough information to RS1 for
And the required data might be different in other urban networks. traffic state estimation and the RTS of RS1 can be estimated from the
Z. Shan, Q. Zhu / Neurocomputing 169 (2015) 134–143 141

Table 1
The MI between different road sections.

RS2 RS3 RS4 RS5 RS6

RS1 1:80n10  4 1:75n10  4 1:05n10  4 4:52n10  5 6:65n10  5


RS2 – 6:20n10  2 7:7n10  2 4:38n10  4 1:61n10  2
RS3 – – 1:21n10  3 6:32n10  4 4:01n10  3
RS4 – – – 2:87n10  2 7:89n10  3
RS5 – – – – 9:68n10  4

Fig. 7. The difference of GMM trained with the missing rate from 5% to 60%.

Fig. 9. The change curve of road traffic speed from 6:00 to 22:00 in October
18, 2012.

the high cover rate (C Z50%). However, low cover rate is often
encountered in real application with cost limitation. In summary, the
greedy method is capable of finding an effective set of TSN to space
Fig. 8. The location of six road sections in MI performance evaluation. sensors, and ensures maximizing the minimum received information.

data obtained from RS6. As a consequence, MI performs as expected 5.5. Traffic speed estimation
for calculating the RTS relationship between different sections.
Three experiments were designed to evaluate the perfor-
5.4. Sensor location result mance of TSN in RTS estimation. It was compared with RRLM
and ARLM. At first, the location method is applied to determine
In sensor location experiments, the TSN method was compared the sensors space, when C ¼ 10%. Then, the RTS of the uncovered
with exhaustion location road method (ELRM), random road loca- sections is calculated by historical average method (HAM),
tion method (RRLM) and arterial road location method (ARLM). linked road average method (LRAM) and spatial correlation
ELRD finds the location set by exhaustion method, which shows the method (SCM) respectively. HAM adopts the average of RTS at
best result on TSN model. RRLM randomly selects sections to space the same time interval of previous 5 days to estimate the current
sensors. In the experiments, we run it for ten times and calculate missing data [16]. It does not apply the real-time data from
the average value in order to reduce the random error. In ARLM, the sensors. In LRAM, the missing data is estimated by the average
sections are divided into different levels by the concern degree. In RTS of linked sections. If all linked section is uninstalled sensors
ARLM, the sensors are located on the highest level (most con- (n ¼0), the RTS is estimated by HAM. In SCM, the missing data is
cerned) at first. In the same level, RRLM is applied. Then, if there are estimated by n sections with the highest correlations [18], where
remaining sensors, the sections in the next level are considered till the upper limit of n is set at 6. Finally, the root mean square error
all sensors are spaced. In this experiment, all 1432 sections are (RMSE) is applied to evaluate the estimate error [54], which is
divided into two levels: arterial section and collector section. defined as
Arterial sections (yellow colored and wider as shown in Fig. 4) are vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
u d
the skeleton of urban road network with high-capacity. Most uX
drivers concern the traffic state of this type of sections. RMSE ¼ t ðvbi  vi Þ2 =d ð10Þ
i¼1
All data in the dataset is applied in the experiment. The perfor-
mance is evaluated by the total received information with different where d is the total testing time and vi =v bi is the true value/
cover rate (C ¼ ½10%; 20%; …; 70%). The result is shown in Fig. 9, estimate of RTS respectively. Here, the result estimated by the
where x-axis donates the cover rate and y-axis donates the ratio of TSN GPS data is treated as the true value.
with ELRM, RRLM and ARLM. It shows that TSN is much better than The result of RTS estimation experiments is shown in Table 2.
RRLM and ARLM. TSN is more effective in maximizing the received We found that: (1) the average RMSE of TSN (3.87) is lower than
information than RRLM and ARLM. The ELRM performs better than ARLM (5.02) and RRLM (6.48). In SCM, the RMSE is reduced 26.3%
TSN. The ELRM is about 1.2 times as TSN, which is far less than the (ARLM) and 46.1% (RRLM) respectively. It indicates that the model
difference of RRLM and ARLM. It means that the local optimum result estimated based on the large-scale GPS data is effective in sensor
by greedy method is very close to the best option. The superiority of location for traffic speed (state) estimation; (2) in both of SCM and
TSN decreases with the increasing of cover rate. The received informa- LRAM, TSN outperforms than RRLM and ARLM. It means that the
tion of RRLM/ARLM is almost the same with TSN, when C Z improvement of our method is not related to the missing data
40%=C Z 50%. It hits that the location problem can be ignored in estimation method to some extent; (3) SCM and LRAM have a
142 Z. Shan, Q. Zhu / Neurocomputing 169 (2015) 134–143

Table 2 designed for maximizing the performance of speed estimation with


The result (RMSE) of road traffic speed estimation with traffic state network (TSN), finite number of cameras, and our future work should consider other
random road location method (RRLM) and arterial road location method (ARLM).
crucial factors for real-world camera network design, such as how and
The missing data is calculated by historical average method (HAM), linked road
average method (LRAM) and spatial correlation method (SCM) respectively. where to locate cameras on the road for performance requirement.

HAM SCM LRAM Average

TSN 6.89 3.34 4.40 3.87 Acknowledgment


ARLM 6.89 4.53 5.50 5.02
RRLM 6.89 6.20 6.75 6.48 This paper draws on work supported in part by the following
Average 6.89 4.69 5.55
funds: National High Technology Research and Development Pro-
gram of China (863 Program) under Grant number 2011AA010101,
National Natural Science Foundation of China under Grant numbers
61002009 and 61304188, Key Science and Technology Program of
Zhejiang Province of China under Grant number 2012C01035-1, and
Zhejiang Provincial Natural Science Foundation of China under
Grant numbers LZ13F020004 and LR14F020003.

References

[1] M. Papageorgiou, Dynamic modeling assignment and route guidance in traffic


networks, Transp. Res. Part B: Methodol. 24 (6) (1990) 471–495.
[2] R.C. Shah, S. Roy, S. Jain, W. Brunette, Data mules: modeling and analysis of a
three-tier architecture for sparse sensor networks, Ad Hoc Netw. 1 (2) (2003)
215–233.
[3] P.A. Patrician, Multiple imputation for missing data†‡[J], Research in Nursing &
Health 25 (1) (2002) 76–84.
[4] L. Qu, L. Li, Y. Zhang, J. Hu, Ppca-based missing data imputation for traffic flow
volume: a systematical approach, IEEE Trans. Intell. Transp. Syst. 10 (3) (2009)
512–522.
Fig. 10. The sensor location performance of traffic state network (TSN), exhaustion [5] Z. Li, Y. Zhu, H. Zhu, M. Li, Compressive sensing approach to urban traffic
location road method (ELRM), random road location method (RRLM) and arterial sensing, in: 2011 31st International Conference on Distributed Computing
road location method (ARLM) with different cover rate. Systems (ICDCS), Minneapolis, Minnesota, USA, IEEE, 2011, pp. 889–898.
[6] Z. Shan, D. Zhao, Y. Xia, Urban road traffic speed estimation for missing probe
better performance than HAM in the three location method, vehicle data based on multiple linear regression model, in: 16th International
IEEE Conference on Intelligent Transportation Systems-(ITSC), Hague, Nether-
because HAM does not apply the real-time data from sensors. lands, IEEE, 2013, pp. 118–123.
TSN is the best of three location methods. ARLM performs better [7] S. Tang, F.-Y. Wang, A pci-based evaluation method for level of services
than RRLM in both SCM and LRAM. The RMSE is reduced 28.9% and for traffic operational systems, IEEE Trans. Intell. Transp. Syst. 7 (4) (2006)
494–499.
18.5% respectively. Their sort order fits the result of received [8] M. Qin, Z. Cui, S. Li, Y. Wang, Y. Zhu, The realization of collecting urban road
information as shown in Fig. 10. These two results indicate that information through multi-approach, in: The Sixth World Congress on
it is helpful for traffic speed estimation to obtain more information Intelligent Control and Automation, 2006, WCICA 2006, vol. 2, Dalian, China,
IEEE, 2006, pp. 8664–8668.
and the assumption of TSN is reasonable. [9] Y. Li, M. McDonald, Link travel time estimation using single gps equipped
probe vehicle, in: The IEEE 5th International Conference on Intelligent
Transportation Systems, 2002, Proceedings, Singapore, IEEE, 2002,
pp. 932–937.
6. Conclusion and future work [10] M. Boltes, A. Seyfried, Collecting pedestrian trajectories, Neurocomputing 100
(2013) 127–133.
As more and more attention is cast on road speed estimation, this [11] P. Cheng, Z. Qiu, B. Ran, Particle filter based traffic state estimation using cell
phone network data, in: Intelligent Transportation Systems Conference, 2006,
paper presented a camera location method which aims to maximize
ITSC'06, Toronto, Canada, IEEE, 2006, pp. 1047–1052.
the information gaining from a set of located nodes in an urban [12] S.G. Ritchie, R.L. Cheu, Simulation of freeway incident detection using artificial
network. In our method, traffic network is transformed to the neural networks, Transp. Res. Part C: Emerg. Technol. 1 (3) (1993) 203–217.
undirected weighted graph, where road sections are abstracted to [13] N.-E. El Faouzi, Data-driven aggregative schemes for multisource estimation
fusion: a road travel time application, in: Defense and Security, International
the node and MI is applied to define the weight. And the optimum Society for Optics and Photonics, 2004, pp. 351–359.
problem is transformed to a graph problem similar to the dominant [14] H.-S. Zhang, Y. Zhang, Z.-H. Li, D.-C. Hu, Spatial–temporal traffic data analysis
set problem. The GMM is applied for modeling the distribution of the based on global data management using mas, IEEE Trans. Intell. Transp. Syst. 5
(4) (2004) 267–275.
RTS, and its order and data for training are determined in the [15] Q.-J. Kong, Z. Li, Y. Chen, Y. Liu, An approach to urban traffic state estimation by
experiments. Then, a TSN based algorithm is proposed to solve the fusing multisource information, IEEE Trans. Intell. Transp. Syst. 10 (3) (2009)
optimum problem in a linear complexity. In the experiments, we apply 499–511.
[16] M. Rahmani, H.N. Koutsopoulos, Path inference of low-frequency gps probes
the GPS data from the real urban road to demonstrate the validity of for urban networks, in: 2012 15th International IEEE Conference on Intelligent
the method step by step. The experiments show that (1) the GMM and Transportation Systems (ITSC), Alaska, USA, IEEE, 2012, pp. 1698–1701.
MI is fit for describing the distribution of RTS and calculating the [17] R. Chrobok, O. Kaumann, J. Wahle, M. Schreckenberg, Three categories of
traffic data: historical, current, and predictive, in: Proceedings of the 9th IFAC
weight of model respectively; (2) our location is effective in maximiz- Symposium Control in Transportation Systems, 2000, pp. 250–255.
ing the minimum received information; (3) the large-scale data is [18] W. Toplak, H. Koller, M. Dragaschnig, D. Bauer, J. Asamer, Novel road
helpful to locate cameras for traffic speed (state) estimation. classifications for large scale traffic networks, in: 2010 13th International IEEE
Conference on Intelligent Transportation Systems (ITSC), Funchal, Madeira
Our ongoing research includes several major directions. First, this
Island, Portugal, IEEE, 2010, pp. 1264–1270.
study only focuses on the sensor design problem for speed demand [19] P. Widhalm, M. Piff, N. Brandle, H. Koller, M. Reinthaler, Robust road link speed
estimation applications, and a natural extension is to assist sensor estimates for sparse or missing probe vehicle data, in: 2012 15th International
design decisions for other network wide traffic state estimation IEEE Conference on Intelligent Transportation Systems (ITSC), Alaska, USA,
IEEE, 2012, pp. 1693–1697.
domains, such as measuring and forecasting OD, travel times, path [20] H.J. van Zuylen, L.G. Willumsen, The most likely trip matrix estimated from
flows, and route delays. Second, the proposed model is specifically traffic counts, Transp. Res. Part B: Methodol. 14 (3) (1980) 281–293.
Z. Shan, Q. Zhu / Neurocomputing 169 (2015) 134–143 143

[21] C. Fisk, Some developments in equilibrium traffic assignment, Transp. Res. Part [45] W. Willinger, M.S. Taqqu, W.E. Leland, D.V. Wilson, Self-similarity in high-
B: Methodol. 14 (3) (1980) 243–255. speed packet traffic: analysis and modeling of ethernet traffic measurements,
[22] H. Spiess, A maximum likelihood model for estimating origin–destination Stat. Sci. (1995) 67–85.
matrices, Transp. Res. Part B: Methodol. 21 (5) (1987) 395–412. [46] A. Hofleitner, R. Herring, A. Bayen, Probability distributions of travel times on
[23] E. Cascetta, Estimation of trip matrices from traffic counts and survey data: a arterial networks: a traffic flow and horizontal queuing theory approach, in:
generalized least squares estimator, Transp. Res. Part B: Methodol. 18 (4) 91st Transportation Research Board Annual Meeting, no. 12-0798, 2012.
(1984) 289–299. [47] D.A. Reynolds, T.F. Quatieri, R.B. Dunn, Speaker verification using adapted
[24] M.L. Hazelton, Estimation of origin–destination matrices from link flows on gaussian mixture models, Digital Signal Process. 10 (1) (2000) 19–41.
uncongested networks, Transp. Res. Part B: Methodol. 34 (7) (2000) 549–566. [48] J.A. Bilmes, et al., A gentle tutorial of the em algorithm and its application to
[25] M. Gentili, P.B. Mirchandani, Survey of models to locate sensors to estimate parameter estimation for Gaussian mixture and hidden Markov models, Int.
traffic flows, Transp. Res. Rec.: J. Transp. Res. Board 2243 (1) (2011) 108–116. Comput. Sci. Inst. 4 (510) (1998) 126.
[26] E. Castillo, J.M. Menéndez, P. Jiménez, Trip matrix and path flow reconstruc- [49] A. Doucet, Sequential Monte Carlo Methods, Wiley Online Library, 2001.
tion and estimation based on plate scanning and link observations, Transp. [50] M. Yannakakis, F. Gavril, Edge dominating sets in graphs, SIAM J. Appl. Math.
Res. Part B: Methodol. 42 (5) (2008) 455–481. 38 (3) (1980) 364–372.
[27] I. Fujito, R. Margiotta, W. Huang, W.A. Perez, Effect of sensor spacing on [51] J.A. Tropp, Greed is good: algorithmic results for sparse approximation, IEEE
performance measure calculations, Transp. Res. Rec.: J. Transp. Res. Board 1945 Trans. Inf. Theory 50 (10) (2004) 2231–2242.
(1) (2006) 1–11. [52] R.-M. Hage, D. Bétaille, F. Peyret, D. Meizel, J. Smal, Unscented Kalman filter for
[28] F.-Y. Liu, M.G. Cogan, Angiotensin ii: a potent regulator of acidification in the urban link travel time estimation with mid-link sinks and sources, in: 2012
rat early proximal convoluted tubule, J. Clin. Investig. 80 (1) (1987) 272. 15th International IEEE Conference on Intelligent Transportation Systems
[29] Z. Nie, W. Li, M. Seo, S. Xu, E. Kumacheva, Janus and ternary particles (ITSC), Alaska, USA, IEEE, 2012, pp. 1632–1637.
generated by microfluidic synthesis: design, synthesis, and self-assembly, J. [53] J.R. Hershey, P.A. Olsen, Approximating the Kullback–Leibler divergence
Am. Chem. Soc. 128 (29) (2006) 9408–9412. between Gaussian mixture models, in: IEEE International Conference on
[30] B. Bartin, K. Ozbay, C. Iyigun, Clustering-based methodology for determining Acoustics, Speech and Signal Processing, vol. 4, 2007, pp. 317–320.
optimal roadway configuration of detectors for travel time estimation, Transp. [54] B. Bazartseren, G. Hildebrandt, K.-P. Holz, Short-term water level prediction
Res. Rec.: J. Transp. Res. Board 2000 (1) (2007) 98–105. using neural networks and neuro-fuzzy approach, Neurocomputing 55 (3)
[31] M. Cremer, H. Keller, A new class of dynamic methods for the identification of (2003) 439–450.
origin–destination flows, Transp. Res. Part B: Methodol. 21 (2) (1987) 117–132.
[32] Q. Yang, X. Wu, 10 challenging problems in data mining research, Int. J. Inf.
Technol. Decis. Making 5 (04) (2006) 597–604.
[33] P.V. Marsden, N.E. Friedkin, Network studies of social influence, Sociol.
Zhenyu Shan received the B.S. degree in computer
Methods Res. 22 (1) (1993) 127–151.
science and technology from the Zhejiang University,
[34] A. Pak, P. Paroubek, Twitter as a corpus for sentiment analysis and opinion
Hangzhou, in 2004; the Ph.D. degree in computer
mining, in: LREC, 2010.
science and technology from the Zhejiang University,
[35] A. Bergman, M.L. Siegal, Evolutionary capacitance as a general feature of
Hangzhou, in 2010. He is currently a lecturer in Intel-
complex gene networks, Nature 424 (6948) (2003) 549–552.
ligent Transportation and Information Security Lab at
[36] H. Bolouri, E.H. Davidson, Modeling transcriptional regulatory networks,
the Hangzhou Normal University. His research interests
BioEssays 24 (12) (2002) 1118–1129.
include intelligent transportation systems, big data and
[37] Q. Chao, J. Shen, X. Jin, Video-based personalized traffic learning, Graph.
visual semantic understanding.
Models 75 (6) (2013) 305–317.
[38] Y. Xia, X. Shi, G. Song, Q. Geng, Y. Liu, Towards improving quality of video-
based vehicle counting method for traffic flow estimation, Signal Process 2014,
http://dx.doi.org/10.1016/j.sigpro.2014.10.035, in press.
[39] S. Indu, V. Nair, S. Jain, S. Chaudhury, Video based adaptive road traffic
signaling, in: 2013 Seventh International Conference on Distributed Smart
Cameras (ICDSC), Palm Springs, California, USA, IEEE, 2013, pp. 1–6.
[40] X. Li, Y. She, D. Luo, Z. Yu, A traffic state detection tool for freeway video Qianqian Zhu is currently an undergraduate in the
surveillance system, Procedia—Social Behav. Sci. 96 (2013) 2453–2461. Department of Software at Hangzhou Normal Univer-
[41] R. Dou, M. Yun, X. Yang, Traffic state identification considering differences sity, Hangzhou. Her research interests include intelli-
between road segments and intersections, Bridges 10 (2014) 9780784412442- gent transportation systems and visual semantic
107. understanding.
[42] R.D. Alba, A graph-theoretic definition of a sociometric clique, J. Math. Sociol. 3
(1) (1973) 113–126.
[43] R. Steuer, J. Kurths, C.O. Daub, J. Weise, J. Selbig, The mutual information:
detecting and evaluating dependencies between variables, Bioinformatics 18
(suppl. 2) (2002) S231–S240.
[44] T.H. Pham, T.B. Ho, Q.D. Nguyen, D.H. Tran, V.H. Nguyen, Multivariate mutual
information measures for discovering biological networks, in: RIVF, 2012,
pp. 1–6.

Вам также может понравиться