Академический Документы
Профессиональный Документы
Культура Документы
//
Abstract
A case study is done for reliability measurement of past travel time data between two locations in
Delhi. Two compression based algorithm are used for reliability estimation one is Lempel Ziv
(LZ) algorithm and the second is modification on this Lempel Ziv- Longest Common
Subsequence (LZ-LCS) algorithm. Both algorithm first estimate the entropy and then the
reliability. LZ search for exactly similar string from the sequence whereas LZ-LCS search for
both substring and subsequence. Results shows an increase in reliability value and decrease in
intervalsare also varied to shows its effect on the entropy and predictability value.
Introduction
Travel time reliability measure the extent of the expected delay. In 1950 Fano shows that Commented [NU1]: Write about importance of title and use of
title
predictability is depends on entropy of the sequence with the help of Fano lemma which is based
on conditional probability. Therefore entropy is the most fundamental quantity to calculate the Commented [NU2]: What are the akternative theory behind it
write in 60-80 word
predictability. If entropy of a travel time series is low then the reliability value ofnext element of Commented [NU3]: Mentioned different study in 100 word
the series will be high and vice versa. If the data series follows a pattern then the next element of
that data series can be predicted with high reliability i.e. it will be easy to predict as compared to
a data series which will not follow a pattern. Therefore to understand the role of past travel time
data is studied in this paper by using the two compression based algorithm.
To calculate the entropy first we have to convert the travel time sequence in characters form
based on some travel time interval like travel time interval of 5-6 minutes is denoted by character
“a” and 6-7 by “b” and so on. After characterizing this travel time series we can estimate the
entropy based on data compression algorithm which follows prefix compression property viz.,
Literature Review
Based on information theory to calculate the reliability the solution proposed by using LZ
entropy estimation based on compression algorithm. This solution to calculate the entropy uses
discretization of time series of character. Suppose a road XY have two point X as starting and Y
is ending point. Let minimum travel time to travel from point X to Y is 10 minutes and
maximum is 15 minutes. Then we can say this travel time sequence for road XY can be bounded
by the interval [10, 15]. Suppose the travel time sequence for XY road is (10, 14, 13, 15,
14……). After characterizing this travel time sequence we will get (a, d, c, e, d…). To calculate
the UBP from this travel time sequence first we need to calculate LZ entropy which is shown
below
𝟏 𝒏 (−𝟏)
𝑬𝒏𝒕𝒓𝒐𝒑𝒚 (𝑬) = (𝒏 ∑𝒊=𝟏⩘𝒊 ) 𝒍𝒐𝒈 𝒆 𝒏 … (1)
Step 1: The previous string Xpre is empty where "A" has never appeared, so ⩘1 = 1.
Step 2: The previous string Xpre is "A" where the following "B” has never appeared, so ⩘2 = 1.
Step 3: The previous string Xpre is "AB" where the following "B" has appeared, but "BC” has
Step 4: The previous string Xpre is "ABB" where the following "C" has never appeared in Xpre, so
⩘4 = 1.
Step 5: The previous string Xpre is "ABBC" where the following "B" and "BC” have appeared,
In LZ entropy calculation we need to scan complete strings for the exactly identical substring for
example abb has appeared in string caabbc. LZ is unable to explore this property. So this lead to
loose entropy estimation. So to explore this property of travel time sequence we are introducing
Here value of ⩘𝒊 will calculate fromLCS(X [1…...i], Y [1………j]) where X and Y holds two
There is a better way to explore the string similarity property of travel time sequence using
longest common subsequence algorithm or longest common substring algorithm. In this travel
time series practically we found that both algorithm gives the same result.
Methodology
predictive algorithm which can estimate the next travel time. This quantity is subject to fano’s
inequality. That is if the travel time sequence have entropy E and with the interval N, then
If a characterized travel time sequence {a,b,c,a,c,b,d…….} observe at some interval ‘n’. Then
we can say the entropy (E) of this characterized travel time series can be calculated from
equation (1)shown above but problem is in calculation is that to calculate the entropy we should
have an infinite length series but we have a fixed length series,so using fixed length series we
cannot calculate real entropy, if we calculate it this will called estimated entropy (and this series
in entropy calculation have the characteristics that if we increase the length of series congruent to
1. Convert the whole travel time series in characterize series based on sub interval which is
explained in appendix 1.
3. Using ⩘𝒊value calculate the entropy from the equation (1) for different number of
subinterval using the entropy calculated in above stepby usingfano’s inequality equation
described in equation (2) and equation (3). Commented [NU7]: Use of this s
Suitablity of factors
Determination of criteria
How assessing the impact of each criteria gudiline
Flow charts
Different types of analysis Steps and formula used , integrteda
formulas
Case study/ Experiment or Analysis/ Discussion Steps
for commuting is through private vehicle as public transport is less accessible and Commented [NU8]: study are a map, it charcstrics in terms of
traffic.
uncomfortable. The in between travel time data between these locations is collected from online
available uber[8] data. The past data is collected for all Mondays from August 2017 to June 2019
for all Mondays for a better predictable pattern of travel time sequence. Three time slots are
selected for present study viz. AM peak (7am-10am), midday (10am-4pm), PM peak (4pm -7pm)
period.
The trend of travel time variation for all Mondays during the study period between the study
2000
1500
1000
500
0
11/20/2017
02/04/2019
05/20/2019
08/07/2017
08/28/2017
09/18/2017
10/09/2017
10/30/2017
12/11/2017
01/01/2018
01/22/2018
02/12/2018
03/05/2018
03/26/2018
04/16/2018
05/07/2018
05/28/2018
06/18/2018
07/09/2018
07/30/2018
08/20/2018
09/10/2018
10/01/2018
10/22/2018
11/12/2018
12/03/2018
12/24/2018
01/14/2019
02/25/2019
03/18/2019
04/08/2019
04/29/2019
06/10/2019
Monday,s Date
Figure1 Travel time trend on MondayTis Hazari to New Delhi Railway station
3000
2000
1000
0
05/13/2019
08/07/2017
09/04/2017
10/02/2017
10/30/2017
11/27/2017
12/25/2017
01/22/2018
02/19/2018
03/19/2018
04/16/2018
05/14/2018
06/11/2018
07/09/2018
08/06/2018
09/03/2018
10/01/2018
10/29/2018
11/26/2018
12/24/2018
01/21/2019
02/18/2019
03/18/2019
04/15/2019
06/10/2019
Monday's Date
Figure 2 Travel time trend on Monday for New Delhi Railway station to Tis Hazari
The travel time sequence data is further used for entropy and reliability calculation for the above
mentioned time slots. Entropy and reliability estimation through LZ and LZ-LCS algorithm is
done for the three time slots and for both direction of journey. The calculation of entropy and
reliabilityis done for varied characterize series based subinterval to study its variation with it and
PM slot LZ
1
Midday slot LZ
0.5
AM slot LZ-LCS
0 PM slot LZ-LCS
2 4 6 8
Midday slot LZ-LCS
Characterize series based subinterval
1.5 AM Mean LZ
Entropy
PM Mean LZ
1
Midday slot LZ
0.5 AM slot LZ-LCS
PM slot LZ-LCS
0
2 4 6 8 Midday slot LZ-LCS
Characterize series based subinterval
0.9 PM slot LZ
0.85 Midday slot LZ
0.8 AM slot LZ-LCS
0.75 PM slot LZ-LCS
2 4 6 8
Midday slot LZ-LCS
Travel time Interval division
0.9 PM Mean LZ
0.85 Midday Mean LZ
0.8 AM Mean LZ-LCS
0.75 PM Mean LZ-LCS
2 4 6 8
Midday Mean LZ-LCS
Characterize series based subinterval
In the study we have used both LZ and LZ-LCS algorithm for entropy and reliability calculation
and the results are showing that there is a decrease in entropy value in LZ-LCS algorithm as
compared to LZ algorithm. The varying trend of entropy and reliability with travel time interval
division can also be seen from the results. LZ-LCS can be considered better than LZ algorithm as
it will search for the substring and subsequence from the travel time sequence whereas the LZ
only search for the exactly similar string from the travel time sequence.From the research it can
be conclude that how much the previous data is significant to estimate the reliability of next
In the calculation process of reliability we need very long string to calculate the entropy and then
predictability, so in future this will remains a tedious task to estimate the reliability value with
Appendix 1
Suppose we have a travel time series with interval [i, j] where i and j shows lower and upper
travel time in a series respectively. If a series T iwith interval [i, j] have n sub-intervals then we
𝑗−𝑖
Ist sub-interval: 𝑖 to 𝑖 + [ ]
𝑛
𝑗−𝑖 𝑗−𝑖
2nd subinterval: 𝑖 + [ ] to 𝑖 + 2 ∗ [ ]
𝑛 𝑛
𝑗−𝑖 𝑗−𝑖
Similarly for (n-1)thsub-interval: 𝑖 + (𝑛 − 2) ∗ [ 𝑛
] to 𝑖 + (𝑛 − 1) ∗ [ 𝑛
]
𝑗−𝑖
And for nth sub-interval: 𝑖 + (𝑛 − 1) ∗ [ 𝑛
] to 𝑗
Reference
[1] R. M. Fano, Transmission of Information (the MIT Press and Wiley, New York and London,
1961).
[2] Ziv, J., Lempel, A., 1977. A Universal Algorithm for Sequential Data Compression. IEEE
Press.
[3] Li, H., He, F., Lin, X., Wang, Y., & Li, M. (2019). Travel time reliability measure based on
[4] Song, C., Qu, Z., Blumm, N., &Barabási, A. L. (2010). Limits of predictability in human
[5] Lu, X., Wetter, E., Bharti, N., Tatem, A.J., Bengtsson, L., 2013. Approaching the limit of predictability
[6] Woodard, D., Nogin, G., et al., 2017. Predicting travel time reliability using mobile phone GPS data.
[7] Shlayan, N., Kachroo, P., Wadoo, S., 2011. Transportation Reliability Based on Information Theory.