Paper in Progress - With Structure

Reliability measurement of Travel time data for its future prediction using LZ
and LZ-LCS algorithm
//
Abstract
A case study is done for reliability measurement of past travel time data between two locations in
Delhi. Two compression based algorithm are used for reliability estimation one is Lempel Ziv
(LZ) algorithm and the second is modification on this Lempel Ziv- Longest Common
Subsequence (LZ-LCS) algorithm. Both algorithm first estimate the entropy and then the
reliability. LZ search for exactly similar string from the sequence whereas LZ-LCS search for
both substring and subsequence. Results shows an increase in reliability value and decrease in
entropy value in LZ-LCS algorithm as compared to LZ algorithm. Characterize series based on
intervalsare also varied to shows its effect on the entropy and predictability value.
Introduction
Travel time reliability measure the extent of the expected delay. In 1950 Fano shows that Commented [NU1]: Write about importance of title and use of
title
predictability is depends on entropy of the sequence with the help of Fano lemma which is based
on conditional probability. Therefore entropy is the most fundamental quantity to calculate the Commented [NU2]: What are the akternative theory behind it
write in 60-80 word
predictability. If entropy of a travel time series is low then the reliability value ofnext element of Commented [NU3]: Mentioned different study in 100 word
the series will be high and vice versa. If the data series follows a pattern then the next element of
that data series can be predicted with high reliability i.e. it will be easy to predict as compared to
a data series which will not follow a pattern. Therefore to understand the role of past travel time
data is studied in this paper by using the two compression based algorithm.
To calculate the entropy first we have to convert the travel time sequence in characters form
based on some travel time interval like travel time interval of 5-6 minutes is denoted by character
“a” and 6-7 by “b” and so on. After characterizing this travel time series we can estimate the
entropy based on data compression algorithm which follows prefix compression property viz.,
Lempel Ziv, Huffman encoding etc.
Write about purpose pf the study and structure of the paper
Literature Review
Based on information theory to calculate the reliability the solution proposed by using LZ
entropy estimation based on compression algorithm. This solution to calculate the entropy uses
discretization of time series of character. Suppose a road XY have two point X as starting and Y
is ending point. Let minimum travel time to travel from point X to Y is 10 minutes and
maximum is 15 minutes. Then we can say this travel time sequence for road XY can be bounded
by the interval [10, 15]. Suppose the travel time sequence for XY road is (10, 14, 13, 15,
14……). After characterizing this travel time sequence we will get (a, d, c, e, d…). To calculate
the UBP from this travel time sequence first we need to calculate LZ entropy which is shown
below
𝟏 𝒏 (−𝟏)
𝑬𝒏𝒕𝒓𝒐𝒑𝒚 (𝑬) = (𝒏 ∑𝒊=𝟏⩘𝒊 ) 𝒍𝒐𝒈 𝒆 𝒏 … (1)
Here main thing is this how we calculate ⩘𝒊 .
Calculation procedure of ⩘𝒊 using LZ

Assume that we have a simple character string X = "ABBCBC" .
The calculation steps regarding ⩘𝒊 is as follows:
Step 1: The previous string Xpre is empty where "A" has never appeared, so ⩘1 = 1.
Step 2: The previous string Xpre is "A" where the following "B” has never appeared, so ⩘2 = 1.
Step 3: The previous string Xpre is "AB" where the following "B" has appeared, but "BC” has
never appeared in Xpre, so ⩘3 = 2.
Step 4: The previous string Xpre is "ABB" where the following "C" has never appeared in Xpre, so
⩘4 = 1.
Step 5: The previous string Xpre is "ABBC" where the following "B" and "BC” have appeared,
but "BCB" has never appeared in Xpre, so ⩘5 = 3.
In LZ entropy calculation we need to scan complete strings for the exactly identical substring for
example abb has appeared in string caabbc. LZ is unable to explore this property. So this lead to
loose entropy estimation. So to explore this property of travel time sequence we are introducing
here an algorithm name LZ-LCS shown below.
Calculation procedure of ⩘𝒊 using LZ-LCS
Here value of ⩘𝒊 will calculate fromLCS(X [1…...i], Y [1………j]) where X and Y holds two
strings of length i and j respectively.
Let C [i, j] be the length of an LCS of X [1…..i] and Y [1….j].
C [i, j] can be computed as follows:
1: If i=0 or j=0 then C [i, j] =0.

2: If i, j>0 and Xi = Yjthen C [i, j] = c [i-1, j-1] +1.
3: If i, j>0 and Xi ≠ Yjthen C [i, j] = Max {C [i-1, j-1], C [i-1, j-1]}.
There is a better way to explore the string similarity property of travel time sequence using
longest common subsequence algorithm or longest common substring algorithm. In this travel
time series practically we found that both algorithm gives the same result.
Methodology
Fano’s inequality: An important measure of reliability is the probability that an appropriate
predictive algorithm which can estimate the next travel time. This quantity is subject to fano’s
inequality. That is if the travel time sequence have entropy E and with the interval N, then
reliability of sequence R ≤ Rmax(E,N),
Where Eis given by equation (2)
𝑬 = 𝑯(𝑹𝐦𝐚𝐱 ) + (𝟏 − 𝑹𝒎𝒂𝒙 ) 𝐥𝐨𝐠 𝟐 (𝑵 − 𝟏) … (2)
With the binary entropy function given by equation (3)
𝑯(𝑹𝒎𝒂𝒙 ) = −𝑹𝒎𝒂𝒙 𝐥𝐨𝐠 𝟐 (𝑹𝒎𝒂𝒙 ) − (𝟏 − 𝑹𝒎𝒂𝒙) 𝐥𝐨𝐠 𝟐 (𝟏 − 𝑹𝒎𝒂𝒙) … (3)
If a characterized travel time sequence {a,b,c,a,c,b,d…….} observe at some interval ‘n’. Then
we can say the entropy (E) of this characterized travel time series can be calculated from
equation (1)shown above but problem is in calculation is that to calculate the entropy we should
have an infinite length series but we have a fixed length series,so using fixed length series we
cannot calculate real entropy, if we calculate it this will called estimated entropy (and this series
in entropy calculation have the characteristics that if we increase the length of series congruent to
infinity we can go close to real entropy. Commented [NU4]:

Commented [NU6]:
Procedure steps for the case study is as follows: Commented [NU5]: What is new in this methodlogy, who has
developed this and unique points of using this methods 100 word.
1. Convert the whole travel time series in characterize series based on sub interval which is
explained in appendix 1.
2. Calculate the ⩘𝒊using LZ and LZ-LCS algorithm which is explained earlier.
3. Using ⩘𝒊value calculate the entropy from the equation (1) for different number of
characterize series based subinterval.
4. Finally we calculate reliability of seriesfor different number of characterize series based
subinterval using the entropy calculated in above stepby usingfano’s inequality equation
described in equation (2) and equation (3). Commented [NU7]: Use of this s
Suitablity of factors
Determination of criteria
How assessing the impact of each criteria gudiline
Flow charts
Different types of analysis Steps and formula used , integrteda
formulas
Case study/ Experiment or Analysis/ Discussion Steps
Entire methodology should being terms 1500 words,

For our study we have selected two locations for origin and destination where preferable option Stating advantages methods using reference
for commuting is through private vehicle as public transport is less accessible and Commented [NU8]: study are a map, it charcstrics in terms of
traffic.
uncomfortable. The in between travel time data between these locations is collected from online
available uber[8] data. The past data is collected for all Mondays from August 2017 to June 2019
for all Mondays for a better predictable pattern of travel time sequence. Three time slots are
selected for present study viz. AM peak (7am-10am), midday (10am-4pm), PM peak (4pm -7pm)
period.
The trend of travel time variation for all Mondays during the study period between the study
locations is depicted in the figure 1 and figure 2.

Tis Hazari to New Delhi Railway station
3000
2500
Travel Time (seconds)
2000
1500
1000
500
0
11/20/2017
02/04/2019
05/20/2019
08/07/2017
08/28/2017
09/18/2017
10/09/2017
10/30/2017
12/11/2017
01/01/2018
01/22/2018
02/12/2018
03/05/2018
03/26/2018
04/16/2018
05/07/2018
05/28/2018
06/18/2018
07/09/2018
07/30/2018
08/20/2018
09/10/2018
10/01/2018
10/22/2018
11/12/2018
12/03/2018
12/24/2018
01/14/2019
02/25/2019
03/18/2019
04/08/2019
04/29/2019
06/10/2019
Monday,s Date
AM Mean Travel Time (7am-10am) PM Mean Travel Time (4pm-7pm)

Midday Mean Travel Time (10am-4pm)
Figure1 Travel time trend on MondayTis Hazari to New Delhi Railway station
New Delhi Railway station to Tis Hazari

4000
Travel Time (seconds)
3000
2000
1000
0
05/13/2019
08/07/2017
09/04/2017
10/02/2017
10/30/2017
11/27/2017
12/25/2017
01/22/2018
02/19/2018
03/19/2018
04/16/2018
05/14/2018
06/11/2018
07/09/2018
08/06/2018
09/03/2018
10/01/2018
10/29/2018
11/26/2018
12/24/2018
01/21/2019
02/18/2019
03/18/2019
04/15/2019
06/10/2019
Monday's Date
AM Mean Travel Time (7am-10am) PM Mean Travel Time (4pm-7pm)

Midday Mean Travel Time (10am-4pm)
Figure 2 Travel time trend on Monday for New Delhi Railway station to Tis Hazari
The travel time sequence data is further used for entropy and reliability calculation for the above
mentioned time slots. Entropy and reliability estimation through LZ and LZ-LCS algorithm is
done for the three time slots and for both direction of journey. The calculation of entropy and
reliabilityis done for varied characterize series based subinterval to study its variation with it and
are presented in figure3, figure 4, figure 5 and figure 6.
Entropy- New Delhi Railway station to Tis hazari

2
AM slot LZ
1.5
Entropy
PM slot LZ
1
Midday slot LZ
0.5
AM slot LZ-LCS
0 PM slot LZ-LCS
2 4 6 8
Midday slot LZ-LCS
Characterize series based subinterval
Figure 3Entropy- New Delhi Railway station to Tis hazari
Entropy- Tis Hazari to New Delhi Railway station

2
1.5 AM Mean LZ
Entropy
PM Mean LZ
1
Midday slot LZ
0.5 AM slot LZ-LCS
PM slot LZ-LCS
0
2 4 6 8 Midday slot LZ-LCS
Figure 4Entropy- Tis Hazari to New Delhi Railway station

Reliability- New Delhi Railway station to Tis
hazari
1
0.95 AM slot LZ
Reliability
0.9 PM slot LZ
0.85 Midday slot LZ
0.8 AM slot LZ-LCS
0.75 PM slot LZ-LCS
2 4 6 8
Midday slot LZ-LCS
Travel time Interval division
Figure 5Reliability - New Delhi Railway station to Tis Hazari
Reliability- Tis Hazari to New Delhi Railway

station
1
0.95 AM Mean LZ
Reliability
0.9 PM Mean LZ
0.85 Midday Mean LZ
0.8 AM Mean LZ-LCS
0.75 PM Mean LZ-LCS
2 4 6 8
Midday Mean LZ-LCS
Figure 6Reliability - Tis Hazari to New Delhi Railway station
Commented [NU9]: Result and discussion should be 700-800

word. Each parameters and analysis should be comments with
A significant decrease in entropy value and increase in reliability value can be seen for LZ and details. With reference of the results with other authors.
How title is important and how it is computed??
How different paramters influence
LZ-LCS method. With increase in travel time interval division entropy values shows an increase Tabulate results , table and picture
Validation and verification of results
Suggestion
trend while for the predictability values it shows a decreasing trend.
Commented [NU10]:
Commented [NU11]:
Conclusion
In the study we have used both LZ and LZ-LCS algorithm for entropy and reliability calculation
and the results are showing that there is a decrease in entropy value in LZ-LCS algorithm as
compared to LZ algorithm and there is increase in predictability value in LZ-LCS algorithm as
compared to LZ algorithm. The varying trend of entropy and reliability with travel time interval
division can also be seen from the results. LZ-LCS can be considered better than LZ algorithm as
it will search for the substring and subsequence from the travel time sequence whereas the LZ
only search for the exactly similar string from the travel time sequence.From the research it can
be conclude that how much the previous data is significant to estimate the reliability of next
sequence element. Commented [NU12]:
What was the objective of the paper, how we have mets,

uniqueness about the results/methods, general application of the
Future Scope title of the study.
In the calculation process of reliability we need very long string to calculate the entropy and then
predictability, so in future this will remains a tedious task to estimate the reliability value with
the less amount of data.
Appendix 1
Suppose we have a travel time series with interval [i, j] where i and j shows lower and upper
travel time in a series respectively. If a series T iwith interval [i, j] have n sub-intervals then we
can write each subinterval as follow:
𝑗−𝑖
Ist sub-interval: 𝑖 to 𝑖 + [ ]
𝑛
𝑗−𝑖 𝑗−𝑖
2nd subinterval: 𝑖 + [ ] to 𝑖 + 2 ∗ [ ]
𝑛 𝑛
𝑗−𝑖 𝑗−𝑖
Similarly for (n-1)thsub-interval: 𝑖 + (𝑛 − 2) ∗ [ 𝑛
] to 𝑖 + (𝑛 − 1) ∗ [ 𝑛
]
𝑗−𝑖
And for nth sub-interval: 𝑖 + (𝑛 − 1) ∗ [ 𝑛
] to 𝑗
Reference
[1] R. M. Fano, Transmission of Information (the MIT Press and Wiley, New York and London,
1961).
[2] Ziv, J., Lempel, A., 1977. A Universal Algorithm for Sequential Data Compression. IEEE
Press.
[3] Li, H., He, F., Lin, X., Wang, Y., & Li, M. (2019). Travel time reliability measure based on
predictability using the Lempel–Ziv algorithm. Transportation Research Part C: Emerging
Technologies, 101, 161-180.
[4] Song, C., Qu, Z., Blumm, N., &Barabási, A. L. (2010). Limits of predictability in human
mobility. Science, 327(5968), 1018-1021.
[5] Lu, X., Wetter, E., Bharti, N., Tatem, A.J., Bengtsson, L., 2013. Approaching the limit of predictability
in human mobility. Sci. Rep. 3 (10), 2923.
[6] Woodard, D., Nogin, G., et al., 2017. Predicting travel time reliability using mobile phone GPS data.
Transp. Res. Part C: Emerg. Technol. 75, 30–44.
[7] Shlayan, N., Kachroo, P., Wadoo, S., 2011. Transportation Reliability Based on Information Theory.
IEEE, pp. 1415–1420.
[8] https://movement.uber.com/?lang=hi-IN Commented [NU13]: Toatl paper length should be 4500-5000

words with above structure and word budget as mentioned.

Paper in Progress - With Structure

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Paper in Progress - With Structure

Загружено:

Авторское право:

Доступные форматы

Reliability measurement of Travel time data for its future prediction using LZ

and LZ-LCS algorithm

entropy value in LZ-LCS algorithm as compared to LZ algorithm. Characterize series based on

Lempel Ziv, Huffman encoding etc.

Write about purpose pf the study and structure of the paper

Here main thing is this how we calculate ⩘𝒊 .

Calculation procedure of ⩘𝒊 using LZ

The calculation steps regarding ⩘𝒊 is as follows:

never appeared in Xpre, so ⩘3 = 2.

but "BCB" has never appeared in Xpre, so ⩘5 = 3.

here an algorithm name LZ-LCS shown below.

Calculation procedure of ⩘𝒊 using LZ-LCS

strings of length i and j respectively.

Let C [i, j] be the length of an LCS of X [1…..i] and Y [1….j].

C [i, j] can be computed as follows:

1: If i=0 or j=0 then C [i, j] =0.

3: If i, j>0 and Xi ≠ Yjthen C [i, j] = Max {C [i-1, j-1], C [i-1, j-1]}.

Fano’s inequality: An important measure of reliability is the probability that an appropriate

reliability of sequence R ≤ Rmax(E,N),

Where Eis given by equation (2)

𝑬 = 𝑯(𝑹𝐦𝐚𝐱 ) + (𝟏 − 𝑹𝒎𝒂𝒙 ) 𝐥𝐨𝐠 𝟐 (𝑵 − 𝟏) … (2)

With the binary entropy function given by equation (3)

𝑯(𝑹𝒎𝒂𝒙 ) = −𝑹𝒎𝒂𝒙 𝐥𝐨𝐠 𝟐 (𝑹𝒎𝒂𝒙 ) − (𝟏 − 𝑹𝒎𝒂𝒙) 𝐥𝐨𝐠 𝟐 (𝟏 − 𝑹𝒎𝒂𝒙) … (3)

infinity we can go close to real entropy. Commented [NU4]:

2. Calculate the ⩘𝒊using LZ and LZ-LCS algorithm which is explained earlier.

characterize series based subinterval.

4. Finally we calculate reliability of seriesfor different number of characterize series based

Entire methodology should being terms 1500 words,

locations is depicted in the figure 1 and figure 2.

AM Mean Travel Time (7am-10am) PM Mean Travel Time (4pm-7pm)

New Delhi Railway station to Tis Hazari

AM Mean Travel Time (7am-10am) PM Mean Travel Time (4pm-7pm)

are presented in figure3, figure 4, figure 5 and figure 6.

Entropy- New Delhi Railway station to Tis hazari

Figure 3Entropy- New Delhi Railway station to Tis hazari

Entropy- Tis Hazari to New Delhi Railway station

Figure 4Entropy- Tis Hazari to New Delhi Railway station

Figure 5Reliability - New Delhi Railway station to Tis Hazari

Reliability- Tis Hazari to New Delhi Railway

Figure 6Reliability - Tis Hazari to New Delhi Railway station

Commented [NU9]: Result and discussion should be 700-800

compared to LZ algorithm and there is increase in predictability value in LZ-LCS algorithm as

sequence element. Commented [NU12]:

What was the objective of the paper, how we have mets,

the less amount of data.

can write each subinterval as follow:

predictability using the Lempel–Ziv algorithm. Transportation Research Part C: Emerging

Technologies, 101, 161-180.

mobility. Science, 327(5968), 1018-1021.

in human mobility. Sci. Rep. 3 (10), 2923.

Transp. Res. Part C: Emerg. Technol. 75, 30–44.

IEEE, pp. 1415–1420.

[8] https://movement.uber.com/?lang=hi-IN Commented [NU13]: Toatl paper length should be 4500-5000

Вам также может понравиться