Вы находитесь на странице: 1из 10

Reliability measurement of Travel time data for its future prediction using LZ

and LZ-LCS algorithm

//

Abstract

A case study is done for reliability measurement of past travel time data between two locations in

Delhi. Two compression based algorithm are used for reliability estimation one is Lempel Ziv

(LZ) algorithm and the second is modification on this Lempel Ziv- Longest Common

Subsequence (LZ-LCS) algorithm. Both algorithm first estimate the entropy and then the

reliability. LZ search for exactly similar string from the sequence whereas LZ-LCS search for

both substring and subsequence. Results shows an increase in reliability value and decrease in

entropy value in LZ-LCS algorithm as compared to LZ algorithm. Characterize series based on

intervalsare also varied to shows its effect on the entropy and predictability value.

Introduction

Travel time reliability measure the extent of the expected delay. In 1950 Fano shows that Commented [NU1]: Write about importance of title and use of
title

predictability is depends on entropy of the sequence with the help of Fano lemma which is based

on conditional probability. Therefore entropy is the most fundamental quantity to calculate the Commented [NU2]: What are the akternative theory behind it
write in 60-80 word

predictability. If entropy of a travel time series is low then the reliability value ofnext element of Commented [NU3]: Mentioned different study in 100 word

the series will be high and vice versa. If the data series follows a pattern then the next element of

that data series can be predicted with high reliability i.e. it will be easy to predict as compared to

a data series which will not follow a pattern. Therefore to understand the role of past travel time

data is studied in this paper by using the two compression based algorithm.
To calculate the entropy first we have to convert the travel time sequence in characters form

based on some travel time interval like travel time interval of 5-6 minutes is denoted by character

“a” and 6-7 by “b” and so on. After characterizing this travel time series we can estimate the

entropy based on data compression algorithm which follows prefix compression property viz.,

Lempel Ziv, Huffman encoding etc.

Write about purpose pf the study and structure of the paper

Literature Review

Based on information theory to calculate the reliability the solution proposed by using LZ

entropy estimation based on compression algorithm. This solution to calculate the entropy uses

discretization of time series of character. Suppose a road XY have two point X as starting and Y

is ending point. Let minimum travel time to travel from point X to Y is 10 minutes and

maximum is 15 minutes. Then we can say this travel time sequence for road XY can be bounded

by the interval [10, 15]. Suppose the travel time sequence for XY road is (10, 14, 13, 15,

14……). After characterizing this travel time sequence we will get (a, d, c, e, d…). To calculate

the UBP from this travel time sequence first we need to calculate LZ entropy which is shown

below

𝟏 𝒏 (−𝟏)
𝑬𝒏𝒕𝒓𝒐𝒑𝒚 (𝑬) = (𝒏 ∑𝒊=𝟏⩘𝒊 ) 𝒍𝒐𝒈 𝒆 𝒏 … (1)

Here main thing is this how we calculate ⩘𝒊 .

Calculation procedure of ⩘𝒊 using LZ


Assume that we have a simple character string X = "ABBCBC" .

The calculation steps regarding ⩘𝒊 is as follows:

Step 1: The previous string Xpre is empty where "A" has never appeared, so ⩘1 = 1.

Step 2: The previous string Xpre is "A" where the following "B” has never appeared, so ⩘2 = 1.

Step 3: The previous string Xpre is "AB" where the following "B" has appeared, but "BC” has

never appeared in Xpre, so ⩘3 = 2.

Step 4: The previous string Xpre is "ABB" where the following "C" has never appeared in Xpre, so

⩘4 = 1.

Step 5: The previous string Xpre is "ABBC" where the following "B" and "BC” have appeared,

but "BCB" has never appeared in Xpre, so ⩘5 = 3.

In LZ entropy calculation we need to scan complete strings for the exactly identical substring for

example abb has appeared in string caabbc. LZ is unable to explore this property. So this lead to

loose entropy estimation. So to explore this property of travel time sequence we are introducing

here an algorithm name LZ-LCS shown below.

Calculation procedure of ⩘𝒊 using LZ-LCS

Here value of ⩘𝒊 will calculate fromLCS(X [1…...i], Y [1………j]) where X and Y holds two

strings of length i and j respectively.

Let C [i, j] be the length of an LCS of X [1…..i] and Y [1….j].

C [i, j] can be computed as follows:

1: If i=0 or j=0 then C [i, j] =0.


2: If i, j>0 and Xi = Yjthen C [i, j] = c [i-1, j-1] +1.

3: If i, j>0 and Xi ≠ Yjthen C [i, j] = Max {C [i-1, j-1], C [i-1, j-1]}.

There is a better way to explore the string similarity property of travel time sequence using

longest common subsequence algorithm or longest common substring algorithm. In this travel

time series practically we found that both algorithm gives the same result.

Methodology

Fano’s inequality: An important measure of reliability is the probability that an appropriate

predictive algorithm which can estimate the next travel time. This quantity is subject to fano’s

inequality. That is if the travel time sequence have entropy E and with the interval N, then

reliability of sequence R ≤ Rmax(E,N),

Where Eis given by equation (2)

𝑬 = 𝑯(𝑹𝐦𝐚𝐱 ) + (𝟏 − 𝑹𝒎𝒂𝒙 ) 𝐥𝐨𝐠 𝟐 (𝑵 − 𝟏) … (2)

With the binary entropy function given by equation (3)

𝑯(𝑹𝒎𝒂𝒙 ) = −𝑹𝒎𝒂𝒙 𝐥𝐨𝐠 𝟐 (𝑹𝒎𝒂𝒙 ) − (𝟏 − 𝑹𝒎𝒂𝒙) 𝐥𝐨𝐠 𝟐 (𝟏 − 𝑹𝒎𝒂𝒙) … (3)

If a characterized travel time sequence {a,b,c,a,c,b,d…….} observe at some interval ‘n’. Then

we can say the entropy (E) of this characterized travel time series can be calculated from

equation (1)shown above but problem is in calculation is that to calculate the entropy we should

have an infinite length series but we have a fixed length series,so using fixed length series we

cannot calculate real entropy, if we calculate it this will called estimated entropy (and this series

in entropy calculation have the characteristics that if we increase the length of series congruent to

infinity we can go close to real entropy. Commented [NU4]:


Commented [NU6]:
Procedure steps for the case study is as follows: Commented [NU5]: What is new in this methodlogy, who has
developed this and unique points of using this methods 100 word.

1. Convert the whole travel time series in characterize series based on sub interval which is

explained in appendix 1.

2. Calculate the ⩘𝒊using LZ and LZ-LCS algorithm which is explained earlier.

3. Using ⩘𝒊value calculate the entropy from the equation (1) for different number of

characterize series based subinterval.

4. Finally we calculate reliability of seriesfor different number of characterize series based

subinterval using the entropy calculated in above stepby usingfano’s inequality equation

described in equation (2) and equation (3). Commented [NU7]: Use of this s
Suitablity of factors
Determination of criteria
How assessing the impact of each criteria gudiline
Flow charts
Different types of analysis Steps and formula used , integrteda
formulas
Case study/ Experiment or Analysis/ Discussion Steps

Entire methodology should being terms 1500 words,


For our study we have selected two locations for origin and destination where preferable option Stating advantages methods using reference

for commuting is through private vehicle as public transport is less accessible and Commented [NU8]: study are a map, it charcstrics in terms of
traffic.

uncomfortable. The in between travel time data between these locations is collected from online

available uber[8] data. The past data is collected for all Mondays from August 2017 to June 2019

for all Mondays for a better predictable pattern of travel time sequence. Three time slots are

selected for present study viz. AM peak (7am-10am), midday (10am-4pm), PM peak (4pm -7pm)

period.

The trend of travel time variation for all Mondays during the study period between the study

locations is depicted in the figure 1 and figure 2.


Tis Hazari to New Delhi Railway station
3000
2500
Travel Time (seconds)

2000
1500
1000
500
0
11/20/2017

02/04/2019

05/20/2019
08/07/2017
08/28/2017
09/18/2017
10/09/2017
10/30/2017

12/11/2017
01/01/2018
01/22/2018
02/12/2018
03/05/2018
03/26/2018
04/16/2018
05/07/2018
05/28/2018
06/18/2018
07/09/2018
07/30/2018
08/20/2018
09/10/2018
10/01/2018
10/22/2018
11/12/2018
12/03/2018
12/24/2018
01/14/2019

02/25/2019
03/18/2019
04/08/2019
04/29/2019

06/10/2019
Monday,s Date

AM Mean Travel Time (7am-10am) PM Mean Travel Time (4pm-7pm)


Midday Mean Travel Time (10am-4pm)

Figure1 Travel time trend on MondayTis Hazari to New Delhi Railway station

New Delhi Railway station to Tis Hazari


4000
Travel Time (seconds)

3000
2000
1000
0
05/13/2019
08/07/2017
09/04/2017
10/02/2017
10/30/2017
11/27/2017
12/25/2017
01/22/2018
02/19/2018
03/19/2018
04/16/2018
05/14/2018
06/11/2018
07/09/2018
08/06/2018
09/03/2018
10/01/2018
10/29/2018
11/26/2018
12/24/2018
01/21/2019
02/18/2019
03/18/2019
04/15/2019

06/10/2019

Monday's Date

AM Mean Travel Time (7am-10am) PM Mean Travel Time (4pm-7pm)


Midday Mean Travel Time (10am-4pm)

Figure 2 Travel time trend on Monday for New Delhi Railway station to Tis Hazari

The travel time sequence data is further used for entropy and reliability calculation for the above

mentioned time slots. Entropy and reliability estimation through LZ and LZ-LCS algorithm is

done for the three time slots and for both direction of journey. The calculation of entropy and
reliabilityis done for varied characterize series based subinterval to study its variation with it and

are presented in figure3, figure 4, figure 5 and figure 6.

Entropy- New Delhi Railway station to Tis hazari


2
AM slot LZ
1.5
Entropy

PM slot LZ
1
Midday slot LZ
0.5
AM slot LZ-LCS
0 PM slot LZ-LCS
2 4 6 8
Midday slot LZ-LCS
Characterize series based subinterval

Figure 3Entropy- New Delhi Railway station to Tis hazari

Entropy- Tis Hazari to New Delhi Railway station


2

1.5 AM Mean LZ
Entropy

PM Mean LZ
1
Midday slot LZ
0.5 AM slot LZ-LCS
PM slot LZ-LCS
0
2 4 6 8 Midday slot LZ-LCS
Characterize series based subinterval

Figure 4Entropy- Tis Hazari to New Delhi Railway station


Reliability- New Delhi Railway station to Tis
hazari
1
0.95 AM slot LZ
Reliability

0.9 PM slot LZ
0.85 Midday slot LZ
0.8 AM slot LZ-LCS
0.75 PM slot LZ-LCS
2 4 6 8
Midday slot LZ-LCS
Travel time Interval division

Figure 5Reliability - New Delhi Railway station to Tis Hazari

Reliability- Tis Hazari to New Delhi Railway


station
1
0.95 AM Mean LZ
Reliability

0.9 PM Mean LZ
0.85 Midday Mean LZ
0.8 AM Mean LZ-LCS
0.75 PM Mean LZ-LCS
2 4 6 8
Midday Mean LZ-LCS
Characterize series based subinterval

Figure 6Reliability - Tis Hazari to New Delhi Railway station

Commented [NU9]: Result and discussion should be 700-800


word. Each parameters and analysis should be comments with
A significant decrease in entropy value and increase in reliability value can be seen for LZ and details. With reference of the results with other authors.
How title is important and how it is computed??
How different paramters influence
LZ-LCS method. With increase in travel time interval division entropy values shows an increase Tabulate results , table and picture
Validation and verification of results
Suggestion
trend while for the predictability values it shows a decreasing trend.
Commented [NU10]:
Commented [NU11]:
Conclusion

In the study we have used both LZ and LZ-LCS algorithm for entropy and reliability calculation

and the results are showing that there is a decrease in entropy value in LZ-LCS algorithm as

compared to LZ algorithm and there is increase in predictability value in LZ-LCS algorithm as

compared to LZ algorithm. The varying trend of entropy and reliability with travel time interval

division can also be seen from the results. LZ-LCS can be considered better than LZ algorithm as

it will search for the substring and subsequence from the travel time sequence whereas the LZ

only search for the exactly similar string from the travel time sequence.From the research it can

be conclude that how much the previous data is significant to estimate the reliability of next

sequence element. Commented [NU12]:

What was the objective of the paper, how we have mets,


uniqueness about the results/methods, general application of the
Future Scope title of the study.

In the calculation process of reliability we need very long string to calculate the entropy and then

predictability, so in future this will remains a tedious task to estimate the reliability value with

the less amount of data.

Appendix 1

Suppose we have a travel time series with interval [i, j] where i and j shows lower and upper

travel time in a series respectively. If a series T iwith interval [i, j] have n sub-intervals then we

can write each subinterval as follow:

𝑗−𝑖
Ist sub-interval: 𝑖 to 𝑖 + [ ]
𝑛

𝑗−𝑖 𝑗−𝑖
2nd subinterval: 𝑖 + [ ] to 𝑖 + 2 ∗ [ ]
𝑛 𝑛

𝑗−𝑖 𝑗−𝑖
Similarly for (n-1)thsub-interval: 𝑖 + (𝑛 − 2) ∗ [ 𝑛
] to 𝑖 + (𝑛 − 1) ∗ [ 𝑛
]
𝑗−𝑖
And for nth sub-interval: 𝑖 + (𝑛 − 1) ∗ [ 𝑛
] to 𝑗

Reference

[1] R. M. Fano, Transmission of Information (the MIT Press and Wiley, New York and London,

1961).

[2] Ziv, J., Lempel, A., 1977. A Universal Algorithm for Sequential Data Compression. IEEE

Press.

[3] Li, H., He, F., Lin, X., Wang, Y., & Li, M. (2019). Travel time reliability measure based on

predictability using the Lempel–Ziv algorithm. Transportation Research Part C: Emerging

Technologies, 101, 161-180.

[4] Song, C., Qu, Z., Blumm, N., &Barabási, A. L. (2010). Limits of predictability in human

mobility. Science, 327(5968), 1018-1021.

[5] Lu, X., Wetter, E., Bharti, N., Tatem, A.J., Bengtsson, L., 2013. Approaching the limit of predictability

in human mobility. Sci. Rep. 3 (10), 2923.

[6] Woodard, D., Nogin, G., et al., 2017. Predicting travel time reliability using mobile phone GPS data.

Transp. Res. Part C: Emerg. Technol. 75, 30–44.

[7] Shlayan, N., Kachroo, P., Wadoo, S., 2011. Transportation Reliability Based on Information Theory.

IEEE, pp. 1415–1420.

[8] https://movement.uber.com/?lang=hi-IN Commented [NU13]: Toatl paper length should be 4500-5000


words with above structure and word budget as mentioned.

Вам также может понравиться