0 оценок0% нашли этот документ полезным (0 голосов)

1 просмотров10 страницRail research

Jan 16, 2021

© © All Rights Reserved

0 оценок0% нашли этот документ полезным (0 голосов)

1 просмотров10 страницВы находитесь на странице: 1из 10

Route Choice Behavior

Analysis Using Automatic Fare Collection Data

Applications of automatic fare collection data were investigated, with a ing data. A novel approach for estimating choice behavior with the

focus on analysis of travel time reliability and estimation of passenger outcome rather than predicting it with the determinant is needed.

route choice behavior. Beijing Metro was used as a case study. A rail This is the second subject of discussion in the paper.

journey was decomposed, and each component was studied with regard

to the uncertainties involved. Methods were then designed and validated

to infer platform elapsed time (PET) for through stations and platform Previous Studies

elapsed time–transfer (PET-Trans) for transfer stations by using smart

card transactional data, train schedules, and complementary manual Studies of smart card fare payment data can be categorized into

surveys. With this information, the journey time distribution of any three groups (1). Those at the strategic level mainly comprise long-

path can be established, and methods were proposed for inferring route term network planning, customer behavior analysis, and demand

choice proportions. After data preparation, the methods were applied to forecasting. At the tactical level, schedule adjustment and longitu-

two typical origins and destinations from the Beijing Metro. Key values dinal and individual trip patterns are studied. Ridership statistics

concerning travel time reliability, such as PET, PET-Trans, travelers and performance indicators belong to the operational level. Most

left behind (unable to board), and path coefficients, were obtained and previous studies focused on passenger trip origin–destination (O-D)

interpreted in detail. The outcome of this research could facilitate estimations. Some had entry-only boarding location records, others

analysis of transit service reliability and passenger flow assignment in had records of both entry and exit, and others had neither (2–5).

daily operations. Another priority of research was transfer pattern analysis (6). Other

studies analyzed daily patterns of ridership and longitudinal variations

of individual passengers’ choice behavior (7).

Transit operators and researchers find that they must be aware of how In terms of modes, rail transit only, buses, and multimodal systems

systems perform and travelers behave to make informed decisions. have all been explored for different purposes. Almost all studies agree

Because of the labor and costs and the number of subjects per sample, that transferring is a critical component of transit systems and that

data collected from manual surveys were not readily available. In addi- transfer conditions affect the system’s attractiveness and travelers’

tion, the quality of manually collected data was not good enough to choice behaviors (8). However, compared with modal transfers,

identify travelers’ preferences and locate the limiting segment hinder- transfers within the closed rail transit network have been largely

ing the improvement of system services. Automatic fare collection ignored (5).

(AFC) has provided transit agencies with huge amounts of operational Several studies have analyzed multiple rail systems. In 2007,

data, which are widely recognized as having the potential to serve Chan (3) developed two applications of Oyster card data for the

functions beyond the designated purpose of revenue management. London Underground. One was to estimate an O-D flow matrix; the

Travel time is a primary factor affecting travel behavior; however, other was to construct rail service reliability metrics. The excess

it also depends on the interaction between passengers’ route choice

journey time metric and the journey time reliability metric were

behavior and train operations. Although transit service reliability is

effective in evaluating service reliability. The work was impressive

a growing issue, there are not enough practical ways to analyze it.

because it was the first attempt to measure service delivery quality

Since travel time is assumed to be a good indicator of transit service

by using elapsed travel time. However, as mentioned in the closure

performance, travel time reliability analysis is the first subject dis-

section of the thesis, several limitations remain. For example, the

cussed in this paper. Route choice behavior has been studied exten-

excess time methodology “cannot break the total journey time into

sively; however, conventional models (e.g., the discrete choice model)

components” (3); thus, detailed analysis and description of each

depend heavily on behavior assumptions and lack reliable support-

separate stage of the trip are impossible. Another limitation is that the

research did not address those O-D pairs with transfers. More than

School of Transportation Engineering, Tongji University, 4800 Cao’an Road, Shanghai 40% of O-D pairs in the London Underground have one or more

201804, China. Corresponding author: Y. Sun, gexingba@gmail.com. transfers. In addition, the research adopted a fixed walking speed

without considering the variance involved.

Transportation Research Record: Journal of the Transportation Research Board,

No. 2275, Transportation Research Board of the National Academies, Washington, Another important work was that of Kusakabe et al., whose main

D.C., 2012, pp. 58–67. objective was “to develop a methodology for estimating which train

DOI: 10.3141/2275-07 is boarded by each smart card holder” (9). The methodology, based on

58

Sun and Xu 59

assumptions about passenger behavior, worked well in determining a 5. Most lines have headways as short as about 2 min: 136 s for

passenger’s train boarding plan in most cases. The remaining small No. 1, 150 s for No. 2, and 170 s for No. 5 during a.m. peaks. Walking

percentage of undecided transactional records were assumed to be times in stations and between lines add uncertainties to the system

assigned to all possible plans with an equal probability. Some resultant because passengers with different walking speeds may have boarded

data such as waiting times and transfer times were also analyzed. various trains. This requires breaking up the journey to attain a

However, the methodology had certain deficiencies. For example, higher resolution of research.

walking time from the entry gate to the platform was ignored, and thus

some wrong boarding plans could have been estimated; as the authors Other megacities in highly populated areas, especially East Asian

admitted, “it seems difficult for him/her to reach the platform from cities, such as Seoul, South Korea, and Shanghai and Guangzhou,

the entry gate in such a short period” (9). Another deficiency was that China, are confronted with similar issues.

“equal probability” was applied in considering those undetermined In summary, few efforts have been devoted to travel time reliability

records. That is, passengers’ preferences were not fully studied. analysis with smart card data, although AFC has been in place for

Zhao et al. (10) and Chan (3) studied passenger route choice years. There is an urgent need to assign passenger flow and allocate

behavior. Both studies used TransCAD to assign inferred O-D revenues in multioperator networks. Traditional route choice methods

matrices onto the network. The work of Zhao et al. used logit and were not effective in addressing such concerns as crowding in rapidly

mixed logit models to examine passenger behaviors, and Chan aimed growing transit networks.

to calculate link loads and interchange volumes. However, assump- This paper focuses on travel time reliability analysis and passenger

tions about travelers’ preferences for paths (e.g., utility functions) route choice behavior estimation with smart card data in an alternative

were not properly verified because of a lack of data. way. A discussion of the methodology is followed by a case study of

In summary, prior studies did not decompose journeys into separate Beijing Metro. The case study demonstrates the ease and feasibility

stages, so detailed examinations of each stage were not found. of applying the proposed methods in measuring system service

Walking inside stations and between lines was simplified, and high performance and passenger flow assignment.

variances were not taken into consideration. Crowding was not

addressed explicitly. Analysis of path choice behavior had no data

support. Formulation, Methodology,

In addition, rapidly expanding cities with emerging economies, and Validation

such as Beijing, were confronted with unique situations, and new

approaches are needed to meet more demanding requirements. The basic philosophy of the following studies is that journey time

depends on the interaction between passenger route and train opera-

tion. If the interaction mechanism can be understood, journey time

Concerns of Beijing Metro can show what happens inside the closed black-box-like rail transit

system.

In the case of the Beijing Metro, the AFC system was adopted around To begin with, a general journey can be described as follows: a

the time of the 2008 Olympics. Thanks to the relatively late adoption, passenger taps his or her smart card at the entry gate, goes through the

more advanced technology is used in the system, and so more accurate access passenger way to arrive at the platform, and starts to wait for

and reliable entry and exit data are obtained. In comparison, the AFC the desired train. When a train with enough capacity to accommodate

system in New York City was in effect in the 1990s, and the entry-only all passengers at the platform arrives, the passenger will ride it to get

transaction times were “truncated to tenth of an hour” (4). Thus, no to the expected station platform. After that the passenger walks to

efforts to infer the alighting station or to estimate O-Ds are necessary the exit gates and again swipes the smart card to finish the journey.

for Beijing. Of course, an additional stage of transfer has to be included if a

However, Beijing Metro has unique concerns: transfer between lines occurs. During this journey, the AFC records

the transactions at the entry and at the exit. The transactional time

1. The network is run cooperatively by Beijing Mass Transit Rail- stamps are of interest in this paper.

way Operation Corporation and Beijing MTR Corporation, which The total journey time can generally be divided into several com-

requires revenue allocation. But a passenger’s path choice could not ponents: access time (ACT) from the origin entrance to the platform,

be determined directly from the transactional record, since a pas- platform wait time (PWT), on train time (OTT), interchange time

senger only swipes the card at the entry and at the exit, without any between line platforms (ICT), interchange wait time (IWT), and

transfer information. Thus, there is a need to mine transactional data egress time (EGT) from the final platform to the destination exit.

to determine passengers’ chosen routes.

2. Straightforward and effective methods are necessary for iden-

tifying service reliability and capacity constraints. Assumptions

3. The current network comprising 14 lines cannot adequately

meet the city’s transit demands. Some lines have been overloaded, Before detailed analyses of individual components are begun, three

and the network has set a single-day record of 6.82 million rides (11). relevant assumptions about passengers and train operations are made:

The effect of crowding on passenger behavior cannot be neglected.

4. The network has the world’s fourth-greatest track length and •• Punctuality assumption: Trains are punctual according to the

will be extended to 660 km by 2015 and 1,000 km by 2020. The schedule. This assumption is reasonable because all metro trains in

dramatic expansion will enable more flexible and complex journeys Beijing run in automatic train operation mode and dwell times are

within the system because more transfer stations will provide better fixed.

connecting services. As a result, route choice behavior will become •• Poisson arrival assumption: Passengers arrive at the metro station

increasingly complicated. in accordance with a Poisson process. Random arrivals are widely

60 Transportation Research Record 2275

recognized on the condition that the headway is “short enough” arrivals of passengers at nearly uniform intervals are more practical.

(12, 13). Researchers have proposed various criteria for defining However, all Beijing Metro routes operate according to their own

“short enough.” For example, Seddon and Day (12) recommended schedules without transfer coordination. Thus, the arrival time of a

10 to 12 min on the basis of empirical research, and Fan and Machemehl passenger on the first route is independent of the train on the second

(13) identified 10 min as the transition from random to nonrandom route to which the passenger is transferring. In other words, the

arrival with a mathematical model. Poisson arrivals at metro stations difference between the train’s arrival time on the first line and the

should be confirmed since all headways are below 10 min; some are train’s departure time on the second line varies from zero to a full

as short as about 2 min. headway. Then, the difference between the batch’s average arrival

•• Representativeness assumption: Several specific O-Ds of large time at the second platform and the train’s departure time on the

volumes from the same origin are representative enough to analyze second line ranges from zero to a headway of the second line. That

the origin. This is a practical assumption because it is impossible to is, IWT is not uniformly distributed for one batch of arrivals but is

aggregate all journeys originating from a certain station for the analy- distributed uniformly for “one batch of batches.”

sis of the origin (e.g., to assess the crowding level at the platform). However, real-life issues are more complicated. The above analysis

For example, 152,400 journeys between 169 O-D pairs originated assumes that a train always takes all waiting passengers away. If the

from Tiantongyuan from April 4 to April 10, 2011. Consideration of train fails to have enough capacity to accommodate all passengers

all 169 O-D pairs would be burdensome, and it is not necessary. waiting to board, some passengers will have to wait for another full

Because of different volumes, the top five O-D pairs in terms of vol- headway. This phenomenon, known as “travelers left behind,” is

ume account for 14% of the 152,400 journeys; the top 10, 24%; and frequent, especially during a.m. peaks. Thus, a proper variable is

the top 20, 38%. Clearly, those O-D pairs are representative enough. necessary to take the additional time into account. Field observa-

The theory of sampling says that a subset of individuals from a tions showed that no first-come-first-served policy existed in prac-

population can yield knowledge of the whole (14). tice mainly because of crowding and a lack of order. Furthermore, a

typical train has six or eight cars and five doors on each side of a car.

Thus, 30 or 40 doors are available for boarding simultaneously. A

Formulation wise strategy for choosing train doors could also lead to an earlier

boarding. Therefore, random service is more realistic.

The first component under consideration, OTT, is formulated as a An exact variable to describe the additional waiting time is then

fixed value according to the punctuality assumption. found. One successful boarding requires k independent trials, each

ACT, ICT, and EGT fall into the same category, since they are with probability of success p. In this formulation, k is a random

closely related to walking speeds, layout of pedestrian facilities variable following the geometric distribution; that is, k ∼ Geo(p) for

inside the station, activities involved during walk, crowding level, k = 0, 1, 2, . . . . Therefore, the additional waiting time caused by

and so forth. Because of the complexities of quantitative description travelers left behind could be expressed as kT. A more generalized

and analysis, manual surveys are necessary for identifying their PWT, called platform elapsed time (PET), is the sum of a uniform

distributions. Data were collected by survey staff by following a variable and integer multiples of a geometric variable.

randomly selected passenger until arrival at the platform under cir- Whether the various time components are mutually independent

cumstances in which passenger walking was not impeded. To reflect is another issue of interest. Those who remain suspicious of the

the impact of the factors mentioned above, all scenarios, such as mutual independence are likely under the illusion that PWT depends

different physical conditions of passengers, peak hour and off peak, on ACT, probably because they observe that a person with a higher

and every possible walk route, were covered. Then all samples were walking speed tends to catch an earlier train and thus spends less

grouped by station, by direction (e.g., southbound and northbound), time waiting than does another person with a lower walking speed,

and by time band (e.g., a.m. peak and off peak). Asymmetry could be despite both of them going through the entrance gate at the same

observed through comparison of maximum and minimum values with time. In fact, it is equally possible that the two board the same train.

the median. The gamma distribution was selected as the candidate In this case, the one with a shorter ACT will wait longer and thus has

for descriptions of ACT, ICT, and EGT over the more commonly a larger PWT. Thus, under the punctuality assumption and the Poisson

used normal distribution because it has a long tail on the right on the arrival assumption, PWT depends only on scheduled headways.

probability density curve and a clear left end as the minimum. The Note that only an intuitive interpretation has been provided here

Kolmogorov–Smirnov (K-S) test results confirm this point. A posi- (simulations under different ACT scenarios provide direct and

tive location parameter γ was introduced to represent this minimum convenient proof).

walking time since a two-parameter (α, shape parameter; β, scale In summary, a typical journey is deconstructed into several

parameter) gamma distribution variable has a minimum of zero. components, each of which is thoroughly studied and given an

PWT and IWT constitute another group to discuss. PWT is ana- appropriate random variable for each stage of a trip. The individual

lyzed first. According to the Poisson arrival assumption, passengers components are independent of each other.

arrive at the metro station according to a Poisson process, denoted

as N(0), N(0) = 0. A train leaving at t takes away all n passengers who

have arrived during the interval [0, t]. According to the conditional Methodology and Validation

distribution of arrival times (15), the result will be Si ∼ U(0, t), where

Si is the ith arrival time. If waiting time is defined as Wi = t − Si, Next, methods for identifying key variables on the basis of knowledge

clearly Wi ∼ U(0, t), i = 0, 1, 2, . . . , n. So with the assumptions of of rail journeys are proposed. The methods will answer the following

Poisson arrivals and train punctuality, PWT is a uniformly dis- questions:

tributed random variable ranging from 0 to the headway. IWT is

discussed next. Clearly, arriving transfer passengers are unlikely to 1. How much time do passengers spend at the origin station (PET)

have Poisson arrivals since they alight almost simultaneously. Batch for those O-Ds without transfers?

Sun and Xu 61

2. How much time do passengers spend at the interchange station 3. Compare Pi with timetables. Then the first available train to

[platform elapsed time–transfer (PET-Trans)] for those O-Ds with board for i is found.

transfers? 4. k ∼ Geo(0.8) is generated to reflect travelers left behind; thus,

3. In what proportion do passengers with the same O-D choose the actual train that is boarded by passenger i is determined.

each possible path? 5. Search the timetables. The alighting time at the destination

platform is found and denoted as Qi.

The basic approaches are as follows: 6. EGTi ∼ Gam(3, 10, 10) is added to Qi to get the time of finishing

this journey, which is denoted as Ti.

1. For those O-Ds without transfers, ACT, OTT, and EGT are 7. The journey time is obtained by subtracting Si from Ti.

known, and PET can be deduced from total journey times.

2. If there are transfers, ACT, PET (deduced from other O-Ds After the experiment, the experimenter will present all train sched-

without transfers), OTT, ICT, and EGT are known, and IWT can be ules, journey times, and some samples (sampling rate 10%) from

inferred in the same manner. the ACT, EGT populations to the observer to derive PET, which is

3. After the PET and PET-Trans inferences, all time components equivalent to PWT + kT.

of a path are known, and the path’s journey time distribution is The following steps show how the observer uses the method to

established by summing all those variables. If the overall journey infer the masked variables.

times of a multipath O-D are regarded as mixtures of each path’s First, samples of ACT, EGT are used to estimate gamma distri-

distributions, the coefficient, or proportion, can be estimated. bution parameters. The result is denoted as GamOBS(αact, βact, γact) and

GamOBS(αegt, βegt, γegt), which mean the ACT and EGT distributions,

So-called blind experiments are introduced here. Methods are respectively, inferred by the observer (OBS). Thus, the total journey

described and then tested. In such an experiment, “certain information time can be written as follows:

which could introduce bias or otherwise skew the result is withheld

from the participants, but the experimenter will be in full possession ACT OBS + ( PWT + k OBST ) + OTT + EGT OBS = T

of the facts” (16).

In this research, the blind experiment is as follows: an experi- where ACTOBS ∼ GamOBS(αact, βact, γact) and EGTOBS ∼ (αegt, βegt, γegt).

menter generates train schedules and artificial journeys with all vari- Because of the independence of time components, the following

ables included and thus has full knowledge of the experiment. Then equations are established:

the experimenter reveals partial information to the observer. If the

observer can infer the information withheld by the experimenter by

E ( ACT OBS ) + E ( PWT + k OBST ) + OTT + E ( EGT OBS ) = E ( T )

using a certain method on the basis of the revealed information, the

proposed method passes a blind test and is feasible.

Var ( ACT OBS ) + Var ( PWT + k OBST ) + 0 + Var ( EGT OBS ) = Var ( T )

Experiment 1. PET Inference It is easy to find E(k) by solving the first equation and Var(k)

by solving the second. Since k has been formulated as a geometric

The train schedule is as follows: the first train leaves at 25,016 and variable, the parameter could be derived either from the perspective

the last at 30,456, with a fixed headway of 170 during this period. of expectation or variance, and the two estimations should be equal in

The OTT of each train run is fixed at 1,032. The steps are as follows: theory. In fact, 0.7989 and 0.7886 are obtained. Finally, all inferences

by the observer are compared with what the experimenter possesses.

1. Generate a sequence of Poisson arrival times in the interval The comparison is given in Table 1.

[25016, 30116] at the entry with the intensity λ = 2. In the table, bolded numbers are revealed directly, italicized

2. For passenger i, ACTi ∼ Gam(2, 15, 35) is generated and numbers are estimated by the observer from samples presented by

added to Si to get Pi (i.e., the time arriving at the platform). the experimenter, and the estimation of parameter p is the average

Journey

Component Definition Parameter THR EPR OBS

β 15 14.811 15.023

γ 35 34.707 35.114

PET PWT a 0 0.05 0

b 170 169.88 170

k ∼ Geo (p) p 0.8 0.7986 0.7937

OTT OTT OTT 1,032 1,032 1,032

EGT EGT ∼ Gam (α, β, γ) α 3 3.0159 3.0217

β 10 9.9179 9.8849

γ 10 10.246 10.097

Note: THR = in theory; EPR = by experimenter; OBS = by observer. Bold numbers are directly revealed,

italic numbers are estimated by the observer from samples prepared by the experimenter.

62 Transportation Research Record 2275

of two estimates from the expectation and variance properties. The choice behavior after all journeys have occurred by using the outcome

relative error of estimation is clearly less than 1% in this experiment of the choice behavior.

and is thus acceptable. In combination with PWT and kT, the distri- A new formulation of passenger path choice is discussed first.

bution of PET can be derived, and therefore how much time is spent Suppose that there are n possible paths from the same origin to the

in the origin station is known. destination. One journey could be regarded as one sample from n

In summary, the method proposed here is effective in estimat- subpopulations. The probability density of all journeys mixed together

ing PET; PET for each station can be derived from representative is thus the mixture density of n component distributions, denoted as

O-Ds originating from the station (refer to the representativeness follows:

assumption).

n

f ( x ) = ∑ ω i f ( x; θ i )

i =1

Experiment 2. PET-Trans Inference

where f (x; θi) is the ith probability density function and ωi is the

The purpose of this experiment is to test the method designed to proportion. For the mixture to be a proper probability density function,

answer Question 2: How much time do passengers spend at the n

it must be the case that ∑i=1ωi = 1, ωi ≥ 0.

transfer station platform (PET-Trans)? The PET inferred in Experi- The equation with regard to total mean µ and component mean µi is

ment 1 is an input for inferences of PET-Trans. The method discussed

in Experiment 1 can be applied to find two estimates of PET-Trans n

from the expectation and variance equations, since PET-Trans is the E ( X ) = µ = ∑ ωiµi

only unknown variable. i =1

of the data is the same as for Experiment 1. Evidently, PET-Trans The equation with regard to total variance σ2 and component σ 2i is

is inferred with satisfactory accuracy. The result demonstrates the

n

robustness of the inference method in the sense that slight estimation E ( X − µ ) = σ 2 = ∑ ω i ( µ i2 + σ i2 ) − µ 2

2

errors of the ACT or EGT parameters (15.216 for 13.749 and 10.024 i =1

for 10.996) do not affect the estimation result.

Now that the overall probability density can be obtained directly

from the smart card data set and the distribution of individual paths

Experiment 3. Coefficient of Path has been established, the coefficient of paths (i.e., path choice) can be

(Route Choice) Inference uniquely derived. In other words, the path choice inference problem

is converted to mixed weights inference given mixed distribution

With the inferred PET and PET-Trans and other known variables, and component distributions.

the distribution of individual path journey times can be established In the experiment, the experimenter will model passengers’ path

by summing all those independent variables. The next step is to choices (refer to steps in Experiment 1). The proportions choosing

derive path coefficients for multipath O-Ds. each path are as follows: ω1 = 0.15, ω2 = 0.5, ω3 = 0.35. After the

Many works have discussed how to predict path choice behavior experiment, component distribution functions and the overall dis-

before trips or journeys, such as by use of the discrete choice model. tribution are presented to the observer. Each component density and

However, this experiment takes a new approach: it tries to infer path the overall density are shown in Figure 1. Then the observer makes

Journey

Component Definition Parameter THR EPR OBS

β 15 15.216 13.749

γ 35 35.005 35.392

PET PWT a 0 0.05 0

b 170 169.78 170

k ∼ Geo (p) p 0.8 0.813 0.811

OTT OTT OTT 1,032 1,032 1,032

ICT ICT ∼ Gam (α, β, γ) α 4 4.2421 3.8343

β 12 11.602 11.843

γ 70 68.787 71.721

PET-Trans IWT a 0 0.02 0

b 200 199.96 200

k ∼ Geo (p) p 0.75 0.746 0.742

OTT OTT OTT 836 836 836

EGT EGT ∼ Gam (α, β, γ) α 3 2.9676 2.8939

β 10 10.024 10.996

γ 10 9.9806 9.5797

Sun and Xu 63

x 10-3

3.5

Var = 32749.0

PATH2 Mean = 2214.2

2.5

Var = 24463.8

Probability Density

PATH3 Mean = 2301.3

2 Var = 24988.1

1.5 PATH1

PATH2

1 PATH3

Total

0.5

0

1800 2000 2200 2400 2600 2800 3000 3200 3400

Journey Times

inferences by solving the equations involving mean and variance February 21–February 27, 2011, and April 4–April 10, 2011. No

stated above. Simple calculations could indicate that p1 = .1499, holidays, special events, or operation disruptions occurred during

p2 = .4998, and p3 = .3503. Obviously the inferred weights are satis- the periods. Selected fields are line IDs, station IDs for both origin

factory, and thus the method for inferring path choice behavior passes and destination, entry times (to the whole minute), and exit times

the blind experiment, too. Of course, the method is not limited to three (to seconds). Other information, such as card types and fares, was

paths. It is effective in addressing more paths if higher-order moments of no interest to this research and was thus veiled.

are introduced; in this experiment, only the first and second moments Because of the inconsistency in the accuracy of entry and exit times,

(i.e., mean and variance) are involved. data enrichment to fill in the lost seconds was necessary. In view of

In this section, each time component has been analyzed, with a the Poisson arrival assumption, the independent increment property,

focus on uncertainties. The proposed methods have been designed, and the conditional distribution of arrival times (15), the lost seconds

described, and tested. PET, PET-Trans, and path choice inferences for any periods of time are uniformly distributed between 0 and 59.

are applied in the next section. Thus, uniform variables are generated to enrich the entry times. For

example, five entry records were observed from 8:00:00 to 8:00:59.

As a result of truncation by AFC, the five entry times were all

Case Study of Beijing Metro recorded as 8:00. In this case, the independent increment property

guarantees that the 60-s period is also characterized by the Poisson

Data Preparation process, and the conditional distribution of arrival times ensures that

the five arrivals are uniformly distributed. Thus, five uniform vari-

Smart card transactional data used in this study came from AFC ables ranging from 0 to 60 (not included) are added to the recorded

Clearing Center of Beijing Metro, train schedules from the Train entry times.

Control Center, and pedestrian data from manual surveys by Beijing The next step is to remove questionable data. Since extremely

Infrastructure Investment Co., Ltd. Data characteristics are described long travel times will bias service analysis, it is reasonable to exclude

and then processed to satisfy the requirements of the proposed abnormal journeys that are only due to personal behaviors. Abnormal

methods. journeys may occur as a result of getting lost in the network, losing

Train timetables contain a record of each train’s arrival and depar- cards, selling newspapers on board, begging in the transfer tunnel,

ture times at every station. All those times are accurate to seconds. and so forth. The cumulative density function (CDF) curves of several

Pedestrian surveys were conducted in two steps. First, eight stations randomly selected O-D pairs show that journey times increase at a

were selected to determine the proper distribution of walking time, and steady rate within the range of the 98th percentile, whereas the jump

the sample size exceeded 40. The histogram and K-S test results both to the maximum value from about the 98th or 99th percentile is sharp.

confirmed the selection of the gamma distribution. Next, 20 samples The dramatic increase results from extreme journeys, which are

collected for other stations were used to estimate the parameters of excluded from this study. Thus, only data within the 98th percentile

ACT, EGT, and so forth. All pedestrian data were to the second. range are taken into consideration.

Only smart cards (single-ride ticket or One Card Through Card) CDFs of four O-D journey times (extreme ones removed) are shown

are accepted by Beijing Metro. Each swipe at the entrance or exit in Figure 2. The figure shows that any maximum journey time is more

triggers one transaction record. Thus, the data from AFC have a than twice the minimum, which is indicative of high uncertainties

penetration rate of 100% of all rail transit passengers, with both and complexities. The observation of a cross between the Huixinxijie

entry and exit controls. The data used here covered two 1-week periods: Nankou–Tiantongyuan (HX-TTY) and Beijing Nan–Fuxingmen

64 Transportation Research Record 2275

0.9

0.8

0.7

0.6

0.5

0.4

0.3

HX-TTY

0.2 DSK-BJN

CWM-TTY

0.1

BJN-FXM

0

500 1000 1500 2000 2500 3000 3500

Journey Time (seconds)

FIGURE 2 CDFs of four O-D journey times, a.m. peaks, April 4 to April 8, 2011.

(HX, TTY, etc. are acronyms of metro stations.)

(BJN-FXM) curves indicates that the BJN-FXM O-D has a smaller 95% (0.63 + 0.63 p 0.37 + 0.67 p 0.372) can leave before the third

lower bound and a larger upper limit and thus has a less reliable train’s departure time.

journey time (which will be quantified later). Another point that deserves further explanation is that even

during off-peak periods on weekends, the corresponding parameter

for travelers left behind is around 0.9, not 1.0. That is, only 90%

Analysis of TTY-HX of passengers ride the first available train, but that might not be due

to the extreme crowding. This is understandable because passengers

The TTY community is home to more than 40,000 residents in north- during those periods are more willing to be left behind, perhaps

ern Beijing (17). Most residents commute to downtown Beijing by anticipating that the next train will have seats available, which is dif-

Metro Line 5. HX is one of the destinations. ferent from the case of a.m. peak periods on workdays. Commuters

Journey distributions across periods of the day and between work- are known for their tolerance for crowding and are unlikely to

days and weekends are plotted in Figure 3. Commuting characteristics wait for another train voluntarily; thus, the parameter reflects true

are evident: the TTY-HX has an extreme a.m. peak, and HX-TTY crowding levels.

has an obvious p.m. peak (see Figure 3, a and c) on workdays, while The time spent at the platform can also be inferred. For exam-

no such clear peaks could be observed on weekends. ple, the PET of TTY during the workday a.m. peak has a mean of

To adapt the travel patterns of Line 5, schedules are different on 135 s and a standard deviation of 116 s, while the PET of TTY

workdays and weekends. On workdays, the a.m. and p.m. peaks have during workday off-peak periods has a mean of 206 s and a stan-

a short headway of 170 s, while the largest headway is 360 s during dard deviation of 143 s. Thus, the PET for the a.m. peak is smaller

off-peak periods. Service is less frequent on weekends. But weekend (135 s versus 206 s) and the coefficient of variation of PET for that

schedules do not adopt a uniform headway, either. Thus, time period period is larger (0.86 versus 0.69). This indicates that the time spent

variations, although slight, are also considered for weekends. The at the platform by passengers during the a.m. peak is smaller on the

crowding level at the platform could be measured by the parameter average; however, it is highly uncertain.

for travelers left behind. A smaller estimated parameter means a The reliability of service performance, defined as the spread and

higher crowding level. Crowding levels of the TTY station during variability of journey time distribution, can also be evaluated. As

a.m. peaks are of interest, whereas crowding levels of the HX station stated previously, extremely long journey times are excluded from

during p.m. peaks are of interest. this study. Thus, a lower bound, which is the minimum amount of time

The resulting estimates are given in Table 3. The table indicates needed to travel, and an equivalent upper limit, the 98th percentile

that (a) workday peaks have higher crowding levels than weekends, value of all journey times, are obtained. Note that the lower bound

(b) clear differences between peak and off-peak periods can be occurs with the shortest walking times, a short or no platform wait,

observed for workdays, and (c) the crowding levels of TTY during and no travelers left behind. Only a few fortunate people can have a

a.m. peaks are similar to those of HX during p.m. peaks. Workday value near the lower bound. The upper end represents those who are

a.m. peaks are taken as an example to interpret the parameter 0.63: not that fortunate (i.e., more walking time, a full headway of waiting,

only 63% of passengers could ride the first available train, 86% and several additional headways due to travelers left behind). Any

(0.63 + 0.63 p 0.37) wait at most for one additional headway, and passenger’s journey time is within the range of the two limits. A

Sun and Xu 65

300 50

250 40

200

30

150

20

100

50 10

0 0

4 6 8 10 12 14 16 18 20 22 24 4 6 8 10 12 14 16 18 20 22 24

(a) (b)

300 50

250 40

200

30

150

20

100

50 10

0 0

4 6 8 10 12 14 16 18 20 22 24 4 6 8 10 12 14 16 18 20 22 24

(c) (d)

FIGURE 3 Journey distributions across periods of the day on workdays and on weekends: (a) TTY-HX, workday; (b) TTY-HX, weekend;

(c) HX-TTY, workday; and (d) HX-TTY, weekend. X-axis indicates time of day; Y-axis indicates number of entries.

measurement of service reliability, the ratio of the upper bound to the primarily because of a lack of data to support path choice behavior

lower bound, is proposed here. assumptions. The following will demonstrate that further mining of

The measurement is straightforward and easy to calculate. Despite journey times can give an indication of route choice behavior and

the simple definition, the measurement is effective in evaluating thus provide a network view. As has been mentioned, the method

service delivery quality. The smaller the ratio, the tighter the journey regards the overall journey time distribution as a mixture of several

time distribution and the more reliable the service. For example, component distributions. Each path is a component, and the coefficient

recall the cross of CDFs of HX-TTY and BJN-FXM (Figure 2). The of path is the mixed weight. The method is intuitive in the sense that

reliability ratio of HX-TTY is 1.858, while that of BJN-FXM is 2.358. different paths have different distributions (mean, variance, skewness,

Therefore, the BJN-FXM O-D pair provides less reliable service kurtosis, etc.), and thus the pattern of the overall distribution can reflect

during a.m. peaks on workdays than does HX-TTY. the mixture proportions. In addition, since journey time distributions

Similar analysis of transfer stations could be done. Specific exam- vary across periods of day, traveler path choice dynamics can also be

ples and detailed results are omitted here because this work is partially captured, while traditional methods fail to address this variability.

for establishing the distribution of individual path journey times. The O-D pair Tiantongyuan–Dongsishitiao (TTY-DS) is used to

illustrate the method. There are two paths: (a) Tiantongyuan (start,

Line 5) to Yonghegong (transfer to Line 2) to Dongsishitiao (end,

Analysis of Tiantongyuan–Dongsishitiao Line 2) and (b) Tiantongyuan (start, Line 5) to Lishuiqiao (transfer

to Line 13) to Dongzhimen (transfer to Line 2) to Dongsishitiao

Many prior works stopped here, at either the station level or the line (end, Line 2).

segment level, and failed to proceed to the network level (3, 10), The paths for TTY-DS are shown in Figure 4. The figure is only a

partial representation of the Beijing Metro network; only the elements

related to this analysis are included.

TABLE 3 Estimated Parameters for Travelers Left Behind The first path has a longer OTT but only one transfer; the second

has one more transfer, which is likely to offset the advantage of shorter

TTY–HX HX–TTY

OTT. By use of the method described in Experiment 3, the mixed

Workday a.m. peak 0.63 Workday p.m. peak 0.61 weight, or the path coefficient, is obtained. To prove its adaptability,

Workday off peak 0.91 Workday off peak 0.93 the method is applied to a.m. peaks and to off-peak periods separately.

Weekend a.m. peak 0.75 Weekend p.m. peak 0.74 Since only two paths are involved, two estimates can be made,

Weekend off peak 0.92 Weekend off peak 0.89

one from the equation of first moment and the other from second

moment. The inference result is given in Table 4.

66 Transportation Research Record 2275

Wangjing West

Guangximen

Shaoyaoju

Tiantongyuan

Liufang

Dongsishitiao

13 Destination

Beiyuan

Tiantongyuan South

Huixinxijie Nankou

Huixinxijie Beikou

Origin

Lishuiqiao South

Beiyuanlu North

Lama Temple

Datunlu East

2

Dongzhimen

Yonghegong

Lishuiqiao

5

5

Hepingxiqiao

Hepingli Beijie

FIGURE 4 Illustration of TTY-DS.

The mean of the two is taken to get the mixed weight. The propor- 2. Passengers spend less time waiting at the platform during

tion taking Path 2 is approximately 9%, and slightly higher during a.m. peaks, but the time is highly uncertain (coefficient of variation

a.m. peaks. The result provides proof of the transfer penalty when around 0.8).

the general cost of a path is calculated. 3. Most O-D pairs have maximum journey times that are at least

When the coefficient of each path is determined, O-D matrices twice the minimum journey time, which indicates poor reliability.

can be assigned onto the network. Thus, analysis and evaluation 4. One additional transfer affects path choice behavior signif-

at the network level, such as link load and transfer volume, are icantly.

possible. The practical need for allocating fare revenues among 5. The method for estimating route choice proportions is easy

several operators is addressed. and effective. It is also capable of capturing day-to-day dynamics of

choice behavior.

Conclusions and Prospective Work This research developed new applications of the AFC transactional

data, but additional issues must be addressed. Limitations of the

This paper contributes to rail transit service assessment and route current work and prospective work are summarized below:

behavior estimation from the perspective of journey time. On the basis

of the independence of time components, methods are designed and 1. Although small-scale surveys—for example, of travelers left

validated to infer PET, PET-Trans, and the path coefficient. The behind—have confirmed the inferences of this work, efforts should

methods are applied to two O-D pairs in the Beijing Metro. Variables be made to verify the results at the network level. Advanced systems,

concerning travel behaviors (PET, PET-Trans, path choice proportion) such as automatic passenger counting, may facilitate this verification.

and performance indicators (travelers left behind, crowding levels, 2. Because of the labor, cost, and complexity of collecting reli-

travel time reliability) are inferred. The results were interpreted to able pedestrian data, robustness analysis of the method (like that in

address Beijing Metro’s concerns. Experiment 2) should be done to determine the extent to which

Some findings are as follows: pedestrian data errors affect the estimates and inferences.

3. This works is based on the Beijing Metro, and the methodology

1. The phenomenon of travelers left behind at some stops is must be tailored to apply to other networks. For example, if trains

frequent during the a.m. peak, which reflects the limited capacity of are not in automatic train operation mode (i.e., trains are not exactly

trains and high crowding levels on the platform. on schedule), the coefficient of variation of headways can be included

to modify the formulation of PWT. In addition, future work may

consider operational disruptions.

TABLE 4 Path Coefficient Inferences 4. The minor defect of Beijing Metro’s AFC entry times prevents

the confirmation of Poisson arrivals, although a method of enrichment

Path 1 Path 2 has been proposed. Data from more advanced systems, such as that

in Shanghai, with entry and exit times both accurate to seconds,

Period 1st Moment 2nd Moment 1st Moment 2nd Moment could confirm the assumption further.

a.m. peak 0.89 0.91 0.11 0.09 5. The time difference between entry and exit is the only item

used in the study. Separate exploitations of entries and exits might

Off peak 0.90 0.94 0.10 0.06

yield more information. For example, disaggregate analysis might

Sun and Xu 67

determine each cardholder’s boarding plan; thus, reconstruction or 7. Csikos, D. R., and G. Currie. Investigating Consistency in Transit Pas-

prediction of the real-time distribution of passenger flows over the senger Arrivals: Insights from Longitudinal Automated Fare Collection

Data. In Transportation Research Record: Journal of the Transportation

network is possible. Research Board, No. 2042, Transportation Research Board of the National

Academies, Washington, D.C., 2008, pp. 12–19.

The study is helpful in the evaluation of rail service reliability and 8. Seaborne, C. W. Application of Smart Card Fare Payment Data to Bus

passenger flow assignment. Network Planning in London, UK. MS thesis. Massachusetts Institute of

Technology, Cambridge, 2008.

9. Kusakabe, T., T. Iryo, and Y. Asakura. Estimation Method for Railway

Passengers’ Train Choice Behavior with Smart Card Transaction Data.

References Transportation, Vol. 37, 2010, pp. 731–749.

10. Zhao, J., A. Rahbee, and N. H. M. Wilson. Estimating a Rail Passenger

1. Pelletier, M.-P., M. Trepanier, and C. Morency. Smart Card Data Use in Trip Origin–Destination Matrix Using Automatic Data Collection

Public Transit: A Literature Review. Transportation Research Part C, Systems. Computer-Aided Civil and Infrastructure Engineering, Vol. 22,

Vol. 19, 2011, pp. 557–568. No. 5, 2007, pp. 376–387.

2. Cui, A. Bus Passenger Origin–Destination Matrix Estimation Using 11. Beijing Subway: March 4, 1998, Subway Passenger Traffic Record

Automated Data Collection Systems. MS thesis. Massachusetts Institute (Chinese). March 5, 2011. http://bjsubway.com/node/2344. Accessed

of Technology, Cambridge, 2006. June 11, 2011.

3. Chan, J. Rail Transit OD Matrix Estimation and Journey Time Reliability 12. Seddon, P. A., and M. P. Day. Bus Passenger Waiting Times in Greater

Metrics Using Automated Fare Data. MS thesis. Massachusetts Institute Manchester. Traffic Engineering and Control, Vol. 15, 1974, pp. 442–445.

of Technology, Cambridge, 2007. 13. Fan, W., and R. B. Machemehl. Characterizing Bus Transit Passenger

4. Barry, J. J., R. Freimer, and H. L. Slavin. Use of Entry-Only Automatic Waiting Times. Presented at 2nd Material Specialty Conference of

Fare Collection Data to Estimate Linked Transit Trips in New York City. Canadian Society of Civil Engineering, Montreal, Quebec, Canada, 2002.

In Transportation Research Record: Journal of the Transportation 14. Sampling (Statistics). http://en.wikipedia.org/wiki/Statistical_sampling.

Research Board, No. 2112, Transportation Research Board of the National 15. Poisson Process. http://en.wikipedia.org/wiki/Poisson_process. Accessed

Academies, Washington, D.C., 2009, pp. 53–61. June 11, 2011.

5. Wang, W. Bus Passenger Origin–Destination Estimation and Travel 16. Blind Experiment. http://en.wikipedia.org/wiki/Blind_experiment.

Behavior Using Automated Data Collection System in London, UK. Accessed June 11, 2011.

MS thesis. Massachusetts Institute of Technology, Cambridge, 2010. 17. Meet Dr. Freud. New Yorker, Jan. 10, 2010. http://www.newyorker.com/

6. Jang, W. Travel Time and Transfer Analysis Using Transit Smart Card reporting/2011/01/10/110110fa_fact_osnos. Accessed Jan. 11, 2011.

Data. In Transportation Research Record: Journal of the Transportation

Research Board, No. 2144, Transportation Research Board of the National

Academies, Washington, D.C., 2010, pp. 142–149. The Rail Transit Systems Committee peer-reviewed this paper.