15 3331 PDF

Revisiting Labor Supply of New York City Taxi Drivers: Empirical Evidence
from Large-scale Taxi Data

Ender Faruk Morgul, M.Sc. (Corresponding Author)
Graduate Research Assistant, Department of Civil & Urban Engineering,
Polytechnic School of Engineering,
Center for Urban Science and Progress (CUSP), New York University (NYU)
One MetroTech Center, 19th Floor, Brooklyn, NY 11201, U.S.A.
Tel: 1-(646)-997-0531
Email: efm279@nyu.edu
Kaan Ozbay, Ph.D.

Professor, Department of Civil & Urban Engineering,
Polytechnic School of Engineering,
Center for Urban Science and Progress (CUSP), New York University (NYU)
One MetroTech Center, 19th Floor, Brooklyn, NY 11201, U.S.A.
Tel: +1 (646) 997 0552
Email: kaan.ozbay@nyu.edu
Word count: 5,974 words + 2 Figure (500 words) + 5 Tables (1,250 words) = 7,724 words
Abstract: 221 words.
Paper Submitted for Presentation and Publication at the

Transportation Research Board’s 94th Annual Meeting, Washington, D.C., 2015
Re-submission Date: November 15, 2014
TRB 2015 Annual Meeting Paper revised from original submittal.

1 ABSTRACT
2 Taxicab activity patterns in urbanized regions have been subject to major changes over the past
3 few years, especially after the introduction of online taxi-hailing applications. Demand-and-supply
4 characteristics of taxicab services remain as a significant question to be analyzed when deciding
5 on transportation policy improvements in big cities. On the other hand, labor supply theories that
6 seek to explain possible income-targeting behavior in transitory wage changes have also been
7 tested with taxicab driver data. However, empirical work on investigating the relation between
8 income and work hours have resulted in conflicting findings mainly due to methodological
9 differences and, to some extent, limited number of observations. One of the main limitations of
10 taxi demand-supply questions have been lack of sufficient and reliable data. This paper presents
11 an empirical assessment of taxicab drivers’ labor supply using a novel large-scale data source from
12 New York City. Electronically collected data provides detailed information about work hours of
13 drivers and collected fares. The methodological framework employed in this paper was mainly
14 borrowed from the previous literature and the findings were compared with results from earlier
15 studies. Some of the results using the large-scale data were found to deviate from these earlier
16 findings. Moreover, seasonal variations in labor supply response to transitory wage changes were
17 empirically ident1ied. The results of this study provide empirical support for the income-targeting
18 hypothesis.
Keywords: Urban Taxi Traffic; Big Taxi Data; Labor Supply of Taxi Drivers; Income-Target
Behavior; Reference-Dependent Preferences

1 INTRODUCTION AND BACKGROUND
2 Taxicabs, as a paratransit mode of travel, are vital components of urban mobility systems.
3 Although highly controlled by law and regulations, taxicab service involves dynamic human-
4 market interactions since the system operates mostly in a passenger demand-based manner (1).
5 Moreover, newly emerging services of online taxi hailing, such as Uber and Hailo, have incredible
6 potential to force changes in the existing taxicab regulations and to create further competitiveness
7 in the taxicab market (2). Unlike traditional taxicab dispatching practice, online-taxi hailing
8 applications allow passengers to request pick-ups directly from the drivers whose geo-location
9 information is shown on a map and the passengers are usually charged lower rates compared to
10 standard taxicabs (3). Moreover, these services have innovative ways for meeting taxicab supply
11 during times when the demand is higher than regular conditions (i.e. in the rain). For example,
12 Uber uses “surge pricing” which adjusts fares real-time to ensure enough number of cabs available
13 to meet service reliability (4). With the increasing development of mobile technologies,
14 undersupply of taxicabs in major cities and growing market penetration of the new online services,
15 policy changes seem inevitable in the near future (5).
16 Decision-makers must precisely analyze demand-and-supply relationship when deciding
17 on new policies in the presence of the above changes on the taxi demand and supply conditions.
18 Observed data has a significant use for the process of policy making, and while evaluating the
19 aftermath of a policy implementation (6). On the other hand, Big Data, -i.e. similar to the
20 electronically collected taxicab data-, that has become available in several major cities including
21 New York City (NYC) can revolutionize the way econometric analyses are conducted just due to
22 the sheer size of this new data set. Detailed and almost complete information, that was once
23 impossible to obtain, can be extracted, visualized and interpreted from these new data sources
24 including the geographical distribution of taxicab activity, and driver/passenger behavior. For
25 example some researchers used similar taxicab data to explore taxicab driver’s airport pick-up
26 decisions (7), or travel time variability analysis (8).
27 Big transportation data provides many opportunities for better understanding of long
28 standing economic theories that are also increasingly becoming important to decision-makers in
29 the field of transportation. For example, when cities would like to make decisions in terms of the
30 number of taxi medallions or total number of taxis, they also need to understand the relationship
31 between driver wages and labor supply. In fact, over the last two decades the case of taxicab drivers
32 has been of interest to several economists in order to test competing theories of labor supply in
33 response to transitory wage changes. There are quite a few reasons that make the profession of
34 taxicab driving attractive for researchers. First, taxi drivers have a unique flexibility to choose how
35 many hours to work in a day. Therefore the total daily work hours largely vary across days. Second,
36 hourly wages are not fixed or directly correlated with the work hours. Although customers are
37 usually charged on a per mile basis, for hire taxi drivers spend less time for searching customers
38 on busy days that influence their average hourly earnings. Third, mainly due to demand shocks
39 such as adverse weather conditions or unavailability of alternative modes (i.e. subway

1 breakdowns), the wages are highly correlated within days (9; 10). All these particular features of
2 taxicab driving allow for an exceptional test environment to observe the changes in labor supply
3 in response to temporary wage fluctuations.
4 Neoclassical labor supply model, which was first introduced by Lucas and Rapping (11),
5 predicts a positive response to transitory changes in wages. According to this approach, a driver is
6 expected to work longer hours as average hourly earnings get higher in shorter horizons. Camerer
7 et al. (9) rejected this idea by showing empirical evidence for negative wage elasticities from NYC
8 taxicab data. The results were explained by the drivers’ daily targets, which was mainly inspired
9 by reference dependent preferences as explained in the Prospect Theory (12; 13). In particular,
10 based on the sample data they used, Camerer et al. (9) claimed that drivers have daily income
11 targets of which they work until they reach and quit working even if they are earning unusually
12 high wages per hour.
13 The findings presented in Camerer et al. (9) ignited a new research direction on target-
14 income hypothesis. Although there is no definitive agreement on the extent of reference
15 dependence, such as the role of experience of the driver on target setting, or policy implications,
16 several theoretical and empirical contributions have been presented in the following years. Chou
17 (10) conducted an empirical study based on Singapore taxicab driver data and found negative wage
18 elasticities similar to Camerer et al. (9). Farber (14) used both one of Camerer et al. (9)’s three
19 data sets and a new set of taxicab driver data from NYC and found contrasting results to Camerer
20 et al. (9)’s. The difference in these new findings was explained by the alternative estimation
21 methods used and the more advanced way for measuring daily wage rate. A probit model was
22 developed to estimate the stopping probability and when driver fixed effects were included,
23 significant relation was found between the cumulative worked hours, however insignificant
24 relation found between cumulative earnings and the probability of stopping. The results in Farber
25 (14) were, therefore, in more favor of the neoclassical approach (when there are no income effects)
26 rather than target-income setting approach. In a following study, Farber (15) developed a labor
27 supply model with reference dependence which assumes a continuous utility function with a kink
28 (i.e. the point where the marginal utility of income change discontinuously) to represent target
29 income from where on marginal utility of income is lower. In the empirical example, a reduced
30 form model was tested in which target income levels were assumed as latent variables with driver
31 specific means. Using the same data set from NYC taxicab drivers in his previous study, Farber
32 (15) reached two significant conclusions: 1) Drivers had a higher probability of stopping when
33 they reached their daily income target. 2) Daily income targets of drivers were not fixed over the
34 days of the week and yet varied significantly. This latter finding of unstable daily targets was
35 discussed to be a major obstacle for developing a convenient labor supply model for income-
36 targeting.
37 Kőszegi and Rabin (16) formulated a general model for reference-dependent preferences
38 in which target levels were endogenously determined based on rational expectations. The
39 generalized reference-dependent utility function was the linear addition of the standard outcome-

1 based consumption utility and a gain-loss utility which was measured relative to reference points.
2 One of the applications for the developed theory was tested on taxicab drivers. The theoretical
3 model not only assumed an expected income target but also took work effort into account, therefore
4 daily target for work hours were also considered as a decisive factor in labor supply. The
5 application for taxicab drivers agreed with target setting behavior, when earnings of drivers in the
6 morning were high, they tend to work less in the afternoon for any given hourly average wage.
7 Crawford and Meng (17) conducted an empirical analysis using a combined model based
8 on Farber (15) and Kőszegi and Rabin (16). Stopping probabilities were estimated using Farber
9 (14)’s data while reference dependence was addressed by both income and work hour targets as it
10 was in Kőszegi and Rabin (16). The estimation results showed work hours were significantly
11 related to stopping probability, whereas income levels were not. The findings also did not agree
12 with Farber (15)’s finding that individual driver targets were not stable to develop a reference-
13 dependent labor supply model. In this study, Crawford and Meng (17) indeed showed robust
14 estimation of parameters that account for target levels.
Work Hour Target
Driver’s Stopping Decision Points

Income
Income Target
Ir
Driver’s Work Start Point

Expected
Wage
24-Hr 24-H
Hours
15 FIGURE 1: A Reference-Dependent Driver who Stops at the Second Target Level He Reaches:
16 Income Target on a Bad Day, Work Hour Target on a Good Day (17).
17 Figure 1 is adopted from Crawford and Meng (17) to provide a simple illustration of a
18 targeting behavior assuming linear consumption utility. The bold lines represent the budget

1 constraints of which the line for a “good day” (i.e. higher total income) lies above the “bad day”
2 line. The dashed lines showing the “income target” and the “work hour target” divides the work-
3 day into four domains depending on the gain or loss of income and work hours. The driver starts
4 working in the lower-right corner of the graph which falls into the income-loss/work hours-loss
5 domain. Earnings and hours increase in the upper-left direction and the driver seeks to enter the
6 gain domains by reaching one or both of his targets. For every time step in the “hours” axis the
7 driver makes a decision to continue on his expected wage line or to stop driving. The driver stops
8 whenever his expected wage falls below his hours disutility cost of income (i.e. the intersection
9 points with the budget constraint lines). For the shown case, an income targeting driver’s optimal
10 stopping decision was assumed to be at the second target point that he reaches. For the “good day”
11 case the driver continues to drive until he reaches his work hour target and for the “bad day” case
12 he decides to stop at the income target point. Different specifications are possible using non-linear
13 consumption utility or various target responses –e.g. quitting after reaching the first target, which
14 are mainly shaped by individual driver characteristics.
15 Doran (18) investigated the wage elasticities for different length periods of time periods.
16 Using a new NYC taxi driver data, three models were considered, -i.e. neoclassical model, daily
17 income target as in Camerer et al. (9) and expectation based income targeting as introduced in
18 Kőszegi and Rabin (16). Larger negative wage elasticity in short term wage changes were found,
19 compared to long term wage changes. The results confirmed that neoclassical model’s inability to
20 explain negative wage elasticities in short-term wage increase. Camerer et al. (9)’s approach
21 adequately addressed the short-term effect and more advanced expectation based income targeting
22 approach as proposed by Kőszegi and Rabin (16) outperformed the other two models by
23 accounting for the differences in both short and long term wage changes.
24 One of the main limitations of all of the abovementioned studies is the sample sizes used
25 for empirical applications. Table 1 gives a summary of data that were used in these past empirical
26 studies with the number of taxi trips and drivers. Most of the studies were based on NYC data, and
27 the largest dataset used contains only a limited number of trip sheet data from 712 drivers.
28 Considering more than 12,000 taxicabs operating in the study region for the time period of data
29 collection, the sample size is quite small and its representativeness of the overall taxicab driver
30 behavior can be questionable (19). For example, as pointed out by Farber (14), parameters that are
31 used for model estimations in Camerer et al. (9) include average wage of other drivers for the same
32 day and same shift. However, if there are not enough observations (for some cases no observations)
33 for certain time periods of interest, the use of such parameters could lead to biased results.
34 Therefore, one of the main motivations for revisiting the theoretical contributions is the
35 availability of the entire data set of taxi trip information thanks to technological advances in vehicle
36 tracking, payment options and passenger information devices. We present empirical evidence from
37 a total of four-month period data that were obtained from more than 30,000 taxi drivers in NYC.
38 Since it was commonly specified in the literature that model selection and estimation

1 methodologies have significant impact on the findings, this study uses similar methodologies with
2 identical parameter assumptions in Camerer et al. (9) and Farber (14).
TABLE 1: Data Size in Selected Empirical Literature
Study Data
Sample Size Study Region
Three data sets:
TRIP (70 trips; 13 drivers)
Camerer et al. (9) New York City
TLC1 (1,044 trips; 484 drivers)
TLC2 (712 trips; 712 drivers)
Chou (10) Survey data from 92 drivers. Singapore
Farber (14), Farber (15),
Crawford and Meng 13,461 trips, 21 drivers. New York City
(17)
3 DATA
4 In order to improve passenger service quality, TLC of New York City started the Taxi Passenger
5 Enhancement Program (TPEP). As part of TPEP, the usage of technologically advanced devices
6 was mandated for all taxicabs, such as on-board credit card payment gadgets and passenger
7 information monitors that shows taxicab’s current location in the network. In 2008, the process of
8 equipping the entire yellow taxi fleet operating in New York City was completed (20). The new
9 system provided by TPEP has given decision-makers the opportunity to monitor the movement of
10 entire taxi traffic in an efficient and almost fully automated way. For example, traditional paper
11 trip-sheets were replaced with automated vehicle locator (VCL) technology that records the GPS
12 coordinates of the pick-up and drop-off points with time stamps.
13 The 2013 taxi data were provided to the research team via a FOIL (The Freedom of
14 Information Law) request in two parts. First part included the trip related information and second
15 part gave the fare details. The two datasets had same number of entries and using driver
16 information and trip pickup and drop-off coordinate information as key variables, two datasets
17 were merged.
18 The drivers were identified in the dataset using a hack license number and a medallion
19 number both of which were anonymized. Trip durations were provided in seconds and trip
20 distances were given in miles. Fare information was available including total fare amount, tips,
21 surcharge (e.g. extra fee for night trips) and toll amount if applied. Payment type was recorded
22 with the two options as credit/debit card and cash payments, both types were kept for the analysis.
23 Since cash payments did not include tip information, only trip fare amounts including tolls and

1 surcharges were used in this study. Other data fields, namely, GPS coordinates of pick-up and
2 drop-off locations, and number of passengers were not used for the analysis in this paper.
3 In this paper we refer the term “wage” as the gross earnings of a driver in a day. It is a fact
4 that drivers’ net earnings can be different depending on several factors such as being an owner or
5 a renter of taxicab and other finance costs. However the taxi dataset did not provide such detailed
6 information about the drivers, therefore this study only considered gross earnings.
7 The taxi data used for the analysis in this study covers four complete months (i.e. January,
8 April, July and October) of 2013, each of which accounts for a different season. The rest of the
9 months were left out mainly to avoid unusually high computational overhead. A number of
10 filtering steps were applied to remove noisy data points. First, erroneous data such as trip duration
11 or trip distance of zero values, and zero total fare amounts were deleted. The records with empty
12 fields and trip shifts that took more than 12 hours were screened out. Finally, trip shifts of a driver
13 that start within the next 12 hours of the previous shift end were omitted. The number of
14 observations (i.e. trips) in the raw data and filtered data for the time periods that were used in
15 empirical test are given in Table 2. From each month approximately 8% of the original raw data
16 were eliminated.
17 The statistics reported in 2014 NYC Taxi Fact Book (TFB) were used to validate the
18 accuracy of filtered data. Among the four months, highest number of trips (i.e. 15,067,357), highest
19 mean hours worked and highest mean total revenues were observed in April. This was consistent
20 with TFB which showed the highest daily average usage in spring months. Lowest number of
21 hours worked and average wage were observed in July, similarly, TFB showed the lowest taxicab
22 usage in summer months. In addition, TFB reported an average daily taxi trip number of 485,000,
23 which compared accurately with the raw data (20).
24 Median daily work hours in the data set were ranging in a narrow band between 8.65-8.78
25 hours. Average hourly earnings did not significantly differ among the months that represent four
26 different seasons. When compared with Camerer et al. (9)’s data (which was also from NYC),
27 average work hours were higher (i.e. 9.67 hours for their TLC1 data set) however average hourly
28 earnings were almost half of what the data shows in this current study. Average earnings can be
29 easily explained by the taxi rate hikes1. The lower average hours worked in 2013 data could be a
30 result of larger sample size, considering the existence of drivers that work very low hours that
31 decrease the sample mean. Another possible reason can be the different definition of the shifts. For
32 the data at hand, we did not have information about the exact shift start time for drivers. Therefore
33 in calculating hours worked, we did not have the chance to include the time searching for the first
34 customer or the empty driving time after last customer. This information was usually provided in
35 classical trip sheets. Work hours in this study was defined as the time period from the pickup of
1
Since 1996 taxicab rates in NYC increased three times. The rates during Camerer et al. (1) were $2.00 initial rate,
$0.30 per mile and $0.30 per 90 seconds waiting time. In 2013, initial rate was $2.50, $0.50 per mile and $0.50 per
one minute waiting time. For peak weekday hours $0.50 extra fee was charged. For more information please see Brent
Cox, http://www.theawl.com/2012/07/how-much-more-do-taxi-fares-cost-today (Accessed on 7/11/14)

1 the first customer to the drop off time of the last customer. It should be also noted that hourly
2 average wages could be slightly lower if those excess times were included in the total hours
3 worked. However, we did not have any validation data about the measurement error that could be
4 generalized to the entire dataset, therefore no correction methodologies were implemented.
TABLE 2: Summary Statistics
January, 2013 April, 2013 July, 2013 October, 2013
Total Number of 32,224 33,111 32,788 33,456

Drivers:
Total number of Raw Data: Raw Data: Raw Data: Raw Data:
trips: 14,744,391 15,067,357 13,791,052 14,971,100
Filtered Data: Filtered Data: Filtered Data: Filtered Data:
13,423,993 13,820,121 12,767,771 13,784,076
Perc. Removed: Perc. Removed: Perc. Removed: Perc. Removed:
8.95% 8.27% 7.42% 7.93%
Hours Worked
Mean 8.34 8.58 8.40 8.52
Median 8.55 8.78 8.6 8.7
Std. Dev. 2.44 2.38 2.46 2.40
Average Wage
($/hour)
Mean 36.5 38.12 35.77 38.00
Median 36.27 38.01 35.63 37.86
Std. Dev. 8.62 8.31 8.41 8.17
Total Revenue
($)
Mean 299.28 323.20 295.81 319.94
Median 296.72 320.81 296.3 317.44
Std. Dev. 100.27 104.07 100.04 103.76
Number of Trips
Mean 22.03 22.63 20.76 21.71
Median 22.0 22.0 20 21
Std. Dev. 9.27 9.17 8.59 8.95
Number of
Customer
Service Hours
Mean 4.08 4.56 4.18 4.59
Median 4.06 4.57 4.22 4.61
Std. Dev. 1.45 1.54 1.49 1.53

Correlation Correlation Correlation Correlation
between (log) between (log) between (log) between (log)
hours and (log) hours and (log) hours and (log) hours and (log)
wages: -0.357 wages: -0.329 wages: -0.335 wages: -0.324
1 LABOR SUPPLY MODELS

2 The main purpose of this study was to test the alternative labor supply models using large-scale,
3 comprehensive data sets of observed taxicab behavior. In doing so, two main empirical analysis
4 were carried out. The first analysis was the “log hours function” that follows from Camerer et al.
5 (9) and the second analysis uses the “discrete choice (i.e. probit) stopping model” that was
6 explained in Farber (14).
7 Log Hours Function
8 The analysis starts with assessing simple correlation between log hours and log wages (i.e. hourly
9 average wages). Table 2 gives the correlation statistics as negative values for all four months.
10 Scatter plots of the log hours and log wages that were given in Figure 2 also visually support the
11 negative correlation. The graphs indicate that there were no obvious differences in the relation of
12 hourly average wage and work hour among different months of four seasons. One interesting
13 indicator for income-targeting can be seen in the peak points in log hours. The points representing
14 hours worked show that up to a certain wage, which can be the reference point, the log hours
15 gradually increase and after the peak there exist a sharp decrease in hours worked.
16 An ordinary least square (OLS) regression with log hours as the dependent variable and
17 log wages as the independent variable was estimated. The main assumption in this approach was
18 that the driver makes the decision to stop considering the average hourly wages. Thus, hour to hour
19 fluctuations in average hourly wage were not taken into account. To test this assumption hourly
20 wage autocorrelations were calculated. If these autocorrelations were negative, that could mean a
21 rational driver is more likely to quit early on a day with early high earnings, since upcoming hours
22 are more likely to be low-wage hours. That would be an intentional violation of the labor supply
23 behavior what neoclassical model claims (9). Median hourly wages for all drivers were calculated
24 and regression analysis was conducted for median hours on the previous hour’s median wage. For
25 all the data sets, positive autocorrelations2 were found, similar to Camerer et al. (9). Therefore, the
26 alternative hypothesis of intentional violation was rejected.
27 Under the assumption of wages being uncorrelated across days and not fluctuating within
28 day, labor supply response to temporary wage changes can be estimated using the following
29 regression form (14):
2
Estimated autocorrelations: January data, 0.022; April data, 0.069; July data, 0.057; October data: 0.076.

ln Hit  .ln Wit  X it    it , (1)
1 in which H it is the hours worked by driver i on day t , Wit is the average hourly wage of driver i
2 on day t , and X it are the additional parameters (e.g. weather conditions, night shift) that are
3 considered to affect labor supply decision. Parameter  represents the wage elasticity that shows
4 the labor supply response to transitory wage changes. The term  it is a random error component.
January 2013 April 2013
July 2013 October 2013
FIGURE 2: Hours Worked – Average Wage Relationships
5 Discrete Choice Stopping Model
10

1 Discrete choice stopping model used in this study for labor supply is in the form of survival time
2 model. The decision to stop working for a taxicab driver was assumed to take place at discrete
3 points in time. Average hourly wage was considered in a cumulative way, which was a major
4 difference compared to the “log hour model” which only calculated the ratio of total wages over
5 total hours worked. Similarly, hours worked were handled in a cumulative way. For a time point
6  and an optimal time point  * , a driver was assumed to decide to stop if    * . A latent variable,
7 R( )     * , was defined for a stopping point,  , for a driver i on day d and hour c :
Ridc ( )   1 h   2 y  X idc    idc , (2)
8 where h is the cumulative hours worked in the shift until time point  , and y is the cumulative
9 earnings until time point  . X idc accounts for the other factors related to the stopping decision.
10 The term  idc is a normally distributed random error term. Therefore, a driver was assumed to stop
11 at  if Ridc  0 . A probit model was used to estimate Eqn.2. A similar specification was used by
12 Crawford and Meng (17), assuming decisions were made based on earnings in the first hour. In
13 this study we followed the original specification proposed by Farber (14) in which cumulative
14 work hours were considered and not split within the shift.
15 MODEL ESTIMATIONS
16 OLS model for “log wage function” was estimated using three other explanatory variables. Night
17 shift, weather conditions, -i.e. rainy (including snow) and high temperature (above 85o Fahrenheit)
18 days, were included to account for the labor supply decisions under different driving conditions.
19 The estimation results are presented in Table 3. Goodness of fit measures used in Camerer et al.
20 (9)’s models were adjusted R2s and they were ranging between 0.146 and 0.484 (see Table II in
21 their paper) which were not wildly different than the model fits shown in Table 3. All of the shown
22 parameters were significantly different from zero. For the months of January and April, there were
23 no observations for high temperature variable therefore that term was omitted in the estimations.
24 As consistent with Camerer et al. (9)’s findings, the estimated wage elasticities (i.e. the coefficient
25 of “Log hourly wage”) were negative for all of the months. In addition, the findings in this current
26 analysis showed even stronger negative values compared to Camerer et al. (9), who found a
27 negative elasticity estimation of -.618 in the most extreme case when they included driver fixed
28 effects in their estimation. Night shift variable had a higher influence on the dependent variable
29 (i.e. log hours) compared to weather related variables for all cases.
30 Possible measurement errors resulting from the simplistic average hourly wage
31 computation, i.e. dividing total wage to reported hours, were addressed by introduction of
32 instrumental variables into the model. Instruments should be selected that they are not correlated
33 with the measurement error in hours. In particular, these errors could lead to high hours-low wage
34 or low hours-high wage observations that both exacerbate the negativity of the estimated wage
35 elasticity (9; 10). Camerer et al. (9) used the statistical measures from the distribution of hourly
36 wages of other drivers that worked on the same day assuming average wage of other drivers should
11

1 be uncorrelated with a driver’s measurement error. Thus, downward bias in wage elasticity
2 estimation because of over/understated hours could be reduced with the help of summary statistics
3 (25th, 75th and median) of the wage distribution of other drivers.
TABLE 3: OLS Estimation Results
Log hourly wage -.6564 -.6614 -.6474 -.6495

(.002) (.002) (.002) (.002)
Night 0.1313 .1274 .1352 .1280
(.001) (.001) (.001) (.001)
Rain3 .0087 .0011 .0453 .0127
(.001) (.001) (.001) (.001)
High Temp. * .0141 .107
- -
(.001) (.003)
Sample Size
640,084 641,088 641,534 665,725
(# of shifts)
Adjusted R2 .143 .126 .131 .124
Notes:
Values in parenthesis are the standard errors.
*:
High temperature variable was omitted for January and April due to collinearity.
-All coefficients were significant at 1 percent level.
4 Table 4 gives the instrumental variable regression results along with the first stage
5 regression of log hourly wage variable on the instrumental variables. Adjusted R2 values are
6 comparable with the model fits found in Camerer et al. (9) in a sense that when they used larger
7 sample sizes they found model fit measures such as 0.019. However when the sample size was
8 low, their model fits improved dramatically, for example in a model with only eight drivers they
9 found an adjusted R2 value of 0.642 (see table III in their paper). Thus, considering the gigantic
10 data size used in the models shown in Table 4, the model fits can be considered reasonable. The
11 model results, surprisingly, did not reconcile with the findings Camerer et al. (9) since for three
12 out of four data sets, the sign of wage elasticity turned to be positive. All instrumental variable
13 coefficients were significantly different than zero. Using the same instrumental variables with the
14 same methodology, Camerer et al. (9) found negative wage elasticities in five of his six models.
3
Historical weather reports were downloaded in comma separated text file format from
http://www.wunderground.com/. Accessed on 7/16/14
12

1 That single model with positive wage elasticity (i.e. using TRIP data set with driver fixed effects)
2 showed negative elasticities when instrumental variables were not used.
3 The only negative wage elasticity was found for the month of January 2013, indicating a
4 possible income-targeting behavior among the drivers for this month. One interesting feature of
5 this month January is that it has the second lowest number of total trips among the four months
6 (see Table 2). Considering the negative impact of rainy (or snowy) days on work hours, drivers’
7 setting daily targets and quitting after reaching their expected earnings could make more sense,
8 especially when compared to working conditions in other months. Moreover, median average
9 hourly wage is the second lowest and standard deviation of average hourly wage is the highest in
10 January, among the four months. The higher variation in wages could be an indication of driver
11 preference heterogeneity, which can be modeled using more advanced modeling approaches such
12 as Mixed Logit (21). Driver fixed effects were generally employed in the literature, however for
13 large-scale data sets including thousands of drivers, introduction of driver-specific dummy
14 variables into the model is not computationally feasible.
TABLE 4: Instrumental Variables Regression Results

Log hourly wage -.2309 1.031 .3052 .599
(.014) (.026) (.011) (.019)
Night 0.1147 .0265 .1122 .0756
(.001) (.002) (.001) (.002)
Rain -.0136 .0243 .0137 .0306
(.001) (.002) (.001) (.002)
High Temp.* .005 .114
- -
(.001) (.068)
First-Stage Regressions
Median -.0294 -.0067 -.012 -.008
(.001) (.001) (.002) (.001)
th
25 percentile .0302 .0193 .025 .0229
(.001) (.001) (.001) (.001)
th
75 percentile .0262 .0119 .0163 .0114
(.001) (.001) (.001) (.001)
Adjusted R2 .087 .035 .068 .124
P-value for
.000 .000 .000 .000
instruments
Notes:
Values in parenthesis are the standard errors.
*:
-All coefficients were significant at 1 percent level.
13

1 Strong positive elasticity observed in April, on the other hand, again, could be
2 compromised with the demand for taxicabs, which is highest during spring months throughout the
3 year. Drivers might be more likely to drive longer hours while the demand is at its yearly peak,
4 and behave more consistent with neoclassical labor supply model to avoid the opportunity cost.
5 One interesting extension to current analysis could be including experience levels of drivers in the
6 models. Camerer et al. (9) conducted an analysis based on the hack license numbers, which were
7 issued in order (i.e. smaller numbers meaning high experience), and found that high-experienced
8 drivers’ wage elasticities were higher than low-experienced drivers. That is, income targeting were
9 more common within experienced drivers. In our data, however the hack license numbers were
10 anonymized therefore it was not possible to validate this finding. For July and October data, again
11 positive wage elasticities found but with much smaller magnitudes compared to April data.
12 The “log hours function” analysis yielded in three main observations. First, the average
13 hourly wage calculation by dividing the entire daily income to total work hours, did not give
14 consistent results when instrumental variables were introduced. This econometric shortcoming
15 was questioned by Farber (14) earlier but was not tested empirically mainly because of the lack of
16 sufficient amount of data to calculate the instrumental variables in the form of statistical summary
17 of other drivers. It was empirically shown in this current study that using the same instrumental
18 variables as in Camerer et al. (9) could change the sign of the wage elasticity, as it happened for
19 the three out of four data sets in this study. Therefore similar model specifications using a broad
20 average hourly wage must be implemented with care. Second, it was still not possible to reject
21 income targeting behavior, since January data indicated negative wage elasticities with and without
22 instrumental variables. Third, varying demand levels for taxicabs could have an effect on income
23 targeting, as April data showed a significantly positive wage elasticity when the demand was the
24 highest in the study year. The latter two findings could be an implication of the seasonal variations
25 in income-targeting behavior of drivers.
26 Table 5 shows marginal probability effects using “discrete choice stopping model”.
27 Cumulative work hours (i.e. total work hours including breaks, cruising and with passenger),
28 cumulative on-duty hours with passengers, cumulative wage and control variables (i.e. night shift,
29 weather conditions) were considered as independent variables. The effects of the variables on
30 stopping probability was evaluated for two different key points. The first point was the same values
31 used in Farber (14)’s analysis (8.0 hours total work, $150 shift income) and the second point was
32 with an updated daily income level based on Table 2 (8.0 hours total work, $300 shift income).
33 The estimated coefficients of cumulative hours and cumulative wage had positive signs for all
34 cases which accorded with intuition. The sensitivity to the selected evaluation point was also
35 reasonably estimated with increasing marginal effect on probability of stopping with higher
36 cumulative shift income in the second evaluation point.
37 The probability of stopping was significantly related to wage and hours worked for all of
38 the study months. The effects of cumulative wage were always higher than cumulative hours, both
39 of which had significant coefficients at one percent level. This was a consistent observation with
14

1 the results seen in the literature when only two variables, cumulative hours and cumulative wages,
2 were used (14; 17). However when the dummy variables were included in the models, those
3 literature reported insignificant effect of cumulative wage, while cumulative hours usually
4 remained significant. The results in Table 5 did not agree with these findings, in which cumulative
5 wage was still a significant factor on stopping probability when the models were estimated with
6 control variables. Therefore, this finding suggested that daily target earnings was an important
7 factor in taxicab drivers’ labor supply. The effect of a $10 increase in income on the probability of
8 stopping work was found 0.24 to 0.48 per cent for a driver that has already earned $300 for that
9 day.
TABLE 5: Marginal Effects Based on Stopping Probability: Discrete Choice Stopping

Model
(1) (2) (1) (2) (1) (2) (1) (2)

Cumulative Hours .006 .013 .004 .010 .004 .009 .004 .009
(.001) (.001) (.001) (.001) (.001) (.001) (.001) (.001)
Cumulative Hours -.010 -.023 -.006 -.013 -.004 -.008 -.004 -.007
with Passengers (.001) (.001) (.001) (.001) (.001) (.001) (.001) (.001)
Cumulative .032 .076 .025 .056 .027 .058 .023 .050
Wage/100 (.001) (.001) (.001) (.001) (.001) (.001) (.001) (.001)
Night .009 .021 .013 .030 .018 .038 .013 .029
(.001) (.001) (.001) (.001) (.001) (.001) (.001) (.001)
Rain -.004 -.011 .001 .001 .002 .004
- -
(.001) (.001) (.001) (.001) (.001) (.001)
High Temp. * .001 .002 -.016 -.034
- - - -
(.001) (.001) (.001) (.001)
Sample Size
13,671,075 14,074,120 12,806,892 14,000,203
(# of taxi trips)
Pseudo R2 .140 .161 .151 .156
Notes: (1): Evaluation point for marginal effects: Cumulative Total Hours= 8.0, Cumulative Income/100=1.5
(2): Evaluation point for marginal effects: Cumulative Total Hours= 8.0, Cumulative Income/100=3.0
*:
-Values in parenthesis are the standard errors.
-All coefficients were significant at 1 percent level. The coefficient for the “Rain” variable was insignificant for
January, 2013 estimate, therefore excluded from the table.
-Trip shifts that last less than 0.5 hours were excluded from the analysis.
10 Moreover, the variation in the marginal effect of daily income among the months in the
11 “discrete choice stopping model” interestingly matched the findings from “log hour function”
12 model. For example, January data showed the highest marginal effect of cumulative income (.076)
13 on the stopping decision, as shown in Table 5. In other words, drivers were more likely to stop
15

1 after reaching $300 evaluation point for cumulative income in January. Similarly, as shown in
2 Table 3 and Table 4, significant negative wage elasticities were found for the January data.
3 Therefore two completely different methodologies were found to support income-targeting
4 behavior of drivers for the same data set. April data in Table 4 showed strong positive wage
5 elasticities in “log hour model”, -i.e. opposite of income-targeting behavior, whereas the results in
6 Table 5 for “discrete choice stopping model” showed the second lowest magnitude of the marginal
7 effect of income (.056) on the decision to stop driving. Thus, even if there exists an income
8 targeting behavior for the month of April, the impact of this behavior was not as significant as two
9 other months, namely January and July.
10 CONCLUSION
11 This paper deals with the taxicab drivers’ labor supply behavior. In particular, the goal of this
12 paper is to take advantage of the new Big Data to shed additional light to the well-established
13 theories of labor supply in response to transitory wage changes and also demonstrate usefulness of
14 it in the context of well-studied economic theories, especially the ones with major implications on
15 urban transportation. One of the potential uses of these models is for the understanding of the
16 impact of demand changes for traditional taxi services on the supply as result of the introduction
17 of new technologies.
18 This study adds to literature by presenting an empirical investigation for income-targeting
19 approach. The data used in this study were collected under TPEP and included almost every taxi
20 trip in NYC during January, April, July and October months of 2013. The main objective of this
21 study is to investigate the relationship between income and worked hours of taxicab drivers based
22 on the large size data and to seek evidence for income-targeting hypothesis which has been a topic
23 of debate in labor economics literature. The methodologies used in this study were purposely
24 borrowed from earlier contributions of Camerer et al. (9) and Farber (14), and the results based on
25 the new data were compared to their findings.
26 The findings were interesting in that they indicated several differences in estimations that
27 could lead to different interpretations as seen in previous studies. For the “log-hour function”
28 approach, simple OLS estimation showed the evidence for income-targeting behavior as proposed
29 in Camerer et al. (9). However, when instrumental variables were included in the model to reduce
30 possible biases due to measurement errors in average hourly wage, positive wage elasticities were
31 found for three of the four data sets (9). Income-targeting assumption still could not be rejected
32 since for January data, negative wage elasticities were estimated. It is hypothesized that seasonal
33 taxicab demand could be a determinant factor in labor supply response of drivers. For “discrete
34 choice stopping model”, both cumulative income and cumulative hours worked were found to
35 affect the probability of stopping. Farber (14) and Crawford and Meng (17)found income as an
36 insignificant factor whereas total hours worked as a major factor for similar models. On the other
37 hand, two different methodologies predicted somewhat similar variation among the study months,
38 January is found to be the month when income-targeting behavior was most supported.
16

1 Both models used in this study were well established and had a wide usage that have led to
2 development of various extensions. Empirical testing, however, needs more work to better
3 understand the real-world implications of these models. The results presented in this study showed
4 the potential use of Big Data for conducting unique empirical analysis of economic theories where
5 limited data can significantly bias analysis results. There are many implications of the provided
6 analysis regarding the reliability of taxicab service. For example, a forthcoming study by Farber
7 (22) discusses the reasons for undersupply of taxicabs during rainy weather. Using a large-size
8 taxicab data Farber (22) suggests a rain surcharge can be used to increase the taxicab supply.
9 Future similar studies should include detailed spatial characteristics of taxi trips. Although
10 Farber (14) found a weak relation between location and key labor supply model parameters, -i.e.
11 income and hours worked, an estimation using increased sample size could be worth investigating.
12 One limitation of the provided analysis was the nonexistence of the information about the driver
13 experience which was shown to have an impact on labor supply (9). However large size taxi data
14 sets were usually provided with anonymized driver information due to privacy concerns, therefore
15 further cooperation with city agencies or taxi medallion companies is needed to receive data about
16 driver experience and similar characteristics. It is important to note that, new taxi data continues
17 to become available and it will be possible to search for any changes in traditional taxi service
18 demand and supply as a result of on-line taxi hailing applications. This remains as a topic of future
19 research.
ACKNOWLEDGMENTS AND DISCLAIMER
This study was partially funded by Polytechnic School of Engineering and Center for Urban
Science and Progress (CUSP) of New York University. The authors are solely responsible for the
findings presented in this research work.
REFERENCES
20 [1] Schaller, B. A regression model of the number of taxicabs in US cities. Journal of Public
21 Transportation, Vol. 8, No. 5, 2005, p. 63.
22 [2] Bilton, N. Disruptions: Taxi supply and demand, priced by the mile. The New York Times, Jan,
23 Vol. 8, 2012.
24 [3] Rempel, J. A Review of Uber, the Growing Alternative to Traditional Taxi Service. American
25 Foundation for the Blind (AFB) AccessWorld Magazine, Vol. 15, No. 6, 2014.
26 [4] Uber. What Is Surge Pricing And How Does It Work? https://support.uber.com/hc/en-
27 us/articles/201836656-What-is-surge-pricing-and-how-does-it-work- Accessed om 11/15/2014,
28 2014.
29 [5] Thompson, D. How Uber's Taxi App is Changing Cities.
30 http://www.theatlantic.com/business/archive/2013/11/how-ubers-taxi-app-is-changing-
31 cities/281451/, 2013.
32 [6] Schaller, B. Issues in fare policy: Case of the new york taxi industry. Transportation Research
33 Record: Journal of the Transportation Research Board, Vol. 1618, No. 1, 1998, pp. 139-142.
34 [7] Yazici, M. A., C. Kamga, and A. Singhal. A big data driven model for taxi drivers' airport pick-
35 up decisions in New York City.In Big Data, 2013 IEEE International Conference on, IEEE, 2013.
36 pp. 37-44.
17

1 [8] Kamga, C., and M. A. Yazıcı. Temporal and weather related variation patterns of urban travel
2 time: Considerations and caveats for value of travel time, value of variability, and mode choice
3 studies. Transportation Research Part C: Emerging Technologies, 2014.
4 [9] Camerer, C., L. Babcock, G. Loewenstein, and R. Thaler. Labor supply of New York City
5 cabdrivers: One day at a time. The Quarterly Journal of Economics, 1997, pp. 407-441.
6 [10] Chou, Y. K. Testing alternative models of labour supply: Evidence from taxi drivers in
7 Singapore. The Singapore Economic Review, Vol. 47, No. 01, 2002, pp. 17-47.
8 [11] Lucas, R. E., and L. A. Rapping. Real wages, employment, and inflation. The Journal of
9 Political Economy, 1969, pp. 721-754.
10 [12] Kahneman, D., and A. Tversky. Prospect theory: An analysis of decision under risk.
11 Econometrica: Journal of the Econometric Society, 1979, pp. 263-291.
12 [13] Tversky, A., and D. Kahneman. Advances in prospect theory: Cumulative representation of
13 uncertainty. Journal of Risk and uncertainty, Vol. 5, No. 4, 1992, pp. 297-323.
14 [14] Farber, H. S. Is Tomorrow Another Day? The Labor Supply of New York City Cabdrivers.
15 Journal of Political Economy, Vol. 113, No. 1, 2005.
16 [15] ---. Reference-dependent preferences and labor supply: The case of New York City taxi
17 drivers. The American Economic Review, Vol. 98, No. 3, 2008, pp. 1069-1082.
18 [16] Kőszegi, B., and M. Rabin. A model of reference-dependent preferences. The Quarterly
19 Journal of Economics, 2006, pp. 1133-1165.
20 [17] Crawford, V. P., and J. Meng. New york city cab drivers' labor supply revisited: Reference-
21 dependent preferences with rationalexpectations targets for hours and income. The American
22 Economic Review, Vol. 101, No. 5, 2011, pp. 1912-1932.
23 [18] Doran, K. Are long-term wage elasticities of labor supply more negative than short-term ones?
24 Economics Letters, Vol. 122, No. 2, 2014, pp. 208-210.
25 [19] Schaller Consulting. NYC Taxicab Fact Book In, New York, 2006.
26 [20] New York City Taxi and Limousine Commission (NYCTLC). Taxicab Fact Book 2014. 2014.
27 [21] McFadden, D., and K. Train. Mixed MNL models for discrete response. Journal of applied
28 Econometrics, Vol. 15, No. 5, 2000, pp. 447-470.
29 [22] Farber, H. S. Why You Can't Find a Taxi in the Rain and Other Labor Supply Lessons from
30 Cab Drivers.In, National Bureau of Economic Research, 2014.
18

15 3331 PDF

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

15 3331 PDF

Загружено:

Авторское право:

Доступные форматы

Revisiting Labor Supply of New York City Taxi Drivers: Empirical Evidence

from Large-scale Taxi Data

Kaan Ozbay, Ph.D.

Paper Submitted for Presentation and Publication at the

TRB 2015 Annual Meeting Paper revised from original submittal.

TRB 2015 Annual Meeting Paper revised from original submittal.

TRB 2015 Annual Meeting Paper revised from original submittal.

TRB 2015 Annual Meeting Paper revised from original submittal.

Driver’s Stopping Decision Points

Driver’s Work Start Point

TRB 2015 Annual Meeting Paper revised from original submittal.

TRB 2015 Annual Meeting Paper revised from original submittal.

TABLE 1: Data Size in Selected Empirical Literature

TRB 2015 Annual Meeting Paper revised from original submittal.

TRB 2015 Annual Meeting Paper revised from original submittal.

TABLE 2: Summary Statistics

January, 2013 April, 2013 July, 2013 October, 2013

Total Number of 32,224 33,111 32,788 33,456

TRB 2015 Annual Meeting Paper revised from original submittal.

1 LABOR SUPPLY MODELS

TRB 2015 Annual Meeting Paper revised from original submittal.

January 2013 April 2013

July 2013 October 2013

FIGURE 2: Hours Worked – Average Wage Relationships

5 Discrete Choice Stopping Model

TRB 2015 Annual Meeting Paper revised from original submittal.

Ridc ( )   1 h   2 y  X idc    idc , (2)

TRB 2015 Annual Meeting Paper revised from original submittal.

TABLE 3: OLS Estimation Results

January, 2013 April, 2013 July, 2013 October, 2013

Log hourly wage -.6564 -.6614 -.6474 -.6495

TRB 2015 Annual Meeting Paper revised from original submittal.

TABLE 4: Instrumental Variables Regression Results

January, 2013 April, 2013 July, 2013 October, 2013

TRB 2015 Annual Meeting Paper revised from original submittal.

TRB 2015 Annual Meeting Paper revised from original submittal.

TABLE 5: Marginal Effects Based on Stopping Probability: Discrete Choice Stopping

January, 2013 April, 2013 July, 2013 October, 2013

(1) (2) (1) (2) (1) (2) (1) (2)

TRB 2015 Annual Meeting Paper revised from original submittal.

TRB 2015 Annual Meeting Paper revised from original submittal.

TRB 2015 Annual Meeting Paper revised from original submittal.

TRB 2015 Annual Meeting Paper revised from original submittal.

Вам также может понравиться