Вы находитесь на странице: 1из 101

An Empirical Model of the Medical Match

Nikhil Agarwal
Harvard University

Job Market Paper

November 18, 2012

Abstract

This paper develops a framework for estimating preferences in two-sided matching markets
with non-transferable utility using only data on observed matches. Unlike single-agent choices,
matches depend on the preferences of other agents in the market. I use pairwise stability together
with a vertical preference restriction on one side of the market to identify preference parameters
for both sides of the market. Recovering the distribution of preferences is only possible in
an environment with many-to-one matching. These methods allow me to investigate two issues
concerning the centralized market for medical residents. First, I examine the antitrust allegation
that the clearinghouse restrains competition, resulting in salaries below the marginal product
of labor. Counterfactual simulations of a competitive wage equilibrium show that residents
willingness to pay for desirable programs results in estimated salary markdowns ranging from
$23,000 to $43,000 below the marginal product of labor, with larger markdowns at more desirable
programs. Therefore, a limited number of positions at high quality programs, not the design of
the match, is the likely cause of low salaries. Second, I analyze wage and supply policies aimed at
increasing the number of residents training in rural areas while accounting for general equilibrium
eects from the matching market. I nd that nancial incentives increase the quality, but not
the number of rural residents. Quantity regulations increase the number of rural trainees, but
the impact on resident quality depends on the design of the intervention.

JEL : C51, C78, D47, J41, J44, L44


Keywords: Resident matching, discrete choice, antitrust, rural hospitals,
compensating dierentials, competitive equilibrium

I am grateful to my advisors Ariel Pakes, Parag Pathak, Susan Athey and Al Roth for their constant support
and guidance. I thank Atila Abdulkadiroglu, Raj Chetty, David Cutler, Rebecca Diamond, William Diamond, Adam
Guren, Guido Imbens, Dr. Joel Katz, Larry Katz, Greg Lewis, Jacob Leshno, Julie Mortimer, Joseph Newhouse, Mark
Shepard, Dr. Debra Weinstein and workshop participants at Harvard University for helpful discussions, suggestions
and comments. Data acquisition for this project was funded by the Lab for Economic Applications and Policy and
the Kuznets Award. Financial support from the NBER Nonprot Fellowship and Yahoo! Key Scientic Challenges
Program is gratefully acknowledged. Computations for this paper were run on the Odyssey cluster supported by the
FAS Science Division Research Computing Group at Harvard University. Email: agarwal3@fas.harvard.edu.
1 Introduction
Each year, the placement of about 25,000 medical residents and fellows is determined via a cen-
tralized clearinghouse known as National Residency Matching Program (NRMP) or "the match."
During the match, applicants and residency programs list their preferences over agents on the other
side of the market, and a stable matching algorithm uses these reported ranks to assign applicants
to positions. Agents on both sides of the market are heterogeneous but salaries paid by residency
programs are not individually negotiated with residents. Therefore, preferences of residents and
programs, rather than prices, determine equilibrium outcomes. The medical match is iconic for the
stable matching literature, but with few exceptions this literature has been primarily theoretical.
Particularly, there is little evidence on the eects of government policies or the design of the market.
These interventions can substantially aect the physician workforce in the United States because
medical residents are a key component of current and future physician labor.1
This paper develops a new techniques for recovering the preferences of both the residency
programs and residents (market primitives) using data only on nal matches. The method may be
useful for studying other matching markets because data on matches is common compared to stated
preferences. As in the medical match, these primitives are important determinants of outcomes in
matching markets when agents are heterogeneous and prices are not highly personalized. Examples
include schooling, colleges and many high-skilled labor markets.
I estimate the model using data from the market for family medicine residents in the U.S. to
empirically analyze two issues that have received particular attention from academic researchers
as well as policy makers. First, I investigate the antitrust allegation that the centralized market
structure is responsible for the low salaries paid to residents. The plaintis in a 2002 lawsuit argued
that the match limited the bargaining power of the residents because salaries are set before ranks are
submitted. They reasoned that a "traditional market" would allow residents to use multiple oers
and wage bargaining to make programs bid for their labor. Using a perfect competition model as
the alternative, they argued that the large salary gap between residents and nurse practitioners or
physician assistants is a symptom of competitive restraints imposed by centralization. Although the
lawsuit was dismissed due to a legislated congressional exception, it sparked an academic debate on
whether inexibility results in low salaries (Bulow and Levin, 2006; Kojima, 2007). Observational
studies of medical fellowship markets do not nd an association between low salaries and the
presence of a centralized match (Niederle and Roth, 2003, 2009). While these studies strongly
suggest that the match is not the primary cause of low salaries in this market, they do not explain
why salaries in decentralized markets remain lower than the perfect competition salary benchmark
suggested by the plaintis. I use a stylized theoretical model to show that residentspreferences for
programs result in an "implicit tuition" that depresses salaries in a decentralized market. I then
quantify the magnitude of this markdown using estimates from the empirical model.
1
According to the "2011 State Physician Workforce Data Book" (ww.aamc.org/workforce), in 2010, 678,324 physi-
cians were reported as actively involved in patient care, whereas 110,692 residents and fellows were in training pro-
grams.

1
Second, I study policy interventions for lowering the perceived under-supply of residents and
physicians in rural areas of the U.S. Although a fth of the U.S. population lives in rural areas, less
than a tenth of physicians practice in rural communities (Rosenblatt and Hart, 2000). The Patient
Protection and Aordable Care Act of 2010 addresses the shortage of rural physicians by funding
an increase in the number of residency programs in rural areas, redistributing unused Medicare
funds originally allocated for residency training in urban hospitals, and increasing the funding of
loan forgiveness programs used to recruit physicians to shortage areas. Broadly speaking, the act
uses a combination of supply interventions and nancial incentives to address the disparity in access
to care. Such regulations are not unique to the United States. Recently, Japan reduced capacities
in urban residency programs to mitigate their rural resident shortage (Kamada and Kojima, 2010).
Similar regulations aecting prices and quantities are common in a variety of matching markets
but their eects on assignments are not well understood.2
Analyzing the general equilibrium eects of government policy as well as predicting outcomes
under alternative market structures using counterfactual simulations require estimates of the prefer-
ences of both sides of the market. Direct data on these market primitives is frequently not available.
Although the rank order lists submitted by residents and programs are collected by the NRMP,
they are highly condential. Preference lists may not even be collected in other labor or matching
markets. When only data on nal matches are available, it is not immediately clear how to use
these data to estimate preferences.
This paper develops methods for estimating preferences using only data on nal matches. The
techniques apply to a many-to-one two-sided matching market with low frictions. Motivated by
properties of the mechanism used in the medical match, I assume that the nal matches are pair-
wise stable (Roth and Sotomayor, 1992). According to this equilibrium concept, no two agents
on opposite sides of the market prefer each other over their match partners at pre-determined
salary levels. Following the discrete choice literature, I model the preferences of each side of the
market over the other as a function of characteristics of residents and programs, some of which
are known to market participants but not to the econometrician. I use the pure characteristics
model of Berry and Pakes (2007) for the preferences of residents for programs. This model allows
for substantial heterogeneity in the preferences. However, a similarly exible model for the pro-
grams preferences for residents raises identication issues and other methodological di culties due
to multiple equilibria. In the medical residency market, anecdotal evidence suggests that residents
are largely vertically dierentiated in skill because academic record and clinical performance are
the main determinants of a residents desirability to a program.3 These factors are not observed
in the dataset but should be accounted for. I therefore restrict attention to a model in which the
2
Tuition regulations in public universities and nancial aid programs are a salient example of price interventions
in matching markets. Schooling reforms establishing new public schools or closing dysfunctional school programs are
common interventions that directly aect supply.
3
Conversations with Dr. Katz, Program Director of Internal Medicine Residency Program at Brigham and
Womens Hospital, suggest that while programs have some heterogeneous preferences for resident attributes, the
primarily trend is that better residents get their pick of programs ahead of less qualied residents. Further, academic
and clinical record, and recommendation letters are the primary indicators used to determine resident quality.

2
programspreferences for residents are homogenous and allow for an unobservable determinant of
resident skill. The assumption also implies the existence of a unique pairwise stable match and a
computationally tractable simulation algorithm.
The empirical strategy must confront the fact that "choice sets" of agents in the market are
not observed because they depend on the preferences of other agents in the market. Instead of
a standard revealed preference approach, I identify the model using observed sorting patterns
between resident and program characteristics, and information only available in an environment
with many-to-one matching. For example, residents from more prestigious medical schools sort
into larger hospitals if medical school prestige is positively associated with human capital and
hospital size is preferable. If residents from prestigious medical schools have higher human capital,
they will not sort into larger hospitals if small hospitals are preferable. Furthermore, the degree of
assortativity between medical school prestige and hospital size increases with the weight agents place
on these characteristics when making choices. However, sorting patterns alone are not su cient
for determining the parameters of the model. A high weight on medical school prestige and a low
weight on hospital size results in a similar degree of sorting as a high weight on hospital size and
low weight on medical school prestige. Fortunately, data from many-to-one matches has additional
information that assists in identication. In a pairwise stable match, all residents at a given
program must have similar human capital. Otherwise, the program can likely replace the least
skilled resident with a better resident. Because the variation in human capital within a program is
low, the variation in residentsmedical school prestige within programs is small if medical school
prestige is highly predictive of human capital. The within-program variation in medical school
prestige decreases with the correlation of human capital with medical school prestige. Note that
it is only possible to calculate the within-program variation in a resident characteristic if many
residents are matched to the same program. Finally, to learn about heterogeneity in preferences, I
use observable characteristics of one side of the market that are excluded from the preferences of
the other side. These exclusion restrictions shift the preferences of, say residents, without aecting
the preferences of programs, thereby allowing sorting on excluded characteristics to be interpreted
in terms of preferences.
I estimate the model using the method of simulated moments (McFadden, 1989; Pakes and
Pollard, 1989), and data from the market for family medicine residents between 2003 and 2010.
Approximately 430 programs and 3,000 medical residents participate in this market each year.
Moments used in estimation include summaries of the sorting patterns observed in the data and
the within-program variation in observable characteristics of the residents. The small number of
markets and the interdependence of observed matches creates additional challenges for estimation
and inference. Instead of considering asymptotic approximations based on independently sampled
matches or many markets, I mimic a data generating process in which the market grows in size.
The characteristics of the market participants are drawn iid from a population distribution and the
pairwise stable match for the realized market is observed. The dependence of matches on charac-
teristics of all agents necessitates the use of a parametric bootstrap for constructing condence sets

3
for the estimated parameter.4
I show how to modify the model to correct for potential endogeneity between salaries and un-
observed program characteristics. The technique is based on a control function approach and relies
on the availability of an instrument that is excludable from the preferences of the residents (see
Heckman and Robb, 1985; Blundell and Powell, 2003; Imbens and Newey, 2009). This approach
can be used in other applications in labor markets where endogeneity may arise from compensating
dierentials or other inuences on equilibrium wages. For this setting, I construct an instrument
using Medicares reimbursement rates to competitor residency programs, which are based on regu-
lations enacted in 1985. The results from the instrumented version of the model are imprecise but
indicate that salaries are likely positively correlated with unobservable program quality.
I assess the t of the model, both in-sample and out-of-sample. The out-of-sample t uses the
most recent match results, taken from the 2011-2012 wave of the census. These data were not
accessed until estimates were obtained. The observed sorting patterns for resident groups mimic
those predicted by the model, both in-sample and out-of sample, suggesting that the model is
appropriately specied.
Counterfactual simulations are used to analyze the issues related to the lawsuit and policy in-
terventions for rural training. In the lawsuit, the plaintis used a perfect competition model to
argue that residentssalaries are lower than those paid to substitute health professionals because
of the match. This reasoning does not account for the eects of the limited supply of heteroge-
neous programs and residents. A shortage of desirable residency programs due to accreditation
requirements may lower salaries at high quality programs. Symmetrically, highly skilled residents
can bargain for higher compensation because they are also in limited supply. Equilibrium salaries
under competitive negotiations are inuenced by both of these forces. I use a stylized model to
show that when residents value program quality, salaries in every competitive equilibrium are well
below the benchmark level suggested by the plaintis. The markdown is due to an implicit tu-
ition arising from residentswillingness to pay for training at a program, and is in addition to any
costs of training passed through to the residents. I estimate an average implicit tuition of at least
$23,000, with larger implicit tuitions at more desirable programs. Although imprecisely estimated,
estimates from models using wage instruments are much higher, at $43,000. The results weigh
against the plaintis claim that in the absence of competitive restraints imposed by the match,
salaries paid to residents would be equal to the marginal product of their labor, close to salaries
of physician assistants and nurse practitioners. At a median salary of $86,000, physician assistants
earn approximately $40,000 more than medical residents. The upper-end of the estimated implicit
tuition can explain this dierence. These results imply that the low salaries observed in this market
and those observed by Niederle and Roth (2003, 2009) in the related medical fellowship markets
without a match are due to the implicit tuition, not the design of the match.
Second, regulations aimed at increasing the number of residents in rural areas also aect sorting
4
Agarwal and Diamond (in progress) studies asymptotic theory for a single large market and the special case with
homogeneous preferences on both sides. Monte Carlo evidence suggests that the root mean squared error drops with
sample size and condence sets have close to correct coverage.

4
through general equilibrium eects. A reduction in urban training positions displaces residents
who can in-turn displace other residents who get assigned elsewhere. Financial incentives for rural
training and increases in the number of positions in rural areas cause similar re-sorting. The net
impact of policy interventions is a function of the preferences of both residents and programs as
well as the overall composition of the market. Using estimates from the model, I show that nancial
incentives have only a moderate eect on the number of residents matched to rural programs. An
incentive of $10,000 per year increases the number of residents in rural areas by about 17, or 5%
of the total number of positions in rural programs. At a total cost of $3.3 million, each additional
resident in a rural program costs $200,000 on average. This large per-resident cost arises because
most of the incentives accrue to residents occupying positions that would have been lled without
the incentive. Only 7.7% of rural residency positions are unlled to begin with, which allows little
scope for salary incentives to increase numbers. Instead, the primary impact of this policy is an
increase in the quality of residents in rural areas. As expected, policy interventions directed at the
supply of positions are more eective at increasing the number of residents placed at rural programs.
Depending on the design of the regulation, supply interventions can either increase or decrease the
quality of residents matched at rural programs through general equilibrium re-sorting eects. I nd
that a policy reducing positions oered in urban programs forces residents into rural programs, but
due to re-sorting, does not signicantly lower the quality of residents matched at rural programs.
An increase in the number of positions oered in rural programs, on the other hand, increases the
quality of residents training in rural communities through disproportional take-up in higher quality
rural programs.
The empirical methods in this paper contribute to the recent literature on estimating preference
models using data from observed matches and pairwise stability in decentralized markets.5 The
majority of papers focus on estimating a single aggregate surplus that is divided between match
partners. Chiappori, Salani, and Weiss (2011), Galichon and Salanie (2010), among others, build
on the seminal work of Choo and Siow (2006) for studying transferable utility models of the marriage
market in which an aggregate surplus is split between spouses. Fox (2008) proposes a dierent
approach for estimation, also for the transferable utility case, with applications in Bajari and Fox
(2005), among others. Sorensen (2007) is an example that estimates a single surplus function,
but in a non-transferable utility model. Another set of papers measures benets of mergers using
similar cooperative solution concepts (Weese, 2008; Gordon and Knight, 2009; Akkus, Cookson,
and Hortacsu, 2012; Uetake and Watanabe, 2012). A common data constraint faced in many of
these applications is that monetary transfers between matched partners are often not observed, so
the possibility of estimating two separate utility functions is limited.
Since salaries paid by residency programs are observed, this paper can estimate preferences of
each of the two sides of the market, with salary as a (potentially endogenous) additional character-
5
See Fox (2009) for a survey. The approach of using pairwise stability in decentralized markets may yield a good
approximation of market primitives if frictions are low. Many studies are devoted to understanding the role of search
frictions as a determinant of outcomes in decentralized labor and matching markets (Mortensen and Pissarides, 1994;
Roth and Xing, 1994; Shimer and Smith, 2000; Postel-Vinay and Robin, 2002).

5
istic that is valued by residents. I use a non-transferable utility model because the salary paid by
a residency program is pre-determined. Similar models are estimated by Logan, Ho, and Newton
(2008) and Boyd et al. (2003), although in decentralized markets, with the goal of measuring pref-
erences for various characteristics. Logan, Ho, and Newton (2008) proposes a Bayesian method
for estimating preferences for mates in a marriage market with no monetary transfers. Boyd et al.
(2003) uses the method of simulated moments to estimate the preferences of teachers for schools
and of schools for teachers. Both papers use only sorting patterns in the data to estimate and
identify two sets of preference parameters. Agarwal and Diamond (in progress) prove that even
under a very restrictive model with no preference heterogeneity on either side of the market, sorting
patterns alone cannot identify the preference parameters of the model. Such non-identication can
yield unreliable predictions for both counterfactuals studied in this paper. To solve this problem, I
leverage information made available through many-to-one matches, in addition to sorting patterns,
for identifying two distributions of preferences.
The results on equilibrium salaries paid to residents may also be of independent interest for their
analysis of labor markets with compensating dierentials, especially those with on-the-job training.
It is well known that compensating dierentials can be an important determinant of salaries in labor
markets (Rosen, 1987). Stern (2004), for instance, nds that scientists often accept lower salaries
from rms that allow their employees to publish research. Previous theoretical work on markets
with on-the-job training has used perfect competition models to show that salaries are reduced
by the marginal cost of training (Rosen, 1972; Becker, 1975). Counterfactuals in this paper using
the competitive equilibrium model compute an implicit tuition at residency programs, which a
markdown due to the value of training that is in addition to costs of training passed through to
the resident.
The paper begins with a description of the market for family medicine residents and the sorting
patterns observed in the data (Section 2). Sections 3 through 7 present the empirical framework used
to analyze this market, the identication strategy, the method for correcting potential endogeneity
in salaries, the estimation approach, and parameter estimates, respectively. These sections omit
details relevant exclusively to the applications related to the lawsuit and the analysis of policy
for encouraging rural training. Background for each issue is presented along with counterfactual
simulations in Sections 8 and 9 respectively. All technical details are relegated to appendices.

2 Market Description and Data


This paper analyzes the family medicine residency market from the academic year 2003-2004 to
2010-2011. The data are from the National Graduate Medical Education Census (GME Census)
which provides characteristics of residents linked with information about the program at which they
are training.6 Family medicine is the second largest specialty, after internal medicine, constituting
6
I consider all non-military programs participating in the match, accredited by the Acceditation Council of Gradu-
ate Medical Education and not located in Puerto Rico. I restrict attention to residents matched with these programs.
Detailed description of all data sources, construction of variables, sample restrictions and the process used to merge

6
about one eighth of all residents in the match. Graduates from family medicine residency programs
provide the bulk of medical care in rural United States (Rosenblatt and Hart, 2000).
I focus on ve major types of program characteristics: the prestige/quality of the program as
measured by NIH funding of a programs major and minor medical school a liates;7 the size of the
primary clinical hospital as measured by the number of beds; the Medicare Case Mix Index as a
measure of the diagnostic mix a resident is exposed to; characteristics of program location such as
the median rent in the county a program is located in and the Medicare wage index as a measure
of local health care labor costs; and the program type indicating the community and/or university
setting and/or rural setting of a program.
Table 1 summarizes the characteristics of programs in the market. The market has approxi-
mately 430 programs, each oering approximately eight rst-year positions. Except for program
type (community/university based), there is little annual variation in the composition of programs
in the market. Salaries paid to residents have roughly kept up with ination with a distribution
compressed around $47,000 in 2010 dollars.8
In general, rural programs are smaller than urban programs. They typically consist of about ve
residency positions, are at smaller hospitals as measured by the number of beds, and are a liated
with medical schools with lower NIH funding. Even though family medicine physicians provide the
majority of care in rural communities where 20% of the US population resides, only about 10% of
residency positions in this specialty are in rural settings.
For residents, the data contains information on their medical degree type, characteristics of
graduating medical school and city of birth. Table 2 describes the characteristics of residents
matching with family medicine programs. The composition of this side of the market has also been
stable over this sample period with only minor annual changes. A little less than half the residents
in family medicine are graduates of MD granting medical schools in the US. A large fraction, about
40%, of residents obtained medical degrees from non-US schools while the rest have US osteopathic
(DO) degrees.9 One in ten US born medical residents are born in rural counties.

2.1 The Match


A prospective medical resident begins her search for a position by gathering information about
the academic curriculum and terms of employment at various programs from an online directory
and o cial publications. Subsequently, she electronically submits applications to several residency
programs which then select a subset of applicants to interview. On average, approximately eight
records are in Appendix E. Data on matches from the Graduate Medical Education Database, Copyright 2012,
American Medical Association, Chicago, IL.
7
Major a liates of a program are directly a liated medical schools of a programs primary clinical hospital. Other
medical school a liations between programs and medical schools, via secondary rotation sites or other a liates of
the primary clinical site, are categorized as minor. See data appendix for details.
8
Resident salaries after the rst year is highly correlated with the rst year salary with a coe cient that is close
to one and a R-squared of 0.8 or higher.
9
As opposed to allopathic medicine, osteopathy emphasizes the structural functions of the body and its ability to
heal itself more than allopathic medicine. Osteophathic physicians obtain a Doctor of Osteopathy (DO) degree and
are licensed to practice medicine in the US just as physicians with a Doctor of Medicine (MD) degree.

7
residents are interviewed per position (Table 1). Anecdotal evidence suggest that during or after
interviews, informal communication channels actively operate allowing agents on both sides of the
market to gather more information about preferences. Finally, residency programs and applicants
submit lists stating their preferences for their match partners. The algorithm described in Roth and
Peranson (1999) uses these rank order lists to determine the nal match. The terms of participating
in the match create a commitment by both the applicant and the program to honor this assignment.
Programs do not individually negotiate salaries with residents during this process.
The centralized market for medical residents was established in the 1950s to create a uniform
transaction date, primarily as a remedy for discernible ine ciencies caused by early and exploding
oers (Roth, 1984; Roth and Xing, 1994). In 1998, the clearinghouse was redesigned amid concerns
that the existing design was not in the best interest of applicants and to lower di culties with solving
colocation problems for residency applicants married to other applicants (Roth and Peranson,
1999). The algorithm currently in use substantially reduces incentives for residents and programs
to rematch by producing a match in which no applicant and program pair could have ranked
each other higher than their assignments. It is adapted from the instability-chaining algorithm of
Roth and Vande Vate (1990) and shares features with the applicant proposing deferred acceptance
algorithm introduced by Gale and Shapley (1962).
A few positions are lled before the match begins and some positions not lled after the main
match are oered in the "scramble." During the scramble, residents and programs are informed if
they were not matched in the main process and can use a list of unmatched agents to contract with
each other.10

2.2 Descriptive Evidence on Sorting


Motivated by the properties of the match, the empirical strategy uses pairwise stability to infer
parameters of the model by taking advantage of sorting patterns between resident and program
characteristics observed in the data and features of the many-to-one matching structure to infer
preferences. I defer the discussion of the many-to-one aspect to Section 4.2.
There is a signicant degree of positive assortative matching between measures of a residents
medical school quality and that of a programs medical school a liates. Figure 1 shows the joint
distribution of NIH funding of a residents medical school and of the a liates of the program
with which she matched. Residents from more prestigious medical schools, as measured by NIH
funding, tend to match to programs with more prestigious medical school a liates. Table 3 takes
a closer look at this sorting using regressions of a residents characteristic on the characteristics
of programs with which she is matched. The estimates conrm the general trend observed in
Figure 1. Programs that are associated with better NIH funded medical schools tend to match
with residents from better medical schools as well, whether the quality of a residents medical
10
A new managed process called the Supplemental Oer Acceptance Program (SOAP) replaced the scramble in
2012. A total of 142 positions in family medicine (approximately 5%) were lled through this process. The scramble
was likely of a similar size in the earlier years. See Signer (2012) (accessed June 12, 2012).

8
school is measured by NIH funding, MCAT scores of matriculants, or the resident having an MD
degree rather than an osteopathic or foreign medical degree. This observation also holds true for
programs at hospitals with a higher Medicare case mix index as well. Rent is positively associated
with resident quality, potentially because cities with high rent may also be the ones that are
more desirable to train or live in. Also note that the coe cient on the rural program dummy
is not statistically signicant. Ceteris paribus, rural programs are not matched with signicantly
lower quality residents than urban programs. Further, statistics from Table 1 show that about
90% of positions in rural programs are lled, while 93% are at urban programs. These ndings
are consistent with survey evidence in Rosenblatt et al. (2006), which shows that rural training
programs are matched with residents of a similar type as urban programs.11
To highlight the geographical sorting observed in the data, Table 4 regresses characteristics of a
residents matched program on her own characteristics and indicators of whether the program is in
her state of birth or medical school state. Residents that match with programs in the same state as
their medical school tend to match with less prestigious programs, as measured by the NIH funds
of a programs a liates. Residents also match with programs that are at larger hospitals and have
lower case mix indices. Column (5) shows that rural-born residents are about seven percentage
points more likely to place at rural programs than their urban-born counterparts.
Since these patterns arise from the mutual choices of residents and programs, estimates from
these regressions are not readily interpretable in terms of the preferences of either side of the market.
In particular, none of the coe cient estimates in these regressions can be interpreted as weights
on characteristics in a preference model. The next section develops a model of the market that is
estimated using these patterns in the data.

3 A Framework for Analyzing Matching Markets


This section presents the empirical framework for the model, treating salaries as exogenous. I
demonstrate how an instrument can be used to correct for correlation between salaries and unob-
served program characteristics in Section 5.

3.1 Pairwise Stability


I assume that the observed matches are pairwise stable with respect to the true preferences of the
agents, represented with k for a program or resident indexed by k. Each market, indexed by t,
is composed of Nt residents, i 2 Nt and Jt programs, j 2 Jt . The data consists of the number
positions oered by program j in each period, denoted cjt , and a match, given by the function
1
t : Nt ! Jt . Let t (j) denote the set of residents program j is matched with.
A pairwise stable match satises two properties for all agents i and j participating in market t:

1. Individual Rationality
11
Unlike Rosenblatt et al. (2006), my analysis includes positions in rural residency training track programs that
are satellites of urban host programs.

9
For residents: t (i) i where denotes being unmatched.
1 (j) 1 1 1
For programs: cjt and t (j) j t (j) n fig for all i 2 t (j) :

2. No Blocking: if j i t (i) then

1
If j t (j)j = cjt , then for all i0 2 t (j), t (j) j ( 0
t (j) n fi g) [ fig
If j (j)j < cj , then t (j) j t (j) [ fig :

A pairwise stable need not exist in general or there may be multiple pairwise stable matches.
The preference model described in the subsequent sections guarantees the existence and uniqueness
of a pairwise stable match.
Individual rationality, also known as acceptability, implies that no program or resident would
prefer to unilaterally break a match contract. Because I do not observe data on unmatched residents,
I assume that all residents are acceptable to all programs and that all programs are acceptable to all
residents. Almost all US graduates applying to family medicine residencies as their primary choice
are successful in matching to a family medicine program, and the number of unlled positions in
residency programs in this speciality is under 10%.12 The primary limitation this assumption is
the inability to account for substitution into other professions or entry by new residents.
Under the no blocking condition, no resident prefers a program (to her current match) that
would prefer hiring that resident in place of a current match if the program has exhausted its
capacity. If the program a resident prefers is empty, the program would not like to ll the position
with that resident.
Theoretical properties of the mechanism used by the NRMP guarantees that the nal match
is pairwise stable with respect to submitted rank order lists, but not necessarily with respect
to true preferences. Strategic ranking and interviewing, especially in the presence of incomplete
information, is likely the primary threat to using pairwise stability in this market.13 The large
number of interviews per position suggests that this may not be of concern in this market, however,
it may be implausible in some decentralized markets.
This equilibrium concept also implicitly assumes that agentspreferences over matches is deter-
mined only by their match, not by the match of other agents. This restriction rules out the explicit
12
While residents may apply to many specialties in principle, data from the NRMP suggests that a typical applicant
applies to only one or two specialties (except those looking for preliminary positions). A second specialty is often a
"backup." Greater than 95% of MD graduates interested in family medicine, however, only apply to family medicine
programs. Upwards of 97% residents that list a family medicine program as their rst choice match to a family
medicine program in the main match (See "Charting Outcomes in the Match" 2006, 2007, 2009, 2011, accessed June
12, 2012).
13
The data and the approach does not make a distinction for positions oered outside the match or during the
scramble. The no blocking condition should be a reasonable approximation for the positions lled before the match
as it is not incentive compatible for the agents to agree to such arrangements if either side expects a better outcome
after the match. The condition is harder to justify for small number of the positions lled during the scramble. Note,
however, that residents (programs) that participate in the scramble should not form blocking pairs with the set of
programs (residents) that they ranked in the main round.

10
consideration of couples that participate in the match by listing joint preferences.14 According to
data reports from the NRMP, in recent years only about 1,600 out of 30,000 individuals partici-
pated in the main residency match as part of a couple. I model all agents as single agents because
data from the GME census does not identify an individual as part of a couple.

3.2 Preferences of the Residents


Following the discrete choice literature, I model the latent indirect utility representing residents
preferences i as a function U zjt ; jt ; wjt ; i; of observed program traits zjt , the programs
salary oer wjt , unobserved traits jt , and taste parameters i. I use the pure characteristics
demand model of Berry and Pakes (2007) for this indirect utility:

z w
uijt = zjt i + wjt i + jt : (1)

In models that do not use a wage instrument, I assume that the unobserved traits jt have a
standard normal distribution that is independent of the other variables. I normalize the mean
utility to zero for (z; w) = 0. The scale and location normalizations are without loss in generality.
The independence of jt from wjt is relaxed in the model correcting for potential endogeneity in
salaries.
Depending on the exibility desired, i can be modelled as a constant, a function of observable
characteristics xi of a resident and/or of unobserved taste determinants i:

i = xi + i: (2)

The taste parameters i are drawn from a mean-zero normal distribution with a variance that
is estimated. The richest specication used in this paper allows for heterogeneity via normally
distributed random coe cients for NIH funding at major a liates, beds, and Case Mix Index.
This specication also allows for preference heterogeneity for rural programs based on a rural or
urban birth location of the resident and heterogeneity in preference for programs in the residents
birth state or medical school state through interaction of xi and zjt . These terms are included to
account for the geographic sorting observed in the market.
The pure characteristics model implies that residents have tastes for a nite set of program
attributes. It omits a commonly used additive ijt term that is iid across residents, programs and
markets. These discrete choice models implicitly assume tastes for programs through a charac-
teristic space that increases in dimension with the number of programs. (Berry and Pakes, 2007)
discuss some counter-intuitive implications of including an ijt term on substitution patterns and
welfare eects of changes in the number of programs.
14
Couples can pose a threat to the existence of stable matches (Roth, 1984) although results in Kojima, Pathak,
and Roth (2010) suggest that stable matches exist in large markets if the fraction of couples is small.

11
3.3 Preferences of the Programs
Since the value produced by a team of residents at a program is not observed, I model residency
program preferences through a latent variable. A very rich specication creates two extreme prob-
lems. On the one hand, a pairwise stable match need not exist if a programs preference for a given
resident depends crucially on the other residents it hires. On the other hand, the number of stable
matches can be exponentially large in the number of agents when programs have heterogenous
preferences.15 These problems are notwithstanding any di culties one might face in identifying
such a rich specication.
My conversations with residency program and medical school administrators suggests that pro-
grams broadly agree on what makes a resident desirable, and refer to a "pecking order" for residency
slots in which the best residents get their preferred choices over others. Anecdotal evidence also
suggests that test scores in medical exams, clinical performance, and the strength of recommen-
dation letters are likely the most important signals of a programs preference for a resident, but
are not observed in the dataset (see Footnote 3). Therefore, I model a programs preference for a
resident using a single human capital index H (xi ; "i ) that is a function of observable characteristics
xi of a resident and an unobservable determinant "i .16 I use the parametric form

hi = xi + "i ; (3)

where "i is normally distributed with a variance that depends on the type of medical school a
resident graduated from. For graduates of allopathic (MD) medical schools, xi includes the log NIH
funding and median MCAT scores of the residents medical school. Characteristics also include the
medical school type for residents, i.e. whether a resident earned an osteopathic degree (DO) or
graduated from a foreign medical school. I also include an indicator for whether a resident that
graduated from a foreign medical school was born in the US. Without loss of generality, the variance
of "i for residents with MD degrees is normalized to 1 and the mean of h at x = 0 is normalized to
zero.
This specication guarantees the existence and uniqueness of a stable match and a computation-
ally tractable simulation algorithm that is described in Section 6.3.17 Finally, Section 4.3 notes that
identifying a model with heterogeneity relies on exclusion restrictions, in this case an observable
15
See Roth and Sotomayor (1992) for conditions of existence of a stable match in the college admissions problem.
The multiplicity of the match implied by heterogeneous preference may not be particularly important from an
empirical perspective. In simulations conducted with data reported to the NRMP, Roth and Peranson (1999) nd
that almost all of the residents are matched to the same program across all the stable matches.
16
The model only allows for ordinal comparisons between residents and is consistent with any latent output func-
tion Fj hi1 ; : : : ; hicj from a team of residents i1 ; : : : ; icj at program j that is strictly increasing in each of its
components. An implicit restriction is that the preference for a resident does not depend on the other residents hired.
The restriction may not be strong in this context becase programs cannot submit ranks that depend on the rest of
the team.
17
Existence follows since these preferences are responsive. The condition is similar to a substitutability condition.
See Roth and Sotomayor (1992) for details. Uniqueness is a consequence of preference alignment. See Clark (2006)
and Niederle and Yariv (2009).

12
program characteristic that is excluded from the preferences of the residents for programs.
Since heterogeneity in the preferences over residents is probable, bias in estimates may aect
conclusions from counterfactual simulations. In particular, the analysis of interventions in rural
residency training programs may be inaccurate if rural programs strongly prefer hiring rural-born
residents. Appendix D.1 presents regressions showing that rural-born residents in rural programs
are of similar (observable) quality as urban-born residents also matched to their residency programs.
This suggests low heterogeneity in the preferences of programs, at least on this dimension.

4 Identication
In this section, I describe how the data provide information about preference parameters using
pairwise stability as an assumption on the observed matches. The discussion also guides the choice
of moments used in estimation. Standard revealed preference arguments do not apply because
"choice-sets" of individuals are unobserved and determined in equilibrium.
Agarwal and Diamond (in progress) study non-parametric identication in a single large market
for a model without heterogenous preferences for programs. They nd that having data from
many-to-one matches rather than one-to-one matches is important from an empirical perspective.
I intuitively describe the reason for this dierence. A formal treatment of identication is beyond
the scope of this paper.
The market index t is omitted in this section because all identication arguments are based on
observing one market with many (interdependent) matches. For simplicity, I also assume that the
number of residents is equal to the number of residency positions and treat all characteristics as
exogenous. Identication of the case with endogenous salaries is discussed in Section 5, and does
not require a reconsideration of arguments presented here.

4.1 Using Sorting Patterns: The Double-Vertical Model


Consider the simplied "double-vertical" model in which all residents agree upon the relative rank-
ing of programs. In a linear parametric form for indirect utilities, preferences are represented
with

uj = zj + j

hi = x i + " i ;

where xi and zj are observed and j and "i are standard normal random variables, distributed
independently of the observed traits. Assume the location normalizations E [uj jzj = 0] = 0 and
E [hi jxi = 0] = 0.
A pairwise stable match in this model exhibits perfect assortative matching between u and h.
Because the set of residents with a higher value of x have a higher distribution of human capital,
they are matched with more desirable programs. Conversely, programs with larger z are more

13
likely to match with residents with higher human capital. The data exhibits positive assortativity
between x and z . I now describe what learned from this sorting.
I begin with an example to show that a sign restriction on one parameter of the model is
needed to interpret sorting patterns in terms of preferences. Consider a model in which x is a
scalar measuring the prestige of a residents medical school and z measures the size of the hospital
with which a program is associated. In this example, residents from prestigious medical schools sort
into larger hospitals if the human capital distribution of residents from more prestigious medical
schools is higher and hospital size is preferable. However, this sorting may also have been produced
by parameters under which residents from prestigious medical schools are less likely to have high
human capital and smaller hospitals are preferable. This observation necessitates restricting one
characteristic of either residents or programs to be desirable. Throughout the empirical exercises in
this paper, I assume that residents graduating from more prestigious medical schools, as measured
by the NIH funding of the medical school, are more likely to have a higher human capital index.18
Under this sign restriction, the sorting patterns observed in Figure 1 can only be rationalized if a
programs desirability is positively related to the NIH funding of its a liates.
The sorting patterns can also allow us to determine whether x = x0 for x 6= x0 or conversely,
if z = z 0 . Because z = z 0 , programs with characteristics z and z 0 are equally desirable to
residents. Given a choice between these two programs, the unobservable characteristic is used
to break ties. For this reason, the distribution human capital of residents matched to the set of
programs with observables z and z 0 are identical. Consider two types of programs, one at larger but
less prestigious hospitals than another program at a smaller hospital. If residents trade-o hospital
size for prestige, then the residents matched with these two hospital types have similar observable
characteristics. Conversely, the distribution of observable quality of residents is higher at hospitals
with characteristics z than at z 0 if z > z 0 . The nature of assortativity observed in the data thus
informs us whether two observable types of residents or programs are equally desirable or not.
Agarwal and Diamond (in progress) consider a more general model in which u and h are non-
parametric functions of x and z respectively with additively separable errors " and . They prove
that sorting patterns can be used to determine if x and x0 are equally desirable.

4.2 Importance of Data from Many-to-One Matches


The preceding arguments using only sorting patterns do not contain information on the relative
importance of observables on the two sides of the market. For intuition, consider an example in
which x is a binary indicator that is equal to 1 for a resident graduating from a prestigious medical
school and z is a binary indicator for a program at a large hospital. Assume that half the residents
are from prestigious schools and half the programs are at large hospitals, and that medical school
prestige and hospital size is preferred ( > 0 and > 0). Sorting patterns from such a model
can be summarized in a contingency table in which residents from prestigious medical schools are
18
The sign restiction does not imply that all medical students at more prestigous medical schools have higher
human capital index.

14
systematically more likely to match with programs at large hospitals. For instance, consider the
following table:

z=1 z=0
x=1 30% 20%
x=0 20% 30%

These matches could result from parameters under which programs have a strong preference
for residents from prestigious medical schools (large ) and residents have a moderate preference
for large hospitals (small ). In this case, residents from more prestigious medical schools get their
pick of programs, but often choose ones at small hospitals. On the other hand, the contingency
table could have been a result of a strong preference for large hospitals (large ) but only a
moderate preference for residents from prestigious medical schools (small ). There are a variety of
intermediate cases that are indistinguishable from each other and either extreme. This ambiguity
contrasts with discrete choice models using stated preference lists where the relationship between
ranks and hospital size determines the weight on hospital size. Here, the degree of sorting between
x and z cannot determine the weights on both characteristics because preferences of both sides
determine nal matches.
In addition to sorting patterns, data on many-to-one matches also determines the extent to
which residents with similar characteristics are matched to the same program. In a pairwise stable
match, two residents at the same program must have similar human capital irrespective of the
programs quality. Otherwise, either the program could replace the lower quality resident with a
better resident, or the higher quality resident is could nd a more desirable program. Residents
training at the same program have similar observables if x is highly predictive of human capital.
Conversely, programs are not likely to match with multiple residents with similar observables if
they placed a low weight on x. The variation in resident observable characteristics within programs
is therefore a signal of the information observables contain about the underlying human capital
quality of residents.19
This information is not available in a one-to-one matching market because sorting patterns are
the only feature known from the data. Agarwal and Diamond (in progress) formally shows that
having data from many-to-one matches is critical for identifying the parameters of the model, and
provides simulation evidence to illustrate the limitations of sorting patterns and the usefulness of
many-to-one matching data.

4.2.1 Descriptive Statistics from Many-to-One Matching

Table 5 shows the fraction of variation in resident characteristics that is within a program. Notice
that almost none of the variation in the gender of the resident is across programs. This fact suggests
19
An analogy with measurement error models to explans why many-to-one matches allow us to identify features we
cannot in one-to-one match data. Since we expect that two residents matched to the same program are very similarly
qualied, the observable quality of two doctors at the same program act like noisy measures of their identical true
quality.

15
that gender does not determine the human capital of a resident. If gender were a strong determinant
of a residents desirability to a program, in a double-vertical model one would expect that programs
would be systematically male or female dominated. Summaries of the other characteristics indicate
that residents are more systematically sorted into programs where other residents have more similar
qualications. For instance, about 30% of the variation in the median MCAT score of the residents
graduating medical schools decomposes into across program variation. This statistic is higher for
the characteristics foreign medical degree and MD degree.
Table 6 presents another summary from many-to-one matching based on regressing the leave
one out mean characteristic of a residents peer group in a program on the characteristics of the
resident. Let x i;1 be the average observable x1 of resident is peers for a match , i.e. x i;1 =
1 P
j 1( (i))j 1 i0 2 1( (i)) xi0 ;1 . I estimate the equation

x i; = xi + e i ;

where xi is resident is observables. Not surprisingly, each regression suggests that a residents
characteristic is positively associated with the mean of the same characteristic of her peers. Viewing
NIH funding, MCAT scores, and MD degree as quality indicators, there is a positive association
between a residents quality and the average quality of her peer group. Further, the moderately high
R-squared statistics for these regressions suggest that resident characteristics are more predictive
of her peer groups than what Table 5 might have suggested.

4.3 Heterogeneity in Preferences


I now discuss exclusion restrictions that can be used to learn about heterogeneity in preferences.
Preferences based on observable characteristics of residents that do not aect their human capital
index are reected in heterogeneous sorting patterns for similarly qualied residents. Assume, for
instance, that the birth location of a resident does not aect the preferences of programs for the
resident. Under this restriction, the propensity of residents for matching to programs closer to their
birthplace can only be a result of resident preferences, not the preferences of programs. Further,
residents matching closer to home will do so at disproportionately lower quality programs since
they trade o program quality with preferences for location.
The principle is similar to the use of variation excluded from one part of a system to identify
a simultaneous equation model. The exclusion restriction in the example above isolates a factor
inuencing the demand for residency positions without aecting the distribution of choice sets
faced by residents. Conversely, one may use factors that inuence the human capital index of a
resident but not their preferences to obtain variation in choice sets of residents that is independent
of resident preferences. Conlon and Mortimer (2010) use a similar source of variation arising from
product availability to identify demand models with unobserved heterogeneity.
While only one restriction may su ce in theory, the empirical specications in this paper use
both restrictions. Ideally, one would be able to estimate preferences for programs that are het-

16
erogeneous across residents with dierent medical schools or skill levels. Richer specications that
allows for this type of preference heterogeneity are di cult to estimate because quality indicators of
residents only include the medical school, and do not vary at the individual level. Even with more
detailed information on residents, estimating the preferences for residents with low qualications is
likely to rely on parametric extrapolations from more qualied residents because of the limited set
of choices faced by less skilled residents.

5 Salary Endogeneity
The salary oered by a residency program may be correlated with unobserved program covariates.
For instance, programs with desirable unobserved traits may be able to pay lower salaries due to
compensating dierentials. Alternatively, desirable programs may be more productive or better
funded, resulting in salaries that are positively associated with unobserved quality. One approach
to correct for wage endogeneity is to formally model wage setting. I avoid this for several reasons.
First, the allegation of collusive wage setting in the lawsuit is unresolved. Second, hospitals tend
to set identical wages for residents in all specialties, suggesting that a full model should consider
the joint salary setting decision across all residency programs at a hospital. Finally, a full model
would need to account for accreditation requirements that require salaries to be "adequate" for a
residents living and educational expenses.20

5.1 A Control Function Approach


I propose a control function correction for bias due to correlation between salaries wjt and program
unobservables jt (see Heckman and Robb, 1985; Blundell and Powell, 2003; Imbens and Newey,
2009): The principle of the method is similar to that of an instrumental variables solution to
endogeneity. It also relies on an instrument rjt that is excludable from the utility function U ( ).
The instrument I use is described in the next section.
Consider the following linear function for the salary wjt oered by program j in period t :

wjt = zjt + rjt + jt ; (4)

where zjt are program observable characteristics, rjt is the instrument, and jt is an unobservable.
Endogeneity of wjt is captured through correlation between the unobservables jt and jt . Equation
(4) is analogous to the rst stage of a two-stage least squares estimator and the equilibrium model
of matches is analogous to the second stage.
The control function approach requires jt ; jt to be independent of (zjt ; rjt ). This assumption
replaces weaker conditional moment restriction needed in instrumental variables approach.21 Under
20
The ACGME sponsoring institution requirements state that "Sponsoring and participating sites must provide all
residents with appropriate nancial support and benets to ensure that they are able to fulll the responsibilities of
their educational programs."
21
Imbens (2007) discusses these independence assumptions at some length, noting that they are commonly made
in the control function literature and are often necessary when dealing with a non-additive second stage. In this

17
this independence, although wjt is not (unconditionally) independent of jt , it is conditionally
independent of jt given jt and zjt . The control function approach uses a consistent estimate of
jt from the rst stage as a conditioning variable in place of its true value.
Since jt can be consistently estimated from equation (4) using OLS, treat it as any other
observed characteristic. As noted earlier, we need to allow for correlation between jt and jt to
build endogeneity of wjt into the system. For tractability given the limited salary variation, I model
the distribution of jt conditional on jt as

jt = jt + jt ; (5)

where jt N (0; 1) is drawn independently of jt and ( ; ) are unknown parameters. Substitute


equation (5) to re-write equation (1) as

z w
uijt = zjt i + wjt i + jt + jt : (6)

Since variation in wjt given jt and zjt is due to rjt , the assumptions above imply that jt is
independent of wjt , solving the endogeneity problem.
As a scale normalization, I set = 1. The term jt can arise from specication error and/or
from unobservable determinants of salaries that do not directly aect the preferences of residents for
a program. Note that the unobservable characteristic of the program jt ; may be correlated across
time through r
jt . For instance, jt may be the sum of a random eect j that is constant over
time for a given j and a per-period deviation d as long as each of the components is independent
jt
of (zjt ; rjt ).
While this linear specication may be di cult to justify from economic primitives, it may
substantially reduce bias in estimates. Even in models of oligopolistic competition in which the
price has a nonlinear relationship with unobservables and the characteristics of competing products,
Yang, Chen, and Allenby (2003) and Petrin and Train (2010) nd that linear control functions can
lead to signicant reduction in bias. The restriction that wjt does not depend on characteristics
of other programs may not be particularly strong in this context. However, the single dimensional
additive source of error, jt , remains a strong assumption since it rules out heterogeneous eects
of the instrument. It may be feasible to relax some parametric assumptions in equations (5) and
(6) in settings with greater variation in the endogenous variable.

5.2 Instrument
Table 7 presents regression estimates of equation (5), except using a log-log specication so that
coe cients can be interpreted as elasticities. The rst four columns do not include the instrument
rjt , which is dened below. Columns (1) and (2) show limited correlation between salaries and
context, even though jt is additively separable from wjt , the observed matches are not an additive function of jt
and wjt . This fact prohits the approach used in demand models pioneered by Berry (1994) and Berry, Levinsohn,
and Pakes (1995), where an inversion can be used to to estimate a variable with a separable form in the unobserved
characteristic and the endogenous variable.

18
observed program characteristics except rents and the Medicare wage index. The elasticity with
respect to these two variables is small, at less than 0.15 in magnitude. This suggests that models
that do not instrument for salaries may provide reasonable approximations for residents prefer-
ences. To address potential correlation, I will also present estimates from specications that use
reimbursement rates for residency training at competitor hospitals as a wage instrument.
Medicare reimburses residency programs for direct costs of training based on cost reports sub-
mitted in the 1980s. Before the prospective payment system was established, the total payment
made to a hospital did not depend on the precise classication of costs as training or patient care
costs. The reimbursement system for residency training was severed from payments for patient care
in 1985 because the two types of costs were considered distinct by the government. While patient
care was reimbursed based on fees for diagnosis-related groups, reimbursements for residency train-
ing were calculated using cost reports in a base period, usually 1984. Line items related to salaries
and benets, and administrative expenses of residency programs were designated as direct costs of
residency training. A per resident amount was calculated by dividing the total reported costs on
these line items by the number of residents in the base period. Today, hospitals are reimbursed
based on this per-resident amount, adjusted for ination using CPI-U.
This reimbursement system therefore uses reported costs from two decades prior to the sample
period of study. More importantly, the per resident amount may not reect costs even in the
base period because hospitals had little incentive to account for costs under the correct line item.
Newhouse and Wilensky (2001) notes that the distinction between patient care costs from those
incurred due to residency training is arbitrary and that variation in per-resident amounts may be
driven by dierences in hospital accounting practices or the use of volunteer faculty rather than
real costs. In other words, whether a cost, say salaries paid to attending physicians, was accounted
for in a line item later designated for direct costs can signicantly inuence reimbursement rates
today.
These reimbursements are earmarked for costs of residency training and are positively associated
with salaries paid by a program today (Table 7, Column 3). Reimbursement rates at competitor
programs can therefore aect a programs salary oer because conversations with program directors
suggest that salaries paid by competitors in a programs geographic area are used as benchmarks
while setting their own salaries (Column 4).22 I instrument using a weighted average of reimburse-
ment rates of other teaching hospitals in the geographic area of a program. The instrument is
dened as
P
k2Gj f tek rrk
rj = P ; (7)
k2Gj f tek

where rrk and f tek are the reimbursement rate and number of full-time equivalent residents at
22
Conversations with Dr. Weinstein, Vice President for GME at Partners Healthcare, suggest that salaries at
residency programs sponsored by Partners Healthcare are aimed to be competitive with those at other programs in
the Northeast and in Boston, by looking at market data from two publicly available sources (the COTH Survey and
New England/Boston Teaching Hospital Survey).

19
program ks primary hospital in the base period, and Gj are the hospitals in program js geographic
area other than js primary hospital. I base the geographic denitions on Medicares physician fee
schedule, i.e. the MSA of the hospital or the rest of state if the hospital is not in an MSA. If less
than three other competitors are in this area, dene Gj to be the census division.23
Consistent with the theory for the instruments eect on salaries, Column (5) shows that com-
petitor reimbursements are positively related to salaries. Estimated in levels rather than logs, this
specication is analogous to the rst stage in a two-stage least-squares method.24 In Column (6),
I test the theory that competitor reimbursements aect salaries only through competitor salaries.
Relative to column (5), controlling for the lagged average competitor salaries reduces the esti-
mated eect of competitor reimbursements by an order of magnitude and results in a statistically
insignicant eect.
The key assumption for validity of the instrument is that the program unobservable jt is
conditionally independent of competitor reimbursement rates, given program characteristics and
a programs own reimbursement rate, which is included in zjt for specications using the instru-
ment. This assumption is satised if variation in reimbursement rates is driven by an arbitrary
classication of costs by hospitals in 1984 or if past costs of competitors are not related to res-
idents preferences during the sample period. The primary threat is that reported per residents
costs are correlated with persistent geographic factors. To some extent, this concern is mitigated
by controlling for a programs own reimbursement rate. Reassuringly, Column (7) in Table 7 shows
that the impact of competitor reimbursement rates on a programs salary changes by less than
the standard error in the estimates upon including location characteristics such as median age,
household income, crime rates, college population and total population.25 Another concern is the
possibility that programs respond to the reimbursement rates of competitors by engaging in en-
dogenous investment. A comparison of estimates from Columns (2) and (5) shows little evidence
of sensitivity of the coe cients on program characteristics (NIH, beds, Case Mix Index) to the
inclusion of reimbursement rate variables.

6 Estimation
This section denes the estimator, the moments used in estimation, the simulation technique and
a parametric bootstrap used for inference.
23
Additional details on Medicares reimbursement scheme and the construction of the instrument are in Appendix
G.
24
Figure G.2 depicts this rst stage visually. A strong increasing relationship between salary and competitor reim-
bursements is noticable. Clustered at the program level, the rst stage F-statistic for the coe cient on the instrument
is 37.6. Since the control function approach is based on assuming independence rather than mean independence, I
test for heteroskedasticity in the residuals from the rst stage. I could not reject the hypothesis that the residual is
homoskedastic at the 90% condence level for any individual year of data using either the tests proposed by Breusch
and Pagan (1979) or by White (1980). Figure G.3 presents a scatter plot of the salary distribution against tted
values. The plot shows little evidence of heteroskedasticity.
25
Strictly speaking, the exclusion restriction requires that the instrument is not strongly correlated with factors
that may determine choices of residents. Appendix G shows that excluded location characteristics do not explain
much variation in addition to controls included in the model although a formal test of exogeneity can be rejected.

20
6.1 Method of Simulated Moments
The estimation proceeds in two stages when the control function is employed. I rst estimate the
control variable jt from equation (4) using OLS to construct the residual

^jt = wjt zjt ^ rjt ^: (8)

Replacing this estimate in equation (6), we get

z w
uijt zijt i + wjt i + ^jt + jt ; (9)

where the approximation is up to estimation error in jt . The estimation of parameters determining


the human capital index of residents and their preferences over residents proceeds by treating ^jt
like any other exogenous observable program characteristic. The error due to using ^jt instead of
jt , however, aects the calculation of standard errors. The rst stage is not necessary in the model
treating salaries as exogenous.
The distribution of preferences of residents and human capital can be determined as a function of
observable characteristics of both sides and the parameter of the model, collected from equations
(6), (2) and (3). The second stage of the estimation uses a simulated method of moments estimator
(McFadden, 1989; Pakes and Pollard, 1989) to estimate the true parameter 0 . The estimate ^M SM
minimizes a simulated criterion function

2 0
m
^ ^S ( )
m W
= m
^ ^S ( ) W m
m ^ ^S ( ) ;
m (10)

where m ^ S ( ) is
^ is a set of moments constructed using the matches observed in the sample, m
the average of moments constructed from S simulations of matches in the economy, and W is a
matrix of weights described in Section 6.4. Additional details on the estimator and the optimization
algorithm are in Appendix A.26

6.2 Moments
The vector m
^ consists of sample analogs of three sets of moments, stacked for each market and
^ S ( ) are computed identically, but
then averaged across markets. The simulated counterparts m
averaged across the simulations and markets. Mathematical expressions for the population versions
and other details are in Appendix A.1.
For the match t observed in market t, the set of moments are given by

1. Moments of the joint distribution of observable characteristics of residents and programs as


26
The objective function in the specications estimated have local minima, and is discontinuous due to the use
of simulation. I use three starts of the genetic algorithm, which is a derivative-free global stochastic optimization
procedure, followed by local searches using the subplex algorithm. Details are in Appendix A.

21
given by the matches:
1 X
m
^ t;ov = 1f t (i) = jg xi zjt : (11)
Nt
i2Nt

2. The within-program variance of resident observables. For each scalar x1;i :


0 12
1 X@ 1 X
m
^ t;w = x1;i 1 x1;i0 A : (12)
Nt t ( t (i)) 1
i2Nt i0 2 (
t t (i))

3. The covariance between resident characteristics and the average characteristics of a residents
peers. For every pair of scalars x1;i and x2;i :

1 X 1 X
m
^ t;p = x1;i 1 x2;i0 : (13)
Nt t ( t (i)) 1 1
i2Nt i0 2 (
t t (i))nfig

The rst set of moments include the covariances between program and resident characteristics.
These moments are the basis of the regression coe cients presented in Tables 3 and 4. They
quantify the degree of assortativity between resident and program characteristics observed in the
data. I also include the probability that a resident is matched to a program located in the same
state as her state of birth, or the same state as her medical school state.
The second and third set of moments take advantage of the many-to-one matching nature of the
market.27 Section 4.2 presents summaries of these moments from the data. The moments cannot be
constructed in one-to-one matching markets, such as the marriage market, but are crucial to identify
even the simpler double-vertical model. Since these moments extract information from within a
peer group, they eectively control for both observable and unobservable program characteristics.28

6.3 Simulating a Match


Under the parametric assumptions made on jt , "i , and i in Section 3, for a given parameter vector
, a unique pairwise stable match exists and can be simulated. Because residents only participate in
one market, matches of dierent markets can be simulated independently. For simplicity, I describe
the procedure for only one market and omit the market subscript t. For a draw of the unobservables
N J
f"is ; is gi=1 and js j=1 indexed by s, calculate

his = xi + "is ; (14)


27
Alternatively, one could combine moments of type 1 and 2 to include all entries in the within program covariance
of characteristics.
28
Note that the number of moments suggested increases rapidly as more characteristics are included in the preference
models. If the covariance between each observed characteristic of the resident and of the program are included in the
rst set of moments, the number of moments is at least the product of the number of characteristics of each side.
On the other hand, the number of parameters is the sum of the number of characteristics. This relative growth can
create di culties when estimating models with a very rich set of characteristics.

22
and the indirect utilities fuijs gi;j : The indirect utilities determine the program resident i picks
from any choice set.
Begin by sorting the residents in order of their simulated human capital, fhis gN
i=1 , and let i
(k)

be the identity of the resident with the k-th highest human capital.

Step 1 : Resident i(1) picks her favorite program. Set her simulated match, s i
(1) , to this

program and compute J (1) , the set of programs with unlled positions after i(1) is assigned.

Step k > 1 : Let J (k 1) be the set of programs with unlled positions after resident i(k 1) has
been assigned. Set s i(k) to the program in J (k 1) most desired by i(k) .

The simulated match s can be used to calculate moments using equations (11) to (13). The
optimization routine keeps a xed set of simulation draws of unobservable characteristics for com-
puting moments at dierent values of .
A model with preference heterogeneity on both sides requires a computationally more complex
simulation method, such as the Gale and Shapley (1962) deferred acceptance algorithm (DAA), to
compute a particular pairwise stable match. In the DAA, each applicant simultaneously applies to
her most favored program that has not yet rejected her. A set of applications are held at each stage
while others are rejected and assignments are made nal only when no further applications are
rejected. This temporary nature of held applications and the need to compute a preferred program
for all applications at each stage signicantly increases the computational burden for a market with
many participants such as the one studied in this paper.29

6.4 Econometric Issues


In a data environment with many independent and identically distributed matching markets, the
sample moments and their simulated counterparts across markets can be seen as iid random vari-
ables. Well known limit theorems could be used to understand the asymptotic properties of a
simulation based estimator (McFadden, 1989; Pakes and Pollard, 1989). The data for this study
are taken from eight academic years, making asymptotic approximations based on data from many
markets undesirable. Within each market, the equilibrium match of agents are interdependent
through both observed and unobserved characteristics of other agents in the market. For this rea-
son, modelling the data generating process as independently sampled matches is unappealing as
well.
Instead, I consider a data generating process in which the size of the market grows rather than
the number of markets. The family medicine residency market has about 430 programs and 3,000
residents participating each year. Similar facts motivated theoretical work on the structure of the
set of stable matches and incentives of agents as the market grows in size (Kojima and Pathak,
2009).
29
Even with an insertion sort, a relatively ine cient sorting algorithm, the computational complexity of the al-
gorithm used here is O n2 whereas if preferences were heterogenous on both sides, a simulation to calculate the
resident optimal match using deferred acceptance algorithm would have a computational complexity of O n3 .

23
Agarwal and Diamond (in progress) studies the properties of the estimator for the double-
vertical model in a single market for a data generating process in which the number of programs
and residents increases. For each program, j, the capacity is drawn from the distribution Fc ,
with support on the natural numbers less than c. They study the case where the total number of
P
positions Ctot = j cj is equal to the number of residents N . Under these asymptotics, the number
of market participants on each side grows at a stochastically proportional rate. The observed
data is a pairwise stable match for N residents and J programs with characteristics (xi ; "i ) and
zjt ; jt drawn from their respective population distributions. Such data can be viewed as a joint
distribution of observable characteristics of programs and residents, with information also on each
residents peer group in the program. The challenge in obtaining asymptotic theory arises precisely
from the dependence of matches on the entire sample of observed characteristics. Similar challenges
arise in the literature on network formation models (see Kolaczyk, 2009; Christakis et al., 2010).
Monte Carlo evidence suggests that in a more general model like the one estimated in this paper,
the root mean square error in parameter estimates decreases with the sample size.

Calculating Standard Errors

An additional challenge arises for constructing condence sets for the estimated parameter
because of interdependence of matches, and because bootstrapping the estimator directly is com-
putationally prohibitive. The covariance of the moments is estimated using a parametric bootstrap
to account for the dependence of matches across residents. With this estimate, I approximate
the error in the estimated parameter using a delta method that is commonly used in simulated
estimators (Gourieroux and Monfort, 1997):

^ = ^ 0W ^
1
^ 0W 1 1
V^ + V^ S W 0 ^ ^ 0 W ^ ; (15)
S

where ^ is the gradient of the moments with respect to evaluated at ^M SM using two-sided
nite-dierence derivatives; W is the weight matrix used in estimation; V^ is an estimate of the
covariance of the moments at ^M SM ; S is the number of simulations and V^ S is an estimate of the
simulation error in the moments at ^M SM .
In this section, I describe the choice of W and outline the parametric bootstrap used to estimate
^
V for the simpler case with N = Ctot and exogenous salaries. Appendix A provides additional details
on estimating ^ . The bootstrap mimics the data generating process described earlier. Three basic
steps are used for each bootstrap iteration b 2 f1; : : : ; Bg :

1. Generate a bootstrap sample of programs fzj;b ; cj;b gJj=1 by drawing from the empirical dis-
P
tribution F^Z;C with replacement. Calculate Ctot;b = cj;b .
j

Ctot;b
2. Generate a bootstrap sample of residents fxi;b gi=1 from F^X , with replacement.

24
Ctot;b
3. Simulate the unobservables "i;b ; i;b ; jt;b to compute fhi;b gb=1 and fui;j;b gi;j at ^M SM .
Calculate the stable match b ^ b.
for bootstrap b and corresponding moments m

^ b is the estimate for V^ used to compute ^ : Monte Carlo evidence suggests


The variance of m
that the procedure yields condence sets with close to the correct size. The model using the control
function correction has an additional step in this bootstrap to account for uncertainty in estimating
^jt , also described in Appendix A.
Finally, the weight matrix in estimation is obtained from bootstrapping directly from the joint
distribution of matches observed in the data. A bootstrap sample of matches f b gB
b=1 is generated
by sampling, with replacement, J programs and along with their matched residents. The moments
from these matches are computed and the inverse of the covariance is used as the positive denite
weight matrix, W . The procedure does not require a rst step optimization and does not need to
converge to V^ 1 .

7 Empirical Specications and Results


I present estimates from three models. The rst model has the richest form of preferences as it
allows for unobserved heterogeneity in preferences via normally distributed random coe cients on
Case Mix Index, NIH Funds of major medical school a liates and the number of beds. It also
allows for heterogeneity in taste for program location based on a residents birth location and
medical school location. I use a second model that does not include random coe cients on Case
Mix, NIH Funds or beds to assess the importance of unobserved preference heterogeneity. These
two models treat salaries as exogenous. The nal model modies the second model to addresses the
potential endogeneity in salaries using the instrument described in Section 5.2. This specication
includes a programs own reimbursement rate in addition to characteristics included in the other
models.
Estimates of residents preferences for programs presented in the next section are translated
into dollar equivalents for a select set of program characteristics. I also present the willingness
to pay by categories of programs. These are the most economically relevant statistics obtained
from preference estimates. Appendix B briey discusses the underlying parameters, which are not
economically intuitive, and robustness using estimates from additional models.

7.1 Preference Estimates


Panel A.1 of Table 8 presents the estimated preferences for programs in salary equivalent terms.
Comparing specications (1) and (2), the estimated value of a one standard deviation higher Case
Mix Index at an otherwise identical program is about $2,500 to $5,000 in annual salary for a typical
resident. Likewise, residents are willing to pay for programs at larger hospitals as measured by beds,
and for programs with better NIH funded a liates. The estimates from specication (1) suggest
a substantial degree of preference heterogeneity for these characteristics as well. The additional

25
heterogeneity in preferences relative to specication (2) results in a shift in the mean willingness to
pay for NIH funding of major a liates, the Case Mix Index, and beds, but not whether they are
desirable or not.
Panel A.2 presents estimates of preferences for program types and heterogeneity in preferences
for program location. Both specications (1) and (2) estimate that, ceteris paribus, rural programs
are preferable to urban programs. This result is consistent with the reduced form evidence pre-
sented in Section 2, which shows a positive though statistically insignicant association between
resident quality and rural programs, and that rural programs do not have a signicantly larger frac-
tion of unlled positions than urban programs. Because rural programs tend to be associated with
smaller hospitals and medical school a liates with lower NIH funding, these estimates do not nec-
essarily imply that rural programs are preferred to urban programs. The next section presents the
willingness to pay by program categories and shows that overall, rural programs are less preferred
to urban programs.
Estimates from both specications also suggest that residents prefer programs in their state of
birth or in the same state as their medical school. For instance, estimates from specication (1)
imply that a typical resident is willing to forgo about $10,000 in salary to match at a program in
the same state as their medical school. Although rural born residents prefer rural programs more
than other residents, they prefer rural programs at a monetary equivalent of under $1,200. The
estimated willingness to pay for these factors is smaller in specication (2) although the relative
importance for the dierent dimensions is similar.
Panel B presents parameter estimates for the distribution of human capital, which determines
ordinal rankings between residents. All specications yield similar coe cients on the various resi-
dent characteristics and estimate that the unobservable determinants of human capital have larger
variances for residents with foreign degrees. The estimated dierence between a US born foreign
medical graduate and foreign graduates from other countries is an order of magnitude smaller than
the standard deviation of unobservable determinants of human capital.

7.1.1 Estimates with Instruments

As compared to estimates from specication (2), which treats salaries as exogenous, the estimated
willingness to pay for program characteristics is generally larger in specication (3). The estimates
for NIH funding of Major Medical school a liates is the only exception. The increase in the
estimated willingness to pay in specication (3) is driven by a fall in the coe cient on salaries but
similar coe cient estimates for the other program characteristics. Appendix B discusses results
from the instrumented version of specication (1), which also leads to a decrease in the coe cient
on salaries and little change in estimates for other coe cients. This specication results in a small,
positive coe cient on salaries that is not statistically signicant and implies an implausibly large
willingness to pay for better programs.
The qualitative eect of including the wage instrument on parameter estimates indicates that,
if anything, treating salaries as exogenous may lead to an understated willingness to pay for more

26
desirable programs. I interpret the magnitudes with caution given the lack of robustness, which is
likely a consequence of the limited salary variation in the data.30 Aside for controlled geographic
covariates such as rent and wage index, estimates in Column (2) of Table 7 do not show strong
evidence of substantial correlation of salaries with program characteristics. My preferred approach
is to focus on results from specication (1) for most counterfactual results and discuss the eect of
possible positive bias in the salary coe cient using specication (3).

7.1.2 Distribution of Willingness to Pay

The distribution of willingness to pay for dierent programs is an important economic input for
analyzing salaries under competitive wage bargaining and for evaluating the eect of nancial in-
centives for rural training. Figure 2 plots the estimated distribution of utility (in dollars) across
programs averaged over residents, net of salaries, for the 2010-2011 sample year as implied by spec-
ication (1). This sample will be used for all counterfactual exercises. Table 9 presents summary
statistics of this distribution by categorizing programs into quartiles based on observed character-
istics, and normalizing the mean across all programs to zero. I estimate a large willingness to pay
for programs with a high Case Mix Index, at larger hospitals and in counties with larger programs.
A typical resident is willing to accept a $5,000 to $9,500 lower salary at the average urban program
instead of a training in a rural location. At under $1,200, the estimated additional preference of
rural born residents for a rural program is not su cient to overturn the mean distaste for training
in rural programs. The nding that the typical rural hospital is not substantially less attractive
than their urban counterparts is consistent with conclusions of Rosenblatt et al. (2006). Using
surveys of program directors, they nd that residents matched at rural programs and the number
of applications per position are similar to those in urban programs.
Specications (1) and (2) estimate the standard deviation in utility across residents and pro-
grams of varying characteristics to be between $14,000 and $22,000. This measure doubles from
$14,000, but is imprecisely estimated, when Specication (2) is modied to account for endogeneity
in salaries. While dierences in the quality of training provided by a program is likely the primary
driver of willingness to pay for dierent programs, as evidenced by tastes for geographically nearby
programs, there may be some contemporaneous value for desirable amenities. At rst glance, the
estimated standard deviation in willingness to pay for programs may seem large with respect to
the observed variation in salaries (about $3,200). However, the ideal comparison is with the distri-
bution of training value added in terms of future income across residency programs, which is likely
much larger. Such a comparison is not possible given the available data.

7.2 Model Fit


In this section, I describe the in-sample and out-of-sample t of estimates from specication (1).
The t of specications (2) and (3) are qualitatively similar. The out-of-sample t uses data from
30
The objective function for specications using salary instruments is fairly at along dierent combinations of
coe cients on the wage and control variables.

27
the 2011-2012 wave of the GME Census, which was only accessed after parameter estimates were
computed.
Estimates of the model only determine the probability that a resident with a given observable
characteristic matches with a program with certain observables. The uncertainty in matches arises
from unobservables of both the residents and the programs. Therefore, an assessment of t must
use statistics that average matches across groups of residents or programs.
For simplicity of exposition, I assess model t using a single dimensional average quality of
matched program for a group of residents with similar observable determinants of human capital.
For each year t, I use the parameter estimates from the model to construct a quality index for each
resident i and program j by computing xi ^ and zjt ^ respectively. Then, I divide the residents into
ten bins based on xi ^ and compute the mean quality of program with which residents from each bin
are matched. Figure 3 presents a binned scatter plot of this mean quality of program as observed
in the data and predicated by model simulations. Both the in-sample points and the out-of-sample
points are close to the 45-degree line. The 90% condence sets of the simulated means for several
resident bins include the theoretical prediction.31
This t of the model provides condence that parametric restrictions on the model are not
leading to poor predictions of the sorting patterns in the market. Therefore, I am comfortable
using estimates as basis of counterfactual analysis.

8 Application 1: Salary Competition


In 2002, a group of former residents brought on a class-action lawsuit under the Sherman Act
against major medical associations in the United States and the NRMP. The plaintis alleged
the medical match is an instrumental competitive restraint used by the residency programs to
depress salaries.32 By replacing a traditional market in which residents could use multiple oers
to negotiate with programs, they argued that the NRMP "enabled employers to obtain resident
physicians without such a bidding war, thereby articially xing, depressing, standardizing and
stabilizing compensation and other terms of employment below competitive levels" (Jung et.al. v
AAMC et.al., 2002). A brief prepared by Orley Ashenfelter on behalf of the plaintis argued that
competitive outcomes in this market would yield wages close to the marginal product of labor,
which was approximated using salaries of starting physicians, nurse practitioners, and physician
31
A more model-free assessment of t using sorting regressions only on observed covariates is presented in Table
B.2. One may also worry predicting sorting patterns is is mechanical because there is little change in the market
composition across years. For counterfactuals directly impacting the composition of market participants, it can be
important for the model to capture changes in sorting as a function of changes in the composition of the market.
However, changes in the composition of the resident and program distribution are negligible, resulting in little available
variation to test the model with such a t.
32
Jung et.al. v AAMC et.al. (2002) states that "The NRMP matching program has the purpose and eect of
depressing, standardizing and stabilizing compensation and other terms of employment." After the lawsuit was led,
the Pension Funding and Equity Act of 2004 amended antitrust law to disallow evidence of participation in the medical
match in antitrust cases. The lawsuit was dismissed following this amendment, overturning a previous opinion of the
court upholding the price-xing allegation.

28
assistants.33 Physician assistants earned a median salary of $86,000 in 201034 as compared to about
$47,000 for medical residents despite longer work hours.35
Recent papers have debated whether low salaries observed in this market are a results of the
match. Using a stylized model, Bulow and Levin (2006) argue that salaries may be depressed
in the match because residency programs face the risk that a higher salary may not necessarily
result in a better resident. Kojima (2007) uses an example to show that this result is not robust
in a many-to-one matching setting because of cross-subsidization across residents in a program.
Empirical evidence in Niederle and Roth (2003, 2009) suggests that medical fellowship salaries are
not aected by the presence of a match, however, the study does not explain why fellowship salaries
remain lower than salaries paid to other health professionals.
The plaintis argued their case based on a classical economic model of homogeneous rms
competing for the services of labor and free entry. However, such a perfect competition benchmark
may not be a good approximation for an entry-level professional labor market. The data provide
strong evidence that residents have preferences for characteristics of the program other than the
wages and may, thus, reject a higher salary oer from a less desirable program. Further, barriers
to entry by residency programs are high and capacity constraints are imposed by accreditation
requirements. A program must therefore consider the option value of hiring a substitute resident
when confronted with a competing salary oer. High quality programs may be particularly able to
nd other residents willing to work for low salaries. Conversely, highly skilled residents are scarce
and they may be able to bargain for higher salaries. It is essential to consider these incentives in
order to predict outcomes under competitive salary bargaining.
I model a "traditional" market using a competitive equilibrium, which is described by a vector
of worker-rm specic salaries and an assignment such that each worker and rm demands precisely
the prescribed assignment. Shapley and Shubik (1971) show that competitive equilibria correspond
to core allocations and satisfy two conditions. First, allocations must be individually rational
for both workers and rms. Second, it must be that at the going salaries no worker-rm pair
would prefer to break the allocation to form a (dierent) match at renegotiated salaries. This
latter requirement ensures that further negotiations cannot be mutually benecial. Kelso and
Crawford (1982) show that competitive equilibria can result from a salary adjustment process in
which the salaries of residents with multiple oers are sequentially increased until the market clears.
The process embodies the "bidding war" plaintis suggest would arise in a "traditional" market.
Crawford (2008) proposed a redesign of the residency match based on the salary adjustment process
with the aim of increasing the exibility of salaries in the residency market and implementing a
competitive equilibrium outcome.
I rst develop a stylized model to derive the dependence of competitive equilibrium salaries on
33
A redacted copy of the expert report submitted on behalf of the plaintis is available on request.
34
Source: Bureau of Labor Studies.
35
At 50 work-weeks a year and 80 hour a week, the cap imposed by the ACGME in 2003, a salary of $50,000 yields
a wage rate for a medical resident of $12.50. A more generous estimate with 65 hours a week, 45 work-weeks a year
and a salary of $60,000 yields a wage rate of $20.50.

29
both the willingness to pay for programs and the production technology of residency programs.
For counterfactual simulations, I adopt an approach that does not rely on knowing the production
technology of resident-program pairs because data on residency program output is not available.
Instead of calculating equilibrium salaries, I use the estimates of only the residentspreferences to
calculate an equilibrium markdown from output net of training costs, called the implicit tuition.
Loosely speaking, my calculation acts as if the output produced by a program-resident pair accrues
entirely to residents. The illustrative model shows that the approach is likely to understate the
equilibrium markdown in salaries since programs do not earn any infra-marginal productive rents
due to their own productivity. The theoretical model is also used to describe dierences with related
models of on-the-job training or salary setting with non-pecuniary amenities.

8.1 An Illustrative Assignment Model


I generalize the model of the residency market in Bulow and Levin (2006) which assumes that
residents take the highest salary oer. I allow resident preferences to depend on program quality
in addition to salaries, and use a more exible production function than Bulow and Levin (2006).
Consider an economy with N residents and programs in which each program may hire only
one resident. Resident i has a human capital index, hi 2 [0; 1), and program j has a quality of
training index, qj 2 [0; 1). To focus on salary bargaining, the training quality of programs are
held exogenous. Without loss of generality, index the residents and programs so that hi hi 1,
qj qj 1, and q1 and h1 are normalized to zero.
Residents have homogenous, quasi-linear preferences for the quality of program, u (q; w) = aq+w
with a 0. The value, net of variable training costs, to a program of quality q of employing a
resident with human capital index h is f (h; q) where fh , fq , fhq > 0 and f (0; 0) is normalized to
0.36 A programs prot from hiring resident h at salary level w is f (h; q) w. I assume that an
allocation is individually rational for a resident if u (q; w) 0, and for a program if f (h; q) w 0.
A competitive equilibrium assignment maximizes total surplus. In this model, the unique
equilibrium is characterized by positive assortative matching and full employment. Hence, in equi-
librium, resident k is matched with program k and is paid a possibly negative wage wk . The vector
of equilibrium wages is determined by the individual rationality constraints and the constraint

f (hk ; qk ) wk f (hi ; qk ) wi + a (qk qi ) : (16)

This constraint on wk requires that the prot of program k by hiring resident k must be weakly
greater than the prot from hiring resident i. At the going salaries, it is incentive compatible for
resident i to accept an oer from program k only if the wage is at least wi a (qk qi ).
There is a range of wages that are a part of a competitive equilibrium. Shapley and Shubik
(1971) shows that there exists an equilibrium that is weakly preferred by all residents to all other
equilibria, and another that is preferred by all programs. Appendix C.1 characterizes the entire
36
A complementary production technology is commonly assumed for studying on-the-job training (Becker, 1975,
pp 34) or sorting in matching markets (Becker, 1973; Teulings, 1995).

30
set of equilibria, and derives the expression for wages at these two extremal outcomes. Since the
plaintis alleged that salaries are currently much lower than in a bargaining process, I focus on the
worker-optimal equilibrium which has higher salaries for every worker than any other equilibrium.
This outcome is unanimously preferred by all residents to other competitive equilibria. The wage
of resident k in the worker optimal equilibrium is given by

k
X
wk = aqk + [f (hi ; qi ) f (hi 1 ; qi )] : (17)
i=2

Resident 1 receives her product of labor f (h1 ; q1 ) (normalized to 0), the maximum her employer
is willing to pay. For resident 2, the rst term aq2 represents an implicit price for the dierence in
the value of training received by her compared to that of program 1 (with q1 = 0). If a resident
were to use a wage oer of w by program 1 in a negotiation with program 2, the resident would
accept a counter oer of w aq2 . The second term in this residents wage, f (h2 ; q2 ) f (h1 ; q2 ),
is program 2s maximum willingness to pay for the dierence in productivity of residents 1 and 2,
which accrues entirely to the resident in the worker-optimal equilibrium. The sum of these two
terms measures the impact of the outside option of each party on the wage negotiation determining
w2 . For k > 2, these (local) dierences in the productivity of residents add up across lower matches
to form the equilibrium wage.

Implicit Tuition

The implicit price for training at rm k, given by aqk , is based on the preferences for training
at a program rather than the cost of training. In models of general training that use a perfect
competition framework, such as Rosen (1972) and Becker (1975), the implicit price is the marginal
cost of training alone because free entry prevents rms from earning rents due to their quality.37
When entry barriers are large due to xed costs or restrictions from accreditation requirements,
rms can earn additional prots due to their quality. I argue that ruling out entry is appropriate
because of accreditation requirements and to focus on wage bargaining. Equation (17) shows that
under these assumptions, program k can levy the implicit tuition aqk on residents. This implicit
tuition results from a force similar to compensating dierentials (Rosen, 1987), but allows for
heterogeneity in resident skill. Equilibrium salaries are the sum of the implicit tuition and a split
of the value f produced by a resident program pair.
As mentioned earlier, the data does not allow us to determine f . I calculate the implicit tuition
using residents preferences alone in order to evaluate whether a gap between f and equilibrium
salaries exists as a result of market fundamentals. The next result shows that the implicit tuition
37
Viewing f (h; q) as output net of costs of training, a constant training cost across residents and programs would
shift the wage schedule down by that constant. As can be seen from equation (17), training costs that depend on
program quality, but not the quality of the resident do not aect equilibrium salaries as long as fq remains positive.
Also note that the implicit price aqk does not depend on the number of residents and programs N , which could
be very large, or the distribution of program quality. Intuitively, the important dierence overturning results from
perfect competition is that the number of rms competing for a xed set of workers is not disproportionately large.

31
bounds the markdown in salaries from below. Under free entry by rms, salaries would be equal
to f because any prots earned by rms would be competed away.

Proposition 1 For all production functions f with fh ; fq ; fhq 0, the prots of the rm k is
bounded below by aqk in any competitive equilibrium.

Proof. Corollary to Proposition 5 stated and proved in Appendix C.2.


Hence, the implicit tuition aqk is a markdown in salaries that is independent of the output. If
residents have a strong preference for program quality, this implicit tuition will be large and salaries
in any competitive equilibrium are well below the product f (hk ; qk ).
To interpret the implicit tuition as a lower bound for salary markdowns, consider two particular
limiting cases for the production function. If f (h; q) depends only on h so that the value of a
resident, denoted f (h), does not vary across programs, the worker-optimal salaries are given by

wk = f (hk ) aqk : (18)

Under this production function, the resident is the full claimant of the value of her labor and salaries
equal her product net of the implicit tuition. Residents are able to engage programs in a bidding
war until their salary equals the output less the implicit tuition because all programs value resident
k at f (hk ).
On the other hand, if f (h; q) depends only on q so that all residents produce f (q), irrespective
of their human capital, the worker-optimal salaries are

wk = aqk : (19)

In this case, the program does not share the product f (qk ) with the resident since any two residents
are equally productive at the program. The resident still pays an implicit tuition for training.38
The production function directly inuences competitive salaries but Proposition 1 shows that
in all cases resident k pays the implicit tuition aqk . Equilibrium wages given in equations (18)
and (19) highlight that the side of the market that owns the factor determining dierences in f
is compensated for their productivity in a competitive equilibrium. Residents are compensated
for their skill only if human capital is an important determinant of f . For this reason, using a
production function of the form f (h) results in a markdown in salaries from f that is only due to
the implicit tuition.
This interpretation highlights a key dierence from results derived using models with many
rms competing for labor with free entry. In those models, one expects all the product to accrue
to the workers because rms enter the market to bid for labor services until a zero prot condition
is met. High compensation for residents is a result of free entry rather than negotiations between
a xed set of agents.
38
In order to ensure that the match is assortative in these limiting cases, I assume that if a program (resident) has
two equally attractive oers, the tie in favor of the resident (program) with the higher human capital (quality).

32
8.2 Generalizing the Implicit Tuition
The expression for the implicit tuition derived above relied on the assumption that residents have
homogeneous preferences for program quality. For this reason, the results from the illustrative
model do not speak to competitive outcomes in a model with heterogenous preferences. This
section generalizes the denition of implicit tuition to make it applicable to the model dened in
Section 3.
Notice that the prot earned by program k in a worker-optimal equilibrium under a production
function of the form f (h) is precisely the implicit tuition aqk because this production function
does not provide programs with infra-marginal productive rents. Under this production function,
markdowns from output are determined only by residentspreferences for programs. Consequently,
calculating rm prots using a production function of this type may provide a conservative approach
to estimating payos to programs more generally. The next result shows that under heterogeneous
preferences for programs, the dierence between salaries and output is the same for all production
functions of the form f (h). This ensures that an implicit tuition can be dened and calculated
using only the residentswillingness to pay for programs, circumventing the need for estimating f .
For notational simplicity, I state the result for a one-to-one assignment model, and the general
result for many-to-one setting is stated and proved in Appendix C.4.39 With a slight abuse of
notation, let the total surplus from the pair (i; j) be afij = uij + f (hi ) 0:40 Here, uij is the
utility, net of wages, that resident i receives from matching with program j and f (hi ) is the output
produced by resident i. I now characterize the equilibria for a modied assignment game in which
~
the surplus produced by the pair is afij = uij + f~ (hi ) 0 in terms of the equilibria of the game
with surplus afij .

~
Proposition 2 The equilibrium assignments of the games dened by afij and afij coincide. Further,
~ ~
if uf and v f are equilibrium payo s for the surplus af , then uf = uf + f~ (hi ) f (hi ) and v f = v f
i j ij i i j j
are equilibrium payo s under the surplus afij . Hence, a rms prot in a worker-optimal equilibrium
depends on fuij gi;j but is identical for all production functions of the form f (h).

Proof. See Appendix C.4 for the general case with many-to-one matches.
As in the illustrative model, under a production technology that depends only on human capital,
the residents are the residual claimants of output. An increase or decrease in the productivity of
human capital is reected in the wages, one for one. The rms prots depends only on the
preferences of the residents. Thus, I refer to the dierence between output and salaries in the
worker-optimal competitive equilibrium for a model in which f depends only on h as the implicit
tuition. This denition uses the assumption that preferences of the programs can be represented
using a single human capital index in the empirical model but also makes the additional restriction
39
In the general formulation, I assume that the total output from a team of residents h1 ; : : : ; hqj is
Pqj
F h1 ; : : : ; hqj = k=1 f (hk ), where f (hk ) = 0 if position k is not lled.
40
This formulation implicitly assumes that, at every program, it is individually rational for a worker to accept a
salary equal to her product. It further assumes that the output of every resident is non-negative.

33
that the productivity of human capital, in dollar terms, does not depend on the identity of the
program.
To the best of my knowledge, a closed form expression for competitive equilibrium salaries is
not available when preferences of the residents are heterogeneous. I calculate the implicit tuition
implied by estimated preferences using a two-step procedure.41 Each step solves a linear program
based on the approach developed in Shapley and Shubik (1971):

Step 1 : Solve the optimal assignment problem, modied from the formulation by Shapley
and Shubik (1971) to allow for many-to-one matching.

Step 2 : Calculate the worker-optimal element in the core given the assignments from step 1.

Appendix C.3 describes the procedure in more detail. All calculations are done with the 2010-
2011 sample of the data.

8.3 Estimates of Implicit Tuition


Estimates presented in Section 7 suggest that residents are willing to take large salary cuts in order
to train at more preferred programs, which can translate into a large implicit tuitions. Table 10
presents summary statistics of the distribution of implicit tuition using estimates from specications
(1) through (3). I estimate the average implicit tuition to be about $23,000 for specications (1)
and (2). This estimate rises to $43,500 when using the instrument in specication (3) because
the coe cient on salaries falls. As mentioned in Section 7, the instrument used appears weak and
yields non-robust point estimates, but generally results in a larger willingness to pay and implicit
tuitions through a decrease in the coe cient on salaries.42 The standard error in the estimate using
specication (3) is also large, at $13,700, but can rule out an average implicit tuition smaller than
$17,000. These estimates are economically large in comparison to the mean salary of about $47,000
paid to residents.
The results also show signicant dispersion in the implicit tuition across residents and programs.
The standard deviation in the implicit tuition is between $12,000 and $25,000. The 75th percentile
of implicit tuition can be about three times higher than the 25th percentile, with even higher
values at the 95th percentile. This dispersion primarily arises from the dierences in program
quality, which allows higher quality programs to lower salaries more than relatively lower quality
program.
The estimated implicit tuition is between 50% to 100% of the $40,000 salary dierence between
medical residents and physician assistants. This nding refutes the plaintis argument that the
41
Since the total number of residents observed in the market is less than the number of positions and the value of
options outside the residency market are di cult to determine, I will assume that the equilibrium is characterized by
full employment. This property follows if, for instance, it is individually rational for all residents to be matched with
their least desirable program at a wage that is equal to the total product produced by the resident at this program
and the product produced by a resident is not negative.
42
The instrumented version of specication (1) results in implicit tuition estimates much larger than the ones
reported because of the smaller estimated coe cient on salaries.

34
salary gap would not exist if residentssalaries were set competitively and physician assistant salaries
approximated the productivity of residents. However, the estimated implicit tuition cannot explain
the salary gap between starting physicians and medical residents, which is approximately $90,000.43
As discussed earlier, the implicit tuition is a conservative estimate of the salary markdown and part
of this salary gap may be due to dierences in the productivity of medical residents and starting
physicians.
When residents preferences are heterogeneous, the implicit tuition is also a function of the
relative demand and supply of dierent types of residency positions, and is not simply a result of
compensating dierentials. Estimates from specication (1) imply a willingness to pay by residents
for programs in the same state as their medical school, and programs in the same state as their birth
state. Therefore, the demand for residency positions is high in states where many residents were
born or states where many residents went to medical school. A supply-demand imbalance occurs,
for instance, when the number of residency positions in the state is low but many residents have
preference for training in that state. These forces will be important determinants of equilibrium
salary if the residency market adopts the design proposed in Crawford (2008) because the proposal
is intended to produce a competitive equilibrium outcome.
To demonstrate the eect of this imbalance on the estimated implicit tuition, I present results
from the regression

ln yj = zj 1 + 2 ln npossj + 3 ln grsj + 4 ln bornsj + ej ;

where yj is the average implicit tuition at program j estimated using specication (1), zj are
characteristics of program j included in specication (1), sj is program js state, npossj is the
number of residency positions oered in sj , grsj is the number of residents from MD medical schools
in state sj and bornsj is the number of residents born in state sj . Column (4) of Table 11 shows
that the elasticity of the average implicit tuition at a program with respect to the number of family
medicine graduates getting their degrees in a medical school in that state is positive, ^3 = 0:19.
Conversely, the elasticity with respect to the number of positions oered in the programs state
is negative, ^2 = 0:16. The estimate for ^4 is not statistically signicant, partially because the
estimated preference for birth state is low and because supply-demand imbalance based on birth-
state is also lower.

8.4 Discussion
In matching markets, agents on both sides are heterogeneous and have preferences for match part-
ners. The eects of this feature on market outcomes, especially when barriers to entry are sub-
stantial, are not captured by a perfect competition model. Theoretical results presented in Section
8.1 show that equilibrium salaries can be well below the product of labor, net of costs of training,
43
I use Mincer equation estimated using interval regressions on condential data from the Health Physician Tracking
Survey of 2008 to calculate the average salaries for starting family physicians. Details in Appendix F.

35
when residents value the quality of a program. Counterfactual estimates show that the willingness
to pay for programs results in a large markdowns in salaries in a competitive wage equilibrium.
The upper end of estimates can explain the salary gap between physician assistants and medical
residents assuming that physician assistant salaries are close to the productivity of residents. My
estimates also show that higher quality programs would earn a larger implicit tuition than less
desirable programs. To the extent that higher quality programs are matched with higher skilled
residents and are also instrinsically more productive, the implicit tuition is a countervailing force
to high dispersion salaries driven by productivity dierences.
The analysis suggests that instead of the design of the match, salaries are low because programs
are capacity constrained and barriers to entry are large due to xed costs or accreditation require-
ments. The implicit tuition can therefore explain the empirical observations of Niederle and Roth
(2003, 2009) in fellowship markets and highlights why analyzing matching markets using a perfect
competition model can be quantitatively misleading.
In this market, salaries may also be inuenced by the previously mentioned guideline requiring
minimum nancial compensations for residents. While these forces may be important, they seem
unrelated to the match. In other words, programs may not have the incentive to pay salaries close
to levels suggested by the plaintis because of economic primitives.

9 Application 2: Rural Hospitals


Access to medical care is signicantly lower in rural communities of the United States: about
a fth of the US resides in rural counties but only a tenth of physicians practice in these areas
(Rosenblatt and Hart, 2000). Increasing residency training in rural areas is seen as an important
part of solutions to this disparity in access to care because of the empirical association between
rural training or background with recruitment and retention of rural physicians (Brooks et al., 2002;
Talley, 1990). About 20% of urban born residents graduating from family medicine programs start
their initial practice in rural areas, roughly in proportion to the population in rural communities
of the US, whereas about 46% of rural born family medicine residents begin their practice in rural
communities (Table D.5). Both urban-born and rural-born residents trained in rural areas are
about 30 percentage points more likely to enter a rural practice after residency (Table D.5). While
some of this association is probably driven by selection into rural residency training programs, it
may also partly be a causal eect of rural training. The dierence in the nature of urban and
rural medicine and specialized experience useful for practicing in rural areas may be a contributing
factor.44
The Patient Protection and Aordable Care Act of 2010 (ACA) contains provisions for increas-
ing the training and recruitment of primary care physicians in rural areas. The ACA provides an
44
Non-specialist primary care physicians tend to supply a disproportionately larger fraction of medical care in rural
counties, including emergency and obstetrics care. Family medicine residents training in rural areas may consequently
be more likely to receive specic experience for practicing rural medicine. Many practitioners concerned with the
rural physician shortage argue for an increased emphasis on rural residency training through either rural programs
or rotations (Rosenblatt and Hart, 2000; Rabinowitz et al., 2008).

36
additional $1.5 billion to loan forgiveness programs focussed on recruiting physicians into health
physician shortage areas and creates targeted grants for increasing residency training positions in
primary care, especially in rural areas.45 Similar concerns motivated Japan to institute regional caps
that reduced the number of positions in urban programs proportionally to their size. Arguably,
caps on urban programs could be implemented in the United States through the Accreditation
Council for Graduate Medical Education (ACGME). In fact, the ACA moves a large number of
unused Medicare funds allocated for supporting costs of residency training in urban programs to
states with disproportionately low resident-to-population ratios and rural areas (see 5503 ACA,
2010).
Broadly speaking, the ACA enacts recruitment incentives and quantity regulations to encourage
physician supply in rural areas. I study the eects of these policies by comparing simulated outcomes
from environments with and without the intervention. A complete model of the market makes it
possible to account for general equilibrium eects. I focus on quantifying impact of these policy
interventions on the sorting and number of residents in rural programs because many of the private
and social costs and benets are di cult to quantify. Insight on the assignments resulting from
these interventions may inuence the decisions of a social planner considering such policies.
All simulations are conducted using the 2010-2011 academic year of the data and specication
(1). I assume that the policies do not aect the entry of residents into the market. Specications (2)
and (3) yield qualitatively similar results. Specication (1) does not use an instrument for salaries,
which Section 7 notes is likely to result in an overestimate of the coe cient on salaries. This is not
a primary concern in the analysis of supply interventions because salaries are kept xed, and only
the choices residents conditional on salaries are important. The analysis of nancial incentives,
however, may overestimate the sensitivity of residents to these policies.

9.1 Financial Incentives for Rural Training


I mimic the loan forgiveness programs of the National Health Services Corps, except for medical
residents. The program currently provides an annual incentive of $20,000 to $30,000 to primary
care physicians for practicing in Health Physician Shortage Area, usually rural or inner-city com-
munities. To simulate the impact of such recruitment incentives for residents training in rural areas,
I exogenously increase the salaries at rural hospitals by $5,000, $10,000 and $20,000. The average
estimated utility dierence between the rural and urban programs is between $5,000 and $10,000
(Table 9).
Panel A of Table 12 presents summaries from the baseline simulation from the model using data
from the year 2010 - 2011. The number of positions lled in rural areas, as observed in the data,
is 310. The average predicted by the model is slightly higher at 313.37 although the inter-quartile
45
The ACA supplements the budget of the National Health Services Corps loan forgiveness program. Section 5301
provides grants for enhancing capacity at existing primary care training locations and Sections 10501 (I) 5508(a)
provides grants specically for establishing new programs in rural health clinics and programs. See Bailey (2010) or
Table 2 of the Congressional Research Service report titled "Discretionary Spending in the Patient Protection and
Aordable Care Act (ACA)."

37
range of simulations contains the observed number of matches. According to baseline simulations,
the quality of doctors matched with rural areas is similar to the quality of doctors in urban areas.
This is consistent with the reduced form evidence presented in Table 3 that do not see a signicant
disadvantage to currently operating rural programs.46
Panel B presents the impact of increased incentives for rural training. The incentive aects
residents roughly indierent between a rural and an urban program to rank the rural program ahead
of the urban program. Across the board, we see small increases in the number of residents matches
to programs in rural communities. An incentive of $20,000 increases the number of residents training
in rural areas by about 17, or 5.5% of the number of positions in rural programs. This incentive
costs the government $325,000 per additional resident matched to a rural program because most of
the loan forgiveness accrues to residents assigned to positions that would be occupied without the
nancial incentive. Instead of aecting numbers, the primary impact of incentives is an increase in
the human capital of residents matching to rural areas. As compared to a baseline of about an even
chance, under a small $5,000 incentive, a randomly chosen rural resident is about 9.4 percentage
points more likely to have a higher human capital than an urban resident. This increase in the
quality of residents is increasing with size of the incentives.
These results can be explained by capacity constraints in rural areas. While price incentives
directly increase the number of residents ranking rural programs ahead of urban programs, the
number that match with any given program is constrained by its capacity. With 310 out of 334 po-
sitions lled, there is little scope for the incentive to substantially increase numbers. Consequently,
although the incentives increase the pool of residents ranking rural programs higher, capacity con-
straints prevent an increase in numbers but allow an increase in the quality of residents matched
at subsidized programs.
One may ask whether a simpler analysis based on partial equilibrium reasoning with unilateral
salary increases by programs would lead to similar conclusions on the assignments between residents
and programs. The quasi-linear utility function implies that a uniform increase in salaries of
all residency programs would not impact assignments because the comparison between any two
programs remains unchanged. A partial equilibrium analysis based on unilateral salary increases
substantially deviates from this prediction. For smaller interventions we expect general equilibrium
eects to be less pronounced. In Appendix D.2, I compare general and partial equilibrium eects of
incentivizing rural training, and more broadly, training in medically underserved states. I nd that
a partial equilibrium analysis overestimates the number of positions allocated for small incentives,
but for larger incentives, overestimates aggregate increases in the quality of residents.

Welfare Eects and the Importance of Heterogeneity

It is not obvious whether the small increase in numbers and a larger increase in the quality
of residents matched with rural programs is socially desirable. A complete cost-benet analysis
46
Unconditionally, rural programs are 7 percentage points more likely to be matched with residents that have an
MD degree. The average medical school median MCAT score of a resident matched with a rural programs is less
than a point lower, and the average NIH funding is 0.3 log points lower.

38
depends on the private surplus to programs and residents as well any social benets of rural training.
The model only allows us to quantify the cost of nancial incentives and its impact on the total
private surplus to residents. Table 12 shows that a $5,000 incentive results in a transfer of $1.6
million from the government to residents. However, the estimated increase in residents private
welfare is 13.5% more than this amount. This result is a consequence of heterogeneous preferences
and the ability of nancial incentives to realize potential e ciency gains by assigning residents with
the lowest distaste for rural programs to those positions. A small incentive for training in a rural
program only induces a resident who is roughly indierent between a rural and an urban program
to choose rural training. This resident then opens up a position in an urban program that may
be strongly preferred by another resident. Therefore, general equilibrium re-sorting eects of the
nancial incentive result in an increase the e ciency of assignments.
The potential for nancial incentives for targeting residents with low distaste for rural areas
only exists when preferences are heterogeneous. In a model that does not allow for heterogeneity,
the willingness to pay for training at a program is identical across residents. Such a model would
predict that a permutation of the assignment does not aect residentswelfare. The impact on the
private benets to residents, net of the transfer, is only through the total number of positions lled
at dierent programs.

9.2 Supply Interventions


I assess the impact of supply regulations in this market by simulating outcomes after changing
the number of positions oered at dierent programs. I consider three types of policy interven-
tions. The rst mimics the policy implemented in Japan and reduces the number of positions
in urban programs proportional to the size of the program (subject to integer constraints) until
further reductions would lead to fewer positions than the total number of residents in the market.
The second intervention is motivated by the provisions in the ACA for increasing the number of
rural training positions. Since the characteristics of new programs are not known, I increase the
number of positions in existing rural programs. This can be thought of as creating copies of ex-
isting programs via grants funded by the ACA. The nal intervention combines the two by rst
increasing the number of positions at existing rural programs followed by decreasing the number
of positions in urban programs proportionally. In all counterfactuals, the number of residents and
observed characteristics are the same as in the dataset. Consequently, the second intervention has
signicantly more positions than programs.
Panel C of Table 12 presents the estimated eects of these policy interventions. Since a policy
that reduces the number of positions oered at urban programs displaces residents from urban
areas, it mechanically increases the number of residents matching at rural programs. However,
the sorting eects of these changes are not a priori clear. A naive reasoning may lead to the
conclusion that caps have a large adverse impact on the quality of residents training at rural
programs because displaced residents are disproportionately less desired by the programs they are
matched to. However, residents displaced from urban programs in turn displace others, resulting

39
in overall resorting. According to estimates from both models, the distribution of resident quality
matching at rural programs is similar to the distribution before the caps.
A major, perhaps not surprising, impact of the caps is the loss in private welfare of residents
from the decreased availability of positions. This decrease results in a similar number of additional
residents in rural programs as a $5,000 nancial incentive. However, price incentives result in
an overall gain for residents in addition to the transfer. The observation suggests that quantity
regulations are a blunt policy instrument that do not target residents with the least dislike for rural
positions.
Column (ii) presents the impact of increasing the number of positions in rural residency pro-
grams by two each. This policy signicantly increases the number of residents matched to rural
programs and also results in an increase in the quality of residents in rural areas. As compared to
outcomes prior to the policy, the typical residents assigned to a rural program is 7 percentage points
more likely to have a higher human capital index than a resident matched to an urban program.
The change in quality of residents in rural areas is due to increases in the number of residents
matched at the highest quality rural programs but decreases in the number of residents matched
at low quality residency programs in urban and rural areas. Although not considered here, entry
of additional residents into the family residency market could mitigate adverse eects of unlled
positions.
Finally, the third policy combines the other two and, by construction, has a large eect on
the number of residents placed in rural programs. As compared to a singular increase in positions
oered in rural areas, this policy can adversely aect the quality of residents assigned to rural
programs. The reason is that residents with a low human capital are forced into undesirable
residency positions that were earlier left vacant under an increase in rural positions.

9.3 Discussion
Many regulations target an activity in which levels alone determine social benets. In the context
of residency training and other matching markets, a social planner may be concerned about the
type of resident training in a rural area in addition to the total number of residents. For instance,
if retention is an important goal, we may prefer a policy that yields residents with higher intrinsic
preference for rural areas in rural training locations. The costs imposed on urban programs by
these interventions are yet another factor that may inuence optimal policy design. The analysis
presented here sheds light on general equilibrium sorting impacts of interventions that should be
considered when designing policy towards rural training.
The exercise also illustrates the ability of the model to understand policy interventions in
matching markets more broadly. In settings where sorting may be an important consideration in
policy decisions, the methods developed in this paper are a natural tool for analysis. There are
perhaps other equally important factors inuencing policy choices, such as the endogenous decisions
of participating in the market or setting salaries. It may be possible to use an appropriately
augmented version of this model to incorporate such decisions. In this study, I hold these decisions

40
held xed to narrowly focus on the direct eects of studied interventions.

10 Conclusion
Two key features of two-sided matching markets are that agents are heterogeneous and that highly
individualized prices are often not used. Both properties have important implications for equilib-
rium outcomes because assignments are determined by the mutual choices of agents rather than
price-based market clearing. A quantitative analysis of policy interventions may therefore require
estimates of preferences on both sides of the market.
When data on stated preferences is available, extensions of discrete choice methods can provide
straightforward techniques for analysis (see Hastings, Kane, and Staiger, 2009; Abdulkadiroglu,
Agarwal, and Pathak, 2012, among others). A common constraint is that only data on employer-
employee matches or student enrollment records, rather than stated preferences, are available.
This paper develops empirical methods for recovering preferences of agents in two-sided markets
with low frictions using only data on nal matches. I use pairwise stability together with a vertical
preference restriction on one side of the market to estimate preference parameters using the method
of simulated moments. The empirical strategy is based on using sorting patterns observed in the
data and information available only in many-to-one matching. Sorting patterns alone cannot be
used identify the parameters of even a highly simplied model with homogeneous preferences on
both sides of the market.
These methods allow me to empirically analyze two important issues concerning the market for
medical residents. First, I address the academic debate on whether centralization in this market
causes low salaries. A stylized model shows that a limited supply of desirable residency positions
can depress salaries even under frictionless competitive negotiations. Residents willingness to
pay for desirable programs results in average salaries that are at least $23,000 lower than levels
suggested by a perfect competition model. Models using wage instruments result in imprecise but
higher estimated markdowns, of about $43,000. These markdowns are due to an implicit tuition
that can explain the gap between incomes of medical residents and physician assistants, and also
the empirical observations of Niederle and Roth (2003, 2009). The result suggests that the limited
supply of heterogeneous residency positions is the primary cause of low salaries, and weighs against
the view the match is responsible for low resident salaries.
Second, I show that policy interventions aimed at encouraging rural training have important
eects on the sorting of residents. For this reason, price incentives and quantity regulations are
not equivalent policy instruments. Furthermore, the size, scope and design of these interventions
signicantly inuence the qualitative and quantitative eects of these interventions. While supply
regulations are more eective at increasing the number of residents in rural areas, nancial incentives
are able to specically target residents that do not signicantly dislike training in rural areas.
Analyzing the general equilibrium eects of both interventions on residentsprivate welfare and the
sorting of residents into rural areas needs a complete model of market primitives.

41
The methods and analysis in this paper can be extended in several directions. The restriction
on the preferences of one side of the market could be relaxed in other markets if the data contain
information that would allow estimating heterogeneous preferences on both sides of the market. For
instance, it may have been possible to estimate heterogenous preferences for residents if program
characteristics that can plausibly be excluded from resident preferences were observed. Future
research in other matching markets could use data from several markets in which the composition
of market participants diers in order to estimate heterogeneous preferences on both sides. These
extensions must also confront methodological hurdles arising from a multiplicity of equilibria are
important in other matching markets.
General equilibrium eects of price and supply interventions are important in other matching
markets as well. For instance, tuition regulations in public universities and public school reforms
introducing new schools or shutting down under-performing schools also aect the sorting of stu-
dents. There are also additional eects of these policies on other endogenous choices such as entry
decisions and price or capacity setting. In future research, I plan to use theoretical and empirical
tools to further investigate these interventions in matching markets.

42
References
Abdulkadiroglu, A., N. Agarwal, and P. A. Pathak (2012). Sorting and Welfare Consequences of
Coordinated School Admissions: Evidence from New York City.

Agarwal, N. and W. F. Diamond (in progress). Identication and Estimation in Two-sided Matching
Markets.

Akkus, O., A. Cookson, and A. Hortacsu (2012). The Determinants of Bank Mergers: A Revealed
Preference Analysis.

Anderson, G. F. (1996). What Does Not Explain the Variation in the Direct Costs of Graduate
Medical Education. Academic Medicine 71 (2), 1649.

Bailey, J. M. (2010). Health Care Reform, Whats in it? Rural Communities and Medical Care.
Technical report, Center for Rural Aairs.

Bajari, P. and J. T. Fox (2005). Measuring the E ciency of an FCC Spectrum Auction.

Becker, G. S. (1973). A Theory of Marriage: Part I. Journal of Political Economy 81 (4), 813 46.

Becker, G. S. (1975). Human Capital: A Theoretical and Empirical Analysis, with Special Reference
to Education, 2nd ed. New York: National Bureau of Economic Research, Inc.

Ben-Porath, Y. (1967). The Production of Human Capital and the Life Cycle of Earnings. Journal
of Political Economy.

Berry, S. T. (1994). Estimating Discrete-Choice Models of Product Dierentiation. RAND Journal


of Economics 25 (2), 242 262.

Berry, S. T., J. Levinsohn, and A. Pakes (1995). Automobile Prices in Market Equilibrium. Econo-
metrica 63 (4), 841 890.

Berry, S. T. and A. Pakes (2007). The Pure Characteristics Demand Model. International Economic
Review 48 (4), 11931225.

Blundell, R. and J. Powell (2003). Endogeneity in Nonparametric and Semiparametric Regression


Models. In M. Dewatripont, L. Hansen, and S. Turnovky (Eds.), Advances in Economics and
Econometrics, Chapter 8, pp. 312357.

Boyd, D., H. Lankford, S. Loeb, and J. Wycko (2003). Analyzing the Determinants of the Match-
ing Public School Teachers to Jobs: Estimating Compensating Dierentials in Imperfect Labor
Markets.

Breusch, T. S. and A. R. Pagan (1979). A Simple Test for Heteroscedasticity and Random Coe -
cient Variation. Econometrica 47 (5), 1287 94.

Brooks, R. G., M. Walsh, R. E. Mardon, M. Lewis, and A. Clawson (2002). The Roles of Nature
and Nurture in the Recruitment and Retention of Primary Care Physicians in Rural Areas: A
Review of the Literature. Academic Medicine 77 (8), 7908.

Bulow, J. and J. Levin (2006). Matching and Price Competition. The American Economic Re-
view 96 (3), 652 668.

43
Camina, E. (2006). A generalized assignment game. Mathematical Social Sciences 52 (2), 152
161.

Chiappori, P.-A., B. Salani, and Y. Weiss (2011). Partner Choice and the Marital College Premium.

Choo, E. and A. Siow (2006). Who Marries Whom and Why. Journal of Political Economy 114 (1),
175201.

Christakis, N. A., J. H. Fowler, G. W. Imbens, and K. Kalyanaraman (2010). An Empirical Model


for Strategic Network Formation.

Clark, S. (2006). The Uniqueness of Stable Matchings. The B.E. Journal of Theoretical Eco-
nomics contr. 6 (1), 8.

Conlon, C. T. and J. H. Mortimer (2010). Demand Estimation Under Incomplete Product Avail-
ability.

Crawford, V. P. (2008). The Flexible-Salary Match: A proposal to increase the salary exibility of
the National Resident Matching Program. Journal of Economic Behavior & Organization 66 (2),
149160.

Fox, J. T. (2008). Estimating Matching Games with Transfers.

Fox, J. T. (2009). Structural Empirical Work Using Matching Models. In S. N. Durlauf and L. E.
Blume (Eds.), New Palgrave Dictionary of Economics (Online ed.).

Gale, D. and L. S. Shapley (1962). College admissions and the stability of marriage. The American
Mathematical Monthly 69 (1), 915.

Galichon, A. and B. Salanie (2010). Matching with Trade-os: Revealed Preferences over Compet-
ing Characteristics.

Gentile Jr., T. C. and D. Buckley (2009). Medicare Reimbursement and Graduate Medical Ed-
ucation. In J. Levine (Ed.), Guide to Medical Education in the Teaching Hospital (4th ed.).
Association for Hospital Medical Education.

Goldberg, D. E. (1989). Genetic Algorithms in Search, Optimization and Machine Learning (First
ed.). Boston, MA: Addison-Wesley Longman Publishing Co., Inc.

Gordon, N. and B. Knight (2009). A spatial merger estimator with an application to school district
consolidation. Journal of Public Economics, 752 765.

Gourieroux, C. and A. Monfort (1997). Simulation-based Econometric Methods. New York: Oxford
University Press.

Hastings, J. S., T. J. Kane, and D. O. Staiger (2009). Heterogeneous Preferences and the E cacy
of Public School Choice.

Heckman, J. J., L. J. Lochner, and P. E. Todd (2003). Fifty Years of Mincer Earnings Regressions.

Heckman, J. J. and R. J. Robb (1985). Alternative methods for evaluating the impact of interven-
tions: An overview. Journal of Econometrics 30 (1-2), 239267.

44
Imbens, G. W. (2007). Nonadditive Models with Endogenous Regressors. In R. Blundell, W. Newey,
and T. Persson (Eds.), Advances in Economics and Econometrics, Vol III, Chapter 2, pp. 17
46.

Imbens, G. W. and W. K. Newey (2009). Identication and Estimation of Triangular Simultaneous


Equations Models Without Additivity. Econometrica 77 (5), 1481 1512.

Johnson, S. G. (2011). The NLopt nonlinear-optimization package.

Jung et.al. v AAMC et.al. (2002). Class Action Complaint, No. 02-CV-00873, D.D.C. May 5, 2002.

Kamada, Y. and F. Kojima (2010). Improving E ciency in Matching Markets with Regional Caps:
The Case of the Japan Residency Matching Program.

Kelso, A. S. J. and V. P. Crawford (1982). Job Matching, Coalition Formation, and Gross Substi-
tutes. Econometrica 50 (6), 1483 1504.

Kojima, F. (2007). Matching and Price Competition: Comment. American Economic Review 97 (3),
1027 1031.

Kojima, F., P. Pathak, and A. E. Roth (2010). Matching with Couples: Stability and Incentives
in Large Markets.

Kojima, F. and P. A. Pathak (2009). Incentives and Stability in Large Two-Sided Matching Markets.
American Economic Review 99 (3), 608 27.

Kolaczyk, E. D. (2009). Statistical Analysis of Network Data: Methods and Models. Springer.

Logan, J. A., P. D. Ho, and M. A. Newton (2008). Two-Sided Estimation of Mate Preferences for
Similarities in Age, Education, and Religion. Journal of the American Statistical Association,
559 569.

McFadden, D. (1989). A Method of Simulated Moments for Estimation of Discrete Response Models
without Numerical Integration. Econometrica 57 (5), 995 1026.

Mincer, J. (1974). Schooling, Experience, and Earnings. New York: National Bureau of Economic
Research, Inc.

Mortensen, D. T. and C. A. Pissarides (1994). Job Creation and Job Destruction in the Theory of
Unemployment. Review of Economic Studies 61 (3), 397 415.

Newhouse, J. P. and G. R. Wilensky (2001). Paying For Graduate Medical Education: The Debate
Goes On. Health A airs 20 (2), 136147.

Niederle, M. and A. E. Roth (2003). Relationship Between Wages and Presence of a Match in
Medical Fellowships. JAMA, Journal of the American Medical Association 290 (9).

Niederle, M. and A. E. Roth (2009). The Eects of a Centralized Clearinghouse on Job Placement,
Wages, and Hiring Practices. In David Autor (Ed.), NBER Chapters, pp. 235 271.

Niederle, M. and L. Yariv (2009). Decentralized Matching with Aligned Preferences.

Pakes, A. and D. Pollard (1989). Simulation and the Asymptotics of Optimization Estimators.
Econometrica 57 (5), 1027 57.

45
Petrin, A. K. and K. E. Train (2010). A control function approach to endogeneity in consumer
choice models. Journal of Marketing Research 47 (1), 313.

Postel-Vinay, F. and J.-M. Robin (2002). Equilibrium Wage Dispersion with Worker and Employer
Heterogeneity. Econometrica 70 (6), 2295 2350.

Rabinowitz, H. K., J. J. Diamond, F. W. Markham, and J. R. Wortman (2008). Medical school


programs to increase the rural physician supply: a systematic review and projected impact of
widespread replication. Academic Medicine 83 (3), 23543.

Rosen, S. (1972). Learning and experience in the labor market. Journal of Human Resources 7 (3),
326342.

Rosen, S. (1987). The theory of equalizing dierences. In O. Ashenfelter and R. Layard (Eds.),
Handbook of Labor Economics, Volume 1 of Handbook of Labor Economics, Chapter 12, pp.
641692. Elsevier.

Rosenblatt, R. A., A. Hagopian, C. H. A. Andrilla, and G. Hart (2006). Will Rural Family Medicine
Residency Training Survive? Family medicine 38 (10), 70611.

Rosenblatt, R. A. and L. G. Hart (2000). Physicians and Rural America. The Western journal of
medicine 173 (5), 34851.

Roth, A. E. (1984). The Evolution of the Labor Market for Medical Interns and Residents: A Case
Study in Game Theory. Journal of Political Economy 92 (6), 9911016.

Roth, A. E. and E. Peranson (1999). The Redesign of the Matching Market for American Physicians:
Some Engineering Aspects of Economic Design. American Economic Review 89 (4), 748780.

Roth, A. E. and M. A. O. Sotomayor (1992). Two-Sided Matching. Cambridge University Press.

Roth, A. E. and J. H. Vande Vate (1990). Random Paths to Stability in Two-Sided Matching.
Econometrica 58 (6), 1475 80.

Roth, A. E. and X. Xing (1994). Jumping the Gun: Imperfections and Institutions Related to the
Timing of Market Transactions. American Economic Review 84 (4), 9921044.

Rowan, T. H. (1990). Functional Stability Analysis Of Numerical Algorithms. Ph. D. thesis.

Shapley, L. S. and M. Shubik (1971). The assignment game I: The core. International Journal of
Game Theory 1 (1), 111130.

Shimer, R. and L. Smith (2000). Assortative Matching and Search. Econometrica 68 (2), 343 370.

Signer, M. M. (2012). 2012 Main Residency Match and SOAP. Technical report, National Resident
Matching Program.

Sorensen, M. (2007). How Smart Is Smart Money? A Two-Sided Matching Model of Venture
Capital. Journal of Finance 62 (6), 2725 2762.

Sotomayor, M. (1999). The Lattice Structure of the Set of Stable Outcomes of the Multiple Partners
Assignment Game. International Journal of Game Theory 28 (4), 567 583.

Stern, S. (2004). Do Scientists Pay to Be Scientists? Management Science 50 (6), 835853.

46
Stock, J. H., J. H. Wright, and M. Yogo (2002). A Survey of Weak Instruments and Weak Identi-
cation in Generalized Method of Moments. Journal of Business & Economic Statistics 20 (4),
51829.

Talley, R. C. (1990). Graduate medical education and rural health care. Academic medicine 65 (12
Suppl), S225.

Teulings, C. N. (1995). The Wage Distribution in a Model of the Assignment of Skills to Jobs.
Journal of Political Economy 103 (2), 280 315.

Thomas Lemieux (2006). The Mincer Equation Thirty Years After Schooling, Experience, and
Earnings. In Jacob Mincer: A Pioneer of Modern Labor Economics.

Uetake, K. and Y. Watanabe (2012). Entry by Merger: Estimates from a Two-Sided Matching
Model with Externalities.

Weese, E. (2008). Political Mergers as Coalition Formation: Evidence from Japanese Municipal
Amalgamations.

White, H. (1980). A Heteroskedasticity-Consistent Covariance Matrix Estimator and a Direct Test


for Heteroskedasticity. Econometrica 48 (4), 817 38.

Yang, S., Y. Chen, and G. M. Allenby (2003). Bayesian Analysis of Simultaneous Demand and
Supply. Quantitative Marketing and Economics 1 (3), 251275.

47
Table 1: Program Characteristics

2010-2011 2003-2004 to 2010-2011


All Programs Rural Programs All Programs Rural Programs
N = 428 N = 63 N = 3,441 N = 481
Mean Std. Mean Std. Mean Std. Mean Std.
First Year Salary (2010 dollars) 47,331 2,953 47,234 2,598 46,394 3,239 46,259 2,882
Program Size 7.70 2.83 5.30 2.64 7.57 2.77 5.25 2.44
Number of Matches 7.36 2.93 4.92 2.77 7.01 2.92 4.72 2.51

NIH Funding (Major A l., bil $) 58.85 86.04 46.51 76.35 56.97 84.85 37.71 61.48
NIH Funding (Minor A l., bil $) 57.98 77.21 40.98 75.02 53.25 83.87 37.34 82.22
Beds (Primary Inst) 421.54 284.15 240.47 151.81 418.41 273.17 257.54 150.29

48
Community Based Program 0.25 0.43 0.29 0.46 0.33 0.47 0.39 0.49
Community-University Program 0.62 0.49 0.63 0.49 0.54 0.50 0.53 0.50
University Based Program 0.13 0.34 0.08 0.27 0.12 0.33 0.07 0.26

Number of Interviews 63.38 31.10 37.62 22.07 55.56 30.17 31.91 20.14

Medicare Case Mix Index (Prim. Inst) 1.61 0.23 1.52 0.20 1.57 0.22 1.50 0.22
Medicare Wage Index (Prim. Inst) 1.00 0.14 0.93 0.11 1.01 0.14 0.93 0.10

Notes: Details on the construction of variables and the rule for classifying a program as rural is provided in the data appendix. Statistics on interviews and
Medicare elds reported conditional on non-missing data. Less than 2% of the data on these elds is missing. NIH fund statistics are reported only for programs
with NIH funded a liates. About 35% of the programs have no NIH funded major a liates, while about 46% have no minor a liates. About 8% of programs
have no NIH funded medical school a liates. All other characteristics have full coverage.
Table 2: Resident Characteristics

2010-2011 2003-2004 to 2010-2011


N = 3,148 N = 24,115
Mean Std Mean Std
Allopathic/MD Graduate 0.45 0.50 0.45 0.50
Osteopathic/DO Graduate 0.15 0.36 0.14 0.34
Foreign Medical Graduate 0.39 0.49 0.41 0.49

49
NIH Funding (MD grads, mil $) 83.26 82.42 84.08 83.96
Median MCAT Score (MD grads) 31.24 2.25 31.31 2.20

US born Foreign Graduate 0.12 0.33 0.09 0.29


Rural Born Resident 0.11 0.31 0.10 0.30

Notes: Details on the construction of variables provided in the data appendix. A resident is classied as rural born if her city of birth is not in an MSA. City of
birth data is unreliable for about 7.3% residents - rural born is coded as missing for these residents. Country of birth is not known for 14.6% of residents, and
are treated as foreign graduates not born in the US.
Table 3: Sorting between Residents and Programs

Log NIH Fund Median MCAT MD Degree DO Degree


(MD) (MD)
(1) (2) (3) (4)
Log NIH Fund (Major) 0.3724*** 0.0154*** 0.0462*** 0.0025
(0.0119) (0.0007) (0.0032) (0.0022)
Log NIH Fund (Minor) 0.1498*** 0.0084*** 0.0208*** 0.0048*
(0.0137) (0.0008) (0.0040) (0.0028)
Log # Beds -0.0972*** -0.0021 -0.0104 -0.0098**
(0.0221) (0.0014) (0.0064) (0.0045)
Rural Program -0.0687 -0.0040 -0.0010 0.0138*
(0.0437) (0.0027) (0.0117) (0.0082)
Log Case-Mix Index 0.1894** 0.0136** 0.4670*** 0.0574***
(0.0940) (0.0058) (0.0255) (0.0179)
Log First-Year Salary 0.0126 0.0590*** 0.3001*** 0.0969***
(0.1717) (0.0106) (0.0467) (0.0327)
Log Rent 0.4612*** 0.0727*** 0.1811*** -0.0012
(0.0600) (0.0037) (0.0168) (0.0118)

Observations 10,842 10,872 23,984 23,984


R-squared 0.1318 0.1282 0.0381 0.0079

Notes: Linear regression of residents graduating school characteristic on matched program characteristics. Samples
pooled from the academic years 2003-2004 to 2010-2011. Column (1) restricts to the set of residents graduating
from medical schools with non-zero average annual NIH funding. Column (2) restricts to the subset of residents
with MD degrees from institutions reporting a median MCAT score in the Medical School Admission Requirements
in 2010-2011. Columns (3) and (4) include all residents. See data appendix for description of variables. All
specications include dummy variables for programs with no NIH funding at major a liates, no NIH funding at
minor a liates and a missing Medicare ID for the primary institution. Standard errors in parenthesis. Signicance
at 90% (*), 95% (**) and 99% (***) condence.

50
Table 4: Geographical Sorting between Residents and Programs

Log NIH Fund Log NIH Fund Log # Beds Log Case Rural
(Major) (Minor) Mix Index Program
(1) (2) (3) (4) (5)

Log NIH Fund (MD) 0.4058*** 0.1555*** -0.0213*** -0.0002 -0.0110***


(0.0124) (0.0116) (0.0046) (0.0011) (0.0023)
Log Median MCAT (MD) 0.6953*** 0.4704*** 0.0830** 0.0023 -0.0877***
(0.1009) (0.0914) (0.0364) (0.0091) (0.0184)
US Born (For) -0.0711* -0.1032*** -0.0025 0.0186*** 0.0141*
(0.0374) (0.0366) (0.0143) (0.0036) (0.0072)
Match in Med Sch. State -0.4463*** -0.2646*** 0.0468*** -0.0057* 0.0111*
(0.0322) (0.0303) (0.0121) (0.0030) (0.0061)
Match in Birth State -0.0038 0.0197 -0.0376*** -0.0075*** -0.0115**
(0.0285) (0.0264) (0.0105) (0.0026) (0.0053)
Rural Born Resident 0.0714***
(0.0066)

Observations 15,394 13,099 24,115 23,652 24,115


R-squared 0.1211 0.0299 0.0052 0.0167 0.0101

Notes: Linear regression of characteristics of program or program a liates on characteristics of matched residents.
Samples pooled from the academic years 2003-2004 to 2010-2011. Column (1) restricts the sample to the set of
programs with major a liates that have positive NIH funding. Column (2) restricts the sample to the set of
programs with a minor a liate with non-zero NIH funding. Column (3) and column (5) includes all programs.
Columns (4) excludes programs for which the Medicare ID is missing. All specications have medical school type
dummies and a dummy for residents graduating from MD medical schools without NIH funding. Column (5)
includes a dummy for non-reliable city of birth information for US born residents. See data appendix for description
of variables. Standard errors in parenthesis. Signicance at 90% (*), 95% (**) and 99% (***) condence.

51
Table 5: Within Program Variation in Resident Characteristics

Fraction of Variation Within Program-Year

Log NIH Fund (MD) 77.83%


Median MCAT (MD) 72.09%

US Born Foreign Graudate 79.01%

Osteopathic/DO Degree 85.16%


Foreign Degree 57.16%
Allopathic/MD Degree 64.81%

Female 96.40%

Notes: Each row reports 1 2 from a separate linear regression of residents graduating school characteristic
Radj
absorbing the program-year xed eects. Samples from the academic years 2003-2004 to 2010-2011. Samples for
regressions with LHS variables Log NIH funding (MD), Median MCAT (MD) are restricted to the set of residents
with non-missing values for the respective characteristic. Regression of US Born (For) restrict to graduates of
foreign medical schools. Osteopathic/DO Degree, Foreign Degree, Allopathic/MD Degree are linear probability
models estimated on the full sample.

52
Table 6: Peer Sorting

Peer Log NIH Fund Peer Log MCAT Peer Foreign Peer DO Degree Peer MD Degree
Degree
(1) (2) (3) (4) (5)

Log NIH Fund (MD) 0.2919*** 0.0103*** -0.0249*** -0.0043** 0.0293***


(0.0132) (0.0026) (0.0030) (0.0019) (0.0033)
Log Median MCAT (MD) 0.6449*** 0.0874 -0.2000*** 0.0165 0.1850***
(0.1832) (0.0750) (0.0458) (0.0247) (0.0499)
US Born (For) 0.0403 0.0141 -0.1063*** 0.0394*** 0.0669***
(0.0421) (0.0103) (0.0091) (0.0050) (0.0079)

53
Observations 19,830 19,845 24,066 24,066 24,066
R-squared 0.1280 0.6437 0.3632 0.0914 0.3197

Notes: Linear regression of average characteristics of peers on the characteristics of a resident. A peer of a resident is another resident matched to the same
program as that resident in the academic cohort of said resident. The calculation of peer averages for a resident excludes the resident herself. Samples pooled
from the academic years 2003-2004 to 2010-2011. Column (1) restricts the sample to the set of residents with at least one peer that graduated from a medical
school with non-zero NIH funding. Column (2) restricts the sample to the set of residents with at least one peer that graduated from a medical school with
non-missing MCAT Score. Peer averages for columns (1) and (2) are constructed only from peers with non-missing observations of these characteristics.
Columns (3-5) considers all residents with at least one peer. All specications have medical school type dummies and a dummy for residents graduating from
MD medical schools without NIH funding. See data appendix for description of variables. Standard errors clustered at the program-year level in parenthesis.
Signicance at 90% (*), 95% (**) and 99% (***) condence.
Table 7: Wage Regressions
Dependent Variable: Log First Year Salary
(1) (2) (3) (4) (5) (6) (7)
Log Rent 0.0266* -0.0373** -0.0379** 0.0179 -0.0378** 0.0172 -0.0306
(0.0151) (0.0177) (0.0175) (0.0140) (0.0160) (0.0143) (0.0230)
Rural Program 0.0032 0.0065 0.0110 0.0103 0.0104 0.0103 0.0055
(0.0079) (0.0081) (0.0080) (0.0071) (0.0076) (0.0069) (0.0079)
Log Wage Index 0.1366*** 0.1182*** -0.0152 0.0806*** -0.0167 0.0809***
(0.0307) (0.0302) (0.0262) (0.0287) (0.0263) (0.0290)
Log NIH Fund (Major) 0.0024 0.0023 0.0062*** 0.0034 0.0062*** 0.0024
(0.0027) (0.0026) (0.0021) (0.0025) (0.0021) (0.0024)
Log NIH Fund (Minor) -0.0060* -0.0047 -0.0005 -0.0040 -0.0005 -0.0041
(0.0032) (0.0032) (0.0029) (0.0031) (0.0029) (0.0031)
Log # Beds 0.0087* 0.0086* 0.0012 0.0064 0.0010 0.0108**
(0.0046) (0.0045) (0.0036) (0.0041) (0.0036) (0.0043)
Log Case-Mix Index -0.0108 -0.0046 0.0051 -0.0038 0.0056 -0.0065
(0.0195) (0.0195) (0.0151) (0.0190) (0.0152) (0.0191)
Log Reimbursement 0.0227*** 0.0064 -0.0002 0.0050
(0.0077) (0.0076) (0.0063) (0.0070)
Log Competitor Salary (Lagged) 0.8779*** 0.8651***

54
(0.0542) (0.0683)
Log Competitor Reimbursement 0.0968*** 0.0090 0.0847***
(0.0170) (0.0170) (0.0178)

Location characteristics Y
Observations 3,418 3,418 3,418 2,997 3,418 2,997 3,418
R-squared 0.0062 0.0452 0.0640 0.3284 0.1226 0.3294 0.1811

Notes: Regression of a programs rst year salary on program characteristics. Location characteristics include median age (county), log median household
income (county), log total population (MSA/county), violent crime and property crime rates from FBIs Crime Statistics/UCR (25 mi radius weighted by
1/distance), dummies for no data in that radius and log college share (MSA/rest of state). Columns (2-7) include dummy variables for programs with no NIH
funding at major a liates, for no NIH funding at minor a liates, and a dummy for missing Medicare ID for the primary institution. All columns include a
constant term. The Competitor Salary (Lagged) is the average of lagged salaries of other family practice residency programs in the geographic area of the
program hospital. Therefore, columns (4) and (6) exclude data from the rst year of the sample, 2003-2004. The Competitor Reimbursement is a weighted
average of the Medicare primary care per resident amounts of institutions in the geographic area of a program other than the primary institutional a liate of
the program. Geographic area dened as in Medicare DGME payments: MSA/NECMA or Rest of State unless less than 3 other observations constitute the
area, in which case the census division is used. See data appendix for description of variables and details on the construction of the reimbursement variables. For
columns (5-7), a programs reimbursement rate is truncated below at $5,000 and a dummy for these 46 truncated observations is estimated as well. Standard
errors clustered at the program level in parenthesis. Sample in all columns restricted to programs for which salary was not imputed from the regressions
described in the data appendix. Signicance at 90% (*), 95% (**) and 99% (***) condence.
Table 8: Preference Estimates

Full Geographic Geo. Het. w/


Heterogeneity Heterogeneity Wage Instrument
(1) (2) (3)
Panel A.1: Preference for Programs (units of std. dev)
Case Mix Index
Coe 4,792 2,320 6,088
(1,624) (1,265) (1,542)
Sigma RC 4,503
(1,037)
Log NIH Fund (Major)
Coe 491 6,499 4,402
(1,651) (2,041) (1,333)
Sigma RC 5,498
(1,234)
Log Beds
Coe 6,900 3,528 8,837
(2,207) (1,259) (1,936)
Sigma RC 11,107
(2,073)
Log NIH Fund (Minor) 4,993 5,560 7,620
(1,558) (1,511) (1,821)
Panel A.2: Preference for Programs
Rural Program 7,327 5,611 17,314
(3,492) (3,555) (4,938)
University Based Program 15,786 11,080 25,130
(3,982) (5,393) (7,088)
Community/University Program -5,001 -2,217 -7,507
(2,016) (1,589) (2,233)

Medical School State 9,820 2,302 4,529


(1,998) (687) (910)
Birth State 6,342 1,320 2,451
(1,308) (411) (497)
Rural Birth x Rural Program 1,189 109 233
(466) (113) (102)

(contd...)

55
Table 8: Preference Estimates (contd)
Full Geographic Geo. Het. w/
Heterogeneity Heterogeneity Wage Instrument
(1) (2) (3)
Panel B: Human Capital
Log NIH Fund (MD) 0.1153 0.1269 0.0941
(0.0164) (0.0139) (0.0131)
Median MCAT (MD) 0.0814 0.0666 0.0413
(0.0070) (0.0038) (0.0030)
US Born (Foreign Grad) 0.1503 -0.2470 0.2927
(0.1021) (0.0801) (0.0705)
Sigma (DO) 0.8845 0.7944 0.7275
(0.0359) (0.0285) (0.0292)
Sigma (Foreign) 3.6190 3.0709 2.8215
(0.1469) (0.1102) (0.1131)

Notes: Detailed estimates and other models using instruments in Table B.1. Results from Panel A estimates
monetized in dollars (normalize wage coe cient to 1). Panel A.1 presents the dollar equivalent for a 1 standard
deviation change in a program characteristic. All columns include median rent in county, Medicare wage index,
indicator for zero NIH funding of major associates and for minor associates. Column (4) includes own
reimbursement rates and the control variable. All specications normalize the mean utility from a program with
zeros on all characteristics to 0. In all specications, the variance of unobservable determinants of the human
capital index of MD graduates is normalized to 1. All specications normalize the mean human capital index of
residents with zeros for all characteristics to 0 and include medical school type dummies. Point estimates using 1000
simulation draws. Standard errors in parenthesis. Optimization and estimation details described in an appendix.

56
Table 9: Estimated Utility Distribution in First-Year Salary Equivalent

Full Geographic Geo. Het. w/


Heterogeneity Heterogeneity Wage Instrument
(1) (2) (3)
N Stat (s.e.) Stat (s.e.) Stat (s.e.)
Panel A: Means in Category
Log Beds (Primary Inst)
Lowest Quartile 107 -$12,509 (3,290) -$5,691 (777) -$15,238 (4,647)
Second Quartile 107 -$2,801 (758) -$3,693 (553) -$3,606 (1,212)
Third Quartile 107 $3,823 (1,138) -$1,041 (320) $1,934 (1,108)
Highest Quartile 107 $11,487 (2,877) $10,425 (1,327) $16,910 (4,831)
Case Mix Index
Lowest Quartile 107 -$10,397 (2,880) -$4,045 (674) -$10,556 (3,450)
Second Quartile 107 -$3,764 (1,100) -$1,965 (436) -$5,162 (1,643)
Third Quartile 107 $3,346 (1,179) -$1,518 (403) $669 (720)
Highest Quartile 107 $10,815 (2,849) $7,528 (1,196) $15,050 (4,663)
Log NIH Fund (Major)
Lowest Quartile 71 -$5,190 (1,716) -$7,903 (1,064) -$15,032 (4,267)
Second Quartile 71 -$3,712 (1,080) -$285 (390) -$8,095 (2,685)
Third Quartile 71 $1,796 (963) $8,460 (1,274) $6,646 (2,021)
Highest Quartile 72 $904 (1,535) $11,733 (1,736) $7,194 (2,368)
County Rent
Lowest Quartile 106 -$5,681 (1,580) -$6,745 (984) -$11,796 (3,549)
Second Quartile 107 -$1,012 (541) -$964 (244) -$3,310 (1,077)
Third Quartile 99 $1,984 (688) $1,715 (333) $2,942 (1,204)
Highest Quartile 116 $4,431 (1,321) $5,589 (827) $11,321 (3,148)

Rural Program 63 -$7,292 (3,101) -$4,692 (967) -$8,066 (4,044)


Urban Program 365 $1,259 (535) $810 (167) $1,392 (698)

Overall Std. Dev. 428 $21,937 (5,215) $14,088 (1,880) $28,578 (8,166)

Notes: Utilities net of salaries are monetized in dollars and normalized to an overall mean of zero. Statistics
averages across residents from 100 simulation draws. Each simulation draws a parameter from a normal with mean
^M SM and variance ^ , where ^ is estimated as described in Section 6.4. Statistics use the 2010-2011 sample.

57
Table 10: Implicit Tuition

Full Geographic Geo. Het. w/


Heterogeneity Heterogeneity Wage Instrument
(1) (2) (3)
Mean $23,802.64 $22,627.64 $43,470.39
(5526.15) (3495.62) (13678.08)
Median $21,263.30 $21,167.71 $40,606.85
(5076.79) (3265.54) (12847.51)

Standard Deviation $16,661.17 $12,278.42 $24,792.30


(3946.33) (1781.09) (7485.20)

5th Percentile $2,795.23 $5,179.08 $7,912.03


(1008.51) (1441.71) (3246.19)
25th Percentile $11,648.70 $14,070.10 $24,853.10
(2820.62) (2364.41) (8299.05)
75th Percentile $31,467.42 $28,902.46 $58,354.66
(7131.65) (4347.95) (18134.03)
95th Percentile $55,279.76 $45,784.76 $92,343.91
(12758.48) (6921.96) (28071.67)

Notes: Based on 100 simulation draws. Each simulation draws a parameter from a normal with mean ^M SM and
variance ^ , where ^ is estimated as described in Section 6.4. Standard errors in parenthesis.

Table 11: Dependence of Implicit Tuition on Demand-Supply Imbalance

Log Average Implicit Tuition in Program


Full Heterogeneity
(1) (2) (3) (4)
Log Residency Positions 0.0008 -0.1557*** -0.0578*** -0.1442***
in Program State (0.0044) (0.0106) (0.0101) (0.0128)
Log Family Medicine MD Graduates 0.1851*** 0.1951***
from Program State (0.0114) (0.0130)
Log US Born Residents 0.0658*** -0.0233
in Program State (0.0102) (0.0145)

R-squared 0.4144 0.4180 0.4150 0.4180

Notes: Linear Regressions. Dependent variable is the log of total implicit tuition at a residency program divided by
the number of residents matched to the program. All regressions on generated implicit tuitions data using the
2010-2011 sample of residents and programs, and 100 simulation draws. All regressions include Log Beds, Log NIH
Fund (Major), Log NIH Fund (Minor), dummies for no NIH funded a liated, Medicare Case Mix Index, Rural
Program dummy and Program type dummies. Standard errors clustered at the simulation level. Signicance at
90% (*), 95% (**) and 99% (***) condence.

58
Table 12: Eects of Policy Instruments for Encouraging Rural Training

Full Heterogeneity
(Specication 1)
Panel A: Baseline Simulations (310/334 positions lled in data)
Simulated Matches 313.33
(310 - 317)
Prob. Rural Match > Urban Match 52.76%

Panel B: Salary Incentives $5,000 $10,000 $20,000


(1) (2) (3)
Rural Matches 10.23 17.3 20.63
(7 - 12) (14 - 21) (17 - 24)
Prob. Rural Match > Urban Match 9.38% 17.70% 31.28%

Total Cost of Subsidy (mil.) $1.62 $3.31 $6.68


Private Welfare of Residents (mil.) $1.84 $3.64 $7.05
Cost Per Additional Resident $158,143 $191,116 $323,762

Panel C: Quantity Regulations Decrease urban +2 positions for Combination


proportionally rural programs of (i) and (ii)
(i) (ii) (iii)

Modied Urban Capacity 2846 2963 2688


% in Urban Capacity -3.95% -9.28%
Modied Rural Capacity 334 460 460
% in Rural Capacity 37.72% 37.72%

in # Rural Matches 12.01 121.31 146.63


(4.5 - 20) (114.5 - 128) (137.5 - 156.5)
Prob Rural Match > Urban Match -0.56% 7.02% -3.73%
ResidentsPrivate Welfare (mi) -$3.76 $5.39 -$5.49

Notes: In Panel C, Column (i) decreases the urban positions in proportion to program size, subject to integer
constraints. Positions at urban programs were reduced in proportion until further reductions would yield a greater
number of residents than programs. In column (i), this yielded 32 more positions than residents. In column (iii),
the number of residents equals the number of positions. All simulations use 2010 - 2011 sample with 3,148 residents
and 3,297 total number of positions. Baseline and counterfactual simulations using 100 draws of structural
unobservables. Inter-quartile range in parenthesis. Prob. X > Y is the Wilcoxian statistic: probability that the
human capital population X is drawn from is greater than that of the population that Y is drawn from.

59
Figure 1: Assortative Matching between Programs and Residents

Notes: Darker regions depict higher density. Density calculated using two-dimensional bandwidths using a quartic
kernel and a bandwidth of 0.6. Log NIH Fund of A liates is the log of the average of NIH funds at major and
minor a liates. Sample restricted academic year 2010-2011 and programs with at least one NIH funded a liate and
residents from NIH funded medical schools.

60
Figure 2: Estimated Distribution of Program Utility

Notes: Estimated distribution of mean utility (from observable components, net of salary) across programs
monetized in terms of rst year salary. Mean utility normalized to zero. Sample of programs from 2010-2011.

61
Figure 3: Model Fit: Simulated vs. Observed Match Quality by Resident Bins

Notes: To construct this scatterplot, I used model estimates from specication (1) to rst obtain the predicted
quality on observable dimensions of the residents and of the programs. Quality for the program is the "vertical
component" zj for the programs. The residents were binned into 10 categories, starting with Foreign graduates,
US born foreign graduates and Osteopathic graduates and seven quantile bins for MD graduates. Resident bins are
constructed from pooling the sample across all years. The seven MD bins are approximately equally sized, except
for point masses at the cutos. The horizontal axis plots observed mean standardized quality of program that
residents from each bin matched with. The vertical axis plots the models predicted mean standardized quality of
the program that a resident in each bin is matched with. An observation is dened at the bin-year level. Simulated
means using the observed distribution of agent characteristics and 100 simulations of the unobserved characteristics.
The 90% condence set for the out-of-sample data is constructed from these 100 simulations.

62
Appendices

A Estimation and Inference


A.1 Moments
For simplicity of exposition, I consider the case of only one market and treat all characteristics as
observed and exogenous. This treatment replaces jt with ^jt . Error is estimating jt is dealt with
in a bootstrap when computing standard errors. I use xi and zj to denote resident and program
characteristics respectively. I assume that covariates that depend on both the residents and the
programs can be written as a known function of xi and zj . This function is subsumed in the
notation.
Given these characteristics and a parameter vector , let FX;";Z; ( jFX ; FZ ) denote the stable
match distribution given the marginal distributions of observed characteristics of agents on each side
of the market. Throughout, I omit conditioning on the marginal distributions to write the match
distribution predicted by as FX;";Z; ( ). I write the match distribution FX;";Z; ( 0 ) at the true
parameter and the population distribution of characteristics as FX;";Z; : Expectations with respect
to FX;";Z; ( ) are denoted E and with respect to FX;";Z; ( 0 ) denoted E0 . I denote population
moments as a function of with m ( ), sample analogs with m ^ and simulation analogs with m ^ ( ).
I denote the observed match with a function : f1; : : : ; N g ! f1; : : : ; Jg and a simulated match
function at with s . Also, let ~ = 1 : f1; : : : ; N g ! 2f1;:::;N g be a map from i to the set of
peers of i (possibly empty since it does not include i):
The three sets of moments discussed in Section 6.2 have the following mathematical expressions.
1. Moments of the match distribution of observable characteristics of residents and programs.
If X and Z are scalar random variables, we can write the second moment of this distribution
as

mov ( ) = E [XZ]
Z
= X Z dFX;Z ( )
1 X 1X n o
m
^ ov ^ Sov ( ) =
m xi zj 1 f (i) = jg 1 s (i) = j :
N S

In general, an arbitrary function of (x; z) can be used in place of the product of X and Z.
One may also use a variable that varies by resident and program, such as an indicator for
whether a program is located in the same state as the residents state of birth.
For estimation, I include pair of covariances between the set of observed program and resident
characteristics that are included in the specications. I also include moments for the same
birth state and the same medical school state. Further, the covariance between the square
of the characteristics of the program on which I include random coe cients and resident
characteristics are included.
2. The within program variance of resident observables. Note that FXjZ; ( ) is the distribution
of characteristics X matched with hospitals with the same value of Z; . In a nite sample,
this is a unique hospital with probability 1: For a scalar X, let
Z
V (Xjz; ) = (X E (Xjz; ))2 dFXjz; ( )

63
denote the average squared deviation of X within program z; . The moment based on the
within program variation is

mw ( ) = E [V (Xjz; )]
Z
= V (Xjz; ) dFZ;
0 12
1 X 1 X
m^w = @xi x i0 A
N j~ (i)j 0
i i 2~ (i)
0 12
1 XB 1 X C
^ Sw ( ) =
m @xi x i0 A :
NS ~ s (i) 0
i;s i 2j ~ s (i)j

When X is vector valued, one could stack components, or replace the conditional variance
V (Xjz; ) with a covariance. I use the within program variance for all characteristics included
in the specications. We may replace X with a function (X) :

3. Covariance between resident characteristics and the average characteristics of a residents


peers. If X = (X1 ; X2 ) where X1 and X2 are scalars, the quantity
Z
E [X1 E [X2 jz; ]] = X1 E [X2 jZ; ] dFX;z; ( )

is the covariance between a residents characteristic X1 and the average characteristics of the
residents peers X2 . The moment can be written as

mp ( ) = E [X1 E [X2 jz; ]]


2 3
1 X 1 X 1X X 1
m
^p ^ Sp ( ) =
m x1;i 4 x2;i0 x2;i0 5 :
N j~ (i) n figj S s ~ s (i) n fig
i0 2~ (i)nfig i0 2~ s (i)nfig

In general, one could consider two separate functions of X instead of X1 and X2 or the same
variable X: I use the covariance between the continuous characteristics of the residents and
peer averages of each characteristic included in the specications.

Alternatively, one could combine moments of the second and third type using the notation to
specify the second type of moments. One would match the entries in the upper triangular portion
of within program covariance matrix.

A.2 A Bootstrap
The number of programs in a given market is denoted Jt . Each program has a capacity cjt that
is drawn iid from a distribution Fc with support on the natural P numbers less than c. The total
number of positions in market t is the random variable Ct = cjt . In each market, the number of
residents Nt is drawn from a binomial distribution B (Ct ; pt ) for pt 1. The vector of resident and
program characteristics zjt ; zijt ; xi ; rjt ; "i ; i ; jt ; jt are independently sampled from a population
distribution. The distribution of program observable characteristics (zjt ; zijt ) may depend on cjt
while all other characteristics are drawn independently.

64
Agarwal and Diamond (in progress) study asymptotic theory under this sampling process in the
case of a single market J ! 1. Limit theorems for the estimator is not yet complete. Monte Carlo
simulations based on inference procedures for standard simulation estimators for the model with
exogenous characteristics and preference heterogeneity have a decreasing root mean square error
with increase in sample size. In these simulations, I used a parametric bootstrap that accounts
for the dependent data structure to estimate the asymptotic variance of the moments, and a delta
method to estimate the asymptotic variance of the parameter.
The data can be seen as generated from an equilibrium map from and the distribution market
participants. Standard Donsker theorems apply for the sampling process for market participants.
The inference method above should then be consistent if a functional delta method applies to
this map i.e. the distribution of the observed matches is (Hadamard) dierentiable jointly in
the parameter and the distribution of observed characteristics of market participants (at the
population distribution of characteristics, tangentially to the space of regular models). Monte
Carlo evidence is consistent with this.
I approximate the limit distribution of ^msm as the number of programs in each market grows
using

p h ip
1
J ^msm 0
0
W 0
W ^ ^msm
J m m ( 0)
d
! N (0; )
0 1 0 1
= W W V tot W 0 0
W
1
V tot = V + VS (20)
S
where W is the weight matrix used in the objective function, = ( 0 ) isPthe gradient of m ( )
evaluated at 0 , and V tot is the asymptotic variance in m ^ S ( 0 ), and J = Jt . The asymptotic
variance V tot in m
^ ( 0 ) is the sum of the variance due to two independent process: the sampling
variance V arising from sampling the observable characteristics of residents and programs in the
economy and the simulation variance VS due to the sampling unobservable traits of the residents
and programs. Note that the sampling variance needs to include the variance in m ^ arising from
uncertainty in estimating ^jt in dierent observed samples of programs. The simulation variance
is scaled down by S, the number of simulations used to compute m ^ S ( ) during estimation. Since
closed form solutions for the moments are not available, I use numerical and simulation techniques
to calculate each of the unknown quantities , VS , V tot .
To estimate ( 0 ), I construct two-sided numerical derivatives of the simulated moment function
m^ ( ) using the observed population of residents and programs. Since m ^ S ( ) is not smooth due
to simulation errors, extremely small step sizes and a low number of simulation draws can lead to
inaccuracies. For this step, I use 10,000 simulation draws and a step size of 10 3 . The simulation
variance is estimated by calculating the variance in 10,000 evaluations of m ^ S ^msm , each with
a single simulation draw and using the observed sample of resident and program characteristics.
Since these two calculations keep the set of observed residents and programs constant, these two
quantities can be calculated independently in each of the markets.
As noted, the sampling variance in m ^ ( ) needs to account for the fact that the control variable
^jt is estimated. It also needs to account for the dependent structure of the match data. I use the
following bootstrap procedure to estimate V .

1. For each market t, sample Jt program observable characteristics from the observed data

65
n oJt
fzjt ; rjt ; qjt gJj=1
t b ; rb ; qb
with replacement. Denote this sample with zjt jt jt
j=1

(a) Calculate ^ b ; ^b and the estimated control variables ^bjt as in the estimation step.
PJt b Nt
2. Draw Ntb from B j=1 qjt ; Qt and a sample of resident and resident-program specic ob-
n oJt Nt b

servables xbit ; zijt


b from the observed data, with replacement.
j=1 i=1
n oB
3. Simulate the unobservables to compute m ^ 1;b ^msm the vector of simulated moments
b=1
using the bootstrap sample economy. The variance of these moments is the estimate I use for
V.

Essentially, the bootstrap mimics the data generating process to sample a new set of agents
from the population distribution to form an economy. It replaces the set of observed characteristics
of the residents and programs with the empirical distribution observed in the data. Given this
economy, it computes ^jt and the moments at a pairwise stable match at ^. The covariance of
the moments across bootstrap iterations is the estimate of V^ . The uncertainty due to simulation
error V^ S is approximated by drawing just the unobserved characteristics.47 In a large economy,
consistency of each of these quantities implies the consistency of the estimate

^ = ^ 0W ^
1
^ 0W 1 1
V^ + V^ S W 0 ^ ^ 0 W ^ : (21)
S

A.2.1 Weight Matrix


It is well known that the choice of weight matrix can aect e ciency. This choice is particularly
important when the number of moments is much larger than the number of parameters. A common
method uses a rst stage consistent estimate of 0 to obtain variance estimates V^ and V^ S to compute
1
the optimal weight matrix W ^ = V^ + 1 V^ S that can be used in the second stage. One may
S
implement the rst step of obtaining a consistent estimate of 0 using any positive denite matrix
W , with the identity matrix as the most commonly used rst-step weight matrix. In this application,
a two-step procedure is computationally prohibitive. In Monte Carlo simulations with this dataset,
I found that using the identity matrix was often inaccurate and left us with a poor estimate
of 0 . Instead, a weight matrix W ~ calculated using the following bootstrap procedure seemed to
approximate the optimal weights fairly well. For each market t, with replacement, randomly sample
Jt programs and the residents matched with them. Treat the observed matches as the matches in
B
the bootstrap sample as well.48 Compute moments m ~ b b=1 from the sample and compute the
variance V~ and set W ~ = V~ 1 . While this weight matrix need not converge to the optimal weight
matrix, the only theoretical loss is in the e ciency of the estimator. This weight matrix also turns
1
out to be close to one that would be calculated as W ^ = V~ ^msm + 1 V^ S ^msm where
S
^msm is the estimate of 0 using W^ sub as the weight matrix, and V~ ^msm and V^ S ^msm are
the sample and simulation variance that are estimated as described earlier.
47
Justifying the use of a nite number of simulation draws S as J ! 1 needs a stochastic equicontinuity condition
on the empirical objective function (see Pakes and Pollard, 1989). Given the incomplete econometric theory, I use
1,000 simulations to mitigate concerns on this front.
48
Note that a submatch of a stable match is also stable. Hence, the constructed bootstrap match is also stable.

66
A.3 Optimization Algorithm
The function dened in equation (10) may be non-convex and may have local minima. Further,
since m^ S ( ) is not smooth as it is simulated. Gradient based global search methods can perform
very poorly in such settings. I use an extensive derivative free global search followed by a renement
step that uses a derivative free local search to compute the estimate ^msm .
R
The global search is implemented using MATLAB s genetic algorithm and a bounded para-
meter space based on initial runs (Goldberg, 1989). The algorithm is derivative free, making it
particularly useful for non-smooth problems. Further, the stochastic search method retains para-
meter values with low tness (poor values of the objective function) for a signicant number of
generations in the population but explores the rest of the parameter space using random innova-
tions. This feature makes it attractive for use in settings where local optima may cause some other
algorithms to "get stuck" in these local minima. In Monte Carlo experiments the algorithm seemed
to out-perform other commonly used global optimization techniques such as multi-start algorithms
with local search, directed search and simulated annealing.
As with the vast majority of optimizers working with non-convex problems, there is no guarantee
that the genetic algorithm nds the global optimum. I conducted three initial genetic algorithm
runs to with separately seeded populations of size 40, cross-over fraction of 0.75, one elite child,
an adaptive mutation scale of 4 and shrinkage of 0.25. These extensive runs were used to generate
starting values for the local searches.
Local searches using starting values yielding the lowest two to three objective function and from
similar models were implemented. The step is conducted to rene the estimate ^msm and to be
thorough in the search for the global minimum. I used the subplex algorithm (Rowan, 1990), a
derivative free optimization routine. It is a variant of the Nelder-Meade algorithm that is more
robust for problems with more than a few dimensions. The rened parameter was always close to
the one found by the global optimization routine. However, it may be liable to not converge to a
minimum. For this reason, I use up to three successive runs of the subplex algorithm implemented
in the toolbox NLOpt for these local runs (Johnson, 2011). Each run restarts the algorithm using
the optimum found in the previous run. I do not repeat the local search if the change the point
estimate between the starting value and the optimum is less than 10 6 in Euclidean norm. Two
iterations were always su cient. I also veried that the reported point is at least a local minimum
using one dimensional slices of the parameter space and proling the objective function in the
direction of other global search results and local minima that may have been found.
My experience with Monte Carlo experiments suggests that this method is very successful in
nding a parameter value close to the true parameter. Although I did not extensively benchmark
this procedure against other optimization procedures, the method also seems faster than grid search,
multi-start with a local optimization using subplex and the simulated annealing algorithm.

67
B Parameter Estimates
Table B.1 presents point estimates of the models discussed in Section 7 and three additional models.
Two of the additional models do not allow for heterogeneity in preferences. The nal additional
model is a version of specication (1) in Table 8 that uses the instrument.
Panel A presents parameter estimates for the distribution of residents preferences and Panel
B presents estimates for the human capital index. As mentioned in the text, these point estimates
are not directly interpretable in economically meaningful terms. Table 8 translates a subset of
coe cients from Panel A into monetized values by dividing a given coe cient by the coe cient on
salaries, and scaling them into dollar equivalents for a one standard deviation change.
First, comparing coe cients on salaries from specications (1) through (3) to the corresponding
specications (4) through (6), we see that accounting for endogeneity in salaries reduces the point
estimate on the salary coe cient. Many of the other coe cients are not substantially altered by
the inclusion of the control variable and the programs own reimbursement rates. The annual rent
and NIH funding of major a liates are two exceptions. This may be a consequence of correlation
between reimbursement rates and these covariates.
Unfortunately, the estimates from specication (6) are not economically interpretable because
of the negative coe cient on salaries but is consistent with the general drop in coe cient when
using wage instruments. The primary economic implication of the drop in coe cient in salaries on
including the instrument, at least for specications (4) and (5), is that the willingness to pay for
programs increases substantially. Specication (4) results in willingness to pay measures that are
implausibly large. I attribute this non-robustness to a weak instrument due to the limited variation
in salaries. Methods for weak-identication robust estimation are not well developed for non-linear
models such as this and are computationally burdensome (Stock, Wright, and Yogo, 2002).
Comparing estimates from specications (1) and (2), we see changes in the estimated coe cient
on NIH funding of major a liates, salaries and the medicare wage index, and rent. Note that
the change in coe cient on rent does not appear to have economically meaningful impact on the
willingness to pay for programs located in high rent areas as compared to programs in low rent
areas. Table 9 shows that specications (1) and (2) yield similar quantities on this front. A reason
for this is that medicare wage index and rents are highly correlated with each other. We also see
that the relative magnitude on coe cients on rural birth interacted with rural program, program
location in birth state and program location in medical school state have similar relative magnitudes
although large in overall magnitude in specication (1). I attribute this dierence to additional
unobserved heterogeneity in specication (1), due to which similar geographic sorting needs to be
explained with higher preference for these characteristics.

68
C Wage Competition
C.1 Expressions for Competitive Outcomes
I rst characterize the competitive equilibria of the model. The expression in equation (17) follows
as a corollary. For clarity, I refer to the quality of program 1 as q1 although I normalize it to 0 in
the model presented in the text.

Proposition 3 The wage wk paid to resident k by program k in a competitive equilibrium is char-


acterized by

w1 2 [ aq1 ; f (h1 ; q1 )]
wk wk 1 + a (qk qk 1) 2 [f (hk ; qk 1) f (hk 1 ; qk 1 ) ; f (hk ; qk ) f (hk 1 ; qk )]

Proof. Since the competitive equilibrium maximizes total surplus, resident i is matched with
program i in a competitive equilibrium. The wages are characterized by

IC (k; i) : f (hk ; qk ) wk f (hi ; qk ) wi + a (qk qi )


IR (k) : aqk + wk 0; wk f (hk ; qk ) :

First, I show that IR (k) is slack for k > 1 as long as IR (1) and IC (k; i) are satised for all
i; k. Since IC (1; k) is satised,

f (h1 ; q1 ) w1 f (hk ; q1 ) wk + a (q1 qk )


) wk w1 + f (hk ; q1 ) f (h1 ; q1 ) + a (q1 qk )
aqk (22)

where the last inequality follows from f (hk ; q1 ) f (h1 ; q1 ) 0 and w1 + aq1 0 from the IR (1) :
Also, IC (k; 1) implies that

f (hk ; qk ) wk f (h1 ; qk ) w1 + a (qk q1 )


) wk f (hk ; qk ) f (h1 ; qk ) + w1 a (qk q1 )
f (hk ; qk ) f (h1 ; q1 ) + w1 a (qk q1 )
f (hk ; qk ) (23)

where the last two inequalities follow since w1 f (h1 ; q1 ) from IR (1) and a (qk q1 ) 0.
Equations (22) and (23) imply IR (k).
Second, I show that it is su cient to only consider local incentive constraints, i.e. IC (i; i 1)
and IC (i; i + 1) for all i imply IC (k; m) for all k; m. Assume that IC (i; i 1) is satised for all
i. For rms i 2 fm; : : : ; kg, this hypothesis implies that

f (hi ; qi ) wi f (hi 1 ; qi ) wi 1 + a (qi qi 1) :

Summing each side of the inequality from i = m to k yields that


k
X
f (hk ; qk ) wk [f (hi 1 ; qi ) f (hi 1 ; qi 1 )] + f (hm 1 ; qm ) + a (qk qm 1) wm 1:
i=m+1

69
Since each f (hi 1 ; qi ) f (hi 1 ; qi 1 ) f (hm 1 ; qi ) f (hm 1 ; qi 1 ) for i m,
k
X
f (hk ; qk ) wk [f (hm 1 ; ql ) f (hm 1 ; qi 1 )] + f (hm 1 ; qm ) + a (qk qm 1) wm 1
i=m+1
= f (hm 1 ; qk ) + a (qk qm 1) wm 1: (24)

Hence, IC (k; m) is satised for all m 2 f1; : : : ; kg. A symmetric argument shows that if IC (i; i + 1)
is satised for all k, then IC (k; m) is satised for all m 2 fk; : : : ; N g
To complete the proof, note that local ICs yield the desired upper and lower bounds.

Corollary 4 The worker optimal competitive equilibrium wages are given by


k
X
wk = f (h1 ; q1 ) a (qk q1 ) + [f (hi ; qi ) f (hi 1 ; qi )]
i=2

and the rm optimal competitive equilibrium wages are given by


k
X
wk = a (qk q1 ) + [f (hi ; qi 1) f (hi 1 ; qi 1 )]
i=2

C.2 Proposition 1
For clarity, I refer to the quality of program 1 as q1 although I normalize it to 0 in the model
presented in the text. As before, I limit attention to production technologies that lead to positive
assortative matching between h and q. To focus on the split of the total production, consider two
production technologies for which the total output produced by each matched pair is the same for
the two technologies. Thus, each N -vector of outputs y = (y1 ; : : : ; yk ) denes a family of production
functions F (y) = ff : f (hk ; qk ) = yk g where yk denotes the output produced by the pair (hk ; qk ) :
The two extremal technologies above in this family are given by fy (hk ; ql ) = yk and fy (hl ; qk ) = yk
for all l 2 f1; : : : ; N g. Let wkf o (f ) (likewise wwo (f )) denote the rm-optimal (worker-optimal)
competitive wage under technology f .
I prove a slightly stronger result here as it may be of independent interest. This result shows
that the split of surplus in cases other than f and f are intermediate.

Theorem 5 In the worker-optimal (rm-optimal) competitive equilibria, each workers wage under
f 2 F (y) is bounded above by her wage under fy and below by her wage under fy .
Hence, for all f 2 F (y), the set of competitive equilibrium wages of worker k is bounded below
by wkf o fy = aqk and above by wkwo fy = yk aqk :

Proof. I only derive the bounds for the worker optimal equilibrium since the calculation for the
rm optimal equilibrium is analogous. From the expressions in corollary 4,

wkwo fy = fy (h1 ; q1 ) a (qk q1 )


= y1 a (qk q1 )

70
since the terms in the summation are identically 0. For any production function, f 2 F (y),
k
X
wkwo (f ) = f (h1 ; q1 ) a (qk q1 ) + [f (hi ; qi ) f (hi 1 ; qi )]
i=2

y1 a (qk q1 ) = wkwo fy

since f (h1 ; q1 ) = y1 and f (hi ; qi ) f (hi 1 ; qi ) 0. Similarly, note that

wkwo fy = yk a (qk q1 )

and since each f (hi ; qi ) f (hi 1 ; qi ) f (hi ; qi ) f (hi 1 ; qi 1 ),

wkwo (f ) f (hk ; qk ) a (qk q1 )


= yk a (qk q1 ) = wkwo fy :

Proposition 1 follows as a corollary.


Proof. For any y = (y1 ; : : : ; yk ) and production function f 2 F (y), the prot of rm k is given by

f (hk ; qk ) wk = yk wk
yk wkwo fy
= a (qk q1 )

C.3 Worker Optimal Equilibrium: Algorithm


The rst step uses a linear program to solve for the assignment that produces the maximum total
surplus. Let aij be the total surplus produced by the match of resident i with program j. This
surplus is the sum of the value of the product produced by resident i at program j and the dollar
value of resident is utility for program j at a wage of 0.49 With an abuse of notation of the letter
x, let xij denote the (fraction) of resident i that is matched with program j. Sotomayor (1999)
shows that the surplus maximizing (fractional) matching is the solution to the linear program
X
max xij aij (25)
fxij g
subject to
0 xij 1
X
xij 1
j
X
xij cj :
i
49
As mentioned in footnote 41, I assume that the equilibrium is characterized by full employment. If utilities are
normalized so that an allocation is individual rationality if the resident obtains non-negative utility, then ij at the
resident is least preferred program j must exceed the negative of the dollar monetized utility resident i obtains at j
at a wage of zero.

71
Interpreting xij as the fraction of total available time resident i spends at program j, the rst
two constraints are feasibility constraint on the residents time. The third constraint says that the
program does not hire more than its capacity cj . For a generic value of aij , the program has an
integer solution. This formulation is computationally quicker than solving for the binary program
with xij restricted to the set f0; 1g. I check to ensure that the solutions I obtain are binary.
The second step seeks to nd the worker optimal wages supporting this assignment. The
algorithm is based on the dual formulation of the one-to-one assignment problem, which has an
economic interpretation given by Shapley and Shubik (1971). Assume for now that cj = 1 for all
j. If ui is the utility imputation for resident i and vj is the imputation for program j, then a core
allocation ensures that for all i, j ui + vj aij . This inequality holds for a core allocation if i and
j are matched since utility in fully transferable, and if i and j are not matched since otherwise i
and j would block the allocation.50 A particular element in the core can be found by solving the
problem
X X
min ui + vj
fui g; fvj g
subject to
ui 0; vj 0
ui + vj aij

where the rst inequalities are the individual rationality inequalities and the second is the no
blocking or incentive compatibility inequality.
In the many-to-one assignment
P problem I solve, the total production from a set of residents R
for a program j is given by i2R fij where fij is the production
P from i matching with j. Hence,
the total surplus from assignments to program j is given by i2R aij . Since the total surplus at a
program is the sum of the surpluses from each residency position, one could rewrite this many-to-
one problem as a one-to-one problem between residents and residency positions. This reformulation
needs the additional restriction that a resident may not block an allocation with another position
at the same program. Let k denote a residency position and jk denote the program that oers this
position. An assignment to positions fyik g with imputations fui g and fvk g is blocked if there exist
i and k such that ui + vk < aijk and yik0 = 0 for all positions k 0 at program jk . In other words,
an allocation is blocked only by a resident and position pair in which the position is at a program
other than
n theo residents assignment.
Let xij denote the optimal assignment assignment found in the rst step and fyik g be an
associated optimal position assignment. The solution to the following linear program gives us
imputations corresponding to the worker-optimal allocation:
X
max ui (26)
fui g;fvk g
subject to
ui 0; vk 0
X X X
ui + vk xij aij
ui + vk = aijk if yik = 1
ui + vk aijk if xijk = 0:
50
See Roth and Sotomayor (1992) for a more detailed discussion of core allocations and the no blocking condition.
Sotomayor (1999) constructs the dual formulation of the many-to-one problem.

72
The second constraint is implied by the optimality of the assignment x as no feasible imputation
may provide a larger total surplus. This constrain always binds since the problem maximizes the
surplus that accrues to the residents and none of the other constraints bound this surplus. The
third constraint asserts that the imputations supporting y result from lossless transfers between
a resident her matched program. The nal constraints are no blocking constraints between worker
i and a position at an unmatched program. Calculating the transfers implied by a solution to this
problem is straightforward.
The linear programs were solved using Gurobi Optimizer (http://www.gurobi.com).

C.4 Implicit Tuition


I prove a more general result for many-to-one assignment games that subsumes Proposition 2. To
do this, I rst need to introduce some notation. A many to one assignment game between workers
iP2 f1; : : : ; N g and rms j 2 f1; : : : ; Jg. The capacity of rm j is cj . I focus on the case when
j cj N . Worker i; with human capital hi , produces f (hi ) 0 at rm j, independently of the
other workers at the rm. An empty slot produces 0. The utility worker i receives from working
at rm j at a wage of w is uij + w. Since the wage transfer is lossless, the total surplus produced
by the pair i; j under the production function f is afij = uij + f (hi ). I assume that each uij 0.
Rigorous treatments of these concepts are given in Roth and Sotomayor (1992), but I recall
denitions for clarity. For a one-to-one assignment game, an assignment is a vector x = fxij gi;j
where
P xij = f0; 1gPand xij = 1 denotes that i is assigned to j. The assignment x is feasible
if i xij 1 and j xij cj . An allocation is the pair (x; w) of an assignment x and wages
w = fwij gij with wij 2 R. The allocation is feasible if x is feasible: An outcome is a pair ((u; v) ; x)
of payos u = P fui gi and v = fvj gj and anP assignment x. Given an allocation, we can compute the
outcome ui = j xij (uij + wij ) and vj = i xij (f (hi ) wij ). The outcome is feasible if it can be
supported by a feasible allocation (x; w). n P o
In the many-to-one case, we refer to an assignment of positions fyi;p gi;p where p 2 1; : : : ; j cj
denotes a position p and a rm. Let jp denote the rm oering position p. Each assignment x
induces a unique canonical assignment of positions y where the positions in the rm are lled by
residents in order of their index i. Its obvious that the function between an assignments and its
canonical assignments of positions is bijective. Likewise, with a slight abuse of notation, we can
dene denition for an allocation of positions using a pair (y; w), where w = fwip g. For an alloca-
tion (x; w) we can obtain an allocation of positions (y; w) ~ by setting y to the canonical assignment
P
and the salaries to w ~ip = wijp . The surplus of position p is dened as vpf = yip (f (hi ) wip )
f P
and of worker i by ui = yip uijp + wip . Feasibility of outcomes in this setting can be dened
analogously to the previous case. Rigorous treatments of these concepts are given in Camina (2006)
and Sotomayor (1999):
A feasible outcome ((u; v) ; x) is stable if ui 0, vj 0 and ui + vj aij for all i, j. The
allocation (x; w) is a competitive equilibrium if the demand of each worker and rm at prices
given by w: The equivalence of stable outcomes and competitive equilibria is well known. For the
many-to-one case, an with ((u; v) ; y) is stable if for all i, p, ui 0, vp 0, ui + vp aijp if yip = 1
or xijp = 0. Consequently, unmatched worker and rms can block if they can produce agree to a
mutually benecial outcome. A matched worker and rm pair can also block an outcome if the
sum of their payos is lower than the total surplus they produce. The correspondence between
many to one stable outcomes and competitive equilibria is noted in Camina (2006). In many to
one settings, the demand for rm positions is dened by restricting the wages for each position at
a rms to be the same for a given worker. Dierent workers may, however, face dierent prices.

73
Now, we are ready to prove the desired result from which the one-to-one matching case follows
trivially by allowing for only one position at each rm.
~
Proposition 6 The equilibrium assignment of positions for the games afij and afij coincide. Fur-
~ ~
ther, if uf and vpf are position payo s for the game af , then uf = uf + f~ (hi ) f (hi ) and vpf = vpf
i i i
~
are equilibrium payo s under the surplus afij . Consequently the implicit tuition for each position is
~
the same for the games af and af .
~
Proof. Sotomayor (1999) shows that equilibria for af and af exist and maximize the total surplus
~
in the set of feasible assignments. Towards a contradiction, assume that y f is an equilibrium for
~
a but not for a . The feasibility constraints are identical in the two games, and so both y f and
f f
~ ~ ~
y f are feasible for both games. Since y f maximizes the total surplus under af ,
X ~ ~ X ~
f
afijp yip > afij yip
f

i;p i;p
X f~
XX f~
X XX
) afijp yip + f~ (hi ) f (hi ) yip > afijp yip
f
+ f~ (hi ) f
f (hi ) yip : (27)
i;p i p i;p i p

Since every worker-rm pair produces positive surplus and the total capacity exceeds the num-
ber of workers, there cannot be any unassigned workers in any feasible surplus maximizing al-
P f P f~ P ~ f~
location, i.e. p yip = p yip = 1 for all i. Hence, we have that p f (hi ) f (hi ) yip =
P ~ f P ~ P
i f (hi ) f (hi ) yij . The inequality in equation (27) reduces to i;p afijp yip
f
> i;p afijp yip
f
, a
contradiction to the assumption that y f is an equilibrium assignment for y f . This contradiction
implies that the equilibrium assignments of positions under the two games coincide.
To
n show that o the second part of the result, consider the payos for af where f (hi ) =
max f~ (hi ) ; f (hi ) . I show that ufi = ufi + (f (hi ) f (hi )) and vpf = vpf . The comparison
of equilibrium payos for f~ and f follows immediately from this. Note that for all i and p, uf 0 i
and vjf 0 implies vjf 0 and ufi 0 since f (hi ) f (hi ) 0. It remains to that ufi +vpf afijp
if i is assigned to position p or if i is not assigned to rm jp . Note that for all i and p, we have that
if ufi + vpf afip ,

ufi + vpf = ufi + f (hi ) f (hi ) + vpf


afijp + f (hi ) f (hi )
= afijp :

To complete the proof I need to show that the payos to each position coincides under the
worker-optimal stable outcome. Let ufi and vpf denote this outcome for the game af . Let u0i and vp0
be the worker-optimal outcome under the function f (hi ) = 0 for all hi . I showed earlier that the
optimal assignments coincide for these two cases. I have shown that u0i + f (hi ) and vp0 is stable for
af . Towards a contradiction, assume that ufi u0i + f (hi ) with strict inequality for at least one i.
This implies that ui f (hi ) is stable for a0 . Hence, ufi f (hi ) u0i with strict inequality for at
f

least one i, contradicting the assumption that u0i and vp0 are part of the worker-optimal outcome.
P P
If y is the optimal assignment, this shows that vp0 = i yip a0ip u0i = i yip afip ufi = vpf ,
proving the result.

74
D Rural Hospitals
D.1 Suggestive Evidence on Preference Heterogeneity for Rural Doctors
If preferences for resident traits other than a single human capital index were important, one
expects that two residents at the same program have dissimilar academic qualications if they
dier on these dimensions. More concretely, one may expect that at rural programs, a rural born
resident is academically less qualied than her peer born in an urban location. This may happen
because a rural program prefers a rural born resident to an equally qualied urban born resident. To
assess whether rural born residents in rural hospitals are more qualied than their urban colleagues,
I estimated the regression
xi = rurali + program_f e (i) + ei ;
where xi is a measure of medical school quality for resident i, rurali is a dummy for a rural born
resident and program_f e (i) is a xed eect for program (i), resident is match.
The results presented in Table D.3 suggests that this may not be of primary importance.
Columns (1) and (6) show that rural born residents matched with rural hospitals hail from medical
schools that have, on average, only about 0.06 log points less NIH research funding that their peers
born in urban areas and are about one percentage point more likely to have an MD degree. Note
that the standard deviation in log NIH funding is 1.23. Neither estimate is statistically signicant.
Although not presented here, the conclusion is robust to using median MCAT score as an indica-
tor of a residents quality in place of research funding or medical school ranks. If program-year
xed eects are included in place of program-xed eects, the estimates are more imprecise and
the hypothesis that the medical school qualities of the rural born residents at rural hospitals is
same as their urban born peers still cannot be rejected. Columns (3) and (8) of the table show
this observation is despite the fact that the average rural born residents hails from an observably
dierent medical school than their urban counterparts.
As a validation exercise, I ran similar regressions using gender in place of rural birth. Since
accreditation guidelines prohibit programs from discriminating on the basis of sex,51 one may
reasonably expect that there is no gender based discrimination by residency programs. Columns
(5) and (6) show that although the average female resident hails from medical schools that is better
funded than male residents in their cohort, their medical school quality is no dierent from their
male colleagues in their residency program.
While these results are reassuring, they are not denitive on the lack of preference heterogeneity.
The somewhat large standard errors and the fact that these observables are proxies for resident
quality are the primary reasons for this reserved interpretation. Nonetheless, they suggest that
estimates may not suer from large biases.

D.2 General and Partial Equilibrium Eects of Financial Incentives


I consider a partial equilibrium alternative to simulations presented in Section 9.1 that may be
analytically inexpensive but could, in some situations, perform fairly well. Suppose a policymaker
could survey rural residency program directors to determine the impact of incentives for rural
training on the residents that choose to train there. For instance, a survey such as the National
GME census used in this study could be also solicit a program directors judgement of the number
and quality of residents that would match to the program if it unilaterally raised its salary. The
51
The institutional requirements from the Acceditation Council for Graduate Medical Education (ACGME) states
that "ACGME-accredited programs must not discriminate with regard to sex, race, age, religion, color, national
origin, disability, or any other applicable legally protected status."

75
responses could be used to predict the impact of the nancial incentives studied earlier by simply
aggregating the number of positions lled and resident types in rural areas. Such a calculation
ignores the inuence of a resident who is on the margin between two rural programs and an urban
program on the nal results. By ignoring the fact that salaries at all rural programs would be
increased simultaneously, the calculation acts as if program directors at both rural programs believe
that this resident is matched to their program.
The hypothetical benchmark can be simulated using the estimated model by aggregating pre-
dicted changes in the matches from the unilateral salary increases at rural hospitals. Panel A of
Table D.4 compares results for $5,000, $10,000 and $20,000 increase in salary to rural programs.
Comparing the results with those in Panel A, it appears that this simple partial equilibrium analysis
would do fairly well at predicting the overall impact of subsidies to rural programs. The impact
on resident quality and numbers are only slightly overstated. This observation is because at the
estimated parameters, most residents are indierent between a rural hospital and an urban hospi-
tal rather than two rural hospitals, and the number of rural positions is only about a tenth of all
positions in the market. This fact is reected in the distribution graphed in Figure 2.
Panel B of Table D.4, compares outcomes for incentives for training in rural programs as well as
medically underserved states. The ACA redistributes previous allocated funding to urban programs
but currently unused to residency training to (i) rural programs, (ii) states in the bottom quartile of
the physician to population ratio and (iii) states in the top 10 in numbers of people living in a Health
Physician Shortage Area. I label these states52 as medically underserved states and compare the
partial and general equilibrium impacts of nancial incentives. We see that for a $5,000 incentive,
the partial equilibrium analysis predicts an 11% larger impact of subsidies. Notice that for larger
subsidies, the dierence between the partial and general equilibrium predictions in the change in
the number of matches is smaller. For a larger subsidy, the partial equilibrium analysis overstates
the change in quality of residents matched at programs in medically underserved states.
Qualitatively similar, but quantitatively larger answers were obtained from a simulation exercise
in which I randomly subsidized one-quarter of the residency programs. Panel C presents these
results. These simulation experiment shows that the model is capable of capturing potentially
important general equilibrium eects of policy interventions. The size of these eects depends on
the primitive preferences in the market structure as well as the scope of the intervention.

52
CMS identied Montana, Idaho, Alaska, Wyoming, Nevada, South Dakota, North Dakota, Mississippi, Florida,
Peurto Rico, Indiana, Arizona and Georgia as in the bottom quartile of physicians to population ratio. Lousiana,
Mississippi, Peurto Rico, New Mexico, South Dakota, District of Columbia, Montana, North Dakota, Wyoming and
Alabama are in the top 10 in numbers of people living in primary care HPSAs. Peurto Rico is exlcuded from this
analysis.

76
E Data Construction
E.1 National GME Census
The American Medical Association (AMA) and the Association of American Medical Colleges
(AAMC) jointly conduct an annual National Graduate Medical Education Census (GME Track)
of all residency programs accredited by the Accreditation Council for Graduate Medical Education
(ACGME). There are two main components of the census: the program survey and the sponsoring
institution survey. The program survey, which is completed by the program directors, also gathers
information about the residents training at the programs. Fields from the surveys are used to
update FRIEDA Online, a publicly accessible database and the AMA physician masterle. Since
2000, the GME Track has been pre-augmented with data from the Electronic Residency Application
Service (ERAS) and the National Residency Matching Program (NRMP).53 The AMA provided
records from the National GME census on all family medicine residency training programs in the
Unites States between 2003-2004 and 2010-2011. The 2011-2012 data was provided after the initial
empirical analysis was completed.
The data les and identiers are structured as follows:

1. Program le with program name, characteristics, a unique identier for the program. This
le also contains the identier for the programs a liated hospitals.

2. Resident le with resident characteristics, program code, country code and medical school
code. Two separate les identify the country and MD granting medical schools by name.

3. Institution le with the institution name, characteristics and a unique identier.

4. Two bridge les. One delineating the relationships between programs and institutions (usu-
ally hospitals) as primary institution, sponsoring institution or clinical a liate, and the other
delineating the relationships between institutions and medical schools as major a liate, grad-
uate a liate or limited a liation.

E.1.1 Sample Construction


The baseline sample is constructed from the set of all family medicine residency programs accred-
ited by the ACGME and rst-year residents training at such programs. From this set, I exclude
programs in Puerto Rico, military programs and their rst-year residents. Less than 20 programs
and 123 residents are excluded due to these cuts. I also exclude programs that do not participate
in the National Residency Matching Program and the residents matched to these program. These
constitute less than 9 programs and 22 residents in each year. Finally, I also exclude the set of pro-
grams not oering any rst-year positions, and programs that have no reported rst-year matches
during the entire sample period from the analysis. This nal exclusion leads to 21 programs being
dropped from the sample in 2003-2004, and less than 5 programs being dropped in the other years.
A detailed breakdown of the annual counts of the sample selection procedure is provided in
Table E.6.
53
The details of the data collection procedure are outlined on http://www.ama-assn.org/ama/pub/education-
careers/graduate-medical-education/freida-online/about-freida-online/national-gme-census.page.

77
E.1.2 Merging GME Track Data
Programs to Clinical Site I wish to identify the primary hospital at which the clinical training
of the residents in the programs occur. The AMA data identies the relationship between programs
and sponsoring institutions and hospitals in two ways. The program les records list each programs
primary site. The program-institution bridge le records the sponsoring institution, (a second)
primary clinical site and other a liated institutions.
The program-institution bridge has the drawback that the clinical site of the program is not very
well reported in the program-institution bridge with at most 94 observations (amongst all ACGME
family medicine programs) in any given year whereas the sponsoring institutions are often medical
schools or health systems. In order to avoid prioritizing sponsoring institutions or clinical sites
from the bridge le, I pick the primary clinical site as reported in the program le as the starting
point.
In a large number of cases, the institutition type of the primary institution was a medical
school or a health system, not a hospital. Consequently, the hospital institution data for these
observations were not available. In the vast majority of these cases, the primary institution, at
some point during the sample period was reported as a dierent site, one that was a hospital. I
checked all cases in which the primary institutition was not a hospital or clinic as identied by an
institution type eld in the institution le, or had a bed count of zero. When possible, I changed
the primary hospital of a program from the listed program according to the following rules:

1. I rst checked the program-institution bridge for a listed primary clinical site that was a
hospital and changed the primary hospital to that primary clinical site.

2. I looked at the closest year in which the program listed a primary clinical site that is a
hospital or clinic and changed it to that hospital or clinic only if the institution was listed as
an a liate or sponsor in that year as well.

The changes aected a total of 285 out of 3441 program-year primary clinical institution rela-
tionships in 109 out of 462 programs in the unrestricted sample of all family residency programs
between 2003-2004 and 2010-2011. In any given year, no more than 43 programs were aected in
any given year.
Finally, 82 program-year observations did not have institution data from the primary sites
based on the designation of primary sites above. These programs were solely sponsored by health
systems or medical schools, and not primarily associated with a hospital. I imputed the hospital
characteristics by taking the mean characteristics of all hospital a liates for these programs. This
imputation populated records in 11 programs in 2003-2004 and 2004-2005 and 10 records in the
other years.

Programs with Medical Schools The link between medical schools and programs is provided
by the AMA through the program-institution bridge followed by the institution-medical school
bridge. The program-institution relationships are categorized into primary clinical sites, sponsors
and a liates. The institution-medical school relationships are categorized as limited, graduate and
major.
I use these relationships to dene two types of a liations for programs to medical schools, major
and minor. A program has a major a liation to a medical school if the primary or sponsoring
institution has a major a liation with a medical school. All other relationships are regarded as
minor relationships. The relationships between programs and medical schools are imputed for all
years between the rst and last year of a major (likewise minor) relationship. I used all relationships

78
since 1996 for this imputation and for 2010-2011, I used the relationships in 2009-2010 as well. For
the unselected sample of family medicine programs between 2002-2003 and 2010-2011, I imputed
relationships for 144 out of 2797 major a liations and 702 out of 3337 minor a liations. The mean
NIH funding across all major and minor a liations are used as the variables for this merge.

E.2 Medical School Characteristics


The National GME Census does not provide data on medical school characteristics. Each medical
school is identied by a number, and only the medical school names for MD granting medical
schools are identied. According to the AAMC, there are 134 accredited MD-granting medical
schools in the United States. In the dataset, I found 135 medical school identiers for MD granting
institutions. Texas Tech University Health Sciences Center School of Medicine appeared with two
dierent ids. I duplicate the elds throughout for that medical school. I next describe the sources
of the data on medical schools and the process used to merge and construct the elds.

E.2.1 NIH Funding Data


The National Institutes of Health organizes the data on its expenditures and makes it available
through RePORT. The records of each project funded by the NIH is available for download through
http://projectreporter.nih.gov/reporter.cfm. The records identify the projects by an application
id and elds include the institution type, total cost and project categories. I include funding for
projects designated to Schools of Medicine, Schools of Medicine and Dentistry, and Overall Medical
as these categories were the major categories at which the recipient was a liated with an MD
medical school. I wished to include funding only for extramural and cooperative research activities,
and training and fellowship programs funded by the NIH in a medical school. So, I dropped activity
codes beginning with G, C, H as these were designated for construction, resource development and
community service. Further, I dropped activity codes beginning with N and Z since those data are
available only after 2007.
I used the records from all project costs incurred in the nancial years 2000 to 2010 that satisfy
the criteria above and aggregated the project costs to the organization name. I wish to construct
the average annual NIH research costs incurred at these medical schools during this period. I infer
that a school was operating during a given year if it secured some NIH funding. All but thirteen
schools secured NIH funding during each of the eleven years in the sample. Six schools did not
receive any NIH funds during this period even though they were operating (as indicated by online
sources) and their eleven year annual average NIH costs were set to zero. For the remaining seven
medical schools, I established the number of years the school was operating by searching for the
history of the school from the history of the medical college published on their websites.
These data were merged with the data from the National GME Census using the medical school
names. Of the 135 MD medical schools in the GME Census, 129 medical schools were matched
successfully to a counterpart in the NIH funding data. I veried that the remaining six schools did
not have any records in project RePORT in the categories considered.

E.2.2 Medical School Admission Requirements (MSAR)


I used the records from the 2010-2011 MSAR publication of the AAMC to augment the medical
school characteristics with the state and the median MCAT score of the admits into a medical
school. The merge was done using the medical school name and MCAT score data was found for all
but seven of the 135 MD granting medical schools. Data on the state the medical school is located
in was found for all MD medical schools.

79
E.3 Medicare Data
Here, I describe the merge and construction of the Case Mix Index and Wage Index variables. The
instrument, based on Medicare reimbursement rates is described in Section G.
I use the records from the Medicare provider les to construct the variables primary care
reimbursement rates, the Medicare wage index and the case mix index. The institution ids for
all a liates were merged with Medicare provider identiers by the name of the provider by using
the 1997 PPS les, and then using the 2010 Impact Files. A second check was conducted for
primary institutions of the programs, and for a liates when primary institutions were not matched
to Medicare data. In a small number of instances, there are multiple matched CMS identiers for
a single institution. Medicare variables were averaged across these multiple matches.

E.3.1 Medicare Wage Index and Case Mix Index


The Center for Medicare Services calculated a Wage Index and Case Mix Index for each provider.54
I merged the CMS data with primary institution. In a small number of instance, the primary
institution did not have a match with Medicare data. In these cases, I calculated the average of
the variable for all a liates with Medicare data. In a total of 63 out of 3441 cases, the case mix
index was not available even for a liates. Here, in the structural estimates, I used an imputed
value from a linear regression on all other characteristics included in the demand system. Finally,
missing values of the wage index were imputed using the geographic denitions Medicare uses to
calculate the wage index.

E.4 Identifying Rural Programs


I use two sources of data to identify the set of rural family medicine program.

1. The American Academy of Family Physicians has a program directory of all family medicine
programs in the United States. The program directory lists the community setting of the
program as one or more of Urban, Suburban, Rural, Inner-city. Programs for which only
rural was listed as the community setting are considered rural programs by this denition.
The records from this directory were scraped on 01/05/2012. I manually merged the set of
rural programs to AMA data using the name of the program, the hospital and the street
address. In the years 2003-2004 to 2010-2011, this procedure identied 438 program-year
observations as rural programs.

2. The program names in the AMA data often directly indicate whether a program is a rural
program or not. For instance, the University of Wisconsin sponsors several programs in family
medicine, one of which is named "University of Wisconsin (Madison) Program" and the other
named "University of Wisconsin (Baraboo) Rural Program." I consider all programs with
rural in the name during the same period of the program as a rural program. This procedure
identied a total of 159 program-year observations as rural programs in the years 2003-2004 to
2010-2011, of which a total of 115 program-year observations overlapped with program-year
observations identied as a rural program using the previous procedure.

In 2010-2011, I checked for contradictions where a program with rural in the program name
listed a community setting other than rural in the AAFP directory. There were a total of 5 programs
54
The les and the description of the calculation for the wage index is given on
http://www.cms.gov/Medicare/Medicare-Fee-for-Service-Payment/AcuteInpatientPPS/wageindex.html and the
Case Mix Index is described on http://en.wikipedia.org/wiki/Case_mix_index

80
that were classied as rural according to rule 2 but not rule 1. Of these, in four cases, the program
directory did not have any information other than the name and address of the program. The
community setting for the remaining program was listed as suburban as well as rural.

E.5 Resident Birth Location


The birth location of the resident is recorded as city, state and country code. The following steps
were carried out to improve the quality of the data and then to identify whether a resident was
born in a rural location in the United States:

1. I convert the AMA country identiers, which are not unique across years, to the corresponding
ISO 3166-1 alpha-3 identier using the country name provided by the AMA. Except for some
former soviet nations and territories of the UK, US and Netherlands, a unique match was
available.

2. The state and country for observations with only the city name were imputed using the state
and country for an identically spelled city if that state-country combination constituted more
than 50% of the observations for that city. This imputation was carried out using the GME
Census data from 1996-1997 to 2010-2011 in ve specialties: internal medicine, pediatrics,
OB/GYN, pathology and family medicine.

3. For US born residents, city-state combinations were geocoded. The observations for which
the geocoder indicated a match with unexpected accuracy (more than, or less than city level
accuracy) were checked by hand and minor spelling errors were corrected. The corrections
were put through the geocoder for a second time. Ambiguous entries were coded as missing
data.

4. The county of birth for US born residents was extracted matched with a list of counties that
belong to a Metropolitan Statistical Area in order to construct the rural birth indicator.

E.6 Other Data


E.6.1 CPI-U
I downloaded the records of the monthly Consumer Price Index for All Urban Consumers from the
Federal Reserve Economic Data (FRED) website. I use the December observation for the CPI-U
for a year.

E.6.2 Rent
Census Data from the 2000 US Census was downloaded from nhgis.org. I used county level aggre-
gates from sample le 1 for population, age and race variables, and from sample le 2 for income
and rent variables. The median gross rent is used as the measure of rent as it adjusts for the utility
payments.
The 2010 US Census did not use the long form on which data on the rent paid is collected.
Consequently, data on the county level median gross rent was downloaded from the 2006-2010
American Community Survey using Social Explorer. These rent numbers are adjusted to 2010
dollars by Social Explorer. The ve-year aggregate was preferred to the annual or three-year
aggregates since the latter did not cover all counties in the US.

81
To construct the median gross rent variable, I convert the median rent data from the 2000 US
Census into 2010 dollars by using CPI-U. A linear interpolation between the 2000 and 2010 rent
data for the interim years.

Merging The city, state and zip code of the program and institutions were used to geocode the
latitude and longitude of the zip codes centroid. These latitudes and longitudes were then used
to determine the county in which the program or institution is located using county shape les
provided by NHGIS. The geographic ids from this process were used to merge these with the data
les. Every program in the sample was successfully matched in this process.

E.7 Miscellaneous Issues


1. For the preference estimates, imputation of salaries for missing data was done for 23 ob-
servations out of 3441 using a linear regression on the other characteristics included in the
model.

2. The program survey asks for the number of rst year positions oered in the next academic
year. I use this as the preferred measure of the programs capacity when available. In ten
instances, this eld was not available and for nine of these instances, it was imputed from the
value of the eld from the previous year. In the remaining instance, the number of rst year
residents in the program was taken to be the number of positions oered. I checked to ensure
that the reported number positions oered next year is equal to the number of matched than
the value of the eld from the previous year.
I nd instances when the number of residents in rst year positions exceeds this capacity
measure. In these cases, I take the maximum of the number matched to the program and
the lagged response to the rst year enrollment as the programs capacity. In more than 75%
of the cases, the number matched did not exceed the reported number of positions by more
than one. Table E.7 summarizes the number of observations aected by this change and the
mean size of the change. One reason for the discrepancy may be residents that repeated their
rst year training or deferred enrollment.

82
F The Distribution of Physician Starting Salaries
The experience adjustment uses the following Mincerian wage regression to capture the impact on
physician productivity:55
ln yi = 0 + 1 ti + 2 t2i + c ci + ei : (28)
Here, yi is the earnings of physician i, ti is the experience of physician i, ci is a vector of controls
and ei is mean zero error. The functional form is motivated by a multiplicative return to human
capital, which increases with job experience up to a maximum before depreciating.
I use records from the restricted-use le of family practice physicians from the Health Physician
Tracking Survey of 2008 to estimate . The survey collects data on the income category of physicians
in the United States, with medical specialty, years practicing medicine and a variety of other elds
related to their medical practice. The survey asks for the income earned by the physician in 2006
from medically related activities, excluding returns on investments in stocks or assets in their
practice. The income eld is coded into groups $50,000, with the lowest category for physicians
with an income under $100,000 and the highest category for physicians with an income of $300,000
or more. I use an interval regression in which ei N (0; e ) to estimate ( ; e ).
Table F.8 presents summaries from the subpopulation of physicians under the age of 60 in 2006,
the year of the income data in the survey. The vast majority of family physicians are salaried and
earn $200,000 or less. Table F.9 presents maximum likelihood estimates from the interval regression
model. The point estimates evidence for concavity in returns to experience and a gender-pay gap
that is well-documented in the empirical literature. A comparison of estimates in columns (2)
and (3) also suggest some heteroskedasticity in the distribution of pay across experience levels.
Column (4) estimates a quadratic functional form for this heteroskedasticity and nds a concave
relationship, with a higher cross-sectional variation in earnings for physicians in the middle of their
career than for physicians early or late in their career.

55
See motivating theoretical model in Ben-Porath (1967), some early empirical work in Mincer (1974). Thomas
Lemieux (2006) and Heckman, Lochner, and Todd (2003) survey the literature on mincer regressions.

83
G Medicare Reimbursement Rates and Instrument Details
G.1 Description of Medicare Reimbursement Regulations
Medicare Direct Graduate Medical Expenditure (DGME) payments are designed to compensate
teaching hospitals for expenses directly incurred due to the training of residents. The methodology
used to determine these payments was established in the Consolidated Omnibus Budget Recon-
ciliation Act (COBRA) of 1985, and are implemented as per 42 CFR 413.75 to 413.83. Here,
I provide a broad outline of the method used to determine Medicare DGME payments and the
PCPRA variable used in the analysis.
Roughly, the total DGME reimbursements to a hospital is the product of the hospital specic
per resident amount (PRA), the weighted number of full-time equivalent residents (FTE) and
Medicares share of total inpatient days. The PRA is determined using the total costs of salaries
and fringe benets of residents, faculty and administrative sta of the residency program and
allocated institutional overhead costs divided by the total number of full time equivalent residents
in a base year, usually 1984 or 1985. Hospitals that began sponsoring residency training after 1985
were grandfathered into the program using their rst year of reported costs as the base year. After
1997, a new hospitals per resident amount was based on the reported costs of other programs in
the geographic area, which is an MSA/NECMA, rest of state or a census division depending on
the number of other providers sponsoring GME. The Balanced Budget Act of 1997 also introduced
certain ceilings and oors on the per resident amount. See Gentile Jr. and Buckley (2009) for a
more comprehensive legislative history of Medicare reimbursement of Graduate Medical Education.
Between 1985 and 2000, the PRA for a hospital was revised by adjusting for the 12 month
change in CPI-U, and minor changes on previously misallocated costs. An exception was made in
1993 and 1994 when two separate PRAs were eectively created, one for primary care and obstetrics
and gynecology residents and the other for all other residents. In these two years, the non-primary
care PRA was not adjusted for ination.
Subsequent to 2000, the per resident amounts were also adjusted using the change in CPI-U
but were subject to a oor and ceiling put in place by the The Balanced Budget Act of 1997.
The oor increased the PRAs of hospitals that were below 70% of the (locally-adjusted) national
average per-resident amount to 70% of the total and later to 85%. The ceiling gradually decreased
the PRAs of hospitals that were above 140% of the (locally-adjusted) national average per-resident
amount until the PRA of a hospital fell below the ceiling. The exact procedure used to make these
adjustments is detailed in 42 CFR 413.77. The Balanced Budget Act of 1997 also created new
regulations on the manner in which the number of full-time equivalent residents was determined.
These regulations are detailed in 42 CFR 413.86.

G.2 The Instrument: Competitor Reimbursement Rates


To construct competitor reimbursements, I rst extract the records from the elds "Updated per
resident amount for OB/GYN and primary care" and "Number of FTE residents for OB/GYN
and primary care" on lines 2 and 1 respectively in form CMS-2552-96, Worksheet E-3, Part-IV for
the cost reporting period beginning October 1, 1996 and before September 30,1997. As per the
instructions for this form (3633.4), this is the latest period for which the response to the eld was
required by the hospitals. Indeed, I found only ve observations for this eld in the cost reporting
period ending October 1, 1998 and no observations in the next period. The per resident amount
variable is recorded in cents, and so is rst converted into dollars. Both elds were winsorized
at the bottom at top 1 percent since the range of values were extreme. Barring the eects of
winsorizing the data, the distribution of the per resident amount variable is similar to Figure G.2

84
taken from Newhouse and Wilensky (2001). While some institutions have per resident amounts
less than $40,000, others are reimbursed at rates higher than $200,000.
The Competitor Reimbursement variable for an institution is constructed in order to mimic the
per resident amount calculation done by Medicare for new sponsors. As given in equation (7), the
(weighted) Competitor Reimbursement variable for a program is the average (weighted by FTE) of
all primary care per resident amounts in the primary institutions geographic area (MSA/NECMA
or the rest of the state) other than that of the primary institution. When this average is constructed
from less than three observations, the census division is used. This variable is then merged to the
primary institution of a program as dened earlier.
Figure G.4 depicts the state-averaged variation in the instrument that is not explained by the
controls included in the preference estimates and a programs own reimbursement rate. A degree
of spatial correlation within a census division is noticeable due to the denition of the geographical
units used. Table G.10 presents regressions of the instrumental variable on characteristics included
in the preference estimation, as well as location characteristics such as median age, median house-
hold income, crime rates, total population and college share. These location characteristics, together
with program characteristics explain only 27% of the variation in the instrument. Strictly speaking
a test for exogeneity with respect to the additional location characteristics would be rejected at the
1% level. However, the location characteristics together explain only about 6% of the variation not
explained by the other controls that are included in the preferences estimates. Columns (4-6) show
that characteristics of the program itself explain about 35% of the variation in its reimbursement
rates and the addition of location characteristics is not important. These ndings are consistent
with Anderson (1996), which argues against this reimbursement schemes on the basis that other
cost predictors do not correlate very strongly with per resident amounts. Strictly speaking, these
ndings do not fully support strict exogeneity of the instrument.

85
Table B.1: Detailed Preference Estimates

w/o Wage Instruments w/ Wage Instruments


Full Het. Geo. Het. No Het. Full Het. Geo. Het. No Het.
(1) (2) (3) (4) (5) (6)
Panel A: Preference for Programs
First Year Salary ($10,000) 2.3099 4.5888 0.6180 0.4983 1.9531 -1.1157
(0.3205) (0.4500) (0.0593) (0.3174) (0.3533) (0.1338)
Log Beds (Primary Inst) 2.5652 2.6058 -0.4044 1.6392 2.7780 -0.2000
(0.3371) (0.2213) (0.0512) (0.2656) (0.2399) (0.0534)
Log NIH Fund (Major) 0.0876 2.3046 0.3729 -0.0474 0.6645 0.5228
(0.1284) (0.1646) (0.0257) (0.1350) (0.0735) (0.0343)
Log NIH Fund (Minor) 1.0351 2.2898 0.4160 1.3589 1.3357 0.5428
(0.1272) (0.1410) (0.0274) (0.1461) (0.1447) (0.0315)
Medicare Case Mix Index 4.9815 4.7917 2.4396 7.9283 5.3517 3.1541
(0.6724) (0.5733) (0.1409) (0.9053) (0.5163) (0.1961)
Medicare Wage Index -5.5213 1.9601 -0.2240 -5.1235 1.4322 -1.1891
(1.0418) (0.5107) (0.1385) (0.9917) (0.3742) (0.1456)
Annual Median Rent ($10,000) 5.9901 -0.5741 1.8420 7.1745 6.1311 3.0188
(0.8155) (0.3137) (0.1371) (0.7448) (0.6117) (0.1946)
Rural Program 1.6925 2.5747 0.2365 1.2727 3.3816 0.7187
(0.3457) (0.3540) (0.0804) (0.3573) (0.4332) (0.0952)
University Based Program 3.6464 5.0845 0.7694 3.6610 4.9082 1.0441
(0.4098) (0.5451) (0.1022) (0.4372) (0.5636) (0.1067)
Community/University Program -1.1552 -1.0174 -0.3486 -1.7033 -1.4662 -0.5667
(0.1969) (0.1645) (0.0480) (0.2180) (0.2114) (0.0631)
Reimbursement Rate -0.0966 0.2569 0.1138
(0.0466) (0.0433) (0.0142)
Control Variable 2.4889 8.7394 2.1200
(0.5335) (0.7762) (0.1571)
Rural Progam x Rural Born Resident 0.2746 0.0500 0.2484 0.0455
(0.0476) (0.0113) (0.0506) (0.0093)
Program in Medical School State 2.2682 1.0563 2.2592 0.8846
(0.1869) (0.0747) (0.1950) (0.0555)
Program in Birth State 1.4650 0.6057 1.4643 0.4787
(0.1250) (0.0443) (0.1269) (0.0296)
Sigma Log NIH Fund (Major) 0.9814 1.1229
(0.1833) (0.1928)
Sigma Log Beds 4.1294 3.8453
(0.5608) (0.5114)
Sigma Medicare Case Mix 4.6807 3.2150
(0.9656) (0.9127)

86
Table B.1: Detailed Preference Estimates (contd)
w/o Wage Instruments w/ Wage Instruments
Full Het. Geo. Het. No Het. Full Het. Geo. Het. No Het.
(1) (2) (3) (4) (5) (6)
Panel B: Human Capital
Log NIH Fund (MD) 0.1153 0.1269 0.1468 0.1191 0.0941 0.1429
(0.0164) (0.0139) (0.0116) (0.0156) (0.0131) (0.0129)
Median MCAT (MD) 0.0814 0.0666 0.0697 0.0797 0.0413 0.0718
(0.0070) (0.0038) (0.0027) (0.0056) (0.0030) (0.0030)
US Born (Foreign Grad) 0.1503 -0.2470 0.4651 0.2083 0.2927 0.5964
(0.1021) (0.0801) (0.0458) (0.0989) (0.0705) (0.0486)
Sigma (DO) 0.8845 0.7944 0.7454 0.9321 0.7275 0.8168
(0.0359) (0.0285) (0.0319) (0.0370) (0.0292) (0.0399)
Sigma (Foreign) 3.6190 3.0709 1.2850 3.5549 2.8215 1.5483
(0.1469) (0.1102) (0.0550) (0.1411) (0.1131) (0.0756)
Medical School Type Dummies Y Y Y Y Y Y

Moments 106 106 106 118 118 118


Parameters 25 22 19 27 24 21
Objective Function 951.31 1122.78 6136.30 1032.24 1090.10 6191.08

Notes: See Table 8 for Panel A estimates monetized in dollar units. Indicator for zero NIH funding of major
associates and for minor associates. In uninstrumented specications, the variance of the vertical unobservable jt
is normalized to 1 and in instrumented specications, the variance of jt is normalized to 1. In all specications,
the variance of unobservable determinants of the human capital index of MD graduates is normalized to 1. All
specications normalize the mean utility from a program with zeros on all characteristics to 0. All specications
normalize the mean human capital index of residents with zeros for all characteristics to 0. Point estimates using
1000 simulation draws. Standard errors in parenthesis. Optimization and estimation details described in an
appendix.

87
Table B.2: Out-of Sample Fit: Regressions
MD Degree Foreign Degree
(1) (2)
Data Simulated (s.e.) Data Simulated (s.e.)
First Year Salary ($10,000) 0.129 0.110 (0.036) -0.178 -0.094 (0.038)
Median Annual Rent 0.261 0.359 (0.074) -0.328 -0.355 (0.076)
Log # Beds -0.017 0.084 (0.021) 0.009 -0.083 (0.022)
Log NIH Fund (Major) 0.050 0.047 (0.012) -0.042 -0.051 (0.013)
Log NIH Fund (Minor) 0.046 0.022 (0.017) -0.051 -0.022 (0.017)
Rural Program -0.019 0.128 (0.042) -0.004 -0.110 (0.044)
Case Mix Index 0.238 0.211 (0.056) -0.220 -0.205 (0.058)
Medicare Wage Index -0.233 -0.365 (0.116) 0.257 0.387 (0.124)
Log NIH Fund (MD) Median MCAT Score
(3) (4)
Data Simulated (s.e.) Data Simulated (s.e.)
First Year Salary ($10,000) 0.135 0.123 (0.096) 0.512 0.484 (0.196)
Median Annual Rent -0.438 0.206 (0.224) 0.065 0.849 (0.421)
Log # Beds -0.067 0.084 (0.065) 0.130 0.180 (0.128)
Log NIH Fund (Major) 0.397 0.143 (0.040) 0.518 0.172 (0.074)
Log NIH Fund (Minor) 0.097 0.198 (0.042) 0.137 0.147 (0.085)
Rural Program -0.172 0.225 (0.122) -0.224 0.065 (0.242)
Case Mix Index 0.237 0.458 (0.179) -0.218 0.533 (0.340)
Medicare Wage Index 1.225 0.309 (0.342) 3.060 1.145 (0.678)

Notes: Linear Regressions using 2011-2012 data. Each simulation draws a parameter from the estimated asymptotic
distribution of specication (1), and unobservables independently. The vector of coe cients is computed for each
draws. The table reports the mean estimate and bootstrapped standard error of simulated estimates in parenthesis.

88
Table D.3: Preferences for Rural Born Doctors

Log NIH Funding (MD) Allopathic/MD Degree


Rural Pgms. Urban Pgms. All All All Rural Pgms. Urban Pgms. All
(1) (2) (3) (4) (5) (6) (7) (8)

Rural Born Resident -0.0582 0.0015 -0.1284*** 0.0122 0.0176 0.0263**


(0.0811) (0.0364) (0.0339) (0.0324) (0.0119) (0.0107)
Female -0.0153 0.0681***
(0.0234) (0.0255)

89
Program Fixed Eect Y Y Y Y Y

Observations 750 7,885 8,635 9,599 9,599 1,200 11,260 12,460


R-squared 0.2916 0.2461 0.0017 0.2535 0.0007 0.2568 0.2662 0.0005

Notes: Linear regression of residents graduating school characteristic on other resident characteristics. Column header Rural (Urban, All) indicates regressions
using residents matched to rural (urban, all) programs. Samples pooled from the academic years 2003-2004 to 2010-2011. Columns (1-5) restrict to residents
graduating from medical schools with non-zero average annual NIH funding. Columns (1-3) and (6-8) restrict to the subset of residents with reliable city of birth
information and were born in the United States. Standard errors clustered at the program level in parenthesis. Signicance at 90% (*), 95% (**) and 99% (***)
condence.
Table D.4: General vs. Partial Equilibrium Eects of Price Incentives

Full Heterogeneity
w/o Wage Instruments
Subsidy Size $5,000 $10,000 $20,000
(1) (2) (3)
Panel A: Rural Programs
Total Capacity 334
Observed # Matches 310
Baseline Simulated Matches 313.33
Baseline Prob Rural Match > Urban Match 52.76%

# Matches (General Equilibrium) 10.23 17.3 20.63


Prob Rural Match > Urban Match (GE) 9.38% 17.70% 31.28%
# Matches (Partial Equilibrium) 10.31 17.59 20.63
Prob Rural Match > Urban Match (PE) 10.22% 19.56% 34.22%

Panel B: Medically Underserved States and Rural Programs (MUA)


Total Capacity 751
Observed # Matches 720
Baseline Simulated Matches 721.79
Baseline Prob MUA Match > Other Matches 53.53%

# Matches (General Equilibrium) 14.72 24.7 29.17


Prob MUA Match > Other Matches (GE) 8.73% 16.82% 29.93%
# Matches (Partial Equilibrium) 16.46 25.88 29.17
Prob MUA Match > Other Matches (PE) 9.31% 18.25% 32.70%

Panel C: 1 in 4 Randomly Chosen Programs


# Matches (General Equilibrium) 21.54 32.23 38.74
# Matches (Partial Equilibrium) 25.45 34.04 39.05
Prob PE Match > GE Match 52.59% 56.43% 67.58%

Notes: Medically underserved states are in the bottom quartile of physician to population ratios or in the top 10 in
total area designated as a Health Physician Shortage Area (HPSA). All simulations use 2010 - 2011 sample with
3,148 residents and 3,297 total number of positions. Baseline and counterfactual simulations using 100 draws of
structural unobservables. Inter-quartile range in parenthesis. Prob. X > Y is the Wilcoxian statistic: probability
that the human capital of the population X is drawn from is greater than that of the population that Y is drawn
from.

90
Table D.5: Recruitment Into Rural Practice

Urban Born Resident Rural Born Resident


Urban Program Rural Program Urban Program Rural Program

Percent Practicing in a Rural County 19.52% 50.45% 46.35% 79.19%

Notes: Means of location outcomes for US born residents entering a non-academic practice and with good data on
birth city and practice city. Post-graduation plans from graduating resident survey administered to residency
program directors in the National GME Census Track. The headers Urban (Rural) Program indicates whether a
resident graduated from an urban (rural) program. Results from 5878 resident observations and 2027 program-year
observations.

91
Table E.6: Sample Construction

Year 2003-2004 2004-2005 2005-2006 2006-2007 2007-2008 2008-2009 2009-2010 2010-2011


Panel A: Programs
Total number of ACGME Programs 475 462 463 460 457 453 451 451
Excluding programs in Peurto Rico 469 458 459 456 453 449 448 448
Excluding military programs 455 444 445 442 440 436 432 434
Excluding programs that do not participate in the NRMP 446 438 443 438 432 432 427 429

92
Excluding programs that are not oering positions 445 438 441 438 431 432 427 429
Excluding programs with no matches in the sample period 425 433 439 436 427 430 423 428

Panel B: Residents
Total number of Residents in ACGME programs 3118 3066 3166 3148 3095 3154 3133 3268
Excluding residents matched with Peurto Rico programs 3097 3048 3154 3140 3085 3143 3126 3254
Excluding residents matched with military programs 2995 2945 3041 3026 2996 3051 3009 3160
Excluding residents matched with NRMP non-participants 2976 2925 3035 3021 2974 3040 2996 3148
Table E.7: Capacity Adjustments

Year Number of program Average adjustment Maximum adjusment


capacities adjusted
2003-2004 51 1.25 3
2004-2005 53 1.32 5
2005-2006 72 1.32 4
2006-2007 57 1.14 2
2007-2008 74 1.35 5
2008-2009 67 1.40 4
2009-2010 65 1.35 5
2010-2011 71 1.54 6
Notes: Capacities are adjusted upwards only. Average adjustment is reported conditional on adjustment.

93
Table F.8: Characteristics of Family Medicine Doctors in the US

Mean Std. Dev

Observations 698

Income less than $100K 16.64%


Income between $100K to $150K 35.43%
Income between $150K to $200K 27.76%
Income between $200K to $250K 9.95%
Income between $250K to $300K 6.36%
Income more than $300K 3.86%
Income Type: Hourly 4.48%
Income Type: Salary 71.73%
Income Type: Prots from Practice 23.79%

Hours Last Week 50.19 13.75


Weeks Worked 47.48 4.63
Full Time 87.95%

Experience 13.69 8.35


Foreign Medical Graduate 15.17%

Female 30.83%

Practice Type: Solo/Two Physician 31.82%


Practice Type: Group 46.27%
Practive Type: Other 21.91%

Large Metropolitan Area 46.89%


Small Metropolitan Area 32.44%
Non-Metropolitan Area 20.67%

Notes: Sample of Family Practice Physicians in the Health Tracking Physician Survey of 2006 with non-missing
income, starting medical practice in or before 2006. Income from medically related activities in 2006. Hours
reported for medically-related activities. Income excludes in returns from investments in nancial and medical
capital. Experience dened as number of years since beginning medical practice. Full-time dened as more than 35
hours spent on medical activities and more than 40 weeks worked in 2006. Large Metropolitan Area has more than
1 million residents.

94
Table F.9: Income of Family Medicine Doctors

Dependent Variable Log Income from Practice


(1) (2) (3) (4) (5) (6)
Panel A: Interval Regression Estimates
Experience 0.0144* 0.0124 0.0819** 0.0117 0.0147* 0.0126
(0.0070) (0.0070) (0.0256) (0.0073) (0.0069) (0.0069)
Experience-squared -0.0003 -0.0004 -0.0063** -0.0004 -0.0005 -0.0004
(0.0002) (0.0002) (0.0024) (0.0002) (0.0002) (0.0002)
Female -0.2617*** -0.2621*** -0.2121*** -0.2581*** -0.2759***
(0.0352) (0.0421) (0.0372) (0.0346) (0.0347)
Foreign Medical Graduate -0.0446
(0.0441)
Practice Type: Solo/Two Physician -0.1392***
(0.0365)
Practice Type: Other 0.0062
(0.0345)
Small Metropolitan Area 0.0544
(0.0347)
Non-Metropolitan Area 0.0647
(0.0398)
Contant 11.7822*** 11.9014*** 11.7658*** 11.9388*** 11.8955*** 11.9228***
(0.0413) (0.0449) (0.0596) (0.0465) (0.0433) (0.0500)
Heteroskedascitiy by experience Y
Sample Young Full-time

Oberservations 698 698 295 616 698 698


Total Sample Weight 60620 60620 25612 53318 60620 60620
Panel B: Estimated Distribution Statistics at Zero Experience
Mean 153660.74 148524.83 127612.97 157920.84 144746.55 146895.78
Std. Dev. 68769.35 63368.95 47622.36 65432.74 50911.87 61416.25

Notes: Interval regressions with normally distributed error. Baseline sample and characteristics as dened in Table
F.8. Column (3) restricts to physicians with less than 10 years of experience. Column (4) estimates sigma as a
quadratic function of experience. Earnings statistics in 2010 dollars, calculated at zero experience, mean gender and
foreign graduate fractions observed in the resident population, and means of practice location and type
characteristics (only for column (6)). Signicance at 90% (*), 95% (**) and 99% (***) condence.

95
Table G.10: Medicare Reimbursement Rates on Characteristics

Dependent Variable Log Competitor Reimbursements Log Reimbursements


(1) (2) (3) (4) (5) (6)
Log Rent -0.0057 -0.0282 -0.1746** -0.0004 -0.2023 -0.1330
(0.0632) (0.0737) (0.0879) (0.1219) (0.1579) (0.1973)
Log Wage Index 0.3924*** 0.3425*** 0.0937 0.7497*** 0.6509*** 0.4406
(0.1036) (0.1038) (0.1042) (0.1977) (0.2042) (0.2679)
Log Reimbursement 0.1701*** 0.1538*** 0.1410***
(0.0227) (0.0221) (0.0232)
Location Characteristics Y Y Y Y
Small Cities (<3 mi in Population) Y Y
Observations 3,441 3,441 2,407 3,441 3,441 2,407

96
R-squared 0.2335 0.2934 0.2719 0.3528 0.3731 0.3550

Notes: Linear regressions. Location characteristics include median age (county), log median household income (county), log total population (MSA/county),
violent crime and property crime rates from FBIs Crime Statistics/UCR (25 mi radius weighted by 1/distance), dummies for no data in that radius and log
college share (MSA/rest of state). All columns include a constant term, log # beds, log NIH Fund (Major), log NIH Fund (Minor), Log Case Mix Index,
Program Type Dummies, Rural Program Dummy and dummies for programs with no NIH funding at major a liates, for no NIH funding at minor a liates,
and a dummy for missing Medicare ID at program institutions. The Competitor Reimbursement is a weighted average of the Medicare primary care per resident
amounts of institutions in the geographic area of a program other than the primary institutional a liate of the program. Geographic area dened as in Medicare
DGME payments: MSA/NECMA or Rest of State unless less than 3 other observations constitute the area, in which case the census division is used. See data
appendix for description of variables and details on the construction of the reimbursement variables. For columns (4-6), a programs reimbursement rate is
truncated below at $5,000 and a dummy for these 46 truncated observations is estimated as well. Standard errors clustered at the program level in parenthesis.
Signicance at 90% (*), 95% (**) and 99% (***) condence.
Figure G.1: Distribution of Per Resident Amounts

Notes: Secondary source from Newhouse and Wilensky (2001). A similar distribution can be roughly reproduced
using the Medicare cost report data used in this study.

97
Figure G.2: Relationship Between Wages and Competitor Reimbursements

Notes: Sample restricted academic year 2010-2011. To construct the residualized scatter plot, I rst regressed the
X-axis and Y-axis variables on County Median Rent (Gross), Rural Program, Medicare Wage Index, Log NIH Fund
(Major), Log NIH Fund (Minor), Log # Beds, Medicare Case-Mix Index and dummies for No NIH Fund (Major),
No NIH Fund (Minor), missing Medicare ID. The X-axis and Y-axis residuals estimated from these regressions are
scattered.

98
Figure G.3: Heteroskedasticity in First Stage Residuals

Notes: To construct the tted salaries, I regressed the First Year Salary on Competitor Reimbursements, County
Median Rent (Gross), Rural Program, Medicare Wage Index, Log NIH Fund (Major), Log NIH Fund (Minor), Log
# Beds, Medicare Case-Mix Index and dummies for No NIH Fund (Major), No NIH Fund (Minor), missing
Medicare ID. The regression was estimated on the full sample from the academic years 2002-2003 to 2010-2011. The
scatter plot shows the salaries and tted values from the academic year 2010-2011 alone. The Competitor
Reimbursement is a weighted average of the Medicare primary care per resident amounts of institutions in the
geographic area of a program other than the primary institutional a liate of the program. Geographic area dened
as in Medicare DGME payments: MSA/NECMA unless less than 3 other observations constitute the area, in which
case the census division is used. See data appendix for description of variables and details on the construction of
the reimbursement variables.

99
Figure G.4: Geographic Distribution of Competitor Reimbursements

Notes: Average residuals of the Competitor Medicare Reimbursements by state. Colors categorized by 10 equally
sized quantiles with darker colors indicating higher values. Program sample restricted academic year 2010-2011. To
construct the average residuals by state, I rst regressed Competitor Medicare Reimbursements on County Median
Rent (Gross), Rural Program, Medicare Wage Index, Log NIH Fund (Major), Log NIH Fund (Minor), Log # Beds,
Medicare Case-Mix Index and dummies for No NIH Fund (Major), No NIH Fund (Minor), missing Medicare ID.
The estimated from these regressions were averaged by the state a program is located in. The Competitor
Reimbursement is a weighted average of the Medicare primary care per resident amounts of institutions in the
geographic area of a program other than the primary institutional a liate of the program. Geographic area dened
as in Medicare DGME payments: MSA/NECMA unless less than 3 other observations constitute the area, in which
case the census division is used. See data appendix for description of variables and details on the construction of
the reimbursement variables.

100

Вам также может понравиться