Академический Документы
Профессиональный Документы
Культура Документы
Modern Microeconomic Analysis
for Business Strategy
(a Provisional and Incomplete Text)
Spring 2010
Jim Dewey
University of Florida
Sam Selikoff
University of Florida and Gator Tutoring
© 2010 James F. Dewey
i
Preface
Like top colleges and universities across the nation, UF has become very
selective in admissions. The SAT math scores of incoming UF freshmen are on par
with those at the universities with the nation’s top 20 undergraduate business
programs. This increase in selectivity at top schools has been accompanied by a
decrease in selectivity at other schools, where most students attend and where most
books are sold. As a result, managerial economics texts have deemphasized
mathematical rigor at just the point in time when it is of most value for UF students.
At the same time, textbook prices have soared. The Florida legislature has
required explicit justification for the use of any expensive new textbook, and UF has
encouraged faculty to provide low cost materials. This mirrors a national movement
among the best universities to make online course materials available free – two
examples are the extensive Open Courseware available from MIT and Preston
McAfee’s (Cal Tech) text Introduction to Economic Analysis. Provision of high quality
free online course materials is becoming a hallmark of top universities.
This draft textbook addresses the need for a text that is appropriate in its level of
academic rigor and topic selection for UF students taking Managerial Economics and
is also available at low cost to my students. The text is available free on the course
website and a printed version is available at Target Copy for approximately $20.
The text started as a very detailed set of notes based on my lectures created by a
Sam Selikoff, a former teaching assistant. I have gone through them adding,
clarifying, editing, and shaping them into chapters, with Sam’s help. Further, they
have been proofread and edited twice by another teaching assistant, Michael
Canencia. In addition, all errors noted by students during the Fall 2009 semester
have been corrected in the current version. However, some chapters have received
much more attention and revision than others at this stage!
Writing a textbook is a long and difficult task ‐ this is very much a work in
progress. The text is provisional and incomplete, and I have no doubt many errors
remain ‐ you must not rely solely on it in my class. Anything covered in lecture is fair
game for exams, even if the corresponding material in the text exhibits errors or is
incomplete. Even with these imperfections, most students said they found it very
useful last semester – very much more useful than students have found alternative
textbooks. So, even with its flaws, I think it will be quite helpful in your studies, as
long as you remember it is an imperfect work in progress and therefore…
USE WITH CAUTION!
Jim Dewey
1/1/2010
ii
Contents
Part 1: Analytical Approach and Tools
Chapter 1
Introduction 1
Appendix to Chapter 1
Math Used in Managerial Economics 24
Chapter 2
Cost, Demand, and Profit Maximization 43
Chapter 3
Applications and Extensions of Optimal Production and Pricing 64
Part 2: Empirical Approximations and Econometrics
Chapter 4
Estimating and Interpreting Approximations 90
Chapter 5
Evaluating Regression Analyses 110
Chapter 6
Omitted Variables Bias 130
Part 3: A Closer Look at Some of the Tools
Chapter 7
Individual Choice 142
Chapter 8
Applications and Extensions of Consumer Theory 159
Chapter 9
Non‐Linear Pricing 170
Chapter 10
Uncertainty with Risk Aversion 187
Chapter 11
More on Production and Cost 203
iii
Part 4: Game Theory – Modeling Strategic Interaction
Chapter 12
One Shot Games with Discrete Strategies 222
Chapter 13
One Shot Games with Continuous Strategies 238
Chapter 14
Repeated Games 249
Part 5: Product Market Structure, Strategy, and Analysis
Chapter 15
Homogenous Product Markets 264
Chapter 16
Differentiated Product Markets 282
Chapter 17
Perfect Competition 295
Chapter 18
Applications of Supply and Demand Analysis 308
Chapter 19
Market Structure Wrap Up 315
Part 6: Firm Structure
Chapter 20
Input Procurement and Contracting 330
Chapter 21
The Firm 347
iv
Part 1
Analytical Approach and Tools
1
Chapter 1
Introduction
Aim of the Course
Microeconomics is the study of how individuals and organizations allocate
scarce resources to achieve their ends. That includes the nature of the interactions
between those individuals and organizations, especially in markets. All
microeconomic analysis texts share much in common, since the core tools of
microeconomic analysis are relatively few. However, this book focuses on applying
those tools systematically to problems faced by firms and their managers.
Modern microeconomics offers an analytical approach that can help firms and
their managers efficiently organize their thoughts when faced with business
decisions. Further, applied microeconomic analysis underpins much of the
information upon which business students will base professional decisions
throughout their careers. Some examples include: 1) research referenced in
newspapers, magazines, and trade journal articles about the market in which their
firm operates, 2) reports conducted for firms by in‐house analysis groups, 3)
research conducted for firms by consultants, and 4) studies conducted by
government agencies pursuant to regulatory or other proceedings. Some of these
analyses will be quite good. Others will be quite bad. Relying on bad analyses, or
using good ones incorrectly, leads to wasteful, and potentially disastrous, decisions.
Managers need to understand the tools of microeconomic analysis well enough to be
able to spot bad information and to use the good information appropriately.
The book first builds an analytical toolbox and then applies it in a rigorous
manner to individual topics. While the individual topics are of some interest in their
own right, the overarching goal is to show commonalities in the analytical process
through repeated application. It is impossible to learn to undertake advanced
analyses from a single book or a single undergraduate class. But, through repeated
application of the basic tools of economic analysis, it is possible to gain enough
insight into economists’ basic tools to focus clearly and critically on the important
economic aspects of a problem, and, also allow intelligent evaluation of advanced
analyses conducted by others.
It is intended for use in a course focused on intermediate microeconomic
analysis, especially of business problems (managerial economics). As such, it
presumes both a solid grounding in microeconomic principles and basic calculus.
However, the text is sufficiently self‐contained for a diligent student who has never
had a principles of microeconomics course and whose calculus knowledge has
become quite rusty to master the material.
A widely accepted way of classifying the learning objectives common in
traditional education is given by the cognitive domain of Bloom’s taxonomy.1 It
1 http://en.wikipedia.org/wiki/Bloom%27s_Taxonomy
2
places learning objectives into six (more or less) hierarchical categories: 1)
knowledge (remembering), 2) comprehension (understanding), 3) application, 4)
analysis, 5) synthesis, and 6) evaluation. While many lower level college courses
focus largely on the first three levels, as the words themselves imply,
microeconomic analysis focuses on higher level learning objectives.
Memorizing terms, understanding concepts, and being able to apply them in
situations you have seem before are necessary to learning economic analysis, but,
far from sufficient. Repeatedly working the largest possible set of practice problems
until you can do them backwards and forwards will help with the first two levels
and somewhat with the third, but will not help you master the higher order learning
objectives. You must deconstruct each new concept you encounter into its
constituent pieces and make sure you understand it from any possible angle and can
generalize it and adapt it to completely new situations. You must practice analyzing
situations you have never before encountered, synthesizing information from
various sources to reach insights that have never been explained to you in the past,
and using the results of your analyses to evaluate alternative courses of actions or
potential solutions to problems.
The Goal of the Firm (and its Management)
Firms procure inputs and use them in turn to produce goods or services that are
of value to their customers. The difference between the value of the firm’s output
and the cost of the inputs used is the value added by the firm. If the share of the
value of its products which it is able to capture in the form of revenue exceeds the
cost of production, some residual revenue will be left over as profit. Mathematically,
π = R −C (1.1)
where π is profit, R is total revenue, and C is total cost.
Who gets to claim this residual revenue (profit)? The firm’s shareholders are the
residual claimants in this case. The value of all shares in the firm, in fact, will equal
(approximately) the expected present value of all future profits, though, we still need
to define exactly what we mean by expected present value. Since the shareholders
prefer to be wealthier, they will seek to provide their agents, the firm’s
management, with incentives to maximize the value of their shares. Thus, the
primary goal of the firm’s management is to utilize the limited resources available to
them to maximize the expected present value of future profits.
Present Value
The difference between the future value and present value of a sum of money
depends on the time value of money as reflected in the interest rate (r). If the
interest rate is 5%, for example, after one period the initial $100 will have earned $5
in interest, so the initial amount will have grown to $105. Every time interest
compounds, the value at the end of the period is 1.05 times the value at the end of
the previous period. After two periods, the same $100 would be worth 1.05 times
3
$105, or $110.25. Generally, if the interest rate is r, the future value (denoted FV) of
any given initial or present value (denoted PV) after t time periods is
FV = PV (1 + r )t . (1.2)
This equation compounds the present value forward by a factor, determined by
the length of time and the interest rate, to obtain the future value.
Since any initial sum growing at a given interest rate will be grow into a larger
sum in the future, any given future value is worth less than that future value at the
present time. Dividing both sides of equation (1.2) by (1 + r )t , we find the following
expression for the present value of a future amount:
FV
PV = . (1.3)
(1 + r )t
This equation discounts the future value backward by a factor, determined by the
length of time and the interest rate, to obtain the present value.
Example: Present/Future Value
If the interest rate is 10% and you invest $1 today, what is the future value three
years from now?
Solution: Use the equation for future value, observing PV = $1, r = 0.10 and t = 3.
FV = PV (1 + r )t
FV = $1(1 + 0.1)3
FV = $1(1.1) = $1.331
3
If the interest rate is 10% and you receive $3 two years from today, what is the
present value?
Solution: Use the expression for present value, observing FV=$3, r=0.10 and t=2.
FV
PV =
(1 + r )t
3
PV =
(1 + 0.1)2
3
PV = 2 = 2.48
1.1
Of course, future profits are not realized in only one period. To find the present
value of a series of profits realized over any number of years, we simply add up the
present value of each individual profit realization. So, if πt represents the value of
profit realized after period t, the value of the firm at time 0 (the present) would
simply be
4
π1 π2 π3
V0 = π 0 + + + +
1+ r (1 + r ) (1 + r )
2 3
. (1.4)
πt
= ∑t
(1 + r )
t
Example: Net Present Value of a Project
Suppose a project involves an expenditure of $124 currently, and will return $21
after one year, $72 more after 2 years, and $45 after the third year, at which
point the project ends. If the interest rate is 8%, what is the net present value of
the project?
Solution: Use the equation for net present value, observing r = 0.08, π0 = ‐124, π1
= 21, π2 = 72, and π3 = 45.
21 72 45
PV = −124 + + + ≈ −7.10 .
1.08 1.08 1.083
2
Information Structure
Often calculating a firm’s value, or the net present value of any particular
investment or project, is impossible to do with complete precision since there is no
way to know for sure what the future will bring. The structure of information ‐ who
knows what, when they know it, and how certain they are in that knowledge ‐ is
important in analyzing any decision. We consider three possible information
structures.
Complete and Perfect Information
When everyone knows exactly what the future holds, and, knows that everyone
has the same information, it is referred to as complete and perfect information.
This is, obviously, the simplest possible information structure. In this case,
calculating the present value of profit, or the present value of any cash flow project,
is a straightforward application of equation (1.4).
Simple Risk and Uncertainty
We will think of situations involving uncertain outcomes as lotteries, in which
every possible outcome might occur with some probability. Risk refers to a situation
where everyone shares (more or less) common estimates of the probabilities of
each possible outcome based on the laws of probability or rigorous empirical
analysis. Uncertainty encompasses risk and broader cases of imperfect information
where there is not enough information to form assessments of the probabilities of
every possible outcome based on agreed upon laws of probability or empirical
regularities. In that case, everyone must form their own subjective probability
assessments that reflect their best guesses about the probability of each possible
5
outcome based upon whatever information is available to them. If everyone is
exactly identical and all information is shared in common, everyone will reach
common probability estimates. When people differ in their experiences, knowledge,
and abilities, their subjective probability estimates will differ.
Uncertainty affects virtually every decision taken by a firm’s management. For
our purposes, sometimes we may assume away the uncertainty to focus on other
important aspects of a problem in situations where the uncertainty itself is not
important to the main point under consideration. However, a great deal of the
information in the text will explicitly incorporate incomplete information in the
form of uncertainty about future events.
Asymmetric Information
Sometimes, some individuals have better or more accurate information about
uncertain contingencies than everyone else. While such asymmetric information
may play a role in a number of important markets, it is an advanced topic which we
will not return to in detail until much later. At this point, though, it is worth noting
that this phenomenon can lead to severe market failure. Typically, such market
failures take one of two forms.
Adverse selection occurs when one individual has information about inherent
characteristics of the problem that is not available to others. To see why this
matters, consider the market for used cars. For purposes of our example, suppose
used cars are either high quality or low quality. Further, suppose individuals who
own used cars and are seeking to sell them know the quality of the car, but potential
buyers do not. Those looking to sell high quality cars will only sell them at a high
price. However, if there are enough low quality cars out there, buyers will not be
willing to pay a high price for any used car, for fear they will get a low quality car
that was not worth the money. Anticipating this, owners of the highest quality used
cars will simply not offer them for sale, and will instead hold onto them longer than
they otherwise would. The market for the highest quality used cars would simply
not exist if information was too asymmetric.
Moral Hazard occurs when one individual has information about an action they
have taken that is not available to others and is therefore shielded from the
consequences of their actions. For example, an hourly employee whose work rate is
hard to monitor may not work as hard as someone whose work rate is easier to
monitor. This can create a role for incentive contracts. As another example, all else
equal, someone with complete homeowners insurance coverage may take less care
in preventing losses due to, say, fire damage. This creates a need for deductibles and
for explicit incentives in the form of policy discounts for homeowners to undertake
safety investments.
Expected Value and Attitudes Toward Risk
When an entity such as an individual or a firm faces uncertainty (whether or not
information is symmetric), their attitude toward risk, as well as the degree of the
risk itself, affects their evaluation of the options they face. Therefore, we will
6
consider the implications of different information structures and attitudes toward
risk. Before doing so, however, it is useful to first introduce two concepts, expected
value (EV) and the certainty equivalent (CE).
Roughly speaking, expected value is the average outcome if a given lottery is
played a very large number of times. More specifically, if i is an index of the possible
outcomes (so that i=1 for the first possible outcome, and so on), xi is the value if
outcome i occurs, and fi is the probability of outcome i, the definition of expected
value of x, denoted E(x), is
E ( x) = ∑ i f i xi . (1.5)
We will often use f to denote probabilities. But, sometimes we will write Pr(xi) to
denote the probability that x takes on the specific value xi.
Example: Expected Value
Suppose the probability that profit is $40 is 0.8; otherwise, profit is ‐$100. Find
the expected value of this gamble.
Solution: Use the definition of expected value, observing that Pr(π=40) = 0.8 and
Pr(π=‐100) = 0.2.
n
E ( x) = ∑ Pr( xi ) xi
i =1
E ( x) = 0.8(40) + 0.2(−100) = 12
Notice that we found the expected value to be $12. When the uncertainty is
resolved, profit will either be $40 or it will be ‐$100. What, then, is the
interpretation of the $12? If this gamble were taken 100 times, about 80 times
profit would be $40, and the other 20 times profit would be ‐$100; the average
per‐period profit would then be $12.
Now imagine facing a choice between a lottery on one hand and a certain sum of
money on the other. If the sure thing is a low enough value, the lottery will be
preferred, and if the sure thing is a high enough outcome, it will be preferred. The
certainty equivalent (CE) of a lottery is the sum of money for certain that is viewed
as exactly equivalent to the lottery.
A risk neutral entity is indifferent toward risk – they care only about the
expected value. For them, the certainty equivalent and the expected value are equal
(CE=EV). Faced with a choice between the lottery in the example above and $12 for
certain, a risk neutral individual would be indifferent. Most individuals, however,
are risk averse, meaning the certainty equivalent of a lottery is less than its
expected value from their point of view (CE<EV). They would choose $12 for sure
rather than the lottery above. For someone who is risk loving, the certainty
7
equivalent of a lottery is higher than the expected value (CE>EV). They would
choose the lottery above over $12 for certain.
Expected Present Value and the Value of the Firm
While most individuals are risk averse, we will assume firms evaluate uncertain
options in an approximately risk neutral manner. Why? First, it is the simplest
possible model, allowing us to focus on other, more important issues until we learn
to model risk aversion (in Chapter 5). Second, individual stockholders can diversify
by buying shares in many different firms and holding any number of other types of
assets, such as bonds, currency, or real estate. To the extent the risks associated
with different assets in the portfolio are independent, diversification reduces the
aggregate risk in the portfolio. In the most extreme case, if shareholders could
diversify away all the risk in their portfolio (so below expected returns on some
investments are exactly offset by above normal returns on others), they would
simply want the firms in which they held shares to make the choices that maximize
expected profits, regardless of the apparent risk to the individual firm.
In reality it is not possible to diversify away all risk in a portfolio. But, the
uncertainty in the return of an individual stockholder’s diversified portfolio is far
less than the uncertainty of the expected profit of an individual firm in that portfolio.
Further, the relationship between the expected return of a well diversified portfolio
and the expected return to any given stock in that portfolio is much stronger than
the relationship between the overall uncertainty about the return of a the portfolio
and the uncertainty about the return of an individual firm’s stock in that portfolio.
So, it makes sense to model firms as if managers care predominantly about expected
profit, with the degree of uncertainty only a secondary concern.
Above, we argued the value of a firm was approximately equal to the present
value of future profits when there was no uncertainty. If the value of a firm is
evaluated in an approximately risk neutral manner, the value of the firm will be the
expected present value of future profits (EPV). The idea is to find the expected value
of profit in every future period and then discount those values back to the present.
Mathematically, EPV is
EPV = ∑ t
Ei ( xit )
=∑ t
∑ i
fit xit
. (1.6)
(1 + r ) (1 + r )
t t
The value of the firm at time 0 is then
V0 = ∑ t
∑ i
fitπ it
. (1.7)
(1 + r )
t
Example: Present Expected Value
Suppose a firm is considering acquiring the rights to a project that has an
uncertain return over the following two years. After one year, the project will
8
return $100 with probability 0.4, otherwise it will return −$200 at that time.
After two years, there is a 0.7 probability the project will return $400 in addition
to the first year returns, otherwise it will return an additional ‐$200. What is the
most the firm should be willing to pay to acquire this project? Assume an
interest rate of 7%.
Solution: Using the definition of expected present value, calculate the expected
value of the return each period, discount them back to the present, and sum.
0.4(100) + 0.6(−200) 0.7(400) + 0.3(−200)
PEV = +
(1.07) (1.07) 2
PEV = −74.76636 + 192.15652 = 117.39017
This expected present value is what the project is worth to us today. The value of
a firm is nothing more than the expected present value of all future cash flows
the firm may generate from all projects it may undertake.
Value of Information – Part 1 – Yes/No Decisions
In a world of uncertainty, supplemental information helps managers make better
profit‐maximizing decisions. Every manager has their own “best guess” about the
future, based on the internal knowledge of the firm. However, if a manager is able to
accumulate additional outside information, perhaps through hiring a consultant,
they may be able to readjust their perceptions of the future. It follows from this that
information can be valuable to managers. In fact, it is valuable to the extent it leads
the manager to update their probability estimates and to therefore alter their
decisions.
We start with the simplest kind of decision, where a firm faces a single,
dichotomous decision. Imagine, for example, we are considering drilling an oil well,
but are uncertain as to whether or not we will strike oil. It costs money to build the
rig, so if we hit oil we will make a profit, but if we don’t hit oil we will have lost the
cost of building the rig. We have our own ideas about how likely it is that we will
strike oil, based on our previous knowledge, experience, and other freely available
information. But, we also have the option to hire a geological consultant who could
run tests and analyze samples and tell us what their “informed” opinion is.
Consultants aren’t perfectly accurate; they merely have information that allows us
to update our own probability estimates. So the question becomes, what is it worth
for us to have better information, rather than perfect information?
First, consider whether or not we should proceed based only on our own
information. Suppose this project has two possible outcomes, call them success (S)
or failure (F), and that the probability of success is Pr(S) and the probability of
failure is Pr(F) = 1− Pr(S) . The payoff for success is π S and the payoff for failure is
π F , where π S > 0 and π F < 0 . Finally, assume we do not need to worry about
discounting, for simplicity.
9
The figure below shows the situation in a simple decision tree. The firm’s
decision to proceed or not is represented graphically by the white square at the left.
If they choose not to proceed (the top path through the figure), the probability is 1
(the dashed box) that they will earn a payoff of $0. If, however, they proceed, they
are not sure what will happen. In that case, whether or not they succeed depends on
the resolution of uncertainty regarding the underlying state of nature, represented
by the shaded circle. If conditions turn out to be favorable, which occurs with
probability Pr(S), the firm makes a profit of πS. If conditions turn out to be
unfavorable, which occurs with probability Pr(F), the firm makes a profit of πF.
Don’t Proceed 1 0
Proceed Pr(S) πS
Pr(F)
πF
How should the we decide whether or not to proceed? If both payoffs are
positive, there is no ambiguity, undertake the project. Similarly, if both payoffs are
negative, reject the project. In either case, additional information can not change the
decision, and so has no value. Otherwise, to determine if we should undertake the
project or reject it, calculate expected profit:
E (π) = ∑ i Pr(πi )πi
E (π) = Pr( S )πS + Pr( F )π F
If this value is positive, undertake the project; otherwise, reject it. Given only this
information about the project, we can say that our expected profit with no
additional information beyond our guesses about these probabilities (NoInfo) is
E(π | NoInfo) = Max ((Pr(S)π S + Pr(F)π F ),0). (1.8)
The reason the above expression written as the maximum of two arguments is that
if the expected profit were negative, you would not proceed, and would actually
earn 0, not a negative amount. So, 0 is the “worst” possible expected profit in this
case.
Example: Expected Profit with Initial Information
Suppose a project has the following values: Pr( S ) = 0.5 , π S = 100, and π F = −60 .
What is the expected profit without additional information?
Solution: Use the above definition of the expected profit given no additional
info.
10
E(π | NoInfo) = Max ((Pr(S)π S + Pr(F)π F ),0)
E(π | NoInfo) = Max (.5(100) + .5(−60),0)
E(π | NoInfo) = Max(20,0) = 20
Since the expected profit is positive, we will undertake the project with an
expected profit of $20. The scenario is illustrated in the decision tree below.
Don’t Proceed 1 0
11
probabilities. This is easiest to illustrate with an example. Suppose we purchased
similar information several times in the past. Each time, we either proceeded with
our project whether the report signaled good news or bad, or the times we did not
proceed, we later learned whether the project would have succeeded or not if we
had gone forward. Further, suppose we kept detailed records and we believe past
outcomes and observations are a reasonable basis from which to accurately infer
future probabilities. The results are shown in the table below.
Outcome
Success Failure Total
Good 6 2 8
News
Report
Bad 4 8 12
News
Total 10 10 20
From this historical data, we can estimate the probability of success given good
news or bad news. The probability of success given good news is how many times
we’ve succeeded after receiving a report of good news, divided by how many times
we’ve had good news, or
Pr(S | GN) = 6 = 3 .
8 4
Given that we received a good report, we now think the probability of success is
0.75. Those who have had and remember statistics will recognize this as a
straightforward calculation of a conditional probability. Similarly, the probability of
success given bad news is how many times we’ve succeeded after receiving bad
news, divided by how many times we’ve been given bad news, or
Pr(S | BN) = 4 12 = 1 3 .
Before moving on, note that we can also use this information to calculate the
probability of success with no additional information and the probability that we
will receive good news if we buy the report. Since success occurs ten times out of
twenty in total, the probability of success if no additional information is observed is
Pr( S | NoInfo) = 10 = 1 .
20 2
Similarly, of the 20 times the report was purchased, it yielded good news eight
times. So, the probability we will receive good news if we buy the report is
Pr(GN ) = 8 = 2 .
20 5
12
In practice, it is unlikely that a manager’s probability estimates will be based
entirely on this sort of precise empirical analysis. At the other extreme, a manager
may have only their best subjective guess based on their previous experiences in
cases that are roughly similar and the impressions they get from talking with
whomever they are considering purchasing the additional information from. Often,
the situation will fall between those two extremes. There may be some data based
on previous situations. However, those situations may not be a perfect match for the
current one. And, while experts who can provide additional information will have
established track records, those track records may not be perfect indicators of
future performance for many possible reasons. So, the manager is left to fill in the
gaps in what they can empirically estimate with their own subjective evaluation of
the situation and the quality of the information they are considering purchasing.
“Good” managers are good at taking whatever information is available and
formulating guesses about the actual but unknown state of nature and acting
accordingly.
With this understanding of how we might arrive at probability estimates, we
return to our consideration of the value of additional, but still imperfect,
information. The decision tree below summarizes the situation, as so far described.
The open square at the far left denotes the initial decision we face, to buy the
information or not. If we decide not to buy the additional information, the situation
is just like before – we face a decision to drill or not based only on our initial
probability estimates. If however, we decide to buy the information, uncertainty
about the nature of the report is resolved (indicated by the shaded circle) and we
get either good news or bad news. We then face another decision – whether to drill
or not. However, our assessment of the likelihood of success is different with good
news than with bad news, hence, we may make a different decision with good news
than with bad news. That potential to change our decision regarding whether or not
to proceed with the project is what gives additional information value. Information
that is too imprecise to make a difference in our decision has less value. The more
closely the new information allows us to estimate the true underlying state of
nature, the more valuable it is.
13
Don’t
Proceed 0
Don’t
Buy Pr(S)
Proceed πS
Pr(F) πF
Don’t
Proceed 0
Good
News
Pr(GN) Proceed Pr(S|GN)
πS
Pr(F|GN) πF
Buy
Don’t
Pr(BN) Proceed 0
Bad
News Pr(S|BN)
Proceed πS
Pr(F|BN) πF
So, how do we go about determining whether we should buy the report and its
value? We need to calculate expected profit conditional on purchasing the additional
information, E (π | Info) . Assuming we perform that calculation ignoring the cost of
attaining the additional information, the value of the report is simply the increase in
expected profit, and we should buy the report it its value exceeds its cost.
How, in turn, do we determine expected profit conditional on having purchased
the information? In this sort of problem, we will usually start at the end and work
our way back toward the beginning. That means we must calculate what expected
profit would be with good news, E (π | GN ) , and with bad news E (π | BN ) . Each of
those is calculated just like we calculated expected profit with no additional
information, except that we have different probabilities, depending on the
information we received. Expected profit with good news is then just the probability
of good news times expected profit conditional on good news plus the probability of
bad news times the expected profit conditional on bad news.
Whether we receive good news or bad news, we will proceed only if expected
profit based on our new estimate of the probability is positive, otherwise we will not
and profit will be 0. Therefore, expected profit given good news is
E(π | GN) = Max ((Pr(S | GN)π S + Pr(F | GN)π F ),0). (1.9)
It is important to note that if expected profit given good news is negative, we will
not proceed even with good news and the highest possible estimate of the
probability of success. We therefore would not proceed with no information or with
14
a report of bad news, either. Since the information can not affect our decision, it has
no value in that circumstance.
Example: Expected Profit with Good News
Suppose the probability of a successful project given a good report (good news)
is ¾, and that success will result in a cash flow of $100, while a failure will result
in a loss of $60. Find the expected profit given a good report.
Solution: Use the information given (observing that the probability of failure
given a good report is 1‐ ¾ = ¼) and the definition of expected profit given good
news.
E(π | GN) = Max ((Pr(S | GN)π S + Pr(F | GN)π F ),0)
(
E(π | GN) = Max 3 (100) + 1 (−60),0
4 4 )
E(π | GN) = Max (60,0) = 60
Because expected profit with good news is positive, we would want to continue
with the project if we receive good news. If this value had been negative, we
could immediately conclude that the report would be useless.
Moving on, expected profit given bad news is
E(π | BN) = Max ((Pr(S | BN)π S + Pr(F | BN)π F ),0). (1.10)
If expected profit given bad news is negative, we will abandon the project,
getting a payoff of 0; if it’s positive, we will proceed. A similar conclusion to the one
above is that if our expected profit given bad news is positive and we proceed with
the project even with the worst possible news about the chances of success, the
information has no value. This is because if we were to buy a report and it came up
unfavorable, we would still proceed with the project. That means we will of course
proceed with no information or with good news. Since the report would have no
effect on our final decision in such a case, it would have no value.
Example: Expected Profit with Bad News
Suppose that based on previous reports, a firm estimates that the probability of
a successful project given a bad report (bad news) is 1/3, and that success will
result in a cash flow of $100, while a failure will result in a loss of $60. Find the
expected profit given a bad report.
Solution: Use the information given (observing that the probability of failure
give a bad report is 1 – 1/3 = 2/3) and the definition of expected profit given bad
news.
15
E(π | BN) = Max ((Pr(S | BN)π S + Pr(F | BN)π F ),0)
3 (
E(π | BN) = Max 1 (100) + 2 (−60),0
3 )
(
E(π | BN) = Max −20 ,0 = 0
3 )
Since we would not proceed after receiving bad news, the information may have
some value, and likewise we can continue with our valuation. If this number had
been positive, the report would have been worthless.
Now that we know how to calculate both expected profit given good news and
expected profit given bad news, we can calculate the expected profit after buying the
additional information. There are two possible outcomes for the report: good news
or bad news. Each outcome has an expected profit associated with it ‐ the two
expected profits we just defined. So, expected profit with additional information is
E (π | Info) = Pr(GN ) E (π | GN ) + Pr(BN ) E (π | BN )
where Pr(GN) is the probability the report gives good news, and Pr(BN) is the
probability the report gives bad news. We know from our above discussion that for
the information to be valuable E(π|BN) must be 0, so the second half of the equation
falls out. Assuming the conditions outlined above hold, so that the report is in fact
valuable, expected profit with information then becomes
E(π | Info) = Pr(GN)E (π | GN) .
The maximum amount that the firm would be willing to pay for the information
is simply how much higher the firm expects profits to be with the information. Thus,
the value of information is
InfoValue = E (π | Info) − E (π | NoInfo) .
As long as the report costs less than this amount, the firm will buy it. (The firm is
exactly indifferent if the cost of the information equals its value).
Example: Expected Profit with Information and Information Value (Continuation)
Suppose, based on previous reports, a firm estimates that the probability of
receiving good news if they buy a report is 0.4. Using the previously calculated
expected profits with good news and bad news, calculate the firm’s expected
profit with additional information. Based on the previous examples, how
valuable is a report to the firm?
Solution: Use the definition of expected profit given information, then find the
difference between the firm’s expected profits with and without the information.
E (π | Info) = Pr(GN ) E (π | GN )
E (π | Info) = 0.4(60) = 24
16
InfoValue = E(π | Info) − E(π | NoInfo)
InfoValue = 24 − 20 = 4
Our running example is illustrated in full in the decision tree below.
Don’t 0
Don’t Buy
1/2
Proceed 100
(1/2)100‐(1/2)60=20
Value=24‐20=4 1/2 ‐60
Don’t 0
Good News
3/4
2/5 Proceed 100
(3/4)100‐(1/4)60=60
Buy 1/4 ‐60
3/5
Don’t 0
Bad News
1/3
Proceed 100
(2/5)(60)+(3/5)(0)=24 (1/3)100‐(2/3)60=‐20/3 2/3 ‐60
What’s to Come
Math is an incredibly useful modeling tool in economics. It allows us to give clear
and concise expression to the most important and fundamental concepts. In
addition, as is probably obvious from the material in the first chapter, we will make
extensive use of mathematical examples to illustrate the main concepts and to
practice using the tools of economic analysis. Therefore, the appendix to this
chapter provides a reasonably thorough and self contained math review. However, if
you are really rusty on your math, you may need to pull out the old textbooks for a
more in depth review.
The remainder of the book will build on the material introduced in this chapter
to analyze decisions firms and their managers must make when trying to maximize
the expected present value of profit. The figure below is a conceptual depiction of
the flows of inputs, goods and services, and funds and of the interactions between
the firm, suppliers, customers, and competitors that govern the firm’s profitability.
17
Labor
Materials
Capital Customers
Input Product
Markets Markets
18
Chapter 7 takes a step back to consider the basic theory of individual preference
and consumer behavior. That theory underpins demand analysis and lends itself to a
number of other applied analyses. Chapter 8 considers some of those applications
and extensions. Chapter 9 considers more complex forms of pricing – block pricing,
two part pricing, and menu pricing. Chapter 10 presents the basic model of
individual choice when faced with uncertainty and considers the ramifications of
risk aversion. Chapter 11 considers cost in detail.
Game theory, the last tool needed to complete our analytical toolbox, is the topic
of chapters 12, 13, and 14. We will use game theory to model multiple firms
simultaneously seeking to maximize their own profits, given their best guesses
about what all the other firms are going to do, recognizing that all the other firms
are doing the same thing and that every firm’s decision affects all the others. More
generally, non‐cooperative game theory is used to analyze any strategic situation
where the players all realize that their best play depends on what all the other
players are going to do.
Part 2 of the book uses the tools developed in Part 1 to study market structure –
the ways the environment in which firms operate effect the decisions available to
them, and, the way the decisions of firms interacting in a market effect other firms
and the market. Chapter 15 analyzes markets in which relatively large firms all
produce completely identical products. In such markets, gaining a strategic
advantage over competitors boils down to taking market share through having a
leaner cost structure or establishing an aggressive strategy early on, since nothing
but price can distinguish one product from the next. Chapter 16 considers markets
where products are differentiated. While cost structure still matters, product
positioning and advertising also become important elements of firm strategy.
Chapter 17 considers product markets where economies of scale are low and
there are few other entry barriers, so that the market can support a large number of
relatively small firms. In the limiting case, such markets are perfectly competitive,
which you should be familiar with from your principles of microeconomics class.
When there are enough firms in a market for the perfectly competitive model to be a
reasonably good approximation for whatever question an analyst wishes to answer,
its simplicity is a tremendous advantage. In particular, complications arising due to
strategic interdependence can be ignored. In that case, equilibrium prices and
quantities and the welfare of consumers and producers may be appropriately
analyzed with the simple supply and demand model. Applications of supply and
demand are the topic of chapter 18. Chapter 19 closes out the study of market
structure by summarizing the various models and placing them in the context of
data on various U.S. industries. It also considers several specific strategies firms may
pursue that are aimed at altering the nature of competition and the structure of the
market itself.
While something called a “firm” has played an important role in most of the first
eighteen chapters of the book, we have not looked at the reasons firms exist, or, why
they are structured as they are. Why don’t individuals simply specialize in the single
thing they do best and then trade with one another? How does the existence and
19
structure of the firm, in its own right, increase value added? That is the subject of
Part 3 of the book. In order to produce goods and services to sell to their customers,
firms must procure inputs. Those inputs may be purchased in the spot market, the
firm may contract with input suppliers, or, the firm might vertically integrate and
produce some of the intermediate inputs themselves. Chapter 20 considers the
choice between the spot market and contracting. It then moves on to consider issues
that arrive in contracting for inputs when the agent (the input provider) has private
information that is not available to the principal (the firm). Lest I give the wrong
impression, Chapter 20 barely scratches the surface of the economics of asymmetric
information, but it does give the basic insights. Chapter 21 considers three problems
other than information asymmetry that can cause the spot market or contracting to
break down, creating a role for a vertically integrated firm. These are team
production and the resulting free riding problem, relationship specific investments
and the resulting hold up problem, and, double marginalization.
Uses and limits of models
Before concluding this introductory chapter, several words of caution are in
order regarding the use of models in this book. It is important that we keep in mind
that they are only models. By definition, models are oversimplified representations
of a few aspects of reality. They allow us to focus on a few aspects of a problem that
we think are particularly important. In that way, they keep the analytical task
manageable. In making any decision we must consider multiple models and
carefully consider if anything has been left out of the models that might have serious
consequences for our decision. As noted by MIT economist Peter Diamond, “To me,
taking a model literally is not taking the model seriously.”2
This can happen in two ways. Some people who forget the models are intended
to be a simplified story, not a precise description of reality, reject all conclusions of
economic analysis on the grounds that the models are unrealistic, never mind the
fact that it would be impossible to draw any conclusions from, or even to construct,
a fully realistic model. That is, they throw the baby out with the bathwater. Others
try to force their actual decisions and opinions to conform narrowly to the results of
their pet model, and ignore the potential serious ramifications of things that lie
outside their models. This can lead to catastrophe when rare but serious events
occur that are not accounted for in the model. For example, LongTerm Capital
Management collapsed in the late 1990s not because they calculated incorrectly in
applying models such as the Black‐Scholes‐Merton option pricing model. Rather, the
collapse was due to the impact of rare but serious negative external shocks outside
of the model coupled with the highly leveraged positions they took as a consequence
of the conclusions reached from their models.3
Consider the assumption that we can treat firms as if they maximize expected
returns and ignore risk aversion (at least as a first approximation). It is useful
because it simplifies matters incredibly, with benefits. First, it lets us get started
2 “Taxes and Pensions,” Southern Economic Journal 2009, 76(1), page 2.
3 http://en.wikipedia.org/wiki/Long‐Term_Capital_Management
20
while we are developing the tools needed to study risk aversion. Second, and more
importantly, it allows us to go in to more detail in our analysis of other issues where
the central area of concern is not the degree of risk aversion, but the nature of profit
maximizing decisions and the direct impact of uncertainty on those decisions.
Without this assumption, we would never be able to gain a number of insights about
profit maximizing decisions. The drawback is that some can become so wrapped up
in the model that they forget it is just a model. Individual managers and board
members with large stakes in the company are, in fact, probably risk averse.
Further, focusing on the expected profits of an individual project within a firm can
miss important links to the big picture. If failure of a $50 million project bankrupts a
firm with expected profits from other projects of $500 billion, the project may not
be worth it even if it looks good evaluated on its own. We should always take
account of the things left out of our models – like the potential to bankrupt the
whole firm – before reaching any final decisions.
21
Chapter 1 Terminology
The following is a list terms that you should know in order to discuss and apply
the material from this chapter.
Adverse Selection A case of information asymmetry in which one party’s
characteristics are hidden from another party.
Asymmetric Information A state in which one party knows more than others.
Certainty Equivalent (CE) The single amount of wealth of a gamble that an
individual receives for certain that provides the same amount of utility that the
actual gamble offers. For risk neutral players, the certainty equivalent is equal to the
expected value of the gamble. For risk‐averse players, it is less than the expected
value of the gamble.
Expected Value The monetary value that one expects to receive from a particular
gamble. This can be applied to anything that has probabilities associated with
payoffs.
Incomplete Information A state in which there is a risk or uncertainty.
Marginal Cost The cost incurred when a firm sells one additional unit of a good. It
is the rate of change (or derivative with respect to quantity) of total cost. It is
increasing because of the presence of a fixed factor. In order to maximize profit, this
should be set equal to marginal revenue.
Marginal Revenue The additional income of selling one more unit. It is the rate of
change (or derivative with respect to quantity) of total revenue. In order to
maximize profit, this should be set equal to marginal cost.
Moral Hazard A case of information asymmetry in which one party’s actions are
hidden from another party.
Perfect Competition A type of firm structure with many firms where each
individual firm has no control over price levels; they are price takers. These firms
have no strategic decisions to make, since they take their price from the market.
Nothing they do will have any impact on the other firms in the market.
Perfect Information A state in which no hidden characteristics or actions exist.
The information the buyers have is the same as that of the seller and everyone
knows all information about the product at all times.
Risk Aversion An attitude toward risk that describes someone who values a
gamble less than the expected value of that gamble. A risk‐averse person would take
a guaranteed payoff that is less than the expected value of the gamble rather than
taking the gamble with the possibility of getting nothing/losing money.
Risk Neutrality An attitude toward risk that describes someone who values a
gamble at its expected value.
Risk Occurs when the probabilities that certain events will occur are somewhat
objective, i.e. probabilities are known to all parties.
22
RiskLoving Describes someone who must receive a guaranteed payoff that is
more than the expected value of a gamble in order to not take the gamble.
Uncertainty Occurs when the probabilities that certain events will occur are
subjective, i.e. probabilities are different depending on the party.
23
Appendix to Chapter 1
Math Used in Managerial Economics
Introduction
As mentioned in Chapter 1, economics attempts to develop systematic methods
for modeling events and decisions concerning individuals and organizations.
Because economics deals with people, it is by nature an inexact science; people are
irrational, whimsical, and often unquantifiable. In order to draw any useful
conclusions about these individuals, it is important that our methods be as reliable
as possible. Applied mathematics is the best way we have for being consistent,
accurate, and logical in developing economic theory because it is a closed system in
which we are unable to contradict ourselves. So, if we equip ourselves with relevant
mathematical knowledge, we will have access to useful tools that we can apply to
various economic situations. This also enables us to check our economic theory
against situations and data that we observe in the real world.
This review is intended to address the most common mathematical concepts
that students will encounter throughout the book. Any lack of understanding of
these basic mathematical concepts will seriously impede the student’s ability to
move forward with the economics. We cannot stress this enough: simple
observation of students’ struggles in learning managerial economics highlights the
importance of this material. While some of it may seem very basic, it is critical that
both method and theory are fully grasped. So, please read it!
Functions and Functional Notation
The most basic mathematical concept that can help us formulate applied
economic theory is a function. A function matches each element of one set to a
single element of another set. The sets that we most commonly talk about are
variables like price and quantity; they are just groups of several elements. Elements
of the set price are simply different prices that a firm could charge, such as $2 or
$5.50. Similarly, elements of the set quantity are different amounts of a good that a
firm could sell, such as 100 boxes or 22.5 widgets. A function, then, is really nothing
more than a relationship between two sets, such as price and quantity.
There’s some additional terminology that will help us describe functions in more
detail. First, let’s look at a basic function, and then use it to introduce these
additional details. The following function shows a relationship between quantity q
and price p:
q = 4000 − 100 p
What does this function tell us about this particular relationship between price and
quantity? First, we are able to find solutions to this function. A solution to a function
is just a collection of values (elements) that each variable (set) takes on which make
24
the function true. So, if we let price have a value of 1, we can determine what
quantity would have to be to satisfy the function:
q = 4000 − 100(1)
q = 3900 .
Thus, if price is $1, our function tells us that 3900 units will be sold. This is
essentially what a function does ‐ it describes the relationship between variables
over all their possible values.
We just found out what the quantity would be if we chose a price of $1, but if we
wanted to find out what price would be if we started by choosing a quantity of 3900,
the math would be a little bit more difficult. This is because our function was solved
for q – and there is a reason for this. When a function is solved for a single variable
that occurs only on the left‐hand side, that variable is known as the dependent
variable (quantity, in our example). As the name suggests, the dependent variable
in a function depends on the other variable(s) in the function. Hence, our function
implies that the quantity sold actually depends on the price that is being charged.
This is why it seems more natural to choose a price first, and then solve for the
resulting quantity. The variable that determines the dependent variable (price, in
our example) is known as the independent variable. Again, as the name suggests,
this variable is chosen independently of the function, and helps determine what
value the dependent variable will take on.
So, a function expresses a relationship between a dependent variable and an
independent variable. In order to save us the trouble of denoting which variable is
dependent and which is independent every time we introduce a function, functional
notation expresses these relationships in a very general way. In the above example,
q was the dependent variable and p was the independent variable – so we would say
that q is a function of p. In this way, we have stated there is a relationship between
the two variables, and that since q is a function of (depends on) p, we have also
identified which variable is independent and which is dependent.
While the statement “q is a function of p” is rather terse, there is an even shorter,
more convenient way to represent this relationship: q = f ( p) . This notation is
equivalent to saying that quantity is a function of price, so again we’ve
communicated all the essential information about the variables. More often, rather
than naming the function f (⋅) , we will write something like q = q( p) , which means
(again) that the variable q is determined by a function named q, which depends on
the variable p. So, using functional notation, our previous example can be written as
q( p) = 4000 − 100 p
and can be referenced to in a question simply as q(p) (read q of p). This helps to cut
down on the number of letters and symbols we need to remember.
So far, we’ve been talking about functions with one dependent variable and one
independent variable. If we think about what affects the sales of a firm, price is
certainly dominant, but, as is always the case in economics, it is certain that other
25
variables also impact quantity demanded. Suppose a firm assumed that income per
capita in the market also impacted quantity sold and they wanted to model this
additional relationship. The firm is supposing that quantity depends on income, in
addition to price. So, income (M) is an additional independent variable. Perhaps the
function will look like the following:
q = 4000 − 100 p + 50M .
Thus, if a function has more than two variables, only one will be the dependent
variable, and the others will be the independent variables. The notational difference
is intuitive: here, quantity is a function of price and income, or q = q( p, M ) .
Finally, we introduce notation for dealing with multiple prices or quantities.
Suppose a company sells its product in two different states, or sets multiple prices
for different customers, and wants to differentiate between these groups. In order to
identify which price and quantity are associated with which group, we use
subscripts on our variables. So, q1 may denote the quantity sold in Florida, while q2
denotes the quantity sold in New York; or pH could represent the price charged
during high‐demand hours, and pL the price charged during low‐demand hours. In
the text, subscripts will always be used to differentiate variables according to
location, time, or some other dimension, while superscripts will be reserved for
exponents of variables.
Equation of a Line
Slope
One of the most common mathematical tools we will use throughout the book is
a line and its equation. Not only does the equation of a line lend itself to rigorous
quantitative application, the graphical representation is able to communicate
economic relationships between variables clearly and effectively. It is crucial to
understand the basic components of a line, so let’s start with a simple example using
a generic x/y coordinate plane.
The first thing we notice about this line is that it is sloping down; that is, as we
start from where the line crosses the Y Axis at 8 and increase the x value (move
right), the y value decreases (moves down). When a line looks like this, its slope is
said to be negative. The slope of a line immediately tells us a lot about the line.
26
Without knowing anything else, we can conclude that there is an inverse
relationship between the y variable and the x variable: as one increases, the other
decreases, and vice versa. This type of relationship will have several implications
when we begin to apply lines to economic theory.
Of course, there are many lines that have negative slopes. For example,
both lines L1 and L2 have negative slopes. How can we differentiate between these
two lines? The slope of a line is actually a specific number that we can calculate. The
definition of the slope of a line, which we will usually refer to generically as b, is the
change in the dependent variable per unit change in the independent variable. That
may sound like a complex definition at first, but think back to what dependent and
independent variables are – they are simply names for the variables that we are
describing a relationship about. Since a line is just a relationship between two
variables, the x and y variables are the independent and dependent variables. But
which one is which? By convention, the variable on the Y Axis is the dependent
variable, and the variable on the X Axis is the independent variable. This will be true
when we start introducing different economic variables, but for now, since we’re
using a simple example, y is our dependent variable and x is our independent
variable.
Now that we know which variable is independent and which is dependent, let’s
revisit the definition of slope. The slope of a line, again, is the change in the
dependent variable per unit change in the independent variable. Since we’ve
identified our variables, we can rewrite this definition: The slope of a line is the
change in y per unit change in x. This definition seems much more manageable. In
order to translate it into a mathematical expression, first we need to understand
exactly what it’s saying.
If we had $10 to spend and we bought 2 apples, how would we find out how
much we spent per apple? We would divide the $10 by the 2 apples and find that we
spent $5 per apple. Similarly, to find the change in y per unit change in x, we want to
divide the change in y by the change in x. By dividing, we will have a number that
represents how much y changes each time x increases by a single unit.
How do we find the change in x and the change in y on a graph? The change in a
variable is just the difference in its beginning and ending values over some interval.
So, the change in x is x1 minus x0, where x0 is the starting x‐value and x1 is the ending
27
x‐value. Likewise, the change in y over some interval of a line is y1 minus y0. Note
that the change in a variable can either be negative or positive, depending on the
beginning and ending points of the interval.
Since we know how to mathematically represent changes in our variables, we
can rewrite the definition of slope again. The slope of a line (b) is the change in y per
unit change in x, or
y1 − y0
b= . (1.1)
x1 − x0
Remember, the two values for each variable that we use to find the change are over
an interval of the line; when calculating slope, we need to use the same interval
when finding the change for each variable.
We also will often use the notation Δ (read delta) to denote “change in”, which
means we can write slope in an even simpler way:
Δy
b= . (1.2)
Δx
Now, let’s revisit our first line and calculate its slope.
To make the calculation easy, let’s use the entire line as our interval. We need to
calculate the change in y over the entire line and divide it by the change in x.
Starting at point a, the beginning y value (y0) is 8 and at point b, the ending y value
(y1) is 0. Similarly, x0 is 0 and x1 is 4. This means the slope is
28
Δy 0 − 8
b= = = −2 .
Δx 4 − 0
Let’s apply this number to our original definition of slope. The slope is the change in
y per unit change in x. So each time x increases by one, y decreases by two. This is
what the slope tells us.
To see how this applies to economics, lets change the variables in this example to
price and quantity. Usually, we write price on the Y Axis and quantity on the X Axis.
So, price is assumed to be our dependent variable, and quantity our independent
variable.
Because the slope is ‐2, we know if quantity were to increase by one unit, price
would have to decrease by two units. If this line represented a demand function, a
manager would know that in order to sell another unit, he would have to lower his
price by $2. This is a simple example of how lines can be applied to economic theory.
Equation
Once we have calculated the slope of a line, writing an equation for the line is
easy. The slope intercept form of a line is
y = a + bx , (1.3)
where a is the value of y when x equals 0 (i.e. the y‐intercept), and b is the slope.
Let’s write the equation of our first line.
29
It is clear that the y‐intercept is 8 because when x is 0, y takes a value of 8. Since we
know the slope of our line is ‐2, the equation for our line is
y = 8 − 2x .
If we wanted, we could solve this line for x, obtaining x = 4 − 0.5y . It is important
to understand that both of these equations represent exactly the same line and thus
the same specific relationship between the variables y and x. However, since x is
now on the left‐hand side, we may be tempted to re‐label which variable is
independent and which is dependent. We will encounter situations similar to this
often throughout the book. For example, whether price determines quantity or
quantity determines price depends on the type of firm, the market, and many other
factors. More often than not, they both determine each other. Because of this
intrinsic quality of the marketplace, variables may not have an exact category, and
will frequently be labeled independent in one scenario and dependent in the next.
Solving Two Linear Equations
Often we will want to calculate the intersection of two lines. Any two distinct,
non‐parallel lines cross at a single point:
This point (or solution) is an ordered pair of values that satisfy both equations.
There are several ways to find the solution, but one of the simplest and most
consistent is to use substitution. Given two equations, y = f (x) and y = g(x) , their
intersection (x*, y*) can be found as follows:
i) Set f (x) = g(x) and solve for x*.
ii) Substitute x* into either y = f (x) or y = g(x) and solve for y*.
To illustrate using the method of substitution to solve two linear equations
simultaneously, let’s find where the equations y = 8 − 2x and x = −2 + y intersect.
First, we need to get the second equation in the form y = g(x) by solving for y :
x = −2 + y → y = 2 + x .
Now, as per the above points, we can set 8 − 2x = 2 + x and solve for x*:
8 − 2x = 2 + x
30
6 = 3x
x* = 2
To find y*, substitute x* into either equation. Let’s use the first one:
y = 8 − 2(2)
y* = 4
Therefore, (2,4) is the solution for the system of equations y = 8 − 2x and
x = −2 + y .
Area of a Rectangle and a Triangle
We will frequently come across geometric shapes in our graphs that represent
areas of interest. For example, the rectangle in the following graph represents a
firm’s profit:
To calculate how much profit this firm is making, we need to calculate the area of
this rectangle. The area of a rectangle is
A = bh , (1.4)
where b is the base of the rectangle and h is the height. In our example, the base is q
and the height is ( p − c) , so the firm’s total profit is π = ( p − c)q .
31
Another common geometric shape that we will encounter in economics is a
triangle. The shaded triangle below shows the deadweight loss in a market as a
result of a tax:
Since a right‐hand triangle is just a square cut in half along its diagonal, the area of a
triangle is
bh
A= , (1.5)
2
where b is the base of the triangle and h is the height. So, the total deadweight loss
(q − q1 )(( p + t) − p ) .
for this market is DWL = 0
2
Exponents, Exponential Functions, and Logarithms
When taking account of a production facility’s returns to scale, or making
assumptions about the elasticity of a market or compound growth, exponents,
exponential functions, and logarithms can be very useful. Thus, it is important to
know all of the transformational rules concerning both of these concepts as they will
show up throughout the book.
Lets start with a basic power function where our independent variable, x, is
raised to some exponent, a. What is xa? a is just the number of times you multiply 1
by x. x1 is the same as 1x, x2 is 1xx, and, so on. x0 means multiply 1 by 0 x’s, so x0=1.
What is x‐a? It is the opposite of multiplying 1 by x a times. Dividing is the opposite
of multiplication, so a is the number of times you divide 1 by x. x‐1 is 1/x, x‐2 is 1/xx,
and, so on. All of the standard “rules” for exponents just follow from this definition.
i) x a+b = x a x b
ii) (x a )b = x ab
1
iii) x − a =
xa
Now that we understand how the basic power function and exponents work, we
note that the power function can be generalized to
32
f ( x) = kxb , (1.6)
where k and b are just constants. The constant k in the general power function just
represents a scaling factor.
The general exponential function moves the independent variable into the
exponent of the function and is written as
f ( x) = aebx , (1.7)
where a and b are constants and e is approximately 2.71828. But, what is e, and what
is special about it? It is the base unit of continual growth processes. Imagine a
process that doubles itself every time period so that the interest rate is r=1. At the
end of one period, the future value of 1 unit growing for 1 period is FV=(1+1)1=2. Of
course, it is doubling!
Now imagine compounding that 100% per period growth n times over the
n
⎛ 1⎞
period. Then FV = ⎜1 + ⎟ . Suppose we compound monthly. FV=2.61304. Daily?
⎝ n⎠
FV=2.71456. Hourly? FV=2.71812. By the minute? 2.71828. With continuous
growth, each little bit of growth starts growing as soon as it emerges. In the limit, as
we approach continuous growth, the FV of one unit growing continuously for one
period of time is e. If we let it grow for 2 periods, we have FV=ee=e2. For x periods,
ex.
Suppose it grows for one period at r=2. We could think of it as 100% growth
occurring twice, so FV=ee=e2. If it grows for one period at rate x, we have FV=ex.
Generally, then, if an initial unit grows continuously at rate r for t periods, FV=ert. If
the initial amount is PV, instead of 1, we just have FV=PVert. Thus, the general
exponential function gives the future amount, y, that started as an initial amount, a,
and then grew exponentially, or in a compound fashion, at rate b for each unit
increase in x, so y = aebx .
So, e has tons of applications with any natural growth process or for modeling
any variable that is affected in an exponential or compound way by another
variable. As it happens, we will not use it a lot in this class, but, we will use it a time
or two. More importantly, you can’t understand natural logs without e, and we will
use natural logs often.
So, what is a natural log? The natural logarithm of y, ln(y), is the power to which
e ≈ 2.71828 must be raised to yield y. So, if x = ln( y ) , then y = e x . So, the natural log
undoes exponentiation, that is, it is the inverse of the exponential function. Fine, but,
intuitively, what is the natural log? Since ex is the amount into which one unit grows
after growing continuously for one period at rate x, x periods at rate r=1, or, t
periods at rate r where rt=x, ln(y) is the combination of growth rate and growth
time, rt, needed for one unit to grow continuously into y units. More generally, if y
increases from an initial amount, a, at an exponential rate of b with increases in x,
ln(y)/b gives the value of x needed for the initial amount to grow to y units.
33
The following properties of natural logarithms follow directly from the
definition of the natural log and from the basic rules for exponents given above.
iv) ln(xy) = ln(x) + ln(y)
⎛ x⎞
v) ln ⎜ ⎟ = ln(x) − ln(y)
⎝ y⎠
Solving Two NonLinear Equations
Any function whose output is not proportional to its input is called a non‐linear
equation. All of the functions we have looked at so far have been linear, but
whenever a logarithm or exponent (other than 1 or 0) is present and interacting
with one of the variables, the equation becomes non‐linear.
Non‐linear equations, like linear equations, can represent demand curves, cost
functions, and many other economic concepts. So, it will often be desirable to solve
two non‐linear equations simultaneously, just as it was with linear equations. The
process for solving a system of non‐linear equations is similar to solving a system of
linear equations, except solving for the final answer is often more tedious.
To demonstrate, let’s solve the following system of non‐linear equations:
2x + y 2 − y = 4
y 2 − 3 = x .
In our previous section dealing with linear equations, we followed a general rule of
solving each equation as a function of x alone, and then setting these two functions
equal to each other. We could do that here, but looking at the second equation, we
can see that it’s already solved for x as a function of y alone. This means we can
substitute it in for x in our first equation, and solve the remaining equation for y:
2(y 2 − 3) + y 2 − y = 4
2y 2 − 6 + y 2 − y = 4
3y 2 − y − 10 = 0
(3y + 5)(y − 2) = 0
3y + 5 = 0 or y − 2 = 0
y = −5 3 or y = 2 .
We can now plug these two solutions into either of the original equations to find the
x values:
34
( 3) − 3 = x
−5
2
(2 )2 − 3 = x
25 − 3 = x or 4 − 3 = x .
9
x = −2 9 x =1
{( ) }
So our solution set, which consists of two ordered pairs, is −2 9 , −5 3 ,(1,2) .
Seldom, we will end up with a quadratic equation that does not factor as neatly
as it has done here. In that case, it may be necessary to use the quadratic equation.
For an equation of the form ax 2 + bx + c = 0 , the quadratic equation tells us the
solution(s) are
−b ± b 2 − 4ac
x= . (1.8)
2a
To illustrate how logarithmic transformations may be necessary, let’s solve the
following system:
ln(q) = 3 + 2 ln( p)
ln(q) = 4.5 + 1.6 ln( p)
Observing that both equations are already solved for ln(q), we can set the right side
of each equation equal to each other:
3 + 2 ln( p) = 4.5 + 1.6 ln( p)
.4 ln( p) = 1.5
ln( p) = 3.75
Using rule iv from the list of rules on exponents and logarithms, this simplifies to
p = e3.75
.
p ≈ 42.52
To find q, plug p into either original equation:
ln(q) = 3 + 2 ln(42.52)
ln(q) ≈ 3 + 7.5
ln(q) ≈ 10.5 .
q ≈ e10.5
q ≈ 36, 315.5
35
Definition of Derivative, Relationship to Max/Min
Definition
When dealing with linear equations (lines), we introduced the concept of slope,
and were able to calculate this value with relative ease. The slope of a line revealed
information about the rate at which the two variables changed – for example, when
x increased by 1, y decreased by 2. This was also true for every interval on the line.
How do rates of change apply to non‐linear equations? Above, we defined a non‐
linear equation as a function whose output is not proportional to its input;
graphically, this amounts to any curve that is not a line. Take the following function
f (x) :
It is clear that this function does not have a constant slope, as lines do. The slope
changes based on what part of the function we’re looking at. This is why discussing
rates of change as they apply to non‐linear functions requires a more sophisticated
concept: the derivative.
Before defining what a derivative is, let’s take another look at our function.
Suppose we wanted to find the average rate of change between two points, a and b:
Between these two points, y increases by y1 − y0 ( Δy , the change in y) and x
increases by x1 − x0 ( Δx ). So, the average rate of change is simply
Δy y1 − y0
= .
Δx x1 − x0
36
This is merely the slope of the thick dotted line between points a and b. Now, if we
rewrite y1 as y0 + Δy , we can express the average rate of change as
Δy (y0 + Δy )− y0
= .
Δx Δx
Using functional notation, this becomes
Δy f (x0 + Δx )− f (x0 )
= .
Δx Δx
But how can we use this to find what the rate of change is at each individual point?
Suppose we moved x1 closer to x0, decreasing Δx . If we continue shrinking Δx
until it is infinitesimal, this is what our average rate of change would look like
between the two points:
The line segment between the two points comes closer and closer to being the line
that is tangent to the curve at point a. The slope of this line segment, then, converges
to the slope of the tangent at a as Δx approaches zero. This leads to the formal
dy
definition of a derivative, denoted , as the limit of the average rate of change as
dx
the change in the independent variable approaches 0:
dy f ( x0 + Δx) − f ( x0 )
= lim (1.9)
dx Δx → 0 Δx
In essence, the derivative of a function at a point is the rate of change of y with
respect to small changes in x; it captures how fast the curve is changing at that point.
Since the derivative is the slope of the tangent, it is clear that for any non‐linear
function, the derivative will change based on where it is being taken:
37
In fact, the converse of this is true as well: given any linear function, its
derivative will be constant along the entire function. This is because the derivative
of a linear function is simply its slope.
Max/Min
The derivative of a function can also help us identify when we are at a “peak” or
a “valley”; that is, when a function is being maximized or minimized. Consider the
following function:
Since the derivative is the slope of the tangent at a specific point on a curve, and
the slope of a horizontal line is zero, it is clear that if the derivative is zero for a
given critical point, that point will be a maximum of the curve. What if a function has
a local minimum, in addition to a local maximum?
38
We can see from the above figure that the derivative will also be zero at local
minimums. How can we tell whether we’re at maximum or a minimum? First, we
know if a given x value is either a maximum or a minimum by checking to see if the
first derivative at that value is zero; this is known as the First Order Condition
(FOC). If the FOC holds, we can check to see whether the point is a minimum or a
maximum by looking at the curvature of the function at that point – which is given
to us by the second derivative. If the second derivative is negative, x is a local
maximum, and if the second derivative is positive, x is a local minimum. This is
known as the Second Order Condition (SOC), and by it we can tell whether we are
maximizing a function or minimizing it.
Derivative Rules
We’ve introduced the concept of a derivative and how it relates to local
maximums and minimums of a function. But how do we actually calculate the
derivative of a function? Depending on the given function, the rules required to
produce that function’s derivative vary. In this section, we discuss some common
rules for derivatives and how they apply to general cases.
Far and away, we will make the most use of the power rule:
dy
i. y = ax b ⇒ = bax b−1
dx
For example, the derivative of
y = 4 x 3
is
dy
= 12 x 2
dx
Three common special cases are:
dy
ii. y = a ⇒ = 0
dx
dy
iii. y = a + bx ⇒ =b
dx
a dy −ba
iv. y = b
⇒ =
x dx x b+1
Based on these rules, the derivative of
2
y = 3+
x 0.5
is
39
dy −1
= .
dx x1.5
We will use the sum rule and the product rule often. The sum rule is
dy
v. y = f (x) + g(x) ⇒ = f '(x) + g '(x) ,
dx
which says that the rate of change in y is the rate of change in f(x) plus the rate of
change in g(x). So, the derivative of
y = 3x + 4x 2
is
dy
= 3 + 8x .
dx
The product rule is
dy
vi. y = f (x)g(x) ⇒ = f '(x)g(x) + g'(x) f (x) ,
dx
which says that when finding the rate of change in y, the rate of change in f(x) gets
multiplied by g(x) since y depends on the product of f(x) and g(x). Similarly, the
impact of changes in g(x) are multiplied by f(x). The total rate of change, then, is the
sum of the rates of change due to each of these parts. Thus, the derivative of
( )
y = 9 − x 2 2x
is
dy
dx
(
= −2x(2x) + 9 − x 2 2 )
= −4x 2 + 18 − 2x 2 .
= −6x 2 + 18
A special case is
f (x) dy f '(x)g(x) − f (x)g'(x)
= f (x) (g(x)) ⇒
−1
vii. y = = ,
g(x) dx g(x)2
so the derivative of
2x 2
y=
(2x − 1)
is
dy 4x(2x − 1) − 2x 2 (2)
= .
dx (2x − 1)2
40
Less often, we will use exponential and logarithmic functions:
dy a
viii. y = a ln(x) ⇒ =
dx x
dy
ix. y = aebx ⇒ = baebx
dx
So, the derivative of
y = 2 ln(x)
is
dy 2
=
dx x
and the derivative of
y = 3e2 x
is
dy
= 6e2 x .
dx
What if one variable depends on another that is a function of a third variable? For
example, if cost depends on quantity, but quantity depends on price. For this, we
need the chain rule. If z = g(y) and y = f (x) ,
dy
x. z = g ( f (x)) ⇒ = g ' ( f (x)) f '(x)
dx
which says that the rate of change in z with respect to x is the rate of change in z
with respect to y, times the rate of change in y with respect to x. For example, the
derivative of
y = (4 − 0.5x)2
is, using first the power rule and second the chain rule,
dy
= 2(4 − 0.5x)(0.5) .
dx
Partial Derivatives
Up to this point, we have talked about differentiation in the context of two
variables, one dependent and one independent. To measure the rate of change of the
dependent variable with changes in the independent variable, we can use the rules
described above. But, most interesting phenomena in economics depend on more
than one variable. For instance, a manager may find that the quantity his firm is able
to sell depends not only on price, but also on income. When a function has more
41
than one independent variable, the rules for differentiation are the same, but the
notation is slightly different.
Take, for example, the following function:
y = 5x + 3z
This function has one dependent variable (y) and two independent variables (x and
z). Since the rules for differentiation listed above apply to equations with two
variables only, we cannot apply them directly here. Recall, however, what a
derivative measures – the rate of change between two variables. If we can mimic an
equation that has only one independent variable by holding the second one
constant, we’ll have an equation with two variables and thus we’ll be able to apply
the rules of differentiation.
So, if we were to hold z constant in the above equation, we could take the
derivative of y with respect to x only. Since we are taking the derivative with respect
to only one variable at a time, this is called the partial derivative of y with respect to
∂y
x, and is denoted .
∂x
When finding a partial derivative, we are looking for the rate of change of the
dependent variable for small changes in only one of the independent variables. Thus,
the other independent variables are treated as constants in this process (they are
not changing). Partial derivatives are then found by applying the standard rules for
differentiation, treating other variables as other constants are treated. Two simple
examples follow:
∂y ∂y
i. y = ax + bz ⇒ = a and = b
∂x ∂z
∂y ∂y
ii. y = ax b z c ⇒ = abx b −1z c and = acx b z c −1
∂x ∂z
Based on these rules, the partial derivatives of
y = 3x 2 + 4xz + z
are
∂y ∂y
= 6x + 4z and = 4x + 1 .
∂x ∂z
42
Chapter 2
Cost, Demand, and Profit Maximization
As discussed in Chapter 1, profit is simply the difference between revenue and
cost. Revenues are determined on the demand side and costs are determined on the
production side. Therefore, before studying profit‐maximizing decisions, we need to
know a bit about the way economists model cost and demand. This chapter will
touch on those models briefly, and then turn to profit maximization. We will
consider models of demand and cost in more detail in later chapters.
Cost, Its Determinants, and Marginal Cost
In order to produce products to sell to its customers, a firm must procure inputs.
Expenditures for these inputs are costs. A firm’s cost function, C(q), represents the
minimum possible total cost of producing q units of output. Thus, in using a cost
function to model cost, we are assuming two things. First, we assume there is no
simple waste of inputs. In other words, we assume that the firm is able to use all of
its facilities and labor exhaustively. Second, we assume the firm chooses the most
efficient production technique, or combination of inputs, for producing that level of
output. So, a firm has already figured out the optimum amount of its different input
types (labor, plants, etc.) and uses them accordingly. How a firm makes this decision
about how much of each input to use is based on the prices of each input, what
technology is available, how much time is available for production, and myriad other
factors. Optimizing inputs will be discussed in greater detail in Chapter 6.
In general, the notation for a firm’s cost function is C = C(q;w, r, z) , which says
cost depends on the quantity produced, q, the wage rate, w, what a firm pays its
labor, the interest rate or rental rate, r, what it pays for investments in plants and
equipment (capital), and any other variables that affect cost lumped into the single
variable z.
Sometimes it is important, or just convenient, to distinguish between fixed costs
and variable costs. Fixed costs are those that do not depend on the output of the
firm. Fixed costs are inherently a short run concept. Over a short time span, costs
such as the lease on office space or the payment on a loan for plant construction, are
fixed, regardless of what level of output is chosen. However, with more time, the
lease need not be renewed or the plant can be sold or expanded. So, given enough
time, there are no completely fixed costs. Variable costs are the costs that increase
as the firm increases its output.
The cost of producing an additional unit is the firm’s marginal cost. In other
words, it is the rate at which total cost changes. Thus, marginal cost is defined as the
derivative of total cost, or
dC
MC = (2.1)
dq
43
Upon taking the derivative of the total cost function, it is clear that the fixed
component of cost will fall out since it never changes. Thus, marginal cost may be
viewed as either the change in total cost or the change in total variable cost when
one more unit is produced. For this reason, marginal cost may seem like the same
thing as variable cost, or, perhaps, variable cost per unit of output; this is a
misconception. Marginal cost is simply the cost of producing an additional unit of
output. It does follow immediately from these definitions that the sum of the
marginal cost of each unit that has been produced is equal the total variable cost.
The graphs to the right show a total cost curve
and a marginal cost curve. These have the “typical”
textbook shape. First, cost rises with output at a
decreasing rate, meaning marginal cost is falling.
This might reflect increasing opportunities for
specialization. Then, at some point, some form of
diminishing returns sets in and marginal cost
starts to rise, meaning total costs increase at an
increasing rate. However, cost curves need not
always have this “typical” textbook shape.
When we actually want to estimate a cost
function, or, to specify one for a practice,
homework, or exam problem, we have to be more
specific mathematically. In practice, we will use
four alternative functional forms to approximate
cost functions. They are:
1) C (q) = F + cq where c>0 and F≥0 and for
which MC = c ,
2) C (q) = F + cq d where c>0, d>0, and F≥0 and
for which MC = cdq d −1 ,
3) C (q) = F + aq + bq 2 where a>0, b>0, and F≥0 and for which MC = a + 2bq , and
4) C (q) = F + aq + bq 2 + cq 3 where a>0, b<0, c>0 and F≥0 and for which
MC = a + 2bq + 3cq 2 .
The first approximation is perhaps the simplest. It simply assumes total cost is
the sum of a fixed component, F, and a variable cost component that is constant per
unit produced at c. The second allows for either increasing or decreasing marginal
cost, depending on whether d>1 or d<1. If d=1, we just get back the first
approximation, which is obviously just a special case of the second. The third
approximation gives a linear and increasing marginal cost. Finally, for the right
parameter values, the fourth approximation gives the “typical” textbook case where
marginal cost first falls then rises. These are all just approximations to be used in
models. Which is more appropriate to use depends on the particulars of the
44
situation under study, and is a matter to be decided using both data and careful
judgment.
Demand and Its Determinants, Inverse Demand
When a firm sets a price for their product, they are ultimately making a decision
about how many consumers will buy their product; this is because some customers
value the product enough to pay a high price and others are only willing to pay a low
price. The quantity consumers will purchase, then, is a function of the price charged,
and is represented
q = q( p)
In general, if the quantity is some “total” quantity, such as total output of an
industry, it will be denoted as Q; otherwise, the quantity will represent that of an
individual firm and will be denoted as q.
Many factors besides price influence consumers’ buying decisions. Consider the
decision to buy a car. Your annual income will weigh heavily on which car you
choose, as will the type of terrain that surrounds your home city, the prices of other
cars in the same category as the one you are considering, and countless other
factors. For some purposes, for example when we wish to focus on how to choose a
profit‐maximizing price or production level, it is convenient to ignore such factors.
In other cases, we may want to take explicit note of the impact of such other factors.
The notational differences are intuitive: if quantity depends on price, as well as
income (m), the prices of substitutes and complements (pS and pC), and the size of
the market or the number of consumers (n), and we let z represents other variables
that affect demand but aren’t being explicitly measured, then we express quantity is
a function of all of these variables:
q = q ( p, m, pS , pC , n, z ) .
Later, in Chapter 7, we will explore the theory underlying demand curves in more
detail. For now, we focus on understanding and using the simple notion that the
quantity demanded can be expressed as a function of these variables.
Representing demand in this way implies that quantity depends on price, among
other things. Since economics is an application of science to the real world, it is often
the case that the variables within a system determine each other, as opposed to
being either exclusively dependent or independent. For this reason, it is sometimes
sensible (or simply more convenient) to represent the price that’s being charged as
a function of the quantity the firm wants to sell. When the relationship is expressed
this way, it is called inverse demand.
Consistent with our earlier notation for demand, if we want to account for the
dependence of inverse demand on several variables, it could be represented as
p = p (q, m, pS , pC , n, z ) .
If we want to focus on the relationship between price and quantity alone and
suppress the other variables, it is represented as
45
p = p(q) .
When we illustrate demand curves with
price on the vertical axis and quantity on p
the horizontal, as in the figure to the right,
we are actually drawing an inverse demand
curve. In the figure, the demand curves
follow the law of demand. That is, at higher
prices, the quantity demanded is lower.
The shift from demand curve d0 to demand
curve d1 illustrates the impact of one of the d1
other variables that affect demand. The
d0
increase in demand might be due to an
increase in income (assuming the good is
normal), an increase in the number of q
consumers, an increase in the price of a substitute, or a decrease in the price of a
complement Any of those changes would cause the quantity demanded to increase
at any given price, thus, the whole curve shifts right.
Measuring the Sensitivity of Quantity Demanded to Price
We know that if the price of a product changes, it affects the quantity demanded.
More specifically, the law of demand tells us that if price falls, quantity demanded
rises, and vice versa. But what about the rate at which demand rises and falls? The
slope of the demand curve tells us how fast quantity changes with respect to price. If
we are given a demand curve q(p), the slope is simply the derivative of quantity with
respect to price:
dq( p)
< 0. (2.2)
dp
Consider the graph of a demand curve to the
right. Every time price falls by one dollar, quantity
demanded increases by two. This change in
quantity on a per‐dollar basis is defined as the
slope of the demand curve. Since price and quantity
are inversely related on a demand curve, this value
will always be negative. In our example, the slope
of the demand curve is ‐2. Since demand is linear,
the slope is the same all along the curve. If demand
were non‐linear, the slope would change
depending on what price were being charged, but it
would still always be a negative value.
Notice that our calculation of a slope of ‐2 was inconsistent with the definition of
the slope of a line that you are probably familiar with. If you take price to be the “y”
axis and quantity to be the “x” axis and define slope as rise over run, or, the change
46
in y divided by the change in x, you would find a slope of ‐½. Referring back to the
graph, if we were to look instead at a single unit increase in quantity, we could infer
that price falls by 0.50. This change in price per‐unit increase in quantity is the slope
of the inverse demand curve, which is what is shown in the figure above. It is in
direct accord with our understanding of the slope of the line in the figure above ‐ the
slope of the inverse demand curve is ‐½.
The slope of the inverse demand curve tells us how fast price changes with
respect to quantity. If we are given an inverse demand curve p(q), the slope of the
inverse demand is just the derivative of price with respect to quantity:
dp(q )
< 0. (2.3)
dq
Again, the slope of the inverse demand curve can change depending on what price is
being charged, but it will always be negative.
It is often desirable to compare demand responsiveness across several firms,
regions, nations, or even time periods. Yet, the units in which both prices and
quantities are quoted vary. For example, prices may be in dollars, cents, yen, or
euros, and quantities may be in ounces, pounds, grams, dozens, hundreds, or
thousands. Since slope depends on the units in which both price and quantity are
quoted, it can be an inconvenient way to summarize the price sensitivity of demand.
Elasticity measures demand responsiveness in percentage terms, making it
units‐free. Because elasticity is units‐free, it can be easily used to compare demand
across multiple firms, industries, locations, or time periods. The elasticity of demand
with respect to price, denoted η (eta), is defined as the percentage change in
quantity relative to the percentage change in price, or
%Δq Δq / q Δq p
η= = = . (2.4)
%Δp Δp / p Δp q
In equation (2.4), Δq is the change in quantity and Δp is the change in price.
Elasticity can be measured over an interval of prices and quantities, or at a single
price and quantity. If it is measured over an interval, Δq and Δp are the differences
between the endpoints of the interval (although we would then need to decide
which point along that interval to use for p and q). If it is measured at a single point,
Δq and Δp are assumed to be infinitesimal, so this fraction becomes the rate of
change of quantity with respect to price at that point, which is simply the derivative.
Elasticity then becomes
dq p
η= . (2.5)
dp q
This is simply the slope of the demand curve times price divided by quantity.
Since there is an inverse relationship between price and quantity on a demand
curve, the first term in the equation for elasticity will always be negative, and thus
elasticity of demand will always be negative. In general, the more negative elasticity
47
of demand is, the more responsive customers are to changes in price. When
elasticity of demand is between 0 and ‐1, demand is considered to be inelastic; when
it is ‐1 exactly, demand is said to be unitary elastic; when elasticity is less than ‐1,
demand is said to be elastic.
Elasticity is generally not constant over a whole demand curve. For a linear
demand curve with constant slope, it is obvious from equation (2.5) that elasticity is
high at high prices and low at low prices. This is often the case even when demand is
not linear. Intuitively, when prices rise, consumers tend to become more price‐
sensitive. At very low prices, consumers tend to be relatively insensitive to price
changes.
Demand Approximations
Economic theory only says demand slopes down – it says nothing directly about
the shape of the demand curve. For applications (and for writing homework and
exam problems) we need to specify more about the demand relationship. This often
involves simply assuming that a particular shape is a good enough approximation of
the shape of the true underlying demand curve and then choosing the parameters of
the approximation to fit the actual demand curve as closely as possible. In practice,
one of two assumptions is almost always made about the shape of the demand curve
– either slope is assumed to be constant or else elasticity is assumed to be constant.
More precisely, it is assumed that either slope or elasticity is relatively constant
over the range of prices under consideration.
Linear Demand Approximations
If we assume that the slope of the demand curve is constant, we are using a
linear approximation to model demand. The name linear demand comes from the
assumption that each variable that is being measured (price, income, etc.) affects
demand at a constant (linear) rate, regardless of how big or small the variable is. For
example, if an increase in per capita income of $5,000 leads to an increase in
quantity demanded of 30 units at a given level of price and income, the same is true
at any other level of price or income.
In a linear demand approximation, the coefficient of each variable represents the
change in quantity demanded per unit change in the variable. A generic linear
demand representation looks like the following:
qD = b0 + bp p + bM M + bS pS + bC pC + bN N + bZ Z + ε . (2.6)
The coefficients, or parameters, are chosen to fit the observed data on demand as
closely as possible. We will cover that in the next chapter. For now, we simply want
to introduce the idea of using a straight line to approximate demand and focus on
how to use such a model once we have it.
In equation (2.6) p is price, M is income, pS and pC are the prices of substitutes
and complements, N is the number of customers (or size of the market), Z
represents any other factors that are important in a particular situation, and ε
represents a random error term. The error term encompasses all the factors that
48
can not be readily understood and measured. Note that bP will always be negative
because price and quantity are inversely related. Recall that for a specific point on
the demand curve, elasticity is
dq p
η=
dp q
thus, elasticity is
p
η = bp . (2.7)
q
This elasticity is high at high prices and approaches 0 as price decreases, as
described previously.
When focusing on setting price or choosing production levels, it is simplest to
represent quantity demand as a function of price only. To do so we will lump all of
the other variables into a single coefficient A, and let B, which is positive, be the
absolute value of the slope. This single‐variable linear demand function is then
qD = A − Bp . (2.8)
Note that the slope of the demand curve is still negative because –B is negative. It is
important to understand that by simplifying the demand approximation to a
function of a single variable, all of the other effects (income, price of substitutes,
etc.) are being represented in the intercept A.
It is also often more convenient to rearrange the demand curve and deal with
inverse demand. Doing so gives
q = A − Bp
q − A = − Bp
1
−B
(
−B p = )1
−B
( q − A) .
A 1
p= − q
B B
So, inverse demand has a positive intercept and a negative slope. It is easier to write
this as
p = a − bq (2.9)
where a=A/B and b=1/B.
49
Example: Linear Demand
Suppose demand is approximated by q = 10000 − 250 p . What is the
interpretation of the slope, what is inverse demand, and what are quantity and
elasticity if price is 10?
Solution: The slope of ‐250 means if price increases by 1 unit, quantity falls by
250 units. To get inverse demand, just solve for p in terms of q.
q = 10000 − 250 p
250 p = 10000 − q
10000 1
p= − q
250 250
p = 40 − 0.004q
To find quantity and elasticity if price is 10, plug 10 into the demand function
to find q. Then plug slope, price, and quantity into the definition of elasticity.
q = 10000 − 250(10) = 7500
p 10 1
η = bp = −250 =−
q 7500 3
Log linear (constant elasticity) Demand Approximations
Another approach when approximating demand is to assume the elasticity of
demand is constant. This is called a log‐linear demand model – the reason why will
become apparent later. By assuming constant elasticity, log linear demand is making
a presumption about how consumers’ buying habits change as demand
determinants change, on a percentage basis. For instance, if a firm assumed that an
increase in per capita income of 10% would result in a 5% increase in quantity
demanded, regardless of the current level of income or other demand determinants,
we would be assuming constant income elasticity of demand 0.5.
A generic constant elasticity demand approximation looks like the following,
qD = eb0 +ε p p M bM pS bS pC bC N bN Z bZ ,
b
(2.10)
where the exponents on the demand determinants turn out to be the constant
demand elasticities – again the reason why will become apparent soon. Just as in the
linear demand model above, ε is a random error term and bp is negative. If we were
to take the logarithm of both sides of this approximation and apply the laws of
logarithms, we would obtain
50
ln ( qD ) = b0 + bp ln ( p ) + bS ln ( pS ) + bC ln ( pC )
. (2.11)
+bM ln ( M ) + bN ln ( N ) + bZ ln ( Z ) + ε
Note that this approximation is linear in the natural logs of the variables. That is,
treating the logs of the original variables as the dependent and independent
variables, we have a linear equation, thus the name log‐linear demand model.
If we wanted a simplified log linear approximation with price as the only
variable, it would look like
q = Ap − B (2.12)
or
ln ( q ) = ln ( A ) − B ln ( p ) , (2.13)
where again A is just standing in for all the other variables and where we are again
letting B represent the absolute value of the coefficient on price, which is an
exponent in the demand curve in this case. In this simplified form, we can easily look
at both the slope and the elasticity of the demand approximation. The slope is the
derivative of quantity with respect to price,
dq
= − BAp − B −1 .
dp
This can also be written as
dq Ap − B
= −B .
dp p
Finally, noting that q = Ap − B , this is just
dq q
= − B . (2.14)
dp p
With that, it is easy to find elasticity using its definition from equation (2.5)
dq p
η=
dp q
q p
= −B .
p q
= −B
Thus, with a log‐linear demand approximation, the elasticities of demand with
respect to any of the independent variables are constant and equal to the coefficient
on that variable.
51
Example: Log Linear Demand
Suppose demand is approximated by q = 20000 p −2 . What is the elasticity of
demand, what quantity is demanded at a price of $20, what is the slope at that
price, and what is inverse demand?
Solution: The elasticity is given by the exponent, ‐2. To find quantity demanded
at a price of $20, just plug 20 into the demand curve.
q = 20000(20) −2
20000
=
400
= 50
To find the slope at that point, take the derivative of demand and plug in the
values for the given point.
dq
= (−2)20000 p −3
dp
−40000
=
203
= −5
Finally, to get inverse demand, rearrange the demand curve to express price in
terms of quantity.
20000 p
2
p2
q = First, multiply both sides by p2 and divide both by q.
q p2 q
1
( p ) = ⎛⎜ 20000 ⎞2
1
2 2
⎟
⎝ q ⎠ Second, raise both sides to the ½ power to isolate price.
p ≈ 141.42q −0.5
Revenue and Marginal Revenue
A firm’s total revenue is simply the price it charges times the quantity it sells.
Revenue = price×quantity , or
R = pq . (2.15)
Of course, the firm is not free to choose both price and quantity. If the firm is so
small that its actions can have no noticeable effect on the market price, it is said to
be a price taker, or perfectly competitive. In that case, the firm may choose only how
much to sell at the going market price.
52
If the firm is large relative to its product market, its actions will have non‐
negligible effects on the market price. In that case, the firm is said to be a price
maker, or to possess a degree of market power. Still, the firm is not free to choose
both price and quantity because it is constrained by consumer demand. If the firm
wants to sell a lot, it must set a lower price. If it wants to charge a high price, it must
resign itself to lower sales. So, the firm can choose price or quantity, but not both.
When demand slopes down, if the firm sells so many units as to drive the price
all the way to 0, revenue is 0. Similarly, revenue is 0 if the firm sells 0 units. In
between, revenue is positive, reaching a maximum possible value at some
intermediate quantity. In general, selling additional output has two offsetting effects
on revenue. First, the additional sale brings in new revenue. Second, in order to sell
the additional unit, price must be set lower. This reduces revenue.
The graph to the right illustrates this
important point. Suppose we begin at a price of
p1 and sell a quantity of q1. In order to sell more
units, we lower our price to p2, at which price we
sell q2 units. What are the effects on revenue? For
each unit we initially sell, we receive less money
per unit, due to the decrease in price. This loss in
revenue is represented by the rectangle labeled
“LOSS”. However, our lower price causes more
people to want to buy our product, so we sell
more units. The increase in unit sales adds to our
revenue and this increase is represented by the rectangle labeled “GAIN”.
If the gains outweigh the losses, selling the additional units adds to the firm’s
total revenue. If the losses outweigh the gains, the change in revenue from selling
the additional units is negative. This change in total revenue from selling a single
additional unit is called marginal revenue. In other words, it’s the rate at which
total revenue changes when more output is sold. Starting from a high price and low
quantity, marginal revenue is positive – the direct increase in revenue from the
extra unit sold outweighs the decrease in price. At high quantities and low prices,
marginal revenue is negative – the direct increase in revenue from the additional
unit is overwhelmed by the decrease in price.
53
The top panel of the figure to the right depicts
the relationship between revenue and quantity
when demand slopes down, R(q). The bottom
panel shows marginal revenue, MR(q). When
quantity is 0, so is revenue. When quantity is so
high as to drive price to 0, revenue is again 0. At
low quantities, revenue is rising with additional
sales, so marginal revenue is positive. At high
quantities, revenue falls with additional sales and
marginal revenue is negative. At the maximum
revenue, marginal revenue exactly equals 0 – the
direct gain from selling an additional unit is exactly
offset by the decline in price needed to generate a
higher sales total.
To model revenue mathematically, starting
from an inverse demand curve, p(q), total revenue
would be price times quantity
R = p(q)q . (2.16)
Marginal revenue is the derivative of total revenue
with respect to quantity. Using the chain rule, this is
dR dp
MR = = p + q . (2.17)
dq dq
It is important to really understand how equation (2.17) tells in part of one line
the whole story about marginal revenue explained in three paragraphs of words
above. When another unit is sold, the direct effect is to bring in additional revenue
equal to the price charged for that unit, p, the first term in the derivative. The
indirect effect is to lower price by the sensitivity of price to quantity sold, dp/dq.
That decrease in price is applied to all units that are to be sold, q. So, we sell an
additional unit at price p but receive a lower price on all q units sold.
It may initially seem that marginal revenue is always simply the price that the
firm sets for its product. However, looking at the above equation, it is clear that this
is only true if dp/dq is 0 ‐ that is if the firm’s sales have no effect on price. That
means the firm is a price taker and the inverse demand curve faced by the firm is a
horizontal line at whatever the going market price is. Markets where all firms are
price takers are called perfectly competitive. Most firms have some control over the
price they charge for their products though, so dp/dq will be negative and marginal
revenue will be less than price. However, in cases where the individual firms have
only a little control over price, it is often more convenient, and realistic enough, to
just treat them as price takers and ignore their tiny impact on market prices.
54
Example: Revenue and Marginal Revenue
Suppose inverse demand is approximated by p = 7 − 0.3q . What are the
equations for revenue and marginal revenue, what quantity and price would
maximize revenue, and what is the maximum value of revenue?
Solution: Revenue is price times quantity, so
R = ( 7 − 0.3q ) q .
Marginal revenue is the derivative of revenue with respect to q. Using the
product rule,
MR = 7 − 0.3q − 0.3q = 7 − 0.6q .
(In general, with linear demand, marginal revenue has the same intercept and
double the slope of the inverse demand curve. You should prove that to yourself
using the general form of an inverse demand curve, p = a − bq .)
Revenue is maximized where marginal revenue is 0, so
MR = 0
7 − 0.6q = 0
0.6q = 7
70
q=
6
p = 7 − 0.3q
3 70
=7−
10 6
42 21 21
= − =
6 6 6
21 70
R= ≈ 40.83
6 6
55
Marginal Revenue and Elasticity
Recall that, from the definition of marginal revenue,
dR dp
= p + q .
dq dq
If we multiply the second term by (p/p) and rearrange, we obtain
dR dp ⎛ p ⎞
= p + q⎜ ⎟
dq dq ⎝ p ⎠
dR dp ⎛ q ⎞
= p + ⎜ ⎟ p .
dq dq ⎝ p ⎠
Observing that (dp/dq)(q/p) is the inverse of elasticity, we can write
dR 1
= p+ p
dq η
dR ⎛ 1⎞
= p ⎜1 + ⎟ . (2.18)
dq ⎝ η⎠
This equation describes the relationship between marginal revenue and
elasticity. If the elasticity of demand exceeds 1 in absolute value so that demand is
elastic, marginal revenue is positive. That is because the increase in quantity sold
will swamp the decline in price. As elasticity increases in absolute value, 1/η
approaches zero, so marginal revenue approaches price. As elasticity becomes very
large, the marginal revenue from the next unit sold will be almost the same as the
current price.
Notice that if demand is inelastic (elasticity is between 0 and ‐1), marginal
revenue will be negative. This occurs because lowering price when demand is
inelastic will cut revenue. Since more units sold means higher cost, it will also cut
profit. In other words, if you truly believe demand is inelastic, you ought to raise
price.
56
Example: Marginal Revenue and Elasticity
Suppose elasticity is ‐0.25. What is marginal revenue?
Solution: From equation (2.18)
⎛ 1 ⎞
MR = p ⎜ 1 − ⎟
⎝ 1 4⎠
= p (1 − 4 ) .
= −3 p
Since elasticity is less than 1 in absolute value, MR is negative! Price should be
increased.
Profit Maximization
We have already established that it is a manager’s ultimate duty to maximize the
present value of the profits created by the firm. If a manager sets a price that is too
low, the firm may sell a lot of units, but profit per unit will be so low that total
profits are small. On the other hand, if a manager sets a price that is too high,
quantity demanded will decrease too sharply, leading potentially to high profit per
unit but again low total profit. How do we model choosing a price and production
level to maximize profit?
The figure at right shows a generic profit
function, π = R(q ) − C (q) . Just as in the above
discussion, it is clear that selling too many or too
few units will result in a loss of potential profits.
Using calculus, we can maximize this profit function
by setting its derivative equal to zero:
dπ ⎛ dp ⎞ dC
= ⎜ p + q⎟ − = 0 .
dq ⎝ dq ⎠ dq
Based on the definitions of marginal revenue and
marginal cost, this profit‐maximizing condition can
be written as
MR − MC = 0
or
MR = MC (2.19)
So, when the revenue a firm gets from selling its last unit equals how much it costs
to make that last unit, a firm is maximizing profit. If marginal revenue is greater
(less) than marginal cost, the change in profit from selling another unit will be
57
positive (negative) – so the firm should sell more
(less). The graph to the right shows this profit‐
maximizing condition in general. In order to sell
the profit‐maximizing quantity, q*, where
marginal revenue equals marginal cost, the firm
sets a price of p*.
This is the basic idea at the core of the class. If
doing something increases profit, do it more. If it
lowers profit, do it less. Profit can only be
maximized where the marginal benefits and
marginal costs of any action balance one another
exactly. Understanding what this means for decisions in various scenarios with
information of varying completeness and complexity and in situations where many
decision makers are simultaneously trying to predict what everyone else will do so
as to strategically maximize their own profits will fill this course. Indeed, broadly
understood, that basic pursuit fills all courses that fall under the large heading of
microeconomics. So, the trick to mastering the course is to really understand what is
going on here and then hone your critical thinking skills and analytical abilities so
that you can apply this basic insight in many more complex situations.
What if we had maximized marginal revenue instead? Then we would set
marginal revenue equal to 0. Price would be lower and quantity would be higher.
Importantly, profits would be lower. That is because the last units sold would have
marginal revenue near 0 but a positive marginal cost. If the bottom line is profit,
both cost and revenue must be factored in. Beware the recommendations of a sales
or marketing department who are paid commissions based on revenue, not profit. It
may be in their interest to recommend prices too low and sales too high for the good
of the company!
58
Example: Profit Maximization
Suppose a firm faces an inverse demand curve of p = 7 − 0.3q and has a cost
function of C(q) = 9 + 1.1q . Find the profit‐maximizing price and quantity, and
calculate the firm’s profit.
Solution: Set up the firm’s profit function and then maximize by setting the
derivative equal to 0.
π = pq − C(q)
π = ( 7 − 0.3q ) q − ( 9 + 1.1q ) (Substituting for p from inverse demand)
dπ
= 7 − 0.6q − 1.1 = 0
dq
0.6q = 5.9
q = 5.9 0.6
p = 7 − 0.3 ( 5.9 0.6 ) = 4.05
59
dπ 70 10 10 11
= − p− p+ =0
dq 3 3 3 3
20 81
p=
3 3
p = 4.05
70 10
q= − (4.05) ≈ 9.83
3 3
π = 4.05 ( 9.83) − ( 9 + 1.1( 9.83) ) ≈ 20
Whether you should work from the demand curve or the inverse demand curve
just depends on convenience. Most of the time, it will be easier to work from
inverse demand in problems like this, but not always.
Estimates of demand elasticity can also be used to maximize profit. Recall the
equation relating marginal revenue to marginal cost:
⎛ 1⎞
MR = p ⎜ 1 + ⎟ .
⎝ η⎠
This relationship between elasticity and marginal revenue always holds since it was
derived from a generic revenue formula. But how does it relate to marginal cost and
overall profit? Now we know that if profit is being maximized, marginal revenue
equals marginal cost. So, if a firm is maximizing profit, we can rewrite this
relationship as
⎛ 1⎞
MR = p ⎜ 1 + ⎟ = MC
⎝ η⎠
and, solving for price, we obtain
⎛ η ⎞
p=⎜ ⎟ MC (2.20)
⎝ 1 +η ⎠
η
In this form, the coefficient on marginal cost, , can be thought of as a “mark‐
1 +η
up” factor. This factor tells us how much to mark‐up price above marginal cost to
maximize profit based on our customer’s sensitivity to price. This mark‐up factor
increases as elasticity decreases in absolute value, thus profit‐maximizing prices are
higher where consumers aren’t as price sensitive.
While this relationship always holds, it is particularly useful in situations where
both marginal revenue and elasticity are roughly constant in the face of small
changes in quantity. In that case, equation (2.20) is not simply a condition that must
60
be true if profits are maximized, but is a simple formula for estimating the profit‐
maximizing price, as long as it is not too far from the current price.
Example: Profit Maximization and Elasticity
Suppose elasticity is ‐5, marginal cost is 3, and price is currently 5.50. Without
assuming elasticity and marginal cost are constant, is price too high, too low, or
just right? If we assume elasticity and marginal cost are relatively constant,
⎛ 1⎞
Solution: If elasticity is ‐5 and price is 5.5, marginal revenue is 5.5 ⎜1 − ⎟ = 4.4
⎝ 5⎠
from equation (2.18). Since this is higher than marginal cost, more units should
be sold, meaning the current price is too high.
To maximize profit, the mark up over marginal cost should be ‐5/(1‐5), or 1.25.
So, price should be (1.25)(3)=3.75.
Example: Profit Maximization using Mark‐up
Suppose a firm faces a demand curve of q = 10,000 p −3 and unit cost is constant
at $4 per unit. Find the profit‐maximizing price and quantity, and calculate the
firm’s profit.
Solution: This problem could be solved like the earlier example with linear
demand, but given that we have a demand curve with constant elasticity, as well
as a constant marginal cost, we can use Equation (2.20) to find the price.
Observing that the elasticity is ‐3 and the marginal cost is 4, price is
⎛ η ⎞
p=⎜ MC
⎝ 1 + η ⎟⎠
⎛ −3 ⎞
p=⎜ 4
⎝ 1 − 3 ⎟⎠
p = (1.5 )4 = 6
and quantity is
q = 10,000 (6 ) ≈ 46.3 .
−3
The firm’s profit is
π = (6 − 4)(46.3) = 92.6
61
Chapter 2 Terminology
The following is a list of terms that you should know in order to discuss and
apply the material from this chapter.
Fixed Cost A cost that remains constant (in a relative range of time) regardless of a
change in the number of units produced. In the short run, this type of cost cannot be
avoided whereas in the long run, it can.
Law of Demand Law that states that as price increases, quantity demanded
decreases. This explains why demand curves are downward sloping and the price
coefficient is always negative.
Linear Demand An additive representation of demand that holds the slope
constant. It assumes each variable that affects demand does so linearly, as opposed
to having a squared or cubed relationship. The coefficient of each variable
represents the way in which the variable affects demand.
LogLinear Demand A multiplicative representation of demand in which each
variable is raised to a certain power. This model assumes that elasticity is constant
and is the power to which the price variable is raised.
Marginal Cost The cost incurred when a firm sells one additional unit of a good. It
is the rate of change (or derivative with respect to quantity) of total cost. It is
increasing because of the presence of a fixed factor. In order to maximize profit, this
should be set equal to marginal revenue.
Marginal Revenue The additional income of selling one more unit. It is the rate of
change (or derivative with respect to quantity) of total revenue. In order to
maximize profit, this should be set equal to marginal cost.
Markup Factor A factor that shows by how much more a firm can charge
consumers for a product than it costs to make. As consumers become more price
sensitive, the markup factor approaches 1, that is, price approaches marginal cost.
The less price sensitive consumers are, the higher the markup factor. The markup
factor is multiplied by the marginal cost in order to find a firm’s profit maximizing
price.
Market Power A firm’s ability to control and change the market price of a good.
Whereas a perfectly competitive firm (price taker) has the least market power, a
monopolistic firm (price maker) has the most.
Price Elasticity of Demand The percentage change of quantity sold with respect to
a percentage change in price. It is a tool that tells us how sensitive customers are to
changes in price. Elasticity of demand will always be a negative number and the
more elastic demand is, the bigger the absolute value of elasticity will be.
Price Taker A firm that has no control over the price of a good (e.g. a wheat
farmer) who takes and uses the global price of that good. This type of firm has no
market power, i.e. a perfectly competitive firm.
62
Profit A firm’s financial gain; the difference between total revenue and total costs.
This is the amount that rational firms care most about and will always want to
maximize when making decisions.
Revenue A firm’s income; the total amount received by consumers or the price per
unit of a good multiplied by the number units sold.
Total Cost The total amount it costs a firm to produce a unit of a good multiplied by
the number of units sold. This is the amount to be subtracted from revenue in order
to find a profit or a loss.
63
Chapter 3
Applications and Extensions of Optimal Production and Pricing
In this chapter, we consider a number of related applications and extensions of
the basic principles set out in Chapter 2. The specific extensions considered are:
price discrimination, setting price across locations that differ in market size but are
otherwise similar, the implications of a capacity limit, the optimal choice of capacity
when demand varies over time, maximizing profit when there is uncertainty about a
determinant of profit, and the value of additional information when faced with
uncertainty. Rather than viewing them as separate topics, you should look at them
as small additions or adjustments to the basic ideas introduced in the last chapter,
or as small variations in the basic tools you have already learned to use. The
primary goal of this course is for you to learn to use the tools of microeconomics to
understand, analyze, and evaluate business decisions. Since the number of possible
situations is unlimited, there is no hope of simply learning by rote what economic
analysis concludes about every possible situation. Rather, you must practice
generalizing and adapting the basic tools to ever new and changing situations.
Simple (3rd degree) Price Discrimination
Maximizing profit with a single product and a single customer type is easy
enough. But sometimes, firms are able to extract additional profit by charging
different types of customers different prices. This is known as simple (or 3rd degree)
price discrimination. A common example of simple price discrimination is when a
movie theater offers discounts to senior citizens. By doing this, it is charging two
different groups of customers (senior citizens and non‐senior citizens) two different
prices for the same product. By charging each customer group a price based on that
group’s demand elasticity, the theater is able to increase its overall profit.
For a firm to effectively use simple price discrimination, its customers must fall
into distinct groups that differ in their willingness to pay. These customer groups
must also be easily identifiable, such as by IDs. Also, because the firm is charging
two different prices for the same product, resale must not be possible. Otherwise,
the customer that gets the product for the cheaper price could make a profit by
reselling to the other customer at a slightly lower price than the firm charges.
Assuming both of these conditions are present, a firm is able to maximize their
profit by charging two different prices. Letting subscripts 1 and 2 denote two
different consumer groups, the firm’s profit is
π = p1q1 + p2 q2 − C (q1 + q2 ) . (3.1)
Cost depends on the total amount produced, which is q = q1 + q2 . Since the cost of
producing a unit is the same whether or not a type 1 customer or a type 2 customer
ends up purchasing it, it follows that a firm will maximize profit when
MR1 = MR2 = MC .
64
That is, marginal revenue for both customer types must equal the common marginal
cost of production. This makes sense. If marginal revenue exceeds (falls short of)
marginal cost in either market, more (fewer) units should be sold in that market.
Which market is charged the higher price? The one with the less elastic (less
price‐sensitive) demand. From the rule established above for the relationship
between the profit maximizing mark‐up and elasticity, saying that marginal revenue
must equal the common marginal cost in both markets means:
η1 η
p1 = MC and p2 = 2 MC .
1 + η1 1 + η2
Thus, if the absolute value of elasticity is lower in market 1, the price is higher in
that market. That is why senior citizens and students get discounts – they tend to
have less disposable income and/or more time to shop, thus they are more sensitive
to price.
Example: 3rd Degree Price Discrimination – Linear Demand
Suppose there are two types of customers with inverse demand curves
p1 = 20 − q1 and p2 = 30 − q2 . If the firm’s cost is C(Q) = 0.5(q1 + q2 )2 , find how
much more profit the firm can make by implementing 3rd degree price
discrimination.
Solution: We need to find the firm’s profit when they use discrimination and
when they charge a single price, and then find the difference. The firm’s profit
when charging two separate prices is
π = (20 − q1 )q1 + (30 − q2 )q2 − 0.5(q1 + q2 )2
and in order to maximize profit, MR1 = MR2 = MC .
MR1 = MC 15 = 4q1
20 − 2q1 = q1 + q2 q1 = 3.75
MR2 = MC q2 = 5 + q1 = 8.75
30 − 2q2 = q1 + q2 p1 = 20 − 3.75 = 16.25
20 − 2q1 = 30 − 2q2 p2 = 30 − 8.75 = 21.25
2q2 = 10 + 2q1 π = 16.25(3.75) + 21.25(8.75)
q2 = 5 + q1 −0.5(3.75 + 8.75)2
20 − 2q1 = q1 + 5 + q1 π = 168.75
Now suppose that the firm can only charge a single price. We need to find the
firm’s total demand by combining the individual consumer types’ demand
curves. Since the total amount the firm sells is the sum of the amount they sell to
65
type 1 and type 2, we need to solve the original inverse demand curves for
quantity:
p1 = 20 − q1 → q1 = 20 − p1
p2 = 30 − q2 → q2 = 30 − p2
To find the firm’s total demand curve with one price ( p1 = p2 = p ), add both of
these quantities:
q = 50 − 2 p
Now, rearranging to find inverse demand
p = 25 − 0.5q
we can solve for the quantity and price that maximize profit.
π = (25 − 0.5q)q − 0.5q2
MR = MC
25 − q = q
q = 12.5
p = 25 − 0.5(12.5) = 18.75
Profit maximization when purchases per capita do not depend on
market size
It is often reasonable to assume purchases per capita do not depend on the size
of the market. That is, in two markets that are similar in terms of income,
demographics, and anything else that affects demand other than market size,
purchases will be proportional to population at a given price unless there is some
specific reason that preferences for the product are related to population. For
example, the demand for public transportation may be greater in larger cities due to
higher congestion. But, for something like movie tickets or hamburgers, it is hard to
see why population size, in itself, should affect the demand of an individual
consumer. One natural application of this model occurs when goods are highly
durable or infrequently purchased for some other reason, in which case each
consumer purchases either 0 or 1 unit in any given time period. For example, few
individuals buy more than one house or car in any given year, or more than one
concert ticket for any given tour, more than one football ticket for a given game, or
more than one movie ticket for a given showing. In that case, purchases per capita
can be thought of as (very nearly) the fraction of consumers that purchase the good.
66
In this case, demand can be written as
qD
= f ( p, pS , pC , M, z)
N
where N is the city size and f ( p, pS , pC , M , z ) is purchases per capita, or the fraction
of the population that buys the good for goods that are durable or otherwise
infrequently purchased. This model is useful because if you know a lot about
demand in Gainesville and you want to estimate demand in Jacksonville, you can
draw conclusions as long as you assume that everything is similar enough between
the locations, except city size.
How does city size affect demand elasticity, and thus the profit‐maximizing mark
up, if it does not affect demand per capita? It doesn’t. To see why, first note we can
write demand as
qD = Nf ( p) . (3.2)
Thus, using the definition of elasticity and equation (3.2),
dq p df p
η= =N
dp q dp Nf ( p)
dq p
η=
dp q
df p
= N . (3.3)
dp N f ( p )
df p
=
dp f ( p)
So, if a firm’s marginal cost is roughly constant, this model shows the profit‐
maximizing price doesn’t vary with city size, only with other demand shifters.
67
Example: Demand for Infrequently Purchased Goods
Of 1,000 potential customers, the fraction purchasing is approximated by
f ( p) = 1 − 0.1 p , and constant unit cost is $2. Find the price and quantity that
maximizes profits.
Solution: The quantity of tickets sold is the number of people times the fraction
that will buy them
q = 1000(1 − 0.1 p )
and profit is then
π = pq − 2q
π = ( p − 2)q
π = ( p − 2)i1000(1 − 0.1 p )
dπ
= 1000 ( (1 − 0.1 p) − 0.1( p − 2) ) = 0
dp
1 − 0.1 p − 0.1 p + 0.2 = 0
0.2 p = 1.2
p = 6 , q = 1000(1 − 0.1(6)) = 400 , and π = (6 − 2)(400) = 1600
Maximizing Profit with a predetermined capacity constraint
Something that is common among many examples of infrequently purchased
goods – movie theaters, sporting events, etc. – is the presence of venues, which have
a limited amount of seating. In the previous section, we faced a demand function
that was proportional to population size and maximized profit, but what if the
profit‐maximizing price sold too many seats and the venue ran out of space?
In practice, firms often have to deal with capacity constraints. If our profit‐
maximizing quantity is less than our venue’s capacity, the constraint doesn’t affect
our decision. If, however, our optimum price sells more seats than we have
available, the next best thing we can do is raise price until the excess tickets are no
longer demanded. In this way, our price will be set where the quantity demanded is
exactly equal to our venue’s capacity (we will sell out). This situation is depicted in
the figure below. The figure shows inverse demand per capita (or the fraction
purchasing). Capacity is limited to q units, so the fraction purchasing cannot exceed
q N . If the profit maximizing price is p* ignoring capacity limits, the fraction
purchasing would be too high. So, price must rise to p** to equate demand and
capacity.
68
p
p**
p* Per Capita
Inverse
PeakLoad Pricing Determining capacity when demand varies over
time
In the last few applications, we showed that ticket sales of 400 would have given
the firm $1600 in profit, but when their capacity limited their sales to 200, their
profit was reduced to $1200. If the firm had been able to choose how large they
wanted their venue to be, 400 seats would be optimal, since 400 tickets would
maximize their profit. But, this number was only based on a single demand function.
If the profit‐maximizing quantity was 400 at some points of the year but only 100 at
other points, it may make sense to build a venue that has a capacity of 200 seats.
69
In situations where demand differs by time of day (or season), it is natural to
suggest a firm ought to charge a higher price during the time of day when customers
have a higher willingness to pay; this is the notion of peak‐load pricing. For example,
suppose a restaurant experiences higher demand during dinner than it does during
lunch. To maximize profit, it should charge dinner customers a higher price than it
charges lunch customers. Since the assumption for this example is that demand will
be higher at dinner, the quantity it sells during dinner will be greater than or equal
to the quantity it sells during lunch.
To proceed, we need to model both operating cost and capacity cost. Assume the
marginal cost of selling one more unit is constant at c, the operating cost per unit. In
our example, that would be the cost of an additional meal: setting one more place,
preparing one more meal, bussing one more table, etc. If there is plenty of capacity
already at hand, these are the only costs incurred if one more unit is produced.
If, however, there is not extra capacity available, there is an additional capacity
cost. We assume the marginal cost of an additional unit of capacity is constant and
equal to k, the capacity cost per unit. In the restaurant example, think of the capacity
cost per unit as the opportunity cost of seating one more customer during peak
hours: the cost of getting additional floor space, another table, another chair,
perhaps a larger kitchen so more meals can be simultaneously prepared, etc.
Assuming the quantity sold during peak hours is greater than during off‐peak
hours, demand during peak hours determines our capacity. That means that, in our
restaurant example, we will only need to worry about capacity costs at dinner, while
at lunch, sections of the restaurant will be roped off and not used. So, if we decide to
sell another unit during peak hours, we must increase the capacity of our restaurant.
The marginal cost of selling one more meal during peak times is c + k .
Therefore, to maximize profit with high‐demand times and low‐demand times, a
firm should set
MRL = MCL = c (3.4)
and
MRH = MCH = c + k (3.5)
where the H and L subscripts refer to high and low demand respectively.
The figure to the right shows inverse demand
and marginal revenue for both high (H) and low (L)
demand times, along with peak and off‐peak
marginal cost. During low demand, the optimal
quantity sets marginal revenue equal to marginal
operating cost, and the optimal price is chosen
accordingly. During high demand times, marginal
revenue is equated to the sum of marginal operating
and capacity costs to find the optimal quantity, and
price is chosen accordingly. In the restaurant
70
example, this corresponds to offering early bird and lunch specials and charging full
prices in the evenings.
In the situation depicted in the figure above, the quantity of meals sold in the
high‐demand period (qH) is higher than the quantity of meals sold in the low‐
demand period (qL), as we assumed it would be. If the willingness to pay in the high
demand period is not much greater than the willingness to pay in the low demand
period, what may happen is the quantity of meals sold at low‐demand times (lunch)
may be higher than the quantity of meals sold at high‐demand times (dinner) if we
apply the technique described above to determine prices.
To illustrate, consider the figure to the right.
Remember, we assumed capacity is determined
only by the high‐demand period of the day. In the
situation above, we find that qH is actually less than
qL. But, since we’ve built capacity based on high‐
demand, we won’t have enough capacity to serve
the low‐demand. This is known as a “shifting peak”
and is even more likely when the off‐peak price
affects peak demand and vice versa – that is when
peak and off‐peak consumption are, to some
degree, substitutes.
Thus, whenever working these problems, it’s important to check that this
assumption holds; namely, that qH > qL. At most, the number of units sold at either
time of day can equal capacity. If your solution violates that working assumption, it’s
back to the drawing board. We would never actually sell less when demand was
higher. At most, we would use all of our capacity at both times of day. So, if the
solution calls for the low demand quantity to be highest, we must go back and
impose on the problem a constraint that qH = qL.
So, when solving a peak‐load problem, maximize the firm’s profit, which is
π = pH (qH )qH + pL (qL )qL − cqL − (c + k )qH . (3.6)
If, upon solving, you find qH ≥ qL , your answer is correct. If, however, qH < qL , you
must rework the problem assuming that qH = qL . Since quantities are the same at
peak and off peak demand, we drop the subscripts and denote each by simply q. It is
important to realize that total output, the sum of peak and off peak sales, will then
equal 2q. While operating cost is incurred for all units produced at each time of day,
capacity cost is incurred only once for both periods. In this case, profit would
become
π = pH (q)q + pL (q)q − (2c + k )q . (3.7)
Maximizing, we obtain
dπ
= MRH + MRL − (2c + k)q
dq
71
MRH + MRL = 2c + k . (3.8)
This says the combined marginal revenue of the last unit sold at peak demand and
the last unit sold at off‐peak demand must equal the marginal operating cost of
producing each unit plus the marginal cost of adding the last unit of capacity that
allows each unit to be produced.
Example: Peak Load Pricing
Suppose a restaurant faces inverse demand of pH = 14 − 0.5qH at high demand
times and pL = 12 − 0.5qL at low demand times. If operating costs are $2 per unit
and capacity costs are $4 per unit, find the profit‐maximizing prices for both
times.
Solution: First, treat the problem as if the constraint qH ≥ qL is not violated. So,
profit is
π = (14 − 0.5qH )qH + (12 − 0.5qL )qL − 2qL − 2qH − 4qH
∂π
= 14 − qH − 2 − 4 = 0
∂qH
qH = 8
∂π
= 12 − qL − 2 = 0
∂qL
qL = 10
Since the constraint is violated, we have broken our assumption that we sell
more during peak demand than we do during off‐peak demand. So, we must sell
the same quantity during both demand times. Assuming qH = qL , profit becomes
π = (14 − 0.5q)q + (12 − 0.5q)q − 2q − 2q − 4q
dπ
= 14 − q + 12 − q − 2 − 2 − 4 = 0
dq
q=9
pH = 14 − 0.5(9) = 9.5
pL = 12 − 0.5(9) = 7.5
72
Profit maximization with uncertainty
In Chapter 1, we saw how the presence of uncertainty creates a situation where
decisions must be based on probability estimates, reducing expected profit. When
there is uncertainty about demand or cost conditions, managers are similarly
limited in their ability to make optimal decisions. On the demand side, for example,
the demand for trucks and SUVs three years from now may depend on the future
cost of gasoline. Similarly, uncertainty about future fuel costs may mean there is
important uncertainty about operating costs in industries where fuel is a high share
of costs.
We are interested in situations in which the presence of uncertainty affects the
choice of production levels and pricing. Therefore, we are interested in situations in
which the production decision must be made before significant uncertainty about
the state of demand or marginal cost is resolved. For example, the profitability of a
decision to drop some production lines devoted to SUVs in favor of adding lines to
produce more fuel efficient hybrid vehicles ultimately depends on the unknown
future demand for SUVs and hybrids. In this section, we analyze decisions about
profit maximization in the face of such uncertainty. While our discussion will focus
on uncertainty about the level of demand, the same general insights and approach
are applicable to uncertainty on the cost side as well.
To focus our discussion, let’s think in terms of a very simple example. Consider a
hot dog vendor on a beach. If the weather stays sunny, he will have a large
lunchtime crowd. If a thunderstorm breaks out, he will have very few customers.
Just before lunch time, he must put some hot dogs on to cook and buns on to warm.
If more people show up than hotdogs he has prepared, they will leave and go
elsewhere before he can cook more. If fewer people show up than hotdogs he has
prepared, the extra hotdogs and buns are wasted. The problem is, he does not know
if a thunderstorm will break out or not ‐ he has only a probability estimate. How
should he determine how many hot dogs to prepare?
To help us answer this question, let us first consider what the vendor would do if
the uncertainty were resolved before he had to make a decision. In that case, he
would find the optimal quantity and price for each possible outcome (good weather
or poor weather), wait until he knew which outcome was going to occur for certain,
and subsequently cook the corresponding number of hotdogs.
Let pH (qH ) be the (inverse) demand curve if demand is high (good weather),
and pL (qL ) be demand if demand is low (poor weather), where qH is the quantity
sold if demand is high and qL is the quantity sold if demand is low. Then, if demand
turns out to be high, the vendor will maximize profits by choosing the quantity,
qH ∗ , that satisfies MRH = MCH and will charge a price of pH (qH *) = pH * .
Likewise, if demand turns out to be low, the manager will sell a quantity, qL * , which
satisfies MRL = MC L and charge a price of pL (qL *) = pL * . Given relative probability
assessments about the likelihood of each of these outcomes occurring, we can
73
calculate the firm’s expected profit from the point of view of someone evaluating the
future before the state of nature is revealed. If Pr(H ) is the chance that demand is
high, and Pr(L) the chance that demand is low, the firm’s expected profit is
E (π ) = Pr( H ) ( pH * qH * −C ( qH *) ) + Pr( L) ( pL * qL * −C ( qL *) ) . (3.9)
Example: Expected profit when uncertainty is resolved before production
Suppose a firm faces uncertain demand, where inverse demand for high demand
q
periods is pH = 20 − H 4 , and inverse demand for low demand periods is
q
pL = 10 − L . If the probability of high demand is 50%, and the firm’s marginal
4
cost is constant at $2, what is the firm’s expected profit if the uncertainty is
resolved before production decisions are made?
Solution: First, find the prices that the manager would set at each demand level.
If demand is high, the profit‐maximizing condition is MRH = MCH , so
qH
20 − =2
2
qH
18 =
2
qH = 36
and the price that sells this quantity is
pH (36 ) = 20 − 36 4 = 11 .
If demand turns out to be high, profit is
π H = 11(36) − 2(36) = 324 .
For low demand period, the optimality condition is MRL = MCL , so
qL
10 − 2 =2
qL
8= 2
qL = 16
and the price is
pL = 10 − 16 4 = 6 .
Profit in low demand periods, then, is
π H = 6(16) − 2(16) = 64 .
74
The firm’s expected profit is the probability of each outcome, times the profit it
receives in that case, or
E(π ) = Pr(H)π H + Pr(L)π L
E(π ) = 0.5(324) + 0.5(64) = 194
Now let’s return to the question of making a production decision before the
uncertainty is resolved. We can use the answer to the question above as a
benchmark to see how much uncertainty reduces the firm’s expected profit. One
way to approach this situation is to choose a single quantity to produce and then set
whatever price is necessary to sell that quantity, whether demand turns out to be
high or low. You might think of it in terms of offering a price discount on rainy days
large enough to sell everything you already prepared even if demand is low. It turns
out this approach is not quite right (even though this is the solution presented in
some managerial economics textbooks). But, working through it is instructive
nonetheless, because, sometimes, it is right, and because we need to understand
when and how it can go wrong so that we may see how to correct it.
Since there’s only one quantity no matter the weather, qH = qL = q , and expected
profit becomes
E (π ) = Pr( H ) pH (q )q + Pr( L) pL (q)q − C (q ) . (3.10)
Maximizing, we obtain:
dE (π ) ⎛ dp ⎞ ⎛ dp ⎞ dC
= Pr( H ) ⎜ H q + pH ⎟ + Pr( L) ⎜ L q + pL ⎟ − = 0 .
dq ⎝ dq ⎠ ⎝ dq ⎠ dq
Note that the term multiplied by the probability of high demand is marginal revenue
of demand if high, given output is q, and the term multiplied by the probability of
low demand is marginal revenue if demand is low, evaluated at quantity q. So, this
expression can be rewritten as:
Pr( H ) MRH (q) + Pr( L) MRL (q) = MC (q) . (3.11)
The left‐hand side of this equation can be thought of as the expected marginal
revenue; it’s the marginal revenue for each state of nature times the probability that
each state of nature occurs. This represents the uncertainty and the fact that we
must make our pricing decisions before demand is completely known. This expected
marginal revenue is then balanced against marginal cost.
75
Example: Profit maximization with uncertainty when demand is known after production,
choosing one quantity
Using the demand and cost conditions curves from the previous example,
( pH = 20 − 0.25qH , pL = 10 − 0.25qL , Pr(H)=0.5, and C(q)=2q), find the expected
profit if the firm is unable to resolve uncertainty before choosing quantity.
Solution: Since we’re only planning on selling one quantity, expected profit is
E(π ) = 0.5 (20 − 0.25q )q + 0.5 (10 − 0.25q )q − 2q .
Maximizing we obtain
dE(π )
= 0.5 (20 − 0.5q ) + 0.5 (10 − 0.5q ) − 2 = 0
dq
0.5 (20 − 0.5q ) + 0.5 (10 − 0.5q ) = 2
10 − 0.25q + 5 − 0.25q = 2
13 = 0.5q
q = 26
If demand turns out high,
pH = 20 − 0.25(26) = 13.50
.
π H = (13.50 − 2)26 = 299
If demand turns out low,
pL = 10 − 0.25(26) = 3.50
.
π L = (3.50 − 2)26 = 39
So, expected profit is
E(π ) = 0.5(299) + 0.5(39) = 169 .
Note that this is lower than the expected profit when the firm was able to
postpone the pricing decision until after the uncertainty was resolved.
So, what is wrong with the approach above? It is based on a potentially faulty
assumption. We assumed that we would lower price until we sold everything we had
produced, even when demand was low. If demand is low, but not too low, so that the
price to sell out is not too low, that might make sense. But, would the vendor always
do that? In the extreme, suppose they had to lower price all the way to 0 to unload
everything they had prepared if it was raining. Then, revenue would be 0. Absent
outside influences not present in this example or model, no firm would ever want to
sell more than maximized revenue. Further, the fact that we assumed the firm would
76
lower price enough to sell out when demand was low held down production, and,
thus, the amount that could be sold if demand was high.
To answer the question correctly, we must recognize that there is no reason to
sell the same quantity when demand is low as when demand is high, and no reason
we have to sell everything we produced. In some scenarios, we can store extra
output, at some cost, as inventory. Inventory accumulation has future benefits
because it saves on next period’s production costs. But, holding inventory has costs.
With inventory, those benefits must be accounted for in the profit function before
optimizing.
If inventory is too expensive, it may make more sense to throw out unused
output. In the hotdog vendor example, the food is perishable and must be disposed
of if not used. Generally, there may be disposal costs associated with throwing
output out. If so, they must be accounted for in the profit function. For the hotdog
vendor example, it is sensible to assume free disposal – that it does not cost
anything significant to throw some hot dogs and buns in the dumpster or to feed
them to the dog when the vendor gets home.
At this point, some student often raises the following objection. Why throw
hotdogs out instead of getting at least something for them. Set a price that raises
some revenue, sell what you can, and then lower the price to get rid of the rest. They
have in mind something like what happens to unsold Halloween candy the day after
Halloween. The problem with that line of thinking lies in a misconception of the
definitions of the product and the market. Products must be thought of in terms of
all of the characteristics needed to meet a customer’s intended use, including
location and timing. Just like available land in Georgia does me no good if building a
house in Gainesville, candy the day after Halloween does me no good if I need it to
hand out to kids on Halloween. So, the store can lower the price the day after
Halloween without too much effect on demand before Halloween.
On the other hand, if they made a habit of marking down Halloween candy a few
days before Halloween, all their customers would just wait until the markdown to
buy. Similarly, if hotdog vendors sold at a high price for 45 minutes after the start of
the lunch rush and then cut the price, many customers would anticipate this and
wait for the price to be cut. So, that strategy is self defeating. Another way to view it
is that this form of price cutting is an attempt at price discrimination – trying to
charge a high price to those willing to pay it and a lower one to other customers.
That only works if the vendor can tell customers who are willing to pay a high price
apart from those who are not upfront and can get away with segmenting the market
and offering discriminating prices. The bottom line is that, for a product with given
physical characteristics at location and over a narrow time interval, unless there is a
way to explicitly price discriminate, the ability of customers to wait for the price cut
before purchasing forces the seller to pick a single point on the market demand
curve and stick to it.
So, let’s return to the hotdog vendor’s problem. They have to decide how many
hot dogs to produce, q, how many to sell if demand is high, qH, and how many to sell
77
if demand is low, qL. That seems like three choices to make. And, in some sense, it is.
But, we can simplify a lot by making a couple of simple observations. First, it would
be a bad idea to make more than the most that the vendor would ever want to sell.
Second, the vendor will never want to sell more when demand is low than they want
to sell when demand is high. So, we know the level of production, and therefore cost,
are determined by the high demand sales target. If we produce qH and demand turns
out to be low, we have plenty of units on hand to meet that demand; if demand turns
out to be high, we’ve produced exactly the right amount.
Expected profit then becomes:
E (π) = Pr( H ) ( pH (qH ) ) qH + Pr( L) ( pL (qL ) ) qL − C (qH ) . (3.12)
We are, quite reasonably, assuming that we would never want to sell more when
demand is low than when demand is high. We therefore built our expression for
expected profit on the assumption that qH ≥ qL . However, there is nothing in
equation (3.12) to guarantee that when we maximize it, we will end up with
qH ≥ qL . So, once we have maximized this expected profit, we will need to check that
our solution actually satisfies that constraint. If it does not, we will have to go back
to the drawing board. We will return to why this might happen mathematically and
what to do about it later.
For now, there are two choice variables in the vendor’s expected profit function.
Maximizing with respect to qL gives:
∂ E ( π)
= Pr( L) MRL = 0 . (3.13)
∂ qL
From that, it follows that MRL = 0 . What about marginal cost? We are assuming we
want to sell more when demand is high than when demand is low. Therefore, once it
is time to actually sell the hotdogs, we will have produced more than we plan to sell
if demand is low. So, if demand is low, we will have extra output just sitting around.
The marginal cost of getting another unit to sell if demand is low is zero! So, if we
actually want to sell more when demand is high than when demand is low, we just
maximize revenue if demand turns out to be low.
Maximizing with respect to qH we get
∂E (π)
= Pr( H ) MR H − MC (q H ) = 0 . (3.14)
∂q H
Notice how the marginal revenue at high demand periods is weighted by the
probability we will experience high demand, whereas marginal cost is not weighted
by any probability. This is because we will always produce qH, even though we may
not actually end up selling all of it.
78
Example: Profit maximization with uncertainty when demand is known after production, and
restricting low demand quantity
Using the previous demand curves ( pH = 20 − 0.25 qH and pL = 10 − 0.25 qL ),
probability estimates, and constant marginal cost given, what is the firm’s
maximum expected profit if it has to produce before demand is known?
Solution: The firm should produce what it expects to sell if demand is high, and
restrict its quantity sold if demand is low. So, it should maximize its expected
profit:
E (π ) = 0.5 (20 − 0.25 qH )qH + 0.5 (10 − 0.25 qL )qL − 2(qH )
The quantity for high demand times is
∂E (π )
= 0.5 (20 − 0.5 qH ) − 2 = 0
∂qH
20 − 0.5 qH = 4
qH = 32
and the quantity for low demand times is
∂E (π )
∂qL
( )
= 0.5 10 − 1 2 qL = 0
10 − 1 2 qL = 0
qL = 20
Since qH ≥ qL , this is a reasonable solution. Therefore, the expected profit is
E (π ) = 0.5 (20 − 0.25(32))(32) + 0.5 (10 − 0.25(20))(20) − 2(32) = 178
Notice that this profit is higher than when we sold the same quantity at both
high and low demand periods, and lower than when the uncertainty was
resolved before the production decision was made.
It is instructive to compare the solutions implied by equations (3.13) and (3.14),
for the case where uncertainty is not resolved before the production decision is
made to the choices that would be made if complete information were available at
the time of the production decision. The quantity to sell when demand is low
equated marginal revenue and marginal cost with complete information, and
maximizes marginal revenue with uncertainty. Thus, more is sold when demand is
low with imperfect information. When demand is high with uncertainty, a fraction of
marginal revenue equal to the probability of high demand is equated to marginal
cost in determining production levels. Thus, sales are lower with high demand when
79
uncertainty exists at the time of production than they would be without the
uncertainty. Basically, the presence of uncertainty is causing underproduction
relative to the most profitable high demand output and overproduction relative to
the most profitable low demand output. Some of that overproduction at low demand
is sold and some is disposed of.
The graph to the right shows
$
both solutions: when
uncertainty is resolved prior to p U
H
the production decision and pH*
when it is not. Prices and pH
quantities marked with a *
represent the solution with
MC
complete information. The
solution with uncertainty is MRH
denoted with a superscript U. pL*
Pr(H)MRH
The graph mirrors the math and pLU
discussion above. With certainty,
production occurs where pL
marginal cost crosses the
qLU qH qH* Quantity
U
relevant marginal revenue. With qL*
uncertainty, production occurs MRL
where the probability weighted
marginal revenue when demand is high crosses marginal cost, qHU, and price is
correspondingly pHU. That means that the effective marginal cost is 0 at low demand,
so sales if demand is low occurs where marginal revenue equals marginal cost,
which is zero in this case, at qLU, and price is pLU. It is obvious from the graph that, as
compared to certainty, the low demand quantity is higher, the low demand price is
lower, the high demand quantity is lower, and the high demand price is higher. Since
we are diverging from what would be chosen if we had complete information when
the production decision was made, expected profit must be lower.
The solution technique outlined above works fine so long as the solution for the
quantity produced to sell when demand is high actually exceeds the solution for the
quantity to sell when demand is low. If the solution does not satisfy that condition, it
contradicts the assumption underlying our formulation of the problem – we will not
have produced enough to sell more when demand is low than when demand is high.
How might we end up in a situation where the solution contradicts the
assumption that qH ≥ qL ? It can be seen readily in the figure above. As the marginal
cost gets higher or the probability weighted high demand marginal revenue gets
closer to the low demand marginal revenue, the difference between qHU and qLU gets
smaller. So, if marginal cost or low demand are high enough, or if high demand or
the probability of high demand are low enough, this solution technique will yield a
solution where qH < qL .
80
What should we do in that case? We can never actually sell more than was
produced. Further, we would never sell more when demand is low than when
demand is high. Therefore, if qL is not less than qH, at most, qL will equal qH. So, if
assuming that qH ≥ qL will be satisfied does not work, we must impose the
constraint on the problem that qH = qL . In that case, everything produced is sold
whether demand turns out to be high or low. Thus, this is the case in which the set
up of equation (3.10) is correct, and the solution is that implied by equation (3.11).
The situation and the solution
are shown in the figure to the right. $
The probability weighted marginal pHU
revenue when demand is high
equals marginal revenue at a pH
quantity, qHX, is less than the MC
quantity that maximizes revenue
when demand is low, qLX. The
solution is to equate the sum of the MRH
probability weighted marginal
pLU
revenues with marginal cost, which Pr(H)MRH Pr(H)MRH
occurs at the quantity labeled qU. If +Pr(H)MRL
demand is low, the price charged is
pLU, and, if demand is high, it is pHU. pL
qHX qU qLX Quantity
MRL
81
Example: Profit maximization with uncertainty when demand is known after production, and
restricting low demand quantity
Suppose pH = 20 − 0.25 qH , pL = 10 − 0.25 qL , C(q)=3q, and Pr(H)=0.2. How much
should the firm sell, and what price should they charge at high and low demand?
Solution: First, allow for the possibility that the quantity sold when demand is
low is less than produced.
E (π) = 0.2 ( 20 − 0.25qH ) qH + 0.8 (10 − 0.25qL ) qL − 3qH
At high demand:
∂E (π)
= 0.2 ( 20 − 0.5qH ) − 3 = 0
∂qH
20 − 0.5qH = 15
qH = 10 .
At low demand:
∂E (π)
= 0.8 (10 − 0.5qL ) = 0
∂qL
10 − 0.5qL = 0
qL = 20 .
Since qH < qL , this is the wrong approach to this problem. So, assume all output
is sold whether demand is high or low and work the problem again.
E (π) = 0.2 ( 20 − 0.25q ) q + 0.8 (10 − 0.25q ) q − 3q
Maximizing:
∂E (π)
= 0.2 ( 20 − 0.5q ) + 0.8 (10 − 0.5q ) − 3 = 0
∂q
12 − 0.5qH = 3
qH = 18 .
82
Value of Information with Continuous Decisions
In Chapter 1, in the context of discrete decisions, we saw how acquiring
additional information can increase expected profit if it allows a decision maker to
update their probability estimates. We can now revisit the question of information
value in a context where the decision to be made is continuous – namely, what
output to choose to maximize profit when there is some uncertainty about some
determinant of profit. It is easy to determine the value of perfect information by
comparing the expected profit that would result with perfect information to
expected profit with whatever information is at hand.
A much more interesting question is how valuable additional but still imperfect
information is to a decision maker, since imperfect information is all they are likely
to actually have access to. An example of imperfect information is a consultant’s
report about the likelihood of success of a new product line. It is imperfect because,
despite his expertise, the consultant is unable to make a 100% accurate assessment
of the future. However, it is possible that the report can still be valuable to the firm
because it (presumably) provides additional information that the firm does not have
otherwise.
We will consider information in the context of maximizing profit while facing
uncertain demand. The effects of cost uncertainty would be treated in a similar
fashion. As in Chapter 1, we assume the additional report results in a signal of either
good news or bad news, where good news leads us to revise our initial assessment
of the chance of high demand, Pr(H), upward to Pr(H|GN), and bad news leads us to
revise it downward to Pr(H|BN). The more reliable we think the information is, the
more it will affect our probability assessments and the more valuable it will be.
The most objective possibility for using additional information to update occurs
when we have a sample of reports from the same source for very similar situations
in the past which we think forms a valid basis to make inferences about the future.
This is illustrated in the example below. Generally, though, the process must involve
some degree of subjective judgment on the part of the decision maker. Better
decision makers are better at using all the information available to them, including
their gut feel or intuition after meeting with the consultants providing the additional
information, to formulate their probability estimates.
Once we have the report, and a new assessment of the probability of high
demand, the problem is just like the one in the previous section, choosing the
quantity to sell at high demand and at low demand to maximize expected profit
based on the new probability of high demand. The solution will be different with
good news than with bad news, and both of those solutions will differ from the
solution with no additional report. We now present an extended example to make
these ideas more concrete. This is a continuation of the example from the previous
section.
83
Example: Value of Imperfect Information
Part 1 – Updating Probability Estimates
Suppose that a manager is uncertain about whether demand will be high or low,
and is considering buying a forecast to improve his assessment of the future. The
manager wants to determine how reliable a new report will be, and he has data
from previous forecasts he has purchased. Forty percent of the time, the forecast
was good news and demand was high and forty percent of the time the forecast
was bad news and demand was low. This is represented in the table below. What
is the probability the firm will actually experience high demand if it receives a
good forecast? What is the probability that it will experience high demand if it
receives a bad forecast? If the firm buys a new
Demand
forecast, what is the probability that it will return
good news? High Low
Solution: Looking at the table, we see that 40% of GN 0.4 0.1
the time the report gave good news and demand Report
BN 0.1 0.4
turned out to be high. Also, we see that 50% of the
time the report gave good news (0.4 + 0.1, the sum of the first row). Thus, the
probability of a high demand period given a good report is
Pr( H | GN ) = 0.4 = 0.8
0.5
Similarly, the probability of high demand given bad news is
Pr( H | BN ) = 0.1 = 0.2
0.5
Finally, we see that 40% of the time the report gave good news when demand
turned out to be high, and 10% of the time the report gave good news and
demand turned out to be low; so, the probability that the report will return good
news is
Pr(GN ) = 0.4 + 0.1 = 0.5
Part 2 ‐ Finding Expected Profit for Each Report
Continuing from the example in the previous section, a firm faces uncertain
q
demand where inverse demand for high demand periods is pH = 20 − H 4 , and
q
inverse demand for low demand periods is pL = 10 − H 4 . The firm’s marginal
cost is constant at $2. What is the firm’s expected profit with good news and with
bad news?
Solution: From above, the probability of high demand with a good report is 0.8,
which means the probability that demand will be low is 1 – 0.8 = 0.2. The
expected cost is simply the cost of producing qH units, since (as described in the
84
previous section) we want to make sure we have enough inventory on hand to
sell if demand turns out to be high. Thus, expected profit given good news is
⎛ q ⎞ ⎛ q ⎞
E (π | GN ) = 0.8 ⎜ 20 − H ⎟ qH + 0.2 ⎜ 10 − L ⎟ qL − 2 qH ,
⎝ 4 ⎠ ⎝ 4⎠
subject to the constraint qH ≥ qL . The constraint must hold, since we are
associating all of our costs with the high demand period. Maximizing, we find the
quantities as follows.
∂ E(π | GN ) ⎛ q ⎞
= 0.8 ⎜ 20 − H ⎟ − 2 = 0
∂ qH ⎝ 2 ⎠
⎛ q ⎞
0.8 ⎜ 20 − H ⎟ = 2
⎝ 2 ⎠
qH
= 17.5
2
q H = 35
∂ E(π | GN ) ⎛ q ⎞
= 0.2 ⎜ 10 − L ⎟ = 0
∂ qL ⎝ 2⎠
qL
10 − = 0
2
qL = 20
Since the constraint qH ≥ qL is satisfied, we can use these quantities to find
expected profit given good news:
⎛ 35 ⎞ ⎛ 20 ⎞
E (π | GN ) = 0.8 ⎜ 20 − ⎟ 35 + 0.2 ⎜ 10 − ⎟ 20 − 2(35) = 265
⎝ 4⎠ ⎝ 4⎠
Similarly, with bad news the probability of high demand is 0.2 and the
probability of low demand 0.8. Thus, expected profit given bad news is
⎛ q ⎞ ⎛ q ⎞
E (π | BN ) = 0.2 ⎜ 20 − H ⎟ qH + 0.8 ⎜ 10 − L ⎟ qL − 2 qH ,
⎝ 4 ⎠ ⎝ 4⎠
subject to the constraint qH ≥ qL . We find the quantities as follows.
∂ E(π | BN ) ⎛ q ⎞
= 0.2 ⎜ 20 − H ⎟ − 2 = 0
∂ qH ⎝ 2 ⎠
qH
20 − = 10
2
q H = 20
85
∂ E(π | BN ) ⎛ q ⎞
= 0.8 ⎜ 10 − L ⎟ = 0
∂ qL ⎝ 2⎠
qL
10 − = 0
2
qL = 20 .
Since the constraint is met, we can plug these quantities back into our profit
function to find the expected profit given bad news.
⎛ 20 ⎞ ⎛ 20 ⎞
E (π | BN ) = 0.2 ⎜ 20 − ⎟ 20 + 0.8 ⎜ 10 − ⎟ 20 − 2(20) = 100
⎝ 4⎠ ⎝ 4⎠
How then do we determine the value of the report? It is the difference in
expected profit with and without the report. In turn, expected profit with the report
is just the probability of good news, Pr(GN), times expected profit with good news,
E (π | GN ) , plus the probability of bad news Pr(BN), times expected profit with bad
news, E (π | BN ) . Letting E (π | Info) represent profit with the additional
information, this is
E (π | Info) = Pr(GN ) E (π | GN ) + Pr( BN ) E (π | BN ) .
Letting E (π | NoInfo) represent expected profit with no information, the value of
information is
Value = E(π | Info) − E (π | NoInfo) .
Thus, once we have determined the maximum expected profit for each possible
probability of high demand, the value of information is determined exactly as in
Chapter 1.
Example: Value of Imperfect Information
Part 3 ‐ Finding the Value of the Information
Using the data from the previous examples, calculate the expected profit of the
firm if they obtain a report. Then, determine the value of the information.
Solution: We know the report has a 50% chance to return good news (which
means it also has a 50% chance to return bad news). We also know that the
firm’s expected profit if they buy a report and receive good news is 265, whereas
the expected profit if the report returns bad news is 100. Thus, the firm’s
expected profit if they buy a report is
E(π | Info) = 0.5(265) + 0.5(100) = 182.50
The value of the information is how much higher the firm expects profits to be
with it than without it. From the examples in the previous section, we know the
86
highest profit the firm can expect to receive without any additional information
is 178. Thus, the value of the information is
Value of Info = 182.50 − 178 = 4.50 .
The decision of whether or not to buy additional information can be illustrated
using a decision tree:
Don’t Buy = 178
E(π)=178
87
Chapter 3 Terminology
The following is a list of terms that you should know in order to discuss and
apply the material from this chapter.
3rd Degree Price Discrimination Charging separate groups different prices for a
product based on their willingness to pay for that product. The firm must be able to
identify and separate these groups of customers, it must be feasible to charge
different prices, and resale must not be possible.
PeakLoad Pricing A technique of charging different prices based upon when
demand differs by time of day or season. At the time the firm experiences its peak
demand, customers have a higher willingness to pay and thus, it can charge higher
prices but must incur a per‐unit capacity cost. At off‐peak demand, the firm will set
prices lower in relation to the lower willingness to pay at that time, but will not
incur any capacity cost.
88
Part 2
Empirical Approximations
and Econometrics
89
Chapter 4
Estimating and Interpreting Approximations
Economics is a set of tools that, among other things, allows us to make inferences
about the consequences of various actions. These inferences are based on models
rooted in economic theory and tested against real‐world data. Because economics
deals with constantly changing individuals and markets, complete or perfect data is
usually unrealistic; but, as we saw in Chapter 1, inferences can be improved through
supplemental information, even if imperfect.
One way that we can model economic phenomena is to approximate empirical
relationships among historical data, using economic theory as a guide. For example,
we can estimate a firm’s demand curve by collecting data on different prices
charged and the resulting quantities sold over the past several years, and then
attempt to describe this relationship using an equation. Once we have a function
that potentially describes a certain relationship between two variables, we can test
this function’s reliability by comparing predictions of the model to external data that
wasn’t used in constructing the function.
To illustrate how these models can be useful, let’s look at the following scenario.
Suppose Little and Small Inc. currently sells 60 units at a price of $4 per unit. They
then lower their price to $3, and as a result sell 70 units. Their cost function is
C = 5 + 2 q . What should the manager set price to in order to maximize profit?
First, let’s look at Little and Small Inc.’s profit at each price. At $4, profit is
π = 4 ⋅ 60 − 5 − 2 ⋅ 60 = 115
and at $3 profit is
π = 3 ⋅ 70 − 5 − 2 ⋅ 70 = 65
So, they’d be better off at a price of $4 than at $3.
Can they raise profits even more by charging a π Max Profit?
different price? If so, would that price be higher
than $4, or somewhere between $3 and $4? The
graph to the right illustrates why there is not an
obvious answer to this question.
In order to see what the profit hill looks like, we
need a demand approximation. In Chapter 2, we
introduced linear and log‐linear approximations P>$4 P<$4
and how they can be used to represent a firm’s
demand curve. The coefficients of the variables in
q
these approximations are what actually determine 60 70
the relationship between these variables ‐ but how
are they found? In the next two sections, we will see how we can estimate these
coefficients (parameters) given two points of data. From there, we generalize to
estimating the coefficients when we have many data points.
90
Fitting a Linear Demand Approximation with 2 Points
Recall the main assumption when using a linear demand approximation is that
the slope of the demand curve is constant. Basically, given any two observed points
(p0,q0) and (p1,q1) on the
demand curve, we are simply
looking for the line that p
passes through them. This is Approximation
illustrated in the figure to the
right. The first step in
deriving a linear expression p0
for our two data points is to Observed
calculate the slope between
the two points; we can then p1
use this to define the
approximation using either
original data point. Once we p(q)
have an approximation of the
demand curve, we can q1 q
q0
calculate marginal revenue
and get an estimate the profit
maximizing price and quantity.
Example: Fitting and Using a Linear Approximation with Two Points
Using Little and Small Inc.’s two observed data points, (60, $4) and (70, $3),
estimate an inverse linear demand curve for the firm.
Solution: First, we need to find the slope of the inverse demand curve, which we
are assuming constant. The slope is Δp/Δq, or
4−3
b= = −0.1
60 − 70
The slope tells us that from any starting point, the change in price is ‐0.1 times
the change in quantity, or
Δp = −0.1Δq .
If we choose a price of 4 and a quantity of 60 as the starting point, and then
move along the demand curve to any other quantity, q, and the resulting price, p,
thus becomes
( p − 4 ) = −0.1( q − 60 ) .
This can easily be rearranged to express the inverse demand relationship more
concisely as follows.
91
p = 4 − 0.1(q − 60)
p = 4 − 0.1q + 6
p = 10 − 0.1q
Given the two points we observed, we
assumed the slope of the curve was constant
and found the line that passed through the
points, as illustrated in the figure to the right.
Given the linear demand function found in the
previous example and a cost function of
C = 5 + 2q , find the optimal price and
associated profit.
Solution: Set up the profit function, and
maximize. Profit is
π = (10 − 0.1q)q − 5 − 2q .
Maximizing it:
dπ
= 10 − 0.2q − 2 = 0
dq
8 − 0.2q = 0
0.2q = 8
q = 40
To find the price, plug this quantity back into the inverse demand
approximation.
p = 10 − 0.1(40) = 6
At $6, profit is
π = 6(40) − 5 − 2(40) = 155
Based on the example, it seems that we could conclude it would profit the firm to
charge a price of $6. However, this estimate of the profit‐maximizing price was
based on a very limited amount of data. Our major assumption when determining
the profit‐maximizing price was that if price falls by 0.10, quantity would increase
by 1, and that this relationship held at all prices. In fact, we’ve only observed data for
prices around $3 and $4; we really don’t have any idea what will happen if we
charge $6. This common mistake is called extrapolating beyond the data range and
we will revisit this in detail later in the chapter. So, what our model really suggests
is that the profit‐maximizing price may well be higher than $4, but there simply is
not enough information to say with any certainty that it should, in fact, be $6.
92
Fitting a LogLinear Demand Approximation with 2 Points
By using a log‐linear model to approximate demand, we are assuming the
elasticity is the same at any price. From Chapter 2, the general formula for a log‐
linear approximation is
qD = ap b (4.1)
where a is a coefficient that captures the scale of demand and b is the price elasticity
of demand. With two points on the demand curve, (p0,q0) and (p1,q1), we can solve
for the unknown parameters (a and
b) that make the approximation go
through the two points. This is p Approximation
illustrated in the figure to the right.
The first step in deriving a log‐linear
approximation through our two data p0 Observed
points is to calculate the elasticity;
we can then use this to define the
approximation using either original
data point. Once we have an p1
approximation of the demand curve, p(q)
we can calculate marginal revenue
and get an estimate the profit q0 q1 q
maximizing price and quantity.
Example: Log‐Linear Demand: Finding Demand
Using Little and Small Inc.’s two observed data points, (60, $4) and (70, $3),
estimate a log‐linear demand approximation.
Solution: First, we need to find the elasticity given these two points, assuming it
is constant. We can plug both points into equation (4.1).
60 = a 4η
70 = a3η
Since a and η are both constants, we can solve this system of equations for the
two unknowns, a and η. Dividing the left‐hand side of the first by the left‐hand
side of the second and similarly for the right‐hand side, then solving, we obtain
the following.
60 a 4η
=
70 a3η
η
6 ⎛4⎞
=⎜ ⎟
7 ⎝3⎠
93
⎛6⎞ ⎛4⎞
ln ⎜ ⎟ = η ln ⎜ ⎟
⎝7⎠ ⎝3⎠
ln ( 6 7 )
η= = −0.5358
ln ( 4 3)
To find a, plug η back into either equation and solve:
60 = a 4−0.5358
a = 126.11
So, our log‐linear demand approximation is
q = 126.11 p −0.5358
The graph below shows the log‐linear demand
curve from the example. Elasticity of demand is
constant along the entire curve but slope
depends on what price is being charged.
At this point, since we’re assuming elasticity is
constant, and since marginal cost was constant
at 2 ( C = 5 + 2q ), we can use the expression for
the profit‐maximizing mark up from Chapter 2.
⎛ η ⎞
p* = ⎜ ⎟ MC
⎝ 1 +η ⎠
⎛ −0.536 ⎞
p* = ⎜ ⎟ 2 = −2.31
⎝ 1 − 0.536 ⎠
This answer is absurd! Recall from Chapter 2 that when elasticity is less than 1 in
absolute value, raising price increases profit. So, the log linear approximation
tells us only to raise price – not how far to raise it!
The technical reason we get an absurd answer for price in the above example is
that the second order conditions fail at the solution implied by our mark up formula.
Intuitively, though, it should make sense. Remember, we are assuming constant
elasticity, and we found demand to be inelastic (elasticity less than 1 in absolute
value). What does it mean for demand to be inelastic? Inelastic demand means for a
10% increase in price, quantity will decrease by less than 10%. Revenue will
increase and cost will decrease ‐ so it’s profitable to raise price. Since elasticity is
assumed constant, it will always be profitable to raise price; so the manager should
theoretically charge a price of infinity!
We know that constant elasticity was just an assumption used for our
approximation. It is not literally true. In fact, demand tends to be inelastic at low
prices and elastic at high prices. In using it, we hope only that over a relatively small
94
range of prices we are likely to charge that elasticity is relatively constant, so that
the approximation is relatively accurate. However, if the price range we have is so
low that we find demand is inelastic, the calculus of profit maximization, literally
interpreted, suggests we should raise price A LOT, until demand is elastic. That large
change in price means assuming elasticity is constant is not reasonable for the
problem at hand.
If we were to raise prices to a higher level and collect data again, the
approximation would become useful if the range for which we had data included
prices near the profit‐maximizing price. Thus, again, it boils down to saying we
should not use an approximation to extrapolate much beyond the range of observed
data.
An additional word of caution is in order here before moving on. Time is a very
important determinant of the elasticity of demand. If gas prices were to increase
permanently by 100%, initially, consumer response would be relatively restrained.
People have to get back and forth to work in the cars they already have, they have
already scheduled appointments and trips, etc. Given more time to adjust, they can
move closer to work, get a job closer to home, buy a more fuel efficient car, etc. In
short, the substitution possibilities are higher in the long run than in the short run. A
finding that demand is inelastic in the short run is no reason to believe it is inelastic
in the long run. Before deciding that an inelastic response to a price increase means
more price increases are in order, make sure that the long run response, not just the
very short run response, is inelastic. If you raise price because short run demand is
inelastic without realizing demand is more elastic in the long run, once consumers
go through the trouble of finding substitutes for your product, you may not get them
back if you lower prices later on.
Regression – Fitting the Best Approximation with Many Data Points
Above, we took two points of data and fit a curve through them, attempting to
describe a relationship between two variables. The general shape of the curve was
determined by our assumptions about what would make a reasonable
approximation. The exact position of the curve was determined by the two data
points – we chose the coefficients or parameters of the approximation so the curve
would precisely fit the observed data by going through both points. With many data
points, it is not possible to fit the approximation precisely to all the data points.
Instead, we want to choose the coefficients of the approximation to fit the observed
data as closely as possible. Econometrics is the use of quantitative mathematical
and statistical techniques to study economic phenomena. Regression analysis is the
branch of econometrics concerned with fitting and evaluating empirical
approximations of underlying economic relationships.
To make things concrete, let’s consider an example. Suppose we want to know
what determines variation in consumer electric bills. If the utility has only one price
per kilowatt hour, the expenditure per household is just the price per kilowatt hour
times the number of kilowatt hours consumed. Sometimes electric utilities have
more complex price structures, where the price per unit increases or decreases with
95
the consumption level. Either way, suppose that the electric utility has held the price
structure constant, and what we want to explain is what else drives electricity
spending. One thing that comes immediately to mind is the size of the residence.
The table to the right shows twenty (hypothetical) Size
observations on the size of a residence and the electric bill (1000's
incurred at that residence. We might think a very Obs Bill ($) Sq Ft)
reasonable place to start is by plotting the data to see if the 1 343.32 3.2
relationship looks linear or log linear. This is done in the 2 299.21 3.4
3 302.02 1.6
figure below. Simple visual inspection reveals two things. 4 167.94 1.2
First the data may well follow a general linear positive 5 209.55 2.7
relationship between residence size and electricity 6 367.80 3.1
expenditure. Second, no single line is going to fit with 7 390.06 2.5
perfect precision. So, how do we choose an approximation 8 398.46 3.2
9 224.87 1.6
to fit as closely as possible?
10 313.27 3
11 209.36 1.6
12 355.15 3.1
13 344.13 1.7
14 453.55 3.7
15 184.73 1.2
16 372.66 3.8
17 264.10 2.2
18 325.37 2.6
19 204.54 1.8
20 205.89 2.2
To begin to answer that question, we have to introduce the notion of an
approximation error. What we have in mind is that the bill is a linear function of
size, plus some component that we are unable to model or explain which we take as
random for our purposes. Thus, we are positing the following relationship:
Billi = β 0 + β1Sizei + ε i . (4.2)
Use of the lower case Greek beta (β) to represent regression coefficients is
ubiquitous in applied regression analysis. β0, or “beta zero”, is the intercept and β1,
or “beta one”, is the slope. The subscript i refers to the observation and takes on any
value from 1 (the first observation) to N (the last), where N is the total number of
observations. The lowercase Greek epsilon (ε) represents the (hopefully small)
random component outside the model.
We never observe the true values of the coefficients. Instead, we estimate them.
We denote the estimates by placing a caret, or a “hat”, over the betas. So β̂1 , or beta
one hat, is the estimated value of β1 . We also do not observe the value of the random
component, by definition. Instead, we observe an approximation error, which
reflects both that there is a random component we do not observe and the fact that
96
we have only estimates of the coefficients. The approximation error is the difference
between the actual electric bill and the one we would predict based on our
imperfectly estimated coefficients. That is, the predicted bill is
Billi = βˆ0 + βˆ1Sizei (4.3)
and the approximation error, or residual, is
εˆi = Billi − Billi = Billi − βˆ0 − βˆ1Sizei . (4.4)
There are an infinite number of slope coefficient and intercepts that might be
chosen to approximate the data. Two possibilities, the lines labeled A and B, are
shown in the figure below. Our goal is to choose values for the estimated coefficients
so as to make the errors as small as possible overall. How to do that is the question.
We could literally add up the errors and minimize their total. But, there is a serious
problem with that approach since positive and negative errors cancel each other
out. For example, if one point has an error of 10 and another point has an error of ‐
10, their sum cancels to 0, even though the total error is 20 in absolute value. So,
really bad approximations might have a total error of 0 using this flawed approach.
The most common regression technique is called Ordinary Least Squares (OLS).
The goal of an OLS regression is to produce a line (or curve) as close to the data
points as possible by minimizing the sum of the squared errors, which are always
positive. Squaring each error, the sum of squared errors, or SSE, is
SSE = ∑ Errori2 = ∑ (Billi − βˆ 0 − βˆ1Sizei ) 2 . (4.5)
i i
Choosing the coefficient estimates to minimize the SSE is a straightforward
exercise in basic differential calculus and algebra. There is nothing magic about it,
nothing that even requires a high level of expertise to understand. We simply take
the partial derivative with respect to each of the coefficients and set them equal to
zero:
97
∂SSE
∂βˆ0
( )
= −2∑ Billi − βˆ0 − βˆ1Sizei = 0
i
. (4.6)
∂SSE
∂βˆ1
( )
= −2∑ Sizei Billi − βˆ0 − βˆ1Sizei = 0
i
Substitution shows that this could also be written as
∂SSE
= −2∑ εˆi = 0
∂βˆ0 i
. (4.7)
∂SSE
= −2∑ Sizeiεˆi = 0
∂βˆ1 i
Thus, the two equations show that the sum of the approximation errors is 0 and that
the sum of the product of the approximation errors and residence size is also zero.
Of course, actually solving these two equations for the two unknown parameters
is messy, and much better done by computer. But, it is important to understand
what the computer is doing when it calculates regression coefficients. It is solving
what is in principle a simple calculus problem that anyone who passed survey of
calculus should understand. It is just that the computations themselves get messy
enough that it makes much more sense to have them performed by a statistical
software package.
Performing ordinary least squares regression on the example we have been
working yields the results shown in the table below. This particular output is from
Microsoft Excel, but most spreadsheet packages will include a regression feature.
For more involved work, statistical packages, such as STATA, are easier to use. But,
all provide substantially the same information. While we will eventually make sense
of all the information in the regression output, for now, lets focus on the estimated
coefficients, which are highlighted.
98
Regression Statistics
Multiple R 0.7501
R Square 0.5627
Adjusted R Square 0.5384
Standard Error 56.37
Observations 20
ANOVA
df SS MS F Pvalue
Regression 1 73580 73580 23.16 0.0001
Residual 18 57190 3177
Total 19 130770
Lower Upper
Coef Std Err t Stat Pvalue 95% 95%
Intercept 111.27 40.56 2.74 0.01 26.06 196.49
Sq Ft 75.11 15.61 4.81 0.00 42.32 107.90
The estimated coefficients yield the following expression for the predicted
electric bill:
Bill = 111.27 + 75.11Size . (4.8)
The interpretation is that every additional one thousand square feet of space is
associated with a predicted increase of $75.11 in the electric bill. Taken literally, the
intercept means a home with zero square feet would incur a bill of $111.27.
Hopefully, we can all agree that that would be taking the model too literally. All it
really means is that for common house sizes, the electric bill is best approximated
by adding 111.27 to a charge of $75.11 per thousand square foot. The resulting
fitted line, or approximation, is shown in the figure below.
99
Of course, many things other than the size of a residence have important effects
on electric bills, just as more than quantity affects cost and more than price affects
demand. Generally, we want our approximations to allow for the effects of many
variables. So, before we can add to our understanding of regression analysis, it is
necessary to introduce some terminology and notation to let us talk about
approximations involving many explanatory or independent variables in a very
general way.
As we have noted before, it is conventional to let Y denote the dependent
variable. We will also let X1, X2, … XK denote K different independent variables, where
a subscript ki may be used to index the kth dependent variable for observation i (k
takes on values between 1 and K, and i takes on values between 1 and N). A
regression, then, is attempting to explain or approximate Y using X1, X2, … XK. For
this reason, we will often refer to Y as an endogenous variable since it is determined
within the regression, and X1, X2, … Xk as exogenous variables since they are
determined outside of the regression.
This is the ideal case. As their labels suggest, it is important for the independent
variables to be truly exogenous; otherwise, the reliability of the regression is
compromised. In economics, it is very difficult to determine which variables actually
are exogenous and which are endogenous. In fact, it is often the case that two
variables (such as price and quantity) influence each other and are therefore both
endogenous. This problem is so significant that it requires an entire chapter to
address it properly; for now, we will mostly ignore it and assume the variable on the
left‐hand side of the equation is endogenous and the variables on the right‐hand
side are exogenous.
With this notation, the general form for a linear regression model is
Yi = β 0 + β1 X 1i + β 2 X 2i + ... + β K X Ki + ε i . (4.9)
If we define a new explanatory variable, X0, that is equal to 1 for every observation,
this can be written more compactly as
Yi = ∑ β k X ki + ε i . (4.10)
k
The betas, β0, β1, etc., are the parameters we are trying to estimate. In order to
approximate the parameters, regression analysis simply assumes that the equation
we’ve described is a good approximation, and then finds the line that is closest to all
of the points in the dataset.
No regression is perfect. Theoretically, there exists an equation that will
completely describe our dependent variable without error. By estimating the
parameters in our approximation, we are attempting to recreate the “true” equation
as best as possible. Since we will never be perfectly accurate, we represent our
estimated betas by placing carets, or hats, over them. The predicted value of the
dependent variable, Y‐hat, is determined by the estimated coefficients:
100
Yˆi = βˆ0 + βˆ1 X 1 + βˆ2 X 2 + ... + βˆK X K = ∑ βˆk X k . (4.11)
k
The regression error (approximation error) is the difference between the actual and
predicted values of the dependent variables. So, the sum of squares errors (SSE) is
( )
n
SSE = ∑ Yi − Yˆi
2
. (4.12)
i =1
This is equivalent to
( )
n
SSE = ∑ Yi − βˆ0 − βˆ1 X 1i − βˆ2 − X 2i − ... − βˆK X Ki
2
(4.13)
i =1
or to
2
n
⎛ K
⎞
SSE = ∑ ⎜ Yi − ∑ βˆk X ki ⎟ .
i =1 ⎝ k =0 ⎠
In interpreting this, remember Yi is the actual observation from the ith trial and
Yˆi is the predicted observation based on our estimated parameters. In short, our
dependent variables (Y’s) and independent variables (X1’s, X2’s, etc.) are our
observed data, and the parameters (β0, β1, etc.) are the unknown quantities that we
are trying to estimate.
To minimize SSE, simply set the partial derivative equal to zero for each
parameter:
∂SSE ∂SSE ∂SSE
= 0, = 0,..., = 0 . (4.14)
∂βˆ0 ∂βˆ1 ∂βˆK
For the intercept, taking the derivative of equation (4.13) yields:
∂SSE
( )
n
= −2∑ Yi − βˆ0 − βˆ1 X 1i − βˆ2 − X 2i − ... − βˆK X Ki = 0 . (4.15)
∂βˆ0 i =1
Substituting, this is just:
n
−2∑ εˆi = 0
i =1
n
. (4.16)
∑ εˆ
i =1
i =0
For any of the other k parameters, taking the derivative of equation (4.13) yields:
∂SSE
( )
n
= −2∑ X ki Yi − βˆ0 − βˆ1 X 1i − βˆ2 − X 2i − ... − βˆK X Ki = 0 . (4.17)
∂βˆk i =1
Substituting, this is just:
101
n
−2∑ X kiεˆi = 0
i =1
n
. (4.18)
∑X
i =1
εˆ = 0
ki i
This gives K+1 equations to solve for the K+1 unknown coefficients. The K+1
equations are that the sum of the approximation errors are zero, and that the K
sums of the product of each independent variable and the approximation error are
all individually 0. These K+1 equations are referred to as the normal equations.
There is now even more reason to use computers to perform the actual calculations
compared to our one independent variable example above, but there is nothing
mysterious, or even complicated, about the idea involved.
The most important assumption here is that all of the explanatory variables are
exogenous and are completely uncorrelated with the unobserved random error
component. If that is not the case, the calculation can not sort out the effect of the
included variables from the effects of the random error with which they are
correlated. This is known as omitted variables bias. While we will touch on this
lightly later in this chapter and the next, for the
most part we will simply assume the explanatory
variables are exogenous and uncorrelated with the Size
random error component. A thorough discussion of Obs Bill ($) (1000s) Temp
omitted variables bias and what can be done about 1 343.32 3.2 63
2 299.21 3.4 51
it must wait until Chapter 6.
3 302.02 1.6 99
To make this more concrete, lets return to the 4 167.94 1.2 60
electric bill example and add another explanatory 5 209.55 2.7 45
6 367.80 3.1 95
variable. The table to the right now includes the
7 390.06 2.5 94
monthly average temperature. It makes sense that, 8 398.46 3.2 97
when temperatures are higher, electricity used for 9 224.87 1.6 64
air conditioning will be higher than when 10 313.27 3 51
temperatures are moderate. Of course, for houses 11 209.36 1.6 54
with electric heat, expenses will be higher at colder 12 355.15 3.1 81
13 344.13 1.7 91
temperatures too. That could cause the relationship 14 453.55 3.7 99
to be U shaped, not linear. However, most of our 15 184.73 1.2 72
data reflects moderate to warm temperatures; so, a 16 372.66 3.8 82
positive linear relationship seems likely. 17 264.10 2.2 88
18 325.37 2.6 74
Running an OLS regression on this data yields 19 204.54 1.8 45
the following results: 20 205.89 2.2 64
102
temperatures and sizes, you should subtract $43.66 from the sum of 64.07 times
size in thousands of square feet and 2.48 times temperature to get the best fit to the
data.
Notice how the additional data caused both the intercept and the coefficient on
Size to change as compared to the initial estimates from the regression when only
size was included. Why would that be? In part, it may be because of a positive
correlation, or co‐linearity, between size and temperature in our data. For some
reason, houses tend to be larger where temperatures are higher. When temperature
is left out, some of the effect of temperature is picked up by the coefficient on size,
causing it to be higher when temperature is left out.
More generally, both Size and Temp may be correlated with other variables that
have been omitted from the model and lumped into the error component but which
nonetheless influence the electric bill. Such omitted variables may cause the
coefficients on the included variables to be biased if they are correlated with
included variables and have their own direct affect on the dependent variable,
because the included variables proxy the effects of the omitted variables in addition
to the effects of the included variables. Thus, including variables that had been
omitted can change the coefficients on all the variables that had previously been
included. Since the omitted variables are often unknown, so are their correlations
with the included variables. So, the coefficients on the included variables can be
biased in unpredictable ways if the included variables are correlated with omitted
variables. That is why the assumption that the included variables are exogenous and
uncorrelated with error component is so important. Unfortunately, it never holds
exactly. All we can hope for is that the included variables are not too correlated with
variables that have been omitted, so that the bias in the estimated coefficients is not
too large.
Interactions
What if we want to model an interactive relationship between two (or more) of
the independent variables themselves? For example, in the above regression, we
estimated that both house size and temperature positively affect the size of the bill.
It would seem that if it takes more energy to cool a house when temperature rises,
that that additional cost would be higher still in larger houses. If we believed that
the effect of temperature was larger in bigger houses, we would need an interaction
term. An interaction term is a new independent variable in a regression that is
simply the product of two or more other independent variables. So, if we wanted to
capture the interactive effects of house size and temperature, our regression model
would become
Bill = β 0 + β1Size + β 2Temp + β3 Size × Temp + ε i (4.20)
This new model adds the product of Size and Temp as the new variable,
Size × Temp . To see the effects of this term, we can look at the partial derivative with
respect to Temp (holding Size constant)
103
∂Bill
= β 2 + β3 Size (4.21)
∂Temp
Including the interaction introduces a kind of non‐linearity into the model, in that
the effect of temperature (and size) on the electric bill are no longer constant. As
long as β3 is positive, the effect of temperature on the electric bill is greater as
house size becomes larger.
OLS is a kind of linear regression. So, it may occur to you to wonder if we can
actually estimate this model that has some sort of non‐linearity with a linear
regression model. Well, we can. Linear regression models must be linear in the
unknown values to be estimated – which are the coefficients. They do not need to be
linear in the observed variables. Indeed, we can make any sort of transformation we
would like to the underlying data. Once that is done, the independent and dependent
variables become constants from the viewpoint of estimating the coefficients, which
is done through solving a set of linear equations – that is, linear in the unknown
coefficients.
The table below displays the dataset with our new variable, which is simply the
product of Size and Temp. Running a new regression on this data, we obtain the
following:
Billi = −.82.51i + 79.57 Sizei + 3.01Tempi − 0.21Sizei × Tempi . (4.22)
The interpretation is that the effect of Size Size X
increased temperature is 0.21 lower for Obs Bill ($) (1000s) Temp Temp
every 1,000 square foot increase in size. This 1 343.32 3.2 63 201.6
is contrary to our expectations. What might 2 299.21 3.4 51 173.4
explain this? Most likely, some sort of 3 302.02 1.6 99 158.4
4 167.94 1.2 60 72
correlation between the new variable and
5 209.55 2.7 45 121.5
something omitted from our model. One 6 367.80 3.1 95 294.5
obvious candidate would be the age of the 7 390.06 2.5 94 235
structure. Newer structures are both larger 8 398.46 3.2 97 310.4
and better insulated, on average. Correlation 9 224.87 1.6 64 102.4
between size and unmeasured insulation 10 313.27 3 51 153
11 209.36 1.6 54 86.4
quality might overwhelm the tendency of 12 355.15 3.1 81 251.1
cooling costs to rise faster with temperature 13 344.13 1.7 91 154.7
in larger residences. If we included a 14 453.55 3.7 99 366.3
measure of residence age or insulation 15 184.73 1.2 72 86.4
quality, we might find a positive effect of the 16 372.66 3.8 82 311.6
interaction between size and temperature. 17 264.10 2.2 88 193.6
18 325.37 2.6 74 192.4
19 204.54 1.8 45 81
20 205.89 2.2 64 140.8
104
Categorical or Dummy Variables
Whereas the variables we’ve considered thus far have all been continuous,
categorical, or dummy, variables are discrete. They allow us to capture whether
certain discrete criteria are met or not. Examples include the effects of things such
as race, gender, or holding an advanced degree. Dummy variables take on a value of
1 if a condition is true, and are 0 otherwise. In a way, this is another kind of non‐
linearity in that the effect of these variables on the dependent variable is not
gradual, it is either all there or not there at all.
The number of dummy variables required to represent a categorical
classification is one less than the number of possible categories. To capture the sex
of a consumer, we would need one variable. It could be called Male and take on a
value 1 if the subject were male and 0 otherwise, or it could be called Female and
take on the value 1 if the subject is female and 0 otherwise. Both variables, though,
are not needed. If the subject is female, we know they are not male. If they are not
female, we know they are male.
For a more complex example, suppose we have data that classify highest degree
earned into: 1) less than a high school diploma, 2) a high school diploma, 3) an
Associates Degree, 4) a Bachelor’s degree, and 5) higher than a Bachelor’s degree.
We would need to introduce four binary variables to represent these five possible
categories. Suppose we chose to “omit” the category “a high school diploma”.
Anytime all of the other variables were 0, we would know that that observation
corresponded to someone who had completed high school but had not completed
any college level degrees.
To return to our electric bill example, we might reasonably suspect the presence
of a swimming pool to affect the electric bill, due to the need to run a pool pump. We
could add this to the regression with the inclusion of a categorical variable, Pool,
which takes on the value of 1 if a pool is present at the residence and 0 otherwise.
The model would then become
Billi = β 0 + β1Sizei + β 2Tempi + β3 Size × Tempi + β 4 Pool + ε i . (4.23)
The coefficient on Pool, β 4 , represents the addition to the bill when a pool is
present. When no pool is present, Pool is 0, so there is no addition to the bill. Thus,
our estimate for β 4 represents the average change in the electric bill due to the
presence of a pool.
The table below shows the new dataset. Running this new regression produces
the following result:
Bill i = 59.64 + 20.12Sizei + 1.62Tempi + 0.15Sizei × Tempi + 74.93Pooli . (4.24)
On average, the presence of a pool adds $74.93 to the electric bill.
105
Note the changes in the other
parameter values. In particular, the Size Size X
coefficients on Size and Temp are Obs Bill ($) (1000s) Temp Temp Pool
much smaller and the coefficient on 1 343.32 3.2 63 201.6 1
the interaction of Size and Temp is 2 299.21 3.4 51 173.4 1
now positive, at 0.15. So, controlling 3 302.02 1.6 99 158.4 0
for the presence of a pool, the effect 4 167.94 1.2 60 72 0
5 209.55 2.7 45 121.5 0
of a one degree increase in 6 367.80 3.1 95 294.5 1
temperature on the electric bill is 7 390.06 2.5 94 235 1
0.15 higher per thousand square foot 8 398.46 3.2 97 310.4 1
increase in the size of the residence. 9 224.87 1.6 64 102.4 0
So, for example, an increase in 10 313.27 3 51 153 1
11 209.36 1.6 54 86.4 0
temperature from 60 degrees to 90
12 355.15 3.1 81 251.1 1
degrees would increase the electric 13 344.13 1.7 91 154.7 1
bill by 0.15 times 30, or 4.5, more in a 14 453.55 3.7 99 366.3 1
2,500 square foot home than in a 15 184.73 1.2 72 86.4 0
1,500 square foot home. 16 372.66 3.8 82 311.6 1
17 264.10 2.2 88 193.6 0
What caused these changes when 18 325.37 2.6 74 192.4 1
we introduced the variable Pool? 19 204.54 1.8 45 81 0
First, there must have been some 20 205.89 2.2 64 140.8 0
correlation in multiple dimensions,
known as multicolinearity, between Size, Temperature, and Pool in this data.
Checking this out, you would find a very strong positive correlation between size
and the presence of a pool, and a weaker correlation between average temperature
and the presence of a pool. This induces, of course, a correlation between Pool and
Size × Temp . This correlation between the included variables and the omitted
variable Pool introduced bias into the previous coefficient estimates, since Pool has
a direct and important effect on electric bills. Including pool removed this source of
contamination and changed the results. Second, is entirely possible that any or all of
these variables are still correlated in unknown ways with the remaining random
error component which is omitted from the regression.
Flexibility of Functional Form – Log and Other Transformations
We have seen two limited examples of non‐linearity in our explanatory variables
above – interactions and dummy variables. But, even though the model must be
linear in the variables to use standard linear regression techniques, the sky is the
limit with respect to non‐linearity of the variables.4 If there is reason to believe a
relationship is quadratic (potentially U‐shaped or shaped like an inverted U), we can
include the square of one of the explanatory variables as a new independent
variable. If we think the relationship may be cubic, for example the typical cost
function from microeconomics, we can include both the square and the cube of an
explanatory variable as additional independent variables. If we think the dependent
4 There are non linear regression techniques that allow non linearity in the parameters, though they
are less commonly used. They are, however, beyond the scope of this class.
106
variable is inversely proportional to one of the explanatory variables, we can
include 1/X as an independent variable. All of these transformations of the
dependent variables leave the regression model linear in the coefficients to be
estimated. All of these can be estimated using OLS. Thus, while the standard
technique may be referred to as linear regression, it offers a great deal of flexibility
with regards to the shape of the approximation.
Of particular interest is the log‐linear or constant elasticity model. Anytime we
might expect the percentage response of the dependent variable to be a roughly
constant multiple of the percentage change in the independent variable, this may be
a good approximation to use. For example, suppose we think a 10% increase in
income will cause a 10% increase in purchases of quarter pound burgers whether
the average price of a burger is $2 or $4. Such a relationship is not captured with a
simple linear model.
For example, with regard to the electric bill example, it may be reasonable to
think the percentage change in the electric bill is a multiple of the percentage change
in house size or other variables so that the absolute increase in the electric bill when
house size increases from 1500 to 2000 square feet is larger if the temperature is
higher, but is the same in percentage terms. Thus, we could approximate the electric
bill with the following power function:
Billi = e β0 + β4 Pooli +ε i Sizeiβ1Tempiβ2 . (4.25)
We dropped the separate variable equal to the product of temperature and size
because, in this form, the two variables are already multiplied by one another, albeit
after being raised to potentially different powers. So, this functional form already
builds in interactions between the independent variables. Taking the partial
derivative of temperature yields the following:
∂Billi
= β 2 e β0 + β4 Pooli +ε i Sizeiβ1Tempiβ2 −1
∂Temp
. (4.26)
Billi
= β2
Tempi
Thus, once we have taken the natural logs of the variables, the model is linear in the
logs, and we can use linear regression techniques to estimate the parameter values.
107
The transformed data for our electric bill example is given in the table below.
Running the regression model above on this transformed data produces the
following result:
The interpretation of the coefficients of the first two variables is that a 10% increase
in house size increases the electric bill by 2.8%, and a 10% increase in temperature
increases the bill by 4.6%.
The interpretation of the coefficient on Pool is different. Recall the original
power function version of the model, the electric bill is e β0 + β4 Pooli +ε i Sizeiβ1Tempiβ2 .
Basically, when a pool is present, the electric bill that would be incurred without the
pool is multiplied by e0.26 = 1.2969 . So, the presence of a pool increases electric bills
by about 30%. It seems unreasonable to think a pool would have that large an effect.
But, remember, the coefficient is measuring not only the effect of a pool, but, also the
effect of any omitted appliances and activities that are closely correlated with the
presence of a pool (plus, this is hypothetical data anyway). In general, when the
dependent variable is in log form, the coefficients on dummy variables are closely
related to the percentage change arising from the presence of the condition
indicated by the dummy variable. To estimate this percentage effect, we
exponentiate the coefficient and subtract 1. (The result will be close to the
coefficient, as long as the coefficient is not too far from 0. Those familiar with how
natural logs and exponential functions work may see why that is.)
In this chapter, we have shown how to use regression analysis to estimate
approximations of underlying economic relationships and how to interpret the
results. The technique of regression analysis involves using calculus and algebra to
find coefficient estimates that minimize the total (squared) difference between the
approximation and the data in a given sample. While the approximation must be
linear in the parameters, it can be very non‐linear in the variables. However, just
because we have estimated an approximation does not mean the approximation is
reliable, reasonable, or accurate enough for its intended application. The next
chapter takes up evaluation of regression results.
108
Chapter 4 Terminology
The following is a list of terms that you should know in order to discuss and
apply the material from this chapter.
109
Chapter 5
Evaluating Regression Analyses
Up to this point, we’ve discussed the process for constructing an approximation
based on a regression. However, how do we evaluate the accuracy of a regression?
For example, above we presented the results of five different regressions to model
electric bills in our hypothetical dataset. How would we decide which was most
accurate? How would we decide if the most accurate one was accurate enough?
Bias and Imprecision
The first thing to understand is the difference between imprecision and bias.
This idea is easy to illustrate with an example that has nothing to do with
economics. Suppose the three stooges are playing darts. Larry is all over the board
with no pattern at all, Curly throws a tight group but he is always several inches
high and right, while Moe throws a tight group that tends to be a little low and left.
Larry is unbiased, because he is not consistently off in any particular way, but he is
very imprecise. Curly is precise because he throws tight groups, but he is biased
because he is always high and right by a wide margin. Larry is relatively precise and
only slightly biased.
Imprecision means a lack of consistency or a high degree of random variation
between in our results, whether or not they are right “on average”. That is, whether
or not we repeated a regression numerous times on different random samples, our
estimates would vary a great deal from one regression to another. Bias refers to a
non‐random difference between the results of a regression and the true underlying
model.
As touched on in Chapter 4, potential correlation between independent variables
in the regression and variables that have been omitted and thus are part of the error
component introduces omitted variables bias. Because there are always unknown
unknowns, this is uncertainty about the true underlying relationships on a very
deep level. While there are techniques to limit the impact of such bias, a large
measure of judgment is required. How can we speak with complete objectivity about
the impact of the unknown unknowns on our regression models? Before tackling the
topic of omitted variables bias, it is sensible to increase the quality of our judgment
about economic models. So, an in depth discussion of bias in regression models is
postponed until much later in the book. For now, we will assume that all inaccuracy
within our models is due to imprecision and focus on how to quantify, evaluate, and
improve precision.
Evaluating the Model Specification and Data
Before we discuss quantitative measures of accuracy, we consider some general
principles that should be followed to increase the reliability of our results. When
they are not followed, they give us reason to suspect the validity of the findings.
110
It’s important to include all variables that our theory indicates are major
determinants in the approximation. For example, if we wanted to predict how many
ice cream cones we plan on selling at a beachfront shop on a given day, it would be
important to include a variable that controls for the weather, since rain probably
reduces demand.
Similarly, it is important NOT to include variables for which there is no
theoretical rationale. It may seem natural to include every variable for which data is
available, just in case it has some correlation with the dependent variable. This is
referred to as data mining, or overfitting if we decide which variables to include
based only on their contribution to the measured “fit” of the approximation to the
data, and should be avoided to maintain the integrity of the approximation. It is
almost certain that a large number of variables are correlated with past
observations of the dependent variable through sheer random chance. If you search
long enough, you will find some for sure. But, there is no reason at all to expect such
correlations to be due to any actual underlying relationships or for them to hold up
in the future – they may simply be spurious. So, no inferences about underlying
causation or future outcomes should be drawn from a regression that was arrived at
by throwing in every possible variable and keeping the ones that “worked” in that
they were correlated. Further, every irrelevant variable included in a regression in
essence uses up some of the data, reducing the ability to test the importance of the
other, more relevant, variables with limited data. In short, economic theory should
be the backbone of our choice of variables.
The actual sample data must be as appropriate to our purpose, reliable, and
large as is feasible. Often, there are trade‐offs between the quality of a dataset and
your sample size. For example, a city level analysis may allow you to examine the
impact of variations in price, income, and demographic characteristics on demand
more easily than state level data, but state level data on many variables such as
income and demographic characteristics is often more accurate and available much
more frequently than city level data. More data will always improve the accuracy of
our regression – as long as it is good data. But, more data that is not suited to our
purpose is useless – worse than a small dataset containing accurate observations on
the right variables.
When designing a regression, we need to decide what form it will take, not only
what variables to include. Should the model be linear, quadratic, log‐linear, or take
some other form? Is there good reason to assume interaction between two or more
of the independent variables? Often, economics suggests important normalizations
to make before running a model. For example, demand should often be expressed on
a per capita basis and dollar amounts should be adjusted for inflation. These
decisions should be based on economic theory insofar as possible, though theory
offers less guidance about the detailed shape of an approximation than the variables
that should be included.
111
Evaluating the Signs and Magnitudes of the Coefficient Estimates
Once we have the regression results, the first thing to do is to check the signs of
the coefficient estimates against our expectations based on economic theory. For
example, the law of demand states that increasing prices yield decreasing
quantities; thus, we should find negative coefficients on product price when
approximating quantity demanded. Or, if one of our variables was income and we
were dealing with a normal good (one whose demand increases as income
increases), we would expect the coefficient on income to be positive. If the signs of
the coefficients depart from well established theory it is a reason to suspect there is
something wrong with our model. If there is something wrong with our model, there
is no reason to expect it to be stable from one time to the next and therefore no
reason to think it has any predictive power.
We may have strong expectations about the signs of some coefficients for
reasons specific to an individual case. If these expectations are not borne out, we
should think carefully about what that means. Is it most likely that our expectations
were wrong or that there is some flaw in our data or model? Is it likely that omitted
variables are causing the confusion?
We should also consider the reasonableness of the magnitude of the coefficients,
though there is no hard and fast rule for what “reasonable” means. Basically, the
coefficient estimates should meet the straight face test. For example, if you find a 1%
increase in the price of gasoline decreases market demand by 100%, or that a 100%
increase in income increased demand for travel by only 1%, something is wrong.
Consulting other studies of similar markets is a good way to get an idea if your
coefficient estimates are in the ballpark of reasonable.
Evaluating the Statistical Significance of the Results
After checking that the signs of the estimated coefficients are consistent with
established economic theory and the results of other empirical studies, that their
magnitudes are reasonable, and making sure that there is a reasonable explanation
for any anomalies, it is time to move on to quantitative measures of the reliability
and precision of the results. For this, we need to revisit what the regression is doing
mathematically. As seen in equations (4.12) and (4.14), the coefficient estimates are
chosen to minimize the sum of squared errors, SSE.
Another way of looking at it is that we are trying to choose the coefficients to
account for as much of the variation in the dependent variable around its mean
value as possible. Letting Y represent the mean value of the dependent variable, the
total sum of squared variation in the dependent variable, SST, is:
SST = ∑ i (Yi − Y ) .
2
(5.1)
Similarly, the sum of squared variation attributable to the model, SSM (sometimes
known as the regression sum of squares or the explained sum of squares), is:
(
SSM = ∑ i Yˆi − Y )
2
. (5.2)
112
Making use of equations (4.14) and a good bit of algebra, it is possible to show that
the sum of squares total is equal to the sum of squares attributable to the model plus
the sum of squared errors:
SST = SSM + SSE . (5.3)
So, minimizing SSE maximizes SSM. These definitions and this last equality are
useful in evaluating the model. Analysis based on these types of calculations
pertaining to the variation in the data is known as Analysis of Variance, or ANOVA.
In evaluation of the model, a lower SSE is better, all else equal. But, it is always
good to have more independent data to use to estimate the model, and the more
observations, the higher SSE. What we need is some measure of the typical squared
error. Rather than dividing by the number of observations, n, we divide by n‐K‐1,
which corresponds to the degrees of freedom of SSE. Recall that when fitting a line,
or any two‐parameter approximation, to two data points, we accounted for the
observed data completely. Similarly, since we are estimating K+1 coefficients, we
can fit K+1 points perfectly. We therefore only have n‐K‐1 independent
contributions to SSE. The smaller SSE/(n‐K‐1), the more power the model has to
account for the dependent variable. For later reference, this ratio is known as the
mean square error, or MSE:
SSE
MSE = . (5.4)
n − K −1
Similarly, the more variation picked up by the model per explanatory variable,
that is, the higher SSM/K, the more explanatory power the variables would seem to
have.5 This is the mean square due to the model, or MSM:
SSM
MSM = . (5.5)
K
The ratio of these two measures is an F‐statistic:
MSM SSM K
F= = . (5.6)
MSE SSE ( n − K − 1)
The F‐statistic is so named because it follows a statistical distribution known as the
F‐distribution. We will not be proving that the expression in equation (5.6) follows
the F‐distribution. But, it would be good to know if there is reason to believe the K
independent variables we selected tell us more as a group about the dependent
variable than would K randomly selected variables. The fact that the properties of
the F‐distribution are known means that we can use the F‐statistic to shed light on
just that question.
5 K is the degrees of freedom of the sum of squares accounted for by the model, even though K+1
∑ (Yˆ − Y )
2
coefficients are estimated. That is because SSM = i i , and, while K+1 coefficients are
used to calculate the predicted values, one is also used to calculate the mean, so, the independent
variation around the mean can move in K dimensions, not K+1.
113
Regression output provides the F‐statistic and a p‐value associated with it.
Roughly speaking, that p‐value is the probability that K randomly selected variables
with no systematic relationship with the dependent variable would be as highly
correlated with the dependent variable as are the K variables of our model. Put
differently, it can be thought of as an estimate of the chances that all the correlation
in the model is completely spurious. Thus, the lower the p‐value, the likelier it is that
our explanatory variables are actually correlated systematically with the dependent
variable. If the p‐value is low enough, the model as a whole is said to be statistically
significant – that is, there is statistical evidence that there really are underlying
relationships in the data. What “low‐enough” means is up to the judgment of the
researcher and the user, though 10%, 5%, and 1% are often points of focus in
discussions. For those familiar with hypothesis testing, it is a p‐value for a test of the
null hypothesis that all of the independent variables are unrelated to the dependent
variable.
Once we have established that the model as a whole is statistically significant,
the statistical significance of the individual coefficient estimates should be
evaluated. Just as the sign of the coefficient on price should be negative, since there
is a strong theoretical reason to think price is an important determinant of the
quantity demanded, there should be statistical evidence that price increases reduce
demand. Regression output provides estimates of the coefficient, the standard error
of the coefficient, a t‐statistic for the coefficient, and a p‐value associated with that t‐
statistic.
While deriving the formula for the standard error of the coefficients is beyond
our scope, it is important to understand what it means. The standard error of the
estimated coefficient on independent variable k is denoted σ βk . Specifically, it is the
square root of the variance of the estimated coefficient. The variance of the
estimated coefficient, denoted σ β2k , is the expected value of the square of the
difference between the estimated coefficient and the true value:
((
σ β2 = E βˆk − β k
k
) ) .
2
(5.7)
Imagine collecting many independent samples and running the same regression
repeatedly. Due to the random error component, you will not get the same
coefficient estimates each time. The variance of the coefficient is an estimate of the
typical squared difference between the estimated coefficients and the true
underlying coefficient. The standard error of the coefficient is the square root of its
variance, so it is an estimate of the typical difference (in absolute terms) to be
expected between a particular estimate and the true value.
The t‐statistic is just the ratio of the coefficient to its standard error:
βk
tβk = . (5.8)
σβk
114
Intuitively, if the coefficient is small relative to the typical error in estimating it,
there is no real reason to think it is anything other than zero. Suppose we estimate a
log‐linear demand approximation and find an estimated price elasticity of demand
equal to ‐0.25 but the standard error is 0.5. The typical error is bigger than the
coefficient, so the “true” coefficient could easily be 0, and our finding of ‐0.25 could
be random variation. Thus, there would seem to be no statistical evidence that the
demand curve slopes down. That should make us very suspicious of our model
specification, the data we used, or both.
The statistical distribution of the t‐statistic is similar, but not identical, to the
normal distribution in that it has a similar shape and becomes closer to the normal
distribution as the sample size gets bigger. Roughly speaking, as long as there are a
reasonably large number of observations relative to the number of parameters
being estimated, say at least 30, the chances of observing a t‐statistic of 2 or over are
5% or less. So, a good rule of thumb is that a t‐statistic near 2 or larger in absolute
value indicates statistical significance. For the example used in the previous
paragraph, where the coefficient was ‐0.25 and the standard error was 0.5, the t‐
statistic is ‐0.5.
Alternatively, we could use the standard error of the coefficient and knowledge
of the t‐distribution to construct a confidence interval for the true value of the
coefficient. Using the rule of thumb described above, an approximate 95%
confidence interval for the coefficient is the estimated coefficient plus or minus
twice the standard deviation:
βˆk − 2σ βˆ ≤ β k ≤ βˆk + 2σ βˆ . (5.9)
k k
The interpretation is if this regression were run repeatedly on independent
samples, an interval constructed like this would contain the true value about 95% of
the time. We can test the hypothesis that the true value of the coefficient is zero by
checking to see whether the confidence interval contains zero. If not, there is
statistical evidence that there is some underlying relationship.
For the example above, the 95% confidence interval for demand elasticity would
be constructed as follows:
−0.25 − 2 ⋅ 0.5 ≤ η ≤ −0.25 + 2 ⋅ 0.5
.
−1.25 ≤ η ≤ 0.75
Speaking approximately, in the example we are 95% sure the true elasticity of
demand falls between ‐1.25 and 0.75. Thus, the model does not provide solid
evidence that elasticity of demand is even negative!
To formalize the idea of statistical significance precisely, rather than by rule of
thumb, we need to know precisely the statistical distribution of the t‐statistic. It is
similar, but not identical, to the normal distribution, in that it has a similar shape
and becomes closer to the normal distribution as the sample size gets bigger.
Importantly, the reported p‐value of the t‐statistic tells us the chances of observing a
t‐statistic of a given magnitude (in absolute value) if the true value of the coefficient
115
is zero. Thus, if the p‐value is “low enough”, the coefficient is said to be statistically
significant. The null hypothesis that the true value of the coefficient is zero may be
rejected at a level of significance given by the p‐value. This test is only strictly valid
under the assumptions of the linear regression model, the most important of which
is that there is no omitted variables bias. Again, what “low enough” means is up to
the ones using the results.
The qualitative and quantitative criteria discussed above can tell us if the results
of the model are reasonable based on economic theory and if they appear to
represent real relationships base on statistical evaluation. Those hurdles must be
cleared by any good model. But, they do not tell us whether the model is “good
enough”. How to make that determination depends on the purpose of the
regression. There are two possibilities: to generate a model that can be used to
predict the values of the dependent variable, or to generate coefficient estimates to
use for decision making purposes.
Evaluating the Accuracy of Results – Dependent Variable Prediction
If the purpose is to predict the dependent variable, we need to know how the
model fits the data. One measure of that is the R‐Squared, or R2, defined as:
SSM
R2 = . (5.10)
SST
Thus, the R2 is the fraction of the total variation of the dependent variable around its
mean that is explained by the model. While the R2 is often used as a measure of
“goodness of fit”, it is of limited usefulness. It does not tell you how big the errors
will be and how much that matters.
The root mean square error, also known as the standard error of the regression
or the standard error of the estimate, is much more informative. It is equal to the
square root of the mean square error and is denoted RMSE or σˆ :
∑ (Y − Yˆ )
2
i i i
RMSE = MSE = (5.11)
n − K −1
Remember, MSE is found by dividing the sum of the errors by the degrees of
freedom to accurately represent the average squared error per data point that the
regression is actually predicting. In short, this number represents the explanatory
power of the regression. Since we take the square root, we get an estimate of the
typical amount of error in the predicted values based on our model.
The RMSE can then be compared to the predicted values to get an idea whether
or not the margin of error is acceptable. There are two ways we might do this. First,
for approximately normally distributed data, about 95% of the distribution lies
within two standard deviations of the mean value (we used this rule of thumb
above). That means an approximate 95% confidence interval for the dependent
variable given a predicted value is:
116
Yˆi − 2σˆ ≤ Yi ≤ Yˆi + 2σˆ (5.12)
This interval will give us a good idea of the variability of the actual values from a
predicted value. Suppose Yˆ = 10, 000 and the standard error is σ̂ = 10 . A 95%
confidence interval for the actual value is [9980,10020] . If we need to order
inventory to meet demand, this might be a very good estimate. We could order
10,050 units and could meet demand with a very small amount of unsold product
relative to the initial order, even if demand turned out on the low end of the interval.
While confidence intervals give us a starting point for determining the accuracy
of the regression, their width does not tell us everything. What if our point estimate
was Yˆ = 50 but our standard error was still 10? The new confidence interval is
[30, 70] , which is clearly a much less accurate interval. If this is a prediction of
demand and we order, say, 75 units to cover the upper end of the interval, there is a
very good chance that over 40% of our order will remain unsold.
Another, perhaps more useful, way to measure the accuracy of the predictions of
a regression is to look at the size of the standard error relative to our point estimate,
or the relative standard error, RSE:
σˆ
RSE = . (5.13)
Yˆ
If the standard error is 10 and the predicted value is 10,000, the RSE is
(10 10000 ) = 0.001 . If instead the predicted value is 50, the RSE is (10 50 ) = 0.2 . So, it
is not the standard error in isolation that matters, but its magnitude relative to the
predicted value of interest to the user. There is no concrete rule for what value is
acceptable; it all depends on the situation at hand. If inventory costs are small, for
example, maybe a large RSE won’t be a problem. If storage is expensive, the RSE will
need to be smaller before the regression is “good enough”.
Special problems arise when evaluating the accuracy of models in which the
dependent variable has been transformed, such as in log‐linear models. That is,
because the root mean square error of the log variable is not the same as the root
mean square error of the untransformed variable, which is ultimately of interest.
These problems are exacerbated even more when we are comparing models in
which the variable has been transformed to models in which it has not been
transformed. This discussion gets technical – sorry, I could not come up with any
way around it!
To make the comparison, we have to transform the predicted transformed
values back into the untransformed state. For example, if we had a prediction of the
natural log of Y, ln Y , we need to transform that to a prediction of Y. How would we
do that? It seems the answer would be to exponentiate ln Y to get Yˆ = eln Y . That is, if
the predicted log of Y is 3.168, it seems the predicted value of Y would be Yˆ = e3.168 .
That is not quite true, darn! Why not?
The log linear model looks something like
117
ln (Y ) = β 0 + ∑ k β k ln ( X k ) + ε , (5.14)
where the error term is approximately normally distributed and has an expected
value of zero. But, the constant elasticity model from which it is derived looks like:
Y = e β0 X 1β1 X 2β2 X Kβ K eε . (5.15)
Once we have estimated coefficients, we can get predicted values, so we have
Exponentiating these indeed gives
eln Y = e β0 X 1β1 X 2β2 X Kβ K .
ˆ ˆ ˆ ˆ
(5.17)
Evaluating the Accuracy of Results – Coefficient Estimates
Similar considerations occur when the objects of interest are the coefficient
estimates themselves. Above, we discussed constructing confidence intervals for the
coefficients. Recall that, speaking approximately, 95% of the time:
βˆ − 2σˆ β < β < βˆ + 2σˆ β .
Looking at the interval can give us a good idea of the general variability of our
estimate. But, we should further use this understanding of the margin of error in our
estimates to get some idea of what a typical error might cost us.
For example, suppose we want an estimate of the price elasticity of demand to
set the profit‐maximizing price. Suppose the estimate were ‐6 with a standard error
of 2. Then, a 95% confidence interval for the elasticity of demand would be
118
−10 < β < −2
Assuming we were going to use this to determine the profit‐maximizing price, we
can use equation 2.18 to construct a point estimate and a confidence interval for the
η
markup factor, . Plugging in our estimates for elasticity, we find the interval for
1+η
the markup factor to be
⎡ −10 −2 ⎤
⎢ , ⎥
⎣ (1 − 10 ) (1 − 2 )⎦
or
[1.11,2 ]
and the point estimate to be
−6
= 1.2 .
1− 6
This suggests that our profit‐maximizing markup factor over marginal cost
should be somewhere between 11% and 100%, with a point estimate of 20%. Or, if
marginal cost is $100, it suggests a price between $111.11 and $200, with a point
estimate of $120. Clearly, this is a very broad range for the optimal price. Setting a
price of $120 might cost a lot of profit if the optimal price were $111 or $200. So, it
probably makes sense to go back and try to improve the model.
Improving Precision
After looking at the regression results quantitatively, we may want to improve
the soundness and accuracy of the model. It is possible to improve the model by
collecting more data. More data can mean more variables or more observations. It
may seem natural that more data is always better, but this is not the case. In fact,
variables that have little or no systematic relationship with the dependent variable
can actually distort any “good” data that may be in the regression. Similarly, adding
more observations can be a bad idea if it means going to lower quality data – that is,
less precisely measured data. So, the solution for increasing accuracy is to make
sure you have more data of high quality on variables that ought to be included on a
basis of economic theory. The problem is that this can become expensive. Firms will
face a trade‐off between the quantity and quality of the data available for analysis
and the cost of that data. The more important it is to have an accurate
approximation of demand, cost, or whatever is being studied, the more it will make
sense to spend to obtain more and better observations and observations of more of
the important variables.
On a similar note, discarding data that shouldn’t be there will also increase the
reliability of the results. The computer is ultimately responsible only for the
relatively simple calculations described earlier; it has no way of judging the
substance of the economic relationships we are trying to measure.
119
Aside from adding additional observations, and from adding appropriate
variables or dropping inappropriate ones (based on theory and solid reasoning, not
data mining), reflection may allow us to find ways to improve the specification. We
may find that interactions between price and income are important; that senior
citizens are more price‐sensitive so an interaction between age and price is
important. Careful thought can often be used to improve models that are not
performing well. Though, the performance of the model is ultimately limited by both
the care of the researcher and the quality of data available.
Evaluating Results – an Extended Example
To make all of this more concrete, lets tie it to some regression output. The
output below is from Microsoft Excel and corresponds to the model of electric bills
expressed in equation (4.24). The independent variables are Size, Temp, SizeXTemp,
and Pool. The R Square of 0.949 means the model accounts for 95% of the variation
of the electric bill around its mean value which was observed in the data. The output
under the heading ANOVA corresponds to various sum of squares calculations. The
line labeled “Regression” gives sum of squares data for the model. So, SSM is
124,040. Since SST is 130,770, we see where the R square came from:
SSM 124040
R2 = = = 0.95 . (5.19)
SST 130770
The standard error of the estimate, or RMSE, is given in the first set of numbers
at the top as 21.2. Under the ANOVA section, the column labeled MS corresponds to
mean square calculations where the total sum of squares is divided by the
appropriate degrees of freedom. MSE is 449 and the square root of that gives:
RMSE = MSE = 449 = 21.2 . (5.20)
Suppose we are interested in the accuracy of a predicted electric bill for a house
with 2,500 square feet and a pool when the temperature is 85 degrees. The
predicted value would be calculated as follows:
Bill = 59.64 + 20.12 ⋅ 2.5 + 1.62 ⋅ 85 + 0.15 ⋅ 2.5 ⋅ 85 + 74.93 = 354 . (5.21)
Given the RMSE, the relative standard error is:
21.2
RSE = = 0.06. (5.22)
354
That means the model will typically miss the actual bill on a house with these
characteristics by about 6%. Alternatively, a rough 95% confidence interval is:
354 − 2 ⋅ 21.2 ≤ Bill ≤ 354 + 2 ⋅ 21.2
(5.23)
312 ≤ Bill ≤ 396
From the row of the ANOVA table labeled regression, the MSM is 31010. Dividing
by the MSE gives the F‐statistic:
120
MSM 31010
F= = = 69.1 . (5.24)
MSE 449
The P‐value for that F‐statistic is given as 0.000000002. Thus, the null hypothesis
that the true coefficients are zero for all independent variables is almost certainly
not true. Put differently, the chances of getting this high of a completely spurious
correlation are negligible. That is a good indicator about the quality of the model.
Regression Statistics
Multiple R 0.97
R Square 0.95
Adjusted R Square 0.93
Standard Error 21.2
Observations 20
ANOVA
df SS MS F Pvalue
Regression 4 124040 31010 69.1 2E‐09
Residual 15 6730 449
Total 19 130770
Lower Upper
Coef. Std Err t Stat Pvalue 95% 95%
Intercept 59.64 74.61 0.80 0.44 ‐99.39 218.66
Size 20.12 29.28 0.69 0.50 ‐42.29 82.54
Temp 1.62 0.96 1.69 0.11 ‐0.43 3.68
Size X Temp 0.15 0.35 0.41 0.68 ‐0.61 0.90
Pool 74.93 16.21 4.62 0.00 40.38 109.48
Looking at the individual coefficients, we can see that they all have the expected
signs. Further, a bit of calculation would reveal that the magnitudes of the
coefficients are not unreasonable. The standard error for the coefficient on Pool is
small relative to the coefficient, so the t‐statistic is large, the P‐value is small, and
zero does not lie within the bounds of the confidence interval provided. The
standard error of the coefficient on temperature is larger relative to the coefficient,
but the t‐stat is somewhat close to 2, the P‐value is relatively small, and the
confidence interval barely contains zero. So, statistically, it seems there is evidence
of a relationship between temperature and the electric bill, though, the evidence is
perhaps somewhat weaker than we might have hoped.
The standard errors are large relative to the coefficients for both size and for the
interaction of size and temperature. That means the t‐statistics are small and the
corresponding P‐values are large. Looking at the upper and lower limits of a 95%
confidence interval provided in the output, we see that zero is well within the
confidence interval for these two variables. In other words, statistically, there is no
particularly strong reason to think that either size or its interaction with
temperature is statistically significant individually. A more advanced test would be
to see if there is evidence that the two matter when taken together – but that is
121
beyond our scope. For now, we note that while we may have strong theoretical and
logical reasons to suspect that both those variables actually matter, those
parameters are not measured very precisely.
If we need the actual parameter estimates for Size or its interaction with Temp,
we probably need to try to collect more data. This would be especially true if we
have any guesses about variables that are missing from our dataset that might be
confounding our attempts to measure the effect of size precisely. It would also be
true if we could collect data in which Size and SizeXTemp exhibit greater variation,
since we can trace out the effects of independent variables more accurately the
more variation we observe in them in the sample data.
Let’s use the information above to compare and ultimately choose between
various models of electricity bills. The table below collects the coefficients, their
standard errors, the R2, the RMSE, and the F‐statistic and its P‐value for all six
regressions in one place for comparison. The first four are the primarily linear
models estimated earlier in the chapter. The fifth is an additional linear model that
includes the pool indicator variable but does not include the interaction term. The
sixth is the log‐linear model from earlier in the chapter. In the table, asterisks are
used to indicate the general range of the P‐value associated with a coefficient. One
asterisk means the P‐value was less than or equal to 0.1, two means less than or
equal to 0.05, and three means less than or equal to 0.01. So, they indicate
approximate levels of statistical significance.
All of our regressions are highly significant taken as a whole, as indicated by the
P‐values of the F‐statistics. In the linear regressions, the RMSE takes a large jump
down when temperature is added moving from model 1 to model 2. When the
interaction term between size and temperature is added in model 3, the RMSE
actually goes up and the significance of the regression as a whole goes down.
Looking at the coefficients, while both of those variables were very significant
statistically in model 2, with small standard errors, their standard errors go up a lot
when the interaction is included in model 3. Moreover, the sign of the interaction
term is contrary to our expectations.
It is hard to say whether model 2 or 3 is “better”. Within the sample, model 2
gives better predictions (slightly, as measured by RMSE). But, there is a strong
reason to suspect the interaction matters. In trying to sort out what went wrong in
model 3, there is a ready suspect ‐ we may simply not have enough independent
variation in size and temperature. In that case, the product of the two will be very
highly correlated with the individual variables, making it impossible to sort out
what part of the variation in electric bills is driven by size, temperature, or their
interaction, since they all tend to move together. This is true in spite of the fact that
we know from the high significance of the regression as a whole that the group of
variables as a whole is strongly associated with variation in the electric bill.
122
Summary of Regression Output for Electric Bill Example
Standard errors in parentheses.
* P‐value≤0.1. * P‐value≤0.05. * P‐value≤0.01.
Model # (1) (2) (3) (4) (5) (6)
Linear or Log‐ Log‐
Linear Linear Linear Linear Linear
Linear Linear
75.11*** 64.07*** 79.57* 20.12 31.65*** 0.28***
Size
(15.61) (8.79) (39.67) (29.28) (8.95) (0.08)
‐‐ 2.48*** 3.01** 1.62 2.01*** 0.46***
Temp
‐‐ (0.38) (1.38) (0.96) (0.27) (0.08)
‐‐ ‐‐ ‐0.21 0.15 ‐‐ ‐‐
SizeXTemp
‐‐ ‐‐ (0.52) (0.35) ‐‐ ‐‐
‐‐ ‐‐ ‐‐ 74.93*** 73.47*** 0.26***
Pool
‐‐ ‐‐ ‐‐ (16.21) (15.41) (0.06)
111.27** ‐43.66 ‐82.51 59.64 30.88 3.31***
Intercept
(40.56) (32.77) (102.49) (74.61) (26.75) (0.33)
F‐statistic 23.16 58.94 37.41 69.12 97.13 77.08
P‐value 1.4E‐04 2.3E‐08 1.8E‐07 1.8E‐09 1.8E‐10 1.0E‐09
R Square 0.563 0.874 0.875 0.949 0.948 0.935
RMSE 56.37 31.14 31.93 21.18 20.63 0.08
Analysis of the raw data shows the linear correlation coefficient (which ranges
from ‐1 for a perfect negative relationship to +1 for a perfect positive relationship)
is 0.19 between size and temperature, 0.82 between size and the interaction of size
and temperature, and 0.70 between temperature and the interaction variable. It is
not surprising that we were unable to sort out the independent effects of the three
variables precisely. The solution is to collect more data. We need data with more
variation in size, more variation in temperature, and, in particular, we need more
variation in temperature for each given size and more variation in size for each
given temperature. If it is not possible to attain data that exhibits such variation, it
will not be possible to identify the separate effect of the interaction term.
While the negative sign on the interaction in model 3 may be due simply to the
inability to sort out the effects of the three independent variables from one another,
there is another possible culprit. As alluded to previously, it is possible that bigger
houses are newer and therefore better insulated. The electric bill will rise less with
temperature increases in a well insulated home than in a poorly insulated one. The
interaction of age, and therefore insulation, with temperature may be getting mixed
up with the interaction of size and temperature. To get any idea about that
possibility, we need to get data on age. Separate data on the degree of insulation
would be good to have, too.
123
When Pool is added in model 4, the RMSE takes another large step down, and
correspondingly, the R2 jumps up. As discussed earlier in the chapter, the sign on
Pool makes sense, even if the magnitude may seem a little large to be due to the pool
in and of itself, and the coefficient is statistically very significant. In model 4, the sign
of the interaction term becomes positive but it remains small, imprecisely
estimated, and statistically insignificant.
Because Pool clearly seems to work, but it is not clear if the interaction should be
included, Model 5 represents a regression with Size, Temp, and Pool, but without
the interaction term. The RMSE is again slightly lower without the interaction. With
Pool included, and, without the interaction, the coefficients on Size and Temp are
again highly significant, as in Model 2.
Of the first five models in the table, which is best? Since there are solid logical
and theoretical reasons to think Size, Temp, and Pool all matter, and since the
evidence is strongly consistent with that hypothesis, it is between model 4 and
model 5. If we were to adopt the criteria that a variable should only be added when
it reduced the prediction error, that is the RMSE, we would go with model 4.
However, there is a good reason to think that the interaction does matter. The best
approach is to try to collect more and better data. If we have to use one of these
models, which is better is a toss up. Since we have only 20 observations, a good
argument may be made for dropping the interaction and going with model 5 since
we just do not have enough data to do a good job of estimating 5 coefficients. If we
had 100 observations and similar results, that argument would not apply.
The last column of the table contains the results of the log‐linear regression. The
coefficients all have the right signs and are very statistically significant. Remember,
the coefficients on Size and Temp represent elasticities. Judging from the P‐value of
the F‐statistic, the model as a whole is very statistically significant. While the model
explains 93.5 percent of the variation in the log of the electric bill with a RMSE of
only 0.08, those numbers can not be compared with the results from the other
regressions, since the dependent variable has been transformed. Instead, we have to
use equation (5.18) to convert the predicted log of the bill to a predicted value for
the mean realization of the bill for each observation. We then calculate the squared
residual for each observation, add them up, and divide by n‐K‐1 to get the mean
square error. Taking the square root gives the RMSE as 21.87. This calculation is
performed in the table below. Thus, the log‐linear model produces a slightly larger
prediction error than do models 4 and 5, but only slightly so. In addition, model 6
does build in some interaction between size and temperature, as shown in equation
(4.26).
124
Predicted
Predicted Bill (Mean) Squared
eln Y +σˆ 2 Residual Residual
2
Obs Bill Log Bill Log Bill
1 343.32 5.84 5.8051 333.11 11.31 127.95
2 299.21 5.70 5.7251 307.49 ‐7.26 52.67
3 302.02 5.71 5.5578 260.11 42.78 1830.11
4 167.94 5.12 5.2458 190.40 ‐21.83 476.69
5 209.55 5.34 5.3433 209.90 0.35 0.12
6 367.80 5.91 5.9851 398.80 ‐29.67 880.35
7 390.06 5.97 5.9193 373.39 17.91 320.80
8 398.46 5.99 6.0037 406.28 ‐6.47 41.86
9 224.87 5.42 5.3570 212.81 12.77 163.04
10 313.27 5.75 5.6896 296.77 17.49 305.74
11 209.36 5.34 5.2789 196.80 13.21 174.63
12 355.15 5.87 5.9118 370.60 ‐14.22 202.11
13 344.13 5.84 5.7951 329.77 15.46 238.96
14 453.55 6.12 6.0543 427.35 27.63 763.24
15 184.73 5.22 5.3297 207.06 ‐21.64 468.44
16 372.66 5.92 5.9751 394.84 ‐20.86 435.23
17 264.10 5.58 5.5938 269.66 ‐4.66 21.74
18 325.37 5.78 5.8203 338.21 ‐11.72 137.26
19 204.54 5.32 5.2284 187.11 18.05 325.94
20 205.89 5.33 5.4473 232.91 ‐26.24 688.40
SSE 7655.27
MSE 478.45
RMSE 21.87
Models 4, 5, and 6 all have similar explanatory power, within the sample, with a
very slight edge to model 5. But, models 4 and 6 allow for interaction between size
and temperature. In short, each of these models seems about as good as the others,
based on the available data. Additional data that showed more variability in the
independent variables and included other relevant information such as the age and
insulation quality of the homes and family size and income might produce a better
model and a clearer choice of which model is best. If this is all that the data that is
cost effective to gather and we are going to use the model to make predictions of
electric bills within the range of the data, which we choose will not matter all that
much because the RMSE is similar for all three models. Since the RMSE is relatively
small and the R2 correspondingly high, any of these models may be accurate enough
for our purely predictive purposes, depending on how much accuracy is needed for
the particular application, it may not be worth the cost of collecting more data.
Suppose, however, a property management company wants a good estimate of
the effect of temperature on the electric bills of different size residences in order to
help them decide on a plan for reducing energy expenses for the properties they
125
manage. Then we have a problem. Model 5 does not allow for interaction, but we
don’t get precise estimates from model 4. Perhaps it would make sense to use model
6. But, the fact that the effects in model 4 can’t be precisely identified should cause
some hesitation about relying on the results of model 6 for this purpose, too. If we
need precise estimates of the effects of size or temperature on the bill, it is
absolutely necessary to collect more data, with more variables and more variability
in the dependent variables. That question just is not adequately answered yet.
This hypothetical example was generated from a known “true” model. So, unlike
in actual applications, we can compare each of the models above to the “truth”. The
underlying model is:
Billi = 100 + 6Sizei + 0.9Tempi + 0.4Sizei × Tempi + 80 Pooli + ε i (5.25)
The independent variables for the example, including the random disturbance, were
generated randomly. The random disturbance was generated to have a correlation
of −0.28 with size. This is to represent an inverse relationship between age and size,
where age effects the bill but has been omitted from the model and left as part of the
error component.
The estimated effect of pool is close to correct in models 4 and 5. However, the
other coefficient estimates are far off in model 4, even though it is the closest. The
true marginal effect of temperature on the electric bill is:
∂Bill
= 0.9 + 0.4Size . (5.26)
∂Temp
However, based on model 4, the estimate would be:
∂Bill
= 1.62 + 0.15Size . (5.27)
∂Temp
The effect of temperature is much more dependent on the size of the residence than
our empirical models found. The “true” and “estimated” effect of temperature on
houses of different sizes is shown in the figure below. Below sizes of 2880 square
feet, model 4 underestimates the effect of temperature on the electric bill. Above
2880, model 4 underestimates the effect. The figure makes it clear that residence
size actually has a much larger impact on the effect of temperature on electric bills
than model 4 would indicate.
126
2.5
2
True
Estimted
1.75
1.5
1.25
1 1.5 2 2.5 3 3.5 4
Size (Thousands of Square Feet)
What should we take away from this example? First, it is critical to have lots of
independent variation in the independent variables or it may be impossible to
identify the effects of the independent variables accurately. Second, important
factors, such as the age of the building, should not be left out of the model. Doing so
may make it impossible to identify the effects of the individual variables. Third, you
are never sure you have not left something important out. Thus, it is likely that any
empirical model suffers from omitted variables bias to one degree or another.
Fourth, it is harder to be sure your coefficient estimates are right than to get good
predictions of the dependent variable, largely due to the first three things in this list.
Fifth, even when your specification is “right”, you may not be able to tell that it is
any better than other similar models. Remember, the RMSE was lowest for model 5,
not model 4, and the coefficients in model 4 were not statistically significant. Even
though model 4 was “right”, there was no way to be sure of that from the output.
Finally, even though the statistical procedures associated with estimating models
can look very sophisticated and technical, a great deal of judgment, art, and humility
should go into formulating, interpreting, and using any empirical approximation.
Limits of approximations
While regression analysis allows us to estimate approximations for both
predicting dependent variables and estimating coefficients, and to test hypotheses,
it has limitations even when the models are well estimated, precisely and carefully
evaluated and interpreted, and cautiously applied. In closing, we review the ones we
have already mentioned here, and discuss some additional ones as well.
First, remember not to extrapolate beyond the range of the data. Determining
the range of data can be more difficult than simply checking whether all the
independent variables for the case we are predicting falls within the ranges of the
corresponding independent variables in the sample. For example, suppose we have
data containing observations where income is $30,000 to $40,000 and price is $2 to
$4, and other observations where income is $60,000 to $70,000 and price is $4 to
$6. We use this to estimate a demand curve. Suppose we want to use these results to
predict demand when income is $35,000 and price is $5.50. Since the range of
127
income is $30,000 to $70,000 and price ranges from $4 to $6, it may seem as if the
point we wish to predict is well within the observed range. However, we never
observe prices over $4 in places with incomes below $60,000. So, there is no
support in the data for a prediction when price is $5.50 and income is $35,000.
Second, even a model that accounts very well for variation within the sample
may not hold up to application to other data, even when the range of data in the
sample covers the range for which we are predicting. It is best to verify the model on
data that was not included in the sample on which the model was estimated, if
enough data is available for that. That will help make certain that the model you
ended up choosing did not fit your sample best due simply to spurious correlations
in the particular subset of data you used to estimate the model. However, that may
not be enough to guarantee the model is applicable to other data points. Past
performance is no guarantee of future success!
Third, the last point can be pushed further. A regression is estimated given the
underlying structure of the market and institutions governing the processes
generating the data at a given interval of time. However, the nature of markets and
institutions change over time. Such structural breaks can lead to large changes in
the relationships between dependent and independent variables. Don’t make the
mistake of thinking that the “right” model is necessarily “right” outside of the
specific institutional context which gave rise to the data from which is was
estimated.
Finally, the gold standard for estimating the effect of an independent variable on
an outcome variable is a double blind randomized trial. Making the trial double
blind and randomized ensures that the independent variable under study is not
systematically related to other factors that affect the outcome variable. When we
estimate a regression model, the most important assumption we make is that the
included independent variables are not confounded with other variables that affect
the outcome but are not included in the model. Without explicitly randomizing the
assignment of individuals to things like income levels and prices, there is just no
way to ensure this assumption is met. It is a huge assumption, violating it has major
implications for the validity and reliability of empirical analyses, and we know it is
never strictly true. That does not mean we should abandon empirical investigations
and applications of economic theory. Imperfect models and approximations are
more useful than no models and approximations, provided we are humble and
cautious in their application. It also means we should take whatever steps are
reasonable to mitigate omitted variables bias. Chapter 6 is devoted to this topic.
128
Chapter 5 Terminology
The following is a list of terms that you should know in order to discuss and
apply the material from this chapter.
Bias Occurs when an estimate is inaccurate in a consistent, systematic way.
Imprecision Occurs when there is non systematic random error in an estimate.
Data Mining Also called over‐fitting, it occurs when variables that have nothing to
do with the regression are added in order to find a correlation that improves within
sample explanatory power.
Standard Error An estimate of the amount, on average, that a prediction or
coefficient estimate will be in error. The square root of the mean square error.
Degrees of Freedom the number of values in the final calculation of a statistic that
are free to vary.
Root Mean Squared Error (RMSE) The standard error of an unbiased estimator. It
is the square root of variance.
Coefficient of Determination (R2) The amount of the variation in the dependent
variable (y) that is explained by the independent variables (x variables). However,
because your variables can be highly correlated but not necessarily cause each
other, the RMSE is a much better indicator of the accuracy of the model.
Fstatistic – The ratio of Mean Square Deviation fron the Model to the Mean Square
Error.
tStatistic The coefficient estimate divided by its standard error.
Pvalue Probability of estimating an estimated statistic if the true value is 0. This
shows the likelihood that any random value would account for explain as much
variation of the dependent variable as the given independent variable. The lower
the p‐value, the more likely the independent variable is truly correlated with the
dependent variable.
129
Chapter 6
Omitted Variables Bias
When we first talked about estimating approximations and regression analysis,
we introduced the concepts of precision and accuracy. Imagine that three archers
are shooting at a target, and the results are as shown in the
figure to the right. Archer 1’s shots are all over the place, 1 1 22
and they aren’t near the center. Since the way in which 2 2
they are off target does not show any strong tendency, he 3 3
3 3
is not biased. But, the shots are very imprecise. This
corresponds to a large standard error. Archer 2’s shot
placement does not vary a lot, but he is not accurate, since 1 1
he has a strong tendency to be high and right. This
corresponds to bias. Archer 3’s shot placement shows as little variance as that of
Archer 2, but he is much less biased.
The ideal empirical method is the randomized double blind trial. The
randomized part comes from selecting a sample, and randomly dividing them into
two groups. You administer the treatment to one group, the treatment group, and
administer a placebo to the other one, the control group. The double blind part
means both those that administer the treatment and those that receive it don’t know
which is the control group and which is the treatment group until the end. With
group assignment randomized and completely unknown to all participants, there is
no way for the outcome to be biased.
An economic experiment can in theory be conducted using these same
principles. Suppose we are testing the effect of a marketing campaign on a certain
product, and we use two randomly assigned Avg Quantity
groups of customers, one being the control. Before After
Only the treatment group is exposed to the
campaign. The results of the study are shown Group Treatment 26 32
in the table. Control 25 21
The change in quantity for the treatment group was +6 and the change for the
control group was ‐4; but this is not what we’re concerned with. What we’re
concerned with is the difference between these changes. In our example, the
difference in difference (D.I.D.) is 10 (6)‐(‐4). You could calculate some measure of
accuracy using ANOVA tables (or any other statistical tools) and if you conducted
multiple experiments that returned the same estimate, you could start drawing
some conclusions about the effect of the marketing campaign.
The problem with empirical research in economics is that it is nearly impossible
to conduct randomized double blind experiments on a large scale with stakes large
enough to be meaningful. How could we randomly assign customers to markets with
different prices, for example? Even if we could, how could we keep both the subjects
and the data collectors ignorant of prices in other cities?
130
An alternative to a controlled experiment is to use observational or non‐
experimental data and use regression methods (such as ordinary least‐squared, or
OLS) to fit a model, and draw conclusions based on the results. Recall the general
form for a linear regression model is
K
y = β 0 + β1 x1 + β 2 x2 + ... + β K xK + ε = ∑ β k xk .
0
where ε is the random error, the x’s are the independent variables, and y is the
dependent variable. The fundamental assumption when using OLS regressions is
that the x’s are uncorrelated with the error term. If this is true, then the regression
is not biased.
In economics, however, many variables you might want to use as independent
variables to explain a dependent variable depend in turn on other variables within
the system; that is, most variables are endogenous by nature. Income and price are
important demand determinants. But, price depends on the interaction of supply
and demand, so, price depends on income and on everything else that determines
demand. Similarly, income is determined by the supply and demand for labor of
various types, which depends on many of the same things that affect demand. Thus,
it is very likely that the different independent variables in a regression will be
correlated with omitted variables present in the error term; but since you didn’t
measure the error term directly (by definition, or, it would not be in the error term),
there’s no way to know. This makes the identification of the effects of the individual
variables in your regression very difficult, and since you can’t identify which
variables have been omitted, it is difficult to know how biased your regression
results are.
This can be illustrated using p S2
supply and demand. Suppose at a
particular date and time you S1
observe a price of p1 and a quantity S3
of Q1, at a different date and time
you observe a price of p2 and a
quantity of Q2, and on yet another D2
day you observe a price of p3 and a D3 D̂
quantity of Q3. You then use these
points to estimate a demand curve.
Each observation of equilibrium D1
price and quantity represents the
interaction of a potentially different Q
supply and demand curve. This is
shown in the figure. If you find the line that best fits these three points, it will look
something like the dashed line labeled D̂ .
The line that best fits the data is hardly the demand curve. It is just a line that
comes closest to fitting points that depend on supply as much as demand.
Algebraically, let’s assume QD = a – bp + εD and QS = c + dp + εS. We know in
131
equilibrium quantity demanded has to equal quantity supplied. We can thus solve
for the equilibrium price as follows.
QD = a − bp + ε D = c + dp + ε S = QS
( d + b) p = a − c + ε D − ε S
a − c + εD − εS
pe =
( d + b)
Equilibrium price depends on everything captured by a and c (income, costs, etc…)
AND the error in the demand equation. Thus the actual equilibrium price is directly
correlated with the omitted variables that affect quantity demanded.
If we try to use standard regression analysis to estimate the demand curve, we
would put quantity on the left and price and other demand determinants on the
right. But, the higher the demand error, the higher price will be. Since the right hand
side variable, price, is directly related to the error term, when there is a positive
demand shock, the direct effect will be to boost demand, but, the increase in price
that comes with the demand shock will cut quantity demanded. It is therefore not
obvious how to relate, and, an OLS regression certainly can’t sort it out. It could
easily look like increases in price are associated with increases in quantity
demanded when all that is happening is that positive demand shocks are pulling up
demand, thereby increasing both quantity and price.
Thus, the higher the error in the demand regression, the higher estimated price
will be, and since we can’t measure the demand error, we’re stuck with this
endogeneity problem. Basically, since both price and quantity depend on each other,
as well as everything else inside the system (income, wages, etc.), some of which are
omitted from the model. Therefore, the fact that many things are endogenous, or
determined simultaneously, means many variables on the right side of an equation
will be correlated with the error term. This form of omitted variables bias is called
endogeneity bias or simultaneous equations bias. The fact that so many things in
economics are potentially endogenous means that OVB is systematic and wide
spread, not simply confined to circumstances where explanatory variables just
happen by chance to be correlated with omitted variables – there are systemic
reasons to expect such correlations to be widespread.
Another systematic source of omitted variables bias is measurement error. We
know the dependent variable of a regression has an error term associated with it
due to the influence of variables we have not included in the empirical model. But,
variables, both dependent and independent, are also observed and measured with
error. Think of this type of error as usual data entry error, or error that results from
measurement to a limited amount of significant digits, etc. Suppose we have the
following “true” model for y:
y = b0 + b1 x + εy .
Suppose that rather than observing the true value of x, we observe or measure a
value of xm, which is the true value plus our measurement error:
132
x m = x + εm .
Solving this for x, we have:
x = xm − ε m .
Substituting this into the model above we get
y = b0 + b1 ( x m − εm ) + εy
y = b0 + b1 x m + (εy − b1εm ) .
This is the model that is feasible to estimate, given our observations. The new
error term is (εy–b1εm). The measured value of x is correlated with εm, so it is
correlated with the error term. Thus, measurement error introduces OVB. This is
not the same indirect bias that we were concerned with when talking about the
supply and demand example – this is simply because there was noise
(imperfections) when we measured x.
Intuitively, when x changes, there is no way to know if it is changing in truth, or,
only as the result of a measurement error. So, if no change in y is observed, there is
no way to know if that is because y is not related to x (b1=0) or because the
observed change in x was only measurement error. Thus, OLS cannot identify the
true underlying relationship and separate it from the noise.
Reducing Bias
It is possible to take steps to lessen the degree and impact of omitted variables
bias. We will briefly discuss five: market trials, natural experiments, instrumental
variables, panel data techniques, and regression discontinuity designs. The
treatment of each of this will be very brief and incomplete. It is intended to give you
an intuitive idea of the types of things that are done to reduce OVB so that you will
be in a better position to evaluate published studies and consultant reports, not to
give you the technical expertise to follow every detail of such studies or to conduct
them yourselves. For those interested, a quick search of any of these topics online
will turn up a large number of examples and references.
Market Trials
First, a firm could run a market trial; that is, releasing a certain product in “test
markets” that are essentially a treatment group, while holding other markets
constant. One problem with this method is that you don’t know how randomized
your treatment group is. Customers in the trial markets are likely to exhibit
systematic differences from those in other markets (the control group) even if the
trial markets are randomly selected – unless the number of trial markets is very
large. Second, it is unlikely customers will be ignorant of the experiment for long. If
they figure out what’s going on, the experiment is no longer double blind, and if
certain customers are getting a lower price and realize it, they may buy more than
133
they otherwise would and invalidate the experiment. Another problem is that
charging the treatment group a price different from the expected profit‐maximizing
price means a firm is potentially foregoing several months of profit.
Natural Experiments
The second technique is to find a “natural experiment” with which to identify the
effect of the variables of interest. These are events which are clearly outside of the
influence of markets or their participants which introduce exogenous (outside the
model) changes in explanatory variables that are otherwise determined in the
model. A potential example of a natural experiment is a government policy that has
large impacts on some locations and no impact on other locations. The key to
identifying a natural experiment is finding a place where an outside influence
caused an exogenous change in explanatory variables BUT where that outside
influence should have no direct effect on the dependent variable itself.
Take the following as an example. Say you want to know what the effects of
minimum wage are on employment. You know demand curves slope down, and you
think raising the wage will decrease the quantity of unskilled labor demanded. So,
you hypothesize that minimum wages (just like other price supports) cause
surpluses of low skill labor, and you test that. Minimum wage varies by state, so you
gather data for the whole country and regress unemployment rates against the
minimum wage data. Suppose your results tell you that states with higher minimum
wages have lower unemployment. This is the opposite of your original hypothesis –
but the problem is, this research design likely suffers from omitted variables bias. It
is entirely possible that states with higher unemployment have lower minimum
wages for reasons totally unrelated to the direct effect of minimum wages on
employment levels. Perhaps locations with a high demand for labor and low
unemployment set a wage which is nominally high, but, does not matter much
because few people would work at that wage in equilibrium anyway. So, it’s possible
that states with high minimum wages have other characteristics that would make
unemployment lower that have nothing to do with minimum wage. This is the
problem of causal identification. How can natural experiments help in this example?
Suppose the federal government institutes an increase in the federal minimum
wage, and some states are affected by it and others are not. This is an example of a
natural experiment, since the individual states were not in control of the change.
Now, you can compare the change in unemployment between the states that were
affected by the increase, and those that weren’t, using D.I.D. measures. Since you
didn’t spend any money on conducting this experiment, and it was more of a natural
byproduct of exogenous changes, it is an example of a natural experiment. The fact
that the change comes from the federal government arguably makes assignment of
individual cities to the “treatment” group(s) random enough for the statistical
analysis to be meaningful.
Or, maybe, not so much. In our example, it’s possible the states where minimum
wages were already high had systematic differences in their related policies (such as
134
welfare, subsidized housing, etc.) and market characteristics (industries, education,
demographics) that mitigate the effects of changes in the minimum wage (that is,
interact with the change in minimum wage in determining labor supply and
demand). Or, perhaps changes in the expected impact of a high minimum wage in
the states where the federal change lead to real changes in the minimum wage are
what lead to the federal policy change in the first place. If changes in underlying
conditions in the states affected by the policy are what caused the policy to be
enacted at any given time, the change was not exogenous, and the experiment would
no longer avoid the endogeneity problem.
Instrumental Variables
A third way to reduce OVB is to use instrumental variables. Let’s start with an
example of an instrumental variable, and then proceed with a more general
definition. Suppose quantity demanded is
QD = a − bp p + bm m + εD
where p is price and m is income; quantity supplied is
QS = c + d p p − dw w + εS
where p is price and w is the wage rate. Now, looking at the equation for demand,
we know price is correlated with the demand error term since price is endogenously
determined. The idea with instrumental variables is to find a variable that is
correlated with price, but not with the demand error. Then, we can use that variable
to identify changes in price that have nothing to do with changes in the demand
error, and use those changes to estimate the demand curve. A variable of this type is
called an instrument because you use it as an instrument to identify exogenous
variation in endogenous variables.
Looking at our equations for supply and demand, if we were to solve for
equilibrium quantity, we would get both quantity and price as functions of income,
wages, and the two error terms, or
QE = f ( m, w,εD ,εS )
p E = f ( m, w,εD ,εS )
If we were now to regress this function of price against these variables, we
would have a predicted equilibrium price given by the following:
pˆ E = α 0 + α1m + α 2 w
This predicted price is determined by income and wages, and is not correlated with
the demand error so long as income and wages are exogenous to the model. Thus,
we can use it as an instrumental variable, and plugging it into our equation for
quantity demanded we can run a new regression:
QD = a − bp pˆ E + bm m + e
135
where e is a different error term than the one in the original equation.
The major problem with using instrumental variables is that if you look at a
broad enough picture of the economy, most every variable is dependent on most
every other variable. Since we’ve developed a function to estimate price based on
income and wages, we’re assuming that income and wages are exogenous – but that
is not true. In fact, wages are a primary determinant of income, and, many factors
that affect income and wages are likely to also affect demand directly. So, for a
variable to be an instrument it has to be (a) not correlated with the error term, (b)
not directly in the equation, but (c) correlated with the endogenous variables that
are in the equation – and this can be quite difficult to find.
Panel Data Techniques
A fourth method for reducing OVB is to collect panel data. The idea is to follow a
group of people (or cities, etc.) over a long period of time. In a time series, data is
collected on one individual from time period t = 1, 2, …, T. Cross section data is
information on several individuals i = 1, 2, …, N in a single time period. Panel data
combines these ideas, and follows N individuals over T time periods.
They can help by isolating idiosyncratic differences between individual
observations. That is, suppose customers in Atlanta like your product for some
unknowable reason, and customers in Gainesville don’t like your product for some
unknowable reason. As a result, the manager in Atlanta sets a higher price and sells
more units. You don’t want to conclude that just because you can charge a higher
price in Atlanta and sell more your demand curve slopes up.
We might try to correct for systematic but unobserved differences across cities
(or individuals) by including an indicator, or dummy, variable for each city. The
model would look something like
yit = β 0 + β1 x1it + β 2 x2it + ... + β K xKit + α t + δ 2 d 2 + δ 2 d3 + ... + δ 2 d n + ε it .
In the model, yit is the dependent variable (price, etc.) in city i and time period t. It
depends on K independent variables x1it, x2it, and so on, which are also associated
with that city and time period. A possible time trend is captured by including t, and
α represents the general increase in the dependent variable per unit of time which
is constant for all years and all cities. The model also contains indicator variables
(the d’s) for each different city, in order to statistically pick up the idiosyncratic
differences between them. So, d2 could be for Gainesville, d3 for Ocala, d4 for
Orlando, etc. and would be 0 if you are not observing that city, and a 1 if you are. The
coefficients (the δs) measure the average difference between each category and the
“base” category, which is taken to be category 1 in the equation above.
Using summation notation, the model could be written more compactly as:
K N
yit = α t + ∑ β k xkit + ∑ δ i di + ε it .
k =0 i =1
136
In this notation, the first x variable is simply a 1 for each observation, as in the
earlier chapters.
You can then compare the data across the observed years. The change in y is
Δy it = y it − y i( t −1)
and, plugging our equation for yit and yi(t1) into this equation we get
K N
yit − yi (t −1) = α (t − (t − 1)) + ( β 0 − β 0 ) + ∑ β k ( xkit − xki (t −1) ) + ∑ δ 2 (d 2 − d 2 ) + ε it − ε i (t −1)
1 2
or
K
Δyit = α + ∑ β k Δxit + Δε it .
1
In this equation, the intercept, α, reflects the average growth across all cities
over time. So, when looking at the change in y, the constant and the dummy
variables drop out. In essence, looking at the change in yit has wiped out all of the
fixed idiosyncratic differences that don’t change over time. It does not matter if we
actually estimate this model in the form of differences, or, if we just include dummy
variables for each city, the effect is the same in practice.
The problem that is inherent in this method is that, while the model eliminates
fixed city‐specific differences, it has no way of controlling for variable city‐specific
differences. For example, if demand is higher for whatever reason in Atlanta than in
Gainesville in year 1, there is no reason to think that it will stay the same amount
higher in year 2, and thus endogeneity will be present in the regression. If, for
example, unmeasured income is trending up more in Atlanta than in Gainesville, a
good manager will likely be raising price every period MORE in Atlanta, and,
experiencing relative increases in quantity sold. Dummy variables, or differencing
the data, which is equivalent, will only take care of fixed idiosyncratic differences
between cities, and to this extent let you identify causal relationships. They do
nothing for time varying idiosyncrasies.
Regression Discontinuity Designs
Lets consider an example. Florida has what is generally considered to be a
stringent accountability system for its public K‐12 schools, at least compared to the
systems put in place in other states to comply with The No Child Left Behind Act of
2001. Suppose you would like to study the effect of school accountability measures.
Specifically, suppose you want to know whether the extra resources and pressure a
school that receives a “failing” grade receives as a result of its failure result in
improvements in student performance, or, changes in something like class size or
hours of instruction. That is, consider a school that received a “D” in 2005 and an “F”
in 2006. If the additional pressure and resources lead to improvements in student
performance, then the grades of individual students in those schools in 2007 should
137
increase relative to their 2006 level, at least as compared to schools who did not
“fail” in either 2005 or 2006.
To test this, we might compare the change in student scores on standardized
tests in schools that went from a “D” to an “F” the previous year to changes in
standardized scores for schools that were a “D” in both years. We could just test for
differences in means. Alternatively, we could do it as a regression model. First,
define a dummy variable, Failit, which is 1 for student i if their school received a
failing grade in year t after receiving a D in year t1 and 0 if it received a D both
years. (We are excluding students at schools that received other grades, to keep the
comparison as clean as possible.) If Testit is student i’s test score in year t, and again
∆ represents “change in,” the model is:
ΔTestit = β 0 + β1 Faili ( t −1) + ε it .
The hypothesis implies β1 is positive. The hypothesis is not that students do
better in failing schools – by definition they do worse. The idea is that in the year
after a school fails, the students learn more than they did the year the school failed,
due to the pressure put on a failing school and the extra resources that flow to it.
The problem with running the model above, or testing for differences in mean
test score changes across the groups (the two are statistically equivalent) is that
school grades are likely to be correlated with a number of other characteristics of
the school and its students which also affect test scores. It will be possible to control
for some of these things, but, not all of them. And, you will never know how many
confounding omitted variables there are, or, how important they are. So, what to do?
A regression discontinuity approach potentially offers a way out. It works
basically as follows. Find the schools that just barely passed and the schools that just
barely failed, and, compare changes in student test scores in those schools. Except
for random differences, these schools should be similar. As long as the threshold for
passing or failing is hard, and as long as every school that “fails” gets the same
“treatment” and no school that does not fail gets that treatment, this is essentially
equivalent to a randomized trial.
The technique is limited in that it can only be applied to cases where the
“treatment” is only received when some variable is above a well defined threshold
(here when a school’s test scores fall below an established level). Further, when the
threshold must be met to receive the treatment, but not everyone that meets the
threshold gets the treatment (crossing the threshold is necessary for “eligibility” but
does not guarantee all that are eligible will get the same treatment), the technique
becomes more complicated and more prone to bias.
In summary, all five of these techniques can be used to reduce OVB, but none of
them are inerrant. It is important to keep this in mind when looking at empirical
results of any kind. This is a good point to again emphasize that no result in
economics, or any other field of study that does not have ready recourse to carefully
controlled experimentation, should be taken seriously unless all three of the
following conditions are met.
138
First, the result is consistent with a believable, reasonable, understandable story.
You may have to think hard to grasp why the story is sensible, but, if you don’t
eventually understand it at an intuitive level, be suspicious. Second, the result is
consistent with a simple but formal mathematical model that captures the essence of
the argument, even if it is based on vast oversimplifications. The reason for this is
that by thinking about the problem mathematically, it forces your logic to be
consistent. Without this, a persuasive speaker who knows how to appeal to human
fallibilities and tendencies to systematic biases in reasoning can make just about
anything sound reasonable. Third, the result is confirmed repeatedly, using multiple
identification strategies, by multiple independent researchers using multiple
independent sources of data. This is not enough on its own, because, absent the
ability to conduct repeated randomized double blind trials, a dishonest person can
use statistical tools to make just about anything seem to be supported by the data if
they try long enough, and even the work of the most careful and honest researchers
is subject to tremendous potential biases and difficulty definitively identifying
causal relationships.
If some finding or recommendation can not be supported in all three of these
ways, it should be treated with a high degree of skepticism. Even if it meets all three
criteria, that does not prove the finding is correct in any fundamental way. It just
means we have a working theory that has not yet been disproved or supplanted by
some better theory, and which therefore might provide the most reasonable basis
currently available on which to proceed. On one hand, it is always best to retain
some degree of skepticism and proceed with caution. On the other hand, being
completely skeptical of everything is simply not feasible – it leads to indefinite
vacillation. The only way forward is to make use of the “best” findings in economics
and other policy related disciplines, but, to keep in mind that they are at best
incomplete approximations, more applicable to some situations than others, likely
biased, and, certainly not the final truth.
139
Chapter 6 Terminology
The following is a list of terms that you should know in order to discuss and
apply the material from this chapter.
Difference in Difference (DID) An econometrical tool used to measure the effects
of a treatment to a group by comparing the group before the treatment to the group
after treatment.
Omitted Variable Bias (OVB) A type of bias that occurs when a variable that
determines the independent variable is omitted from the model and is correlated
with the included explanatory variables. It is a problem because it makes it
impossible to separate the effect of the explanatory variable and the omitted
variable.
Random DoubleBlind Trial (RDBT) An experiment that works to eliminate
subjective bias in which neither the experimenter nor the subjects know which
group is the control group and which is the experimental. This can become
extremely costly and time‐consuming when dealing with economics.
Market Trial A type of experiment that uses a test market to test price changes.
The test market is essentially a treatment group while holding other markets
constant. Problems can arise, however, because market trials are not double‐blind,
customers are not randomly assigned across markets so it is impossible to know for
sure that the results are unbiased, and they can be very expensive because the firm
is not setting price at profit‐maximizing level.
Endogeneity/Simultaneous Equations Bias A type of bias that occurs when the
independent variable is correlated with the error term in a regression model.
Natural Experiment A type of experiment that finds exogenous changes that cause
big changes within the system of variables that are being measured. D.I.D. measures
are used to compare the impact of a federal change on each state. Problems can
arise if the states change over time at different rates or the timing of the natural
experiment is endogenous.
Panel Data Method of following a group of individuals over a long period of time.
This works to separate customers into different cities, isolating their idiosyncratic
differences. Indicator variables are used to pick up all of these idiosyncrasies. The
problem with this method is that it only works for individual‐specific effects fixed
over time.
Instrumental Variable A variable that is not directly included in the model,
correlated with the endogenous right‐hand side variable, and not correlated with
the error term. It is used as an instrument to identify what changes in price have
nothing to do with the demand error and then estimate the slope of the demand
curve using these price changes. However, these variables are hard to find.
140
Part 3
A Closer Look at Some of the Tools
141
Chapter 7
Individual Choice
In chapters two and three we used demand curves to represent the consumer
side of the market, in particular the way price is related to the quantity demanded
and the way income levels and the prices of substitutes and complements affect
demand. Consumer theory is the study of how the underlying preferences of
individuals and the limits imposed by income and market prices shape individual
choices. This framework gives rise to a more detailed theory of demand. It also has
applications to interesting economic questions in its own right.
Underlying Assumptions about Preferences
Consumers choose from among many possible combinations of any number of
goods to consume. Each particular combination of the goods is called a bundle of
those goods. We model consumers as if they choose their most preferred bundle
from among the feasible options. Utility is the name of the metric economists use to
measure preference. It is what is known as an ordinal measure, not a cardinal one.
That is, it assigns a higher number to more preferred bundles than to less preferred
alternative bundles. The units of utility are arbitrary, it is only the ranking that
matters. If one bundle is preferred to a second bundle, it does not matter if the
utility of the first is 10,000 and the utility of the second is 0.0001, or if the utility of
the first is 4 and the utility of the second is 3.9999. All that maters is the number
associated with the first is higher than the number associated with the second.
Statements like “bundle A is twice as good as bundle B” are not meaningful with the
utility metric, as “twice as good as” has absolutely no meaning with an ordinal
measure.
In order to develop the theory of individual choice, we first list and describe our
basic assumptions about individual preferences. The first assumption is
completeness. This means that when faced with a choice between any two bundles,
call them bundle A and bundle B, the consumer either prefers A to B (denoted
A B ), prefers B to A (denoted A ≺ B ), or is indifferent between the two (denoted
A ~ B ). These are the only three logical possibilities, and preferences are complete
in the sense that the consumer can make one and only one of these three statements
about any two possible bundles of goods.
The second assumption is that more is better. This means consumers are better
off if they have more of one good and no less of all the others, all else equal. So, if B
has more pizza than A and as much of everything else, B A . Keep in mind here we
are discussing preferences only, not the cost of attaining the bundles or what the
consumer would ultimately choose given the cost. The assumption that more is
better is assumed to hold in the range of combinations of goods a consumer would
normally purchase, not necessarily for any possible combination. For example, when
choosing between five pieces of pizza and 10,000 pieces of pizza, a consumer
invariably would choose the five pieces, due to the infeasibility of eating and storing
all the extra pizza. Although this seems to violate the assumption that more is better,
142
10,000 pieces of pizza is not in the relevant range – we will not be analyzing choices
over bundles with 10,000 pieces of pizza. What about trash? People prefer less trash
‐ does that violate the assumption? Not if we define things in terms of trash removal.
We assume the good is trash removal, which people would prefer to have more of
(trash is a “bad”, not a “good”).
Third, when faced with more than two bundles to choose from, the consumer’s
choices must satisfy transitivity. That is, if A B and B C , then A C . For
example, a consumer cannot say they’d rather have a hamburger than a pizza and a
pizza rather than a burrito, but that they’d rather have a burrito than a hamburger.
The fourth and final assumption relates to a consumer’s marginal rate of
substitution between two goods, so we must first define it. A consumer’s marginal
rate of substitution of good X for good Y (denoted MRSXY) is the amount of good Y he
is willing to give up for one more unit of good X that preserves his level of
happiness. For example, if a consumer currently has two pieces of pizza and three
cokes, he may be willing to give up two cokes in exchange for one more piece of
pizza. Thus, his marginal rate of substitution of pizza for coke is 2. Note that this
value is different than his marginal rate of substitution of coke for pizza.
It will have occurred to the thoughtful reader that this rate varies according to
how much of each good the consumer has. For example, the consumer may be
willing to give up two cokes for one more piece of pizza when he has three cokes;
however, this number will surely change if he has only one coke. In fact, as we can
intuitively conclude, the less coke the consumer has, the less he is willing to give up
for one more piece of pizza. This is actually the fourth assumption, and it is
succinctly known as a diminishing marginal rate of substitution.
If you believe these four assumptions to be true, then a consumer’s choice
between bundles of goods when faced with a decision may be expressed as a
mathematical maximization problem where the consumer maximizes a utility
function (where the ranking matters but the units are arbitrary) subject to the
constraint imposed by their limited income and market prices.
Indifference Curves and Preferences
An indifference curve shows different combinations of two goods that provide
the same amount of satisfaction, or utility. If we are equally happy with two cokes
and one piece of pizza as we are with one coke and three pieces of pizza, we are
indifferent between those two options – hence the term indifference curve. An
indifference curve for two goods can be depicted on a graph where one axis
measures the amount of one good (say good X) and the other axis measures the
amount of the other good (say good Y). The figure below depicts three types of
indifference curves.
The “L” shaped indifference curve depicts the preference of a consumer who
views goods X and Y as perfect complements. Remember, the indifference curve
shows different combinations of X and Y that make the consumer equally happy. So,
for a given amount of good X, additional units of good Y do not move the consumer
off of his indifference curve; that is, they don’t increase his satisfaction. In this case,
143
the goods have no substitutability, and
utility can only be increased if he has Y Imperfect
Substitutes
some additional amount of both goods –
they are consumed together or not at all.
For example, someone who has no use
for either peanut butter or jelly
individually, ever, under any Perfect
circumstances, but will consume them Complements
together, views them as perfect
complements. Perfect
If the consumer’s indifference curve Substitutes
looks like the straight line in the figure, X
goods X and Y are perfect substitutes. In
essence, the consumer is willing to trade a specific amount of Y for one more unit of
X, and this specific amount never changes. For example, if a consumer would always
be willing to give up two cokes for one piece of pizza, no matter how much coke and
pizza he has, the goods would be perfect substitutes to him.
The third indifference curve, in the figure represents imperfect substitutes. This
The consumer is willing to trade some Y for one more unit of X, but the amount of Y
he is willing to give up for another unit of X decreases as he has more X and less Y.
This indifference curve is consistent with the assumption that more is better and the
assumption of a diminishing MRS.
Y
Begining at point a in the figure to the right, a
if the amount of good X increases by one unit, in Y3
order to stay on the same indifference curve the
amount of Y consumed must decrease from Y3
to Y2. Otherwise, we’d have more utility since Y2 b
c
more is better. This is why indifference curves Y1
have a negative slope.
X
Indifference curves with this shape also
exhibit a diminishing MRS. Going from point a 1 1
to point b, the consumer gains one unit of X. For
this unit of X, the consumer is willing to reduce their consumption of Y from Y3 to Y2.
But, going from point b to point c, which is also a one‐unit gain of X, the consumer is
only willing to decrease Y from Y2 to Y1, a smaller decrease. This is because at point b
the consumer has less Y and more X, thus Y becomes more valuable in terms of X.
This is why indifference curves are convex to the origin –the diminishing marginal
rate of substitution. This property is also sometimes expressed as a preference for
variety.
From the last figure, it should be clear that the MRSXY is just the absolute value of
the rate of change of Y with X along the indifference curve. The rate of change at a
particular point is just the slope of the tangent at that point. Saying the MRSXY
diminished along an indifference curve is saying that a line tangent to the
144
indifference curve gets flatter as X increases, since the tangent IS the graphical
representation of the concept of the MRS.
Indifference curves cannot cross, as do the
Y
two in the figure to the right. Why? We know
that a has more of both good X and good Y than
b, so a b . But b and c are on the same c
indifference curve, and since indifference a
curves are defined to be different combinations
of goods that give the same amount of utility, b
b ~ c . Transitivity then tells us that since a b
X
and b ~ c , a c . But a and c are on the same
indifference curve, so this cannot be true. So
indifference curves cannot cross.
Instead, a graph with multiple indifference curves must look something like the
one in the next figure, below and to the right. The figure shows three indifference
curves labeled 1, 2, and 3. The figure labels each indifference curve according to the
X or Y value where it crosses the 45 degree line, at which point Y=X. Completeness
means that every possible consumption bundle is on some indifference curve which
is associated with some (ordinal) measure of utility. That means that any number of
indifference curves may pass between indifference curve 1 and indifference curve 2,
and between any two other indifference curves.
Indifference curve 2 represents
Y
higher utility than indifference curve 1.
This is obvious since, along the points Y=X
where they cross the 45 degree line,
where Y=X, indifference curve 2 has 3
more of both goods than indifference
curve 1. Since indifference curves above 2 3
and to the right involve higher
consumption levels, they represent 1 2
higher utility, or more preferred 1
combinations of goods. Similarly,
1 2 3 X
indifference curve 3 represents a higher
level of utility than indifference curve 2. Since the units of utility are arbitrary, and
only ranking matters, we could have as easily called the curves 0.7, 26.263, and,
1,140.3.
Utility Functions, Marginal Utility, and the MRS
Graphical representations of indifference curves are quite useful. For some
purposes though, it is far more convenient and powerful to have a mathematical tool
to focus and check our reasoning. Above we claimed it is possible to represent any
preferences that satisfy the four assumptions described above with a mathematical
function. Such a function is known as a utility function, and would be denoted U(X,Y)
if the two goods consumed are X and Y. This claim may sound unreasonable. After
145
all, how can something as subjective as preferences, which have no natural numeric
scale beyond a simple rank ordering, be represented in an equation?
Actually, we already showed why and how the claim is true in the last figure
above. Any bundle of X and Y lies on some indifference curve. That indifference
curve represents all bundles of X and Y that provide the same level of satisfaction.
So, we can map the bundle (X,Y) to a specific number that signifies the indifference
curve it falls on. Call that number U(X,Y). All points on that indifference curve get
that number. All higher indifference curves get progressively higher numbers. All
lower indifference curves get progressively lower numbers. In the figure, we
assigned those utility numbers by the point where the indifference curve crosses the
45 degree line, but the exact numbers are not important. Saying that combination
( X 3 ,Y3 ) ( X 2,Y2 ) is equivalent to saying U ( X 3,Y3 ) U ( X 2 ,Y2 ) . Thus, even if we have
no idea what form it takes, we know a utility function exists that represents any the
consumer’s preferences that satisfy the four assumptions stated above.
As we’ve stated before, the units of utility are arbitrary; all that is necessary for a
utility function to be representative of an indifference curve is that it preserves the
rank given by the consumer’s preferences. Let’s illustrate this point with an
example. Consider the following three utility functions:
U1 = X + Y + XY , U 2 = ln( X + Y + XY ) , and U 3 = 2( X + Y + XY ) .
All three increase as X + Y + XY increases. So, we could write the second two as
U 2 = ln(U1 ) , and U 3 = 2U1 . While the scale and exact shape of each is different, it
should be clear that they rank any
combination of X and Y in the same order. To X Y U1 U2 U3
be concrete, two possible combinations of X 3 1 7 1.95 14
and Y are considered in the table to the right.
2 2 8 2.08 16
We can see that utility function U1 ranks the
second bundle (2,2) higher than the first (3,1), as do the other two. The scale doesn’t
matter – only the ranking. All three of these utility functions map the bundles to the
same indifference curves. While a utility function exists for every indifference curve,
it is by no means unique. But, there is only one underlying ranking of possible
consumption bundles that is consistent with any consumer’s preference.
The marginal utility of a good is the change in utility that results from a small
increase in consumption of that good. The partial derivatives of the utility function
with respect to the amount of good X consumed gives the marginal utility of X, MUX.
So, if utility depends only on X and Y, the marginal utilities are as follows.
∂U
MU X =
∂X
. (7.1)
∂U
MU Y =
∂Y
The marginal utilities are basically how far up the indifference map the
consumer moves when they have a little more of one good. Suppose the marginal
utility of X is 3 and the marginal utility of Y is 0.5. Then, if the consumer gets one
146
more unit of X, how many units of Y could they give up while retaining the same
level of utility? Since another X increases utility by six times the amount of another Y
at the margin, the consumer could give up 6 units of Y for one X while remaining on
the same indifference curve. Thus, the marginal rate of substitution of X for Y is
equal to the ratio of the marginal utility of X to the marginal utility of Y:
MU X
MRSXY = . (7.2)
MUY
We noted above that the MRS was just the absolute value of the slope of the
indifference curve. Another way to see that is to note that the change in utility, dU,
for small changes in X and Y, dX and dY, can be expressed as:
dU = MU X dX + MU Y dY (7.3)
This just says utility increases by MUX per unit increase in X plus MUY per unit
change in Y, at least for very small changes. Along an indifference curve, dU=0. So,
this can be rearranged as follows:
MU X dX + MU Y dY = 0
MU Y dY = − MU X dX . (7.4)
dY MU X
=−
dX MU Y
Thus, along a given indifference curve, the slope of the indifference curve at any
given point is given by the negative of the ratio of the marginal utility of X to the
marginal utility of Y, which is the negative of the MRSXY as well.
Budget Constraints
Preferences are the first major component of consumer theory. Budget
constraints are the second. The way we graphically represent budget constraints is
by using, creatively enough, a budget line. The equation of the budget line is when
there are only two goods is
m = pX X + pY Y , (7.5)
where m is the consumer’s total budget, pX and pY are the prices of good X and good
Y, and X and Y are the amounts of good X and good Y purchased. So, if we are on this
line, our income, which is the amount of money we have to spend on both goods, is
equal to the amount of money that we actually spend on the goods. Generally, the
budget line can be defined for any number of goods. It is just the fact that income
must equal expenditures. This does not preclude saving. If we want to analyze
saving, we would just let savings be one of the goods upon which income is spent.
The budget line simply depicts the bundles of goods the consumer can afford.
Solving this equation for Y, we obtain
pY Y = m − pX X
147
m pX
Y= − X .
pY pY
This is the budget line in slope‐intercept form.
The slope of the budget line is the negative of Y
the ratio of the price of X to the price of Y, m
p pY pX
− X p . The Y intercept in the equation above slope = −
Y pY
is m p . That is the amount of Y that could be
Y
purchased if no X were purchased. Similarly,
X
the X intercept is m p . A graph of the budget
X
line looks like the figure to the right. m
pX
The interpretation of the slope pf the
budget line will prove to be important. In this context, the slope tells us the rate at
which Y can be exchanged for another unit of X at market prices. For example, if the
price of a coke is $1.50 and the price of a slice of pizza is $3, we have to give up half a
slice of pizza per additional coke purchased. Generally, since income is fixed, as the
amount of good X purchased increases, the amount of Y purchased must fall. Since
how much of good Y we have to give up in order to obtain one more unit of good X is
based on their relative prices, it should make sense that the slope of the budget line
is the price ratio.
Example: Budget Line
If the price of beer is $1, the price of pizza is $2, and a consumer has $50 to
spend on pizza and beer, find and graph the consumer’s budget line.
Solution: The budget line is
m = p B B + pP P
Pizza
50 = B + 2P m
= 25 pB
Since we have $50, and one piece p p slope = − = −0.5
of pizza costs $2, we can buy a .5 pP
maximum of 50/2 = 25 pieces of 1
pizza. Similarly, if we buy no
Beer
pizza, at $1 per beer we can buy m
a maximum of 50/1 = 50 beers. = 50
pB
These intercepts give us our
budget line:
148
Individual Choice
Now that we have the two basic tools of consumer theory, we combine them to
analyze consumer decisions. Consumers choose their preferred bundle from among
those that are possible. That is, they want to maximize their utility or reach the
highest possible indifference curve given their budget constraint. First we look at
the problem graphically, then mathematically.
The figure on the right shows three
indifference curves for a hypothetical Y
consumer, along with their budget line.
Indifference curve 3 is out of reach, given
their budget. Indifference curve 1 is in
reach, but it should be obvious that it is m
not the highest indifference curve that py
can be reached. The highest indifference
curve that can be reached is represented
3
by indifference curve 2. While it is easy
to understand this graph, as far as it goes, 2
1
it is important to have a more precise m
X
understanding of the solution, and to be
px
able to explain that understanding
precisely in words.
Interpretation of the Solution of the Individual’s Choice Problem
Notice that at the solution, the budget line is tangent to the indifference curve.
We know that the slope of the budget line is the negative of the price ratio, and that
the slope of the tangent to the indifference curve is the negative of the marginal rate
of substitution. Thus, at the point on the budget line where the individual reaches
the highest possible indifference curve,
pX
MRS XY = . (7.6)
pY
We will refer to this type of condition as an optimality condition.
At the solution, the amount of Y the consumer is willing to give up for one more
unit of X must equal the amount of Y he would have to give up to get another unit of
X in the market. Intuitively, if the value of another unit of X in terms of Y is higher
than the cost of another unit of X in terms of Y, the consumer will choose to buy
more X and less Y. On the other hand, if the value of another unit of X in terms of Y is
less than the cost of another unit of X in terms of Y, the consumer will choose to buy
more Y and less X.
Making use of the fact that the marginal rate of substitution is equal to the ratio
of the marginal utilities, we can rewrite the optimality condition as follows:
149
MU X p X
=
MU Y pY
. (7.7)
MU X MU Y
=
pX pY
This interpretation of this version is that the marginal utility per dollar spent on
good X must equal the marginal utility per dollar spent on good Y. If the marginal
utility of the last dollar spent on X exceeds the marginal utility of the last dollar
spent on Y, the consumer could increase utility by buying more X and less Y, and
vice versa. Remember, utility just corresponds to indifference curves. If the marginal
utility per dollar spent is higher for good X than for good Y, that just means that
spending one dollar more on X will move the consumer up more indifference curves
than he would move down by spending one dollar less on Y in order to free up a
dollar to spend on X.
This optimality condition readily generalizes to more than two goods. It just says
that the relative value of any two goods consumed must equal the relative cost, or
that the marginal utility per dollar, or “bang per buck”, should be equal for all goods.
At a consumer’s most preferred feasible choice, two things must then be true in
general. First, they must satisfy the budget constraint – they can’t spend more than
they have. Second, the optimality condition(s) must hold – it should not be possible
to reallocate expenditures so as to reach a more preferred bundle, or no available
good should have a value in terms of the other goods consumed that exceeds its
price relative to those goods.
Let’s now tie this logical and intuitive reasoning to the graphical analysis using
the budget constraint from our earlier example where the budget is $50, one beer
costs $1, and one slice of pizza costs $2. The figure below shows the budget line and
two indifference curves. Indifference curve III, based on the graph, is unattainable
given the budget. Indifference curve I crosses the budget line in two places, points a
and b.
The line tangent to the indifference
curve I at point a is shown by the dotted Beer
line. The slope of this tangent line
represents the rate at which we can 50 a
trade pizza for beer at that point while
maintaining our level of utility; in other
words, the slope of the tangent line is III
the marginal rate of substitution of pizza b
I
for beer at any given point. Intuitively, it
means how much beer the consumer is Pizza
25
willing to give up to get one more piece
of pizza at point a.
150
Notice that the tangent line to the indifference curve at point a is steeper than
the slope of the budget line. Therefore, at point a,
pPizza
MRSPB > pBeer .
Remember, the slope of the budget line represents the rate at which the market will
allow us to trade pizza for beer. Since pBeer = $1 and pPizza = $2, the market allows us
to trade two beers for one slice of pizza; equivalently, if we wanted one more pizza
slice, we would need to give up 2 beers. Because the slope of the tangent line at
point a is steeper than the slope of the budget line, the consumer is willing to give up
more than 2 beers for one more piece of pizza to stay at the same utility level. Since
the consumer can give up exactly 2 beers for one more piece of pizza (based on the
market prices), he will be better off if he does so. The consumer will buy more pizza
and less beer if he starts at point a. As they do so, they have more pizza and less
beer, so the value of another slice of pizza in terms of beer falls, or the MRSPB
diminishes. Eventually, the consumer will reach a point where the MRS equals the
price ratio, or the relative value equals the relative cost at the margin.
At point b, it’s just the opposite; the slope of the tangent is flatter, which means
the consumer is willing to give up less than 2 beers for one more pizza. So, at point
b,
pPizza
MRSPB < pBeer .
So, if the consumer buys more beer by giving up pizza, he will be better off, since he
can get exactly 2 beers for one pizza. As they buy more beer and less pizza, the value
of pizza in terms of beer increases, until the value of another slice of pizza in terms
of beer equals the relative cost.
The consumer, then, will reach the highest indifference curve when the rate at
which he’s willing to trade pizza for beer is exactly the same as the rate at which the
market allows him to, or when
pPizza
MRSPB = pBeer .
Beer
151
to say the same thing). Nowhere did we say anything about diminishing marginal
utility. The scale of utility is arbitrary, so, in this context, it is completely
meaningless to refer to “diminishing marginal utility.”. All that matters is the ratio of
the marginal utilities and how that ratio changes with the consumption bundle.
Price Changes and Income Shifts
The model above describes determination of a consumer’s individual quantity
demanded at a given set of prices and a given income level. As we have explained,
the budget line is determined by both the prices of the two goods and the
consumer’s level of income. Changes in either of these affect the budget line. Let’s
first look at what happens when the income changes.
Suppose that the consumer’s income m
Y
increases in the figure to the right. The effect
m1
is that the consumer can buy more of both m1 > m0
goods if they so choose, but as the slope of pY
the budget line is determined by the prices m
0 Y
of the goods – which haven’t changed – the 1
pY
budget line will simply shift up. The opposite Y0
would happen if the consumer’s income fell. X
In the figure, both X and Y are normal goods X 0 X1 m0 m1
for this consumer – he consumes more of pX
pX
each when income increases.
The reader should experiment with different shaped indifference curves to
verify that it is possible for one of the two goods to be an inferior good, for which
consumption will fall when incomes increase, but not for both goods to be
simultaneously inferior. In addition, if the prices of both goods increase by a given
factor, it is just like a decrease in income. Or, if both prices fall by a given factor, it is
just like an increase in income. For example, if both prices increase by 50%, it is just
like incomes fell by one third. It would be a good idea for the reader to show this
graphically and to verify the equivalence algebra and the equation of a budget line.
Finally, if prices and income all increase by the same factor, nothing changes.
Suppose the price of good X increases
from pX0 to pX1 in the figure to the right. Y
Since the x‐intercept of the budget line is
determined by how many units of good X m pX1 > pX 0
we can buy with a fixed amount of income,
an increase in the price means we can buy pY
fewer units. Thus, the budget line pivots
inward. As we would expect, a higher price
of good X means that the new chosen
bundle will contain less units of X than the X0 X
X1
old bundle. If the price of good X were to
decrease, exactly the opposite would
m m
happen. Note that the change in the price of pX1 pX 0
152
X affects the demand for Y in the figure. At the higher price of X, in this example the
demand for Y increases. Thus, X would be thought of as a substitute. The reader
should experiment drawing different shaped indifference curves to show that it
would be possible for consumption of Y to decrease, in which case the goods are
complements, or to remain the same. It is also left to the reader to determine how a
change in the price of good Y would affect the budget line and equilibrium bundle.
There are really two things going on when a price increases. First of all, the
relative price changes. At the initial solution and original prices, the value of X in
terms of Y equaled the relative cost. Once the price of X increases, the value of X in
terms of Y is suddenly less than the cost. That induces the consumer to want to
substitute some Y for X. The tendency for substitution induced by a price change is
known as the substitution effect. At the same time, the increase in price reduces
purchasing power. That means the consumer is poorer. If the good is normal, this
will result in a further decrease in the consumption of X. If the good is inferior, this
will actually work against the tendency to substitute away from X. The effect of the
reduction in purchasing power, or, real income, due to a price change on the
quantities chosen is known as the income effect. Changes in prices have larger
effects on purchasing power if the good accounts for a large share of the consumer’s
budget. So, the income effect is more pronounced for goods that are a large share of
the budget.
The effect of a price increase above led to a decrease in the quantity demanded.
This is consistent with the law of demand. However, the analysis above, in and of
itself, does not “prove” the law of demand. To the contrary, the tools of consumer
theory can be used to illustrate the strange case where the law of demand would not
hold. While it is doubtful that we would ever observe it in practice, especially at the
market level, here is how the theoretical argument goes. Suppose someone very
poor spends a lot of their budget on inferior goods. If the price of that inferior good
goes up, the substitution effect says buy less, the income effect says buy more. Since
the good is a large share of their budget, the income effect could win out! Such a
good would be known as a Giffen good.
For example, suppose a college student Y px1 > px 2
is so poor that they eat instant Ramen
noodles for all except one meal every week. M
For that last meal, they order from the
dollar menu at McDonald’s. They regard py
2
Ramen noodles as an inferior good. If their
income increased a little, they would order
from the dollar menu at McDonald’s twice
each week. Now, suppose the price of 1
Ramen goes down. With the money they
save, they might choose to buy from x2 x1 X
McDonald’s one more time, and therefore
eat less Ramen. This is shown in the figure
on the right.
153
This situation is pretty unlikely at an individual level. From the perspective of a
market, taken as a whole, it seems almost impossible, because while some may
spend a lot of their income on instant Ramen, the fraction of the average consumer’s
income spent on it will be too small for the income effect to outweigh the
substitution effect. As the reader should be able to tell from the figure, it is even
hard to draw! The main reason to cover it is that it is a good extreme case for the
reader to use to check their understanding of the theory.
Individual Choice – the Calculus Version
Recall the consumer’s original goal: reaching the highest indifference curve with
their limited budget. Since we have shown that utility functions represent
indifference curves, this goal is equivalently stated as a consumer maximizing his
utility given their budget constraint. With only two goods, it is possible to solve the
budget constraint for Y or X, substitute that expression into the utility function for Y
or X, and then just maximize as usual. In the two‐good case, the way to think about
the problem mathematically is:
max X ,Y U ( X ,Y )
. (7.8)
subject to p X X + pY Y ≤ m
How is this problem attacked? Assuming we have defined things so all income is
m pX
spent, as we did above, the budget constraint could be written as Y = − X , so
pY pY
m pX
utility would be U ( X , − X ) . For example, if the utility function is
pY pY
U = 2 XY + X , the price of X is 4, the price of Y is 2, and income is 20, this would
become U = 2 X (10 − 2 X ) + X . Maximizing that will give you the consumer’s choice.
But, it is not a particularly insightful way to accomplish anything.
Instead, we can set up and solve this problem by appending what is known as a
Lagrange multiplier to the constraint and creating a new form of the problem
known as a Lagrangian. For the two‐good case, that looks like the following:
L = U ( X , Y ) + λ[ m − p X X − pY Y ] . (7.9)
Why do this? This is basically a “cooked” problem that is structured to account for
the constraint by introducing a new choice variable, λ (lambda). With the problem
set up this way, we can take three partial derivatives and end up with three
equations in three unknowns, (X, Y, and λ). Solving those equations will give us the
solution to the consumer’s problem. Thus, we can apply what we know about
unconstrained optimization to solve the problem once it is written in this way.
Taking the partial with respect to X we have
∂L ∂U
= − λpX = 0 . (7.10)
∂X ∂X
154
Since the partial derivative of utility with respect to X is just the marginal utility of X,
the equation becomes
MU X − λpX = 0 . (7.11)
Similarly, the partial with respect to Y yields
Setting the first equation equal to the second and rearranging terms gives
MU X p X
= . (7.13)
MU Y pY
Looking at our result from the first two partials, we see that equation (7.13) is the
same optimality condition obtained above.
The third and final partial derivative of the Lagrangian with respect to λ is
∂L
= m − pX X − pY Y = 0 . (7.14)
∂λ
Rearranging, this is just the budget constraint,
m = pX X + pY Y . (7.15)
The solutions to equations (7.13) and (7.15) yield the individual consumer’s
demand functions for each good in terms of income and the prices of both goods.
We could rearrange the first two partials to get an expression for λ. Doing so, we
find the following is true at the solution:
MU X MU Y
λ= = . (7.16)
pX pY
This can be interpreted as the marginal utility of the last dollar spent. In other
words, it is a measure related to how far up the indifference map the consumer
would move with a little more income. Thus, the solution for the parameter λ could
be thought of as the marginal utility of another dollar’s worth of income. Since the
parameter for λ was associated with the income constraint, it should not be
surprising that it turns out to tell us something about the utility value of another
dollar.
You will not have to use the LaGrangian on any exam in Managerial Economics.
So, why did we go through it? There are several reasons. The first was to show that
what we argued above based on logic, intuition, and graphical analysis was entirely
internally consistent. The second was because students who are very
mathematically inclined but who had trouble with the intuition may be able to get a
better picture of the logic and intuition having seen the math. Third, many, even
most, of the students who go on to graduate school in many business decisions
including not just economics but also finance, accounting, operations management,
155
and decision science will use this technique extensively later on. So, exposure to it at
this point may be helpful in preparing you for that experience. Finally, all the
software applications that perform constrained optimization use this technique.
Many of you who move on to careers in business will have to make sense of reports
colleagues or consultants have prepared using such tools. Having some knowledge
of the technique underlying the solution should help you make sense of such
reports, and allow you to evaluate the results more intelligently.
Summary and Example Problem
In summary, two conditions must be met at the consumer’s most preferred
feasible bundle.
1. Optimization condition: A consumer allocates a given budget optimally
p MU X p X MU X MU Y
between good X and good Y. MRS XY = X , = , or = .
pY MU Y pY pX pY
2. Budget constraint: A consumer cannot spend more than they have.
m = pX X − pY Y
Example: Utility Maximization
Suppose a consumer’s utility function for different combinations of pizza (Z) and
beer (B) is given by the function U(Z,B) = 4 B + 2 Z . If the price of beer is $1,
the price of pizza is $2 and the consumer has $50 to spend on beer and pizza,
find the amount of pizza and beer he will buy.
Solution: We begin by finding the optimization condition. The marginal utility of
beer is the partial derivative of the utility function with respect to B, or
2
MU B =
B
and the marginal utility of pizza is
1
MU Z =
Z
The marginal rate of substitution of pizza for beer is the quotient of the two
marginal utilities, or
MU Z B
MRSZB = =
MU B 2 Z
Now we can equate this to the ratio of the prices of the goods to obtain the
optimization condition:
B 2 pZ
MRSZB = = =
2 Z 1 pB
B =4 Z
156
B = 16Z
So, based on our optimization condition, we have found the ratio at which the
consumer will consume beer and pizza. We now must find the second condition,
the budget constraint. Plugging in the level of income and the prices of the goods
to the budget constraint, we have
50 = 2Z + B
To solve these two equations simultaneously, substitute the optimization
condition into the budget constraint:
50 = 2Z + (16Z)
Z ≈ 2.8 and B = 16(2.8) ≈ 44.4
157
Chapter 7 Terminology
The following is a list of terms that you should know in order to discuss and
apply the material from this chapter.
Completeness An underlying assumption of consumer theory that states a
consumer must either prefer one product to another or be indifferent between
them. In other words, preferences are exhaustive: A B, B ≺ A, or A ~ B
More is Better An underlying assumption of consumer theory that states
consumers are better off if they have more of something. This assumption is
bounded by the relevant range of what consumers might normally purchase.
Transitivity An underlying assumption of consumer theory that states if A B and
B C, it must hold true that A C.
Utility A metric used to apply math to preference rankings. The units of utility are
not important, only the rank order matters.
Marginal Utility The added satisfaction of consuming one more unit of a good.
Marginal Rate of Substitution (MRS) The amount of one good a consumer is
willing to give up for one more unit of another good.
Diminishing Marginal Rate of Substitution An underlying assumption of
consumer theory that states as a consumer gets more of one good and less of the
other along an indifference curve, the value of the good second good increases
relative to the first good. This implies a preference for variety.
Indifference Curve A curve that shows different combinations of two goods that
provide the same amount of utility. Indifference curves do not cross because of
transitivity.
Budget Line A line that shows how much money a consumer has to spend on two
goods.
158
Chapter 8
Applications and Extensions of Consumer Theory
Compensation Indexing and Compensating Differentials
Even though utility is a nebulous term that has no fixed scale, and thus cannot be
directly related to a consumer’s actual “happiness”, the theory of individual choice
still has many useful applications. Often, firms find it in their interest to relocate
employees from one city to another. Since housing and other costs may differ across
cities, management often adjusts salaries across cities to compensate employees for
such differences. How much should compensation adjust for changes in the costs of
housing and other goods and services across cities?
In setting up the model, we need to consider what is most important to include
and what we are to ignore. For items that are easily transported from one city to
another, arbitrage means we should expect the “law of one price” to hold, but for
relatively small transportation costs. That means we can ignore variation in the
prices of transportable goods, and any goods and services that are available on the
internet. At the other extreme, land simply can not be moved from one city to the
next, no matter how large the price difference. That, coupled with the fact that
housing accounts for such a large share of a typical consumer’s income, means we
must account for differences in the price of housing in the model. Services which are
highly labor intensive are an intermediate case. Their costs will vary somewhat, due
to the effect of cross city differences in wages, but not as much as housing costs. To
keep the model simple, we will ignore the role of variations in services costs across
cities. But, in interpreting the model, we should not forget that in reality, service
costs will vary, too.
We will therefore assume that utility depends on the amount of housing
consumed, H, and the amount of everything else consumed, E. So, the utility function
is written as U(H, E) . The price of housing, or the rental rate per unit of housing, is
R. The units of housing are most easily thought of as square feet. Since we are
assuming everything other than housing costs the same in all cities, the cost of one
dollar’s worth of everything else, pE, is just
1. Thus, the budget constraint is E
M = RH + E . We will assume, for
simplicity, that income is entirely
determined by the salary the employee is
paid by the firm. MG
Let’s say an employee currently works EG
in Gainesville and his firm would like to UG
relocate him to Atlanta. The current budget
line in Gainesville is shown in the figure to HG MG H
the right. If they just buy everything else RG
159
and no housing, they could consume MG units of E. If they just buy housing and
nothing else, they could consume (MG/RG) units of housing. Given the budget
constraint, they can reach the indifference curve UG in Gainesville, and they will
consume HG units of housing and EG units of everything else. At the solution, the
value of another unit of housing in terms of everything else, MRSEH, is equal to pH/pE,
or R:
MU H
=R (8.1)
MU E
Suppose the firm wants to move the
employee to their Atlanta brand, and that the E
price of housing in Atlanta, RA, is higher than
the price in Gainesville. If they were offered
the same salary, their budget line would MG
pivot in, due to the higher cost of housing. At
the new budget line, the employee can no EG
longer reach indifference curve UG, and thus UG
wouldn’t be willing to move (assuming they
can find a similar job in Gainesville). M G HG MG H
RA RG
To get the employee to move, one option
would be for the firm to pay enough more in Atlanta to allow the employee to
consume the same bundle that they consumed in Gainesville. That level of income,
denoted Mˆ A , is calculated as
Mˆ A = RA H G + EG . (8.2)
Graphically, this is a shift in the new budget
line out until it reaches the original E
consumption bundle, (HG, EG). Remember, the
slope of the budget line is (pH/pE), and since Mˆ A
pE is $1 and pH is RA, the slope of the new,
higher budget Mˆ A doesn’t change; we’re just MG
shifting up the intercept to get back to the
original bundle. EG
Is this the best option from the UG
perspective of the firm? It’s true that Mˆ A will M G HG MG H
get the employee to move – but is there a RA RG
cheaper way to accomplish this? Looking at
the graph, we see that with the new budget line, relating to the level of income Mˆ A ,
the employee can actually reach a higher indifference curve than they reached in
Gainesville. So, with new income Mˆ A , the employee will actually be better off. The
intuitive reason why this is the case is because when people move to places with
more expensive housing, they economize on housing. That is, they are willing to
substitute away from housing when its relative price increases. You can see at the
160
new higher indifference curve, the bundle would include a smaller amount of
housing than the employee consumed in Gainesville.
Thus, when considering how to
compensate the employee for moving at E
the least possible expense to the firm, we M ˆ
A
don’t have to provide him with the same M
A
bundle as he had in Gainesville. Instead,
we simply need to make sure he can MG
obtain the same indifference curve in the EA
new city (i.e. is just as happy). So, EG
management should raise salary in Atlanta UG
above the level in Gainesville by
something less than it takes to reach the HA MG HG MG H
same bundle. The necessary salary RA RG
corresponds to the budget line labeled MA
in the figure to the right. This allows the employee to reach the same indifference
curve, UG, by buying the combination of goods (HA, EA).
Now, let’s say the firm wants to move the employee from Gainesville to the Keys.
Housing is more expensive in the Keys. But, the employee really likes the beach,
fishing, etc. So, there is an amenity difference ‐ the Keys are a more alluring location
than Gainesville to the employee, all else equal. In this case, the indifference curve
for housing and other consumption that the
employee would have to reach in the Keys to E
stay just as happy as he was in Gainesville
would be below the old indifference curve
that they had in Gainesville, because the
amenities themselves are compensation for MG
the move, at least in part. We will call this
new indifference curve UK=G, since the utility
the employee gets from it (when he is in the UG
UK=G
Keys) is the same as the utility he gets from
his old, higher indifference curve (when he is MG MG H
in Gainesville). The situation is shown in the RK RG
figure to the right.
161
We know the level of income required to E
obtain the original bundle from Gainesville
ˆ
MK
ˆ in the figure, is more than
in the Keys, M K
the firm needs to pay. But, since the MK
employee gets extra utility just from being
in the Keys, we don’t have to pay him as MG
much as it would cost to reach a point on
indifference curve UG. Since UG and UK=G UG
UK,G
correspond to the same level of utility, we
need to only reach the indifference curve MG MG H
UK=G. This takes a salary of MK. RK RG
Thus, we have seen that differences in housing cost and differences in inherent
utility bearing conditions, called amenities, create differences in the wages paid to
similar workers doing similar jobs in different cities. Such wage differences are
known as compensating differentials. In locations that are more pleasant or cheaper
to live in, wages are lower, all else equal. In places that are cold, dreary, or otherwise
unappealing to an employee, workers will demand higher wages, as they will in
places that are more expensive. This concept can also be applied to jobs that are
risky, difficult, and unpleasant versus jobs that are relaxed, non‐stressful and safe.
These types of differences between jobs will be reflected in the wages that firms
have to pay.
162
Individual Choice, Individual Demand, and Market Demand
Our analysis of consumer theory
allowed us to find the quantity of a
Y
good a consumer purchased for
market prices and income. We were pX 1 > pX 2 > pX 3
also able to determine how changes in
market prices affected the quantity m
the consumer purchased. By tracing py
out the quantity of good X a consumer
purchases at different prices and
income levels, we arrive at that
consumer’s individual demand curve. 3
2
The figure to the right shows how 1
this works. For a fixed level of income X
m and a price of good Y of pY , we have X1 X2 X3
three different prices for good X: pX1 is p
the most expensive, pX2 is cheaper, and
pX3 is the cheapest. We can see, based p1
on consumer optimization, as the
price of good X increases, the quantity p2
that the consumer demands
decreases.
p3
If we transfer these three points of
d
consumption to a graph with price on
the vertical axis and quantity qX
q1 q2 q3
demanded on the horizontal, we will
obtain a graph of this individual’s demand curve for good X, as shown in the lower
panel of the figure. As we can see, an increase in the price of good X tends to
decrease the quantity of good X this consumer demands. While theoretically Giffen
goods could exist for some consumers in some rare cases, this is the usual one.
Suppose the market for good X contained only this consumer, whose demand
curve we have just modeled. Then, the market demand curve for good X would be
identical to the consumer’s demand curve. Typically, however, there is more than
one consumer in a market. How do we go from individual to market demand?
163
Assume the market for good X consisted of three consumers. For a given price of
good X, each customer would demand a certain quantity. At each price, the market
demand curve represents the
total quantity demanded by p
all the consumers in the
market. This is shown in the
figure at right. p1
The three individuals’
demand curves are shown by p2
d1, d2, and d3. The quantity of
good X that consumer 1
demands at each price are p3
shown by x1, x2, and x3, but d1 d2 Dx
d3
the other two consumers also
demand a certain amount at x1 x2 x3 Qx
each price. The market
demand curve, DX, is simply the sum of these three consumers demanded quantities
at each price; in other words, it is the horizontal summation of the individual
consumers’ demand curves. This generalizes out to any market demand curve,
regardless of how many consumers there are.
Our conclusions from consumer theory about changes in the prices of related
goods and income levels also carry over to market demand. Recall from our analysis
of indifference curves and budget lines that if the price of good Y were to increase,
the individual consumer
would not only buy less of p
good Y, but he would also
substitute into good X, which Ï pY
is now cheaper relative to the
initial price levels, if the goods
are substitutes. Thus, if good
Y is a substitute for X to
enough consumers, the
demand curve for good X
would shift outward, as in the Dx’
figure to the right. Similarly, if D x
the price of good Y were to
decrease, the demand curve Qx
for good X would shift inward.
164
Finally, recall that if a consumer’s income, m, were to increase, he would buy
more of good X at all prices
if is a normal good, and less p
if it is inferior. Assuming a
good is a normal good for Ï M
most, if not all, of those who
consume it, an increase in
market income, M, would
shift the demand curve for
good X outward. However, if
it is inferior for enough
consumers, increases in Dx’
market income, M, shifts Dx
demand inward. A decrease
in income works in the Qx
opposite way.
Willingness to Pay and Consumer Surplus
Generally, as the price of a good increases, the customer responds by buying
less; but how much worse off is he because of it? Or, equivalently, how much better
off is a consumer when he can buy a product at a lower price?
Let’s start with the simplest possible case where a consumer either buys one
unit of a good or none. The consumer has a reservation price reflecting the value
they place on the good. If the price of the good is above this reservation price, the
consumer is not willing to buy the good. If it is at or below this reservation price, the
consumer is willing to buy the good. We call this reservation price the consumer’s
willingness to pay for the good, denoted v.
Often, consumers are able to purchase goods at a price that is below their
maximum willingness to pay. For example, imagine a consumer values a certain car
at $20,000, but the car is currently listed at a price of $17,000. Because the
consumer is able to pay $17,000 for a car that he values at $20,000, he would be
$3,000 better off. This dollar measure of how much better off a consumer is because
he gets to purchase a good at the prevailing market price is called his consumer
surplus.
The calculation of consumer surplus is simple when a consumer has a choice
between buying either one unit of a good or zero units. When a consumer
potentially buys more than one unit, things become more complicated. In this
context, consumer surplus represents a very simplified measure of consumer
theory. But, this measure is used extensively. To be exact, we would have to relate
the measure of consumer welfare to consumer theory more precisely and speak in
terms of compensating differentials for price differentials.
Instead, we start by letting v(q) denote a consumer’s willingness to pay for q
units in total is denoted v(q). Thus v(10) represents the maximum amount the
165
consumer would pay in total for 10 units of the good if his alternative were to
purchase zero units. We then assume the consumer behaves as if to maximize the
difference between this total willingness to pay for q units and what he actually pays
(at price p per unit), which is the consumers surplus, cs(q):
cs (q) = v(q) − pq . (8.3)
Maximizing consumer surplus with respect to quantity gives
dCS dV
= − p = 0 . (8.4)
dq dq
Letting v '(q) denote the derivative of willingness to pay, or marginal willingness
to pay, this could be written as
p = v'(q) . (8.5)
This says a consumer will continue buying as long as their marginal willingness to
pay, or their willingness to pay for just a little more, exceeds the cost of a little more.
At any point where the value of another unit exceeds (is less than) the cost, there is
an incentive to buy more (less). That is, this tells us the price at which the consumer
will buy q units. So, it follows from this that marginal willingness is the individual
consumer’s inverse demand curve in this model.
Let’s consider an example. The first two columns of p(q)=
the table to the right show an inverse demand curve. q v(q) cs(q)
v’(q)
The third column adds up the marginal willingness to
1 7 7 3
pay for each unit to get (total) willingness to pay. The
last column shows consumer surplus if price is 4 at 2 6 13 5
each possible quantity. For the first unit, we can see 3 5 18 6
that his maximum willingness to pay is 7; as he only 4 4 22 6
pays 4, he has a surplus of 3. If 2 are purchased, total
5 3 25 5
willingness to pay is 13 (7+6) and consumer surplus is
5 (13‐8). The third unit brings total willingness to pay to 18 and purchasing it brings
consumer surplus to 6. The fourth unit increases willingness to pay and total
payments in equal amounts, leaving consumer surplus unchanged. As we will
discuss more below, this is simply because the price exactly equals marginal
willingness to pay at 4 units AND because we are assuming discrete units. If units
were both valued and available in fractions, or if the price were 3.50, the fourth unit
would increase consumer surplus. Purchasing more than 4 would lower consumer
surplus, since the marginal value falls below the market price.
166
The figure to the right shows the
example from the table above in graphical p
form. Since we have discrete units (we did
not specify the value for half a unit, or a
tenth of a unit), the shaded areas 7
correspond to consumer surplus – the 6
amount willingness to pay exceeds actual 5
payment. Willingness to pay in the figure 4
would include not only the surplus, but
also the amount paid. v’(q)=p(q)
1 q
2 3 4
Now that we know the demand curve tells us the consumer’s marginal
willingness to pay for the qth unit, it is clear that v(q), his total willingness to pay for
all q units, is the total area under the curve. With a linear demand, this can be
calculated as the area of a triangle. For those comfortable with the concept of an
integral, it is also the integral of the area under the (inverse) demand curve, p(q), up
to the quantity purchased. Letting x be the dummy variable of integration, this is
written as
q
v(q) = ∫ p(x)dx .
0
(8.6)
If we subtract the total payment for q goods at a price of p, the expression for
consumer surplus is
q
cs(q) = ∫ p(x)dx − pq .
0
(8.7)
This is the area in the graph that’s under the demand curve and above the current
market price.
Market consumer surplus is nothing more than the summation of all the
individual consumer surpluses, or
Q
Market consumer surplus can still be found geometrically, as it is simply the area
below the market demand curve, and above the market price.
167
Example: Consumer Surplus
Part 1
Suppose a consumer’s inverse demand is given by p = 10 − 0.4q and the current
price is $2. How much does the consumer buy and what is their consumer
surplus and willingness to pay at that quantity?
Solution: The consumer purchases where their marginal willingness to pay
equals the market price.
2 = 10 − 0.4q
0.4q = 8
q = 20
The easiest way to find cs(q) is to graph P
the demand curve and calculate the area 10
under it at the chosen quantity. This is
cs
shown in the figure. Consumer surplus is 2
just the area of the indicated triangle. The pq d
base of the triangle is 20 and the height is
10‐2=8, so consumer surplus is 20 Q
20(8)
cs = = 80 .
2
We also observe that the total amount actually paid by the consumers is
pq = 2(20) = 40
so total willingness to pay is
v(q) = 80 + 40 = 120 .
We could have found the same answer using integration.
Part 2
Assume there are 100 identical consumers in the market. What is the total
willingness to pay of all consumers and market consumer surplus when each
consumer maximizes their consumer surplus and price is 2?
Solution: The total value of all units sold and consumer surplus are simply 100
times the values per consumer in this case.
168
Chapter 8 Terminology
The following is a list of terms that you should know in order to discuss and
apply the material from this chapter.
Compensation Indexing A tool used by managers to find how much more or less
they must pay an employee for relocating them. It takes into account housing prices
in the different region, the cost of everything else, and the employee’s amenity
preferences.
Compensating Differential The additional amount of income needed by an
employee to do an undesirable or dangerous job or work in an undesirable town.
Amenity A desirable feature of an area to which an employee is relocating. As
amenities for a certain town increase, the extra income a manager must pay an
employee for relocating decreases.
Consumer Surplus The benefit received by consumers who can buy a product for
less than their willingness to pay for it. Approximately, it is the triangular area
under the demand curve and above the market price.
169
Chapter 9
Non Linear Pricing
Block Pricing and TwoPart Pricing
In the last chapter, we developed the concepts of (total) willingness to pay and
consumer surplus. In this section, we apply those ideas to study more advanced
pricing strategies. Consider the standard way a firm with market power maximizes
profit. Assume, for purposes of the
current discussion, that they are selling p
to a single customer, who is a price taker.
The figure to the right is the standard p cs
mon
depiction. The monopolist sets MR = MC
and sells the profit‐maximizing quantity
π DWL
MC
qMON by charging a price of pMON, The area
above cost and below price is the firm’s MR p(q)
profit (π); the area above price and qmon q* q
below the demand curve is consumer
surplus (cs).
What does the triangle labeled DWL represent? Note that units to the right of
qMON but to the left of q* have a higher value to the consumer than the marginal cost
of producing them (that is, p(q) > MC) and aren’t being sold. Put differently, total
value added by the firm could be increased by producing and selling to the point
where the marginal willingness to pay, or marginal value of another unit, is equal to
marginal cost. However, to do so, the firm would have to lower the price charged per
unit to marginal cost. This would reduce profits, since price is less than marginal
revenue for a firm with market power. In the figure, with a constant per unit cost,
this would eliminate profits altogether. The fact that the firm’s profit‐maximizing
output falls short of the level that would maximize value added results in a
deadweight loss (DWL).
So, from the firm’s point of view, there are two related problems with the picture
above. First, there is value added left to be had by increasing sales to q*, but they
can’t get any of it because they would have to lower price. Second, consumers get to
keep some of the value added by the firm in the form of consumer surplus. Block
and two‐part pricing attempt to rectify some these deficiencies that are inherent in
simple linear pricing from the firm’s point of view.
The only way the firm can get rid of the deadweight loss is to sell every unit that
has a higher value to consumers than its cost to produce; in other words, it must sell
where marginal willingness to pay equals marginal cost. But, with simple linear
pricing, lowering price below the monopoly level lowers profit. The only way to take
away consumer surplus is to charge them more, but it the linear price is higher, they
buy less and profit is lower. The question is – how to sell more while simultaneously
170
charging the customer enough in total to capture not only the new value added but
also leave them with little consumer surplus, instead taking it as profit? Simple
linear pricing is simply not up to that challenge.
Using block pricing, the firm “bundles” together q* units of its goods and
charges a single price P for the entire bundle, or package. Since it is selling q* units
as a bundle, and charging a single price for the entire package, intuition tells us that
it should charge a consumer his total
willingness to pay for q* units. This is p
depicted in the figure to the right.
By charging one price for the block of
units (for example, a 24‐pack of coke), the
firm is capturing a customer’s total p* P=v(q*) MC
willingness to pay in the form of profit, p(q)
leaving no value added in the form of q
consumer surplus. q*
If there are n identical customers who each buy one bundle for price P, profit is
π = nP − C (nq*) (9.1)
where nq* is the total amount of individual
p
units (i.e. single cans of coke) produced.
This is shown in the figure to the right. The
portion of the consumer’s willingness to pay
that does not cover costs is left as profit. π
Since value added is maximized, this is as p* MC
large a profit as the firm could possibly
make from one customer.
C(nq*) p(q)
q* q
Now let’s look at the math behind block
pricing, which we generalize to allow for
both non‐constant marginal cost and multiple (n) identical customers. Each
customer still buys a single bundle of q units at a price of P for the whole bundle.
The firm’s total cost is C(nq). The firm wishes to maximize profit, which is total
payment less cost, or
π = nP − C (nq) , (9.2)
subject to the constraint that the total charge for a bundle cannot exceed a
customer’s willingness to pay, or
P ≤ v(q) .
Since the firm wants to capture as much value added as possible, it would not
set the bundle price less than the consumer’s maximum willingness to pay, since
that would leave some surplus for the consumer. Thus, it wants the two to equal
each other, which means we can substitute v(q) for P into our original profit
function:
171
π = nv(q) − C (nq) . (9.3)
Maximizing profit, we get
dπ dV dC d (nq)
=n −
dq dq d (nq) dq
Note that dV/dq is marginal willingness to pay, or p(q); dc/d(nq) is the rate at which
cost changes for a one unit change in total quantity, or marginal cost; and d(nq)/dq
is the rate at which total quantity changes for a one unit change in the amount per
bundle, or n (since increasing each bundle by one unit will result in n more total
units being produced). We can now rewrite our derivative as
dπ
= np(q) − nMC = 0 , or (9.4)
dq
p(q) = MC . (9.5)
Notice that this is the same conclusion we saw graphically; that is, we want to
produce where marginal willingness to pay equals marginal cost. Solving this gives
the optimal bundle size q*, which can be used to find the price of the bundle P:
P = v(q*) . (9.6)
Twopart pricing accomplishes the same thing in a different way. In two‐part
pricing, there is a price per unit p, and a fixed fee f that is paid for the right to
purchase the goods. An example of a firm that uses this method is Sam’s Club, where
a certain fee gets you monthly or annual membership to access the goods within the
store at a low per‐unit price. Notice that this pricing strategy is infeasible if resale is
possible, since customers could make a profit by buying a membership, paying for
goods at the low price, and reselling them on the market for a slightly higher price,
but saving other consumers the membership
fee.
p
The idea is to choose the per unit price so
that consumers will choose the quantity that
maximizes value added, and then to charge a
membership fee high enough to capture all
f = cs(q*)
consumer surplus in the form of profit. In
p MC
order to sell q* units, the firm must set price
equal to marginal cost. Then, the consumer p(q)
surplus that remains is captured as profit by q* q
setting the fixed fee equal to that amount, as
shown in the figure when dealing with one
consumer.
172
Now let’s look at the math behind two‐part pricing, which we generalize to allow
for both non‐constant marginal cost and multiple identical customers. Since there
are n consumers who each buy q units at a price of p per unit, and also pay a fixed
fee f, our profit function is
π = n( f + pq) − C (nq)
subject to the two constraints
1. f + pq ≤ v(q)
which says the total price the customer pays must not be greater than their total
willingness to pay for q units, and
2. p(q) = p or V '(q) = p
which are two equivalent ways of saying that the consumer will maximize their
individual surplus. In other words, customers will buy a certain quantity at a certain
price based on their own marginal willingness to pay – the firm can not dictate both
price and quantity independently. Substituting the first constraint into our profit
function gives
π = nv(q) − C (nq) ,
which is in the same form that it took with block pricing example. Therefore, we
know the solution will be the same when it is maximized, namely, quantity q* at
which p = MC . We can then plug q* into the constraints to find f and p as
p* = MC (q*) (9.7)
and
f = v(q*) − p * q * . (9.8)
173
Example: Block and Two‐Part Pricing
Suppose there are n identical customers with willingness to pay
v(q) = 10q − 0.25q 2 . If the firm has a constant marginal cost of $2, find profits
using simple linear pricing, the bundle price and profits using block pricing, and
the fixed fee and profits using two‐part pricing.
Solution: With simple linear With block pricing, the price of the
pricing, the firm will set marginal bundle is equal to the total willingness
revenue equal to marginal cost: to pay for 16 units:
p = 10 − 0.5q P = V (16) = 10(16) − 0.25(162 )
π = n(10 − 0.5q)q − 2nq P = 96
dπ π = n ( 96 − 2(16) ) = n64
= n(10 − q − 2) = 0
dq
q=8
p=6 With two‐part pricing, the per‐unit
price is equal to marginal cost, and the
π = n(6 − 2)8 = n32 fee is equal to the leftover consumer
For both block and two‐part surplus:
pricing, the firm needs to sell the p=2
quantity q* that maximizes total f = cs = v(q) − pq = 96 − 2(16) = 64
surplus. Total surplus is value
minus cost: π = n64
One of the main assumptions for both of these pricing models was that the
customers were identical. Typically, this is not the case, and different customers will
have different valuations of a firm’s products. When this is the case, a single fixed fee
or bundle will not capture as much consumer surplus as when all the customers
were identical.
If the firm is able to segment its customers into separate groups based on their
willingness to pay, it could charge them each a separate “membership” fee and
extract the maximum amount of surplus from each customer group. For example,
when discounted membership fees are offered to seniors at a golf club, this is
exactly what they are doing – charging them less based on their willingness to pay.
It is not always possible to separate different customer groups by their
willingness to pay, or if it is, it may not be feasible to charge them accordingly. If a
174
firm cannot explicitly separate its customers, it has a few options. The first is to
continue using two‐part pricing, but changing the pricing mechanism slightly. When
doing this, there are two tradeoffs the firm must balance: i) as the firm increases the
fixed fee, in order to capture more consumer surplus from the consumers with the
higher willingness to pay, it may cut out some of the consumers with the lower
willingness to pay completely from the market; ii) as the firm increases the per‐unit
price, in order to capture more revenue, it will lose sales per customer, but
customers won’t drop out of the market completely. This flexibility of two‐part
pricing means the firm has options in the presence of customers that aren’t
identical.
The second option the firm has is to use “menu pricing” to get customers to
segment themselves. By constructing different bundles that have specific quantities
and prices based on different customer groups’ willingness to pay, the firm can
effectively charge different customers different prices, thereby increasing its overall
profits. This topic is the subject of the next section.
Menu Pricing
In block and 2‐part pricing, we assumed we had identical customers. If we had
multiple customer types, but still had complete knowledge about their willingness
to pay, we would be able to segment the market and just have discriminating block
or 2‐part prices. We talked briefly about how a two‐part price might be determined
when there are multiple consumer types, but, the seller cannot identify them
explicitly. In other words, what if there is asymmetric information about customer
demand – the customer knows their individual demand type and the firm does not.
This is a form of adverse selection – the customer knows their type but the firm does
not.
As we briefly discussed when considering block pricing, it is possible to set a
single block price, perhaps based on some sort of “average” customer’s willingness
to pay. But, this will cause customers that have “less than average” willingness to
pay to reject the bundle entirely.
Another approach is to use 2‐part pricing, but, now we have to consider “small”
value customers and “high” value customers. When choosing a single fee and a
single per‐unit price for multiple customer types, there is a trade‐off between
lowering the fixed fee and raising the per‐unit price in order to gain some small
value customers, and raising the fixed fee and lowering the per‐unit price in order to
capture more surplus from high value customers. Intuitively, the ideal is when these
effects essentially offset each other, but, we did not derive the precise solution. Two‐
part pricing is this more flexible, and so more profitable, than block pricing in this
setting.
Both of the above methods are ways of coping with multiple customer types
using a single pricing mechanism; however, with a more sophisticated pricing
system – menu pricing ‐ it is possible to do better. Menu pricing attempts to deal
175
with the problem of asymmetric information about customer types by offering
multiple bundles at different prices, with a different bundle and price designed for
each type of customer. There is no way to force a customer to buy the intended
bundle, due to incomplete information (i.e. there is no potential for 1st degree price
discrimination), so, the firm must design the bundles so that the customer’s will
voluntarily make the choice the firm intended.
With menu pricing, if there are two consumer types, high value customers and
low value customers, but the firm can’t tell them apart, the firm offers two bundles.
The first bundle is targeted towards big value customers, and thus has a high
quantity q, a high bundle price P, and a low price per unit p, where p = P/q. This
should make intuitive sense – a customer that values a product a lot will, for a given
price, want to buy a lot of the product. Thus, this customer will be more attracted to
a bundle that has a large quantity of the product at a low price per unit. The second
bundle is targeted towards small value customers; these customers aren’t
concerned with buying copious amounts of the product, and so will buy a bundle
with a low quantity q, a low bundle price P, and a high price per unit p = P/q. The
reason they are willing to pay a higher price per unit is because they aren’t buying
that many units to begin with, and as such will avoid the higher total price of the
first bundle that’s intended for the big customers.
This reveals a subtle but important point as to why profits won’t be as high with
menu pricing as they would be if a firm had complete knowledge about the market,
and was able to engage in perfect price discrimination – charging each customer
their maximum willingness to pay. Since the big value customers have a higher
willingness to pay than the small value customers, regardless of how the small
bundle is designed, the big customers will always value it higher than the small
customers; in other words, as long as the small customers are willing to buy the
small bundle, the big customers will always get some surplus from buying the small
bundle. When designing the big bundle, the firm would like to charge a bundle price
P that is equal to the big customers’ total consumer surplus, just as we showed when
we introduced block pricing – to get it all as profit. But, if the firm did this, leaving
the big customers zero surplus from buying the big bundle, the big customers would
simply buy the small bundle, since they will always get some surplus from that. So,
when designing the big bundle, the firm must allow the big customers to retain
some of their consumer surplus to ensure that they will buy it. This is why it is not
as profitable as if the firm could discriminate and only offer the big bundle to the big
customers.
Now let’s define terms with regard to the menu pricing problem.
nH: the number of high demand customers, whose value is VH(qH) each
nL: the number of low demand customers, whose value is VL(qL) each
Total Cost: C(nLqL + nHqH), where nLqL + nHqH is the total quantity sold
Two bundles, one big (PH, qH) and one small (PL, qH)
Using this notation, profit is
176
π = nL PL + nH PH − C (nL qL + nH qH )
subject to the following constraints.
(1) PL ≤ VL (qL ) : You cannot charge the low type more than their total
willingness to pay, or they won’t buy
(2) PH ≤ VH (q H ) : You cannot charge the high type more than their total
willingness to pay, or they won’t buy
(3) VL (qL ) − PL ≥ VL (q H ) − PH : The low value customers must get at least as
much consumer surplus from their bundle as they do from the big
bundle
(4) VH (q H ) − PH ≥ VH (qL ) − PL : The high value customers must get at least as
much consumer surplus from their bundle as they do from the small
bundle
Constraints (1) and (2) are called participation constraints because customers
will not participate if you violate them. Based on what was said earlier, we know we
won’t be able to ever charge the high value customers their total willingness to pay,
since they could just buy the low value bundle and retain surplus; so, the second
constraint is non‐binding, in the sense that it will be automatically satisfied given
the setup of the rest of the problem. The first constraint, though, will “bind.” That is
because we can make more profit from charging the small customer a higher
package price as long as they buy. So, we will want to charge them the highest price
possible, equal to their total willingness to pay. Any higher and they would not buy.
So, constraint (1) actually constrains our choice, and holds with equality, while
constraint (2) does not actually constrain our choice.
Constraints (3) and (4) are called selection constraints because customers will
select the “wrong” bundle if you violate them; they are also sometimes called
incentive constraints. We know V(q) – P is consumer surplus, so the above
constraints can be rewritten as
3’. CSL(qL) > CSL(qH)
4’. CSH(qH) > CSH(qL).
Since we never need to worry about the low value customers buying the big bundle,
constraint (3) will never be violated, and, we can ignore it. We would like to charge
the big customer their whole willingness to pay, leaving them no surplus. But, as
we’ve already discussed, trying to do so would cause the high value customers to
buy the small bundle. So, constraint (4) is binding, which means we will be tempted
to violate it, so it will hold with equality and we cannot ignore it or we would violate
it.
Since we’ve narrowed these four constraints down to two that bind (1 and 4)
let’s look at these two again. Constraint (1) says that the price of the small bundle
should be less than or equal to the small value customers’ total willingness to pay. Is
there any reason the firm should charge them a lower price than their total
177
willingness to pay? Since they don’t have any other option (i.e. they won’t consider
the big package), no. Thus, the firm should set (total package) price equal to their
total willingness to pay, or PL = VL(qL). (Another reason equality should hold is that
the lower the price is on the small bundle, the more consumer surplus a high value
customer will keep if they bought the small bundle, so the more likely it is that the
high value customer will buy the small bundle).
Constraint (4) says high value customers should get at least as much consumer
surplus from the big bundle as they should from the small bundle. Is there any
reason they should get more? Remember, the objective is to incentivize the high
value customer to buy the big bundle; but, giving them significantly more consumer
surplus for buying the big bundle does nothing but reduce profits for the firm. Thus,
equality also holds for the fourth constraint, or VH(qH) – PH = VH(qL) – PL.
Constraint (1) implies PL = VL(qL) and constraint (4) implies VH(qH) – PH = VH(qL)
– PL. We can rearrange constraint (4) and, solving for PH we get
PH = VH(qH) – VH(qL) + PL.
This says we can set the price for the big package PH equal to the price of the small
package PL plus any extra value the high value customers gets from buying the big
package over the small package VH(qH) – VH(qL). Substituting the first constraint into
the rearranged second constraint we get
PH = VH(qH) – VH(qL) + VL(qL).
Substituting PL (the first constraint) and PH (the second constraint, in its final form)
into our original profit function, we get
π = n LVL (qL ) + n H (VH (q H ) − VH (qL ) + VL (qL )) − C(n L qL + n H q H ) .
Remember, qH and qL are the sizes of the big and small bundle, respectively – which
the firm chooses. Maximizing with respect to qH gives:
∂π dV dC
= nH H − nH = 0 .
∂ qH dqH dQ
We can factor out nH, and, recognizing that dV/dq is price and dC/dQ is MC, we
obtain
pH (qH ) = MC .
We know that when marginal willingness to pay equals marginal cost, we are
maximizing value added. This is shown in the figure below, where the quantity that
maximizes value added for the big demander is qH*. So, the result says for high value
customers, we should sell the quantity that maximizes value added.
Maximizing with respect to qL we obtain
∂π ⎛ dV dV ⎞ dV
= n H ⎜− H (qL ) + L (qL )⎟ + nL L − nL MC = 0 .
∂qL ⎝ dq dq ⎠ dqL
178
Because dVH/dq(qL) is pH(qL), or the high value customers’ willingness to pay for the
low quantity, dVL/dq(qL) is pL, or the low value customers’ willingness to pay for the
low quantity, and dVL/dqL is also pL, we have
nL (pL (qL ) − MC) = n H (p H (qL ) − pL (qL ))
and dividing by nL we obtain
nH
pL (qL ) − MC = (pH (qL ) − pL (qL )).
nL
Consider the right side of the equation. We know that for a given quantity, high
value customers will pay more than low value customers; so, pH(qL) – pL(qL) must be
positive. Since the right side is positive, it follows that pL(qL) > MC.
Look at a graph of the inverse demand curves for each customer type to the
right, where we have assumed marginal cost is
constant to keep the graph as clear as possible.
From our first partial derivative, we found
pH(qH)=MC; so, the quantity in the big bundle
will be qH*, and this maximizes value added. The MC
quantity that maximizes value added for the
small bundle is qLe (e for socially efficient) since pL(qL) pH(qH)
that is where pL(qL)=MC; but, in our second q
qL* qLe qH*
partial derivative we found that pL(qL)>MC. So
the actual quantity of the small bundle, qL*, must
be somewhere where the price is higher than the cost; in other words, to the left of
qLe. The intuitive reason as to why we don’t maximize value added with the small
bundle is because we are unable to separate our customers. If we were able to
separate them and charge each customer type by their willingness to pay, we’d
simply maximize value added, and capture the entire consumer surplus using either
block or 2‐part pricing. In menu pricing, we cannot separate our customer types, so
the small bundle quantity doesn’t maximize value added.
179
Example
Type 1 customers have a willingness to pay of V1 = 10q1 – q12/2 and type 2
customers have a willingness to pay of V2 = 15q2 – q22/2, nL=10 and nH=5, and cost
function is C(n1q1 + n2q2) = 2(n1 + n2)q. (Type 2 customers are “high value” since for
any given quantity they have a higher willingness to pay.)
There are two binding constraints. The first is that the price of the small bundle
must equal the low value customers’ willingness to pay, or
2
q1
V1 = 10q1 − = P1 .
2
Second, the high value customers must get the same amount of consumer surplus
(or slightly more) from buying their bundle as from buying the small bundle, or
V2 (q2 ) − P2 = V2 (q1 ) − P1
q22 q12
15q2 − − P2 =15q1 − − P1 .
2 2
Substituting the first constraint in for P1 in the second constraint and solving for P2
we obtain
q22 q12 q12
P2 = 15q2 − −15q1 + + 10q1 −
2 2 2
q22
P2 = 15q2 − − 5q1 .
2
Since we have an expression for the price of each bundle as a function only of the
quantities, we can now set up a profit function.
⎛ ⎞ ⎛ ⎞
π = 10⎜10q1 − q1 ⎟ + 5⎜15q2 − q2 − 5q1 ⎟ − 2(10q1 + 5q2 ) .
2 2
⎝ 2⎠ ⎝ 2 ⎠
To find the quantities of the two bundles, take the partial derivatives. We’ll start
with q2 since it only occurs in 3 places:
∂π
= 5(15 − q2 ) − 2(5) = 0
∂q2
15 − q2 = 2
q2 =13
Now, take the derivative with respect to q1.
∂π
= 10(10 − q1) + 5(−5) −10(2) = 0
∂q1
100 −10q1 − 25 − 20 = 0
10q1 = 55
q1 = 5.5
180
To find the prices of the bundles, plug the quantities into the equations P1 and P2.
Our earlier conclusion was that the quantity in the small bundle would be less than
the quantity that maximizes value added. We can check that by setting the low value
customers’ marginal willingness to pay equal to the marginal cost:
dV1
= 10 − q1 = 2 = MC
dq1
q1 = 8
Since 8 > 5.5, this fits. The following graph
illustrates the values in this example. In
essence, we are restricting the quantity of the 2 MC
small bundle to make it less likely a high
value customer will be tempted to buy it. This pL(qL) pH(qH)
keeps the high value customers “honest,” in q
that they will purchase the bundle that was 5.5 8 13
designed for them – the big one.
To intuitively wrap up what’s going on with menu pricing, let’s try to see the
concepts using a graph. We know the quantities that maximize value added, qHE for
the big bundle and qLE for the small bundle, occur where marginal value equals
marginal cost. If we let the small bundle have quantity qLE, the low value customers’
total willingness to pay, which is the price of the small bundle (PL), is shown in the
left figure below. What is profit from the small demanders? We’re getting the entire
area VL(qLE) as revenue, and our (variable) costs are simply MC×qLE, so profit is the
triangle labeled “π” in the next figure.
π
MC MC
VL(qLE)
pL(q) pH(q) pL(q) pH(q)
q q
qLE qHE qLE qHE
Suppose we go through the same process for the big bundle, and set the price of
the big bundle PH as the high value customers’ total willingness to pay for qHE units.
Now, when a high value customer buys the big bundle, he gets no consumer surplus
(since the price is exactly his total willingness to pay); however, if he buys the small
bundle, his total willingness to pay is shown in the figure on the left. Since the price
of the small bundle is only the low value customers’ total willingness to pay [shown
in an earlier graph as VL(qLE)], the high value customers’ consumer surplus from
buying the small bundle is shown in the next figure.
181
CSH
MC MC
VH(qLE)
pL(q) pH(q) pL(q) pH(q)
q q
qLE qHE qLE qHE
Since the high value customers get this amount of consumer surplus from buying
the small bundle, but none from buying the big bundle if its package price were to
equal their entire willingness to pay, they would never buy the big bundle. Thus, we
cannot set the price of the big bundle as the high value customers’ total willingness
to pay.
In order for the high value customers to buy the big bundle, they must receive at
least the same amount of consumer surplus as they would if they bought the small
bundle. Their total willingness to pay for qHE units is shown in the left figure below.
The price we can charge them is this area, minus the consumer surplus they would
get from buying the small bundle, as shown in the right figure below.
CSH
VH(qHE) MC MC
pL(q) pH(q)
PH
pL(q) pH(q)
q q
qLE qHE qLE qHE
Our profit from the big bundle, then, is just the price (PH) minus the cost
(everything that falls below MC). This is shown in the left panel of the next figure.
π ‐∆πL
π
MC MC
C(qHE) ∆qL
pL(q) pH(q) pL(q) pH(q)
q q
qL E
qH E
qLE qHE
Our earlier claim was that this is not the way to maximize profit. We stated that
restricting the size of the small bundle would actually increase our overall profit. So,
let’s look at our profit if the size of the small bundle is less than qLE. Lowering qLE
lowers the amount of surplus we have to let the high demand customer retain. In the
figure on the right above, we’ve lowered the quantity of the small bundle by ∆qL.
182
Since we can only charge the low value customers their total willingness to pay, and
we’ve shrunk the size of the bundle, we can no longer capture the gray triangle as
profit. Therefore, the triangle is lost profit from our low type customers (labeled ‐
∆πL in the graph).
Since we’ve shrunk the size of the small bundle, the consumer surplus the high
value customers would get if they bought the small bundle shrinks to CSH’, which
means the consumer surplus we must leave them when pricing the big bundle
shrinks to that value as well. Thus, we can charge more for the big bundle,
particularly by the amount labeled ∆πH (the loss of the high value customers’
consumer surplus from buying the new smaller bundle).
Now lets assume there is just one of each type of consumer (for now, just to
make the graphical analysis simple). Since the increase in profits from the high value
customers is bigger than the decrease in profits from the low value customers (∆πH
> ∆πL), total profit is greater. So, now we know the size of the bundle for the small
demander is less than qLE, but how much? The first thing is to notice that in the
previous graph, the change in profits ∆π for either customer type is at the margin; in
other words, it’s the difference in profits for lowering the quantity of the small
bundle by a very small amount. In the graph, the change in quantity ∆qL is bigger to
illustrate the point, but it’s important to understand that the theory applies for
minute changes in quantity.
Since we know we want to keep lowering qL as long as ∆πH > ∆πL, it follows that
we want to keep lowering the
quantity until these two are equal at
the margin; in other words, until the ∆πH
profit we lose from the low value ‐∆πL
customers is exactly equal to the MC
profit we gain from the high value
customers for a marginal (tiny) pL(q) pH(q)
change in qL. This is shown in the q
qL* qLE qHE
figure to the right. The optimal
quantity for the small bundle is qL*.
Remember, we are assuming there is one customer of each type (or, that there
are equal numbers of each type). If this were not the case, you would have to take
that into account when deciding how much profit to take away from the low value
customers. If we think back to our first order condition that we found when first
deriving our profit function, we had
nH
pL (qL ) − MC = (pH (qL ) − pL (qL )).
nL
Looking at the above graph, we can connect our conclusions to this mathematical
equation. The loss in profit from low value customers if we decrease qL slightly, ∆πL,
is the difference between their marginal willingness to pay and marginal cost, or
pL (qL ) − MC , which is just the left‐hand side of the equation. The gain in profit from
high value customers from decreasing qL slightly, ∆πH, is the difference between
183
their marginal willingness to pay and the low value customers’ marginal willingness
to pay, or p H (qL ) − pL (qL ) , which is just the right‐hand side of the equation when nH
= nL. So, we see that the graph relates to our first derivative of profit. Furthermore,
since the equation has nH/nL on the right‐hand side, we can analyze what would
happen if the ratio of high to low value demanders were not equal to 1.
If the number of high value customers increases, then the ratio nH/nL will
increase, which means the profit we take we gain from high value customers (on the
margin) is more important relative to the profit we lose from low value customers.
Since we gain profit from high value customers by lowering the quantity of the small
bundle, the quantity in the small bundle will be less (compared to qL* in the figure).
If the number of high value customers decreases, then the ratio nH/nL will decrease,
which means the profit we lose from the low value customers (on the margin) is
more important relative to the profit we gain from the high value customers. Since
we lose profit from low value customers by lowering the quantity of the small
bundle, the quantity in the small bundle will be greater (compared to qL* in the
figure).
The way we’ve introduced menu pricing has been through offering different size
packages at different package prices; essentially, multiple customer block pricing.
You can also use 2‐part pricing in a similar way, by offering multiple combinations
of fees and per unit prices (and, often, other sorts of benefits as well). Customers
then choose which kind of “membership” they want to have. Using the same
principles from bundle pricing, we know the quantity we want the low value
customers to buy will be less than the
efficient quantity, qL* in the figure. To get
them to buy that amount, the price must
f
be pL in the figure. Then, we set their fixed pL L
fee as their consumer surplus (the triangle MC
labeled fL). We still want to sell the
efficient quantity to the high value pL pH
customers (qH ). The price that gets them
E q
qL* qLE qHE
to buy that quantity is pH, which is the
marginal cost.
184
The consumer surplus the high value customers will get if they buy the small
membership and pay the low fee is labeled CSH in the first graph below. Since they
could get this amount of surplus by choosing the option intended for the low
demand type, they must keep this amount of surplus if they choose the option
intended for them, or, they will not choose it. Therefore, the high fee will be their
total consumer surplus when price equals marginal cost, less that amount of
surplus. This is shown as fH in the second figure below. So instead of offering two
different size bundles for two different prices, we’re now offering two different
memberships, one with fee fL that gets you prices of pL, and another with fee fH that
gets you prices of pH, where fL < fH but pL > pH.
CSH CSH
f
pL L
fH
pH MC pH MC
An important point to take away is that we always want the high value
customers to buy the quantity that maximizes value added, i.e. the socially efficient
quantity. This problem is analogous to a number of adverse selection problems in
other areas of economics, such as choosing income tax rates. If the government has a
certain amount of revenue it wants to raise in the form of tax, and workers can be
classified as “high productive” workers and “low productive” workers, the
government will want to get most of its tax revenue from the “high productive”
workers, as they provide a greater share of the taxable income (just as we wanted to
take most of our profits from the high value customers, as they have the highest
willingness to pay). The problem is, increasing the marginal tax rate on the “high
productive” workers will cause them to want to act like “low productive” workers,
so they can escape the higher tax rates (just as the high value customers in our
problem were tempted to buy the small bundle in order to get more consumer
surplus).
The conclusion is that you want the marginal tax rate on the “most productive”
worker to be 0, in order to incentivize that worker to continue being productive
(take on another project, etc.) and not to be less productive; in our example, it’s that
you want the high value customers to buy a large quantity, and not to be tempted to
buy a small quantity. That does not mean you want their TOTAL tax payments to be
low, or even their AVERAGE tax rate to be low. Indeed, they will pay the highest total
tax and quite possibly the highest share of their incomes as tax. It just means that, at
the margin, since they are the ones that can earn the most and therefore pay the
most in taxes, you want to avoid giving them the incentive to produce less – which
would mean the tax burden would have to be higher on others with less ability to
pay to raise the same total revenue.
185
Chapter 9 Terminology
The following is a list of terms that you should know in order to discuss and
apply the material from this chapter.
Asymmetric Information A state in which one party knows more than others.
Adverse Selection A case of information asymmetry in which one party’s
characteristics are hidden from another party.
Consumer Surplus The benefit received by consumers who can buy a product for
less than their willingness to pay for it. Approximately, it is the triangular area
under the demand curve and above the market price.
Block Pricing A way of avoiding deadweight loss by setting the price and quantity
where consumer surplus is maximized (where marginal willingness to pay equals
marginal cost), then extracting all of the surplus as profit by bundling the units and
setting a single price for that bundle. The bundle price is equal to the consumer
surplus plus the total cost.
2Part Pricing A way of avoiding deadweight loss by setting the price and quantity
where consumer surplus is maximized (where marginal willingness to pay equals
marginal cost), then extracting all of the surplus as profit by charging a fixed fee that
grants the customer the right to purchase the goods. This fee is equal to the
consumer surplus.
Menu Pricing A way for a firm to maximize profits when there are different types
of customers and the firm is unable to identify and separate them into groups. The
firm must deal with this asymmetric information about customer types by offering
multiple bundles at different prices or offering different fixed membership fees,
higher fees allowing customers to buy a good for a lower per‐unit cost.
Participation Constraint Constraint that must hold true in order for a party to
participate. In the case of Menu Pricing, the surplus the customer would need to
participate must be at least as much as the surplus he would receive from not
participating. In this case, the participation constraints would be that the price for
the low type consumer must be less than or equal to his willingness to pay, and the
price for the high type consumer must be less than or equal to his wiliness to pay.
Incentive/Selection Constraint Constraint that must hold true in order for a party
to act a certain way or buy a certain membership or bundle. In the case of Menu
Pricing, the incentive constraints would be that the surplus received by the high and
low type consumers for buying the package or paying the fee meant for them must
be at least as much as the surplus they would receive from buying the other package
or paying the other fee.
186
Chapter 10
Uncertainty with Risk Aversion
When discussing firms making decisions under uncertainty, we assumed they
behaved in a risk‐neutral manner. This was both because one decision made by one
firm represents a very small portion of a well‐diversified portfolio, and because it is
the simplest way of dealing with the consequences of uncertainty for decisions
when the impact of the degree of risk aversion is not, in its own right, particularly
important to the decision.
This isn’t the case with individuals, though, because as single holders of the
gambles that we experience (chance of sickness, bonus packages, etc.), we bear the
full amount of the risk. Therefore, we generally assume that individuals are risk‐
averse.
Expected Utility
Suppose there are different possible outcomes for any given scenario, and each
outcome has an associated wealth level. Let wi denote the wealth level in outcome i,
where i goes from 1 to n possible outcomes, and let fi be the probability of the ith
outcome. Given this information, the expected utility theorem says there exists a
utility function u(wi) such that the option with the highest expected utility,
E (u ) = f1u ( w1 ) + f 2u ( w2 ) + f 3u ( w3 ) + ... ,
or just
EU = ∑ i fi u ( wi ) ,
is chosen.
We’ll expound on what this theorem is really saying, but it’s important to
understand that this is just a model. The point of the model is not to provide a
detailed explanation for how everybody will act all the time; instead, think of it
when looking at the market as a whole. We’re assuming events in the market often
play themselves out as if consumers, when making decisions, act in this way.
The four major assumptions we had for consumer theory (completeness, more is
better, transitivity, and preference for variety) also apply to this model. In addition,
there is one more important assumption:
To explain the independence axiom, let’s use an example. Remember, we’re
talking about individuals making decisions about respective gambles, so A, B, and C
187
are different gambles that the individual is faced with. Let’s assume that you have a
car, and you have to buy car insurance. There is some chance the car will be
wrecked and you will face a large repair bill. Gamble A is if you buy minimal
insurance, and gamble B is if you buy extensive insurance. Assume that we prefer A
to B ( A B ). The theorem says that if we introduce some new gamble C in equal
weights to both of the original gambles, it shouldn’t affect our preference of A over
B. Suppose C is a given probability you will die before taking delivery of your new
car covered by the policy, and, the complimentary probability that you will not.
Adding this gamble to both of the other gambles, creating a compound lottery,
shouldn’t change the fact that you prefer A over B; so, you will choose A&C over B&C.
This is in essence what the theorem is saying.
Let’s look at an example. Suppose an individual’s utility function is U = 10 w ,
and they are faced with a choice between A and B, where
For a risk‐neutral firm, the expected value of the gambles is
EVA = .5(0) + .5(100) = 50
EVB = 36
The expected utility of gamble A is
( ) ( )
EUA = .5 10 0 + .5 10 100 = 0 + 50 = 50
and the expected utility of gamble B is
( )
EUB = 1 10 36 = 10(6) = 60
So, the expected value (or payoff) is higher for gamble A, but the expected utility is
higher for gamble B. This is a result of the individual’s risk‐aversion; even though
the expected payoff is higher for A, this individual gets a higher utility from B
because there’s no risk in the payoff – it’s $36 for sure.
Let’s look at another example. The utility function is the same as in the previous
example, but now the gambles are C and D, where
The expected values are
EVC = .8(36) + .2(0) = 28.8
188
EVD = .5(64) + .5(0) = 32
which means a risk‐neutral firm would choose D over C. The expected utilities are
( ) ( )
EUC = .8 10 36 + .2 10 0 = 8(6) = 48
( ) ( )
EUD = .5 10 64 + .5 10 0 = 5(8) = 40
so the individual will prefer C over D. Again, this consumer prefers the gamble with
less risk (smaller variation in potential outcomes) and the lower expected payoff.
This is not true in general; you have to work out each consumer’s expected utility
using his or her individual utility function to find out if the lower variation in risk is
worth the lower expected payoff.
Now that we know how to calculate expected utilities, we need one more tool in
order to convert them back into values of wealth that have a more concrete
meaning. The certainty equivalent of a gamble is the certain amount of wealth that
gives you the same utility as the gamble gives you. For a risk neutral entity, such as a
firm, the certainty equivalent equals the expected value of a gamble (CE = EV). For a
risk averse entity, such as an individual, the certainty equivalent is strictly less than
the expected value of a gamble (CE < EV).
To find the certainty equivalent of a gamble for a risk averse individual, the
utility of the certainty equivalent must equal the expected utility of the gamble, or
u (CE ) = ∑ fi u ( wi )
and since the certainty equivalent is an amount of wealth, this is a way to assign a
monetary value to a gamble.
Let’s look at an example. Suppose we are faced with a gamble where there is a
50% we get a payoff of $100, and a 50% we get a payoff of $0. Assuming our utility
function is still u = 10 w , we have
( ) (
u (CE ) = .5 10 100 + .5 10 0 )
10 CE = 50 (plugging CE into our utility function)
CE = 5
CE = 25
So for this individual, they’d be indifferent between the gamble described above,
and receiving $25 for sure. Recall that the expected value of the gamble was $50.
Someone who was risk neutral would value the gamble at $50.
189
Constructing a Utility Function for Uncertain Outcomes
Earlier we introduced the concept of expected utility. But, “expected utility” may
still seem like a vague abstraction that is hard to relate to. To make the notion of
expected utility more concrete, consider the following scenario. Suppose you are
faced with a gamble, where there is some probability f that you receive $100, and
some probability 1f that you receive $0. Therefore, the expected utility of this
gamble is
E (u ) = fu (100) + (1 − f )u (0) .
The certainty equivalent is the single, certain, amount of wealth an individual
receives for certain that provides the same amount of utility as the gamble. In other
words, the individual values this amount of wealth as equivalent to the gamble –
which is why it is called the certainty equivalent. Notice that a certainty equivalent
only exists in relation to a particular gamble; there is no such thing as an overall
certainty equivalent for a utility function. The certainty equivalent for our example,
then, is given by
Now that we have a function describing our expected utility of a gamble and our
certainty equivalent for that gamble, we can experiment with different probabilities
of f to see how they affect these values. Remember, we are completely indifferent
between the certainty equivalent and the gamble, since the utility that each provides
us is the same. Keeping that in mind, consider the following thought experiment.
Consider each of the levels of wealth in the left column of the table below (say in
thousands of dollars). Then, ask yourself how high the probability of winning 100 (f)
would have to be make you indifferent between the gamble and the certain wealth
level.
The second column gives the answer to this question for a
hypothetical individual, but, you should go through the thought CE f
experiment and find your own values, too. If one had only 0 for 0 0
sure, we would be willing to take the gamble if f were 0; that is, 16 .4
we’d be indifferent between $0 for sure and a gamble with a 0% 25 .5
chance at $100 and a 100% at $0. Similarly, if one could have 49 .7
$100 for sure, one would only be willing to take the gamble if f 81 .9
were 100; if we had any less than a 100% chance of winning 100 1
$100, we’d keep our original $100, assuming risk aversion.
Now let’s consider a certain wealth of $16. Ask yourself, if you had $16 for sure,
how high must the shot at $100 be for you to give that $16 up? This is exactly what
this value of f represents. So, for this individual, suppose f would have to be .4. That
is, he would be indifferent between $16 for sure, and a gamble with a 40% chance of
$100 and a 60% chance of $0. Moving down the table we see the rest of this
individual’s probabilities, which are just the values of f where he’d be indifferent
between keeping the initial wealth level and taking the gamble.
190
Going through this exercise is equivalent to defining a utility function. If you can
assign values of f that would make you just as happy with the gamble as you would
be with the initial value of wealth, you have essentially defined your utility function.
To find this individual’s utility function, let’s begin by defining the utility of $0 as 0,
and the utility of $100 as 100. The units for utility have no intrinsic value, so
defining them this way (0 units and 100 units) simply sets the scale in a convenient
way. (We will consider the allowable transformations of the utility function in more
detail in the next section. But, basically, multiplying by a constant and adding a
constant will not change the expectation or the certainty equivalents.)
Since the certainty equivalent is
plugging in 0 for U(0) and 100 for U(100), gives
or just
u (CE ) = f 100 .
This makes it clear that identifying the probabilities for a hypothetical gamble that
makes it equivalent to a specified utility IS the same thing as identifying the utility
function itself (assuming the individual’s preferences satisfy the standard
assumptions, in particular the independence axiom).
U(CE)
Going back to our table (recreated at right), and CE f =f100
observing each initial wealth value’s respective 0 0 0
probabilities, we can solve for utility by plugging f into 16 .4 40
the above equation. This is shown in the third column. 25 .5 50
Remember, the units of utility are arbitrary; so, as long as 49 .7 70
you can determine the different probabilities for each 81 .9 90
initial wealth value, you can define a utility function. 100 1 100
The graph to the right shows this particular utility function. We’ve illustrated the
case where f is .5; that is, where we have a 50%
chance at $100. The expected value of this u
gamble is $50, and the expected utility turns
out to be 50 units (due to the way we scaled 100
our utility function). The table tells us the CE of
this gamble is 25, and this is shown on the 50
graph; notice that the utility of the CE is 50,
which is also the utility of gamble. That is
simply because we chose the scale (which is
25 50 $100 w
arbitrary) so that u=100f, which is also the
expected value for this example.
191
In the figure to the right, the
U(100) 100
horizontal distance between the
expected value of the gamble, 50,
and the certainty equivalent, 25, is
50
labeled as the risk premium. The
risk premium is just the amount an Risk
individual is willing to pay to get Premium
rid of the risk they face. Suppose U(0) 0
this individual has an initial wealth 25 50 100
of 100 but faces a 50% chance of CE EV
loosing it all. Expected losses are
50 (100 times 0.5) – the expected value of their wealth is only 50. But, the individual
would happily give up another 25 to be rid of the risk – the risk premium.
Observe that the CE is determined by the curvature of the utility function. If this
utility function were more concave, the single wealth amount that gives a utility of
50 (the CE) would be less than 25. Thus, any operation on a utility function that
changes its curvature (such as squaring) creates a new utility function that is
entirely different from the original, because it will produce different certainty
equivalents.
To stress the point that scale chosen in the above example ‐ u(0) = 0 and u(100)
= 100 ‐ had no impact on the utility function, let’s look at a more general example.
Let u(w) be our utility function, and suppose we are faced with a gamble where we
get $60 with probability f and $10 with probability (1f). Our expected wealth is just
E(w) = f 60 + (1 − f )10
u
and expected utility is
u(60)
E (u ) = fu (60) + (1 − f )u (10) . E(u)
Since the probability f is associated u(10)
Risk
with the payoff of $60, the expected wealth Premium
will just be f percent of the way from 10 to
60. If there’s a 50% chance of $60, E(w) 10 CE E(w) 60 w
will be halfway from 10 to 60; if there’s a
75% chance of $60, E(w) will be 75% of
the way from 10 to 60. The point is to realize that this will be true for expected
utility as well. This is why we draw the straight line from [10,u(10)] to [60,u(60)] ‐
because however close E(w) is to 60, that’s how close E(u) will be to u(60). This is
shown in the figure above when f is about 2/3. The risk premium is the difference
between the certainty equivalent and expected wealth.
The greater the risk premium, the more risk averse an individual is. Consider the
utility functions in the figures below. In the graph on the left, u2 has a sharper
curvature than u1; thus, the risk premium for u2 will be greater, so the individual
192
with the utility function u2 is more risk averse than the individual with the utility
function u1.
u u2 u
u3
u1
u4
w w
In the graph on the right, U3 is a straight line, so the risk premium will be 0 (this
individual values gambles at their expected wealth). This individual is risk‐neutral.
Since U4 is convex, not concave, the risk premium will actually be negative; in other
words, their certainty equivalent will be greater than the expected wealth of the
gamble. Because of this, the individual with the utility function U4 is risk loving.
Uniqueness and Scale of the Expected Utility Function
Recall that in the last chapter, the units of utility were arbitrary, all that mattered
was the ranking of the different consumption bundle. As a result, any increasing
transformation of the original transformation represented the same preferences –
add to it, multiply it by a constant, take the log, square it, whatever. The units, level,
and scale of the expected utility function are also arbitrary. But, any transformation
must keep the ranking of the expected utility the same, not just the ranking of the
utility of wealth. Also, and equivalently, it must keep the certainty equivalents the
same. What that means is that any increasing transformation of the expectation of
the utility function, E [u ( w) ] = ∑ i fi u ( wi ) , is fine. For example, in theory taking the
( ) ( )
natural log, ln E ⎡⎣u ( w ) ⎤⎦ = ln ∑ i fi u ( wi ) , would give the same preferences across
gambles, although it is exceedingly hard to conceive of any circumstance where that
particular transformation would be at all helpful.
Of more interest, though, are transformations of the utility function itself, u ( w ) .
We showed above that the certainty equivalents are determined by the curvature of
the utility function, so, any transformation that changes the curvature represents
different preferences. But, the scale of the utility function, that is the units and level
of utility, does not matter. So, u ( w ) and a + bu ( w ) , where a and b are constants
with b>0, give the same certainty equivalents and preserve the ranking of expected
values. Mathematically, it is relatively easy to demonstrate this is the case:
193
E [ a + bu ( w) ] = ∑ i fi ( a + bu ( wi ) )
= a ∑ i fi + b∑ i f i u ( wi ) .
= a + bE [u ( w) ]
The Value of Insurance
In the example above, where u=10w0.5, if an individual had an initial wealth of
100 but faced a 50% probability of losing it all, their expected wealth would be 50
and the certainty equivalent 25, leaving the risk premium at 25. That means the
individual would accept 25 for sure in exchange for their gamble. A risk neutral firm,
on the other hand, would be willing to pay 50 for it. So, of the risk can be reallocated
form the risk averse individual to a risk neutral firm, the potential gains from trade
are 25, ignoring the costs of facilitating the transaction itself for now.
This concept is the basis for insurance markets. If a risk neutral insurance
industry serves identical customers with independent risks, the value added by the
insurance industry is just the number of consumers times the value of insurance per
consumer (the risk premium), less the costs of writing and administering the
policies. That is:
The distribution of that value between insurance firms and customers depends
on the market structure. If the firms have market power, they will capture some of it
as profit. If the firms are perfectly competitive, the price they charge for a policy will
simply reflect the cost of a policy. This will include expected losses plus the costs of
writing and administering a policy. That is, in a competitive insurance industry,
Policy Price = Expected Losses + Administrative Costs per Insured.
Consumers are guaranteed a wealth level equal their initial level less the policy
price. If a loss is incurred, they are fully compensated. Without insurance, the value
of their gamble is the CE. The gain, or surplus, to each customer is
Since initial wealth less expected losses is just expected wealth, this means
194
In a perfectly competitive insurance industry, consumers capture all value added as
consumer surplus.
The ability of the insurance industry to add economic value stems from the fact
that insurance companies value gambles at their expected value. Since individuals
are risk averse, their certainty equivalent is less than the expected value of the
gamble, so insurance companies create value based on the difference of the
expected value and the certainty equivalent – the risk premium.
Why do insurance companies value gambles at their expected value? The reason
individuals have certainty equivalents less than the expected wealth level is because
they are risk averse – they get less utility in the face of uncertainty. Insurance
companies, though, can pool their risk by insuring many clients, which allows them
to act as risk‐neutral entities. Consider the following simple example to illustrate
this point.
Suppose any individual can make $100,000 next year, but there’s a 10% chance
they will get sick and lose $90,000, ending up with only $10,000. For the individual,
the likelihood of ending up with only $10,000 is devastating, since it will directly
affect the amount of money he has for food, rent, etc. In other words, the fact that
there is uncertainty present and that the individual may get sick and lose most of his
income will be crippling for that individual, since his income is what he depends on.
This is what we mean when we say individuals are risk averse. He doesn’t care that
the expected wealth of the gamble is .9(100,000) + .1(10,000) = $91,000, because if
he gets sick, he will be in trouble. Suppose his certainty equivalent is $80,000. This
means he’d be willing to take the $80,000 for sure in order to pass on that risk to
someone else (the insurance company).
Now, the insurance company has many clients. Suppose each client faces the
same gamble as the one described above. If the insurance company only had one
client, it would face exactly the same risk as the individual, and would behave in a
similar risk averse manner. This is because by only insuring one client, the
probability that all of the company’s clients get sick is still 10%, thus costing them
$90,000. But say the insurance company has 10,000 clients. Then, due to the sheer
amount of clients, on average only 10% of them (1,000 clients) will get sick. Since
they have so many clients, it is unlikely that many more or less than 10% will get
sick ‐ so the other 90% won’t file claims. In other words, the probability that every
client will get sick is much, much lower than when the company had one client. This
is why insurance companies can afford to just look at the expected wealth of the
gamble. Through risk pooling, they have drastically reduced the chance that they
will have to pay out claims to more than 10% of their clients.
In this way, insurance companies diversify away their uncertainty through risk
pooling. This is why they act as risk neutral entities. In the previous example, the
insurance company was able to diversify away the risk by adding more clients
because each client’s risk was independent; that is, the probability that one client
got sick and filed a claim to the insurance company was completely separate from
195
the probability that another client got sick. This is an important assumption for risk
pooling. If the risks were not independent, the diversification would not be effective.
Imagine providing windstorm insurance for homes on the coast in Florida. If each
home has a 10% chance of getting destroyed by a hurricane, you cannot diversify
away that risk by insuring many homes, since a single hurricane will destroy all of
the homes. So, a company that provides windstorm insurance must insure separate
geographical locations (i.e. independent risks) in order to diversify their risk
properly.
Limitations of the Expected Utility Model and Rational Man Models
We’ve been talking about the expected utility model and how it allows us to
model choices of rational individuals when faced with uncertainty. There are limits
on how accurate this model really is. Similarly, there are limits to the accuracy of
any rational model of decision making, with or without uncertainty. We discuss
three apparent violations of simple rational man models: the endowment effect, the
Allais paradox, and the Ellsberg paradox.
An experiment conducted at Cornell illustrates this effect. A classroom was given
a survey asking whether they would want a candy bar or a coffee mug, two items
that have roughly the same market value. About half of the class wanted a coffee
mug, and about half of the class wanted a candy bar. Then, the items were
distributed randomly to the class, so that half were give a candy bar and half were
given a coffee mug. The idea is that, since the items were distributed randomly,
about 25% of the class should have ended up with a coffee mug that wanted one,
25% should have ended up with a candy bar that wanted one, and the other 50%
ended up with either a coffee mug or a candy bar but wanted the other. Then, the
class was allowed to trade their items freely. It was predicted that the other 50%
would want to trade their items, but in fact nobody traded. This, and other
experiments of a similar nature, claims to be evidence of the endowment effect. The
people who didn’t get what they wanted now valued their items higher simply
because they now had them.
The problem with this experiment, though, is that it ignores the transaction costs
of trading. The transaction cost of a trade is anything other than the actual prices of
the goods that it costs you to facilitate the transaction. So, in the example of the
196
Cornell experiment, the transaction cost of one student trading with another would
be to get up, go introduce himself, trade the item, and go back to his seat. Also, since
the goods are worth relatively little (only a couple of bucks) the gains from the trade
are small. Since some students may be shy, or some students may have better things
to do with the time they were given to trade, the transaction costs may be too high
relative to the small value they would gain by getting the good that they wanted.
Thus, for trade to occur, the gains from the trade must outweigh the transaction
costs. In lieu of the problem in the Cornell experiment, a researcher by the name of
John List conducted a different experiment. List looked at the market for collectible
trading cards online, such as baseball cards. The transaction costs for trading online
are relatively low, and the stakes for getting certain cards that are worth a lot are
high, so this experiment ensures that the gains from trade outweigh the transaction
costs of trading. The point of the experiment is to see whether or not this market is
efficient. What he found out was that the traders that were beginners didn’t trade
when they should have, but the traders that were more experienced were pretty
efficient. So, there may have been some evidence of the endowment effect among
the novice traders, in the sense that they were nervous of other traders taking
advantage of them so they were unable to properly value the cards that they had.
But, overall, the market for cards was pretty efficient since there was a group of
traders who were experienced and knew how to properly value their cards. This is
analogous to the stock market, and the takeaway is that in markets where there is a
significant amount of traders who are knowledgeable about the value of the stock
they are trading, they will tend to push the market prices toward their efficient level.
This is not to say the endowment effect is not a real thing. People are subject to
all manner of quirks in their thinking – among them the tendency to focus more on
the positive aspects of the decisions they have already made, which leads to the
endowment effect. The point is the fact that rational man models are not perfect
descriptions of the way we all make everyday decisions does not mean they can
shed no light on the workings of markets – they capture the essence of some of the
important aspects of decision making. We just need to keep in mind that they don’t
explain everything about the way anyone makes any particular decision.
Also, a point needs to be made about the difference between the endowment
effect and sentimental value. If you have a watch that most people think has a value
of $50, but is worth $150 to you because it’s been in your family line for three
generations, there is nothing irrational about this. The sentimental value of a good is
a perfectly rational reason to value a good more than someone else does. The
endowment effect, however, is when someone values a good more than someone
else simply because they own it. This is irrational and means that the person isn’t
able to properly value his or her own goods.
The Allais Paradox is best explained using an example. Suppose there are four
gambles A, B, C and D, and each has the following probability distribution with
respect to three different prizes of $0, $1,000,000 and $5,000,000:
197
Probabilities
Prize A B C D
$0 0 0.01 0.89 0.9
$1,000,000 1 0.89 0.11 0
$5,000,000 0 0.1 0 0.1
First, let’s choose between gambles A and B. Lottery A is $1M for sure. Lottery B
is a 1% chance of $0, an 89% chance of $1M, and a 10% chance of $5M. Typically, an
individual will choose gamble A over B, to avoid the 1% chance of getting $0.
Now, let’s choose between gambles C and D. Lottery C is an 89% chance of $0
and an 11% chance of $1M. Lottery D is a 90% chance of $0 and a 10% chance of
$5M. Typically, an individual will choose gamble D over C, since for an additional 1%
chance of $0 they can have a 10% of $5M.
To find out why this is paradoxical, let’s work out the expected utilities of each
gamble. The expected utilities of A and B are
E(U A ) = 0U(0) + 1U(1) + 0U(5)
Then, A is only preferred to B if the expected utility of A is greater than the expected
utility of B, or
E(U A ) > E(U B )
Similarly, D is only preferred to C if
E(U D ) > E (UC )
198
Since it is impossible for .11U(1) to be both less than and greater than .01U(0) +
.1U(5), choosing A over B and choosing D over C is inconsistent.
This inconsistency is taken to be a violation of the independence axiom. The
independence axiom, in this context, means things that are the same between two
options shouldn’t affect the decision. So, looking back at the table of probabilities for
each gamble, if you look at A and B, they both have an 89% chance of getting $1M.
The real differences between A and B is that A has an extra 11% chance of getting
$1M, and B has an extra 1% chance of getting 0 and an extra 10% chance of getting
$5M. The 89% chance of getting $1M is common between the two gambles, so by the
independence axiom it shouldn’t affect our choice. If we now look at gambles C and
D, they have in common an 89% chance of getting $0 – so this shouldn’t affect our
choice. Looking at everything else, C has an extra 11% chance of getting $1M, and D
has an extra 1% chance of getting 0 and an extra 10% chance of getting $5M, which
is exactly the same difference between gambles A and B. This is why if someone
chooses A over B, but chooses D over C, they are acting inconsistently with the
expected utility model.
The Ellsberg Paradox is also best explained using an example. Suppose there is
an urn with 60 marbles, where 1/3 of them are green, and the other 2/3 are either
orange or blue. Now suppose we are offered the following gambles:
I. You are paid $1,000,000 if you draw a green marble
II. You are paid $1,000,000 if you draw a blue marble
In general, an individual will choose option I over option II. Now suppose there are
two more gambles we are offered:
III. You are paid $1,000,000 if you draw a green or an orange marble
IV. You are paid $1,000,000 if you draw a blue or an orange marble
In general, an individual will choose option IV over option III. These two decisions
illustrate a paradox. Why? Let’s look at expected utilities of gambles I and II:
E(U I ) = 1 U(1)
3
E(U II ) = Pr(B)U(1)
where Pr(B) is the probability that you draw a blue marble. Remember, we aren’t
told how many are blue and how many are orange, so this probability is unknown to
us; that is, we have to make our own subjective guess about it. If we prefer I to II,
then the expected utility of I must be greater than the expected utility of II, or
E(U I ) > E (U II )
199
1 > Pr(B)
3
Basically, preferring I to II means that we think the probability of drawing a blue
marble is less than 1/3. Now, let’s look at the expected utilities of gambles III and IV:
[ ]
E(U III ) = 1 3 + Pr(Or) U(1)
where Pr(Or) is the probability that you draw an orange marble. Now, we will only
prefer IV over III if the expected utility of IV is greater than the expected utility if III,
or
E(U IV ) > E(U III )
Pr(B) > 1
3
So preferring IV to III means that we think the probability of drawing a blue marble
is greater than 1/3. But we’ve already concluded from choosing I to II that we think
the probability of drawing a blue marble is less than 1/3. So, we are acting
inconsistently in the context of this model. This is referred to as ambiguity aversion,
and it basically means that people don’t like gambles where they don’t know the
true, objective probabilities of each outcome.
Uses of the Expected Utility Model
In light of these problems with the model for expected utility, the model still has
its uses. Some of the major ones are:
1. To guide individual decisions. If you believe that more is better, that you have
a preference for variety, that your decisions should be transitive, etc. then the
model can help guide your individual decision‐making processes. When the
model was first developed, it was thought that this was to be its primary
function; that is, to actually present individuals with their own individual
utility function in order to aid them to make logical and consistent decisions.
It turns out this hasn’t really been the main application, but it is still possible
for individuals to use the model in order to prevent errors in decision‐
making.
2. Description of individual behavior. Given our above discussion of the Allais
and Ellsberg paradox, this is untrue. Also, with regard to the Cornell and List
experiments, people may act in accordance with this model to some extent;
200
but this model is simply not accurate enough to generalize all individual
behavior.
3. Asif model for major decisions and experienced traders. We’ve seen that this
model is most accurate when applied to individuals that are both
experienced in their market, and when they are dealing with gambles that
have high stakes. It’s not necessarily saying that these people use this model
explicitly when going through their decision‐making process; only that this
model will help analyze and predict the outcomes of experienced players
making key decisions.
Henceforth we will be using the model as it is described in number three above. We
will soon be talking about contracting, where two parties are entering into an
agreement that is worth a lot of money, and that both parties know a lot of
information about it. In this way, the model of expected utility is applicable to the
extent that it helps us predict the outcomes of these big decisions.
201
Chapter 10 Terminology
The following is a list of terms that you should know in order to discuss and
apply the material from this chapter.
Expected Utility Theorem‐ Theorem that states consumers will make choices
based on the utility they expect with an associated payoff, probabilities of those
expected payoffs, and risk aversion.
Expected Utility‐ The sum of the probability of each outcome multiplied by the
utility of each outcome.
Independence Axiom‐ An underlying assumption of the expected utility theorem
that states if a new lottery is compounded with two original lotteries with a given
probability, it shouldn’t affect a consumer’s choice because the difference between
those two choices remain unchanged.
Certainty Equivalent (CE)‐ The single amount of wealth of a gamble that an
individual receives for certain that provides the same amount of utility that the
actual gamble offers. For risk neutral players, the certainty equivalent is equal to the
expected value of the gamble. For risk averse players, it is less than the expected
value of the gamble.
Risk Premium‐ The value added by the insurance company by charging risk averse
customers their certainty equivalent while the company values the gamble at the
expected wealth.
Risk Pooling‐ Strategy used by insurance companies that allows them to behave
risk neutrally by selling insurance to many customers. Because of the law of large
numbers, actual value will approach expected value.
Diversification‐ Strategy used by insurance companies that works to reduce risk by
serving clients in many locations. This makes the risks independent and can help an
insurance company create more value by acting risk neutrally.
Endowment Effect‐ A phenomenon that occurs when a person places a certain
amount of value on a good before buying it, then, after buying it, places a higher
value on that good because he owns it.
Allais Paradox‐ An inconsistency in the expected utility model that violates the
independence axiom. This occurs when things that are the same between two
options affect the decision.
Ellsberg Paradox‐ An inconsistency in the expected utility model that occurs when
individuals fail to formulate consistent subjective probabilities in the face of
ambiguous uncertainty.
202
Chapter 11
More on Production and Cost
Recall
π = pq – C(q).
C(q) here is our cost function. In reality, a cost function is very difficult to predict
because everything from engineering to tax policy affects it. In this class, our cost
functions represent economic cost, which is the cost of everything you give up for a
specific decision, whether or not it’s readily quantifiable. Note economic cost does
not always line up with accounting cost. For example, if you have a job with an
annual salary of $100,000 and you are contemplating pursuing an MBA, part of the
cost of that degree would be the $100,000 a year you would otherwise be making.
The equation
Cost = ∑ pi x i
i
is a general equation for total cost of production, where p is the cost of a certain
input (e.g. labor), and x is the amount of that input used. We can always produce a
certain quantity by simply buying more inputs, but what we’re interested in is
minimizing our cost. Thus, when we refer to a cost function [or C(q)] we mean the
minimum possible cost of q units. There are two important assumptions to this idea:
1. No pure waste – your managers are not buying extraneous inputs.
2. Choosing best production process – among the alternatives, choosing the
most efficient way to produce q units.
The first one is just generally good business practice; the second one requires more
attention and will be one of the main subjects of this section.
The next thing to define is our production function. This function tells us how
many units we can produce based on different amounts of our inputs. The general
form is
q = f(x1,x2,…,xn)
where q depends, or varies, on inputs x1 through xn. The marginal product of input
xi is defined as
∂q
MPi =
∂x i
which is the partial derivative of the entire production function with respect to the
variable xi. This is just the rate of change of quantity with respect to xi. We usually
restrict our production function to just two inputs, capital (K), and labor (L), and so
for a production function,
q = f(K,L)
203
and the marginal product of labor/capital is
∂q ∂q
MPL= MPK= .
∂L ∂K
Note that when looking at the marginal product of labor, we are looking at how
much quantity changes for an additional unit of labor. It’s how much product one
more unit of labor provides. This is an important concept and will make later
material much easier to understand.
Input Substitution
Looking at a standard job, such as digging a hole for a pool, we could either
spend a lot of money on capital (machinery) to do the digging, or we could spend a
lot of money on labor (man hours) and let the work force do the digging. Since there
are multiple ways of doing it, it’s easy to imagine that there’s some optimal
allocation where we are maximizing our inputs’ cost effectiveness. Now let’s define
two extreme types of substitutes.
1. Perfect substitutes – these are things that when substituted do not have
any effect on output. Red pencils and blue pencils are examples, because
they both accomplish exactly the same thing.
2. Perfect complements – there is a very specific ratio of carbonated water to
flavoring to make the drink Coca‐Cola. As soon the allocation is changed
at all, the end product is no longer the same. So, all of your inputs have
fixed proportions.
If you are dealing with perfect substitutes, you just buy whichever input is
cheapest. With perfect complements, you have no discretion over choice of inputs.
In most cases, however, inputs are imperfect substitutes, which fall somewhere
between perfect substitutes and perfect complements. This is where the decision‐
making process comes in. It follows that the degree of substitutability, or basically
the efficiency of your inputs, comes into play when determining the optimum
allocation of your inputs.
The marginal rate of technical substitution of input i for input j, or MRTSij, is one
way to measure substitutability. In our example, the MRTSLK is how much less
capital you could use if you had one more unit of labor, while maintaining the same
output level.
Suppose MPL = 10 and MPk = 5. If you had one more unit of labor, we can see you
would have 10 more units of productivity; thus, you could give up two units of
capital, and you’d maintain your output. In this example, MRTSLK = 2. In general, the
definition is
MPL
MRTSLK =
MPK
so the more productive labor is relative to capital, the more capital you could give
up for one more unit of labor.
204
Suppose we increase labor and decrease capital to take advantage of this relative
productivity advantage. What happens to the marginal products? Well, with the
addition of a person to a work force, there are fewer tools per person, so MPL falls
(remember, MPL is the increase in productivity for one more unit of labor). What
about the tools themselves? There are less of them, so each tool is being put to more
use. Thus, the MPK increases. Putting both of these together, it follows that MRTSLK
decreases. This should make intuitive sense; as you add more and more labor, the
amount of capital that can be freed up for one additional person decreases.
Example: Cost MRTS
Suppose our production function is q = 4L0.5K0.5.. Find MRTSLK.
Solution: First find the MPL. Taking the partial derivative with respect to L, we
see
∂q ⎛K ⎞
0.5
205
important takeaways from the graph of imperfect substitutes:
1. Slopes down – this is just another way of saying that marginal product is
positive. Since marginal product is positive, one more of X1 will
contribute to q, and thus we will need some amount less of X2.
2. Bowed in – this is the more important conclusion. You can see as X1
increases a lot, the graph flattens out. This can be interpreted as the more
amount of one input you have, the less its marginal product.
3. Isoquants cannot cross.
To look more specifically at #2, look at the following graph, which produces q0
units:
K
1
∆K (‐slope) = MRTSLK
1
∆K
q0
L
In the top of the graph, 1 more unit of labor causes lots of capital to be freed up,
represented by the large ∆K; however, as we add more capital, ∆K becomes smaller.
Also, the opposite of the slope of the above graph is just MRTSLK.
For the final point about isoquants, look at the following graph:
K a
b c
q1
d q2
L
Comparing points a and b, we notice that point a has more of both capital and
labor. Thus, we can conclude that q2 > q1. However, looking at point c and d, we can
see that c has more of both labor and capital, and thus q1 > q2. Since both of these
cannot be true, we have our third assumption.
Now that we have defined isoquants, we can use them to illustrate how we will
minimize cost. First it is necessary to introduce our cost curves. These are called
isocosts, as they represent one cost. Since we are using only labor and capital, our
cost equation is
206
C = wL + rK
where w is the wage rate, or the cost of labor, and r is the rate of interest, or the cost
of capital (think of it as the rate of interest being charged on the machinery we’re
renting, or the amount of interest being charged on a loan we took to buy the
machinery). It’s easier to draw if we rearrange it to the following
C w
K = − L (solving for k)
r r
⎛C⎞
and since K is on the y‐axis, and the equation is in y=mx+b form, the intercept is ⎜ ⎟
⎝r⎠
⎛ w⎞
and the slope is ⎜− ⎟ . Let’s look at a graph with both curves in it:
⎝ r⎠
c2
K
*All three iso‐cost r
curves have a c0
⎛ w⎞ r
slope m = ⎜− ⎟
⎝ r⎠ c1
r
q0
L
In this illustration there are three (blue) iso‐cost curves. The cost of labor (w)
and cost of capital (r) do not change; the only thing that changes is how many of
each input we have. The iso‐cost curve C1 doesn’t have enough inputs anywhere to
reach the q0 isoquant. The curve C2 reaches it in two places, but there is a better
solution. The curve C0 is the best solution, since it reaches the isoquant using the
minimum amount of inputs. This is because C0 touches the isoquant in just one
place. Thus, to minimize cost, you want an iso‐cost curve that is tangent to your
isoquant curve.
In the figure to the right, K* is the optimum amount
of capital, and L* is the optimum amount of labor. Since
the derivative is nothing more than the slope of the
tangent, the derivative of the isoquant at L*,K* and the
slope of the iso‐cost curve are the same. Thus, when K*
⎛ w⎞ ⎛w⎞
‐MRTSLK = ⎜− ⎟ or MRTSLK = ⎜ ⎟
⎝ r⎠ ⎝r⎠
L*
you are producing at minimum cost. Looking at the
definitions of both sides again, it should make sense
why this is true. MRTSLK is the rate at which (internally) you can substitute capital
207
⎛w⎞
for one unit of labor, keeping output constant. ⎜ ⎟ is the market rate at which you
⎝r⎠
can give up capital to buy another unit of labor, keeping cost constant. In order to
minimize cost, the rate at which you can substitute capital for labor internally has to
equal the rate at which the market allows you to do so.
If we write MRTSLK as MPL divided by MPK (which is just its definition), we can
rearrange the equation:
MPL w MPL MPK
= ⇔ =
MPK r w r
which says each marginal product divided by its cost should be equal. To further
explain, the marginal product of labor divided by its cost is how much productivity
you can get for spending $1 on labor. Think of this as “bang per buck.” Thus, when
this equation is equal, that last dollar spent on labor provides the exact same
productivity as your last dollar spent on capital. Note whatever side of the equality
is higher means you should use more of that input, as it is being more productive for
the same amount of money ($1).
The final way to explain this equality requires that we rearrange the equation as
follows:
MPL w w r
= ⇔ =
MPK r MPL MPK
which says the cost of each input divided by its marginal product should be equal.
This is basically describing the cost of obtaining one more unit of productivity using
labor. Thus, when it costs the same to obtain one more unit of productivity using
both labor and capital, your inputs are optimally allocated and you are minimizing
cost. Note in this case the smaller the number the better, since it’s essentially the
marginal cost of obtaining one more unit.
_________________________________________________________________________________________________
Example: Cost – Optimization Condition
Suppose MPL = 10, MPK = 5, w = 20, and r = 5. Which input (labor or capital)
should we use more of?
Solution: We can look at this any of the three ways described above. Let’s look at
the “bang per buck” method.
MPL 10 MPK 5
= = .5 = = 1
w 20 r 5
This says that for $1 spent on labor we can get 0.5 more units, but for $1 spent
on capital we can get 1 more unit. Thus, we should be using more capital.
_________________________________________________________________________________________________
208
So in order to minimize cost, two things must hold:
Pi MPi MP j P P
1. Optimization condition: MRTSij = or = or i = j
Pj Pi Pj MPi MP j
2. Production constraint: q = f(x1,x2,…,xn)
The optimization condition can be expressed in any of the three above ways, and
it’s important that you be able to explain what they mean. It basically means your
inputs are allocated efficiently. The production constraint just means that if you
want to produce 10 units of output, you must have enough inputs to physically make
10 units. Note that the production constraint doesn’t take into account any sort of
efficient allocation of inputs. Thus, solving 1 and 2 above together (simultaneously)
will provide us with a cost function [C(q)], which is a function that tells us the
minimum cost of producing q units.
_________________________________________________________________________________________________
Example: Cost function
Suppose our production function is q = 4L0.5K0.5, the cost of labor is w=20 and
the cost of capital is r=5. Find the cost function.
Solution: We know both conditions have to hold (optimization, production).
Let’s first look at the optimization. Using the form MRTSLK = (w/r), we see
MPL K
MRTSLK= =
MPK L
which was found by dividing the two partial derivatives (marginal products). For
a more detailed explanation of the algebra, refer back to the example on MRTS. The
optimization condition tells us
w
MRTSLK =
r
so
K 20
= = 4 ⇔ K = 4L
L 5
which is telling us for each unit of labor we use, we should use 4 units of capital.
Now that we have the optimization condition, we can substitute it into the
production function to solve our system of equations. The production function was
given to us in the problem. Thus,
q = 4L0.5K0.5
q = 4L0.5(4L)0.5 (substituting in our optimization condition)
q = 8L
209
q q
L* = and K = 4L so K* =
8 2
where L*, K* are the optimum amounts of labor and capital. This tells us how
much capital and labor we need for q units of output in the most efficient way. To
represent our cost function, we simply need to multiply how much each input cost
by how many units of input we’re using. So
C(q) = wL* + rK*
⎛q⎞ ⎛q⎞
C(q) = 20 ⎜ ⎟ + 5 ⎜ ⎟ = 5q
⎝8⎠ ⎝2⎠
and the minimum cost of producing q units is 5q. Note that if the question had
asked what amount of labor would be needed to minimize the cost of producing 15
units, we could simply plug in 15 into our equation L = (q/8).
_________________________________________________________________________________________________
Now let’s consider changing input prices. Looking at the optimization condition,
MPL MPK
= ,
w r c1
suppose w (the cost of labor) increases. This must r
MPL
mean that decreases. Thus, to maintain c0
w
r
equilibrium, we will use less labor and more
capital. Graphically, we see in the top figure to the q
right as wage increases from w0 to w1, our iso‐cost
line changes slope and no longer reaches our
isoquant of q units. Notice that the y‐intercept for c0 c0
the original iso‐cost line doesn’t change, since the w1 w0
cost of capital (r) didn’t change. Thus, we have to
increase the amount of both labor and capital in
order to get the new (black) iso‐cost line C1 to
reach our isoquant. It’s important to realize that C1
is greater than C0; this is because C0 could not K*1
reach the isoquant. Looking at the second figure to K*0
the right, we see that the amount of labor (L*)
decreases and the amount of capital (K*)
increases, as we would expect following an L*0
L*1
increase in the cost of labor.
The fact that our allocation of labor and capital changed is not enough to
conclude that C1>C0. This conclusion is drawn because once wage has increased, our
original iso‐cost line (which, remember, shows combinations of labor and capital for
the same cost C0) wasn’t able to reach the isoquant curve. This is why the new cost is
higher than the old.
210
We’ve just seen how to use the optimization condition and production function
to derive a cost function. Now when you see a general profit function π=p*q – C(q),
you know what C(q) represents. The following is another way of arriving at the
same cost function. It represents the method that the computer uses to solve the
same system of equations that we did by algebra. It is a supplemental topic, not
essential for the class, but occasionally has been asked as an extra credit question on
a test.
The Calculus of Cost Minimization – the LaGrangian
The way the computer looks at the problem is the following.
Minimize wL + rK subject to q≤f(LK)
which just means the computer will use calculus to minimize the left function, while
making sure there are enough inputs to achieve the desired level of output. The
equation is
L = wL + rK + λ[q‐f(L,K)]
where λ is the Lagrange multiplier. Then, minimize by solving for L, K, and λ.
Looking at the partials, we see
∂L
= q− f (L,K ) = 0 ⇒ q = f (L,K )
∂λ
which is just our production function. The partials for the next two variables are
∂L ∂L
= w − λMPL ⇒ w = λMPL and = r − λMPK ⇒ r = λMPK
∂L ∂K
dividing these two equations we get
w = λMPL ⎛ w ⎞ MPL
⇒⎜ ⎟=
r = λMPK ⎝ r ⎠ MPK
which is just our optimization condition. Thus we see the use of the Lagrange
multiplier leads to the same set of equations that need to hold. Some students like to
use this method to solve problems, but again, there will never be a problem that will
explicitly require you to use this method.
Cost in the Short Run
Up until this point, we’ve assumed that managers have complete control over
how much capital or labor to buy, or that they are not limited in their decisions of
inputs. In the shortrun, however, some of the inputs may be fixed. Thus, the STC(q)
is just the short‐run (total) cost function, or the minimum cost for producing q units
given a fixed amount of one (or more) input(s).
211
In the short‐run, there are fixed and sunk costs. It’s important to understand the
difference. A cost is fixed if it doesn’t change as quantity changes. Think of it as the
opposite of variable cost. A sunk cost is money that cannot be retrieved. If we take
out a loan to build a factory and pay $10,000 a month in interest, that cost is fixed. If
the factory cost $1,000,0000 to build, it’s not necessarily all sunk costs, because if
we could sell the factory for $900,000 we could retrieve some of it back. In this
example, the $100,000 of the factory that we can’t get back would be our sunk costs.
Given the definition of short‐run, we can define some other common terms that
will be helpful.
dSTC
• SMC (shortrun marginal cost): the derivative of STC, or
dq
STC
• SATC (shortrun average total cost): the average cost per unit, or
q
• SAVC (shortrun average variable cost): variable (total ‐ fixed) cost per unit, or
STC − SFC
q
SFC
• SAFC (shortrun average fixed cost): average fixed cost per unit, or
q
We’re now going to explain the graphs of all the curves. Let’s first look at SMC in
the figure to the right. Why do we illustrate marginal
costs as increasing? Diminishing returns to a fixed factor. $ SMC
Remember that in the short‐run, there is some input that
is fixed; for example, a plant. If you’re producing in a
single plant, at some point if you try to cram more and
more workers in that one plant, each worker’s
effectiveness will decrease. Thus, the decreasing q
marginal productivity with a fixed factor means
increasing marginal costs.
The main assumption in the short‐ $ MC ATC = AFC + AVC
run is that there is a fixed factor, so AVC
marginal cost is increasing. Average
fixed cost decreases as quantity
increases since our fixed costs are held
constant in the short‐run. Average AFC
variable cost, may fall at first if adding
workers to a factory initially makes
everyone more specialized and q
productive. However, for the same
reason that marginal cost increases ‐ diminishing returns to a fixed factor ‐ average
variable costs will eventually rise. Average total cost, which is just the sum of the
AFC and AVC curves, therefore must eventually rise as well.
212
One important thing to note about the above graph is that the marginal cost
curve crosses the average variable cost curve where average variable cost is at its
minimum. There is no economic reason for this; it’s just the way the math works
out. The easy way to think of it is in terms of test grades. If your average is an 80 and
you get a 90 on a test, it will pull up your average. Think of the marginal cost as your
next test grade, and the average variable cost as your overall grade. The only time
that the marginal cost curve would not change your average is if it were equal to
your average (i.e. a test score of 80 wouldn’t change your class grade of 80). Thus,
when they cross, average variable cost won’t go up or down, which is why it crosses
it at the minimum point on the graph. This is the same reason why the MC curve
crosses the ATC curve at its minimum.
To clear up some terminology issues that may arise later, let’s define a tricky
cost. Say your factory is up, but you aren’t producing any units. Any money you’re
paying just for the factory is the fixed cost, since it doesn’t vary with output (as your
current production level is 0). If you decide to produce even one unit, and have to
incur another $10,000 in order to start your assembly line, clean your machines, etc.
we refer to these start‐up costs as variable costs because they are avoidable. Even
though these don’t change between producing 1 and 100 units, they are incurred
between producing 0 and 1 unit, and thus are lumped into the variable cost
category. Thus, the costs that are wrapped up into the AFC curve are those that are
not avoidable; those that are avoidable are contained in the AVC curve.
_________________________________________________________________________________________________
Example: Shortrun Cost
Using the same production function, cost of labor, and cost of capital as in cost
function example, if our amount of capital is fixed at K = 4, what is the new
minimum cost function?
Solution: From earlier, our production function was q=4 L K , r=5, and w=20.
Since K is fixed, we can just plug it in to our production function.
q= 4 L 4
q=8 L
q2 = 64L
q2
L=
64
Notice there is no need to efficiently allocate resources to both labor and capital
(as in the previous example) since the amount of capital we have is fixed; in other
words, we don’t have the freedom to parcel out our resources as economically as
possible. Our short‐run total cost is
STC = wL + rK
213
q2
STC = 20 * + 5* 4
64
5
STC = q2 +20
16
If you remember from our first example (where capital wasn’t fixed) our cost
function was 5q. Thus, the fixed factor has increased our minimum cost, as we
would expect. The rest of our costs are as follows (remember in this case, capital is
fixed and labor is variable):
rK 20
AFC = =
q q
d⎛5 2 ⎞ 10 5
MC = ⎜ q + 20⎟ = q = q
dq ⎝16 ⎠ 16 8
5 2
q
wL 16 5
AVC= = = q
q q 16
_________________________________________________________________________________________________
Longrun cost curves
In the long‐run, there are no sunk
$ LRAC
costs. Thus, you have complete control
over the amount of each input you
choose to use. We assume that all costs
are variable, since the variable costs by
definition are those that can be avoided.
IRS CRS DRS
In the long‐run, you can plan to either
make or sell a factory, which is why q
there are no fixed (unavoidable) costs.
The standard longrun average cost M.E.S.
curve is shown in the figure to the right.
The reason the LRAC curve is U‐shaped is the following. At low quantities, we
can utilize economies of scale and our average cost falls. It’s at this point that adding
more machinery or workers is beneficial. This is also called increasing returns to
scale (IRS). Then there’s a period in the middle where adding another machine or
worker costs about the same amount of money as it provides; thus, average cost
doesn’t change a lot. This is called constant returns to scale (CRS).
The final part of the cost curve increases, which suggests some decreasing
efficiency as output increases. In our short‐run model, this is because we were
restricted to a fixed factor (i.e. one plot of land, one factory). Why would average
cost increase in the long‐run? We can build as many factories as we like, or hire as
214
much labor as we need; we aren’t restricted by any fixed factors. The replication
argument suggests that as each of your production facilities reaches its constant
returns to scale (which is also called the minimum efficient scale) that you should
just reproduce that facility and thus avoid the increasing costs at higher outputs.
There is a fallacy in this argument, however.
Looking at an individual firm, we know that there are several workers at the
bottom, some middle workers who manage the bottom workers, and then some sort
of governance structure at the top that manages the long‐term direction of the firm.
If the replication argument holds, in order to obtain higher outputs we simply open
up new factories, each at our M.E.S. (minimum efficient scale) to avoid higher costs.
The problem is that as the efficient process is being duplicated, there is more
information for the managers at the top to process. Thus, as the firm grows in size
managers become less efficient, and costs go up. This is the reason for increasing
average costs at high quantities of output, or diminishing returns to scale (DRS).
The long‐run marginal cost
curve has the same property as the $
LRMC LRAC
short‐run marginal cost line, in the
sense that it crosses the average
cost curve at its minimum. This is
shown in the top figure to the right.
If we add a demand curve to our
LRAC curve, as in the bottom figure q
to the right, we can see that the
different economies of scale that a
firm may encounter as it increases $
production may not matter. Looking
at demand curve d0 we see that the
total industry demand will never be
enough for our factories to
encounter constant or even LRAC
decreasing returns to scale. This d1
situation lends itself to a monopoly, d0
since it’s most efficient for one firm q
to produce the quantity demanded.
If instead demand is given by demand curve d1 there may be several firms
producing at M.E.S without exceeding demand. In general, as demand for a product
increases, the more firms may “fit” in the market, which makes intuitive sense.
215
Finally, let’s look at both short‐run and long‐run average cost curves on the same
graph. Remember, the definition of LRAC is the minimum amount of producing q
units; it isn’t subject to any sunk (or fixed) costs. At quantity q0 we get our minimum
cost by looking at the LRAC curve. If we build a plant specifically designed to
produce q0 units efficiently, SRAC and
LRAC will be the same. However, if we $
use that plant to produce anything SRAC LRAC
other than q0 units, SRAC>LRAC
because in the short‐run we are
limited to a fixed factor. This is true in
general; short‐run average costs are
higher than long‐run costs at any
given quantity, unless your plant is
producing the amount of output it was
q0 q
designed for.
Minimizing Costs with Multiple Plants
Many firms have multiple plants, which are designed to use different production
techniques. For a simple introduction, let’s assume we have two plants A and B. If all
“fixed” costs are sunk, then the decision becomes how much to produce in plant A
and how much to produce in plant B. Suppose that initially MCA<MCB. You should
produce initially in plant A, since it’s cheaper to do so. However, we know that as we
produce more in one plant, diminishing returns to a fixed factor will increase MCA.
At some point, it will be as expensive to produce another unit in A as in B. At that
point, it makes sense to use both plants. If ever the marginal cost is lower in one
than in the other, cost could be reduced by reallocating output to produce more in
the cheaper plant, which would increase its MC, and less in the other, which would
decrease its MC. The only time output should not be reallocated is when the
marginal costs are equal across plants. Thus, the optimum allocation of production
across both plants occurs when MCA=MCB.
This is illustrated in the figure to
the right. we see at quantities up $
MCB MCA
until q0 that MCA<MCB, so we just MCFIRM
produce in A. Eventually MCA $*
increases (because of diminishing
returns) and we switch to using both
plants. Remember, we want
MCA=MCB, so for a given level of
marginal cost ($*), we see how much
we can produce in plant A (qA*), how q0 qB* qA* q* q
much we can produce in plant B
(qB*) and add together the
horizontal distances to obtain how much our firm can produce at a given marginal
216
cost (q*). Since this is the minimum cost of producing q units for our entire firm
(using both plants), this becomes our firm’s MC curve. In summary, if fixed costs are
sunk, produce where MCA=MCB.
Suppose “start‐up” costs are not yet sunk. In other words, we can avoid certain
costs if we don’t produce any units in a certain plant, but, as soon as we produce any
output at all, the cost becomes fixed. For example, it might cost $10,000 to start up a
production line. Once the line is ready to go into production, that $10,000 is sunk
and cannot be recouped. Thus, for very low quantities, we would want to use only
the plant with the lowest start up costs. Suppose plant B has the lowest start up
costs, even though it has a higher marginal cost than plant A. Then we would use
only plant B for a small enough output level to save on start up costs. For very high
quantities, the start‐up costs become less important, and you would want to
minimize the variable costs. This becomes the same situation as earlier, and you
would want to use both plants and produce where MCA=MCB. At some intermediate
quantity, it may make sense to use just the plant with the lowest MC, but this is not
certain – it depends on the particulars of the situation.
To conclude, note that there are three possibilities for producing q units:
produce them all in plant A, produce them all in plant B, or (efficiently) split up
production between plants A and B. The previous paragraph had some rules of
thumb for choosing which plants to use at certain quantities to obtain minimum
cost, but these can always be verified simply by testing all three methods.
_________________________________________________________________________________________________
Example: Multiple Plants
Suppose there are two plants with cost functions
C(q1) = 20 + .25q12 and C(q2) = 10 + q22
and the fixed costs in plant 1 are sunk. What is the minimum cost of producing
20 units?
Solution: A production level of 20 units is neither especially high nor low, so
let’s test all three methods and see which one is cheapest. Note the fixed costs in
each plant are the part of the cost function that does not depend on q (20 in plant
1 and 10 in plant 2).
First let’s see how much it costs to produce 20 units in only plant 1:
C(q1) = 20 + .25(20)2 = 20 + .25(400) = 120
Now, let’s look at producing all 20 units in only plant 2. Notice that the problem
tells us that the $20 of fixed costs in plant 1 are sunk, so even though we’re not
producing any units there, we have to take them into account when looking at
using only plant 2:
C(q2) = 10 + (20)2 + 20 = 30 + 400 = 430
Finally, let’s look at the cost of producing 20 units using both plants. When using
both plants we want to set MC1=MC2.
217
d d
MC1= [C(q1 )] = .5q1 and MC2= [C(q2 )] = 2q2
dq1 dq2
Setting these two equal we get
.5q1=2q2
q1=4q2
which says that for every unit we produce in plant 2 we should produce 4 in
plant 1. Since we have only one equation and two variables, we can’t solve it.
However, we know that the total amount of units that we are going to produce, q,
must be the amount we produce in plant 1 plus the amount we produce in plant
2:
q = q1 + q2
and substituting in our equation from setting marginal costs equal:
q = (4q2) + q2
q = 5q2
q2 = (q/5)
which means (1/5) of total quantity will be in plant 2. Looking again at the total
quantity equation we see
q = q1 + q2
q = q1 + (q/5)
q1 = (4q/5)
so the other (4/5) of total quantity is produced in plant 1. To find the total cost
of production using both plants, we just add the cost of producing q1 units in
plant 1 to producing q2 units in plant 2:
C(q) = 20 + .25(q1)2 + 10 + (q2)2
C(q) = 30 + .25(.8q)2 + (.2q)2
C(q) = 30 + .2q2
Since q = 20 units, we can see how much this would cost:
C(20) = 30 + .2(202) = 110
Thus, to minimize cost, we should use both plants, producing 16 in plant 1 and 4
in plant 2 (found by using the above equations for q1, q2).
________________________________________________________________________________________________
218
Chapter 11 Terminology
The following is a list of terms that you should know in order to discuss and
apply the material from this chapter.
Economic Cost The cost of the decision made plus the opportunity cost of choosing
the next best alternative. This includes both quantifiable and qualitative costs.
Opportunity Cost The value of an alternative given up when making a decision.
Accounting Cost Strictly the monetary cost of a decision. This does not include
opportunity costs.
Marginal Product The additional output achieved when adding another unit of
input. Mathematically, it is the partial derivative of the entire production function
with respect to the input in question.
Capital (K) Factors of production used to make goods and services.
Perfect Substitutes Goods that share similarities and thus can be replaced or
substituted by one another. In input substitution, these are inputs that when
substituted do not have any effect on output. In this case, a manager would just buy
whichever input is the cheapest.
Perfect Complements Goods that are purchased with one another. In input
substitution, these inputs have fixed proportions. In this case, a manager has no
discretion over the choice of inputs.
Imperfect Substitutes Goods that fall somewhere between perfect substitutes and
perfect complements. Decision‐making is important with these goods because a
manager must choose an input combination that is the most efficient in order to
determine the optimum allocation of inputs.
Marginal Rate of Technical Substitution (MRTS) A tool for measuring
substitutability. It is the rate at which one input can be substituted for the other
while producing the same level of output.
Isoquant A graphical representation of the substitutability of inputs. It shows
different combinations of inputs that produce a single amount of output.
Isocost A line that shows different combinations of inputs that cost the same
amount.
LaGrangian A way to set up a constrained optimization problem for finding a
minimum or maximum function subject to constraints. This can be used to minimize
cost with respect to a production constraint or maximize utility with respect to a
budget constraint.
Sunk Cost Money spent in the past that can never be retrieved, regardless of the
any decision made.
219
Short Run Time period in which there is a fixed factor of production (e.g. a
production plant) and fixed costs cannot be avoided if the manager decides to shut
down the firm.
Long Run Time period in which there are no fixed factors of production. All fixed
costs can be avoided and there are no sunk costs. A manager in a long run situation
would have complete control over the amount of each input used.
StartUp Cost Cost incurred only when a manager chooses to start production in a
firm. Since this type of cost can be avoided, it is referred to as a variable cost.
Long Run Average Cost A curve made up of the minimum points of all of the short
run average cost curves.
Economies of Scale Benefits that a firm experiences from expansion up until a
certain point in the long run.
Increasing Returns to Scale (IRS) In the long run, the period at low quantities
when a manager can continue to produce more by purchasing fewer inputs, thus
lower points on the average cost curve.
Constant Returns to Scale (CRS) In the long run, the period at a medium level of
output when producing more is going to cost the same.
Decreasing Returns to Scale (DRS) In the long run, the period at high levels of
output when producing more means buying even more inputs than usual, thus
higher points on the average cost curve.
Minimum Efficient Scale (MES) The point at which long run average costs are
minimized. When producing high levels of output, a manager will want to open
many plants operating at MES instead of producing the output in one plant.
Replication Argument Argument that suggests that as each production facility
reaches its constant returns to scale or MES, the facility should be reproduced, thus
avoiding the increasing costs at higher outputs. A fallacy to this argument is the high
level of information to be processed by top‐level management, leading to
inefficiency and higher costs (diminishing returns to scale).
220
Part 4
Game Theory ‐
Modeling Strategic Interaction
221
Chapter 12
One Shot Games with Discrete Choices
In markets where firms are neither pure monopolists nor perfect competitors
(price takers), one firm’s actions have significant impacts on the others. Thus, each
firm takes account of this interdependence when making decisions. Game theory is a
tool that allows us to analyze how decisions are made in environments involving
such strategic interdependence. When we analyze situations (games) in which
strategic decisions are weighed against one another, we are basically searching for
“reasonable” solutions to the game – that is, predictions about the way “rational”
players would play.
A simultaneous game is one in which both players move at the same time.
Think of it as a football game where, during each play, both players have to make
their best guess about what they think the other will do. A sequential game is
where the first player moves, and then the second player moves. Think of it as a
game of chess; the second player that moves already knows what decision the first
player has made, and factors that into what they plan on doing. Finally, a oneshot
game is one that is played just once, while a repeated game is one that is played
many times.
The basic building blocks for describing a game are the following:
1. List of players
2. List of decision nodes (points in a game where a decision is to be made)
and choices available at each node
3. Payoffs (what they are playing for) for every possible outcome.
In order to analyze a game, we take the basic description of the game and use it
to identify all possible strategies for each player, that is, all the ways it is possible for
them to play the game, conditional on all the different situations in which they might
possibly find themselves while playing the game. The number of potential ways
even a simple game of tic‐tac‐toe can be played is huge. (The first player has 9!, or
362,880, potential strategies, to be exact.) So, strategies get complicated even for
simple games. Fortunately, the games we will look at are very simple, focusing on
only the one or two most important strategic aspects of a situation.
One Shot Simultaneous Move Games
We will start with the simplest case, a one‐shot, simultaneous move game.
Remember, a one‐shot game means after the game the players aren’t concerned
about any ramifications their decisions may have. Also, a simultaneous game is one
in which all players must act without knowledge of their competitors’ decisions.
222
The Prisoner’s Dilemma
Imagine Luke and Sam are criminals who have been convicted of a certain crime.
The prosecution says that there is enough evidence to convict both of them for some
minor crime, and they will assuredly go to jail for 6 months. They then take each
criminal into a separate room and tell them if they will testify against their partner,
they will drop the 6‐month charge, and their partner will get 24 months for being
convicted of the larger crime. Each criminal has no idea what his partner chose, and
has to make the decision without cooperation. To look at this game, we will set up a
matrix describing all possible outcomes (which is called the normal form):
SAM
Testify Not
Testify L: 24 L: 0
LUKE
S: 24 S: 30
Not L: 30 L: 6
S: 0 S: 6
To clarify the payoffs:
• TOPLEFT: Luke testifies, and Sam testifies. Since the deal was to get the
minor charge dropped if you testify, both players are guilty of only the
major charge (24 mos.).
• TOP‐RIGHT: Luke testifies, Sam doesn’t. Therefore Luke gets the minor
charge dropped, and Sam is guilty of both the minor (6) and major (24)
charges.
• BOTTOM‐LEFT: Luke doesn’t testify, Sam does. Sam gets the minor charge
dropped, and Luke is guilty of both (30).
• BOTTOM‐RIGHT: Neither testifies. Thus, they are both guilty of the minor
charge (6).
Now that we have the payoffs, we can look at this game from each player’s
perspective, and look at their best responses given their isolated knowledge. First,
let’s define some special types of strategies:
• Strongly Dominant: A strategy that always has a higher payoff than your other
strategies, regardless of your opponent’s choice.
• Weakly Dominant: A strategy that always has at least as high a payoff as your
other strategies, regardless of your opponent’s choice.
These will make more sense when we finish our example of the Prisoner’s Dilemma.
223
Sam Sam
Testify Not Testify Not
224
Entry Game
This is a game where there is an incumbent firm, and a firm that is considering
entry into the market. The incumbent has to decide whether or not to expand their
business to meet demand. The entrant has to decide whether or not to enter the
market. The payoffs in this game are profits in dollars.
Entrant
In Out
Expand I: 20 I: 50
E: ‐20 E: 0
Incumbent
Not I: 30 I: 80
E: 20 E: 0
To provide a story to go with these payoffs, assume that it’s inefficient for the
incumbent to expand; they’d rather stay status quo than trying to satisfy the entire
market. If they don’t expand, and the entrant comes in, they will lose a lot of
business. Finally, if the entrant comes in and the incumbent expands, there’s an
excess of capacity, but the incumbent will still make some profit due to brand
recognition.
Now let’s look at the best responses:
Entrant
In Out
Incumbent Expand I: 20 I: 50
E: ‐20 E: 0
Not I: 30 I: 80
E: 20 E: 0
The incumbent has a strongly dominant strategy, since not expanding will
always provide a higher payoff. The entrant does not have a dominant strategy,
since they want to enter only if the incumbent is not expanding.
The solution to this game is not as clear, since both players don’t have dominant
strategies. However, there is still a “reasonable” solution. From the incumbent’s
perspective, they’d rather not expand regardless of the entrant’s strategy. So when
the entrant is looking at this game, it’s reasonable to assume that the incumbent will
never expand. Given that piece of information, the rational decision for the entrant
to make is to enter the market. This process is called iterated elimination of
dominated strategies, and is not as definitive a solution as if both firms had
strongly dominant strategies. Once the “dominated” strategy of expanding is
eliminated from consideration, the entrant has a clear choice – to enter. The
incumbent would prefer the entrant not to enter to achieve a profit of 80, but this
225
would require the entrant to believe the incumbent was expanding. This is possible
in a sequential game, however, and will be covered after some more discussion on
simultaneous games.
Battle of the Sexes
Suppose a couple is going out after work to either a basketball game or a show.
They plan to talk by cell phone after work to decide whether to meet at the game or
the show. When they get off work, cell service is out, and they can’t talk. Each one
has to decide where to go without communicating. The normal form of the game is
shown in the table below.
Girl
Show Game
Show B: 3 B: 1
G: 2 G: 1
Boy
Game B: 0 B: 2
G: 0 G: 3
To explain the payoffs:
Assume the boy’s favorite activity is the show, and the girl’s favorite activity is
the game. They would both rather be together at their least favorite activity, then
apart at their favorite activity. So, when they are together at the show (top‐left) the
boy gets one more utility than the girl, since he likes the show. When they are
together at the game, the girl gets one more utility then the boy, since she likes the
game. When they are apart but at the activities they respectively like, they each get
one utility. When they are apart but at the activities they don’t like, they both get 0.
Let’s analyze this game.
Girl
Show Game
Show B: 3* B: 1
G: 2* G: 1
Boy
Game B: 0 B: 2*
G: 0 G: 3*
We see neither player has a dominant strategy. In some cases they would each
like to go to the show, but in others they’d like to go to the game. Since the idea of
dominance doesn’t get us anywhere in this game, it’s time to introduce some new
terminology to cope with games such as the one above.
226
• Maximin or Secure or Safe: This is essentially a strategy that has the best
downside or avoids the worst outcome.
Looking at this game again, let’s see what both players’ secure strategies are.
Girl
Show Game
227
he will too. This is also the case for the girl. The Nash equilibria are where the
reaction functions intersect. The graph below illustrates this.
Girl
Nash
Show Game Equilibria
Show B: 3* B: 1
G: 2* G: 1
Boy
Game B: 0 B: 2*
G: 0 G: 3*
From before, we had shown that neither player had a dominant strategy, as their
best responses were dependent on the other player’s strategy. Their best responses
are in bold in the above table. Since the Nash equilibrium is where their best
responses intersect, there are two Nash equilibria in this game. This suggests that
there’s no real solution that game theory can come up with for this game, even
though there are two outcomes where each player is playing their best response to
the other.
We can imagine that all of the influences dictating the choice of the boy and girl
in the above game are not included in the payoffs. For example, the boy may want
simply to act chivalrous and defer to the girl’s desires, regardless of his own
individual payoff. In this way, social norms may create focal points, which are
obvious solutions to those who know the richer context in which a game takes place.
The following is an example of a focal point. Imagine the game the girl wanted to see
was the national championship – the Nash equilibria would be the same, but most
likely the boy would know how much the girl wanted to see the game, and that
would be the outcome. Really, of course, if such other factors affect the payoffs, we
should include them in the game, and then the set of Nash equilibria might look
different. But, it may make sense to model a game based on some notion of “typical”
payoffs and then wonder outside the formal context of the model what sorts of
conditions might favor one choice over another in any particular situation.
Referring back to the prisoner’s dilemma and the entry game, there was only one
cell where both players’ best responses were chosen, and since those cells were also
intersections of their reaction functions, they were the Nash equilibria.
Prisoner’s Dilemma Entrant Game
Testify Not In Out
Testify L: 24* L: 0* Expand I: 20 I: 50
S: 24* S: 30 E: ‐20 E: 0
Not L: 30 L: 6 Not I: 30 I: 80
S: 0* S: 6 E: 20 E: 0
N.E.
228
Monitoring Game
Suppose Jen is an employee, and Eric is her manager. Jen can choose to work
hard on any given day, or shirk (slack off). Eric can choose to check up on her, or not
to.
Jen
Hard Shirk
Check E: 30 E: 30
Eric J: 20 J: 0
Not E: 40 E: 10
J: 20 J: 30
To tell a story about the payoffs, imagine the profits of the business are just
based on Jen’s work. If she works hard, the profits the business earns are going to be
the same. If Eric takes time out of his day to check up on Jen, it is going to cost him
something (in the form of opportunity costs). So, if he doesn’t check up on Jen and
she works hard, his payoff is 40, but if he checks, he loses 10 units in the form of
time spent checking up, and his payoff is 30. Thus, he would rather her work hard
without requiring supervision. If she does shirk and he checks, suppose he can fix
the problem, so the business still makes the normal profits, but Jen receives
discipline and gets 0 as payoff. Jen gets paid 20 (unless she is caught shirking) but
receives an extra 10 units from shirking, as long as she isn’t caught.
Now let’s mark the best responses.
Jen
Hard Shirk
Check E: 30 E: 30*
Eric
J: 20* J: 0
Not E: 40* E: 10
J: 20 J: 30*
Eric will check if he thinks Jen will shirk, but he won’t if he thinks she’ll work. Jen
will work hard if she thinks Eric will check, but she’ll shirk if she thinks he won’t
check. Thus, it seems as if this game has no Nash equilibrium, since each player’s
best responses never intercept. However, it’s because there are different types of
Nash equilibriums. They are the following:
• Pure strategy N.E.: Both players make definitive choices.
• Mixed strategy N.E.: Each player chooses some probability associated with
each response. The choices aren’t definite; they are probabilistic.
Since there is no pure strategy N.E. for the game, there has to be some
probability associated with all of the responses. From Jen’s perspective, if she works
229
hard she will get 20 if Eric checks, and 20 if Eric does not check. An equation that
represents this, her expected profit from working hard, is
20fc + 20(1‐fc)
where fc is the probability that Eric checks (and 1‐fc is the probability that Eric
doesn’t check). Therefore, her expected profit from shirking is
0fc + 30(1‐fc)
and based on Jen’s estimation of Eric’s probability of checking (fc) she can choose
which response (working hard or shirking) results in a higher expected profit. If the
first equation is higher than the second equation, she will work hard; if it’s less, she
will shirk.
Now it’s important to understand that if the probability of checking is very low,
Jen’s expected profit from shirking will be higher, and Jen will always shirk. If the
probability is really high that Eric will check, Jen will always work hard. The only
time that Jen will randomize her response is when the two expected profits are
equal. She will do this in order to “fool” Eric, because if she were not to randomize
her response, Eric would know how to react every time. In order to find out when
the expected profits are equal, we just set the two expressions equal:
20fc + 20(1‐fc) = 0fc + 30(1‐fc)
20fc + 20 – 20fc = 0 + 30 – 30fc
20 = 30 – 30fc
30fc = 10
fc = 1/3
Therefore, the probability that Eric checks up on Jen is 33%. If it were lower than
33%, Jen would always shirk. If it were higher, Jen would always work hard. A
probability of 33% keeps Jen randomizing her response.
To find Jen’s probability of working hard, we look at Eric’s payoffs. His payoffs
for checking are
30fh + 30(1‐fh) = 40fh+10(1‐fh)
30 = 40fh + 10 – 10fh
30fh = 20
fh = 2/3
and thus the probability that Jen will work hard is 67%. The interpretation of these
two probabilities is the following: If you are in a situation where Jen thinks there’s a
33% chance that Eric will check on her, she’s happy to work hard 67% of the time;
and if you are in a situation where Eric thinks Jen will work hard 67% of the time, he
is happy checking up on her 33% of the time. The only time you get into a situation
where both players are happy with their strategies and are guessing right about the
230
other player’s strategies, is if they are randomizing their responses with these
probabilities.
The reason randomizing is important in the context of this game is because
players can exploit predictability. Here, if Jen knew Eric was just too lazy to check,
she could exploit that and shirk. That’s why in equilibrium both players are
randomizing to maximize their expected payoffs.
Note that being unpredictable (randomizing your response) doesn’t necessarily
mean playing the game with a 50% chance either way. We just solved the game
above and found out that most of the time (67%) Jen will work hard and seldom
(33%) will Eric check up on her. This is because the payoffs of the different
outcomes influence how tempting it is to either shirk or check up. Imagine, for
example, Jen’s payoff for shirking and not being caught increases; her probability of
working hard would drop to account for the new payoff structure, and Eric’s
probability of checking would increase as a result.
One Shot Sequential Move Games
A sequential game has a first mover and a second mover. The important thing
here is that since the first mover knows the second mover will act based on the first
response, the first mover has control over how the game will be played. Let’s look at
the entry game again, but in sequential form.
Entrant
In Out
231
actually has to expand during his move; he just has to credibly and irrevocably
commit to expanding.
To model the sequential game where the incumbent moves first, we need to look
at the entrant’s strategies. These become more complex, since they are conditional
on the decision made by the incumbent.
Incumbent’s
1st Move
Expand Not
Entrant’s
In In
Strategies
In Out
Out In
Out Out
Each row represents a different set of strategies based on what the incumbent
does in the first move. Think of each row as a set of instructions about what to do in
each situation. Let’s represent all four of these strategies in a normal form table, as
we have been in the previous games.
Entrant
In, In In, Out Out, In Out, Out
Incumbent Expand I: 20 I: 20 I: 50 I: 50
E: ‐20 E: ‐20 E: 0 E: 0
Not I: 30 I: 80 I: 30 I: 80
E: 20 E: 0 E: 20 E: 0
For the column headings (the entrant’s strategies), the first word represents the
response given the incumbent expands, and the second word (after the comma)
represents the response given the incumbent does not expand. These payoffs come
from looking at the initial table of the game. The far left column is the same as the
original left column, and the far right is the same as the original right column. The
difference is the middle two columns, which represent the entrant’s ability to
choose what to do based on the incumbent’s first move.
Let’s look at the best responses:
232
Entrant
In, In In, Out Out, In Out, Out
Expand I: 20 I: 20 I: 50* I: 50
Incumbent
E: ‐20 E: ‐20 E: 0* E: 0*
Not I: 30* I: 80* I: 30 I: 80*
E: 20* E: 0 E: 20* E: 0
We see the players’ best responses intersect twice, and thus there are two Nash
equilibria. However, one of them doesn’t make much sense. The first column (In, In)
means the entrant will enter no matter what. However, since this is a sequential
one‐shot game, it doesn’t make sense for the entrant to play this strategy, since if the
incumbent expanded, they would lose 20. Looking at the other strategy that
contains the Nash equilibrium (Out, In) we see that the entrant could either get 0 or
20, which is always at least as good as the strategy (In, In). Therefore, the strategy
(Out, In) weakly dominates the strategy (In, In), and we can conclude that the
strategy (In, In) will likely not be played.
Until now we’ve been looking at games in the normal form, such as above. Since
we’re dealing with sequential games, we can also write them out as a game tree, or
in extensive form.
In I: 20 | E: ‐20
Expand Entrant
Out I: 50 | E: 0
Incumbent
In I: 30 | E: 20
Don’t Entrant
Out I: 80 | E: 0
The way we find out the solution to this game is through backwards induction.
Looking at the top two responses, which is the scenario where the incumbent
expands, the entrant has a choice between a payoff of ‐20 and 0. Thus, we can cross
off ‐20 (the entrant entering).
In I: 20 | E: ‐20
Expand Entrant
Out I: 50 | E: 0
Incumbent
In I: 30 | E: 20
Don’t Entrant
Out I: 80 | E: 0
233
Now looking at the bottom two responses, which is the scenario where the
incumbent does not expand, the entrant has two possible payoffs, 20 and 0. He
chooses 20, so we can eliminate the last strategy.
This shows the entrant’s best responses to the incumbent’s decision. Remember
when we first introduced sequential games we said it’s the first mover that has
control over the final outcome of the game. We can imagine that the incumbent went
through the same exercise that we just did, and knows how the entrant will respond
to each decision. Thus, the incumbent essentially has a choice over 50 and 30. He
chooses 50, so he will expand, knowing that the entrant will stay out, giving us the
solution to the game (circled in green).
This solution is called the subgame perfect Nash equilibrium (or SPNE); it is
also referred to as the rollback equilibrium. The main assumption with SPNE is that
every player acts rationally at every fork in the tree, whether or not they get to that
particular fork. This is why we were able to eliminate two of the entrant’s four
strategies, which led us to eliminating one of the incumbent’s two strategies. Think
back to the table that gave us two Nash equilibria. We now see the first one (I:30 E:
20) didn’t make sense, because it involved the entrant acting irrationally at a node
of the game tree – he would have had to enter into the market even if the incumbent
expanded, giving him a payoff of ‐20 instead of 0. This is why we call it the SPNE; it
is perfect in the sense that every player is acting rationally, and that there are no
noncredible threats. This is an important concept when using the game tree.
More Than Two Strategies and More than Two Players
Imagine player A and B are competing with each other, and each strategy
represents how hard they compete. Player A has strategies top, middle and bottom.
Player B has strategies left, middle, and right. Below is the game illustrated, showing
each players’ best responses.
B
Left Middle Right
Top A: 0 A: 10 A: 30*
A B: 10 B: 0 B: 20*
Middle A: 10 A: 20* A: 0 N.E.
B: 20* B: 20* B: 10
Bottom A: 30* A: 10 A: 10
B: 10 B: 20* B: 0
Looking at player A: If B chooses left, A chooses bottom. If B chooses middle, A
chooses middle. If B chooses right, A chooses top.
Looking at player B: If A chooses top, B chooses right. If A chooses middle, B
chooses left or middle. If A chooses bottom, B chooses middle.
234
We see there are two Nash equilibria. We would need more information to solve
this game. Perhaps one of the two solutions is a focal point under some
circumstances. Perhaps A is a market leader, and everyone expects them to play top.
Perhaps, since player B does not care, they both expect A to play top because A does
care.
The main point here is that there is nothing special about games with two
strategies. The same solution concepts apply when there are many strategies; they
are just more complicated in the application. In particular, iterated dominance
becomes more complex. To apply that technique, we would first identify any
strategies that are always worse (strongly dominated) than some combination of a
player’s other strategies, OR always worse or at least no better (weakly dominant)
compared to some combination of the players strategies. These dominated
strategies are then eliminated for both players, and, the new, reduced game is
examined. Once again, dominated strategies are eliminated. That is why this
technique is called iterated dominance. This continues until none of the strategies
remaining are dominated. If only one strategy is left for each player, the game has a
solution by iterated dominance.
Now, suppose there were three players in the above game, instead of just two.
We could imagine that player A still chooses a “row,” player B chooses a “column,”
and, that there are three different game tables, like the one above, not just one, and
the third player, player C, gets to choose the “table.” The idea would remain the
same, although it would be more cumbersome to put into practice.
235
Chapter 12 Terminology
The following is a list of terms that you should know in order to discuss and
apply the material from this chapter.
Strategic Interdependence A theory that, in a market with a few players, each
firm sets prices and quantities based on the other firms’ behavior. Each firm has
some degree of influence over supply and demand, thus all firms are dependent on
each other to make the best managerial decisions.
Game Theory A study of behavior in strategic situations. It allows economists to
analyze how competing firms will make decisions based on all available information.
It is a search for the most reasonable way a player would act when strategic
decisions are being weighed against one another.
Simultaneous Game A game in which both players move at the same time.
Sequential Game A game in which there is a first and a second mover and the
second mover will know how the first mover behaved.
OneShot Game A game that is not repeated.
Repeated Game A game that is played many times.
Prisoner’s Dilemma A one shot, simultaneous move game in which both players
would benefit more by cooperating but whose Nash Equilibrium choice offers a
smaller benefit because players are looking to minimize their losses.
Dominant Strategy Strategy that is played regardless of what the player’s
opponent plays.
Strongly Dominant Strategy A player’s dominant strategy that always has a
higher payoff than his other strategies.
Weakly Dominant Strategy A player’s dominant strategy that always has at least
as high a payoff as his other strategies.
Iterated Elimination of Dominated Strategies A player’s strategy that involves
eliminating strategies that are dominated, taking the remaining ones as a new game,
and repeating until a solution is found.
MaxiMin Strategy Also called a secure or safe strategy; it involves the players
choosing strategies with the least amount of risk. The players want to minimize
their potential losses.
Nash Equilibrium A solution to a game in which players know each other’s
equilibrium solutions. It assumes that if there is a best or most reasonable solution
to a game, both players must be choosing their best response to the other player.
Every game that can be written in a normal form has a Nash equilibrium. It occurs
where the player’s reaction functions cross.
Reaction Function A function that puts together all of a player’s best responses to
his opponent’s decision.
236
Focal Point When a player’s focus is on other factors outside of the model that will
ultimately impact the outcome.
Pure Strategy Nash Equilibrium Solution in which both players make definitive
choices.
Mixed Strategy Nash Equilibrium Solution in which each player chooses some
probability associated with each response. Choices aren’t definite; they are
probabilistic.
NonCredible Threat An empty threat made by a rational player. It is not in the
player’s best interest to carry out this threat; opponents will know this and
therefore never believe the threat.
Extensive Form A game tree used to represent sequential games. Backwards
induction is used to find the solution.
Backwards Induction Strategy of finding a solution to a sequential move game in
which a solution for the second mover is found, then the solution for the first mover
is found based on the second mover solution.
SubGame Perfect Nash Equilibrium (SPNE) Also called rollback equilibrium, it is
a solution to a sequential game in which players act rationally at every fork in the
tree, whether or not they get to that particular fork.
237
Chapter 13
One Shot Games with Continuous Strategies
When a firm makes a choice about the price to charge or the quantity to produce,
it faces a continuum of choices, not just a small discrete set. How do the above
arguments extend to such cases? The basic ideas are the same. Each player makes its
best guess about what other players will do, and chooses its best response. In a Nash
equilibrium, all players choose best responses to the strategies of all other players.
We begin by first considering a simultaneous move game and then sequential moves
in an extended example. After that, we briefly consider a general version of a game
with continuous strategies to show that the techniques and intuition applied in the
example apply to any game.
An Extended Example – Advertising Simultaneously
In chapters 12 and 13, we will take up the choices of output and price in
situations of strategic product market competition. To avoid monotony we use a
different example to illustrate the ideas here – the decision of how much to
advertise when each firm’s advertising has significant effects on the sales of the
other firms.
Kevin’s Big Store and Morgan’s Monster Market have to decide how much to
advertise. Assume product prices are predetermined and each customer will
generate $10 in profit for whichever firm they purchase from, before advertising
expenses are subtracted. Mainly, we make that assumption to simplify things. If you
want a more concrete setting, suppose the competing firms are retailers who are
bound by contracts with manufacturers to charge the manufactures suggested retail
price (MSRP).
Think of advertising as producing a series of messages from the firms to
potential customers. Let aK and aM represent advertising expenditures by Kevin and
Morgan, respectively. For our example, assume the quantity sold by Kevin is
qK = 10 + 0.5aK − 0.25aM − 0.005 ( aK + α aM )
2
(13.1)
and the quantity sold by Morgan is
qM = 10 + 0.5aM − 0.25aK − 0.005 ( aM + α aK ) .
2
(13.2)
What sort of situation might demand functions like these represent? First, if
neither advertises, each has sales of 10. However, if one firm does advertise, they
initially gain sales at a rate of 0.5 per unit of advertising, but they reduce the other
firm’s sales by 0.25 units. Thus, advertising both draws in new customers and takes
some away from the other firm. Diminishing returns are reflected in the quadratic
term – they can’t bring in new customers indefinitely or take customers away from
the other firm indefinitely. As Kevin advertises more, holding Morgan’s advertising
constant, the marginal return to his advertising falls. However, the quadratic term in
238
Kevin’s demand depends on both Kevin’s advertising and Morgan’s. That means
changes in Morgan’s advertising affect the rate at which the marginal returns to
Kevin’s advertising fall.
Perhaps an increase in Morgan’s advertising causes the marginal productivity of
Kevin’s advertising to fall more slowly. That might occur if the customers Morgan
gains by advertising, including new ones brought into the market and customers
lured away from Kevin, are more easily lured away from her by Kevin’s advertising,
and vice versa. Of course, it is also possible that an increase in Morgan’s advertising
might cause the marginal returns to Kevin’s advertising to diminish faster. For
example, Morgan’s advertising may create brand loyalty and reduce Kevin’s ability
to lure customers away with his own advertising. As we will later see, the effect of
one player’s actions on the marginal productivity of the other player’s actions is
CRUCIAL in determining the character of strategic interactions. In this example,
whether Morgan’s advertising increases or decreases the productivity of Kevin’s
advertising depends on the parameter α. If α<0, an increase in Morgan’s advertising
means the marginal returns to Kevin’s advertising diminish more slowly – so an
increase in Morgan’s advertising increases the marginal productivity of Kevin’s
advertising. If α>0, an increase in Morgan’s advertising means the marginal returns
to Kevin’s advertising diminish more rapidly – so an increase in Morgan’s
advertising decreases the marginal productivity of Kevin’s advertising.
With these demand functions and a predetermined price such that each sale
brings in $10 in profit before advertising costs are subtracted, profits net of
advertising are
(
π K = 10 10 + 0.5aK − 0.25aM − 0.005 ( aK + α aM ) − aK ,
2
) (13.3)
and
(
π M = 10 10 + 0.5aM − 0.25aK − 0.005 ( aM + α aK ) − aM .
2
) (13.4)
Whatever his guess about Morgan’s advertising level, Kevin will choose his own
advertising to maximize his profit. Taking the derivative of (13.3) gives:
dπ K
= 10 ( 0.5 − 0.01( aK + α aM ) ) − 1 = 0 . (13.5)
daK
We will refer to the derivative of Kevin’s payoff with respect to his advertising as the
net marginal benefit, NMB. If the NMB is positive (negative), Kevin should increase
(decrease) advertising. At the maximum, the NMB is 0. This can readily be
rearranged as follows:
5 − 0.1aK − 0.1α aM = 1 . (13.6)
This sets the marginal benefit of an advertising message equal to marginal cost ($1).
Solving equation (13.6) for Kevin’s advertising level gives Kevin’s reaction
function, or, his best response function, RK(aM). The reaction function gives the level
239
of advertising that will maximize his profit for any level of Morgan’s advertising.
Solving gives
aK = RK (aM ) = 40 − α aM . (13.7)
If Morgan does not advertise, Kevin spends $40. The slope of a reaction function has
important strategic implications. We can find the slope of the reaction function by
dRK da
simply taking its derivative. In this case , or K , is equal to ‐α. For every
daM daM
additional dollar Morgan spends, Kevin spends an additional –$α. If α is negative,
Kevin’s reaction function slopes up. This is shown in the left panel of the figure
below. If α is positive, Kevin’s reaction function slopes down. This is shown in the
right panel of the figure below.
aK aK
RK(aM)
40
40
RK(aM)
aM aM
It is important to clearly understand what determines the slope of a reaction
function. At Kevin’s profit maximizing output level for any level of Morgan’s
advertising, NMBK is 0. That is, NMBK is zero at every point on Kevin’s reaction
function. So, beginning from a point on Kevin’s reaction function, if Morgan’s
advertising increases the NMB of Kevin’s advertising, when she advertises more
Kevin’s NMB becomes positive. Then, to maximize profit he must advertise more,
increasing advertising until once again NMBK is 0. Thus the reaction function slopes
up. On the other hand, if Morgan’s advertising decreases the NMB of Kevin’s
advertising, when she advertises more Kevin’s NMB becomes negative if we began
at point on the reaction function (where NMBK is 0). To maximize profit he must
advertise less, decreasing advertising until NMBK is 0. Thus the reaction function
slopes down.
We can find the impact of Morgan’s advertising on the NMB of Kevin’s
advertising by taking the derivative of equation (13.5). Doing so gives
240
dNMBK
= −10 ( 0.01α ) = −0.1α .6 (13.8)
daM
Since this derivative is positive, Kevin’s reaction function slopes up. If, however,
Morgan’s advertising reduced the NMB of Kevin’s advertising, this derivative would
be negative and his reaction function would slope down, not up.
Taking the derivative of Morgan’s profit function and solving for her reaction
function would work the same way. Since her profit function is basically the same as
Kevin’s and the calculus and algebra works the same way, lets just skip to the
reaction function:
aM = RM (aK ) = 40 − α aK . (13.9)
Above, we defined a Nash Equilibrium technically as the point of intersection of
two reaction functions. This is shown in the figure below, with aKne and aMne
representing the Nash Equilibrium advertising levels of Kevin and Morgan,
respectively. In the panel to the left, the solution is for each to spend more than $40
because each responds to the other’s advertising by advertising more. In the panel
to the right, they each spend less than $40, because each responds to the other’s
advertising by advertising less.
aK aK
RM(aK)
RM(aK)
RK(aM)
aKne
40
40 aKne
RK(aM)
aMne 40
40 aMne aM aM
When the reaction functions slope up, the players’ strategies ‐ here how much to
advertise ‐ are called strategic complements. This is the case in the left panel. When
the reaction functions slope down, the player’s strategies are called strategic
substitutes. This is the case in the right panel.
To solve for the numeric value of advertising in the Nash Equilibrium, we need to
specify a value for α. For purposes of an example, lets assume α=‐0.5. In that case,
the reaction functions are
6 This is just the derivative with respect to Morgan’s advertising of the derivative of Kevin’s profit
with respect to Kevin’s advertising – that is it is the cross partial derivative of Kevin’s profit function,
∂ 2π K
.
∂aK ∂aM
241
RK (aM ) = 40 + 0.5aM (13.10)
and
RM (aK ) = 40 + 0.5aK (13.11)
respectively. Suppose Kevin expected Morgan to spend $40. Then, he would want to
spend $60 (40+0.5⋅40). But, if he spent $60, Morgan would want to spend $70, in
which case Kevin would want to
spend $75, and so on. If instead
RM(aK)
Kevin expected Morgan to spend aK
$120, he would spend $100. But if he RK(aM)
spent $100 Morgan would spend
$90, in which case Kevin will spend
$85, and so on. If you continued with 80
this process, it would converge to
$80. The only advertising level
where both correctly anticipate their 40
opponent’s play and react optimally
is when both spend $80. This is
illustrated in the figure at right. 40 80 120 aM
Equations (13.10) and (13.11) provide two equations in two unknowns. To find
the solution algebraically instead of graphically, substitute from one into the other
and solve. Substituting for aM in Kevin’s reaction function gives:
aK = 40 + 0.5 ( 40 + 0.5aK )
= 60 + 0.25aK
. (13.12)
0.75aK = 60
aK = 80
Substituting the solution for Kevin’s advertising into Morgan’s reaction function
gives the solution for Morgan’s advertising,
aM = 40 + 0.5 ( 80 ) = 80 . (13.13)
This game is symmetric. For our purposes, that simply means the only
difference between the players is their names. If you take Kevin’s profit function and
replaced the Ks with Ms and the Ms with Ks, you get Morgan’s profit function. When
a game is symmetric, it will generally have a symmetric equilibrium, that is an
equilibrium where the players play the same strategy.7 In a game where the reaction
functions can only cross one time, like this one, we know therefore that there is only
one solution and that it will involve both players playing identical strategies.
Symmetry can be very helpful when analyzing and solving games. This one was
easy to solve with simple substitution, but that is not always the case. However,
7 At least, this is true for the most part and for all the cases we will be interested in. There are some
technical issues involved in establishing this with complete generality.
242
symmetry means we know that in the solution aM=aK. Once we find the expression
for NMB and equate it to 0, we can use this fact to solve immediately for the
equilibrium. In this case, NMB=0 and aM=aK become two equations in two
unknowns, with the added benefit that one is very simple which makes solving easy.
For example, substituting ‐0.5 for α in equation (13.5) and applying symmetry gives
10 ( 0.5 − 0.01( a + 0.5a ) ) − 1 = 0
5 − 0.05a − 1 = 0 , (13.14)
0.05a = 4
a = 80
where a, sans subscript, represents the equilibrium level of advertising common to
both players.
So, in the Nash Equilibrium in our example, each spends 80 on advertising.
Plugging this in to the demand and profit functions, we find each sells 22 units and
makes a profit of $140 (10⋅22‐80). If neither advertised, each would sell 10 units
and make a profit of $100. In this example advertising results in higher profits.
However, each firm’s advertising has a negative effect on the other’s profits.
Therefore, if the two firms cooperated with one another, or colluded, profit would
be higher. To see this, calculate what would happen if both advertising levels were
chosen jointly, or cooperatively, to maximize total profit.
Total profit is just the sum of the two firm’s profits.
πT = π K + π M
( )
π T = 10 10 + 0.5aK − 0.25aM − 0.005 ( aK − 0.5aM ) − aK
2
(13.15)
+10 (10 + 0.5a − 0.25a − 0.005 ( a − 0.5a ) ) − a
2
M K M K M
To maximize, the partial derivatives, set them equal to zero, and solve. Taking the
derivative with respect to advertising for Kevin gives:
∂π T
= 10 ( 0.25 − 0.01( aK − 0.5aM ) − 0.01( aM − 0.5aK )( −0.5 ) ) − 1 = 0 . (13.16)
∂aK
The derivative with respect to advertising for Morgan is the mirror image of the
above derivative. In the solution to this cooperative problem, advertising will be the
same for each firm, that is aK=aM, due to the symmetric way each firm’s advertising
enters the problem. We can use that to simplify the solution process. Working from
equation (13.16) and letting a represent the common advertising level gives:
2.5 − 0.1aK + 0.05aM + 0.05aM − 0.025aK − 1 = 0
0.025a = 1.5 . (13.17)
a = 60
243
If both spend 60, both sell 20.5 units and make a profit of $145. Yet, if either firm
unilaterally tries to advertise at a level of $60, the other firm has an incentive to
respond by spending $70 on advertising to maximize their own profit. So, the
cooperative solution is not an equilibrium solution. Thus, there is an aspect of the
prisoner’s dilemma in this advertising game.
Suppose Kevin developed some cost advantage that means he makes $12 per
unit sold instead of $10. How would the game change? First, it would no longer be
symmetric, and we could not use that to simplify finding a solution. But, we can use
our understanding of reaction functions to offer a general analysis of how the
equilibrium changes without solving explicitly for the new equilibrium. Regardless
of the level of Morgan’s advertising, the NMB of Kevin’s advertising increases. That
means his reaction function will shift up – for every level of Morgan’s advertising, he
spends more. This is shown in the figure below. RK0 and RM0 are the initial reaction
functions, and aK0 and aM0 are the initial Nash Equilibrium values. RK1 represents
Kevin’s reaction function after the decrease in his cost, and aK1 and aM1 are the new
Nash Equilibrium values.
In the left panel, advertising levels are strategic complements. When Kevin’s
reaction function shifts up and he advertises more, Morgan responds by advertising
more as well, which drives a further increase in Kevin’s advertising in the new
equilibrium. So, both firms advertise more in the Nash equilibrium. The indirect, or
strategic, effect of the change in Kevin’s cost is the increase in Morgan’s advertising.
This strategic effect undoes some of the direct increase in Kevin’s profit due to the
direct effect of his cost reduction.
aK aK
RM
RK 1
RM
aK1 aK1
RK0
RK1
aK0
aK0
RK0
244
*****It would be a useful exercise for the reader to calculate the Nash Equilibrium of
the non symmetric game when Kevin makes $12 per customer and Morgan makes
only $10.****
An Extended Example Continued – Advertising Sequentially
Return to the original demand functions where we had not yet specified a value
for α, and suppose Morgan moves first. Will she advertise more or less and how will
Kevin respond? As usual in a sequential move game, we begin at the end. First,
suppose Kevin’s reaction function slopes up. Then, if Morgan advertises more (less)
in the first stage, Kevin sees her move and he responds by advertising more (less) as
well in the second stage. Anticipating this at the first stage, Morgan should ask
herself the following question ‐ “Do I want Kevin to advertise more or less?” Since
Kevin’s advertising reduces Morgan’s profit, she wants him to advertise less. At the
simultaneous play equilibrium, the NMB of Morgan’s advertising is 0. That means if
she spends just a little more or a little less, it has no direct impact on her profit. It
does have an important indirect effect, though. If she spends less, since she moves
first, Kevin will see that and spend less in turn. This will increase Morgan’s profit. It
will also increase Kevin’s.
In fact, Kevin’s profit increases by more than Morgan’s when Morgan moves first.
Why? Because he could obtain the same increase in profit by adopting the same
advertising level as Morgan, but he need not do so. Instead, since he moves second,
he is free to choose the advertising level that maximizes his profit, given Morgan has
advertised less. If he makes a different choice, it must be because it results in greater
profit. Thus, his increase in profit must be at least as big as Morgan’s, and is typically
larger.
Now suppose Kevin’s reaction function slopes down. If Morgan advertises more
at the first move, Kevin will see that and respond by advertising less at the second
move. Since Morgan wants Kevin to advertise less, she should advertise more. Again,
at the simultaneous play equilibrium, the NMB of Morgan’s advertising is 0. That
means if she spends just a little more, it has no direct impact on her profit. The
indirect effect is important ‐ if she spends more at the first move Kevin will see that
and spend less in turn at the second move. This will increase Morgan’s profit. Since
Morgan advertises more, Kevin’s profit is lower. Thus, there is a clear first mover
advantage when advertising levels are strategic substitutes.
To work an example, lets return to the given demand functions and again
assume α=‐0.5. Morgan can anticipate that Kevin’s reaction function is
RK = 40 + 0.5aM . Knowing that Kevin will respond this way at the second move after
observing her choice at the first move, she is free to, in effect, choose the point on his
reaction function that she would prefer through her choice of advertising. So, we can
simply substitute this reaction function into her profit function for aK and then
maximize her profit. Her profit function becomes
( )
π M = 10 10 + 0.5aM − 0.25 ( 40 + 0.5aM ) − 0.005 ( aM − 0.5 ( 40 + 0.5aM ) ) − aM . (13.18)
2
245
This simplifies to
(
π M = 10 0.375aM − 0.005 ( 0.75aM − 20 ) − aM .
2
) (13.19)
Maximizing gives
dπ M
= 10 ( 0.375 − 0.01( 0.75aM − 20 ) 0.75 ) − 1 = 0 . (13.20)
daM
Solving equation (13.20), we find Morgan spends 75.56 on advertising. This is less
than the 80 spent in the simultaneous move game. That induces Kevin to spend less
as well. From Kevin’s reaction function, we find he would spend 77.78
(40+0.5⋅75.56). Plugging those values in to find profit, we find moving first
increased Morgan’s profit from 140 to 140.55. However, Kevin’s profit increased
from 140 in the simultaneous play version to 142.22.
Continuous Strategies – General Analysis
This section discusses games with continuous strategies in a very general way. It
therefore appears more abstract and more technical than the previous extended
example. BUT, the ideas and techniques are the same as those above. The main point
is to show that the intuition and technique discussed above is perfectly general and
can be correctly applied to a broad array of situations involving strategic
interdependence. In fact, if you come away with a good intuitive understanding of
the basic concepts, you will be in very good shape for most of the remaining
chapters – much of which will consist of applying these ideas to the analysis of
market and firm structure.
Let’s first define some notation. The two players are A and B. They may be firms,
individuals, or nations, as the situation warrants. Their strategies are represented
by xA and xB. Strategies may be the choice of advertising, price, or quantity by a firm,
the choice of how much to spend on national defense for a nation, or any other
continuous variable that must be chosen in a strategically interdependent
environment. We will let π represent the payoffs. These may be profits for firms,
utility for individuals, or whatever is most appropriate in any particular case.
The important thing is that each player’s payoff is a function of the choices of
both players. So, player A’s payoff is denoted πA(xA,xB) and player B’s πB(xB,xA). As
defined above, we will refer to the partial derivative of a player’s payoff with respect
to their own strategy as the Net Marginal Benefit, NMB, of their strategy. That is,
the rate at which their payoff increases with a small increase in their strategy
choice. The NMB of each player’s strategy is a function of the level of both strategies
∂π ∂π
‐ that is A = NMBA ( x A , xB ) and B = NMBB ( xB , x A ) .
∂x A ∂xB
Each player increases the level of their strategy as long as their NMB is positive.
So, to find each players reaction functions, set their NMB equal to 0. Thus, the Nash
Equilibrium is described by two conditions,
246
NMBA ( x A , xB ) = 0 and NMBB ( xB , x A ) = 0 . (13.21)
Solving those two conditions gives the solutions for each choice in terms of the level
of the other player’s strategy,
x AR ( xB ) and xBR ( x A ) or RA ( xB ) and RB ( x A ) . (13.22)
While we are presenting the case of only 2 players, it works the same way with any
number of players, n. The only difference is that there would be n NMBs to set equal
to 0, which would give n reaction functions, which would provide n equations in n
unknowns that could be solved for the solution to the game.
How do we find the slope of the reactions functions? That is, what determines
whether A will increase or decrease xA when they expect xB to be higher? We
described the intuitive arguments above. If an increase in xB increases NMBA, player
A responds by increasing xA, and the reaction function slopes up. If an increase in xB
decreases NMBA, player A responds by decreasing xA, and the reaction function
slopes down.
Lets confirm this mathematically for A (the procedure would be the same for B).
First, substitute the solution for A’s choice into the rule that NMB is 0 for A’s choice,
so we write
NMBA ( x AR ( xB ) , xB ) = 0 . (13.23)
Then, take the derivative of both sides of (13.23) with respect to xB:
∂NMBA dx AR ∂NMBA
+ = 0 . (13.24)
∂x A dxB ∂xB
Intuitively, changes in xB have two effects on NMBA when xA has been chosen
optimally. First, the direct effect is the second term in equation (13.24). In the
advertising example, B’s advertising increased NMBA if α was negative. If firms were
choosing quantities to sell and the goods are substitutes, if B sold more, it would
decrease the NMB to A of selling more. Second is the indirect effect. In order to keep
NMB equal to 0, xA must change to induce a change in NMBA to counter the effect of
the change in xB on NMBA. This is the first term in equation (13.24). From the chain
rule, the indirect effect is the derivative of NMBA with respect to xA times the
derivative of x AR with respect to xB.
Equation (13.24) can be rearranged as follows:
∂NMBA dx AR ∂NMBA
=− (13.25)
∂x A dxB ∂xB
or
dxAR ∂NMBA ∂xB
=− . (13.26)
dxB ∂NMBA ∂x A
247
This says the slope of the reaction function is the opposite of the ratio of the
derivative of NMBA with respect to B’s strategy to the derivative of NMBA with
respect to A’s strategy. Diminishing marginal returns implies the effect of xA on NMB
is negative. In fact, if it were positive, setting NMB equal to 0 would find a minimum
(a valley) not a maximum (a peak) of A’s profit. The fact that the denominator of the
right hand side of equation (13.26) is negative cancels the fact that we are looking
for the negative of the ratio. Thus, the sign of the slope of the reaction function,
dxA/dxB, is the same as the sign of ∂NMBA ∂xB .
So, mathematically, if an increase in xB increases NMBA, A’s reaction function
slopes up, and, vice versa. This is exactly as we argued intuitively and as we showed
to be true in the advertising example. However, we have now established that it is
true in general. This means we can use knowledge of the effect of one player’s
strategy on the net marginal benefit of the other player’s strategy to determine the
shape of the reaction functions generally. We can then use that knowledge to
analyze the effects of outside factors on the equilibrium outcome. For example, the
impact of a change in cost on the outcome of the advertising game above. The same
type of analysis can be made whether the players are choosing prices, quantities,
advertising, or, some other variable.
We can also use this to determine how a game will change when one player
moves first. Suppose player B moves first. Since B moves first, they can count on A
responding optimally at the second stage. We can therefore substitute A’s reaction
function into B’s payoff, to get πB(xB,xA(xB)). Maximizing this gives
dπ B ∂π B ∂π B dx A
= + = 0 . (13.27)
dxB ∂xB ∂x A dxB
In equation (13.27), player B takes account of the direct effect of their choice on
their product and also the indirect effect of their choice on their opponents ensuing
choice and the effect of that choice on their profit.
If an increase in xA decreases B’s payoff, then B wants to encourage A to select a
lower level of xA. If the reaction functions slope up, B chooses a lower level of xB to
induce A to choose a lower xA. This was the case with the advertising example. If the
reaction functions slope down, B chooses a higher level of xB to induce A to choose a
lower xA. For example, if x is a choice of quantity, B wants A to produce less, so, B
would produce more when they move first.
The other possibility is that an increase in xA increases B’s payoff, in which case
B wants to encourage A to select a higher level of xA. If the reaction functions slope
up, B chooses a higher level of xB to induce A to choose a higher xA. This is the case if
x is price and the two players sell substitute goods. To get the second mover to
charge a higher price, the first mover chooses a higher price. If the reaction
functions slope down, B chooses a lower level of xB to induce A to choose a higher
level of xA.
248
Chapter 14
Repeated Games
In repeated games, at least one player plays the game more than once. One of the
more important things about this type of game is that some responses that were not
rational in a one‐shot game may be rational in a repeated game. Let’s look at a
scenario where two firms are competing in an industry, and they each have the
option to compete hard (low prices and high quantities) or soft (high prices and low
quantities):
David
Hard Soft
Hard M: 0 M: 15
Mike D: 0 D: ‐10
Soft M: ‐10 M: 10
D: 15 D: 10
Looking at the best responses (highlighted) we can see they intersect in the top‐
left cell. This is the standard one‐shot equilibrium, similar to the prisoner’s dilemma.
However, if there were some way for Mike and David to cooperate, they could both
make a payoff of 10 (bottom‐right cell) and be better off. Let’s suppose they both
knew that they were going to play the game 10 times. Is it possible that the two
might now have an incentive to cooperate by playing soft? That is, might each player
try to earn and retain the good will of the opposing player by playing soft initially
and then continuing to play soft as long as the other player does so?
Ask yourself what would happen the 10th and final time the game is played. At
this point, both players know it is the last round. There is no longer any reason to
retain the other player’s good will, since there is no future. Rationally, each player
would play hard in round 10. If they expect the other to play hard, this is protecting
themselves. If they thought for some reason the other would play soft, they make
more profit playing hard and there is no cost in terms of a loss of future good will.
Each player should thus predict what the Nash Equilibrium play in the last round
will be for both players to play Hard. Therefore, knowing this, there is no reason to
bother maintaining good will in the next to last period either. Thus, they should both
anticipate that equilibrium play in the next to last period is for both to play hard.
Knowing that, there is no reason to try to cooperate in the second to last period, and,
so on and so on. Cooperation unravels from the known final period all the way to
the first period of the game. In a repeated game with an end period that is known for
certainty, the Nash Equilibrium of the repeated game is just the Nash Equilibrium of
the one shot game repeated again and again.
249
Infinitely or Indefinitely Repeated Games
What is the end period is not known with certainty. There are two possibilities
here. The first is that the game is literally repeated infinitely ‐ it simply never ends.
The second possibility is that no one knows when the game will end. That is, there is
some chance the game may end after any period. That probability may represent the
chance a product becomes obsolete, or a player dies or retires, or that there is some
major change in the structure of the market. We will focus on the latter type of
game, in which there is some chance the game will end after any given round. In
these types of games, there may be an incentive to maintain good will, since the
players don’t know if the game will continue next period or not when choosing their
strategies.
We will let f represent the (constant) probability that the game ends after any
play, and r represent the interest rate used to discount future payoffs to obtain their
present value. Since the game could potentially continue forever, there are too many
strategies to define individually. Therefore, we will focus on a couple of strategies
that seem particularly reasonable in repeated games. Both are examples of trigger
strategies – strategies where an observed action on the part of your opponent
triggers a change in your play in future rounds.
• Tit for tat: In the first round, play cooperatively. Thereafter play whatever the
opponent played in the previous round. For example, in the price competition
game, if Mike were to play soft in period one, and then every round thereafter
play what David played the round before, Mike would be playing a tit‐for‐tat
strategy. In other words, Mike “pays back” David with whatever response
David gave Mike in the previous play. An important variation is tit for tat with
forgiveness, whereby non‐cooperative play is punished for a while, but then
the player attempts to establish cooperation again.
• Grim: In the first period, play cooperatively. Then, cooperate as long as your
competitor cooperates, but as soon as he stops cooperating, you punish him
every period thereafter. Think of it as rewarding cooperation forever, and
punishing non‐cooperation forever. We will focus on this strategy because it
gives the strongest possible incentives to cooperate. If Grim can’t induce your
opponent to cooperate with you, nothing can.
Let’s see if Grim is an equilibrium for our previous game if we assume r=0.1 and
f=0.12. Note the payoffs for cooperating are 10 (the bottom‐right cell).
David
Hard Soft
Hard M: 0 M: 15
Mike D: 0 D: ‐10
Soft M: ‐10 M: 10
D: 15 D: 10
250
Assume that Mike plays grim. Is it a best response for David to play grim in return?
If David plays grim, he will get 10 per round as long as the game goes on. His
expected profit is:
(0.88) (0.88)2
E (π | grim ) = 10 + 10 + 10 + ...
(1 + .1) (1 + .1) 2
The first 10 is from the first period and is not discounted. The second 10 he gets
only if the game continues (with probability 1‐f = 1‐0.12 = 0.88), and if he does get
the second 10 it has to be discounted back to its present value (dividing by 1+0.1).
The third 10 only happens if the game continues in both periods (0.88)2 and then is
discounted back two periods, etc. Thus, in general, the expected profit is
t t
∞
⎛ 1− f ⎞ ∞
⎛ 0.88 ⎞ ∞
10∑ ⎜ ∑ ∑ ( 0.8) .
t
⎟ = 10 ⎜ ⎟ = 10
t =0 ⎝ 1 + r ⎠ t = 0 ⎝ 1.1 ⎠ t =0
where t is the time period (round of play).
∞
To find the value of ∑ ( 0.8 ) , note that it is simply an infinite geometric series.
t
t =0
So, we can make use of the following convenient formula (which we will not prove):
∞
1
∑a 0
t
=
1− a
.
(The familiar formula for the present value of a perpetuity with a periodic payment
of A beginning at time 0 with interest rate r is just a special application of this
result.) For our purpose, a=0.8, so:
∞
1 1
∑ ( 0.8)
t
= = = 5 .
t =0 1 − 0.8 0.2
So the expected payoff above, in which David plays grim in response to Mike’s play
of grim boils down to
t
⎛ 1− f ⎞
∞
10∑ ⎜ ⎟ = 10 ⋅ 5 = 50 .
t =0 ⎝ 1 + r ⎠
Now let’s consider what would happen if David did not play grim. We will refer
to this as “cheating”, since David is not cooperating when the other player is trying
to. If David does not cooperate in the current round, he would play Hard. Matched
against Mike’s play of Soft, David would earn a payoff of $15. However, thereafter
Mike would play Hard, since his strategy is grim. David’s best play in all later rounds
is Hard. Therefore, in all subsequent periods, David’s payoff will be 0. Then, David’s
expected profit from the time he cheats going forward is
⎛ 1 − .12 ⎞ ⎛ 1 − .12 ⎞
1 2
E (π | cheat ) = 15 + ⎜ ⎟ 0+⎜ ⎟ 0+ = 15 .
⎝ 1 + .1 ⎠ ⎝ 1 + .1 ⎠
251
Since 50>15, David would rather play grim and “cooperate” each round by playing
Soft until Mike fails to play Soft, rather than “cheat” at some point and play Hard.
Since it is in David’s interest to play grim in response to Mike’s play of grim, we can
conclude it would also be in Mike’s interest to play grim in response to David’s play
of grim, because the game is symmetric. Thus, both players playing grim constitutes
a Nash Equilibrium in the infinitely (or indefinitely) repeated game.
Since playing grim is an equilibrium in this game (in other words, playing grim is
a best response to playing grim), cooperation is possible. Note that this doesn’t
mean that non‐cooperation isn’t also possible. Since non‐cooperation (both playing
hard) was our original Nash equilibrium when viewing this as a one‐shot game,
playing hard forever is also an equilibrium in a repeated game. The fact that it is
repeated doesn’t eliminate the original equilibrium; it just adds new potential
equilibria. In any repeated game, repeating the one‐shot equilibrium play each
round is always an equilibrium.
To generalize the above example, let’s define some notation:
• πCOOP: this is one‐period payoff when both players cooperate
• πNE: this is the payoff from the one‐shot Nash equilibrium of the game
• πCHEAT: this is the one‐period payoff when one player exploits the other; note,
the “non‐cheating” player must be cooperating in expectation that the
“cheating” player will also cooperate
Assume there are two players, A and B. Assume one player plays grim. Is it a best
response for the other to play grim as well? The expected payoff of grim against
grim is:
⎛ 1− f ⎞ ⎛ 1− f ⎞
E (π | grim) = πCOOP + ⎜ ⎟ πCOOP + ⎜ ⎟ πCOOP +
⎝ 1+ r ⎠ ⎝ 1+ r ⎠
t
.
∞
⎛ 1− f ⎞
E (π | grim) = πCOOP ∑ ⎜ ⎟
t =0 ⎝ 1 + r ⎠
Using the formula for the value of an infinite geometric series, we can simplify
the above summation. For our purpose, first note
⎛ 1− f ⎞
a=⎜ ⎟ .
⎝ 1+ r ⎠
Then,
1 1 1 1 1+ r
= = = = .
1− a ⎛ 1 − f ⎞ ⎛ 1 + r 1− f ⎞ ⎛ 1 + r −1+ f ⎞ r + f
1− ⎜ ⎟ ⎜ − ⎟ ⎜ ⎟
⎝ 1+ r ⎠ ⎝ 1+ r 1+ r ⎠ ⎝ 1+ r ⎠
So, to convert a flow of uncertain future payments starting immediately to its
1+ r
present value, multiply by . Suppose instead the payment starts one period
r+ f
252
from now (either at the end of the current period or the beginning of the next), we
simply subtract one from this, to capture the fact that there is no current payment.
1+ r 1 + r − (r + f ) 1 − f
That yields −1 = = . So, to convert a flow of uncertain future
r+ f r+ f r+ f
1− f
payments starting in one period to its present value, multiply by . Note that if
r+ f
f=0, this gives the well known multiplier to find the value of a perpetuity, 1/r.
Returning to the problem at hand, we have:
⎛ 1+ r ⎞
E (π | grim) = πCOOP ⎜ ⎟ .
⎝r+ f ⎠
This expression is just the expected present value from playing grim against grim
and cooperating forever. It will be soon be useful to not that we could also express
this as:
⎛ 1− f ⎞
E (π | grim) = πCOOP + πCOOP ⎜ ⎟ .
⎝r+ f ⎠
This latter expression simply breaks the expected present value into the initial
payoff plus the expected present value of the uncertain perpetuity beginning in one
period.
Now let’s look at the expected value from cheating, evaluated from the period at
which the player cheats going forward. Remember, the player that cheats gets one
period of πCHEAT followed by the one‐shot Nash equilibrium forever when playing
against grim.
⎛ 1− f ⎛ 1− f ⎞
2
⎞
E (π | cheat ) = πCHEAT + ⎜ ⎟ π NE + ⎜ ⎟ π NE + .
⎝ 1+ r ⎠ ⎝ 1+ r ⎠
The payoff is the profit of “cheating” plus the expected present value of receiving
the one shot Nash equilibrium profit beginning one period out and continuing until
the game ends. This is just:
t
∞
⎛ 1− f ⎞
E (π | cheat ) = πCHEAT + π NE ∑ ⎜ ⎟ .
t =1 ⎝ 1 + r ⎠
Making use of the fact that the expected present value of an uncertain perpetuity
1− f
beginning in one period is found by multiplying by , this can be written as:
r+ f
⎛ 1− f ⎞
E (π | cheat ) = πCHEAT + ⎜ ⎟ π NE .
⎝r+ f ⎠
253
Cooperation in a repeated game is possible if the expected present value of
playing grim in response to grim exceeds the expected present value of “cheating”,
that is if E(grim) > E(cheat). From the work above, this is:
⎛ 1− f ⎞ ⎛ 1− f ⎞
πCOOP + πCOOP ⎜ ⎟ ≥ πCHEAT + ⎜ ⎟ π NE .
⎝r+ f ⎠ ⎝r+ f ⎠
Rearranging this equation, we get
⎛ 1− f ⎞
(πCHEAT − πCOOP ) ≤ ⎜ ⎟ (πCOOP − π NE ) .
⎝r+ f ⎠
The left hand side is the current gain from cheating instead of cooperating. The cost
of cheating this period is that in all future periods a cheater will get the Nash
equilibrium profit, not the cooperative profit. So, the right side is the expected
present value of the future profits that are given up by a cheater. So, this inequality
just says that if the expected present value of the future profits lost if the player
cheats at the current play exceeds the current gain from cheating instead of
cooperating, cooperation is a Nash equilibrium.
Since we’re saying the left‐hand side must be larger for cooperation to be in the
player’s self interest, several important conclusions follow from the previous
paragraph. If πCOOP increases, cooperation will be more likely (note (1‐f)/(r+f) is less
than one). Conversely, if πCHEAT increases, cooperation will be less likely. If r, the
interest rate, increases, that means we’re valuing present profits higher relative to
future profits, making cheating more attractive. If f, the probability that the game
ends, increases, the future is less valuable, and again it will be harder to cooperate.
An increase in the number of players makes cooperation more difficult to sustain
for two reasons. First, the gains to cooperating are likely to fall with the number of
players. If a cartel makes all the firms act collectively as a monopoly, the profit each
π
firm gets individually is (1/n)th of the monopoly profit, MONOPOLY . Thus, as n
n
increases, each firm’s “cooperating” profit decreases, which decreases the incentive
to cooperate. Second, cooperation hangs on the promise of reward versus threat of
punishment. In the real world, punishment can be much more difficult because it
can be hard to detect cheating. If we let c be the cost of monitoring one firm, and n
be the number of players, than each player has to monitor (n‐1) firms (since they
don’t have to monitor themselves) to make sure no one is “cheating.” Thus, an
individual’s monitoring costs are c(n‐1). So, monitoring costs are proportional to n‐1
while the gain to cooperating will tend to be inversely proportional to n.
Individually, the gain to cooperating thus falls for two reasons. From the cartel point
of view, The total cost of monitoring is c(n‐1)⋅n since there are n firms each
incurring costs of c(n‐1). So, total monitoring costs increase with the square of n,
and thus can seriously erode the total gain to cooperation in a large cartel.
If information were free and perfect, monitoring would not be needed. If any
player’s profit deviated from what they expected based on the “cooperative”
254
agreement, they would know someone “cheated” on the agreement. However,
monitoring and punishing players can be complicated due to the presence of noise.
Some things happen in the real world due to random events we have not explicitly
included in the model. The triggers for punishments in cooperative strategies must
be observable signals that everyone agrees on. But, noise means that the signal may
be received sometimes when cheating occurs, and, it may be received other times
when cheating did not occur.
In fact, if the cooperative agreement (whether implicit or explicit, like a cartel)
represents a cooperative Nash equilibrium, then it is in no one’s interest to cheat. In
that case, when a signal is received that indicates cheating, the members should
KNOW that no one cheated. But, the signal still triggers punishment, even though
everyone knows they are punishing even though no one cheated. So, maybe it would
make sense not to pull the trigger on punishment when the agreed upon signal says
to do so? While the reasoning may seem subtle, doing that would entirely
undermine cooperation. If all players believed that everyone believed no one would
cheat, and thus would not punish, it becomes in everyone’s interest to cheat, and,
cooperation totally breaks down. For cooperation to work, there has to be an
“enforcer” in the group who everyone believes will pull the trigger when the agreed
upon signal arrives, whether or not anyone actually believes anyone cheated.
Otherwise, everyone cheats all the time.
An example: Suppose OPEC (the oil cartel – the Organization of Petroleum
Exporting Companies) agrees on a new production quota, and expects crude prices
to average $95 per barrel. Suppose, then, they actually observe average prices of
$75 per barrel. Does that mean that some members are “cheating” by producing
over their quota and thus driving prices down? Or, does it mean that supplies from
other producing nations are unexpectedly high for some unknown reason, or, that
world oil demand is unexpectedly low? It may not be possible to know the answer
for certain. Further, to the extent the answer can be determined, it may take a long
time. By then, if it is due to “cheaters,” they have had a long time to cheat. But, if the
members start punishing one another, they may be doing so only because demand
was lower than expected, not because anyone cheated. The “enforcer” in OPEC is
Saudi Arabia. They are the enforcer due to their large excess capacity. They maintain
discipline in OPEC because they have the power to flood the market and reduce
prices for everyone. That threat is useless unless they are willing to punish everyone
even if they are not completely sure how much any particular member cheated, and,
how much the observed price is due to factors outside every members control.
Noise means it is a good idea to build some “forgiveness” into trigger strategies.
Perhaps, if everyone cooperates, the average price is expected to be $95. But, since
there is not certainty, perhaps punishment is only triggered if average price falls
below $80. Then, after some period of punishment, the strategy should allow for the
establishment of new targets. The occasional delivery of punishment (for random
reasons in equilibrium) ensures that no one ever cheats. The lower target for the
trigger reduces the frequency of the punishment, and, the fact that the punishment
is limited in duration allows the reestablishment of cooperation. But, all of these
255
things make it harder to punish cheaters, and, because punishment is somewhat
random, reduce the gains to cooperation. So, noise makes cooperation harder to
sustain. Noise is also related to monitoring costs, in that the noisier the environment
the more expensive it may be to collect useful (but imperfect) signals on the
behavior of other players.
In summary, cooperation is more likely where there is less noise, fewer players,
less probability of the game ending, less profit from cheating, and lower interest
rates. To hold a cartel together, you need all of those things favorable; otherwise, the
gains from cheating become too large.
Repeated Games with Reputation Effects
We are now going to complicate a typical game by considering reputation as a
factor. Reputation is just what it sounds like – an intangible benefit that influences
players’ behavior. Up to this point, we’ve assumed all players act rationally with
respect to their payoffs; in other words, players maximize their expected payoff in a
given situation. However, some players may care about other things, for example
simply being honest. So, perhaps there are players out there that will cooperate if
they say they will cooperate. Or, who will enter a market if they say they will. They
do these things regardless of whether or not it appears to be in their self‐interest at
the time. Cultural and social references to such behavior are common, for example,
the following two.
“Recompense injury with justice, and recompense kindness with kindness”
(Confucious).
“An honest man's word is as good as his bond” (John Ray's English Proverbs, 1670).
We will call players that play in ways like this “crazy.” We don’t mean “crazy” in
the usual sense. Only in that they do not strictly maximize their payoffs within the
rules of the game as it is written. They may simply care about things we have not
included in the model. That makes them behave in ways our model would not
predict.
The presence of a small number of such “crazy” players can completely change
the way a game is played. A player that everyone thinks is “crazy” in this way can
change the way others play, because they do not expect the “crazy” player to be
rational. Therefore, players who are not crazy may none the less seek to gain a
reputation as “crazy” to change the way other players play against them.
Lets look at a very simple example of this first. Consider the following game in
which two firms engage in price competition. Each strategy is just how competitive
the firms price their products, with hard being pricing below cost in an attempt to
undercut their competitor, soft being something like monopoly prices, and, medium
somewhere in between.
256
Player B
Soft Medium Hard
A: 10 A: ‐5 A: ‐10
Soft
B: 10 B: 15 B: 0
Player A: 15 A: 0 A: ‐5
A Medium
B: ‐5 B: 0 B: ‐5
A: 0 A: ‐5 A: ‐10
Hard
B: ‐10 B: ‐5 B: ‐10
Looking at their best responses (highlighted), we see both players have a
dominant strategy to play medium in a one shot game, and thus the one shot Nash
equilibrium is the middle cell. Let’s assume that this is a repeated game with a
known end period ‐ it ends in time period T. With only rational players, unraveling
means cooperation is not possible at all with the known end period. However, let us
allow for the possibility of “crazy” players in this game. Suppose there is a
probability of f of a crazy player. For purposes of this game assume a “crazy” player
plays soft in the first period and continues to do so as long as their competitor has
played soft in previous rounds. But, if the competitor plays medium or hard in one
period, the “crazy” player plays hard the following period to punish the competitor.
After the period in which the punishment is delivered, the “crazy” player will play
medium forever.
Can this change the way a rational player will act? Might it be rational for a
player that is not crazy to mimic the behavior of a crazy player, at least for a while?
Looking at the end of the game in time period T, all players know the game will be
over. Thus, a rational player will play their strongly dominant strategy of medium,
since even if their competitor were crazy, there isn’t time left for the crazy player to
punish them.
What about period T‐1, one period before the final period of the game? Might it
be in their interest to play soft in period T‐1? Or, are they sure to play medium, in
which case the cooperation will unravel all the way back to the beginning?
Consider the following strategy on the part of sane players – in all periods but
the last, play soft until an opponent plays medium, after which play medium, and, in
the last period, play medium. Suppose player A thinks B is playing that strategy if
they are sane, but, there is a chance, f, that B is crazy. Is it in A’s best interest to
adopt the same strategy as B? Or, will they just play their dominant strategy?
Lets look at period T‐1. If A plays soft, he gets 10 in period T‐1. Then, in period T,
if B is crazy, B plays soft, A plays Medium, and, A earns a payoff of 15. But, if B is not
crazy, they both play medium and both earn 0. Evaluated at period T‐1, the expected
present value of playing soft in period T‐1 then medium in T, EA(πA|S,M), is then
f 1− f f
E (π A | S , M ) = 10 + 15 + 0 = 10 + 15 .
1+ r 1+ r 1+ r
257
The alternative would be to play medium at T‐1. Then, A makes 15 in period T. If
B is sane, both play medium in period T, and, A makes 0. If B is crazy, B plays hard in
period T and A makes ‐5. The expected payoff, evaluated at period T‐1, of playing
medium in both remaining periods, EA(πA|M,M), is
f 1− f f
E(π A | M, M) = 15 − 5+ 0 = 15 − 5 .
1+ r 1+ r 1+ r
Now that we have the expected profit for both of A’s strategies, we want to know
when the first is more profitable than the second. That is, when:
E(π A | S, M) ≥ E(π A | M, M)
f f
10 + 15 ≥15 − 5
1+ r 1+ r
f .
20 ≥ 5
1+ r
1+ r
f ≥
4
If the interest rate is low, f need be only slightly larger than 0.25 for it to be
worth playing soft in period T‐1 to maintain the possibility of the good will of a
crazy opponent in period T. Similar reasoning means it is worthwhile to play soft in
all earlier periods, too. The (S,M) strategy essentially boils down to player A acting
“crazy” up to time period T, the last time period.
There are several important things to understand about the previous example.
First, we are not saying it is in player A’s best interest to actually be “crazy.” A
“crazy” player will always play soft if their competitor did, even in the final round.
We know that it is never rational to play soft in the final round; so what player A is
considering is not actually whether to become “crazy,” but whether it is worth
acting as if she were “crazy” to take advantage of the value of that reputation.
Further, actually being “crazy” is not a choice. Rational players may choose to mimic
crazy ones, at least for a while, to develop a valuable reputation as a cooperator that
allows them to make $10 every period before the last instead of 0, and, then to have
a chance at making 15 the last period.
Another conclusion that follows from this is a reputation only has value as long
as there is time left in the game to make use of it. The closer to the end of the game
you are, the less valuable a reputation becomes. Finally, a rational player that has
gained a reputation will want to “milk” that reputation in the last period. This means
that even though it may be rational for player A to act “crazy” in all periods up to
period t, she will “milk” her reputation by playing medium in the final period, period
t. This is simply because it is no longer beneficial to maintain her reputation once
the game has ended.
Now that we have considered the case where a reputation as a cooperator may
be beneficial, let’s look at the case where a reputation as a fighter may be beneficial.
For this situation, we will use an entry game as an example, but, allow the entrant to
258
move first. The entrant can enter or not, then, the incumbent has the option to fight
or accommodate the entrant. The game tree is shown below:
Enter Accommodate
Entrant Incumbent E: 10, I: 20
Fight
No
E: 0, I: 40 E: ‐10, I: 10
Using the red lines to eliminate irrational responses, we see that the sub‐game
perfect Nash equilibrium is that the entrant enters, and the incumbent
accommodates. Now, just like when we were considering cooperation in the
previous game, we want to know whether or not it is rational for the incumbent to
act as a fighter. Suppose there is some chance, f, that an entrant may encounter a
crazy incumbent who always fights. Will a rational incumbent want to mimic the
crazy one, and, will that deter the entrant? Let’s look at the game again, adding the
value of that reputation:
Enter Accommodate
Entrant Incumbent E: 10, I: 20 + VA
Fight
No
E: ‐10, I: 10 + VF
E: 0, I: 40+VN
Notice the addition of the variables to the incumbent’s payoffs. VF is added to the
incumbent’s payoff when he fights. This signifies the expected present value of
future payoffs if the incumbent fights; think of it as the future profits he earns by
keeping other competitors out of the industry. VA is added to the incumbent’s payoff
when he accommodates. Similarly, it’s the expected present value of future payoffs if
the incumbent accommodates. Finally, VN is the expected present value of future
profits if the incumbent neither fights nor accommodates because the entrant does
not enter.
Before we analyze this game, it’s important to note that the incumbent is playing
this game many times, while the entrant is only playing it once. This is where the
incumbent’s reputation comes in to play; he is trying to value the reputation of
acting like a fighter versus acting like an accommodator. He will face many entrants,
and what he does now will impact how they act in the future. Also, you can look at
fighting in this game similar to acting “crazy” in the last, since fighting isn’t the SPNE.
The incumbent is deciding whether or not to act “crazy,” not whether or not to
actually become “crazy.”
259
Now let’s ask a question. When is it rational for the incumbent to fight (i.e. act
“crazy”)? He fights when his expected payoff from fighting is greater than his
expected payoff from accommodating. Looking at his payoffs, we see
E(π|fight) > E(π|accom)
10 + VF > 20 + VA
VF – VA > 10
So when the difference between the value of a reputation as a fighter and the
value of a reputation as an accommodator is greater than 10, the incumbent would
rather fight. We’re not going to solve this game explicitly, like we did the last. But we
can make some general observations.
Imagine that if the incumbent fights off the first entrant, and that that is enough
to keep all future entrants from entering. What would VF be? Well, the period after
the incumbent fights, no more entrants will ever enter again. Thus, he will earn a
payoff of 40 forever. Taking into account discounting, the present value of that 40 is
(40/r). Since that is what he earns due to him fighting in the first round, VF = (40/r).
Similarly, imagine if the incumbent accommodates the first entrant, and that that
is enough to show all future entrants that he is an accommodator. What would VA
be? After the first period, future entrants will always enter, since they know the
incumbent will always accommodate. Since he earns a payoff of 20 for
accommodating, that payoff forever is just (20/r). Since that is what he earns as a
result of accommodating in the first round, VA = (20/r). So the conclusion we can
make about our above inequality, VF – VA > 10, is that if the incumbent plays the
game a long time, the left hand side of the inequality, VF – VA, could approach
40 − 20 20
= , which clearly exceeds 10.
r r
Let’s now define g to be the probability that a “sane” incumbent fights;
remember that the incumbent is just acting like a fighter in order to secure that
reputation. We already solved the inequality that tells us when it is worthwhile to
fight, so g is just the probability that it holds, or
Pr(VF – VA > 10)
Now let’s look at the game from the entrant’s perspective. When the entrant
decides upon entering or not entering, he ultimately wants to predict whether or
not the incumbent will fight him. If he does, the entrant will stay out, since his payoff
would be ‐10. The entrant may be facing one of three types of incumbents ‐ a crazy
incumbent, a sane one that fights anyway, or, a sane one that will accommodate. The
total probability of a fight if the incumbent enters is f+g. The entrant stays out if
E(π|Enter) < 0
(1‐(f+g))*10 – (f+g)*10 < 0
10 < 20(f+g)
0.5 < (f+g)
260
which says that if the total chance that the incumbent will fight is greater than 50%,
the entrant will stay out.
The following observations are important about reputations in repeated games:
• It is possible for g and VF – VA to be high, even if f is low, if r is low and the
incumbent will play a long time. The intuitive interpretation of this is that an
incumbent that is playing a long time, who may not be crazy (low f), but who
values the future a lot (low r), will still fight to secure a reputation as a fighter
to keep entrants out.
• A reputation is not valuable if it is “cheap” in the sense that it is easy to come
by. Remember, to gain the reputation as a fighter, the incumbent had to fight
occasionally, which cost him payoffs in the period. If the forgone payoffs were
really low, however, the entrant would know that the incumbent could get
back that reputation at any time, and therefore that the incumbent will not be
willing to sacrifice much to keep it.
• The value of a reputation drops as “retirement” nears. The inherent value of a
reputation is in the fact that it secures future profits. As the remaining time
periods of the game dwindle down, the reputation won’t be as valuable to the
incumbent, and thus it will be less likely that he will try to maintain it.
261
Chapter 14 Terminology
The following is a list of terms that you should know in order to discuss and
apply the material from this chapter.
Unraveling The process of starting at the end of a finite game to see how players
will act. Because players know when the final stage of the game will occur, they will
have no incentive to cooperate in that stage, and thus the penultimate stage, and so
on until the first round.
Probability of Obsolescence The probability that the game will end after any play.
As this probability increases, cooperation becomes less likely to occur.
Trigger Strategy Strategy in which a player acts a certain way until the opponent
acts differently, at which point the player changes his behavior.
Tit for Tat Strategy Trigger strategy in which a player does anything his opponent
did in the previous period.
Grim Trigger Strategy Strategy in which the players play cooperatively in the first
period, but as soon as one player stops cooperating, the other punishes him every
period thereafter.
Monitoring Cost The cost of enforcing and punishing players for not cooperating.
As this cost increases, cooperation becomes less likely to occur.
Noise The sheer randomness in the real world that cannot be modeled but can
affect outcomes. As the level of noise increases, cooperation becomes less likely to
occur.
Cartel A cooperative and informal agreement between two or more firms, usually
in an oligopolistic industry.
Reputation An intangible benefit that influences other players’ behavior. Certain
players may forgo the highest payoff today in order to secure a reputation for the
future.
Crazy Player A player who is not maximizing his current possible payoff. This
could be because of the value of having a reputation.
262
Part 5
Product Market Structure,
Strategy, and Analysis
263
Chapter 15
Homogenous Product Markets
When talking about market structure, firms usually fall somewhere along a
spectrum that describes how competitive their industry is. At one end is a
monopoly, which consists of a single firm that faces no competition, and has
complete control over the market price. At the other end is perfect competition, in
which there are many firms, and each individual firm has no control over price
levels; they are price takers, as described earlier in the course. Firms that are
engaging in perfect competition have no strategic decisions to make, since they take
their price from the market; nothing they do will have any significant impact on the
other firms in the market.
An oligopoly falls somewhere in the middle of the spectrum. An industry that is
an oligopoly consists of at least two firms, each having significant market share.
Since there are a relatively small number of large players, the managerial decisions
that each firm makes have a significant impact on the demand that other firms
experience. It is a market in which game theory can be used to help predict how
firms will act, since all of the firms are strategically connected.
Before talking more about an
p MC
oligopoly, let’s illustrate a typical
monopoly on a graph. Demand is D pMON
and marginal revenue is MR. AC
Remember, MR<p since you have to π
lower price to sell one more unit, so
D
the MR curve will be less then the D ACMON
curve. The red lines are marginal
MR
cost (MC) and average cost (AC). MC
must cross AC where the AC line is at QMON Q
a minimum, because a higher
marginal cost will pull average cost up, and a lower marginal cost will pull average
cost down.
A monopoly will maximize profit where MR=MC, shown by pMON and QMON. The
profit is just total revenue minus total cost. The AC curve tells us the average cost of
each unit, so the profit is just the area above ACMON and below PMON, shown by the
blue box in the graph.
In the free market system, when a firm is making profit, other firms are bound to
enter. This will happen unless there are barriers to entry. These could be legal
barriers, such as regulations or patents; or they could be due to sheer economies of
scale, such as one firm always being able to produce a given output more cheaply
than two firms could. Unless there are barriers to entry, profits will attract entrants.
(Note: monopoly rights don’t necessarily guarantee profits if, for example, you had
monopoly rights to produce VHS tapes or some other obsolete product).
264
Bertrand (Price) Competition with Homogenous Products
Now let’s imagine we have two players, A and B, that are engaging in price
competition. Assume these two firms make and sell homogenous products; that
is, assume that they are selling the same thing, and the only difference is the price
they charge. Therefore, the customer base is indifferent between their products and
will just buy from whichever firm has the lower price. Next, assume both players
announce their price simultaneously, and the firm with the lowest price meets
whatever level of demand they encounter at that price. If they announce the same
price, they split the market evenly. Finally, assume c is the constant per unit cost for
both firms, and ε is the smallest increment by which price can be changed.
To determine the Nash equilibrium of this game, let’s look at player A’s best
responses. (Since the game is symmetric, these mirror player B’s best responses.)
If player B sets a price of pB, as long as pB>c+ε, player A will set a price of pA=pB‐ε
in order to undercut player B and get the whole market.
If pB=c+ε, player A won’t want to undercut player B anymore, since doing so
would cause pA=c, which means player A is earning just enough to cover costs and
thus not earning profit. Therefore, if pB=c+ε, player A will set the same price at
pA=c+ε, and they will earn a profit of ε per unit, and split the market evenly.
If pB=c, player A doesn’t really care what price it sets, since setting it at c would
mean 0 profit, and setting it above would not get any market share.
Just as when we were looking at game theory tables, we want to know where the
intersections of best responses are. We can see the only two points where their
responses intersect are either pricing at cost, or pricing just slightly above cost.
Since each firm has the incentive to lower price and capture the entire market, their
prices will be driven down to the competitive level (unit cost or unit cost plus ε) and
neither firm will make any significant profit.
This result also has applicability to a monopolist producing a highly durable
good. Imagine for simplicity that the good lasts forever, such as diamonds, that these
goods never loose any value, and, that the inflation adjusted interest rate is near 0.
Suppose consumers expect that the price charged today will be the same as the price
charged tomorrow. Once the monopolist sets price, everyone willing to buy at that
price buys. When tomorrow rolls around, if the monopolist wants to sell more, he
will have to lower price. The next day he would have to lower price again, and, so on,
until he was pricing at cost. But the customers today would be able to predict that,
and since it’s the same product tomorrow and the next day, they would just wait.
Thus, today’s durable goods monopolist faces price competition from his future self,
and, that competition forces him to sell his product near cost today.
In the previous example, we saw how price competition can drive a two‐firm
industry to sell their products at or near cost (or even a durable goods monopolist);
however, we don’t readily observe this in the real world. This is because we had an
underlying assumption present in the last two games that typically does not exist.
265
Referring to the first game, we saw that the Nash equilibrium was for each firm
to produce half of the industry quantity, or Q/2. We also saw that player A had the
incentive to undercut player B (when pB>c+ε) in order to capture the entire market.
However, if player A were to capture the entire market, she would need to be able to
produce Q units, that means A’s capacity must be Q units. Similarly, for B to
undercut A, their capacity must be Q units, too. Capacity is expensive, and it takes
lots of labor and time to get factories and workers in place. Why would either player
ever expend resources to build Q units of capacity when they expect to produce only
Q/2 units in the Nash equilibrium? Surely the managers would be fired for wasteful
spending if they built plants twice as large as needed. Therefore, the players will not
always be able to undercut one another, for lack of capacity. This is true in general;
price competition doesn’t make sense when dealing with homogenous (or durable)
products and when the firms must also choose their capacity.
Simultaneous Homogenous Product Price Competition with Capacity
Choice
So, to reasonably model homogenous product oligopoly, we have to include the
choice of capacity. Remember we said that the only reason a firm would engage in
price competition with another firm in this industry is if it had excess capacity to do
so; but the Nash equilibrium of that game was each firm splitting the market.
Therefore, no rational player would have excess capacity, since it is costly. We will
now see that adding capacity limits changes the strategic decisions in this game.
Let’s assume an inverse demand curve of p(Q), which just describes price as a
function of market quantity. Let’s let Q be industry quantity, and, allowing for two
firms A and B, let
Q = qA + qB
where qA, qB are player A’s and B’s quantities, respectively.
We will model price competition in a homogenous product market as a two‐
stage simultaneous move game. In the last stage, each firm announces a price and
sells what is demanded at that price, up to its capacity. In the first stage, each firm
chooses its capacity. Capacity cost is constant at k per unit and operating cost is
constant at c per unit. Letting q represent capacity, A’s capacity is q A and B’s is qB .
In choosing capacity, neither firm will have any incentive in this game to build
significant excess capacity. So, they both expect that total sales in the second period
will be the sum of the two firm’s capacities, or
Q = q A + qB .
Thus, looking ahead from the first stage, they expect that market price will be
p = p ( q A + qB ) .
Now, let’s look at player A’s profit function.
π = pq – C(q).
266
Capacity cost is k qA and with no excess capacity, operating cost is c qA . Substituting
p = p ( q A + qB ) for price, this becomes
Simultaneous Quantity (Cournot) Competition
To generalize, we know the amount firm A will produce implicitly depends on
capacity, so instead of writing qA , we can simply say that firm A will produce qA
units, and it will be understood that this was based on her choice of qA . Also, we can
define player A’s total cost in the form of a cost function C(qA). Rewriting her profit
function with these generalizations, it becomes
πA = p(qA+qB)qA – C(qA)
which says the price determined in the second round is just the price that clears the
market given total available quantity (capacity).
Let’s look at
maximizing this qB MCA
p
graphically. Market
demand is shown by
p(Q). Since A will
maximize profit based
on their best guess p(Q)
about what B will
produce, the residual Residual Demand
demand that she faces MRA
will be the industry
demand minus the qA
amount B produces Q
(qB), shown by the
267
double arrow. Since A will produce where MR=MC, qA is where A’s MR curve
intersects her MC.
We can see given the information above, this problem looks like any other profit
maximization question. The problem becomes a bit more complicated when we
analyze how exactly player A makes her best guess about the quantity that B will
produce. Let’s look at A’s profit function again, and set the derivative equal to 0:
πA = p(qA + qB)qA – C(qA)
dπ A ∂p dC
= p+ q A − A = 0 . (Using the product rule to get marginal revenue.)
dq A ∂q A dq A
Notice that the term (dC/dqA) is nothing more than A’s marginal cost, or MCA.
Now, when we used the product rule to calculate A’s marginal revenue (the first two
terms of the derivative) we said the derivative of p(qA + qB) was (dp/dqA); we can
now rewrite this as
∂p dp ∂Q
= ,
∂q A dQ ∂q A
since the right side says first how price changes with respect to industry quantity
(dp/dQ), and then how industry quantity changes with respect to A’s quantity
(∂Q/∂qA). This is the same as writing it as how price changes with respect to A’s
quantity (∂p/∂qA) as it is on the left; it’s just splitting it up into two different
derivatives. Splitting it up allows us to simplify it, however, because we know
Q=qA+qB which means
∂Q
= 1 .
∂q A
So the equation becomes
dπ A ∂p dC
= p+ qA − A = 0
dq A ∂q A dq A
dπ A dp
= p+ q A − MC A = 0
dq A dQ
dp
p+ q A = MC A .
dQ
Now, let’s let sA be A’s market share. This is just the share of the total industry
quantity that qA is; in other words, sA = (qA/Q). Solving for qA we get qA=sAQ and
plugging this in to the above equation we get
dp
p+ Qs A = MC A .
dQ
The reason we’ve gone through this mathematical manipulation is to express a
point regarding competition. In the above equation, the left side is marginal
268
revenue, and the right side is marginal cost, which we know are equal if a firm is
maximizing profit. Now consider a firm in a monopoly. What is its market share?
100%, or sMON=1. So the marginal revenue of a monopolist is
dp
MRMON = p + Q .
dQ
We also know a monopolist has to lower price in order to sell another unit. This
implies (dp/dQ) is negative, and it follows that MR<p. The marginal revenue of firm
A in competition with other firms is
dp
MRA = p + Qs A
dQ
and we can see that as the market share of A goes down (i.e. more firms are in the
industry), sA goes down. For firms in perfect competition, sA approaches 0, and MR
approaches price. For firms in an oligopoly, sA will be somewhere between 0 and 1.
Looking back at the solution to player A’s maximization problem
dp
p+ q A = MC A
dQ
remember that market demand p(Q) depends not only on qA but also on qB; thus,
when we solve this equation, we will get qA as a function of qB. This is because player
A is maximizing her profit given a “best guess” about player B’s quantity. From our
section on game theory, we described a set of best responses as a reaction function,
and this is exactly the same thing. So, solving the above will give us A’s reaction
function which depends on qB, or
qA = RA(qB)
and doing the same for player B will give us
qB = RB(qA)
which is just a function telling player B what to produce for a given “guess” of qA.
Notice the only difference between these maximization problems and the
previous ones is that there is now an unknown present that each player has to make
assumptions about. The theory of setting marginal revenue equal to marginal cost
still holds, as it always will; it’s just that what marginal revenue exactly is has
become slightly more complicated.
269
Example
Let’s look at a numerical example, and see how this uncertain variable affects a
firm’s profits. Suppose cost/unit is $5, and let market demand be P(Q)=20 ‐ .25Q.
The profit of a monopolist is
π MON = (20 − .25Q)Q − 5Q
Q
MR = 20 − = 5 = MC (taking the derivative and setting equal to 0)
2
Q
= 15 ⇒ Q = 30
2
P = 20 − .25(30) ⇒ P = 12.50
π MON = 30(12.50 − 5) = 225
The profit of a firm in oligopoly (Cournot) is
π A = (20 − .25(qA + qB ))qA − 5qA
π A = (20 − .25qA − .25qB )qA − 5qA
dπ A
= 20 − .25qA − .25qB − .25qA − 5 = 0
dqA
(To make a point, refer back to the graph of A’s residual market demand; it was
shifted down by expectations of how much B was going to produce. In this equation,
the ‐.25qB represents that shift.)
20 − .25qB − .5qA = 5
15 − .25qB = .5qA
qA = RA (qB ) = 30 − .5qB
(Another point: if A is a monopolist, B doesn’t produce anything, so qB=0 and qA
would be 30, the same quantity that our monopolist above produced.)
Since B’s cost is the same, this game is “symmetric”, so going through the same
process will yield an identical reaction function:
qB = RB (qA ) = 30 − .5qA
270
Now let’s look at a graph of both
players’ reaction functions. The RA is qB
A’s reaction function, and the RB is
B’s reaction function, while the 60
horizontal axis is the quantity A RA
produces, and the vertical axis is the
quantity B produces. We said if B
produces 0, A will want to produce 30
30; that is, why the horizontal qBNE
intercept for A’s reaction function is RB
30. To find the vertical intercept for
A’s reaction function, just solve the
equation when qA = 0. By a similar qANE 30 60 qA
process we can find the intercepts
for B’s reaction function.
Now, let’s think about the (only) equilibrium of this graph. Suppose player A
thinks player B will produce 30 units. RA tells us what player A’s best response is to
that quantity; so, follow the dotted green line to see what player A would want to
produce. However, if A produces this quantity, follow the line again to RB to see what
player B would want to produce. It is clear that this process will continue until both
players reach the point of intersection, the green dot. It is at this point where each
player is making a guess about the other player’s quantity, maximizing their profit
with respect to that guess, and both players are happy. Looking at the graph we can
see that this is the only reasonable solution to this game, since it’s the only point
where the two players’ best responses intersect.
We know we want the point where both lines intersect, and we have both
players’ reaction functions; thus, we can just solve the two functions as a system of
equations:
qA = RA (qB ) = 30 − .5qB qB = RB (qA ) = 30 − .5qA
qA = 30 − .5(30 − .5qA ) (substituting in qB’s reaction function)
qA = 15 + .25qA
.75qA = 15 ⇒ qA = 20
(Note: If the game is symmetric, where the only difference between the two
firms is their names, and there’s only one nash equilibrium, as our game is, you can
save time by concluding that qA=qB. Thus, after finding player A’s reaction function
qA=30‐.5qB you could substitute qA for qB and gotten qA=30‐.5qA or 1.5qA=30 which
gives us the same answer of 20 for each firm. This is handy if you need to save time
on an exam.)
Since the game is symmetric, and there’s only one equilibrium, we know qA=qB,
so qB=20. We can now find industry quantity:
Q = qA + qB = 20 + 20 = 40
271
and referring back to our market demand curve to determine our price we get
P(Q) = 20 − .25Q = 20 − .25(40) = 10
so player A’s profit is
π A = 20(10 − 5) = 100
and since we know the game is symmetric, πB = 100 as well. Total industry profit is
πA + πB = 100 + 100 = 200
Notice that the total industry profit is 200 with competition, whereas the total
industry profit in the case of a monopolist was 225. This should make intuitive
sense; competition drives down profits. This is still higher than 0 profit, however,
which was the case when we were considering price competition with two firms
that had unlimited capacity.
Now, let’s generalize the graph
qB
of the reaction functions. Just as in
our example, when player A
produces nothing, player B will RA
want to produce the monopoly
quantity, and if player B produces
nothing, player A will want to qB
MON
272
We can look at the graph of qB
the players’ reaction functions to RA
find out. There are two results
from the reduction in marginal
cost. Since B lowered his qB1
q˜ B1
marginal cost, for the initial
quantity that A was producing
(qA0), B will be able to produce qB0 RB1
more; therefore, his reaction
function will shift out. At A’s RB0
original quantity of qA0, player B qA1 qA0 qA
will now produce at q˜ B1 . This
reduction in MC will increase B’s
profits.
As a result of player B’s increase in quantity, player A will reduce her quantity.
This is because her residual demand, which is based on how much B produces, will
fall. This decrease in qA will also increase B’s profits.
To better interpret these results, let’s first discuss some terminology. Within the
context of Cournot competition, we say that quantities are “strategic substitutes”;
that means that the more player A produces, the less player B will want to produce.
It’s basically a fancy way of saying that reaction functions slope down. In a strategic
sense, A’s quantity substitutes for B’s, and vice versa.
Knowing this, we see that the effect of a cost reduction in Cournot competition is
two‐fold. First, player B experiences an increase in profits from the direct effect of
having a lower marginal cost. This simply means that his cost per unit is lower so he
is able to produce a higher quantity. Secondly, player B earns higher profits due to
the strategic effect of having A lower her quantity as a result of player B’s increase
in quantity. Again, this goes back to the fact that their reaction functions are
negatively sloped. We see that the strategic effect moves in the same direction as the
direct effect, and both of them increase B’s profits. Thus, the advantage of a cost
reduction is magnified in Cournot competition.
Another way a firm could increase profits is through advertising. The problem
with advertising is that it will benefit every firm; this is because in Cournot
competition, all products are identical. For example, if an orange juice producer
buys advertising for orange juice, the market demand for all orange juice will
increase, and since all products are the same, every orange juice producer will
benefit from the increase in demand. In fact, each firm will benefit relative to their
market share of the industry. This is why in homogenous industries, firms will
usually get together and form trade associations to advertise. This way, each firm is
contributing to the cost of advertising, since each firm benefits.
Suppose firm A increases advertising. We know this grows A’s demand, so for a
given level of B’s output, A will produce more. However, since the products are
273
identical, B’s level of demand will grow by approximately the same amount. The
graph to below illustrates this point.
As you can see, both reaction
functions shift out, since each firm qB
wants to produce more given the R
increase in demand. The direct effect RA0 A1 A ↑ advertising
of A’s increase in advertising is that
she experiences greater demand, and
thus earns higher profits. However,
the strategic effect is that B will
want to produce more due to him also
RB1
experiencing greater demand; this
will cause A to earn lower profits. We RB0
see that in the case of advertising
qA
with quantity competition and
homogenous products, the direct
effect is diluted by the homogenous nature of the products and the direct effect and
the strategic effect work against each other.
In summary, for a firm in a homogenous product industry, capacity limits
become very important. For two firms without capacity limits, prices will be driven
down to cost, and neither firm will make a profit. Firms in a homogenous product
industry shouldn’t heavily pursue individual advertising, since the spillover effects
and the strategic effects decrease the benefits from advertising. Instead, firms
should try to get into trade associations when advertising to split up costs among all
the firms that will benefit from it. When a firm invests in technology to reduce
marginal cost, the benefits are magnified since the direct effect and strategic effect
move in the same direction. Thus, the most beneficial thing a firm can do in a
homogenous product industry is to cut cost more efficiently than the competition.
First Mover (Stackelberg) Quantity Competition
Up to now, we have assumed that player A picks her quantity given her “best
guess” about player B’s quantity, and player B picks his quantity given his “best
guess” about player A’s quantity. Let’s assume now that this is a sequential game,
and that player A picks her quantity first. Remember from our discussion on game
theory we said this means player A gets to basically dictate how the game will be
played, because she knows how player B will respond to the quantity she picks. This
does not necessarily mean A gains and B loses. Player B may be better off by the fact
that he gets to maximize his profit based on a known quantity for A. Each game is
different, and must be analyzed individually to see whether there is a first mover
advantage or a second mover advantage.
Since player A moves first, we know player B will set his quantity equal to
qB = RB (qA )
274
which intuitively says player B will maximize profit given player A’s quantity of
qA. Since player A knows this is how player B will respond, she wants to incorporate
this into her own profit function. Thus, her profit becomes
π A = P(qA + qB )qA − C(qA )
π A = P(qA + (RB (qA ))qA − C(qA )
where she has just substituted in B’s reaction function for qB. Maximizing profit:
dπ A dP ⎛ dRB ⎞
=P+ ⎜1+ ⎟qA − MCA
dqA dQ ⎝ dqA ⎠
which is the same answer we got when the players were playing simultaneously
with the exception of the (dRB/dqA) term. What does this term signify? Intuitively,
it’s how much player B changes his quantity based on player A producing 1 more
unit. This is important to understand: we know the players’ reaction functions slope
downward. This means as player A produces more, player B produces less. Now that
player A moves first, she knows that after she sets her quantity, player B will set his
based on what she chose. So, if she increases her quantity by one, it will lower what
player B will produce by some amount; this amount is exactly what this new term
represents.
The reason this is important is because of the following. Say we are in a
simultaneous move game. If player A increases quantity by one, market quantity
goes up by one, and price will decrease by some amount. This is because player B
picks quantity at the same time player A does, and has no time to respond to the
increase in quantity. However, in this sequential move game, if player A increases
quantity by one, B moves second and as such will produce some amount less (due to
his negatively‐sloped reaction function). As a result, market quantity will still
increase, but not by as much due to B’s decrease in quantity. Thus, market price will
go down by less than it did earlier. Therefore, market price is less sensitive to player
A’s quantity in a sequential game. Because of this, player A will sell more when she
is the first mover than when the game is simultaneous.
275
Example
Let’s verify this using our specific example from earlier. Remember, constant
marginal cost is $5, and P(Q) = 20 − (Q/4) . We also know that RB = 30 − (qA /2) . So
⎛ q (30 − (qA /2) ⎞
π A = ⎜20 − A − ⎟qA − 5qA
⎝ 4 4 ⎠
⎛ qA qA ⎞
π A = ⎜20 − 7.5 − + ⎟qA − 5qA
⎝ 4 8⎠
⎛ q ⎞
π A = ⎜12.5 − A ⎟qA − 5qA
⎝ 8⎠
Maximizing: Total market quantity is
qA Q = qA + qB = 30 + 15 = 45
MRA = 12.5 − = 5 = MCA
4 and price is
q
7.5 = A ⇒ qA = 30 P (Q) = P (45) = 20 −
45
= 8.75
4 4
Solving for B’s quantity:
30
RB (30) = 30 − = 15
2
We can see in the Cournot game each player produced 20; by moving first, A
produces more (30) which causes B to produce less (15), but not one for one, so
total industry output has increased from 40 to 45. Since total output has increased,
price decreased from 10 to 8.75. Looking at their profits we see
π A = 30(8.75 − 5) = 112.50 , π B = 15(8.75 − 5) = 56.25
276
Longrun Equilibrium
Earlier, we said in the long‐run, if firms are making a profit in an industry, other
firms tend to enter, until each firm’s profits are 0 unless there are barriers to entry.
Remember, barriers to entry could be legal, or simply economies of scale.
Let’s now consider the long‐run implications of a homogenous product industry.
Consider the most efficient firm that has not entered into the market. Assume the
demand that is “left over” in the
industry is dresidual, and it is the P LRAC
demand curve that the firm faces.
The firm has a long‐run average
cost that is represented by the
LRAC curve. Since there are
quantities along the residual dresidual
demand curve that have a higher
price than the LRAC curve, the d’residual
firm will enter, as it can make a
profit. We said that in q
equilibrium, firms don’t make a P
profit; the only way for this to be the case is if the residual demand left for a firm is
always below its LRAC curve. Thus, as firms enter, the residual demand will shift
down to d’residual. When a firm faces this demand, we say that it is the marginal firm,
in that it is just barely willing to enter. Of course, there may be no firm that is exactly
indifferent to entering. The last firm that actually enters may make a profit, while if
the next firm in line entered, it would make a loss.
Example
Let’s look at an example using the marginal firm. Assume, as before, that
P = 20 − (Q/4) and the cost function for firm 4 is C4 (q4 ) = 5q + F where F is the fixed
costs. Now suppose that there are n = 4 firms in the industry, and that
q1 + q2 + q3 = 40 in the Nash equilibrium. Will entry occur? If there is profit left in the
industry, more firms will enter; so, to find out whether this is the case, we need to
find out firm 4’s profit.
⎛ 40 q4 ⎞
π 4 = ⎜20 − − ⎟q4 − 5q4 − F
⎝ 4 4⎠
⎛ q ⎞
π 4 = ⎜10 − 4 ⎟q4 − 5q4 − F
⎝ 4⎠
q4
MR4 = 10 − = 5 = MC4
2
q4 = 10 and Q = 40 + 10 = 50
so price is
277
50
P (Q) = P (50) = 20 − = 7.50
4
which means firm 4’s profit is
π 4 = (7.50 − 5)10 − F ⇒ 25 − F
So is F > 25, firm 4 will exit, and there are too many firms in the industry. If F <
25, firm 4 is making a profit, which means entry may occur. Entry won’t necessarily
occur because there may only be room for four firms. To figure out if the fifth firm
will enter, simply do the above calculation and see if he is making a profit. If F is
exactly 25, firm 4 is the “marginal” firm and makes 0 profit exactly.
Now let’s generalize long‐run oligopoly theory. We know in the long‐run that in a
given industry no excess profit will be available; if there were, more firms would
have entered. Suppose it takes n firms to eliminate all profit potential for additional
firms to make a profit. Firm (n+1) firm will not enter because they would make a
loss if they did so. Firm n is the marginal firm if the residual demand left over for
firm (n+1) is never be above its long‐run average cost curve. This is illustrated
below.
The demand that firm
n+1 faces is the dresidual,n+1 p
line, and its average cost LRACn+1
curve is LRACn+1 curve. Total Output
Notice that the price that
customers are willing to
pay (given by dresidual,n+1) D
is never high enough to
cover the firm’s costs
(given by LRACn+1), which dresidual,n+1
is consistent with long‐
run theory. If the D line q
gives the demand for the
entire market, the horizontal distance shown by the dotted line represents the total
output of the first n firms. This total output is the Nash equilibrium quantity with n
firms entering and all other firms not entering.
Now let’s look at the general mathematics of our long‐run model. Let i be a firm
in the industry, and let there be i=1, 2, …, n firms with i=1 being the most efficient,
i=2 being the second‐most efficient, etc. and i=n being the least efficient firm that is
still in the industry. The profit for a random firm j is
n
π j = p(∑ qi )q j − C j (q j ) .
i =1
278
To clarify, ∑qi is simply the total industry output Q. Then, if firm j is in the
market, we know
MRj = MCj (for all firms in the market)
since they are maximizing profit. We also know that for firm n, which is the least
efficient firm that is in the market,
πn ≈ 0
since it is the marginal firm. Another way of saying this is that the marginal firm
barely breaks even. How precisely this holds, however, also depends on the degree
of economies of scale and the details of input markets and firm structure market. If
economies of scale are significant, and say only two firms can enter before a third
firm drives profits below 0, the two firms may make a profit greater than 0; if
economies of scale are small, though, and there are 25 firms, firm n’s profit will be
close to 0.
Finally, since we know that firm n is making approximately 0 profit, this last
equation can be stated as p ≈ LRACn . The two conditions that define long run
equilibrium in an industry are then:
1) MRj = MCj (for all n firms in the market)
and
2) p ≈ LRACn .
Another important observation follows. We know MRj = MCj is the same as
dp
p+ Qs j = MC j
dQ
and as s (the market share) goes to 0 (i.e. the number of firms in the industry
increases), p approaches MC. Since p ≈ LRACn in long‐run equilibrium, that must
mean MC approaches LRAC for the marginal firm as more and more firms fit in the
industry. In other words, as economies of scale become smaller, the industry
equilibrium becomes closer and closer to one in which firms are producing and
pricing at minimum LRAC.
279
Chapter 15 Terminology
The following is a list of terms that you should know in order to discuss and
apply the material from this chapter.
Oligopoly A type of firm structure with at least two firms, each having significant
market share. Since there is a relatively small amount of big players, the managerial
decisions that each firm makes have an impact on the demand that other firms
experience.
Monopoly A type of firm structure with a single firm that faces no competition and
has complete autonomous control over setting prices and production levels.
Perfect Competition A type of firm structure with many firms where each
individual firm has no control over price levels; they are price takers. These firms
have no strategic decisions to make, since they take their price from the market.
Nothing they do will have any impact on the other firms in the market.
Barriers to Entry Obstacles faced by potential entrants to a market. These could
include legal barriers, such as regulations or patents, or economies of scale allowing
the current firm being able to produce a given output more cheaply than two firms
could.
Homogenous Products Products that are identical, making price competition
fierce.
Homogenous Product Bertrand Competition A model that shows the fierceness
of price competition. It represents two firms competing in price, producing
homogenous products, and sharing the same marginal cost in which the price they
charge will be driven down to the competitive level (at cost) and no profit will be
made. This model assumes unlimited capacity.
Durable Good A good that lasts a long time. A monopolist selling a durable good
today would be in competition with himself tomorrow, and so on, until price is
driven down to cost.
Cournot Competition A model that represents a homogenous product oligopoly in
which firms have capacity limits and move simultaneously. The firms in this model
use reaction functions to make decisions on how much to produce.
Strategic Substitutes Within Cournot competition, these are the quantities
produced by the firms. The more one player produces, the less the other player will
produce. This means that the firms’ reaction functions slope down because one
firm’s reaction function is inversely related to the quantity produced by the other
firm and vice versa.
Direct Effect The benefits experienced by a firm that tries to get ahead of its
competition by either reducing its marginal cost or increasing its advertising
expenditure.
280
Strategic Effect The counteracting or magnifying effect caused by a competitor’s
reaction to a firm’s decision to reduce marginal cost or increase advertising
expenditures.
Stackelberg Competition A sequential move model that represents two firms
competing on quantity in which one firm is the leader and the other is the follower.
Residual Demand Demand left over for the most efficient firm not yet in the
industry. If this is above its long run average cost curve, the firm will enter. If it is
below LRAC, it will stay out.
Marginal Firm The firm that is almost efficient enough to enter the market but
cannot because more efficient firms have already established their place. Thus, the
residual demand curve for the market is below this firm’s LRAC curve.
281
Chapter 16
Differentiated Product Markets
In the previous chapter, we assumed products were completely identical, and
customers simply bought from the cheapest firm. In the real world, however,
perfectly homogeneous goods rarely exist. Even if two products are identical, their
location can itself be an aspect of differentiation; consider buying gasoline, and how
you’d rather buy from a convenient location all other things equal. The fact that
products are differentiated means, ultimately, that the effects of price competition
are dampened; that is, undercutting an opponent won’t take away all of his demand,
since there are certain customers that strictly prefer his product. Let’s look at a
model for products that are identical except for their location; this is called spatial
differentiation.
Imagine a city with only one street, with one firm located at each end. There are
N customers uniformly distributed along the street, and the length of the street is 1
unit (miles, blocks, kilometers, whatever). Let 0 be the location of firm A, and,
similarly, let 1 be the location of firm B. Each customer is located at some point x
between 0 and 1. This city is represented the figure below.
$
v : willingness to pay
t : transportation cost/unit of distance
x : location of the customer
v‐tx
v‐t(1‐x)
A B
0 x 1
Each consumer’s reservation value for the product is v. Consumers incur a
(round trip) transportation cost of t per unit of distance traveled. If a consumer
located at point x (somewhere between 0 and 1) buys from firm A at location 0, they
have to travel a one‐way distance of x to make the purchase. Thus, the value they
receive net of transportation cost is vtx. If they buy from firm B, they have to travel
1x units of distance, and, the net value they receive is vt(1x). The net values
purchasing from each firm as a function of location (x) are also shown in the figure
above. These net values (reservation price less transportation costs) represent the
maximum amount a consumer located at x would be willing to pay for each firm’s
product. Willingness to pay declines with distance, due to transportation costs.
282
Consumer surplus is the difference between their willingness to pay and what a
consumer actually pays. For a consumer located at x, their surpluses if they buy
from A or B respectively are:
SA = v – tx ‐ pA and
SB = v – t(1‐x) ‐ pB.
Each consumer will buy from whichever firm offers them the highest surplus.
Assuming the firms prices are such that some consumers buy from each, a consumer
at some location, x , will be indifferent between the firms. To find this location, x ,
just equate the surpluses and solve.
S A = SB
tx + p A = t − tx + pB
2tx = t − p A + pB
t + pB − p A
x=
2t
This location, x , defines the point of indifference. Everyone closer to A will buy
from A and everyone closer to B will buy from B. Since consumers are uniformly
distributed, and since x represents the fraction of the distance between 0 and 1
where x lies, x represents the fraction of the N consumers that buy from A and
(1 − x) represents the fraction of the N consumers that buy from B. Thus, demands
are:
⎛ 1 p − pA ⎞
q A = Nx = N ⎜ + B ⎟ (16.1)
⎝2 2t ⎠
and
⎛ 1 p − pB ⎞
qB = N (1 − x) = N ⎜ + A ⎟ . (16.2)
⎝2 2t ⎠
Looking at the demand functions, we can make some conceptual connections.
First, notice if pA=pB, qA and qB are N(1/2). This should make sense – if the only
difference between firm A’s product and firm B’s product is the location, and they
charge the same price, each firm will get the half of the market that is closest to
them.
Earlier we defined t to be the “transportation costs per unit of distance,” and we
said that this is the only thing that differentiates the products. Now, however, we
can think of geographic distance as a metaphor for any characteristic that
differentiates the products, and we can think of t as a metaphor for how important
that product differentiation is. This means the higher value that t has, the more
differentiation is important, since a customer will lose greater surplus the “further”
away they are from their ideal product.
Looking back at our formula we see that this value of t influences the impact of
the difference in prices (pB – pA). If t is very small, the effect on quantity of the
283
difference in prices will be magnified (since you will be dividing by a small number).
This should make sense, since the smaller t is, the more homogenous the products
are (the less the difference matters to consumers), and thus the effects of price
competition are drastic, as they were when we first talked about price competition
with homogenous products. If t is very large, the effect on quantity of the difference
in prices will be small. A larger t symbolizes more emphasis on differentiation, and
like we just showed, undercutting will take less of the market away from your
competitor if non‐price differentiation is more important.
Now let’s model this as a simultaneous play game where firms compete on prices
and solve for the Nash equilibrium. This is known as differentiated product
Bertrand competition. Assume both firms have an identical constant cost per unit of
c. Then:
⎛1 pB − p A ⎞
π A = ( pA − c ) N ⎜ + ⎟ . (16.3)
⎝2 2t ⎠
Maximizing:
dπ A ⎡ 1 p − pA −1 ⎤
= N⎢ + B + (pA − c )⎥ = 0 (by the product rule)
dpA ⎣2 2t 2t ⎦
dπ A ⎡1 p p p c⎤
= N ⎢ + B − A − A + ⎥ = 0 (simplifying)
dpA ⎣ 2 2t 2t 2t 2t ⎦
2t
t + c + pB − 2 p A = 0 (multiplying both sides by )
N
c + t pB
pA = RA ( pB ) = + (solving for pA)
2 2
So maximizing firm A’s profit with respect to price (since now price is what the
firms are choosing, not quantity) we get a reaction function for A’s best price given a
“best guess” about what price firm B will charge. Just as in Cournot competition, we
can graph this reaction function, but instead of modeling responses to estimates of
quantity, we will be modeling responses to estimates of price.
pB RA
RB
(c+t)/2
pA
(c+t)/2
284
The stark difference between reaction functions in differentiated product price
competition compared to quantity competition is that they slope up. This is because
as player A raises her price, player B will want to raise his as well; remember in
Cournot (quantity) competition, as player A raised her quantity, player B wanted to
lower his quantity.
The Nash equilibrium is simply where the reaction functions cross, just as it was
in Cournot competition. We have player A’s reaction function, and since this game is
symmetric, we know B’s reaction function would mirror A’s. But, we can just use
symmetry to solve, since in equilibrium pA=pB. Thus:
c + t pA
pA = +
2 2
pA c + t
= . (16.4)
2 2
p ANE = pBNE = c + t
Note that the players’ prices are the same only because the game is symmetric.
Below, the solution is added to the figure.
pB RA
RB
c+t
(c+t)/2
pA
(c+t)/2 c+t
We can now see how as t → 0 , p NE → c . Since we said t can be seen as a
metaphor for any characteristic that differentiates player A’s product from player
B’s, this should make sense. When t is 0, this (metaphorically) means there is no
difference between the two products; thus, price will approach the constant per unit
cost, just as it did when we were examining price competition in homogenous
product industries. A conclusion is that price becomes higher than cost if (a) we
have capacity limits, like we saw in Cournot competition, or (b) we have
differentiation, like we’ve seen in Bertrand competition.
To generalize this concept of differentiation, let’s look at what’s true in general.
Assume that MCA=cA and MCB=cB. We know that how much A sells depends on how
much A charges, but also on how much B charges. Thus, qA(pA,pB), and
π A = ( p A − cA ) q A ( p A , pB ) .
285
Maximizing:
πA dqA
= qA + (pA − cA ) = 0 . (using the product rule)
dpA dpA
Notice when A maximizes her profit, B’s price (potentially) shows up in two
places; since we know qA is a function of both pA and pB, both A’s and B’s prices
influence the first qA term, as well as the (dqA/dpA) term. Thus, when you solve this
equation for A’s price, you will get a reaction function that depends on B’s price, or
RA(pB). This is true for B’s price as well, in the
sense that it will be a function of A’s price, or pB RA
RB(pA). The graph of their reaction functions
will be upward sloping, as they were in the
last example, since whenever competing by pB RB
price, firm A will want to respond to an
increase in B’s price by increasing her own
price, at least as long as prices are below the
monopoly level (which they will be, due to pA
competition).
Example
Assume qA = 10 − pA + .5 pB , qB = 18 − 2 pB + .5 pA and that cA = 2 and cB = 1.
Solving for A’s reaction function
π A = (10 − pA + .5 pB )(pA − 2 )
dπ A
= 10 − pA + .5 pB − pA + 2 (product rule)
dpA
2 pA = 12 + .5 pB
pA = 6 + .25 pB
and solving for B’s reaction function
π B = (18 − 2 pB + .5 pA )(pB − 1)
dπ B
= 18 − 2 pB + .5 pA − 2 pB + 2 (product rule)
dpB
4 pB = 20 + .5 pA
pB = 5 + .125 pA
To solve for the Nash equilibrium, find where the reaction functions intersect:
pA = 6 + .25 pB and pB = 5 + .125 pA
pA = 6 + .25 (5 + .125 pA )
286
pA = 7.25 + .03125 pA
pA = 7.48 so pB = 5 + .125(7.48) = 5.94
and these are the players’ respective Nash equilibrium prices.
Price (Bertrand) Competition with a First Mover
First, it’s important to understand that the first mover in this game will do at
least as good for himself as in the Nash equilibrium. The reason for this is because
the simultaneous game had a pure strategy equilibrium where each player played
their best response to the other. If the first mover sets the same price when they go
first that they would have set moving at the same time, the second mover will
respond in the same way. So, equilibrium prices and payoffs would be the same. The
first mover has additional control over how the game will be played, since he is
setting up the stage for the second mover. Knowing how the second mover will
respond, the first mover would never set a price that, after the second mover moves,
would give him less of a profit than he could achieve by choosing the same price as
chosen in the simultaneous move game.
Relative to the original Nash equilibrium price, what will the first mover do?
Remember, in the simultaneous move Nash equilibrium, both players are on their
reaction functions. That means, the net marginal benefit, NMB, of a change in price
is 0 for both players. (Setting NMB equal to 0 defines the reaction function with
continuous strategies.) So, a slight increase or decrease in the first mover’s price has
no significant impact on their profit. However, an increase in the second mover’s
price would boost the first mover’s profit. Since the first mover knows the reaction
functions slope up, they can get the second mover to charge a higher price by
increasing their own price. This boosts the first mover’s profits.
Since the first mover chooses to increase price, when they did not have too, their
profits will be higher, due to the induced increase in the second mover’s price, as
compared to the simultaneous move game. The second mover, however, benefits
from the increase in the first mover’s price AND gets to choose their price optimally
in response. This gives them an opportunity to raise their price somewhat but to
also undercut the first mover relative to the simultaneous move game. Thus, the
second mover profits relatively more than the first mover. Therefore, when there is
a first mover, both firms obtain higher profits, but both firms would rather be the
second mover than the first.
In general, if player B is the first mover, his profit will look like the following
π B = ( pB − cB ) qB ( pB , RA ( pB ) ) .
We substituted RA ( pB ) for firm A’s price because, since B moves first, B knows A
will respond according to their reaction function, thus, when choosing their price,
the first mover knows that the second mover’s price is dictated by the first mover’s
choice. Maximizing B’s profit:
287
dπ B ⎡ dq dq dp ⎤
= qB + (pB − cB )⎢ B + B A ⎥ .
dpB ⎣ dpB dpA dpB ⎦
Compared to the simultaneous game, the second term in brackets,
(dqB/dpA)(dpA/dpB), is new. Remember, B is moving first. The new term says that
player B’s price influences player A’s price through the reaction function (dpA/dpB),
and that player A’s price influences player B’s quantity (dqB/dpA). Intuitively, this
term is saying that as player B moves first, he knows what price he charges will
affect what price A charges, which in turn will affect what quantity B sells.
Example
From our earlier example qB = 18 − 2 pB + 0.5 p A and cB = qB . We also found player
A’s reaction function to be RA = 6 + 0.25 pB . Find the equilibrium prices assuming B
moves first.
dπ B p ⎛ 1⎞
= 18 − 2 pB + 3 + B + ⎜ −2 + ⎟ ( pB − 1) = 0 (using the product rule)
dpB 8 ⎝ 8⎠
7 7 7
21 − 1 pB − 1 pB + 1 = 0
8 8 8
pB = 6.10 > 5.94
Player B indeed raises his price from 5.94 to 6.10 when he is the first mover. To find
player A’s price, use her reaction function:
RA = 6 + 6.10 / 4 = 7.53 > 7.48 .
Player A raises her price as well. But, it took a relatively large increase in B’s price to
induce a relatively small change in A’s price. If we were to plug these prices back
into the players’ profit functions, we would see that both players’ profits increased,
but, A’s increased relatively more. The reader should verify that for themselves.
Incentives for Cost Reduction
Let’s see how a cost reduction by one player affects price competition in a
differentiated product market. Recall that players competing in Bertrand
(differentiated) markets face upward sloping reaction functions. Let’s see what
happens if player A invests in some technology to reduce her marginal cost. The
changes are depicted in the two figures below.
288
pA MCA
RA’ RA
pB
MCA’
pA0ne
pA’ RB
pA1ne
pB0ne
MRA DA pB1ne
MRA’ DA’
qA pA0ne pA
pA1ne
pA’
Initially, from the left panel of the figure we see player A wants to charge a lower
price (from pA0ne to pA’) since her MR equals her MC at a higher quantity. Since
player A now wants to charge a lower price regardless of what price B charges, her
reaction function RA will shift left (remember, A’s price is on the horizontal axis, so
for her price to decrease, the reaction function must shift left). If player B were not
to respond to this new reaction function, and were to continue charging pB0ne, player
A would be content charging pA’. However, since player B is rational, he will react to
A’s lower price by charging a lower price himself, moving along his reaction
function. Because of this strategic effect, player A will in turn lower her price even
more. The new Nash equilibrium prices are pA1ne and pB1ne.
There are a few things going on here. The direct effect of player A’s cost
reduction is a decrease in A’s price and an increase in A’s profits. Since player A
lowered her price, player B loses some market share; as a result, he will lower his
price to get some back (just as his reaction function tells us). This strategic effect
will take back some of B’s lost market share from A, decreasing A’s profits. The fact
that the strategic effect moves in the opposite direction of the direct effect in price
competition dampens the benefits from a cost reduction. Recall that these effects
moved together when firms were competing in quantity (Cournot competition).
Thus, in Cournot competition, firms wanted to move first to get a bigger share of
the market and had greater incentives to invest in cost reductions, as the strategic
effect enhanced the direct effect. In Bertrand competition, firms want to move
second to raise price while undercutting their competitor, and they have less
incentive to engage in cost reduction as the strategic effect lessens the gains from
the direct effect. Now let’s consider advertising.
289
Advertising
When firms were competing in a homogenous industry, advertising only made
sense if all of the firms were involved in the cost. This is due to the spillover effects
of advertising for identical products. In differentiated markets, however, firms have
a greater incentive to advertise since spillover effects are lessened.
We can classify two different types of advertising as follows. First is informative
advertising. Imagine the following is the distribution of potential customers for a
certain type of product. How far along the line you are represents your preference
for a certain characteristic (location to product, sweetness of cola, etc…).
B A
Imagine firm A and B are located as above. We know customers who are “closer”
to either firm will buy from that firm assuming equal prices; thus, customers on the
right side of the dotted line will buy from firm A, and those to the left will buy from
firm B if prices are equivalent. However, this is assuming that all the customers are
perfectly informed about where firm A is and where firm B is. If customers weren’t
aware of firm B’s location, it’s possible those to the left of the dotted line would buy
from A, although they’d prefer to buy from B if they knew it existed. Thus, firm B
may invest in informative advertising to simply inform the potential customers that
he exists. This kind of advertising is good for society since it simply informs
customers. It is also good for managers because it creates awareness about their
product.
Next is persuasive advertising. Assume we have the same picture as above. If
firm A were to invest in persuasive advertising, they would be attempting to shift
consumer preferences to be more in favor of A’s product as opposed to B’s product.
Graphically, this would look like a shift in the curve of potential consumers, shown
below.
B A
Now, the group of customers that are closer to A (still to the right of the dotted
line) is much larger than those closer to group B. This advertising may or may not be
beneficial to society. If both firms invest in persuasive advertising and end up
290
canceling out each other’s effects, the money has just been wasted. From a
manager’s perspective, though, it becomes a question of maximizing profit.
If we were to add advertising to the firm’s profit function, it would become
π = pq ( p, A) − C ( q( p, A) ) − A
which means that the quantity that the firm sells now depends both on the price he
charges, as well as his amount of advertising. So, since there are two variables under
the firm’s control, maximizing profit requires two partial derivatives:
∂π
= 0 and
∂p
∂π ∂q dC ∂q
= p* − − 1 = 0 .
∂A ∂A dq ∂A
This will give us two equations with two unknowns (price and advertising), so
we can solve for each variable. Looking at the second derivative, we know the
derivative of cost with respect to quantity (dC/dq) is marginal cost, so rewriting the
equation we get
∂q
( p − MC ) = 1 .
∂A
This says the marginal benefit of advertising should equal its marginal cost. The
marginal benefit of advertising is marginal profit per unit sold (p‐MC) times how
many more units you sell for one more unit of advertising (∂q/∂A). The cost of
spending another dollar on advertising is 1.
A numerical example will clarify: suppose price is 4 and MC is 1, so marginal
profit per unit is 3. If one unit of advertising increases sales by 0.5 units, then
( 4 − 1) ⋅ 0.5 = 3 ⋅ 0.5 = 1.5 > 1 .
Since the revenue generated from one more unit of advertising (1.5) is greater than
the cost of one more unit of advertising (1) you should spend more on advertising.
In other words, the marginal benefit of advertising should equal the marginal cost of
advertising, or MBA = MC A .
If we look at a graph with both firms’ reaction functions on it, we can examine
the strategic implications of an increase in advertising on the choice of price. This is
shown in the figure below. Say firm B increases their advertising budget; this will
allow him to charge higher prices regardless of the price charged by A, shifting his
reaction function up.
291
If B’s advertising has no direct
effect on A’s demand, instead only RA’ RA
effecting it indirectly through changes pB
RB’
in prices, in equilibrium prices will
change from p0 to p1. However, B’s
pB1 RB
additional advertising may take a
pB1’
significant number of customers from
A, lowering A’s demand curve at any pB0
given price. Firm A will respond by
lowering price (to maintain some
customers) ‐ A’s reaction function
shifts to the left. In this case, the new pA1’ pA0 pA1 pA
prices are pA1’ and pB1’.
We won’t be running through a concrete example in which both price and
advertising are chosen strategically, primarily because the algebra becomes very
tedious when there are 4 equations to solve for 4 unknowns. But, but it is important
to remember that when choosing advertising a firm needs to be aware not only of
the ideal price and advertising level it should choose holding its opponents actions
constant, but, also of how the firm’s advertising affects demand for its competitors,
and, that changes in advertising will impact equilibrium prices and vice versa.
Long Run Equilibrium
Just as in homogenous competition, for the long‐run to be in equilibrium, the
marginal firm must not make any profit, or more firms would enter. In other words,
it must be the case that the residual demand that the marginal firm faces never
reaches his long‐run average cost
curve. The residual demand that a
p
firm in a differentiated industry may
at first not seem analogous to a
LRACn+1
homogenous industry; however, it can
be viewed in a similar way, since
differentiated products are still
substitutes for one another, albeit not
perfect substitutes. The graph to the
right illustrates the residual demand dresidual
that the faced by the (n+1)th firm if the q
market can sustain n firms in
equilibrium.
Since the marginal firm doesn’t make any profit in the long‐run, it may be
tempting to say that in the long‐run, firms that are in the market don’t have to worry
about strategy – market forces mean profits are driven to zero. This is not the case.
The reason firms that are in the market must continue to think about strategy and
good management is they still owe their shareholders a fair rate of return on their
investment, whatever that is. When we say the marginal firm makes zero profit in
292
the long‐run, the profit we’re talking about is economic profit; that is, profit in
addition to normal returns.
Also, if an industry is limited by economies of scale so that it supports only
relatively few firms, strategy is important even if the marginal firm makes zero
profit. Basically, if there are large economies of scale to entry, and there is only room
for three or four firms, strategic decisions can help firm two take profits away from
firm three, for instance. On the other hand, if economics of scale are small, there will
be many small firms, and strategy is not as important. The name for this type of
industry is “monopolistic competition.” However, since the products are
differentiated, strategy is still somewhat important “locally” (as opposed to “perfect
competition”). Imagine two fast‐food restaurants that are across the street from one
another. Even though their individual strategic decisions won’t impact the entire
fast‐food industry that much, they are highly concerned with the decisions that the
other is making as they impact directly their potential customers. So, with
differentiated products, even if firms are small, a firm’s choices have non‐negligible
impacts on its “closest” competitors.
Summary
Strategies and outcomes vary greatly between homogenous products and
differentiated products and between quantity competition and price competition.
With identical products, since firms will not build lots of excess capacity unless
there is a specific reason, competition tends to be about quantity and capacity, not
about price directly. That means cost management is important, advertising less so,
and, there is a first mover advantage. As soon as there is some aspect of
differentiation between the products, it is possible for competition to be about price,
not just capacity. With differentiated products and price competition, reaction
functions slope up. Cost management, while important, is less important than with
quantity competition. Advertising now becomes an important strategic
consideration. And, it is better to move second than first, though moving first still
beats moving simultaneously – firms had an incentive to be the second mover to
exploit the first mover’s posted price.
It is possible to have quantity competition with differentiated products. This
kind of model is especially important in cases where lead times in production are
significant and when the degree of substitutability is high, though not perfect. Then
strategic decisions are more about capacity than price. But, the imperfect
substitutability means there is room for advertising and that the strategic effects of
cost management alone are not as large as they would be for identical products.
Ultimately, it’s a question of how differentiated a product really is. The more
homogenous a product, the greater the incentive to move first and capture a larger
market share, the more important cost management, the more likely capacity has
major strategic implications, and the less important individual advertising. The
more differentiated a product, the less important capacity and cost management
become, and, the more important individual advertising becomes.
293
Chapter 16 Terminology
The following is a list of terms that you should know in order to discuss and
apply the material from this chapter.
Differentiated Products Products that are not identical, making price competition
less fierce because factors other than price are important.
Transportation Cost A metaphor for any characteristic that differentiates
products. A higher transportation cost means an increased importance in product
differentiation since the customer will lose greater surplus the farther away they are
from their ideal product.
Consumer Surplus The benefit received by consumers who can buy a product for
less than their willingness to pay for it.
Differentiated Product Bertrand Competition A model that shows how fierce
price competition can be dampened by the presence of other factors. Reaction
functions slope up because the price charged by one firm is directly related to the
price charged by the other firm.
Informative Advertising A new firm’s advertising meant to let customers know of
its product. This type of advertising is good for society since it simply informs
customers.
Persuasive Advertising A firm’s advertising meant to change the consumers’
preferences to be more in favor of its product as opposed to a competitor’s product.
If both firms engage in this type of advertising, their effects cancel each other out,
wasting money.
Product Positioning Occurs when a firm introduces a new product, similar to its
other products, taking advantage of a vulnerable niche in the market. Even though
the firm is losing profit, taking demand away from its existing products
(cannibalization), it may be beneficial in preempting another firm’s entry, increasing
profit in the long run.
294
Chapter 17
Perfect Competition
In the long‐run in an industry producing a reasonably homogenous product, the
strategic effects of the decisions of individual firms become trivial if economies of
scale are so small that “many” firms must enter the industry before profits are
eliminated. Think of the global market for wheat; a single Kansas wheat farmer’s
strategies have insignificant effects on what a European farmer decides to do.
We can see this directly from an expression for marginal revenue derived in
chapter 12. Letting s represent an individual firm’s market share, q/Q, the marginal
revenue for firm i, a firm in a homogenous market, is:
dp dp
MRi = p + qi = p + Qsi .
dQ dQ
This says that the revenue a firm generates for selling one more unit is the price
(p) that it charges, as well as the effect on price that the sale itself has (dp/dQ) times
how many units the firm is selling. Every time a firm sells another unit, it drives
down market price (because of the law of demand), which means everyone selling
output in the market is going to get less revenue. So, the firm selling the extra good
drives down price a little bit for everyone (including himself), but collects the price
from the new good he just sold. The effect of the price reduction on all of the other
products he would have sold is only relevant to the part of the market the firm itself
owns, si. In other words, he’s only “losing” profits from the price reduction on his
market share (si).
As the number of firms increases, each firm’s market share decreases, and as si
approaches 0, marginal revenue approaches price (since the second term in the
above equation falls out). If a firm is small enough, they can just ignore the effect of
their sales on market price when choosing quantity. So, for them, marginal revenue
and price are the same thing. So, firms that are “small enough” can jut be modeled as
price takers. For them, setting MR=MC means setting p=MC.
This turns out also to be the case with differentiated product markets when
economies of scale and market size allow many firms to enter. This is because even
though there may be several firms selling slightly different products, they are still
substitutes for one another. As the product space becomes more and more tightly
packed with the products of competing firms, every individual consumer finds that
they have increasingly appealing substitutes for their preferred product. That
means the demand for any individual firm’s product becomes increasingly elastic as
more and more firms introduce products.
Mathematically, we cannot combine all the individual firm’s quantities, q, into a
total market quantity, Q, since the products are not the same thing. So, we cannot
work from the expression for marginal revenue given above. We can, however, work
295
from an alternative expression. Recall the following expression for marginal
revenue.
dpi dp q ⎛ 1⎞
MRi = pi + qi = pi + i i pi = pi ⎜1 + ⎟
dqi dqi pi ⎝ ηi ⎠
As the number of substitutes increases, the absolute value of customers’
elasticity (η) increases, the ratio (1/η) approaches 0, so marginal revenue
approaches price. With differentiated products, as the number of good substitutes
becomes large, marginal revenue again becomes essentially equal to price and
MR=MC becomes p=MC.
The model of perfect competition, then, just begins from the assumption that
firms are pricetakers so that it can analyze market outcomes while ignoring the
intricate complexities that arise from strategic interdependence. It is applicable
when differentiation is “small enough” and economies of scale are “small enough”
that the number of firms in the industry is “large enough” that the implications of
strategic interdependence become of only secondary importance when looking at
outcomes for a market as a whole. For any specific firms in the market, facing any
particular decision, strategic implications may still be of importance. But, the
strategic interplay has little impact on the performance of the market as a whole and
is thus ignored so that we are not prevented from seeing the forest by the
complexity of all the trees.
The model for perfect competition requires a lot of assumptions (rational
players, complete information, price takers, etc.) and thus is not a close
representation of “reality.” As such, its predictions will not exactly hold for any
particular firm in any particular industry. The goal of any model we’ve used is to
highlight the implications of the most important decisions in any situation and to
predict the most important general effects of changes in outside factors on the
market and players being modeled. If there are “enough” firms and if they are “small
enough” so that taking them to be price takers will not cause us to ignore strategic
interactions that have large effects on the whole market, assuming all firms are price
takers becomes a useful if unrealistic simplification that allows us to focus on the big
picture.
Perfect Competition in the Shortrun
Looking at the short‐run, firm i’s profit is
πi = pqi − STCi (qi ) .
Maximizing gives:
d πi
= p − SMCi = 0
dqi .
⇒ p = SMCi
Looking at this graphically gives the figure below.
296
Firms produce where MR=MC to maximize profits. Since p=MR here, that means
producing where p=MC. So, for a given price (two are shown), a firm simply looks to
its marginal cost curve to
determine how many units to sell. p
Thus, the firm’s MC curve that lies MCi
above its AVC curve can be thought p2 firm i's
of as its supply curve. If price were supply curve
less than the minimum possible p1 AVCi
AVC, the firm could not even cover
all of its variable costs. Thus, it
would do better to shut down and
simply lose its fixed costs rather
than to produce and lose all fixed q1 q2
qi
costs plus some variable costs.
If you were to solve the equation p – MCi(qi) = 0, for qi, you would obtain firm i’s
supply curve, or qis(p). The firm’s supply curve gives the quantity the firm produces
to maximize profit given the market price. Everything the individual firm needs to
know about demand and all the other firms is summed up in the market price. The
overall supply curve of the market is then the sum of all of the individual firm’s
n
supply curves, or S ( p) = ∑ qi ( p) . Graphically, market supply is the horizontal sum
i =1
of the individual firm’s supply curves. With the market supply, we can readily look
at firm behavior and market equilibrium, the interconnections between the two,
and, firm and market responses to changes in parameters such as taxes, regulations,
wages, interest rates, incomes, or anything else that affects supply or demand.
297
By making our earlier assumptions that led to marginal revenue approaching
price, we eliminated away the strategic interactions between firms. Note again that
this doesn’t mean they don’t exist; it’s just that making these assumptions allows us
to add in a supply curve and talk about markets as a whole. That is, it allows us to
simultaneously analyze a representative firm and the market as a whole in the two
simple figures above.
Example
There are 25 identical firms, each firm with short run total cost of
STC = 10 + 2q + q 2 / 4 . Market demand is D( p ) = 200 − 10 p .
To find the marginal cost of each firm, simply take the derivative.
SMC = 2 + q / 2
To find supply, set p = SMC and solve for q.
MC = 2 + q /2 = p
q /2 = p − 2
q = 2p − 4
Since the market supply curve is simply the sum of the individual firms’ supply
curves, with identical firms, just multiply an individual firm’s supply by n.
S ( p ) = 25qi
= 25(2 p − 4)
= 50 p − 100
To find equilibrium price and quantity, set supply equal to demand.
S ( p) = D( p )
50 p − 100 = 200 − 10 p
60 p = 300
p=5
298
Putting this in a graph gives the figure below.
Firm Market
MC
S
AVC
pe=5
pe=5
MinAVC
D
q
q(pe)=6 Qe=150 Q
Equilibrium firm‐level and market‐level quantities are shown, along with
equilibrium price. We know that firms will only produce where price is greater than
AVC, or else they’re not even covering costs. So, at the firm’s minimum average
variable cost, the market supply curve cuts off. Market price will never fall below the
lower dotted line in the short run because of this reason.
Perfect Competition in the Longrun
If firms are making economic profits (or π>0), there will be entry. Thus, for an
industry to be in equilibrium, it must be the case that no firm can enter and receive a
price higher than their minimum average cost. Even though firms aren’t making an
economic profit, they are still all maximizing their individual profits. Thus, every
firm in the industry sets marginal revenue equal to marginal cost, and since price
equals marginal revenue,
pi = LRMCi .
where LRMCi is the long‐run marginal cost of firm i. For the marginal firm, firm n, we
assume that pn ≈ LRAC n . The reason it is approximately equal to and not exactly
equal to LRAC is because economies of scale may be such that firm n, which is in the
industry, makes a profit, but, firm n‐1, which is not in the industry, would make a
loss if it entered. When we work problems, however, we assume that pn = LRACn for
simplicity.
Since the marginal firm produces where the (given) market price equals long‐
run average cost ( pn ≈ LRAC n ) and for every firm in the industry price equals long‐
run marginal cost ( pi = LRMCi ) it follows that LRMCn=LRACn. We know that
marginal cost equals average cost at the minimum of the average cost curve; so, in
the long‐run, the marginal firm will produce where price equals minimum average
cost (p=MinLRAC) which is also where marginal cost equals average cost. If price
299
were higher than MinLRAC, the marginal firm would be making a profit, and firms
would enter; if it were lower, the marginal firm would be making a loss, and firms
would exit.
At this point, it is necessary to distinguish between two possible types of
industries: constant cost industries and increasing cost industries. In an increasing
cost industry, as the industry expands, it drives up the prices of the inputs it uses, or
it has to put into use inputs that are less productive. If the industry must put to use
inputs that are less productive, that has the effect of driving up the price of the more
productive inputs above their “opportunity” cost. Such increases in the prices
fetched by the more productive inputs are known as “rents.”
To take the simplest and, perhaps, most compelling example, consider the
market for agricultural grains. If little grain is to be produced, it is produced on only
land, which is both well suited to grain production (fertile) and not well suited to
other uses. Since the land is not well suited to other uses, it will have a relatively low
price. However, as the market expands, more land must be put into production.
Competition for the more productive land will drive up its price. So, it will become
profitable to use land that is slightly less productive, or, to divert productive land
that had some other productive use (say in ranching) to grain production. The land
that was most fertile but had few other uses now fetches a higher price. That is an
economic “rent” – a payment in excess of the reservation value at which the
landowner would have been willing to lease the land for grain production.
Economic rents can accrue to other types of inputs as well. Michael Jordan,
Peyton Manning, Angelina Jolie, and Johnny Cash would all have been happy to
perform for far less than they actually earned at the height of their careers. If the
demand for professional women’s soccer were higher, Mia Hamm and Abby
Wambach would be much wealthier than they are. On the other hand, if the demand
for professional football were lower, Peyton Manning would not be as wealthy as he
is and many journeymen NFL quarterbacks would be in other lines of work.
At the most basic level, individual workers, land, materials, etc… are
idiosyncratic and thus better suited to some uses than to others. If an industry is
large enough, and, thus uses a large share of a certain kind of input, as the industry
expands or contracts, it drives up or down the going rates for the best suited inputs,
and brings less well suited inputs into the industry or expels them from it. So, if an
industry consumes a large share of some of the things it uses as inputs, as it grows,
the minimum long run average cost of running a firm in the industry goes up. The
construction industry drives up the wages of unskilled manual laborers as it
expands, and, drives them down as it contracts. The agricultural industry drives up
the price of fertile land as it expands, and, drives it down as it contracts. Sports
leagues drive up the salaries of the most skilled players as they expand, and, the
movie industry drives up the payments to the most skilled actors and actresses as it
expands. These would all be increasing cost industries, though, depending on
institutions and the question at hand, they may or may not be reasonably
approximated as perfectly competitive.
300
For some cases, though, the industry is a small player in the markets for the
factors of production it uses, (for example the furniture industry), or, the effect of
the industry on the prices of the inputs it uses is not of much direct interest to the
question under study (for example, when studying policies to encourage energy
efficiency in residential construction). In such cases, it more accurate, or at least
simpler and accurate enough, to just assume the industry is constant cost industry.
We also assume that all firms, whether in the industry or not, have an identical cost
structure. In a constant cost industry, the industry can expand and contract without
affecting the cost or productivity of the factors of production used in the industry.
Therefore, as the industry expands or contracts, the cost curves of its representative
firms remain constant.
In any event, when the industry is in long run equilibrium, price equals the
marginal firm’s minimum long run average cost, as shown in the figure below. The
difference is whether the long run industry supply curve slopes up or is flat
(perfectly elastic). We have drawn two potential long‐run industry supply curves
(LRIS) in the figure. The flat LRIS curve represents a constant cost industry. The
other possibility is that average costs increase as the industry expands, in which
case the LRIS sloped up, representing an increasing cost industry.
peLR LRIS
peLR0 (constant cost
industry)
D0
q Q
QeLR0
Relationship Between the Longrun and the Short Run
In the next figure, we deal with a constant cost industry and add a short run
supply curve to the initial picture of the market equilibrium and also add the firm’s
short run marginal cost curve. Suppose market demand shifts out, for whatever
reason. New demand intersects the short‐run supply curve at a price of ptemp, and
we see that our marginal firm is making a profit (since this price is above average
cost). Thus, firms will enter, shifting the short‐run supply curve to SSR1. Since all of
the firms have the same cost structure, and the addition of new firms doesn’t affect
cost (constant‐cost industry), price will tend towards the original equilibrium price,
which is why the long‐run industry supply is flat.
301
Firm Market
$ LRMC $
SRMC SSR0 SSR1
LRAC
p temp
LRIS
p LR
e
peLR
D1
D0
q Q
The other possibility is that if demand increases, average costs increase as firms
are added to an industry, either because the firms being added are less efficient than
the original firms, or just because of competition for the factors of production best
suited to the industry. This is the case in the figure below. In the left panel of the
figure, average cost curve moving from LRAC0 to LRAC1 as entry occurs. Thus the
long‐run industry supply curve (LRIS) slopes up and the long‐run equilibrium price
increases from peLR0 to peLR1.
302
Example
The long run cost function for firms in a constant cost industry is
C (q) = 2q − 0.2q 2 + 0.01q 3 . Market demand is QD = 450 − 100 p . Find equilibrium
price, supply, equilibrium quantity, and, the number of firms in the industry in the
long run.
We know firms produce at their minimum average cost; in other words, where
marginal cost equals average cost. So, set the two equal, and remember average cost
is simply cost divided by q.
AC = 2 − .2q + .01q 2 = 2 − .4 q + .03q 2 = MC
.2q = .02q 2
.2 = .02q
q = 10
$ LRMC
How now to find price? It may help to look at
a graph (right). We found firm quantity to be
10 when average cost is at its minimum. We LRAC
see we can plug that into either the LRMC pe LR
curve or the LRAC curve to find price.
LRAC (10) = 2 − 0.2q + 0.01q 2
10 q
= 2 − 0.2(10) + 0.01(10) 2
p e LR = 2 − 2 + 1 = 1
Since this industry is constant cost,
our long‐run industry supply curve $
will be at a price of $1. To find
industry demand, plug in the long‐
run price of $1 into the demand
curve.
QD (1) = 450 −100 = 350 $1 LRIS
Since we know each firm produces D
10 units, there are 350/10 = 35 firms Q
in the industry in the long‐run. 350
303
Value Added and Social Welfare in (more or less perfectly)
Competitive Markets
What we want to do now is measure welfare in competitive markets, that is, how
much value is added because the product market exists. Part of that value is
captured by consumers in the form of consumer surplus (CS). Consumer surplus
was defined earlier in Chapter 9. It is simply the difference between what they
would be willing to pay for the amount of the product purchased, as measured by
the area under the demand curve, and what they do pay, which is price times
quantity. So, in a normal market equilibrium, consumer surplus is the area under
the demand curve and above price. Mathematically, at price p, if demand is QD(p)
and inverse demand is denoted pD(Q):
QD ( p )
CS = ∫
0
pD ( x)dx − pQ .
Remember CS is just an approximation. First, it relies on obtaining a reasonable
estimate of a demand curve, which is difficult to do in the first place. Secondly, it
assume consumers maximize [V(q) – pq], but this ignores income effects. Suppose
you’d be willing to buy at most three burritos in any given week. If the cost of
everything else you buy goes down, you can imagine you’d be willing to pay more for
any particular number of burritos, because of your increased purchasing power.
This model doesn’t take that into account, and so it’s not completely accurate, but it
accurate enough for many purposes, and, is easy to use and allows us to gain many
additional insights into the workings of markets.
We need a similar measure for the benefits captured by the producer side of the
market. Again, in a normal equilibrium, revenue received by the producers if simply
price times quantity. Since the supply curve represents the marginal cost curves of
the producers, the area under the supply curve
represents total variable costs of all producers. S
The area under price and above the supply
curve therefore represents revenue received in p
e
PS
excess of variable costs. This is producer
D
surplus, PS. This is illustrated in the figure on
the right. Since producer surplus just adds up
the area between the supply curve and the equilibrium price, in a normal
equilibrium it can be expressed as the corresponding integral. Specifically, if p is
price, S(p) is the supply curve, and pS(Q) the inverse supply curve:
S ( p)
PS = p * S ( p ) − ∫0
ps ( x)dx .
Since the supply curve represents firms’ marginal costs, producer surplus is
closely related to profit. In the short run, profit is equal to producer surplus less any
fixed costs that are sunk and also less quasi fixed (start up) costs that are not sunk
304
until output exceeds 0. So, in the short run producer surplus is profit plus those
fixed costs. That is:
PS = ∑ i (π i + Fi ) .
305
come up in the long run model because LRIS depends on average cost, not just
marginal cost, and, average cost includes the quasi fixed costs (which are possible
even in the long run).
The other imperfection with producer surplus in the short run is that we are
thinking in terms of the marginal costs faced by the firms under the current
situation, current labor contracts, etc… But, some of the inputs will be earning
economic rents. Those are not truly costs, but, rather, represent “surplus” received
by owners of the factors of production that are differentially more productive than
the marginal unit. That surplus gets ignored in the standard treatment of producer
surplus in a short run model.
Now that we’ve defined CS and PS, we define total surplus to be the sum of the
two:
Qe Qe Qe
What outcome would maximize value added? Producing right to the point where the
marginal value of another unit equals its
marginal cost. The demand curve p S
represents marginal value, and the supply
curve represents marginal cost, so to
maximize TS, produce where supply
equals demand – that is at the competitive
market equilibrium, Q* in the figure to the D
right. Moreover, in a competitive market,
production takes place at minimum long
Q
run average cost, so, there is no cheaper Q*
way to attain the given output.
In general, competitive markets allocate resources efficiently, tending toward
this equilibrium quantity where TS is maximized. There are a lot of underlying
assumptions. A big one is that the only ones who receive benefits from the products
are consumers and the only ones who pay costs are the suppliers. If third parties
outside the market transactions are helped or harmed as a result of what goes on in
the market, that is if there are external or spillover benefits or costs, supply and
demand no longer represent the true marginal value and marginal cost. For
example, if an industry causes pollution, it’s easy to see how other people could
effectively be paying the costs that are outside our classification of consumer and
producer. Another is that the allocation is “efficient” GIVEN the existing distribution
of wealth – the analysis above said nothing about whether or not that initial
distribution was a desirable one. Nonetheless, this tendency toward maximizing
value added, or social welfare conditional on the given initial distribution of wealth,
is why the free operation of supply and demand in competitive markets is the
standard by which markets are judged.
306
Chapter 17 Terminology
The following is a list of terms that you should know in order to discuss and
apply the material from this chapter.
Constant Cost Industry An industry in which firms are identical and their
marginal cost is constant. As this type of industry expands, input prices are
unaffected and the equilibrium price never changes.
Increasing Cost Industry An industry in which new firms are not as efficient as
exiting firms, driving up long run average cost for the last firm, causing higher
equilibrium prices. Some markets are so big that they drive the price of their own
inputs up as they expand.
307
Chapter 18
Applications of Supply and Demand Analysis
We know that in equilibrium, demand D(p) equals supply S(p). Quantity
demanded can be written as the following
QD = D(p, m, PRC, nC, zC)
which says demand depends on price (p), income (m), the prices of related goods,
such as substitutes and complements (PRC), the number of consumers (nC), and
anything else unaccounted for (zC).
Based on our model of supply, we can write quantity supplied as the following
QS = S(p, PI, PRP, nF, zS)
which says supply depends on price (p), input prices, such as wage and capital rates
(pI), the price of goods related in production, such as assembly lines that can be
changed from producing one good to another relatively quickly (pRP), the number of
suppliers (nF), and anything else unaccounted for (zS).
Basic Comparative Statics
Since we know supply and demand depend on
all of the above factors (and more), we can ask P S’
what happens to each if one of the factors S
changes. For example, suppose we’re looking at P2
the market for construction services. If, for some P1
reason, fewer illegal aliens are available to work
on construction jobs, labor costs will rise. This D
increase in costs will shift the supply curve for
the construction industry left, and the simple shift Q2 Q1 Q
shown in the figure to the right tells us price will
increase and quantity will decrease. Essentially, eliminating the strategic
interactions by assuming all firms are price takers allows us to use very simple
graphs to see the overall effects of a change in any of the variables that influence
supply or demand.
308
Impact of a Tax
Suppose we start off with a constant cost industry, and impose an excise tax of $t
per unit on producers. If pD is the price paid by demanders and pS is the price
received by suppliers:
pD = pS + t . p DWL
Since producers pay the tax,
LRIS+t
we can think of the tax as an p0+t Tax revenue
increase in costs of suppliers, LRIS
p0
causing an upward shift in
the LRIS curve, shown in the D
figure to the right. The
revenue raised by tax is
simply t dollars per unit, Q1 Q0 Q
times the Q1 units that are
sold after the tax is instated (or the area of the rectangle). From a social perspective,
the tax revenue is just a transfer of wealth, and doesn’t cause any loss in surplus.
However, due to the fact that every unit now costs p0+t, consumers will buy less.
The value of the last unit (the Q1th unit) is MC+t, and so this unit still has a value that
exceeds the actual cost of producing it; but the tax means it is not purchased. The tax
presents the production and consumption of the last (Q0 – Q1) units that would have
otherwise been sold from being transacted in the market, and, these are valued
above their cost. Thus the tax creates a deadweight loss – the triangle in the figure. It
represents surplus that someone used to get (consumers in this example) that
vanishes due to the tax. That represents a net loss of social welfare. Depending on
what statistics you look at, $1 of government revenue costs taxpayers $1.20 ‐ $1.30,
exactly because taxes destroy some social surplus. Theoretically this means that you
should only use tax revenue to fund public ventures that have benefits 20‐30%
greater than their cost, since it costs an extra 20‐30% just to raise the revenue. (This
is quite aside from arguments about fiscal stimulus spending when/if monetary
policy is insufficient in a severe recession – but that is a topic for a class in
macroeconomics.)
Now let’s look at an ad valorem tax, and further lets suppose we have an
increasing cost industry (or that we are looking at the short run impact). An ad
valorem tax is just one that’s proportional to value; so a typical sales or property tax
is an example of an ad valorem tax. If pD is the price paid by demanders and pS is the
price received by suppliers, and t is the tax rate, the three are related by the
following equation:
pD = (1 + t ) pS .
To find the equilibrium price(s) and quantity after the tax, simply use that
relationship to substitute into the demand curve, and then equate supply and
demand. That is:
309
D( pD ) = D ( (1 + t ) pS ) = S ( pS ) .
Since the tax is a percent of supplier price, the after‐tax supply curve can be seen
as an upward rotation from the original supply curve, as shown in the figure. The
first thing to note is that compared to the
old price, the price demanders pay Tax
doesn’t go up by the full amount of the Revenue LRIS×(1+t)
tax, and the price suppliers receive p
doesn’t go down by the full amount of pD LRIS
the tax. The difference between pD and pS
is the value of the tax, but how it is split pOLD DWL
up between suppliers and demanders pS
depends on the sensitivity of supply and D
demand to price changes (on elasticity of Q
supply and demand). The only time QNEW QOLD
when consumers strictly pay the old
price plus the entire tax is when we’re looking at a constant cost industry, where the
supply curve is completely flat, or, if the demand curve is vertical.
The intersection of the upward pivoted supply with demand determines the
price demanders pay (pD) and the new equilibrium quantity (QNEW). Suppliers
receive that new price less what they pay as tax pS = pD / (1 + t ) . The gray rectangle
has a height of pD–pS which is the value of the tax (the difference between what
demanders pay and what suppliers receive), and a length of QNEW, the quantity sold
after the tax; thus, the area of the rectangle represents the tax revenue. The area of
the rectangle that falls above the old price (POLD) can be thought of as the portion of
the tax paid by consumers, and the area that falls below old price can be thought of
as the portion of the tax paid by suppliers. The area of the triangle labeled DWL is
deadweight loss.
Impact of a Subsidy or a Price Floor
Now we’re going to look at an example where the government provides
subsidies to an agricultural industry. The first figure represents the market before
the subsidy. In the figure, short‐run supply (SS) is less elastic than long‐run supply
(LRIS). This is because in the short‐run, firms have the presence of a fixed factor,
which limits their ability to
respond to price changes. In the p SS
long‐run, producers can enter or
LRIS
exit the industry, increasing their
capacity for production and their
ability to respond to price
changes. Similarly for consumers,
if the price of gas goes up, there’s
not much they can do about it in DS DL
the short‐run and thus the Q
quantity demanded won’t drop as
310
much; in the long‐run, consumers can buy more fuel‐efficient cars, find alternative
ways to work, etc. and as the number of substitutes for gas increases, demand for
gas becomes more elastic.
Now suppose the government imposes a subsidy of $s per unit on the market. A
subsidy means the government is providing a certain amount of money per unit to
suppliers or consumers. Let’s assume it is paid to suppliers for this example, in
which case we can interpret the subsidy as a downward shift in supply (reduction in
cost). Note that this affects both short‐run and long‐run supply. Initially, quantity
demanded increases from initial (Q0) to short‐run with subsidy (Qs). The price paid
by demanders goes down after the subsidy (pDS). Just like in our last example, to find
the price suppliers have to
receive in order to produce QS
SS
units, look up from QS to the
P
original supply curve; we see SS ‐ $s
that the price suppliers receive LRIS
goes up to pSS. The cost of the PS S
LRIS ‐ $s
subsidy in tax revenue needed to P0
support it in the short‐run is the PDS
cost of the subsidy per unit ($s,
which is the difference between DS DL
pS and pD ) times the new short‐
S S
Q
Q0
run quantity after the subsidy Q Q
S L
(QS), or the area of the rectangle
in the above figure.
When we look at the quantity demanded in the long‐run (QL), we see that it is
higher than in the short‐run. This is because suppliers and demanders are more
price sensitive in the long‐run, so
the changes in quantities brought P DWL LRIS
about by taxes and subsidies are
LRIS ‐ s
bigger in the long‐run than they are PS P
0
in the short‐run. The cost of the P
D
subsidy in tax revenue and the
additional deadweight loss it creates
in the long‐run are shown in the Q0 Q1 Q
graph at right.
We know that the optimum quantity is where supply crosses demand, or Q0. We
see the subsidy causes more units than this amount to be produced, and so the
deadweight loss is due to over‐production. Thus, when we’re looking at an example
with a subsidy, deadweight loss occurs because the last Q1 – Q0 units that were
produced had a higher marginal cost than they did marginal value. The DWL is
illustrated accordingly in the above graph as the difference between the supply
curve and the demand curve to the right of equilibrium quantity (the area of the
triangle). The cost of the subsidy itself in tax revenue needed to fund it is the gray
311
rectangle, which is just the subsidy amount per unit ($s, or pS – pD) times the
quantity sold after the subsidy (Q1).
Another way the government encourages higher prices to suppliers in certain
industries is by a price support. A price support is a mandated price that’s higher
than the equilibrium price. We know that at prices higher than equilibrium there’s a
surplus; so basically, the government promises to buy up any surplus created by the
price support. Note that a price support
is the same thing as a price floor, since
P
the government is saying price can’t go
below a certain point. We see with the surplus
S
floor quantity demanded is QDF and P
SUP
quantity supplied is QSF, so there’s a
surplus. The amount that it costs for the P0
government to buy up the surplus is just
D
the price being charged (the price
support level, pSUP) times the quantity of QDF Q0 QSF Q
the surplus (QSF ‐ QDF), or the area of the
rectangle. The deadweight loss is the area of the rectangle, minus the triangle on top
that falls above the supply and demand curves; this is because below the demand
but above supply is surplus value that could have gone to consumers and producers
if price were at equilibrium, and the area below supply is the cost of producing the
surplus of units that are of no value to anyone in the end. The rest of the rectangle is
just part of the transfer of wealth from taxpayers to firms.
The question of what is actually done with the surplus that the government buys
is tricky. Imagine the price support is instituted on the wheat industry. If the
government buys up a surplus of wheat, it’s tempting to want to say that they could
just give it to charity; the problem is, if they give it to charity, the charities are
buying less from the markets, driving price down and causing more of a surplus.
What usually happens, then, is the government either pays the producers not to
grow it in the first place, or they store the surplus until it rots.
Price Ceilings and Price Gouging Laws
Suppose we’re looking at the market for some scarce resource, such as ice, that
gets impacted during a natural
disaster. If a hurricane hits, the p
demand for the good will increase, S’
pD
shifting the demand curve right;
however, since the hurricane has p’ S
knocked down power lines and Price Ceiling
p0
closed off roads, it’s harder for D’
suppliers to transport or produce D
ice, and so supply is reduced,
QS Q0 Q’ Q
shifting the supply curve left. QD
It’s not clear what will happen
312
to quantity sold, but it is certain that equilibrium price will increase. What happens
sometimes is that people will complain about increases in prices during disasters,
and the government may step in and tell suppliers that they can’t raise price. If this
is the case, and p0 becomes the price ceiling, at new levels of supply and demand
there will be a shortage (QD – QS). The argument in favor of the price ceiling is that
even though quantity supplied is lower than it would be if price were allowed to
increase, the price is kept down and thus people of all incomes can afford the
commodities they need most during disasters. In other words, if price were allowed
to increase, the argument goes, only the rich would be able to buy ice. The problem
with this is that the task of rationing the limited quantity of good falls on the firms,
and they have to put people in lines to determine who gets what, which results in
many people, who would be otherwise willing to pay a higher price, unable to
purchase what they need.
Q’ would be new equilibrium quantity if price were allowed to rise to the new
equilibrium level, p’ (note, Q’ could be higher or lower than Q0; in this picture, we’ve
drawn it as being higher). The price that demanders would be willing to pay for
another unit given the shortage is pD. So, the “standard” deadweight loss is just the
value that would have been created if price was allowed to increase, and more units
were sold, or the area of the triangle in the graph.
The shaded rectangle to the left of the triangle may be deadweight loss as well.
This is because with a price ceiling, people are still willing to pay pD. So, they will use
their extra resources to try to increase their likelihood of obtaining the limited
goods. This could be in the form of bribery, hiring someone to wait in line for you,
etc. When taking this extra cost into account, the actual price that customers pay
actually approaches pD, but, potentially only p0 goes to producers – the rest may
largely be wasted (time spent standing in a line instead of doing something
productive). If this is the case, the shaded rectangle could be considered deadweight
loss as well.
Since we’ve just shown how taxes, subsidies, price floors and price ceilings all
create deadweight losses, it seems that there’s no reason they should exist.
However, in general, economists won’t mind taxing industries with spillover costs
(such as pollution) or subsidizing industries with spillover benefits, as this actually
causes the amount in production to approach the optimum.
313
Chapter 18 Terminology
The following is a list of terms that you should know in order to discuss and
apply the material from this chapter.
Consumer Surplus The benefit received by consumers who can buy a product for
less than their willingness to pay for it. Approximately, it is the triangular area
under the demand curve and above the market price.
Producer Surplus The benefit received by producers for selling goods at a certain
price. In essence, it is the markup that they are charging on their products.
Approximately, it is the triangular area above the supply curve and below the
market price.
Ad Valorem Tax A tax charged to suppliers as a percentage of the price they pay to
make a product. This type of tax would increase the slope of the supply curve.
PerUnit Tax A tax charged to suppliers per unit. This type of tax would cause the
supply curve to shift left, i.e. decrease supply.
Tax Revenue The amount of money received by the government created by
implementing the tax.
Deadweight Loss The burned up value that occurs when the amount of resources
devoted to an industry is not optimal, i.e. when the consumer and producer surplus
is not maximized.
Price Floor An inefficient minimum level above equilibrium price that can be
charged for a product. The government usually imposes this because equilibrium
price is considered too low. The government will decide to buy any surplus created
by the floor.
Price Ceiling An inefficient maximum level below equilibrium price that can be
charged for a product. The government usually imposes this after a disaster to
prevent price gouging. The argument in favor of this type of intervention is an
increased fairness.
314
Chapter 19
Market Structure Wrap Up
Summary of Models
The figure below provides a visual summary of the models we covered in the last
four chapters. At the far left, we have the simplest, monopoly. A monopoly’s product
is so differentiated from its nearest competitor that its pricing and output decisions
have no impact on any other firms, and no other firm’s decisions have any impact on
it. Of course, it is hard to imagine any firms whose market power is that extreme in
reality, but, it provides a very useful theoretical benchmark.
Assuming the monopoly generates economic profit (which is not a sure thing –
who wants a monopoly on 8‐track cassette manufacturing), other firms will want to
enter the market to claim their share of the pie, if entry is possible. Three kinds of
things may stand in the way of entrants – legal barriers, the incumbent’s strategy,
and sheer economies of scale. Broadly speaking, legal barriers would include patent
protections, licensing requirements, and any other form of regulation, tax, or
government policy that makes it hard for a new competitor to enter a market.
Possible incumbent strategies – like entry‐limit pricing – we will cover later.
Homogenous Products:
Oligopoly:
Capacity limits quantity
Reaction functions
comp, excess capacity
slope down in quantity
fierce price comp, cost
comp, price comp can
management accentuated,
advertising at industry level yield π=0 with 2 firms
Perfect Competition
Fewer Entry Barriers
Monopoly Supply and Demand
Mean More Firms in LR where π≈0
Monopolistic
Differentiated Products: Competition
Oligopoly:
Price or quantity comp
Reaction functions
(depends on determinants of
slope up in price comp,
capacity limits), firm level
down in quantity comp
advertising accentuated, cost
management muted
If economies of scale are so large that if one firm is in the market it makes a
profit but if a second enters the market both firms make losses, the industry is
characterized by natural monopoly. Traditionally, many utilities are examples of
natural monopoly. For example, the electric power industry, at least on the
315
distribution end, is taken to be a natural monopoly. The reason is that it is so
expensive to put in place the infrastructure for electricity distribution that it would
never make sense to operate two parallel distribution grids at the same time. This is
one reason why utilities tend to be regulated – there is little room for market
competition to bring prices down closer to marginal costs, and, thus to bring the
equilibrium output close to the socially efficient level.
Ignoring legal and strategic barriers, if economies of scale are not too severe,
more firms enter. If the entering firms produce products that are essentially
identical, we have a homogenous product oligopoly (moving clockwise along the top
of the figure). If capacity is essentially unlimited, in that each firm has enough
capacity to meet any possible quantity demanded, price competition between two
firms can be so fierce that profits are driven down to zero, and the firms split the
market. Logically, it would then make no sense to build the capacity needed to serve
the whole market.
So, with homogenous products, the strategic decision has to do with capacity,
which indirectly determines price. The fact that products are homogenous means
firm strategy has more to do with cost reduction than with stimulating demand.
Indeed, the strategic effects of a cost reduction in quantity competition magnify its
benefits, while the benefits of any firm level advertising spill over to all firms. Thus,
firms in truly homogenous product industries tend to advertise cooperatively
through trade councils and industry associations, not individually and in a
competitive manner.
As long as additional firms can expect to earn profits by entering, they will
continue to do so. In the long run, no additional firms could hope to earn a profit if
they entered. If economies of scale are such that the industry can support only a
small number of firms, strategic considerations will remain important. Indeed, a
way for a firm or an entrepreneur to make a profit, at least for some time, is to find a
more efficient way to fill the needs of customers, reducing costs and incurring the
direct and strategic benefits of that. But, market forces simply mean these profits
will be eliminated over time.
On the other hand, if economies of scale are very small relative to the scale of
market demand, many firms can enter. As they do so, the market share of any
individual firm gets small. As this happens, the effect of any one firm’s decision to
sell another unit of output is spread over more and more firms. So, marginal
revenue grows ever closer to market price. In the extreme, marginal revenue is
exactly equal to price when market share goes to zero. No firm has a market share of
0, but, in practical terms, the strategic effects of one firm on others become trivial as
market share gets small. So, if we want to consider the performance of a market as a
whole, it makes sense to treat firms as “price takers” and thus ignore complications
arising from strategic effects when the number of firms is “reasonably” large. What
constitutes “reasonably” large is in the eye of the beholder and depends on your
judgment, given the needs of the question at hand. Industries of price taking firms
are termed perfectly competitive.
316
On the other hand, if each entrant produces a slightly different product from the
other firms in the industry, we need models of differentiated product industries.
Since the products are differentiated, entrants will not take all profit from
incumbent firms simply by undercutting their prices a bit. That softens price
competition – even when there is excess capacity, firms may make economic profits
if the number of competitors is limited. In that case, strategic decisions may involve
wither price or quantity. In markets where lead and lag times are short, so that it is
easy to expand or contract production sizably on an instant’s notice, it may make the
most sense to think about these firm’s engaging in price competition. In markets
where lead and lag times are significant, and in which it takes a long time to ramp
production up or down, it makes sense to think in terms of quantity competition –
with capacity being the strategic variable.
With differentiated products, the role of cost management is muted. If a firm
can’t capture as many of an opponents customers by undercutting their price, there
is less reason to invest in cost reduction. Further, the opponents will respond by
lowering their price as well, strategically offsetting part of the direct benefit of the
cost reduction. On the other hand, the role of advertising is enhanced. This might be
advertising to inform customers of a product’s existence and characteristics, to
persuade customers to switch, or, to build brand loyalty and thus prevent customers
from switching. Such advertising might emphasize differences that are potentially
important to customers, or, exploit human foibles and induce customers to prefer
one product over another for no reason embodied in the product itself.
When advertising is thrown into the strategic mix, the picture becomes much
more complex. That is because both price and advertising affect demand and
therefore marginal revenue. So, when one firm advertises more, their reaction
function shifts up – they will charge higher prices. That leads the other firm to raise
their price as well. However, the change in advertising might also decrease the other
firm’s demand, shifting their reaction function. To correctly analyze the situation, it
is necessary to simultaneously determine price or quantity and advertising in the
Nash equilibrium. While the complexity of that problem renders its solution
somewhat beyond the scope of the class, the important thing to take away is that
decisions about price or advertising should be made jointly, and, that changes that
lead one firm to advertise more or less will induce changes in both firm’s prices and
the other firm’s advertising.
Firms will continue to enter as long as they can do so profitably. When
economies of scale are such that more than a few firms can enter, the market
structure may be referred to as monopolistic competition. Since products are
differentiated, entry means an increase in product variety. As there are more and
better substitutes, each firm will face a more elastic demand. As the product space
becomes more densely packed, the firms have less and less control over their own
price. In the limit, marginal revenue approaches price as the number of firms
becomes very large. So, when economies of scale allow the number of firms in a
differentiated product market to become large, those firms essentially become price
takers.
317
Thus, whether entry occurs in the form of identical product (the top of the
figure) or differentiated (the bottom of the figure), if enough firms enter, the firms
become price takers in the limit. When firms are taken as price takers, we think of
them as perfectly competitive. Once we decide to just treat firms as perfectly
competitive, it becomes quite simple to relate the conditions facing a representative
firm market supply, and, to relate the interaction of market supply and demand to
the decisions of a representative firm. That gives us a simple yet powerful tool to
structure their thinking about market outcomes.
Generally, as markets become more competitive, prices fall, as do profits. Firms
may wish to undertake strategies to restrict the level of competition, or, at least, to
soften it. On the other hand, as markets become more competitive, output increases,
as does value added in the form of consumer and total surplus. Absent externalities
of one sort or another, and ignoring concerns about the initial distribution of wealth,
outcomes in a competitive market are socially efficient, in that everything with a
marginal value in excess of its cost is produced, and, in the long run, everything is
produced at the minimum long run average cost, so there is no excess capacity.
Thus, government rules and regulations and antitrust policies ostensibly aim to
promote pro‐competitive practices and to curb anti‐competitive practices and to
smooth over market imperfections.
In order to gauge the competitiveness of markets, it is necessary to measure
markets and market power. Concentration ratios are one way to measure how
competitive an industry is. Cm is the total market share of the largest m firms in an
industry. The idea is that the larger the ratio, the more concentrated the industry,
and, the less likely it is to be highly competitive. If zi is market share, and the n firms
in the industry are indexed in descending order of their market share, so i=1 is the
largest firm and i=n is the smallest firm:
m
Cm = ∑ si .
i =1 Firm 100s 10000s2
C4, which is the most commonly used, measures the total 1 20 400
market share of the largest 4 firms. 2 20 400
3 10 100
Simple concentration ratios are limited because they 4 10 100
say nothing about the other firms in a market. If the 5 10 100
largest four firms hold 60% of a market, it may matter a 6 10 100
great deal whether the next two largest firms have a 7 10 100
30% market share or a 5% market share. A Herfindahl 8 4 16
Hirschman index is an index that measures the
9 4 16
concentration of all firms in an industry. The formula is
10 2 4
n
Total 100 1336
HHI = 10000∑ si 2 .
i =1
It is multiplied by 10,000 simply to get rid of the large number of decimals that
arises when squaring small market shares, e.g. 0.012=0.0001. If an industry were a
pure monopoly, its HHI would be 10,000. If every firm’s market share were zero
318
(perfect competition), HHI would be 0. The table above to the right shows a sample
calculation for a hypothetical 10 firm industry. Concentration measures for selected
U.S. industries are shown in the table below.
Concentration Measures for Selected US Industries
Industry C4 HHI Industry C4 HHI
Fluid milk 4 101 Men's & boys' neckwear 6 140
Soft drink 4 710 Printing 1 48
Breakfast cereal 8 300 Pen & mechanical pencil 7 195
Bread & bakery products 4 581 Ready‐mix concrete 1 57
Sugar 5 856 Dental equipment & supply 3 437
Distilleries 7 209 Basic chemical 1 160
Furniture & related 1 57 Battery 5 958
Wood container & pallet 7 26 Petroleum refineries 4 809
Paper mills 5 883 Automobile 8 275
Corrugated & solid fiber 3 392 Boat building 3 573
US Census Bureau, http://www.census.gov/epcd/www/concentration.html
There are some inherent problems with using these ratios to measure
competitiveness. First, just defining the market can be very problematic. Geography
comes into play. For example, if you’re trying to measure the concentration of the
distillery industries and you leave out imports, you’re going to overstate the
concentration of the industry. If you’re looking at the ready mix concrete industry,
you must look city by city, since suppliers in New York won’t ship ready mix
concrete to Florida, due to basic logistics. The numbers in the table above are flawed
for both of these industries, since they include only US firms, but, include all US
firms. Thus, the more closely defined the substitutes are for a good, the better the
representation of the concentration ratio. Both time and intended use are
characteristics that need to be met in order for a product to satisfy a given customer.
An even bigger problem with these concentration ratios is that they ignore
strategic interaction – that is they ignore the nature of the competition between the
firms! We’ve seen models where there are two firms competing in homogenous
competition, and each has unlimited capacity, price competition drives profits down
to zero. The HHI for this industry would be 5,000, which would imply that it is
highly concentrated, but the firms are just breaking even. It would seem better, then,
to look at more direct measures of market power.
The most direct measure is simply the difference between price and marginal
⎛ η ⎞
cost, or, the mark up factor. Recall p* = ⎜ ⎟ MC , where η/(1+η) is the markup
⎝ 1+ η ⎠
factor. For a perfectly competitive industry, the markup factor is 1, or, price equals
marginal cost. To the extent that this factor increases, the market becomes
competitive.
319
The biggest problem with this method is that it is almost impossible to measure
short run marginal cost at a specific point in time accurately. First of all, it can be
hard to disentangle which costs are actually attributable to production of the
marginal unit. Second, we must measure the full economic cost of production,
including a normal (risk adjusted, after tax) return on capital. Often, about the best
we can hope for is that average cost is close to marginal cost and that the
accountants’ measures of total cost come close to capturing economic cost. In that
case, total accounting cost divided by output may be a reasonably good proxy of
marginal cost.
A 1987 working paper by Matthew Shapiro provides the most comprehensive
attempt to measure market power in US industries. He undertook an econometric
analysis of data from the U.S. National Income and Product Accounts. Not only is the
work just over 20 years old (at the time of this writing), but, the assumptions
involved in the econometrics are extremely tenuous – constant demand elasticity at
the firm level, constant returns to scale, constant share of labor in total cost. But, it is
still the best comprehensive set of estimates available. For any individual case, it
would be better to rely on accounting data specific to the firm(s) involved (which
may be proprietary and would not generally be easily available other than as a part
of some legal proceeding). But, our purpose is to paint the U.S. market in broad
strokes, and for that Shapiro’s paper is the best available. Calculations based on his
research are shown in the table below.
Market Power Measures for Broad US Industries
19491985
Demand Elasticity
Mark Up Firm/
Industry Factor Firm Market Market
Agriculture 1.01 ‐96.2 ‐1.8 0.02
Construction 1.24 ‐5.2 ‐1 0.19
Durable Manufacturing 1.40 ‐3.5 ‐1.4 0.40
Nondurable Manufacturing 1.42 ‐3.4 ‐1.3 0.38
Transportation 2.11 ‐1.9 ‐1 0.53
Communication and Utilities 2.25 ‐1.8 ‐1.2 0.67
Wholesale Trade 2.67 ‐1.6 ‐1.5 0.94
Retail Trade 2.25 ‐1.8 ‐1.2 0.67
Finance 1.22 ‐5.5 ‐0.1 0.02
Services 1.04 ‐26.4 ‐1.2 0.05
Shapiro, Matthew D., "Measuring Market Power in U.S.
Industry, NBER Working Paper No.2212, 1987
The first column of the table above shows Shapiro’s estimate of the firm level
profit‐maximizing mark up factor. Not surprisingly, it looks like agriculture and
services are highly competitive, with a mark up factor essentially equal to 1.
320
Communications and utilities show considerably more market power (though the
industries were regulated, so, it is hard to know what to make of that).
Transportation and wholesale and retail trade also showed considerable market
power over the period in question. With declines in transportation costs and the rise
of the internet, both of which make it easier to access substitute products, it is
almost certain that market power in those industries is lower now than it was on
average over the time period of Shapiro’s sample. Construction and both durable
and non‐durable manufacturing were intermediate cases – some market power, but,
not much.
The mark up factor on its own lacks an upper benchmark. It would be 1 for
perfect competition. But, with a pure monopoly, the mark up would depend on the
market level elasticity of demand. Shapiro and others have suggested comparing the
elasticity of demand faced by a firm (ηFirm) with the elasticity of demand for the
market as a whole (ηMarket) by dividing the market elasticity by the firm level
elasticity, (ηMarket/ηFirm). A perfectly competitive firm faces a demand elasticity if ‐∞,
so the ratio would be 0. A monopoly faces the whole market, so the ratio would be 1.
The ratio thus has a neat upper and lower bound corresponding to monopoly
and competition. There are two additional (related) problems, though. While the
firm level demand elasticity can be inferred from the mark up factor, we have to
obtain a measure of the market demand elasticity. That requires a lot of data and
provides more places to go wrong. Second, with differentiated products, it is difficult
to even define what “market” demand is. All that really exists are separate but
related demands for the different but related products. The best interpretation is
something like the average percentage decline in purchases when ALL prices go up
by 1%. That, obviously, is a hard thing to infer from any actually observed data.
Shapiro, however, provides econometric estimates of this additional parameter as
well.
The last column of the table above presents his estimates of (ηMarket/ηFirm).
Wholesale trade was quite monopolized over the sample period, as were retail trade
and communications and utilities. Communications and utilities were largely
regulated natural monopolies at the time. Wholesale and retail trade likely had
monopoly power due to a combination of economies of scale and geographic
isolation. Most likely, they are far more competitive now. Agriculture, construction,
finance, and services appear quite competitive, overall.
Any attempt to quantify market power will be imprecise. If it is based only on
observed market share and concentration indices, it will be plagued by problems of
market definition and the fact that it ignores the nature of competition. If based on
observed mark ups, it will suffer from the difficulty of measuring actual economic
marginal cost at any particular point in time. Estimates of concentration and mark
up are important clues about market power, but, are not sufficient in and of
themselves to provide evidence of either market power or anti‐competitive
practices. It is also important to consider things like the availability of substitutes,
the presence or absence of economies of scale and excess capacity, and, other
potential barriers to entry.
321
Antitrust Law and Policy
Since perfectly competitive markets allocate resources efficiently, the U.S.
government, through the Department of Justice, (DOJ) uses antitrust laws to restrict
anti‐competitive practices. What follows is by no means an exhaustive or precise
consideration of antitrust law or policy. It is merely a rough pass at some of the
larger issues.
The US Department of Justice considers industries with an HHI above 1800
“highly concentrated” and gives special attention to merges and other practices in
such industries. It also pays special attention to mergers that cause the HHI to
increase by over 100. These are benchmarks for attention, not hard and fast rules
about what merges to allow or when to break up a large company. Market definition
plays a crucial role in making such judgments, and, many economists and other
consultants are employed as experts in litigation to ague over exactly which
products are substitutes for the products of firms under DOJ antitrust scrutiny.
U.S. antitrust policy depends greatly on the “Rule of Reason” articulated in the
1911 U.S. Supreme Court ruling in Standard Oil Co. of New Jersey v. United States, 221
U.S. 1. The ruling held that only unreasonable restraints on trade are subject to
actions under U.S. antitrust laws, and that market power in its own right is not
illegal. Four our purpose, that means things are taken on a case‐by‐case basis. For
example, a merger that increases concentration and mark up, but, allows the new
larger firm to exploit economies of scale and scope and therefore to actually offer
lower prices to consumers would not be blocked simply because it might increase
market power by some measures. A contract or practice that may appear to restrain
trade on its face may be allowed if there is some sound economic reason for it
beyond simply restraining trade to boost profits at the expense of customers.
Anti and ProCompetitive Strategies
Firms, however, are perfectly happy to restrain trade simply to boost profits. We
regularly hear accusations in the press that some big firm or another is engaged in
predatory pricing, for example. In this section, we consider a number of pricing
strategies that have anti‐ or pro‐competitive effects. Whether or not pricing
strategies that drive out competitors are in a firm’s interest, or even feasible, is not
as straightforward as it might seem. And, when they are feasible and in a firm’s
interests, they might or might be found to violate antitrust policy – that is an issue
for a class in antitrust law or economics.
Preventing Entry with EntryLimit Pricing
One of the tools a monopolist has in order to keep out potential competition is
entry‐limit pricing. The idea is that the monopolist, when faced with possible entry,
could lower prices to a point where the entrant cannot compete. That would keep
the entrant out and leave the monopolist the sole firm in the industry.
However, remember our discussion on credible threats. The only incentive the
monopolist has for lowering price is to keep the entrant out. If the entrant actually
does enter, the monopolist has no incentive to keep prices low. So, the entrant can
322
predict that if they entered, the rational monopolist would accommodate the entry.
So, pricing below the profit‐maximizing level simply to deter entry is not a credible
threat that post‐entry prices will remain low.
Entry‐limit pricing is only feasible if the monopolist can commit to the threat of
keeping prices low after entry. The goal of the of entry‐limit pricing is to credibly
commit to a price and output level that
leaves the entrant’s residual demand
below his long‐run average cost curve, as LRACEntrant
in the figure to the right. If he can
reasonably commit to a strategy that
leads to the above outcome, the entrant
will never be profitable and thus won’t dEntrant
enter. How can the monopolist make his
threat credible?
One way is to reduce marginal cost, so the profit‐maximizing price limits entry.
This might be done through adopting a technologically advanced factory that has
high fixed costs but low marginal costs, profit‐maximizing price (where MR=MC)
will be low and quantity will be high. It might also be done by moving first and
building a much larger capacity than would otherwise be profitable. This is related
to they type of production technology adopted, trading a high fixed cost for a low
marginal cost to deter entry.
A first mover also has the chance to reduce marginal costs by moving so far
along the learning curve. Producing more than would otherwise maximize profit
early on can lead to opportunities to learn by doing that reduce cost and improve
efficiency. An aggressive first mover may learn a lot about the industry and develop
such an advantage in their distribution network, operating structure, etc. that the
competition is never able to “catch up,” making entry unattractive.
The problem is that even if you
accomplish both of these things, your p
marginal cost may still not be low
enough to keep out potential
entrants. Even if marginal cost is 0,
the profit‐maximizing price is p*
positive. If that price is higher than
the entrant’s cost, they will still have D
an incentive to enter. This is shown in
the figure to the right. MC=0
q* MR q
The next possibility is to develop a
reputation as a fighter. In this case,
the monopolist deters entry through his reputation, as opposed to a physical
reduction in marginal cost. What makes the threat credible here is the present value
of future payoffs from keeping entrants out; in other words, if this present value is
high enough, all players will rationally believe that the monopolist will lower prices
323
to secure his reputation as a fighter (as long as the monopolist doesn’t violate
antitrust legislation).
Given a monopolist that has secured a strategy that is credible enough to keep
entrants out, we now have to ask whether or not this strategy is a good one. To do
this, we need to compare the expected present value of profits when we engage in
entry‐limit pricing to the expected present value of profits without entry‐limit
pricing, or
EPV (π | Limit ) > EPV (π | Accomodate) .
If the profit from limiting is higher, it is a reasonable strategy. In a simple world, we
can imagine a monopolist who limits entry and gets a lower profit (πL) this period
and every period thereafter, versus one who doesn’t limit entry and gets a
monopoly profit (πM) this period, and an accommodating profit (πA) every period
thereafter, or
πL π
πL + > πM + A
r r
where r is the interest rate. Comparing πM with πL, we know the entry‐limiting
monopolist has taken steps, such as an investment in a plant with a very low
marginal cost or a very high capacity, but a very high fixed cost, in order to secure a
strategic advantage for the future of credibly keeping entrants out. Therefore, the
per‐period profit of the monopolist that is attempting to limit entry will be lower
than a monopolist that is simply maximizing profit (πM > πL). The deciding question
now becomes how much higher is the entry‐limit profit than the accommodating
profit (πL – πA). This will ultimately decide whether the strategic advantages are
worth the costs of obtaining a reputation as a fighter or reducing marginal cost to
the point that entrants are just not interested. That is, is the flow of future extra
profits from limiting entry high enough to justify the loss of today’s monopoly profit:
πL − π A
> π M − π L . There is no reason to think this will necessarily hold, even when
r
entry‐limit pricing is feasible, which it may not be.
Eliminating Competitors with Predatory Pricing
This is a tactic used to drive an existing competitor out of the market; however, it
relates to entry‐limit pricing in the sense that the threat must be credible. Thus,
temporarily lowering prices to drive out a competitor and then, once your
competitor is gone, raising prices to monopoly levels will only attract new
competition. For this to really work, it must be rational to lower prices today to
push a competitor out of the market, but also to keep prices low enough in the long
run to dissuade future entrants from entering the market.
To determine if it is rational to use predatory pricing to drive out current
competitors, simply compare the present value of profits using predatory pricing
with the present value of profits without using predatory pricing. Regarding legality,
it is not inherently illegal to price below your opponent’s cost. If you eliminate
324
competitors by having a lower marginal cost, that may even be good for customers.
Having a reputation for pricing below your own cost to harm competitors, though, is
grounds for more serious legal antitrust trouble.
PreEmptive Product Introduction in Differentiated Product Markets
Unless there are barriers to entry, π > 0 leads to entry. When we first introduced
differentiated products, there were two firms on opposing ends of a line that
represented different levels of a certain characteristic of a good. In reality, there are
many different characteristics that firms can use to differentiate their products.
Imagine two different properties of a good, x1 and x2, and imagine each point in the
following graph represents products of competing firms containing different
amounts of each characteristic. Remember, each characteristic could be anything
that affects customers’ preferences, such as sweetness, location, etc.
Suppose the dots in the figure below represent actual products, your firm’s
product is the light‐colored dot, and you are making a profit. The squares represent
potential points where another product may be introduced that would attract
enough customers to be profitable, but which would reduce your firm’s profits. That
is, you are vulnerable to entry in the market niches corresponding to the squares.
One way to combat this potential threat is to consider preemptive product
introduction. In other words, you could sell new products that would take some of
the demand away from your initial product, in order to keep other competitors out
of the market. Basically, product positioning is a very strategic decision that can be
used to keep other competitors out, as well as keep demand focused on your most
profitable product.
x2
Niches vulnerable
to entry
x1
Softening Competition with a Price Match Guarantee
This is where a firm guarantees that, on a similar product, any lower price
charged by a competitor will be matched, and possibly lowered by some additional
amount. On the surface, this seems like it protects customers that are buying from
the firm with the price match guarantee; however, this actually serves to insulate
price competition. Suppose two firms each have price match guarantees, and
imagine they are selling homogenous products. Without the guarantee, each firm
would want to undercut the other to take away market share, driving price down to
325
marginal cost. With the guarantee, neither firm has any incentive to undercut the
other, since the original will just match that price; thus, each firm leaves prices at
monopoly level. Thus, price match guarantees effectively restrain competition, and
are a mechanism firms use to keep prices high.
Limiting Entry by Raising Costs
Another way to limit entry is to increase costs for the entire industry. One way to
do this is by increasing fixed costs. Suppose there is a monopolist in an industry and
his profit is πM = 100. If entry were to occur, both firms would make a profit of π =
20. Now imagine the monopolist goes to the government and says his industry
contributes to global warming, and the government should charge every firm in the
industry $21 per year. This is essentially an increase in fixed costs, since it’s not
dependent on a firm’s output. The profit of each firm if entry occurs is now π = ‐1, so
the entrant does not enter. The monopoly profit becomes πM = 100 – 21 = 79, which
is better than 20, so this is a rational strategy to keep potential entrants out.
You can also attempt to increase variable
costs for the entire industry. Generally you will p
lose more profits (due to the higher costs) p1
p0 MC1
when compared to simply raising fixed costs.
As the graph illustrates, increasing marginal MC0
cost will result in higher prices, meaning a
deadweight loss for the industry and lower MR D
consumer surplus (due to the higher prices). So, q1 q0
when limiting entry by raising costs, it usually q
makes more sense to raise fixed costs rather
than raising variable costs.
Pricing Below Cost to Promote Competition Penetration Pricing
This pricing strategy is used to “penetrate” a market, typically when introducing
a new product. A supplier will charge an extremely low price, offer the product for
free, or even pay customers to try a very limited amount of their product to create
awareness in a market. This strategy is particularly effective when the incumbent
firm has some sort of a “lock‐in,” which is when a large group of customers are
already loyal to a particular product. A good example would be if a software
company that produced a substitute for Microsoft Excel were to pay consumers to
use its product, in order to break into the market. While this is pricing below cost,
since it promotes rather than hinders competition, it does not run afoul of antitrust
rules.
Vertical Foreclosure
Suppose there is a supply chain that has two downstream firms that compete
with one another for sales of final products to consumers, and a single upstream
firm that provides each downstream firm with a needed input. Now, if one of the
downstream firms merges with the upstream firm, there will be a new vertically
integrated firm, as well as the other downstream firm. The new firm can now
326
attempt to squeeze the extra downstream firm out of the market by marking up the
price of the components they sell to them. This is known as vertical foreclosure –
using market power at a different vertical point in the supply chain to reduce
competition at a given horizontal location.
Such behavior is an antitrust violation. But, it may still not be in the new
integrated firm’s best interest to engage in this type of pricing anyway, because
doing so may boost the downstream firm’s profits, but it does so at the expense of
the upstream firm’s profit. If the upstream division sells components to both the
downstream division and the competing downstream firm, it may actually be more
profitable for the upstream division to continue selling to both entities. The best
pricing decision depends on the strategic situation the firms are in.
327
Chapter 19 Terminology
The following is a list of terms that you should know in order to discuss and
apply the material from this chapter.
Concentration Ratio A measure of how competitive an industry is. Only the largest
firms are measured and compared to the industry as a whole. The larger this ratio is,
the less competitive the industry.
EntryLimit Pricing A tool used by a monopolist to lower prices to a point where
an entrant cannot compete. This drives the entrant out, leaving the monopolist the
sole firm in the industry, enabling him to again raise prices. This will be successful
only when a credible commitment is made to maintaining low prices after the
entrant enters.
Herfindahl Hirschman Index An index that measures the concentration of all
firms in an industry. Industries range from 0 (perfectly competitive) to 10,000 (pure
monopoly). The U.S. Department of Justice considers an HHI above 1800 “highly
concentrated.”
Penetration Pricing A pricing strategy that is used to enter a market, typically by
introducing a new product. Charging an extremely low price, offering the product
for free, or even paying customers to try the product are ways to create awareness
in the market.
Predatory Pricing A tool used to drive an existing competitor out of a market by
lowering prices to a point where that competitor cannot compete. This will be
successful only when a credible commitment is made to maintaining low prices after
the competitor exits.
Price Match Guarantee A strategy used by firms in an oligopolistic industry to
insulate price competition and keep prices high. The firms guarantee that, on a
similar product, any lower price charged by a competitor will be matched. When all
firms use this strategy, they will have no incentive to undercut each other and can
set prices at the monopoly level.
Vertical Foreclosure Occurs when there are two downstream firms and one
upstream firm in a supply chain, one of the downstream firms merges with the
upstream firm, and the upstream firm no longer purchases from the remaining
downstream firm.
328
Part 6
Firm Structure
329
Chapter 20
Input Procurement and Contracting
Ways to Obtain Inputs
Every stage of the production process requires inputs. There are three main
ways to get them: 1) buy them in the spot market, 2) make them, or 3) sign a
contract with an input supplier. Each is appropriate in different situations.
The spot market is just the free‐market economy in its purest sense; you go to
the market to determine the market price of whatever input you need, and buy as
many of them as you require. Using the spot market is the simplest way to procure
an input. It has the major advantage of allowing the firm to focus more of its
resources on producing the goods it will ultimately sell. Thus, using the spot market
to obtain inputs keeps the firm more specialized.
The spot market functions well as long as there is near perfect information,
products are very standardized, and transactions costs are low. If all of these related
conditions are met, supply and demand will be able to accurately represent the cost
and value of the good, and the spot market will be able to facilitate trade efficiently.
But, the spot market tends to break down if any of these three conditions are not
met. The more differentiated an input is, or the higher the transactions costs of
finding it, the thinner the market will be. These are related; a lack of standardization
means a thin market in which much depends on search and negotiation, which
means high transactions costs. Similarly, when information is incomplete, it may
take a great deal of time just to gather enough information on which to base a
decision, and when information is too asymmetric, every transaction can become a
time consuming negotiation.
A firm that vertically integrates several stages of the production process is
internally creating some of the intermediate inputs needed for its final product. This
gives the firm the most direct control over the production of the input. When the
input needed by the firm is highly specialized and it is simply too hard to negotiate
with an input supplier to provide it, this may be the best solution. But, it also diverts
more of the firm’s focus from producing the goods it will ultimately sell to its
consumers. So, the firm becomes less specialized.
Signing a contract with an input supplier is an intermediate solution. The firm
remains more specialized, but some of its resources are devoted to negotiating
contracts and maintaining the relationship with the input supplier. The economic
aspects of contracting with input suppliers, especially when faced with adverse
selection or moral hazard, is the subject of the rest of this chapter. It should go
without saying that this is not a course in contract law – but I feel it should be said
anyway. The text provides only an overview of the economic aspects of contracting
– when and how contracting can increase the profitability of the firm. How to
actually negotiate, write, and enforce a contract is, of course, well beyond the scope
of a class in managerial economics.
330
Contracting and Optimal Contract Length
There are both benefits and costs to contracting, and specifically, to contract
length. The major benefit of signing a contract is that it avoids the costs of search,
bargaining, negotiation, and other transactions costs that would be incurred if the
firm had to seek out an input supplier every time it needed more of the input, spell
out in detail exactly what they need, and negotiate purchases one by one. With a
contract, all those details are worked out one time in advance, and then, for the
length of the contract, they are not incurred again. The higher those types of
transactions costs, the higher the marginal benefit of extending contract length a bit.
One of the major costs of contracting has to do with the complexity of the
contracting environment. How many contingencies can be foreseen? How easily can
what to do in each contingency be agreed upon? What must be done so the contract
is readily enforceable in each foreseeable contingency? How expensive is it to pay
for enough time for the firm’s managers, lawyers, etc… to actually negotiate the
contract. The more complex the contracting environment, and, the more expensive
it is to negotiate the contract, the higher the marginal cost of extending the contract
length a bit.
The other major cost of contracting has to do with the cost of being tied down in
the future. This cost has primarily to do with raw uncertainty ‐ the unknown
unknowns. To the extent that unknowns can be numerated and assigned rough
probabilities, they can be dealt with (perhaps at great expense) in the contingencies
spelled out in the contract. But, what to do in contingencies that can not be
reasonably foreseen before hand obviously can not be spelled out in a contract. The
firm might find being tied into a contract excessively restrictive if radically new
technologies lead them to adopt a production process that no longer needs the input
specified in the contract, for example. The more important this type of raw
uncertainty, the higher the marginal cost of extending contract length a bit.
The optimal length of a contract MB
balances the marginal benefits of MC”
& MC
additional contract length against MC
their marginal costs. This is shown in
the figure to the right (L*). An
increase in the transactions costs of
obtaining the input without a
contract increases the marginal MB’
benefit of contact length (to MB’) and
increases contract length (to L’)
given the initial marginal cost curve. MB
An increase in the complexity of the
contracting environment or the costs
of being tied into a longer contract in L” L* L’ Length
the face of uncertainty increase the
marginal cost of contract length (to MC”) and decreases contract length (to L”) given
the initial marginal benefit curve.
331
Contracting and Asymmetric Information
Sometimes different parties to a transaction have significantly different
information regarding certain aspects of the transaction – there is asymmetric
information. The simplest example is if an individual wished to sell a box full of
money, but only he (and not the buyer) knew how much was inside. The buyer could
reasonably conclude that the seller wouldn’t be willing to sell the box for anything
less than the money in it, and thus would never want to buy the box. This is because
of the asymmetric knowledge the seller has.
In general, there are two categories of asymmetric information. The first is
adverse selection, which is when one party knows some characteristics about
themselves that are hidden from the other. Insurance markets provide a good
example of an adverse selection problem. Suppose an insurance company provides
health insurance to two types of people, one that is high‐risk and one that is low‐
risk. In order for the insurance company to break even on expected losses, they
must charge a premium somewhere in between what they expect to lose from both
low‐risk and high‐risk types. Thus, it is possible the low‐risk types will determine
they are being over‐charged by the insurance company to compensate for the high‐
risk types, and decide insurance is not worth buying. The company can reasonably
predict this, and as a result they will charge the remaining high‐risk types a higher
premium. This result is where the name adverse selection comes from ‐ because
there are hidden characteristics about the people who desire health insurance, the
company ends up insuring the people who are most likely to get sick and thus to file
claims.
The second category of asymmetric information is moral hazard, which is when
one party’s actions are hidden from another party. An example of a moral hazard
problem is a driver who has auto insurance. Since the insurance company cannot
monitor the driver’s behavior all of the time, the driver’s actions are hidden from
the company. If the driver is fully insured, he has less of an incentive to drive
carefully.
Both types of information asymmetry affect the contracting problems faced by
firms when procuring inputs. When purchasing an intermediate good, it is very
likely that the input supplied will eventually know more about the cost of producing
the input that the firm purchasing it will. That will give them an incentive to
overstate costs in an attempt to extract a higher payment. When a board of directors
hires a CEO, the will know much more about how hard they worked than will the
board of directors. We consider each of these contracting situations in turn.
Contracting in PrincipalAgent Relationships with Adverse Selection
Imagine a principal is contracting with an agent for a task, such as fixing a car.
At the time of the contract, the principal does not know how hard or expensive (in
time, money, or utility terms) the work will be. The agent may or may not know how
difficult the task will be at the time the contract is signed, but he will find out before
the task is undertaken. If the task is expensive, we will assume the agent can walk
away without fulfilling the contract if doing so would bankrupt them. That is, the
332
contracts are enforceable only as long as the agent remains solvent – bankruptcy
laws do not allow the principal to force the agent to take a loss. The principal wants
to make sure that the agent does not lie and overstate cost in the event the job is
easy, but also to make sure payment is high enough if costs are high to prevent the
agent from taking a loss, because then the agent would walk away and the task
would not get done. Because of this, the principal has to allow for two different
payments within the contract: a payment if the work turns out to be hard, and a
payment if the work turns out to be easy.
The principal will never know by direct observation whether the work was hard
or easy; he has to rely only on what the agent tells him. So, the principal is at an
information disadvantage resulting from hidden characteristics about the nature of
the agent’s work. Assuming the principal must get the job completed, such as the
repair of his car, he must allow in the contract the possibility that the repair will be
expensive, and thus the possibility that he will have to pay a high price. The problem
with allowing for this possibility is that the agent now has the incentive to lie about
the difficulty of the job in order to get the high payment. The solution to this
problem is to arrange the contract in a way that incentivizes the agent to be honest
about the true costs of his work. Even so, the information disadvantage will prove
costly to the principal.
Model
To develop a model for this contracting problem with adverse selection, assume
we want to procure some inputs from an agent, such as bolts for some piece of
machinery we are building. We don’t know ultimately how much it’s going to cost
the agent to produce these inputs, but at the time of the contract we have some idea
of the how likely it is that cost will be high or low. For the purposes of this model, we
make the following definitions.
q is the number of inputs the agent produces, and it can either be the amount
that he produces if costs are high (qH) or the amount that he produces
if costs are low (qL). Note qL > qH.
V(q) is the value to our firm of having q units of the input; it is the profit that
we can make in the future using these inputs, before subtracting the
costs of input procurement.
f is the probability the cost (to the agent) of making our inputs is low (CL(q))
1‐f is the probability that the cost (to the agent) of making our inputs is high
(CH(q))
P is the total (delivery) price called for in the contract, which is based on how
many units the agent produces; it can either be the total delivery price
if costs are high (PH) or the total delivery price if costs are low (PL).
We assume that the principal has the bargaining power; in other words, the
principal is creating this contract as a “take it or leave it” deal which is offered to
several competing production firms. As described above, we also assume the agent
333
has the ability to file for bankruptcy. This implies that if we don’t pay the agent
enough to cover his costs of production, he will file for bankruptcy and get out of the
contract. We have two participation constraints that follow directly from this:
1. PH ≥ CH (qH )
2. PL ≥ CL (qL ) .
These constraints ensure that the contract prices cover the agent’s costs of
production. We also have two incentive constraints that are used to make sure the
agent is honest about his costs. The first one is that the agent would get at least as
much (or a little more) profit from producing qH units if cost turns out to be high
than he does from producing qL units, or
3. PH − CH (qH ) ≥ PL − CH (qL )
Similarly, we want the agent to get at least as much (or a little more) profit from
producing qL units if cost turns out to be low than he does from producing qH units,
or
4. PL − CL (qL ) ≥ PH − CL (qH )
Just like in menu pricing, we have two selection constraints and two incentive
constraints; and just like in menu pricing, only one of each type bind. Looking at the
incentive constraints, #3 induces the agent not to lie about costs being low when
they are really high. Since a supplier would never come to a principal and say cost is
only $5 when in reality it is $10, we don’t have to worry about constraint #3
binding.
Looking at the two participation constraints, #2 says the price of the low‐cost
contract must be greater than or equal to the cost of producing qL units. If this
contract price were to ever be below the cost of production, the supplier could
simply say that costs were high and produce the lesser quantity qH and still make a
profit. So, we don’t have to worry about constraint #2 binding.
Given this information, the model tells us that we want to maximize value less
input procurement costs, which is
f (V (qL ) − PL ) + (1− f )(V (q H ) − PH ),
subject to the constraints
1. PH = CH (qH )
4. PL − CL (qL ) = PH − CL ( qH ) .
Rearranging constraint #4, and plugging in constraint #1, we get
PL = PH − CL (qH ) + CL ( qL )
PL = CH (qH ) − CL ( qH ) + CL (qL ) .
334
This last version of constraint #4 should make intuitive sense. It just says that if the
agent says cost is low, we pay them their costs of production, CL (qL ) , plus the profit
they would make if they lied and claimed cost was high, CH (qH ) − CL (qH ) . That
covers their cost and keeps them honest.
Substituting our constraints into the original problem we get
f (V (qL ) − cH (qH ) + CL (qH ) − CL (qL ) ) + (1 − f ) (V (qH ) − CH ( qH ) )
Now we can solve for qL and qH by maximizing this expression. Maximizing with
respect to qL, we get
∂ E ( π) ⎛ dV (qL ) dCL (qL ) ⎞
= f⎜ − ⎟ = 0
∂ qL ⎝ dqL dqL ⎠
and dividing by f we get
MB (qL ) = MCL (qL )
which simply says marginal benefit at qL equals marginal cost.
Maximizing with respect to qH, we get
∂E(π)
= f (−MC H (q H ) + MC L (q H )) + (1− f )(MB(q H ) − MC H (q H )) = 0 .
∂q H
Rearranging this, we get
f
MB (qH ) − MCH (qH ) = ( MCH (qH ) − MCL (qH ) ) .
1− f
Looking at this equation, we see that marginal benefit equals marginal cost plus the
extra term on the right. This is because the cost to us of increasing qH in our contract
is the production cost plus the fact that the low‐cost producer becomes more
tempted to lie and say cost is high. Thus, we request fewer units for qH than we
would if we didn’t have to worry about the low‐cost producer lying.
The following illustrates the adverse
selection contracting problem in a graph. MCH
Since the firm is the principal and is paying
the agent to procure a certain quantity of
MCL
inputs, the marginal benefit curve (MB) is the
firm’s marginal benefit from q units. The agent
will either be a high cost producer (MCH) or a MB(q)
low cost producer (MCL). Then, qLe is the
quantity that maximizes value added if cost is q
qHe qLe
low, and qHe is the quantity that maximizes
value added if cost is high.
From the above discussion, we know when cost is low we want MB = MCL; so, the
quantity in the low‐cost contract will be qLe. The quantity in the high‐cost contract,
335
however, won’t be the quantity that maximizes value added. This is due to the
propensity the low‐cost producer has for lying and saying costs are high.
Let’s look at another graph to illustrate this point. Let’s suppose for now the
probabilities of high and low cost are equal. We know the area under the MC curve
is total variable cost (ignoring any quasi fixed costs), and we know from the
participation constraints that the contract price (P) has to cover the total cost. If we
were to demand qHe units in the high‐cost contract, the price of the high contract
would be as shown in the left figure below. Now, suppose the cost to the producer
turns out to be low. If he lies and says that cost is high, he will get this high contract
price; since his actual costs are based on the MCL curve, his profit will be as shown in
the figure on the right.
MCH MCH
MCL π MCL
PH
MB(q) MB(q)
qHe q qHe q
The only way to reduce the producer’s incentive to lie is to restrict qH; by doing
this, we reduce the profit he makes from lying when costs are low; this means we
gain some profit because of the cheaper cost of the high‐cost contract. But what
happens to our profits if costs are actually high? Since we’d be demanding a quantity
that is less than the quantity that maximizes value added (qHe), we are losing
potential profits if costs turn out to be high. This is shown in the left panel of the
figure below.
So, every time we restrict qH, we lose some profit because the next unit has a
higher marginal benefit than its marginal cost, but we gain some profit because the
contract price we have to pay for the high‐cost producer is lower. We will then
continue to restrict qH until these two marginal effects offset each other. This is
shown in the right panel of the figure below.
MCH MCH
‐∆πH ‐∆πH
MB(q) MB(q)
qHe q qH qH* q
336
This is precisely where our earlier solution for qH comes from; the distance –∆πH
is MB(qH)‐MCH(qH), and the distance ∆πL is MCH(qH)‐MCL(qL). For us to find qH, these
have to be equal. Note that this is only exactly true when f = 1f. As f increases, the
probability of low cost increases, and as a result we have to restrict qH further as it
becomes more likely the producer will lie about having a high cost. As f decreases, it
is more likely that cost will be high, so the gain to reducing the incentive for lying
when cost is low is smaller and, as a result, qH will be higher.
Example
Let V (q) = 10q − 1 4 q 2 , f = 0.5, CL = 0.5qL and CH = 2qH.
The two binding constraints are
PH = 2q H and PL − 1 2 qL = PH − 1 2 q H .
Substituting the first into the second and solving for PL we obtain
PL = 3 q H + 1 qL
2 2
Expected profit function is
E(π) = f (V (qL ) − PL ) + (1− f )(V (q H ) − PH )
and substituting from the constraints gives:
( ) ( )
E(π) = 1 2 10qL − 1 4 qL − 3 2 q H − 1 2 qL + 1 2 10q H − 1 4 q H − 2q H .
2 2
Maximizing this we find
dE (π) 1
2(
= 10 − 0.5qL − 0.5 ) = 0
dqL
qL = 19
and
dE (π)
= 0.5 ( −1.5 ) + 0.5 (10 − 0.5qH − 2 ) = 0
dqH .
qH = 13
If, on the other hand, we had perfect information, we would simply find our
quantities by setting the marginal benefit equal to the marginal cost:
10 − 1 2 qL = 1 2 and 10 − 1 2 q H = 2 , so
qL = 19 and q H = 16 .
So, we can see that we reduce our contract for the high‐cost producer by 3 units
because of the information asymmetry.
337
Up to this point, we’ve been discussing adverse selection contracts in scenarios
where the producer is risk neutral but can file for bankruptcy. The reason this was
important was that it required us to fully compensate the producer if cost turned
out to be either high or low. Suppose, however, that the agent can afford to take the
loss if cost turns out to be high. Then, if the agent signed a contract that will lead to a
loss if cost is high, they can’t get out of it if cost really does turn out to be high, they
just have to take the loss.
If this is the case, the timing of the uncertainty of the contract and the attitude of
the agent toward risk become important. If neither party knows what costs are
going to be at the time of the formation of the contract, AND the agent is risk neutral,
the principal can simply demand the quantity that maximizes value added and pay a
contract price that compensates the agent for expected costs; that is, the price of the
contract takes into account the likelihood of costs being either high or low. The
participation constraint is simply:
f ( PL − CL (qL ) ) + (1 − f )( PH − CH (qH ) ) ≥ 0 .
Then, if the agent takes the contract, he will be forced to produce the requested
quantity; if cost turns out to be low, he will make a profit, and if costs turn out to be
high, he will take a loss. So, if the agent can afford to take a loss on a high‐cost
contract, we don’t face the “bankruptcy constraint” that we dealt with in the first
method. In that case, we get the efficient quantity for either high or low cost
because, at the time of signing the contract, and since there is no way out of the
contract once it is signed, there is no information advantage to the agent, and, no
disadvantage to the principal.
Now suppose that the agent we’re contracting with is not a firm that produces
inputs, but rather is an individual. When we were dealing with a firm, we just had to
worry about covering their costs of production. This is because firms are risk‐
neutral. We know, however, that individuals are risk‐averse. So, if we want an
individual to participate in our contract, we need to look at their expected utility, as
opposed to just expected costs. If f is the probability that the job is easy and UR is the
individual’s reservation utility, the participation constraint then becomes something
like:
fu ( PL − CL (qL ) ) + (1 − f )u ( PH − CH (qH ) ) ≥ uR .
If we’re dealing with an individual, however, introducing uncertainty about the
outcome introduces risk into the individual’s compensation. Since the individual is
risk averse, they will demand higher compensation on average to make up for the
risk. So, there is a tradeoff – making the high cost option less attractive increases
risk and therefore increases expected compensation costs, but, it also decreases the
incentive to lie, reducing expected compensation costs. The optimal contract will
strike a balance between these two opposing forces.
338
Contracting in PrincipalAgent Relationships with Moral Hazard
Now we’re going to focus on contracting issues between principals and agents in
the context of a moral hazard problem – when the agent knows more about their
actions than does the principal. Suppose a board of directors is hiring a new CEO for
a company; then, the board of directors is the principal of the relationship, and the
CEO is the agent of the relationship. The issue of moral hazard arises immediately
because the principal is unable to actively observe all the work that the agent does.
As a result of this, even if the principal feels the agent didn’t work hard at the task, it
would be hard to tangibly prove it in a court. This leads us into a discussion about
what the principal can do to motivate the agent to work diligently at their tasks.
The natural solution to this difficulty is for the board of directors (the principal)
simply to pay the CEO (the agent) based on company performance. That is, the
board is hiring the CEO to make the firm profitable; so, pay him if the firm ends up
making a profit, and don’t pay him if the firm doesn’t make a profit. This is a suitable
solution in a world with no uncertainty. If we could be certain the firm would be
profitable if the CEO worked hard, and unprofitable otherwise, then we could pay
him accordingly. But what if there were some way the firm could be profitable if the
CEO did not work hard, and, unprofitable even if he does work hard? We wouldn’t
want to punish this industrious CEO simply because of something that wasn’t under
his control, or, overly reward those who just get lucky.
Because these types of situations occur in the real world, we have to find a way
to reconcile this uncertainty. The only way the board will hire the CEO is if they pay
him in a way that is contingent on company profits, in order to incentivize him to
work hard. From the CEO’s perspective, however, the contract has become risky;
that is, he knows there are certain elements that influence company profits that
aren’t under his control. We know individuals are risk averse. Since his salary is now
tied to company profits, and company profits are contingent on uncertain things
that he has no control over, the company will have to offer more money for the CEO
to compensate for this added risk.
The principal thus faces a trade‐off between risk and incentives when designing
a contract for the agent. To make it more likely that the CEO will work hard, the
principal has to add greater incentives for working hard. Incentivizing the CEO to
work hard leads to higher profits for the company. In the face of these high
incentives, though, the contract has become more risky; the CEO will demand a
higher salary because of this, which leads to lower profits for the company. The
optimum contract balances these two effects.
In addition to profit, there are other indicators to use as signals of the effort of
the agent, such as costs, sales, complaints, reports of monitors, and the performance
of other firms (yardstick competition). The idea is to tie the pay of the agent to
signals that best represent the job he is being hired for. So, if the agent is a CEO,
overall company profit is a good signal. If the agent is a production manager, tying
his contract to the firm’s production costs is a better way to incentivize him to work
hard at what he’s supposed to do – it is less random and more under his influence.
339
Model
For the purposes of this model, we make the following definitions.
V is the value of the firm, which can either be high (VH) or low (VL)
e is the effort of the agent, which can either be high (eH) or low (eL)
f is the probability that the value of the firm is high given the agent works
hard, or Pr(VH|eH)
g is the probability that the value of the firm is high given the agent doesn’t
work hard, or Pr(VH|eL)
u(w) – c(e) is the agent’s utility function, which is the utility he gets from the
wealth of the contract, u, minus the utility cost, or disutility, from putting
forth effort. For simplicity, assume the cost of high effort is c (that is, c(eH)
= c) and assume the cost of low effort is 0 (that is, c(eL) = 0).
uR is the reservation utility, which is the utility of the agent’s next‐best
option. If you don’t meet this level of utility with your contract, the agent
won’t take it.
wR is the reservation wage, which is just the single amount of wealth that
corresponds to the reservation utility for a job with low effort required;
that is, wR = u(wR).
Since we have limited the possibilities to two (high value or low value), this is
the only signal on which to base a contract. Let wH represent the wage paid to the
agent if value is high, and wL represent the wage if value is low. Note that we can
also think of these two wages in terms of a base pay and a bonus; that is, wL = wBASE
and wH = wBASE + bonus.
We will assume that the principal wants the agent to work hard. Given this, the
principal wants to minimize expected compensation costs. The expected cost of
hiring the agent is the probability of high value times the high wage, plus the
probability of low value times the low wage, assuming the agent works hard, or
fwH + (1 − f )wL .
This minimization problem is subject to two constraints. The first constraint is
the participation constraint. This says that you must pay the agent enough to
actually induce him sign the contract, which means his expected utility from the
contract must be greater than or equal to his reservation utility, or
E (u | eH ) = f (u ( wH ) − c) + (1 − f )(u ( wL ) − c) ≥ uR .
We use f here because we are only concerned with getting the agent to participate
when he works hard; we aren’t trying to get him to participate when he doesn’t work
hard. If we expand the above equation, we get
E (u | eH ) = fu ( wH ) + (1 − f )u ( wL ) − c ≥ uR .
340
Since the opportunity cost of the agent signing our contract is uR, this constraint
binds; that is, if we don’t meet this constraint, the agent walks away. We want to
make sure that his expected utility from the contract is as much as his reservation
utility – but making it greater just takes profits away from the firm. So, equality
holds for this constraint.
The second constraint is the incentive constraint. This constraint says the
agent’s expected utility from working hard must be greater than or equal to his
expected utility from shirking, or
E(U | e H ) ≥ E (U | eL )
We’ve defined the expected utility from working hard above; the expected utility
from shirking is the chance value is high given that he doesn’t work hard times the
high wage, plus the chance value is low given that he doesn’t work hard times the
low wage, or
E (U | eL ) = gu ( wH ) + (1 − g )u ( wL ) − 0 .
Notice the cost of effort here is 0, since the cost of shirking (using our notation) is 0.
Then, the incentive constraint becomes
fu ( wH ) + (1 − f )u ( wL ) − c ≥ gu ( wH ) + (1 − g )u ( wL ) .
This constraint also binds, meaning it holds with equality. If we don’t satisfy it,
the agent would have no incentive to work hard. If the agent works hard, they incur
an extra cost of c and get a higher probability of wH since f > g. So, the only way to
compensate for the cost of effort c is to increase the bonus given for success by
increasing wH or decreasing wL. Recall from our earlier discussion that increasing
the bonus introduces risk into the contract, which increase the average salary that
we have to pay the agent; so, we want to increase the “bonus” to the point where he
gets just as much utility from working hard as he does from shirking, but not any
higher, as this will just increase the firm’s contracting costs. So, the constraint will
be met with equality.
To solve this problem, let’s start out by working from the incentive constraint.
Rearranging the constraint for the cost of effort we get
( f − g )u ( wH ) + (1 − f − 1 + g )u ( wL ) = c or
( f − g ) ( u ( wH ) − u ( wL ) ) = c .
This form of the incentive constraint has an intuitive explanation. The right‐hand
side is just the disutility from working hard. The left‐hand side is the extra
probability of high value from working hard, times the extra utility from the high
wage. So, the left‐hand side is the expected benefit from working hard, and the right‐
hand side is the expected cost of working hard; as long as the agent “covers” the cost
of high effort by the expected benefit he gets from putting forth high effort, he’ll be
incentivized to work hard.
341
Finally, we will rewrite the equation once more in order to make our calculations
easier:
c
u ( wH ) − u ( wL ) =
f −g
Now let’s rearrange the participation constraint:
u ( wL ) + f ( u ( wH ) − u ( wL ) ) − c = u R
In this form, the participation constraint intuitively says the agent’s utility from the
base pay, plus his expected utility from the bonus, minus his utility from working
hard should equal his reservation utility.
Substituting the final form of our incentive constraint into our participation
constraint and rearranging, we get
f
u ( wL ) + c − c = uR
f −g
f
u ( wL ) = uR + c − c
f −g
g
u ( wL ) = u ( wR ) − c .
f −g
This says that the utility of the base wage equals the utility of the reservation wage
minus a multiple of the cost of effort. The more likely it is that the agent gets the
bonus for doing nothing, the lower the base wage is, in order to discourage the agent
from doing nothing. Similarly, the higher that the probability the agent gets the high
wage from working hard, the higher you make the base wage; this is because the
more likely it is that hard work leads to high pay, the less you have to discourage the
agent from doing nothing.
We now need an expression for the high wage. We can obtain this by plugging in
the expression for the low wage into the incentive constraint. This gives:
c
u ( wH ) = u ( wL ) +
f −g
g c
u ( wH ) = u ( wR ) − c+
f −g f −g
1− g
u ( wH ) = u ( wR ) + c
f −g
This says the high wage must cover the reservation wage plus a multiple of the cost
of effort. Looking at the multiple, we see the higher f is, the lower we can make our
high wage. This is because the more likely it is that the agent gets the bonus from
working hard, the less high we have to worry about incentivizing him to work hard,
and thus we can reduce the riskiness of the contract by reducing the high wage.
342
Conversely, as g increases, the higher we must make the high wage. This is because
g is essentially the probability that the agent gets lucky for shirking, and the higher
this probability becomes, the more we have to incentivize the agent to work hard by
raising the bonus.
Let’s summarize the implications of changes in the exogenous variables of this
model on the low and high wages we have to pay our agent in the following table.
Exogenous wH wL
variable
wR + +
C + ‐
F ‐ +
G + ‐
Now, let’s compare this scenario with one in which the agent completely
determines the value of the firm, and the firm has complete information about the
agent. In this case, f = 1 and g = 0. As f approaches 1, observe what happens to the
high wage. Looking at the utility of the high wage
1− g
U(w H ) = U(w R ) + c
f −g
and plugging in f = 1 we get
1− g
U(w H ) = U(w R ) + c
1− g
U(w H ) = U(w R ) + c
which says that we just have to set the high wage high enough to compensate for the
reservation wage, plus the cost of working hard. As g approaches 0, and looking at
the utility of the low wage
g
U(w L ) = U(w R ) − c
f −g
when we plug in g = 0 we get
0
U(w L ) = U(w R ) − c
f −0
U(w L ) = U(w R ) .
This says base pay is just equal to the reservation wage.
So, with perfect information, we’d pay a hard‐working CEO his reservation wage
plus the cost of working hard, and we’d simply pay a CEO that shirks his reservation
wage. The extent to which our actual numbers differ represents the cost of
343
uncertainty due to moral hazard; these can be attributed to the fact that the signals
we are using are imperfect, or, noisy.
Example
Let f = .8, g = .4, u(w) = w0.5, uR = 10, expected wage that we will pay our
and c = 8. The participation constraint agent is
is E ( w) = 0.8(484) + 0.2(4) = 388
0.8 wH + 0.2 wL − 8 = 10 If we were operating under complete
and the incentive constraint is certainty, we would set their wage
where their utility from working hard
is equal to their reservation wage, or
0.8 wH + 0.2 wL − 8 = 0.4 wH + 0.6 wL
wCERTAINTY − 8 = 10
wCERTAINTY = 324
Solving these two equations, we get
Thus, the cost due to asymmetric
wL = 4 and w H = 484
information is
Since we’ve satisfied both constraints E(w) − wCERTAINTY = 388 − 324 = 64 .
(indeed, that’s how we found our
wages) we know for sure that our
agent will work hard; then, the
Pitfalls of Incentive Contracts
The first thing that can go wrong with incentive contracts is when a principal
acts in the best interest of the agent, as opposed to the company. For example, if a
board of directors were hiring a new CEO, but there were some ties between the
board members and the CEO, clauses built into the contract may actually just be
golden parachutes that serve no purpose in actually incentivizing the CEO to work
hard. We won’t elaborate any more on this issue because in general we are
assuming that the management of the company will act in the shareholder’s best
interests.
Given that the principal is acting on behalf of the company, commitment
becomes the next major issue. We’ve designed an incentive contract to ensure that
the CEO behaves a certain way; it is crucial that we act exactly as we’ve specified in
the contract in order for it to work. The problem is that we know we’re only going to
pay the CEO if the company is profitable; but even if the CEO works hard (which we
know he will, since we’ve designed the contract that way) there’s a chance the
company will not be profitable, and thus we will have to pay him the low wage. Even
if we know he’s worked hard, and the company fails due to something out of his
control, we cannot pay him any sort of “sympathy” pay, as this would cause his
incentives to be awry from the beginning. So, even though it may seem brutal, both
parties must commit to everything in the incentive contract for it to work properly.
344
Another issue arises when parties start to renegotiate their contracts. If
management has an incentive contract with a new CEO, and after the CEO’s first year
he does well, it’s possible the management will rewrite the contract at the beginning
of the next year with “higher” incentives to induce the CEO to improve upon last
year. The problem with this is that the CEO may realize this, and as a result he would
have motivation to not improve as much as he could in order to leave himself room
for improvement over the long‐term. This is known as the ratchet effect.
Finally, since we are talking about powerful incentive contracts that are
designed to overcome massive information problems, it is good to be aware of the
law of unintended consequences. For instance, in our earlier example we found the
high‐effort wage to be 484; if this were in millions of dollars, and this bonus wage
were to be used as a way to get management to work hard, you have just given the
accountants an incentive to falsify their balance sheets. So, it’s important to make
sure your contracts measure things that actually add value to the company in order
to prevent the agent from finding an easier way of earning the bonus.
345
Chapter 20 Terminology
The following is a list of terms that you should know in order to discuss and
apply the material from this chapter.
Asymmetric Information A state in which one party knows more than others.
Moral Hazard A case of information asymmetry in which one party’s actions are
hidden from another party.
Incentive Contract Contract that works to decrease information asymmetry by
allowing the contractor to legally bind an agent to work hard instead of shirking.
Adverse Selection A case of information asymmetry in which one party’s
characteristics are hidden from another party.
Procurement Contract Contract that works to decrease information asymmetry
by allowing the contractor to legally bind the agent to tell the truth about the costs
of production.
Participation Constraint Constraint that must hold true in order for a party to
participate. In the case of contracting with moral hazard, the contractor must pay
the agent enough to actually make him sign the contract; the agent’s expected utility
from the contract must be at least as much as his reservation utility. In the case of
contracting with adverse selection, the contractor must pay the agent enough to
cover costs for the agent to sign the contract; the price the contractor pays the agent
if costs end up being high must be at least as much as the high cost of the job and the
price the contractor pays the agent if costs end up being low must be at least as
much as the low cost of the job.
Reservation Utility The utility that can be received at an agent’s next best job.
Incentive/Selection Constraint Constraint that must hold true in order for a party
to act a certain way or buy a certain membership or bundle. In the case of moral
hazard and contracting, the contractor will want to pay the agent enough to work
hard instead of shirking; the agent’s expected utility from working hard must be at
least as much as his expected utility from shirking. In the case of contracting with
adverse selection, the contractor must pay the agent enough to be honest about the
cost of the job; the agent must get at least as much profit from producing the high‐
cost number of units when costs are high as he does from producing the low‐cost
number of units. The agent must also get at least as much profit from producing the
low‐cost number of units if costs are low as he does from producing the high‐cost
number units.
346
Chapter 21
The Firm
In focusing on maximizing profit, and therefore shareholder value, we simply
assumed firms exist and that they are owned by shareholders. But, why do firms
exist in the first place? As argued long ago by Adam Smith in “The Wealth of
Nations,” division of labor and the resulting specialization boost productivity.
Imagine that we are dealing with some long process for producing a final good
valued by an end user. The good begins as raw materials and undergoes
transformations at many intermediate steps until the final product is ready for the
end consumer. We can imagine an economy in which everyone specializes in
whatever they are comparatively best at. Each individual would purchase “inputs”
from the individuals who best perform the previous intermediate step, perform
their own part of the task, and then sell the resulting “output” to the individuals who
perform the next intermediate step, and so on, until the final product is produced.
This hypothetical economy would function through the interaction of very
specialized individuals, or proprietors, with no role for firms. In this hypothetical
completely decentralized economy, if the value of a good increases or decreases, so
does the good’s price through the interaction of supply and demand. These price
signals induce resources to flow towards their highest valued uses.
In contrast, many distinct operations occur within a single firm, which operates
under a single centralized ultimate decision making authority. The same firm can
buy raw inputs, turn them into numerous intermediate goods, assemble them to
produce a final good through many related steps, and employ accountants,
marketers, and even janitors to facilitate the firm’s activities. This is the opposite of
specialization in some sense, since a single decision making entity is performing a
multitude of functions, some of which are only very loosely related to others. Within
the firm, there are no price signals since everything is directly under control of the
management; thus, resources won’t be automatically allocated to the areas of the
highest value. Rather, the firm relies on the centralized authority to formulate the
plan on which the firm operates. Why not rely on the market for those decisions and
have more specialization? Why have firms at all?
To see the answer, we take a step back. Every stage of the production process
requires inputs. Those inputs must be procured in some way. There are three main
ways that an entity can obtain inputs. The first is to simply buy them in the spot
market. The spot market is just the free‐market economy in its purest sense; you go
to the market to determine the market price of whatever input you need, and buy as
many of them as you require. If all inputs could be readily and efficiently obtained in
the spot market, firms would be superfluous.
As covered in the last chapter, the spot market functions well as long as there is
near perfect information, products are very standardized, and transactions costs are
low. If all of these related conditions are met, supply and demand will be able to
accurately represent the cost and value of a good, and the spot market will be able to
347
facilitate trade efficiently. But, the spot market tends to break down if any of these
three conditions aren’t met. The more differentiated an input is, or the higher the
transactions costs of finding it, the thinner the market will be. These are related; a
lack of standardization means a thin market in which much depends on search and
negotiation, which means high transactions costs. Similarly, when information is
incomplete, it may take a great deal of time just to gather enough information on
which to base a decision, and when information is too asymmetric, every
transaction can become a time consuming negotiation.
If the spot market is inadequate, the next possibility is to negotiate a contract to
procure the input. That was considered in detail in the last chapter. If contracting is
too costly, the only remaining alternative is to make the input yourself – which is
exactly what a firm does. A firm that vertically integrates several stages of the
production process is internally creating some of the intermediate inputs needed for
its final product. So, the firm may be viewed as a way to improve efficiency by
internalizing spot market failures and centralizing decision making authority
related to that failure when it is not cost effective to overcome it via contracting.
We will now look more closely at three specific types of input market failures
that can occur when inputs are not standard, information is imperfect, transactions
costs are high, or input markets are not competitive. These are team production,
relationship‐specific investment, and double marginalization.
Team Production and Free Riding
Team production refers to a situation when multiple aspects of a production
process are for some reason inseparable. An example is the design of a new car; this
is a very specialized procedure that requires several different groups of people with
specific knowledge about different aspects of automobiles to come together and
agree on the designing, working in a way that is inherently interactive and
synergistic. Suppose a group of ten individuals gets together and begins work on
this new car, and imagine they plan on evenly splitting whatever profit the group
ends up with evenly among all members. Now, suppose one of the team members
could cancel weekend plans with his family and thereby add $4,000 of value to the
project. If the work is not done that particular weekend, the opportunity will be
missed because the project will be at too late a stage to implement the change. Since
ultimately he will only get one tenth of the profit the team makes, his profit will
increase by $400. Suppose he would only be willing to cancel his weekend plans if
he made at least $500. Then, he will not make the change, and $3,500 of value added
will be lost (the 4,000 increase less the opportunity cost of the canceled plans).
This is the nature of the free riding problem in team production situations. It is a
result of individuals bearing all the costs of their individual efforts but only
receiving a share of the benefits. This leads to individual team members putting in
less effort than optimal and instead “free riding” on the efforts of other group
members.
348
Model
Suppose players A and B are working together in a team production situation.
They split all proceeds evenly. For the purposes of this model, we make the
following definitions.
e: The amount of effort each individual puts forth. Player A’s effort is eA and
player B’s is eB. Effort measures both the duration and intensity of work.
V: The value generated by the project. It is a function of the total effort put in to
the project by all team members. In this case, the value of the project is V(eA,
eB).
C: The cost of effort, measured in terms of the amount of monetary
compensation the individuals would require to voluntarily put forth a
specified amount of effort. Player A’s cost of effort is CA(eA) and player B’s is
CB(eB).
S: The surplus generated by the project, or gross value less the cost of effort.
Suppose both players work together to create as much joint, or total, surplus as
possible. In this case, the total surplus of the project is
STOT = V (eA , eB ) − C A (eA ) − CB (eB ) .
Maximizing total surplus with respect to player A’s effort, we get
∂STOT
= MBA − MC A = 0
∂e A
where MBA is the marginal benefit of player A’s effort, and MCA is the marginal cost
of player A’s effort. For player B, we get the same thing:
∂STOT
= MBB − MC B = 0 .
∂e B MCA
If everyone on the team is working
efficiently, they should all be working where
the marginal benefit of their effort is equal to
the marginal cost of their effort. Let this level
of effort be called e*. The figure at right shows MBA
A’s marginal cost and marginal benefit of
effort assuming B puts forth effort eB*. The
“efficient” joint effort level for A is eA*. The eA* eA
same would be true for B.
Let’s now consider the Nash equilibrium if the players play this game
uncooperatively, that is maximizing their own surplus, not the team’s total surplus.
Player A’s surplus is
V (eA , eB )
SA = − C A (eA ) ,
2
349
since he only gets half of the value of the project, and bears all of the costs of his own
effort. Maximizing we get
dS A MBA MCA
= − MC A = 0 .
deA 2
The figure to the right shows the
marginal benefit of A’s effort, assuming
player B puts in the jointly efficient effort
level, eB*. Since there are two people in the MBA
group, the marginal benefit that player A
actually receives is MBA/2, the rest goes to
player B. If player A acts in his own self‐ eˆ A eA* eA
interest, he will put forth effort eˆA ; this level
of effort maximizes player A’s surplus by setting their marginal cost equal to the
portion of their marginal benefit that they receive.
For any given level of effort put forth by player B, player A will want to work less
than the effort level that maximizes total surplus. Player B will act in the same way.
Thus, each player has an incentive to “slack,” that is, to work less than the amount
that would be best for the team as a whole. At the Nash equilibrium, both players
are putting in less than the efficient level, but, the level that maximizes their own
surplus given the other player’s effort.
The value of the project depends on both the level of effort put forth by player A
as well as player B, so MBA depends on both eA and eB. Therefore, we can solve the
equation above for A’s choice of effort as a function of B’s effort level, that is, A’s
reaction function:
eA = RA (eB ) .
eB RA
If you were to do the same thing for player B,
you’d obtain a similar equation:
eBNE RB
eB = RB (eA )
As we have seen many times, the Nash
equilibrium occurs where the reaction
functions intersect. This is shown in the figure
to the right.
eANE eA
To summarize, since players on the team
receive only a fraction of the benefit of the
work they put in, everyone will free ride, putting in less effort than is in the team’s
best interest. As a result, total effort and thus total surplus will be below the efficient
level. That means that each player’s individual surplus when they play in a narrowly
self interested way is LESS than it would be if each instead played for the good of the
team. Thus, free riding in team production situations is very similar to testifying in
the prisoner’s dilemma. As the number of members on the team grows, this problem
gets worse, since each player receives only 1/n of their marginal benefit. The shares
350
need not be equal, but, if one player receives more than 1/n, that means another
must receive less than 1/n.
Example
Suppose V = 8eA eB ,
0.5 0.5
c(eA ) = 4 eA , and c(eB ) = 1 4 eB .
1 2 2
STOT = 64 − 1 (8) 2 − 1 (8) 2 = 32 .
4 4
Total surplus is Since each player gets half the
value and incurs the whole cost of
their effort, A’s individual surplus is:
STOT = 8eA eB − 1 4 eA − 1 4 eB
0.5 0.5 2 2
64 1 2
Maximizing with respect to player SA = − (8) = 16 .
2 4
A we get
B’s surplus is the same in this
⎛e ⎞
0.5
∂STOT 1 instance, since the problem is
= 4⎜ B ⎟ − e A = 0
∂e A ⎝ eA ⎠ 2 symmetric. This is the solution if both
players cooperate and put forth the
Maximizing with respect to player efficient level of effort.
B we get
Now suppose the two players do
⎛e ⎞
0.5
∂STOT 1 not cooperate; instead each
= 4⎜ A ⎟ − e B = 0
∂e B ⎝ eB ⎠ 2 maximizes their own individual
surplus. Player A’s surplus is
This problem is symmetric, so eA =
eB, in the solution to the two equations 8eA 0.5eB 0.5 1 2
SA = − eA ,
above. Using that simplifies the 2 4
solution. Substituting into the first and maximizing gives
partial derivative above gives:
⎛e ⎞
0.5
dS A 1
⎛e ⎞
0.5
1 = 2⎜ B ⎟ − eA = 0 .
4⎜ A ⎟ − eA = 0 deA ⎝ eA ⎠ 2
⎝ eA ⎠ 2
Since the game is symmetric, eA=eB
4(1) − 1 2 eA = 0
0.5
in the Nash equilibrium. We can solve
for the equilibrium using that fact and
4 = 1 eA substituting into the first order
2
condition above. We could also
eA = 8 = eB
proceed to find A’s reaction function.
At the efficient joint solution, both
A and B put forth 8 units of effort, the
gross value of the project is
V = 8(8)0.5 (8)0.5 = 64 and total
surplus is
351
⎛ eB ⎞
0.5 eA = eB = 4
1
⎜ ⎟ = eA
⎝ eA ⎠ 4 Value in this instance is
eB ⎛ 1 ⎞
2
V = 8 ⋅ 40.5 ⋅ 40.5 = 32 ,
= ⎜ eA ⎟
eA ⎝ 4 ⎠ player A’s surplus is then
1 32 1 2
eB = e3A SA = − (4) = 12 .
16 2 4
eA = 161 3 e1B 3 Player B’s surplus is the same, so
Player B’s reaction function is total surplus is
found the same way. We now use STOT = 24 .
symmetry to find the equilibrium
effort levels from the first order By not working together and
condition above. instead maximizing their own
individual surplus, they end up worse
⎛e ⎞
0.5
1 off individually and in total.
2⎜ A ⎟ − eA = 0
⎝ eA ⎠ 2
1
2 = eA
2
We next consider ways to overcome, or at least reduce, free riding in team
situations. One option is to work only with players who have strong reputations for
honesty and cooperation. Another, related, option is to play repeatedly with the
same partners, employing trigger strategies to obtain cooperative behavior.
A third possibility is to try to overcome free riding through contracting. If effort
is observable, the contract would specify that a player would not get paid unless
they put forth the jointly efficient effort – total value would be split among only the
players that put in the efficient effort. But effort is generally not measurable and
observable, only value is. In that case, the contract could only be contingent on the
total value produced. Imagine a contract that specifies a base pay level for everyone
and a bonus which is paid only if total value equals or exceeds what is expected if
everyone puts in the efficient effort. The strongest form of such a contract would be
to pay everyone 0 unless value meets the target, in which case the value is divided
among the team members. What if just one player free‐rides? If the contract is
enforced, no one in the group gets paid the bonus. So, no one has any incentive to
enforce the contract, they would rather be paid a share of the bonus than none of it.
Since everyone knows that no one has any incentive to enforce the contract, such a
contract cannot prevent free riding. The difficulties overcoming free riding through
contacting are even worse when effort is not only unobservable, but in addition
value is partially determined by random influences outside the team’s control.
The final possibility for overcoming free riding is to form a firm where the
shareholders are different individuals than the members of the production team.
352
Shareholders are “residual claimants”; that is, they have a claim to any residual
profits the firm has after everyone is paid. In the above contracting example, if one
person slacked off, nobody had any incentive to enforce the contract, which would
deny them all their bonuses. However, shareholders would indeed have the
incentive to enforce such a contract with their workers, because they get to keep the
value of the bonuses if gross value falls short of the target level. This is a very
important reason for the existence of firms – to create a body of shareholders that
has the incentive to enforce incentive contracts with members of the production
team.
This solution is not perfect. Imperfect information can get in the way of
effectively monitoring the team. In that case, writing contracts on observed value
places risk in the agent’s contracts, increasing what they must be paid. It can be
costly to negotiate a contract that everyone is actually willing to sign. Having a CEO
that the production team members, the shareholders, and the shareholder’s board
of directors believe in can be a tremendous advantage in this situation. Suppose the
CEO can observe effort reasonably closely, but, effort cannot be proven in court.
Then, the production team’s contracts can be simple, calling for a given base pay
plus a bonus IF the CEO gives them a strong performance evaluation. As long as
everyone believes that the CEO will do a good job of evaluating effort and will do so
honestly, the production team members will be willing to sign such simple
contracts. As long as the shareholders also believe in the CEO’s ability and honesty,
they are happy to have that sort of contract with their production team, and, to
enforce the contract in court if needed. Suppose a contract is formed that has a base
pay of $50,000, and a bonus pay of $50,000 if the group works hard. The explicit
part of this contract is the amount of money specified; but the contract also has an
implicit part that says it is up to the CEO (or whoever the monitor is) to decide how
hard the group members worked. Thus, having a trustworthy CEO who is skilled in
evaluating performance is critical in getting the group members to sign the contract
while keeping the cost of negotiating and enforcing such contracts low.
Relationship Specific Investments and the Hold Up Problem
A relationship specific investment is an investment made solely, or at least
largely, to facilitate a certain market transaction. It therefore loses all (or at least
much) of its value if the transaction does not occur. Consider an electric company
who builds a plant near a coal mine owned by another company and to lays railroad
track to get the coal from the mine to the plant. If the two firms are unable to then
negotiate agreeable terms for coal prices, the investments embodied in the plant
and the track are worthless. Thus, those investments are specific to the relationship
between the electric company and the firm, and, are not valuable outside of that
relationship.
There are three types of relationship specificity. The first is location specificity,
as in the example above. Due to where the asset must be located and the inability to
move it from one location to another, its value depends on a specific relationship.
The second is physical asset specificity. In this case, an asset is produced in a very
specialized way that is not valuable outside of the transaction it was designed to
353
facilitate. For example, if your firm invests in highly specialized machinery that can
only be used to build parts for a particular jet, this machinery is only useful as long
as you have a contract with the government to work on that jet; if this contract is
rescinded, the machines becomes useless. The final type is human capital specificity.
Many transactions between firms require managers and other workers of both firms
to get to know the other company’s finances, management structure, production
processes, etc… If the relationship between the firms falls apart, all of this human
capital knowledge becomes useless.
How is it that the specific nature of these investments can lead to market failure
and thus an additional role for firms? Their very specificity means no spot market
exists for them – everything depends on bargaining and negotiation, meaning there
are significant transactions costs and correspondingly there is not large group of
suppliers and consumers determining market prices through their interactions.
Bargaining and negotiation in this circumstance poses a glaring problem. Once
the party that must undertake the relationship specific investment has actually sunk
the investment, the other party KNOWS the investment has little or no value outside
the transaction. As a result, the other party has them “over a barrel,” so to speak,
and will attempt to renegotiate the terms of the agreement, to extract more surplus
for themselves. That is, if a party to a transaction sinks a lot into a relationship
specific investment, they open themselves up to being “held‐up” by the other party
down the road.
For example, suppose Michael wants to procure a measuring device that hasn’t
been designed yet and is very specific to his particular needs. He approaches Susie
and proposes an agreement whereby Susie invests time and money to create and
design these devices, which Michael will then buy from her. Suppose Susie must
invest $2,000 in time and money to create these devices, but after the initial
investment the cost of producing a single device is only $10. Michael agrees to pay
$40 per unit in order to compensate Susie for the initial investment; so Susie agrees.
But what if, after Susie has designed the devices and begun to manufacture them,
Michael approaches her and only offers to pay $11 per unit. It is too late for Susie to
recoup her losses on the initial investment, and since the devices are specific to
Michael’s venture, she cannot sell them to anyone else. So, she will have to agree to a
renegotiated price. Exactly how high that price will be depends on the bargaining
power of the two parties. The important point is the specific nature of the
investment makes it unrecoverable outside of this transaction, which means Susie is
vulnerable to opportunistic behavior on the part of Michael. To protect herself from
this, Susie will tend to under‐invest in product development, since she will have less
to lose. As a result, both parties will be worse off.
With relationship specificity, unless there is an explicit and readily enforceable
contract in place, the party making the specific investment can be “held up” and
pressured for renegotiation by the other party. Because the party that is making the
investment can predict this type of opportunistic behavior, they will tend to sink
less into the initial investment. This reduces value added, and potentially makes
both parties worse off compared to the jointly efficient solution. Theoretically, both
354
parties may need to sink resources in relationship specific investment, and, they can
each then try to exploit the other, meaning both parties may under‐invest in the
relationship.
Model
To model this situation, we assume that there is a single transaction between a
buyer and a seller, each of whom must make specific investments to facilitate the
transaction. We further assume that any negotiation or renegotiation will result in
an even split of the surplus which is at stake in the negotiation. We make that
assumption to keep things simple. The actual split will depend on the bargaining
power of the parties. But, different splits will not change the basic result, nor would
more complex bargaining models. We also define the following notation.
I: The amount of the relationship specific investment; the seller’s investment
is IS and the buyer’s investment is IB.
V(IS,IB): The value generated by the transaction, before subtracting the costs
of the investment, as a function of the investments made by the seller and the
buyer.
Total surplus, or value added, is the value generated by the transaction minus
the costs of the investment. In this case, it is
STOT = V(I S ,I B ) − I S − I B .
If both parties cooperate, they will want to maximize total surplus. Maximizing with
respect to the seller’s investment we obtain
∂STOT ∂V
= −1 = 0 .
∂I S ∂IS
This just says that the seller should continue to invest until the last dollar he invests
generates exactly one dollar of value added. Maximizing with respect to the buyer’s
investment gives a similar result:
∂STOT ∂V
= −1 = 0 .
∂I B ∂IB
Working together, both buyer and seller should invest until the marginal benefit of
investment equals the marginal cost of investment. We denote those jointly efficient
estimate levels IS* and IB*.
What if both parties anticipate renegotiation after the investment is sunk (the
hold‐up problem)? Then each party would want to maximize their own individual
surplus, as opposed to maximizing total surplus. Remember, we have assumed that
any later renegotiation will result in each party keeping half of what is at stake in
the negotiation. Then, the buyer’s individual surplus is the share of the value he
would take away from the renegotiation less his investment, or
355
SB = 1 V (IS ,IB ) − IB .
2
Maximizing with respect to the buyer’s investment gives:
∂SB 1
= MBB −1 = 0 .
∂IB 2
This means the buyer will only invest up to the point where half the marginal
benefit of his investment equals his marginal cost; in other words, he will under‐
invest. The same is true for the supplier.
In the figure to the right, we show the
marginal benefit of the buyer’s investment
assuming the seller makes the jointly efficient
level of effort. We can see that the level of
investment put in if the buyer maximizes his 1
own surplus ( IˆB ) is less than the level of MBB
½MBB
investment put in if the buyer maximizes total
surplus (IB*). I
IˆB IB*
Assuming both the buyer and seller
maximize their own individual surplus, IS RB
solving the first order condition above will
give the buyer’s investment as a function RS
of the seller’s choice (the buyer’s reaction IS
NE
356
further research reveals that you no longer have any need for the devices that you
originally contracted for, you may wish to change the terms for reasons that have
nothing to do with opportunistic behavior; but, the contract has to be airtight to
properly solve the hold‐up problem, so you’re stuck. The question then becomes
balancing the marginal benefit of a longer contract with the marginal cost of a longer
contract. If avoiding renegotiation were critical, a longer contract would be
preferred; but if you had to have multiple contingencies built into the contract for
every possible future outcome, the cost of writing the contract increases with its
length and thus a shorter contract would be preferred.
If neither of these methods solves the hold‐up problem, the only other
alternative is to vertically integrate and produce the inputs yourself. This creates an
upstream firm that takes raw materials and converts them into parts, which the
downstream firm then buys and converts into the final product. One problem that
results from this is that the new firm loses the specialization it had. Since the firm
now has several areas of production, management is required to process much more
information in order to efficiently run the firm. Another issue that may emerge is an
internal form of the hold‐up problem; that is, if there is relationship specific
investment going on between the upstream division and the downstream division,
the divisions now has an incentive to try to take advantage of one another. However,
there is no worry about enforcing contracts between the divisions in court; the CEO
assumes this role. So, assuming you have a trustworthy and knowledgeable CEO
who can make good decisions with the entire firm’s interest in mind, it is enough for
the CEO to directly manage the divisions’ actions and hold both divisions to the
firm’s internal agreements.
Double Marginalization
The third and final type of spot market failure leading to a role for firms which
we will consider is known as double marginalization. It occurs when multiple firms
in a supply chain have significant market power. Each adds their own profit margin,
resulting in a final price higher than what would be charged if all the firms jointly
chose prices to maximize profits. Profits are lower in total, final prices are higher,
and final output and consumer surplus are lower.
Consider the simplest possible supply chain ‐ a single upstream firm that
produces parts for a single downstream firm, which then buys these parts and
produces the final product for the end consumer (both are monopolists). Assume
the downstream firm needs one part from the upstream firm to produce one unit of
the final product. An example of this would be a firm that manufactures cars, where
the upstream firm produces engines for the downstream firm, and the downstream
firm uses one engine to make one car. Then, the total cost of making q cars is the
upstream cost of producing q engines, plus the downstream cost of using the
engines to produce q cars, or
CTOT (q) = CU (q) + CD (q)
and the total marginal cost is
357
MCTOT = MCU + MCD .
To maximize total profit, we want the marginal revenue of our final product to be
equal to the marginal cost of the product. Since the downstream firm sells the final
product, MRD is the marginal revenue. Thus, to maximize total profit,
MRD = MCTOT .
Now, consider the upstream firm. They are selling products to the downstream
firm, and assume that they are the only supplier; thus, they are a monopoly. If they
maximize their own profit, they set their marginal revenue equal to their marginal
cost:
MRU = MCU .
Since they are a monopoly, the price charged exceeds their marginal cost:
pU > MCU .
What now is the effective marginal cost to the downstream firm? The
downstream firm is buying products from the upstream firm at price pU, and
assembling them into end products at a cost of MCD; so their effective marginal cost
is
MCD′ = MCD + pU .
If the downstream firm maximizes their own profit, they set their perceived
marginal cost equal to their marginal revenue:
MRD = MCD ' = MCD + pU .
Observing that PU > MCU, we get
MRD > MCU + MCD .
This says marginal revenue from the last unit sold is higher than the total
marginal production cost. This is because the upstream firm maximizes profit by
charging a price that’s higher than their marginal cost, and so does the downstream
firm. In effect, the downstream firm is buying a product that’s already been marked
up, and then marking it up again upon selling it to the end customer ‐ which is
where the term double marginalization comes from. What this means for the final
price of the downstream firm’s end product is that it will be higher than the price
that maximizes total profits.
If we look at the downstream firm’s profit function, they face demand
represented by p(q) and have costs of production CD(q) as well as the costs of buying
the upstream firm’s components at pU; so, their profit function is
πD = p(q)q − pU q − CD (q) .
Maximizing we get
MR − pU − MCD = 0 .
358
If we solve for pU we get
pU = MR − MCD .
Thus, we have a function that describes the price that the upstream firm can charge
the downstream firm; in other words, we have an inverse demand function for the
upstream firm. This says that the highest price the upstream firm can charge is the
difference between the downstream firm’s marginal revenue and their marginal
cost.
The figure to the right illustrates
this scenario graphically. The upstream pU+MCD
firm maximizes profit, setting MRU pD
equal to MCU. Thus, they sell q
components to the downstream firm at MCU
a price of pU, which is determined by pU
their demand curve obtained from the
previous discussion (pU = MR – MCD). MCD
The downstream firm’s effective p(q)
marginal cost is the price they pay for q
the component plus their marginal cost MR
of production (pU + MCD), and when q
they maximize profit they set their pU = MR‐MCD
MRU
effective marginal cost equal to their
marginal revenue. This gives q final
products sold at a price of pD.
Now suppose these two firms
vertically integrated, thereby pU+MCD
pD
avoiding the double marginalization p* MCTOT = MCU + MCD
problem. Then, the total marginal
cost of producing the end product is MCU
just MCU + MCD. The situation would pU
look like graph to the right. The firm
maximizes profit by setting MR MCD
equal to MCTOT, and ends up with a p(q)
quantity of q* units at a price of p*. q
MR
So, we can see through the two
graphs that firms acting individually q q*
add two mark‐ups to the same MRU PU = MR‐MCD
product, and end up charging a price
that’s higher than the price that maximizes profit.
359
Example
( ) q − .8q − 0.2q .
dq 2
π = 11 − q
and solving for pU we obtain the 4
upstream firm’s inverse demand
Maximizing gives the following
curve:
solution.
pU = 10.2 − q 2 . dπ
= 11− q −1 = 0
dq 2
Now, the upstream firm will
maximize their profit. Their profit q = 20
function is
p = 11− 20 4 = 6
(
πU = 10.2 − q
2 ) q − 0.2q . π TOT = (6 − 1)20 = 100
Maximizing we find to following. So, prices are lower, quantity is
dπU higher, and total profits are higher
= 10.2 − q − .2 = 0 when the firms cooperate than when
dq
they operate independently.
q = 10
Just as with the other problems considered above, there are several potential
ways to deal with double marginalization. One is repeated interaction. We know
from our study of game theory that where cooperation may not be possible in a one‐
shot game, it may be possible in a repeated game. But, cheating is always a
possibility for one of the players, so this is not an end‐all solution to the problem.
Another is to form a contract that specifies what the price will be between the two
firms. But, as we know, contracts don’t always work. Plus, being tied down to a long‐
term contract can sometimes hurt the parties more than they benefit them.
360
The final way to deal with this issue is to merge the two firms into a single firm.
This is not without its limitations; after forming the firm, you now have two
different divisions, and if the managers of each of these divisions are paid based on
how well their division performs, they will be incentivized to increase the profit of
their division, rather than the entire firm. But, if you are able to have a CEO that can
ensure that the price the upstream firm charges the downstream firm is just the
upstream firm’s marginal cost (that is, pU = MCU), then the issue will be resolved.
This is known as transfer pricing, and the idea is that within a firm, the transfer
prices should reflect marginal costs as opposed to some other value that is being
marked up.
The Role of the Firm
In summary, creating a firm is simply a last resort that can be used to avoid
problems encountered when trying to procure inputs from either the spot market or
through contracting. The firm is a collection of agreements between agents
(management and workers) and principals (stockholders and management) that
defines who is the final arbiter of disputes. The reason a firm is a last resort is
because by creating a firm, you lose specialization and division of labor.
Nevertheless, the firm is a unique solution to problems inherent in dealing with
complex contracting environments, unique products, transactions costs, and
information problems. It targets areas of a free market economy that don’t function
as they are supposed to, internalizes the inefficiencies, solves them using a
centralized structure, and creates residual claimants to enforce agreements and
profit from them. This is summarized in the figure below.
High Transactions Costs?
Team Production? Spot
Large Specific Investments? No
Market
Too much market power up
or down stream?
Yes
Complex contracting
environment? No Contract
High degree of uncertainty?
Unenforceable Contracts?
Yes
Integrate the activity into
the firm.
361
Chapter 21 Terminology
The following is a list of terms that you should know in order to discuss and
apply the material from this chapter.
Spot Market A free market economy in its purest sense. People go to this market to
determine the market price of whatever input they need and buy as many of them as
they require. It functions well as long as there is perfect information, products are
standardized, and transaction costs are low.
Free Riding A problem that occurs in team production when a team member fails
to work hard, reducing his individual costs but receiving the same amount of benefit
from everyone else’s hard work. This can stem from a team production situation in
which individual team members bear all the costs of an action but only receive a
small share of the benefit.
RelationshipSpecific Investment An investment made solely for the purpose of
facilitating a certain market transaction, which loses all of its value if the transaction
does not occur. Three types include geographic location, physical asset, and human
capital.
HoldUp Problem Problem that occurs because the party making the relationship‐
specific investment is subject to opportunistic behavior by the other party in the
contract. Unless there is a very rigid and explicit contract in place that can be
enforced, the party making the investment can be “held up” and pressured for
renegotiation by the other party. Because the party that is making the investment
can predict this type of behavior, they will tend to sink less money into the initial
investment, causing the transaction to have less value than it otherwise could.
Vertical Integration Strategy by a firm that uses a hierarchical structure to
produce all parts of a product, each part at a different level of the hierarchy, instead
of buying them from the spot market or other firms. This occurs when a
downstream and an upstream firm merge.
Double Marginalization Problem that occurs when both a downstream and
upstream firm (unmerged) markup price over marginal cost. This causes price paid
by consumers to be much more than optimal.
362