Efficiency in Repeated Games Revisited The Role of Private Strategies

Efficiency in Repeated Games Revisited: The Role of Private Strategies Author(s): Michihiro Kandori and Ichiro Obara Source:
Econometrica, Vol. 74, No. 2 (Mar., 2006), pp. 499-519 Published by: The Econometric Society Stable URL: http://www.jstor.org/stable/3598808 Accessed: 12/05/2009 10:02
Your use of the JSTOR archive indicates your acceptance of JSTOR's Terms and Conditions of Use, available at http://www.jstor.org/page/info/about/policies/terms.jsp. JSTOR's Terms and Conditions of Use provides, in part, that unless you have obtained prior permission, you may not download an entire issue of a journal or multiple copies of articles, and you may use content in the JSTOR archive only for your personal, non-commercial use. Please contact the publisher regarding any further use of this work. Publisher contact information may be obtained at http://www.jstor.org/action/showPublisher?publisherCode=econosoc. Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printed page of such transmission. JSTOR is a not-for-profit organization founded in 1995 to build trusted digital archives for scholarship. We work with the scholarly community to preserve their work and the materials they rely upon, and to build a common research platform that promotes the discovery and use of these resources. For more information about JSTOR, please contact support@jstor.org.
The Econometric Society is collaborating with JSTOR to digitize, preserve and extend access to Econometrica.
http://www.jstor.org
Econometrica,Vol. 74, No. 2 (March,2006), 499-519
NOTES AND COMMENTS EFFICIENCY IN REPEATED GAMES REVISITED: THE ROLE OF PRIVATE STRATEGIES
BY KANDORI, MICHIHIRO AND OBARA, ICHIRO1 Most theoreticalor applied researchon repeated games with imperfectmonitoring has focused on public strategies: strategiesthat depend solely on the historyof publicly observablesignals.This paper sheds light on the role of privatestrategies: strategiesthat not only on public signals, but also on players'own actions in the past. Our depend main finding is that players can sometimes make better use of informationby using Our equilibrium in gamescan be improved. privatestrategiesand that efficiency repeated privatestrategyfor repeated prisoners'dilemma games consists of two states and has the propertythat each player'soptimal strategy is independent of the other player's state. KEYWORDS: Efficiency, imperfect public monitoring, mixed strategy, partnership privatestrategy,repeated game, two-statemachine. game, privateequilibrium, 1. INTRODUCTION THE THEORY OF REPEATED GAMES under imperfect public monitoring pro-
vides a formal frameworkwith which to explore the possibilityof cooperation in long-term relationships,where each agent's action is indirectlyand imperfectly observed through a public signal. It has been successfully applied to a number of economic problems: cartel enforcement, internal labor markets, and internationalpolicy coordination,to name a few. However, almost all existing works (includingAbreu, Pearce, and Stacchetti (1990) and Fudenberg, Levine, and Maskin(1994)) focus on a simple class of strategiesknownas public strategies.In this paper,we show how playerscan make better use of inforand show mation by using a more general class of strategies,privatestrategies, that efficiencyin repeated games can often be improved. A public strategyspecifies a currentaction conditionalonly on the historyof the public signal. A private strategy,by contrast,depends on one's own actions in thepast as well as on the history of the public signal. To see why this generalizationhelps to improveefficiency,consider a simple repeated partnership game with two actions {C, D} and two outcomes of public signal {good,bad}, where the stage game payoffs have the structureof the prisoners' dilemma. Assume that the signal is rather insensitive to a deviation at the cooperative
1Weare grateful to a co-editor and two anonymousreferees for their thoughtfulcomments. We also thankDrew Fudenbergand David Levine for an informativediscussion.This paperstems from two independent papers:"CheckYour Partner'sBehaviorby Randomization" Kandori by (1999) and "PrivateStrategyand Efficiency:Repeated PartnershipGame Revisited"by Obara (1999). An earlierversionof this paper is includedin Chapter1 of Obara's(2001) doctoralthesis. Obarais gratefulto George MailathandAndrewPostlewaitefor their adviceand encouragement throughoutthis project. 499
500
M. KANDORI AND I. OBARA
equilibriumto support(C, C) by a easy to show that there is no trigger-strategy mutualpunishment.(A formalproof can be found in Example 1 (Section 3.3).) However, suppose that the signal is quite sensitive to one's deviationwhen the opponent is playing D: Pr(badlD, C) = Pr(badlC,D) <<Pr(bad D, D). Now consider a trigger strategyto support a point that mixes C and D. Although mixing D causes inefficiency in the stage game payoffs, the public signal becomes more sensitive to deviationsand there is a possibilitythat some cooperation can be attainedby such a trigger-strategy equilibrium.Note that because uses onlypublic strategies,punishmentalways this trigger-strategy equilibrium occurs after the public outcome bad. However, we can further improve efficiency by using a private strategy equilibrium,where each player is punished only when the signal is most sensitive to her deviation, i.e., when the outcome is bad and the opponent chose D. Note the dependence of the punishment on the opponent's action. This is possible only when a private strategyis employed. Our first contributionis to point out this informationaladvantageof privatestrategies. Our second contributionis to find a method by which to constructa private equilibriumthat formalizes the idea above. Note that, with private strategies, each player may not know the other player'scontinuation strategybecause it depends on unobservablepast actions. Hence, players have to compute their beliefs (by Bayes' rule) about what the opponent is going to do, and the computation generally becomes fairly complex over time.2 Our private strategy, state and a punishwhich overcomes this difficulty,utilizes two states: a reward mentstate.An action (possiblymixed) is played at each state, and the transition probabilitybetween the states depends on the realized action-signal pair. As we argued above, at each point in time, each player may not know the other player'sstate or which continuationstrategyis being used by the other player. The trick is to choose the right mixed actions and the right transitionprobabilities to make one's incentives identicalno matterwhichstate the otherplayer is in. Such a constructionmakes one's belief about the other player'sstate irrelevant;one's action is optimal independent of her belief. A similaridea first appearedin Piccione (2002) in repeated gameswith privatemonitoring.A simpler constructionwith the two-state strategywas independentlyfound by Ely and Valimaki(2002) for repeated gameswith privatemonitoring,and by Obara (1999), on which this paper is based.
2. THE MODEL
point: Pr(badlC, D) = Pr(badlD, C) = Pr(badlC, C) + e. When e is small, it is
In this paper,we studythe followingsimple repeated partnershipgame. The action set for each player is {C, D} and actions are not directly observable.
2The same difficultyis found in repeated games with privatemonitoring.See Kandori(2002).
EFFICIENCYIN REPEATED GAMES REVISITED
501
There is a public signal o that can be either X or Y. We assume that the stage game expected payoffs have the prisoners'dilemma structure C
C 1,1
D
-h,l+d,
l+d,-h
0,0
where d, h > 0 (D is dominant) and d - h < 1 ((C, C) is efficient).3We also assume that (i) the signalingstructureis symmetric(p(wIC, D) = p(wolD,C)) and (ii) defection at (C, C) makes X more likely (p(XICC) < p(XIDC)). This game is repeated over time. The players share the same discount factor 8 E (0, 1). This is a simplifiedversion of the model examined by Radner, Myerson,and Maskin (1986) (RMM from now on).4 A public strategyspecifies a current action conditional solely on the history of public signals. A sequential equilibriumin public strategies is called a peris fectpublic equilibrium (PPE).5Aprivate strategy a strategywhere the current action depends on the history of public signals and past private actions. We call a sequential equilibriumin private strategies a privateequilibrium(PE).6 As shownby RMM, the folk theorem fails for PPE in this examplebecause the space of the public signal is too small.
3. ADVANTAGEOF PRIVATESTRATEGIES
3.1. UpperBound of PPE Firstwe present an analysisthat helps to determine an upper bound of PPE payoffs. Our analysisis a generalizationof RMM (1986) and Abreu, Milgrom, and Pearce (1991) (AMP from now on). The main difference is that we consider mixedpublic strategies.We show that the best PPE is sometimes in mixed strategies.7The analysisalso allowsus to gain new insightinto a general reason why private strategiescan outperformpublic strategies.
3We assume that each player'srealizedpayoff is a function of the public signal and her own action, so that the realized payoff does not reveal the opponent'saction. 4The action set is continuumin Radner,Myerson,and Maskin(1986). SDeviationsto nonpublicstrategiesare allowed. 6Note that it would be without loss of generality to restrict attention to public strategies if we were to consider only pure strategies (Abreu, Pearce, and Stacchetti (1990)) and the signal distributionhad a full support.For any pure strategy,there exists a payoff equivalentpublicpure strategy.Thus the set of pure strategyNash equilibrium payoffsand the set of publicpure strategy Nash equilibriumpayoffscoincide. Furthermore, with the assumptionof full support,the former coincideswith the set of pure strategysequentialequilibrium payoffsand the lattercoincideswith the set of pure strategyPPE payoffs. 7Fudenberg,Levine, and Maskin (1994) show that mixed strategiescan be useful in discriminating different players' deviations (i.e., to make their pairwise full rank condition satisfied) aroundany efficient action profile.The present analysisreveals anotherreasonwhy mixinghelps to improveefficiency.
502
Let us representa mixed action of playeri by the probabilityto play D, which is denoted by qi. Let Q be the set of mixed profiles (q1, q2), where (i) each player plays C with a positive probabilityand (ii) playingD with probability1 makes signal X more likely. Note that Q contains the efficient point (0,0) (where (C, C) is played for sure) and its neighborhood.In our examples, the equilibriawhose first-periodaction profile is in Q play a major role in determining a relevantupper bound of PPE payoffs. When an action profile in Q is played in an equilibrium,each player has an incentive to play D to increase the currentpayoff. This short-termgain should be wiped out by future payoff reductions.Then condition (ii) implies that both playersshould be punished when X is observed.This is the well-knowncause of inefficiencyunder imperfect monitoring:when there is no way to tell which player deviates,inefficientpunishmenthas to occur on the equilibriumpath. Suppose that an action profile in Q is played in the most efficient PPE that maximizesthe sum of the two players'payoffs. To characterizethe associated efficient average equilibriumpayoff for player i, denoted vi, let us introduce some notation. Let g(ai, qj) be playeri's expectedpayoffwhen playeri chooses ai = C, D and the opponent plays mixed action qj. We employ a similarnotation for the signal distributions.Recall also that the average payoff is defined to be (1 - 8) times the discounted sum of the stage game payoffs. Then the efficient averageequilibriumpayoff vi satisfies
(1) v = (1 - 5)g(C, qj) + 8(v. - p(XIC, qj)Avi)
and
(2) Vi> (1 - 8)g(D, qj) + 8(v - p(XID, qj)Avi),
where v' is the continuation payoff after Y and Avi is the (expected) future payoff reduction from vi for player i, which accrues when X is observed. If qi mixes C and D, those actions must provide an equal payoff so that (2) is satisfied with equality. Since vl + v2 > v' + v2 by definition, we obtain, from (1) and (2), the expression
(3)
vI + v2< g(C, q2)+ g(C, q1) -
d(q2) L(q2)-1
d(ql) d(q) L(ql)- 1'
where d(qj) = g(D, qj) - g(C, qj) is the gain from deviation and
(4) L() p(XID, qj) p(XID, C)(1 - qj) + p(XID, D)qj
p(XIC, qj)
p(XIC, C)(1 - qj) + p(XIC, D)qj
is the likelihoodratio that measures the degree of observabilityof a deviation. Inequality(3) directlyimplies the followinglemma:
503
LEMMA1: Suppose the efficient PPE that maximizes vl + v2plays q E Q in the firstperiod.8The maximumtotalpayoff vl + v2 is boundedabove by 2v*, where v* is definedby
(5)
v*
qE[0,1]
max g(C, q) -
L(q) - 1
(q)
We denote the maximizerof the right-handside of (5) by q*. Providedthat v* > 0 and 6 is large enough, the upper bound (5) is exactly achieved by the trigger-strategy equilibrium: (q*, q*) is played in the first (a) followingsymmetric period, (b) permanentreversionto (D, D) is triggeredwith a certainprobability p after observingX, and (c) p is chosen to satisfythe incentiveconstraint(2) with equality.Note that we have vi = v' with this strategyprofile and (2) binds with equality. Hence the upper bound formula (3) is exactly achieved. When q* = 0, (5) reduces to APM's (1991) formulafor the best pure trigger-strategy equilibriumpayoff. Our formula (5) clarifieswhy mixed (public) strategiesmay help to improve
efficiency. We can interpret g(C, q) = 1 - q - hq as the stage game payoff to
d(q) be sustained and the last term L(q)- = (1q)d+qh as the efficiency loss associL(q)ated with the inefficient punishment.Whiletakingthe inefficientaction D with a largerprobabilityq reducesthe stage game payoff (g(C, q)), it may improve associated the qualityof monitoring(increaseL(q)) and reducethe inefficiency with thepunishment(it may or may not reduce the deviation gain d(q)). Thus q* may not be 0 in general. That is, a mixed trigger strategy may achieve a better outcome than the pure one. When D makes the signal more informative,however, we may further imthat imposes penaltyonlyafterD is played. prove efficiencyby aprivatestrategy In what follows, we constructprivate equilibriathat take advantageof this potential source of efficiency.
3.2. An Exampleof EfficientPrivateEquilibrium We first present a special case of the model above, where private equilibria achieve full efficiency,while the PPE payoffsare bounded away asymptotically from the efficient frontier. Let us assume 0 < p(YIC, C) < 1, 0 < p(YID, D) < 1, and p(YIC, D) = p(YID, C) = 0. Consider the followingprivate strategy,which starts at phase (I) below. (I) Mix C and D. Choose D with a (small) probabilityq E (0, 1).
8The most efficient PPE may not use an action profile in Q in the first period. Corollary1 addressesthis issue and derives an upperboundin a special class of our model.
504
(II) If the signal is Y and one's own action was D, play D forever. Otherwise, go to (I). Note that, under the assumptionthat p(YIC, D) = p(YID, C) = 0, when a player chose D and observes Y, it is common knowledge that the other player also chose D (and, of course, observes Y). The symmetricequilibriumpayoff v satisfies (6) and (7) v = (1 )(1 - q)(1 + d) + {1 - qp(YID, D)}v. v = (1 - 8)(1 - q - qh)+ 6v
Equation (6) representsthe averagepayoffwhen a playerplays C today (while the opponent is employingthe above strategy).Note that punishmentis surely avoided in this case. Equation (7) shows the average payoff when the player chooses D today. In this case, punishmentis triggeredwhen the opponent also chooses D and the signal is Y, which happens with probabilityqp(YID, D). Equations (6) and (7), taken together, imply that the player is indifferentbetween choosing C and D. From (6), we have (8) v = 1 - q-qh.
Also, by (6) and (7) we obtain (1 - 8){(1 - q)d + qh} = 8qp(YID, D)v. 8qp(YID, D)(1 - q - qh). Direct calculationshows that there is a root of this equation in (0, 1), which tends to 0 as 8 -* 1. Equation (8) then implies that, as q tends to 0, the averagepayoff tends to 1, the payoff from full cooperation. Now we show that all the PPE payoffs are bounded awayfrom (1, 1). Consider the most efficient PPE (v?,v2) that maximizes v1+ v2, and let us define V(Q) = {(gl(q)g2(q))lq E Q). Note that V(Q) containsall feasible points sufficiently close to (1, 1). Suppose that (v?, v?) lies in this neighborhoodof (1, 1). Now consider the current and continuation payoffs associated with the most
efficient PPE: (v?, v2)
=
This and (8) result in a quadratic equation in q, (1 - 8){(h - d)q + d} =
definition,we have g? + go > vo + v2. The last inequalityimplies that (gO,gO) also lies in the neighborhoodof (1, 1), and hence in V(Q). In other words, the most efficient PPE plays a profile q E Q in the firstperiod. Then, the payoffupper bound (5) applies,which contradictsour premise that v?+ v? is arbitrarily close to 1 + 1. Hence we obtain the following result.
PROPOSITION In thegame definedabove,thereis a privateequilibrium 1: that attainsthe efficient asymptotically point (1, 1) as 8 -- 1, whileanyperfectpublic equilibrium payoffprofileis boundedawayfrom (1, 1) for all 8.
(1 - 5)(g?, gO)+
(v'1, v).
Since v1 + v2 < v? + v? by
505
Whereas it is much easier to detect the opponent's defection when one defects herself, it is more efficient to trigger a punishment only after such a (private)history.More precisely,privatestrategiesallow playersto start a punishment after the realizationof the action-signalpair for which the likelihood ratio (with respect to a defection) is maximized.This high likelihood ratio helps to reduce the inefficiencyterm in the PPE payoff formula (5). For this particular example, the inefficiency term indeed vanishes completely because the likelihood ratio p(YID, D)/p(YIC, D) is infinite. In this example, the public signal does not have a full support.As a result, it becomes common knowledge to start a mutualpunishmentafter D is played and Y is observed. If the public signal has full support, neither player is sure whether the opponent is in the punishment mode. Therefore specifying the optimal action at each historycan potentiallybe a formidabletask. We address this issue next. 3.3. Two-State MachinePrivateEquilibrium In this section, we demonstrate how to construct a private equilibrium when the signal has full support. We assume 0 < p(XICC) < p(XIDC) = p(XICD) < p(XIDD) < 1, which implies that the bad signal X is more likely to realize when more players defect. This assumptionis commonly employed in partnershipgames, includingRMM (1986). We also assume that the opponent's defection is easier to detect when one is playingD: p(XID, C) p(XID, D)
p(XIC, C)
p(XIC,D)
This is a natural assumption, which implies a form of decreasing returns

to scale.9 Let us denote p(XICC) = po, p(XICD) = p(XIDC) = pi, and p(XIDD) = p2 in this section. With this notation, the likelihood ratio defined
by (4) is expressed as L(q)

(1 - q)pl + qp2 (1- q)po + qpl
which is strictlyincreasingin q under our likelihood assumption(9). Now consider the following private strategy,which we call a two-statemachine. The strategyhas two states, R and P, and it begins with state R. Furthermore, it has the following structure(see Figure 1):
9Condition(9) implies that the probabilityof "success"Y increases more for the first input of "effort" C than the second input of effort. That is, p(YICD) - p(YIDD) > p(YICC) p(YICD).
506 (C,X) (C,Y) (D,Y)
~ (D,X)
(C,X) (C,Y) (D,X)
i-p
R
p small qRisvery R: reward state
FIGURE1.
p
-pp
large qp isvery
(D,Y)
P: punishment state
* StateR (stateto reward opponent):Choose D with probabilityqR (a small the number).Go to state P with probabilitypR E (0, 1) if D was taken and X was observed;otherwise, stay in state R. * State P (statetopunish the opponent):Choose D with probabilityqp (a large number).Go to state R with probabilitypp E (0, 1) if D was taken and Y was observed;otherwise,stay in state P. First note that this private strategy shares a feature similar to the strategy described in the previous section. Each player moves to state P only after (D, X): the most informative action-signal pair of defection. Similarly,the playersmove to state R only after (D, Y), which can be shown to be the most informativeaction-signalpair of cooperation.Second, note that there is always strategicuncertainty.Neither playerknows exactlywhat the other player'scurrent state is and her belief is going to be updated all the time. How can we check if this machine is a best response to itself at every history given such such a way that no matterwhichstateplayer2 is in, player 1 is alwaysindifferent betweenchoosing C and D. This means that any repeated game strategy is a best response to the machine, hence so is the machine itself. we Since the game and the strategyare symmetric, suppresssubscripti whenever possible. A set of parameters (qR, qp, PR, PP) is chosen to satisfy the following four equations. When player 2 is in state R, the equilibriumconditions for player 1 are as follows. * Player 1 plays C today:
(10) VR= (1 - 5)(1 - qR - qRh) + 8{(1 - qRppR)VR + qRP1PRVP).
ever-changing beliefs? To resolve this problem, we choose
(qR,
qP, PR, pP) in
GAMES REVISITED IN EFFICIENCY REPEATED
507
* Player 1 plays D today:

(11) VR= (1 + )( - qR)(1 + d) 8{(1 - qRP2PR)VR qRP2pRVp}. +
When player 2 is in state P, the equilibriumconditions for player 1 are as follows. * Player 1 plays C today: (12) Vp= (1 )(1- qp - qph)
+ 8[qp(l - P1)pPVR + {1 - qp(1 - pI)pp}Vp].
* Player 1 plays D today: (13) Vp= (1 - 8)(1 - qp)(l + d)

+ 8[qp(1 - p2)pPVR+ {1 - qp(1 - p2)PP}VP],
where Vz is the averagepayofffor player1 when player2 is in state Z = R, P. Equations (10) and (11) imply that player 1 is indifferentbetween C and D when player2 is in state R. A similarexplanationapplies to (12) and (13). This system of equations implies that player 2's state alone determines player l's continuationpayoff completely and vice versa. Each solution for this system of equations corresponds to an equilibrium two-state machine. Since the above conditions consist of four equations (10)-(13) and six unknowns (Vz, q*, pk, Z = R, P), there is a manifold of two-state machine equilibria.Here we pick the solution that correspondsto the most efficient two-state machine. In the Appendix, we prove the following results. When 8 is close to 1, this system has a solution such that (i) the probabilities(qz, p}, Z = R, P) are in [0, 1], (ii) qp can be set to 1 and qR - 0 as 8 -> 1, and (iii) VR > Vp under the used to derive the formulafor the trigger-equilibrium payoff (5), we can obtain
assumption p2 - P1 > p?d + (1 - p2)h. By a manipulation similar to the one (1 - q*)d + q*h R qRh
V =-1q
-qR hRR L R qR
L(1)-l
Hence property (ii) means that the payoff arbitrarily close to 1 -
L'-
can be
achieved as a PE as 8 -+ 1. In summarywe obtain the following result.

PROPOSITION Suppose P2 - pi > pid + (1 - p2)h. Then there is a two-state 2: machine private equilibrium whose payoff tends to 1 - L)-I as 8 - 1. Note that our equilibrium payoff uses L(1), the highest likelihood ratio to detect deviation: L(1) = maxqE[o,] L(q). Otherwise it is exactly like the formula (5) for the best trigger-strategy equilibrium payoff. The advantage of the
508
private equilibriumcomes from its ability to punish the opponent when one takes an action (D in this example) that makes the public signal most informative. As a corollaryto Proposition2, it can be shown that this PE is more efficient than any PPE. We show this under the assumptionthat public randomization is available for PPE. (Without public randomization,the set of PPE is even (weakly) smaller, so our claim is still true.) Note that, with public randomization, the best symmetricPPE payoff profile (vl = v2) maximizesv1+ v2 among all PPE.10
d1 > h > d, and 18 such thatfor all 8 E ( , 1), thereis a two-statemachine +d-h. Thenthereexists PPE payoff. the that equilibrium Pareto-dominates bestsymmetric
1: COROLLARY Suppose p2 - pi > p1d + (1- p2)h,
PROOF: First we apply Lemma 1, where Q in the present model is the set of all action profiles to play (C, C) with a positive probability.When the best symmetricequilibriuminitiallyplays an action in Q, the payoff upper bound is, by Lemma 1,
v*
(1 - q)dqh+ qh q)d max 1 - q- hq qe[0,11 L(q)-l

d
which is strictly smaller than 1 - L(I)-l under our assumptions h > d and L(q) < L(1) for all q E [0, 1). Now consider the remaining case, where the best symmetric PPE does not use (C, C) in the first period. Let v = max((v1+ v2)/2) denote the best average payoff (across players) in all PPE. When public randomizationis available, this is also the best symmetricPPE payoff.When v is achievedby a publicrandomizationover (C, D), (D, C), and (D, D) in the firstperiod, we must have v < (1 - 8) 1 h + v. This is because (i) the first period payoff to achieve v is no greater than the payoff associated with the equal mixture of (C, D) and (D, C) (= L+d-h), and (ii) the continuation payoff should be by definition no greater than v. From this inequality, v satisfiesv < +d' . Hence in this case our private strategy equilibrium Paretodominates the best symmetric PPE if 1 - L
_ > 1+d-h
Q.E.D.
Although certain restrictionsare imposed on the structureof the stage game for this corollary,there still exists an open set of parametersthat satisfies all these restrictions.As a special case of Corollary 1, we construct an example are whereprivateequilibria nearlyefficient,whilethe onlyPPE is the repetition of the stagegame Nash equilibrium.
'?Supposethat a PPE payoff profile (v', v') maximizesv, + v2. By symmetry,(v', v ) is also a PPE payoffprofilethat maximizesvl + v2.Then we can create anotherefficientPPE that achieves the symmetricpayoff profileby randomizingbetween these two equilibriain the firstperiod.
509
EXAMPLE Assume that d = 1: p(XICC) = 2
K>
0, h = 1 +
> 0, and
1 1
2
p(XICD) = p(XIDC) = p(XIDD) = 1where

K
+ e,
are small positive numbers. The two payoff bounds in the proof, maxq[0,1]1 - q - hq (1-q)d+qh and L(q)-1 +d-h, are nonpositive in this example for any given small K as e -- 0 (note that 1 - q - hq- (L-q)d+qh tends to 1 - q(2 + K) - (1-q)K+q(l+K) = -q(2 + K) - q < q L(q)-1 as e -- 0). Hence, for any given small K, if we choose a small enough e, the
e
and
only PPE is the repetition of the stage game Nash equilibriumpoint (0, 0). d On the other hand, the private equilibrium achieves 1 - L(1)-1 as 8 -> 1, which is approximately 1- K for small s. Hence the private equilibriumis nearly efficient. The analysisthus far has focused on the case in which the folk theorem in public strategies fails. However, even when the folk theorem holds and efficiency is approximatedby PPE as 8 -- 1, PE may do better than any PPE for Considerthe following example. each sufficientlyhigh discount factor."1
EXAMPLE The stage game payoff parameters satisfy d = 1 and h = 6. The 2:
public signal takes on three values: Y, X1, and X2. The distributionsof the public signal are given in TableI. Note that, as long as e > 0, the pairwise full rank condition (Fudenberg, Levine, and Maskin (1994)) is satisfied at (C, C). That is, the first three rows are linearly independent. This means that each player's defection at (C, C) is statisticallydiscriminated(player i's deviation makes signal Xi more likely,
TABLEI
y X1 x2
(C,C) (D, C) (C,D) (D,D)
1/3 0 0 1/3
1/3 1/2 + e 1/2 1/3
1/3 1/21/2 + e 1/3
"If 8 is small, the only equilibriumis a trivialone (the repetition of the stage game equilibrium), which is by definition a PPE. In our example, the PE does strictlybetter than any PPE whenever a nontrivialequilibriumexists.
510
i= 1, 2). Thus the Fudenberg-Levine-Maskin folk theorem applies. In this case, efficiencycan be supportedby asymmetricpunishmentsin public strategies without using a mutual punishment. On the other hand, this example is similarto the one in Section 3.2, where signal Y arises only when both players take the same action.12 Therefore, by a similarconstruction,(1, 1) can also be achievedby a PE in the limit as 8 -- 1. (See the on-line supplementalmaterial (Kandoriand Obara (2006)) for details.) Hence, in this example, both PPE and PE asymptoticallyachieve efficiency as 6 -- 1. However, we can show that PE does better than any PPE for all sufficientlylarge 8 < 1 if Eis small enough. Note that our PE triggersa punishment after a realizationof Y, whose probabilityis independentof e. Hence the payoff associatedwith our PE is not affected by s. On the other hand, smaller E requires a larger 8 to approximatethe efficiencywith PPE for the following reason. When 8 is small, different players'defections are statisticallydiscriminated only by a small margin.This means that, for example,when signal X1 is realized, a largefuture payoff should be transferredfrom player 1 to player 2. For this to be feasible, the discount factor should be very close to 1. This is shown in Figure 2: the solid curve represents the private equilibriumpayoff, while all the PPE payoffs are below the dotted line if 8 is less than a certain threshold.
0.8-
0.6
Payoff
0.4/
0.2/
0.995
FIGURE2.
'2Wecan modifythis exampleso that the signal has full supportand still show that a PE dominates any PPE by using the two-state machine equilibrium from Section 3.3.
511
Now we derive the upper bound of PPE in Figure 2. Let (v?, v?) be a max-
imizer of v1 + v2 over all PPE average payoff profiles. When v? - v?, the
best symmetric PPE is achieved by publicly randomizing between (v?, v?) and of of l~ game). LL?L g'(ql, (the ICL~I exists because Vi L? 3yillll~llr (v?, vf) \L"~t latter \c~A13L3V~L?CLU3L? the symmetry VI the fj(llllLf. Let ~jijYf) q2) i)1/ and v(wo) be the current payoffs and the continuation payoffs to achieve Y210 v?. The best symmetric PPE payoff v satisfies
(14) (14) 2)2 v =(1i5)1 - gl' (ql, q2) + g'(ql, q2) = (1 8)
+ 8E,(vl(o)
+ v;2())p(woiql, q2)
When the best symmetric PPE does not need public randomization, the same formula still applies. In either case, the best symmetric PPE payoff v satisfies (15) 2v-=v?+v2= max
(V ,V2)EVPPE
v1+v2,
where VPPE is the set of all PPE payoff profiles.

Rearranging formula (14), we obtain (16) 2v = g'(ql, q2) + g2(ql, q2) + ) + A2(w))p(wo\ql, q2),
(Ad(
where 4i(o) = - (vio(w) - v?) for i = 1, 2. The continuation payoffs (vl(to), v2(w)) must be in VPPE and, by the definition of Ai(o), this requirement is expressed as (17) Vow,
1 -8
(Al(o),
2())
+ (v,0 V)
E VPPE
The definition of Ai(w) and v'(o) + v(wo) < v? + vo (by (15)) imply A1(w) + A2(W)) < 0 for all w. From equation (16), we can also derive a lower bound of 1(Wo) + A2(W): for some positive constant L > 0 that is independent of ?, we have -L < A1(wo)+ A2(W) for all o. This is because (i) 2v and g'(ql, q2) + g2(q1, q2) are bounded and (ii) p(wolql, q2) is bounded below by
a positive constant that is independent of e (Lemma 1 in Kandori and Obara (2006)). Next we show that large variationsin (A1(w), A2(co)) are requiredto sustain v > 0 when the distinguishability parametere is small. Consider D = {(Al, A2)I -L< 4l + A2 < 0 and A, > K for i = 1 or 2}.
For any (large) constant K, we can find a (small enough) s > 0 such that sustaining any v > 0 requires (A1(o), A2(W)) E D for some o if e < 8 (Lemma 2 in Kandori and Obara (2006)).
512
Hence, the feasibilitycondition (17) implies that v = (v? + v)/2 > 0 is sustained only if (1D + (v?, v?)) n VPPE i 0. This, in turn, is satisfied only if ( 1-^D + (v?, v?)) n VF = 0, where VF is the feasible payoff set. For any given v > 0, direct calculationshows that a lower bound of 8 to satisfythe condition
is given by 8(v) = (3K )/K
the derivation).Then the inverse function of 8(.), u(8)= 1-83K 8 2

L
- L + 8(1 - V)} (the on-line proof contains
provides an upper bound of PPE payoffs. Note that, by construction,K can be arbitrarily large by a suitable choice of e, while L is independent of s. The
dotted line in Figure 2 depicts u(8) with (3K - L)/8 = 500.
4. GENERAL TWO-STATE MACHINE
Is our two-state machine, defined so far for the prisoner'sdilemma,applicable to a more general two-persongame? More specifically,in any given game, which action profiles (if any) can be supported by a two-state machine equilibrium?To addressthose issues, we present a systematicway to find a private equilibriumfor general two-persongames. We use R and P to denote "reward"and "punishment"states as in SecAZ denote the support of the equilibriummixed action aZ for state Z = R, P and Al = AR U AP. As before, player i moves between the states based on realizations of the public signal and her privateaction so that playerj is indifferent among A* = A' U A' (and does not gain by playing any other action). In the partnershipgame example in Section 3.3, AR = A* = {C, D} and AP = {D}. Our characterizationresult shows that such a two-state machine is an equilibriumfor large 8 if and only if the followingsimple systemof linearinequalities holds: Ve xR: 2 x A -> [0, oo), (LI) For i, j = 1, 2 and j + i, there exist ViR, G X, - [0, oo) such that and xP: [2 x AP
(18) (19) Vai E A*, Va, ? A,
V
tion 3.3.13 Let g: A1 x A2 --> I2 be the stage game payoff function and let
= gi(ai, a)
- E[xi (o, aj)ai ai], aj) ai, aR],
VR > gi(ai, a ) - E[xiR(,
13We think of a more general machine that consists of Ni < oo, i = 1, 2, states. In state n, can player 1 randomizesover A' and moves to the other states based on a realizationof an actionsignal pair so that player 2 is indifferentbetween all the actions in A* - UN 1 A at every state and vice versa. In the discussionversion of this paper (Kandoriand Obara (2003)), we showed that such a machinewith countable(possiblyinfinite)states can be reducedto two-statemachine withoutloss of generality.
513
(20) (21)
Vai E A, Vai
?
Vi
gi(ai, a;) + E[x(o,
aj)la,
],
A*,
ViP > gi(ai,f)
+- E[xfP(W,aj)lai, ac],
(22)
VR> Vi,
where VR and VP correspond to player i's equilibriumcontinuation payoff when playerj 7: i is in state R and state P, respectively. The formalstatementof this resultfollows. Here, we confine our attentionto nontrivialtwo-state machine equilibria,where no player alwaystakes a myopic
best reply.
PROPOSITION If there is a nontrivialtwo-statemachine equilibriumwith 3: equilibrium payoff ViZand (mixed) action profile ai for state Z = R, P, then is satisfiedfor some xi, Z = R, P. Conversely, (LI) holdsfor some Viz, if (LI) az, and xZ, Z = R, P, then there exists a two-statemachine equilibriumwith Vi and az, Z = R, P, provided that the discountfactor 8 is close enough to unity. The proof is in the Appendix. Intuitively,xR and xP in (LI) represent, respectively, the penalty (in state R) and the bonus (in state P) to discipline player i. Nonnegativity of xz and inequality (22) directly imply that there is a certain restrictionon the actions that can be used in a two-state machine equilibrium. 2: COROLLARY The (potentiallymixed)actions used in a nontrivialtwo-state machine equilibriumaR and af, and theirsupportA* = AR U AP must satisfy separationcondition
(23)
aiEAf
min gi(ai,
acR)
> maxgi(ai, aP).

aiEAi
This condition is useful to check which (mixed) action profile can be used for a nontrivialtwo-state machine equilibrium.For example,we can show that there is no nontrivialtwo-state machine equilibriumfor the stage game
1,0 0,1
-1,1
1,-1
If A* is a singleton, then maxaiAi gi(ai, ca) = 1 and (23) cannot be satisfied.
= Thus rrlu suppose that A -- Ai for i = 1, 2. Then, for any choice of j)a aP, (23) is vurvvv riir~i lLLC ?III)I1UI IVIVV aR,
514
violated because maxgi(ai, iaP)> min maxgi(ai, aj)

aiwai ajEAAj aiEAi
1
3
= max
>
min
gi(ai, aj)
ajEjAAj aiEAT(=Ai)
min gi(ai, a),
where AAj is the set of mixed actions of playerj.

5. RELATED LITERATUREAND COMMENTS
PrivateMonitoring Our two-statemachine also worksfor theprivatemonitoring case, where each player i observes a privatesignal ti. If we modify our two-state machine strategy for player i in such a way that player i uses ci in place of the public sigwe nal ow, can see that this constitutesan equilibriumunder privatemonitoring as long as the marginaldistributionof wi is identical to the public signal distribution. Our private strategieswork because neither player needs to know the other player's state. Ely and Valimaki (2002) independently found a similar two-state machine strategy in the frameworkof repeated games with private monitoring. As in this paper, a player is indifferent among all the repeated game strategies, regardless of the state the opponent is in. The idea behind these strategies goes back to Piccione (2002), which is essentially based on a machinewith a countablyinfinite numberof states. However,there is an importantdifferencebetween our paper and that by Ely and Valimaki (2002). In their paper, a pure action is played at each state. Note that it is difficultto embody our efficient punishment(which occurs only when action D is played) in such a formulation.This is because a the "monitoring" would be more tempted to defect when the opponent is not likely to be player in the state to play monitoringaction D. To utilize the most informativeactionsignal pair to sustain cooperationwithout being noticed by the opponent, one needs to play a mixed action at the rewardstate. Indeed, Ely and Valimaki's two-state machine, which uses a pure action in each state, can sometimes be strictlyimprovedby using a mixed action. A subsequent work by Ely, H6rner, and Olszewski (2005) generalizes the above ideas for two-playerrepeated games with privatemonitoring.Their formulation is more general than ours given by (LI). In our two-state machine, ARand AP are fixedthroughoutthe game,whereasEly, Horner,and Olszewski
(2005) allow them to vary over time or according to a realization of a public
515
correlation device. With such a generalization,they showed that the folk theorem holds for some class of stage games (but not for a general game) when monitoringis private and almost perfect. PrivateStrategy Exercise 5.10 in Fudenbergand Tirole (1991) is an early example of a game in which a privatestrategymakes a difference. It is a two-period repeated game with three players.Using a combinationof privateactions and public signalsin the firstperiod as a correlationdevice, players 1 and 2 canpunish player3 with minmaxactionprofile in the second period. This severe punishthe correlated ment, which is not availablefor any PPE, deters player l's deviation from the efficient action in the firstperiod. Lehrer (1991) also used a private strategyas an endogenous correlationdevice (internal correlation) in repeated games without discounting to support correlatedequilibriumpayoffs. Mailath, Matthews, and Sekiguchi (2002) showed three examples of twoperiod repeated games with imperfect public monitoringin which a PE dominates any PPE. In their first example, the first period serves as a correlation device, as in Fudenberg and Tirole's example, but the first period generates correlatedsignals to supportthe efficient correlatedequilibriumin the second period (as in Lehrer (1991)). In their second and third examples, the efficient action profile can be supportedin the firstperiod only for some PE, because a harsh punishmentis availablefor the PE. These two examples differ from Fudenberg and Tirole'sexamplebecause they are based on an incorrectbelief off the equilibriumpath, not on correlatedpunishments.In both examples,when a player deviates, the other playerrespondswith a tough punishmentdue to her incorrectbelief. Our equilibriumis differentfrom these and the Fudenbergand Tirole example, because we focus on the efficient use of information,whereas their examples emphasize the magnitude of possible punishments.Note that the magnitude of possible punishments becomes less of a problem as 6 -- 1 for infinitelyrepeated games. Some papers use a certain type of private strategywherebyplayersrandomize over their actions and send a message based on their realizations.Kandori (2003) used such a strategyto show that Fudenberg,Levine, and Maskin'ssufficient condition for the folk theorem can be relaxed when players can communicate.Ben-Porathand Kahneman(2003) used a similarstrategyto prove a folk theorem with costly privatemonitoringand communication.Obara(2003) also used a similartrickto achieve full surplusextractionwith adverseselection and moral hazardin the context of mechanismdesign. OpenIssue One open question remains. Although we were able to show that a PE can be far more efficient than any PPE, we have not yet characterizedthe best pri-
516
vate equilibriumpayoff. This is due to the lack of recursivestructurein private monitoring equilibria,which makes characterizationof all private equilibria difficult (see Kandori (2002)). In general, when PPE are inefficient, is there also an efficiencybound for private equilibria?Alternatively,do private equilibria achieve full efficiency?This is an importanttopic for future research. Faculty of Economics, The University Tokyo, 7-3-1 Hongo, Bunkyo-ku, of 113-0033, Japan;kandori@e. Tokyo, u-tokyo.ac.jp and Economics, UCLA, 405 HilgardAve., Los Angeles, CA 90095-1477, Dept. of U S.A.; iobara @econ.ucla.edu;http://www. ucla.edu/iobara/. econ.
Manuscript received February,2004; final revision received November, 2005.
APPENDIX: PROOFS PROOFOF PROPOSITION From (10) and (11), we can obtain 2: (24) (1 - 8){(1 - qR)d + qRh} = qRpR(2 - p)(VR - VP).
As before, we can use this equation to derive, from (10), the equation (25)
= VR 1 -qR - qRh-
(1 - qR)d + qRh L(1)- 1
Similarly,we can derive two equations from (12) and (13). l - pi (1 qp)d + qph Vp= 1 - qp - qph + (26) P L(1)- 1
(27) (1 - 8){(1 - qp)d + qph} = 8qppp(p2 - p1)(VR - Vp).
This system of equations is equivalentto (10)-(13). First note that PR should be set equal to 1. If there exists a solution of these equations with PR < 1, then we can reduce qR to increase VRvia (25) while increasingPR and reducingpp so that (24) and (27) are still satisfied.Note also that qp can be set equal to 1. If not, you can increase qp to reduce Vpvia (26), while lowering PR and pp so that (24) and (27) are satisfied.This leads to
Vp=
(1- p2)h
P2P1
from (26). Now we are left with three equations ((24), (25), and (27)), and three unknowns (qR, PP, VR). Once qR is obtained, VRis also obtained from (25) and
qRh
pp
(-q- qR)d+qRh [d, (1
1]
517
is obtained from (24) and (27). Thus we need only to find qRin [0, 1]. These three equations reduce to a quadratic equation for qR: c2(8)q2
Cl(5)qR + Co(S)
=
0, where c2(6)
8{p2(1
+ + h) - p(li + d)}, c1(8) = (1 -
8)(h - d) + 5{pld + (1 - p2)h - (P2 - pl)), and Co(8) = (1 - 5)d. One root of this quadratic equation is clearly qR = 0 when 5 = 1. Whereas dFldqRl(qR, )=(o,1) / 0 by the assumption p2 - pI > pld + (1 - p2)h, the implicit function theorem can be applied to obtain a C1 function qR(8) around 5 = 1 such that
dqR(1) _~
I(qR,8)=(O,1)
d65
I(qR)=(0,)
d - P2)h - (P2 - Pl) pld + (1
which is negative by assumption. Thus there exists a qR(8) e (0, 1) for

large enough 8 such that qR(5) --0 as 8 - 1. Hence we obtain a solution for (24)-(27) parameterized by 8 around 6 = 1. Because lim^1 VR(8) = 1 L(1 is larger than Vp = ((1 - p2)h)/(p2 - pl) by the assumption p2 pl >
pld + (1 - p2)h, this two-state machine is a sequentialequilibriumfor large 5, combined with the belief obtained via Bayes' rule.14 Therefore, for any r7> 0, we can find 5 such that the payoff of this PE exceeds 1 - L() - r for any 8 E (S, 1). Q.E.D.
PROOF OF PROPOSITION 3: Consider the following transition rule for
player j: go to state P with probabilityp (co, aj) when the currentstate (for j) is z = R, P; otherwise, go to state R. The equilibriumcondition for player i when j is in state z = R, P is (28)
V > (1- 8)gi(ai, aj) + 8E[(1 - p(co, aj))iR + pj(co, aj)Viai, aj],
where the equalityshould be satisfiedfor ai E A* = ARU AP. Considerfirstthe case z = R. SubtractingSVRfrom both sides and dividingby (1 - 8), we obtain 8 V - E 1 -p a) Vi > gi(ai, a where the equality holds for ai E A*. A similar manipulationfor state z = P leads to
j
1- 8
can 14Belief be simplyderivedby Bayes'rule at any history.Because no deviationis observable to the opponent, a player always updates her belief by assumingthat the opponent has never deviated.
518
where the equality holds for ai E A*. Hence, if we have a two-state machine strategyequilibrium,conditions (18)-(22) are satisfiedwith
(29) x/R(o, aj)= 1
pR(W^
aj)((R
- VP)
and (30) xp(o, aj) (1 - p(t, aj))(R ).
for Note that VR > V7P a nontrivialtwo-state machine equilibrium. Conversely, suppose that conditions (18)-(22) are satisfied. Then (29) and (30) can be satisfiedfor pj(w, aj) E [0, 1], z = R, P, for sufficientlyhigh 6. Hence we obtain the equilibriumcondition (28) and the two-state machine Q.E.D. equilibriumto supportpayoffs (VR,VP) for i = 1,2.
REFERENCES
AND D. PEARCE(1991): "Information and Timing in Repeated PartABREU, D., P. MILGROM,
Econometrica, 1713-1733. 59, nerships,"

ABREU, D., D. PEARCE,AND E. STACCHETTI (1990): "Toward a Theory of Discounted Repeated
Games with ImperfectMonitoring," Econometrica, 1041-1063. 58,

BEN-PORATH, AND M. KAHNEMAN E., (2003): "Communication in Repeated Games with Costly
Gamesand EconomicBehavior,44, 227-250. Monitoring," J. AND ELY, C., J. HORNER, W.OLSZEWSKI (2005): "Belief-FreeEquilibriain Repeated Games," Econometrica, 377-415. 73, ELY,J. C., ANDJ. VALIMAKI (2002): "A Robust Folk Theorem for the Prisoner'sDilemma," Journalof Economic Theory, 102, 84-105.
FUDENBERG,D., D. K. LEVINE,AND E. MASKIN(1994): "The Folk Theorem with Imperfect
PublicInformation," Econometrica, 997-1040. 62,

D., FUDENBERG, AND J. TIROLE(1991): Game Theory. Cambridge, MA: MIT Press. M. Behaviorby Randomization: New EfficiencyResults KANDORI, (1999): "CheckYourPartners'
on Repeated Gameswith ImperfectMonitoring," TechnicalReport CIRJE-F-49,Universityof Tokyo. to Joural of Economic (2002):"Introduction Repeated Gameswith PrivateMonitoring," Theory,102, 1-15. Communication,and Efficiencyin Repeated Games with Im(2003): "Randomization, Econometrica, 345-353. 71, perfect PublicMonitoring," KANDORI,M., AND I. OBARA (2003): "Efficiencyin Repeated Games Revisited:The Role of PrivateStrategies,"UCLA WorkingPaper826. (2006): "Supplementto 'Efficiencyin Repeated Games Revisited:The Role of Privace Material,74, http://www. econometricsociety. org/ecta/ Strategies',"EconometricaSupplementary supmat/5074Ex2.pdf. E. LEHRER, (1991): "InternalCorrelationin Repeated Games,"Interational Journalof Game 19, Theory, 431-456.
AND T SEKIGUCHI G. MAILATH, J., S. A. MATTHEWS, (2002): "Private Strategies in Finitely Re-
to Economics,2, peated Games with ImperfectPublic Monitoring,"Contributions Theoretical http://www. bepress. com/befte/contributions/vol2/issl/art2.
519
I. Game Revisited,"UnOBARA, (1999): "PrivateStrategyand Efficiency:Repeated Partnership publishedManuscript,Universityof Pennsylvania. (2001): "PrivateInformationin Repeated Games,"Ph.D. Thesis, Universityof Pennsylvania. (2003):"TheFull SurplusExtractionTheoremwith Hidden Actions,"UnpublishedManuscript,UCLA. M. PICCIONE, (2002): "The Repeated Prisoner'sDilemma with Imperfect PrivateMonitoring," Journalof Economic Theory, 102, 70-83. AND R., RADNER, R. MYERSON, E. MASKIN (1986): "AnExample of a Repeated Partnership Reviewof Economic StudGame with Discountingand with UniformlyInefficientEquilibria," ies, 53, 59-69.

Efficiency in Repeated Games Revisited The Role of Private Strategies

Загружено:

Сведения о документе

Исходное описание:

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Efficiency in Repeated Games Revisited The Role of Private Strategies

Загружено:

Авторское право:

Доступные форматы

Efficiency in Repeated Games Revisited: The Role of Private Strategies Author(s): Michihiro Kandori and Ichiro Obara Source:

Econometrica,Vol. 74, No. 2 (March,2006), 499-519

M. KANDORI AND I. OBARA

point: Pr(badlC, D) = Pr(badlD, C) = Pr(badlC, C) + e. When e is small, it is