Вы находитесь на странице: 1из 49

Game Theory and Evolutionary Games

Lacra Pavel
Systems Control Group
Dept. of Electrical and Computer Engineering
University of Toronto

2014
2
Contents

1 The Name of the Game 5


1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2 Games in Extensive Form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.3 Games in Normal Form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.4 Game Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.4.1 Competitive versus Cooperative . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.4.2 Repetition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.4.3 Knowledge Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.5 Solution Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.5.1 Minimax Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.5.2 Best Response . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.5.3 Nash Equilibrium Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.5.4 Pareto Optimality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.5.5 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

2 Matrix Games: 2-player Zero-Sum 17


2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.2 Pure and Mixed Strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.2.1 Mixed-Strategy Cost Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.3 Minimax Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.4 Computation of Mixed-Strategy Equilibria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.4.1 Nash equilibrium & Dominated Strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.4.2 Graphical Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.4.3 Linear Programming Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

i
ii CONTENTS

2.5 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

3 Matrix Games: N-player Nonzero Sum 35


3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.2 Bimatrix Games . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.2.1 Mixed Strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.2.2 Computation of NE for Bimatrix 2 × 2 Games . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.2.3 Symmetric 2 × 2 Games . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.3 N-Player Finite Games . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.3.1 Pure and Mixed Strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.3.2 Mixed-Strategy Payoff (Cost) Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
3.4 Dominance and Best Replies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
3.4.1 Best-Response Correspondence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.5 Nash Equilibria Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
3.6 Nash Equilibria Characterization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
3.7 * Nash Equilibria Refinements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
3.7.1 Perfect Equilibrium . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
3.7.2 Proper Equilibrium . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
3.7.3 Strategically Stable Equilibrium . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
3.8 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

4 Continuous-Kernel Games 55
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
4.2 Game Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
4.3 Extension to Mixed-Strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
4.4 Nash Equilibria and Best-Response Correspondence . . . . . . . . . . . . . . . . . . . . . . . . . . 56
4.5 Existence of Pure-Strategy NE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
4.6 Example: Optical Network OSNR Game . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
4.6.1 Iterative Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

5 Continuous-Kernel Games with Coupled Constraints 67


5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
5.2 Nash equilibria and relaxation via an augmented optimization . . . . . . . . . . . . . . . . . . . . . 68
5.3 Lagrangian Extension in a Game Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
CONTENTS iii

5.4 Duality Extension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72


5.5 Hierarchical Decomposition in a Game Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
5.6 Example: Optical Network Game with Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . 76
5.6.1 Indirect Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
5.6.2 Lagrangian Pricing Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
5.6.3 Iterative Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
5.7 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

6 Evolutionary Games and Evolutionary Stability 81


6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
6.2 Evolutionary Stable Strategy (ESS) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
6.3 ESS in Population Games . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
6.4 *Supplementary Material . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
6.4.1 *Characterization of ESS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
6.5 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92

7 Replicator Dynamics 93
7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
7.2 Derivation of Replicator Dynamics (RD) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
7.3 RD Equilibria vs. NE Strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
7.4 RD Equilibria vs. ESS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
7.5 Doubly Symmetric (Partnership) Games . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
7.6 Potential Games . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
7.7 *Supplementary Material . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
7.7.1 *Proof of Proposition 7.11 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
7.7.2 *Doubly Symmetric Games and NE Efficiency . . . . . . . . . . . . . . . . . . . . . . . . 110
7.7.3 *Extensions to Multi-populations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
7.7.4 *Adaptation (Strategy) Dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
7.8 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115

8 Learning in Games 117


8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
8.2 Payoff (Cost) Monotonic Dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
8.3 Revision Protocol and Mean Dynamic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
iv CONTENTS

8.4 Imitation Dynamics (ID) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122


8.4.1 Pairwise proportional imitation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
8.4.2 Imitation driven by Dissatisfaction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
8.5 Direct Dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
8.6 Best Response (BR) Dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
8.7 Fictitious Play (FP) and Stochastic FP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
8.8 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130

A Fixed-Point Theorems 131


A.1 Brower’s Fixed-Point Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
A.2 Kakutani’s Fixed-Point Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133

B Standard Optimization Review 135


B.1 Convex Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
B.2 Standard Optimization Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
B.3 Diagonally Dominant and M-matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
B.4 Projection Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
B.5 Lipschitz Continuity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
List of Notations

R The set of real numbers


R+ The set of nonnegative real numbers
Rm The set of all ordered m-tuples of real numbers
Rm×n The set of all m × n matrices
L+ (x0 ) The positive limit set of x0
N The player index set {1, . . . , N}, typical element i
M The strategy index set {1, . . . , m}, typical element j, k, l
Ωi (infinite) set of pure strategies (actions) for player i, i ∈ N
Ω = ∏i∈N Ωi set of pure strategy profiles
Ai finite set of actions for player i, i ∈ N
A = ∏i∈N Ai set of action (pure strategy) profiles
∆i ( Ω i ) (infinite-games) set of randomized/ mixed-strategies for player i,
i∈N
∆ i ( Ai ) (finite-games) set of randomized/ mixed-strategies for player i, i ∈
N
∆ = ∏i∈N ∆i set of mixed strategy profiles
ui pure strategy (action) for player i, ui ∈ Ωi , i ∈ N
ai pure strategy (action) for player i, ai ∈ Ai , i ∈ N
u = ( u1 · · · uN ) N-tuple of pure strategies (profile), u ∈ Ω
a = (a1 · · · aN ) N-tuple of pure strategies (profile), a ∈ A
u = (ui , u−i ) compact notation for pure-strategy profile u
a = (ai , a−i ) compact notation for pure-strategy profile a
Mi = {1, . . . , mi } index set of finite pure strategies (actions) for player i, i ∈ N
Ωi = {ei1 , . . . , eimi } finite set of pure strategies (actions) for player i ∈ N
∆i = {xi ∈ Rm mi
+ | ∑ j =1 xi, j = 1 }
i
simplex, set of mixed-strategies for player i, i ∈ N
eij j-th pure strategy (action) for player i, eij ∈ Ωi , j ∈ Mi , i ∈ N
xi mixed-strategy for player i, xi ∈ ∆i , xi = [xi, j ], j ∈ Mi , i ∈ N
x = (x1 · · · xN ) N-tuple of mixed-strategies (profile), x ∈ ∆
x = (xi , x−i ) compact notation for x mixed-strategy profile
Ji : Ω → R cost function of player i, i ∈ N
Ji : ∆ → R or Ji expected cost of player i, i ∈ N

v
vi CONTENTS

Ui : Ω → R, Ui = −Ji utility (payoff) function for player i, i ∈ N


A ∈ Rm×n cost matrix in two-player zero-sum (or in symmetric) matrix game
A, B ∈ Rmi ×mi cost matrices player 1 and 2 in two-player bimatrix game

m
∆ = {x ∈ Rm
+ | ∑ j =1 x j = 1 } The set of points in Rm such that x j ≥ 0 for every j ∈ M = {1, . . . , m} and
∑ j∈M x j = 1. Here “|” means “such that” and “,”means “and”
Introduction

Game theory, an area initiated more than fifty years ago [74], has been of interest for researchers working in a broad
range of areas from economics, [15], computer science, [53, 97], social studies, and more recently in engineering
and communication networks, [16, 45, 117, 99, 7, 47, 90]. The recent popularity it has been enjoying in engineering
has to do with fact that it brings new perspectives to optimization and control of distributed networks.
Game theory is a branch of applied mathematics concerned with the study of situations involving conflicting inter-
ests. The field was born with the book by John von Neumann and Oskar Morgenstern, [115], although the theory
was developed extensively in the 1950s by many among whom John Nash, [73, 74]. Game theory mathematically
describes behavior in strategic situations, in which an individual’s success in making choices depends on the choices
of others. It incorporates paradigms such as Nash equilibrium and incentive compatibility, which can help quan-
tifying individual preferences of decision-making agents. In fact game theory provides a rigorous mathematical
framework for modeling actions of individual selfish or cooperating agents/players and interactions among players.
Furthermore, it has an inherently distributed nature and provides a foundation for developing distributed algorithms
for dynamic resource allocation.
While initially developed to analyze competitions in which one individual does better at another’s expense (zero-sum
games), it has been expanded to treat a wide class of interactions, which are classified according to several criteria,
among which cooperative versus noncooperative games. Typical classical games are used to model and predict the
outcome of a wide variety of scenarios involving a finite number of “players" (or agents) that aim to optimize some
individual objective.
Historically, the development of game theory was motivated by studies in economics, however, many interesting
applications in have emerged in diverse fields such as biology [106], computer science [37], social science and
engineering [54]. In engineering, the interest in noncooperative game theory is motivated by the possibility of
designing large scale systems that globally regulate their performance in a distributed, and decentralized manner.
Modelling a problem within a game-theoretic setting is particularly relevant to any practical application consisting
of separate subsystems that compete for the use of some limited resource. Examples of such applications include
congestion control in network traffic (i.e. internet, or transportation), problems of optimal routing, [10, 12, 13],
power allocation in wireless communications and optical networks, [99, 90].
Moreover, recent interest has been on extending the standard game setup in various ways, some having to do with
computation of equilibria, [27, 91, 30] others being concerned with the inherent static nature of a traditional game
setup, how to extend it to a dynamic process by which the equilibria is to be achieved, [64, 103] or to address
equilibria efficiency, [46, 1, 44, 94].

1
2 CONTENTS

In a noncooperative (Nash) game [74, 96, 18] each player pursues the maximization of its own utility, or equivalently
the minimization of its own cost function, in response to the actions of all other players. The stable outcomes of the
interactions of noncooperative selfish agents correspond to Nash equilibria. On the other hand, in a cooperative game
framework the natural players / agents can be the network nodes, routers or switches (as software agents), or users,
[8]. These players/agents cooperate to redistribute the network resources (bandwidth, wavelength capacity, power).
Why game theory? Consider the case of a multiple access network problem; most optimization based approaches
find the optimal multiple access control (MAC) and routing parameters that optimize network throughput, lifetime,
delay etc and assume all nodes in the network use these parameters. But there is no reason to believe that nodes
will adhere to the actions that optimize network performance. Cheaters may deviate in order to increase their
payoffs which in turn affects other users. In effect any scenario where there is some strategic interaction among
self-interested players is best modelled via a game theoretic model. Game theory helps to capture this interaction,
the effect of actions of rational players on the performance of the network. Although the selfish behavior of players
causes system performance loss in a Nash game [29, 53, 47, 1, 93], it has been shown in [47, 100] that proper
selection of network pricing mechanism can help preventing the degradation of network system performance. In the
context of evolution, a game captures the essential features where strategic interactions occur.

CGT, EGT and LGT

In these course notes we shall study game theory as a framework with its branches: classical game theory (CGT),
evolutionary game theory (EGT) and learning game theory (or learning in games) (LGT).
Game theory as a framework is a methodology used to build models of real-world social interactions. The result of
such a process of abstraction is a formal model that typically comprises the set of individuals who interact (called
players/agents), the different choices available to each of the players (called strategies), and a payoff function that
assigns a (usually numerical) value to each player for each possible combination of choices made by every individual.
When different assumptions about how players behave, or should behave, are included in the framework, gives rise
to the different branches that compose game theory: CGT, EGT and LGT.
Classical game theory (CGT): Classical game theory was chronologically the first branch to be developed (Von
Neumann and Morgenstern, 1944), the one where most of the work has been focused historically, and the one with
the largest representation in most game theory textbooks and courses. Classical game theory (CGT) is a branch
of mathematics devoted to the study of how instrumentally rational players should behave in order to obtain the
maximum possible payoff in a formal game.
The main problem in classical game theory is that, in general, rational behaviour for any one player remains un-
defined in the absence of strong assumptions about other players’ behaviour. Hence, in order to derive specific
predictions about how rational players should behave, it is often necessary to make very stringent assumptions about
everyone’s beliefs (e.g. common knowledge of rationality) and their interdependent consistency. If a game involves
only rational agents, each of whom believe all other agents to be rational, then theoretical results offer accurate
predictions of the game outcomes.
Even when the most stringent assumptions are in place, it is often the case that several possible outcomes are possible,
and it is not clear which, if any, may be achieved, or the process through which this selection would happen. Thus,
the general applicability of classical game theory is limited. A related limitation of classical game theory is that it is
an inherently static theory: it is mainly focused on the study of end-states and possible equilibria, paying hardly any
CONTENTS 3

attention to how such equilibria might be reached. A more realistic modelling scenario involves players that are less
than rational and a repeated game play.
Evolutionary Game Theory (EGT): Some time after the emergence of classical game theory, biologists realized the
potential of game theory as a framework to formally study adaptation and coevolution of biological populations.
For those situations where the fitness of a phenotype is independent of its frequency, optimization theory is the
proper mathematical tool. However, it is most common in nature that the fitness of a phenotype depends on the
composition of the population [77]. In such cases game theory becomes the appropriate framework. In evolutionary
game theory (EGT), players are no longer taken to be rational. Instead, each player, most often meant to represent an
individual animal, always selects the same (potentially mixed) strategy, which represents its behavioural phenotype,
and payoffs are usually interpreted as Darwinian fitness. The emphasis is placed on studying which behavioural
phenotypes (i.e. strategies) are stable under some evolutionary dynamics, and how such evolutionary stable states
are reached. Despite having its origin in biology, the basic ideas behind evolutionary game theory, that successful
strategies tend to spread more than unsuccessful ones, and that fitness is frequency-dependent, have been extended
to other fields. The study of dynamic systems often begins with the identification of their invariant or equilibrium
states, in EGT literature called stable states. This is often called static analysis, as it does not consider the dynamics
of the system explicitly, but only its rest points. The most important concept in the static analysis of EGT is the
concept of Evolutionary Stable Strategy (ESS), proposed by Maynard Smith and Price,[108]. Very informally, a
population playing an ESS is uninvadable by any other strategy.
The main shortcoming of mainstream EGT is that it is founded on assumptions made to ensure that the resulting
models are mathematically tractable. Most of the work assumes one single infinite and homogeneous population
where players using one of a finite set of strategies are randomly matched to play an infinitely repeated 2-player
symmetric game. Most of our treatment will be assuming this standard model. In the last few years various alter-
native models (e.g. finite populations, stochastic strategies, multi-player games, structured populations) are being
explored. Extensive references on this topic can be found in [116], [41], [114], [98].
Learning game theory (LGT): Like evolutionary game theory (EGT), learning in games abandons the demanding
assumptions of classical game theory on players’ rationality and beliefs. However, unlike its evolutionary counterpart
(EGT), learning game theory assumes that individual players adapt, learning over time about the game and the
behaviour of others (e.g. through reinforcement, imitation, or belief updating). Therefore, instead of immediately
playing a perfect move, players adapt their strategy based on the outcomes of previous matches, [34], [103], hence a
classical game with learning, or learning in games. Extensive references on this topic can be found in [34], [75].

Applications: Game Theory in Networks

A game-theoretic approach can be used for studying both the behavior of independent agents and the structure of
networks they create. Moreover, the formulation of a game-theoretic model leads directly to the development of
distributed algorithms towards finding equilibria, if they exist. Over the years there have been a lot of research
efforts resulting in a rich literature in the application of game theory in transportation systems [97], Internet [47]
wireless networks [45, 99, 7], or optical networks, [90, 86], [92].
Among some of the recent game theory applications, communication networks is an area of recent interest. Ap-
plications can involve either cooperative or noncooperative games, static or repeated games, finite or continuous
strategy games, [13]. Typical applications include power control problems in different multiuser environments
[31, 45, 7, 99, 90, 52], routing [11, 80] or congestion control [97, 10, 4, 12, 19, 55, 105, 117], extending the system-
4 CONTENTS

based optimization approaches [50, 66, 60, 59, 61, 111]. The many "players" interact within the network and make
(sequential) decisions, i.e., play a game. For example, in a noncooperative (Nash) game framework the natural play-
ers / agents can be the Internet service providers (ISP) or domain operators, [58, 63], the routers [11, 19, 118, 80], or
even the users themselves in an access network application with dedicated wavelengths, [86]. As another example
in wired networks, there could be two sets of players: telecom firms/ISP and end users. Both sets of players have
different objectives and non-negligible interaction across players exists. In wireless networks there could be wire-
less LANs where users / players communicate with a fixed access point, or wireless ad-hoc networks where users
/players communicate with each other in the absence of any fixed infrastructure support.
Chapter 1
The Name of the Game

Chapter Summary
"He thinks that I think that he thinks ... "

This chapter provides a brief overview of basic concepts in game theory. The following chapters will start introduc-
ing them formally.

1.1 Introduction

In this chapter we shall introduce the game theoretical notions in simplest terms. Our goal will be later on to study
and formalize mathematically various game problems, by which we understand problems of conflict with common
strategic features. These models are called “games" because they are built on actual games such as bridge and poker.
The theory of games stresses the strategic aspects, i.e, the aspects controlled by the players and in this goes beyond
the classical theory of probability which treats games limited to aspects of pure chance. This theory of games was
first introduced by Borel in 1921 but the theory was established by John von Neumann in 1928, who laid together
with Morgenstern the basis of what later on John Nash generalized and what is called nowadays the mathematical
theory of games, [74].
In any game there are a number of players (multiple decision-makers) that have a sequence of personal moves; at
each move, each player has a number of choices from among several possibilities, also possible is the chance or
random move (think of throwing a dice). If we start by considering examples of games, one well-known case is the
game of chess, in which there are no chance moves once the game starts, bridge which has chance moves but skill is
important and roulette which is entirely a game of chance. In fact in the game of chess each player knows every move
that has been made so far, while in bridge this information is imperfect. At the end of the game there is some payoff
to be gained (cost to be paid) by the players which depends on how the game was played. Noncooperative game
theory studies the strategic interaction among self-interested players. This is in contrast to a standard optimization
where there is only one decision-maker who aims to minimize an objective function by choosing values of variables
from a constrained set such that the system performance is optimized.
So far we have mentioned three elements: alternation of moves (individual or random (chance)), a possible lack of
knowledge and a payoff or cost function. A game G consists of a set of players (agents) N = {1, . . . , N}, an action

5
6 CHAPTER 1. THE NAME OF THE GAME

set (also referred to as a set of strategies) available to those players and an individual payoff (utility) Ui or cost
function Ji for each player i ∈ N . The convention used in most books on classical game theory (matrix form) and
evolutionary games is utility or payoff maximization. We shall use the convention of cost function minimization that
follows the control setup, keeping in mind the equivalence between the two approaches, [18]. Specifically, in a game
each player i ∈ N individually takes an optimal action to maximize its own payoff (utility) Ui , which is equivalent
to minimizing its own cost (loss) function, Ji , formally defined as Ji = −Ui . So we will always understand that when
a player aims to maximize its payoff (utility) this means to minimize its cost (loss) incurred during a game.
Each player’s success in making decisions depends on the decisions of the others. Let Ωi denote the set of actions
available to player i, which can be finite or infinite. This leads to either finite actions set games, also known as matrix
games, or infinite (continuous action set) games.In the latter case each player can choose its action from a continuum
of (possibly vector-valued) alternatives. A strategy can be regarded as a rule for choosing an action, depending on
external conditions. Once such a condition is observed, the strategy is implemented as an action. In the case of
mixed strategies, this external condition is the result of some randomization process. Briefly, a mixed strategy for
agent i is a probability distribution xi over its action set Ωi . In some cases actions are pure, or independent of any
external conditions, and the strategy space coincides with the action space. In discussing games in pure strategies
we shall use the term “strategy" and “action" interchangeably to refer to some u ∈ Ω, and the game G can simply be
specified as G (N , Ωi , Ji ).
In the next sections we introduce these concepts for possible forms of a game as well as what we understand by
various solution concepts.

1.2 Games in Extensive Form

The extensive form of a game amounts to a translation of al the rules into technical terms of a formal system designed
to describe all games.
Extensive form game generally involve several acts or stages, and each player chooses a strategy at each stage. The
game’s information structure, i.e., how much information is revealed to which players concerning the game’s out-
comes and their opponents’ actions in the previous stages, significantly affects the analysis of such games. Extensive
form games are generally represented using a tree graph. Each node (called a decision node) represents every pos-
sible state of play of the game as it is played, [18]. Play begins at a unique initial node, and flows through the tree
along a path determined by the players until a terminal node is reached, where play ends and cost s are assigned
to all players. Each non-terminal node belongs to a player; that player chooses among the possible moves at that
node, each possible move is an edge leading from that node to another node. Their analysis becomes difficult with
increasing numbers of players and game stages.
A formal definition is as follows.

Definition 1.1. A N-player game G in an extensive form is defined as a graph theoretic tree of vertices (states)
connected by edges (decisions or choices) with certain properties:

1. G has a specific vertex called the starting point of the game;

2. a function called the cost function which assigns an N-vector (tuple) J1 , . . . , JN to each terminal vertex (out-
come) of the game G , where Ji denotes the cost of player i, N = {1, . . . , N},
1.2. GAMES IN EXTENSIVE FORM 7

3. each non-terminal vertex of G is partitioned into N + 1 possible sets, S 0 , S 1 , . . . , S N called the player sets,
where S 0 stands for the choice of chance (nature),

4. each vertex of S0 has a probability distribution over the edges leading from it,

5. the vertices of each player S i , i = 1, . . . , N are partitioned into disjoint subsets known as information sets,
S ji , such that two vertices in the same information set have the same number of immediate (choices/edges)
followers and no vertex can follow another vertex in the same information set

As a consequence of (5) a player knows which information set he is in but not which vertex of the information set.
A player i is said to have perfect information in a game G is each information set for this player consists of one
element. The game G in extensive form is said to have perfect information if every player has perfect information.
A pure strategy for player i denoted by ui is defined as a function which assigns to each of player’s i information
sets S ji , one of the edges leading from a representative vertex in this set S ji . We denote by Ωi the set of all pure
strategies of player i, ui ∈ Ωi and by u = (u1 , . . . , uN ) the N-tuple of all players strategies, with u ∈ Ω = Ω1 × . . . ΩN .
A game in extensive form is finite if it has a finite number of vertices, hence each player has only a finite number of
strategies. Under this definition most parlor games are finite (think of chess). Let us look at a couple of examples.

Example 1.2. In the game of Matching Pennies (see Figure 1.1) player 1 chooses “heads" (H) or “tails" (T), player
2, not knowing this choice, also chooses between H or T. If the two choose alike (matching) than player 2 wins 1
cent from player 1 (hence +1 for player 2 and -1 for player 1); else player 1 wins 1 cent from player 2 (hence the
reverse case of the above). The game tree is shown below with vectors at the terminal vertices indicating the cost
function, while the number near vertices denote the player to whom the move corresponds. The dotted (shaded) area
indicates moves in the same information set.

H T

2 2

H T H T

(-1,1) (1,-1) (1,-1) (-1,1)

Figure 1.1: Game of matching pennies in extensive form

The next two figures show other two zero-sum game examples which differ by the information available to player
2 at the time of its play (information set), denoted by the shaded area (dotted). In the first case, Figure 1.2, the two
possible nodes of player 2 are in the same information set, implying that even though player 1 acts before player 2
does, player 2 does not have access to it s opponent decision. This means that at the time of its play, player 2 does
8 CHAPTER 1. THE NAME OF THE GAME

not know at which node (vertex) he is. This is as saying that both players act simultaneously. The extensive form
in the second case Figure 1.3, admits a different matrix game in normal form. In this case each node of player 2
is included in a separate information set, i.e., has perfect information as to which branch of the tree player 1 has
chosen.

L R

2 2

L R L R
M M

(-1,1) (-3,3) (0,0) (-6,6) (-2,2) (-7,7)

Figure 1.2:

L R

2 2

L R L M R
M

(-1,1) (-3,3) (0,0) (-6,6) (-2,2) (-7,7)

Figure 1.3:

1.3 Games in Normal Form

Games in normal-form (strategic form) model scenarios in which two or more players must make a one-time decision
simultaneously. These games are sometimes referred to au one-shot game, simultaneous move games. The normal-
form is a more condensed form of the game, stripped of all features but the choice of each player’s pure strategies,
and it is more convenient to analyze. The fact that all players make their choice of strategy simultaneously has
nothing to do with a temporal constraint, but rather with a constraint on the information structure particular to this
type of game. The information structure of a game is a specification of how much each player knows at the time he
chooses his strategy. For example, in Stackelberg games, [18], where there are leaders and followers, some players
(followers) choose their strategies only after the strategic choices made by the leaders have already been revealed.
1.3. GAMES IN NORMAL FORM 9

In order to describe a normal-form game we need to specify players’ strategy spaces and cost functions. A strategy
space for a player is the set of all strategies available to that player, where a strategy is a complete plan of action
for every stage of the game, regardless of whether that stage actually arises in play. A cost function of a player i,
Ji , is a mapping from the cross-product of players’ strategy spaces to that player’s set of costs (normally the set of
real numbers), hence depends on all players’ strategies. We will be mostly concerned with these type of normal-
form games herein. For any strategy profile (N-tuple of players’ pure strategies) u = (u1 , . . . , uN ) ∈ Ω, where
Ω = Ω1 × . . . ΩN is the overall pure-strategy space, let Ji (u) ∈ R denote the associated cost for player i, i ∈ N .
Now these costs depend on the context: in economics they represent a firm’s profits or a consumer’s (von Neumann-
Morgenstern) utility, while in biology they represent the fitness (expected number of surviving offspring). We gather
all these real numbers Ji (u) to form the combined pure-strategy vector cost function of the game, J : Ω → RN , where
 
Ji (u)
 .. 
J( u ) = 
 . 

JN (u)
We shall denote a normal-form game by G (N , Ωi , Ji ). It is possible to tabulate the function J for all possible values
of u1 , . . . , uN ∈ Ω either in the form of a relation (easier for continuous or infinite games), or, as an N-dimensional
array (table) in the case of finite games (when Ω is finite set). In this latter case and when N = 2 this reduces to
a matrix whose size is given by the number of available choices for the two players and whose elements are pairs
of real numbers corresponding to outcomes (costs) for the two players. That is the reason why even when there are
N > 2 players such normal-form games are called matrix games.
Let us look at a few examples for N = 2, where we shall list player 1’s choices as the rows and player 2’s choices
as the columns. Hence entry ( j, k) indicates the outcome of player 1 using the j pure strategy and player 2 using k
strategy.

Example 1.3. Matching Pennies:


Consider the game of Matching Pennies above, where each player has two strategies “Heads" (H) or “Tail" (T). The
normal-form of this game is described by the matrix

player 2
H T
player 1 H (-1,1) (1,-1)
T (1,-1) (-1,1)

or given as the matrix ' (


(−1, 1) (1, −1)
M=
(1, −1) (−1, 1)
Most of the times instead of M we shall use a pair of cost matrices (A, B) to indicate the outcome for each player
separately, matrix A for player 1 and matrix B for player 2. For the above game this simply means the pair of matrices
(A, B) where ' ( ' (
−1 1 1 −1
A= , B=
1 −1 −1 1

It turns out that one can transform any game in extensive-form into an equivalent game in normal-form, so we shall
restrict most of our theoretical development to games in normal-form only.
10 CHAPTER 1. THE NAME OF THE GAME

1.4 Game Features

Depending of various features of the game one could classify them in different categories. Below we briefly discuss
such classification depending on the competitive nature of the game, the knowledge/information available to the
players, and the number of times the game is repeated.

1.4.1 Competitive versus Cooperative

A cooperative game is one in which there can be cooperation between the players and/or they have the same cost
(also called team games). A non-cooperative game is one where an element of competition exists and we can have
coordination games, constant-sum games, and games of conflicting interests. We give below such examples for the
class of matrix games.

Coordination Games

In coordination games, what is good for one player is good for all players. An example coordination game in normal
form is: ' (
(−3, −3) (0, 0)
M=
(0, 0) (−4, −4)
In this game, players try to coordinate their actions. The joint action ( j, k) = (2, 2) is the most desirable (least cost),
but the joint action ( j, k) = (1, 1) also produces negative costs to the players. This particular game is called a pure
coordination game since the players always receive the same payoff.
Other coordination games move more toward the domain of games of conflicting interest. For example, consider the
Stag-Hunt (SH) game: stag hare (we shall come back to this example)
' (
(−4, −4) (0, −1)
M=
(−1, 0) (−1, −1)
In this game, each player can choose to hunt stag (first row or first column) or hare (second row or second column).
In order to catch a stag (the biggest animal, hence the bigger payoff or lowest cost of −4), both players must choose
to hunt the stag. However, a hunter does not need help to catch a hare, which yields a cost of −1. Thus, in general,
it is best for the hunters to coordinate their efforts to hunt stag, but there is considerable risk in doing so (if the other
player decides to hunt hare). In this game, the costs (payoffs) are the same for both players when they coordinate
their actions, but their costs are not equal when they do no coordinate their actions.

Constant-Sum Games

Constant-sum games are games in which the sum of the players’ payoffs sum to the same number. These games are
games of pure competition of the type “my gain is your loss." Zero-sum games are particular example of these games
and we shall study them in detail in the next chapter. An example of such game is the Rock, Paper and Scissors
game with the matrix form below
 
(0, 0) (1, −1) (−1, 1)
 
M =  (−1, 1) (0, 0) (1, −1) 
(1, −1) (−1, 1) (0, 0)
1.4. GAME FEATURES 11

Games of Conflicting Interests

These fall in between constant-sum games and coordination games and cover a large class, whereby the players
have somewhat opposing interests, but all players can benefit from making certain compromises. One can say that
people (and learning algorithms) are often tempted to play competitively in these games (both in the real world and
in games), though they can often hurt themselves by doing so. One of the most celebrated games of this type is the
Prisoners’ Dilemma game that we shall encounter many times. Here two criminals are the players’ action choices
are to "Confess" (defect) or "Not Confess" (cooperate), resulting in corresponding jail time.
' (
(5, 5) (0, 15)
M=
(15, 0) (1, 1)

1.4.2 Repetition

Any of the previously mentioned kinds of games can be played any number of times between the same players.

One-shot Games

In one-shot games, players interact for only a single round (or stage). Thus, in these situations there is no possible
way for players to reciprocate (by inflicting punishment or rewards) thereafter.

Repeated Games

In repeated games, players interact with each other for multiple rounds and each time they play the same game. In
such situations, players have opportunities to adapt to each others’ behaviours (i.e.,“ learn") in order to try to become
more successful. There can be finite-horizon repeated games where the same game is repeated a fixed number of
times by the same players, or infinite-horizon games in which the play is repeated indefinitely.

Dynamic Games

The case where the game changes when players interact repeatedly is what can be called a repeated dynamic game,
characterized by a state. These are also called differential games. It is now good to point out that an important such
class are stochastic games or Markov games. These are extensions of Markov decision processes to the scenario
with N multiple adapting players. In a stochastic game we can model probabilistic transitions, and these are similar
to extensive form games, played in a sequence of stages. Formally, a N-player (player) stochastic game is denoted
by (Σ, Ωi , T , Ji ), where Σ is a set of states, Ωi , i = 1, . . . , N is set of actions for player i and Ω = Ω1 × · · · × ΩN is
the set of joint actions, T is a transition function T : Σ × Ω × Σ → [0, 1] and Ji is the cost function Ji : Σ × Ω → R
for player (player) i.
We shall not cover these games in these notes.

1.4.3 Knowledge Information

Depending on the amount of information a player has different plays and outcomes may be possible. For example,
does an player know the costs (or preference orderings) of other players? Does the player know its own cost (payoff)
12 CHAPTER 1. THE NAME OF THE GAME

matrix? Can he view the actions and costs of other players? All of these (and other related) questions are important
as they can help determine how the player should learn and act. Theoretically, the more information an player has
about the game, the better he should be able to do. In short, the information an player has about the game can
vary along the following dimensions: knowledge of the player’s own actions; knowledge of the player’s own costs;
knowledge of the existence of other players; knowledge of the other players’ actions; knowledge of the other players’
costs and in case learning is used, knowledge of the other players’ learning algorithms.
In a game with complete information each player has knowledge of the payoffs and possible strategies of other
players. Thus, incomplete information refers to situations in which the payoffs and strategies of other players are not
completely known. The term perfect information refers to situations in which the actual actions taken by associates
are fully observable. Thus, imperfect information implies that the exact actions taken by associates are not fully
known.

1.5 Solution Concepts

A solution concept briefly describes how to use a certain set of mathematical rules to decide how to play the game.
Various solution concepts have been developed, in trying to indicate/predict how players will behave when they play
a generic game. Herein we only introduce these solution concepts in short, and many will be further studied in depth.

1.5.1 Minimax Solution

One of the most basic properties of every game is the minimax solution (or minimax strategy), also called security
strategy. The minimax solution is the strategy that minimizes a player’s maximum expected loss (cost). There is an
alternate set of terminology we can use (often used in the literature as we mentioned before). Rather than speak of
minimizing our maximum expected loss, we can talk of maximizing our minimum expected payoff. This is known
as the maximin solution. Thus, the terms minimax and maximin can be used interchangeably. We shall study this
solution in depth in the next chapter for zero-sum games.
Let us look at the Prisoner’s Dilemma matrix game above. In the (PD) prisoner’s dilemma, both players are faced
with the choice of "Confess" (defecting) or to "Not Confess" (cooperating). If both players defect (confess), they
both receive a relatively low cost (which is 5 years in prison). However, if one of the players does "Not confess"
and the other "Confesses" (defects), the defector gets a very low cost (0 years or gets free) (called the temptation
cost), and the other ("Not confess") gets all the blame and receives a relatively high cost (15 years). If both players
do "Not confess" (cooperate), they get only 1 year (low cost). So what should you do in this game? Well, there are
a lot of ways to look at it, but if you want to play conservatively, you might want to invoke the minimax solution
concept, which follows from the following reasoning. If you play "Confess" (defect), the worst you can do is get a
cost of 5 (thus, we say that the security of defecting is 5 years). Likewise, if you play "Not Confess" (cooperate), the
worst you can do is get a cost of 15 years (security of cooperating is 15 years). In this game the minimax strategy
(lowest cost of the two) in this game is "Confess" or to defect, and the minimax value is 5. However, even though the
minimax value is the lowest cost you can guarantee yourself without the cooperation of your associates, you might
be able to do much better on average than the minimax strategy if you can either outsmart your associates or get
them to cooperate (see above for 1 year only ) (in a game that is not fully competitive). So we need other solution
concepts as well.
1.5. SOLUTION CONCEPTS 13

1.5.2 Best Response

Another basic solution concept in multi-player games is to play the strategy that gives you the lowest cost given your
opponents’ strategies. That is exactly what the notion of the best response suggests. Suppose that you are player i,
and your opponents’ play u−i . Then the your best response in terms of pure strategies is u∗i such that

Ji (u∗i , u−i ) ≤ Ji (ui , u−i ), ∀ui ∈ Ωi

We shall see that the best response idea has had a huge impact on learning algorithms. If you know what your other
players are going to do, why not get the lowest cost (highest payoff) you can get (i.e., why not play a best response)?
Taking this one step further, you might reason that if you think you know what other players are going to do, why
not play a best response to that belief ? While this obviously is not an unreasonable idea, it has two problems. The
first problem is that your belief may be wrong, which might expose you to terrible risks. Second (and perhaps more
importantly), we will see that this “best-response" approach can be quite unproductive in a repeated game when
other players are also learning/adapting.
Best response dynamics
In evolutionary game theory, best response dynamics (BR dynamics) represents a class of strategy updating rules,
where players strategies in the next round are determined by their best responses. In a large population model,
players choose their next action probabilistically based on which strategies are best responses to the population as a
whole. In general this will lead to a best-reaction correspondence (possibly multi-valued), with “jumps" from one
strategy to another. Importantly, in these models players only choose the best response on the next round that would
give them the highest payoff on the next round. Players do not consider the effect that choosing a strategy on the
next round would have on future play in the game. This constraint results in the dynamical rule often being called
myopic best response. In order to avoid the use of multi-valued best response correspondences, some models use
smoothed best response functions.

1.5.3 Nash Equilibrium Solution

We now introduce briefly a most celebrated solution concept for a N-player non-cooperative game G . John Nash’s
identification of the Nash equilibrium concept has had perhaps the single biggest impact on game theory. Simply
put, in a Nash equilibrium, no player has an incentive to unilaterally deviate from its current strategy. Put another
way, if each player plays a best response to the strategies of all other players, we have have a Nash equilibrium.
We will discuss the extent to which this concept is satisfying by looking at a few examples later on.

Definition 1.4. Given a game G a strategy N-tuple (profile) u∗ = (u∗1 , . . . , u∗N ) is said to be a Nash equilibrium (or
in equilibrium) if and only if

Ji (u∗1 , . . . , u∗N ) ≤ Ji (u∗1 , . . . , u∗i−1 , ui , u∗i+1 , . . . , u∗N ), ∀ui ∈ Ωi , ∀i ∈ N (1.1)

or, in compact notation,


Ji (u∗i , u∗−i ) ≤ Ji (ui , u∗−i ), ∀ui ∈ Ωi , ∀i ∈ N

where u∗ = (u∗i , u∗−i ) and u∗−i denotes u∗ for all players except the ith one, N = {1, . . . , N}.
14 CHAPTER 1. THE NAME OF THE GAME

Thus u∗ is an equilibrium if no player has a positive incentive for unilateral chance of his strategy, i.e., assuming the
others keep their same strategies. In particular this means that once all choices of pure strategies have been revealed
no player has any cause for regret (hence the point of no regret concept).

Example 1.5. Consider the game with normal form

player 2
u2,1 u2,2
player 1 u1,1 (3,1) (0,0)
u1,2 (0,0) (1,3)

and note that both (u1,1 , u2,1 ) and (u1,2 , u2,2 ) are equilibrium pairs. For matrix games we shall use the matrix notation
and for the above we will say that (3,1) and (1,3) are equilibria.

If we look at another game (a coordination game),


' (
(−3, −3) (0, 0)
M=
(0, 0) (−1, −1)

By inspection as in the above, we can conclude that both the joint actions ( j, k =)(1, 1) and ( j, k) = (2, 2) are Nash
equilibria since in both cases, neither player can benefit by unilaterally changing its strategy. Note, however, that
this illustrates that not all Nash equilibria are created equally. Some give better costs than others (and some players
might have different preference orderings over Nash equilibrium).
While all the Nash equilibria we have identified so far for these two games are pure strategy Nash equilibrium, they
need not be so. In fact, there is also a third Nash equilibrium in the above coordination game in which both players
play mixed strategies. Unfortunately not every game has an equilibrium. Take a look at the game of matching
pennies in Example 1.3, and you can see that it does not have (pure) equilibrium pairs.
Strategic dominance is another solution concept that can be used in many games. Loosely, an action is strategically
dominated if it never produces lower costs (higher payoffs) and (at least) sometimes gives higher costs (lower pay-
offs) than some other action. An action is strategically dominant if it strategically dominates all other actions. We
shall formally define this later on. For example, in the Prisoner’s Dilemma (PD) game, the action "Confess" (defect)
strategically dominates the "Not confess" (cooperate) in the one-shot game. This concept of strategic dominance
(or just dominance, as we will sometimes call it) can be used in some games (called iterative dominance solvable
games) to compute a Nash equilibrium.
Here are a couple more observations about the Nash equilibrium as a solution concept:
• In constant-sum games, the minimax solution is a Nash equilibrium of the game. In fact, it is the unique Nash
equilibrium of constant-sum games as long is there is not more than one minimax solution (which occurs only when
two strategies have the same security level). We shall study this in the next chapter.
• Since a game can have multiple Nash equilibrium, this concept does not tell us how to play a game (or how we
would guess others would play the game). This poses another question: Given multiple Nash equilibria, which one
should (or will) be played? This leads to the idea of considering refinements of the NE concept.
1.5. SOLUTION CONCEPTS 15

1.5.4 Pareto Optimality

One of the features of a Nash equilibrium (NE) is that in general it does not correspond to a socially optimal outcome.
That is, for a given game it is possible for all the players to improve their costs (payoffs) by collectively agreeing to
choose a strategy different from the NE. The reason for this is that a posteriori some players may choose to deviate
from such a cooperatively agreed-upon strategy in order to improve their payoffs further at the group’s expense. A
Pareto optimal equilibrium describes a social optimum in the sense that no individual player can improve his payoff
(or lower his cost) without making at least one other player worse off. Pareto optimality is not a solution concept,
but it can be an important attribute in determining what solution the players should play (or learn to play). Loosely,
a Pareto optimal (also called Pareto efficient) solution is a solution for which there exists no other solution that gives
every player in the game a higher payoff (lower cost). A PE solution is formally defined as follows.

Definition 1.6. A solution u∗ is strictly Pareto dominated if there exists a joint action u ∈ Ω for which Ji (u) < Ji (u∗ )
for all i, and weakly Pareto dominated if there exists a joint action u )= u∗ ∈ Ω for which Ji (u) ≤ Ji (u∗ ) for all i.

Definition 1.7. A solution u∗ is weakly Pareto efficient (PE) if it is not strictly Pareto dominated and strictly PE if it
is not weakly Pareto dominated.

Often, a Nash equilibrium (NE) is not Pareto efficient (optimal). Then one speaks of a loss of efficiency, which
is also referred to as the Price of Anarchy. An interesting problem is how to design games with improved Nash
efficiency, and pricing or mechanism design is concerned with such issues.
In addition to these solution concepts other important ones include the Stackelberg equilibrium [18], which is rele-
vant in games where the information structure plays an important role), and correlated equilibria [34], [79], which is
relevant in games where the randomization used to translate players’ mixed strategies into actions are correlated.

1.5.5 Examples

Recall Example 1.3 of the Matching Pennies game. This is a zero-sum game, or a strictly competitive game, where
the interests are diametrically opposed and B = −A. It turns out this game has no (pure) Nash equilibrium.
Let’s look at a couple of other examples of the simplest two-player matrix game, where the cost of the two players
take form of a two matrices, A and B.

Example 1.8. Battle of the Sexes (BoS) Game:


Battle of the sexes (BoS), also called Bach or Stravinsky is a two-player coordination game. Imagine a couple that
agreed to meet this evening, but cannot recall if they will be attending the opera (1st option) or a football match (2nd
option). The husband would most of all like to go to the football game. The wife would like to go to the opera. Both
would prefer to go to the same place rather than different ones. If they cannot communicate, where should they go?
The pair or cost matrices below is an example of Battle of the Sexes, where the wife chooses a row and the husband
16 CHAPTER 1. THE NAME OF THE GAME

chooses a column, A for wife, B for husband.


' ( ' (
−3 0 −2 0
A= , B=
0 −2 0 −3

This representation does not account for the additional harm that might come from not only going to different
locations, but going to the wrong one as well (e.g. he goes to the opera while she goes to the football game,
satisfying neither). In order to account for this, the game is sometimes represented as in the pair below
' ( ' (
−3 −1 −2 −1
A= , B=
0 −2 0 −3

This game has two pure strategy Nash equilibria, one where both go to the opera and another where both go to the
football game. For the first game, there is also a Nash equilibrium in mixed strategies, where the players go to their
preferred event more often than the other.

Example 1.9. Hawk-and-Dove (HD) Game (Chicken Game):


Consider a two-player game in which each player is animal fighting over some prey (or deciding or not to start a
conflict). Each can behave like a Hawk (1st choice) or a Dove (2nd choice), hence the parallel to the belligerent vs.
non-belligerent. The best outcome for each animal/player is that in which it acts as a Hawk while the other acts as
a Dove; the worst outcome is that when both act like Hawks. Hence each animal prefers to be hawkish if the other
is dovish, and dovish of the opponent is hawkish. A game that captures this is shown below. Notice that it has two
Nash equilibria (Hawk, Dove) and (Dove, Hawk) corresponding to two different conventions about the player who
yields.

' ( ' (
0 −4 0 −1
A= , B=
−1 −3 −4 −3
again a symmetric game. In terms of a single matrix M with double entry this is described by
' (
(0, 0) (−4, −1)
M=
(−1, −4) (−3, −3)

The earliest presentation of a form of the Hawk-Dove game was by John Maynard Smith and George Price in their
1973 Nature paper, “The logic of animal conflict". This is also known as the game of Chicken, as an influential
model of conflict for two players in game theory. The principle of the game is that while each player prefers not to
yield to the other, the worst possible outcome occurs when both players do not yield. The name “Hawk-Dove" refers
to a situation in which there is a competition for a shared resource and the contestants can choose either conciliation
“Dove" or conflict “Hawk" ; this terminology is most commonly used in biology and evolutionary game theory
(EGT). From a game-theoretic point of view, "chicken" and "hawk-dove" are identical; the different names stem
from parallel development of the basic principles in different research areas. We shall use the Hawk-Dove game in
the latter part of the notes concerned with EGT approach.
Chapter 2
Matrix Games: 2-player Zero-Sum

Chapter Summary

This chapter provides basic concepts and results for two-player zero-sum finite (matrix) (2PZSM) games. Material
is mostly adapted from [18], [71, 78].

2.1 Introduction

Recall the Matching Pennies example in Chapter 1, a two-player game in which each player has only two pure
strategies, “Heads" (H) or “Tails" (T). The normal form (strategic form) of this game is described by

P2

(H ) (T )
P1 (H ) ' (
−1 1
A=
(T ) 1 −1
and B = −A. Here we can think of player 1 (P1) losing a dollar to player 2 (P2) if their choices match and winning
a dollar from player 2 (P2) if they do not. This is an example of a two-player zero-sum matrix game that we study
in this chapter.
We start by formalizing such a game, the cost functions, strategy spaces and then move on to solution concepts for
the game.
Consider a two-player game, denoted G (N , Ωi , J ), where N = {1, 2} is the set of players, Ωi is the action set and
Ji the cost function for player i ∈ N . Thus J1 , J2 : Ω1 × Ω2 → R. Let the action of player 1 and 2 be denoted by
u1 ∈ Ω1 and u2 ∈ Ω2 , so that the two costs are J1 (u1 , u2 ), J2 (u1 , u2 ).
Such a game is called a two-player zero-sum game if

J1 (u1 , u2 ) + J2 (u1 , u2 ) = 0, ∀ ( u1 , u2 ) ∈ Ω 1 × Ω 2

and the overall action profile is u = (u1 , u2 ) ∈ Ω1 × Ω2 . Player 1 is the minimizer of J1 , while player 2 is the
minimizer of J2 . In a two-player zero-sum, based on the above relation, player 2 is the maximizer of J1 . In such a

17
18 CHAPTER 2. MATRIX GAMES: 2-PLAYER ZERO-SUM

case, sometimes we shall drop the index of cost function, and use J = J1 so that we say that Player 1 is the minimizer
of J, while Player 2 is the maximizer of J.
Assume now that player 1 (P1) and player 2 (P2) each have a finite number of discrete options/actions or pure
strategies to choose from, m1 = m, m2 = n. Then the set of their actions Ω1 and Ω2 can be simply identified
with the set of indices M1 := {1, ..., m} and M2 := {1, ..., n} corresponding to these possible actions. The action
u1 ∈ Ω1 of player 1 can have m values, and the j-th action can be identified with the index j ∈ M1 . Thus we let
Ω1 := {e11 , . . . , e1j , . . . , e1m }, where e1j ∈ Rm is the j-th unit vector in Rm . Similarly for player 2, u2 ∈ Ω2 can have n
values, and the k-th action can be identified with the index k ∈ M2 . Thus we let Ω2 := {e21 , . . . , e2k , . . . , e2n }, where
e2k ∈ Rn is the k-th unit vector in Rn . Let a jk denote the game outcome for player 1 when the j-th action is used by
player 1 and the k-th by player 2, respectively. This leads to an overall (m × n) matrix A, hence the name matrix
game. In terms of the cost function for player 1, we can write that his cost when (u1 , u2 ) pair is set to the ( j, k)-th
action pair or (u1 , u2 ) = (e1j , e2k ), is
J1 (e1j , e2k ) = a jk = (e1j )T A e2k
Thus
J1 (u1 , u2 ) = (u1 )T A u2
where u1 ∈ Ω1 , u2 ∈ Ω2 . Similarly, a cost matrix B can be defined for player 2, and correspondingly its cost function

J2 (u1 , u2 ) = (u1 )T B u2

Specializing the definition of a zeros-sum game to finite action set (matrix games), gives the condition

A + B = 0, or B = −A

Hence, we can simply identify G by using only a single cost matrix A. Each entry of the matrix A is an outcome of
the game corresponding to a particular pair of decisions by the players. Player 1 can be seen as choosing one of the
m rows in matrix A and player 2 as choosing one of its n columns.
As an example, consider the case when the first player has m = 3 choices, while the second player has n = 2 choices,
so that      
1 0 0
1   1   1  
e1 =  0  , e2 =  1  , e3 =  0 
0 0 1
' ( ' (
1 0
e21 = , e22 =
0 1

Consider the game with cost matrix  


5 3 −3
 
A= 1 2 0  (2.1)
3 4 1
Assuming that the game is played only once, then since player 1 is the minimizer, a reasonable choice for him is to
minimize his losses against any choice of player 2. Thus player 1 assumes the worst case i.e., that player 2 picks
the column with maximum in each row, hence 5, 2, or 4 and among these player 1 will chose the minimum of them
which is 2, hence he will choose row 2. Mathematically this means that he chooses row j∗ such that

max a j∗ k ≤ max a jk , j = 1, . . . , m (2.2)


k k
2.1. INTRODUCTION 19

If player 1 chooses like that, his losses will be at most 2, also called his loss ceiling, or the security level for his
losses, and strategy j∗ is the called a security strategy of player 1. Similarly player 2 who is the “maximizer" aims
to secure his gains, hence get at least the gain floor, or the security level for his gains, and strategy k∗ is the called a
security strategy of player 2. Assuming again the worst case, i.e., that player 1 picks the row with minimum in each
column, hence 1, 2, or −3, player 2 will pick among the maximum of them which is 2, hence he will choose column
2. So in this case a security strategy of player 1 is row 2 and of player 2 is column 2, i.e. (2, 2) and the outcome is 2.
Then denoting the left hand side in the above as

JU = min max a jk
j k

and similarly let


JL = max min a jk
k j

we note that in the above case


JU = 2
and
JL = 2
and the pair ( j∗ , k∗ ) = (2, 2) ∈ M1 × M2 is called a saddle-point equilibrium and the value of the game is J = 2.
Formally we can write

Definition 2.1. For a given m × n zero sum matrix game G with cost matrix A, let j∗ ∈ M1 be a (pure) strategy
chosen by player 1. Then j∗ is called a security strategy for player 1 if the following holds for all j ∈ M1

JU := max a j∗ k = max (e1j∗ )T A e2k ≤ max (e1j )T A e2k = max a jk , ∀ j ∈ M1


k∈M2 k∈M2 k∈M2 k∈M2

hence
JU := min max a jk = min max (e1j )T A e2k
j∈M1 k∈M2 j∈M1 k∈M2

Similarly, a security strategy k∗ player 2 can be defined giving his security level denoted by JL . JU and JL are the
security level of player 1 and 2, respectively, or the upper and lower value of the game.
Consider another example given by  
0 −1 3
 
A= 1 0 0  (2.3)
3 4 1
In this case the security level of player 1 is (for each row maximum over columns is [3, 1, 4]T and player 1 picks row
with min value, so row 2 is picked)
JU = min max a jk = 1
j k

while the security level of player 2 is (for each column minimum over rows is [0, −1, 0] and player 2 picks column
with max value, so either column 1 or 3 is picked)

JL = max min a jk = 0
k j
20 CHAPTER 2. MATRIX GAMES: 2-PLAYER ZERO-SUM

If both players play their security strategies, hence either they use the strategy pairs (2, 1) or (2, 3), then the outcome
of the game is either 1 or 0.
It is obvious that in any matrix game, there always exists at least a security strategy for each player since there are
finite number of choices. Moreover, (pure) maximin value is ≤ (pure) minimax value, i.e.,

JL ≤ JU

To prove this note that we have


min a jk ≤ a jk , ∀ j, k
j

and
a jk ≤ max a jk , ∀ j, k
k

Hence taking some fixed j = j∗ and k = k∗ ,

min a jk∗ ≤ a j∗ k∗ ≤ max a j∗ k


j k

hence with the notation above, and taking j∗ and k∗ as the security strategies, yields

min a jk∗ = JL ≤ a j∗ k∗ ≤ JU = max a j∗ k (2.4)


j k

This says that in any game the outcome is between the JL and JU which are also called the lower and upper value
(floor and ceiling value) of a game. If these two values are equal then the game has a saddle point equilibrium, and
equal value is called the value of the game; we saw in example (2.1) this is the case and the value of the game is
J ∗ = 2. Unfortunately not all games posses such a saddle-point equilibrium.
Loosely speaking we shall call a strategy pair ( j∗ , k∗ ) an equilibrium (or to be in equilibrium) if after the game is
over, based on the outcome obtained, the players have no ground for regretting their past actions. In order to see
where this equilibrium concept is coming from let us look at another example given below in (2.5). We assume that
the two players act independently and the game is played once.
 
4 0 −1
 
A= 0 −1 3  (2.5)
1 2 1

In this case the security level of player 1 is (for each row maximum over columns is [4, 3, 2]T and player 1 picks row
with min value, so row 3 is picked)
JU = min max a jk = 2
j k

and his security strategy is row 3. The security level of player 2 is (for each column minimum over rows is
[0, −1, −1] and player 2 picks column with max value, so column 1 is picked)

JL = max min a jk = 0
k j

and his security strategy is column 1. Now if both players play their security strategies, hence they use the strategy
pair (3, 1), then the outcome of the game is 1 which is in between JL and JU . To test the condition for regret, consider
player 1 evaluating this outcome and thinking: “If I knew that player 2 plays his security strategy (hence column
1), I would have chosen row 2 and get a smaller outcome of 0", hence regrets his actions. Similarly player 2 can
2.1. INTRODUCTION 21

think, and we see that the security strategies pair does not posses equilibrium properties. On the other hand such a
security strategy pair is conservative, since if a player would have chose an different one, then he might get a worse
outcome than his security level. Here we see that the knowledge one player has can make a difference in how he
plays and what outcome he gets. Such knowledge would be for example available to player 1 if the players do not
act independently but player 2 acts first, followed by player 1. In general we shall assume simultaneous play.
Let us look again at example (2.1). The security strategies pair is (2, 2) with value 2 and neither player has any
cause for regret, and we say the strategies are optimal against one another, and in equilibrium, called saddle-point
equilibrium in pure strategies. Also since the security levels coincide it does not matter if they play simultaneously
(independently) or in a predetermined order. We give next the formal definition of such an equilibrium.

Definition 2.2. For a given m × n matrix game A = [a jk ], let row j∗ and column k∗ , i.e., ( j∗ , k∗ ) be a pair of (pure)
strategies chosen by the two players. Then if

a j∗ k ≤ a j∗ k∗ ≤ a jk∗ , ∀ j = 1, . . . , m, ∀k = 1, . . . , n

the pair ( j∗ , k∗ ) is a saddle-point equilibrium and the matrix game is said to have a saddle-point in pure strategies,
and the corresponding outcome J ∗ = a j∗ k∗ is the saddle-point value of the game.

Note that this point is both the minimum in its column and maximum in its row.
The following result gives properties relating saddle-points with security strategies in zero-sum matrix games.

Proposition 2.3. Consider an m × n matrix game A = [a jk ], with JL = JU . Then


(i) the game has a saddle-point in pure strategies.
(ii) a pair of (pure) strategies ( j∗ , k∗ ) is a saddle-point if and only if j∗ is a security strategy for player 1 and k∗ is a
security strategy for player 2.
(iii) the saddle-point value J ∗ is uniquely given as J ∗ = JU = JL .

Proof: (i) Let j∗ be a security strategy for player 1 and

JU = max a j∗ k ≥ a j∗ k , ∀k (2.6)
k

and k∗ be a security strategy for player 2, so that

JL = min a jk∗ ≤ a jk∗ , ∀j (2.7)


j

Note that j∗ and k∗ always exist since there are a finite number of choices of the actions of each player. Then taking
k = k∗ on the RHS of (2.6) we get
a j∗ k∗ ≤ JU = max a j∗ k
k
and taking j = j∗ on the RHS of (2.7)
JL = min a jk∗ ≤ a j∗ k∗
j

These are another justification for (2.4). By assumption JU = JL , so that from above

a j∗ k∗ ≤ max a j∗ k = JU = JL = min a jk∗ ≤ a j∗ k∗


k j
22 CHAPTER 2. MATRIX GAMES: 2-PLAYER ZERO-SUM

hence JU = JL = a j∗ k∗ . Using this on the LHS of (2.6) and (2.7) yields

a j∗ k∗ ≥ a j∗ k , ∀k and a j∗ k∗ ≤ a jk∗ , ∀ j

which by Definition 2.2, shows that ( j∗ , k∗ ) is a saddle-point.


(ii) Sufficiency part has just been proved in (i). For necessity assume ( j∗ , k∗ ) is a saddle-point. Then from Definition
2.2,
a j∗ k ≤ a j∗ k∗ ≤ a jk∗ , ∀ j = 1, . . . , m, ∀k = 1, . . . , n
and from the outer left and right side it follows that

a j∗ k ≤ a jk∗ , ∀ j = 1, . . . , m, ∀k = 1, . . . , n

hence
max a j∗ k ≤ a jk∗ , ∀ j = 1, . . . , m
k
Now since a jk∗ ≤ maxk a jk is true always, combining with the above yields

max a j∗ k ≤ a jk∗ ≤ max a jk , ∀ j = 1, . . . , m


k k

and now the outer left and right give (2.2) hence j∗ is a security strategy for player 1. Similarly it can be shown that
k∗ is security strategy for player 2. Part (iii) is immediate from above.
!
Remark 2.4. Any two saddle-point strategies are ordered interchangeable, i.e., if ( j1 , k1 ) and ( j2 , k2 ) are two saddle-
point strategies, then ( j1 , k2 ) and ( j2 , k1 ) are also saddle-point strategies.

As another example let us look again at the Matching Pennies game, with cost matrix
' (
−1 1
A=
1 −1

Note that in each row, max over columns is [1, 1]T and player 1 chooses min so he can pick either row 1 or 2, which
both give JU = 1. Similarly for each column, min over rows is [−1, −1] and player 2 chooses max, so he can pick
either column 1 or 2, giving JL = −1. But JL )= JU in this case, so this game has no saddle point and hence no
solution in pure strategies.
As a way to get out of this difficulty one could decide not on a single (pure) strategy but on a choice between pure
strategies as dictated by chance (randomizing). Such a probability combination of the original pure strategies is
called a mixed strategy.

2.2 Pure and Mixed Strategies

A player is said to use a mixed strategy whenever he/she chooses to randomize over the set of available actions.
Formally, a mixed strategy is a probability distribution that assigns to each available action a likelihood of being
selected. If only one action has a positive probability of being selected, the player is said to use a pure strategy.
Thus when an equilibrium cannot be found in pure strategies the space of strategies can be enlarged, and the players
are allowed to base their decisions on random events, hence mixed strategies. This is similar to the case when one
2.2. PURE AND MIXED STRATEGIES 23

tries to solve the equation x2 + 1 = 0 which does not have a solution in real number, but does in the enlarged space
of complex numbers. Check that in the Matching Pennies game if both players choose each of their pure strategies
with probability 12 then each will have an expected cost (loss) of 0 which seems an acceptable solution. This is the
case when the game is played repeatedly and the player 1 (player 2) aims to minimize (maximize) the expected cost
(outcome) of individual plays.
The introduction of mixed strategies was successful as a new solution concept because von Neumann, [115] was
able to show (even from 1928) that for any matrix game the minimax value is equal to the maximin value in mixed
strategies, hence any such game has a solution in mixed strategies. This is known as von Neumann’s Minimax
Theorem and is one of the key results of the theory of finite two-player zero-sum games (two-player zero-sum
matrix (2PZSM) games). We shall prove it soon but for now let us look at some formal definition and properties of
mixed strategies and expected costs in such strategies.
Consider a 2PZSM game with cost matrix A and u1 ∈ Ω1 := {e11 , . . . , e1j , . . . , e1m }, u2 ∈ Ω2 := {e21 , . . . , e2k , . . . , e2n }.
Recall that the cost (or game outcome) when the two players use pure strategy pair ( j, k), j ∈ M1 := {1, . . . , m},
k ∈ M2 := {1, . . . , n}, or when (u1 , u2 ) = (e1j , e2k ) is [A] j,k = a jk written as

J (u1 , u2 ) = J (e1j , e2k ) = (e1j )T A e2k = [A] j,k = a jk (2.8)

Thus for any (u1 , u2 ) ∈ Ω1 × Ω2 , J : Ω1 × Ω2 → R, J (u1 , u2 ) = (u1 )T A u2 .


Imagine player 1 can use a device to randomly choose one of these pure strategies with a preassigned probability,
and this is called a mixed strategy. As an example of such a device think of using a fair coin for choosing between
two strategies. The player choose a mixed strategy (hence the probability distribution) and afterwards in (one or
many) plays of the game implements the pure strategy that is indicated by that device. This probability distribution,
i.e., the probabilities assigned to each of its pure strategies is what constitutes a mixed strategy. Formally we define
this below.
A mixed strategy of player 1 denoted by x is a probability distribution defined over the set Ω1 := {e11 , . . . , e1j , . . . , e1m }
of pure strategies and the set is denoted by ∆1 (Ω1 ). Let χ denote a random variable of choosing one of the pure
strategies in set Ω1 and let x j denote the probability that player 1 will choose the j-th action, e1j , (pure strategy) in
Ω1 , i.e.,
Pr[ χ = e1j ] = x(e1j ) := x j
Since Ω1 is finite, the probability distribution x is the vector of probabilities associated with pure actions, i.e.,
x = [x j ], j ∈ M1 = {1, . . . , m}. Thus x ∈ ∆1 , where
m
∆1 := {x ∈ Rm | ∑ x j = 1, x j ≥ 0, ∀ j = 1, . . . , m } (2.9)
j =1

This ∆1 set is the unit simplex of (m − 1) dimension, written also as

∆1 = {x ∈ Rm | 1Tm x − 1 = 0, x j ≥ 0, ∀ j}

where 1m ∈ Rm is the all ones vector. We can write any mixed-strategy x ∈ ∆1 as


m
x= ∑ e1j x(e1j ) = ∑ e1j x j (2.10)
j∈M1 j =1

where e1j ∈ Rm , j = 1, . . . , m are unit vectors and the pure strategies. Hence, by (2.10) and (2.9), any mixed-strategy
is some convex combination of the pure strategies, e1j . Moreover, pure strategies are just extreme case of mixed
strategies (vertices of the simplex ∆1 ), e.g., when x j = 1 and the rest are 0, giving e1j .
24 CHAPTER 2. MATRIX GAMES: 2-PLAYER ZERO-SUM

Similarly for player 2, we shall identify his mixed strategy with y = [yk ], k ∈ {1, . . . , n}, y ∈ ∆2 , where yk denotes
the probability that player 2 will choose action k from his n available (pure) alternatives in Ω2 and y ∈ ∆2 ,
n
∆2 := {y ∈ Rn | ∑ yk = 1, yk ≥ 0, ∀k = 1, . . . , n }
k=1

and
n n
y= ∑ e2k y(e2k ) = ∑ yk e2k
k=1 j =1

We sometimes denote ∆X = ∆1 × ∆2 .

2.2.1 Mixed-Strategy Cost Function

Recall the cost J (u1 , u2 ) when a pure strategy pair (u1 , u2 ) = (e1j , e2k ) is used. Assuming that the two players’
strategies are jointly independent, the probability of selecting pure strategy pair (u1 , u2 ) = (e1j , e2k ) ∈ Ω1 × Ω2 is
given by x(u1 ) · y(u2 ) = x(e1j ) · y(e2k ) = x j · yk . Then the expected (averaged) cost is
m n m n
∑ ∑ J ( u1 , u2 ) x ( u1 ) y ( u2 ) = ∑∑ J (e1j , e2k ) x(e1j ) · y(e2k ) = ∑ ∑ x j J (e1j , e2k ) yk := J (x, y) (2.11)
u1 ∈Ω1 u2 ∈Ω2 j =1 k=1 j =1 k=1

and this defines the mixed-strategy cost function, when x ∈ ∆1 , y ∈ ∆2 . Using (2.8) yields for the (average) or
expected cost when mixed-strategy (x, y) is used,
m n
J (x, y) = ∑ ∑ x j a jk yk = xT A y (2.12)
j =1 k=1

where x ∈ ∆1 and y ∈ ∆2 and A is the cost matrix in pure strategies, with A = [a jk ]. When x, y are restricted to be
vertices of ∆1 , ∆2 , hence are pure strategies, e1j , e2k , we see that this recovers J (e1j , e2k ) = a jk .
A game could be denoted by G (N , Ω, J ) in its pure-strategy representation, or when we refer to its mixed-strategy
extension, by G (N , ∆X , J ).
Remark 2.5. Note that in mixed strategies, while player 1 is the minimizer of J 1 (x, y) = J (x, y), player 2 is the
minimizer of J 2 (x, y) = −J (x, y) hence the maximizer of J (x, y) and

J 1 (x, y) + J 2 (x, y) = 0

again indicating a zero-sum game. The function J (x, y) = xT A y that player 1 aims to minimize and player 2 aims to
maximize is called the kernel of the game.

Now that we extended the space of strategies and the cost definition let us extend the concept of security strategies
in mixed strategies.

Definition 2.6. For a given two-player zero-sum matrix (2PZSM) game G (N , ∆X , J ) with m × n cost matrix A,
let x∗ ∈ ∆1 be a mixed strategy chosen by player 1. Then x∗ is called a mixed security strategy for player 1 if the
following holds for all x ∈ ∆1
JU := max x∗T A y ≤ max xT A y, ∀x ∈ ∆1
y∈∆2 y∈∆2
2.2. PURE AND MIXED STRATEGIES 25

hence
JU := min max xT A y
x∈∆1 y∈∆2

Similarly, a mixed strategy chosen by player 2, y∗ ∈ ∆2 is called a mixed security strategy for player 2 if the following
holds for all y ∈ ∆2
J L := min xT A y∗ ≥ min xT A y, ∀y ∈ ∆2
x∈∆1 x∈∆1

hence
J L := max min xT A y
y∈∆2 x∈∆1

The quantities JU and J L are the expected (average) security level of player 1 and 2, respectively, or the expected
(average) upper and lower value of the game.

Note that the maximum and minimum in Definition 2.6 are guaranteed to exist. This follows because for JU , the
quantity maxy∈∆2 xT A y is a linear hence continuous function of x ∈ ∆1 , and the simplex ∆1 over which optimization
is done is closed and bounded, hence compact. Similar properties hold for J L .
First we give some properties of the security strategies and security levels in mixed strategy case.
Lemma 2.7. In every two-player zero-sum matrix (2PZSM) game with cost matrix A, the security levels in pure and
mixed strategies satisfy the following relations

JL ≤ J L ≤ JU ≤ JU

Proof: We claim that for any real-valued function J : ∆1 × ∆2 → R

max min J (x, y) ≤ min max J (x, y)


y∈∆2 x∈∆1 x∈∆1 y∈∆2

holds. Then this holds for J (x, y) = xT A y and the middle inequality

J L ≤ JU

will follow.
The claim can be proved by using the fact that for ∀y ∈ ∆2 , ∀x ∈ ∆1

min J (x, y) ≤ J (x, y),


x∈∆1

Taking the maximum on both sides with respect to y yields

max min J (x, y) ≤ max J (x, y), ∀x ∈ ∆1


y∈∆2 x∈∆1 y∈∆2

Then, since this holds for ∀x ∈ ∆1 , it holds when we take minimum with respect of x again, so

max min J (x, y) ≤ min max J (x, y)


y∈∆2 x∈∆1 x∈∆1 y∈∆2

and the claim is proved.


The outer inequalities follow from the fact that pure strategies are a subset of the mixed strategies and any x ∈ ∆1 is
a convex combination of e1j .
!
26 CHAPTER 2. MATRIX GAMES: 2-PLAYER ZERO-SUM

Definition 2.8. For a given two-player zero-sum matrix (2PZSM) game with m × n cost A = [a jk ], let x∗ = (x∗ , y∗ ) ∈
∆X be pair of mixed strategies chosen by the two players. Then (x∗ , y∗ ) is a saddle-point game equilibrium in mixed
strategies if both
x∗T A y∗ ≤ xT A y∗ , ∀x ∈ ∆1

x∗T A y ≤ x∗T A y∗ , ∀y ∈ ∆2

hold. The quantity J = x∗T A y∗ is called the saddle-point value or the value of the game in mixed strategies.

Note that if the above holds for x∗ = (x∗ , y∗ ) = (e1j∗ , e2k∗ ), the equilibrium is pure, identified by ( j∗ , k∗ ), and Definition
2.2 is recovered. In the pure equilibrium case we saw that equality can hold only in special cases. It turns out that in
the mixed-strategy case the two values are always equal. This is one of the most important theorems in game theory
and we shall prove it below.

2.3 Minimax Theorem

The next result is one the most important results in game theory and it has many proofs. We give the original proof
due to John von Neumann (1928), [115]. The one due to Nash(1950) is based on Kakutani’s fixed-point theorem, a
much later result (1941).
The proof we give is based on the Separating Hyperplane theorem and uses the following lemma:

Lemma 2.9. Let any arbitray m × n matrix Q. Then either (i) or (ii) below must hold:
(i) there exists some y0 ∈ ∆2 such that xT Q y0 ≤ 0, ∀x ∈ ∆1
(ii) there exists some x0 ∈ ∆1 such that xT0 Q y ≥ 0, ∀y ∈ ∆2

We prove this lemma a little later.

Theorem 2.10 (Minimax Theorem). In any two-player zero-sum matrix (2PZSM) game with cost A, where A is a
m × n matrix, or G (N , ∆X , J ), where J (x, y) = xT A y, we have

min max J (x, y) = max min J (x, y)


x∈∆1 y∈∆2 y∈∆2 x∈∆1

or
min max xT A y = max min xT A y
x∈∆1 y∈∆2 y∈∆2 x∈∆1

where ∆1 and ∆2 denote the simpleces of appropriate dimensions

∆1 = {x ∈ Rm |1Tm x − 1 = 0, x j ≥ 0, ∀ j}

∆2 = {y ∈ Rn |1Tn y − 1 = 0, yk ≥ 0, ∀k}

Proof:
Step 1: We use Lemma 2.9 to show first that for any arbitrary constant c we have either

( a) max min xT A y ≥ c
y∈∆2 x∈∆1
2.3. MINIMAX THEOREM 27

or
( b) min max xT A y ≤ c
x∈∆1 y∈∆2

To do this, we use Lemma 2.9 for the matrix Q = −A + c 1m×n where 1m×n denotes a m × n matrix with all entries
equal to 1, any c constant. Note that this matrix has the property that xT 1m×n y = 1 for every x ∈ ∆1 , y ∈ ∆2 .
If (i) in Lemma 2.9 holds, it follows that there exists some y0 ∈ ∆2 such that

0 ≥ xT (−A + c 1m×n ) y0 = −xT A y0 + c, ∀x ∈ ∆1

which implies that


xT A y0 ≥ c, ∀x ∈ ∆1

from which (a) follows.


Alternatively, if (ii) in Lemma 2.9 holds, it follows that there exists some x0 ∈ ∆1 such that

0 ≤ xT0 (−A + c 1m×n ) y = −xT0 A y + c xT0 1m×n y = −xT0 A y + c, ∀y ∈ ∆2

Hence
xT0 A y ≤ c, ∀y ∈ ∆2

from which (b) follows.


Step 2: Assume by contradiction that there is a gap between the two values, i.e., there exists some k > 0 such that

min max xT A y = max min xT A y + k (2.13)


x∈∆1 y∈∆2 y∈∆2 x∈∆1

Now if a k > 0 exists such that (2.13) holds, then we can take c = maxy∈∆2 minx∈∆1 xT A y + 21 k and use Step 1. Then
either (a) or (b) have to hold for this c also. However check that neither (a) nor (b) hold, which is false, since at least
one of them has to be true. Thus the assumption (2.13) is false, hence there is no gap between the two values and
the proof is complete.
!
We now prove the above Lemma 2.9.
Proof: Proof of Lemma 2.9
There are two possible cases:
(A) ∃y0 ∈ ∆2 , ∃ξ0 ∈ Rm , ξ0, j ≥ 0, s.t. Q y0 + ξ0 = 0
or,
(B) For ∀y ∈ ∆2 and ξ ∈ Rm with ξ j ≥ 0, ∀ j ∈ {1, . . . , m}, Q y + ξ )= 0.
Consider first case (A). We show that (i) in the Lemma holds. Under case (A), for every x ∈ ∆1 , xT (Q y0 + ξ0 ) = 0,
i.e.,
xT Q y0 = −xT ξ0 ≤ 0

where we have used the fact that entries in x and ξ0 are nonnegative. The above inequality shows that (i) in the
Lemma statement holds.

Consider now case (B). We show that (ii) in the Lemma holds. This follows in three steps.
28 CHAPTER 2. MATRIX GAMES: 2-PLAYER ZERO-SUM

Step 1: From matrix Q, define the convex hull of the columns of [Q Im ], denoted by C . Specifically, let C be
defined as
n m n m
C = {x ∈ Rm | x = ∑ qk αk + ∑ e1j β j , f or some αk , β j ≥ 0, s.t., ∑ αk + ∑ β j = 1}
k=1 j =1 k=1 j =1

where qk are the columns of Q, or

C = {x ∈ Rm | x = Q α + β , f or some α ∈ Rn , β ∈ Rm s.t.αk , β j ≥ 0, 1Tn α + 1Tm β = 1}

Step 2:
CLAIM: If case (B) holds, then the vector 0 ∈ Rm does not belong to the convex hull C , i.e., 0 ∈
/ C.
Let us prove this claim by contradiction. Assume that 0 ∈ C , while under (B),

Q y + ξ )= 0, ∀y ∈ ∆2 , ∀ξ ∈ Rm , ξ ≥ 0 (2.14)

By the definition of C above, since 0 ∈ C this means we could find some convex combination α , β such that
   
α1 β1
 .   . 
0 = Qα + β , f or some α = .  n  .  m T T
 .  ∈ R , β =  .  ∈ R , s.t. αk ≥ 0, β j ≥ 0 1n α + 1m β = 1
αn βm

Note that 1Tn α = ∑nk=1 αk )= 0 otherwise all αk are zero, which from Qα + β = 0 would mean all β j are zero also,
not possible in a convex combination. Then by dividing both sides in the above by 1Tn α yields

Qᾱ + β̄ = 0

where
1 1
ᾱ = α ∈ ∆2 , β̄ = β ∈ Rm , β̄ ≥ 0
1Tn α 1Tn α
This contradicts (2.14) for y = ᾱ , ξ = β̄ , hence our assumption that 0 ∈ C , is false and the CLAIM is proved.

Step 3:
Using the CLAIM in Step 2, since 0 ∈
/ C , and C is convex we can use the Separating Hyperplane theorem.
A version of this states that: Two nonempty convex subsets of Rm can be properly separated by a hyperplane if and
only their relative interiors are disjoint.
/ C , by the Separating Hyperplane theorem, there exists a hyperplane H passing through 0 ∈ Rm ,
Thus, since 0 ∈

H = {x ∈ Rm | γ T x = 0}

such that C is in one of the two half-spaces defined by H,

C ⊂ H+ = {x ∈ Rm | γ T x > 0} or C ⊂ H− = {x ∈ Rm | γ T x < 0}

where γ ∈ Rm denotes the normal to the hyperplane H. Suppose that C ⊂ H+ . Then it follows that for every η ∈ C we
have
γT η > 0 (2.15)
2.4. COMPUTATION OF MIXED-STRATEGY EQUILIBRIA 29

Any such an η ∈ C can be written as

η = Q y + ξ , f or some y ∈ Rn , ξ ∈ Rm s.t. yk ≥ 0, ξ j ≥ 0, 1Tn y + 1Tm ξ = 1

Thus from this and (2.15) it follows that for every yk ≥ 0, ξ j ≥ 0, such that 1Tn y + 1Tm ξ = ∑nk=1 yk + ∑mj=1 ξ j = 1,

γ T (Q y + ξ ) > 0

In particular this holds for those convex combinations with all ξ j = 0, i.e., for all y such that 1Tn y = 1, i.e.,

γ T Q y > 0, ∀y ∈ ∆2 (2.16)

Note that this is part of (ii) with x0 taken as x0 = γ . The only missing part is that we need to have x0 ∈ ∆1 . Now for
those convex combinations with ξ j = 1, and the rest 0, i.e. , for ξ = e1j and y = 0 we get

γ T e1j = γ j > 0

Thus γ j > 0, ∀ j. If ∑mj=1 γ j = 1, this γ (the hyperplane normal) provides indeed the desired x0 for (ii). If not, we can
obtain the desired x0 by rescaling γ by 1Tm γ . Indeed from (2.16) we can obtain

γ T Q y > 0, ∀y ∈ ∆2
1
where γ = 1Tm γ
γ ∈ ∆1 since ∑mj=1 γ j = 1. This shows (ii) holds and concludes proof under case (B). !
In Proposition 2.3 we related pure security strategies to a pure saddle-point solution. Based on Theorem 2.10,
JU = J L and we can obtain a similar result in the case of mixed strategies (the proof is similar and is omitted).

Corollary 2.11. Consider a two-player zero-sum matrix (2PZSM) game with m × n cost matrix A = [a jk ]. Then
(i) the game has a saddle-point in mixed strategies.
(ii) a pair of mixed strategies (x∗ , y∗ ) is a saddle-point if and only if x∗ is a security strategy for player 1 and y∗ is a
security strategy for player 2.
∗ ∗
(iii) the saddle-point value J is uniquely given as J = JU = J L .
(iv) in case of multiple saddle points, the mixed saddle-point strategies have the ordered interchangeable property.

Remark 2.12. The same result as in Theorem 2.10 holds for a more general case:
For any X and Y convex, closed, bounded subsets of Euclidean spaces. Consider a two-player zero-sum game with
kernel J : X ×Y → R continuous, convex in x for each y and concave in y for each x. Then

min max J (x, y) = max min J (x, y)


x∈X y∈Y y∈Y x∈X

This can be proved based on Kakutani’s fixed point theorem, [48]. In the linear case above, X = ∆1 , Y = ∆2 , the
kernel xT A y is linear in each variable separately, hence trivially convex in x and concave y. The proof of Theorem
2.10 uses geometric arguments instead.

2.4 Computation of Mixed-Strategy Equilibria

The Minimax theorem guarantees that every two-person zero-sum game (2PZSM) will have optimal mixed strate-
gies, but its proof is an existence proof and unfortunately does not guide us how to compute them. In the simplest
30 CHAPTER 2. MATRIX GAMES: 2-PLAYER ZERO-SUM

case when a saddle-point exists then the corresponding pure strategy pair ( j∗ , k∗ ) is a Nash equilibrium, which is
special case of the mixed strategy x∗ = e1j∗ , y∗ = e2k∗ . We give below some methods that can be used to solve the
easiest games.

2.4.1 Nash equilibrium & Dominated Strategies

Consider a two-player zero-sum matrix (2PZSM) game with cost m × n matrix A. We we say that row j dominates
row r if
a jk ≤ ark , ∀k

and
a jk < ark , f or at least one k

Then a strategy j is dominating another strategy if the choice of the first (dominating) is as good as the second
(dominated) strategy and in some cases better. Similarly, column k dominates column c if

a jk ≥ a jc , ∀j

and
a jk < a jc , f or at least one j

Dominated strategies can be eliminated since they will never be chosen. The following result states this.

Proposition 2.13. In a matrix game A assume that rows j1 , . . . , jl are dominated. Then player 1 has an optimal
strategy such that x j1 = · · · = x jl = 0. Moreover any optimal strategy for the game obtained after removing the
dominated strategies will be optimal for the original game.

A similar result holds for columns, hence we can work with smaller dimension matrices which can simplify the
process of finding an equilibrium.

Example 2.14. Consider the game with the matrix


 
2 1 4
 0 2 1 
 
A= 
 1 5 3 
4 3 2

Note that second row dominates the fourth row, hence player 1 will never use the 4-th strategy. Discarding it we are
left with a reduced matrix  
2 1 4
 
 0 2 1 
1 5 3
and we note that here column 3 dominates over column 1, hence player 2 will never use his first strategy, and we can
delete column 1. In  
1 4
 
 2 1 
5 3
2.4. COMPUTATION OF MIXED-STRATEGY EQUILIBRIA 31

row 3 is dominated by row 2, hence we can remove it and we are left with
' (
, 1 4
A =
2 1

so a 2 × 2 matrix game.

Now if the 2 × 2 game has a saddle point then all is done. If however there is no such saddle point we have to work
a bit more for the mixed-strategy equilibrium. An analytical method will be discussed in the next chapter for more
general bimatrix 2 × 2 games, that is applicable to zero-sum games also. Next we describe a graphical solution based
on minimax security strategies.

2.4.2 Graphical Solution

This method can be relatively easily applied to any 2 × n game, hence when one of the players has only 2 strategies
so that A has 2 rows and n columns.
Since every mixed strategy is a convex combination of pure strategies, a reasonable choice for the minimax strategy
of player 1 (a conservative strategy to limit his losses) is to choose x so as to secure (minimize) his losses against
every possible pure choice of player 2. When player 2 uses the k-th pure strategy u2 = e2k , while player 1 uses mixed
strategy x, the expected cost is by (2.12),

J (x, e2k ) = xT A e2k = (e2k )T AT x = ( AT )k x = ( AT x )k , k = 1, . . . , n (2.17)

where ( ˙)k denotes the k-th row in ( ˙), hence (AT )k denotes the k-th row of AT or k-th column (transposed) in A.
Hence the minimax strategy for player 1 is to minimize his maximum loss or V1 (x) := maxk∈M2 J (x, e2k ), i.e.,

x∗ = arg min V1 (x) = arg min max J (x, e2k ) (2.18)


x∈∆1 x∈∆1 k∈M2

leading to his cost being


JU = min V1 (x) = min max J (x, e2k ) = min max( AT )k x (2.19)
x∈∆1 x∈∆1 k∈M2 x∈∆1 k∈M2

Player 1 has to minimize


V1 (x) = max( AT )k x = max(a1k x1 + a2k x2 )
k∈M2 k

Then using x = [x1 , x2 ]T , with x2 = 1 − x1 , and denoting Rk (x1 ) = (a1k − a2k ) x1 + a2k , leads to minimization over
x1 ∈ [0, 1],
V1 (x1 ) = max(Rk (x1 )), k = 1, . . . , n
k

Thus V1 (x1 ) is the maximum of n linear functions Rk (x1 ) in the single variable x1 . These are called the best-
response functions and can be plotted on the same graph. Then the maximum of these linear functions Rk (x1 ) can
be minimized by graphical methods leading to the mixed security strategy x∗ for P1.

Example 2.15. Consider ' (


2 3 1 5
A=
4 1 5 0
In Figure below we show the functions which have to pass through points (0, a2k ) and (1, a1k )

R1 = −2 x1 + 4
32 CHAPTER 2. MATRIX GAMES: 2-PLAYER ZERO-SUM

R2 = 2 x1 + 1
R3 = −4 x1 + 5
R4 = 5 x1

6 6
R3
5 5
4 4
3
R1 3
2 2
R2
1 1
R4 0
0
0 x1* 1 x 1

Figure 2.1: Graphical method for finding mixed security strategy for P1

The red thick line represents V1 (x1 ) (notice that it is discontinuous). Then the lowest point on this line give as
coordinate x∗1 , hence x∗2 = 1 − x∗1 and the value of the game is V1 (x∗1 ).

This can be used to compute solutions for zero-sum matrix games where A is a 2 × 2 and extended for 2 × n matrix.
We need to compute the mixed security strategy of player 2, P2. A conservative maximin extension for player 2 is to
secure his gain, or to choose y so as to maximize his payoff for every possible pure choice by player 1. When player
1 uses the j-th pure strategy u1 = e1j , while player 2 uses mixed strategy y,

J (e1j , y) = (e1j )T A y = ( A ) j y = ( A y ) j , j = 1, . . . , m (2.20)

where ( ˙) j denotes the j-th row in ( ˙), hence ( A ) j denotes the j-th row of A. Then maximin extension for player 2
is to maximize V2 (y) := min j∈M1 J (e1j , y), i.e.,

y∗ = arg max V2 (y) = arg max min J (e1j , y) (2.21)


y∈∆2 y∈∆2 j∈M1

This is the maximin strategy for player 2, giving him the payoff

J L = max V2 (y) = max min J (e1j , y) = max min ( A ) j y (2.22)


y∈∆2 y∈∆2 j∈M1 y∈∆2 j∈M1

When n = 2 immediately as before one obtains y∗ , hence (x∗ , y∗ ). In case n = 3 pure strategies that give worse
outcome (dominated) are not used and can be eliminated; then a smaller 2 × 2 game can be considered. Similar
extension for 2 × n is possible, (see [18]).
2.4. COMPUTATION OF MIXED-STRATEGY EQUILIBRIA 33

2.4.3 Linear Programming Formulation

An alternative to the graphical method is to convert the matrix game into a linear programming (LP) problem for
which efficient algorithms are available (simplex being the most famous). This relationship between a two-player
zero-sum matrix game and an LP problem is described below.
We assume that the A matrix has all entries positive, hence a jk > 0, ∀ j, k. Then the average (expected) value of the
game is given by
J = min max xT A y = max min xT A y > 0
x∈∆1 y∈∆2 y∈∆2 x∈∆1

We start with the left hand side. For a given x ∈ ∆1 , the resulting xT A y is maximized over ∆2 , hence leading to a y
that depends on the given x. Thus we denote V1 (x) := maxy∈∆2 xT A y > 0, which satisfies

V1 (x) ≥ xT A y, , ∀y ∈ ∆2 (2.23)

Thus we can write J = minx∈∆1 maxy∈∆2 xT A y = minx∈∆1 V1 (x), where V1 (x) > 0. Since ∆2 is a n-dimensional
simplex, we can write (2.23) for y = e2k , k ∈ M2 = {1, . . . , n}, i.e., V1 (x) ≥ xT A e2k , ∀k ∈ M2 = {1, . . . , n}, and in an
equivalent form this leads to the vector inequality
 
(e21 )T
 .  T
1n V1 (x) ≥  . 
 . A x=A x
T

(e2n )T

where 1n is the all ones vector, 1n = [1, . . . , 1]T ∈ Rn . Using the scaling x) = x/V1 (x), or

x = x) V1 (x)

the foregoing can be written as


1n ≥ AT x)
Thus the minimization problem that player 1 has to solve is

min V1 (x)

subject to
1
AT x) ≤ 1n , x)T 1m = , , x) ≥ 0
V 1 ( x)
This is equivalent to the maximization problem
max x)T 1m
subject to
AT x) ≤ 1n , x) ≥ 0
which is a standard LP problem. Solving this LP problem will give the mixed security strategy of player 1 nor-
malized by the average value of the game J. Similarly if we start with the player 2 case and introduce V2 (y) :=
minx∈∆1 xT A y ≤ xT A y, , ∀x ∈ ∆1 and y) = V21(y) y we obtain that his equivalent minimization problem is

min y)T 1n

subject to
A y) ≥ 1m , y) ≥ 0
34 CHAPTER 2. MATRIX GAMES: 2-PLAYER ZERO-SUM

which is the dual of the maximization problem above. Thus we have shown that given a matrix game with all entries
positive there exists two LP problems (dual one to another) whose solutions yield the saddle-point solution of the
matrix game. In fact the positivity of A is only conventional and by a translation transformation can be removed (see
[18]).
We mention here an online computation method called fictitious play (FP), that can be used in a repeated game,
where players improve they payoff (lower their loss) by keeping track of previous plays and make a decision about
the next move based on these previous plays. We shall discuss it in Chapter 8.

2.5 Notes

One feature of a mixed strategy equilibrium is that given the strategies chosen by the other players, each player is
indifferent among all the actions that he/she selects with positive probability. Hence, in the Matching Pennies game,
given that player 2 chooses each action with probability 12 , player 1 is indifferent among choosing H, choosing T,
and randomizing in any way between the two. Because randomization is more complex and cognitively demanding
than is the deterministic selection of a single action, this raises the question of how mixed strategy equilibria can be
sustained and how mixed strategies should be interpreted.
A formal interpretation is given in [39] by John Harsanyi (1973), who showed that a mixed strategy equilibrium of
a game with perfect information can be viewed as the limit point of a sequence of pure strategy equilibria of games
with imperfect information. Specifically, starting from a game with perfect information, one can obtain a family of
games with imperfect information by allowing for the possibility that there are small random variations in payoffs
and that each player is not fully informed of the payoff functions of the other players. Harsanyi showed that the
frequency with which the various pure strategies are chosen in these perturbed games approaches the frequency with
which they are chosen in the mixed strategy equilibrium of the original game as the magnitude of the perturbation
becomes vanishingly small.
Another interpretation comes from the field of evolutionary biology. Consider a large population in which each
individual is programmed to play a particular pure strategy. Individuals are drawn at random from that population
and are matched in pairs to play a game. The cost that results from the adoption of any specific pure strategy will
depend on the frequencies with which the various strategies are represented in the population. Suppose that those
frequencies change over time in response to cost differentials. For specific classes of games any trajectory that
begins at an interior state in which all strategies are present converges to the unique mixed strategy equilibrium of
the game. As another interpretation of the mixed strategy, the population frequency of each strategy corresponds to
the likelihood with which it is played in the mixed strategy equilibrium. We shall use this interpretation in the EGT
context in later chapters.
Bibliography

[1] D. Acemoglu and A. Ozaglar. Costs of competition in general networks. In Proceedings of the 44th IEEE
Conference on Decision and Control, pages 5324–5329, December 2005.

[2] G. P. Agrawal. Fiber-optic Communication Systems. John Wiley, 3rd edition, 2002.

[3] T. Alpcan. Noncooperative games for control of networked systems. Ph.D. Thesis, University of Illinois at
Urbana-Champaign, Illinois, USA, 2006.

[4] T. Alpcan and T. Basar. A game-theoretic framework for congestion control in general topology networks. In
Proceedings of the 41st IEEE Conference on Decision and Control, pages 1218–1224, December 2002.

[5] T. Alpcan and T. Basar. A hybrid systems model for power control in multicell wireless data networks.
Performance Evaluation, 57(4):477–495, August 2004.

[6] T. Alpcan and T. Basar. Distributed algorithms for Nash equilibria of flow control games. Advances in Dy-
namic Games: Applications to Economics, Finance, Optimization, and Stochastic Control, Annals of Dynamic
Games, 7:473–498, 2005.

[7] T. Alpcan, T. Basar, R. Srikant, and E. Altman. CDMA uplink power control as a noncooperative game. In
Proceedings of the 40th IEEE Conference on Decision and Control, pages 197–202, December 2001.

[8] T. Alpcan, X. Fan, T. Basar, M. Arcak, and J. T. Wen. Power control for multicell CDMA wireless networks:
A team optimization approach. Wireless Networks, 14(5):647–657, 2008.

[9] E. Altman and Z. Altman. S-modular games and power control in wireless networks. IEEE Transactions on
Automatic Control, 48(5):839–842, 2003.

[10] E. Altman and T. Basar. Multiuser rate-based flow control. IEEE Transactions on Communications,
46(7):940–949, 1998.

[11] E. Altman, T. Basar, T. Jimenez, and N. Shimkin. Routing into two parallel links: game-theoretic distributed
algorithms. Journal of Parallel and Distributed Computing, 61(9):1367–1381, 2001.

[12] E. Altman, T. Basar, and R. Srikant. Nash equilibria for combined flow control and routing in networks:
asymptotic behavior for a large number of users. IEEE Transactions on Automatic Control, 47(6):917–930,
2002.

141
142 BIBLIOGRAPHY

[13] E. Altman, T. Boulogne, R. El-Azouziand T. Jimenez, and L. Wynter. A survey on networking games in
telecommunications. Computers and Operations Research, 33(2):286–311, 2006.

[14] E. Altman and E. Solan. Constrained games: The impact of the attitude to adversary’s constraints. IEEE
Transactions on Automatic Control, 54(10):2435 – 2440, 2009.

[15] K. J. Arrow and G. Debreu. Existence of an equilibrium for a competitive economy. Econometrica, 22(3):265–
290, 1954.

[16] G. Arslan, J.R. Marden, and J.S. Shamma. Autonomous vehicle target assignment: A game-theoretical for-
mulation. ASME Journal of Dynamic Systems, Measurement and Control, (129):584–596, 2007.

[17] R. EI Azouzi and E. Altman. Constrained traffic equilibrium in routing. IEEE Transactions on Automatic
Control, 48(9):1656–1660, 2003.

[18] T. Basar and G. J. Olsder. Dynamic noncooperative game theory. SIAM Series Classics in Applied Mathe-
matics, 2nd edition, 1999.

[19] T. Basar and R. Srikant. Revenue-maximizing pricing and capacity expansion in a many-users regime. In
Proceedings of the 21st IEEE Conference on Computer Communications (INFOCOM), pages 294–301, June
2002.

[20] M. Benaïm and J. W. Weibull. Deterministic approximation of stochastic evolution in games. Econometrica,
(71):873–903, 2003.

[21] D. P. Bertsekas. Nonlinear programming. Athena Scientific, 2nd edition, 1999.

[22] D. P. Bertsekas and J. N. Tsitsiklis. Parallel and distributed computation: numerical methods. Prentice-Hall,
1989.

[23] G. W. Brown. Iterative solutions of games by fictitious play. in Koopmans, T. C. et al., editors, Activity
Analysis of Production and Allocation, Wiley, New York:374–376, 1951.

[24] G. C. Chasparis, J. S. Shamma, and A. Rantzer. Perturbed learning automata in potential games. In Proceed-
ings of the Conference on Decision and Control, pages 2453–2458, December 2011.

[25] R. Cressman and J. Hofbauer. Measure dynamics on a one-dimensional continuous trait space: Theoretical
foundations for adaptive dynamics. Theoretical Population Biology, 67:47–59, 2005.

[26] B. de Meyer. Repeated games, duality and the central limit theorem. Mathematics of Operations Research,
21(1):237–251, 1996.

[27] B. de Meyer and A. Marino. Duality and optimal strategies in the finitely repeated zero-sum games with
incomplete information on both sides. Cahiers de la Maison des Sciences Economiques, pages 1–7, 2005.

[28] G. Debreu. A social equilibrium existence theorem. Proceedings of National Academy Sciences of the United
States of America, 38(10):886–893, October 1952.

[29] P. Dubey. Inefficiency of Nash equilibria. Mathematics of Operations Research, 11(1):1–8, February 1986.

[30] F. Facchinei, A. Fischer, and V. Piccialli. On generalized Nash games and variational inequalities. Operations
Research Letters, 35(2):159–164, 2007.
BIBLIOGRAPHY 143

[31] D. Falomari, N. Mandayam, and D. Goodman. A new framework for power control in wireless data net-
works: games utility and pricing. In Proceedings of the Allerton Conference on Communication, Control, and
Computing, pages 546–555, September 1998.

[32] R. A. Fischer. The Genetical Theory of Natural Selection. Oxford, Claredon, 2nd (1958) edition, 1930.

[33] D. Friedman. Evolutionary games in economics. Econometrica, 59:637 – 666, 1991.

[34] D. Fudenberg and D.M. Kreps. Learning mixed equilibria. Games and Economic Behaviour, 5:320–367,
1993.

[35] D. Fudenberg and D. K. Levine. Theory of Learning in Games. The MIT Press, Cambridge, 1998.

[36] I. Gilboa and A. Matsui. Social stability and equilibrium. Econometrica, Wiley, New York:859–867, 1991.

[37] A. Greenwald, E.J. Friedman, and S. Shenker. Learning in network contexts: experimental results from
simulations. Games and Economic Behaviour, 35:80–123, 2001.

[38] P. T. Harker. Generalized Nash games and quasi-variational inequalities. European Journal of Operational
Research, 54(1):81–94, 1991.

[39] J. C. Harsanyi. Games with randomly disturbed payoffs: A new rationale for mixed-strategy equilibrium
points. International Journal of Game Theory, 2:1–23, 1973.

[40] J. Hofbauer and W. H. Sandholm. Stable games and their dynamics. Journal of Economic Theory, 144:1665–
1693, 2009.

[41] J. Hofbauer and K. Sigmund. Evolutionary games and population dynamics. Cambridge University Press,
1998.

[42] J. Hofbauer and K. Sigmund. Evolutionary game dynamics. Bulletin of the American Mathematical Society
(New Series), 40:479–519, 2003.

[43] R. A. Horn and C. R. Johnson. Matrix analysis. Cambridge University Press, 1999.

[44] M. Huang, R. P. Malhame, and P. E. Caines. Nash certainty equivalence in large population stochastic dy-
namic games: connections with the physics of interacting particle systems. In Proceedings of the 45th IEEE
Conference on Decision and Control, pages 4921–4926, December 2006.

[45] H. Ji and C. Huang. Non-cooperative uplink power control in cellular radio systems. Wireless Networks,
4(3):233–240, 1998.

[46] R. Johari, S. Mannor, and J. N. Tsitsiklis. Efficiency loss in a resource allocation game: A single link in elastic
supply. In Proceedings of the 43rd IEEE Conference on Decision and Control, pages 4679–4683, December
2004.

[47] R. Johari and J. Tsitsiklis. Efficiency loss in a network resource allocation game. Mathematics of Operations
Research, 29(3):407–435, August 2004.

[48] S. Karlin. The Theory of Infinite Games. Addison-Wesley, 1959.


144 BIBLIOGRAPHY

[49] F. Kelly. Charging and rate control for elastic traffic. European Transactions on Telecommunications, 8:33–
37, 1997.

[50] F. P. Kelly, A. K. Maulloo, and D. Tan. Rate control for communication networks: Shadow prices, proportional
fairness and stability. Journal of the Operational Research Society, 49(3):237–252, 1998.

[51] H. K. Khalil. Nonlinear systems. Prentice Hall, 3rd edition, 2002.

[52] S. Koskie and Z. Gajic. A Nash game algorithm for SIR-based power control in 3G wireless CDMA networks.
IEEE/ACM Transactions on Networking, 13(5):1017–1026, 2005.

[53] E. Koutsoupias and C. H. Papadimitriou. Worst-case equilibria. In Proceedings of the 16th Annual Symposium
STACS, pages 404–413, 1999.

[54] N.N. Krasovskii and A.I. Subbotin. Game-theoretical control problems. Springer-Verlag, 1988.

[55] S. Kunniyur and R. Srikant. End-to-end congestion control schemes: utility functions, random losses and
ECN marks. IEEE/ACM Transactions on Networking, 11(5):689–702, 2003.

[56] l. Panait and K. Tuyls. Theoretical advantages of lenient Q-learners: An evolutionary game theoretic perspec-
tive. In Proceedings of the Sixth Intl. Joint Conf. on Autonomous Agents and Multi-Agent Systems (IFAAMAS),
pages 180–187, 2007.

[57] L. Libman and A. Orda. The designer’s perspective to atomic noncooperative networks. IEEE/ACM Trans-
actions on Networking, 7(6):875–884, 1999.

[58] A. Loja and et al. Inter-domain routing in multiprovider optical networks: game theory and simulations. In
Proceedings of Next Gen. Internet Networks, pages 157–164, May 2005.

[59] S. Low, F. Paganini, and J. Doyle. Internet congestion control. IEEE Control Systems Magazine, 22(1):28–43,
2002.

[60] S. H. Low and D. E. Lapsley. Optimization flow control-I: basic algorithm and convergence. IEEE/ACM
Transactions on Networking, 7(6):861–874, 1999.

[61] Z. Luo and J. Pang. Analysis of iterative waterfiling algorithm for multiuser power control in digital subscriber
lines. EURASIP Journal on Applied Signal Processing, pages 1–10, 2006.

[62] M. Maggiore. Introduction to nonlinear control systems, course notes for ECE1647, 2009. Available in the
handouts section on the course website.

[63] P. Marbach and R. Berry. Downlink resource allocation and pricing for wireless networks. In Proceedings of
the 21st IEEE Conference on Computer Communications (INFOCOM), pages 1470–1479, June 2002.

[64] J. R. Marden, G. Arslan, and J. S. Shamma. Joint strategy fictitious play with inertia for potential games. In
Proceedings of the 44th IEEE Conference on Decision and Control, pages 6692–6697, December 2005.

[65] A. Mecozzi. On the optimization of the gain distribution of transmission lines with unequal amplifier spacing.
IEEE Photonics Technology Letters, 10(7):1033–1035, 1998.
BIBLIOGRAPHY 145

[66] D. Mitra. An asynchronous distributed algorithm for power control in cellular radio systems. In Proceedings
of the 4th WINLAB Workshop on Third Generation Wireless Information Networks, pages 249–257, November
1993.

[67] D. Monderer and L. S. Shapley. Potential games. Games and Economic Behavior, 14(1):124–143, 1996.

[68] D. Monderer and L. S. Shapley. Potential games. Games and Economic Behavior, 14:124–143, 1996.

[69] J. Morgan and M. Romaniello. Generalized quasi-variational inequalities and duality. Journal of Inequalities
in Pure and Applied Mathematics, 4(2):Article 28, 1–7, 2003.

[70] J. Morgan and M. Romaniello. Generalized quasi-variational inequalities: Duality under perturbations. Jour-
nal of Mathematical Analysis and Applications, 324(2):773–784, 2006.

[71] R. Myerson. Game Theory: Analysis of Conflict. Harvard University Press, 1991.

[72] H. Mync. Nonnegative Matrices. Wiley, 1988.

[73] J. Nash. Equilibrium points in n-person games. Proceedings of National Academy Sciences. USA, 36(1):48–
49, 1950.

[74] J. Nash. Non-cooperative games. The Annals of Mathematics, 54(2):286–295, September 1951.

[75] N. Nisan, T. Roughgarden, E. Tardos, and V. Vazirani (eds.). Algorithmic Game Theory. Cambridge Univer-
sity Press, 2007.

[76] M. A. Nowak. Evolutionary Dynamics: Exploring the Equations of Life. Belknap/Harvard, Cambridge, 2006.

[77] M. A. Nowak and K. Sigmund. Evolutionary Dynamics of Biological Games. Number 303. 2004.

[78] M. Osborne. An Introduction to Game Theory. Oxford University Press, 2004.

[79] G. Owen. Game Theory. Academic Press, 3rd edition, 1995.

[80] A. Ozdaglar and D. Bertsekas. Routing and wavelength assignment in optical networks. IEEE/ACM Trans-
actions on Networking, 11(2):259–272, 2003.

[81] Y. Pan and L. Pavel. OSNR optimization in optical networks: extension for capacity constraints. In Proceed-
ings of the 2005 American Control Conference, pages 2379–2385, June 2005.

[82] Y. Pan and L. Pavel. Global convergence of an iterative gradient algorithm for the Nash equilibrium in an
extended OSNR game. In Proceedings of the 26th IEEE Conference on Computer Communications (INFO-
COM), pages 206–212, May 2007.

[83] Y. Pan and L. Pavel. Iterative algorithms for Nash equilibrium of an extended OSNR optimization game. In
Proceedings of the 6th International Conference on Networking (ICN), April 2007.

[84] Y. Pan and L. Pavel. OSNR optimization with link capacity constraints in WDM networks: a cross layer
game approach. In Proceedings of the 4th IEEE Conference on Broadband Communications, Networks and
Systems (BroadNets), September 2007.
146 BIBLIOGRAPHY

[85] Y. Pan and L. Pavel. A Nash game approach for OSNR optimization with capacity constraint in optical links.
IEEE Transactions on Communications, 56(11):1919–1928, November 2008.

[86] Y. Pan and L. Pavel. Games with coupled propagated constraints in optical networks with multi-link topolo-
gies. Automatica, 45(4):871–880, 2009.

[87] P. A. Parrilo. Polynomial games and sum of squares optimization. In Proceedings of the 45th IEEE Conference
on Decision and Control, pages 2855–2860, December 2006.

[88] L. Pavel. Power control for OSNR optimization in optical networks: a noncooperative game approach. In
Proceedings of the 43rd IEEE Conference on Decision and Control, pages 3033–3038, December 2004.

[89] L. Pavel. An extension of duality and hierarchical decomposition to a game-theoretic framework. In Proceed-
ings of the 44th IEEE Conference on Decision and Control, pages 5317–5323, December 2005.

[90] L. Pavel. A noncooperative game approach to OSNR optimization in optical networks. IEEE Transactions
on Automatic Control, 51(5):848–852, 2006.

[91] L. Pavel. An extension of duality to a game-theoretic framework. Automatica, 43(2):226–237, 2007.

[92] L. Pavel. Game theory for control of optical networks. Birkhäuser-Springer Science, 2012.

[93] G. Perakis. The price of anarchy when costs are non-separable and asymmetric. In Proceedings of the 10th
International Integer Programming and Combinatorial Optimization (IPCO) Conference, pages 46–58, June
2004.

[94] G. Perakis. The “price of anarchy” under nonlinear and asymmetric costs. Mathematics of Operations Re-
search, 32(3):614–628, August 2007.

[95] J. Robinson. An iterative method of solving a game. Annals of Mathematics, 54:296–301, 1951.

[96] J. B. Rosen. Existence and uniqueness of equilibrium points for concave n-person games. Econometrica,
33(3):520–534, 1965.

[97] T. Roughgarden. Selfish routing and the price of anarchy. MIT Press, 2005.

[98] W. H. Sandholm. Population Games and Evolutionary Dynamics. Cambridge University Press, 2009.

[99] C. Saraydar, N. B. Mandayam, and D.J. Goodman. Pricing and power control in a multicell wireless data
network. IEEE Journal on Selected Areas in Communications, 19(10):1883–1892, October 2001.

[100] C. Saraydar, N. B. Mandayam, and D.J. Goodman. Efficient power control via pricing in wireless data
networks. IEEE Transactions on Communications, 50(2):291–303, 2002.

[101] G. Scutari, S. Barbarossa, and D. P. Palomar. Potential games: a framework for vector power control problems
with coupled constraints. In Proceedings of IEEE International Conference on Acoustics, Speech and Signal
Processing (ICASSP), 2006.

[102] R. Selten. Re-examination of the perfectness concept for equilibrium points in extensive games. International
Journal of Game Theory, 4:25–55, 1975.
BIBLIOGRAPHY 147

[103] J. S. Shamma and G. Arslan. Dynamic fictitious play, dynamic gradient play and distributed convergence to
Nash equilibria. IEEE Transactions on Automatic Control, 50(1):312 – 326, 2005.

[104] U. Shandbag and et al. Nash equilibrium problems with congestion costs and shared constraints. In Proceed-
ings of the Conference on Decision and Control, December 2009.

[105] H. Shen and T. Basar. Differentiated Internet pricing using a hierarchical network game model. In Proceedings
of the 2004 American Control Conference, pages 2322–2327, June 2004.

[106] J. M. Smith. The theory of games and the evolution of animal conflicts. Journal of Theoretical Biology, 7:209
– 221, 1974.

[107] J. Maynard Smith. Evolution and the Theory of Games. Cambridge University Press, 1982.

[108] J. Maynard Smith and G. R. Price. The logic of animal conSSict. Nature, 246:15–18, 1973.

[109] N. D. Stein, A. Ozdaglar, and P. A. Parrilo. Separable and low-rank continuous games. In Proceedings of the
45th IEEE Conference on Decision and Control, pages 2849–2854, December 2006.

[110] R. K. Sundaram. A first course in optimization theory. Cambridge University Press, 1996.

[111] C. Tan, D. Palomar, and M. Chiang. Solving nonconvex power control problems in wireless networks: low
SIR regime and distributed algorithms. In Proceedings of the 48th IEEE Global Telecommunications Confer-
ence (GLOBECOM), November 2005.

[112] P. D. Taylor and L. Jonker. Evolutionarily stable strategies and game dynamics. Mathematical Biosciences,
40:145–156, 1978.

[113] E. van Damme. Stability and Perfection of Nash Equilibria. Springer Verlag, Berlin, 2nd edition, 1987.

[114] T. L. Vincent and J. S. Brown. Evolutionary game theory, natural selection, and Darwinian dynamics. Cam-
bridge University Press, 2005.

[115] J. von Neumann and O. Morgenstern. Theory of Games and Economic Behavior. Princeton University Press,
1944.

[116] J. W. Weibull. Evolutionary game theory. The MIT Press, 1995.

[117] H. Yaiche, R. Mazumdar, and C. Rosenberg. A game theoretic framework for bandwidth allocation and
pricing in broadband networks. IEEE/ACM Transactions on Networking, 8(5):667–678, October 2000.

[118] A. Yassine and et al. Competitive game theoretic optimal routing in optical networks. In Proceedings SPIE,
pages 27–36, May 2002.

Вам также может понравиться