Learning in Games

LEARNING IN GAMES
A STARTING QUESTION
There exist various equilibrium concepts

Iterated elimination of dominated strategies Nash equilibrium and its refinements
Sub-game perfect equilibrium
A starting question for our investigation is
How do players reach the equilibrium theory predicts? How do they realize what to do?
AN IMPORTANT DISTINCTION
Games can be one-shot
Played only once
Or they can be repeated

Played many times Different possible rules (e.g. rematching among players or not)
Learning models apply to repeated games
Chance for learning
GOAL OF THIS LECTURE
Provide an overview of the most important learning models

Underlying assumptions How does learning take place Empirical validity
THE VERY BEGINNING
Evolutionary approaches
Assumption: each player is born with a strategy and uses it No proper learning: strategies that are more successful increase players fitness and enable the player to survive longer or reproduce more Empirical validity: unable to explain observed rapid learning in the lab, better suited to study animal behavior or cultural transmission
THE VERY BEGINNING
Evolutionary approaches
Assumption: each player is born with a strategy and uses it No proper learning: strategies that are more successful increase players fitness and enable the player to survive longer or reproduce more Empirical validity: unable to explain observed rapid learning in the lab, better suited to study animal behavior or cultural transmission
First attempt to look at the process behind strategy selection
LEARNING FROM OTHERS
Belief learning models

Way of learning the way to the Nash equilibrium Players update beliefs about what others will do based on history, and use those beliefs to decide which strategies are best Assumption: opponents are playing stationary (possibly mixed) strategies
Fictitious play (1951)

keep track of relative frequency of play for each strategy of the other player. begin with an initial or prior sample use play frequencies as beliefs to compute expected value of alternative strategies
play a best response to historical frequencies
LEARNING FROM OTHERS
Brown(1951) first introduced FP

Name due to fact that Brown wanted to use it as an explanation for Nash equilibrium: he imagined that a player would "simulate" play of the game in his mind and update his future play based on this simulation Not proper learning but brought attention to the process of reaching an equilibrium
Problem with classic fictitious play
It ignores when another player made choices in the past Introduces a weight on past observation that increases according to how far in the past it was collected
Weighted fictitious play
WEIGHTED FICTITIOUS PLAY
Fictitious play beliefs
Weight given to the observation of strategy aj

Belief player i has that opponent will play strategy aj at time t+1 Indicator function =1 if strategy aj played at time t
WEIGHTED FICTITIOUS PLAY IN ACTION
In the given game
ISSUES WITH FICTITIOUS PLAY
Empirical evidence does not support use of FP
Nyarko and Schotter, 2002 Look at (stated) beliefs and actual play
Beliefs elicited in an incentive-compatible way Behavior is not the one predicted by fictitious play Given stated beliefs, FP makes predictions that are not empirically observed
It seems people are not very good at predicting others behavior People dont learn from others choices!
THE FOCUS SHIFTS ON OWN EXPERIENCE
Some basic rules

Law of effect (Thornidke 1898)
Choices that have led to good outcomes are more likely to be repeated in the future
Power law of practice (Blackburn 1936)
Learning curves tend to be steep initially, then flatter
THE FOCUS SHIFTS ON OWN EXPERIENCE
Reinforcement models
Assumption: strategies are reinforced by their previous payoff (and somewhat by neighbouring ones) Learning: update attraction you feel for a particular strategy cumulating past payoffs (including spillover) Empirical validity: best for describing players with imperfect reasoning ability (animals)
Does not account in any way historical or foregone payoffs in the learning process
AN EXAMPLE OF REINFORCEMENT LEARNING

Define qnk the propensity of player n to play pure strategy k Assume that initially players have the same propensity to play all their pure strategies Choice rule is probabilistic
qnk (t ) pnk (t ) qnj (t )
Reinforcement function R(X)= x- xmin

x payoff of chosen move xmin is the smallest available payoff

x payoff of chosen move xmin is the smallest available payoff

Negative reinforcement is avoided
Cumulative update of propensities

qnj(t)+R(x) if j=k
qnk(t+1) =
qnk
Corresponding update of probabilities
pnj (t )
q
jS
qnj (t )
nj
(t )
Cumulative update of propensities

qnj(t)+R(x) if j=k If plays again same strategy
qnk(t+1) =
qnk
propensity towards that strategy
n is the player k is the strategy
Corresponding update of probabilities qnj (t ) pnj (t ) qnj (t )

jS
Consider the following game structure

h
H L 9, 7 5,3
l
11,2 4,6
Lets assume that

qrh=qrl=10 and qch=qcl=10 players play each strategy with the following probability
prh=prl=10/(10+10)=0.5 pch=pcl=10/(10+10)=0.5
at t=1 (H, l) is determined
Recall
qrh=qrl=10 and qch=qcl=10 and all p=0.5 at t=1 (H, l) is determined
h H L 9, 7 5,3
l 11,2 4,6
Reinforcement learning implies

qrh(t+1)=10+(11-4)=17 qrl=10 qcl(t+1)=10+(2-2)=10 qch=10
Recall
h H L 9, 7 5,3
l 11,2 4,6

qrh(t+1)=10+(11-4)=17 qrl=10 qcl(t+1)=10+(2-2)=10 qch=10
column gets the min he could get, no negative reinforcement, at max NO reinforcement
Recall
h H L 9, 7 5,3
l 11,2 4,6

qrh(t+1)=10+(11-4)=17 qrl=10 qcl(t+1)=10+(2-2)=10 qch=10
Probabilities are modified: law of effect!

prh(t+1)=17/(10+17)= 0.63 prl(t+1)= 10/(10+17)= 0.37 pch(t+1)=10/20=0.5 pcl(t+1)=10/20=0.5
Recall
h H L 9, 7 5,3
l 11,2 4,6

qrh(t+1)=10+(11-4)=17 qrl=10 qcl(t+1)=10+(2-2)=10 qch=10
Probabilities are modified: law of effect!

prh(t+1)=17/(10+17)= 0.63 prl(t+1)= 10/(10+17)= 0.37 pch(t+1)=10/20=0.5 pcl(t+1)=10/20=0.5
No reinforcement implies no modification in probabilities
EMPIRICAL EVIDENCE ON REINFORCEMENT
There are more complicated version of the RL model which include also an experimentation and a forgetting parameter Experimental studies (Erev, Roth 1995, 1998) show that
RL from own payoff is able to approximate the direction of learning in some games only learning is slow
Players do not learn from counterfactual reasoning
options never explored never teach anything!
COMBINING EXPERIENCE AND BELIEFS
Belief learning assumes players ignore information about what they chose in the past
look at what others did!
Reinforcement learning assumes players ignore information about foregone payoffs
never learn anything from choices not taken
New wave of models combining both types of information
EXPERIENCE-WEIGHTED ATTRACTION MODEL
Main features
attraction of a strategy depends not only experienced payoff (when actually played) , but also on counterfactual payoff (payoff earned when actually not played).
Two variables updated after every period

attractions Aji (t) experience weight N(t)
EWA IN ACTION
Attraction of a given strategy is determined by
EWA IN ACTION
Attraction of a given strategy is determined by
weight of past attractions weighted payoff term
EWA: WEIGHT OF PAST ATTRACTIONS
where

is a forgetting parameter (for past attractions) N(t) is the experience weight updated by the following formula, so that it is weakly increasing
controls how attractions grow indicates whether experience is cumulated rather than averaged =1 cumulated, =0 averaged
EWA: WEIGHTED PAYOFF TERM

payoff of playing a given strategy
EWA: WEIGHTED PAYOFF TERM

payoff of playing a given strategy
This term represent reinforcement and implies that
all strategies are updated by times the payoff they would have yielded, even when not chosen the chosen strategy is further updated by (1- )
when =0 only experience payoff matter
represents the weight placed on foregone payoffs
FROM ATTRACTIONS TO PROBABILITIES
Attractions are translated into probabilities (of playing the different strategies by the following logit response rule
Portrays noisy players that are not always able to pick the best option
For low levels of , many discrimination mistakes, play strategies at random For high levels of , better discrimination between attractions indicates the amount of noise in behavior
EWA PREDICTIONS AND OTHER MODELS
EWA is able to reproduce results of other models

weight on foregone payoffs forgetting parameter for past attractions indicates if experience is cumulated rather than averaged
THE EWA CUBE AND OTHER MODELS
EWA ACCURACY IN PREDICTION
The EWA model is able to corrctly predict choice across many games, especially when we look at out-of sample behavior
We use have of the observation to tune the model and use the tuned model to predict the rest of the data
Example: Continental divide game

Decision where to locate your firm on a scale from 1-14 (1 is closer to Hollywood, 14 closer to silicon Valley, assuming you make a product that applies to both sectors) Players payoff depend on own choice and the median choice
CONTINENTAL DIVIDE GAME
Two Pareto-ranked Nash equilibria Convergence to one or the other depends on the starting point
EWA VS OTHER MODELS
All models capture bifurcating behavior but

Problems with reinforcement learning: No braking/acceleration (Too slow!)
Problems with belief learning: No asymmetric convergence (No effects of payoff differences in equilibria)

EWA model is the best at describing behavior in the continental divide game
A NEW VERSION OF EWA
Technically EWA has four parameters

, , , But with four parameters you can fit pretty much anything!
Criciticism: maybe EWA overfits the data

Self-tuning
EWA: new version of EWA with fewer parameters (=0, , and endogenously determined)
Performance of the self-tuning EWA is pretty much the same as the standard EWA
READINGS
Elective reading Colin Camerer and Teck-Hua Ho, ExperiencedWeighted Attraction Learning in Normal Form Games Econometrica, Vol. 67, No. 4 (Jul., 1999), pp. 827-874, http://www.jstor.org/stable/2999459
Mandatory reading Selected pages from Camerer, Behavioral Game Theory ch. 6 (199-204, 209-221, 257) Extract from Camerer, Chong, Ho (handed out in class)

Learning in Games

Загружено:

Сведения о документе

Исходное описание:

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Learning in Games

Загружено:

Авторское право:

Доступные форматы

LEARNING IN GAMES

There exist various equilibrium concepts

Sub-game perfect equilibrium

A starting question for our investigation is

Games can be one-shot

Played only once

Or they can be repeated

Learning models apply to repeated games

Chance for learning

GOAL OF THIS LECTURE

Provide an overview of the most important learning models

THE VERY BEGINNING

THE VERY BEGINNING

First attempt to look at the process behind strategy selection

LEARNING FROM OTHERS

Belief learning models

Fictitious play (1951)

play a best response to historical frequencies

LEARNING FROM OTHERS

Brown(1951) first introduced FP

Problem with classic fictitious play

Weighted fictitious play

WEIGHTED FICTITIOUS PLAY

Fictitious play beliefs

Weight given to the observation of strategy aj

WEIGHTED FICTITIOUS PLAY IN ACTION

In the given game

ISSUES WITH FICTITIOUS PLAY

Empirical evidence does not support use of FP

THE FOCUS SHIFTS ON OWN EXPERIENCE

Some basic rules

Power law of practice (Blackburn 1936)

Learning curves tend to be steep initially, then flatter

THE FOCUS SHIFTS ON OWN EXPERIENCE

AN EXAMPLE OF REINFORCEMENT LEARNING

qnk (t ) pnk (t ) qnj (t )

Reinforcement function R(X)= x- xmin

AN EXAMPLE OF REINFORCEMENT LEARNING

qnk (t ) pnk (t ) qnj (t )

x payoff of chosen move xmin is the smallest available payoff

Reinforcement function R(X)= x- xmin

AN EXAMPLE OF REINFORCEMENT LEARNING

qnk (t ) pnk (t ) qnj (t )

x payoff of chosen move xmin is the smallest available payoff

Reinforcement function R(X)= x- xmin

AN EXAMPLE OF REINFORCEMENT LEARNING

Cumulative update of propensities

Corresponding update of probabilities

AN EXAMPLE OF REINFORCEMENT LEARNING

Cumulative update of propensities

propensity towards that strategy

n is the player k is the strategy

Corresponding update of probabilities qnj (t ) pnj (t ) qnj (t )

AN EXAMPLE OF REINFORCEMENT LEARNING

Consider the following game structure

Lets assume that

at t=1 (H, l) is determined

AN EXAMPLE OF REINFORCEMENT LEARNING

Reinforcement learning implies

AN EXAMPLE OF REINFORCEMENT LEARNING

Reinforcement learning implies

AN EXAMPLE OF REINFORCEMENT LEARNING