Академический Документы
Профессиональный Документы
Культура Документы
A STARTING QUESTION
How do players reach the equilibrium theory predicts? How do they realize what to do?
AN IMPORTANT DISTINCTION
Evolutionary approaches
Assumption: each player is born with a strategy and uses it No proper learning: strategies that are more successful increase players fitness and enable the player to survive longer or reproduce more Empirical validity: unable to explain observed rapid learning in the lab, better suited to study animal behavior or cultural transmission
Evolutionary approaches
Assumption: each player is born with a strategy and uses it No proper learning: strategies that are more successful increase players fitness and enable the player to survive longer or reproduce more Empirical validity: unable to explain observed rapid learning in the lab, better suited to study animal behavior or cultural transmission
It ignores when another player made choices in the past Introduces a weight on past observation that increases according to how far in the past it was collected
Nyarko and Schotter, 2002 Look at (stated) beliefs and actual play
Beliefs elicited in an incentive-compatible way Behavior is not the one predicted by fictitious play Given stated beliefs, FP makes predictions that are not empirically observed
It seems people are not very good at predicting others behavior People dont learn from others choices!
Choices that have led to good outcomes are more likely to be repeated in the future
Reinforcement models
Assumption: strategies are reinforced by their previous payoff (and somewhat by neighbouring ones) Learning: update attraction you feel for a particular strategy cumulating past payoffs (including spillover) Empirical validity: best for describing players with imperfect reasoning ability (animals)
Does not account in any way historical or foregone payoffs in the learning process
qnk(t+1) =
qnk
pnj (t )
q
jS
qnj (t )
nj
(t )
qnk(t+1) =
qnk
l
11,2 4,6
prh=prl=10/(10+10)=0.5 pch=pcl=10/(10+10)=0.5
Recall
qrh=qrl=10 and qch=qcl=10 and all p=0.5 at t=1 (H, l) is determined
h H L 9, 7 5,3
l 11,2 4,6
Recall
qrh=qrl=10 and qch=qcl=10 and all p=0.5 at t=1 (H, l) is determined
h H L 9, 7 5,3
l 11,2 4,6
column gets the min he could get, no negative reinforcement, at max NO reinforcement
Recall
qrh=qrl=10 and qch=qcl=10 and all p=0.5 at t=1 (H, l) is determined
h H L 9, 7 5,3
l 11,2 4,6
Recall
qrh=qrl=10 and qch=qcl=10 and all p=0.5 at t=1 (H, l) is determined
h H L 9, 7 5,3
l 11,2 4,6
There are more complicated version of the RL model which include also an experimentation and a forgetting parameter Experimental studies (Erev, Roth 1995, 1998) show that
RL from own payoff is able to approximate the direction of learning in some games only learning is slow
Belief learning assumes players ignore information about what they chose in the past
Main features
attraction of a strategy depends not only experienced payoff (when actually played) , but also on counterfactual payoff (payoff earned when actually not played).
EWA IN ACTION
EWA IN ACTION
where
is a forgetting parameter (for past attractions) N(t) is the experience weight updated by the following formula, so that it is weakly increasing
controls how attractions grow indicates whether experience is cumulated rather than averaged =1 cumulated, =0 averaged
all strategies are updated by times the payoff they would have yielded, even when not chosen the chosen strategy is further updated by (1- )
when =0 only experience payoff matter
Attractions are translated into probabilities (of playing the different strategies by the following logit response rule
Portrays noisy players that are not always able to pick the best option
For low levels of , many discrimination mistakes, play strategies at random For high levels of , better discrimination between attractions indicates the amount of noise in behavior
The EWA model is able to corrctly predict choice across many games, especially when we look at out-of sample behavior
We use have of the observation to tune the model and use the tuned model to predict the rest of the data
Two Pareto-ranked Nash equilibria Convergence to one or the other depends on the starting point
Problems with belief learning: No asymmetric convergence (No effects of payoff differences in equilibria)
EWA model is the best at describing behavior in the continental divide game
EWA: new version of EWA with fewer parameters (=0, , and endogenously determined)
Performance of the self-tuning EWA is pretty much the same as the standard EWA
READINGS
Elective reading Colin Camerer and Teck-Hua Ho, ExperiencedWeighted Attraction Learning in Normal Form Games Econometrica, Vol. 67, No. 4 (Jul., 1999), pp. 827-874, http://www.jstor.org/stable/2999459
Mandatory reading Selected pages from Camerer, Behavioral Game Theory ch. 6 (199-204, 209-221, 257) Extract from Camerer, Chong, Ho (handed out in class)