Вы находитесь на странице: 1из 7

Effects of extrinsic motivation on tasks with performance-based rewards and completionbased rewards

Gene Levitzky

Abstract The effect of extrinsic motivation on


levels of intrinsic motivation and task performance is a hotly debated topic in the varying sub-fields of psychology. It is commonly believed that the use of extrinsic consequences will result in a decrease in highly valued human behavior (Dickinson, 1989). Consequently, in this paper I considered the idea that my computational models would reflect this dreaded decrease in intrinsic motivation, however, only in the tasks that rewarded based solely on completion and not on performance (see Eisenberger & Cameron, 1996). In this study I examined the effects of extrinsic motivation on how well a computational simulation learned to complete a simple task when it was rewarded for completing the task, and when it was rewarded based on how well it completed the task. The results showed that for tasks with a completionbased reward, the presence of extrinsic motivation was significantly more harmful then for tasks with performance-based reward.

Introduction
The notion of extrinsic motivation can be defined as behavior that is controlled by incentives that are not part of the activity (Eisenberger & Cameron, 1996). Whereas, intrinsic motivation can be defined as behavior that could not be attributed to external controls (Dickinson, 1989). Put another way, extrinsic motivation is any form of incentive or pleasure that is not derived inherently from the task itself, but is instead derived from some external often unrelated

source (say giving food pellets to a rat for successfully completing a maze). Intrinsic motivation, on the other hand, is the impetus that comes directly from performing the task itself (say the pleasure someone who enjoys solving mazes leisurely might experience from solving a difficult maze). Writers such as Dickinson, and Deci & Ryan (Dickinson, 1989; Deci & Ryan, 1999) have put forth the claim that if an activity is initially intrinsically motivated and then later salient extrinsic controls are present, behavior will be attributed to those controls and, as a result, will not readily occur in their absence in the future (Dickinson, 1989). In this paper I will argue that this is indeed the case, but only for tasks where the salient extrinsic rewards are applied solely upon completion, and independently of how well the tasks are completed. For the sake of clarity, throughout this paper performance-based reward will refer to a reward whose value is determined by how well the accompanying task is completed. If the reward for completing a certain task is calculated as a function of how well the task is completed, then I call this reward performance-based. For example, if a task of solving a puzzle is timed, and the reward for solving the puzzle is dependent on the completion time, then that task has a performance-based reward. Conversely, a completion-based reward is any reward that is dispensed simply upon completing the given task. For example, if a subject is rewarded purely for solving a puzzle, regardless of how well or how quickly he solved it, then that reward is completion-based. Eisenberger & Cameron argue in their paper that there is no evidence to suggest that extrinsic motivation has any detrimental effects on intrinsic motivation when the extrinsic motivation is in the form of a performance-based reward. In fact, the two argue that an extrinsic performancebased reward "leads to increased expressed interest in the task (Eisenberger & Cameron, 1996).

Thus, in this paper I will argue not only that extrinsic motivation has detrimental effects on intrinsic motivation and the overall performance on a given task, but that in certain instances extrinsic motivation will actually have a positive impact. More specifically, I will argue that when performing a task with completion-based reward, extrinsic motivation is deleterious. However, when performing a task with performance-based reward, extrinsic motivation is beneficial. Finally, I will show that the effects of extrinsic motivation on tasks with performancebased reward are much more benign then on tasks with completion-based reward.

Methods and Procedure


The computational simulation consisted of sixteen states in a four-by-four grid world, where one of the states was terminal and would conclude an episode when transitioned into by the agent.

Figure 1. Here the non-terminal states are seen in white, and the goal state is seen in pink. The agent is represented by the golden disk.

Initially, the agent would spawn in the corner opposite of the terminal state, and proceed to explore the environment. Upon reaching the terminal (or goal) node and concluding the episode, the agent would respawn in a random state in the grid, and repeat the

process. Within the environment, the agent had four transitions that could be made from any given state: North, South, West, and East. Prior to any exploration, each transition was equally likely. There were two methods by which the probability of a transition could be altered. The first was transitioning into the goal state and receiving the reward, in which case the probability of the transition that brought the agent into the terminal state was incremented (and the other three transitions in that state were proportionately reduced). The other was by attempting to transition out of bounds, in which case the probability of the attempted transition was decremented (and, again, the other three transitions were proportionately inflated). The study itself consisted of four sets of eight trials, where each trial consisted of 1000 episodes. The first two sets of trials were concerned only with the differences between a completion-based reward model, and a performance-based reward model. In the former, the agent was rewarded a fixed scalar value only when transitioning into the goal node. In the latter, the agent was similarly rewarded, but the reward's value was contingent on how good the agent's policy was. Depending on whether the agent made too many moves before reaching the goal, or arrived there quickly, the reward was either incremented or decremented, respectively. Here, quickly was defined by a value k, which was 1.5 times the value of the side of the grid; so for the 4 x 4 grid, k = 6. Whenever the agent took more than k steps to reach the goal, the reward was reduced by a factor of 0.10; more precisely, for every k steps the agent made without reaching the terminal state, the reward was reduced. For reaching the goal in under k steps, the agent's reward was increased by a factor of 0.10. At the beginning of every episode, the reward was reset to its original value. The final two sets of trials introduced the key variable of the experiment: the effect of extrinsic motivation on the performance of the agent. These two trials were similar to the first two, but differed in that after 500 episodes the value of the reward given in the terminal state was reduced by half. The learning algorithm employed ubiquitously throughout the trials was SARSA. In brief, the algorithm focuses on transitions from state-action pair to state-action pair, and learn[s] the value[s] of stateaction pairs (Sutton & Barto, 1998). Q-values are assigned using the following algorithm:

Q(st,at) Q(st,at) + [rt+1 + Q(st+1,at+1) Q(st,at)] Where the value of r was zero everywhere except in the terminal state, gamma the discount factor was 0.833, and alpha the learning rate was 0.100.

Trial Set 3: Trial 1 2 3 4 Mean Steps 6.050 6.284 5.722 6.263 Mean Reward 375.000 375.000 375.000 375.000

Results

5 5.351 375.000 Each set of trials consisted of 8 trials and each trial 6 5.106 375.000 consisted of 1000 episodes. During each trial, the 7 6.013 375.000 number of steps per episode was recorded, along with the reward received for that episode (this figure was 8 6.078 375.000 constant for completion-based trials). From these sets Avg 5.858 375.000 of data, I computed the mean steps to goal, and the Figure 4. Third trial set, completion-based with extrinsic reward mean reward received over the 1000 episodes, along halved after 500 episodes. Standard deviation for steps: 0.430. with their respective standard deviations. The averages for my different sets of trials Trial Set 4: Trial Mean Steps Mean Reward were as follows: Trial Set 1: Trial 1 2 3 4 5 6 7 8 Avg Trial set 2: Trial 1 2 3 4 5 6 7 8 Avg Mean Steps 5.240 5.288 5.015 4.675 4.681 4.750 5.021 5.367 5.005 Mean Reward 538.134 539.367 538.404 540.992 539.845 541.286 539.429 536.374 539.229 Mean Steps 4.744 5.074 4.986 4.997 5.009 5.255 4.703 4.760 4.941 Mean Reward 500.000 500.000 500.000 500.000 500.000 500.000 500.000 500.000 500.000 1 2 3 4 5 6 7 8 Avg 4.876 4.878 4.965 5.111 4.798 4.870 6.429 4.963 5.111 404.019 402.876 405.447 401.235 403.675 403.957 393.685 402.770 402.208

Figure 5. Fourth trial set, performance-based with extrinsic reward halved after 500 episodes. Standard deviation for steps: 0.541 Standard deviation for rewards: 3.651.

Figure 2. First trial set, completion-based. Standard deviation for steps: 0.191.

A graph illustrating the number of steps to goal, and the amount of reward received per episode in a typical trial can be found in the supplementary graph section. I found that the difference in the average steps to goal between trial set 3 (completion-based with reward halved) and trial set 4 (performance based with reward halved) to be significant. (5.858 5.111) / 0.430 = 1.737 standard deviations. Assuming a normal distribution tells us that my result was in the 96th percentile, and therefore statistically significant. Further, comparing trial set 1 (performancebased) and trial set 3 (performance-base and reward halved) shows an enormous significant change: in the

Figure 3. Second trial set, performance-based. Standard deviation for steps: 0.279. Standard deviation for rewards: 1.595.

third trial set the average number of steps to the terminal state was much greater, 4.801 standard deviations higher than in trial set 1. Conversely, when comparing trial set 2 (completion-based) and trial set 4 (completion-based and reward halved) yields no significant change.

Discussion
The results of the computational simulations positively correlated with the thoughts of Dickinson, Deci & Ryan, and Eisenberger & Cameron. The data from the four trial sets highlighted a significant discrepancy between the first and third trial set, where the agent performed much better in the first trial when compared with the third trial. If we take the reward given in the terminal state to represent extrinsic motivation, then these results demonstrate the detrimental effects of extrinsic incentive as argued by Deci & Ryan. In the first trial set where the agent was always rewarded the same amount for completing the task, the agent performed relatively well. While in the third trial set, where the reward was unexpectedly halved halfway through the trial, the agent performed significantly worse. To understand the exact mechanism that caused this drop in performance, refer to figures 6-7. Here we see that the performance of the agent was almost identical up to roughly 600 episodes into the trial, after which in trial set one the overall trend in performance continued until the end, whereas in trial set three there was a dramatic spike in the number of steps the agent made to reach the terminal state while it lost interest in the task temporarily due to TD error correction. Next I examine the impact of reducing the reward in trials where the reward was disclosed based on how well the agent performed (trial sets two and four). The difference, as predicted by Eisenberger & Cameron, was insignificant, however the performance did slip in trial set four very slightly: less than a third of a standard deviation. This implies that extrinsic controls play a minimal role when the extrinsic rewards are awarded as function of the quality of the

performance. Finally, I look at the different ways in which extrinsic motivation impacted tasks with completion-based rewards (trial set three) and tasks with performance-based rewards (trial set four). Again, as predicted by Eisenberger & Cameron, when lessening an extrinsic incentive for an agent that was acclimated to receiving a constant reward for completing a task, the agent would begin to perform poorly. In contrast, when lessening an extrinsic incentive for an agent that received a reward based on how well it performed, the agent performed at the same level as before. This disparity in performance between trial sets three and four illustrates the idea that extrinsic motivation is only detrimental when applied to tasks involving completion-based rewards. While, the results and data of the experiment tend to support my initial hypotheses, the experimental setup was not without its flaws. The system used to determine the performancebased reward could have potentially saturated the agent with reward. This is potentially true because after a few hundred episodes and a good policy, reaching the goal within k steps is significantly easier than reaching it in more than k steps. This saturation is evident by the fact that the agent always received more average reward in performance-based trials than in completionbased trials. While the difference in average reward between the two types of trials is not that large (between 5% and 10%), it could potentially account for why extrinsic motivation has less of an impact on performance-based trials (because the agent is still receiving a slightly higher reward than it would have in completion-based trials). In the future, more refined setups should be used, specifically setups where the performance-based reward is determined in a more balanced way. Moreover, the agent ought to be subjected to tasks that are more interesting and more complicated.

Supplementary Figures

Avg. Steps to Goal


Third Trial Set: Completion-Based, reward halved after 500 episodes
140 120
Number of Steps

100 80 60 40 20 0 0 200 400 600


Episode

800

1000

1200

Figure 6.

Avg. Steps to Goal


First Trial Set: Completion-Based, reward constant
140 120
Number of Steps

100 80 60 40 20 0 0 200 400 600


Episode
Figure 7.

800

1000

1200

Avg. Steps to Goal


Fourth Trial Set, Performance-Based, reward halved after 500 episodes
140 120
Number of Steps

100 80 60 40 20 0 0 200 400 600


Episode

800

1000

1200

Figure 8.

Reward Per Epsisode


Fourth Trial Set, Performace-Based
600 500 400
Reward

300 200 100 0 0 200 400 600


Episode

800

1000

1200

Figure 9. The halving of reward is clearly evident here at the 500 episode mark.

References
Sutton, R, & Barto, A. (1998). Reinforcement learning: an introduction. Cambridge, Massachusetts: MIT Press. Deci, E, & Ryan, R. (1999). A meta-analytic review of experiments examining the effects of extrinsic rewards on intrinsic motivation. Psychological Bulletin, 125(6), 628-667. Dickinson, A. (1989). The detrimental effects of extrinsic reinforcement on "intrinsic motivation". The Behavior Analyst, 12(1), 1-15. Eisenberger, R, & Cameron, J. (1996). Detrimental effects of reward. American Psychologist, 51(11), 1153-1166.

Вам также может понравиться