This dissertation is submitted in part requirement for the Degree of M.A. with Honours in Economics at the
University of Aberdeen, Scotland, and is solely the work of the above named candidate
word count: 9,997
Table of Contents
1. Introduction ................................................................................................................................................ 5
2. Overview ..................................................................................................................................................... 6
2.1 Aspirations shift .................................................................................................................................... 6
2.2 The academic field of esports ............................................................................................................. 7
2.3 The Rise of the Computer ..................................................................................................................... 7
2.4 The esports leader ............................................................................................................................... 9
2.5 Matchmaking potential ...................................................................................................................... 10
3. Current evaluation tools ........................................................................................................................... 11
3.1 The Dota matchmaker ........................................................................................................................ 11
3.1.1 MMR Specifications ......................................................................................................................... 11
3.1.2 Matchmaking inefficiencies ............................................................................................................. 15
3.2 Ingame statistics ................................................................................................................................ 17
3.2.1 In game statistics inefficiency .......................................................................................................... 18
3.2.2 In game statistics inefficiencies ....................................................................................................... 19
3.3.1 Complexity ....................................................................................................................................... 20
3.3.2 One game is not enough: the competitive aspect ........................................................................... 21
3.4Postgame statistics .............................................................................................................................. 22
4. The prospect of improvement .................................................................................................................. 23
5. Ambiguous endogenous player skill ......................................................................................................... 24
5.1 Organisation ....................................................................................................................................... 25
5.2 Shortterm failure ............................................................................................................................... 27
6. Information asymmetry ............................................................................................................................ 27
6.1 Collective salience .............................................................................................................................. 29
6.2 The assumptions ................................................................................................................................. 30
6.3 The model ........................................................................................................................................... 31
7. The survey setup ....................................................................................................................................... 32
7.1 Findings and interpretation methods ................................................................................................. 35
7.2 Limitations .......................................................................................................................................... 36
7.3 Results ................................................................................................................................................ 39
7.4 Hypothesis testing .............................................................................................................................. 46
8. Conclusion and recommendations ........................................................................................................... 47
9. Appendix A ................................................................................................................................................ 49
10. References .............................................................................................................................................. 79
The dissertation focuses on how the principles of information asymmetry affect perceptions
of competence and performance of every Dota player. Combined with the insufficient
qualitative tools to evaluate one's skill, the players are left to interpret misleading
quantitative statistics. Therefore a new system that takes information inefficiencies into
account has to be created to rapidly increase the community's provision from pursuing a
common passion. The tool creation is not achieved in the scope of the dissertation but a
theoretic model capable of achieving it is codified and valuable data is collected for further
testing.
Acknowledgement
I would like to thank Dr Juergen Bracht for advising and guiding my work, to
Alistair Spragg for having the courage to code the uncodable, to Ing Milan Krajcovic for
immense help with interpreting the data, to Dr Alan Bester for endorsing the research
within the community and to Michal Simko, Michaela Debreceniova, Martin Certicky and
Chris Koch for proofreading every draft and supporting my unorthodox thesis.
1. Introduction
This paper's aim is to introduce the reader to the intricate world of Dota 2 at a
leisurely pace, gradually explaining concepts as they appear. Is it structured in a way to
ease the transition from general concepts and simple truths into more complex Dotaspecific concepts many academics might not know much about.
Section 2 serves as a general overview for the reasoning behind choosing this topic,
particularly the physical vs. digital factors that postulate the idea that esports should be
studied as a mixture of sports and game theory to provide academic guidance in fixing
issues experienced by the author personally. Quick overview of the game is given, with
additional information snippets scattered further in the text to prevent overwhelming the
reader with too much gamespecific jargon at any time.
In section 3, the existing tools to evaluate performance are explained and critically
presented. This section reasons why these quantitative measures are insufficient in tracking
individual player competence. The section is concluded with the need to devise an
improved metric that can constantly capture player performance and offer qualitative
inference.
Section 4 specifies the potential benefits and the groups affected by such a metric,
while sections 5 and 6 introduce the economic concepts of information asymmetry and the
endogenous competence variable. These are briefly described with respect to player
behaviour within the game environment. General theoretical assumptions are later created
to govern the subsequent model used to conduct a simple shortterm experiment that was
supposed to probe the current situation and interpret it. Its setup, limitation and findings can
be found in section 7. The experiment proves to offer some significance but can be heavily
improved. The paper concludes that in order to create a functioning tool, the right research
with the right promotion and the right statistical models need to be in place. The author
expects to continue working on the topic in his own time and hopes to offer the community
a semifunctioning collaborative rating system by The International 5 in August.
2. Overview
2.1 Aspirations shift
7
1
This paper's focus makes the discussion whether esports has to be defined as a
sport irrelevant for its purpose (Gestalt, 1999). There is an apparent natural connection
between the two. Analogous to traditional sports, the gaming player base follows a
traditional Gaussian distribution2 (Reddit, 2014), with only a small percentage of the
players practicing deliberately (Rioult et al., 2014); each player expresses a selfset level of
intrinsic motivation to reach her goals through the structured game environment with
extremely welldefined and consistent rules identical for every player3, which in turn allow
a player to win a game by finding and executing strategies that outperform the opponent's
strategy4. One significant difference between the two is the form of the activity and the
subsequent evaluation feedback available. While physical sport can be physically measured
and tracked, virtual activity within a digital environment requires the exploration of new
techniques to evaluate performance as is does not involve such easily measurable metrics.
That is also why an online game is often compared to a game of chess and the Elo system
developed for tracking chess players has been implemented to esports.
2.3 The Rise of the Computer
gamers on the same ground used for traditional athletes. Amazon bought Twitch, a gaming
streaming website, for $1 billion last year. Esports viewership has been estimated at over
70 million people in 2013 and is expected to have risen rapidly since (SuperData Research,
2013).
The potential of approaching esports through 'sports' science is that it would look
further than considering it a sociocultural phenomenon. Esports is rooted deeply in the
digital youth culture. Children who are already very competent using the modern
information technology further train their competencies through playing computer games,
rapidly widening the socialtechnological gap between them and older generations. The
mastery of multimodal communication has become one of the fundamental capabilities to
acquire high status within a group or a society, particularly in youth culture (Wagner,
2006).
Looking at esports through game theory would provide deriving modern
approaches and methodologies to actively improve the industry's progress and add elements
it is missing. However, despite massive growth in popularity, (Ibid) academic focus on esports so far has been revolving mainly around its sociological, ethical (Rioult et al., 2014)
and cognitive aspects (Latham, Patston and Tippett, 2013). Little weight has been given to
the lack of tools to measure digital activity and competence. Rioult et al (2014) researched
the possibility of analyzing real time sport based on data mining of an online game, Latham
et al (2013) postulated the importance of distinguishing videogame expertise and
experience and Yang et al (2014) focused on identifying success patterns, but the literature
on evaluating digital performance and competence level has much space to improve.
9
2.4 The esports leader
'Dota 2' and Dota will be interchangeably used to refer to the same game
Abbreviation for Multiplayer Online Battle Arena
8
It is now installed on 42 million computers worldwide
9
http://steamcharts.com
10
Developed by Valve, Steam provides an internetbased distribution of computer games and additional
community features
11
(eSports Earnings, 2015)
7
10
another in dozens of annual online and offline tournaments. The most prominent of them is
The International (TI), famous for its prize pool (Wallace, 2014). Every professional
player's goal is to win TI which grants a team of five players lifetime reputation and
prestige. During the rest of the year, fans and players struggle to evaluate who the overall
best players are. Last year's TI4 winners, Team Newbee, dominated during the tournament
and their performance sunk so much afterwards that the Dota community refuses the team
to be automatically entered into the next TI (which has been a tradition so far).
2.5 Matchmaking potential
11
Dota matchmaking system tries to judge each player's competence level using a
matchmaking rating (MMR) which is a metric that places each player onto a percentile
point of the total player base Gaussian curve (similar to Elo12 ratings in chess). The exact
formula used to calculate a player's MMR and the underlying matchmaking algorithm has
never been presented by Valve13, therefore the following information is only descriptive.
Every player who has played at least one game is assigned an uncalibrated numerical value
based on specific datadriven rules (WhataBaller, 2014). For any potential match, similar
MMR players are expected to meet the ideal criteria described in 2.5. The algorithm assigns
a score for each criterion and creates a weighted average. When the generated score
exceeds a set threshold, the match is considered 'good enough' and is formed. To achieve
continuous maximum match quality, the MMR recalibrates after every match.
3.1.1 MMR Specifications
The ideal expected prematch win rate criterion is the primary aspect of MMR
calibration, although the algorithm does not force particular win rates for players 14. Instead
its builtin Elotype function places a player on a winning streak into continuously higherlevel games and a player on a losing streak becomes matched with progressively lower
skilled opponents and teammates. Therefore players' win rate stabilising around 50% is an
indirect result of the system. An important characteristic to note is that an Elo system is not
meant to give players a sense of progress. The algorithm is targeted at creating closely
balanced games by copying an assumed normal distribution of competence therefore it is
12
12
mathematically impossible for all players' MMR to keep rising indefinitely as more
games are played. A player falsely looking at their MMR as some progress indicator would
be constantly told she makes none (Fletcher, 2013). The MMR works as a medium of
probability distribution of performance in the next match. Uncertainty, serving as the
standard deviation of such distribution, adjusts with respect to the relationship between
actual and predicted match outcomes. The algorithm's repeatedly correct prediction of a
match result reduces player uncertainty, while surprise win/loss tends to increase it
(Blog.dota2.com,2015).
Stemming from Valve's data that similar skill players with dissimilar experience
tend to have different game play expectations (Ibid), the matchmaking system also tries to
match players with similar total matches played. The system measures experience as an
13
approximate logarithmic function of the total games played for a player (figure 2 below).
15
If the two players are closer enough in the diagram, they are placed in the same match.
Valve proclaims that they rely on MMR almost exclusively after around 150 Dota matches
(known as the calibration phase), (Dev.dota2.com, 2013).
The global aspect of online gaming has to be taken in consideration, whereas many
players speak different languages. Lack of a common language among teammates' should
be strongly avoided to allow players the
strategic coordination crucial to win a match.
This is an issue Valve has been criticised for
ever since they offered a language preference
matchmaking option. The problem is that the
15
The typical new player (A) progress describes how she gains experience and gradually moves upwards as
her skill increases over time. Player (B) with prior experience in the genre follows a steeper rise and arrives at
similar MMR sooner but becomes matched with similarly skilled players
14
game has no way of proving a player speaks a particular language, therefore anyone can
select any language to reduce their matchmaking queue time (English queue times are
comparatively shorter than any others). Reasoning of a player who deliberately lies about
their language proficiency is that the opportunity cost of not being able to coordinate with
teammates is smaller than waiting to play for a longer period.
Citing Festinger, Chen et al (2010) conjecture that online community members tend
to compare themselves to others if comparative information is present. On top of that,
social comparison theory proposes that people follow social comparisons in situations that
are ambiguous (Ibid). MMR classification serves to increase match enjoyment for the
majority (which functions fairly well16) and while not a direct evaluation of performance17
it is often the only consistent unambiguous player label available (Blog.dotacoach.org,
2015). The community often reduces players to mere numbers, holding little respect to
players with a lower spectrum MMR. What the continuous design of a fluctuating MMR
misses is the ability to recognise exceptional players straight from the start (unlike in
traditional sports) and place them to brackets where they belong. Such players have to
progressively jump towards higher MMR with each easily won game instead.18
16
15
Matching inefficiencies exist on both tails of the MMR distribution, due to the
game's learning curve and highly competitive aspect. The relation between MMR and skill
competence is visualised in figure 4 (note that the skill curve is only representative and
actually unknown). Sub 2000 MMR players often refer to their skill bracket as 'Elo Hell' or
'The Trench', claiming that as the major factor of improving their MMR is winning, having
already accomplished a higher skill level will not consequently make them meet higherskilled opponents, as they will get stuck in the undervalued MMR bracket. As Dota is a
team game, it is very rare for an individual to singlehandedly beat the opposing five players
therefore a player who
perceives herself to be
better has to cooperate
with four teammates
who at that skill level
percentile might not
Figure 4: An illustration of the relation between MMR and competence
16
even have a general grasp of all the game mechanics. On the other tail of the spectrum,
above 6000 MMR players suffer from the lack of similarly skilled players to play noncompetitive games with. Players in this bracket often focus on achieving higher order
competitive ambitions (like professional gaming or private leagues19), resulting in highlyskilled players often finding themselves stuck in the matchmaking queue, waiting for
another nine similarly skilled players. As one game of Dota on average requires a 20 to 60
minute commitment, the playerside assumption is that the payoff of waiting in a queue for
a good match reduces the longer they wait. Serverside optimisation follows the assumption
that the expected utility of looking for a game increases with time spent in the queue (as
more
would
tantamount
be
found).
players
This
problem is represented in
figure 5. Long queues for
highskilled
gradually
players
minimise
their
pro.faceit.com
17
enjoyment from winning games to secure them a tournament trophy than having a
tantamount opponent in every game. Competitive gameplay is also considered a higherorder bracket transcendent above the MMR ceiling, which nullifies MMR's meaning for
professional gaming. Winning a tournament is a team effort and constitutes additional
deliberate practice to get accustomed to an aggregate team strategy due to an extremely
high level of professional competition. A team with the five highest MMR players with no
prior mutual team coordination experience would have trouble beating a wellcoordinated
professional team consisting of players with relatively smaller average MMR but prior
experience in competitive gaming.
3.2 Ingame statistics
18
3.2.1 In game statistics inefficiency
The available individual player statistics are same for every player, with no weight
on any lane position or team role. A professional team
traditionally consists of two support players, who create
earlygame opportunities for the team momentum to build up as they gradually offer
comparatively less than the other team roles as the game progresses. Support players often
20
19
times end up with negative hero kill/death/assist KDA23 scores, minimum last hits (LH)24
and so on, sacrificing their resources for the collective team strategy goals.
The other three roles, socalled cores, consist of an offlaner role, a mid role and a carry; all
three offer little early on but gradually get stronger and often decide the result of a game.
The offlaner role is expected to suffer early and start slow but develops to provide an utility
function for the team, creating space for the other two cores. The mid role has been
historically a tempocontrolling role, dictating
the
23
Calculated simply as
Final blow that kills a creep and rewards gold
25
Active gathering of ingame gold and experience
24
20
competitive regions such as China or Europe. Without context, the metric is misleading as
multiple nongame factors affect a player's KDA, such as the team's play style or the
standard of competition. Additionally, when it comes to tracking performance on specific
team roles, simple metrics provide even less useful information. Gold per minute (GPM)26
or experience per minute (XPM)27 only work as comparisons of team cores. The KDA
ratio, hero damage (HD)28 and hero healing (HH)29 only suggest team fighting30 capabilities
while a team does not necessarily have to participate in team fights in order to win the
game known as the splitpush strategy31 (Herrer, 2015).
The quantitative scope of the currently tracked data favours roles specifically
focusing on gathering as many resources to win and undervalues a support player's
comparatively important contribution to winning the game. Supports are often referred to as
the unsung heroes of a victory as the game simply has not found a quantitative way to
evaluate nonresource based play so far. The community can still saliently agree on which
support players excel in their role (Medium, 2015).
3.3.1 Complexity
In addition to the victory rule, it is the pure complexity of the game that denies
further evaluation inferences. Just under 1.5 billion games have been played and recorded32
so far and no two games were the same. This is due to the combination of specific game
mechanics, 110 heroes, each with four unique abilities and six item slots for 131 unique ingame items, individual player form, play style, strengths and weaknesses and all the
26
21
In order to win a match, professional players try to bypass the game's complexity by
a strategy that chooses the set of variables (heroes, items, timeframes for specific actions
and so on) that is the most probable to win a game, with respect to the opposite team's
composition, basing the strategy choice on either statistical of salient principles. Such
behaviour in competitive Dota gives rise to what is called the 'metagame'. The most
33
22
prominent feature of the 'meta' is the pool of heroes picked or banned34 (joinDOTA.com,
2015). A particular set of hero choices combined with a particular play style and
itemisation, leads to a higher statistical probability of winning than the opponents' hero set,
creating a trend of dominant strategy saliency. Going back to the complexity of the game,
such trends are very unstable, due to the fact that each hero has their own strengths and
weaknesses which can be countered by picking other specific heroes against it, creating a
neverending cycle of strategies and counterstrategies35 (Kelley, 2015). The 'meta' is closely
watched and discussed as it is a simplified way of understanding the game. Each hero's win
rate in professional games is tracked36 and qualitative inferences are made.
3.4Postgame statistics
Similar approach is used for professional teams, but it has its own limitations.
Steenhuisen (2015) uses the example of two teams: Cloud9 and MVP Phoenix. First has a
217167 record translating into a 56.51% win rate and the other has a misleadingly better
record of 150115 (56.60%) An average Cloud9 opponent is much more skilled than that of
MVP Phoenix due to regional differences. To circumvent this, a professional tier Elobased
ladder ranking is commonly used to adjust the estimates of team performance (Noxville,
2015 and Gosugamers.net, 2015). These are calculated as a combination of team win rate
and subsequent tournament victories. While sufficient to rank teams against each other, the
comparisons of player performances within these matches become much more complicated
(going back to insufficient tracking of individual performance). While a team's win rate is a
relatively better indication of overall longterm performance compared to winning a
particular tournament or ingame statistics, it is only a hint for winning a particular game.
34
at a beginning of the match, both teams take turns in choosing (pick) or removing (ban) 5 heroes from the
pool each
35
http://www.joefkelley.com/dota2chart.html
36
http://www.datdota.com/heroes.php
23
As the most successful professional teams fluctuate under a 70% win rate, the metric does
not take influential variables such as opposite team's standing, current form or importance
of a match into account (Haigh, 1999:261)37. Curiously, Savinoxo administers a betting
model that tries to predict future match outcomes by tracking previous match results and
selecting profitable bets using the Kelly criterion (Smartdotabetting.com, 2015).
A player's performance will naturally fluctuate based on form. The game currently
cannot describe such fluctuations. Winning a tournament should inherently suggest that the
winning team consists of the five best players at the time of the tournament, but with data
being biased towards core team roles there is no way of confirming such a notion. While
the winning team has the highest probability of ranking first in collaborative rubrics (team
execution, cohesion etc.), individual contributions to such estimates cannot be inferred.
37
Using Morris' relative match importance theory has not been acknowledged in esports so far
24
38
While professional football players still aim to be the best they can, the club pays their
players to win trophies (as they bring in the most prize money) and buys popular players to
increase nongame revenue (such as TV rights, merchandise).
25
grasp (Thurnsten, 2015) and thousands more of directed practice to 'master' (Yang and
Roberts, 2013). Given the endogenous quality of the attribute, the word 'master' should be
taken lightly, as the skill ceiling is an ambiguous concept. Dota 2 is considered one of the
steepestlearningcurve games in the history of gaming (Cocilova, 2015). Notice the 'maybe
20%' indicator on the yaxis in the illustrative figure 6 below.39
Skill
maybe 20%
start here with prior
experience
when people
actually start
considering
themselves pro
learning heroes and
items
time/task
getting good at it
40
5.1 Organisation
While a high skill is much more costly for the player to achieve, it is also much
more valuable to the competitive scene. These costs and benefits raise the issue of whether
or not there is an optimal skill level. Most markets have the property that the buyers are
diverse in their willingness to pay for quality increases, so there will be a variety of
different quality levels being sold at different prices (Holt, 2007). Similar concept can be
drawn for the player market in Dota, whereas teams and players are looking for diverse
competence in players due to their own skill levels being distributed over a spectrum.
39
"In fact, once you have been playing for over a year you begin to hit the glass wall of mediocrity. Where
only those who are talented with either natural skill or hard work can surpass." (Cutsrock, 2014)
40
http://www.playdota.com/forums/attachment.php?attachmentid=37770&d=1299085266
26
Similar to what Rapaport (cited in Roth, Sonmez and Unver, 2003) suggested to minimise
the elimination of immunologically incompatible volunteer kidney donors, the Dota
community independently creates subtop tournaments that teams with subtop collective
competence and results participate in against likeminded teams to improve, providing
evidence to the motivations to play discussed in 2.1. While not as popular to spectate, such
organisation promotes the enjoyment of playing the game along with increased personal
competition, which reflects into potential continuation of fresh pool of new talents to the
highest leagues, similar to NHL farm teams (Latham, Patston and Tippett, 2013).
A player's willingness to join a team is intuitively affected by the perceived
difference between her current perceived competence versus the potential team's average
competence (shown in respect to MMR in the illustrative figure 7 below). A 50%
willingness to join illustrates an indifference of preference.
27
When offered a spot on a team, players with relatively lower skill face the
temptation to exaggerate their quality, especially if others cannot perceive their endogenous
skill level. Any skill information can only be acquired and maintained through signalling
by previous achievements, community reputation or previous team's opinions of the player.
The competitive organisation of Dota 2 is analogous to an experiment by DeJong, Forsythe
and Lundholm (1985) which showed that even with unknown prices a market will not
collapse thanks to a process of reputation building. As a player claims a certain competence
and others only hold imperfect information about the proclaimed competence, a set of
acquired reputations prevents the scene from collapsing. Instead an ascending vertical skillbased system exists.
5.2 Shortterm failure
Nonetheless, some shortterm market failures have been historically observed at the
top competence level where the differences in skill levels are already minimised given their
extreme form. These were notably concerning Jacky Mao, Tal Aizik and Johan strm;
three players who were consider top tier at the time they were, for one reason or another,
removed from their teams (No Tidehunter, Team Secret and Cloud9 respectively) and
afterwards struggled to find a hightier team to play for. Without an objective measure of
their competence, potential teams assumed the players carried some undesirable attributes
and stayed aloof from signing them, analogous to Akerlof's (1970) lemon market
experiment. Mao was even forced to create a new team.
6. Information asymmetry
Even though information asymmetry is traditionally understood in the economic
context of buyers, sellers, price and quality (Holt, 2007) it can be reformulated using simple
proxies to apply to the context of Dota. The ubiquitous analogical assumption of the theory
28
is that a player's skill level can neither be observed by others nor the player herself and its
evaluation cannot be quantified. Skill is described as an
Player vs. self: No objective metrics of the qualitative individual skill set exists for
the player to rate herself on. Along with the ambiguous competence ceiling, the
only valid way of assessing her own expertise is to compare her own perceptions of
personal competence to what she perceives others' to be. Valve deals with the
problem by introducing the MMR metric providing players with a comparison
option, while professional players take the matter into their own hands by
competing professionally with the ultimate goal of winning The International
(analogous to being the best).
Player vs. other players: Following up on the inability to evaluate own skill, players
can only guess what competence the others have. But, with the addition of a
community, the notion of a player comparing perceived competence of another
player against the collective perception of such competence can exist. If finitely
replicated by the community onto the player herself, some form of collective
saliency takes place. The resulting crossreferenced meta perception becomes the
focal point of her own competence, held by the community. Such reasoning would
explain why only having some limited knowledge of a game is sufficient enough to
judge a player's performance (however poorly). After watching a five minute
section of a football match, any fan familiar with the basic rules can, to some extent,
tell which players are performing better than others. If all general assumptions are
29
held equal, the same should apply for Dota players. The information asymmetry
takes place in the form of players not being able to objectively judge others without
high enough personal competence41
Professionals vs. casuals: The ability to use the community's salient perception to
judge others 'fairly' with respect to the level of one's own (unknown) competence
suggests that the closer a player gets to achieving the notional competence ceiling,
the more capable she becomes at judging other players objectively. An unintended
information asymmetry therefore develops between the player and all other players
who improve at a slower pace. Professional players who signal their competence via
winning competitive matches should therefore be assumed to possess the best
means to evaluate other players' performance.
Analogous to the driver ability experiment (McCormick, Walkey and Green, 1986)
30
players use (and can expect other players to use) similar conceptual scheme of what is
meant by performance. Research by Sadler and Good (2006) remains optimistic about the
evaluation validity and positive effects of introducing peergrading into a community.
Piech et al (2013) conclude that peer assessment offers a promising solution to scale the
grading of complex assignments in massive open online courses.
6.2 The assumptions
that consumers are more likely to think closely about opaque details when making "large, one time choices"
than when making small repeated purchases
31
curve (figure 4), with all other players distributed to the left. We can therefore suggest
using collective saliency as the mean and professional player evaluation as the standard
deviation of the probability distribution of endogenous player competence, potentially
providing a valid evaluation of performance. Combined with the existing tools to quantify
performance, such approach could yield the desired results described in 4. It would ideally
provide a highly reliable assessment, engage the community and become applicable to a
diverse collection of problem settings. (Piech et al., 2013)
6.3 The model
Rater bias: every user u is associated with a bias, bu R. This variable reflects the
user's tendency to under or over valuate her rating of a player
32
Using Bayesian statistics the model puts prior distributions over the latent variables and
assumes that while an individual user's bias may exist, the average bias of many users is
zero:
u G (0, 0)
bu N (0, 1/0)
rp N ( 0, 1/ 0)
zup N (rp + bu, 1/t u) for every observed rating (user rating).
G refers to a gamma distribution with fixed hyperparameters 0 and 0, while 0 and 0 are
hyperparameters for the priors over biases and true ratings.
Having this model in mind, we have set up a survey which was supposed to outline the
findings and assumptions gathered throughout the paper.
33
Data gathering was implemented online via the RankDota website created for this
purpose.44 The page parsed selected information on the specified matches from the Dota 2
API (Application Programming Interface) after they finished; namely the time of the match,
the team names, their respective sides within the game, the players' names and the heroes
each assigned to an individual player. The winner of a match was not offered to reduce
experimenter's effect. The site and a plea to participate was advertised for the first two days
of the tournament (24th and 25th of April) through Twitter45 and a Dotaspecific subreddit
on Reddit46. Further matches were not promoted. Users were kept anonymous to avoid the
"culture shock" of assuming responsibility for rating (Sadler and Good, 2006).
After accessing the homepage, users were shown a list of finished Starladder games,
43
http://wiki.teamliquid.net/dota2/Star_Ladder_Star_Series/Season_12
http://rankdota.co.uk
45
https://twitter.com/squartefaghoui/status/591599291065638912
46
http://www.reddit.com/r/DotA2/comments/33phzi/launching_rankdota_please_give_it_a_go/ and
http://www.reddit.com/r/DotA2/comments/33t330/rankdota_beta_test_day_2/
44
34
with the ability to rank the players within every game individually (therefore a match
consisting of two games had two separate ranking pages). An important thing to note is that
two different matches were constantly played at the same time, with live coverage through
Twitch47 only offered for the more popular of the two. Users had the option to select their
current
MMR
and
Likert
scale
ranging
from
to
10
was
presented.
User were asked to 'Please rank the performance of the players in this match (10
being the best)'. Performance was presented as an ambiguous concept, without any links to
team roles and no additional description of the scale was given. This let users define
performance based on their own belief of what constitutes performance and to base the
rating labels on their own private descriptions assigned to such belief. Each numerical rank
was a member of a set of finite possible labels for performance. Each user faced 10 choice
problems per player in which there was no incentive to make any particular choice apart
from personal bias (Mehta, Starmer and Sugden, 1994). There was no risk associated with
choosing a rating. Users' goal was not to coordinate on a similar rank  as there is no payoff
presented for doing so. It is to evaluate perceived performance as best as they can. Rating
coordination would be either coincidental or based on a shared conceptual scheme,
eliminating the possibility that the user chooses a rating that is different than what she
believes to be the right label (Sugden, 1995). Apart from performance, no additional rubrics
were codified for the users to rank due to the experimenter's acceptance of own secondary
level information deficiencies (Sadler and Good, 2006). The experiment didn't tell users
which team role players held, only the heroes they played. General game knowledge and
having watched the game are assumed to be sufficient for users to recognize roles. Due to
personal coding limitations, deference of double entries could not be achieved. There is
47
35
also no way of telling whether users answered truthfully but no explicit nor implicit
incentive to sabotage the project has been discovered. Professional Dota matches are
viewed by players and fans spreading across the whole skill distribution spectrum.
Intuitively, the more competent a player becomes the more interest she shows in spectating
competitive matches. The number of Starladder viewers fluctuated from 20,000 for early
morning matches to over 250,000 for the final match.
Reasoning behind this experiment was to create some inferences about user
behaviour created by outlining the findings and assumptions gathered so far with the
ultimate goal of laying foundations for additional research into finding an optimal
performance evaluation tool. The following hypotheses were defined48:
H1: User's MMR affects the rating she assigns to professional players.
H2: Difference exists between the rating of lower and higher MMR users.
H3: The ratings of winners and losers are different.
H4: True rating (rp), rater bias (bu) and rater reliability (u) can be observed.
7.1 Findings and interpretation methods
Data was gathered for 7 matches involving a total of 15 games, all taking place on
the first two days of the tournament. We managed to collect a total of 7961 responses of
which 6029 were valid (meaning the response had a rank > 0). 1986 responses were
matched with the optional MMR choice, which were used for the inference analysis of the
effect of a respondent's MMR to their ranking.
We had used univariate and multiway statistics, frequency tables and frequency
distribution descriptive methods to evaluate the data. Nonparametric Wilcoxon test was
48
while not an indicator of competence level, MMR correlates with it enough and is the only useful label
generally recognised
36
used to infer sample differences and an analysis of independence was used to describe the
enumerative data presented in contingency tables, while a graphic method (box plot, bar
graph and contingency graph) helped to better visualise the data. We utilised the Statit
Custom QC statistical software49 and Microsoft Excel to read and process the data.
Important thing to clear reader's potential confusion in the following interpretations is that
the game client does not hold a name for what this paper calls a match but instead names
every game (a subset of a match) with a match ID.
7.2 Limitations
Using a 10 point Likert scale limited our options to interpreter the data. We
therefore did not consider the experiment results as categorically discreet values but as
discreet variables which entitled us to use statistical functions for composite data. We are
aware of the deficiency of processing a discrete scale through continuous statistical
methods but doing so effortlessly simplifies the interpretation while having no negative
implications on the results. We consider the scale to be sufficiently defined in the sense of
ordinality (1 minimum, 10 maximum). Community responses on the survey promotion
page reassured us that the respondents understood the scale, even though they considered it
too soft. Due to ambiguity of the scale different users might have used different weighting
(Sugden, 1995) and some issues of being unable to justify one rank over another have been
reported. We had forced an assumption that player performance remains the same over the
course of the game. This assumption is often incorrect but necessary without access to a
live match's combat log50 (Edge, 2013) which is a limitation in the Dota API.
Measuring performance through a simple 1 to 10 scale also does not allow for more
sophisticated responses. Expanding the scale range to 100 would reduce agreement by
49
50
Evaluation version
A chronological ingame list of every interaction involving the player
37
chance but also dissatisfy the users who would encounter further difficulties with an even
softer scale. If we were able to track specific users, reducing the scale maximum by three
points and implementing a weighted Kappa to account for chance agreement could have
provided sharper results (Abedi, 1996). Offering performance subrubrics and allowing
users to input freeform text would have given us further insights into users' ratings (Sadler
and Good, 2006) although many users might not have been prepared to give solid feedback
in English so a language barrier would have to be accounted for. We were also unable to
probe users for postranking clarifications ourselves which led some to contact us directly,
relating their concerns (Watters, 2012). Ling et al. (2005) found that individuals contributed
more when given specific challenging goals and were reminded of their uniqueness. We
have replied to every user's comment but due to the code issues, we could not identify
users. We were worried the survey might fail due to undercontribution or nonparticipation
from the community. Success of the experiment lied in a proper execution of the
promotional campaign (Butler cited in Chen et al., 2010). No monetary incentives were
offered to potential participants as the Dota community relies on voluntary contribution of
time and effort rather than monetary encouragement. Such a move could have resulted in
adverse reputation effects and a skewed sample size. Users were instead prompted to rank
more than one match reminding them doing so generates a higher public benefit (Chen et
al., 2010). If tracked, rating would become an impure public good as user could signal their
competence to the community and we could create a database of rating history and possibly
generate
future
match
predictions
for
betting
on
matches
for
the
users.
The more popular matches streamed through Twitch were expected to be rated more
than the matches streamed simultaneously, endangering interpretation accuracy for the
unbroadcasted match (Ludford et al., 2004). A key challenge to prevent this from
38
happening was to motivate the community members who hold a fan allegiance to such
teams. The only measure of success was the sample size for each match, which was also
affected by other factors. Frequent users were more likely to encounter the survey
promotion and participate in studies on Dota due to community affinity, creating a
volunteer effect and changing their ranking behaviour because of this. This was to some
extent controlled by using an MMR proxy (Rosenthal and Rosnow, 2009). There is no way
of knowing if similarly skilled users averaged on a performance rank at most one standard
deviation away from the true rating expected by the information asymmetry assumptions.
Although directly approached, no professional players have provided their peerrating to
estimate rp. Further optimisations of the model were therefore not possible.
The bias that similar MMR users would not coordinate on a similar rating due to
different fandom preferences would result in a skewed ratings sample for particular fan
favourite players (Sudgen, 1995). No such skewness has been recorded with any player.
Overall, we wanted to follow the best practice set by Piech et al (2013) whereas users'
ranking pattern is precalibrated and instead of ranking any number of games, she is
assigned a randomly selected list of players to rank (one of which has been previously
ranked by a professional player). The voluntary spirit of the survey and the professional
players response rate fiasco did not allow us to replicate the concept.
39
7.3 Results
All the processed results can be found in Appendix A. Only a selection of results are
interpreted in the main text. The sample gathered exhibited distribution of ranking scale
frequency shown in figure 9. An interesting fact is the abnormal overusage of the maximum
rank compared to other values, which suggests respondents could identify the
Figure 9
40
Figure 10
Figure 11
41
Figure 12
42
Continuous analysis of central fit, shown in figure 13 holds an overall rating mean
of 6.2 with a standard deviation of 3.0 in 6028 valid cases. The games are listed in
ascending order by time. The spread of responses directly correlates to which games were
broadcasted through Twitch, which in turn correlates with team popularity. Generally, the
top teams have the most fans and the perceived best quality games, therefore the popularity
bias could not be eliminated at this stage. Game 1420622700 had the most responses: A
relatively weaker Team Malaysia51 eliminated the tournament favourite (Team Secret) into
the loser bracket. The game also occurred after the promotional post skyrocketed onto the
front page of the Dota subreddit, confirming the survey participation assumption. The
inverse is apparent from the small amounts of responses to the last four matches recorded
as the survey promotion has not been advertised anymore.
Figure 13
51
Rank 8 on Gosugamers
43
The contingency graph below (figure 14) shows the unexpected game result on
ranking with respect to selected MMR. Team Secret curiously received 232 more ratings
than Team Malaysia. Respondents who selected their current MMR as 6001+ were a lot
more critical to all players, compared to other MMR groups. While the most popular team,
Secret players received low scores from every MMR group.
44
Additional central fit analysis explored the effect of match result on rankings. We
report a significant statistical difference (P = 0.000) between the two result states, shown in
figures 15 and 16 respectively. This is an unsurprising result, given that winning is the
ultimate goal of any competitive game and succeeding usually means performing better. As
exceptions, some winning teams did not receive similarly high scores than others,
suggesting that if no overlycompetent performance can be seen, viewers coordinate on a
lower rank. In contrast, game 1422859350 between Vici Gaming and Cloud9 recorded an
unusual high mean coordination for the losing team. This exception is probably due to a
small sample size. More details on the relations in this section can be found in the
Wilcoxon tests in Appendix A.
45
Figure 17 shows average rating each team received. No emphasis was given to
ranking teams in the survey so the figure only shows collective rankings of players included
in each team. The day after the survey ended, Vici Gaming became the tournament winners.
While their frequency ranking is low, their performance was valued as one of the best
during the first two days of the tournament. Many fans expected Team Secret to be close
contenders but they had been eliminated with a 14 record and their collective rating
indicates the frustration of fans.
Figure 17
46
H1: As found through the Wilcoxon tests in Appendix A, we have enough ground
to accept the H1 claim.
H2: Proven by the Wilcoxon test, 6000+ MMR players' ratings differ significantly
from the ones by lower MMR groups, therefore we can accept the H2 claim.
H3: Findings visualised in figure 14, and 2 independence results in Appendix A
propose winners are consistently given higher ratings than losers (P = 0.000) therefore we
can accept H3. The size of the sample is enough to minimize assumed popularity bias (bu)
as theorised 6.3 and 7.2.
H4: Due to the experiment design, configuration of circumstances and the data
types gathered, we are unable to judge this hypothesis with enough depth. The site is
missing a signin option for users, therefore we were unable to track additional factors
(apart from MMR and rating given) required to define the features of H4.
47
48
Competitive gameplay would still benefit from applied study of possible measurement
tools, in order to further improve and provide higher provisions for fans, subsequently
minimising the third tier information asymmetry.
Even though the Dota community is anonymous compared to the students involved
in the original PG1 experiments, it seems the Dota community is attracted by the prospect
of a common passion. The results suggest that the community's primal concern that the
survey would only act as a popularity contest were minimised using the relatively small
sample size (as stated in 6.3).
There is a long way to achieve higher provision via peerevaluation but probing
collective knowledge is a good place to start. We are now developing a second version of
the website to optimise data gathering in a longer timeframe on more tournaments to be
able to stratify the data and capture additional parameters that affect ambiguous
endogenous competence, aiming to achieve PG1 optimisation.
49
9. Appendix A
(due to the margin standards set by the coordination, the graphs section cannot be
perfectly centred in the middle of the page, which we would like to apologise for. It annoys
the author as much as the read)
Statistics for raterRating
Mean
Std Error Mean
Std Deviation
C.O.V.
6.190113
0.038456
2.985728
0.482338
Skewness
Kurtosis
Minimum
Maximum
0.406538
1.024626
1
10
Geometric Mean
Std Error G.Mean
Valid cases
5.068968
0.04828
6028
Median
Approx SE of Median
IQR
IQR/Median
7
0.0644
5
0.714286
Q1
Q3
Range
Midrange
4
9
9
5.5
Harmonic Mean
Std Error H.Mean
Missing cases
3.582832
0.048175
0
0 1
780
12.940
12.940
1 2
238
3.948
16.888
2 3
300
4.977
21.865
3 4
379
6.287
28.152
4 5
555
9.207
37.359
5 6
652
10.816
48.175
6 7
736
12.210
60.385
7 8
731
12.127
72.512
8 9
615
10.202
82.714
9 10
1042
17.286
100.000
 Total
6028
100.000
Valid cases =
6028
Missing cases =
Total
334
5.54
781
12.96
280
4.64
50
B
0.00 B
9.82 B
99
LC
B
290 B
0 B
290
B
4.81 B
0.00 B
4.81
B 100.00 B
0.00 B
B
9.13 B
0.00 B
99
MY
B
322 B
1186 B
1508
B
5.34 B 19.67 B 25.02
B 21.35 B 78.65 B
B 10.14 B 41.58 B
99
Secret
B
1535 B
276 B
1811
B 25.46 B
4.58 B 30.04
B 84.76 B 15.24 B
B 48.33 B
9.68 B
99
TT
B
703 B
234 B
937
B 11.66 B
3.88 B 15.54
B 75.03 B 24.97 B
B 22.13 B
8.20 B
99
VG
B
0 B
87 B
87
B
0.00 B
1.44 B
1.44
B
0.00 B 100.00 B
B
0.00 B
3.05 B
n
Total
3176
2852
6028
52.69
47.31
100.00
ChiSquare
7 3104.988
0.000
Likelihood Ratio ChiSquare
7 3663.826
0.000
MantelHaenszel ChiSquare
1
568.236
0.000
Phi Coefficient
0.718
Contingency Coefficient
0.583
Cramer's V
0.718
Missing cases = 0
raterRating
6.2
3.0
6028
raterRating
Std
Valid
MatchID
Freq
Mean
Deviation cases
1420140358
799
6.8
2.6
799
1420142374
76
5.8
3.5
76
1420243583
36
5.7
4.0
36
1420367758
653
5.4
2.9
653
1420622700
2148
6.2
3.0
2148
1420633610
379
5.6
3.5
379
1420816727
598
6.5
2.7
598
1420832492
89
4.3
3.1
89
51
1421007628
1421280530
1421421139
1422406844
1422527977
1422859350
1423036681
518
420
154
32
63
36
27
6.4
6.2
6.0
5.4
6.3
8.0
7.2
2.6
3.1
3.2
2.9
3.2
2.3
1.5
518
420
154
32
63
36
27
raterRating
Std
Valid
MatchID
Result Freq
Mean
Deviation cases
1420140358 loss
352
5.9
2.7
352
win
447
7.6
2.3
447
1420142374 loss
win
40
36
4.4
7.4
3.2
3.2
40
36
1420243583 loss
win
16
20
4.2
6.9
3.9
3.8
16
20
1420367758 loss
win
351
302
3.8
7.3
2.1
2.6
351
302
1420622700 loss
win
1190
958
4.9
7.8
2.7
2.6
1190
958
1420633610 loss
win
195
184
5.2
5.9
3.5
3.4
195
184
1420816727 loss
win
322
276
5.9
7.3
2.6
2.6
322
276
1420832492 loss
win
39
50
4.7
4.0
3.1
3.1
39
50
1421007628 loss
win
290
228
5.1
8.1
2.3
2.0
290
228
1421280530 loss
win
230
190
5.4
7.3
3.0
2.8
230
190
1421421139 loss
win
64
90
4.6
7.0
3.0
2.9
64
90
1422406844 loss
win
20
12
5.3
5.5
3.0
2.8
20
12
1422527977 loss
win
35
28
5.6
7.1
3.2
3.2
35
28
1422859350 loss
win
20
16
7.4
8.8
2.8
1.4
20
16
1423036681 loss
12
7.0
1.7
12
win
15
7.4
1.4
15
10
N = 3176
N = 2852
raterRating (win)
52
******424
***194
*****358
*******439
*******461
*****339
****270
**148
********543
1233********************
537********
378******
213***
94*
40*
30*
90*
237***
0
Sample Medians
raterRating (loss)
raterRating (win)
5
8
Sample Size
0.071
0.0562
3176
2852
raterRating
Std
Valid
MatchID
Result TeamID
Freq
Mean
Deviation cases
1420140358 loss
TT
352
5.9
2.7
352
win
1420142374 loss
win
1420243583 loss
win
1420367758 loss
win
1420622700 loss
win
1420633610 loss
win
C9
447
7.6
2.3
447
LC
40
4.4
3.2
40
VG
36
7.4
3.2
36
LC
16
4.2
3.9
16
VG
20
6.9
3.8
20
TT
351
3.8
2.1
351
C9
302
7.3
2.6
302
1190
4.9
2.7
1190
MY
958
7.8
2.6
958
LC
195
5.2
3.5
195
TT
184
5.9
3.4
184
Secret
53
1420816727 loss
win
1420832492 loss
win
1421007628 loss
win
1421280530 loss
win
1421421139 loss
win
1422406844 loss
win
1422527977 loss
win
1422859350 loss
win
1423036681 loss
MY
322
5.9
2.6
322
Secret
276
7.3
2.6
276
LC
39
4.7
3.1
39
TT
50
4.0
3.1
50
Secret
290
5.1
2.3
290
MY
228
8.1
2.0
228
Alliance
230
5.4
3.0
230
IG
190
7.3
2.8
190
Alliance
64
4.6
3.0
64
IG
90
7.0
2.9
90
Secret
20
5.3
3.0
20
Alliance
12
5.5
2.8
12
Secret
35
5.6
3.2
35
Alliance
28
7.1
3.2
28
C9
20
7.4
2.8
20
VG
16
8.8
1.4
16
C9
12
7.0
1.7
12
win
VG
15
7.4
1.4
15
raterRating
Std
Valid
TeamID
Freq
Mean
Deviation cases
Alliance
334
5.4
3.1
334
C9
781
7.5
2.4
781
IG
280
7.2
2.8
280
LC
290
5.0
3.4
290
MY
1508
7.5
2.6
1508
Secret
1811
5.3
2.8
1811
TT
937
5.0
2.9
937
VG
87
7.5
2.9
87
54
raterRating
Std
Valid
MatchID
Result TeamID raterMMR Freq
Mean
Deviation cases
1420140358 loss
TT
3001
36
7.0
1.8
36
4001
40
5.5
2.7
40
5001
32
6.2
2.4
32
6001
4
7.8
1.3
4
win
1420142374 loss
win
1420367758 loss
win
1420622700 loss
win
1420633610 loss
win
1420816727 loss
C9
3001
4001
5001
6001
45
50
41
5
7.6
8.4
7.3
7.6
1.4
1.4
2.6
1.5
45
50
41
5
LC
3001
4001
5001
5
10
5
3.4
5.7
1.0
2.5
1.9
0.0
5
10
5
VG
3001
4001
5001
4
8
4
6.8
6.1
1.0
1.5
2.9
0.0
4
8
4
TT
1001
2001
3001
4001
5001
6001
15
11
15
40
25
5
5.6
4.9
5.1
4.1
4.7
4.2
1.6
1.8
1.0
2.4
1.8
0.4
15
11
15
40
25
5
C9
1001
2001
3001
4001
5001
6001
12
8
16
31
21
4
7.8
7.5
7.9
7.9
8.1
8.0
1.2
1.4
1.1
2.7
1.0
0.8
12
8
16
31
21
4
Secret
1001
2001
3001
4001
5001
6001
9
25
132
135
67
50
1.4
5.4
5.6
5.1
5.2
3.1
0.5
1.7
2.5
2.7
2.7
3.5
9
25
132
135
67
50
MY
1001
2001
3001
4001
5001
6001
7
20
113
108
49
40
6.6
9.0
8.3
8.3
6.6
5.8
4.3
1.1
2.0
1.8
2.6
4.1
7
20
113
108
49
40
LC
2001
3001
4001
5001
6001
15
18
10
10
20
3.9
4.3
5.5
4.1
3.1
2.6
2.2
4.7
1.5
3.6
15
18
10
10
20
TT
2001
3001
4001
5001
6001
12
16
12
8
16
4.8
7.4
5.2
8.1
1.0
3.0
1.8
3.9
1.5
0.0
12
16
12
8
16
MY
2001
3001
15
15
5.9
6.9
1.6
2.4
15
15
55
Analysis variable:
raterRating
Std
Valid
MatchID
Result TeamID raterMMR Freq
Mean
Deviation cases
1420816727 loss
MY
4001
30
5.7
1.7
30
5001
10
6.7
3.6
10
6001
5
4.4
1.5
5
win
1420832492 loss
win
1421007628 loss
win
1421280530 loss
win
1421421139 loss
win
1422406844 loss
win
1422527977 loss
win
1422859350 loss
Secret
2001
3001
4001
5001
6001
12
12
32
8
4
7.8
8.1
7.9
8.6
8.8
1.3
1.3
1.5
1.8
0.5
12
12
32
8
4
LC
1001
6001
4
4
5.5
2.2
1.3
0.5
4
4
TT
1001
6001
5
5
6.2
1.4
1.3
0.5
5
5
Secret
1001
2001
3001
4001
6001
5
5
35
40
5
9.2
6.2
4.7
5.1
4.0
0.8
2.2
1.8
1.6
2.4
5
5
35
40
5
MY
1001
2001
3001
4001
6001
4
4
28
32
4
10.0
7.5
7.5
7.6
7.2
0.0
2.9
1.6
1.3
2.1
4
4
28
32
4
Alliance
2001
3001
4001
5001
6001
10
15
25
25
5
8.5
3.7
5.0
4.6
1.0
1.5
2.0
1.7
3.3
0.0
10
15
25
25
5
IG
2001
3001
4001
5001
6001
8
12
20
20
4
4.8
6.8
8.1
8.2
5.5
4.1
2.2
1.3
1.9
5.2
8
12
20
20
4
Alliance
3001
5001
8
12
3.7
2.7
2.0
2.5
8
12
IG
3001
5001
10
15
7.5
6.5
1.2
4.1
10
15
Secret
6001
9.0
0.0
Alliance
6001
2.0
0.0
Secret
2001
4001
5001
5
10
15
4.8
2.2
7.5
2.0
1.7
2.7
5
10
15
Alliance
2001
4001
5001
4
8
12
8.2
6.2
6.8
1.3
2.1
4.3
4
8
12
C9
3001
5001
5
5
6.4
10.0
0.5
0.0
5
5
56
Analysis variable:
raterRating
Std
Valid
MatchID
Result TeamID raterMMR Freq
Mean
Deviation cases
1422859350 win
VG
3001
4
7.5
1.0
4
5001
4
10.0
0.0
4
1423036681 loss
C9
3001
7.2
2.2
win
VG
3001
5
6.8
1.3
5
Note:
raterRating
Std
Valid
raterMMR Freq
Mean
Deviation cases
1001
61
6.1
2.9
61
2001
154
6.3
2.6
154
3001
553
6.6
2.4
553
4001
641
6.4
2.7
641
5001
388
6.2
3.0
388
6001
189
4.2
3.7
189
N = 61
N = 154
raterRating (1001)
10
raterRating (2001)
********15 35********************
***6 21************
*****10 21************
****7 21************
****7 25**************
*2
6***
*2 11******
****7

**5 14********
0
There is no significant difference between the samples.
If in fact the populations were THE SAME, the chance of this much evidence
of greater values for variable raterRating (2001) than for variable
raterRating (1001) would be approximately:
Onesided Pvalue = 0.4512.
The chance of this much evidence in EITHER direction would be twice that
value, that is:
Twosided Pvalue = 0.9025.
Sum of ranks for Variable raterRating (1001)
T =
6537.5
Approximate
Standard Error
Sample Size
57
raterRating (1001)
raterRating (2001)
7
6.5
0.5762
0.2417
61
154
N = 61
N = 553
raterRating (1001)
10
raterRating (3001)
**15 123********************
*6
97***************
*10
86*************
*7
91**************
*7
63**********
*2
29****
*2
26****
*7
5*
*5
33*****
0
There is no significant difference between the samples.
If in fact the populations were THE SAME, the chance of this much evidence
of greater values for variable raterRating (3001) than for variable
raterRating (1001) would be approximately:
Onesided Pvalue = 0.1645.
The chance of this much evidence in EITHER direction would be twice that
value, that is:
Twosided Pvalue = 0.3289.
Sum of ranks for Variable raterRating (1001)
T =
17485
7
7
Approximate
Standard Error
0.5762
0.1276
Sample Size
61
553
58
N = 61
N = 641
raterRating (1001)
10
raterRating (4001)
*15 168********************
*6
82*********
*10
81*********
*7
82*********
*7
74********
*2
56******
*2
25**
*7
21**
*5
52******
0
There is no significant difference between the samples.
If in fact the populations were THE SAME, the chance of this much evidence
of greater values for variable raterRating (4001) than for variable
raterRating (1001) would be approximately:
Onesided Pvalue = 0.3377.
The chance of this much evidence in EITHER direction would be twice that
value, that is:
Twosided Pvalue = 0.6753.
Sum of ranks for Variable raterRating (1001)
T =
20811.5
7
7
Approximate
Standard Error
0.5762
0.158
Sample Size
61
641
59
N = 61
N = 388
raterRating (1001)
10
raterRating (5001)
**15 108********************
*6
59**********
*10
37******
*7
30*****
*7
39*******
*2
32*****
*2
20***
*7
9*
*5
54**********
0
There is no significant difference between the samples.
If in fact the populations were THE SAME, the chance of this much evidence
of greater values for variable raterRating (5001) than for variable
raterRating (1001) would be approximately:
Onesided Pvalue = 0.4323.
The chance of this much evidence in EITHER direction would be twice that
value, that is:
Twosided Pvalue = 0.8645.
Sum of ranks for Variable raterRating (1001)
T =
13565
7
7
Approximate
Standard Error
0.5762
0.2538
Sample Size
61
388
60
N = 61
N = 189
raterRating (1001)
10
raterRating (6001)
***15 47**********
*6
9**
**10 8*
*7
4*
*7
5*
*2
6*
*2
6*
*7 16***
*5 88********************
0
There is a significant difference between the samples.
If in fact the populations were THE SAME, the chance of this much evidence
of greater values for variable raterRating (1001) than for variable
raterRating (6001) would be approximately:
Onesided Pvalue = 0.0000.
The chance of this much evidence in EITHER direction would be twice that
value, that is:
Twosided Pvalue = 0.0001.
Sum of ranks for Variable raterRating (1001)
T =
9583
7
2
Approximate
Standard Error
0.5762
0.5455
Sample Size
61
189
61
N = 154
N = 553
raterRating (2001)
10
raterRating (3001)
*****35 123********************
***21
97***************
***21
86*************
***21
91**************
****25
63**********
*6
29****
*11
26****

5*
**14
33*****
0
There is a significant difference between the samples.
If in fact the populations were THE SAME, the chance of this much evidence
of greater values for variable raterRating (3001) than for variable
raterRating (2001) would be approximately:
Onesided Pvalue = 0.0951.
The chance of this much evidence in EITHER direction would be twice that
value, that is:
Twosided Pvalue = 0.1902.
Sum of ranks for Variable raterRating (2001)
T =
51605.5
54516
2221.6147
Sample Medians
raterRating (2001)
raterRating (3001)
6.5
7
Approximate
Standard Error
0.2417
0.1276
Sample Size
154
553
62
N = 154
N = 641
raterRating (2001)
10
raterRating (4001)
****35 168********************
**21
82*********
**21
81*********
**21
82*********
**25
74********
*6
56******
*11
25**

21**
*14
52******
0
There is no significant difference between the samples.
If in fact the populations were THE SAME, the chance of this much evidence
of greater values for variable raterRating (4001) than for variable
raterRating (2001) would be approximately:
Onesided Pvalue = 0.3208.
The chance of this much evidence in EITHER direction would be twice that
value, that is:
Twosided Pvalue = 0.6417.
Sum of ranks for Variable raterRating (2001)
T =
60109
6.5
7
Approximate
Standard Error
0.2417
0.158
Sample Size
154
641
63
N = 154
N = 388
raterRating (2001)
10
raterRating (5001)
******35 108********************
***21
59**********
***21
37******
***21
30*****
****25
39*******
*6
32*****
**11
20***

9*
**14
54**********
0
There is no significant difference between the samples.
If in fact the populations were THE SAME, the chance of this much evidence
of greater values for variable raterRating (5001) than for variable
raterRating (2001) would be approximately:
Onesided Pvalue = 0.4418.
The chance of this much evidence in EITHER direction would be twice that
value, that is:
Twosided Pvalue = 0.8836.
Sum of ranks for Variable raterRating (2001)
T =
41571.5
6.5
7
Approximate
Standard Error
0.2417
0.2538
Sample Size
154
388
64
N = 154
N = 189
raterRating (2001)
10
raterRating (6001)
*******35 47**********
****21 9**
****21 8*
****21 4*
*****25 5*
*6
6*
**11 6*

16***
***14 88********************
0
There is a significant difference between the samples.
If in fact the populations were THE SAME, the chance of this much evidence
of greater values for variable raterRating (2001) than for variable
raterRating (6001) would be approximately:
Onesided Pvalue = 0.0000.
The chance of this much evidence in EITHER direction would be twice that
value, that is:
Twosided Pvalue = 0.0000.
Sum of ranks for Variable raterRating (2001)
T =
31273.5
6.5
2
Approximate
Standard Error
0.2417
0.5455
Sample Size
154
189
65
N = 553
N = 641
raterRating (3001)
10
raterRating (4001)
**************123 168********************
***********97
82*********
**********86
81*********
**********91
82*********
*******63
74********
***29
56******
***26
25**
*5
21**
***33
52******
0
There is no significant difference between the samples.
If in fact the populations were THE SAME, the chance of this much evidence
of greater values for variable raterRating (3001) than for variable
raterRating (4001) would be approximately:
Onesided Pvalue = 0.1110.
The chance of this much evidence in EITHER direction would be twice that
value, that is:
Twosided Pvalue = 0.2220.
Sum of ranks for Variable raterRating (3001)
T =
337616.5
7
7
Approximate
Standard Error
0.1276
0.158
Sample Size
553
641
66
N = 553
N = 388
raterRating (3001)
10
raterRating (5001)
********************123 108*****************
***************97
59*********
*************86
37******
**************91
30****
**********63
39******
****29
32*****
****26
20***
*5
9*
*****33
54********
0
There is no significant difference between the samples.
If in fact the populations were THE SAME, the chance of this much evidence
of greater values for variable raterRating (3001) than for variable
raterRating (5001) would be approximately:
Onesided Pvalue = 0.1020.
The chance of this much evidence in EITHER direction would be twice that
value, that is:
Twosided Pvalue = 0.2040.
Sum of ranks for Variable raterRating (5001)
T =
177576.5
7
7
Approximate
Standard Error
0.1276
0.2538
Sample Size
553
388
67
N = 553
N = 189
raterRating (3001)
10
raterRating (6001)
********************123 47*******
***************97
9*
*************86
8*
**************91
4*
**********63
5*
****29
6*
****26
6*
*5
16**
*****33
88**************
0
There is a significant difference between the samples.
If in fact the populations were THE SAME, the chance of this much evidence
of greater values for variable raterRating (3001) than for variable
raterRating (6001) would be approximately:
Onesided Pvalue = 0.0000.
The chance of this much evidence in EITHER direction would be twice that
value, that is:
Twosided Pvalue = 0.0000.
Sum of ranks for Variable raterRating (6001)
T =
50947
7
2
Approximate
Standard Error
0.1276
0.5455
Sample Size
553
189
68
N = 641
N = 388
raterRating (4001)
10
raterRating (5001)
********************168 108************
*********82
59*******
*********81
37****
*********82
30***
********74
39****
******56
32***
**25
20**
**21
9*
******52
54******
0
There is no significant difference between the samples.
If in fact the populations were THE SAME, the chance of this much evidence
of greater values for variable raterRating (4001) than for variable
raterRating (5001) would be approximately:
Onesided Pvalue = 0.3288.
The chance of this much evidence in EITHER direction would be twice that
value, that is:
Twosided Pvalue = 0.6576.
Sum of ranks for Variable raterRating (5001)
T =
197786
7
7
Approximate
Standard Error
0.158
0.2538
Sample Size
641
388
69
N = 641
N = 189
raterRating (4001)
10
raterRating (6001)
********************168 47*****
*********82
9*
*********81
8*
*********82
4*
********74
5*
******56
6*
**25
6*
**21
16*
******52
88**********
0
There is a significant difference between the samples.
If in fact the populations were THE SAME, the chance of this much evidence
of greater values for variable raterRating (4001) than for variable
raterRating (6001) would be approximately:
Onesided Pvalue = 0.0000.
The chance of this much evidence in EITHER direction would be twice that
value, that is:
Twosided Pvalue = 0.0000.
Sum of ranks for Variable raterRating (6001)
T =
57416
78529.5
2875.1073
Sample Medians
raterRating (4001)
raterRating (6001)
7
2
Approximate
Standard Error
0.158
0.5455
Sample Size
641
189
70
N = 388
N = 189
raterRating (5001)
10
raterRating (6001)
********************108 47********
**********59
9*
******37
8*
*****30
4*
*******39
5*
*****32
6*
***20
6*
*9
16**
**********54
88****************
0
There is a significant difference between the samples.
If in fact the populations were THE SAME, the chance of this much evidence
of greater values for variable raterRating (5001) than for variable
raterRating (6001) would be approximately:
Onesided Pvalue = 0.0000.
The chance of this much evidence in EITHER direction would be twice that
value, that is:
Twosided Pvalue = 0.0000.
Sum of ranks for Variable raterRating (6001)
T =
43087
7
2
Approximate
Standard Error
0.2538
0.5455
Sample Size
388
189
71
0
4042
67.05
4042
67.05
1001
61
1.01
4103
68.07
2001
154
2.55
4257
70.62
3001
553
9.17
4810
79.79
4001
641
10.63
5451
90.43
5001
388
6.44
5839
96.86
6001
189
3.14
6028
100.00
Missing cases = 0
Total
246
12.39
58
2.92
90
4.53
131
6.60
213
10.73
235
11.83
243
12.24
274
13.80
191
72
B
0.15 B
0.86 B
2.32 B
3.42 B
1.91 B
0.96 B
9.62
B
1.57 B
8.90 B 24.08 B 35.60 B 19.90 B
9.95 B
B
4.92 B 11.04 B
8.32 B 10.61 B
9.79 B 10.05 B
999999
10 B
12 B
18 B
77 B
100 B
70 B
28 B
305
B
0.60 B
0.91 B
3.88 B
5.04 B
3.52 B
1.41 B 15.36
B
3.93 B
5.90 B 25.25 B 32.79 B 22.95 B
9.18 B
B 19.67 B 11.69 B 13.92 B 15.60 B 18.04 B 14.81 B
n
Total
61
154
553
641
388
189
1986
3.07
7.75
27.84
32.28
19.54
9.52
100.00
Statistic
DF
Value
Prob
ChiSquare
45
368.225
0.000
Likelihood Ratio ChiSquare
45
313.363
0.000
MantelHaenszel ChiSquare
1
44.031
0.000
Phi Coefficient
0.431
Contingency Coefficient
0.395
Cramer's V
0.193
Warning:
Note:
Total
171
16.19
45
4.26
84
7.95
114
10.80
172
16.29
154
14.58
73
B 15.15 B 16.28 B 20.14 B 15.29 B 10.68 B
2.91 B
999999
7 B
4 B
13 B
42 B
34 B
14 B
1 B
108
B
0.38 B
1.23 B
3.98 B
3.22 B
1.33 B
0.09 B 10.23
B
3.70 B 12.04 B 38.89 B 31.48 B 12.96 B
0.93 B
B 12.12 B 15.12 B 14.58 B 10.00 B
6.80 B
0.97 B
999999
8 B
1 B
8 B
25 B
26 B
21 B
2 B
83
B
0.09 B
0.76 B
2.37 B
2.46 B
1.99 B
0.19 B
7.86
B
1.20 B
9.64 B 30.12 B 31.33 B 25.30 B
2.41 B
B
3.03 B
9.30 B
8.68 B
7.65 B 10.19 B
1.94 B
999999
9 B
3 B
1 B
7 B
9 B
13 B
10 B
43
B
0.28 B
0.09 B
0.66 B
0.85 B
1.23 B
0.95 B
4.07
B
6.98 B
2.33 B 16.28 B 20.93 B 30.23 B 23.26 B
B
9.09 B
1.16 B
2.43 B
2.65 B
6.31 B
9.71 B
999999
10 B
2 B
6 B
19 B
21 B
23 B
11 B
82
B
0.19 B
0.57 B
1.80 B
1.99 B
2.18 B
1.04 B
7.77
B
2.44 B
7.32 B 23.17 B 25.61 B 28.05 B 13.41 B
B
6.06 B
6.98 B
6.60 B
6.18 B 11.17 B 10.68 B
n
Total
33
86
288
340
206
103
1056
3.12
8.14
27.27
32.20
19.51
9.75
100.00
Statistic
DF
Value
Prob
ChiSquare
45
209.189
0.000
Likelihood Ratio ChiSquare
45
202.435
0.000
MantelHaenszel ChiSquare
1
17.971
0.000
Phi Coefficient
0.445
Contingency Coefficient
0.407
Cramer's V
0.199
Warning:
Total
75
8.06
13
1.40
6
0.65
17
1.83
74
5 B
2 B
4 B
9 B
13 B
11 B
2 B
41
B
0.22 B
0.43 B
0.97 B
1.40 B
1.18 B
0.22 B
4.41
B
4.88 B
9.76 B 21.95 B 31.71 B 26.83 B
4.88 B
B
7.14 B
5.88 B
3.40 B
4.32 B
6.04 B
2.33 B
999999
6 B
2 B
7 B
33 B
30 B
8 B
1 B
81
B
0.22 B
0.75 B
3.55 B
3.23 B
0.86 B
0.11 B
8.71
B
2.47 B
8.64 B 40.74 B 37.04 B
9.88 B
1.23 B
B
7.14 B 10.29 B 12.45 B
9.97 B
4.40 B
1.16 B
999999
7 B
6 B
8 B
44 B
47 B
23 B
7 B
135
B
0.65 B
0.86 B
4.73 B
5.05 B
2.47 B
0.75 B 14.52
B
4.44 B
5.93 B 32.59 B 34.81 B 17.04 B
5.19 B
B 21.43 B 11.76 B 16.60 B 15.61 B 12.64 B
8.14 B
999999
8 B
5 B
13 B
72 B
56 B
38 B
7 B
191
B
0.54 B
1.40 B
7.74 B
6.02 B
4.09 B
0.75 B 20.54
B
2.62 B
6.81 B 37.70 B 29.32 B 19.90 B
3.66 B
B 17.86 B 19.12 B 27.17 B 18.60 B 20.88 B
8.14 B
999999
9 B
0 B
16 B
39 B
59 B
25 B
9 B
148
B
0.00 B
1.72 B
4.19 B
6.34 B
2.69 B
0.97 B 15.91
B
0.00 B 10.81 B 26.35 B 39.86 B 16.89 B
6.08 B
B
0.00 B 23.53 B 14.72 B 19.60 B 13.74 B 10.47 B
999999
10 B
10 B
12 B
58 B
79 B
47 B
17 B
223
B
1.08 B
1.29 B
6.24 B
8.49 B
5.05 B
1.83 B 23.98
B
4.48 B
5.38 B 26.01 B 35.43 B 21.08 B
7.62 B
B 35.71 B 17.65 B 21.89 B 26.25 B 25.82 B 19.77 B
n
Total
28
68
265
301
182
86
930
3.01
7.31
28.49
32.37
19.57
9.25
100.00
Statistic
DF
Value
Prob
ChiSquare
45
270.249
0.000
Likelihood Ratio ChiSquare
45
217.086
0.000
MantelHaenszel ChiSquare
1
37.715
0.000
Phi Coefficient
0.539
Contingency Coefficient
0.475
Cramer's V
0.241
Warning:
Note:
75
52
52
76
77
78
79
10. References
Abedi, J. (1996). Interrater/Test Reliability System (ITRS). Multivariate Behavioral
Research, 31(4), pp.409417.
kerblom, A. (2015). syndereN om Kina vs. Vrlden: "I need a bigger sample size" 
Fragbite.se. [online] Fragbite.se. Available at:
http://fragbite.se/fragtv/video/2298/synderenomkinavsvarldenineedabiggersamplesize [Accessed 29 Apr. 2015].
Akerlof, G. (1970). The Market for "Lemons": Quality Uncertainty and the Market
Mechanism. The Quarterly Journal of Economics, 84(3), p.488.
Bacharach, M. (1993). Variable universe games. [S.l.]: [s.n.].
Beyond The Summit, (2014). TI4 Interview with Nahaz (statsman). [online] YouTube.
Available at: https://www.youtube.com/watch?v=KAFbouKJAQ [Accessed 29 Apr.
2015].
Blog.dota2.com, (2015). Matchmaking  Dota 2. [online] Available at:
http://blog.dota2.com/2013/12/matchmaking/ [Accessed 29 Apr. 2015].
Blog.dotacoach.org, (2015). DotaCoach Blog: Does coaching work? We spent $1,300 to
find out.. [online] Available at: http://blog.dotacoach.org/2015/04/doescoachingworkwespent1300to.html [Accessed 29 Apr. 2015].
BobRawrley, (2015). 72.5% of all games are in normal bracket, 15.5% in high and 11.9%
in very high  the initial MMR of 2250 seems to have hardly moved /r/DotA2.
[online] reddit. Available at:
http://www.reddit.com/r/DotA2/comments/2wjo81/725_of_all_games_are_in_normal
_bracket_155_in/corh17g [Accessed 29 Apr. 2015].
Chen, Y., Harper, F., Konstan, J. and Li, S. (2010). Social Comparisons and Contributions
to Online Communities: A Field Experiment on MovieLens. American Economic
Review, 100(4), pp.13581398.
Cocilova, A. (2015). 10 great PC games with incredibly steep learning curves. [online]
PCWorld. Available at: http://www.pcworld.com/article/2061971/10greatpcgames
80
81
82
D., Terveen, L., Rashid, A., Resnick, P. and Kraut, R. (2005). Using Social
Psychology to Motivate Contributions to Online Communities. Journal of ComputerMediated Communication, 10(4), pp.0000.
Liquipedia, (2015). Liquipedia Dota 2 Wiki. [online] Wiki.teamliquid.net. Available at:
http://wiki.teamliquid.net/dota2/Main_Page [Accessed 29 Apr. 2015].
Luca, M. and Smith, J. (2013). Salience is quality disclosure: Evidence from the U.S. News
college ranking. Journal of Economics & Management Strategy, 22(1), pp.5877.
Ludford, P., Cosley, D., Frankowski, D. and Terveen, L. (2004). Think different.
Proceedings of the 2004 conference on Human factors in computing systems  CHI
'04.
Lutz, T. (2013). Nigerian football team's 790 defeat: other infamous losses. [online] the
Guardian. Available at:
http://www.theguardian.com/football/blog/2013/jul/10/nigeriafootballteam790heavydefeats [Accessed 29 Apr. 2015].
McCormick, I., Walkey, F. and Green, D. (1986). Comparative perceptions of driver
ability A confirmation and expansion. Accident Analysis & Prevention, 18(3),
pp.205208.
Medium, (2015). The Fourth Core: AUI_2000's Enigma. [online] Available at:
https://medium.com/@theodore.yan/thefourthcoreaui_2000senigmaa53c3b0d4b47
[Accessed 29 Apr. 2015].
Mehta, J., Starmer, C. and Sugden, R. (1994). Focal points in pure coordination games: An
experimental investigation. Theory and Decision, 36(2), pp.163185.
Mislevy, R., Almond, R., Yan, D. and Steinberg, L. (1999). Bayes nets in educational
assessment: Where the numbers come from. In: Proceedings of the fifteenth
conference on uncertainty in artificial intelligence. Morgan Kaufmann Publishers Inc,
pp.437446.
Noxville, (2015). How the Ratings of the DAC Teams have changed (so far)  datdota.com.
[online] Datdota.com. Available at: http://www.datdota.com/blog/?p=1095 [Accessed
83
29 Apr. 2015].
Piech, C., Huang, J., Do, C., Ng, A., Chen, Z. and Koller, D. (2013). Tuned Models of Peer
Assessment in MOOCs. Cornell University Library.
Playdota.com, (2013). I'm gonna need a 30003500 account  DotA Forums. [online]
Available at: http://www.playdota.com/forums/showthread.php?t=1398477 [Accessed
29 Apr. 2015].
reddit, (2014). Ranked MMR survey  results update /r/DotA2. [online] Available at:
http://www.reddit.com/r/DotA2/comments/2124az/ranked_mmr_survey_results_updat
e/ [Accessed 29 Apr. 2015].
Rioult, F., Mtivier, J., Helleu, B., Scelles, N. and Durand, C. (2014). SECS 2014.
AASRI Procedia, pp.8287.
Rosenthal, R. and Rosnow, R. (2009). Artifacts in behavioral research. New York: Oxford
University Press.
Roth, A., Sonmez, T. and Unver, M. (2003). Kidney Exchange.
Sadler, P. and Good, E. (2006). The Impact of Self and PeerGrading on Student Learning.
Educ. Assessment, 11(1), pp.131.
Schelling, T. (1960). The strategy of conflict. Cambridge: Harvard University Press.
Smartdotabetting.com, (2015). About  SmartDotaBetting. [online] Available at:
http://smartdotabetting.com/about/ [Accessed 29 Apr. 2015].
Steenhuisen, B. (2015). Refining our Notion of Hero Performance  datdota.com. [online]
Datdota.com. Available at: http://www.datdota.com/blog/?p=1110 [Accessed 29 Apr.
2015].
Sugden, R. (1995). A Theory of Focal Points. The Economic Journal, 105(430), p.533.
SuperData Research, (2013). eSports market brief: US accounts for almost half of total
viewership.. [online] Available at: http://www.superdataresearch.com/blog/esportsbrief/ [Accessed 29 Apr. 2015].
84
The Economist, (2015). The once and future king. [online] Available at:
http://www.economist.com/blogs/gametheory/2015/03/statisticalanalysisfootball
[Accessed 29 Apr. 2015].
Thurnsten, C. (2015). Game is hard: how Dota 2 changed my view of the 'average' gamer.
[online] PC Gamer. Available at: http://www.pcgamer.com/gameishardhowthreeyearsofdota2changedmyviewoftheaveragegamer/ [Accessed 29 Apr. 2015].
ToftAndersen, J. (2014). Jacob ToftAndersen on Twitter. [online] Twitter. Available at:
https://twitter.com/themaelk/status/544946864110731264 [Accessed 29 Apr. 2015].
Wagner, M. (2006). On the Scientific Relevance of eSports. In: Proceedings of the 2006
International Conference on Internet Computing and Conference on Computer Game
Development. pp.437440.
Wallace, A. (2014). [Infographic] Dota 2 International Prize NeckAndNeck with
'Traditional' Sports Winnings. [online] Gameskinny.com. Available at:
http://www.gameskinny.com/9u6c1/infographicdota2internationalprizeneckandneckwithtraditionalsportswinnings [Accessed 29 Apr. 2015].
Watters, A. (2012). The Problems with Coursera's Peer Assessments. [online]
Hackeducation.com. Available at: http://hackeducation.com/2012/08/27/peerassessmentcoursera/ [Accessed 29 Apr. 2015].
WhatABaller, (2014). Matchmaking and ratings /r/DotA2. [online] reddit. Available
at: http://www.reddit.com/r/DotA2/comments/1y44o4/matchmaking_and_ratings/
[Accessed 29 Apr. 2015].
Whitehill, J., Ruvolo, P., fan Wu, T., Bergsma, J. and Movellan, J. (2009). Whose vote
should count more: Optimal integration of labels from labellers of unknown expertise.
Advances in Neural Information Processing Systems, (22), pp.20352043.
Wingfield, N. (2014). In ESports, Video Gamers Draw Real Crowds and Big Money.
[online] Nytimes.com. Available at:
http://www.nytimes.com/2014/08/31/technology/esportsexplosionbringsopportunityrichesforvideogamers.html [Accessed 29 Apr. 2015].
85
Гораздо больше, чем просто документы.
Откройте для себя все, что может предложить Scribd, включая книги и аудиокниги от крупных издательств.
Отменить можно в любой момент.