Академический Документы
Профессиональный Документы
Культура Документы
Analysis of Price
Variations in eBay
Video Game Auctions
Andrew Choe, Yuto Ezure, Eric Guan, Zimo Li, Tianlun Zhang
6/1/2012
I.
Introduction
In daily real world transactions, we observe people willing to pay different prices
for goods that are functionally homogeneous. Prices for the same good vary from location
to location, such as the price of the same brand-name shirt in different store branches, or
the same bottled beer at different bars. Many things explain these differences perhaps
one bar is more popular and has been around longer, or maybe one of the stores is in a
busier part of town and gets more traffic. Reputation, location, even how appealing a
store looks these and countless other factors contribute to the price, beyond the base
price of an item.
We investigate if there are parallels to such effects in an online setting such as
eBay. In many ways, such effects are a lot simpler to measure on eBay: in the real world,
there are innumerable inputs that might affect price, many that are practically impossible
to account for. On eBay, however, the anonymity and scarcity of information for any
given item allows us to focus on how particular, measurable differences explain
variations in the selling price of an item. A buyer only has access to information about an
item from its listing website: current price, starting price, condition, shipping price, seller
rating, number of bids, number of bidders, item description, auction duration, and
pictures of the item. eBay allows us to measure how these variables affect the final
selling price for a multiplicity of different items. From such analysis, we can examine
what traits about an item have significant effects on its selling price. Primarily, it would
be interesting to examine whether more reputable sellers are at an advantage by testing
whether the sellers rating actually has a significant effect on the final sale price of the
item. Additionally, we can examine variables related to the auction mechanism starting
price and number of bidders and also see if they influence the final price.
A.
games. There are two main reasons for this; video games are very standardized there is
virtually no difference across two different copies of the same game and at the same
time, the games on eBay come in a wide range of conditions of wear-and-tear. This gives
us an extremely nice opportunity to test for consumer behavior in the presence of limited
information since besides a qualitative categorization of condition, there is very little
other information available to the buyer. Specifically, we seek to test if the sellers
reputation has an influence in this regard; i.e. if the extra degree of trust in a seller of
higher reputation results in a significant difference in the final price if buyers are
willing to pay a premium for an assurance of quality.
Secondly, while we limit our scope to only video game listings, there still exist
significant differences between the games. In many cases, different games cater to
completely different audiences. This allows us to obtain a wider range of regressor values.
For example, it is probable that older, more obscure games are only listed by sporadic
individuals who still happen to own them, while more popular games are listed not only
by individuals, but also by professional merchants and stores that have extremely high
seller ratings. Similarly, obscure games tend to have fewer bidders per listing, while
popular games can have fairly high bidding activity throughout all listings. Having more
variation in the independent variables will lead to smaller standard errors for our
estimates and ultimately more significant results for the regression.
B.
intermediary through which buyers and sellers interact. eBay auctions are second-price
auctions. The auction occurs in real time, with the current selling price (the secondhighest bid) displayed on the listing webpage. Some auctions have a Buy-It-Now option,
where buyers can bid some fixed price to immediately win the auction.
When posting an auction, the seller specifies a starting price and an ending date.
When looking at a listing, the bidder sees several attributes notably, the current price,
the time left for the auction and the sellers feedback rating. Also visible are the shipping
cost, the starting price, the number of bids, and the number of distinct bidders. Since
these variables are all that is visible to the buyer, these variables ought to explain most of
the variation in final sale price.
C.
Significant Results
We eventually find that log of seller rating, starting price, and number of bidders
are significantly positively correlated with final price while quantity by day is statistically
significant, and slightly negatively correlated with final price. Additionally, we find that
sellers with higher rating tend to sell quality items at a higher premium that buyers do
value the assurance of quality. Finally, we find that increasing numbers of bidders not
only drive up the final auction price, but also decrease the variation in final prices.
II.
Literature Review
There have been several papers researching factors that contribute to the final
price on eBay auctions. In The Dynamics of Seller Reputation: Theory and Evidence
from eBay, Cabral and Hortacsu discover that sale prices of items listed by a seller drop
after the seller receives his first negative rating. Although we are not looking at exactly
the same variable due to the difficulty of collecting this negative rating specifically,
Cabral and Hortacsus result is still significant to us in that it proves that buyers care
about the reputation of the sellers. Does a Sellers Ecommerce Reputation Matter?
Evidence from eBay Auctions by Melenik and James further strengthens this point as it
finds statistically significant correlation between seller rating and price. The paper also
included some of our proposed control variables, such as length (duration) of the auction.
However, their research involved a uniform item, so we would have to consider
additional control variables for our paper for across-item differences.
In fact, there are many papers involving a regression between rating and the final
price. In Pennies from eBay: the Determinant of Price in Online Auctions, by LuckingReiley et al, auction duration, starting price, and reserve prices were identified as having
positive effects on the final price. The effects of auction length and starting price may be
expected, as longer auction length will allow more bidders to see the product and higher
starting price will prevent the item from getting an unjustly low final price. The reserve
price would have a positive correlation with the final price by the same logic, although
we did not include reserve price as a control variable because the buyers do not have
access to the information.
Thus far, the directions of correlation on our proposed variables all seem to be
quite intuitive. However, intuitive interpretation may not hold true for some of the other
variables. For example, in The Effect of Asymmetric Bidder Size on an Auctions
Performance: Are More Bidders Always Better?, Wedad Elmaghraby found that number
of bidders should not always positively affect the final price, which is counterintuitive.
The result was derived based on his constructed theory, which involved differentiating
bidders based on their production capacity. With our regressions, we should be able to
determine whether the effects he proposed are actually significant enough to change the
direction of correlation.
One more thing we should be aware of in our research is winners curse, which is
a common place occurrence in online auctions, as stated in Economic Insights from
Internet Auctions by Patrick Bajari and Ali Hortacsu. The winners curse phenomenon
states that the bidders who win the auctions only do so because they overvalue the item.
The paper explains that the seller reputation is one of the ways by which eBay hopes to
decrease the winners curse effect; the availability of rating information would allow
bidders to make better estimates of the value of the item. However, the paper could not
gather enough evidence from previous research to determine whether or not rating system
actually resolves the winners curse problem. We hope to provide further insight on this
issue by observing whether or not seller ratings affect the final outcome of the auction.
The uniqueness of our research compared to similar literature is that our research
controls for almost all the information that a bidder sees when participating in an auction.
While there are some possible omitted variable bias coming from our lack of
consideration of non-quantifiable information such as the quality of the product picture
and the product description, the inclusion of all other types of data available to the
bidders should make our results less prone to the bias when compared to previous papers.
It should also be noted that we are doing our research on non-uniform products, with
dummy variables to represent each unique product. Therefore, our research results should
be more applicable to items that were not our direct research subjects.
III.
Model Specification
(Where Y is a vector of final prices, X is a matrix of variables of interest, C is a matrix of controls, and is
a vector of error terms).
We regressed the final price of each item sold on the following variables of
interest: log rating of the sellers, the adjusted starting price of the auction (defined as the
sum of the starting price and the shipping cost), the number of bidders, the quantity
(defined as the number of concurrent sales of identical items), and the presence of a best
seller medal. We also controlled for all other relevant information that were available to
the buyers as well as to us, such as the condition of the item (coded as five dummy
variables: brand new, like new, very good, good, and acceptable), duration of the auction,
the date on which the auction ended (coded as dummy variables for each of the 17 days),
and the game title (coded as dummy variables for each game). Finally, we included an
interaction term between the log rating and the condition.
Essentially, we included all the information that was available to the buyers at
their time of decision-making to analyze how they respond to different attributes of the
listing. Under the assumption that buyer behavior is fairly stable, then these variables
should explain all fluctuations in the final sale price. Seller rating has a very wide range
(from 0 to 500,000) and appeared to behave non-linearly. In Figure 1, seller rating has an
extremely skewed distribution, while log seller rating has a nice bell-curve shape. Based
on these observations, we elected to use log seller rating in our linear model. We chose to
specifically include the interaction terms between log seller rating and condition
dummies; the intuition is that the different seller ratings grants different levels of
7
credibility to the condition of the item that the seller specified, and would therefore alter
400
Frequency
200
1000
500
0
Frequency
1500
600
2000
the effect that the condition variable has on the final price.
500
1000
Seller Rating
1500
2000
5
10
Logarithm of Seller Rating
15
Figure 1: Histograms of rating (excluding many outlier seller ratings > 2000), log-rating.
We expect the log rating to be a primary contributor to the price because it is the
sole measure of the sellers credibility, and the buyers would be willing to pay a premium
for seller credibility; the coefficient for log rating should be significantly positive.
Similarly, we expect that the presence of a best seller medal would have a positive
coefficient because it increases the confidence that the buyer has in the seller and is
therefore willing to pay more.
We also conjecture that the number of bidders will have a positive coefficient
with final price, as well as be negatively correlated with the variance of the final price.
The idea is that for auctions with more distinct bidders, the final sale value is closer to the
maximum price consumers are willing to pay for the good on the secondary market, and
therefore will be less volatile. Due to this, we further expect more significant t-stats for
games with higher average number of bids.
Starting price is more puzzling; a low starting price might increase the number of
bidders following an item, but a high starting price guarantees some minimum sale price.
We expect that the quantity variable would have a negative coefficient, assuming that
buyer behavior is stable.
For the coefficients of control variables, we expect that the duration of the auction
would also have a positive coefficient since the price cannot decrease after a bid has been
made. We set the Brand-New condition as the default, so each of the dummy variables
for condition (Like-New, Very-Good, Good, and Acceptable) ought to have negative
coefficients with larger effects in that order (i.e. worst condition leads to lowest price).
Finally, we expect the interaction terms to have negative coefficients. This is because we
expect that higher seller rating gives more credibility to the condition specified, so seller
rating is most influential for brand new products. Since the default condition is set to
brand new, the coefficients on the interaction terms of lower conditions are expected to
be negative.
All the explanatory variables are stochastic except for the dummy variables for
game title, which is the only variable that we deliberately selected for in collecting the
data. If buyer behavior is not influenced by previous auctions, then there will not be
autocorrelation. This assumption is somewhat unwarranted, but to account for this would
be difficult due to the stochastic nature of the date completed variable. Finally, one of our
main hypotheses was that more bidders would drive the final price closer to the real
intrinsic value of the item and thus decrease the variance in final price. Due to this
correlation between the number of bidders and variance of the dependent variable, we
assumed that there would be a heteroskedasticity problem, which we corrected for by
using the robust option in Stata.
IV.
Data
A.
Collection
In order to collect data, we wrote a web-crawler script in Python to browse
through eBay and parse out the desired information (refer to Appendix A for code). The
crawler operates in two steps. First, when given the URL for an eBay search query, the
crawler collects a list of listing URLs for the product. Then the crawler opens up each
listing via its URL and finds the following variables:
Final Price
Shipping Cost either some specified value or varies based on location
Seller Rating a numerical rating based on the sellers history and feedback
Best-Seller Medal (Dummy) a status awarded based on seller performance
Condition the state of wear-and-tear of the item
Buy-It-Now (Dummy) if the final purchase was via the Buy-It-Now option
Initial Price
Number of Bidders
Number of Bids
Date Ended (Dummy)
Auction Duration
We restricted our dataset to completed auctions in the US.
B.
Data Refinement
There were a few issues with the data. Only the last three weeks of eBay auctions
are publicly visible online; due to the time constraints of this project, we were only able
to collect data on recent eBay listings. Additionally, many listings were contaminated and
unusable. Some listings involved bundles of products; one listing we found was an iPad
bundled with WWE wrestling tickets. Other listings were simply mislabeled; there were
several listings selling the packaging for a game but not the game itself. Fortunately, most
10
of these contaminated listings either have extremely high/low prices or are unsold. We
were able to prune the data by removing extreme outlier prices and unsold listings.
In addition, we had to make several adjustments to the raw data in order to obtain
meaningful measurements of information relevant to consumer decisions:
Start and final prices. Starting and final prices have been adjusted to include
shipping.
C.
Data Summary
Table 1: Summary Statistics
$16.05 and $29.48. We can see from the histogram that the distribution of final price is
200
400
Frequency
600
800
approximately normal:
20
40
Adjusted Final Price
60
80
We can see from the histograms below that there is extremely high volatility in
both number of bidders and adjusted starting price across listings. This is especially
noticeable for starting price, where many sellers choose to start their listings near 0,
extremely far from the true value of the game. It is important to note that number of
500
1000
Frequency
600
400
200
0
Frequency
800
1500
1000
2000
bidders is dependent on game more popular titles tend to have more bidders on average.
10
Number of Bidders
15
20
20
40
Adjusted Starting Price
60
80
12
V.
Results
The coefficients that are significant at 1% significance level are those of log of
seller rating, starting price, number of bidders, quantity, duration, the interaction term
for the Like-New condition and log of seller rating, and the interaction term for the
Very-Good condition and log of seller rating (see next page for regression output). As
stated in the model section, because our data is not time-dependent data, we do not have
any obvious autocorrelation to test for. We tested for heteroskedasticity using the
Breusch-Pagan test, which came back positive (with p-value of 0.000). We use the robust
option in Stata to correct for this in our main regressions. In order to test for the
robustness of our results, we run several additional regressions where we eliminate some
of the control variables to see if the coefficients for remaining variables still remain
significant. In particular, the control variables we eliminate are the date dummies,
condition dummies, auction duration, and the condition-rating interaction terms. We do
not use any instrumental variables in our regression because there are no variables in the
eBay listings that we could collect data on and instrument with.
13
In addition to the aggregate regressions above, we also performed regression (4) for each
of the 47 individual game titles. This is to see how many of the coefficients are
significant at the individual game level, and to possibly explore the factors behind these
results. The results from this series of regressions are reproduced in Table 3.
14
15
VI.
A.
Discussion
Analysis of Coefficients
In our final regression, we have statistically significant values the four regressors
we focused on log-rating, adjusted starting price, number of bidders and quantity. The
value of the coefficients are, respectively, 0.368, 0.329, 0.850, and -0.079 [See Table 1].
The rating coefficient suggests that for a 1% increase in rating, we have around a 0.4 cent
increase in final price. The starting price suggests that for every additional dollar in the
initial price, the ending price will increase by around 33 cents.
Likewise, for an
additional bidder we expect an 85 cent increase per bidder. Finally, for every additional
unit sold we estimate around an 8 cent increase in final price.
The coefficient on log-rating and number of bidders are quite unsurprising more
reputable sellers get higher prices and more bidders will drive up the final price. In
Pennies from eBay: the Determinant of Price in Online Auctions (Lucking-Reiley),
they discovered that starting price was highly correlated with final price, so our
coefficient on starting price is also quite expectable. However, this starting price
coefficient is probably contaminated by the fact that more expensive items tend to have
greater starting prices.
Interestingly, we find a slight negative correlation between equilibrium quantity
and final price. It is important to note that we cannot use the quantity relationship to
sketch out demand and supply relations; it is very likely that quantity is correlated with
popularity of a game, which is also correlated with final price. Additionally, shifts in
either demand or supply will affect both final price and quantity. However, it is notable
16
that the magnitude of this coefficient is very; this suggests that the price a game is sold at
is functionally unaffected by how large the market for that game is. Though this is by no
means solid evidence, it is consistent with highly elastic demand. We obtained very
similar results when regressing over quantity supplied instead of equilibrium quantity
(see the data refinement section for details on the distinction).
Also very notable is that the interaction terms between condition and log rating
are significant and negative; (-0.165, -0.273, -0.353, -0.203 for Like-New, Very-Good,
Good, and Acceptable, respectively) [See Table 1]. Since these are all relative to the
norm of Brand-New, these negative interaction terms imply that seller rating matters
more for higher-quality goods, with the exception of Acceptable goods. This agrees with
our theoretical predictions; the quality premium that buyers pay is magnified when the
quality is higher.
Given the magnitudes of final price we are dealing with in our data, such results
are not economically insignificant. For example, if two different sellers have ratings that
are around 300 and 2400 (not an unusual occurrence), our model suggests that the latter
will get a final price that $1.20 greater than the former. Given that games sell for around
30 dollars, this is quite a figure.
B.
game titles had statistically robust correlation between log rating and final price while
others do not. One possible explanation for this is the number of bidders that participated
in auctions for a particular title. If there are fewer people participating in the auction, then
17
there is a higher chance that the listing will not reach the equilibrium price. Additionally,
due to variation in the bidders demand, fewer bidders implies that there will be a higher
volatility in the final price. The volatility transfers to the variance of the error term, which
directly correlates with the variance of the coefficients. Thus, we conjecture that the
game titles without significant correlations did not have enough participant bidders to
reduce the volatility of the final price. To check the validity of this hypothesis, we plotted
the volatility of final price and the number of bidders for each game against the p-value
of the log-rating and final price correlation [See Figures 4 & 5]. The two regressions for
the p-value against the volatility of final price and the p-value against the number of
bidders both result in a p-value of less than 0.05. However, it should be noted that this
regression only involves 37 data points and thus is not terribly robust.
3.5
3
2.5
2
1.5
1
0.5
0
3
3.5
4.5
5
5.5
Mean Number of Bidders
6.5
18
4
3.5
3
2.5
2
1.5
1
0.5
0
0
0.1
0.2
0.3
0.4
SD / Mean of Final Price
0.5
0.6
C.
used a web-crawler to get our data, so we could only get as much data as eBay still had
publicly available on their website, which was a little over 2 weeks of completed auctions.
Given that the quantity vs. price data gave us interesting results, it might be worth the
effort to collect such information over time-spans longer than half a month. Also, we did
not take into account the presence and quality of the listing pictures, nor the quality of the
written description of the item in our regression due to the fact that we could not quantify
such data. In the future, it is possible we could come up with some rough metric of the
listing quality to account for in the regression.
Internal to the video game market, we could account for factors such as the age of
the video game, the number of copies sold and the demographics of the games target
audience. These factors will affect the supply in the secondary market. Additionally, we
could have accounted for the critics rating of the game, the console the game was for, as
19
well as other qualities such as online play or multiplayer capabilities. These factors
certainly affect the resale value of the game, but they are difficult to quantify.
On a slightly larger scale, this type of analysis could be extended to markets
outside of videogames. We used videogames for their standardization among the same
game and significant differences among varying games. Other products out there might
fulfill these roles as, or even more, adequately, than video games and should be looked
into.
VII.
Conclusion
Through our research, we have found that the bidders in fact care about the seller
ratings; people are willing to pay a reputation premium in the form of a higher price for
items listed by sellers with good reputations. We also found positive correlation between
condition of the products and the final price. Interestingly enough, the interaction terms
between seller ratings and the conditions had significant negative coefficients. This
indicates that the bidders care more about seller rating for claims of higher quality; this
reputation premium is magnified in high-quality items.
The number of bidders has two main effects; increasing the final price value and
decreasing the volatility of the final price. Both of these effects are predicted by auction
theory and present in our data. As more people bid, the price converges to the maximum
price in the eBay secondary market; this causes both an increase in final price and a
decrease in the volatility. This decrease in the variance of the final price can help account
for the fact that only some of the game titles have significant correlation between log
rating and the final price. We found that games with high number of bidders tended to
have highly significant coefficients between log rating and the final price.
20
We also found that other variables such as starting price, quantity sold, and
duration all have significant correlations with the final price, which is consistent with our
predictions as well as the results from existing literature. However, all of these results are
not conclusive as they may be contaminated by other factors. For example, the positive
correlation between starting price and the final price may only be because of people
listing more expensive games with higher starting prices.
21
Bibliography
Bajari, Patrick & Hortacsu, Ali, "Economic Insights from Internet Auctions," Journal of
Economic Literature, American Economic Association, vol 42 (2004): 457-486.
Cabral, Lus M B & Hortasu, Ali, "The Dynamics of Seller Reputation: Theory and
Evidence from eBay," CEPR Discussion Papers 4345 (2004).
Elmaghraby, Wedad, The Effect of Asymmetric Bidder Size on an Auctions
Performance: Are More Bidders Always Better Management Science,
vol.51(2005): 1763-1776.
Lucking-Reiley, David & Bryan, Doug & Prasad, Naghi & Reeves, Daniel, "Pennies
from eBay: the Determinants of Price in Online Auctions," Econometric Society
World Congress 2000 Contributed Papers (2000).
Melnik, Mikhail I & Alm, James, "Does a Seller's Ecommerce Reputation Matter?
Evidence from eBay Auctions," Journal of Industrial Economics, Wiley
Blackwell, vol. 50(2000): 337-49.
22
VIII.
23
soup = URLtoSoup(url)
urls = []
next=False
# class="vip" inidcates listings
for a in soup('a',{'class':'vip'}):
urls.append(a['href'])
# class="botpg-next" indicates the 'Next' button on the bottom of
the page.
#If the url doesn't begin with 'http', then the link is dead.
for td in soup('td',{'class':'botpg-next'}):
for td1 in td(href=True):
tdurl = td1['href']
if ((tdurl[:4] == "http") & (tdurl!=url)):
urls.append(tdurl)
next=True
if (tdurl[:4] == "/ccg"):
urls.append("http://www.ebay.com"+tdurl)
next=True
if (next):
temp = urls[:-1]
temp.extend(Penny(urls[-1]))
urls = temp
return urls
def Nickel(url):
soup = URLtoSoup(url)
#price
price = "ASDF"
for span in soup('span',{'itemprop':'price'}):
price = extractPrice(span.string)
#shipping cost
shippingcost = None
#doesn't work for variable shipping cost cases
for span in soup('span',{'class':'vi-is1-sh-srvcCost vi-is1hideElem vi-is1-showElem'}):
shippingcost = extractPrice(span.string) #May need better
parsing
#FREE
for span in soup('span',{'class':'vi-is1-tese'}):
shippingcost = span.string.strip()
#seller rating
sellerrating = -1
for a in soup('a',{'class':'mbg-fb'}):
sellerrating = int(a.contents[1])
#best-seller medal
topseller = (len(soup('b',{'class':'mbg-ts'})) > 0)
#condition
24
condition = None
for span in soup('span',{'class':'vi-is1-condText'}):
condition = span.string.strip()
data = [ price, shippingcost, sellerrating, topseller, condition]
#bidding info link
bidurl = None
#doesn't work for unsold goods yet. It would go to the original
listing link, and then to the # bids link.
for span in soup('span',{'itemprop':'offers'}):
if (str(span.a) != "None"):
bidurl = span.a['href']
data.extend(Dime(bidurl))
else:
print "No bidding info link found."
return data
def Dime(url):
soup = URLtoSoup(url)
# data will contain:
# [ Starting Bid, Bidders, Bids, Time Ended, Duration ]
#Buy-it-Now
BiN = False
for img in soup('img',{'alt':'Buy It Now'}):
BiN = True
#Starting Bid
startingbid = None
cvf =
soup('td',{'align':'left','class':'contentValueFont','style':'color:#66
6'})
if (len(cvf)>0):
startingbid = extractPrice(cvf[-1].string)
data = [BiN, startingbid]
# bidders, bids, time ended, duration
for dat in soup('span',{'class':'titleValueFont'}):
da = dat.string.strip()
if (len(da)<15):
d = extractInt(da)
else:
d = extractFirst(da)
data.append(d)
return data
# ==========================
name="NAME"
25
n = len(sys.argv)
for i in range((n-1)/2):
penny = Penny(sys.argv[2*i+1])
name = sys.argv[2*i+2]
print penny
print len(penny)
print "================="
data=[]
for p in penny:
nickel = Nickel(p)
print nickel
if (len(nickel) >= 7):
data.append(nickel)
f = open(name+".csv",'w')
for nick in data:
for item in nick:
f.write("%s, " % item)
f.write("\n")
26
IX.
27
variable
variable
variable
variable
variable
variable
variable
variable
variable
variable
variable
variable
28
29
30
X.
32