Вы находитесь на странице: 1из 32

ECON 20900 INTRO TO ECONOMETRICS: HONORS

Analysis of Price
Variations in eBay
Video Game Auctions
Andrew Choe, Yuto Ezure, Eric Guan, Zimo Li, Tianlun Zhang
6/1/2012

I.

Introduction
In daily real world transactions, we observe people willing to pay different prices

for goods that are functionally homogeneous. Prices for the same good vary from location
to location, such as the price of the same brand-name shirt in different store branches, or
the same bottled beer at different bars. Many things explain these differences perhaps
one bar is more popular and has been around longer, or maybe one of the stores is in a
busier part of town and gets more traffic. Reputation, location, even how appealing a
store looks these and countless other factors contribute to the price, beyond the base
price of an item.
We investigate if there are parallels to such effects in an online setting such as
eBay. In many ways, such effects are a lot simpler to measure on eBay: in the real world,
there are innumerable inputs that might affect price, many that are practically impossible
to account for. On eBay, however, the anonymity and scarcity of information for any
given item allows us to focus on how particular, measurable differences explain
variations in the selling price of an item. A buyer only has access to information about an
item from its listing website: current price, starting price, condition, shipping price, seller
rating, number of bids, number of bidders, item description, auction duration, and
pictures of the item. eBay allows us to measure how these variables affect the final
selling price for a multiplicity of different items. From such analysis, we can examine
what traits about an item have significant effects on its selling price. Primarily, it would
be interesting to examine whether more reputable sellers are at an advantage by testing
whether the sellers rating actually has a significant effect on the final sale price of the

item. Additionally, we can examine variables related to the auction mechanism starting
price and number of bidders and also see if they influence the final price.
A.

Choice of Items: Video Games


To examine these relationships, we focus our study on listings of console video

games. There are two main reasons for this; video games are very standardized there is
virtually no difference across two different copies of the same game and at the same
time, the games on eBay come in a wide range of conditions of wear-and-tear. This gives
us an extremely nice opportunity to test for consumer behavior in the presence of limited
information since besides a qualitative categorization of condition, there is very little
other information available to the buyer. Specifically, we seek to test if the sellers
reputation has an influence in this regard; i.e. if the extra degree of trust in a seller of
higher reputation results in a significant difference in the final price if buyers are
willing to pay a premium for an assurance of quality.
Secondly, while we limit our scope to only video game listings, there still exist
significant differences between the games. In many cases, different games cater to
completely different audiences. This allows us to obtain a wider range of regressor values.
For example, it is probable that older, more obscure games are only listed by sporadic
individuals who still happen to own them, while more popular games are listed not only
by individuals, but also by professional merchants and stores that have extremely high
seller ratings. Similarly, obscure games tend to have fewer bidders per listing, while
popular games can have fairly high bidding activity throughout all listings. Having more
variation in the independent variables will lead to smaller standard errors for our
estimates and ultimately more significant results for the regression.

B.

Overview of the eBay Auction System


eBay does not manage payments, nor does it deliver goods; it is simply an

intermediary through which buyers and sellers interact. eBay auctions are second-price
auctions. The auction occurs in real time, with the current selling price (the secondhighest bid) displayed on the listing webpage. Some auctions have a Buy-It-Now option,
where buyers can bid some fixed price to immediately win the auction.
When posting an auction, the seller specifies a starting price and an ending date.
When looking at a listing, the bidder sees several attributes notably, the current price,
the time left for the auction and the sellers feedback rating. Also visible are the shipping
cost, the starting price, the number of bids, and the number of distinct bidders. Since
these variables are all that is visible to the buyer, these variables ought to explain most of
the variation in final sale price.
C.

Significant Results
We eventually find that log of seller rating, starting price, and number of bidders

are significantly positively correlated with final price while quantity by day is statistically
significant, and slightly negatively correlated with final price. Additionally, we find that
sellers with higher rating tend to sell quality items at a higher premium that buyers do
value the assurance of quality. Finally, we find that increasing numbers of bidders not
only drive up the final auction price, but also decrease the variation in final prices.

II.

Literature Review
There have been several papers researching factors that contribute to the final

price on eBay auctions. In The Dynamics of Seller Reputation: Theory and Evidence
from eBay, Cabral and Hortacsu discover that sale prices of items listed by a seller drop

after the seller receives his first negative rating. Although we are not looking at exactly
the same variable due to the difficulty of collecting this negative rating specifically,
Cabral and Hortacsus result is still significant to us in that it proves that buyers care
about the reputation of the sellers. Does a Sellers Ecommerce Reputation Matter?
Evidence from eBay Auctions by Melenik and James further strengthens this point as it
finds statistically significant correlation between seller rating and price. The paper also
included some of our proposed control variables, such as length (duration) of the auction.
However, their research involved a uniform item, so we would have to consider
additional control variables for our paper for across-item differences.
In fact, there are many papers involving a regression between rating and the final
price. In Pennies from eBay: the Determinant of Price in Online Auctions, by LuckingReiley et al, auction duration, starting price, and reserve prices were identified as having
positive effects on the final price. The effects of auction length and starting price may be
expected, as longer auction length will allow more bidders to see the product and higher
starting price will prevent the item from getting an unjustly low final price. The reserve
price would have a positive correlation with the final price by the same logic, although
we did not include reserve price as a control variable because the buyers do not have
access to the information.
Thus far, the directions of correlation on our proposed variables all seem to be
quite intuitive. However, intuitive interpretation may not hold true for some of the other
variables. For example, in The Effect of Asymmetric Bidder Size on an Auctions
Performance: Are More Bidders Always Better?, Wedad Elmaghraby found that number
of bidders should not always positively affect the final price, which is counterintuitive.

The result was derived based on his constructed theory, which involved differentiating
bidders based on their production capacity. With our regressions, we should be able to
determine whether the effects he proposed are actually significant enough to change the
direction of correlation.
One more thing we should be aware of in our research is winners curse, which is
a common place occurrence in online auctions, as stated in Economic Insights from
Internet Auctions by Patrick Bajari and Ali Hortacsu. The winners curse phenomenon
states that the bidders who win the auctions only do so because they overvalue the item.
The paper explains that the seller reputation is one of the ways by which eBay hopes to
decrease the winners curse effect; the availability of rating information would allow
bidders to make better estimates of the value of the item. However, the paper could not
gather enough evidence from previous research to determine whether or not rating system
actually resolves the winners curse problem. We hope to provide further insight on this
issue by observing whether or not seller ratings affect the final outcome of the auction.
The uniqueness of our research compared to similar literature is that our research
controls for almost all the information that a bidder sees when participating in an auction.
While there are some possible omitted variable bias coming from our lack of
consideration of non-quantifiable information such as the quality of the product picture
and the product description, the inclusion of all other types of data available to the
bidders should make our results less prone to the bias when compared to previous papers.
It should also be noted that we are doing our research on non-uniform products, with
dummy variables to represent each unique product. Therefore, our research results should
be more applicable to items that were not our direct research subjects.

III.

Model Specification

(Where Y is a vector of final prices, X is a matrix of variables of interest, C is a matrix of controls, and is
a vector of error terms).

We regressed the final price of each item sold on the following variables of
interest: log rating of the sellers, the adjusted starting price of the auction (defined as the
sum of the starting price and the shipping cost), the number of bidders, the quantity
(defined as the number of concurrent sales of identical items), and the presence of a best
seller medal. We also controlled for all other relevant information that were available to
the buyers as well as to us, such as the condition of the item (coded as five dummy
variables: brand new, like new, very good, good, and acceptable), duration of the auction,
the date on which the auction ended (coded as dummy variables for each of the 17 days),
and the game title (coded as dummy variables for each game). Finally, we included an
interaction term between the log rating and the condition.
Essentially, we included all the information that was available to the buyers at
their time of decision-making to analyze how they respond to different attributes of the
listing. Under the assumption that buyer behavior is fairly stable, then these variables
should explain all fluctuations in the final sale price. Seller rating has a very wide range
(from 0 to 500,000) and appeared to behave non-linearly. In Figure 1, seller rating has an
extremely skewed distribution, while log seller rating has a nice bell-curve shape. Based
on these observations, we elected to use log seller rating in our linear model. We chose to
specifically include the interaction terms between log seller rating and condition
dummies; the intuition is that the different seller ratings grants different levels of
7

credibility to the condition of the item that the seller specified, and would therefore alter

400
Frequency

200

1000

500
0

Frequency

1500

600

2000

the effect that the condition variable has on the final price.

500

1000
Seller Rating

1500

2000

5
10
Logarithm of Seller Rating

15

Figure 1: Histograms of rating (excluding many outlier seller ratings > 2000), log-rating.

We expect the log rating to be a primary contributor to the price because it is the
sole measure of the sellers credibility, and the buyers would be willing to pay a premium
for seller credibility; the coefficient for log rating should be significantly positive.
Similarly, we expect that the presence of a best seller medal would have a positive
coefficient because it increases the confidence that the buyer has in the seller and is
therefore willing to pay more.
We also conjecture that the number of bidders will have a positive coefficient
with final price, as well as be negatively correlated with the variance of the final price.
The idea is that for auctions with more distinct bidders, the final sale value is closer to the
maximum price consumers are willing to pay for the good on the secondary market, and
therefore will be less volatile. Due to this, we further expect more significant t-stats for
games with higher average number of bids.
Starting price is more puzzling; a low starting price might increase the number of
bidders following an item, but a high starting price guarantees some minimum sale price.

We expect that the quantity variable would have a negative coefficient, assuming that
buyer behavior is stable.
For the coefficients of control variables, we expect that the duration of the auction
would also have a positive coefficient since the price cannot decrease after a bid has been
made. We set the Brand-New condition as the default, so each of the dummy variables
for condition (Like-New, Very-Good, Good, and Acceptable) ought to have negative
coefficients with larger effects in that order (i.e. worst condition leads to lowest price).
Finally, we expect the interaction terms to have negative coefficients. This is because we
expect that higher seller rating gives more credibility to the condition specified, so seller
rating is most influential for brand new products. Since the default condition is set to
brand new, the coefficients on the interaction terms of lower conditions are expected to
be negative.
All the explanatory variables are stochastic except for the dummy variables for
game title, which is the only variable that we deliberately selected for in collecting the
data. If buyer behavior is not influenced by previous auctions, then there will not be
autocorrelation. This assumption is somewhat unwarranted, but to account for this would
be difficult due to the stochastic nature of the date completed variable. Finally, one of our
main hypotheses was that more bidders would drive the final price closer to the real
intrinsic value of the item and thus decrease the variance in final price. Due to this
correlation between the number of bidders and variance of the dependent variable, we
assumed that there would be a heteroskedasticity problem, which we corrected for by
using the robust option in Stata.

IV.

Data

A.

Collection
In order to collect data, we wrote a web-crawler script in Python to browse

through eBay and parse out the desired information (refer to Appendix A for code). The
crawler operates in two steps. First, when given the URL for an eBay search query, the
crawler collects a list of listing URLs for the product. Then the crawler opens up each
listing via its URL and finds the following variables:

Final Price
Shipping Cost either some specified value or varies based on location
Seller Rating a numerical rating based on the sellers history and feedback
Best-Seller Medal (Dummy) a status awarded based on seller performance
Condition the state of wear-and-tear of the item
Buy-It-Now (Dummy) if the final purchase was via the Buy-It-Now option
Initial Price
Number of Bidders
Number of Bids
Date Ended (Dummy)
Auction Duration
We restricted our dataset to completed auctions in the US.

B.

Data Refinement
There were a few issues with the data. Only the last three weeks of eBay auctions

are publicly visible online; due to the time constraints of this project, we were only able
to collect data on recent eBay listings. Additionally, many listings were contaminated and
unusable. Some listings involved bundles of products; one listing we found was an iPad
bundled with WWE wrestling tickets. Other listings were simply mislabeled; there were
several listings selling the packaging for a game but not the game itself. Fortunately, most

10

of these contaminated listings either have extremely high/low prices or are unsold. We
were able to prune the data by removing extreme outlier prices and unsold listings.
In addition, we had to make several adjustments to the raw data in order to obtain
meaningful measurements of information relevant to consumer decisions:

Start and final prices. Starting and final prices have been adjusted to include
shipping.

Log-rating. Since seller rating is a nonnegative number, we added 1 to those with


0 rating before taking the logarithm. Given that ratings reach into the 1000s, this
is not a significant change.

Quantity supplied. Quantity supplied is measured by number of listings posted on


any given day, including unsold listings. This is specific to each game.

Quantity sold. Quantity sold is measured by quantity of transactions that occurred


on any day also specific to each game.

C.

Data Summary
Table 1: Summary Statistics

In total, we collected 7,085 listings across 37 different video games. Different


titles had drastically different price ranges, with 50% of the listings selling between
11

$16.05 and $29.48. We can see from the histogram that the distribution of final price is

200

400

Frequency

600

800

approximately normal:

20

40
Adjusted Final Price

60

80

Figure 2: Histogram of Final Price

We can see from the histograms below that there is extremely high volatility in
both number of bidders and adjusted starting price across listings. This is especially
noticeable for starting price, where many sellers choose to start their listings near 0,
extremely far from the true value of the game. It is important to note that number of

500

1000

Frequency

600
400

200
0

Frequency

800

1500

1000

2000

bidders is dependent on game more popular titles tend to have more bidders on average.

10
Number of Bidders

15

20

20

40
Adjusted Starting Price

60

80

Figure 3: Histograms of Number of Bidders and Adjusted Starting Price

12

V.

Results
The coefficients that are significant at 1% significance level are those of log of

seller rating, starting price, number of bidders, quantity, duration, the interaction term
for the Like-New condition and log of seller rating, and the interaction term for the
Very-Good condition and log of seller rating (see next page for regression output). As
stated in the model section, because our data is not time-dependent data, we do not have
any obvious autocorrelation to test for. We tested for heteroskedasticity using the
Breusch-Pagan test, which came back positive (with p-value of 0.000). We use the robust
option in Stata to correct for this in our main regressions. In order to test for the
robustness of our results, we run several additional regressions where we eliminate some
of the control variables to see if the coefficients for remaining variables still remain
significant. In particular, the control variables we eliminate are the date dummies,
condition dummies, auction duration, and the condition-rating interaction terms. We do
not use any instrumental variables in our regression because there are no variables in the
eBay listings that we could collect data on and instrument with.

13

Table 2: Main Regression Results

In addition to the aggregate regressions above, we also performed regression (4) for each
of the 47 individual game titles. This is to see how many of the coefficients are
significant at the individual game level, and to possibly explore the factors behind these
results. The results from this series of regressions are reproduced in Table 3.

14

Table 3: Results of Regression (4) for Individual Game Titles


Game Name
Assassin's Creed Revelations (Xbox 360, 2011)
Batman: Arkham City (Sony Playstation 3, 2011)
Battlefield 3 (Sony Playstation 3, 2011)
Battlefield 3 (Xbox 360, 2011)
Call of Duty: Black Ops (Sony Playstation 3, 2010)
Call of Duty: Black Ops (Xbox 360, 2010)
Call of Duty: Modern Warfare 2 (Sony Playstation 3, 2009)
Call of Duty: Modern Warfare 2 (Xbox 360, 2009)
Call Of Duty: Modern Warfare 3 (Sony Playstation 3, 2011)
Call Of Duty: Modern Warfare 3 (Xbox 360, 2011)
Fable III (Xbox 360, 2010)
Gears of War 3(Xbox 360)
Grand Theft Auto IV (Sony Playstation 3, 2008)
Grand Theft Auto IV (Xbox 360, 2008)
Halo: Reach (Xbox 360, 2010)
Kinect Sensor with Kinect Adventures (Xbox 360, 2010)
Madden NFL 12 (Sony Playstation 3, 2011)
Madden NFL 12 (Xbox 360, 2011)
Mario Kart 64 (Nintendo 64, 1997)
Mass Effect 3 (Sony Playstation 3, 2012)
Mass Effect 3 (Xbox 360, 2012)
NBA 2K12 (Xbox 360, 2011)
New Super Mario Bros. (Nintendo DS, 2006)
New Super Mario Bros. (Wii, 2009)
Pokemon Black & White Version (Nintendo DS, 2011)
Pokemon Diamond & Pearl Version (Nintendo DS, 2007)
Pokemon Gold & Silver Version (Nintendo Game Boy Color, 2000)
Pokemon Red & Blue Version (Nintendo Game Boy, 1998)
Pokemon Ruby & Sapphire Version (Nintendo Game Boy Advance, 2003)
The Elder Scrolls V: Skyrim (Xbox 360, 2011)
Super Smash Bros. (Nintendo 64, 1999)
Super Smash Bros. Brawl (Wii, 2008)
Super Smash Bros. Melee (Nintendo GameCube, 2001)
Super Mario 64 (Nintendo 64, 1996)
Super Mario Galaxy (Wii, 2007)
Uncharted 3: Drake's Deception (Sony Playstation 3, 2011)
The Legend of Zelda: Ocarina of Time (Nintendo 64, 1998)

Log of Seller Rating


Coefficient
t-Statistic
-0.565
-1.501
-0.703
-1.025
1.039
3.450
1.182
2.373
0.862
2.700
0.530
1.100
0.349
1.992
0.278
1.160
0.566
2.434
-0.333
-0.910
0.030
0.127
0.515
1.677
0.147
0.901
0.155
0.486
0.071
0.197
1.870
1.310
-0.217
-0.651
0.185
0.912
0.166
0.783
0.614
2.572
0.894
1.844
1.451
3.779
-0.083
-0.322
0.375
1.039
0.006
0.012
1.116
1.193
0.241
0.436
-0.890
-1.204
0.034
0.153
0.221
0.294
-0.119
-0.591
-1.362
-1.517
0.576
1.720
0.774
2.054
0.562
1.454
0.447
1.666
-0.267
-1.035

15

VI.
A.

Discussion
Analysis of Coefficients
In our final regression, we have statistically significant values the four regressors

we focused on log-rating, adjusted starting price, number of bidders and quantity. The
value of the coefficients are, respectively, 0.368, 0.329, 0.850, and -0.079 [See Table 1].
The rating coefficient suggests that for a 1% increase in rating, we have around a 0.4 cent
increase in final price. The starting price suggests that for every additional dollar in the
initial price, the ending price will increase by around 33 cents.

Likewise, for an

additional bidder we expect an 85 cent increase per bidder. Finally, for every additional
unit sold we estimate around an 8 cent increase in final price.
The coefficient on log-rating and number of bidders are quite unsurprising more
reputable sellers get higher prices and more bidders will drive up the final price. In
Pennies from eBay: the Determinant of Price in Online Auctions (Lucking-Reiley),
they discovered that starting price was highly correlated with final price, so our
coefficient on starting price is also quite expectable. However, this starting price
coefficient is probably contaminated by the fact that more expensive items tend to have
greater starting prices.
Interestingly, we find a slight negative correlation between equilibrium quantity
and final price. It is important to note that we cannot use the quantity relationship to
sketch out demand and supply relations; it is very likely that quantity is correlated with
popularity of a game, which is also correlated with final price. Additionally, shifts in
either demand or supply will affect both final price and quantity. However, it is notable
16

that the magnitude of this coefficient is very; this suggests that the price a game is sold at
is functionally unaffected by how large the market for that game is. Though this is by no
means solid evidence, it is consistent with highly elastic demand. We obtained very
similar results when regressing over quantity supplied instead of equilibrium quantity
(see the data refinement section for details on the distinction).
Also very notable is that the interaction terms between condition and log rating
are significant and negative; (-0.165, -0.273, -0.353, -0.203 for Like-New, Very-Good,
Good, and Acceptable, respectively) [See Table 1]. Since these are all relative to the
norm of Brand-New, these negative interaction terms imply that seller rating matters
more for higher-quality goods, with the exception of Acceptable goods. This agrees with
our theoretical predictions; the quality premium that buyers pay is magnified when the
quality is higher.
Given the magnitudes of final price we are dealing with in our data, such results
are not economically insignificant. For example, if two different sellers have ratings that
are around 300 and 2400 (not an unusual occurrence), our model suggests that the latter
will get a final price that $1.20 greater than the former. Given that games sell for around
30 dollars, this is quite a figure.
B.

Relationship between Number of Bidders and Volatility of Final Price


When we regress for each game title independently, we find that some of the

game titles had statistically robust correlation between log rating and final price while
others do not. One possible explanation for this is the number of bidders that participated
in auctions for a particular title. If there are fewer people participating in the auction, then
17

there is a higher chance that the listing will not reach the equilibrium price. Additionally,
due to variation in the bidders demand, fewer bidders implies that there will be a higher
volatility in the final price. The volatility transfers to the variance of the error term, which
directly correlates with the variance of the coefficients. Thus, we conjecture that the
game titles without significant correlations did not have enough participant bidders to
reduce the volatility of the final price. To check the validity of this hypothesis, we plotted
the volatility of final price and the number of bidders for each game against the p-value
of the log-rating and final price correlation [See Figures 4 & 5]. The two regressions for
the p-value against the volatility of final price and the p-value against the number of
bidders both result in a p-value of less than 0.05. However, it should be noted that this
regression only involves 37 data points and thus is not terribly robust.

Significance of Log-Rating Coefficient


vs. Number of Bidders
4
|t-Statistic|
of Log Rating Coef.

3.5
3
2.5
2
1.5
1
0.5
0
3

3.5

4.5
5
5.5
Mean Number of Bidders

6.5

Figure 4: Significance of Log-Rating Coefficient vs. Number of Bidders

18

Significance of Log-Rating Coefficient


vs. Volatility in Price
|t-Statistic|
of Log Rating Coefficient

4
3.5
3
2.5
2
1.5
1
0.5
0
0

0.1

0.2
0.3
0.4
SD / Mean of Final Price

0.5

0.6

Figure 5: Significance of Log-Rating Coefficient vs. Volatility in Price

C.

Directions for Future Research


One of our limiting factors was the number of days we could collect data on. We

used a web-crawler to get our data, so we could only get as much data as eBay still had
publicly available on their website, which was a little over 2 weeks of completed auctions.
Given that the quantity vs. price data gave us interesting results, it might be worth the
effort to collect such information over time-spans longer than half a month. Also, we did
not take into account the presence and quality of the listing pictures, nor the quality of the
written description of the item in our regression due to the fact that we could not quantify
such data. In the future, it is possible we could come up with some rough metric of the
listing quality to account for in the regression.
Internal to the video game market, we could account for factors such as the age of
the video game, the number of copies sold and the demographics of the games target
audience. These factors will affect the supply in the secondary market. Additionally, we
could have accounted for the critics rating of the game, the console the game was for, as
19

well as other qualities such as online play or multiplayer capabilities. These factors
certainly affect the resale value of the game, but they are difficult to quantify.
On a slightly larger scale, this type of analysis could be extended to markets
outside of videogames. We used videogames for their standardization among the same
game and significant differences among varying games. Other products out there might
fulfill these roles as, or even more, adequately, than video games and should be looked
into.

VII.

Conclusion
Through our research, we have found that the bidders in fact care about the seller

ratings; people are willing to pay a reputation premium in the form of a higher price for
items listed by sellers with good reputations. We also found positive correlation between
condition of the products and the final price. Interestingly enough, the interaction terms
between seller ratings and the conditions had significant negative coefficients. This
indicates that the bidders care more about seller rating for claims of higher quality; this
reputation premium is magnified in high-quality items.
The number of bidders has two main effects; increasing the final price value and
decreasing the volatility of the final price. Both of these effects are predicted by auction
theory and present in our data. As more people bid, the price converges to the maximum
price in the eBay secondary market; this causes both an increase in final price and a
decrease in the volatility. This decrease in the variance of the final price can help account
for the fact that only some of the game titles have significant correlation between log
rating and the final price. We found that games with high number of bidders tended to
have highly significant coefficients between log rating and the final price.
20

We also found that other variables such as starting price, quantity sold, and
duration all have significant correlations with the final price, which is consistent with our
predictions as well as the results from existing literature. However, all of these results are
not conclusive as they may be contaminated by other factors. For example, the positive
correlation between starting price and the final price may only be because of people
listing more expensive games with higher starting prices.

21

Bibliography
Bajari, Patrick & Hortacsu, Ali, "Economic Insights from Internet Auctions," Journal of
Economic Literature, American Economic Association, vol 42 (2004): 457-486.
Cabral, Lus M B & Hortasu, Ali, "The Dynamics of Seller Reputation: Theory and
Evidence from eBay," CEPR Discussion Papers 4345 (2004).
Elmaghraby, Wedad, The Effect of Asymmetric Bidder Size on an Auctions
Performance: Are More Bidders Always Better Management Science,
vol.51(2005): 1763-1776.
Lucking-Reiley, David & Bryan, Doug & Prasad, Naghi & Reeves, Daniel, "Pennies
from eBay: the Determinants of Price in Online Auctions," Econometric Society
World Congress 2000 Contributed Papers (2000).
Melnik, Mikhail I & Alm, James, "Does a Seller's Ecommerce Reputation Matter?
Evidence from eBay Auctions," Journal of Industrial Economics, Wiley
Blackwell, vol. 50(2000): 337-49.

22

VIII.

Appendix A Web Crawler Python Code


#!/usr/bin/python
# eBay web crawler
# Eric Guan
# Usage: $ ./spider.py "http://BLAHBLAHBLAH" [OutputName]
"http://BLAH2" [OutputName2] ...
import sys
import re
import urllib2
from BeautifulSoup import BeautifulSoup
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_2)
AppleWebKit/535.1 (KHTML, like Gecko) Chrome/14.0.835.202 Safari/535.1'}
def extractPrice(string):
pricere = re.compile(r"\d+\.\d*")
return pricere.search(string).group(0)
def extractInt(string):
intre = re.compile(r"\d+")
return intre.search(string).group(0)
def extractFirst(string):
firstre = re.compile(r"\S+")
return firstre.search(string).group(0)
def URLtoSoup(url):
try:
request = urllib2.Request(url, headers=headers)
handle = urllib2.build_opener()
except IOError:
return None
if handle:
try:
content = unicode(handle.open(request).read(), "utf-8",
errors="replace")
soup = BeautifulSoup(content)
except urllib2.HTTPError, error:
if error.code == 404:
print "ERROR: %s -> %s" % (error, error.url)
else:
print "ERROR: %s" % error
soup=None
except urllib2.URLError, error:
print "ERROR: %s" % error
soup=None
return soup
def Penny(url):

23

soup = URLtoSoup(url)
urls = []
next=False
# class="vip" inidcates listings
for a in soup('a',{'class':'vip'}):
urls.append(a['href'])
# class="botpg-next" indicates the 'Next' button on the bottom of
the page.
#If the url doesn't begin with 'http', then the link is dead.
for td in soup('td',{'class':'botpg-next'}):
for td1 in td(href=True):
tdurl = td1['href']
if ((tdurl[:4] == "http") & (tdurl!=url)):
urls.append(tdurl)
next=True
if (tdurl[:4] == "/ccg"):
urls.append("http://www.ebay.com"+tdurl)
next=True
if (next):
temp = urls[:-1]
temp.extend(Penny(urls[-1]))
urls = temp
return urls
def Nickel(url):
soup = URLtoSoup(url)
#price
price = "ASDF"
for span in soup('span',{'itemprop':'price'}):
price = extractPrice(span.string)
#shipping cost
shippingcost = None
#doesn't work for variable shipping cost cases
for span in soup('span',{'class':'vi-is1-sh-srvcCost vi-is1hideElem vi-is1-showElem'}):
shippingcost = extractPrice(span.string) #May need better
parsing
#FREE
for span in soup('span',{'class':'vi-is1-tese'}):
shippingcost = span.string.strip()
#seller rating
sellerrating = -1
for a in soup('a',{'class':'mbg-fb'}):
sellerrating = int(a.contents[1])
#best-seller medal
topseller = (len(soup('b',{'class':'mbg-ts'})) > 0)
#condition

24

condition = None
for span in soup('span',{'class':'vi-is1-condText'}):
condition = span.string.strip()
data = [ price, shippingcost, sellerrating, topseller, condition]
#bidding info link
bidurl = None
#doesn't work for unsold goods yet. It would go to the original
listing link, and then to the # bids link.
for span in soup('span',{'itemprop':'offers'}):
if (str(span.a) != "None"):
bidurl = span.a['href']
data.extend(Dime(bidurl))
else:
print "No bidding info link found."
return data
def Dime(url):
soup = URLtoSoup(url)
# data will contain:
# [ Starting Bid, Bidders, Bids, Time Ended, Duration ]
#Buy-it-Now
BiN = False
for img in soup('img',{'alt':'Buy It Now'}):
BiN = True
#Starting Bid
startingbid = None
cvf =
soup('td',{'align':'left','class':'contentValueFont','style':'color:#66
6'})
if (len(cvf)>0):
startingbid = extractPrice(cvf[-1].string)
data = [BiN, startingbid]
# bidders, bids, time ended, duration
for dat in soup('span',{'class':'titleValueFont'}):
da = dat.string.strip()
if (len(da)<15):
d = extractInt(da)
else:
d = extractFirst(da)
data.append(d)
return data

# ==========================
name="NAME"

25

n = len(sys.argv)
for i in range((n-1)/2):
penny = Penny(sys.argv[2*i+1])
name = sys.argv[2*i+2]
print penny
print len(penny)
print "================="
data=[]
for p in penny:
nickel = Nickel(p)
print nickel
if (len(nickel) >= 7):
data.append(nickel)
f = open(name+".csv",'w')
for nick in data:
for item in nick:
f.write("%s, " % item)
f.write("\n")

26

IX.

Appendix B Stata DO file

* ============= Data Cleaning ===============*


clear
* =============== Merged =============== *
local i=0
cap erase merged.dta
local files : dir . files "*.csv"
foreach f of local files {
drop _all
insheet using "`f'"
rename v1 finalprice
rename v2 shippingcost
rename v3 rating
rename v4 bestsellermedal
rename v5 condition
rename v6 buyitnow
rename v7 startingprice
rename v8 bidders
rename v9 bids
rename v10 datecompleted
rename v11 duration
if `i'==0 {
gen gamestring = "`f'"
}
if `i'>0 {
cap append using merged, gen(temp)
replace gamestring = "`f'" if temp == 0
disp "`f'"
}
save merged, replace
local i=1
}
cap drop temp
replace gamestring = substr(gamestring, 1, length(gamestring[_n]) - 4)
* * =========== Variable Generation / Refinement =============== *
* Destrings various variables with "none" and otherwise numeric values
destring shippingcost, gen(shipping) force
destring startingprice, replace force
* Fixes bestsellermedal dummy variable
replace bestsellermedal = "1" if bestsellermedal == "True"
replace bestsellermedal = "0" if bestsellermedal == "False"
destring bestsellermedal, replace
* Fixes buyitnow dummy variable
replace buyitnow = "1" if buyitnow == "True"
replace buyitnow = "0" if buyitnow == "False"
destring buyitnow, replace

27

* Drop bad data


drop if buyitnow == 1
drop if finalprice > 75
* Generates quantity supplied before taking out unsold listings
bysort date game: gen quantitysupplied = _N
* Drops unsold listings
drop if bidders == 0
* Throw out listings with variable shipping cost
replace shipping = 0 if (shippingcost == "FREE")
drop if (shippingcost == "None")
replace startingprice = 0 if (startingprice == .)
* Formats the date variable
gen date = date(datecompleted, "MD20Y")
format date %d
* Generates a few variables
gen price = finalprice + shipping
gen startprice = startingprice + shipping
gen lograting = log(rating)
replace lograting = 0 if rating == 0
* Generates quantity sold
bysort date game: gen quantity = _N
* Encodes game categories into numeric factors
encode gamestring, gen(game)
* Generates dummies for conditions
replace condition = "1Brand New" if condition == "Brand New"
replace condition = "2Like New" if condition == "Like New"
replace condition = "3Very Good" if condition == "Very Good"
replace condition = "4Good" if condition == "Good"
replace condition = "5Acceptable" if condition == "Acceptable"
quietly tab condition, gen(condition)
* Labels a few variables
label variable datecompleted "Date (Raw)"
label variable gamestring "Game Names - String (Raw)"
label variable shippingcost "Shipping Cost (Raw)"
label variable finalprice "Final Price (Raw)"
label variable startingprice "Starting Price (Raw)"
label variable condition "Condition - String (Raw)"
label
label
label
label
label
label
label
label
label
label
label
label

variable
variable
variable
variable
variable
variable
variable
variable
variable
variable
variable
variable

rating "Seller Rating"


buyitnow "Buy-It-Now (Dummy)"
bidders "Number of Bidders"
bids "Number of Bids"
duration "Duration"
bestsellermedal "Best Seller Medal"
date "Date"
condition "Condition - Factor"
game "Game Names - Factor"
shipping "Numeric Shipping Cost"
price "Adjusted Final Price"
startprice "Adjusted Starting Price"

28

label variable lograting "Logarithm of Seller Rating"


label variable quantity "Quantity Sold"
label variable quantitysupplied "Quantity Supplied"
* Add game ratings to each game
gen gamerating = 0
replace gamerating = -100 if (gamestring == "masseffect3(xbox360)")
save merged, replace

* ============= Regressions ===============*


* LaTeX output
sutex, label
latabstat bidders, by(game) stat(count mean sd)
* Regressions on individual games
eststo clear
quietly bysort gamestring: eststo: reg price condition2-condition5 date
duration c.lograting#condition2-condition5 bestsellermedal quantity
startprice bidders lograting, robust
* Outputs as a table - combine in excel and plot graph
est table _all, label title(Regression table\label{tab1}) stats(N)
keep(lograting) ///
b(%9.3f) t(%9.3f)
estout using IndivRegressionTable.tex, label keep(lograting)
title(Regression table\label{tab1}) replace ///
cells(b(star fmt(%9.3f)) se(par fmt(%9.3f) )) starlevels(* .10 **
0.05 *** .01)
* Tabulates price and bidders by game
tabstat price, by(gamestring) s(count, mean, sd)
tabstat bidders, by(gamestring) s(count, mean, sd)
* ============= Main regressions ================ *
eststo clear
* Basic Regression (Log-Rating, Bidders, StartPrice, Quantity, BestSellerMedal,
(F)Game)
eststo: reg price i.game bestsellermedal quantity startprice bidders
lograting, robust
* Regression #2 (Log-Rating, Bidders, StartPrice, Quantity, BestSellerMedal,
(F)Condition, (F)Game)
eststo: reg price i.game condition2-condition5 bestsellermedal quantity
startprice bidders lograting, robust
* Regression #3 (Log-Rating, Bidders, StartPrice, Quantity, BestSellerMedal,
Duration, (F)Date, (F)Condition, (F)Game)
eststo: reg price i.game condition2-condition5 i.date duration
bestsellermedal quantity startprice bidders lograting, robust

29

* Full Regression (Log-Rating, Bidders, StartPrice, Quantity, BestSellerMedal,


(F)Condition * Log-Rating, Duration, (F)Date, (F)Condition, (F)Game)
eststo: reg price i.game condition2-condition5 i.date duration
c.lograting#condition2-condition5 bestsellermedal quantity startprice
bidders lograting, robust
esttab using RegressionTable, tex label r2 noconstant title(Regression
table\label{tab1}) replace ///
cells(b(star fmt(%9.3f)) se(par fmt(%9.3f) )) starlevels(* .10 **
0.05 *** .01)

30

X.

Appendix C eBay Auction Interface

Figure A.1: Listing of Auctions of a Game

Figure A.2: Individual Auction of a Game


31

Figure A.3: Bid History of an Item

32

Вам также может понравиться