Академический Документы
Профессиональный Документы
Культура Документы
In this paper, we present a method to support the analysis and visualization of market
structure by automatically eliciting product attributes, and brand’s relative positions, from online
customer reviews. First, we discover attributes and attribute dimensions using the "voice of the
consumer,” as reflected in customer reviews, rather than that of manufacturers. Second, the
approach runs automatically. Third, we support rather than supplant managerial judgment by
reinforcing or augmenting attributes and dimensions found through traditional surveys and focus
groups.
We test the approach on three different product domains: a rapidly evolving technology
market, a mature market, and a service good. We analyze and visualize results in several
different ways including comparisons to expert buying guides, a laboratory survey, and
Correspondence Analysis of automatically discovered attributes.
Key words: market structure analysis; online customer reviews; text mining
INTRODUCTION
Marketing research, the set of methods to collect and draw inferences from market-level
customer and business information, have been the lifeblood of the field of marketing practice and
the focus of much academic research in the last 30+ years. Simply put, marketing research, the
methods that surround it, and the inferences derived from it, have put marketing as an academic
discipline and as a functional area within the firm “on the map”. From a practical perspective,
this has brought forward the “stalwarts and toolbox” of the marketing researcher including
methods such as: preference data collection via Conjoint Analysis (Green and Srinivasan 1978)
inferring market structure via multi-dimensional scaling (Elrod 1988; Elrod 1991; Elrod, Russell
et al. 2002), inferring market segments through clustering routines (DeSarbo, Howard et al.
1991), or simply understanding the sentiment and “voice of the customer” (Griffin and Hauser
1993). While these methods are here to stay, the radical changes wrought by the Internet and
user-generated media promise to fundamentally alter the data and collection methods that we use
In this paper, we propose to harness the growing body of free, unsolicited, user-generated
online content for automated market research. Specifically, we describe a novel text-mining
algorithm for analyzing online customer reviews to facilitate the analysis of market structure in
two ways. First, the Voice of the Consumer, as presented in user-generated comments, provides
a simple, principled approach to selecting product attributes for market structure analysis (we
also discuss their use for conjoint studies as well). Traditional methods, by contrast, rely upon a
dimensions from consumer surveys (internal analysis). Second, the preponderance of opinion as
represented in the continuous stream of reviews over time provides practical input to augment
traditional approaches (such as surveys or focus groups) for conducting brand sentiment analysis,
and can be done (unlike traditional methods) continuously, automatically, very inexpensively,
and in real-time.
Our focus on market structure analysis is not by chance, but rather due to its centrality in
marketing practice and its fit with text mining of user-generated content. Analysis of market
structure is a key step in the design and development of new products as well as the repositioning
of existing products (Urban and Hauser 1993). Market structure analysis describes the
substitution and complementary relationships between the brands (alternatives) that define the
market (Elrod, Russell et al. 2002). In addition to descriptive modeling, market structure
analysis is used for predicting marketplace responses to changes such as pricing (Kamakura and
Russell 1989), marketing strategy (Erdem and Keane 1996), product design, and new product
introduction (Srivastava, Alpert et al. 1984). Hence, if automated in a fast, inexpensive way (as
described here), it can have significant impact on marketing research and the decisions that
There is a long history of research in market structure analysis. Approaches vary by the
type of data analyzed (panel-level scanner data, aggregate sales, consumer survey response) and
by the analytic approach (Elrod, Russell et al. 2002). Internal methods, characterized by
Multidimensional Scaling (MDS), simultaneously induce a market structure and those product
attributes that define substitutes and complements. By contrast, external methods, characterized
dimensions. Regardless of the use of an internal or external approach, with few exceptions,
4
approaches to market structure analysis begin with the same set of survey or transaction sales
data and assume that “all customers perceive all products the same way and differ only in their
evaluation of product attributes (Elrod, Russell et al. 2002).” Models that incorporate customer
uncertainty about product attributes (Erdem and Keane 1996) serve to highlight the colloquial
wisdom, “garbage in, garbage out.” Surprisingly, despite its importance, there is little extant
research to guide attribute selection for these methods (Wittink, Krishnamurthi et al. 1982).
There is literature on the sensitivity of conjoint results to changing attribute selection, omitting
an important attribute, level spacing, etc. (Green and Srinivasan 1978) but not much on how to
choose those attributes in the first place. We propose to fill this gap using automated analysis of
online customer reviews. Specifically, in this paper, we visualize market structure by using
correspondence analysis, a variant of multivariate data analysis used in external analysis but
apply it to product attributes mined from the “Voice of the Consumer” in an automated manner.
We note, however, that ours is by far not the first use of user-generated content or even
specifically online reviews for the purposes of marketing action. The impact of customer
reviews on consumer behavior has long been a source of study. A large body of work explores
how reviews reflect or shape a seller's reputation (Eliashberg and Shugan, 1997; Chevalier and
Mayzlin, 2003; Dellarocas, 2003; Ghose, Ipeirotis, et al, 2006). Other researchers have studied
the implications of customer reviews for marketing strategy (Chen and Xie 2004). To stress its
importance and ubiquitous nature, a 2009 conference co-sponsored by the Marketing Science
Institute and the Wharton Interactive Media Initiative had over 50+ applicants all doing work on
user-generated content.
With that said, there has been comparatively little work on what marketers might learn
from customer reviews for purposes of studying market structure. Early work combining
5
marketing and text mining focused on limited sets of attributes for purposes of analyzing price
premiums associated with specific characteristics (Archak, Ghose et al. 2007; Ghose and
Ipeirotis 2008; Ghose, Ipeirotis et al. 2009). By comparison, our objective is to learn the full
range of product attributes and attribute dimensions voiced in product reviews and to reflect that
in a visualization of market structure that can be used for marketing actions. Work that analyzes
online text to identify market competitors (Pant and Sheng 2009) focuses on the corporate level
and relies upon network linkages between Web pages and online news. Social networks have
also been used to segment customers for planning marketing campaigns, but relies upon directly
reference to user-generated comments (Hill, Provost et al. 2006). In this paper, we focus on the
automatically extract and visualize direct comparative relationships between product brands from
online blogs (Feldman, Fresko et al. 2007; Feldman, Fresko et al. 2008). Our work is
complementary in at least three ways. First, we present a simpler set of text-mining techniques
that is less dependent upon complex language processing and hand-coded parsing rules, requires
minimal human intervention (only in the analysis phase), and is better suited to customer reviews
(Hu and Liu 2004; Lee 2005). Second, our focus is on learning attributes as well as their
underlying dimensions and levels. Our goal in learning attributes, dimensions and levels is not
only to facilitate direct analysis but also to inform more traditional market structure and conjoint
analysis methods. Third, we specifically highlight the “Voice of the Consumer”. Different
customer segments such as residential home users of personal computers versus hard core
gamers may refer to the same product attribute(s) using different terminology (Randall,
6
Terwiesch et al. 2007). These subtle differences in vocabulary may prove particularly useful in
describe an automated process for identifying and analyzing online product reviews that is easily
repeatable over reviews for both physical products and services and requires minimal
human/managerial intervention. The process extends traditional market structure analysis in the
following ways:
Provides a principled approach for selecting attributes for market structure analysis by
identifying what attributes customers are commenting on as well as the polarity of those
Identifies not only what attributes customers are speaking about but also how they speak
about it. In particular, our approach identifies not only attributes but elicits underlying
Highlights the Voice of the Consumer. We can think of how a customer describes
product attributes not only in terms of attribute dimensions and levels but also in terms of
vocabulary (West, Brown et al. 1996). Both differences in granularity (dimensions and
levels) and differences in vocabulary can signal differences between consumer segments.
Discovers attributes within user-generated comments that are not highlighted using more
traditional techniques for eliciting attributes and dimensions such as those used to
discovered within our approach that are meaningful as determined in a follow-up survey
conducted to assess the degree to which these attributes were important to consumers.
7
Facilitates market structure analysis over time by enabling the periodic (re)estimation of
Supports rather than supplants traditional internal (perceptual mapping) and external
(conjoint analysis) approaches for market structure analysis by suggesting attributes and
dimensions in addition to those that emerge from traditional focus groups and surveys.
Possesses high face validity in attribute and dimension selection for practical significance
because marketing managers are familiar with and have easy access to online reviews.
In the remainder of this paper, we provide an overview of our technical approach, revisit the
approach in detail, and analyze and evaluate the approach on three different product domains.
OVERVIEW
“informal heuristic way” as many of these techniques are new to the marketing audience; we
then revisit each step in greater detail later for those who would like to replicate our approach.
The process begins with a set of online reviews in a product category over a specified time
frame. For example, in this paper, we consider the reviews for all digital cameras available at
Epinions.com as of July 5, 2004. Figure 1, Step 1 shows three reviews, one for a camera
manufactured by Olympus, one for a camera by HP (Hewlett Packard), and one for a camera by
Fuji. In Step 2, screen scraping software automatically extracts details from each review
including the brand and a list of Pros and a list of Cons. Our goal is to group all phrases
discussing a common attribute into one or more clusters to reveal what customers are saying
8
about the product space and how they say it. While some review sites do not provide user-
authored Pro and Con summaries (e.g. Amazon.com), many others including Epinions.com,
BizRate, and CNet do (Hu and Liu 2004). Exploiting the structure provided by Pro and Con lists
from prose-like text. This allows us to have our process be “automated”, whereas extant
All of the Pros and Cons are then separated into individual phrases as depicted in column
1 of the table in Figure 1, Step 3. Preprocessing transforms the phrases from column 1 into a
normalized form. Column 2 depicts one step of pre-processing, the elimination of uninformative
stop-words such as articles (“the”) and prepositions (“of”). Each phrase is then rendered as a
word vector. The matrix of word vectors is depicted in the remaining columns of the table in
Step 3. Each row is the vector for one phrase. Each vector index represents a word and the
vector value records a weighted, normalized count of the number of times that the indexed word
Phrases are automatically grouped together based upon their similarity. Similarity is
measured as the cosine angular distance between word vectors and is identical to the Hamming
distance used in computer-science based research. Step 4 depicts the K-means (albeit other
algorithms can be easily used) clustering of the phrases from Step 1. Conceptually, we may
think of each cluster representing a different product attribute. The example shows clusters for
product into components (Ulrich and Eppinger 2003). In the same way, Step 5 depicts the
hierarchical decomposition of a product attribute into its constituent dimensions (i.e. attribute
9
levels a la conjoint analysis). In Step 5a, we show a conceptual decomposition of the digital
camera product attribute “memory.” In Step 5b, we show an actual decomposition using only
the phrases from Step 1. The decomposition is treated as a linear programming assignment
problem (Hillier and Lieberman 2001). The objective is to assign each word in the attribute
cluster to an attribute dimension. Each word phrase defines a constraint on the assignment: any
two words that co-occur in the same phrase cannot be assigned to the same attribute dimension.
Thus, we know that “smart” and “media” cannot appear as a value for the attribute dimension
quantity (4, 8, 16) or for the attribute dimension of memory unit (“mb”). Note that not all
phrases include a word for every dimension. Intuitively, this is both reasonable and an important
aspect of capturing the Voice of the Consumer. We wish to know not only what customers say
but also how (at what level of detail) they say it. For the algorithm, phrases that do not include a
word for each attribute simply represent a smaller set of co-occurrence constraints than a phrase
In this section we revisit various steps in the process described above in greater detail.
Specific algorithms and pseudo-code describing the implementation appear as an appendix and
Step1 simply involves identifying a source for reviews and identifying the product (or set
of products) that we wish to analyze. For Step 2, we wrote a program in the Python
programming language. For each review, we get the product identifier used by Epinions.com to
uniquely identify a product, the list of Pros, and the list of Cons. Product brand names are
excerpted from the Epinions.com product identifier1 and inserted into a MySQL database. It is
also important to note that one could separate the selected reviews by pre-defined segments (i.e.
demographic clusters, and hence produce segment-level Pro-Con lists that would be analyzed
distinctly), or attempt to (but is beyond the scope of this research) to simultaneously infer latent
To construct the matrix of word vectors, we focus on each phrase. For now, we do not
distinguish between whether a phrase appears as a Pro or a Con, focusing only on grouping
together those phrases that discuss a common product attribute. Later, we will see whether
looking at Pros versus Cons separately provides differentiated market structure, an important
idea in that marketing managers may be interested in understanding market structure on the
“positive side” versus market structure on the “negative side” (i.e. we have the pluses of our
competitors but don’t share their negatives). As a standard preprocessing step in text mining,
we normalize words as follows. Delete all stop-words and stem the remaining text (Salton and
McGill 1983). Stop-words, like grammatical articles, conjunctions, prepositions, etc. are
meaningless for purposes of product attribute identification so they are removed. For example,
after pruning, the phrase "Only 8 mb Smart media card included" becomes "8 mb Smart media
card included." Reduce words to their root form by stemming. We use the Porter stemmer to
11
find equivalences between singular, plural, past and present tense forms of individual words used
by customers. Thus, "includes" and included" are both reduced to the root "includ."
review space so that the word vector for every phrase, taken together, form a matrix. Every entry
in the matrix(i,j) measures the importance of a word j in characterizing or defining the product
attribute discussed by phrase i. Every row of the matrix represents the corresponding phrase in
the vector space of words. The matrix values for each word are based upon word frequency and
derived from the TF-IDF (Term Frequency-Inverse Document Frequency) standard (Salton and
McGill 1983). Specifically, terms describing product attributes and sentiments tend to exhibit
unusually high frequencies in reviews. By considering reviews from unrelated product domains
(to form a control condition, i.e. a null importance matrix), we extend the TF-IDF measure to
Intuitively, our objective is now to cluster the vectors (the matrix rows) so that all phrases
describing the same product attribute are grouped together. More formally, given the phrase
word matrix(i,j) over the set of I phrases and the set of words J, we seek to separate phrases into
a set C of k mutually exclusive and exhaustive clusters We use the cosine measure of angular
distance between vectors to calculate similarity. The cosine measure is then applied to the
phrase word matrix using the K-means clustering algorithm. While any number of clustering
algorithms is acceptable, we selected K-means for its simplicity and its familiarity to both the
text-mining and marketing communities. More complex topic clustering algorithms like
12
Probabilistic Latent Semantic Analysis (PLSA) and Latent Dirichlet Analysis (LDA) also require
parameter estimation and marketing approaches such as Latent Structure MDS (LSMDS) begin
with a pre-defined set of product attributes. As noted earlier, the principle contribution of our
The quality, QC, of a K-means clustering, C, is calculated by the sum of the distances
from each vector in a cluster to that vector's centroid. Following (Zhao and Karypis 2002), this
metric is more simply defined as the sum of the length of the composite vectors:
Because K-means is known to be extremely sensitive to its initial conditions, we repeat the
algorithm ten times, beginning with a new, random set of k centers and pick the solution that
maximizes QC.
A critical step in our approach is to discover not only what product attributes customers
are discussing but also how they say it. More particularly, we answer the question of how by
discovering the attribute dimensions that customers use in their reviews. To discover attributes,
we assume that each phrase corresponds to a distinct product attribute. To discover attribute
dimensions, we will assume that each word in the phrase corresponds to a distinct dimension.
Discovering attributes then reduces to the assignment of particular words to attribute dimensions.
previous notation slightly, assume a set of phrases I composed from the set of words J and a set
13
of attribute dimensions D. We have J D binary decision variables Xjd where Xjd is 1 if word j
where Yij is 1 or 0 depending upon whether word j appears in phrase i. Thus, our objective is to:
max X jd
s.t. i J Yij * X jd 1
X jd binary
The graph partitioning algorithm used to set the parameters I, J, and D and the
constrained logic program (CLP) by which we solve the optimization are implemented in Python
and detailed in Appendix 1.2 and 1.3 respectively. Logical assignment is depicted in Table 1. A
maximal clique (see appendix) appears in the top row, the constraints represented by phrases of
normalized words appear in the middle column, and the corresponding assignment appears in the
There are at least two ways of evaluating the quality and efficacy of approaches such as
ours. For a given product, we can “objectively” compare attributes that are automatically
discovered from customer reviews to an external reference standard (e.g. Consumer Reports).
Alternatively, and more directly, we can visualize and analyze the results by using the attributes
to map the relationship between competitors in the marketplace (i.e. the construction of a market
structure map). In this section, we do both and report results from the application of our
approach to an actual set of digital camera product reviews. We then ask how the results change:
14
if we consider only positive or negative comments, when we look at reviews over different time
periods, using just pros or cons, and apply the process to two additional product domains to
structure analysis and consider the extension of our approach to conjoint analysis, an external
Digital cameras
Our initial data set consists of 8,226 online digital camera reviews downloaded from
Epinions.com on July 5, 2004. The reviews span 575 different products and product bundles that
range in price from $45 to more than $1000. Parsing the Pro and Con lists produces the
aforementioned phrase word matrix that is 14,081 phrases by 3,364 words. We first set k (the
number of maximum clusters for K-means) at 50 by analyzing the set of all product attributes in
our reference buying guides in a manner that follows Popescu et al. (2004). While relying upon
experts is common, there are also a number of more general, statistical approaches for initializing
k including the Gap (Tibshirani, Walther et al. 2005) and KL (Krzanowski and Lai 1988)
55 and found that k maximizes the value of KL at 50. Setting k = 50, we iterated K-means
clustering 10 times, selecting the best resulting output based upon QC (Eqn 1).
Given an initial set of 50 clusters (from K-means), our next step is to further filter the
initial clusters into attribute dimensions. The CLP process produced a total of 171 sub-clusters
describing attributes and dimensions within the 50 initial clusters. Applying a χ2 threshold of
.001 and further filtering the results using the Spearman Rank test rs (see Appendix 1.3) reduces
those 171 sub-clusters to 99. Within each cluster, sub-clusters may represent noise from K-
15
means or different ways in which customers express the dimensions of an attribute (e.g. the
number of batteries, the type of batteries, or battery life). To this point, the entire research
process is fully automated with no human intervention whatsoever. Finally, a manual reading
reveals which of the remaining sub-clusters discuss a common product attribute, and which are
noise; albeit, in the future even this could be automated. Our final reading identifies 39 clusters
of product attributes (see Table 2). Though we might have expected 50 sub-clusters, one for
each of the initial clusters, this is not the case. For some initial clusters, none of the sub-clusters
pass the statistical filters. In other cases, multiple sub-clusters within a single cluster may
indicate that the parent cluster does not cleanly distinguish a common product attribute.
To facilitate the presentation, we apply a naïve convention for naming each cluster (a
common practice in marketing studies): scan the cluster for the most frequent word(s) in each
cluster. Some resulting cluster names may only have meaning in the product context.
Comments are inserted in parentheses to provide context to the automatically generated name as
well as to indicate where certain product attributes are duplicated. A listing of automatically
generated dimensions for each of the 39 attributes is available in a separate appendix available
As an objective measure of the success of our approach, we compare attributes and levels
derived from our online customer reviews to those discovered using more traditional measures
such as those used in creating expert buying guides. In particular, we compute (P) (Salton and
McGill 1983) the number of automatically generated attributes and dimensions also used by
16
experts in published buying guides. Conversely, recall (R) counts the number of attributes and
levels named in professional buying guides that are automatically discovered in the Voice of the
Consumer. More formally, if X is the set of attributes from the Voice of the Consumer and Y is
the set of set of attributes identified in professional guides, P and R are defined as:
X Y X Y
P and R
X Y
The first three columns of Table 3 describe the reference sources. Epinions (A)
represents the attributes and levels by which customers can browse the digital camera product
mix and Epinions (B) represents a buying guide available on the Epinions website. CR02 –
CR05 represent print buying guides regarding digital cameras from Consumer Reports for the
years 2002 through 2005. The next two columns report precision and recall where reference
attributes must exactly match an automatically generated cluster from our approach.
attribute in some reference guides while the automatic process identified “optical” as a
dimension of “zoom.” Borrowing from Popescu et al. (Popescu, Yates et al. 2004), we further
define precision and recall containment (P+ and R+) to allow specific terms to qualify as a
match for more general terms, provided that the more specific term appears as a dimension or
level, and vice versa. In the final two columns, we report P+ and R+.
A quick review suggests that the automated extraction performs with varying quality
relative to the on-line buying guides. Given that different sources may be subject to different
biases and that technologies evolve over time, we also considered all pairwise comparisons
17
between the reference guides themselves. Table 4 reports the average precision and recall from
The results suggest, at least in part, that the external sources are neither exhaustive nor
even consistent with one another. Hence, a more appropriate benchmark for evaluation may be
the internal consistency between the sources themselves. Assuming containment, the .72
average recall from our approach, labeled "Auto" in Table 4, exceeds all others. At the same
time, the precision is equal to the median consistency among all guides. As generating a "larger"
list automatically can be further reduced by post-time human intervention, such results are both
Most importantly, our attributes come directly from customer reviews. In some cases,
our product attributes may not closely align with those in the marketing materials for
manufacturers and retailers. Our results therefore suggest that product reviews do reveal
information not used in the traditional methods. The managerial question is therefore not
whether online product reviews provide information; but instead, the question is what value that
information provides. Did we find “unseen” important attributes? We discuss that next.
To assess the value of our automated process, we conducted a laboratory survey that
asked subjects to evaluate the importance of different attributes for the purpose of purchasing a
new digital camera. We find that automated analysis of online product reviews can support
managerial decision-making in at least two ways. First, our approach can identify significant
attributes that are otherwise overlooked by the experts. Second, the reviews can serve as a filter
18
for other attribute elicitation methods; attributes that are identified by experts but also named by
customers may have more salience for purposes of product marketing and design.
Specifically, our survey, which took less than 5 minutes to complete, was administered as
average rate of $10/hour. Pre-testing suggested no interference between our survey and the
unrelated studies conducted during the same session. In total, 181 subjects at a large
Northeastern university participated in our web-based survey. Based upon validity checks and
A set of product attributes for testing was constructed by reconciling the 39 attributes
shown in Table 2 with all of the attributes identified in the 10 different reference buying guides
listed in Table 3. After duplicate attributes were eliminated, the resulting set of 55 attributes was
divided into overlapping thirds to reduce any individual respondent’s burden. See Appendix 3
for the complete list. A few attributes were repeated in each third as an additional validity check.
Each subject viewed between 20 and 21 attributes. Subjects were asked to rate their "familiarity
with" and the "importance of" each attribute using a 1- 7 scale. The specific questions were:
Imagine that you are about to buy a new digital camera. In the table below, we give a
list of camera attributes. For each attribute, please answer two questions:
First, from 1 – 7, please rate how familiar you are with the attribute. [1] means that you
have no idea what this attribute is and a [7] means that you know exactly what this is.
Second, from 1 – 7, please rate how much you care about this attribute when thinking
about buying a new digital camera. [1] means that you do not care about this attribute at all
19
and [7] means that this is critical. You would not think of buying without first asking about this
attribute.
Subjects were prompted to answer all questions. In particular, subjects were reminded to
answer the second question for each attribute even if they answered [1] for the first question. In
Part 2 of the survey, to understand the role that expertise might play, subjects were asked to
provide their self-assessed expertise on digital cameras (1= “novice” and 7 = “expert”).2 Finally,
subjects were asked for a standard set of demographic variables such as age, gender, education-
level, etc. These variables were used as covariates to verify our main findings.
Our main results are summarized in Figure 2 where we plot the mean familiarity versus
mean importance for each of the 55 attributes. Attributes that appeared only in one or more
buying guides are labeled "Expert Only" and symbolized by diamonds. Attributes that appear in
at least one buying guide and also in our automated results are labeled "Expert + VOC (Voice of
the Consumer)" and are identified by squares. Finally, attributes that emerged only from our
automated analysis of reviews are labeled "VOC Only" and plotted as "X" symbols. As we
would expect, the graph indicates a general trend upwards and to the right. Users are more
familiar with those product attributes that they tend to consider important. Similarly, if a user is
unfamiliar with a particular attribute, they are unlikely to place a high value on that attribute. A
complete table of attributes and their respective familiarity and importance means is provided in
Appendix 3. We note that, in general, "importance" does not vary greatly by gender, expertise,
or any other demographic variable. In several instances, whether the subject owns a digital
camera does exhibit some significance. The significance of covariates should be checked
Figure 2 suggests two significant managerial implications. First, there are eight attributes
labeled "VOC Only," and they tend towards the upper right-hand corner of the plot: (camera)
size, body (design), (computer) download, feel (durability), instructions, lcd (brightness), shutter
lag, and a cover (twist lcd for protection). The existence of product attributes that are both
familiar and important to users suggests the value of processing online reviews to augment
Moreover, by comparing the "Expert Only" plot to the "Expert + VOC" label in Figure 2,
we see that most high-importance, high-familiarity attributes appear as "Expert + VOC" while
the lower-left hand region is populated primarily by "Expert Only" attributes. Comparing the
difference (p < .01) in average familiarity (4.2 to 5.1) and in average importance (4.1 to 4.9).
The difference (p < .05) between "Expert Only" and "VOC Only" is equally large and suggests a
represented in online product reviews can serve as a filter, highlighting meaningful product
attributes.
The potential for applying "VOC" as a filter is also seen in the wide disagreement among
the Expert Buying Guides. There are only seven product attributes that appear in at least 50%
(five) of the Expert guides used in our evaluation. Labeled by "+" symbols in Figure 2, we see
that even when the Experts agree, there is a wide variance in familiarity and importance that is
At the same time, it is worth noting that our procedure does miss some high value
missing attribute. Our approach relies on word frequencies. Specific references to any one
brand (e.g. “Canon”) may not appear with sufficient frequency to cluster as an attribute. We
might instead map all explicit brand names to a single common word (e.g. “brand”).
The second sub-class of attributes missed by our approach is largely due to differences in
how terms are classified. In Figure 2, the attribute ranked second in importance among "Guide
Only" results is "battery source." While our automated approach does capture characteristics of
batteries and even battery type, these values are classified as properties of the "battery" attribute.
Likewise, we distinguish between “shutter lag” and “shutter delay.” Although the terms refer to
the same physical characteristic, distinctions in vocabulary may reveal distinct submarkets. In
this way, our analysis highlights differences between the voice-of-the-consumer as represented in
product reviews with that of the manufacturer and retailer’s marketing literature.
information to understand and visualize market structure. A product brand is associated with
each online review. Using the automatically generated attribute clusters, we generate a brand by
attribute matrix, counting the number of brand occurrences (number of phrases) for each attribute
and, as standard, normalized by the total number of phrases for that brand. To then turn this into
technique for analyzing two-way, two-mode frequency data (Everitt and Dunn 2001) making it
more appropriate for this task than continuously scaled MDS procedures that are commonly used
22
representing the data in a reduced space as measured by consulting the eigenvalues and the
corresponding scree plot (Greenacre 1992). To help interpret the dimensions in the reduced
space, we use (stepwise) regression of each brand’s (x,y) coordinates on the derived attributes.
The probability for entry is 0.05 and probability for removal is 0.1, albeit other values were
Figure 3 depicts the brand map in two dimensions based upon the initial set of digital
camera reviews analyzed for this study alongside the Scree plot of the eigenvalues and
cumulative percentage of inertia. In the paper, the visualizations are limited to two dimensions.
However, we recognize that the reliability of any such model is dependent upon fit as
represented by the percentage of inertia captured by those dimensions. At the limit, we imagine
that marketing managers might use the two-dimensional figures as a point of departure for
The percentages on the axes indicate the percentage of inertia for F1 and F2 respectively.
The isolation of Casio in the lower right quadrant of the coordinate axes and the positioning of
Panasonic as farthest from the origin or “average” representation in the dimensional space are
consistent with the empirical data. Of the nine brands in the brand map, only Casio and
Panasonic are not represented in the 20 best-selling digital cameras of 2004 (InfoFaq) and the
Following convention, we rescale the axes and plot both brands and attributes in the same
space in Figure 4. Only A20, about “slow” or “fast” startup, boot-up, or shut down, appears in
the lower right quadrant perhaps explaining Casio’s isolation. Furthermore, attributes A18 and
23
A21 appear in the upper left quadrant. This suggests that “AC power adapters” and the camera’s
“body construction” are not relevant to customers when evaluating brand (dis)similarity. For the
marketing manager, this provides several initial hypotheses for improving or repositioning
Regressing the CA coordinates (F1 and F2) on the attribute clusters defines F1 as a
combination of comments about camera options (e.g. manual focus, auto focus, aperture) and
slow startup/shutter delay. F2 captures some combination of camera size (e.g. easily fits into a
pocket or hand) and external memory cards (e.g. SD card, Compact Flash, etc.). A summary of
the stepwise results for variable selection are included as an appendix. An alternative approach
to naming the dimensions would have been to regress on the actual, manufacturer-provided
physical dimensions. However, this is exactly what makes our approach novel/different. Figure
One additional benefit of using structured Pro-Con lists is the ability to immediately
identify the sentiment polarity (Pro phrases convey positive polarity and Con phrases convey
negative polarity). Beginning with the attributes derived from clustering review phrases, we
label each cluster based upon whether it includes only Pro phrases, only Con phrases, or both.
We construct the combined Pro-Con dimensional space by considering attributes that comprise
the intersection (attributes that are mentioned as both Pro and Con) or the union of all attributes.
with which comments are made in constructing our market structure map. Because there was
24
little qualitative difference in the relative positioning of brands between using the intersection of
Pro and Con attributes versus the union of Pro and Con attributes, we depict only the union.
As one would expect, in the combined space, there is a clear separation between Pro and
Con phrases. As stated earlier, this provides another layer of insight into brand positioning:
“Who are our competitors on the praise dimensions and criticism dimensions?” To the best of
our knowledge, this is a further unique contribution of this research that has never been explored.
Interestingly, Casio’s brand isolation exists only in the context of Pro comments, perhaps
suggesting that the differentiation is not due to an inferior product. Rotational distinctions aside,
we see that most brands retain a close relative positioning between Pro and Con with respect to
the average (or origin) in the combined space. By contrast, Sony is clearly further away with
respect to negative comments suggesting the need for their further exploration. In the interest of
space and readability, we omit the figure projecting attributes and brands in the same figure.
However, the regressions suggests that F1 is best explained by a combination of “slow” or “fast”
startup, boot-up, or shut down, controls (e.g. white balance, etc.), and the viewfinder. F2 is
explained by remarks about the LCD, about reliability and service, and about battery life.
ability to map changes in the environment over time in real-time. For the marketing manager,
customer reviews offer an opportunity to measure the impact and diffusion of campagins as
reflected in the Voice of the Consumer. Conversely, as noted earlier, tapping reviews offers a
We collected a parallel set of 5567 digital camera reviews from Epinions.com dated
between January 1, 2005 and January 28, 2008. The new set of reviews produced 39 initial
attribute clusters. After accounting for duplicates (e.g. multiple clusters referring to resolution),
there were a total of 30 unique digital camera product attributes. Five new attributes were
surfaced replacing five previously formed clusters. The substitutions are listed in Table 5.
Although it is difficult to discern the underlying causes, the changes have face validity. As the
customer base becomes increasingly sophisticated and increasingly connected online, the need
for instructions and support has shifted towards online self-service. Likewise, the ubiquity of
personal computers and online photo management software may have shifted such functions
away from the camera. In keeping with the theme of a more technically sophisticated audience,
functions such as ISO settings, multiple shot modes, and white/color balance are more significant
Even though many attributes remain the same across time periods, changes in both
customers and the marketplace may drive (or reflect) brand repositioning over time. To
construct a combined attribute space over two time periods, there are at least two approaches.
The first is to cluster reviews from each time period independently, derive attribute clusters for
each period, and then construct the combined space from attributes appearing in both periods
(intersection) or from either period assigning zero counts to brands in the time period where the
attribute does not appear (union). Manual intervention is required to align clusters from the
different periods to ensure that they reference the same items. A more automated approach, as
with our combined Pro-Con space, would cluster reviews from both time periods together. Each
attribute cluster is then labeled depending upon whether it includes phrases from period 1, from
26
period 2, or from both. The combined space could again include the intersection or union of
attributes. To minimize the amount of human intervention required in our process, we pursued
the second strategy. For example, Figure 6 depicts changes in brand positioning over time with
respect to positive (Pro) customer comments. Because there was little change in the
visualization of relative brand positioning when using intersection or union in the combined
Figure 6 depicts a clear trend with brands converging towards the origin, which is entirely
expected. In a fast moving market marked by innovation and new product introduction, one
would expect a degree of convergence as manufacturers attempt to follow one another. The
exception is Panasonic, which is notable in that it moved away from the other brands.
We also explored the robustness of our approach by applying the technique to two
technology sophisticated product domain, we first considered a more stable product domain:
Toaster ovens.
We downloaded all Epinions.com toaster oven reviews available on 10/5/2007 and split
the data set as of January 1, 2005. Before 2005, there are 402 reviews, and after, there are 398
reviews. Product prices range from $20 to $380. Maximizing KL set the number of clusters (k)
at 25. After filtering the clusters, the resulting 18 attributes (including duplicates) are listed in
Table 6.
In Figure 7, we map the market structure for Toaster Ovens. F1 is explained by “(lack
of) reliability,” “ease of access to interior (for cleaning or inserting/removing food),” and “toast.”
F2 is explained by “browning (cooking method),” “ease of access,” and “reviews (what others
say, suggest, or have read).” In a mature market, one might expect greater stability over time.
The CA plot does show some convergence. Hamilton, Cuisinart, Oster appear to move towards
the origin. Other brands like Toastmaster and Krups, move counter to that trend while others
(DeLonghi or Black and Decker) show little relative movement. Such activity is consistent with
the opportunity to innovate even in seemingly staid markets (Urban, Johnson et al. 1984; Urban
and Hauser 1993) but less so than what we observed for digital cameras.
For the second domain, we studied a single set of 3800 hotel reviews from Philadelphia,
Maximizing KL and then filtering resulted in the 34 attributes (including some duplicates) in
Table 7. When describing attributes of services, customers may write in ways less amenable to
traditional NLP sentiment analysis (Pang and Lee 2008). Comments about service attributes and
dimensions may also prove less quantifiable or readily actionable (e.g. “staff friendliness”).
Consequently, work applying text and data mining to hotel reviews tends to exploit a pre-defined
set of amenities from hotel descriptions and/or limit the text mining to measures of readability
and subjectivity (Ghose, Ipeirotis et al. 2009). Our work, by contrast, uses the customer reviews.
We again used CA to visualize the market structure. When looking at digital cameras,
many different camera models were aggregated within a single brand name. For hotels, we
drilled down on brands to distinguish between distinct sub-markets within and between hotel
28
chains. Figure 8 looks only at positive comments. Despite attempts to draw distinctions
between sub-markets, it is interesting to see that distinct brand clusters may or may not emerge.
In Figure 8, F1 corresponds to room size and amenities, breakfast and other food and dining
resources, and the bed. F2 corresponds to price, location, and parking. Only the Limited Service
hotels are in the lower right quadrant. Full Service hotels primarily cluster above the X axis.
Interestingly, where more than one hotel brand is available, the parent company (Hilton,
Starwood, Holiday Inn, and Marriott) distributes its presence across the different quadrants of the
The market space is plotted with respect to negative comments in Figure 9. Again, the
intention is not to definitively define the market structure. Rather, our goal is to suggest that, on
its face, analyzing consumer reviews can reveal not only attributes (what people discuss) but the
actual Voice of the Consumer (how they discuss it). Such distinctions may offer a unique
opportunity to analyze the evolution of market structure from the perspective of customer
In contrast to CA and internal market structure analysis, Conjoint Analysis is perhaps the
most common approach to external market structure analysis (Elrod, Russell et al. 2002). We
might envision the (re)design of conjoint studies based upon attributes defined by the Voice of
the Consumer. Conjoint studies have been applied to new product introduction (Wittink and
Cattin 1989; Michalek, Feinberg et al. 2005), optimal product repositioning (Moore, Louviere et
al. 1999) and pricing (Goldberg, Green et al. 1984), and segmenting customers (Green and
29
an agglomeration of preferences for the underlying attribute levels that have been selected. This
constant sum, or self-explicated) or method for determining the profiles (Huber and Zwerina
1996; Moore, Gray-Lee et al. 1998; Toubia, Simester et al. 2003; Evgeniou, Boussios et al.
2005).
The automated analysis of online product reviews could potentially assist in the design of
conjoint studies in at least three ways. First, as previously noted, customer reviews may reveal
attributes not included elsewhere. Second, reviews reveal not only what customers speak of but
how they speak about it. Analysis of reviews can help inform the granularity with which
atributes are described and the vocabulary used to describe those attributes. Finally, analysis of
reviews can help produce meaningful levels for conjoint study design.
Our approach for revealing attribute dimensions (as described in Table 1) assigns words
to individual attribute dimensions. For example, “3x,” “4x”, “5x”, etc. might all be assigned to a
single attribute dimension for zoom “magnification.” Using techniques such as distributional
clustering (Pereira, Tishby et al. 1993), we can group words within the cluster of an attribute
dimension to meet a managerially specified target number of levels (Lehmann, Gupta et al.
1997). The result is a conjoint study designed by customers, in the words of customers, for
prospective customers. Our hope is that this research brings one of the key conjoint open issues
In this paper, we have presented a system for automatically processing the text from
online customer reviews. Phrases are parsed from the original text. The phrases are then
normalized and clustered. A novel logical assignment approach exploits the structure of Pro-Con
review summaries to further separate phrases into sub-clusters and then assigns individual words
to unique categories. Notably, our work differs from sentiment-based strategies in that we do not
rely upon complex natural language processing techniques and do not rely upon the user to first
provide a set of representative examples or keywords (Turney 2002 ; Nasukawa and Yi 2003; Hu
and Liu 2004; 2006). However, opportunities to refine our approach are symptomatic of broader
Conceptual considerations
To generate initial clusters, the system requires phrases. Even though Epinions' customer
reviews provide lists of phrases, variations in human input are a source of noise. Moreover, we
assume that a phrase represents a single concept and that individual words represent distinct
levels of attribute dimensions. It is easily seen how this assumption creates difficulties with the
natural language in reviews (e.g. people who write lists like "digital and optical zoom").
One possible solution is to apply more sophisticated NLP techniques. Beginning with a
representative set of meaningful words, there are ways of expanding the set of words,
(Hearst 1992; Maedche and Staab 2000; Popescu, Yates et al. 2004; Popescu and Etzioni 2005).
However, our Pro-Con summaries are simply lists of phrases with no associated linguistic
context; Liu et al. (2005) demonstrate that techniques which rely upon representative examples
perform markedly less well in the context of Pro-Con phrases. Our constrained optimization
31
approach dispenses with the need for representative examples and knowledge of grammatical
rules (Lee 2005). An entirely different approach that would preserve the unsupervised nature of
our work is to attempt to identify phrases through frequent item-set analysis. Hu and Liu (2004)
demonstrate that by tuning support and confidence thresholds, it is possible to discover whether a
word pair represents one concept or two (e.g. “compact flash” versus “3x zoom”).
Our assignment strategy uses word graphs based on co-occurrences. Although our use of
optimization to cluster words within a graph is unique, the graph representation is not new. Co-
clustering (Baker and McCallum 1998; Dhillon, Mallela et al. 2002) and categorical clustering
strategies (Gibson, Kleinberg et al. 1998; Ganti, Gehrke et al. 1999; Zaki and Peters 2005) model
Pragmatic considerations
We need the ability to assess the stability of our clusters and concomitant product
features. One instance of stability is sensitivity to data sample size. Here, we relied upon a large
data set to yield the phrases from which we identify a maximal clique and hence do assignment.
The large data set is also a boon because we can liberally discard phrases to minimize the effects
of naïve parsing. To measure the sensitivity to sample size, we would cross-validate on smaller
sets of review samples. We can plot the trade-off between sample size and evaluation metrics to
identify diminishing returns and attempt to estimate a minimal number of required reviews. Care
needs to be taken to ensure sufficient heterogeneity in the sample selection with respect to
While our approach is generalizable across different product domains, our dependence on
sources that provide phrase-like strings is a limitation. At least two factors ameliorate this
32
limitation. First, there are other domains where phrase-like text-strings apply as opposed to
prose. Progress notes in medical records and online movie reviews (Eliashberg and Shugan
1997) are two such examples. Second, recognizing the current limitations of natural language
processing tools, more online sources are soliciting customer feedback in the form of phrases
Conceptually, each attribute is parameterized by one or more dimensions and each dimension is
defined by the levels (the set of values) that a particular dimension may take. Operationally,
attributes, dimensions, and levels are defined by the clustering. Attributes name the clusters
from K-means. The sub-clusters formed from the clique-based assignment each represent an
attribute’s dimension. The values within each sub-cluster define the dimension’s levels.
Knowing that the distinction between attributes, dimensions, and levels is noisy emphasizes the
traditional processes.
Future work
In addition to work expanding the conceptual and pragmatic dimensions of our work,
there are a number of ways in which we might enrich the concept relationships that we are
learning. For example, some attribute properties are ordinal in nature. Recognizing order
facilitates the challenge of aligning orderings. For marketing and product design, alignment is
critical because different customers may address a concept using parallel categories. For
example, will 32 mb satisfy a customer seeking to store 130 images? We can apply concept
clustering (Gibson, Kleinberg et al. 1998; Ganti, Gehrke et al. 1999) in conjunction with our
Structure exists not only at the market level but also at the level of individual consumers
(Elrod, Russell et al. 2002). A deeper understanding of the distinctions between segments
requires understanding not only the differences between product attributes but also the
differences in the underlying customer needs (Srivastava, Alpert et al. 1984; Allenby, Fennell et
al. 2002; Yang, Allenby et al. 2002). Unlike traditional data sources for market structure
analysis, online product reviews include comments about not only what and how, but why. In the
text of reviews, customers often relate why they purchased a product and what they use that
product for. Recent research mining user needs from online reviews (Lee 2004; Lee 2009) could
based upon online product reviews. However, traditional approaches to market structure analysis
produce models useful not only for describing existing markets but also for predictive purposes.
Exploring the integration of the Voice of the Consumer and productive reviews into a predictive
market structure analysis is a great opportunity (Allenby, Fennell et al. 2002; Elrod, Russell et al.
2002).
Learning from reviews offers a principled approach to selecting key attributes for
studying the structure of the existing market as well as defining studies to enable product
repositioning and/or new product design within that market. Moreover, our approach
emphasizes not only what but how. Identifying distinct voices with which customers describe a
product and changes in those voices over time, may reveal subtle differences in sub-markets and
represent a significant opportunity for better managing the consumer – producer relationship.
We believe that our research can be an important first step in that direction.
34
REFERENCES
Allenby, G., G. Fennell, et al. (2002). "Market Segmentation Research: Beyond Within and
Archak, N., A. Ghose, et al. (2007). Show me the money! Deriving the Pricing Power of Product
Baker, D. and A. McCallum (1998). Distributional Clustering of Words for Text Classification.
SIGIR 98.
Chen, Y. and J. Xie (2004). Online Consumer Review: A New Element of Marketing
DeSarbo, W. S., D. J. Howard, et al. (1991). "Multiclus: A new method for simultaneously
Dhillon, I. S., S. Mallela, et al. (2002). Enhanced Word Clustering for Hierarchical Text
Elrod, T. (1988). "Choice Map: Inferring a Product-Market Map from Panel Data." Marketing
Elrod, T. (1991). "Internal Analysis of Market Structure: Recent Developments and Future
Elrod, T., G. J. Russell, et al. (2002). "Inferring Market Structure from Customer Response to
Choice Processes in Turbulent Consumer Good Markets." Marketing Science 15(1): 20.
35
Everitt, B. S. and G. Dunn (2001). Applied Multivariate Data Analysis. New York, Oxford
University Press.
Evgeniou, T., C. Boussios, et al. (2005). "Generalized Robust Conjoint Estimation." Marketing
Feldman, R., M. Fresko, et al. (2007). Extracting Product Comparisons from Discussion Boards.
IEEE ICDM.
Feldman, R., M. Fresko, et al. (2008). Using Text Mining to Analyze User Forums. International
Ganti, V., J. Gehrke, et al. (1999). CACTUS - Clustering categorial data using summaries. ACM
Ghose, A. and P. Ipeirotis (2008). Estimating the Socio-Economic Impact of Product Reviews:
Mining Text and Reviewer Characteristics. New York, New York University.
Ghose, A., P. Ipeirotis, et al. (2009). Towards Designing Ranking Systems for Hotels on Travel
Ghose, A., P. Ipeirotis, et al. (2006). The Dimensions of Reputation in Electronic Markets, New
Gibson, D., J. Kleinberg, et al. (1998). Clustering categorical data: an approach based on
Goldberg, S. M., P. E. Green, et al. (1984). "Conjoint Analysis of Price Preimums for Hotel
Green, P. E. and A. M. Krieger (1991). "Segmenting Markets with Conjoint Analysis." Journal
Green, P. E. and V. Srinivasan (1978). "Conjoint Analysis in Cosumer Research: Issues and
Griffin, A. and J. R. Hauser (1993). "The Voice of the Customer." Marketing Science 12(1): 1-
27.
Hearst, M. (1992). Automatic acquisition of hyponyms from large text corpora. COLING.
Hill, S., F. Provost, et al. (2006). "Network-based Marketing: Identifying likely adopters via
Hu, M. and B. Liu (2004). Mining and Summarizing Customer Reviews. KDD04. Seattle, WA.
Huber, J. and K. Zwerina (1996). "The Importance of Utility Balance in Efficient Choice
Kamakura, W. A. and G. J. Russell (1989). "A Probabilistic Choice Model for Market
97-133.
Krzanowski, W. J. and Y. T. Lai (1988). "A criterion for determing the number of groups in a
Lee, T. (2004). Use-centric mining of customer reviews. WITS. Washington, D.C.: 146-151.
37
Lee, T. (2005). Ontology Induction for Mining Experiential Knowledge from Customer
Lee, T. Y. (2009). Automatically Learning User Needs from Online Reviews for New Product
Liu, B., M. Hu, et al. (2005). Opinion Observer: Analyzing and Comparing Opinions on the
Maedche, A. and S. Staab (2000). Semi-automatic Engineering of Ontologies from Text. SEKE.
Michalek, J. J., F. M. Feinberg, et al. (2005). "Linking Marketing and Engineering Product
22: 42-62.
Moore, W. L., J. Gray-Lee, et al. (1998). "A Cross-Validity Comparison of Conjont Analysis and
Moore, W. L., J. J. Louviere, et al. (1999). "Using Conjoint Analysis to Help Design Product
Pang, B. and L. Lee (2008). Opinion Mining and Sentiment Analysis. Foundations and Trends in
Information Retrieval.
Pant, G. and O. Sheng (2009). Avoiding the Blind Spots: Competitor Identification Using Web
Pereira, F., N. Tishby, et al. (1993). Distributional Clustering of English Words. ACL93.
38
Popescu, A.-M. and O. Etzioni (2005). Extracting Product Features and Opinions from Reviews.
HLT-EMNLP.
Popescu, A.-M., A. Yates, et al. (2004). Class extraction from the World Wide Web. AAAI
Workshop on ATEM.
Randall, T., C. Terwiesch, et al. (2007). "User Design of Customized Products." Marketing
Salton, G. and M. McGill (1983). Introduction to modern information retrieval. New York,
McGraw-Hill.
Srivastava, R. K., M. I. Alpert, et al. (1984). "A Customer-Oriented Approach for Determining
Tibshirani, R., G. Walther, et al. (2005). "Cluster validation by prediction strength." Journal of
Toubia, O., D. I. Simester, et al. (2003). "Fast Polyhederal Adaptive Conjoint Estimation."
Urban, G. L. and J. R. Hauser (1993). Design and Marketing of New Products, Prentice Hall.
Urban, G. L., P. L. Johnson, et al. (1984). "Competitive Market Structure." Marketing Science
3(2): 30.
West, P. M., C. L. Brown, et al. (1996). "Consumption Vocabulary and Preference Formation."
Wittink, D. R., L. Krishnamurthi, et al. (1982). "Comparing derived importance weights across
Yang, S., G. Allenby, et al. (2002). "Modeling Variation in Brand Preference: The Roles of
Zaki, M. and M. Peters (2005). CLICKS: Mining Subspace Clusters in Categorical Data via K-
Zhao, Y. and G. Karypis (2002). Criterion Functions for Document Clustering: Experiments and
TABLES
Auto E(A) DP Mega Biz Cnet E(B) CR02 CR03 CR04 CR05 Mean
Precision 0.37 0.69 0.33 0.57 0.41 0.27 0.48 0.24 0.26 0.27 0.37 0.42
Recall 0.72 0.23 0.55 0.23 0.48 0.33 0.46 0.38 0.37 0.37 0.50 0.39
Table 4: Internal consistency: average precision and recall between one source and all others
41
Before 2005 Size Support (service) Feel (mfr) Instruction Edit (in camera)
After 2005 ISO Modes Accessories Easy to use White/color balance
Table 5. Changes in automatically generated attributes between 2004 and 2005-2008
Room (size, appearance) Bar Park (valet, fee, price, charge) Staff (front desk; friendly) Business Center Problem
Hotel (size, cleanliness) Car Surround (neighborhood) Internet wireless Feel Location
Shuttle (service) Pool Noise (street, highway, construction) Close (location, sights) Furnishings View
Concierg (Amenities) Cost Facil (conference, laundry, exercise) Downtown (park, location) Stay Lack
Air condit (noise) TV Water (bottle, shower, pressure) Staff (atmosphere) Public area
Place (look, modern) Bed Food, accommodations, pillow Property (architecture) Citi (location)
Table 7. Automatically generated product attributes from TripAdvisor
42
FIGURES
Borrowing from the information retrieval community, our phrase word matrix is a
representation of the vector-space model (VSM). More formally, j J is a word in the set of all words; i
I is a phrase. A phrase is simply a finite sequence of words and J is a subset of the set of finite word
sequences I = {<j>| j J}. We define an initial phrase word matrix as a simple variation on the term-
where the term frequency TFij counts the total number of occurrences of word j in the instances of
phrase i. The inverse phrase frequency IPFj = log(|I|/nj) is a weighting factor for words that are more
helpful in distinguishing between different product attributes because they only appear in a fraction of the
total number of unique phrases. If |I| represents the total number of unique phrases in the review
A limitation of the TF-IPF weighting is that there are still some terms (e.g. sentiment words like
"great" or "good") that are neither stop words nor product attributes yet appear with product attributes in
the TF-IDF matrix. As an additional discount factor beyond IPF, we automatically gather words from a
second set of K phrases using online reviews for an unrelated product domain. Intuitively, words
appearing in the reviews for unrelated products are less likely to represent relevant product attributes for
the focal one. For example, words describing digital camera attributes are less likely to also appear in
Formally, for a set of (I') phrases drawn from the set of finite word sequences over j J, we
calculate rank(j) = rank(TF'ijIPF'j) where higher weighted frequencies correspond to higher rank. Note
52
that multiple words may share the same rank; if we define words that do not appear in any phrase as
Matrix(i,j) = TFij rank j IPF j IPF ' j (A1.2)
Thus, we scale TF by the rank of the word in the unrelated product domain and scale the IPF by IPF'
To discover attributes, we assume that each customer review phrase corresponds to a distinct
product attribute. To discover attribute dimensions, we assume that each word in the phrase corresponds
to a distinct dimension. Discovering attribute dimensions then reduces to the assignment of particular
words to attribute dimensions. But how do we know how many dimensions there are in the assignment
problem? Is it possible that the assignment optimization has no feasible solution because of conflicting
constraints due to noise from the vagaries of human language? To solve this problem, we generate a
graph of all words in the cluster. Each word is a node and arcs are defined by the co-occurrence of two
words in the same phrase. We partition the graph into (possibly overlapping) sub-graphs by searching for
maximal cliques. Intuitively, each sub-graph represents a maximal subset of words and phrases for which
an optimal solution exists. The size of the maximal clique sets the number of attributes |D|. The sub-
More formally, we assume that phrases and words are preprocessed and normalized into words as
before. A graph G = (V,E) is a pair of the set of vertices V and the set of edges E. An edge in E is a
connection between two vertices and may be represented as a pair (vi,vj) V. Each phrase (word)
represents a vertex v in the graph; edges are defined by phrase pairs within a review (word pairs within a
phrase). An N-partite graph is a connected graph where there are no edges in any set of vertices Vi. A
clique of size N simulates a plays the role of arelational schema and can be extended to an N-partite
graph by substituting each vertice vi of the clique with a set of vertices Vi. A database table with disjoint
53
columns thus represents an N-partite graph where the size of the clique defines the number of columns
and each word in the clique “names” a column. A maximal-complete-N-partite graph is a complete-N-
partite graph not contained in any other such graph; in other words, the initial clique is maximal. The
corresponding database table of phrases represents the existing product attribute space, and the maximal-
complete-N-partite graph includes possibly novel combinations of previously unpaired attributes and/or
attribute properties.
To relate the graph back to customer reviews, we say that a product attribute is constructed from k
dimensions. Each dimension names a domain (D). Each domain D is defined by a finite set of words that
includes the value NULL for review phrases where customers fail to mention one or more attribute
dimension(s). The Cartesian product of domains D1 …Dk is the set of all k-tuples {t1…tk | ti Di}. Each
phrase is simply one such k-tuple and the set of all phrases in the cluster simply defines a finite subset of
the Cartesian product. A relational schema is simply a mapping of attribute properties A1 …Ak to domains
D1 … Dk. Note the strong, implicit assumption that a maximal clique, taken over a word graph, is a proxy
for the proper number of attribute dimensions. Under this assumption, it is easy to see how searching for
To align words into their corresponding attribute dimensions, we frame the task as a
mathematical assignment problem and resolve the problem using a bounds consistency approach. We
process_phrases(p_list)
[1] schema = find_maximal_clique(p_list)
[2] order phrases by length
[3] for each phrase p:
[4] # initialize data structures
[5] tok_exclusion – for each tok, mutually exclusive tokens
[6] tok_candidates – for each tok, valid candidate assignments
[7] tok_assign – for each tok, the dimension assignment
[8] # propagate the constraints for each successive phrase
[9] tok_candidates, tok_exclusion, tok_assign =
[10] propagate_bounds(phrase, tok_candidates,
[11] tok_exclusion, tok_assign, schema)
Figure 2. Logical Assignment
54
define the assignment using the maximal clique that corresponds to the schema for each product attribute
table (see Figure 2). In the bounds consistency approach, we invert the constraints (tok_exclusion) to
express the complementary set of candidate assignments (tok_candidates) for each attribute dimension. If
the phrase constraints, taken together, are internally consistent, then the candidate assignments
(tok_assign)for a given token are simply the intersection of all candidate assignments as defined by all
We transform the mutual exclusivity constraint represented by each phrase into a set of candidate
assignments using the algorithm in Figure 3. Note that we need only propagate the mutual exclusivity of
words that are previously unassigned. Accordingly, for each unassigned token in a given phrase, the set
of candidate assignments is the intersection of the possible assignments based upon the current phrase and
all candidate assignments from earlier phrases containing the same token. We maintain a list of active
tokens boundary_list to avoid rescanning the set of all tokens every time the possible assignments for a
Finally, the K-means clustering used to separate review phrases into distinct product attributes is
a noisy process. The clustering can easily result in the inclusion of spurious phrases. Both the initial
clustering of phrases into product attributes and the subsequent assignment of words to attribute
properties are inherently imperfect. Inconsistencies may emerge for any number of reasons including:
Poor parsing, the legitimate appearance of one word multiple times within a single phrase (e.g. the phrase
‘digital zoom and optical zoom’ duplicates the word ‘zoom’) or even “inaccuracies” by the human
55
reviewers who write the text that is being automatically processed. This could result in a single attribute
property divided over multiple table columns. For example, some reviews might write "SmartMedia" as a
single word and others might use "Smart" and "media" as two separate words. Alternatively, multiple
product attributes may appear in the same cluster. '[C]ompact flash' and 'compact camera' are clustered
together based upon their common use of the word 'compact,' yet refer to distinct attributes.
To address the problem of robustness in the face of noisy clusters that include references to additional
product attributes or have different properties for the same attributes, we extend our CLP approach to
simultaneously cluster phrases and assign words. By modeling reviews as a graph of phrases, we can
apply the same CLP in a pre-assignment step to filter a single (noisy) cluster of phrases. As alluded to in
Appendix 2.2, we generate a graph where phrases are nodes, and edges represent the co-occurrence of
two phrases within the same review. The extended CLP then prunes phrases by recursively applying co-
occurrence constraints; two phrases in the same review cannot describe the same attribute just as two
words in the same phrase cannot describe the same attribute dimension. The same assignment
representation removes phrases that are not central to the product attribute at the heart of a particular
phrase cluster. Phrases that are not “connected” in the graphical sense of a connected component or
Unfortunately, even the extended CLP approach is imperfect. Some of the tables will represent
distinct product attributes. Others will simply constitute random noise. Individual tables are supposed to
represent distinct product attributes, so we assume that meaningful tables should contain minimal word
overlap. With this in mind, we apply a two-stage statistical filter to further filter noisy clusters.
First, because each table itself separates tokens into attribute properties (columns), meaningful
tables will not hold too small a percentage of the overall number of tokens. Second, we assume that
meaningful tables comprise a (predominately) disjoint token subset. If the tokens in a table appear in no
other table, then the intra-table token frequency should match the frequency of the initial k-means cluster;
likewise, the table's tokens, when ordered by frequency, should match the relative frequency-based order
56
of the same tokens within the initial cluster. The first stage of our statistical filter is evaluation of a 2
statistic, comparing each table to its corresponding initial cluster. Although there is no hypothesis to be
tested per se, there is a history of applying the 2 statistic in linguistics research to compare different sets
of text with a measure that weights higher-frequency tokens with greater significance than lower
frequency tokens (Kilgarriff 2001). In our case, we set a minimum threshold on the 2 statistic to ensure
that individual tables reflect an appropriate percentage of tokens from the initial cluster.
After filtering out tables that do not satisfy the 2 threshold, we use the same cluster token counts
to calculate rank order statistics. We compare the token rank order from each constituent table to that in
the corresponding initial cluster using a modified Spearman rank correlation co-efficient (rs). As a minor
extension, we use the relative token rank, meaning that we maintain order but keep only tokens that are in
both the initial and the iterated CLP cluster(s). We select as significant those tables that maximize rs. In
the event that two or more tables maximize rs we promote all such subclusters either as a noisy cluster or
as synonymous words for the same product attribute as determined by a manual reading.
57
In this Appendix, we list the digital camera product attributes (with duplicates eliminated) that are
found exclusively in one or more online buying guides (Expert Only), learned automatically from the
product reviews (VOC Only), or in both (Expert + VOC). The means for Familiarity and Importance as
collected from our survey are reported on a 1 to 7 scale. To help align the attribute names used here with
those in Table 2, we include a mapping from the automatically derived attribute clusters (auto) to those 55
attributes used in the consumer survey. Note that in some cases, an automatically derived attribute is
mapped to more than one survey (expert) attribute name and vice versa due to inconsistencies between the
granularity with which an attribute is discussed in the expert guides and/or by the voice of the consumer.
Expert Only
Familiarity Importance
battery source 6.42 6.05
flash ext 4.91 3.09
flash range 4.17 3.90
image compress 4.80 4.62
image sensor 2.37 3.25
image stab 4.88 5.28
man light sens 3.69 3.69
manual exp 2.69 3.10
manual light meter 2.38 2.86
manual shut 3.09 3.24
mem qty built-in 5.68 5.05
movie fps 4.43 4.35
movie output 3.90 4.20
music play 4.57 3.80
num sensors 2.25 3.18
power adapt 6.05 4.87
time lapse 4.45 3.76
wide angle 3.57 3.57
58
VOC Only
Survey attribute Auto Familiarity Importance
camera size size 6.35 5.87
body (design) body 5.82 5.48
download time USB 5.68 4.75
feel (durability) feel; support service 5.30 5.43
instructions instruction 6.07 4.18
lcd brightness screen 4.62 4.57
shutter lag slow; shutter 3.87 3.80
twist lcd cover 3.80 3.62
Expert + VOC
Auto Familiarity Importance
battery life battery 6.50 6.30
cam soft edit 5.32 3.57
cam type shoot 4.65 5.06
comp cxn USB 6.13 5.75
comp soft software 5.77 4.60
ergonomic feel feel 5.15 4.83
flash built-in red-eye 6.45 6.26
flash mode low-light; red-eye 5.48 5.12
lcd viewfinder lcd 5.00 5.15
lens cap cover; lens 4.92 4.25
lens type macro 3.80 4.20
manual aper control 2.40 2.86
manual focus control; focus 5.14 3.72
mem capacity mb 5.30 5.68
mem stor type disk; floppy; flash (drive) 5.63 5.47
movie audio movie 4.85 4.48
movie length movie 5.73 5.47
movie res mpeg 4.97 5.15
navigation menu 5.93 5.40
optical viewfinder optical 4.50 4.33
photo qual; picture; print;
picture quality image qual 6.35 6.60
price price 6.31 6.45
resolution resolution; megapixel 5.35 5.56
59
mode, two-way count data (Everitt and Dunn 2001). To interpret the dimensions in the reduced space, we
regressed each brand’s factor scores on the derived attributes. For each figure in the paper, we report the
results of the stepwise regression on F1 and F2. The reported results assume a probability for entry of
ENDNOTES
1
Interested readers may contact the corresponding author for the source code for all of the
2
Each subject also completed a six-item digital camera quiz after providing their self-
assessment. The correlation was high (r = 0.4) and hence we use the self-assessment score in
subsequent analysis.