Driver Analysis and Product Optimization Using Bayesian Networks

Tutorial on Driver Analysis and Product Optimization
with BayesiaLab
Stefan Conrady, stefan.conrady@conradyscience.com
Dr. Lionel Jouffe, jouffe@bayesia.com
December 1, 2010
Conrady Applied Science, LLC - Bayesia’s North American Partner for Sales and Consulting
Conrady Applied Science, LLC - www.conradyscience.com
Table of Contents
Tutorial on Driver Analysis and Product Optimization with BayesiaLab

Introduction 1
BayesiaLab 1
Conrady Applied Science 1
Acknowledgements 1
Abstract 1
Bayesian Networks 1
Structural Equation Models 1
Probabilistic Structural Equation Models 2
Tutorial 2
Model Development 2
Data Preparation 2
Consumer Research 2
Data Import 2
Unsupervised Learning 5
Preliminary Analysis 6
Variable Clustering 8
Multiple Clustering 10
Analysis of Factors 12
Completing the PSEM 14
Market Driver Analysis 16
Product Driver Analysis 19
Product Optimization 19
Conclusion 24
Contact Information 25
Conrady Applied Science, LLC 25
Bayesia SAS 25
Copyright 25
Driver Analysis and Product Optimization with BayesiaLab i

Driver Analysis and Product Optimization with BayesiaLab
Acknowledgements
Tutorial on Driver We would like to express our gratitude to Ares Research
Analysis and Product (www.ares-etudes.com) for generously providing data
from their consumer research for our case study.
Optimization with
Abstract
BayesiaLab Market driver analysis and product optimization are one
of the central tasks in Product Marketing and thus
relevant to virtually all types of businesses. BayesiaLab
Introduction provides a unified software platform, which can, based
This tutorial is intended for new or prospective users of on consumer data,
BayesiaLab. The example in this tutorial is taken from
the field of marketing science and is meant to illustrate 1. provide deep understanding of the market
the capabilities of BayesiaLab with a real-world case preference structure
study and actual consumer data. Beyond market
2. directly generate recommendations for prioritized
researchers, analysts and researchers in many fields will
product actions.
hopefully find the proposed methodology valuable and
intuitive. In this context, many of the technical steps are The proposed approach utilizes Probabilistic Structural
outlined in great detail, such as data preparation and the Equation Models (PSEM), based on machine-learned
network learning, as they are applicable to research with Bayesian networks. PSEMs provide an efficient
BayesiaLab in general, regardless of the domain. alternative to Structural Equation Models (SEM), which
have been used traditionally in market research.
BayesiaLab
Bayesia SAS, based in Laval, France has been developing Bayesian Networks
BayesiaLab since 1999 and it has emerged as the leading A Bayesian network, belief network is a directed acyclic
software package for knowledge discovery, data mining graphical model that represents the joint probability
and knowledge modeling using Bayesian networks. distribution over a set of random variables and their
BayesiaLab enjoys broad acceptance in academic conditional dependencies via a directed acyclic graph
communities as well as in business and industry. The (DAG). For example, a Bayesian network could represent
relevance of Bayesian networks, especially in the context the probabilistic relationships between diseases and
of market research, is highlighted by Bayesia’s strategic symptoms. Given symptoms, the network can be used to
partnership with Procter & Gamble, who has deployed compute the probabilities of the presence of various
BayesiaLab globally since 2007. diseases.
Conrady Applied Science Structural Equation Models
Conrady Applied Science, based in Franklin, TN, is a Structural Equation Modeling (SEM) is a statistical
consulting firm specializing in knowledge discovery and technique for testing and estimating causal relations
probabilistic reasoning with Bayesian networks. In 2010, using a combination of statistical data and qualitative
Conrady Applied Science has been appointed Bayesia’s causal assumptions. This definition of SEM was
authorized sales and consulting partner for North articulated by the geneticist Sewall Wright (1921), the
America. economist Trygve Haavelmo (1943) and the cognitive

scientist Herbert Simon (1953), and formally defined by
Judea Pearl (2000).
Structural Equation Models (SEM) allow both

confirmatory and exploratory modeling, meaning they
Conrady Applied Science, LLC - www.conradyscience.com 1

are suited to both theory testing and theory • BayesiaLab functions, keywords, commands, etc., are
development. shown in bold type.
Probabilistic Structural Equation Models • Variable names are capitalized and italicized.
Traditionally, specifying and estimating an SEM required
a multitude of manual steps, which are typically very Model Development
time consuming, often requiring weeks or even months
of an analyst’s time. PSEMs are based on the idea of Data Preparation
leveraging machine learning for automatically generating
Consumer Research
a structural model. As a result, creating PSEMs with
This study is based on a monadic1 consumer survey
BayesiaLab is extremely fast and can thus form an
about perfumes, which was conducted in France. In this
immediate basis for much deeper analysis and
example we use survey responses from 1,320 women,
optimization.
who have evaluated a total of 11 fragrances on a wide
Tutorial range of attributes:
At the beginning of this tutorial, we want to emphasize • 27 ratings on fragrance-related attributes, such as,
the overarching objectives of this case study, so we don’t “sweet”, “flowery”, “feminine”, etc., measured on a 1-
lose sight of the “big picture” as we immerse ourselves to-10 scale.
into the technicalities of BayesiaLab and Bayesian • 12 ratings on projected imagery related to someone,
networks. who would be wearing the respective fragrance, e.g.
“is sexy”, “is modern”, measured on a 1-to-10 scale.
In this study we want to examine how product attributes
• 1 variable for Intensity, a measure reflecting the level
perceived by consumers relate to purchase intention for
of intensity, measured on a 1-to-5 scale.2
specific products. Put simply, we want to understand the
• 1 variable for Purchase Intent, measured on a 1-to-6
key drivers for purchase intent. Given the large number
scale.
of attributes in our study, we also want to identify
• 1 nominal variable, Product, for product identification
common concepts among these attributes in order to
purposes.
make interpretation easier and communication with
managerial decision makers more effective. Data Import
To start the analysis with BayesiaLab, we first import the
Secondly, we want to utilize the generated understanding
data set, which is formatted as a CSV file.3 With
of consumer dynamics, so product developers can
Data>Open Data Source>Text File, we start the Data
optimize the characteristics of the products under study
Import wizard, which immediately provides a preview of
in order to increase purchase intent among consumers,
the data file.
which is our ultimate business objective.
Notation
In order to clearly distinguish between natural language,

BayesiaLab-specific functions and study-specific variable
names, the following notation is used:
1 a product test only involving one product, i.e. in our study each respondent evaluated only one perfume.
2 The variable Intensity is listed separately due to the a-priori knowledge of its non-linearity and the existence of a “just-
about-right” level.
3 CSV stands for “comma-separated values”, a common format for text-based data files.

Product variable and clicking the Discrete check box,

which changes the color of the Product column to red.
The table displayed in the Data Import wizard shows the

individual variables as columns and the responses as
rows. There are a number of options available, e.g. for
sampling. However, this is not necessary in our example We will also define Purchase Intent and Intensity as a
given the relatively small size of the database. discrete variables, as the default number of states of
Clicking the Next button, prompts a data type analysis, these variables is already adequate for our purposes.5
which provides BayesiaLab’s best guess regarding the The next screen provides options as to how to treat any
data type of each variable. missing values. In our case, there are no missing values
Furthermore, the Information box provides a brief so the corresponding panel is grayed-out.
summary regarding the number of records, the number Clicking the small upside-down triangle next to the
of missing values, filtered states, etc.4 variable names brings up a window with key statistics of
the selected variable, in this case Fresh.
For this example, we will need to override the default

data type for the Product variable, as each value is a The next step is the Discretization and Aggregation
nominal product identifier rather than a numerical scale dialogue, which allows the analyst to determine the type
value. We can change the data type by highlighting the of discretization, which must be performed on all
4 There are no missing values in our database and filtered states are not applicable in this survey.
5 The desired number of variable states is largely a function of the analyst’s judgment.

continuous variables.6 For this survey, and given the

number of observations, it is appropriate to reduce the
number of states from the original 10 states (1 through
10) to smaller number. One could, for instance, bin the
1-10 rating into low, mid and high, or apply any other
arbitrary method deemed appropriate by the analyst.
Clicking Select All Continuous followed by Finish

completes the import process and the 49 variables
(columns) from our database are now shown as blue
nodes in the Graph Panel, which is the main window for
network editing.
The screenshot shows the dialogue for the Manual

selection of discretization steps, which permits to select
binning thresholds by point-and-click.
Note
For choosing discretization algorithms beyond this

example, the following rule of thumb may be helpful:
• For supervised learning, choose Decision Tree.
• For unsupervised learning, choose, in the order of

priority, K-Means, Equal Distances or Equal
Frequencies.
For this particular example, we select Equal Distances

with 5 intervals for all continuous variables. This was
the analyst’s choice in order to be consistent with prior
research.
This initial view represents a fully unconnected Bayesian
network.
For reasons, which will become clear later, we will

initially exclude two variables, Product and Purchase
Intent. We can do so by right-clicking the nodes and
selecting Properties>Exclusion. Alternatively, holding “x”
while double-clicking the nodes performs the same
exclusion function.
6 BayesiaLab requires discrete distributions for all variables.

Unsupervised Learning
As the next step, we will perform the first unsupervised
Needless to say, this view of the network is not very
learning of a network by selecting Learning>Association
intuitive. BayesiaLab has numerous built-in layout
Discovering>EQ.
algorithms, of which the Force Directed Layout is
perhaps the most commonly used.
The resulting view shows the learned network with all

the nodes in their original position.
It can be invoked by View>Automatic Layout>Force

Directed Layout or alternatively through the keyboard
shortcut “p”. This shortcut is worthwhile to remember
as it is one of the most commonly used functions.

The resulting network will look similar to the following

screenshot.
It is very important to note that, although this learned

graph happens to have a tree structure, this is not the
To optimize the use of the available screen, clicking the result of an imposed constraint.
Best Fit button in the toolbar “zooms to fit” the Preliminary Analysis
graph to the screen. In addition, rotating the graph with The analyst can further examine this graph by switching
the Rotate Left and Rotate Right buttons helps to into the Validation Mode, which immediately opens up
create a suitable view. the Monitor Panel on the right side of the screen.
The final graph should closely resemble the following

screenshot and, in this view, the properties of this first
learned Bayesian network become immediately apparent.
This network is a now compact representation of the 47
dimensions of the joint probability distribution of the
underlying database.
This panel is initially empty, but by clicking on any node

or multiple nodes in my network, Monitors appear

inside the Monitor Panel and the corresponding nodes

are highlighted in yellow.
The gray arrows inside the bars indicate how the

distributions have changed compared to the previous
distributions. This means that respondents, who have
rated the Flowery attribute of a perfume at the top level,
will have a 67% probability of also assigning a top
rating to the Fresh attribute.
P(Fresh = " > 8.2" | Flowery = " > 8.2") = 66.9%
Note
By default, the Monitors show the marginal distributions The structure of our Bayesian network may be
of all selected variables. This shows, for instance, 9.7% directed, but the directions of the arcs do not
necessarily have to be meaningful.
of respondents rated their perfume at <=2.8 in terms of
the Fresh attribute. For observational inference, it is only necessary that
the Bayesian network correctly represents the joint
probability distribution of the underlying database.
On this basis, one can start to experiment with the
properties of this particular Bayesian network and query
it. With BayesiaLab this can be done in an extremely Switching briefly back into the Modeling Mode and by
intuitive way, i.e. by setting evidence (or observations) clicking on the Flowery node, one can see the
directly on the Monitors. For instance, we can compute probabilistic relationship between Flowery and Fresh in
the conditional probability distribution of Flowery, given detail. By learning the network, BayesiaLab has
that we have observed a specific value, i.e. a specific automatically created a contingency table for every
state of Fresh. In formal notation, this would be single direct relationship between nodes.
P(Flowery | Fresh)
We will now set Flowery to the state that represents the

highest rating (>8.2) and we can immediately observe the
conditional probability distribution of Fresh, i.e.
P(Fresh | Flowery = " > 8.2")

All contingency tables, together with the graph structure, Formal Definition of Mutual Information
thus encode the joint probability distribution of our
original database. ⎛ p(x, y) ⎞
I(X;Y ) = ∑ ∑ p(x, y)log ⎜
Returning to the Validation Mode, we can further y∈Y x∈X ⎝ p(x)p(y) ⎟⎠
examine the properties of our network. Of great interest
is the strength of the probabilistic relationships between
the variables. In BayesiaLab this can be shown by We can also show the values of the Mutual Information
selecting Analysis>Graphic>Arcs’ Mutual Information. on the graph by clicking on Display Arc Comments.
The thickness of the arcs is now proportional to the

Mutual Information, i.e. the strength of the relationship
between the nodes.
In the top part of the comment box

attached to each arc the Mutual
Information of the arc is shown. Below,
expressed as a percentage and highlighted
in blue, we see the relative Mutual
Information in the direction of the arc (parent node ➔
Intuitively, Mutual Information measures the
child node). And, at the bottom, we have the relative
information that X and Y share: it measures how much
mutual information in the opposite direction of the arc
knowing one of these variables reduces our uncertainty
(child node ➔ parent node).
about the other. For example, if X and Y are
independent, then knowing X does not provide any Variable Clustering
information about Y and vice versa, so their mutual The information about the strength between the manifest
information is zero. At the other extreme, if X and Y are variables can also be utilized for purposes of Variable
identical then all information conveyed by X is shared Clustering. More specifically, a concept related closely to
with Y: knowing X determines the value of Y and vice the Mutual Information, namely the Kullback-Leibler
versa. Divergence (K-L Divergence) is utilized for clustering.

For probability distributions P and Q of a discrete

random variable their K–L divergence is defined to be
P(i)
DKL = (P || Q) = ∑ P(i)log
i Q(i)
In words, it is the average of the logarithmic difference
between the joint probability distributions P(i) and Q(i),
where the average is taken using the probabilities P(i).
Such variable clusters will allow us to induce new latent

variables, which each represent a common concept
among the manifest variables.7 From here on, we will
make a very clear distinction between manifest variables,
which are directly observed, such as the survey
responses, and latent variables, which are derived. In
traditional statistics, deriving such latent variables or
factors is typically performed by means of Factor In this case, BayesiaLab has identified 15 variable
Analysis, e.g. Principal Components Analysis (PCA). clusters and each node is color-coded according to the
cluster membership. To interpret these newly-found
In BayesiaLab, this “factor extraction” can be done very clusters, we can zoom in and visually examine the
easily via the Analysis>Graphics>Variable Clustering structure on the graph panel.
function, which is also accessible through the keyboard
shortcut “s”.
The speed in which this is performed is one of the

strengths of BayesiaLab, as the resulting variable clusters To support the interpretation process, BayesiaLab can
are presented instantly. also display a Dendrogram, which allows the analyst to
review the linkage of nodes into variable clusters.
7 An alternative approach is to interpret the derived concept or factor as a hidden common cause.

The analyst may also choose a different number of

clusters, based on his own judgement relating to the
domain. A slider in the toolbar allows to choose various
numbers of clusters and the color association of the
nodes will be update instantly. The analyst also has the option to use his domain
knowledge to modify which manifest variables belong to
specific factors. This can be done by right-clicking on the
Graph Panel and selecting Class Editor.
By clicking the Validate Clustering button in the

toolbar, the clusters are saved and the color codes will be
formally associated with the nodes. A clustering report
provides us with a formal summary of the new factors
and their associated manifest variables.8
Multiple Clustering
As our next step towards building the PSEM, we will
introduce these newly-generated latent factors into our
existing network and also estimate their probabilistic
relationships with the manifest variables. This means we
will create a new node for each latent factor, creating 15
new dimensions in our network. For this step, we will
need to return to the Modeling Mode, because the
introduction of the factor nodes into the networks
requires the learning algorithms.
8 Variable cluster = derived concept = unobserved latent variable = hidden cause = extracted factor.

new factor will need to represent the corresponding

manifest variables with up to five states.
The Multiple Clustering process concludes with a report,

which shows details regarding the generated clustering.
The top portion of the report is shown in the following
screenshot.
More specifically, we select Learning>Multiple

Clustering, which brings up the Multiple Clustering
dialogue. There is a range of settings, but we will focus
here only a subset. Firstly, we need to specify an output
directory for the to-be-learned networks. Secondly, we
need to set some parameters for the clustering process,
such as the minimum and maximum number of states,
which can be created during the learning process.
The detail section of Factor_0, as it relates to the

manifest variables, is worth highlighting. Here we can
see the strength of the relationship between the manifest
variables, such as Trust, Bold, etc., and Factor_0. In a
traditional Factor Analysis, this would be the equivalent
of factor loading.
After closing the report, we will now see a new

(unconnected) network, with 15 additional nodes, one
for each factor, i.e. Factor_0 through Factor_14,
highlighted in yellow in the screenshot.
In our example, we select Automatic Selection of the

Number of Classes, which will allow the learning
algorithm to find the optimum number of factor states
up to a maximum of five states. This means that each

Analysis of Factors Returning to the Validation Mode, we can see five states
We can also further examine how the new factors relate for Factor_0, labeled C1 through C5, as well as their
to the manifest variables and how well they represent marginal distribution. As Factor_0 is a target node by
them. In the case of Factor_0, we want to understand default, it automatically appears highlighted in red in the
how it can summarize our five manifest variables. Monitor Panel.
By going into our previously-specified output directory,

using the Windows Explorer or the Mac Finder, we can
see that 15 new networks (in BayesiaLab’s xbl format for
networks) were generated. We open the specific network
for Factor_0, either by directly double-clicking the xbl
file or by selecting Network>Open. The factor-specific
networks are identified by a suffix/extension of the
format “_[Factor_#].xbl” and “#” stands for the factor
number. We then see a network including the manifest
variables and with the factor being linked by arcs going
from the factor to the manifest variables.
Here we can also study how the states of the manifest

variables relate to the states of Factor_0. This can be
done easily by setting observations to the monitors, e.g.
setting C1 to 100%.

which will bring up a record selector in the toolbar.
With this record selector, we can now scroll through the

entire database, review the actual ratings of the
respondents and then see the estimation to which cluster
each respondent belongs.
We now see that given that Factor_0 is in state C1, the
variable Active has a probability of approx. 75% of
being in state <=2.8. Expressed more formally, we would
state P(Active = “<=2.8” | Factor_0 = C1) = 74.57%.
This means that for respondents, who have been
assigned to C1, it is likely that they would rate the Active
attribute very low as well.
In the Monitor for Factor_0, in parentheses behind the

cluster name, we find the expected mean value of the
numeric equivalents of the states of the manifest
variables, e.g. “C1 (2.08)”. That means that given the
state C1 of Factor_0, we expect the mean value of Trust,
Bold, Fulfilled, Active and Character to be 2.08.
In our first case, record 0, we see the ratings of this

respondent indicated by the manifest Monitors. In the
highlighted Monitor for Factor_0 we read that this
respondent, given her responses, has a 82% probability
of belonging to Cluster 5 (C5) in Factor_0.
Moving to our second case, record 1, we see that the

respondent belongs to Cluster 3 (C3) with a 96%
To go into even greater detail, we can actually look at probability.
every single respondent, i.e. every record in the database,
and see what cluster they were assigned to. We select
Inference>Interactive Inference,

We can also evaluate the performance of our new

network based on Factor_0 by selecting
Analysis>Network Performance>Global. Before we re-learn our network with the new factors, we
need to include Purchase Intent as a variable and also
impose a number of constraints in the form of Forbidden
Arcs.
Being in the Modeling Mode, we can include Purchase

Intent by right-clicking the node and uncheck Exclusion.
This will return the log-likelihood density function, as
shown in the following screenshot.
This makes the Purchase Intent variable available in the

next stage of learning, which is reflected visually as well
in the node color and the icon.
Completing the PSEM
We are now returning to our main task and our principal
network, which has been augmented by the 15 new
factors.

Our desired SEM-type network structure stipulates that

manifest variables be connected exclusively to the factors
and that all the connections with Purchase Intent must
also go through the factors. We achieve such a structure
by imposing the following sets of forbidden arcs:
1. No arcs between manifest variables
2. No arcs from manifest variables to factors
3. No arcs between manifest variables and Purchase

Intent
We can define these forbidden arcs by right-clicking

anywhere on the graph panel, which brings up the
following menu.
In BayesiaLab, all manifest variables and all factors are

conveniently grouped into classes, so we can easily define Upon completing this step, we can proceed to learning
which arcs are forbidden in the Forbidden Arc Editor. our network again: Learning>Association
Discovering>EQ
The initial result will resemble the following screenshot.

comments by double-clicking Factor_0 and scrolling to

the right inside the Node Editor until we see the
Comments tab.
Using the Force Directed Layout algorithm (shortcut

“p”), as before, we can quickly transform this network We repeat this for all other nodes and we can
into a much more interpretable format. subsequently display the node comments for all factors
by clicking the Display Node Comment icon in the
toolbar or by selecting View>Display Node Comments
from the menu.
Market Driver Analysis

Our model, the PSEM, is complete and we can now use
it to perform the actual analysis part of this exercise,
Now we see manifest variables “laddering up” to the namely to find out what “drives” Purchase Intent.
factors and we also see how the factors are related to
each other. Most importantly, we can observe where the We return to the Validation Mode and right-click on
Purchase Intent node was attached to the network Purchase Intent and then check Set As Target Node.
during the learning process. The structure conveys that Double-clicking the node while pressing “t” is a helpful
Purchase Intent has the strongest link with Factor_2. shortcut.
Now that we can see the big picture, it is perhaps

appropriate to give the factors more descriptive names.
For obvious reasons, this task is the responsibility of the
analyst. In this case study, Factor_0 was given the name
“Self-Confident”. We add this name into the node

The resulting view has all the manifest variables grayed-

out, so the relationship between the factors becomes
more prominent. By deselecting the manifest variables,
This will also change the appearance of the node and we also exclude them from subsequent analysis.
literally give it the look of a target.
In order to understand the relationship between the

factors and Purchase Intent, we want to tune out all the
manifest variables for the time being. We can do so by
right-clicking the Use of Classes icon in the bottom right
corner of the screen. This will bring up a list of all
classes. By default, all are checked and thus visible.
We will now right-click inside the (currently empty)

Monitor Panel and select Monitors Sorted wrt Target
Variable Correlations. The keyboard shortcut “x” will
do the same.
For our purposes, we want to deselect All and then only

check the Factor class.

“Correlations” is more of a metaphor here, as

BayesiaLab actually orders the factors by their mutual
information relative to the target node, Purchase Intent.
This brings up the monitor for the target node, Purchase

Intent, plus all the monitors for the factors, in the order
of the strength of relationship with the Target Node.
By clicking Quadrants, we can obtain a type of

opportunity graph, which shows the mean value of each
factor on the x-axis and the relative Mutual Information
with Purchase Intent on the y-axis. Mutual Information
can be interpreted as importance in this context.
This immediately highlights the order of importance of

the factors relative to the Target Node, Purchase Intent.
Another way of comprehensively displaying the
importance is by selecting Reports>Target
Analysis>Correlations With the Target Node

By right-clicking on the graph, we can switch between these constraints will be extremely important when
the display of the formal factor names, e.g. Factor_0, searching for realistic product scenarios.
Factor_1, etc., and the factor comments, such as
On a side note, an example from the presumably more
Adequacy, Seduction, which is much easier for
tangible auto industry may better illustrate such kinds of
interpretation.
constraints. For instance, a vehicle platform may have an
As in the previous views, it becomes very obvious that inherent wheelbase limitation, which thus sets a hard
the factor Adequacy is most important with regard to limit regarding the maximum amount of rear passenger
Purchase Intent, followed by the factor Seduction. This is legroom. Even if consumers perceived a need for
very helpful for understanding the overall market improvement on this attribute, making such a
dynamics and for communicating the key drivers to recommendation to the engineers would be futile. As we
managerial decision makers. search for optimum product solutions with our Bayesian
network, this is very important to bear in mind and thus
The lines dividing the graph into quadrants reflect the we must formally encode these constraints of our
mean values for each axis. The upper-left quadrant
domain through the Cost Editor.
highlights opportunities as these particular factors are
“above average” in importance, but “below average” in Product Optimization
terms of their rating.
We now return briefly to the Modeling Mode to include
the Product variable, which has been excluded from our
Product Driver Analysis
analysis thus far. Right-clicking the node and then
Although this insight is relevant for the whole market, it
unchecking Properties>Exclusion will achieve this.
does not yet allow us to work on improving specific
products. For this we need to look at product-specific At this time, we will also move beyond the analysis of
graphs. In addition, we may need to introduce factors and actually look at the individual product
constraints as to where we may not have the ability to attributes, so we select Manifest from the Display
impact any attributes. Such information must come from Classes menu.
the domain expert, in our case from the perfumer, who
will determine if and how odoriferous compounds can
affect the consumers’ perception of the product
attributes.
Back in the Validation Mode, we can perform a Multi

Quadrant Analysis: Tools>Multi Quadrant Analysis
These constraints can be entered into BayesiaLab’s Cost

Editor, which is accessible by right-clicking anywhere in
the Graph Panel. Those attributes, which cannot be This tool allows us to look at the attribute ratings of
changed (as determined by the expert), will be set to each product and their respective importance, as
“Not Observable”. As we proceed with our analysis, expressed with the Mutual Information. Thus we pick

Product as the Selector Node and choose Mutual

Information for Analysis. In this case, we also want to
check Linearize Nodes’ Values, Regenerate Values and
specify an Output Directory, where the product-specific
networks will be saved. In the process of generating the
Multi Quadrant Analysis, BayesiaLab will actually
generate one Bayesian network for each Product. For all
Products the network structure will be identical to the
network for the entire market, however, the parameters,
i.e. the contingency tables, will be specific to each
Product.
For Product No. 5, Personality is at the very top of the

importance scale. But how will the Personality attribute
However, before we proceed to the product-specific compare in the competitive context? If we Display Scales
networks, we will first see a Multi Quadrant Analysis by by right-clicking on the graph, it appears that Personality
Product and we can select each product’s graph simply is already at the best level among the competitors, i.e. to
by right-clicking and choosing the appropriate product the far right of the horizontal scale. On the other hand,
identification number. on the Fresh attribute Product No. 5 9 marks the bottom
end of the competitive range.
Please note that only the observable variables are visible
on the chart, i.e. those variables which were not
previously defined as “Not Observable” in the Cost
Editor.
9 Any similarities of identifiers with actual product names are purely coincidental.

For a perfumer it would thus be reasonable to assume

that there is limited room for improvement in regard to
Personality and that Fresh offers perhaps significant
opportunity for Product No. 5.
To highlight the differences between products, we will

also show Product No. 1 in comparison.
BayesiaLab also allows us to measure and save the “gap

to best level” (=variations) for each product and each
variable through the Export Variations function. This
formally captures our opportunity for improvement.
For Product No. 1 it becomes apparent that Intensity is

highly important, but that its rating is towards the
bottom end of the scale. The perfumer may thus
conclude a bolder version of the same fragrance will
improve Purchase Intent.
Finally, by hovering over any data point in the

opportunity chart, BayesiaLab can also display the
position of competitors compared to the reference
product for any attribute. The screenshot shows Product
No. 5 as the reference and the position of competitors on
the Personality attribute. Please note that these variations need to be saved
individually by Product.
By now we have all the components necessary for a

comprehensive optimization of product attributes:
1. Constraints on “non-actionable” attributes, i.e.

excluding those variables, which can’t be affected
through product changes.
2. A Bayesian network for each Product.

3. The current attribute rating of each Product and each

attribute’s importance relative to Purchase Intent.
4. The “gap to best level” (variation) for each attribute

and Product.
With the above, we are now in a position to search for

realistic product configurations, based on the existing
product, which would realistically optimize Purchase
Intent.
We proceed individually by Product and for illustration

purposes we use Product No. 5 again. We load the
product-specific network, which was previously saved
when the Multi Quadrant Analysis was performed.
The Target Dynamic Profile provides a number of

important options:
• Profile Search Criterion: we intend to optimize the

mean of the Purchase Intent.
• Criterion Optimization: maximization is the objective.
• Search Method: We select Mean and also click on Edit

One of the powerful features of BayesiaLab is Target Variations, which allows us to manually stipulate the
Dynamic Profile, which we will apply here on this range of possible variations of each attribute. In our
network to optimize Purchase Intent: case, however, we had saved the actual variations of
Analysis>Report>Target Analysis>Target Dynamic Product No. 5 versus the competition, so we load that
Profile data set, which subsequently displays the values in the
Variation Editor. For example, Fresh could be
improved by 10.7% before catching up to the highest-

rated product in this attribute. Initially, we have the marginal distribution of the
attributes and the original mean value for Purchase
Intent, i.e. 3.77.
• Search Stop Criterion: We check Maximum Number

of Evidence Reached and set this parameter to 4. This
means that no more than the top-four attributes will
be suggested for improvement.
To further illustrate the impact of our product actions,
Upon completion of all computations, we will obtain a
we will simulate their implementation step-by-step,
list of product action priorities: Fresh, Fruity, Flowery
which is available through Inference>Interactive
and Wooded.
Inference.
With the selector in the toolbar, we can go through each
product action step-by-step in the order in which they

The highlighted Value/Mean column shows the
were recommended.
successive improvement upon implementation of each
action. From initially 3.76, the Purchase Intent improves Upon implementation of the first product action, we
to 3.92, which may seem like a fairly small step. obtain the following picture and Purchase Intent grows
However, the importance lies in the fact that this to 3.9. Please note that this is not a sea change in terms
improvement is not based on utopian thinking, but of Purchase Intent, but rather a realistic consumer
rather on attainable product improvements within the response to a product change.
range of competitive performance.

The second change results in further subtle improvement Although BayesiaLab generates these recommendation
to Purchase Intent: very quickly and easily, they represent a major
innovation in the field of marketing science. This
particular optimization task has not been tractable with
traditional methods.
Conclusion
The presented case study demonstrates how BayesiaLab
can transform simple survey data into a deep
understanding of consumers’ thinking and quickly
provides previously-inconceivable product
recommendations. As such, BayesiaLab is an
revolutionary tool, especially as the workflow shown
here may take no more than a few hours for an analyst
to implement. This kind of rapid and “actionable”10
insight is clearly a breakthrough and creates an entirely
new level of relevance of research for business
The third and fourth step are analogous and bring us to
applications.
the final value for Purchase Intent of 3.92.
10 The authors cringe at the inflationary use of “actionable”, but here, for once, it actually seems appropriate.

Contact Information Copyright

© Conrady Applied Science, LLC and Bayesia SAS 2010.
Conrady Applied Science, LLC All rights reserved.
312 Hamlet’s End Way
Franklin, TN 37067 Any redistribution or reproduction of part or all of the
USA contents in any form is prohibited other than the
+1 888-386-8383 following:
info@conradyscience.com
www.conradyscience.com • You may print or download this document for your
personal and non-commercial use only.
Bayesia SAS
6, rue Léonard de Vinci • You may copy the content to individual third parties
for their personal use, but only if you acknowledge
BP 119
Conrady Applied Science as the source of the material.
53001 Laval Cedex
France
• You may not, except with our express written
+33(0)2 43 49 75 69 permission, distribute or commercially exploit the
info@bayesia.com content. Nor may you transmit it or store it in any
www.bayesia.com other website or other form of electronic retrieval
system.

Driver Analysis and Product Optimization Using Bayesian Networks

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Driver Analysis and Product Optimization Using Bayesian Networks

Загружено:

Авторское право:

Доступные форматы

Tutorial on Driver Analysis and Product Optimization

Stefan Conrady, stefan.conrady@conradyscience.com

Dr. Lionel Jouffe, jouffe@bayesia.com

Tutorial on Driver Analysis and Product Optimization with BayesiaLab

Market Driver Analysis 16

Product Driver Analysis 19

Driver Analysis and Product Optimization with BayesiaLab i

Conrady Applied Science Structural Equation Models

America. economist Trygve Haavelmo (1943) and the cognitive

Structural Equation Models (SEM) allow both

Conrady Applied Science, LLC - www.conradyscience.com 1

In order to clearly distinguish between natural language,

Conrady Applied Science, LLC - www.conradyscience.com 2

Product variable and clicking the Discrete check box,

The table displayed in the Data Import wizard shows the

For this example, we will need to override the default

Conrady Applied Science, LLC - www.conradyscience.com 3

continuous variables.6 For this survey, and given the

Clicking Select All Continuous followed by Finish

The screenshot shows the dialogue for the Manual

For choosing discretization algorithms beyond this

• For supervised learning, choose Decision Tree.

• For unsupervised learning, choose, in the order of

For this particular example, we select Equal Distances

For reasons, which will become clear later, we will

6 BayesiaLab requires discrete distributions for all variables.

Conrady Applied Science, LLC - www.conradyscience.com 4

The resulting view shows the learned network with all

It can be invoked by View>Automatic Layout>Force

Conrady Applied Science, LLC - www.conradyscience.com 5

The resulting network will look similar to the following

It is very important to note that, although this learned

The final graph should closely resemble the following

This panel is initially empty, but by clicking on any node

Conrady Applied Science, LLC - www.conradyscience.com 6

inside the Monitor Panel and the corresponding nodes

The gray arrows inside the bars indicate how the

P(Fresh = " > 8.2" | Flowery = " > 8.2") = 66.9%

We will now set Flowery to the state that represents the

P(Fresh | Flowery = " > 8.2")

Conrady Applied Science, LLC - www.conradyscience.com 7

The thickness of the arcs is now proportional to the

In the top part of the comment box

Conrady Applied Science, LLC - www.conradyscience.com 8

For probability distributions P and Q of a discrete

Such variable clusters will allow us to induce new latent

The speed in which this is performed is one of the

Conrady Applied Science, LLC - www.conradyscience.com 9

The analyst may also choose a different number of

By clicking the Validate Clustering button in the

Conrady Applied Science, LLC - www.conradyscience.com 10

new factor will need to represent the corresponding

The Multiple Clustering process concludes with a report,

More specifically, we select Learning>Multiple

The detail section of Factor_0, as it relates to the

After closing the report, we will now see a new

In our example, we select Automatic Selection of the

Conrady Applied Science, LLC - www.conradyscience.com 11

By going into our previously-specified output directory,

Here we can also study how the states of the manifest

Conrady Applied Science, LLC - www.conradyscience.com 12

which will bring up a record selector in the toolbar.

With this record selector, we can now scroll through the