Вы находитесь на странице: 1из 11

Food Quality and Preference 16 (2005) 315325 www.elsevier.

com/locate/foodqual

PLS methodology to study relationships between hedonic judgements and product characteristics
Michel Tenenhaus
c

a,*
a

^me Pag , J ero es b, Laurence Ambroisine c, Christiane Guinot

HEC School of Management (GREGHEC), 78351 Jouy-en-Josas, France b Agrocampus, 65 rue de Saint-Brieuc, CS 84215, 35042 Rennes, France CE.R.I.E.S. 1, Biometrics and Epidemiology Unit, 20 rue Victor Noir, 92521 Neuilly-sur-Seine, France Received 7 May 2003; received in revised form 15 April 2004; accepted 24 May 2004 Available online 10 July 2004

Abstract This paper depicts a methodology devoted to a situation where a few products are described by many physico-chemical and sensory characteristics, and are evaluated by consumers on a preference scale. The objective is to relate the block of hedonic variables to the physico-chemical and to the sensory blocks. The analysis of the link between the responses and the predictors using PLS regression allows to cluster the consumers in homogeneous groups with respect to their tastes, and in such a way that their behaviour can be related to the characteristics of the products. For each group, PLS regression allows obtaining a graphical display of the products with their characteristics, and a mapping of the consumers based on their preferences. Moreover, PLS path modelling allows a detailed analysis of each group by building a causal scheme: each block of consumers is related to the physicochemical and the sensory blocks, and the sensory block is itself related to the physico-chemical block. Finally this PLS path modelling is compared with hierarchical multi-block PLS model. 2004 Elsevier Ltd. All rights reserved.
Keywords: Hierarchical multi-block PLS model; PLS approach; PLS path modelling; PLS regression; Sensory analysis

1. Introduction The relationship between a set of hedonic judgements (or preferences) and a set of characteristics (sensory or instrumental) observed over the same products is a fundamental problem of sensory analysis. It has given rise to numerous works, deriving mostly from the PREFMAP (Carroll, 1972) and MDPREF methods (Chang & Carroll, 1969): these methods have been widely used in sensory science (McEwan, 1996). Today the most frequently used methodology consists of two points of view (Greenho & MacFie, 1994) known as external and internal. Let us suppose that n

* Corresponding author. Tel.: +33-01-39-67-72-49; fax: +33-01-3967-71-09. E-mail addresses: tenenhaus@hec.fr (M. Tenenhaus), jerome.pages@agrocampus-rennes.fr (J. Pag es), laurence.ambroisine@ceries-lab.com (L. Ambroisine), christiane.guinot@cerieslab.com (C. Guinot). 1 The CE.R.I.E.S. is a skin research centre funded by CHANEL.

products are described by p characteristics and are the subject of q hedonic judgements. In practice, these judgements are given by consumers: therefore when talking about hedonic judgements, we can use indierently the terms preferences, consumers, judgements or judges. The product characteristics are contained in table X (dimensions n and p); the hedonic judgements are contained in table Y (dimensions n and q). In the external analysis, each hedonic judgement is connected with the main factors of variability of the product characteristics. More specically, a principal component analysis is carried out on table X . Each hedonic judgement (column of Y ) is then connected with the principal components of X by means of a regression model: the model may only contain terms of degree one (vectorial model) or be supplemented by square terms (circular model, when the regression coecients of the principal components are equal by construction, or elliptical otherwise) or even interactions. The vectorial model simply consists in introducing the Y columns as supplementary variables in the principal component analysis of X .

0950-3293/$ - see front matter 2004 Elsevier Ltd. All rights reserved. doi:10.1016/j.foodqual.2004.05.013

316

M. Tenenhaus et al. / Food Quality and Preference 16 (2005) 315325

In the internal analysis, we rst look for the main factors of variability of the hedonic judgements. These factors are then linked to the characteristics of the products. More specically, a principal component analysis is made of table Y and the X columns are projected as supplementary variables. Based on this basic methodology, research has been conducted with dierent aims, directed towards the choice of representation space, display of preference areas or consumer segmentation. 1.1. Choice of representation space One compromise between internal and external analyses consists in choosing a space representing at the same time the variability of the product characteristics and of the hedonic judgements. A multiple factor analysis can thus be made of the juxtaposed tables X ; Y (Pag es, 1996; Pag es & Tenenhaus, 2001), or even a hierarchical multiple factor analysis (Le Dien & Pag es, 2003) if we wish to distinguish the groups within the X columns (e.g. sensory characteristics on the one hand and instrumental on the other). In this paper we follow the proposal of Huon de Kermadec (2001): in the external analysis the X principal components are replaced by the X PLS components coming from the PLS regression of Y on X . This approach has been adopted for two reasons. The rst one is the idea of summarising the product characteristics X by components which explain simultaneously the product characteristics X and the hedonic judgements Y . This is precisely the objective of PLS regression. The second reason for choosing this approach is the objective of dividing each judgment Yk bk is explained by the into two parts: the rst one Y product characteristics and is calculated by PLS regression, the second one is not related to the product characteristics and is the residual of the PLS regression. 1.2. Display of the products, their characteristics X and the hedonic judgements Y In the external or internal analysis, the products, their characteristics X and the hedonic judgements Y can be displayed onto one plot. The judgements and the product characteristics are represented by points with coordinates equal to their correlations with the two rst principal or PLS components. The products are also represented using the correlations between the product dummy variables (a product dummy variable is equal to one for the specic product and zero for the other products) and the principal or PLS components. This representation is the correlation loading plot proposed by Martens and Martens (2001) and is available in the Unscrambler and SIMCA-P 10.5 software. Another map is very useful when the number of judges is large. A biplot of the products and their

characteristics X is obtained by superposition of the PLS components t1 ; t2 of the PLS regression of Y on X and the regression coecient vectors p1 ; p2 of the regression of X on t1 , t2 . For each point t1 ; t2 it is possible to estimate the percent of judges classifying favourably a product with characteristics summarised by t1 ; t2 : it is bk c1k t1 c2k t2 above the average. the percent of Y Hence the idea of bringing together the response areas of all consumers by using contour lines of judge preferences (Danzart, 1998). 1.3. Consumer segmentation In many situations, the diversity of hedonic judgements is important. This suggests segmenting consumers according to their preferences and studying separately each of the groups obtained. This partitioning may either be based on the raw hedonic judgements (Pag es, 1996; Courcoux & Chavanne, 2001; Vigneau, Qannari, Punter, & Knoops, 2001) or on the part of the hedonic judgements which can be explained by the product characteristics (Pag es & Tenenhaus, 2001; Vigneau & Qannari, 2002; Hilgesen, Solheim, & Ns, 1997; Huon de Kermadec, Durand, & Sabatier, 1997) so as to obtain groups of consumers whose preferences are more easily explainable by the characteristics of the products. The methodology presented here falls into these aims: we want to establish homogeneous groups of consumers whose preferences are, as far as possible, explainable by the characteristics of the products. To do that, we look for a representation space that depends only on the product characteristics (more specically, this space should be created by linear combinations of these characteristics) but that represents both these characteristics and the hedonic judgements: this space can be obtained by PLS regression. The homogeneous consumer groups are built on the basis of their representation in this space. Each of the groups is then studied separately. Each separate study involves a display of the group preference areas, by means of a scatter of points and by contour lines, followed by the denition of a model relating preferences to the product characteristics, by means of PLS regression, by PLS path modelling and nally by a hierarchical multi-block PLS model.

2. The orange juices example Six orange juices were selected from the most well known brands in France. Three products can be stored at room temperature (Joker, Pampryl and Tropicana all at room temperature (r.t.)) and three others have to be stored in refrigerated conditions (Fruivita, Pampryl and Tropicana all refrigerated (refr.)). Table 1 provides an extract of the data. The rst nine variables correspond

M. Tenenhaus et al. / Food Quality and Preference 16 (2005) 315325

317

Table 1 Extract from the orange juice data le, physico-chemical parameters, sensory characteristics mean values, and hedonic value given by each judge Pampryl r.t. Physico-chemical Glucose (g/l) Fructose (g/l) Saccharose (g/l) Sweetening power (g/l) pH before processing pH after centrifugation Titre (meq/l) Citric acid (g/l) Vitamin C (mg/100 g) Sensory Smell intensity Odour typicity Pulp Taste intensity Acidity Bitterness Sweetness Hedonic Judge 1 Judge 2 Judge 3 ... Judge 96 25.32 27.36 36.45 89.95 3.59 3.55 13.98 0.84 43.44 2.82 2.53 1.66 3.46 3.15 2.97 2.60 2.00 1.00 2.00 3.00 Tropicana r.t. 17.33 20.00 44.15 82.55 3.89 3.84 11.14 0.67 32.70 2.76 2.82 1.91 3.23 2.55 2.08 3.32 2.00 3.00 3.00 3.00 Fruivita refr. 23.65 25.65 52.12 102.22 3.85 3.81 11.51 0.69 37.00 2.83 2.88 4.00 3.45 2.42 1.76 3.38 3.00 3.00 4.00 4.00 Joker r.t. 32.42 34.54 22.92 90.71 3.60 3.58 15.75 0.95 36.60 2.76 2.59 1.66 3.37 3.05 2.56 2.80 2.00 2.00 2.00 2.00 Tropicana refr. 22.70 25.32 45.80 94.87 3.82 3.78 11.80 0.71 39.50 3.20 3.02 3.69 3.12 2.33 1.97 3.34 4.00 4.00 3.00 4.00 Pampryl refr. 27.16 29.48 38.94 96.51 3.68 3.66 12.21 0.74 27.00 3.07 2.73 3.34 3.54 3.31 2.63 2.90 3.00 1.00 1.00 1.00

to the physico-chemical data, the following seven to sensory assessments and the last 96 variables represent marks of appreciation of the products given by students at ENSA, Rennes using a ve-level scale. These gures have already been used in Pag es and Tenenhaus (2001) to illustrate the connection between multiple factor analysis and the PLS approach.

3. Choice of a representation space and search for homogeneous groups of consumers 3.1. Choice of a representation space A PLS regression of the standardised hedonic judgements Y on the standardised product characteristics X is carried out. The choice of the number m of PLS components is quite pragmatic in practice. In SIMCAP the number of components is chosen automatically by using various rules based on a statistic called Q2 . This statistic is used for measuring the importance of a new PLS component for predicting the whole set of Y or each individual Yk and is calculated by cross-validation. A new component is considered to be signicant if Q2 is larger than some threshold. SIMCA-P also proposes a statistic called Q2 cum which is a kind of cross-validated R2 . A new component is introduced if this Q2 cum increases signicantly, but there is no rule on what is meant as a signicant increase of this statistic.

Since release 9, SIMCA-P gives the condence intervals of the PLS regression coecients bkj computed by Jack-knife. These condence intervals are very sensitive to the number of components. We have noticed in many applications that models with a large number of components may lead to very large condence intervals: PLS regression coecients bkj that were signicant in models with less components may therefore become non-signicant. In sensory analysis, two components are generally sucient owing to the low number of products. Usually the rst component is globally signicant and describes the tastes of the majority of judges. The other components are more local and describe the tastes of smaller groups of judges. The rst component is also probably more stable than the following components on a new sample. Finally the choice of two components allows convenient graphical displays to be made. A single scatter plot summarising all the information provided by the method can be built. Variables X and Y are visualised on a graphical display also called the correlation circle using their correlations with the PLS components t1 and t2 . It may be read in a similar way to principal component analysis loading plot: in particular, the cosines of the angles formed by the vector-variables in the plan give an indication of the correlations between the original variables. This representation can be completed by a representation of the product dummy variables described above and using the same principle. This graphical display is now a common feature in external

318

M. Tenenhaus et al. / Food Quality and Preference 16 (2005) 315325

II
1.0
R=1
44 53 sweetening power 21 66 89 pulp 40 36 90 1 3320 73 83 smell intensity

glucose

.5

fructose 76 acidity 56

taste intensity 23 13

PAMPRYL refr. 49

R=.20

22

III

citric acid JOKER r.t. 0.085 titer 64 bitterness82 27 PAMPRYL r.t. 93 9 14 57 81 74

-.5

42 65 17 67

18 69 50 FRUIVITA refr. 79 12 72 71 43 8 30 78 32 odor typicity TROPICANA refr . 41 37 46 55 68 saccharose 86 6 52 95 48 sweetness 59 7554 77 84 35 80 96 pH after centrifugation vitamin C 19 63 734 70 2 92 3pH91 25 processing before 88 31 11 51 94 60 10 4 61 62 39 58 24 16 45 87 47 5 29 28 1526 TROPICANA r.t. 38

adequate when a vectorial model is used, and should be adapted when an ideal point model is preferable. The correlations of judge k with the rst two PLS components t1 , t2 enable his preferences to be estimated by writing the usual PLS prediction formula in term of the rst two standardised PLS components t1 and t2 : bk c1k t1 c2k t2 CorYk ; t1 t CorYk ; t2 t Y 1 2
I

-1.0 -1.0

-.5

0.0

.5

1.0

Axis 1

IV

Fig. 1. Correlation circle of the products, characteristics and judges with t1 ; t2 and construction of the judge clustering using the two bisectors (the four groups are indicated by roman gures).

preference analysis and in PLS regression. It can be carried out using the Unscrambler software, release 8 (2003) or the SIMCA-P software, release 10.5 (2004). The obtained map is very close to the usual loading plot (w c mapping of SIMCA-P ) of PLS regression, up to a normalisation, but presents the big advantage of allowing the representation of supplementary variables. In the orange juices example, the PLS regression of the 96 judges Y on the 16 physico-chemical and sensory variables X , using the rst two components, leads to the Products Characteristics Judgments map shown in Fig. 1. This is the correlation circle between the Products, Characteristics and Judgments variables and the rst two PLS components. With the usual criteria, the quality of this regression is low: the global R2 between Y and t1 ; t2 is equal to 0.53 (compared with a variance of Y explained by the rst two principal components of Y equal to 0.55), but the R2 resulting from the cross-validation (Q2 cum in SIMCA-P ) only amounts to 0.06. This low value is due to the extreme diversity of the stated opinions. 3.2. Clustering of judges Due to the very low number of products the use of a vectorial model to relate the preferences to the PLS components t1 and t2 is preferable. Another reason in advantage of the vectorial model is the fact that PLS regression is a good compromise between regression on the principal components of X and multiple regression of Y on X . PLS regression is in some way optimised to give simultaneously good X components and good regressions of each Yk on the X components. More generally the methodology described in this paper is

The correlations of Yk with t1 ; t2 therefore reect the portion of preferences of judge k explainable by the characteristics X of the products. Fig. 1 shows that judges have very dierent opinions on the products and that some judges preferences cannot be related to product characteristics since they are located in the centre of the graphical display. It does not seem useful to take these judges into account in the clustering: we are trying to identify groups that can be interpreted clearly in terms of characteristics of the products and not an exhaustive description of the population. Therefore we decided, somewhat arbitrarily, to eliminate from the analysis judges k with a R2 Yk ; t1 ; t2 6 0:20, indicated by a dotted circle in Fig. 1. Since the PLS components are orthogonal (there is no missing data), this condition can be written on the coordinates of judge k in the plan in the following manner: R2 Yk ; t1 ; t2 CorYk ; t1 2 CorYk ; t2 2 6 0:20. Similarly, vitamin C plays a rather unimportant role in explaining the preferences, as shown in Fig. 1. It is not surprising because it is known that vitamin C plays no role on the sensory aspects of the orange juices; therefore this variable was removed from the analysis. An analysis without these poorly represented or scarcely explicative elements has led to a very close map (not shown). Consequently, for the sake of simplicity, this rst map is used. In order to build a clustering of the judges based on preferences explainable by the product characteristics, we simply use their coordinates CorYk ; t1 ; CorYk ; t2 shown in the plan. The plan is divided into four areas bordered by the two bisectors thus obtaining the four groups of judges shown in Fig. 1. The underlying idea is that, in this context ofP sensory analysis, a modied Y PLS component ~ uh q k 1 CorYk ; th Yk is an interpretable weighted sum of judges only when all the weights CorYk ; th are positive. When this is the case the modied PLS component ~ uh summarises a group of homogeneous judges and the weight of each judge is proportional to its correlation with the X PLS component th . The four groups of judges shown in Fig. 1 should be homogeneous. Therefore it should make sense to summarise each group through a weighted sum of judges considered as a typical judge for the group. In this approach we suppose that the PLS components are much more stable and meaningful than the individual judges. Each PLS component can be considered as a sensory dimension. In order to analyse more precisely a PLS component it is useful to analyse separately each side of this component. There-

Axis 2

M. Tenenhaus et al. / Food Quality and Preference 16 (2005) 315325

319

fore we have rerun a PLS regression for each group of judges more related to the side considered than to the other PLS component sides. We could have used a clustering algorithm for the classication of judges Yk described by their correlations with t1 ; t2 . But our very simple approach ts better with our objective which consists more in describing the PLS components than in classifying judges on which we have in fact no information.

1.0
sweetening power pulp glucose fructose
50 43 73 18 71

smell intensity
8

69 1279

0.5

taste intensity PAMPRYL refr .

FRUIVITA refr .
86

55 30o odor 48 6

typicity

titer citric acid

TROPICANA refr. JOKER r.t.


31 94

52

0.0

acidity bitterness PAMPRYL r.t.

saccharose 6835 59 3 2 sweetness 84 63 96 91 pH after centrifugation 25


92 4 39

pH 77 before processing
10 58 61

60 11

Axis 2

4. Interpretation of the sensory dimensions by separate studies of each group of consumers The interpretation of each dimension can be done with more precision by analysing each group of consumers separately. By way of example, the results of the analysis of the rst group are presented below. 4.1. Representation space of group I Fig. 2 shows the variable map in the PLS regression of the Y s of the 37 judges in group I alone (block denoted by YI ) on the characteristics X of the products, excluding vitamin C. The quality of this PLS regression with two components is noticeably higher than the previous one: R2 0:633 and Q2 cum 0:266. This group prefers Fruivita and the Tropicanas; it rejects Joker and the Pampryls. By comparing the maps in Figs. 1 and 2, it may be noticed that the map shown in Fig. 2 is virtually an extract of Fig. 1 in which only the judges of group I are represented. This conrms the stability and robustness of PLS regression for this type of application. 4.2. Construction of the average judge representing group I and modelling of his relationship with the product characteristics The P usual PLS component u1 is proportional to ~ u1 q k 1 CorYk ; t1 Yk . In fact, u1 represents more or less an average of the standardised assessments of the judges since the obtained correlation CorYk ; t1 are very close and positive (Fig. 2 and Table 2). The variable u1

-.5
TROPICANA r.t.

-1.0 -1.0

-.5

0.0

.5

1.0

Axis 1

Fig. 2. Analysis of group I, correlation circle of the products, characteristics and judges with t1 ; t2 .

(Table 3) can be deemed to represent the overall mark of the products attributed by the group I judges and, on that account, constitutes an average judge representing group I. This group prefers refrigerated Fruivita and Tropicana, followed by Tropicana at room temperature, and rejects the other products. A new PLS regression of the PLS component u1 on the characteristics of the products X (R2 0:938 and Q2 cum 0:895 for a single component) indicates the signicant characteristics, as shown in Fig. 3. This group prefers product characteristics with high positive

Table 2 SIMCA-P Software output, correlations between YI and the rst two PLS components of the PLS regression of YI on X CorYk ; t1 Judge Judge Judge ... Judge Judge 2 3 4 94 96 0.815 0.738 0.622 0.433 0.649 CorYk ; t2 0.002 )0.011 )0.342 )0.279 )0.028

Table 3 SIMCA-P software output, PLS component u1 and main signicant characteristics of orange juices taken into consideration by the judges in group I Product Fruivita refr. Tropicana refr. Tropicana r.t. Pampryl refr. Joker r.t. Pampryl r.t. u1 3.61 3.00 2.06 )2.33 )2.94 )3.40 pH before processing 3.85 3.82 3.89 3.68 3.60 3.59 pH after centrifugation 3.81 3.78 3.84 3.66 3.58 3.55 Sweetness 3.4 3.3 3.3 2.9 2.8 2.6 Acidity 2.42 2.33 2.55 3.31 3.05 3.15 Bitterness 1.76 1.97 2.08 2.63 2.56 2.97

320

M. Tenenhaus et al. / Food Quality and Preference 16 (2005) 315325

CoeffCS [1] (u1)

0.10

0.00

-0.10

pH before processing

odour intensity

smell intensity

sweetening power

after centrifugation

taste intensity

Saccharose

citric acid

Glucose

Titre

Fig. 3. SIMCA-P software output, 95% Jack-knife condence intervals of the PLS regression coecients in the PLS regression of component u1 on X (one signicant PLS component).

values (products perceived as having a high sugar content or high pH) and rejects product characteristics with high negative values (products perceived as strongly acidic and bitter). This property can be veried in the extract of the product characteristics shown in Table 3. 4.3. Map of preference areas for each group of consumers 4.3.1. Principle For each point of the product map t1 ; t2 shown in Fig. 4, the PLS regression equation (1) enables us to estimate the standardised evaluation given by judge k for a theoretical product whose characteristics are summarised by t1 ; t2 . Since the variables Yk have been standardised, judge k is considered to classify this bk > h and unfavourably if product favourably if Y bk < h, where h is equal, for example, to 0 or 1. In Y practice, we calculate the percentage of judges classifying the product favourably on a grid of points t1 ; t2 ,

pH

and then build the iso-preference contour lines. This map also shows the products and characteristics using their coordinates p1 ; p2 up to a normalisation to allow legibility of the map. This biplot representation is justied as p1 and p2 are the regression coecient vectors of X on t1 and t2 . 4.3.2. Application to group I of consumers Let us give an example of the calculation of the percentage of judges classifying a product favourably. Here, we choose h 0 as we are interested in judges marking above the average. Let us select a product with characteristics X leading to t1 1 and t2 1:5. For each judge k , an estimated standardised evaluation for this product is calculated by using the usual PLS prediction formula bk c1k t1 c2k t2 CorYk ; t1 =st1 CorYk ; t2 = Y 1 st2 t2 , where st1 and st2 are the standard deviations of t1 and t2 . For example, as the standard deviations of t1 and t2 are respectively equal to 3.119 and 1.563 in this example, the evaluation of judge 2 for this bk 0:8148=3:119 1 product is estimated as Y 0:0023=1:563 1:5 0:26. Extending this calculation for the 37 judges of group I it can be deduced that the proportion of judges marking this product favourably bk > 0) is equal to 0.81. (Y The contour lines in the orange juices example are shown in Fig. 4. They show that the estimated percentage of judges in group I classifying products with characteristics close to those of refrigerated Fruivita and Tropicana above average is equal to 100%; this percentage amounts to around 70% for products with characteristics close to those of Tropicana at room temperature. It can also be seen that they particularly appreciate products with high sugar content and high pH. They reject bitter or acid products and those with high fructose and glucose levels. More specically, this map suggests theoretical products likely to be appreciated by this category of people: those situated in the 100% area. As a vectorial model has been used, one

Fig. 4. Map showing the percentages of the judges in group I classifying a product t1 ; t2 above average with the products and their characteristics X .

sweetness

Fructose

pulp

bitterness

acidity

M. Tenenhaus et al. / Food Quality and Preference 16 (2005) 315325

321

should take care to look at the part of this area not too far from the origin (as the quality of the forecast obtained by regression model (1) is weak for points far from the calibration data). 4.4. PLS path modelling of the relationships between preferences, physico-chemical and sensory characteristics of the products for each group of consumers PLS regression allows to link block Y (hedonic data) to block X (physico-chemical and sensory data). One may, however, wish to take into account the fact that there are actually two blocks of variables explaining hedonic data Y : block X1 (physico-chemical data) and block X2 (sensory data). 4.4.1. Principle Let us assume that the sensory variables depend on the physico-chemical variables and that the hedonic variables in turn depend on the physico-chemical variables and the sensory variables. Then we can build the arrow diagram shown in Fig. 5. The PLS approach ller, proposed by Herman Wold (Wold, 1985; Lohmo 1989; Chin, 1998; Chin & Newsted, 1999; Tenenhaus, 1999; Pag es & Tenenhaus, 2001; Tenenhaus, Esposito Vinzi, Chatelin, & Lauro, 2004) is appropriate to study this type of model. It can also be studied using a maximum likelihood approach when the number of cases is large (more than 200 cases). Then the use of LISREL reskog, 1970; Jo reskog & So rbum, 1989) is recom(Jo mended. In our example we have six products therefore we can only use the PLS approach to study the causal model shown in Fig. 5. In the arrow diagram shown in Fig. 5, we assume that each block of manifest variables (observable) is summarised by a latent variable (non-observable): the physico-chemical block X1 is summarised by n1 , the sensory block X2 by n2 and the hedonic block Y by n3 . These three latent variables are supposed to be standardised and are related between them by the two following structural equations:
Glucose Fructose Saccharose Sweetening power pH before processing pH after centrifugation Titre Citric acid Vitamin C Smell intensity Odour typicity Pulp Taste intensity Acidity Bitterness Sweetness

n2 b21 n1 e2 n3 b31 n1 b32 n2 e3

2 3

where the residuals eh follow the usual hypotheses made in multiple regression. The relationship between the manifest variables (observable) and the latent variables (non-observable) may be formative, i.e. the function of the latent variable is to summarise the manifest variables of the block or be reective, i.e. each manifest variable is considered as a reection of a latent variable existing a priori, a theoretical concept measured through the observed variables. The formative mode does not require the blocks to be unidimensional, while this requirement is compulsory for the reective mode. Here, we are more in a formative mode for the physico-chemical and sensory blocks and in a reective mode for the hedonic block, which is unidimensional by construction. The two modes are indicated by the direction of the arrows in Fig. 5. Thus we get the following equations to describe the links between manifest and latent variables: n1 X1 -1 d1 n2 X2 -2 d2 Yk pk n3 ek formative mode formative mode reflective mode 4 5 6

where d1 , d2 , and ek are residuals with usual hypotheses. Let us describe the PLS algorithm for this specic application. The external estimates ^ nh of the latent variables nh are dened by ^ n1 X1 w1 ^ n2 X2 w2 ^ n3 Yw3 7 8 9

where the whj are called the weights associated to the manifest variables. The causal model is used to dene the internal estimates Zh of the latent variables nh . For the calculation of the internal estimates of the latent variables, we use the centroid scheme recommended by Herman Wold (1985). Each internal estimate Zh is computed by using a signed sum of the external estimates ^ n of the latent variables n connected with nh . These estimates are dened here by Z1 signCor^ n1 ; ^ n2 ^ n2 signCor^ n1 ; ^ n3 ^ n3 10 11 12 Z2 signCor^ n2 ; ^ n1 ^ n1 signCor^ n2 ; ^ n3 ^ n3

1 3
Judge 2 Judge 3

Z3 signCor^ n3 ; ^ n1 ^ n1 signCor^ n3 ; ^ n2 ^ n2

Judge 96

Fig. 5. Theoretical model of relationships between the hedonic, physico-chemical and sensory data.

With regard to the PLS algorithm, it is recommended that the method of calculating external estimates of the latent variables is selected depending on the type of relationship between the manifest variables and their latent variables: Mode A for the reective type and Mode B for the formative type (Wold, 1985). When Mode A is used the weight whj associated with a

322

M. Tenenhaus et al. / Food Quality and Preference 16 (2005) 315325 Table 4 PLS-Graph software output, estimate of the latent variables Physico-chemical Pampryl r.t. Tropicana r.t. Fruivita refr. Joker r.t. Tropicana refr. Pampryl refr. )0.792 1.154 0.882 )1.689 0.616 )0.172 Sensory )1.381 0.465 0.958 )0.850 1.382 )0.577 Hedonic )1.159 0.673 1.248 )0.959 1.023 )0.825

manifest variable is calculated by simple regression of the manifest variable on the internal estimation of the latent variable. When Mode B is used the vector of weights wh associated with the manifest variables of one block is calculated by multiple regression of the internal estimation of the block latent variable on these manifest variables. The PLS algorithm is iterative. It starts by an arbitrary choice of the weights wh in Eqs. (7)(9) for the computation of the external estimations of the latent variables. Then, using Eqs. (10)(12), the internal estimations are computed. Using Mode A or B, new weights wh are computed. This process is repeated until convergence of the weights wh . Finally the regression equations (2) and (3) are estimated by multiple regressions on the external estimates of the latent variables. For the estimation of the model parameters, the Wynne Chin PLS-Graph software (Chin, 2001) is used. All variables are standardised. The low number of products obliged us to use Mode A to calculate the external estimates of the latent variables (although the mode of relationship between the manifest and latent variables is formative for the physico-chemical and sensory blocks).

Table 5 PLS-Graph software output, correlation coecients between the latent variables Physico-chemical Physico-chemical Sensory Hedonic 1.000 0.810 0.864 Sensory 1.000 0.968 Hedonic

1.000

hedonic score to the physico-chemical and sensory scores is written as hedonic score 0:233 physico-chemical score 0:780 sensory score residual 13

4.4.2. Application to group I of consumers Fig. 6 shows the regression coecients between the latent variables of the model shown in Fig. 5 and the correlation coecients between the manifest and latent variables. Table 4 shows the estimates for the latent variables (which we also call scores) and Table 5 their correlation coecients. The correlation coecients between the manifest and latent variables are validated by bootstrap on 100 samples. The non-signicant correlations are indicated in italic in Fig. 6. We may conclude from Fig. 6 that the regression equation relating the

Saccharose Sweetening power

Fructose
.93 - .90

The physico-chemical score is not signicant (t 0:36), while the sensory score is (t 1:62). There is also a signicant connection between the physico-chemical and the sensory scores (t 2:45). However there is a large correlation between the hedonic score and the physicochemical score (r 0:864). Consequently the non-signicant inuence of the physico-chemical score on the hedonic score in the multiple regression equation is due to multi-collinearity. This suggests that a PLS regression of the hedonic score should be carried out on the physico-chemical and sensory scores; this regression leads to the following equation: hedonic score 0:484 physico-chemical score 0:533 sensory score residual 14

Glucose
-.90 -.20

Vitamin C

pH before process . .08 pH after centrifugation Titre Citric acid Smell intensity Odour typicity Pulp Taste intensity Acidity
.98 .71 - .64 - .93 .95 .94 -.97

- .98

.233 (t = .36) >0 Judge 2, Judge 3,

.810 (t = 2.45)
.41

>0
3

>0 .780 (t = 1.62) Judge 96 R2 = .956


2

.97 - .95

Sweetness

with an R2 0:936 to be compared with R2 0:956 in the model (13). The two PLS regression coecients validated by Jack-knife in SIMCA-P are now signicant (Fig. 7). The estimate of the hedonic score shown in Table 4 enables us to classify the products by order of preference: Fruivita refr: > Tropicana refr: > Tropicana r:t: > Pampryl refr: > Joker r:t: > Pampryl r:t:

Bitterness

Fig. 6. Estimate of the internal model (regression coecients and Student-t) and the external model (correlation coecients). Non-signicant correlations are in italic.

M. Tenenhaus et al. / Food Quality and Preference 16 (2005) 315325


Fructose
.10 .38 .43 .43 -.36 -.37 -.33

323

Saccharose
0.70

Glucose
-.33

Sweetening power pH before processing pH after centrifugation Titre

Vitamin C
-.01

CoeffCS[1](hedonic)

0.60 0.50 0.40 0.30

Judge 2
.62

t 11
.67

Judge 3
.55

Citric acid

t1
.11 .43 .30 -.28 -.46 -.46 .74

u1
R2 = .945 .49

Smell intensity
0.20

Odour typicity
0.10 0.00

Judge 96

Pulp Taste intensity

t 21
.46

Physico-Chemical

Sensory

Acidity

Bitterness

Sweetness

Fig. 7. SIMCA-P software output, validation of the PLS regression of the hedonic score on the physico-chemical and sensory scores.

Fig. 8. Estimate of the hierarchical multi-block PLS model (w and c loadings). Non-signicant loadings are in italic.

This gives the ranking shown in Table 3, using the PLS component u1 as the average judge in group I. The physico-chemical score is correlated negatively with fructose, glucose, titre and citric acid and positively with saccharose, pH before processing and pH after centrifugation. The sensory score is correlated positively with odour typicity, pulp and sweetness and negatively with taste intensity, acidity and bitterness. Table 5 shows that the hedonic score given by the group I judges is correlated positively with the physico-chemical score (r 0:864) and very strongly with the sensory score (r 0:968). Consequently, the group I judges like products with odour typicity, pulp, and sweetness and reject products with taste intensity and an acidic and bitter nature. This result is veried in Table 6. 4.5. Hierarchical multi-block PLS modelling of the relationships between preferences, physico-chemical and sensory characteristics of the products for each group of consumers Hierarchical multi-block PLS has been proposed by Wold, Kettaneh, and Tjessem (1996) for easier model interpretation and as an alternative to variable selection. On the orange juices data, for the group I of consumers it consists of three steps: (1) a PLS regression of YI on X1 keeping the signicant PLS components t1h , (2) a PLS regression of YI on X2 keeping the signicant PLS

components t2h , then (3) a PLS regression of YI on the signicant PLS components t1h and t2h leading to components th and uh . A PLS component is considered signicant if the Wolds criterion Q2 is larger than the usual limit 0.05 and the research for new components stopped as soon as a non-signicant component appears. On this example each of the three steps leads to one component. The results are summarised in Fig. 8. The loadings w and c are shown on the arrows relating the variables to the PLS components. The regression coecient between u1 and t1 (equal to 1, this is a property of the PLS regression algorithm) is shown on the arrow relating u1 to t1 . The non-signicant X are shown in italic. Moreover the PLS components t11 , t21 and u1 values are given in Table 7. Their correlations with the corresponding latent variables issued from PLS path modelling are all above 0.999. Finally comparing Figs. 6 and 8 it appears that PLS path modelling followed by PLS regression

Table 7 SIMCA-P software output, hierarchical multi-blocks model: estimate of the latent variables Physico-chemical Pampryl r.t. Tropicana r.t. Fruivita refr. Joker r.t. Tropicana refr. Pampryl refr. )1.76 2.59 2.00 )3.80 1.41 )0.44 Sensory )2.71 0.95 1.87 )1.65 2.74 )1.19 Hedonic )1.46 0.87 1.55 )1.24 1.30 )1.01

Table 6 Sensory characteristics of the products classied according to the preferences of group I Product Fruivita refr. Tropicana refr. Tropicana r.t. Pampryl refr. Joker r.t. Pampryl r.t. Sweetness 3.4 3.3 3.3 2.9 2.8 2.6 Odour typicity 2.88 3.02 2.82 2.73 2.59 2.53 Pulp 4.00 3.69 1.91 3.34 1.66 1.66 Taste intensity 3.45 3.12 3.23 3.54 3.37 3.46 Acidity 2.42 2.33 2.55 3.31 3.05 3.15 Bitterness 1.76 1.97 2.08 2.63 2.56 2.97

324

M. Tenenhaus et al. / Food Quality and Preference 16 (2005) 315325

and hierarchical multi-block PLS model give similar results leading to the same interpretation on this example. 5. Conclusion The proposed methodology presents many interesting features: A space representing hedonic judgements built from product characteristics, where such characteristics and the judgements are simultaneously represented. This representation is a compromise between the internal and external approaches, which presents the important specicity of relying on linear combinations of product characteristics. This guarantees the interpretability of the resulting preference structure. The use of PLS regression ts well with this sensory problematic. A segmentation principle aiming at dening homogeneous hedonistic judgement groups, and at the same time, explainable by product characteristics. As long as no consensus exists among the hedonic judgements, a segmentation is mandatory. The very simple procedure which is described herein relies on the interest of the PLS components. It will be rather useful only for cases where these components have a strong interpretation. The possibility of graphically visualising preference zones of each group and of numerically expressing the percentage of favourable judgements. This representation is necessary when a more complete model than the vectorial model is used. But it is also useful for this simple case: it brings a quantication of the preferences for any point of the representation space. The possibility of dening a synthesis of the relationships among the variables based on latent variables. This synthesis gives a complement to the graphical display, which allows a quantication of the relationships between the various variable blocks. A major interest in this methodology is given by the unitary framework, based on PLS principles, which allows to uncover the structural relationships existing among product characteristics and preference judgements. The researcher may also, depending on his objectives, as well as on his work methods, focuses his attention on some selected aspects among all. Acknowledgements We would like to thank the referees for their numerous comments for the improval and clarication of the paper.

References
Carroll, J. D. (1972). Individual dierences and multidimensional scaling. In R. N. Shepard, A. K. Romney, & S. B. Nerlove (Eds.), Multidimensional scaling: Theory and applications in the behavioral sciences (vol. 1, pp. 105155). New York: Seminar Press. Chang, J. J., & Carroll, J. D. (1969). How to use MDPREF, a computer program for multidimensional analysis of preference data. Computer manual. Murray Hill, NJ: Bell Labs. Chin, W. W. (1998). The partial least squares approach for structural equation modeling. In G. A. Marcoulides (Ed.), Modern methods for business research (pp. 295336). Mahwah: Lawrence Erlbaum Associates. Chin, W. W. (2001). PLS-Graph users guide. C.T. Bauer College of Business, University of Houston, USA. Chin, W. W., & Newsted, P. R. (1999). Structural equation modeling analysis with small samples using partial least squares. In R. Hoyle (Ed.), Statistical strategies for small sample research (pp. 307341). Thousand Oaks: Sage Publications. Courcoux, P., & Chavanne, P. C. (2001). Preference mapping using a latent class vector model. Food Quality and Preference, 12, 369 372. Danzart, M. (1998). Cartographie des pr ef erences. In SSHA (Ed.), Evaluation sensorielle: Manuel m ethodologique. Paris: Lavoisier. Greenho, K., & MacFie, H. J. H. (1994). Preference mapping in practice. In H. J. H. MacFie & D. M. H. Thomson (Eds.), Measurement of food preferences. London: Blackie Academic & Professional. Hilgesen, H., Solheim, R., & Ns, T. (1997). Consumer preference mapping of dry fermented lamb sausages. Food Quality and Preference, 8, 97109. Huon de Kermadec, F. (2001). M ethodes reliant donn ees sensorielles descriptives, h edoniques et instrumentales. In I. Urpadilleta, C. Ton Nu, C. Saint Denis, & F. Huonde Kermadec (Eds.), Trait e d  evaluation sensorielle: Aspects cognitifs et m etrologiques des perceptions. Paris: Dunod. Huon de Kermadec, F. H., Durand, J. F., & Sabatier, R. (1997). Comparison between linear and nonlinear PLS methods to explain overall liking from sensory characteristics. Food Quality and Preference, 8, 395402. reskog, K. G. (1970). A general method for analysis of covariance Jo structure. Biometrika, 57, 239251. reskog, K. G., & So rbum, D. (1989). LISREL-7 users reference Jo guide. Mooresville: Scientic Software. Le Dien, S., & Pag es, J. (2003). Hierarchical multiple factor analysis: application to the comparison of sensory proles. Food Quality and Preference, 14, 397403. ller, J.-B. (1989). Latent variables path modeling with partial Lohmo least squares. Heildelberg: Physica-Verlag. Martens, H., & Martens, M. (2001). Multivariate analysis of quality, an introduction. Chichester: John Wiley and Sons. McEwan, J. A. (1996). Preference mapping for product optimization. In T. Ns & E. Risvik (Eds.), Multivariate analysis of data in sensory science. Amsterdam: Elsevier Science BV. Pag es, J. (1996). Un exemple danalyse simultan ee de donn ees h edoniques, descriptives et instrumentales. In Proceedings of third sensometrics meeting (pp. 27.127.4). Nantes: ENITIAA. Pag es, J., & Tenenhaus, M. (2001). Multiple factor analysis and PLS path modeling. Chemometrics and Intelligent Laboratory Systems, 58, 261273. SIMCA-P (2004). User guide. Ume a: Umetrics AB. Tenenhaus, M. (1999). Lapproche PLS. Revue de Statistique Appliqu ee, 47, 540. Tenenhaus, M., Esposito Vinzi, V., Chatelin, Y.-M., & Lauro, C. (2004). PLS path modeling. Computational Statistics and Data Analysis, 46.

M. Tenenhaus et al. / Food Quality and Preference 16 (2005) 315325 Unscrambler (2003). User manual. Oslo: Camo Process AS. Vigneau, E., & Qannari, E. M. (2002). Segmentation of consumers taking account of external data. A clustering of variables approach. Food Quality and Preference, 13, 515521. Vigneau, E., Qannari, E. M., Punter, P. H., & Knoops, S. (2001). Segmentation of a panel of consumers using clustering of variables around latent directions of preference. Food Quality and Preference, 12, 359363.

325

Wold, H. (1985). Partial least squares. In S. Kotz & N. L. Johnson (Eds.), Encyclopedia of statistical sciences (vol. 6, pp. 581591). New York: John Wiley and Sons. Wold, S., Kettaneh, N., & Tjessem, K. (1996). Hierarchical multiblock PLS and PC models for easier interpretation and as an alternative to variable selection. Journal of Chemometrics, 10, 463 482.

Вам также может понравиться