Astm 434

Copyright by ASTM Int'l (all rights reserved); Thu Jun 18 11:16:44 EDT 2015
Downloaded/printed by
Universidad Nacional De Colombia (Universidad Nacional De Colombia) pursuant to License Agreement. No further reprod
MANUAL ON
SENSORY TESTING METHODS
Sponsored by
ASTM Committee E-18 on
Sensory Evaluation of Materials and Products
AMERICAN SOCIETY FOR
TESTING AND MATERIALS
ASTM Special Technical Publication 434
published by the
AMERICAN SOCIETY FOR TESTINGAND MATERIALS
1916 Race Street, Philadelphia, Pa. 19103
Universidad Nacional De Colombia (Universidad Nacional De Colombia) pursuant to License Agreement. No further reproductions aut
~) BY AMERICAN SOCIETY FOR TESTING AND MATERIALS 1968
Library of Congress Catalog Card Number: 68-15545
ISBN 0-8031-0018-3
NOTE
The Society is not responsible, as a body,

for the statements and opinions
advanced in this publication.
Printed in Baltimore. Md.

First Printing. May 1968
Second Printing. May 1969
Third Printing, November 1969
Fourth Printing, April 1973
Fifth Printing. January 1976
Sixth Printing. September 1977
Printed in Philadelphia, Pa.
Seventh Printing, April 1982
Eighth Printing, September 1.984
Ninth Printing, November 1986
Printed in Baltimore. Md.
Tenth Printing. July 1990
Universidad Nacional De Colombia (Universidad Nacional De Colombia) pursuant to License Agreement. No further reproductions autho
Foreword
Sensory testing is concerned with measuring physical properties by

psychological techniques. As part of the field of psychometrics, sensory
methods are used for measurements that cannot be made directly by
physical or chemical tests.
To conduct sensory tests may not seem particularly difficult to the lay-
man, but it is not as easy as it seems. One cannot simply proceed by rote
and expect to obtain meaningful and valid results. Such an approach
makes it highly probable that the data developed will not reveal the true
situation. It is necessary to be thoroughly familiar with the techniques
available, to know when and how to use them, and to have a panel that
has been carefully screened and trained.
This manual endeavors to guide the technical man who is not an expert
in the field, but who is confronted with the need to conduct sensory tests.
An attempt has been made to make the manual complete by including all
relevant areas but without exploring each one fully. The main purpose is
to show how to evaluate the properties of objects rather than to demon-
strate the underlying theories.
Both general and specialized procedures are given which should be
adequate for most situations encountered. Included are descriptions of
basic techniques for discrimination and preference testing, the screening
and training of panels, controls for test situations, when and where the
different techniques should be applied, and basic guidance in the statisti-
cal analysis and interpretation of the results. Literature references are
included for those interested in becoming familiar with the subject matter.
Although sensory testing techniques can be applied with all human
senses, this manual is mai~lly restricted to the senses of taste and smell.
Subsequent manuals are expected to be concerned with the other senses.
Also, the examples usually are concerned with food products, primarily
because the people who compiled the manual were food oriented. This
does not imply that the techniques are applicable only to foods; they can
be used wherever the senses of taste or olfaction are involved.
An effort has been made to organize the manual as simply as possible,
yet without being unduly repetitive. Thus, relevant information in regard
to a given topic, for example, a particular kind of test, might be found in
all sections. Cross referencing has been used as an additional aid.
Universidad Nacional De Colombia (Universidad Nacional De Colombia) pursuant to License Agreement. No further reproductions
Related
ASTM Publications
Basic Principles of Sensory Evaluation, STP 433

(1968)
Correlation of Subjective-Objective Methods in the

Study of Odors and Taste, STP 440 (1968)
Universidad Nacional De Colombia (Universidad Nacional De Colombia) pursuant to License Agreement. No further reproductions authorized.
Contents
I. G e n e r a l R e q u i r e m e n t s . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
A. Physical C o n d i t i o n s . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1. G e n e r a l . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
2. L o c a t i o n . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . I
3. L a b o r a t o r y L a y o u t . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
4. O d o r C o n t r o l . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
5. Lighting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
6. G e n e r a l C o m f o r t . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
B. Test Subjects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1. D i s c r i m i n a t i o n Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2. Preference T e s t s . . . : . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
3. T r a i n i n g o f Subjects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
4. M o t i v a t i o n o f Subjects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
5. Physiological Sensitivity o f Subjects . . . . . . . . . . . . . . . . . . . . . . . . . 9
6. Psychological C o n t r o l . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
C. Samples of Materials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1. Selection o f Samples T o Be Tested . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2. P r e p a r a t i o n o f Samples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3. Presentation o f Samples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
lI. Test F o r m s . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
A. Paired C o m p a r i s o n s . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1. Scope a n d A p p l i c a t i o n . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2. S u m m a r y o f M e t h o d . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3. Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
4. Special C o n s i d e r a t i o n s . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
5. Analysis o f D a t a . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
B. R a t i n g Scales . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2. Types o f R a t i n g Scales . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
4. Analysis o f D a t a . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
C. M a g n i t u d e E s t i m a t i o n . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3. Analysis o f D a t a . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
D Ranking Methods--Rank Order ............................. 22
3. P r o c e d u r e . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
5. Analysis o f D a t a . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
E. Forced Choice M e t h o d s . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2. S u m m a r y of M e t h o d . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3. Description o f M e t h o d s . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
4. Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
Universidad Nacional De Colombia (Universidad Nacional De Colombia) pursuant to License Agreement. No furth
vi COIq~tffl$
5. Special Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
6. Analysis o f Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
F. Threshold- M e t h o d s . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
1. Scope and Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2. Preparation o f Samples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3. Selected M e t h o d s . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
4. Analysis o f Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
G. Quality Attribute Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
1. Scope and Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2. Summary o f M e t h o d s . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3. Special Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4. Data Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
IlI. Special Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
1. Hedonic Scale M e t h o d . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2. Rating Scale Evaluation o f Intensity . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3. Flavor Quality Control in the Production of Beverages . . . . . . . . . . 34
4. Flavor Profile M e t h o d . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
5. Quality Attribute Check List . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
6. Flavor and O d o r Characterization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
7. F o o d Action Scale (FACT) M e t h o d . . . . . . . . . . . . . . . . . . . . . . . . . . 38
8. Triangle T e s t - - D e g r e e o f Difference . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
9. Triangle T e s t - - C h a r a c t e r zation o f Difference . . . . . . . . . . . . . . . . . . 39
10. Dilution Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
IV. Statistical Procedures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
A. Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
B. Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
C. Limitations and Qualifications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
D. Reference to Prepared Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
E. The t-Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
F. Chi-Square Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
G. Analysis o f Variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
H. Problem o f Multiple C o m p a r i s o n s . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
I. Threshold Determination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
Acknowledgment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
Glossary o f Statistical Symbols . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
Tables I to 12 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
Universidad Nacional De Colombia (Universidad Nacional De Colombia) pursuant to License Agreement. No further r
STP434-EB/May 1968
MANUAL ON SENSORY TESTING METHODS
I. General Requirements
A . PHYSICAL CONDITIONS [1] l
1. General
a. Sensory testing requires special controls of various kinds. If
they are not employed, results may be biased or sensitivity
may be reduced. Most of these controls depend directly upon,
or are affected by, the physical setting in which the tests are
conducted. Major ones include control of irrelevant odor
stimulation, elimination of psychological distraction, and
provision of a generally comfortable work environment.
b. This section describes in general terms the conditions which
are most desirable and indicates how they are usually attained
in laboratories which have been designed especially fer sen-
sory testing. When sensory testing must be done using facili-
ties not designed for that purpose, control is more difficult
but not impossible. Then it is a matter of improvising to ap-
proximate the optimal conditions as closely as possible.
2. Location
Many factors might be considered here, since the location of
the laboratory may determine how easy or difficult it is to es-
tablish and maintain some of the physical controls. In addition
there are two general considerations:
a. Accessibility
The laboratory should be located so that the majority of
the available test subjects can reach it conveniently, with a
minimum of disturbance in normal work routines. Otherwise,
motivation and performance will be adversely affected.
b. Freedom from Confusion
This requirement often conflicts with the above. It is un-
desirable to locate the laboratory where there is heavy traffic
flow (for example, adjacent to a main lobby or cafeteria)
because of the possibility of disturbance of the tests by well-
meant socialization. When located in such an area, special
procedures to control this factor are needed.
3. Laboratory Layout
a. The objective is to arrange the test area so as to achieve effi-
ciency of physical operations, to avoid distraction of test
subjects by the laboratory operations or by outside persons,
and to minimize mutual distraction among subjects.
b. The testing area should be divided into at least two parts:
one a work area for storage, sample preparation and presenta-
1The italic numbers in brackets refer to the list of references at the end of this manual.
1 11:16:44 EDT 2015
Copyright by ASTM Int'l (all rights reserved); Thu Jun 18
Downloaded/printed by ASTM International
Copyright 9 1968 by www.astm.org
2 MANUALON SENSORY TESTING METHODS
tion, etc., and the other for the actual testing. These areas
should be separated by a complete partition if preparation
involves cooking or odorous materials.
c. Individual panel booths are essential to avoid mutual distrac-
tion among test subjects.
d. It is convenient to provide a room where test subjects can wait
their turn without disturbing those who are testing.
4. Odor Control
a. The testing area must be kept as free from odors as possible.
This is sometimes difficult to attain, and the degree to which
one may compromise with the ideal is a matter of judgment.
Some of the desirable practices are listed here, but certain
circumstances will require special solutions.
b. Air conditioning with activated carbon filters installed in the
system is the best means of odor control. A slight positive
pressure in the testing room to reduce inflow of air from the
sample preparation room and other areas is recommended.
Air from the sample preparation room should not pass
through the filters.
c. All materials and equipment inside the room should be either
odor free or have a low odor level. (Transite partitions have
proved to be very effective as wall and ceiling material. If
highly odorous products are to be examined or high humidi-
ties are anticipated, these partitions may be sprayed with an
odorless, strippable, soft-colored coating which can be re-
placed if it becomes contaminated. Low-odor asphalt tile
has proved effective as floor material.)
d. Air in the testing room may become contaminated from the
experimental samples themselves as, for example, when testing
perfumes. Procedures should be developed, suitable to the
materials and the tests, so that odoriferous samples are ex-
posed for a minimum time.
5. Lighting
a. Most testing does not require special lighting. The objective
should be to have an adequate, comfortable level of illumina-
tion such as is provided by any good lighting system.
b. Special light effects may be desired to hide irrelevant differ-
ences in color and other aspects of appearance. One may
simply use a very low level of illumination, or may adjust the
color of illumination either with colored bulbs or by attaching
colored filters over standard lights.
6. General Comfort
There should be an atmosphere of comfort and relaxation in
the testing room, which will encourage panel members to con-
Universidad Nacional De Colombia (Universidad Nacional De Colombia) pursuant to License Agreement. No further
MANUAL ON SENSORYTESTING METHODS 3
centrate on the testing tasks. Air conditioning with controlled

temperature and humidity is desirable for this reason. Care
should be exercised in selecting chairs and stools, designing work
areas, etc. to ensure that this principle is not violated.
B. TEST SUBJECTS[2]
1. Discrimination Tests [3,4]
a. Selection oj General Discrimination Panels
Two methods, representing different levels of selection, are
described here, which give a degree of selection adequate for
panels performing most of the discrimination functions re-
quired in sensory testing laboratories.
The rationale of these methods is that a panel member is
usually required to deal analytically with complex stimuli;
hence any series of tasks on simple stimuli will only partially
determine a person's value. It is necessary to take into con-
sideration the whole gamut of factors that may influence per-
formance, and this can be done only by using representative
tests on representative materials. This may be considered a
work-sample method. The selection prccess is started with a
large group of persons, and the objective is to rank all can-
didates in order of skill. The size of the group initially tested
affects the efficiency of the ultimate panel, since the larger the
number of candidates, the greater the i:robability of finding
persons of superior ability. All available personnel should be
included in the screening trials, since it is possible to find
persons of superior ability in unexpected quarters. Do not
excuse any one from the selection tests on the grounds that he
is automatically qualified due to special experience or posi-
tion. Requalification of all panel members is required periodi-
cally.
(1) One basic procedure in the first method is the triangle test
(lI.E.3a). The differences represented in the selection tests
should be similar to those likely to be encountered in the
actual operation of the panel. For example, if the panel is
to be used for only one product, use that product to
design selection tests. They should cover as broad a range
of the anticipated types of differences as possible.
Each test should represent a difference such that the
group of candidates as a whole will establish a significant
difference, but the percentage of correct responses should
not go above 80 percent. Each person should have two
trials at the same test session on each triangle test. It is
recommended that selection be based on no fewer than 20
judgments per subject made on 10 different tests in 10
different sessions. Candidates are ranked on the basis of

percentage of correct responses. The top-ranking people
are selected but with the proviso that no one scoring less
than 60 percent correct will be used. It is essential that
each candidate take all (or nearly all) of the tests. Other-
wise, the percentage correct may not be a valid basis of
comparison, because tests are likely to vary in degree of
difficulty.
(2) The second method tests candidates in a situation in-
volving repetition of judgments on a series of samples
which represents a range of total quality. The type of
rating scale to be employed in the final operation of the
panel is used. A series of four to six samples, all variations
of a single product type and representing a range of total
quality, is established. If the panel is to be used on more
than one product type, then this series of samples should
be of the product type of major interest, or else the experi-
ment should be repeated on two or three product types.
Each candidate rates the series several times (a minimum
of four replications is recommended). The data for each
candidate are separately subjected to analysis of variance.
The ratio of between-samples variance to within-samples
variance is used as the measure of panel member skill.
The degree to which a person discriminates between
samples and is consistent in his replicate judgments will
be reflected in his F-ratio. Candidates are ranked in order
of F-ratios from highest to lowest. The panel is selected
from the highest ranking candidates whose F-ratios are
significant at or beyond the 5 percent level. (See IV.G.)
b. Number of Panel Members
(1) The number of panel members used varies considerably
from one laboratory to another. The number most often
used is ten. Investigators use different criteria to determine
size. There is no "magic" number. Each situation may
have its own particular needs. Also, panel size may de-
pend upon the number of qualified persons available. A
panel should never include a person, or persons, with less
than satisfactory qualifications just to achieve a predeter-
mined panel size.
(2) Basically, the number should depend upon the variability
of the product, the reproducibility of judgments, and
whether there are basic differences between panel mem-
bers. When a panel is first organized such information is
usually unavailable, and panel size may be limited by the
Universidad Nacional De Colombia (Universidad Nacional De Colombia) pursuant to License Agreement. No fu
MANUAL ON SENSORY TESTING METHODS 5
number of qualified persons available. Specific instruc-

tions regarding size are hardly in order because of the
many factors that must be considered. Instead, general
recommendations are made, limits are suggested, and the
factors to be considered are indicated.
(3) The minimum number of panel members for a given test
is five, since any fewer would represent too much de-
pendence upon any one individual's responses. With
appropriate replicate judgments, this number gives preci-
sion satisfactory for most discrimination problems.
(4) If at all possible, a pool of qualified persons (depending on
the amount of work anticipated and the number of people
available) should be maintained, from which the individ-
uals to take a given test or series of tests are drawn in
regular rotation. This has obvious advantages, including
the ready availability of replacements in emergencies,
improved motivation through reduction of the test load
on any one person, and the capability of handling peak
loads of testing.
2. Preference Tests [3J
a. Selection of Subjects
(1) Preference testing requires different criteria of selection
than discrimination tests. Ability to discriminate or per-
form other complex acts is no longer valid. Instead, the
only criterion should be representativeness of some con-
sumer population. The approaches used in selection for
discrimination tests are antithetical to the preference
objectives, which are to predict direction of choice and
sometimes the extent to which a product appeals to some
population.
(2) Definition of the population of interest is required, but in
routine work many compromises are accepted. Then it
becomes a matter of assuring random selection, working
within the limitations which have been accepted. Sophisti-
cated sampling procedures are available, but they are
beyond the scope of this manual and are usually not ap-
plicable to laboratory testing. In the usual laboratory
situation realism demands compromise because of limita-
tions on the numbers and types of people available; how-
ever, it is possible to take precautionary steps which help
avoid the more serious errors.
(3) This approach is suggested: develop a roster of all persons
who may be available for testing. For any particular test,
select subjects from this roster by use of a random method.
Universidad Nacional De Colombia (Universidad Nacional De Colombia) pursuant to License Agreement. No furthe
6 MANUAL ON SENSORY TESTING METHODS
Eliminate from consideration for any particular test all

persons who have expert knowledge of the product type
and all those who have specific knowledge of the samples
and variables being tested.
(4) Sometimes, and usually because no one else is available,
preference testing is done using members of trained
panels or people who have expert knowledge of the
products. The results should always be considered and
interpreted separately from those obtained from a prop-
erly selected group of subjects. The possibility of bias and
resultant error in predicting consumer preferences is much
greater.
b. Number of Subjects
(1) The same basic factors that apply to discrimination tests
also apply here. These are the magnitude of error, the
precision desired in the results, and breadth of sampling;
however, a greater emphasis is placed on the last factor.
Variability tends to be high in preference testing but is
relatively constant for a given type of test. Precision (for
example, in terms of the size of difference between treat-
ments that one wants to be able to detect) is a matter of
arbitrary choice. Breadth of sampling is related to the
probability that the sample will be representative of some
meaningful population. As more subjects are included, the
possibility of selection bias is reduced. Full consideration
of all of these factors is a technical matter beyond the
scope of this manual. Sampling within the usual limita-
tions seldom can be technically proper, but compromises
should be made with the above factors in mind.
(2) Some general guides in determining the number of sub-
jects to be used are given below:
(a) Conclusions based upon results from small laboratory
panels should be considered as tentative and subject
to further verification whenever important issues are
involved. Panels numbering as few as 16 to 20 people
are sometimes employed; however, the usual practice
is to require at least 30. Even this number is small and
represents rough screening. The error is large, and
important trends can go undetected. Moreover, the
breadth of sampling is dangerously narrow.
(b) About 50 to 100 people are usually considered ade-
quate for most of the problems handled in the labora-
tory, the exact number depending on the experimental
design. If properly selected they will be representative
of the available population. The error will be small

enough so that most important differences will be
detected.
(c) The use of larger numbers of people will improve
discrimination but will not do anything about possible
biases in the population. When the importance of the
problem, or of the decision that must be made, indi-
cates the need for a larger test, it is advisable to take
the additional data from a more carefully chosen
sample.
(d) Note that obtaining replicate judgments from a small
group of people does not serve the same purpose as
increasing the number of people. It will reduce vari-
able error, but it does not correct for limited breadth
of sampling.
c. hlterpretation o f Results
(1) A point that is touched on several times in the two sections
above should be made explicit. Drawing inferences and
conclusions from test results when they are not warranted
is a serious fault that must be guarded against. A prefer-
ence test made with an inadequate number of subjects, or
with biased subjects, is not inherently wrong, if its limita-
tions are recognized. The problem is that experimenters
are prone to overlook the limitations.
(2) The above leads to a strong recommendation. When
interpreting preference test results, particularly when the
test was small or the sampling limited, pay particular
attention to the possible effects of these factors. Do not
generalize too broadly.
3. Training of Subjects [5]
a. Discrimination Tests
(1) Panel members must become thoroughly familiar with the
tests with which they will be concerned. This includes
complete understanding of the nature of the judgments
required, the test procedure, and of test controls which
the individual is required to maintain. Initial training
usually can be accomplished during selection trials.
(2) Training may be continued through individual and group
sessions in which various samples of the product types
usually involved in the tests are evaluated and discussed.
This is particularly important for panel members who will
be required to make qualitative distinctions among
products. It is necessary for all panel members to learn
a common language.
(3) Training should concentrate on the subjects' perceptual

and judgmental tasks. They need not understand such
matters as test design and mathematical treatment and
interpretation of results. Training subjects to recognize
features of a set of physical standards may help them dis-
regard personal preferences and develop more stable
judgments.
b. Preference Tests
(l) Training is antithetical to the purpose of such tests. It
should consist simply in describing the mechanics of the
test. Any attempt to alter the subject's attitudes or his
manner of arriving at his decisions must be carefully
avoided.
(2) Incidental training, arising from the circumstance of con-
tinual testing, is often alleged to occur. Although this
probably does happen to a limited extent, there is no
evidence to indicate that it is a serious problem within
most testing programs.
4. Motivation of Subjects [6]
a. Obtaining useful results depends heavily on maintaining a
satisfactory level of motivation. The criteria for good motiva-
tion cannot be specific; however, poor motivation will gen-
erally be evidenced in hasty, careless testing, apparently poor
discrimination, and a lessened willingness to participate.
b. Motivation is a complex problem area. People's behavior is
caused by many factors which may be interacting in unpre-
dictable ways. Thus, motivation can only be dealt with in a
general way. Perhaps the important thing is that the experi-
menter recognize the importance of motivation, be aware of
the types of things that affect it, and be alert for evidences of
poor motivation. He must not assume that people are auto-
matons.
c. One of the most important factors contributing to good moti-
vation is interest in the test activity itself. With inexperienced
subjects, who test only occasionally, interest is usually spon-
taneous. In the course of long-term panel work, interest may
become reduced. Deliberate means must therefore be em-
ployed to supply motivation. One of the best means of achiev-
ing good motivation is to maintain a high degree of status for
the program and the subjects. This can be achieved if the
program is recognized as a useful and productive part of the
subjects' work, if those in charge appear to know what they
are doing, and if the tests are run efficiently. The subjects
should be made aware of the importance of their contribution.
A helpful practice is to publicize test results insofar as possible

without prejudicing future tests. Adequate facilities and
businesslike laboratory procedures, maintained day after day,
will develop respect for the program.
d. Favorable management attitudes are essential for a productive
program and should be publicized sufficiently to favorably
influence rank-and-file panel members.
e. Pleasant surroundings also contribute. An effort should be
made to make test participation a relaxing break in the day's
routine. In this connection, a "reward" system may be used.
5. Physiological Sensitivity of Subjects [7,8]
a. Rules for maintaining physiological sensitivity cannot be
specified in detail. Generally speaking, they consist in avoiding
conditions which might interfere with the normal functioning
of the taste and odor senses. Temporary adaptation from
substances eaten or smelled is the major problem. Odor is
particularly important, because people may become adapted
to an odor continually present and remain unaware of the fact.
b. There is some evidence that physiological sensitivity fluctuates
throughout the day; however, this time dependence is ap-
parently not strong enough to contraindicate testing at any
time during the normal working day (with exceptions as
indicated below).
c. Following are some general suggestions:
(1) Do not test for 1 h after meals.
(2) Wait at least 20 min after smoking, chewing gum, or eat-
ing or drinking between meals.
(3) Do not use panel members who are ill, particularly when
suffering from the common cold.
(4) Encourage panel members to avoid eating highly spiced
foods for lunch on days tests are to be run in the after-
noon.
(5) When running odor tests, ask panel members not to use
such cosmetics as perfumed face lotions or lipstick. It is
desirable to have subjects wash their hands with odorless
soap when they are required to handle the containers
(6) In taste testing, as a precautionary measure have subjects
rinse out their mouths with water just prior to starting a
test.
d. One aspect to be considered is elimination of the effects of the
experimental samples themselves, in that the early members in
a series tend to adapt the senses for the later ones. This brings
up the question of rinses, or some other means of cancelling
the effects of a given sample.
Universidad Nacional De Colombia (Universidad Nacional De Colombia) pursuant to License Agreement. No further re
I0 MANUAL ON SENSORY TESTING METHODS
(1) With odor stimuli, normal breathing usually suffices if one

waits 20 to 30 s. However, this is only a general guide. The
time required will vary with the adapting stimulus; some
substances may require considerably longer recovery
periods.
(2) With taste stimuli, rinsing with taste-neutral water seems
to be the best method. There is no evidence that foods such
as crackers or apples are any more effective than water.
Rinse water should be at room temperature, rather than
cold. Water above body temperature is advisable when
fatty foods are tested by trained panel members, but it
should not be used in preference tests because of its
generally unpleasant effect.
(3) Rinsing between samples is not done universally. For
example, there is evidence that subjects perform better
in the triangle test if they follow the practice of either
rinsing between samples or not, whichever they prefer.
6. Psychological Control
a. Sensory testing, whether discrimination or preference, is con-
cerned with the measurement and evaluation of stimuli by
means of human behavior. Thus, the technology outlined in
this manual may be considered as an example of applied
psychology. This does not mean that all operators need be
trained in that science, nor that they must at all times con-
sciously maintain the kinds of attitudes that are typically
psychological in the clinical sense; however, it does mean that
procedures must take account of relevant psychological vari-
ables. One must be generally aware of the complexity of
human behavior, in addition to learning how to deal with
specific factors, to anticipate and avoid sources of error or
bias.
b. It would be impossible to list here all possible psychological
factors and dictate measures for their control; nor is it neces-
sary. The same basic philosophy that applies to all experi-
mental methods is applicable. Throughout this manual special
procedures are described which incorporate elements of
psychological control. They are particularly evident in the
section on test methods, and many features of experimental
design are directed toward the same purpose. The purpose of
this section is to emphasize points which are considered par-
ticularly important and to list others which may not have
been touched on elsewhere.
c. A subject always responds to the total situation. For example,
in a preference test, a person's rating of a material reflects not
Universidad Nacional De Colombia (Universidad Nacional De Colombia) pursuant to License Agreement. No
only his feelings built up by many factors, both transitory and

relatively permanent, but, generally, all irrelevant to the
purposes of the experiment. This is the reason for attempting
to keep the experimental situation as constant as possible,
keeping it quiet and comfortable, and eliminating outside
pressures. Many features of test design and data analysis take
this into account. For example, it is both commonly accepted
and true that comparisons between samples served to the
same subject in the same session are more reliable than com-
parisons between samples served to different subjects or to the
same person fit different times.
d. There is a related problem which pertains to the experimental
situation itself. This is the subject's tendency, conscious or
unconscious, to use all available information in reaching a
decision, even though he may know that it is irrelevant. This
is particularly important in the forced-choice methods. For
example, a subject may allow accidental variations in such
things as sample size, containers, etc. to determine his choice.
This source of error can nearly always be avoided by rigorously
adhering to the proper procedures of sample presentation.
e. Sensory testing usually seeks to evaluate the properties of a
sample per se, apart from its developmental history. Thus, one
eliminates subjects who are known to have special knowledge
about the materials under test and identifies samples by code
The codes themselves can be biasing. For example, such code
designations as A-l, X in relation to another letter, 1 as com-
pared to 2, 13, or 7, and many others are likely to have ac-
quired meanings which could influence decisions. This source
of error can be eliminated rather easily. Recommendations
are:
(1) Use 2 or 3 digit codes generated from a table of random
numbers.
(2) Use multiple codes for a sample even in the course of a
single session.
(3) Avoid the temptation to use a certain code, or set of codes,
constantly to expedite tabulation of results.
f. It is a common phenomenon in psychological testing that
subjects want to "please" the experimenters. They want to
give "right" answers both to demonstrate their skills and to
expedite, so they believe, the progress of science. This kind of
cooperation must be avoided. Experimenters, particularly
the operators who are giving instructions and presenting
samples, must be aware of the possible effects of their own
attitudes and even of chance statements. The proper approach
is a careful, impersonal neutrality. Avoid giving any hint of

the expected results of an experiment, and do not discuss the
samples with subjects prior to testing. Let them know that
you are pleased to have them test (this is good for motivation),
and let it appear that you will be no less pleased whatever the
test results.
g. Related to the above is the factor of deliberate or unintentional
acts by management or fellow-workers that may influence
subjects. For example, the opinion, "Our new 'Product X' is
the best in the world," might thus become cloaked with
authority, with the result that subjects respond favorably to
anything which they suspect might be Product X. Such situa-
tions go beyond the laboratory and are sometimes hard to
handle. Education of the entire organization is the best solu-
tion.
h. Certain fairly standard and predictable influences are con-
trolled or neutralized in terms of test design and are discussed
elsewhere in this manual. They include time error, position
error, contrast effect, convergence effect, "halo" effect, and
others.
C. SAMPLES OF MATERIALS
I. Selection of Samples To Be Tested [9]
This topic requires no special treatment. The problems of
selection of materials for sensory testing are the same as selection
for any other experimental or quality control purpose; hence
they will not be reviewed here. The general principle is to select
material so that it is representative of the product or process
under study. Sometimes experimenters are nmch concerned about
selection of human subjects, but erroneously assume that the
sampling of materials needs no attention.
2. Preparation of Samples [10]
a. Procedures for preparing samples for testing shall be such
that no foreign tastes or odors are imparted. All samples
within a given test shall be identical with regard to preparation
factors which are subject to control and are not intrinsic to
the material tested.
b. In many instances there is freedom to select one of a variety
of methods of preparation of a given basic material, for
example, if one were testing potatoes, they could be fried,
boiled, mashed, baked, or even eaten raw, It is impossible to
review all of the particular problems that may arise in this
connection; however, some of the important general factors
are listed below:
(1) For difference testing, select the method which is judged
Universidad Nacional De Colombia (Universidad Nacional De Colombia) pursuant to License Agreement. No f
most likely to permit detection of a difference. Simplicity

is the key. For instance, avoid preparations that may add
flavor to all samples, for example, frying or the addition
of seasoning.
(2) For preference testing, select a method judged to be typi-
cal of normal use of the product. With foods, for example,
it is sometimes desirable to run tests using several different
recipes. Generally, preference test subjects should be
allowed to use such "voluntary" additions as salt and
pepper, although they should be instructed to use uniform
amounts on all samples.
(3) The question of the need for a "carrier" in preference tests
is often pertinent. For example, does a test of cake frost-
ings require cake? This cannot be answered categorically.
There is evidence that valid comparisons among samples
of many auxiliary items can be made without using a
normal carrier; however, this depends upon the nature of
the material. Some materials (for example, foods such as
hot sauce, spices, vinegar, etc.) require dilution because of
their intense physiological effects. Each case must be
decided on its own merits.
c. Evaluation of materials (for example, food packaging) where
the main concern is whether tastes or odors will be imparted
to other substances may require the special approach known as
transfer testing [11], which makes use of flavor sensitive ac-
ceptor materials such as mineral oil, distilled water, butter,
or chocolate.
(1) The test sample is confined in a closed space (for example,
a bell jar) with samples of the acceptor material for a
period of 12 to 24 h, or it may be placed in direct contact
with the acceptor material for an appropriate period.
(2) Control samples of the acceptor material are prepared by
exposure under the same conditions except that the test
sample is absent.
(3) This approach may be used with a wide range of acceptor
materials. Selection of the particular material and the
conditions of exposure depend on the nature of the test
sample and the conditions of its intended use.
3. Presentation of Samples
a. The general principle is that samples shall be presented in
such a manner that subjects will respond only on the basis of
those factors which are intrinsic to the material tested. The
key is uniformity, particularly within a given test. It is even
desirable to maintain uniformity from one test to another
Universidad Nacional De Colombia (Universidad Nacional De Colombia) pursuant to License Agreement. N
within a given product type. Important factors to consider

are" quantity of sample, the containers, eating utensils,
and temperature.
b. Size of Samples
The amount of sample to be presented may vary over a
considerable range. Usually considerations of preparation
effort and of availability of materials set the upper limit. In
difference tests, especially with foods, the criterion for the
lower limit is to provide an amount sufficient to permit the
average subject about three tastes, that is, normal sips or bites.
Sometimes the test procedures may dictate otherwise, for
example, subjects may be instructed to try each sample only
once. In such instances, the quantity of sample can be adjusted
accordingly. Generally, about 1/~ oz of a liquid and 1 oz of a
solid is enough. If the samples are to be eaten for preference
tests, the usual practice is to present samples about twice as
large, although for most people this is not necessary. Do not
serve full "normal-serving" quantities, even if the material is
available, unless only one sample is to be tested.
c. Temperature of Presentation
Whenever possible, samples should be presented at room
temperature since this is both more convenient and facilitates
control; however, there are other criteria. For difference
testing, temperature should be such as to optimize the prob-
ability of discrimination. Another criterion, which sometimes
may qualify the first, is that of normalcy, for example, it may
be unimportant to know that two products differ when hot if
they will never be used this way. For preference testing the
normalcy criterion becomes more important. The temperature
of presentation should approximate common practice with
the particular material; however, there is one qualification--
extremes should be avoided. For example, cold drinks should
not be lower than about 45 F, and hot food or drinks should
not be above about 170 F.
d. Elimination of Appearance and Other Factors
(1) Appearance factors come under the general topic of uni-
formity; however, there is a special feature that should be
noted. It is sometimes desirable to test samples for other
sensory characteristics, even when they differ in appear-
ance. Differences may be eliminated in one of several
ways, including reduced illumination, use of colored
lights, or the addition of coloring normal for the product
type.
(2) Similarly, differences in other nonpertinent factors may
be masked by appropriate means. For example, differ-

ences in texture or consistency can be eliminated by sub-
jecting all samples to maceration or blending, perhaps
with the addition of water.
e. Order of Presentation [12,13]
(1) When a test involves more than one sample, the order in
which the samples are tested is very important. People
may respond differently to the samples simply because of
the order of presentation. This is related to the traditional
"time error" of psychophysical experimentation. Also,
they may react to a given sample differently because of
the qualities of the sample which preceded it. This refers
to "contrast effect" and "convergence effect." Experience
has proven that no amount of instruction or training will
avoid these effects without otherwise biasing results; nor
is it necessary, since the effects can be neutralized.
(2) The principle is to balance the order of presentation
among subjects so that over the entire test each sample
will have preceded and followed each other sample an
equal number of times. More simply stated, use all pos-
sible permutations of order of presentation an equal
number of times. The same objective may be accomplished
in a large experiment by randomizing order; however,
balancing is more efficient.
(3) When samples are served simultaneously, as in a triangle
or rank order test, the same problem exists. One sample
must be considered before another. When samples can be
apprehended almost simultaneously, as in visual compari-
sons, the phenomenon is called "position error." The
same solution applies here. So balance the geometric
(for example, left to right) arrangement of samples, and
instruct subjects so that over the entire experiment each
sample is considered in each position, or time sequence,
an equal number of times.
.I. Number of Samples [14]
(I) The number of samples that should be presented in a
given test session is a function of the type of product
being tested. Obviously, the minimum number depends
upon the test method, and, in most testing, we are actually
concerned with the permissible maximum number.
(2) Generally, several samples or sets of samples may be
considered during a single session. The actual number
depends upon how quickly subjects may become fatigued
and, to some extent, whether the test products are pleas
Universidad Nacional De Colombia (Universidad Nacional De Colombia) pursuant to License Agreement. No further rep
ing. If the series is extended beyond a certain point, the

test results inevitably become less discriminating. Strength
of flavor, persistence of flavor, and anesthetic and other
physiological effects all must be considered. Motivation is
an important factor, perhaps more so than physiology.
Subjects usually lose their desire to discriminate before
they lose their capability.
(3) Generally speaking, it is permissible to conduct much
longer sessions with trained panels than with preference
test subjects. Here the experimenter, working constantly
with the same group and, perhaps, the same materials,
can adjust session length on the basis of feedback from
the panel members.
(4) The following recommendations are made as general
guides to be used in the absence of more specific informa-
tion about a particular test situation for odor and taste:
(a) In single stimulus evaluation of preference, three or
four samples of most products may be presented.
Six is the maximum number.
(b) In paired comparison preference tests, serve a maxi-
mum of three pairs.
(c) In taste testing using rank order preference, serve a
maximum of four to six samples.
(d) In difference testing with trained panels, present a
maximum of six pairs or four triangles.
(e) In single stimulus evaluation with trained panels,
whenever possible, present no more than six samples.
(5) Further information in regard to number of samples is
provided in the section on particular test methods.
II. Test Forms [15]

The most commonly used test forms are paired comparisons, rating
scales, rank order, forced choice methods, threshold methods, and
methods of quality attribute analysis.
A. PAIRED COMPARISONS
1. Scope and Application
a. Paired comparisons, the oldest of the recognized psychometric
methods, is based on the simple act of making a choice be-
tween alternatives. Almost any kind of psychometric problem
can be presented in this form. It may be used whenever a
quantifiable psychological dimension can be specified and two
objects are available for comparison.
b. Its range of potential application is broadened by its simplic-
ity. The task is easy to understand and can be explained with

out relying on written communication. It can be used even
with small children, illiterates, or any unsophisticated sub-
jects.
c. As the number of samples in the set to be compared increases,
the number of possible comparisons rapidly increases to the
point where they cannot be handled in a single session because
of excess fatigue. Also, the method may fail to take advantage
of the subject's full range of ability to discriminate.
2. Summary of Method
a. Two stimuli are presented, either simultaneously or succes-
sively, and the subject is asked to select one over the other on
the basis of some previously defined dimension.
b. In its simplest application, lhe method is used to compare
only two samples, but it can also be applied in experiments
designed to make multiple comparisons within a series of
samples. A subject may test only one pair of samples at a
single session or may make a series of comparisons up to the
physiological or psychological limit.
c. Results are obtained in terms of the relative frequencies of
choice of the two samples, usually as accumulated for all
subjects participating in the test. There are several methods of
determining the reliability of results. Methods are also avail-
able for developing scale values for a series of samples from
the results of comparisons made within each pair.
3. Procedure
a. The subject is told what attribute to judge. Usually this is a
simple matter and can be condensed along with procedural
instructions into a brief sentence.
b. The two samples of the pair may be presented at the same
time, or one at a time in succession. The first is more com-
monly done, since it saves time for the experimenter and is
necessary if the subject is to be permitted to try each sample
more than once. In successive presentation, the test operator
directly controls both the order of presentation of samples
and the time interval between them. When the samples are
presented simultaneously, these factors cannot be controlled
except through instructions to subjects.
c. The time interval between samples of a pair may vary from 10
to about 40 s. The longer intervals are used when stimuli are
strong, thus probably highly adapting. Intervals for odor tests
should be in the same range. With simultaneous presentation,
nearly all subjects will overestimate the passage of time and
try the second sample too soon. When multiple pairs are
presented at a session, the interval between pairs should not

be less than 40 s.
d. Subjects are often instructed to rinse their mouths between the
members of a pair; however, this is not always done, and there
is some evidence that failing to rinse does not diminish the
capability of discriminating. When multiple pairs are pre-
sented at a session, subjects should be required to rinse be-
tween pairs.
4. Special Considerations
a. Designs for Multiple Treatments
(1) The classical definition of the method requires that, when
more than two treatments are involved in an experiment,
all possible pairs should be tested by each subject and
that, over the entire experiment, all possible permutations
of pairs should be used an equal number of times; how-
ever, actual practice often compromises, particularly when
there are many samples. A single subject may test only a
subset of pairs, so that it will require several subjects to
finish all of one permutation of pairs.
(2) Frequently situations arise where one is interested only
in certain pairs within a set. For example, one may have
a logical control, such as the product currently being
produced, and want to know how each of a number of
experimental products compares with it.
b. Limitation on Interpretation of Results
It must be remembered that the results obtained with the
paired comparison method show only the relationship between
two samples, or among the members of a limited set. Thus,
their value is strictly dependent on the validity of any com-
parison standard which is used.
5. Analysis of Data
a. In any paired test, failures to respond may occur because the
subject cannot decide between two samples. (Some question-
naires are even designed to permit a "same" judgment, but
this is not recommended because it encourages higher fre-
quencies of failure to reach a decision.) These cases cannot be
included in the usual type of analysis. Sometimes they are
simply discarded, but this is not recommended because it
misrepresents the data. The other alternative is to split the
"no choice" cases equally between the two alternatives,
assigning any odd case randomly.
b. Methods of Analysis
(Section IV: Table 1, t-test for proportions, chi-square.)
B. RATING SCALES [15,16]

a. The rating scale methods all provide the subjects with a scale
showing several degrees of magnitude. A dimension of evalua-
tion, for example, a characteristic of the product type, is also
specified. Stimulus objects (for example, food samples) are
presented, and the subject's task is to assign each a scale
magnitude to reflect the amount or intensity of the specified
characteristic or attribute.
b. This method, like paired comparisons and rank order, has
broad application. In theory it could be used with any psycho-
logical dimension that is quantifiable and can be perceived
or conceptually understood. The nature of the object rated
can vary widely. In the kinds of applications with which this
manual is concerned, the evaluation is based on the subject's
immediate perception of a material or his feelings about that
material, but rating scales can also apply to feelings or
opinions in more general situations.
c. In basic scope, rating scale methods cover almost the same
ground as paired comparisons and rank order. Also, there is
a great deal of overlapping in their application. However,
the rating scale approach is limited by certain inherent fea-
tures. Subjects must exercise a greater degree of sophistica-
tion, how much depending upon the complexity of the situa-
tion. Further, the method has no advantage over paired
comparisons, unless it is possible for the experimenter to
describe, and for the subject to perceive, more than two
degrees of the attribute to be measured. These, however, are
theoretical considerations. Seldom is it possible to base one's
choice of a method on full consideration of all of these factors.
d. Common applications of the rating scale method in this field
include:
(I) Evaluation of hedonic value (preference), that is, peoples'
feelings of "like" and "dislike".
(2) Evaluation of peoples' opinions about the quality of
materials.
(3) Evaluation, in either hedonic or quality terms, of the
response to certain general attributes of a product, such
as texture, appearance, flavor, consistency, etc.
(4) Evaluation of the degree or intensity of specific attributes
of a material, such as sweetness, hardness, redness,
smoothness, amount of off-flavor in a food, etc.
2. Types of Rating Scales
a. A notable feature ol this method is the great variety of
particular scales that have been and are being used. Considera-
ble variation is permissible without affecting the value of the
results; however, freedom is not unlimited. Some variations
may adversely affect discrimination or reliability.
b. The central idea of a rating scale is to create the impression
of a continuum related to some undimensional concept and
provide the subject a ready means of locating an object in
relative position on that continuum. The following types of
scales are recognized:
(1) Graphic scales: either a simple line or one marked off
into segments. Direction, that is, which end is "good"
and which "bad" or which is "more" and which is "less",
must be shown.
(2) A verbal scale: consisting of a series of brief written state-
ments, usually the name of the dimension with appro-
priate adverbial or adjectival modifiers, which are written
out in appropriate order.
(3) Numerical scales: consisting of a series of numbers
ranging from low-to-high, which are understood to
represent successive levels of quality or degrees of a
characteristic.
(4) Scale of standards: where the distinguishing feature is
the use of acutal physical samples of material to represent
the scale categories. Sometimes such scales are partial,
that is, some but not all of the scale categories are repre-
sented by physical standards.
c. One type recommended is a line graph marked into segments,
with verbal anchors either at all or at alternate points. They
may be placed either horizontally or vertically.
d. The length of scales may vary. Physical extent may vary
within wide limits without affecting results as long as the
scale remains easy to read. The critical feature is the number
of segments or points which are specifically designated on
the scale. While an exact recommendation is hardly justified,
certain guides may be provided.
(1) In general, discrimination and reliability of results in-
crease with increased length; however, beyond nine
points this increase is slight. Longer scales do not appear
to be warranted except in special cases.
(2) The number of categories on a scale may be adjusted to
the extent of variation likely to be found in the products
or qualities evaluated.
(3) In general, rating scales should not have less than five
categories.
e. Another way of classifying scales is as single (unipolar) and

double (bipolar). An example of the first would be a scale
to evaluate the degree of a certain quality where it would be
meaningless to specify degrees of absence of the quality. An
example of the second would be a scale to evaluate hedonic
value or quality where degrees of good and bad are both
meaningful. Whether to use a single or double scale depends
upon the characteristic being evaluated. The designer of a
scale should determine how it can most clearly be presented.
Usually single scales should have fewer points.
f. An important factor, bearing on the use of rating scales and,
to some degree, on the use of other methods as well, is the
dimension of evaluation specified. It is a frequent fault to
specify a quality which may be meaningful to the experimenter
but which the subjects either do not understand or understand
in different ways. Some things, such as preference or ideas
about quality, sweetness, hardness, and the like cause no
problems. This matter should be carefully considered when-
ever a new dimension is specified.
g. In theory, the points on the scale should be equidistant. This
may be unattainable, so the practical objective should be to
ensure that the points of the scale are clearly successive. The
best policy in a verbal scale is to use simple adverbs and
adjectives that are likely to mean the same to all persons.
a. For analysis, successive digits are assigned to the points of
the scale, usually beginning at the end representing either
zero-intensity or the greatest degree of negative feeling or
opinion. This follows the convention of having higher num-
bers represent greater magnitude or more of a given quality.
b. Although normally this method calls for presentation of
stimuli one at a time, rating scales may be used in test situa-
tions where all samples of the series are presented together.
However, when this is done, the order in which the samples
are tested should be controlled by instructions. When there
are only two samples, the rating scale method is similar to
paired comparisons, except that the subject assigns each a
rating on the scale rather than simply choosing one sample.
When there are more than two samples, it is analogous to
ranking, except that each of the samples is assigned to some
point on the scale rather than simply being placed in order.
c. In interpreting rating scale data, it should be kept in mind
that the actual average values have no great importance, since
they have been assigned arbitrardy, aside from a possible
22 MANUALON SENSORY TESflNG METHODS
backlog of data obtained using the same scale with compara-

ble populations The relative values within a given experi-
ment, however, are not subject to this limitation.
4. Analysis of Data
(Section IV: analysis ot variance, t-test, multiple comparisons.)
C. MAGNITUDE ESTIMATION [17]
a. This method is like the rating scale method in that its purpose
is to assign degrees of magnitude to stimuli on a specified
psychological continuum; however, it merits separate descrip-
tion because of the radically different way of obtaining the
scale, which is developed by the subject rather than being
given a priori by the experimenter.
b. The range of application has been limited. For the most part
it has been employed in laboratory experiments to investigate
the relation of physical stimulus intensity to perceived magni-
tude. Isolated instances of its trial use for measuring prefer-
ences have been reported. It would seem to have the potential
for use in a variety of problems.
a. The subject is first familiarized with the range of stimuli to
be presented, or, as with food preferences, he has a "built-in"
understanding of the range.
b. He is then instructed to conceive of this range as lying within,
or represented by, a numbering system (for example, 0 to 10,
0 to 100, or zero to infinity). Special emphasis is placed on
the necessity of using the system as a ratio scale. For example,
assigning 10 to a stimulus should mean that it seems ten
ti~aaes as strong as a stimulus which is called 1, half as strong
as one which is called 20, and one tenth as strong as one
which is called 100. Physical reference samples with pre-
designated values may be included.
c. Stimuli are presented singly, and he assigns a number to each.
d. A single subject may give replicate judgments, or results may
be based on averages for a group of subjects.
3. Analysis of Data
Data may be analyzed in the same ways as other rating scale
data. Also, there are analyses specific to such data [17].
D. RANKING METHODS--RANK ORDER [2]
a. Again, in theory, this third traditional psychometric method
could be used to evaluate on any psychological dimension;
however, its use in sensory testing of materials has been
limited because of the difficulty arising when it is necessary

to consider many stimuli at the same time.
b. It is more useful when the samples can be considered in succes-
sion with minimum time lag between them, for example, in
judging visual characteristics. With some odor and taste
stimuli it tends to create confusion. It has a compensating
advantage in accomplishing the general purpose of rating
scale evaluation when no suitable rating scale is available.
c. Usually the ranking task can be done more quickly than
evaluation by other methods. Thus, one of the main applica-
tions of the method is for rapid preliminary screening.
a. A number of samples are presented to the subject. His task
is to arrange them in order according to the degree to which
they exhibit some specified characteristic or according to his
teelings or opinions about them.
b. It is, in effect, an extension of the paired-comparison approach
beyond two samples.
3. Procedure
a. Again a necessary preliminary is to ensure that subjects
understand and agree upon the dimension or criterion of
evaluation.
b. Usually all samples are presented at the same time, and
sufficient material is provided so that a subject can check
back on his first impression. Samples may be presented singly
in succession; however, the latter procedure loses a major
advantage of the method.
c. The subject is instructed to proceed in a certain order based
on spatial arrangement (usually left-to-right) for the first
examination and to allow a given time between samples,
which varies according to the nature of the material and the
characteristic being judged. He is permitted, and usually
encouraged, first to assign a preliminary order and then to
check back on the placement of particular samples if he is in
doubt.
d. The number of samples may vary from three up to a limit of
about ten. The limit is dependent on span of attention and
memory as well as physiological considerations. The permissi-
ble limit is greater for trained than for untrained subjects.
With the latter no more than four to six samples should be
included. The number varies with the sense modality involved.
It is greatest for the judgment of visual factors, next for odor,
and least for taste.
As with paired comparisons, rank order results evaluate
samples only in relation to each other. To avoid this limitation,
the rank order method of testing may be combined with the use
of a rating scale.
5. Analysis of Data
(Section IV: Tables 2 and 3, chi-square analysis of rank order
data.)
E. FORCED CHOICE METHODS [6]
I. Scope and Application
These are the difference testing methods. The various forms
are used either for determining whether two products differ in
any way or whether they differ in regard to a specified dimension
or characteristic. These are the most sensitive methods, hence
are most applicable where differences are slight. When large
differences are involved, these methods are less useful than the
rating scale. A major application is in flavor quality control of
the production of beverages and foods.
Several variants of this general type are described. Their
common element is that each creates an arrangement of samples
representing a problem which the subject tries to solve. He is
forced to choose one sample, and this choice can be designated
as either correct or incorrect. When the frequency of correct
solutions is above the chance level, a difference is inferred.
3. Description of Methods
a. Triangle Test [18]
Three samples are presented either simultaneously or
successively. Two are the same, representing a single lot; the
third represents another lot and may be different. The subject
is required to pick the sample which he believes to be different.
b. Duo-Trio Test [19]
The set of samples is the same as in the triangle test; how-
ever, now one of the identical samples is identified as the
"control," and the subject is required to pick the unidentified
sample which is different from the control. The control is
always considered first. Usually the samples are presented
successively with a controlled time interval between them;
however, they can be presented simultaneously.
c. Dual-Standard Test [1]
This is like the duo-trio except that four samples are in-
volved. The subject is first givcn both samples, identified as
control and "other", and allowed to examine them. Then he
is given the same samples as unknowns and is required to

identify them.
d. Paired Difference Test [6]
This method employs a standard external to the test itself.
Two samples are presented, and the subject is required to
choose one of them on the basis of some specified charac-
teristic.
e. A-not-A or Single Stimuli [6]
The subject is permitted to study the two materials until
he believes he can identify them. Then he is presented with a
series of control and experimental samples, the order of which
has been determined randomly, and is required to identify
each.
f. Multiple Standards [6]
This test is designed for a special type of problem, where
the standard cannot be represented by a single product. The
typical question is whether a test lot of product differs generL
cally from a product type within which there is, or may be,
considerable variability. Several (preferably two to five)
standards representing the product type are presented to the
subject together with the test sample. The situation is described
to the subject, and he is instructed to select the sample which
differs most from all of the others.
4. Design
a. Certain basic features of experimental design apply to the
forced choice methods just as with any other method. For
example, it is necessary to balance sample presentation to
control for time error and position error (I.C.3e). There are,
however, certain special problems which arise with tests of
this type.
b. With both the triangle test and the duo-trio test it is necessary
to determine for any given test unit which of the two samples
should be duplicated and which should be the " o d d " sample.
This can be done in two ways. Either one of the samples
can be selected for use as the control throughout the whole
test, or the two samples can be used alternately as the control.
Which procedure to adopt should be decided by the following:
(1) When one has no knowledge about the possible difference
and, therefore, cannot form a valid hypothesis, the two
samples should be used equally often as control, that is,
half of the individual tests should employ one sample as
control and half should employ the other.
(2) There are several cases where it is advantageous to use
the same control throughout, since it increases the proba-

bility of discrimination.
(a) Experience has shown that subjects are more likely
to make correct judgments when the control repre-
sents a familiar flavor, rather than a new or strange
one. For example, the situation often occurs where
one wants to determine whether a process variation
or a formula change with a standard product has
changed its flavor appreciably. Most panel members
are likely to be familiar with the company's products
and have probably tested them many times. An
analogous situation is where production samples are
constantly being checked against an accepted stand-
ard. In cases such as this, the relatively well-known
standard product should be selected as the control.
(b) Subjects are more likely to make correct judgments
when the odd sample is stronger in flavor than the
paired samples. Hence, the "weaker" sample should
be used as the control whenever it is known, or
suspected, that the major difference between the
samples will be in regard to flavor strength. For
example, one may want to determine a tolerance for
the addition of a strong flavored ingredient to a
product. Here one would use the sample with the
lesser amount of the ingredient as the control.
(c) Subjects are more likely to make correct judgments
when the odd sample tends to be less pleasant than
the control. In cases where one can make a reasonably
accurate prediction of the direction of preference if
the difference is detected, the suspected "better"
sample should be used as the control. This criterion
may sometimes conflict with the weaker sample
criterion. In such a case the suspected weaker sample,
rather than the suspected "poorer" sample, should
be used as the control.
c. When a subject is to be given two or more forced choice tests
in immediate succession, it is necessary to control for what
may be called "expectation" effects. The subject may come to
expect the position or time-sequence of the samples in the
second, third, etc. tests to bear some logical relation to those
in the earlier tests. For example, if he judged the first unknown
to be the odd sample in the first of two duo-trio tests, he might
expect the second unknown to be the odd sample in the second
test. Normal control procedures for time error or position
MANUAl. ON SENSORY TESTING METHODS 27
error dictate that each possible sequence or pattern of posi-

tions shall be used equally often in the course of a given test.
To control for expectation effect, it is important that the
allocation of sequences or patterns be done randomly and
that the subjects are aware that this is so.
d. The A-not-A presents a special problem of expectation effect,
since with the longer series, a subject is particularly likely
to divert his attention to trying to "figure out" the series.
For example, he may decide that the experimenter will alter-
nate the control and experimental samples or that, after a
number of controls, the next sample will be an experimental
sample, or that he will be given exactly equal numbers of
experimental and control samples during the series. The
identity of each sample in the sequence must be determined
randomly and independently. Note that this does not mean
that the two samples are served equally often in the series.
Again, subjects must be aware of how the sequences are
established.
a. Number of Tests
Guide lines on the number of samples that should be tried
by a subject in a single test session are set forth above (I.C.3f)
in the form of safe maximum numbers from the standpoint
of adaptation; however, these maxima are sometimes exceeded
in special applications. Difference tests are often used in
relatively constant situations where the same type of material
is tested by the same panel members over a long period. This
permits experimentation with the system to determine how
far the maximum number of tests may be extended. The limits
will depend upon the type of material to be tested, the training
and motivation of panel members, and the extent to which
one may be willing to sacrifice discrimination in the interests
of economy of testing. This may be especially useful in quality
control applications when the number of available panel
members is small. Such extensions of the test series should
not be adopted without first experimentally proving their
feasibility in the particular situation in which they are to be
used.
b. Comparison of Methods
The triangle test is probably used more often and by more
laboratories than the other forced-choice methods. This may
be due to its historical priority but also is due, in part, to its
ease of administration and the formal simplicity with which
the problem may be presented to the subject. It appears to
have no real advantage over the duo-trio from the standpoint

of efficiency and precision of discrimination. A panel will
perform better using the method with which it has had the
most experience. Therefore, it is recommended that in a
given situation one of these methods be selected and used
exclusively. The paired difference test will provide better
discrimination in those problems where it is appropriate;
however, its breadth of application is limited by the need for
a subjective standard external to the test. It cannot be used
unless there is assurance that all panel members understand
the criterion in the same way. The dual-standard test has seen
little use. The presence of the second standard, in theory,
should improve discrimination by providing additional in-
formation, but it also makes the test situation more complex.
Its greatest potential is in odor testing; in taste testing, the
extra sample is a more serious disadvantage. The A-not-A
test has been used only experimentally up to now but may be
valuable in special applications. The multiple standards test
should not be used with problems where the others will serve.
C. Complex Sorting Tasks
All forced-choice methods may be considered as "sorting
tasks"--even the paired difference test which requires only
the sorting of two objects into two classes. The essence of
the methods described above is simplicity. Tasks of increasing
complexity can readily be designed. For example, the subject
may be presented with eight samples, four of one kind and
four of another, and asked to sort them into their two classes.
Usually the merit alleged for such tests is their efficiency in
the sense that the probability of a fully correct solution by
chance alone is very low; hence, a difference can be proved
with only a few trials. However, it is also true that, as com-
plexity increases, so also does the probability that a person
will make errors even though he may have the proven capa-
bility of detecting the difference between the products. More-
over, the frequent failures associated with such tests tend to
affect motivation adversely. Thus, while complex sorting
tasks are valid for experimenting on perception and problem
solving, the more simple test forms should always be used
tor regular, product-oriented work.
d. Interpretation of Results
(1) The usual analysis of forced-choice data is to compare
the observed number of correct responses with the number
that theoretically would result from chance alone and to
calculate the probability of the occurrence of the observed
number. If that probability is low, we say that a difference

has been established. One is more certain of a result at
the 1 percent level than of one at the 5 percent level;
however, it is not valid to consider these levels of signifi-
cance as a measure of the degree of difference between
products because they are critically dependent upon the
number of trials. The appropriate measure of degree of
difference is the percentage of correct responses.
(2) Difference tests are almost always run with small trained
panels. This, together with the specialized test conditions,
means that positive results (that is, establishing a differ-
ence) are seldom, if ever, representative of what would
be found with the general population; however, negative
results can be projected. For example, a significant
percentage correct even at beyond the l percent level
cannot be interpreted to mean that the typical consumer
will notice the difference, but only that it is possible. On
the other hand, a "no-difference" result provides rea-
sonably good assurance that the consumer will not find
any difference.
6. Analysis of Data
(Section IV: Tables 4 and 5, t-test for proportions.)
F. THRESHOLD METHODS
I. Scope and Application
a. These methods are designed for the specific purpose of deter-
mining the minimum detectable level or concentration (abso-
lute threshold) of a substance or the minimum detectable
change in concentration (difference threshold). Level or
concentration implies a substrate, usually one which is rela-
tively neutral from a sensory standpoint (for example, the
threshold of sucrose in distilled water or the threshold of
mercaptan in air). To make generalization of the data mean-
ingful the purity of the substrate used, as well as the chemical
purity of the stimulus material, should be as high as possible.
Thresholds should n o t be used as a measure of above threshold
relative intensity. There are two basic criteria of response,
one is the detection threshold in which the subject has only
to respond to difference from some neutral background, and
the second is the recognition threshold, where the subject
must name the specific stimulus, for example, salt, sweet.
The detection threshold is generally lower than the recognition
threshold.
b. Establishing thresholds reliably is time consuming, and they
are used mainly in research or specialized applications.
Universidad Nacional De Colombia (Universidad Nacional De Colombia) pursuant to License Agreement. No fur
Certain forms (for example, dilution methods) are used

routinely in studies of air contamination.
2. Preparation of Samples
A series of samples is prepared representing increasing
concentrations of the substance of interest in the selected
substrate. The series is such that it brackets the range in which
the threshold lies with six to ten steps. Use of a log series of
concentrations (that is, with fixed ratios) is recommended.
Preliminary examination and testing is required to locate
the appropriate range. When determining the absolute thresh-
old, the series starts at zero concentration. With difference
thresholds, it starts at a point definitely lower than the stand-
ard and goes to a point definitely higher than the standard.
3. Selected Methods
a. Constant Stimulus Differences [20]
This method may be used for either absolute threshold or
difference threshold measurements. For the absolute thresh-
old, the standard is zero concentration. Each sample is paired
with the standard, and the pairs are presented in random
order. The subject judges which sample in each pair is
stronger. The point in the series of concentrations where 75
percent of the judgments are correct is designated as the
threshold. For the difference threshold, a series of standards
(about four) is used, each one bracketed by its appropriate
range of concentrations. Again the subject judges which of
each pair is the stronger, and the difference threshold is the
point where 75 percent of the judgments are correct, that is,
agree with the direction of the physical difference.
b. Method of Limits
This method is used for absolute threshold determination.
The subject is previously trained in what to look for. The
samples in the series are presented in order of physical con-
centration, and the subject judges the presence or absence of
the designated quality. Ascending series (starting with a
below-threshold stimulus) and descending series are usually
given alternately. The series is continued until the judgment
changes (from no to yes or vice versa) and stays the same for
two successive presentations. The starting point is varied
among successive series of presentations. Blanks (zero con-
centrations) may be used in the series. A single threshold is
the average of the values obtained in an ascending and
descending series.
4. Analysis of Data
(Section IV L)
Universidad Nacional De Colombia (Universidad Nacional De Colombia) pursuant to License Agreement. No further reproduction
G. QUALITY ATTRIBUTE ANALYSIS

This problem area is basically different from that with
which the other methods are concerned. The difference is
reflected in the distinction, quantitative versus qualitative.
The other methods have the purpose of measuring known
or designated dimensions; here the objective is to identify
those dimensions which should be evaluated, then to deter-
mine their relative intensity. They are descriptive techniques
and provide a different kind of information, although many
of them are semi-quantitative, too. Their usual application
is to investigate products which are relatively unknown and
determine their characteristics from the perceptual standpoint
2. Summary of Methods
Methods in this area are usually complex and specific to given
problems. There are various special techniques. (See IliA, 5
and 6.)
a. A notable feature of methods of quality attribute analysis
is their variability. A wide variety of approaches are used.
Most of them incorporate certain of the basic methods de-
scribed above, for example, rating scales or paired com-
parison. Some concentrate on the problem of identifying a
broad range of attributes; others may be limited to considera-
tion of relatively few readily identifiable attributes. Some
attach great importance to verbal descriptions; others attempt
to eliminate the language component as much as possible.
b. In general, the use of language is very important in quality
attribute analysis. One must be concerned not only with what
is perceived but also with how the information is transmitted
by the test subject. This becomes another major source of
variability in the test results. The "one word--one meaning"
assumption is often made but may be unwarranted. One
should be aware of the possibility of semantic error and do
something about it. Devices commonly employed to eliminate,
or reduce, the error arising from this source include the use
of physical reference standards and intensive training of
panel members.
c. Statistical analysis has been a serious problem with methods
of quality attribute analysis. It has been difficult to develop
statistical models to represent their overall operation, al-
though statistical tests can be used with certain aspects of the
data from many procedures. It is more common to assess
32 MANUAl. ON SENSORY TESTING METHODS
reliability subjectively by judging whether replicate tests

produce reasonably similar information.
4. Data Analysis
(Refer to Section IV for analysis which pertains to the Basic
method employed.)
III. Special Applications

I. Hedonic Scale Method [21]
a. Scope and Application
This is a rating scale method of measuring the level of
liking for foods; however, it is not restricted to foods but
can be used for almost any kind of product where affective
tone is believed to be important. The method relies on people's
capacity to report, directly and reliably, their feelings of like
or dislike. An important advantage is that it may be used
with untrained people as well as experienced panel members;
however, at least a minimum level of verbal ability is required
for adequate performance.
b. Summary of Method
Samples are presented in succession, and the subject is
told to decide how much he likes or dislikes each one and to
mark the scales accordingly. The essence of the method is its
simplicity. Instructions to the subject are restricted to proce-
dures. No attempt is made to direct his response. He is allowed
to make his own inferences about the meaning of the scale
categories and determine for himself how he shall use them
to express his feelings about the samples. A separate printed
scale is provided for each sample presented in the test session.
The scales may be grouped together on a single page or may
be on separate pages.
The scale is verbally anchored with nine categories, as
follows: like extremely, like very much, like moderately,
like slightly, neither like nor dislike, dislike slightly, dislike
moderately, dislike very much, and dislike extremely. These
phrases are placed on a line-graphic scale which may be
oriented either vertically or horizontally. The physical length
of the scale may vary over a considerable range with little
or no effect on the results.
c, Special Considerations
(1) Variations in Scale Form--Many different forms of
hedonic scales may be used without major effect on the
value of the results as long as the essential feature of
verbal anchoring of clearly successive categories is re-
tained. Some of the alternate forms are:
(a) Elimination of the neutral category.

(b) Use of more like than dislike categories.
(c) Reduction in the numbered categories (use of fewer
than five is not recommended).
(d) Replacement of the line-scale with a simple listing
of the verbal categories.
(e) Replacement of the verbal categories with caricatures
representing degrees of pleasure and displeasure
(Smiley scale).
Variations in scale form are likely to cause marked
changes in the distributions of responses and, conse-
quently, in such statistical parameters as means and
variances; however, relative measures tend to remain
constant.
(2) Reliability of Results--The levels of rating obtained on
the hedonic scale may be affected by many factors other
than the quality of the test samples, such as characteristics
of the subjects or of the test situation, transitory attitudes
or expectations of the subjects, etc. Consequently, one
should be cautious about making inferences on the basis
of comparison of average ratings obtained in separate
experiments. This is permissible only when large numbers
of subjects have participated and test conditions have
been comparable. However, the relative preference levels
among samples tested together are usually found to be
very constant from one test to another.
2. Rating Scale Evaluation of Intensity
This method is designed to measure the perceived intensity
of some specified characteristic or attribute of a material.
The dimension of evaluation may be general (for example,
overall intensity of odor or flavor) or specific (for example,
hydrogen sulfide odor or sweetness of a beverage). It is not
to be used for determining the absolute threshold. It may be
used with any material or product and for any attribute
which can be clearly understood by the subjects.
Trained subjects, who have been specifically instructed
in regard to the attribute to be evaluated, are served a series
of samples. Each sample is rated for intensity on a 9-category
interval scale with alternate points anchored as follows:
none, slight, moderate, large, and extreme. Scales of other
lengths are often used.
c. Special Considerations
Usually all samples served in the same test session are to
be evaluated on a single dimension or attribute; however,

the method may also be used to evaluate multiple qualities.
If evaluating more than one attribute, more than one scale
should be used. Subjects must be instructed regarding all
dimensions.
3. Flavor Quality Control in the Production of Beverages [22]
There are many methods of quality control currently in
use. One method provides the basis for assuring flavor uni-
formity in commercial scale production of beverages. Produc-
tion lots are tested against a previously selected standard to
determine whether there is a detectable difference in flavor.
This information is used to decide whether the product is
suitable for release. It is most useful with a product whose
flavor is homogeneous and whose quality is reasonably
constant. When flavor characteristics tend to be variable or
quality is subject to wide fluctuations, it must be supported
by some other method designed to evaluate on the criterion
of relative quality.
A standard is selected and a procedure established for
physically replenishing it without flavor change. Each new
production lot is tested against the standard using the duo-
trio method and a selected, trained panel. The physical
standard is presented as the initial identified member of the
set. The panel should consist of fully qualified subjects. A
subject gives two judgments on a given test at a single session.
A complete test leading to the initial decision that the produc-
tion lot does not differ from the standard consists of at least
20 judgments from the panel. When results from a 20-judg-
ment test suggest that there may be a difference, but fail to
prove it by accepted statistical criteria, the test may be ex-
tended to 30 or 40 judgments. Customarily a lot is considered
rejected when the extended test indicates a significant differ-
ence between the standard and test sample at or beyond the
5 percent level.
(I) Testing of Multiple Products--For efficiency, when there
are many lots to be tested, a subject may be asked to
make up to six judgments at a single session, consisting
of replicate judgments on each of three tests. A suitable
recovery period is required between tests, and a water
rinse is used. Whenever possible in such cases, the weaker-
flavored materials are tested first.
Universidad Nacional De Colombia (Universidad Nacional De Colombia) pursuant to License Agreement. No further rep
MANUAl. ON SENSORY TESTING METHODS 35
(2) "Warm-Up" Sample--With strong-flavored materials

(for example, alcoholic beverages), a conditioning sample
may be used consisting of one of the test samples pre-
sented before the trio of test samples. The subject tastes
it, rinses his mouth, then starts with the critical samples.
(3) Control Features--Strongly flavored materials may be
diluted with taste-neutral water for testing. Samples
should be small but adequate, and their quantity must
be controlled within narrow limits. For a given product
and panel, a time interval between samples is stand-
ardized, and this must also be controlled within a narrow
range.
(4) Panel Performance--Performance of individual panel
members, even after initial selection, should be con-
tinuously reviewed. Anyone whose average percentage
correct is consistently below the average for the whole
panel should be replaced.
4. Flavor Profile Method [23,24]
The primary use of this method is to describe the aroma
and flavor characteristics of food products. It may also be
used with non-foods possessing aromatic characteristics and
has been used to describe texture. It can be used in its entirety
to provide a complete description of a sample or to show
differences among a group of samples; also, it may be used
to identify the specific note, such as an off-flavor, or to show
changes in intensity of a particular quality.
This is a qualitative and semi-quantitative method of
describing an aroma or flavor complex in terms defined below.
(1) Character Notes--perceptible factors, including aro-
matics, tastes, and mouth feelings, described in qualitative
or associative terms.
(2) Intensity--as rated on a scale with categories anchored
as: barely perceptible, slight, moderate, and strong.
Additional points may be inserted between the anchors,
where small differences are anticipated.
(3) Order of Appearance--time sequence in which the various
aroma or flavor components are detected.
(4) Aftertaste---sensory impressions remaining after the
stimulus has been removed from the mouth.
(5) Amplitude--initial overall intensity impressions including
both the separately identifiable factors and the underlying
unidentifiable part of the flavor complex. It is based on
fullness, degree of blending, quality of separate factors

appearing as either first or last impressions, and appro-
priateness of factors for the product. A frame or reference
is developed by examining samples representing the
product-type.
c. Procedure
(1) Orientation Prior to Formal Panel--This involves one or
two informal sessions (more for an inexperienced panel),
where the panel leader outlines the objectives of the
project and presents samples of the test product, including
similar market products where available to the panel
for study. Optimum conditions for handling and testing
samples are established, and a vocabulary of descriptive
terms is developed through discussion and use of refer-
ence standards.
(2) Formal Panel Sessions--These consist of, first, a closed
session in which each panel member independently
examines the samples and records his findings, followed
by an open session for reporting and discussion of in-
dividually recorded findings. Language differences are
resolved, ideas are exchanged, disputed points are sorted
out for future study, and future sessions are planned.
(3) Analysis and Interpretation--This is the responsibility
of the panel leader, who should be capable of expressing
the panel results so that they are meaningful to others.
Normally the results are not treated statistically, but
statistical assessment is possible and sometimes is used.
d. Panel Members
(1) Selection--Panel members are selected on the basis of
interest, ability to identify the basic tastes, ability to
recognize and describe a series of odor stimuli, and
ability to discriminate flavor intensities.
(2) Training--This requires a formal initial phase consisting
of lectures and demonstrations to provide background
information on odor and taste perception, techniques
of evaluating different types of products, use of reference
standards, and development of descriptive vocabulary.
This is followed by months of daily practice sessions to
provide the opportunity to apply this knowledge, develop
individual skills, and to learn to work as a group. Any
panel member, once trained, is qualified to train new
panel members.
5. Quality Attribute Check List [25]
This is an exploratory method which may be used to develop
information about the characteristics of a product that may

be important to consumers. The results should not be con-
sidered as definitive measures.
A list of the sensory attributes that may apply to the
product type is developed. Samples are examined by the
subject who indicates those characteristics which he believes
apply. Sometimes the intensities of characteristics are also
indicated.
C. Procedure
(I) The first step is development of the list. Its length and
scope may vary with the test purpose. It may include
only a limited number of qualities, such as those which
are most likely to occur or those the experimenter is
interested in, or it may expand to include every charac-
teristic that might conceivably apply.
(2) A printed list is provided, usually with the attributes
grouped according to some logical scheme, for example,
by odor, texture, appearance, etc.
(3) Samples are served singly and in fairly large amount if
the list is extensive. While examining the sample and
checking back as often as necessary, the subject goes
through the list and checks those attributes which he
believes apply.
(4) Results are stated in terms of the percentage of times
each attribute is checked.
6. Flavor and Odor Characterization
This is a technique designed to evaluate flavors and odors
and to differentiate between them qualitatively as well as
quantitatively. The evaluation is accomplished by rating
individual characteristics which can be recognized in complex
flavors and odors.
A list of pertinent flavor and odor characteristics is devel-
oped, sometimes with the help of reference compounds. A
panel of nine to twelve trained subjects is used, although as
few as five panel members will suffice. Each panel member
rates the perceived characteristics on separate (9-point)
intensity scales. The tabulated data represent a flavor or odor
pattern.
C. Procedure
In an initial round-table discussion, the panel members
indicate what characteristics or attributes are perceived in
the test samples. The suggested characteristics are discussed,
so that the subjects are familiar with the meaning of the terms
used. Thc terms are edited to a usable number (generally no
more than twelve), taking into account both desirable and
undesirable characteristics. Score sheets are prepared which
provide for rating each attribute separately on a (9-point)
intensity scale with categories ranging from "none" to "very
strong". The subjects evaluate the test samples for the selected
characteristics.
Samples are coded and randomly presented to subjects
who are seated in individual booths. Test results are not
discussed with subjects as a group, and subjects are n o t
allowed to change scores once the test is completed.
The results may be analyzed by inspection, which means
that the averages are compared without further mathematical
treatment. Also, they may be analyzed by the same methods
which are applicable to other rating scale data.
7. Food Action Scale (FACT) Method [26]
This is a rating scale method of measuring the level of
acceptance of food products by a population. The method
relies on peoples' capacity to report, directly and reliably,
their attitudes and predicted actions toward a food stimulus.
It is primarily designed to be used with untrained consumers,
although it can be used with experienced panel members. A
minimum level of verbal facility is required for adequate
performance. The scale is not applicable to use for rating of
specific characteristics but rather for a measure of general
attitude toward the stimulus.
Samples are presented in succession, and the subject is
told to decide which of nine statements on a scale best repre-
sents his attitude toward the product. He is allowed to make
his own inferences about the meaning of the scale categories.
A separate printed scale is provided for each sample presented
in the test session.
The scale has nine categories verbally anchored as follows:
I would eat this every opportunity I had.
I would eat this very often.
I would frequently eat this.
I like this and would eat it now and then.
I would eat this if available but would not go out of my
way.
I don't like it but would eat it on an occasion.
I would hardly ever eat this.

I would eat this only if there were no other tood choices.
I would eat this only if I were torced to.
These phrases are placed on a line-graphic scale which may
be oriented either vertically or horizontally.
(1) Variations in Scale Form--The word "eat" in the scale
may be replaced by "drink", "buy", or "use".
(2) Reliability of Results--The scale provides reliability at
the same level as that of the hedonic scale. The same
precautions as to generality of results apply to the FACT
scale as to the hedonic scale.
8. Triangle Test--Degree of Difference [27]
This is a special purpose procedure to be used in difference
testing where large differences are anticipated. In such cases
a high proportion of choices are likely to be correct, and this
combination technique provides a useful extension of the
triangle test. The method call be used with any difference
problem; however, for most tests the additional information
is usually not worth the extra time and effort. The method
has been used infrequently.
This is simply a combination of two methods already
described--the triangle test (II.E.3a) and a rating scale
0I.B).
c. Procedure
(1) The subject takes the triangle test in the usual way and
then is asked to rate the amount or degree of the perceived
difference, using the 9-point intensity scale with alternate
points anchored at none, slight, moderate, large, and
extreme.
(2) The triangle test results are interpreted in the usual way
in terms of the percentage of correct responses.
(3) The scale results are analyzed separately, using only the
ratings associated with correct triangle judgments.
9. Triangle Test--Characterization of Difference [281
This is another special purpose variant of the triangle test
designed to provide additional information (that is, beyond
the proportion of correct choices). It is most useful when used
routinely in quality control work or in the examination of
new products. Its primary advantage is efficiency, since, when
Universidad Nacional De Colombia (Universidad Nacional De Colombia) pursuant to License Agreement. No further repr
the basic triangle test finds a difference, the characterization

technique may provide clues as to the nature of the difference,
obviating the need for a special test.
This is a combination of the basic triangle method with a
simplified version of the flavor characterization method.
c. Procedure
(1) A list of sensory characteristics, judged to be important
in connection with the product type which is being tested,
is prepared.
(2) This list is shown on the questionnaire in a form which
asks for a paired-comparison judgment between the
odd sample and the two identical samples with an "equal"
response permitted. For example:
darker lighter equal
sweeter not as sweet equal
stronger vinegar weaker vinegar equal
spicier not as spicy equal
(3) The subject takes the triangle test in the usual way and
then is required to try to describe the difference between
the odd sample and the identical pair by checking each
of the 3-point characteristic scales in answer to the
question, "The different sample was . . . . "
d. Analysis and Interpretation of Results
(1) The triangle test results are analyzed in the usual way.
(2) For analysis of the flavor characterization results use
only the data associated with correct judgments on the
triangle test.
(3) They can be analyzed formally as paired-comparison
data, or they may be simply examined to obtain qualita-
tive impressions of possible trends.
10. Dilution Techniques [29]
Dilution methods represent the application of threshold
measurement techniques to practical problems. The methods
may vary with regard to certain particulars but are basically
the same. Their purpose is to obtain a measure of the differ-
ence between test products and a standard, when the differ-
ences are so large that one of the forced-choice methods
would not be appropriate. Such methods have been used with
dried whole milk, dried eggs, perfumes, and other products.
An appropriate standard product is selected. A series of
dilutions ot the test product in the standard is made up such
that the weakest concentration is not perceptibly different

from the standard, but the strongest is definitely different
from it. The members of this series are tested against the
standard by a trained panel using the method of constant
stimulus differences to determine the weakest concentration
which is perceived as different.
c. Procedure
(1) Select the reference standard.
(2) Establish an upper limit for the series, that is, the highest
concentration which it is reasonable to subject to formal
testing. This has to be done by trial and error, but it is
usually safe to rely on the judgments of just two or three
people to indicate the lowest strength which will be
definitely perceptible.
(3) Decide on a lower limit of concentration for the series
such that it is very unlikely that any panel member will
be able reliably to detect a difference between it and the
standard.
(4) Define a series of concentrations including those which
represent the upper and lower limits. Usually six are
enough, although eight may be used if greater precision
is desired. A log series is more efficient in most cases,
although a series based on arithmetic progression can
be used.
(5) Test the series of concentrations against the standard
using the method of constant stimuli differences (lI.F.3).
Use a trained panel and obtain 15 to 20 judgments for
each member of the series.
(6) The threshold, obtained by the method described above
becomes the measure of the strength of the test product
and, by inference, also a measure of the degree of differ-
ence between it and the standard.
d. Special Considerations
(1) The triangle or duo-trio tests may be used to test the
concentrations individually. This will give greater preci-
sion but may require more time and effort. Testing should
start with a sample in the middle of the series, then
proceed to stronger or weaker concentrations until a pair
of adjacent samples is found such that one is significantly
(5 percent level) different from the standard, while the
next lower concentration fails to reach that level. Obtain
20judgments for each concentration. When this procedure
is employed, the threshold is considered as that concen-
tration where the frequency of correct judgments just
reaches the 5 percent significance level. It may be found

by interpolation between two members of the series, if
no concentration gives exactly the number of correct
judgments required.
(2) The amount of testing required by the method of constant
stimulus differences may be reduced, after obtaining a
few judgments on the full series, by eliminating very
strong samples which are always correctly identified.
(3) The appropriateness of the reference standard has to be
a matter of judgment. It must be meaningful in relation
to the product and the problem represented. This cannot
be specified in advance; however, some examples are
fresh whole milk for testing whole milk powder, or fresh
eggs for testing stored dried eggs. In some situations
distilled water may be a meaningful standard, as in testing
the pungency of spices (Scoville test).
(4) An isolated result has little practical value. Thus, the
method is appropriate only when there is need for evaluat-
ing and comparing a number of treatments of the same
product type. This fact has implications for selection o!
the reference standard. It must be a material whose flavor
characteristics will not change from one source of supply
to another.
IV. Statistical Procedures

This section presents certain methods that are necessary or particularly
useful for analysis of the kinds of data dealt with in sensory testing.
Some general background information is included; however, complete
exposition of methods has not been attempted. Instead the objective is
to provide guidance for the worker who has no extensive training in
statistics but may be required to analyze and interpret test results. This
section is referred to as appropriate throughout the manual. References
are given for those who feel the need for further information about the
techniques.
A. DEFINITIONS
I. This section is included for the convenience of the reader. It
summarizes what is believed to be key information; some of it
is basic material that would eventually be learned in making
routine applications of procedures. Terms that are frequently
encountered in statistical analyses are defined and in some
cases are described briefly in terms of operations.
2. Two aids are included:
a. Glossary of Statistical Notations (p. 62).
There is variation among different authors in regard to the
MANUAL ON SENSORY TESTING METMODS 43
Chart A--Computation of Certain Statistics.

G i v e n sets o f scores for A a n d B as f o l l o w s ( N = 10):
Xa Xa ~ XB Xn 2
5 25 3 9
6 36 I I
4 16 4 16
3 9 5 25
5 25 6 36
6 36 4 16
4 16 2 4
4 16 5 25
3 9 4 16
7 49 3 9
Z ..................... 47 237 37 157
Mean:
~'a - ~Xa _ 47 _ 4.7

N 10
-f(n - ZXn _ 37 _ 3.7

N 10
Variance:
2 NZXa 2 - (ZXA) 2 (10 X 237 - 472 )

sA = = = 1.789
N ( N -- 1) 10 • 9
2 NZXn 2 - - (ZXn) 2 (10 X 157 - - 372 )
sn = = = 2.233
N ( N -- l ) 10 X 9
Standard deviation:
sa = " V / ~ = X / 1 . 7 8 9 = !.337
sn = ~ = ~ = 1.494
Standard error o f mean:
= sA 1.337 _
SEa x/N- X/'-i-O 0.423
_ sa 1.494
SEn ,V/~r - ,V/~_0 - 0.472
Standard error o f the difference between the means:
9 = ~)~n = (1"789 -I- 2"233) 1/2

SEt,_s) (N-F 10 = 0.6342
symbols used to denote certain quantities and statistics. This

glossary defines the symbols as used in this manual. An
attempt has been made to follow the most common practices.
b. Computation of Certain Statistics (Chart A, p. 43).
An arbitrary set of data is given, and the calculation ot a
number of the measures which are described below is demon-
strated.
3. Commonly used measures of central tendency in a distribution
of data:
a. Mean (Chart A) : the arithmetic average of a set of values.
b. Median: the midpoint of an array of scores when they are
arranged from the lowest to the highest. It is the point such
that one half of the scores are higher and one half are lower.
c. Mode: the value of the score which occurs most frequently
in a set.
4. Commonly used measures of variation in a distribution of data
(Chart A):
a. Variance: the average of the squares of the differences between
the individual scores and the average score. Definition for-
mula
~;(X - ~)~
S --
N--1
It also may be expressed in terms of raw scores, eliminating

the need to subtract each score from the mean of the scores
2 N~,X 2 -- (ZX) 2
$ =
N ( N -- 1)
where:
= average score,
X = any score, and
N = number of scores.
b. Standard Deviation: the square root of the variance. This
statistic has special importance in describing distributions of
data and in computing the statistical significance of results
(Chart A).
C. Standard Error of the Mean: this is a key concept in many
tests of significance. It may be implied, even though it is not
directly involved in the computational formula. It is directly
related to the standard deviation and variance by the formula
$
SE=
Universidad Nacional De Colombia (Universidad Nacional De Colombia) pursuant to Lice
It takes into account both the variation in the distribution

and the number of cases. It may be considered as the standard
deviation (estimated) of the distribution of means which
would be expected if the experiment were repeated ad in-
finitum. The standard error concept also applies to propor-
tions (percentages).
d. ,Standard Error of the Difference: this is a measure of the
degree to which the difference between two means would be
expected to vary due to random error. It is estimated from the
standard errors of the two distributions by the following
formula, which is applicable only when the distributions are
independent
SE(a-a) = (SEa s + SEB~) l/~
B. CONCEPTS
1. Degrees of Freedom (df) [30]
This is a key concept, and the symbol d f is frequently en-
countered in statistical writing and statistical tables. Although
discussion of its rationale would not be appropriate to the level
of this manual, a brief exposition of its central meaning may be
helpful to the reader.
Degrees of freedom has to do with the functional independence
of measures. For example, in the expression, A + B = 0, A and
B are functionally dependent, since setting the value of either one
determines the value of the other. Thus, when n measures are
expressed as deviations from a sample mean, all measures except
one are free to vary if we place the restriction on the data that
the sum of the deviations must equal zero. In general, the number
of degrees of freedom is equal to the number of measures minus
the number of algebraically independent linear restrictions
occurring in that set of measures.
The t-test may be used to demonstrate the idea. The formula
for comparing two independent sets of measures is
t-
SE(A_n)
If:
nA = number of measures in Set A, and
nn = number of measures in Set B, then
d f = nA + n, -- 2.
The denominator of the t-ratio is based on the squared devia-
tions of the scores from the mean for each set of data. Thus one
algebraic restriction has been placed on each set of data; there-
Universidad Nacional De Colombia (Universidad Nacional De Colombia) pursuant to License Agreement. No furt
46 MANUAl. ON SENSORY TESTING METHODS
fore, the total degrees of freedom is found by subtracting 1 from

each n and adding the results.
2. Null Hypothesis
This hypothesis is central to the testing of statistical signifi-
cance. Although many kinds of hypotheses can be made and
statistically tested, the most common, and the only one with
which we are concerned here, is the null hypothesis. This working
hypothesis states that there is no real difference between two
means, proportions, distributions, etc. Then tests are applied to
determine how reasonable the hypothesis is. This is stated in
terms of significance level, which tells one the probability that
one would be wrong in accepting the hypothesis.
3. Statistical Significance
A brief, nontechnical, explanation of statistical significance is
that it shows the probability, usually expressed as a percentage,
that a given result could have occurred by chance. It is closely
associated with the null hypothesis. For example, to assert
significance at the 1 percent level means that there is only one
chance in I00 that the null hypothesis should be accepted. Oc-
casionally, significance will be expressed as confidence level,
which is simply the complement, for example, the 95 percent
confidence level is the same as the 5 percent significance level.
The explanation above is concerned with estimating what is
called "Type I error", which is the "error" of finding a difference
when, in fact, there is none. This is the most common application
of the concept, and the only way statistical significance is used in
this manual. Another aspect of statistical significance is con-
cerned with "Type II error", which is the "error" of failing to
find a difference when there is a real difference.
4. One-Tailed versus Two-Tailed Tests [31]
This has reference to the way one understands the situation
represented in his data and, more importantly, to the way one
uses the standard tables. The "tail" terminology is related to the
distributions involved in tests of significance. For instance, the
hypothesis that two means are different uses a two-tailed test.
However, the assertion that one mean is greater than or lesser
than the other mean requires a one-tailed test. A two-tailed test
involves no assumptions about the difference, if it exists; how-
ever, the one-tailed test goes a step farther since it involves an
assumption about the direction of the difference. If there is any
doubt about which test is appropriate in any given situation, use
the two-tailed test.
C. LIMITATIONSAND QUALIFICATIONS
1. What Is Significant?
Long usage has given the 5 percent level a special status. It is
Universidad Nacional De Colombia (Universidad Nacional De Colombia) pursuant to License Agreement. No further repro
most often considered as the cutoff point between a "real"

difference and one which can be accepted only with reservation,
but this is convention only. Obviously, a result which just misses
the 5 percent level (say 6 percent) is very little different from the
situation where it attained the 5 percent level. Choice of the
probability level is the prerogative of the experimenter and should
depend upon the circumstances of the experiment and the way
the results are to be used.
2. Multiple Tests of Significance
Sometimes there is a need for many tests of significance on the
same data, or on related sets of observations. This is permissible;
however, one must keep in mind the meaning of statistical signifi-
cance. The 5 percent level implies 95 chances in 100 that the
difference is "real", but also implies that there are five chances in
100 that there is no difference. As one continues to make and
test hypotheses, the probability of drawing one of the latter cases
increases. For example, if just one of 20 tests attains the 5 per-
cent level, there is no reason to conclude that there is anything
special about the case, because this is just what chance would
allow. Such multiple testing is sometimes frowned upon but is
often done. There are special ways of compensating for the altered
probabilities (IV.H.3, 4, and 5).
3. Reliability of Results
One way of interpreting significance is in terms of what one
would expect to happen if the experiment were repeated. For
example, a 5 ,percent level difference between the averages of two
experimental treatments suggests that the finding of "no differ-
ence" would be unlikely; it should happen no more often than
once in twenty repetitions. However, finding a significant dif-
ference does not mean that one should expect to find a difference
as large or at the same level of significance whenever the experi-
ment is replicated. It means only that one should expect to find
a difference greater than zero and in the same direction as before.
4. Theoretical Basis for Statistical Analyses
Most statistical analyses are based upon certain a~sumptions
about the data. For example, most assume that the data, or
random errors in the data, are normally distributed. Theoretical
statisticians are often concerned about whether these assumptions
are actually met, but the users of statistical computations in this
manual need not be concerned with this.
O. REFERENCE TO PREPARED TABLES
Tables have been developed for rapid determination of statistical
significance in certain kinds of situations. Many of these have been
published. All have been developed according to the standard for-
mulas for the various statistics. They are great time savers whenever
they will apply to a set of test results. They are usually limited in the
range of the numbers of subjects (or responses) covered. Five such
tables are reproduced in this manual and are discussed below.
1. Table 1--Significance of Paired-Comparison (Binomial) Results--
Two-Tailed
a. This table is for use in situations where either of the two sam-
pies may be chosen and where the chance probability is 50 to
50 percent. The preference test is the typical situation; how-
ever, it also applies to tests where comparison has been made
on the basis of other factors such as "which is sweeter",
"which is stronger", etc. The key point is that choice of either
sample, not just one of the samples, is permitted by the condi-
tions of the test.
b. Examples of Use
(1) A preference test was run with 50 subjects; 34 preferred
Sample A and 16 preferred Sample B. Enter the table at
50 in the first column and observe that 34 exceeds the
value in the 5 percent column but is less than the value in
the 1 percent column. Therefore, the result is considered
significant at the 5 percent level.
(2) A small-scale test was run to determine whether two
samples differed in degree of saltiness. Out of 16 subjects,
12 chose Sample A and 4 chose Sample B. Entering the
table at i6 in the first column, it will be seen that 13
choices are required for significance at the 5 percent level.
Therefore, one would conclude that this result was not
significant.
c. When the number of judgments exceeds the range of Table 1,
use the t-test for percentages (IV.E.4).
2. Tables 2 and 3--Significance of Rank Order Test Results [32]
a. These tables are for use in determining the significance of
rank order results. They cover the range of situations repre-
sented by the ranking of 2 to 12 samples by 2 to 20 subjects.
(It may be noted that the "2 samples" column gives the same
information as Table 1).
b. How to Use
Assign the values i, 2, 3, 4, etc. to the successive ranks. Add
these values for each sample across all subjects. Determine the
highest and the lowest values for any sample. Enter the first
column to the right for the number of samples. If the high
value fl'om the experiment exceeds the higher value in Table 2,
or the low value is below the lower value in the table, the rank-
ing differs significantly from chance at the 5 percent level.
Then refer to Table 3 and repeat the operation to determine
whether the result reaches the 1 percent level of significance.
c. Example of Use
Ten subjects each ranked four samples of juice for sweetness.
The sums of the ranks were as follows: Sample A--16, Sam-
ple B--28, Sample C--18, and Sample D--28. Enter Table 2
at I0 in the first column and look in the column for four
samples. It shows that a low of 16 or a high of 34 would be
required; hence, Sample A is significantly low. Then check
Table 3, where it is seen that the low value required is 14.
Hence, the result is significantly different from chance at the
5 percent level but not at the I percent level.
d. When the number of samples exceeds 12, or the number of
rankings (subjects) exceeds 20, use chi-square (IV.F).
3. Tables 4 and 5--Significance of Results in Paired or Triangle
One-Tailed Situations
a. These tables are for use in situations where the choice of only
one of the samples will fulfill the conditions of the experiment.
Figures are given for 50 to 50 percent chance probability,
where there are only two samples involved, and for the 33.3
to 66.7 percent probability level of the triangle test. The duo-
trio, dual standard, or paired difference tests are examples of
two-sample situations.
b. Examples of Use
(1) A duo-trio test was run with 10 subjects each making two
trials. There were 16 correct identifications. Enter Table 4
at 20 in the first column, and note that the 16 correct is
exactly what is required for significance at the I percent
level.
(2) A triangle test was run with 25 subjects each making two
trials. There were 30 correct identifications. Enter Table 5
at 50 in the first column, and note that only 28 correct is
required for significance even at the 0.1 percent level.
Thus, the result is significant at beyond the 0.1 percent
level.
c. When the number of judgments exceeds the range of the table,
use the t-test for percentages (IV.E.4).
E. THE t-TEsT
I. The statistic, t, is often used in determining the significance of
differences, but it is not applicable when there are more than two
measures (means, proportions, etc.) to consider. It is defined as
the difference divided by the standard error of the difference. Its
distribution shows the probabilities associated with this ratio for
a given number of cases (dr) (Table 6). There are many different
applications of this test, that is, for different kinds of data, but all
follow the same general procedure. Three methods are given here.
2. Generalized t-Test
a. This is used for testing the significance of the difference be-
tween the means of independent sets of observations, such as
measurements made on independently drawn samples of the
same population.
b. Example of Use
One group of 10 subjects rated Sample A on a 7-point scale.
A different group of 10 rated Sample B. (The same c o m p u t a -
tions would be made if the two groups had rated the same
sample, but then we would be testing for differences between
groups.) The scores were:
Sample A Sample B
4
3
7
2
1
3 2
3 6
I 7
4 7
2 5
Sum of scores . . . . . . . . . . . . . . . . . . 30 47
Average score . . . . . . . . . . . . . . . . . . . . . . . . 3.0 4.7
Variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.11 3.57
Standard deviation . . . . . . . . . . . . . . . . . . . 1.76 1.89
Standard error . . . . . . . . . . . . . . . . . . . . . . . 0. 557 0. 597
Degrees of freedom (N - 2) = 18
Standard error of the difference
SEr = (0.577 ~ + 0.597~) in = 0.816
t - Xn-- Xa _ 1.7 _ 2.08

SE(a_~) 0.816
Reference to Table 6 shows that with 18 degrees of freedom
a t of 2.10 is required for significance at the 5 percent level.
Therefore, the difference does not reach this criterion.
3. t-by-Difference
a. This is a short-cut method of obtaining t when paired scores
or ratings are available, for example, when each subject has
given a preference rating for two samples of food. When the
scores are positively correlated, it p r o v i d e s a more discriminat-
ing test by removing from the estimate of error the effect of
Universidad Nacional De Colombia (Universidad Nacional De Colombia) pursuant to License Agreemen
differences between levels of scores. This is analogous to taking

out subject effect in the analysis of variance (IV.G).
b. The formula is
Y,D
t =
where:
ZD = algebraic sum of the differences between the paired
scores,
ZD ~ = sum of the squares of the differences, and
N= number of pairs of scores.
c. Example
Each of six subjects has rated both Sample A and Sample B:
Rating
Subject
XA Xn XA - Xn (XA - X a ) 2
x 3 4 --I !
Y .................... 8 2 6 36
z .................... 7 3 4 16
W .................... 5 5 0 0
V .................... 7 2 5 25
U .................... 5 6 --1 1
(ZD) 13 (ZO~) 79
13
t= = 1.66
6 • 679
- - 1 132/1/2
Reference to Table 6 shows that with 5 degrees of freedom

(N - 1) a value of 2.02 is required for significance even at the
10 percent level. Hence, the difference between the sample
means would be called not significant.
. t-Test for Percentages (proportions)
a. The t-test can also be used with percentages with proper con-
version of the formulas.
b. The formulas are
(1)
where:
p = higher percentage, and
q = (I - p) = lower percentage.
(2) The standard error of the difference between proportions

would be the same as shown above for the difference
between means.
SE(.(_B) = (SE., 2 + S E ~ ) 1/~
c. Uses
(1) It is most often used to test for the significance of the
difference between an experimentally obtained proportion
and a fixed hypothesis, for example, the 50 to 50 percent
theoretical chance distribution in a paired comparison
test. Here one uses the first formula above.
(2) If used to test for the significance of the difference between
two experimentally observed proportions, one would use
the formula for the standard error of the difference.
d. Example
A preference test was run between Sample A and Sample B
with 121 subjects; 72 chose A, and 49 chose B. The number of
subjects exceeds the range of Table 1 ; hence, computation is
necessary.
Compute the S E of the theoretical chance proportion where
p = 0.50, q = 0.50, and N = 121
SEp~r (0.50 X 0.50'~ 1;2 0.50 0.045

-- 121 ] - II -
observed proportion - theoretical proportion
t =
SEprol~ortion
72
Observed proportion - 121
= 0.595 (p) and 0.405 (q)
Theoretical proportion = 1/~,
= 0.50 (p) and 0.50 (q)
0.595 - 0.500 0.095

t = 0.045 - 0.045 - 2.11
Enter Table 6 at 120 degrees of freedom (nearest value to N)

and look under the two-tailed columns, because this was a
preference test where either sample could have been chosen.
It is noted that a t of 1.98 is required for significance at the
5 percent level; therefore, this result is significant at that level.
A t of 2.62 is required for the 1 percent level; hence, it does
not meet that criterion.
MANUAl. ON ,SENSORY TESTING METHODS 53
F. CHI-SQUARE TEST
I. This is a method to determine whether the observed frequencies
in the categories of a distribution differ significantly from the
frequencies which might be expected according to some hypothe-
sis. This is a general method applicable to a wide range of types
of data and distributions. It is a test which requires no assump-
tions about the distribution of the variable in the population.
2. The distribution ofchi-square is published in tables which appear
in most statistical texts. They show the values which are required
for statistical significance at various levels for various degrees of
freedom (Table 7).
3. It is important that the hypothesis be one which is meaningful in
regard to the particular experiment. In most cases this will be
obvious, for example, when using the null hypothesis that there
is no real difference between samples in regard to the character-
istic measured; the responses should be equally divided among
the categories. Another situation is where a test has been run in
two situations or with two different groups of people. This means
that we compare the two sets of data to see whether they could
have arisen from the same distribution. Here one averages the
two frequencies in each category to obtain the expected frequency.
a. The formula for chi-square is
x =:g ~.
where:
O = frequency observed, and
E = frequency expected.
The indicated summation is for all terms of the form shown.
There will be a term for each category of the distribution.
. The calculated chi-square value is interpreted by reference to
published tables or to charts giving the same information which
show the values to be expected at selected probabilities according
to degrees of freedom.
. The degrees of freedom in single applications is usually one less
than the total number of categories (and terms used in computing
the value of chi-square).
. Example of application to paired comparison data:
Number of choices observed: A = 28, B = 12.

Number expected by the null hypothesis: A = 20, B = 20.
Degrees of freedom: 1
(28 - 20) ~ (12 - 20) ~
• - + - 3.2+3.2 = 6.4
20 20
Universidad Nacional De Colombia (Universidad Nacional De Colombia) pursuant to License Agreement. No further reprodu
Entering the chi-square table with one degree of freedom, we

find that 6.4 is to be expected only about 1 percent of the time,
hence, the result is significant at that level.
8. Example of application of chi-square to rank order data [33]:
(Seven samples have been ranked for preference by 14 subjects.)
Sample
Subject
A B C D E F G
] ......................... ] 3 2 6 5 4 7
2 ......................... 1 2 3 5 6 4 7
3 ......................... 1 2 3 4 6 5 7
4 ......................... J 4 2 3 7 5 6
5 ......................... 1 3 2 4 5 6 7
6 ......................... | 2 3 6 5 4 7
7 ......................... 1 3 4 2 5 6 7
8 ......................... 2 1 3 5 4 7 6
9 ......................... 3 1 2 6 4 5 7
10 . . . . . . . . . . . . . . . . . . . . . . . . . 1 4 2 3 5 6 7
11 . . . . . . . . . . . . . . . . . . . . . . . . . 1 3 2 5 4 6 7
12 . . . . . . . . . . . . . . . . . . . . . . . . . I 2 3 5 4 6 7
13 . . . . . . . . . . . . . . . . . . . . . . . . . 1 3 4 2 7 6 5
14 . . . . . . . . . . . . . . . . . . . . . . . . . 2 3 4 5 I 7 6
Rank Total ............ 18 36 39 61 68 77 93
chi-square =
( NP(-/~-+--[5
12 X X(RT)"
) -- 3 N ( P + 1)
where:
N = number of subjects,
P = number of samples (and number of ranks),
R T = rank total,
(RT) 2 = sum of the squares of the rank totals, and
P - 1 = degrees of freedom.
chi-square =
( 14 X ,27 X 8 • )- (3 X 14 X 8) = 62.92
The chi-square table shows that for six degrees of freedom a value
as high as 16.8 will occur by chance only 1 percent of the time. There-
fore, differences among the samples have been established at beyond
the 1 percent level.
Note that these data could also be analyzed by the method in
IV.D.2, that is, by obtaining the sums of ranks and determining their
significance by reference to Tables 2 and 3.
G. ANALYSIS OF VARIANCE [34,35]
1. Analysis of variance is a general method, with broad applications.
It can be very complex, going far beyond any exposition that

might be attempted within the scope of this manual. Here we will
attempt to present only the rudiments. Extension of knowledge
beyond this point should be by reference to appropriate texts.
2. Basic Ideas of Analysis of Variance
a. The total amount of variation that exists within an overall
distribution of scores (values, measures) can be segregated
into categories representing that due to various identifiable
effects and the variation due to unidentifiable causes, which is
called error. It is assumed that error variation is random and
normally distributed.
b. If the variance attributable to some identifiable cause exceeds
the variance attributable to random error by a certain ratio,
one may infer that this effect was not due to chance. If the
ratio falls below this level, one may infer that the effect of the
variable is no greater than might have resulted from chance,
therefore, it is not significant.
c. This is the/;-ratio. It is interpreted by reference to standard
tables, which usually show the values which this ratio must
reach to be significant at the 5 and 1 percent levels. These
values are dependent upon the number of degrees of freedom
for both the numerator (the variable being tested) and the
denominator (estimate of error) (Tables 8 and 9).
3. Example---2-Factor Problem
a. Three subjects have each scored three samples (A, B, and C).
Thus we can examine the effects of differences between sub-
jects (subject effect) and of differences between samples
(treatment effect).
b. Set up a computation table as follows:
Scores for Samples Subject

Subject
A B C ZX (I~X) 2
1 ............... I 2 2 5 25
2 ............... 2 3 4 9 81
3 ............... 1 1 2 4 16
Totals ...... 4 6 8 18 122
Treatment Y.X = totals separately for A, B, and C, (4, 6, and

8).
Subject 2~X = totals separately for each subject (5, 9, and
4).
Subject (2;X) ~ = squares of above totals (25, 81, and 16)
~X 2 = square each score individually and get
total (44).
Universidad Nacional De Colombia (Universidad Nacional De Colombia) pursuant to License Agreement. No further repro
56 MANUALON SENSORYTESTING METHODS
Total ~ X = add all 9 scores (18).

(~?X)2 = square of above (324).
Correction factor ( C F ) : (ZX)2 _ 324 _ 36.

No. of Scores 9
Total sum of squares ~;Xz - CF (44 - 36 = 8).
Treatment sum of squares: get (2;X) 2 for each sample sepa-
rately, sum across treatments, divide by number of subjects,
and subtract correction factor
42 + 6 2 + 82 116
36 = - - -- 36 = 2.67
3 3
Subject sum of squares:get (ZX) 2 separately for each subject,
get total for all subjects, divide by the number of scores for
each subject, and subtract CF.
122
-- 36.0 --- 4.67
3
c. Computation ofF-ratios
Degrees Sum of Mean Signifi-

Source of Variation of F-Ratio
Freedom Squares Squares canoe
Treatment . . . . . . . . . . . . . . 2 2.67 1.33 8.06 5%

Subject . . . . . . . . . . . . . . . . . 2 4.67 2.33 14.10 50"/0
Interaction (error) . . . . . . 4 0.66 0.165 ......
8 8.0
Degrees of freedom: one less than the number of categories of

each variable. The total degrees of freedom is one less than the
total number of scores.
Interaction: subject-treatment interaction, here obtained as
the residual after subtracting the treatment and subject sums
of squares from the total sum of squares. This is the only
possible estimate of error here.
Mean squares: obtained by dividing each sum of squares by
its degrees of freedom.
F-ratios: obtained by dividing mean squares for treatment and
subject by mean square for error.
d. To determine significance, consult the tables of F-ratios (Tables
8 and 9) at two degrees of freedom in the numerator and four
in the denominator for both treatment and subject effects. It
will be seen that both of the above exceed the value required
for the 5 percent level but fail to reach the value required for
the 1 percent level.
H . PROBLEM OF MULTIPLE COMPARISONS

1. When it is desired to make comparisons among the means de-
veloped from an experiment involving more than two samples,
a problem arises because the probabilities are altered (IV.C.2).
As the number of means increases, the likelihood of a given value
of the difference between the highest and lowest means being
equalled or exceeded by chance alone increases (becomes less
significant).
2. There are various ways of adjusting to this situation, including
that of disregarding it and simply computing the regular t-test
between any two means that are compared. (This is expedient,
but not recommended.)
3. Least Significant Difference (LSD) [30]
a. First verify that the overall effect of differences among the
means is significant as shown by the F-ratio.
b. Compute the LSD as follows
LSD0.o5 - X/~'s" to.o5
where:
= square root of the mean square for error from
s
analysis of variance, or standard deviation,
t0.0s = t value for 5 percent significance level, and
n = number of observations on which the means are
based.
T o obtain the LSD for the I percent, or other significance
level, would require only the substitution of the t value for the
particular level.
c. Any two means whose difference exceeds the computed value
are considered as significantly different.
4. Duncan Multiple Range Test [36]
a. Given a set of k means this test shows whether all are signifi-
cantly different, or whether some differ while others do not.
It requires four steps.
b. Arrange the means in order of magnitude.
Treatment ...... C A E B D
Mean . . . . . . . . . . . 7.03 6.63 5.64 5.62 4.57
c. Calculate the standard error of the mean.
S,g= /~
n
For this example, s ~, the error mean square trom the analy-
sis of variance, is 1.876 with 60 degrees of Jreedom, and n -~
16.
Universidad Nacional De Colombia (Universidad Nacional De Colombia) pursuant to License Agreement. No further reproductions au
syr = A / ~ ' v _ 0.342

'V 16
d. Calculate the shortest significant range using Tables 10 and 11.
In these tables the rows depict the degrees of freedom in the
error term, and the columns show the number of means.
Since there are 5 means in the example, there will be 4 ranges
to find.
Using the 5 percent significance level and 60 degrees of
freedom, the significant ranges (r) are
k= 2 3 4 5
r = 2.83 2.98 3.08 3.14
Each of these is multiplied by the standard error (above) to

find the shortest significant range (r).
k= 2 3 4 5
r = 0.97 1.02 1.05 1.07
e. Calculate the range between two means in the array, and com-
pare with the shortest significant range for the relative position
in the array.
Begin by testing the largest against the smallest, then the
largest against the second smallest, and so on until the range
between the two means is less than the significant range for
the number of means grouped in that range. Underline the
means in this grouping to indicate they are not significantly
different.
It is not necessary to test means within this grouping, since
no difference between two means can be significant if both
means are contained in a group of means which has a non-
significant range.
When the largest mean has either been grouped, or found to
be significantly different from all other means, the second
largest mean is tested--first against the smallest mean, then
against the second smallest, and so on. Continue until the
second smallest mean is tested against the smallest, unless all
means have already been grouped.
t~ In the example of means given, the calculations will proceed as
follows:
Treatment . . . . . . C A E B D
Mean . . . . . . . . . . . 7.03 6.63 5.64 5.62 4.57
C-D: 7.03 - 4.57 = 2.46 >1.07 (Rs)

C-B: 7.03 -- 5.62 = 1.41 >1.05 (R4)
C-E: 7.03 -- 5.64 = 1.39 >1.02 (R3)
C-A: 7.03 - 6.63 = 0.40 <0.97 (R~)
Universidad Nacional De Colombia (Universidad Nacional De Colombia) pursuant to License Agreement. No further reproductions authorize
R5 indicates a comparison between the 1st and 5th means;

R4 indicates a comparison between the 1st and 4th or 2nd
and 5th means, etc.
Underline C and A to show they are not significantly dif-
ferent.
A-D: 6.63 -- 4.57 = 2.06 >1.05 (R,)
A-B: 6.63 - 5.62 = 1.01 <1.02 (R3)
Underline A through B. Note that it is not necessary to

test A and E. The difference is actually greater than the value
for R2, but the two means are contained in a group which
does not differ significantly.
E-D: 5 . 6 4 - - 4.57 = 1.07 >1.02 (R3)
E and B are not tested since they are already grouped

B-D: 5 . 6 2 - - 4.57 = 1.05 >0.97 (R2)
5. Dunnett Test [37]

a. This test applies to the special case where one of the treatments
is a predetermined control, and the objective is to determine
which of the other sample means is significantly different from
the mean of the control.
b. Dunnett has published tables to be used in conjunction with
his method (Table 12). They are tables of t-values, where t
has been adjusted upward. For selected levels of significance
they show the t-value to be used according to the degrees of
freedom and the number of treatment means in the set within
which comparisons are to be made.
c. Method of Computation
(1) One defines A as the amount of difference between a
sample mean and the control required for significance.
(2) The formula for A when the N's for all means are equal is
A = ts~/~
where:
t = value from Dunnett table,
p = number of samples excluding the control,
N = number of observations on which each mean is
based,
df = (p q- I) (N -- I), and
s = standard deviation (square root of the average
variance) of the p q- 1 distributions.
(3) When the N's are not equal for all samples, the formula
becomes
1 1 1 '~t/2
A = ts . . .
d f = N - - (p+ 1)
Other c o m p u t a t i o n s are the same.
d. Example
(1) Given Samples A and B representing two productions of
u n k n o w n quality. H o w does each c o m p a r e with Sample C,
a control of k n o w n quality? T h e three samples were
scored using a 9-point scale by a panel of I0 subjects.
Therefore, p = 2 and N = I0.
(2) Distributions
Sample C Sample A Sample B

X x* X x* X x:
7 0.01 3 0.81 5 0.81

8 0.81 4 0.01 6 0.01
6 1.21 5 1.21 8 4.41
6 1.21 2 3.61 2 15.21
8 0.81 6 4.41 7 1.21
9 3.61 4 0.01 7 1.21
7 0.01 3 0.81 3 8.41
5 4.41 2 3.61 8 4.41
8 0.81 5 1.21 6 0.01
7 0.01 5 1.21 7 1.21
7.1 3.9 5.9

~x ~
1.43 1.88 4.10
s2=N--1
(3) C o m p u t a t i o n of standard deviation

(a) Obtain the average variance
1.88 + 4.10 + 1.43 7.41
- -- - 2.47
3 3
(b) Then
s = (2.47) 1/2 = 1.57
(4) df= 3 X 9 = 27, and

t for the 5 percent level with 27 degrees of freedom = 2.00.
%/)1/2
(5) = (0.20) x/2 = 0.45.
(6) A = 2.00 X 1.57 X 0.45 = 1.40.

(7) Determination of significav.ce
Xc--Xa = 7.1 - 3.9 = 3.2
Since this value is larger than " A " , one concludes that
Sample A is significantly inferior to Sample C.
Xc- XB = 7.1 - 5.9 = 1.2
Since this value is smaller than " A " , one concludes that
Sample B and Sample C do not differ significantly.
THRESHOLD DETERMINATION
a. This graphic method for determining a difference or absolute
threshold is one which serves as a basis for the variations seen
in many of the psychophysical methods.
b. A series of stimuli have been presented on several occasions
and a judgment has been made each time on whether the
stimulus was noticed. If the stimulus was noted, one assigns
a value of 1 ; if the stimulus was not noticed, a value of 0 is
assigned. This gives the following table.
Stimulus Values
Occasions
1 2 3 4
1 ........................ 0 0 1 1
2 ........................ 0 1 0 1
3 ........................ 0 1 I 1
4 ........................ 0 0 0 1
5 ........................ 0 I i 0
6 ........................ I 0 0 1
7 ........................ 0 0 0 1
8 ........................ 0 1 1 1
9 ........................ 0 0 I 1
10 ........................ 0 0 1 1
Frequency noticed ........ 1 4 6 9

Proportion ............... 0.1 0.4 0.6 0.9
Stimulus 1 was noticed only once in ten occasions, while

stimulus 4 was noted 9 out of 10 times.
c. P r o c e d u r e
(1) Determine the proportion of times each stimulus was
noticed. This is shown in the last row of the table.
(2) Draw a diagram showing proportions on the y-axis and
the stimulus values on the x-axis. Plot the proportions fox
the stimulus values and draw a smooth curve through the
points.
(3) Note where the line crosses the 0.50 point on the y-axis,
and draw a straight line from the curve to the x-axis. The
point on the x-axis denotes the stimulus value which is at
the absolute threshold, that is, the stimulus value which is
noticed 50 percent of the time.
d. The difference threshold is defined as that stimulus noticed
75 percent of the time. Using the same figure as used in finding

the absolute threshold, draw the line from the x-axis to the
curve as shown at the point where the curve crosses the 0.75
point on the y-axis.
Acknowledgment
Production of this manual has taken eight years. So many Committee
E-18 members, past and present, have contributed to its development
that it would be impossible to list or even remember them all.
Subcommittee II1 was decided on at the organizational meeting of
Committee E-18 in 1958. Under the guidance of its first chairman, F. J.
Pilgrim, the group developed a work plan in terms of a list of procedures
which deserved authoritative codification. The initial intent was to
describe and publish each method separately; however, difficulties soon
became apparent. The most serious one was that the necessary back-
ground information was not readily available. In January 1963, the
decision was made to gather all of the material together in the form of a
manual. A basic outline was developed at that time.
Planning and replanning, writing and rewriting, false starts and re-
directions have consumed much time. The first of a series of drafts was
presented at the June 1964 meeting. Biannual revisions extended the series
to five before the manual was deemed sufficiently complete to submit to
the full membership of Committee E-18 for review. The final version was
approved for publication in January 1967.
Those who have made major contributions to this manual are listed
below. It is recognized that some may have been overlooked.
F. J. Pilgrim F. Sullivan
D. R. Peryam, ed. N. Oshinsky
K. S. Konigsbacher Isabel Shillestad, ed.
N. F. Girardot J. F. Elrod, ed.
M. D. Seaman R. A. Kluter
H. G. Schutz E. K. Robbins
D. A. Brandt E. Z. Skinner
We are indebted to the Literary Executor of the late Sir Ronald A.
Fisher, F.R.S., to Dr. Frank Yates, F.R.S., and to Oliver & Boyd Ltd.,
Edinburgh, for permission to reprint Table Ili from their book Statistical
Tablesfor Biological, Agricultural and Medical Research.
Glossary of Statistical Symbols
This glossary is designed simply as a ready reference. The definitions
and explanations are written for the nonstatistician to help him use the
manual. They are not necessarily authoritative nor complete; however,
none is incorrect.
2; Summation sign. It indicates that all terms of the type which

follows are to be added together, for example, ZX means add
all X's.
N N u m b e r of observations or cases in the distribution being con-
sidered. Also, the lower case letter is used sometimes.
X The value of a single observation, score, rating, etc. It is impor-
tant to distinguish between the capital and lower case letters.
x The difference between a score, X, and the mean of the distribu-
tion.
Bar In general, a bar written over a symbol means the average of the
distribution of values of that type.
)(" The average of all X's in the distribution of scores, that is, the
mean.
Subscripts Used to label terms in a formula when there is a possibility of
confusion. F o r example, when comparing distributions A and B,
the two means would be indicated as Xa and X~.
s 2 Variance of a distribution of scores.
s The standard deviation of a distribution of scores.
SE Standard error of a distribution of scores.
SET Standard error of the mean of the scores.
SE~a_B) Standard error of the difference between the means of distribu-
tions A and B.
SEp,r Standard error of a percentage (proportion).
df Degrees of freedom.
t Difference between the means divided by the standard error of
the difference; value resulting from the t-test. Interpreted as a
level of significance by reference to Table 6.
F The ratio of two variances, relating to analysis of variance where
it is usually variance due to an identifiable cause divided by the
variance due to chance. Interpreted as a level of significance by
reference to Tables 8 and 9.
• Chi-square, a special statistic of known distribution used to
determine whether two distributions are or are not significantly
different. Interpreted by reference to Table 7.
D A difference between paired values (scores).
p and q Indicate the two values in a percentage or proportion, p is the
higher value, and q = 1.00 - p.
T A B L E l--Number of choices required for significance at various levels in a

paired-comparison test where either sample may be chosen. Chance probability is
50 percent, and the hypothesis is two-tailed.
Minimum Number Required Minimum Number Required
No. of Judgments No. of Judgments
5% 1% 0.1% 5% 1% 0.1%
] ................... 41 ........... 28 30 32
2 .................. 42 ........... 28 30 32
3 .................. 43 ........... 29 31 33
4 .................. 44 ........... 29 31 34
45 ........... 30 32 34
6 ............. 6 .. 46 ........... 31 33 35
7 ............. 7 . 47 ........... 31 33 36
8 ............. 8 8 48 ........... 32 34 36
9 ............. 8 9 49 ........... 32 34 37
I0 ............. 9 I0 50 ........... 33 35 37
11 . . . . . . . . . . . . . 10 11 11 52 . . . . . . . . . . . 34 36 39
12 . . . . . . . . . . . . . 10 11 12 54 . . . . . . . . . . . 35 37 40
13 ............. II 12 13 56 . . . . . . . . . . . 36 39 41
14 ............. 12 13 14 58 ........... 37 40 42
15 ............. 12 13 14 60 ........... 39 41 44
16 ............. 13 14 15
17 ............. 13 15 16 62 . . . . . . . . . . . 40 42 45
18 ............. 14 15 17 64 . . . . . . . . . . . 41 43 46
19 ............. 15 16 17 66 . . . . . . . . . . . 42 44 47
20 ............. 15 17 18 68 . . . . . . . . . . . 43 46 48
70 . . . . . . . . . . . 44 47 50
21 ............. 16 17 19
22 ............. 17 18 19 72 . . . . . . . . . . . 45 48 51
23 ............. 17 19 20 74 . . . . . . . . . . . 46 49 52
24 ............. 18 19 21 76 . . . . . . . . . . . 48 50 53
25 ............. 18 20 21 78 . . . . . . . . . . . 49 51 54
26 ............. 19 20 22 80 ........... 50 52 56
27 ............. 20 21 23
28 ............. 20 22 23 82 ........... 51 54 57
29 ............. 21 22 24 84 ........... 52 55 58
30 . . . . . . . . . . . . . 21 23 25 86 ........... 53 56 59
88 ........... 54 57 60
31 ............. 22 24 25 90 ........... 55 58 61
32 ............. 23 24 26
33 ............. 23 25 27 92 ........... 56 59 63
34 . . . . . . . . . . . . . 24 25 27 94 ........... 57 60 64
35 . . . . . . . . . . . . . 24 26 28 96 ........... 59 62 65
36 . . . . . . . . . . . . . 25 27 29 98 ........... 60 63 66
37 ............. 25 27 29 100 . . . . . . . . . . . 61 64 67
38 ............. 26 28 30
39 ............. 27 28 31
40 ............. 27 29 31
T A B L E 2--Five percent level--rank totals required for significance. (Rank total must be lower than the first value or higher than the second
value.) ~
Number of Number of Treatments, or Samples, Ranked
Rankings 2 3 4 5 6 7 8 9 10 1! 12
2 . . . . . . . . . . . . . . . . . . . .
3 . . . . . . . . . . . . . . . . ' 4-i7 -25 i-34
4 .............. i.. ~i1 ~i5 6-18 6-22 7-25 7-29 8-32 8-36 8-39 9-43
5 ................. 6-14 7-18 8-22 9-26 9-31 10-35 11-39 12-43 12-48 13-52
6 .............. 7-11 8--16 9-21 10-26 11-31 12-36 13-41 14--46 15-51 17-55 18-60
7 .............. 8-13 10--18 11-24 12-30 14-35 15--41 17--.46 18-52 19-58 21-63 22-69
8 .............. 9-15 11-21 13-27 15-33 17-39 18-46 20-52 22-58 24--64 25-71 27-77
9 .............. 11-16 13-23 15-30 17-37 19--44 22-50 24-57 26-64 28--71 30-78 32-85
10 .............. 12-18 15-25 17-33 20-40 22--48 25-55 27-63 30-70 32-78 35-85 37--93 x
c
II .............. 13-20 16--28 19-36 22--44 25-52 28-60 31--68 34-76 36-85 39-93 42-101 o
12 . . . . . . . . . . . . . . 15-21 18-30 21-39 25-47 28-56 31-65 34-74 38-82 41-91 44-100 47-109 z
13 . . . . . . . . . . . . . . 16-23 20-32 24-41 27-51 31-60 35-69 38-79 42-88 45-98 49-107 52-117
14 .............. 17-25 22-34 26--44 30-54 34-64 38-74 42-84 46-94 50-I 04 54-I ]4 57-125
15 .............. 19-26 23-37 28--47 32-58 37--68 41-79 46--89 50-100 54-111 58-122 63-132
16 . . . . . . . . . . . . . . 20-28 25-39 30-50 35-61 40-72 45--83 49-95 54-106 59-117 63-129 68-140
17 . . . . . . . . . . . . . . 22-29 27-41 32-53 38-64 43-76 48-88 53-100 58-112 63-124 68-136 73-148 Z
]8 . . . . . . . . . . . . . . 23-31 29--43 34-56 40--68 46-80 52-92 57-105 62-118 68-130 73-143 79-155 o
19 . . . . . . . . . . . . . . 24--33 30-46 37-58 43-71 49-84 55-97 61-110 67-123 73-136 78-150 84-163
20 .............. 26-34 32---48 39--61 45-95 52-88 58-102 65-115 71-129 77-143 83-157 90-170
O
9 A b r i d g e d f r o m : K r a m e r , A m i h u d , " A R a p i d M e t h o d f o r D e t e r m i n i n g Significance o f Differences f r o m R a n k S u m s , " Food Technology ~t
Vol. 14, 1960, pp. 576-581. C o p y r i g h t ~ ) 1960 by t h e I n s t i t u t e o f F o o d Technologists.
T A B L E 3 - - O n e percent level--rank totals required f o r significance. (Rank total must be lower than the first value or higher than tne second
value.) 9
Number of Number of Treatments, or Samples, Ranked
Rankings 2 3 4 5 6 7 8 9 10 11 12
2 . . . . . . . . . . . . . . . .
3 . . . . . . . . . . . . . . . . 4-'29 4Si2 4~3'5
4 . . . . . . . . . . . . . . . . . . . . . . 52i9 5-23 5-27 6~30 6-34 6-38 6--42 7-45
5 . . . . . . . . . . . . . . . . iii (>-i9 7-23 7-28 8-32 8-37 9-41 9-46 10-50 10-55
6 . . . . . . . . . . . . . . . . 7-17 8-22 9-27 9-33 10-38 11-43 12-48 13-53 13-59 14-64
7 .............. 8-20 10-25 11-31 12-37 13-43 14-49 15-55 16-61 17-67 18-73
8 .............. 9Li5 10-22 11-29 13-35 14-42 16-48 17-55 19-61 20-68 21-75 23-81 O
9 .............. 10-17 12-24 13-32 15-39 17-46 19-53 21-60 22-68 24-75 26-82 27-90
10 . . . . . . . . . . . . . . 11-19 13-27 15-35 18-42 20-50 22-58 24-66 26-74 28-82 30-90 32-98
O
11 . . . . . . . . . . . . . . 12-21 15-29 17-38 20--46 22-55 25-63 27-72 30-80 32-89 34-98 37-106
12 . . . . . . . . . . . . . . 14-22 17-31 19-41 22-50 25-59 28-68 31-77 33-87 36-96 39-105 42-114
13 . . . . . . . . . . . . . . 15-24 18-34 21-44 25-53 28-63 31-73 34-83 37-93 40-103 43-113 46-123
14 . . . . . . . . . . . . . . 16-26 20-36 24-46 27-57 31-67 34-78 38-88 41-98 45-109 48-120 51-131
15 . . . . . . . . . . . . . . 18-27 22-38 26--49 30-60 34-71 37-83 41-94 45-105 49-116 53-127 56-139
16 . . . . . . . . . . . . . . 19-29 23-41 28-52 32-64 36-76 41-87 45-99 49-111 53-123 57-135 62-146
17 . . . . . . . . . . . . . . 20-31 25-43 30-55 35-67 39-80 44-92 49-104 53-117 58-129 62-142 67-154
18 . . . . . . . . . . . . . . 22-32 27-45 32-58 37-71 42-84 47-97 52-110 57-123 62-136 67-149 72-162
19 . . . . . . . . . . . . . . 23-34 29-47 34-61 40-74 45-88 50-102 56-115 61-129 67-142 72-156 77-170
20 . . . . . . . . . . . . . . 24-36 30-50 36-64 42-78 48-92 54-106 60-120 65-135 71-149 77-163 82-178
a A b r i d g e d f r o m : K r a m e r , A m i h u d , " A R a p i d M e t h o d f o r D e t e r m i n i n g S i g n i f i c a n c e o f D i f f e r e n c e s f r o m R a n k S u m s , " Food Technology,

Vol. 14, 1960, pp. 576-581. C o p y r i g h t @ 1960 by t h e I n s t i t u t e o f F o o d T e c h n o l o g i s t s .
z
C
0
z
0
.<
z
0
X
0
T A B L E 5--Number o f correct identifications required for significance at various

levels in triangle test. Chance probability is 33.3 percent, and the hypothesis is one-
tailed.
Minimum Number Required Minimum Number Required
No. of Judgments No. of Judgments
S% S% 0.1% 5% 1% 0.S%
I ................... 41 . . . . . . . . . . . 20 22 24
2 ................... 42 . . . . . . . . . . . 20 22 25
3 ............. 3 ... 43 . . . . . . . . . . . 21 23 25
4 ............. 4 44 . . . . . . . . . . . 21 23 26
5 ............. 4 5- 45 . . . . . . . . . . . 21 24 26
6 ............. 5 6 46 . . . . . . . . . . . 22 24 27
7 ............. 5 6 47 . . . . . . . . . . . 22 24 27
8 ............. 6 7 48 . . . . . . . . . . . 22 25 27
9 ............. 6 7 49 . . . . . . . . . . . 23 25 28
I0 ............. 7 8 50 . . . . . . . . . . . 23 26 28
11 ............. 7 8 10 52 ........... 24 26 29
12 ............. 8 9 10 54 ........... 25 27 30
13 ............. 8 9 11 56 ........... 26 28 31
14 ............. 9 10 11 58 ........... 26 29 32
15 ............. 9 10 12 60 ........... 27 30 33
16 ............. 9 11 12
17 . . . . . . . . . . . . . I0 11 13 62 ........... 28 30 33
18 ............. I0 12 13 64 ........... 29 31 34
19 ............ 11 13 14 66 ........... 29 32 35
20 . . . . . . . . . . . 11 13 14 68 ........... 30 33 36
70 ........... 31 34 37
21 . . . . . . . . . . . 12 13 15
22 . . . . . . . . . . . 12 14 15 72 ........... 32 34 38
23 ............. 12 14 16 74 ........... 32 35 39
24 ............. 13 15 16 76 ........... 33 36 39
25 ............. 13 15 17 78 ........... 34 37 40
26 ............. 14 15 17 80 . . . . . . . . . . . 35 38 41
27 ............. 14 16 18
28 ............. 15 16 18 82 . . . . . . . . . . . 35 38 42
29 . . . . . . . . . . . . . 15 17 19 84 . . . . . . . . . . . 36 39 43
30 ............. 15 17 19 86 . . . . . . . . . . . 37 40 44
88 . . . . . . . . . . . 38 41 44
31 ............. 16 18 20 90 . . . . . . . . . . . 38 42 45
32 ............. 16 18 20
33 ............. 17 18 21 92 ........... 39 42 46
34 ............. 17 19 21 94 ........... 40 43 47
35 ............ 17 19 22 96 ........... 41 44 48
36 . . . . . . . . . . . . . 18 20 22 98 ........... 41 45 48
37 . . . . . . . . . . . . . 18 20 22 100 . . . . . . . . . . . 42 46 49
38 . . . . . . . . . . . . . 19 21 23
39 . . . . . . . . . . . . . 19 21 23
40 ............. 19 21 24
-I
. .~ . . . . . . ~-- . . . . . . . . . .
o~ ~ ~.~
m
?o ~r~" o
~, ~o- 7,
"~.0"
~,-,.
."-.~-. .
.N
}:
).
z
C
0
. . . . . . . . . . . . . . . . . . . .
Z
~ .~.~,..
z
~a
,I-
0
0
i
~,~
O
o
L~ Z
C
o
Z
g~
ira4
c~
IT
o
q~
c~
~a
MANUAL O N SENSORY TESTING METHODS 71
................................ I!
............................... I!
" li
?
i
~,O
9 . 9 ~2
,~3
T A B L E 9 - - Vcdcw8 o f F - r a l i o ~ i g n i f w a n t ot the 1 percent Is~d. ~
~,,s
me\ 1 2 3 4 5 6 7 8 9 10 12 15 20 24 30 40 60 120 w 3:
\ >.
Z
1..... 4062 4999.5 5403 5626 5764 5859 5928 5982 6022 6056 6106 8157 6209 6235 6281 6287 6813 6339 6366 C
2 ..... 98.50 99.00 09.17 99.25 99.30 99,33 99.36 99.37 99.39 99.40 99.42 99.43 99.45 99.46 99.47 99.47 99.48 99.49 99,50
3 ..... 34.12 30.82 29.48 28.71 28.24 27.91 27.67 27.49 27.35 27.23 27.05 26.87 26.69 26.60 26.50 26.41 26.32 26.22 26,13
4 . . . . . 21.20 18.00 16.69 15.98 15.53 15.21 14.98 14.80 14.66 14.55 14.37 14,20 14.02 13,93 13.84 13.75 13.85 13.56 13,46 0
Z
5 ..... 16.26 13.27 13.06 II.39 10.97 10.67 10.40 10.29 10.16 10.05 9.89 9.72 9.55 9.47 9.38 9.29 9.20 9.11 9.02
6 ..... 13.75 10.92 9.78 9.15 8.75 8.47 8.28 8.10 7.98 7.76 7.72 7.56 7.40 7.31 7.23 7.14 7.06 0.97 6.88 ~
m
7 ..... 12.25 9.55 8.45 7.85 7.46 7.19 6.99 6.84 6.72 6.62 8.47 6.31 6.16 6.07 5.99 5.91 5.82 5.74 5.65 7
8..... 11.26 8.85 7.59 7.01 6.63 6.37 6.18 0.03 5.91 5.81 5.67 5.52 5.36 5.28 5.20 5.12 5.03 4.95 4.86
9..... 10.56 8.02 6.99 6.42 6.06 6.80 5.61 5.47 5.35 5.26 5.11 4,96 4,81 4.73 4.65 4.57 4.48 4.40 4,31 m
10 . . . . . 10.04 7.56 6.55 5.99 5.64 5.39 5.20 5.06 4.94 4.85 4.71 4.56 4.41 4.33 4.25 4.17 4.08 4.00 3.91
11 . . . . . 9.65 7.21 8.22 5.67 5,32 5.07 4.89 4.74 4.83 4.54 4,40 4.25 4.10 4.02 3.94 3,86 3,78 3,69 3,60 o~
12 . . . . . 9.33 6.93 5.95 5.41 5.06 4.82 4.84 4.50 4.39 4.30 4.16 4.01 3,86 3.78 3.70 3.62 3.54 3.45 3.36 .~
13 . . . . . 9.07 6.70 5.74 5.21 4.86 4.62 4.44 4.30 4.19 4.10 3.96 3.82 3.86 3.59 3.51 3.43 3.34 3,25 3.17 Z
14 . . . . . 8.86 6.51 5.56 5.04 4.69 4,46 4,28 4.14 4.03 3,94 3.80 3.66 3.51 3.43 3.35 3.27 3.18 3.09 3,00 0
15 . . . . . 8.68 6.36 5.42 4.89 4.56 4.32 4,14 4.00 3.89 3.80 3.67 3.52 3.37 3.29 3.21 3.13 3.05 2,95 2.87 m~:
16 . . . . . 8.53 6.23 5.29 4.77 4.44 4.20 4,03 3.89 3.78 3.69 3.55 3.41 3.26 3.18 3.10 3.02 2.93 2.84 2,75
17 . . . . . 8,40 8.11 6.18 4.67 4.34 4.10 3.93 3.79 3.68 3.59 3.46 3.31 3.16 3.08 3.00 2.92 2.83 2.75 2.65 0
18 . . . . . 8.29 6.01 5.09 4.58 4.25 4,01 3.84 3.71 3.60 3.51 3.37 3.23 3,08 3.00 2.92 2.84 2.75 2.66 2.57 r-~
19 . . . . . 8.18 5.93 5.01 4.50 4.17 3,94 3.77 3.83 3.52 3.43 3.30 3.15 3.00 2.92 2.84 2.78 2.67 2.58 2.49
20 . . . . . 8.10 5.85 4.94 4.43 4.10 3.87 3.70 3.56 3.46 3.37 3.23 3.09 2.94 2.88 2.78 2.69 2.61 2.52 2.42
21 . . . . . 8.02 5.78 4.87 4.37 4.04 3,81 3.64 3.51 3.40 3.31 3.17 3.03 2.88 2.80 2.72 2.64 2.55 2.46 2.36
22 . . . . . 7.95 5.72 4.82 4.31 3.99 3.76 3,59 3.45 3.35 3.26 3.12 2.98 2.83 2.75 2.67 2.58 2.50 2.40 2 31
23 . . . . 7.88 5.66 4.76 4.26 3.94 3.71 3.54 3.41 3.30 3.21 3.07 2.93 2.78 2.70 2.62 2.54 2.45 2.35 2.26
24 . . . . . 7.82 5.61 4.72 4.22 3.90 3.67 3.50 3.36 3.26 3.17 3.03 2.89 2.74 2.66 2.58 2.49 2.40 2.31 2.21
25 . . . . . 7,77 5.57 4.68 4.18 3.85 3.83 3.46 3.32 3.22 3.13 2.99 2.85 2.70 2.62 2.54 2.45 2.36 2.27 2.17
36 . . . . . 7.72 5.53 4.64 4.14 3.82 3.69 3.42 3.29 3.18 3.09 2.96 2.81 2.66 2.58 2.50 2.42 2.33 2.23 2.13
27 . . . . . 7,68 5.49 4.60 4.11 3.78 3.56 3.39 3.20 3.15 3.06 2.93 2.78 2.63 2.55 2.47 3.38 2.29 2.20 3.10
28 . . . . . 7.64 5.45 4.57 4.07 3.75 3.53 3.36 3.23 3.12 3.03 2.90 2.75 2.60 2.52 2.44 2.35 2,26 2.17 2.06
29 . . . . . 7.60 5.42 4.54 4.04 3.73 3.50 3.33 3.20 3.09 3.00 2.87 2.73 2.57 2.49 2.41 2.3 2,23 2.14 2.03
30 . . . . . 7.56 5.39 4.51 4.02 3.70 3.47 3.30 3.17 3.07 2.98 2.84 2.70 2.55 2.47 2.39 2.30 2.21 2,1I 2.01
40 . . . . . 7,31 6.18 4.31 3.83 3,51 3.29 3.12 2.99 2.89 2.80 2.66 3.52 2.37 2,29 2.20 2.11 2.02 1.92 1.80
60 . . . . . 7.08 4.98 4.13 3.86 3.34 3.12 2,95 2,82 2.72 2.63 2.50 2.35 2.20 2.12 2.03 1.94 1.84 1.73 1.60
120 . . . . . 6.88 4.79 3.95 3.48 3.17 2.96 2.79 2.66 2.56 2.47 2.34 2.19 2.03 1.95 1.86 1.70 1.66 1.53 1.38
w ...... 8,63 4.61 3,78 3,32 3.02 2.80 2.64 2.51 2.41 2.32 2.18 2.04 1.88 1.79 1.70 1.59 1,47 1.33 1.00
Adapted with permission from Biometrika Tablss/or Stat~tician$, 2nd ed., Vol. I, Pearson, E. S. and Hartley, H, O., eds., Copyright 1958, Cambridge University Press.
b nz ffi degrees of freedom for numerator.
r n* ffi degrees of freedom for denominatoz.
=c ::::: ::::: ::::: :::: /
0 ,-.]
>,
IXl
~a
iii
r
,.4E~
.= P~
.7.
?,
o~
CZ saog~w ONIJ.$:IJ. AIIOgNgC3 NO lVNNVW
//,,-
t~
:>
r t'll
A'~ ~ ~
~ PPhP~ PhPPP PPPPP PPPP~ ~ P ~P;~
~a
~ PPPPP ~ h ~ PP~P~ P~PPP ~PPp ~p~
T
f,e
" PhPPP PPPpp pp~pp pp~pp ~ ~
9 ~ ~ ~ ~ ~
P
=_..
PPPP~ PPPPP PP~PP PP~ PPP~P ~
PPPPP PPPPP PhPh~ PP~ P ~ ~
PPPPP ~PPPP PPPPP PP~ ~ ~Z~ u.
PPhhP PhPPP PPPPP PP~p ~Pp~ ~Z~
~PP PPPP~ ~PPPP PPPPP PP~ ~
SaOl'~W ONLI.$:IIAaOSN~ISNO ~fttNVW ~'Z
MANUAL ON SENSORY TESTING METHODS 7.5
T A B L E 12--Table o f t for one-sided comparisons between p treatment means

and a control .for a joint confidence coefficient o f P = .95 and P = .99 (significance
levels o f 5 and 1 percent).*
Number of Treatment Means ExcludingControl
Error, df P
1 2 3 4
15 . . . . . . . . . . . . . . . . 95 1.75 2.07 2.24 2.36

.99 2.60 2.91 3.08 3.20
16 . . . . . . . . . . . . . . . . 95 1.75 2.06 2.23 2.34
.99 2.58 2.88 3.05 3.17
17 . . . . . . . . . . . . . . . . 95 1.74 2.05 2.22 2.33
.99 2.57 2.86 3.03 3.14
18 . . . . . . . . . . . . . . . . 95 1.73 2.04 2.21 2.32
.99 2.55 2.84 3.01 3.12
I9 . . . . . . . . . . . . . . . . 95 1.73 2.03 2.20 2.31
.99 2.54 2.83 2.99 3.10
20 . . . . . . . . . . . . . . . . 95 1.72 2.03 2.19 2.30
.99 2.53 2.81 2.97 3.08
24 . . . . . . . . . . . . . . . . 95 1.71 2.01 2.17 2.28
.99 2.49 2.77 2.92 3.03
30 . . . . . . . . . . . . . . . . 95 1.70 1.99 2.15 2.25
. 99 2.46 2.72 2.87 2.97
40 . . . . . . . . . . . . . . . . 95 1.68 1.97 2.13 2.23
999 2.42 2,68 2.82 2.92
60 . . . . . . . . . . . . . . . . 95 1.67 1.95 2.10 2.21
.99 2.39 2.64 2.78 2.87
120 . . . . . . . . . . . . . . . . 95 1.66 1.93 2.08 2.18
.99 2.36 2.60 2.73 2.82
................. 95 1.64 1.92 2.06 2.16
. 99 2.33 2.56 2,68 2.77
* This table is adapted from: Dunnett, C. W., " A Multiple Comparison Pro-
cedure for Comparing Several Treatment Means with a C o n t r o l , " Journal o f the
American Statistical Association, Vol. 50, 1955, pp. 1096-1121.
Cited References
[I] Peryam, D. R. and Swartz, V. W., "Measurement of Sensory Differences," Food

Technology, Vol. 5, 1950, pp. 207-210.
[2] Dawson, E. H., Brogdon, J. L., and McManus, S., "Sensory Testing of Differences
in Taste; I. Methods, If. Selection of Panel Members," Food Technology, Vol. 17,
1963, pp. 1125-1132; 1251-1256,
[3] Girardot, N. F., Peryam, D. R., and Shapiro, R. S., "Selection of Sensory Testing
Panels," Food Technology, Vol. 6, 1952, pp. 140-143.
[4] Kramer, A. et al, "Studies in Taste Panel Methodology," Journal of Agricultural
and Food Chemistry, Vol. 9, 1961, pp. 224-228.
[5] Bennett, G. B., Spahr, M., and Dodds, M. L., "The Value of Training a Sensory
Test Panel," Food Technology, Vol. 10, 1956, p. 205.
[6] Peryam, D. R., "Sensory Difference Tests," Food Technology, Vol. 12, 1958, pp.
231-236.
[7] Brandt, D. A. and Hutchinson, E. P., "Retention of Taste Sensitivity," Food
Technology, Vol. 10, 1956, pp. 419 and 420.
[8] Laue, E. A., lshler, N. H., and Bullman, G. A., "Reliability of Taste Testing and
Consumer Testing Methods: Fatigue in Taste Testing," Food Teclmology, Vol. 8,
1954, p. 389.
[9J Cartwright, L, C., Snell, C. T., and Kelley, P. H., "Organoleptic Panel Testing as
a Research Tool," Analytical Chemistry, Vol. 24, 1952, pp. 503-506.
Universidad Nacional De Colombia (Universidad Nacional De Colombia) pursuant to License Agreement. No further reproductions a
76 MANUALON SENSORYTESTING METHODS
[I0] Kroll, B. J. and Pilgrim, F. 3., "Sensory Evaluation of Accessory Foods With
and Without Carriers," Journal of Food Science, Vol. 26, 1961, pp. 122-124.
[I1] Sj6str6m, L. B. and Sullivan, F., "Testing of Polymers," Vol. 1, Chapter II, Inter-
science, 1965, pp. 371-373.
[12] Eindhoven, 3. et al., "Effects of Sample Sequence on Food Preferences," Food
Research, Vol. 21, 1964, p. 534.
[13] Kamenetzky, 3., "Contrast and Convergence Effects in Ratings of Foods," Journal
of Applied Psychology, Vol. 43, 1959, pp. 47-52.
[14] Gridgeman, N. T., "Group Size in Taste Sorting I'riaIs," Food Research, Vol. 21,
1956, p. 534.
[15] Pilgrim, F. J. and Wood, K. R., "Comparative Sensitivity of Rating Scale and
Paired Comparison Methods for Measuring Consumer Preference," Food Tech-
nology, Vol. 9, 1955, pp. 385-387.
[16] Guilford, J. P., Psychometric Methods, 2nd ed., McGraw-Hill, New York, 1954.
117] Stevens, S. S., "The Surprising Simplicity of Sensory Metrics," American Psycholo-
gist~ Vol. 17, 1962, pp. 20-39.
[18] Byer, A. J. and Abrams, D., "A Comparison of the Triangular and 2-Sample Taste
Test Methods," Food Technology, Vol. 7, 1953, p. 185.
[19] Mitchell, J. W., "Duration of Sensitivity in Trio Taste Testing," Food Technology,
Vol. 10, 1956, pp. 169-171; 201-203; 218-220.
[20] Pilgrim, F. J., Schu[tz, H. G., and Peryam, D. R., "Influence of Monosodium
Glutamate on Taste Perception," Food Research, Vol. 20, 1955, pp. 310-314.
[21] Peryam, D. R. and Pilgrim, F. J., "Hedonic Scale Method of Measuring Food
Preferences," Food Technology, Vol. ] 1, No. 9, 1957, pp. 9-14.
[22] Brandt, D. A., Quality Control in the Distilled Spirits Industry Laboratory Prac-
tice," VoL 13, No. 8, 1964, p. 717.
[23] Cairncross, S. E. and Sj6str6m, L. B., "Flavor Profile--A New Approach to Flavor
Problems," Food Technology, Vol. 4, 1950, pp. 308-311.
[24] Caul, J. F., "The Profile Method of Flavor Analysis," Advances bl Food Research,
Vol. 7, 1956, pp. 1-40.
[25] Hasselstrom, T. et a], "Reaimed Sensory Testing Aids Food Analysis, Formula-
tion," Food Engineerbtg, Aug. 1956.
[26] Schutz, H. G., "A Food Action Rating Scale for Measuring Food Acceptance,"
Journal of Food Science, Vo]. 30, 1965, pp. 365-374.
[27] Mahoney, C. H., Stier, H. L., and Crosby, E. A., "Evaluating Flavor Differences
in Canned Foods," Food Technology, Vol. 1!, 1957, p. 29.
[28] Gregson, R. A. M., "Bias in the Measurement of Food Preference by Triangular
Tests," Occupational Psychology, Vol. 34, 1960, pp. 249-257.
[29] Peryam, D. R. et al, "New Flavor Evaluation Method," Food Engineering, Vol.
23, No. 8, 1951, pp. 83-86.
[30] Cochren, W. G. and Cox, G. M., Experimental Designs, 2nd ed., Wiley, New York,
1950.
[31] Roessler, E. B., Baker, G. A., and Amerine, M. A., "l-Tailed and 2-Tailed Tests
in Organoleptic Comparisons," Food Research, Vo]. 21, 1956, p. 117.
[32] Kramer, A., "A Rapid Method for Determining Significance of Differences from
Rank Sums," Food Technology, Vo]. 14, 1960, p. 576.
[33] Friedman, M., "The Use of Ranks to Avoid the Assumption of Normality Im-
plicit in the Analysis of Variance," Journal oflhe American Statisticcl Association,
Vol. 32, 1937, p. 675.
[34] Shelf6, H., "An Analysis of Variance for Paired Comparisons," Journal of the
American Statistical Assn., Vol. 47, 1952, pp. 381--400.
[35] Snedecor, G. W., Statistical Methods, 5th ed., Iowa State College Press, Ames,
Iowa, 1956.
[36] Duncan, D. B., "A Significance Test for Differences Between Ranked Treatments
in an Analysis of Variance," Virgbaia Journal of Science, Vo]. 2, 1951, pp. 171-189.
[37] Dunnett, C. W., "A Multiple Comparison Procedure for Comparing Soteral
Treatments with a Control," Journal of the American Statistical Association, 1955,
pp. 1096--1121.
Universidad Nacional De Colombia (Universidad Nacional De Colombia) pursuant to License Agreement. No further reproductions authorize
Suggested References
Gridgeman, N. T., "A Comparison of Some Taste Test Methods," Journal of Food
Science, Vol. 26, 1961, pp. 171-177.
Harper, R., "Psychological Aspects of Food Acceptance," Advancement of Science,
Vol. 13, t957, pp. 297-299.
Tilgner, D. J., "Dilution Tests for Odor and Flavor Analysis," Food Technology,
Vol. 16, 1962, p. 26.
Amerine, M. A., Pangborn, R. M., and Roessler, E. B., Principles of Sensory
Evaluation of Food, Academic Press, New York, 1965.
Dawson, E. H. and Harris, B. L., "Sensory Methods for Measuring Differences in
Food Quality," Bulletin 34, 1951, U.S. Department of Agriculture, Washington,
D.C.
Ellis, B. H., Guide Book for Sensory Testing, Continental Can Co., Chicago, 1966.
Geldard, F. A., The Human Senses, Wiley, New York, 1953.
IFT Committee on Sensory Evaluation, "Sensory Testing Guide for Panel Evalua-
tion of Foods and Beverages)" Food Technology, Vol. 18, 1964, pp. 1135-1141.
Krum, J. K., "Truest Evaluations in Sensory Panel Testing," Food Engineering,
1955, pp. 74-83.
Little, A. D., Inc., Flavor Research and Food Acceptance, Reinhold, New York)
1958.
"Sensory Food Analysis," Laboratory Practice, Vol. 13, 1964, pp. 596-641; 700--
738.
An Introduction to Taste Testing of Foods, Merck & Co., Rahway, N.J., 1963.
Pangborn, R. M., "Sensory Evaluation of Foods: A Look Backward and Forward,"
Food Technology, Vol. 18, No. 9, 1964, pp. 63-67.
Universidad Nacional De Colombia (Universidad Nacional De Colombia) pursuant to License Agreement. No further reproductions a

Astm 434

Загружено:

Сведения о документе

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Astm 434

Загружено:

Авторское право:

Доступные форматы

Copyright by ASTM Int'l (all rights reserved); Thu Jun 18 11:16:44 EDT 2015

ASTM Special Technical Publication 434

The Society is not responsible, as a body,

Printed in Baltimore. Md.

Sensory testing is concerned with measuring physical properties by

Basic Principles of Sensory Evaluation, STP 433

Correlation of Subjective-Objective Methods in the

MANUAL ON SENSORY TESTING METHODS

centrate on the testing tasks. Air conditioning with controlled

different sessions. Candidates are ranked on the basis of

number of qualified persons available. Specific instruc-

Eliminate from consideration for any particular test all

of the available population. The error will be small

(3) Training should concentrate on the subjects' perceptual

A helpful practice is to publicize test results insofar as possible

(1) With odor stimuli, normal breathing usually suffices if one

only his feelings built up by many factors, both transitory and

is a careful, impersonal neutrality. Avoid giving any hint of

most likely to permit detection of a difference. Simplicity

within a given product type. Important factors to consider

be masked by appropriate means. For example, differ-

ing. If the series is extended beyond a certain point, the

II. Test Forms [15]

ity. The task is easy to understand and can be explained with

presented at a session, the interval between pairs should not

B. RATING SCALES [15,16]

e. Another way of classifying scales is as single (unipolar) and

backlog of data obtained using the same scale with compara-

limited because of the difficulty arising when it is necessary

is given the same samples as unknowns and is required to

the same control throughout, since it increases the proba-

error dictate that each possible sequence or pattern of posi-

have no real advantage over the duo-trio from the standpoint

number. If that probability is low, we say that a difference

Certain forms (for example, dilution methods) are used

G. QUALITY ATTRIBUTE ANALYSIS

reliability subjectively by judging whether replicate tests

III. Special Applications

(a) Elimination of the neutral category.

be evaluated on a single dimension or attribute; however,

(2) "Warm-Up" Sample--With strong-flavored materials

fullness, degree of blending, quality of separate factors

information about the characteristics of a product that may

I would hardly ever eat this.

the basic triangle test finds a difference, the characterization

that the weakest concentration is not perceptibly different

reaches the 5 percent significance level. It may be found

IV. Statistical Procedures

Chart A--Computation of Certain Statistics.

Z ..................... 47 237 37 157

~'a - ~Xa _ 47 _ 4.7

-f(n - ZXn _ 37 _ 3.7

2 NZXa 2 - (ZXA) 2 (10 X 237 - 472 )

Standard error o f mean:

Standard error o f the difference between the means:

9 = ~)~n = (1"789 -I- 2"233) 1/2

symbols used to denote certain quantities and statistics. This

It also may be expressed in terms of raw scores, eliminating

It takes into account both the variation in the distribution

fore, the total degrees of freedom is found by subtracting 1 from

most often considered as the cutoff point between a "real"

SEr = (0.577 ~ + 0.597~) in = 0.816

t - Xn-- Xa _ 1.7 _ 2.08

differences between levels of scores. This is analogous to taking