Вы находитесь на странице: 1из 8

Timbre Space

as a Musical Control Structure

David L. Wessel
IRCAM
31, Rue St. Merri
75004 Paris, France

Introduction
Researchon musical timbre typically seeks represen- Tests of the control schemeshave been carriedout in musical
tations of the perceptualstructureinherentin a set of sounds contexts.* Particularemphasiswill be givenhere to the con-
that have implicationsfor expressivecontrol over the sounds structionof melodiclinesin wlhichthe timbreis manipulatedon
in composition and performance.With digital analysis-based a note-to-note basis. Implicationsfor the design of human
sound synthesis and with experimentson tone quality per- control interfacesand of software for real-timedigitalsound
ception, we can obtain representationsof sounds that suggest synthesizerswill be discussed.
ways to provide low-dimensionalcontrol over their percep-
tually importantproperties. Musical Timbre
In this paper, we will describe a system for taking
subjective measures of perceptual contrast between sound Timbre refers to the "color" or quality of sounds,
objects and using this data as input to some computer and is typically divorced conceptually from pitch and
programs. The computer programs use multidimensional loudness. Perceptual researchon timbre has demonstrated
scalingalgorithmsto generategeometricrepresentationsfrom that the spectral energy distributionand temporalvariation
the input data. In the timbralspaces that result from the in this distributionprovidethe acousticaldeterminantsof our
scaling programs,the various tones can be representedas perceptionof sound quality (See (Grey: 1975) for a thorough
points and a good statistical relationshipcan be sought be- review).Withone notable exception (Erickson: 1975), music
tween the distancesin the space and the contrastjudgments theorists have directed little attention towards the compo-
between the correspondingtones. The spatial representation sitional control of timbre. The primaryemphasishas been
is given a psychoacoustical interpretationby relating its on harmonyand counterpoint.The reasonfor this probably
dimensionsto the acousticalpropertiesof the tones. Controls lies in the fact that most acoustical instrumentsprovide for
are then applied directly to these propertiesin synthesis. very accuratecontrol over pitch but providelittle in the way
The control schemes to be describedare for additivesynthesis of compositionally specifiablemanipulationof timbre. With
and allow for the manipulationof the evolvingspectralenergy the potential of electroacousticinstrumentsthe situation is
distribution and various temporal features of the tones. quite different. Indeed one can now think in terms of pro-
viding accurate specifications for, by way of example,
* Taped examples of various timbralmanipulationschemes
sequences of notes that change timbre one after another.
discussedin this article are availablefrom IRCAM;get cassette It is this note-to-note changeof timbrethat will concern us
order informationfrom IRCAMat the addressabove. in this paper.
D. Wessel: Timbre Space as a Musical Control Structure Page 45
Synthesis Technology 1976) to the analysis and synthesis of musical sounds. The
phasevocoder is an advanceover methods like the heterodyne
Digitaltechnologyofferspowerful,general,andflexible filter (Beauchamp: 1969; Moorer: 1975) in that the analysis
soundsynthesizers.A numberof suchsynthesismachines does not have to be pitch synchronousnor do the waveforms
have alreadybeen constructedand are producingmusical have to contain more or less harmoniccomponents, thus
results.Notableexamplesincludethe digitalsynthesisand permittingthe analysis of tones with pitch variationas well
processingsystemdesignedby PeterSamson(1977) now in as inharmonictones like those produced by percussionin-
operationat the StanfordCenterfor ComputerResearchin struments.The method also guaranteesthat when the analysis
MusicandAcoustics;the 256 digitaloscillatorbankdesigned data is not modified, the originalsignalis recoveredexactly.
andconstructedby G. DiGiugno(1976) andthe digitalsyn- Let us say that we want to synthesize,using data from
thesizerof HalAllesandDiGiugno(1977), bothin operation phase vocoder analysis, a musical instrument timbre with
at IRCAM; the Alles(1977 a, b) synthesizerat BellLabs;and 25 harmonics.In this case 25 amplitude and 25 frequency
the Dartmouthdigitalsynthesizer(Alonsoet al.: 1975).Some envelopes will be required.Storing these functions in full
of thesedevicesofferthe alluringpossibilityof usinga "brute detail demandsconsiderablememory, and if we are using
force"additiveapproachto the synthesisof complexand a computer-controlleddigitalsynthesizerlike those mentioned
musicallyrichtime-variant spectra.In this reportwe will at the start of this article,then transferof the envelopesmay
concentrateon this formof additivesynthesis,becauseof exceed the bandwidthof the link between the computerand
its generality,andwith the accompanying problemof pro- the synthesizer. Furthermore,if the shapes of the envelope
vidingdirectcontroloverthe perceptualpropertiesof the functions are to be modified, the computationtime required
synthesizedsounds. to rescaleevery point of the functions may easily exceed the
Beforebeginninga descriptionof a procedurefor capabilitiesof real-timemanipulation.Clearlysome form of
developingcontrolsthat could facilitatethe musicallyex- data reductionis required,if one wants to work in real time.
pressivemanipulation of complextime-variant spectra,we A particularlyattractive procedure that produces a
willexaminethe natureof boththe acousticalandperceptual significantreduction in the quantity of data involved is to
databasesinvolved.Additivesynthesisrequiresa considerable approximate curvilinearenvelope functions with functions
if not overwhelming amountof explicitinformation,andwe composed of a seriesof straightline segments(such as those
shallexplorewaysto reducethis quantityof datawithout given in (Moorer:1977)). Such straight-line-segmentapproxi-
sacrificingrichnessin the sonic result.On the otherhand, mations can be stored in terms of the coordinatesof the
the datathatwe canobtainaboutourperceptual experience break points of the function, thus greatly reducingmemory
of timbrehas quitea differentcharacterfromthe physical demands. Furthermore,the digital synthesizersconstructed
dataof acoustics,andso we shallalsoexaminesuchnotions by Alles, DeGiugno, and Samson provide digital oscillators
as subjectivescales,perceptualdimensions,and structural that include straight-lineramp controls for both amplitude
representations of subjectivedata.We shall then see the and frequency.In workingwith these oscillators,one provides
extent to whichwe can givean accountof the subjective as data the startingvalue for a ramp,its slope, and a terminat-
experienceby examiningthe relationshipbetween the ing value or time that indicateswhen a new slope is required.
acousticaland the perceptualdatabases.Weseek a psycho- The actual generationof the values along the specified line
acousticsof timbrethathasimplications for timbralcontrol segments is provided within the oscillator itself. In sup-
in musicalcontexts. plying control data from the computer'smemory to these
synthesizers,the break points of the line-segmentapproxi-
mations can be passed directly from the computer to the
Additive Synthesis and Possibilities for synthesizer, thus greatly reducingthe data rate demandson
Data Reduction the interface. Finally, the straight-line-segmentapproximations
make possible rapid modification of the function shapes, as
only the coordinatesof the break points need be modified.
In the additivemodelfor soundsynthesis,a tone is But can we get away with such a drastic data reduction?
represented by the sumof sinusoidalcomponents,each of If we approximatecurvilinearfunctions with a smallnumber
whichhas time-varying amplitudeand frequency.Moorer of connected straightlines, will high audio quality and timbral
(1977) givesanexcellentaccountof the details.To synthesize richnessbe maintained?
a sound,one specifiesa numberof softwareor hardware Indications that the straight line segment approxi-
sinusoidaloscillators,each with its own amplitudeand mations would providesatisfactoryresultshave been obtained
frequencycontrolenvelopes.Additivesynthesisof thisform for brasstones describedin (Risset and Mathews: 1969) and
hastwo importantadvantages. First,it is general,thatis, with by (Beauchamp: 1969). Grey (1975) carriedout a carefully
a sufficientnumberof independently controllableoscillators controlledperceptualdiscriminationexperimentto determine
a verylargeandhighlyvariedclassof signalscanbe generated. the extent to which the tones with completely detailed ampli-
At somesacrificein computational efficiencyandwith an tude and frequency functions could be discriminatedfrom
increasein the quantityof datafor specifyingthe envelopes, those with line-segmentapproximationsconsisting of from
one canmimicFMsynthesis(Chowning:1973)andothernon- five to seven segmentsper envelope (also Grey and Moorer:
lineartechniques(Arfib: 1977;Beauchamp:1975;Le Brun: 1977). Grey used 16 different orchestralinstrumenttones,
1979;Moorer:1976).Of course,one canproduceeffectsnot and in generalhe found the discriminationsto be extremely
possiblewith thesetechniques.Second,one can analyze difficult. His findingsstronglysuggestthat it is not necessary
existing sounds and obtain data that can be used to resyn- to retainthe highly complex temporalmicrostructureof the
thesize exactly the signal that was analyzed. Moorer(1976) amplitude and frequency functions in order to preserve
has describedthe applicationof the phasevocoder (Portnoff: timbral quality. It would appear that the line-segment
Page 46 Computer Music Journal, Box E, Menlo Park, CA 94025 Volume 3 Number 2
approximationscan be made with little harm. Besides the programsproduce an n-dimensionalspatial arrangementof
resulting data reduction, an important advantageof such points that representthe varioussound objects. The programs
approximationsis that the resultingtones have more clearly operate to maximize a goodness-of-fitfunction relatingthe
defined acoustical properties. This will prove especially distancesbetween the points to the correspondingdissimilarity
important when we wish to determine those physical pro- ratingsbetween the sounds.
perties that are especially important for perception. Perhapsat this point it would be best to show how the
multidimensionalscalingexperimentswere carriedout. At
Representation of Timbre Dissimilarities IRCAMwe have recently developed a set of programsthat
greatly facilitate the design, execution, and interpretation
In the next sections we will first describe a method of such experiments.In the following example we use the
for characterizingthe structureof the relationshipsinherent same set of sounds used in both (Grey : 1975) and (Grey and
in a set of sounds differingin timbre.We will then show how Gordon: 1978) and presented in the series "Lexicon of
such representationsof the timbres can be related to the
AnalyzedTones" (Moorer,et al.: 1977, 1978). This set con-
underlyingacoustical propertiesof the tones. We will also sists of 16 synthetic orchestralinstrumenttimbresand a group
show that the representationscan be used to compose timbral of eight hybrid instrumenttimbres producedby exchanging
patterns with perceptual properties predictable from the spectral envelopes between members of the original set.
structureof the representation.Finally it will be arguedthat The goal of our experiment will be to provide a represen-
when this is accomplishedwe have in some sense designeda tation of these 24 timbres as points in a Euclidianspace as
systematic control scheme for the perceptually important well as an interpretationof this representationin terms of
acoustic propertiesof the sounds. the acousticalpropertiesof the tones.
From a quantitativepoint of view, data from subjective
judgments has a peculiar if not uncertain status (Luce: General Method
1972). The notion of a unit of measurementsuch as the
decibel or hertz is difficult if not impossibleto establishfor The procedurethat providesan interpretedrepresenta-
subjectivescales. We can of course choose units for subjective tion of the soundsinvolvesthe following five steps: 1. Selec-
scales like "sones" or "mels" as Stevens (1959) has done, tion of materialsfor study; 2. Collection of the dissimilarity
but the so-called unit derived in one experimentalcontext judgments; 3. Representationof the dissimilarityjudgments
fails to remain fixed in other contexts. In fact, the "sone," with spatial and other schemes such as clusters and graphs;
the unit for subjectiveloudness, is not invariantacross the 4. Psychoacousticalinterpretationof the structure; 5. Veri-
two ears of individualswith normal hearing(Levelt, et al.: fication of the interpretationin musicalsituations.
1972). Such units are useful in that they providea common
languagefor discussingthe auditoryabilities of a population Selecting the Materials for Study
of listeners, but they cannot justifiably be treated with the
In preparingthe sounds that we wish to represent,
algebraof dimensionalanalysis that underliesmeasurement attention must be paid to (A) the numberof sound elements,
in the physical sciences. I think it right to be pessimistic
about the possibility of subjectivescales being elevatedto the (B) the problem of equalizingthe sounds with respect to
same form of measurementas physical measurement.But parameterswe wish to ignore, and (C) the rangeof variation
within the set of sounds.
subjectivejudgments,if collected over a sufficient number
of objects, in this case sounds, can have a representable A. The Numberof Sound Elements
structure,and this structurecan in turn be relatedto various
acousticalparameters. To obtain a meaningfulrepresentation,a certainminimal
Perceptualjudgments tend to be relative in nature. number of sound elements is required,in order that the
With few exceptions, we tend to judge an object in terms dissimilarltiesimpose a sufficient amount of constraintfor
of the relationshipsit has with other objects. Relational fixing accuratelythe locations of the points in a space. Some
judgments are of great interest in music, because music of the multidimensionalscalingprograms,like KYST,operate
involvespatternscomposed of a varietyof sounds, and it is the with only qualitativeor "ordinal"constraintson the distances
relational structure within and between the patterns that in the space. These algorithimsseek arrangementsof the points
is of primaryimportance. Judgmentsof the extent of per- such that the rank order of the interpoint distancesin the
ceptual similarityor dissimilaritybetween two sounds can space matches, in terms of a well-defined goodness-of-fit
be made in a very intuitive fashion. One can say that sound measure,the rank order of the dissimilaritiesbetween the
A is more similarto sound B than to sound C without having
correspondingstimuli. If we begin with interpoint distances
to name or otherwise identify explicitly the attributes that from a known configurationof points, the programsprovidea
were involved in the judgment. Researchgroups at IRCAM,
very accuraterecovery of the positions of the points even
at MichiganState University,and at the StanfordCenterfor when the distancesare subjectedto radicalmonotonic trans-
Researchin Musicand Acoustics have been usingperceptual formations or perturbationsdue to random error(Shepard:
dissimilarityjudgmentsin a varietyof musicaland otherwise 1966). It is hard to set down a precise rule of thumb for
audio-relatedcontexts. One of the generaltechniques is to determiningthe minimumnumber of elements to use. The
representthe perceptualdissimilaritiesas distancesin a spatial choice dependson the numberof dimensionsand the distribu-
configuration.One begins with a set of dissimilarityjudgments tion of the points in the space, an issue to which we shall
typically taken for all pairsof sounds that can be formed from return below. In most psychologicalresearchusing multi-
the set. This matrix of dissimilaritiesis then processedby one dimensionalscaling, 10 points have seemed sufficient for
of a variety of multidimensionalscaling programssuch as two dimensionsand 15 for three. In our recent researchwe
KYST (Kruskal: 1964 a,b ). The multidimensionalscaling have tended to encouragethe use of between 20 to 30 points
for spacesof two and three dimensions,respectively.
D. Wessel: Timbre Space as a Musical Control Structure
Page 47
B. Equalizingthe Tones with Respect to in which each sound element is stored. Each sound file in the
ExtraneousParameters list is then related to a characterthat can be typed on the
terminalkeyboard. The relationshipsbetween the characters
If possible, the tones should be equalizedwith respect and the sounds are listed on the computer terminal'sCRT
to the propertiesthat are not to influence the judgments. display. When a characteris typed the correspondingsound
When studying timbre, the usual procedureis to equalize file is played throughthe digital-to-analogconvertersand the
the pitch, subjectiveduration,loudness, and room information playing action is indicated by an increasein the brightness
aspects of the tones. On the other hand, if we are studying of the character-filename entry in the table displayedon the
room information (that is, reverberationstructure),then we CRT. This programpermitsrapidauditorycomparisonsamong
probablywant to use a standardsource and manipulateonly a largenumberof different sounds. Currentlythe limit is 100
the reverberationparameters.Attention should also be paid files of arbitrarylength.
to just what is being equalized. If we are equalizingwith
respect to loudness, for example, and the tones in the set have Collecting the Timbre Dissimilarity Judgements
differentspectralshapesandattackrates,then simply matching
sound pressurelevel (in terms of decibels) will not provide Thoughthere exist a varietyof ways to collect percep-
the appropriateequalization. In this case we are without a tual dissimilarities,we have found simple ratingsof the extent
satisfactory model for the perception of the loudness of of dissimilarityto be the most efficient and least tedious way
complex time-variantspectra (Moorer: 1975) and we must to make the judgments.At IRCAMwe have been using direct
resort to making empiricalmatches. Grey (1975) provides estimatesof the dissimilaritybetween two tones. Ourlistening
a good example of such subjective matching proceduresfor judge, using a programwritten by Bennett Smith called
pitch, duration,and loudness. ESQUISSE,sits before a CRT displayterminaland an audio
system fed by the computer's digital-to-analog converters.
C. Controllingthe Rangeof VariationWithinthe Set The two soundsin a pairare relatedto the terminalkeys "m"
and "n", allowingthe listener to play the sounds at will. After
The rangeof variationin the timbresof the sounds will listening,the judge enters a ratingwith one of the keys "0"
certainlydiffer from one type of study to another. In some through"9" on the terminal.Immediatelyafter the judgment
instanceswe will want to investigatea timbraldomainhavinga is entered, the next pair of tones is ready to be played from
broad range of variation including perhaps inharmonic the keyboard. The sequencing of the judgment trials is
percussionsounds as well as sounds with more or less har- random,and all of the n (n- 1)/2 pairsare used, wheren
monic components. In other situations more restrictionsare is the numberof sounds understudy. After all the pairshave
imposed on the range of variation,as in, for example, the been judged, the random sequence is unscrambledand a
study to be describedin this paperwhere we use only sounds matrix of dissimilaritiesis formed.
derived from standard orchestralinstruments played in a The data collection programincludes what we call a
conventionalmanner.For even more refinedinvestigationsof "coffee-break" feature that allows the judgment session to
timbralnuance one might use a very limited rangeof sounds. be interruptedeither by machinefailureor human fatigue.
Once a generalrangeof variationhas been determined, The listener can then returnto the experimentat a later
some attention must be paid to the homogeneity of variation time and continue from the point in the sequencewhere the
within the set. Considerthe following example. If we choose interruptionoccurred.Withthis program,one is able to listen
eight distinctively percussivetimbres and eight distinctively to the sounds rapidly and freely, and the judgments can be
non-percussion timbres but provide no linking elements be- enteredwith ease.
tween the two domains,then it is likely that all the subjective
dissimilaritiesthat are made for pairsthat cut acrossthe two A Two-dimensional Representation of 24 Orchestral
classesof sounds are largerthan all the intra-classdissimilari- Instrument Timbres
ties. In situationslike this, some of the multidimensionalscal-
ing programsgive what are called degeneratesolutions and the I servedas a judge using the dissimilaritydata collection
intra-class structureis not fully displayed. Shepard (1974) programon a set of 24 orchestralinstrumenttimbresthat were
providesa detailed discussionof this problemand prospects obtained from John Grey. These soundswere synthesized
for its solution. In addition, in such a situation I find that in usingline segmentenvelopesand were equalizedsubjectively
makingthe judgmentsI have a difficult time concentratingon for pitch, loudness, and duration.A two-dimensionalrepre-
the relatively subtle differences for pairs within a class in sentationof the soundsprovidedby the KYSTprogramis shown
the context of the much largerdissimilaritiesfor the pairs in Figure 1, along with an interpretationof the dimensionsof
spanningthe two categories.Unfortunately, since we will this space. The verticalaxis is relatedto the spectralenergy
most often deal with timbraldomainsabout which we have distributionof the tones, and the horizontal,to the natureof
little knowledge, clear rules for the preliminaryselection the onset transient.The sounds at the top of plot are bright
of the variationin the materialare hard to set down. Selection in character,and as one moves towardsthe bottom the timbres
proceduresdepend ultimately on our specific interests and become progressivelymore mellow. In a numberof studies on
desiresfor control. timbrespaces(Wedenand Goude: 1972; Wessel:1973; Grey:
To facilitate the preliminaryscreeningand selection of 1975; Ehresmanand Wessel: 1978; Grey and Gordon: 1978;
the sounds, we have developedan interactiverandomaccess Wesseland Grey: 1978) this dimensionrelatedto the spectral
audio playback programon IRCAM'sDEC-10 computer. energy distributionhas appeared.A consistent, quantitative,
This programis called KEYS and was written by Bennett acousticalinterpretationhas also been providedby calculating
Smith (Wesseland Smith: 1977). One suppliesKEYSwith a an excitation patternfor the spectrumprovidedby Zwicker's
list of the sounds written out as a list of the names of the files model for loudness (Zwicker and Scharf: 1965). This trans-
Page 48 Computer Music Journal, Box E, Menlo Park, CA 94025 Volume 3 Number 2
formationon the acousticalspectrumcompensatesfor certain The horizontal dimension is related to the quality of
propertiesof the auditory system like criticalbands and the the "bite" in the attack. Possibilitiesfor quantificationof this
asymmetricspreadof maskingfrom low to high frequencies. dimensionare discussedin the next section on timbrepatterns
The centroid or mean of this compensatedspectralenergy formed in the space.
distributionis then calculatedand correlatedwith projections
of the points on the axis assumedto be relatedto brightness. Predictions About Timbre Patterns
In all of the studies these correlationshave been very high.
To a largeextent, music consists of syntactic patterns.
mTM It is the natureof the relationshipsamongthe elements of the
patternsthat is of primaryimportancein their perception.In
UTPZ C02 the next seriesof examplesI wouldlike to show that when note -
to-note timbralchangesare organizedin terms of the timbre
"BRIGHTER"
space which we just examined, then predictableperceptual
organizationsof timbre patternscan be obtained.
First we will examine some auditoryeffects that we can
bol relateto the dimensionsof the spaceandthe distancesspanned.
In the following patternsthe sequenceof notes will alternate
tBCZ
between two differing timbres, but otherwise the pitch se-
VTMZ
dEH fc, quence and rhythmictimingwill remainfixed. The pitch se-
quence is the simple, repeating,three-note ascendingline
"MORE BITE" shown in Figure2. The alternatingtimbresequenceis shown
by the alternatingnotes markedrespectivelywith "0" and
Os2Z hX "X". When the timbral distance between the adjacentnotes
IITP j S2Z
w
XS2
rojz is small, the repeatingascendingpitch lines dominate our
Wslz perception.However,when the timbre differenceis enlarged
along the "spectralenergy distribution"axis, the perceptual
organizationof the patternis radicallyaltered.The line now
nsl splits at the wide timbralintervalsand for many listenerstwo
g
interwovendescendinglines areformed,eachwith its own tim-
eBN kFL ix2 bralidentity. This type of effect is called "melodicfission" or
JX3 "auditorystreamsegregation"in the psychoacousticliterature
(Bregmanand Campbell: 1971; Van Noorden: 1975) and
is a consequence of the large spectral energy distribution
between the alternatingtimbres.
qFHZ

aFH P53
SBNZ

FIGURE1. Two-dimensionaltimbrespace representationof FIGURE2. Ascending pitch patterns in "three" with two
24 instrument-likesounds obtained from Grey. The space was alternatingtimbres ("0" and "X"). If the timbraldifference
producedby the KYSTmultidimensionalscalingprogramfrom between adjacent notes is large,then one tends to perceive
dissimilarityjudgementsmadeby Wessel.The lower-case letters interleaveddescendinglines formed by the notes of the same
identify the sounds as recordedon a cassette-tape preparedto timbraltype.
accompanythis paperand distributedby IRCAM.The upper-
case subscriptsat each point identify the tones as reportedin A different effect is obtainedby movingalong the di-
(Grey: 1975, 1977), (Greyand Gordon: 1978) and (Gordon mension we interpretedas relatingto the onset characteris-
and Grey: 1978). The originalanalyzed tones upon which tics of the sounds. Whenthis is done we obtain a perceptu-
these synthesized versions are based are being presented ally irregularrhythm even though the acousticalonset times of
in the ComputerMusicJournalseries "Lexicon of Analyzed the notes are the same. This observationhas some important
Tones." implicationsfor the control of sound in synthesis.Whenwe
Abbreviationsfor stimuluspoints: 01, 02 = oboes, FH = alter the propertiesof the attack of the tone, we are also
French horn, BN = bassoon, C1 = E-flat clarinet, C2 = bass likely to influence the temporal location of the perceived
clarinet, FL = flute, X1, X2, X3 = saxophones, TP = trumpet, onset of the tone. This lack of accordbetween physicalonset
EH = English horn, S1 = cello played sul ponticello, S2 = time and subjectiveonset time has been observedwith speech
cello played normally, S3 = cello played muted sul tasto, sounds by Morton,et al. (1975). Morton'sexperimentalpro-
FHZ = modified FH with spectralenvelope, BNZ = cedure offers the possibility of determiningthe relative per-
modified BN with FH spectralenvelope, S1Z = modified S1 ceived onset times for a set of notes. The procedureuses a
with S2 spectralenvelope, S2Z = modified S2 with S1 spectral simpleABAB... AB alternatingsequence similarto those
envelope, TMZ = modified TM with TP spectral envelope, just described.The listener adjusts the shift in onset for all
BCZ = modified C2 with 01 spectral envelope, 01Z = the B's in the sequence until the sequence is perceivedas
modified 01 with C2 spectralenvelope. regular,and the temporaldisplacementin the physicalonset
D. Wessel: Timbre Space as a Musical Control Structure Page 49
is then noted. Perhapswith the applicationof such a method
to musicaltimbresand with the employment of a good model
of auditory temporal integrationof complex time-varying
spectra,we will be able to predictmore preciselywhere the
perceivedonsets of tones with differing spectralevolutions
will occur. Nevertheless, both the fine tuning of rhythm
in music and psychoacoustic researchwill benefit greatly if
the control software of our synthesissystems allows easy and
flexible adjustment of physical onset times in complex
musicalcontexts.
Withthe previousexampleswe have furtherverifiedthe
interpretationof the timbre space and have demonstrated
that to some extent the propertiesof the space retain their
validity in richer musical situations. In the next example
involvingtimbralanalogiesI hope to demonstratethat other
propertiesof the geometry of the timbre space allow us to
make predictionsabout patternperceptionas well.
!

Timbral Analogies

Composers frequently make transpositionsof pitch


patterns.It seemednaturalto ask if transpositionsof timbral
sequenceswork as well. To get some preliminaryindications
regardingthis possibility David Ehresmanand I (Ehresman
and Wessel: 1978) tested a parallelogrammodel of analogies
developed by Rumelhartand Abramson(1973). The basic
idea is illustratedin Figure3. If we make a two-note timbral
pattern,the sequenceA,B in Figure3, and wish to make an
analogous(or, for our purposes,transposed)sequencebegin-
ning on timbreC, then we choose the note D that best com-
pletes a parallelogramin the space.
To test this idea we presentedlisteners with four dif-
ferent solutions to timbral analogiesof the form A -+ B
FIGURE3. Parallelogrammodel of timbre analogies.A + B
as C -> Di. They were asked to orderthe alternativeanalogies
is a givenchange in timbre;C + D is a desiredtimbralanalogy,
indicatingthe best formed, the next best formed, and so forth. with C given. D is the ideal solution point. D1, D2, D3, and
1,r

The idea was to check if the listeners'rankingof the goodness


of the analogy would be inversely related to the distance D4 are the actual solutions offered to the listeners.
between the variousalternativesolutions,Di's, and the ideal
solution point specified by the parallelogramin the timbre
space. In constructingthe analogy problemswe chose the
alternativesso they would fall at graded distances from the Table 1. Rankorderdata averagedover nine listenersand all
ideal solution point. We selected 40 different analogy prob- 40 analogies. Cf. Figure3.
lems from a timbre space similarto the one just described.
Our laboratorycomputer system allowed us to synthesize
and store the tones and then to automate and analyze the Listener-AssignedRank
Rank Distanceof (J)
experiment. One important feature of the system was that the Alternativefrom I
the listenershad essentiallyrandomaccess to the four alter-
the IdealSolution 1 2 3 4
native formations of an analogy problem and were able to
make rapidauditorycomparisonsamong them.
1 .422 .303 .156 .119
Table 1 shows the results of this experiment for nine 2 .322 .283 .217 .178
listeners who each ordered the solutions for 40 different (/) 3 .169 .267 .358 .206
analogyproblems.The entriesin the table indicate the propor- 4 .086 .147 .269 .497
tion of times, averagedover listenersand analogies,for which
the Ith closest alternativeto the ideal analogy point was
ranked as the Jth best solution, where I is the row index
and J is the column index. Column 1 of this table shows The idea of transposingtimbralpatternssuggestsanother
that the predictionwas indeed fulfilled. In fact, the distance procedurefor representingimportantperceptualrelationships.
between an alternativeand ideal analogy point predicts not Usinga technique called simultaneouslinearequation scaling
only the best solution but the rankorderingof most of the (Carrolland Chang:1972) one can derivedistanceestimates
alternatives.Though more researchneeds to be done, the directly from the analogy quality judgments and then use
notion of transposinga sequence of timbres by forming the standardmultidimensionalscaling algorithms on these
anothersequence geometricallyparallelto it in timbrespace distances. Though we have yet to try this scheme, it seems
thus appearsto be a reasonableand musically viable idea. promising.
Page 50 Computer Music Journal, Box E, Menlo Park, CA 94025 Volume 3 Number 2
Designing Control Systems from the Perceptual References
Representations
Alles, H. G. (1977a) "A Portable Digital Sound Synthesis
The timbre space representation suggests relatively System." ComputerMusic Journal, Vol. 1, No. 4,
straightforwardschemes for controlling timbre. The basic pp. 5-6.
idea is that by specifying coordinates in a particulartimbre Alles, H. G. (1977b) "A ModularApproach to Building
space, one could hear the timbre representedby those co- Large Digital Synthesis Systems." ComputerMusic
ordinates. If these coordinates should fall between existing Journal,Vol. 1, No. 4, pp. 10-13.
tones in the space, we would want this interpolated timbre Alles, H. G., and DiGiugno, G. (1977c) "A Onr Card,64-
to relate to the other sounds in a mannerconsistent with the ChannelDigitalSynthesizer."Computer usic Journal,
structureof the space. Evidencethat such interpolatedsounds Vol. 1, No. 4, pp. 7-9.
are consistent with the geometry of the space has been Alonso, S., Appleton, J. H., Jones, C. (1975) "A Special-
providedby Grey (1975). Grey used selected pairsof sounds Purpose Digital System for the Instruction, Compo-
from his timbrespace and formed sequencesof interpolated sition, and Performanceof Music."Proc. of the 1975
sounds by modifying the envelope break points of the two Conferenceon Computersin UndergraduateCurricula,
soundswith a simple linearinterpolationscheme. These inter- 6, 17-22.
polated sequencesof sounds were perceptuallysmooth and did Arfib, D. (1977) "Digital Synthesis of Complex Spectraby
not exhibit abruptchangesin timbre.Membersof the original Means of Non-Linear Distortion of Sine Waves and
set of sounds and the newly createdinterpolatedtimbreswere Amplitude Modulation."Paper presented at the 1977
then used in a dissimilarityjudgmentexperimentto determine InternationalComputerMusic Conference,Center for
a new timbre space. This new spacehad essentially the same MusicExperiment,Universityof Californiaat San Diego,
structureas the original space with the interpolated tones La Jolla.
appropriatelylocated between the sounds used to construct Beauchamp,J. W. (1969) "A Computer System for Time-
them. It would appear from these results that the regions VariantHarmonicAnalysis and Synthesis of Musical
between the existing sounds in the space can be filled out, Tones," in Music by Computers,edited by H. von
and that smooth, finely graded timbral transitionscan be Foerster and J. W. Beauchamp.John Wiley and Sons,
formed. Inc., New York.
The most naturalway to move about in the timbral Beauchamp,J. (1975) "Analysisand Synthesis of Comet
space would be to attach the handles of control directly to Tones Using Nonlinear InterharmonicRelationships."
the dimensionsof the space. I examinedsuch a control scheme Journal of the Audio EngineeringSociety, Vol. 23,
in a real-time context (Wessel: 1976). A two-dimensional pp.778-795.
timbrespace was representedon the graphicsterminalof the Bregman,A. S., and Campbell,J. (1971) "PrimaryAuditory
computer that controlled the DiGiugno oscillator bank at Stream Segregationan(i Perception of Orderin Rapid
IRCAM.One dimensionof this space was used to manipulate Sequencesof Tones."Journalof ExperimentalPsychol-
the shape of the spectralenergy distribution.This was accom- ogy, Vol. 89, pp. 244-249.
plished by appropriatelyscaling the line segment amplitude Carroll,J. D., and Chang,J. J. (1972) "Simules:Simultaneous
envelopesaccordingto a shapingfunction. The other axis of Linear Equation Scaling."Proceedings, 80th Annual
the space was used to control either the attack rate or the Convention of American Psychological Association.
extent of synchronicityamongthe variouscomponents. Over- Chowning,J. M. (1973) "The Synthesis of Complex Audio
all, the timbraltrajectoriesin these spaceswere smooth and Spectra by Meansof Frequency Modulation."Journal
otherwise perceptually well-behaved. To facilitate more of the Audio EngineeringSociety, Vol. 21, pp. 526-
complex forms of control, an efficient computerlanguagefor 534. Reprintedin ComputerMusic Journal, Vol. 1,
dealingwith envelopesis needed. Grey and his colleaguesat No. 2, pp. 46-54, 1977.
Stanfordhave developeda languagefor this purpose(Kahrs: DiGiugno,G. (1976) "A 256 DigitalOscillatorBank."Paper
1977), and our group at IRCAMis making a similareffort. presented at the 1976 InternationalComputerMusic
The basic idea behind such a languageis to providea flexible Conference, MassachusettsInstitute of Technology,
control structure that permits specification, sequencing,and Cambridge.
combination of variousproceduresthat creafe and modify Ehresman,D., and Wessel, D. L. (1978) "Perception of
envelopes. These procedureswould include operationslike TimbralAnalogies."IRCAMTechnicalReport No. 13.
stretching or shorteningduration, changingpitch, reshaping Erickson,R. (1975) Sound Structurein Music. Universityof
spectrum, synchronizingor desynchronizingspectral compo- CaliforniaPress,Berkeleyand Los Angeles.
nents, and so forth. Withsuch a languageit will be possible Grey, J. M. (1975) Explorationof MusicalTimbre.Stanford
to tie the operationson the envelope collections directly to Univ. Dept. of MusicTech. Rep. STAN-M-2.
the properties of the perceptual representationsof the Grey, J. M., and Moorer,J. A. (1977) "PerceptualEvaluations
material. of Synthesized MusicalInstrumentTones." Journal of
the Acoustical Society of America, Vol. 62, pp. 454-
Acknowledgments 62,1977.
Grey, J. M. (1977) "MultidimensionalPerceptualScalingof
MusicalTimbre."Journalof the Acoustical Society of
I would like to thank John Grey and John Gordon for America,Vol. 61, pp. 1270-1277, 1977.
their tones and comments, and GeraldBennett, Andy Moorer, Gordon, J., and Grey, J. M. (1978) "PerceptualEffects of
Wayne Slawson, and John Strawn for their comments and SpectralModificationson OrchestralInstrumentTones."
criticalreadingof the manuscript. ComputerMusic Journal, Vol. 2, No. 1, pp. 24-31.
D. Wessel: Timbre Space as a Musical Control Structure
Page 51
Kahrs,M. (1977) "A ComputerLanguagefor Psychoacoustic Portnoff, M.R. (1976) "Implementationof the DigitalPhase
Study and MusicalControl of Timbre."Paperpresented Vocoder Using the Fast Fourier Transform."IEEE
at the 1977 InternationalComputerMusicConference. Transactionson Acoustics, Speech, and Signal Proces-
Centerfor MusicExperiment,Universityof Californiaat sing, Vol. ASSP-24, pp. 243-248.
San Diego, La Jolla. Risset, J. C. and Mathews, M.V. (1969) "Analysisof Musical
Kruskal,J.B. (1964a) "MultidimensionalScalingby Optimiz- InstrumentTones."Physics Today, Vol. 22, pp. 23-30.
ing Goodness of Fit to a Nonmetric Hypothesis." Rumelhart, D.E., and Abrahamson,A.A. (1973) "Toward
Psychometrika,Vol. 29, pp. 1-27. a Theory of AnalogicalReasoning."CognitivePsycho-
Kruskal,J.B. (1964b) "NonmetricMultidimensionalScaling: ogy, Vol. 5, pp. 1-28.
A NumericalMethod." Psychometrika, Vol. 29, pp. Samson,Peter (1977) "SystemsConceptsDigitalSynthesizer
115-129. Specifications."Availablefrom Systems Concepts,520
Le Brun,M. (1979) "Waveshaping Synthesis,"Journalof the ThirdStreet, San Francisco,California94107.
Audio EngineeringSociety, Vol. 27, No. 4, pp. 250- Shepard,R. N. (1966) "MetricStructuresin OrdinalData."
266. Journal of MathematicalPsychology, Vol. 3, pp. 287-
Levelt, W.J. M., Riemersma,J.B., and Bunt, A.A. (1972) 315.
"BinauralAdditivity of Loudness,"British Journal of Shepard, R. N. (1972) "PsychologicalRepresentationof
Mathematicaland StatisticalPsychology, Vol. 25, pp. Speech Sounds," in Human Communication,edited
51-68. by E.E. Davisand P.B. Denes. Mc Graw-Hill,New York.
Luce, R. D. (1972) "WhatSort of Measurementis Psycho- Shepard,R.N. (1974) "Representationsof Structurein
physicalMeasurement?"AmericanPsychologist, Febru- Similarity Data: Problems and Prospects." Psycho-
ary, pp. 96-106. metrika39, 373-421.
Moorer,J.A. (1975) "On the Loudness of Complex, Time- van Noorden, L. (1975) "TemporalCoherencein the Percep-
Variant Tones." Stanford University, Department of tion of Tone Sequences." Instituut voor Perceptie
Music,Tech Report STAN-M-4. Onderzoek,Eindhoven,Holland.
Moorer, J. A.(1976) "The Synthesis of Complex Audio Wedin,L. and Goude, G. (1972) "DimensionAnalysisof the
Spectra by Meansof Discrete Summation Formulae." Perception of Instrumental Timbre." Scandinavian
Journal of the Audio EngineeringSociety, Vol. 24, Journ.Psych.,Vol. 13, pp. 228-240.
pp.717-727. Wessel,D.L. (1973) "Psychoacousticsand Music: A Report
Moorer,J. A. (1976) "The Use of the PhaseVocoder in Com- from MichiganState University."PAGE:Bulletin of the
PuterMusicApplications."Presentedat the 55th Con- ComputersArts Society, Vol. 30.
vention of the Audio EngineeringSociety; availableas Wessel,D.L. and Grey, J.M. (1978) "ConceptualStructures
PreprintNo. 1146 (El). for the Representationof MusicalMaterial."IRCAM
Moorer,J. A. (1977) "SignalProcessingAspects of Computer TechnicalReport #14.
Music- A Survey."Proceedingof the IEEE, Vol. 65, Wessel, D.L. and Smith, B. (1977) "PsychoacousticAids
No. 8, pp. 1108-1137, and ComputerMusicJournal, for the Musician'sExplorationof New Material."Paper
Vol. 1,No. , pp. 4-38. presentedat the 1977 InternationalComputerMusic
Moorer, J.A., Grey, J.M., and Snell, J.M., (1977) "Lexicon Conference,Center for Music Experiment,University
of AnalyzedTones (PartI : Violin Tone)," Computer of Californiaat San Diego, La Jolla.
MusicJournal,Vol. 1, No. 2, pp. 39-45. Wessel,D.L. (1976) "PerceptuallyBased Controls for Ad-
Moorer, J.A., Grey, J.M., and Strawn, J. (1977) "Lexicon ditive Synthesis."Paperpresentedat the 1976 Interna-
of AnalyzedTones (Part II :Clarinetand Oboe Tones)," tional Computer Music Conference, Massachusetts
ComputerMusic Journal, Vo. 1, No. 3, pp. 12-29. Institute of Technology.
Moorer, J.A., Grey, J.M., and Strawn, J. (1978) "Lexicon Zwicker, E. and Sharf, B. (1965) "A Model of Loudness
of Analyzed Tones (Part III: the Trumpet),"Computer Summation."Psych. Review, Vol. 72(1), pp. 3-26.
MusicJournal,Vol. 2, No. 2, pp. 23-31.
Morton, J., Marcus,S., and Frankish,C. (1976) "Perceptual
Centers (P-Centers)"Psychological Review, Vol. 83,
No. 5, pp. 405-408.

Page 52 Computer Music Journal, Box E, Menlo Park, CA 94025 Volume 3 Number 2

Вам также может понравиться