Вы находитесь на странице: 1из 7

http://www.academia.

edu/283594/The_Role_of_Syllables_In_Speech_Perception

Nusbaum, H. C., & DeGroot, J. The role of syllables in speech perception. In M. S. Ziolkowski,M. Noske, & K. Deaton (Eds.), Papers from the parasession on the syllable in phonetics and phonology . Chicago: Chicago Linguistic Society, 1991.

The Role of Syllables in Speech Perception Traditionally, psychologists have viewed speech perception as a recognition process in which continuously varying acoustic patterns are matched to discrete phonemic representations in the listener's mind. Once phonemes are recognized, words can be identified based on patterns of segments rather than on patterns of acoustic properties (Pisoni, 1981).According to this view, the waveform is segmented into chunks of acoustic information and each chunk (or a portion of each chunk) is recognized as a phoneme. A classic problem facing theories of speech perception is to explain how the waveform is divided into acoustic segments that are appropriate for phoneme recognition. While it is possible to segment speech according to acoustic criteria, speech cannot be segmented according to the criteria that would be needed to provide appropriate correspondence with phonemes (Fant, 1962). A further problem for theories of speech perception arises because the acoustic specification of specific phonemes changes with talker, phonetic context, and situation (Liberman, Cooper, Shankweiler, & Studdert-Kennedy, 1967). Thus there are no context-independent, invariant acoustic properties that can be considered as criterial for recognizing a particular phonological segment. These problems make it difficult to explain how listeners recognize the segmental structure of speech. One approach to dealing with these problems is to propose that speech perception is mediated by linguistic units other than the phoneme. The hope of this approach is that other types of linguistic units (e.g., phonetic features or syllables) may be less affected by the coarticulatory processes that affect the acoustic representation of phonemes. For example, Stevens and Blumstein (1978)have argued that phonetic features may be related directly to unique acoustic patterns, whereas Massaro (1972) has argued that the syllable is a better candidate for the fundamental unit of speech perception. Unfortunately, there is some empirical evidence against each of the candidates (e.g., Ohman, 1966; Walley &Carrell, 1983).Thus there has been a longstanding, unresolved question about the nature of the first level of linguistic representation of speech after auditory processing. To a degree, the question of how listeners represent the sound pattern of language has perhaps even achieved the same kind of notoriety for being undecidable as the question of whether mental images are represented as analog "pictures" or propositional "descriptions" (Anderson, 1978; Kosslyn, 1980; Pylyshyn, 1981). For some time, the issue seemed to be resolved by the argument that listeners actively use several levels of linguistic representation simultaneously, such that no particular level has primacy, and each may serve a different function. For example, Studdert-Kennedy (1976) suggested that speech may be segmented into syllables, which then serve as the basis for phonetic recognition. By this view, each individual syllable should provide all the information

necessary to support phoneme recognition. The initial psychological research investigating the perceptual representation of the sounds of speech did not seem to provide any resolution whatsoever for this issue of the linguistic representation of speech sounds. In these early studies, subjects were instructed to listen for a particular target phoneme or syllable. During a trial of the experiment, subjects monitored for this target in a sequence of utterances, such as isolated syllables or vowels, pressing a button to indicate recognition of the target. On the basis of their seminal study, Savin and Bever (1970) argued that the syllable is the primary unit of speech analysis: Subjects recognized whole syllables faster than phonemes within syllables. The logic behind this conclusion is relatively straightforward, but perhaps overly simplistic. Subjects should be able to respond most quickly based on target representations that are available to consciousness in the shortest amount of time. If the sounds of speech are represented as whole syllables, then recognizing syllable targets should be faster than recognizing phoneme targets because the phonemes have to be extracted or inferred after the syllables are recognized. Subsequent studies argued against this logic and conclusion. These experiments indicated that the results of target monitoring may reflect a listener's expectations about the structure of the stimuli, even more than they reflect the perceptual units that mentally represent the sound patterns of those stimuli (Foss& Swinney, 1973; Healy & Cutting, 1976; McNeill & Lindig, 1973; Mills, 1980a, 1980b). Furthermore, studies that have used the target monitoring paradigm to investigate how listeners represent the sound patterns of speech have been qualified by the problems of disentangling task-specific strategies from the basic perceptual mechanisms used in recognizing spoken language (Norris & Cutler,1988).In fact, the target monitoring paradigm was not really designed for investigating directly the perceptual units that represent the sounds of speech, but was developed instead as a probe task to measure the online demands of comprehension while listening to spoken language (Cutler & Norris, 1979).Although it is possible to conclude that the question of how listeners represent speech is undecidable and better left to theoretical analysis rather than empirical study, we could also conclude that previous studies have asked the question the wrong way. Perhaps it would be better to investigate the perceptual units of speech by using a different empirical method rather than by trying to coerce the monitoring paradigm to serve that particular function.

To start with, in order to understand how the sounds of speech are initially encoded into linguistic units, we must have some idea about what we mean by a unit of perception. If we are asking about basic units of perception from which higher order representations are formed, then we could start from the assumption that these basic units are atomic or perceptually indivisible (see Pomerantz, 1986).In other words, if there is some unit that can be decomposed into other subordinate linguistic units, it is not basic. (For this definition to work, we must treat perceptual decomposition as different from inferential decomposition. In other words, it is likely that listeners can always use their metalinguistic knowledge to make inferences about linguistic units but this ability is not what we are interested in investigating.)How can we tell if a perceptual unit is constructed of component parts that are also psychological units themselves? Garner (1970, 1974) has outlined the rationale by which we can investigate the psychological representation

of perceptual units. According to this logic, we can start by defining analytically the dimensional or constituent structure of a particular unit. For example, a phoneme may have a constituent feature structure or a syllable may have a constituent phoneme structure. If subjects can make a decision about one of the constituent parts, independent of the other constituent parts, we can think about the unit as decomposable into that constituent structure. Consider the example of a syllable which is composed of a sequence of phonemes. We can ask listeners to decide whether the consonant in a CV syllable is a /b/ or a /d/. If listeners can make this decision without being influenced by the other phoneme in the same syllable, namely the vowel, it seems reasonable to conclude that the phonemes within a syllable are represented as discrete and separate entities. Garner described two experimental conditions that can be compared to determine whether a perceptual unit can be decomposed into smaller parts. If it can be decomposed, then there are more basic units than the one being investigated (i.e., its constituents). In one condition, the unidimensional condition, only the target dimension varies. For our example of the syllable, subjects would be instructed to decide if each CV syllable contained a /b/ or /d/and the vowel would be held constant, so listeners only hear two different syllables. This condition provides baseline data about the time it takes listeners to recognize a consonant within a syllable. In a second condition, the orthogonal condition, the vowel also varies so that there are four different stimulithe two consonants combined with two vowels. In this condition, subjects perform exactly the same task of deciding whether the consonant is /b/ or /d/. If the consonant and vowel are completely independent perceptual units, listeners should be able to identify the target without being influenced by irrelevant variation in the context. If this is the case, response times should be the same in both conditions. On the other hand, if the syllable is an integral unit, decisions about one part of a syllable should be affected by variation in another part; response times should be longer in the orthogonal condition than in the unidimensional condition. Wood and Day (1975) carried out this specific study to investigate the integrality of adjacent phonemes within a syllable. They found that recognition times for a consonant were significantly slowed by independent variation in the vowel and vice versa. These results demonstrate that the consonant and vowel are perceptually integral within a syllable. This suggests that the syllable may be the basic unit representing the sounds of speech. However, just as basic perceptual units should not be easily decomposed into more primitive units, they should not be integral with each other. Basic units of perception should be independent of one another. This means that listeners should be able to make decisions about the information contained within one syllable without depending on the information in adjacent syllables. If syllables are basic units for recognizing the sound patterns of spoken language, then we would expect each syllable to stand on its own. If the sound pattern information in one syllable is integral with the information in an adjacent syllable, then the basic unit representing sound patterns would be a higher order description than the syllable, or it would be a representation that organizes information regardless of syllable structure or boundaries.

Mills, C. B. (1980b). Effects of the match between listener expectancies andcoarticulatory cues on the perception of speech. Journal of Experimental Psychology: Human Perception and Performance, 6, 528-535.Murphy, G. L., & Medin, D. L. (1985). The role of theories in conceptualcoherence. Psychological Review, 92, 289-316. Norris, D., & Cutler, A. (1988). The relative accessibility of phonemes andsyllables. Perception & Psychophysics, 43, 541-550. Nusbaum, H. C. (in press). Understanding speech perception from the perspective of cognitive psychology. In J. Charles-Luce, P. Luce, & J. R.Sawusch (Eds.), Theories in spoken language: Perception, production and development. Norwood, NJ: Ablex Publishers.Ohman, S. E. G. (1966). Coarticulation in VCV utterances: Spectrographicmeasurements. Journal of the Acoustical Society of America, 39, 151-168.Pastore, R. E., Ahroon, W. A., Puleo, J. S., Crimmins, D. B., Golowner, L., &Berger, R. S. (1976). Processing interaction between two dimensions of nonphonetic auditory signals. Journal of Experimental Psychology: Human Perception and Performance, 2, 267-276.Pisoni, D. B. (1978). Speech perception. In W. K. Estes (Ed.), Handbook of learning and cognitive processes (Vol. 6, pp. 167-233). Hillsdale, NJ:Lawrence Erlbaum.Pisoni, D. B. (1981). In defense of segmental representations in speech processing. In Research on Speech Perception Progress Report No. 7 (pp.215-227). Bloomington: Indiana University, Speech Research Laboratory.Pomerantz, J. R. (1986). Visual form perception: An overview. In E. C. Schwab& H. C. Nusbaum (Eds.), Pattern recognition by humans and machines:Vol. 2. Visual perception (pp. 1-30). Orlando, FL: Academic Press.Pylyshyn, Z. W. (1981). The imagery debate: Analogue media versus tacticknowledge. Psychological Review, 87, 16-45.Quine, W. V. (1964). Speaking of objects. In J. A. Fodor & J. J. Katz (Eds.), The structure of language: Readings in the philosophy of language

(pp.479-518). Englewood Cliffs, NJ: Prentice-Hall.Remez, R. E., Rubin, P. E., Pisoni, D. B., & Carrell, T. D. (1981). Speech perception without traditional speech cues. Science, 212, 947-950.

Samuel, A. G. (1977). The effect perception: Noncategorical perception. Perception & Psychophysics, 22,

of

discrimination

training

on

speech

321-330.Samuel, A. G. (1986). The role of the lexicon in speech perception. In E. C.Schwab & H. C. Nusbaum (Eds.), Pattern recognition by humans and machines: Vol. 1. Speech perception (pp. 89-111). Orlando, FL:Academic Press.Savin, nonperceptual reality of the phoneme. Journal of Verbal Learning and Verbal Behavior, 9, 295-302.Sawusch, J. R., & Nusbaum, H. C. (1983). Auditory and phonetic processes in place perception for stops. Perception & Psychophysics, 34, 560-568.Selkirk, E. O. (1982). The syllable. In H. van der Hulst & N. Smith (Eds.), The structure of phonological representations (Part 2, pp. 337-383).Dordrecht, Holland: Foris.Shand, M. A. (1976). Syllabic vs segmental perception: On the inability to ignore"irrelevant" stimulus parameters. Perception & Psychophysics, 20, 430-432.Smith, E. E., & Medin, D. L. (1981). Categories and concepts. Cambridge, MA:Harvard University Press.Smith, E. E., Shoben, E. J., & Rips, L. J. (1974). Structure and process insemantic memory: A featural model for semantic decisions. Psychological Review, 81, 214-241.Stevens, K. N. (1960). Toward a model for speech recognition. Journal of the Acoustical Society of America, 32, 47-55.Stevens, K. N., & Blumstein, S. E. (1978). Invariant cues for place of articulationin stop consonants. H., B., & Bever, T. G. (1970). The

Journal of the Acoustical Society of America, 64, 1358-1368.Stevens, K. N., & Halle, M. (1967). Remarks on analysis by synthesis anddistinctive features. In W. Wathen-Dunn (Ed.), Models for the perceptionof speech and visual form (pp. 88-102). Cambridge, MA: MIT Press.Stevens, K. N., & House, A. S. (1972). Speech perception. In J. Tobias (Ed.), Foundations of modern auditory theory. Vol. 2 (pp. 1-62). New York:Academic Press.

Studdert-Kennedy, M. (1976). Speech perception. In N. J. Lass (Ed.), Contemporary issues in experimental phonetics (pp. 243-293). NewYork: Academic Press.Tomiak, G. R., Mullennix, J. W., & Sawusch, J. R. (1987). Integral processing of phonemes: Evidence for a phonetic mode of perception. Journal of the Acoustical Society of America, 81, 755-764.Treiman, R., Salasoo, A., Slowiaczek, L. M., & Pisoni, D. B. (1982). Effects of syllable structure on adults' phoneme monitoring performance. In Research on Speech Perception Progress Report No. 8 (pp. 63-81).Tversky, A. (1977). Features of similarity. Psychological Review , 84 , 327-352.Walley, A. C., & Carrell, T. D. (1983). Onset spectra and formant transitions inthe adult's and child's perception of place of articulation in stopconsonants. Journal of the Acoustical Society of America, 73, 1011-1021.Wickelgren, W. A. (1969). Context-sensitive coding, associative memory, andserial order in (speech) behavior. Psychological Review, 76, 1-15.Wittgenstein, L. (1953). Philosophical investigations

. New York: Macmillan.Wood, C. C., & Day, R. S. (1975). Failure of selective attention to phoneticsegments in consonant-vowel syllables. Perception & Psychophysics, 17, 346-350.

Вам также может понравиться