Context Effects in Spoken Language Perception - Semantic Scholar

7 downloads 0 Views 289KB Size Report
... of phonetic contrast perception that can be obtained from very young children. ..... In: F.H. Bess, J.S. Gravel, and A.M. Tharpe (Eds.). Amplification for children ...
Boothroyd. Context effects………….

Saved as Salamanca 2002 paper.doc

Context Effects in Spoken Language Perception Arthur Boothroyd 2550 Brant Street San Diego, CA 92101 Distinguished Professor Emeritus, City University of New York Scholar in Residence, San Diego State University Visiting Scientist, House Ear Institute (619) 231 7948 (Tel. and FAX) (619) 392 1740 (Mobile) [email protected] Presentation to International Conference on Foniatry, Audiology, Logopedics, and Psycholinguistics. Salamanca Spain, June 2002. Abstract Contextual evidence plays a significant role in perception. It influences the interpretation of sensory evidence and can increase both speed and accuracy. Researchers and clinicians have paid a lot of attention to the auditory sensory evidence available to hearing-impaired subjects and its enhancement by hearing aids and cochlear implants. Much less attention has been paid to the role and use contextual evidence, perhaps because of a shortage of assessment tools and quantitative methods. In this paper, two metrics are introduced. One is the k-factor, which is the ratio of the logarithms of error probabilities when perceiving with and without a particular context. The other is the jfactor, which is the ratio of the logarithms of recognition probabilities for wholes and their constituent parts. Both are derived from probability and information theory. Data from normally hearing adults, and from hearing-impaired children who use cochlear implants, provide useful insights. For example, the contribution of sentence context is shown to be attributable mainly to sentence meaning with little contribution from syntax. There are also substantial individual differences in the ability to take advantage of sentence context. Both types of subject show evidence of increased reliance on sentence context as the available sensory evidence is reduced. When perceiving lowredundancy sentences of known topic, under difficult listening conditions, the contribution of contextual evidence can be as much as 10 times that of the sensory evidence. It is further shown that contextual constraints within consonant-vowelconsonant words lead them to behave as though they consist of between 2 and 2.5 independently perceived phonemes. This was found to be true for both the normally hearing adults and most of the pediatric implantees. A few of the implantees, however appeared not to be taking advantage of lexical context in the perception of phonemes. A model is developed that combines j and k factors so as to predict sentence-level speech perception from the perception of phonetic contrasts. Such models may be useful in predicting possible outcome on the basis of measurements of phonetic contrast perception that can be obtained from very young children.

1

Boothroyd. Context effects………….

Saved as Salamanca 2002 paper.doc

Context and perception Perception is the interpretation of sensory evidence that has been generated by the sense organs in response to patterns of physical stimulation originating from objects and events in the external world. Essentially, the perceiver chooses the most likely interpretation from the several possibilities that exist within his internal model of the external world. This internal model, which has been developed as a result of perceptual, cognitive, and social-cognitive development, represents prior knowledge that the perceiver brings to the perceptual process. But perceptual decisions are not based on sensory evidence alone. Every object exists in a context and every event occurs in a context. The context provides the perceiver with additional evidence that can influence the interpretation of sensory evidence. As with sensory evidence, the value of the contextual evidence depends on the perceiver's prior knowledge. To be useful, perception must be fast and accurate. By using contextual evidence effectively, the perceiver can increase both speed and accuracy. If reliance on context is taken to extremes, however, the probability of error increases. Perceptual skill requires an optimal balance between speed and accuracy, an optimal mix in the use of sensory and contextual evidence, and the ability to change both according to the demands of the immediate situation. To summarize, successful perception depends on four factors. The sensory and contextual evidence that are available to the perceiver and the knowledge and skills that the perceiver brings to the task (See, also, Boothroyd, 1994). Context and spoken language perception Although spoken language perception rests on the substrate of general perception, there are several aspects of the process that are unique (see, also, Boothroyd, 1997). For example: 1. The sound-generating events to be perceived are the movement patterns of speech. But these have no inherent value. Rather, they provide evidence about underlying language patterns that are, themselves, evidence about the meaning and purpose of the message being generated by the talker. 2. Similarly, the percepts consist, not of sound-generating events, but of language patterns, and their meanings. 3. There are many situations in which auditory evidence about speech movements is enhanced by visual evidence. In extreme cases of profound hearing loss, vision can even be the only source of sensory evidence - as in lipreading . 4.To the physical and social context we must add a linguistic context. Phonemes occur in the context of words. Words occur in the context of sentences. And sentences

2

Boothroyd. Context effects………….

Saved as Salamanca 2002 paper.doc

occur in the context of narratives and conversations, usually with a known topic. Contextual evidence, therefore, has a linguistic dimension. 5. To the world and social knowledge that the perceiver brings to the perceptual task, we must add linguistic knowledge - phonological, lexical, syntactic, semantic, and pragmatic. The presence of linguistic context effects has been recognized for many years. (See, for example, Rozensweig and Postman (1957) and Broadbent (1967) on the effects of frequency of word occurrence and Miller, Heise, and Lichten (1951) on the effects of sentence context). 6. The speed of spoken language perception is determined, not by the perceiver, but by the talker. As a result, the perceiver is restricted when seeking an appropriate compromise between speed and accuracy or an appropriate balance between sensory and contextual evidence. Anyone trying to understand speech in a second language is familiar with the problems created when the speed at which the talker produces speech exceeds the speed with which it can be perceived - and the errors that occur when one deals with this problem by an excessive reliance on context. Prelingual deafness and spoken language perception The child with a congenital or prelingually acquired hearing loss is at a multiple disadvantage in the perception of spoken language. First, the hearing deficit reduces the quality and quantity of sensory evidence available from the speech stimulus. Second, the resulting language deficits reduce the quality and quantity of linguistic contextual evidence. Third, cognitive and intellectual deficits resulting from the language deficit reduce the quality and quantity of non-linguistic contextual evidence. Fourth, in an attempt to deal with the reduced sensory evidence, the child may adopt a perceptual strategy that satisfies short term needs but at the expense of the long-term development of appropriate processing skills. All four aspects of perception are at risk: sensory evidence, contextual evidence, knowledge, and skill. The combined effect of these difficulties is an increased error in spoken language perception. Moreover, the demands of perception with diminished sensory and contextual evidence often lead to rapid fatigue and a reduced attention span for spoken language. Naturally, the severity of the problems just described vary from child to child depending on the severity of the hearing loss, the aptitudes of the child, and the success of sensory and educational intervention. In recent decades, we have learned a lot about the effects of hearing loss, amplification, and cochlear implants on the sensory component of spoken language perception and we have developed a range of assessment tools. But less attention has been paid to understanding and assessing the contextual component. A valuable early contribution to the assessment of context effects was the work of Kalikow, Stevens, and Elliot (1977) which led to the development of the Speech in Noise (SPIN) test, later modified by Bilger (1984). In this test, recognition is measured for the last word in two types of sentence. High-probability sentences carry a lot of

3

Boothroyd. Context effects………….

Saved as Salamanca 2002 paper.doc

semantic cues to the identity of the test word (for example, "Zebras have black and white stripes." Low-probability sentences carry no semantic cues (but do retain syntactic cues). For example, "They were talking about the stripes"). Noise is used to reduce sensory evidence. The difference between the two scores provides a measure of the use of semantic contextual evidence. Although these materials were originally developed for the express purpose of assessing the use of context, it is interesting to note that their major application has simply been as a sentence-level performance measure. Word recognition with and without sentence context k-factor theory One problem with the use of a difference between two performance measures as an index of context effects is that the difference is range-dependent. As overall performance approaches 0% or 100% the difference measure automatically approaches zero. A simple difference may not, therefore, accurately quantify the perceiver's use of context. In my own work on this topic, I have sought suitable transforms of the difference scores, using methods based on probability theory and information theory. When listening under difficult conditions, all words in a sentence must be perceived with reduced sensory evidence. But each word need not be perceived independently, as if it were presented in isolation. Once some of the words are recognized, they provide contextual evidence that can increase the recognition probability for other words. This effect can be modeled as proportional increase in the independent channels of information available to the perceiver. The resulting equation is: w

Where:

ps

w

ps pi k

w

= 1 - (1-wpi)k ................................................................................. (1) = word-recognition probability in sentences = word-recognition probability in isolation = a dimensionless factor representing the effect of sentence context .

From equation (1) one can derive an expression for k, given two estimates of recognition probability: k

= log(1 - wps)/log(1- wpi) ................................................................. (2)

In other words, k is the ratio of the logarithms of error probability with and without sentence context. Note, again, that the k-factor can be thought of as a proportional increase in the effective number of channels of independent information available to the perceiver. A k factor of 1 indicates that sentence context has no effect on word recognition. A k factor of 2 indicates that the effect of adding sentence context is equivalent to doubling the number of channels of independent information that are available when the words are

4

Boothroyd. Context effects………….

Saved as Salamanca 2002 paper.doc

presented in isolation, and so on. In a sense, one can think of the contextual evidence as multiplying the usefulness of the sensory evidence by the k-factor. Figure 1 shows the predicted effects of sentence context on word recognition for various values of k ranging from 0 to 10. The horizontal axis shows word recognition probability without sentence context, expressed in percent. The vertical axis shows the predicted word recognition with sentence context. Some k-factor data from normal adults Clearly, the effect of sentence context will depend both on the redundancy of the sentence material itself and on the knowledge and skills of the perceiver. By studying adults with normal hearing and language, one can obtain estimates of the contextual information available in the speech materials, as well as data on the normal range of variability in the use of that information. Boothroyd and Nittrouer (1988) attempted to separate the contributions of syntactic and semantic factors to the sentence context effect. Word recognition was measured in three types of sentence: a) random word strings, in which there was no sentence context (e.g., "green like went was"), b) semantically anomalous, or implausible, sentences in which syntax played the dominant role (e.g., "ducks eat old tape"), and c) highly plausible sentences in which both syntax and semantics played a role (e.g., "most birds can fly"). All sentences were four words long so as not to tax short-term memory. Noise was used to reduce sensory evidence for groups of eight young, normally hearing, adults. The results are shown in Figure 2. It will be seen that the k factor for syntax alone was only 1.35. The k factor rose to 2.70 when plausible meaning was added. From these two values, it can be shown that the k factor for semantics alone was 2.0. These data indicate that, although syntax plays a role, most of the sentence context effect is attributable to sentence meaning. There was no evidence from these data that the value of k was range-dependent. Similar values were obtained under both easy and difficult listening conditions. The sentences were, however, extremely short. Boothroyd and Kosky (1991) measured the effect of sentence context in a more realistic situation. The sentence materials were the CUNY topic-related sentence sets. Each set consists of twelve sentences ranging in length from three through fourteen words. Each sentence is about one of twelve topics that remain constant throughout the sets. Before each sentence was heard, subjects were informed of the topic. The purpose was to simulate a conversational setting in which the topic of a sentence is usually known. Recognition of words in isolation was measured using the AB words lists. Each list contains ten consonant-vowel-consonant words. In this study the sensory evidence was reduced, not by noise, but by low-pass filtering. Mean data for six normally hearing adults are shown in Figure 3. Two aspects of these data are noteworthy. First, the k factors are considerably higher than those reported earlier. Three factors may account for the difference. One is the influence of prior knowledge of sentence topic. The second is the use of monosyllables for measuring word recognition in isolation. The sentences

5

Boothroyd. Context effects………….

Saved as Salamanca 2002 paper.doc

contained many polysyllables, which are easier to perceive than monosyllables. Third, most of the sentences were longer and more complex than those used in the Boothroyd and Nittrouer study. They would be expected, therefore, to provide more contextual information, including that available from suprasegmental cues. The second noteworthy aspect of the data in Figure 3 is that the value of k does not remain constant across listening conditions. Instead it falls as listening becomes easier. The implication is that the subjects were changing their perceptual strategy to take more advantage of contextual evidence as the amount of sensory evidence was reduced. The fact that this phenomenon was not observed in the Boothroyd and Nittrouer study is probably, again, related to the use of shorter sentences in the earlier study. The heavy line in Figure 3 is a least-squares fit to equation (1) with the added assumption that k falls exponentially from some starting value, when word recognition in isolation is 0%, towards an asymptote of 1. Thus: ki = (1+ a*e(-i /b))........................................................................................ (3) Where: i = wpi = word recognition in isolation ki = value of k when word recognition = i a+1 = value of k for i = 0 (i.e., k0) b = an empirically determined decay constant The best fit to the data of Figure 3 was obtained with k0 = 10.3 and b = 58.4, giving a predicted k100 of 1.9. An important finding from this study was that k values differed significantly among subjects (p