Sample Form - Semantic Scholar

1 downloads 0 Views 590KB Size Report
Shilha no contrast contrast contrast contrast contrast. Khasi contrast contrast contrast ... and thus more languages exhibit voicing contrast in those contexts.
Perception of voicing in fricatives1 H yesun C ho and M aria G iavazzi  Department of Linguistics and Philosophy, MIT 77 Mass ave., Cambridge MA 02139 USA [email protected], [email protected] A bstract Stops and strident fricatives show a similar pattern of voicing neutralization. Voicing contrast is preserved more often (i) in between sonorants than in word boundaries; (ii) word-initial than word-final position. In case of stops, the asymmetry between word-initial and word-final position is due to the availability of VOT cues. However, it has been less investigated what cues are responsible for fricative voicing neutralization. The present study has identified the important cues to voicing distinction of fricatives in the intervocalic position. It turned out that the cues in surrounding vowels as well as frication duration are important. This explains the asymmetry between intervocalic position versus word boundaries, though it does not explain the asymmetry between word-initial and word-final positions. K eywords: fricatives, voicing contrast, neutralization, Licensing by Cue, perception

1. Introduction Speech perception and speech production condition the phonological distribution of sounds and play an important role in determining whether the contrast between two classes of sounds is implemented or whether it is neutralized (Liljencrants and Lindblom 1972, Ohala 1981, Steriade 1997). Crucial to the interaction of perception and phonological processes is the concept of “cue,” defined following Wright (2001) as the information in the acoustic signal, which allows the listener to apprehend the existence of phonological contrast. The absence of an important cue to the perception of a contrast 1 

We thank Edward Flemming, Donca Steriade, and Adam Albright for their precious help. We are also

grateful to the audience of CIL 18 for their comments. All errors are ours. The research was funded by the MIT Linguistics department. 

 

The names are in alphabetical order. Both authors contributed equally to this paper.

in a given phonological context has as a major consequence the triggering of neutralization. Steriade  (1999:4)  hypothesizes  that  “the  likelihood  that  distinctive  values  of  the  feature F will occur in a given context is a function of the relative perceptibility of the F-contrast  in  that  context”  (Licensing  by  Cue  Hypothesis). Given that different phonological contrasts rely on different cues, different contrasts show specific and individual patterns of distribution. Also, the availability and the nature of the cues to a given contrast type vary systematically with the context in which the segments occur: different  phonological  contrasts  are  therefore  ‘licensed’  in  different  positions. An extensive recent literature has shown that perceptual considerations shape the typology of contrast neutralization: more perceptible contrasts are preferred to less distinct contrast, and contrasts are usually neutralized first in environments where they would be less distinct (cf. Dispersion Theory of Contrast, Flemming 1995, 2002). Not all cues to a contrast are the same: a distinction has to be drawn between internal and external cues (Steriade 1997: 9). Whereas internal cues are realized during the segment itself, e.g. vowel formants and fricative noise, external cues are realized on adjacent segments, e.g. formant transitions, or they rely on the presence of adjacent segments, e.g. VOT (Flemming 2007). The availability of external cues depends on the environment of the target segment; the presence of external cues is thus highly variable across contexts. On the other hand, internal cues are less context-dependent as they are intrinsic acoustic properties of the segment itself. The paper is organized as follows: In Section 2, we report the pattern of voicing contrast neutralization of stops and that of fricatives. In Section 3, we will present our experiment and the results. In Section 4, we will discuss how the unavailability of important cues in certain conditions would result in the typological asymmetries in the patterns of voicing contrast neutralization. Section 5 is the conclusion. 2. Voicing contrast in stops and in fricatives

2.1 Voicing contrast in stops The distribution of voicing contrast in stops is strongly conditioned by the availability of acoustic cues to voicing in different contexts (Kingston and Diehl 1994, Steriade 1997). Contexts in which more cues are available are those in which the voicing contrast is more likely to be preserved cross-linguistically. Among the acoustic correlates of stop voicing, cues in the transition following the

release of the stop are more important to the perception of the contrast, than cues in the closure (Raphael 1981). VOT is a more perceptually salient cue than closure duration (Lisker 1957). Other cues to the presence of voicing are amplitude and duration of the release burst (Repp 1979) and the amplitude of F1 at release (Lisker1986) and F1/F0 adjacent to the closure (Haggard, Ambler and Callow 1970). Cross-linguistic patterns of [±voice] neutralization provide evidence for the link between contrast preservation in a certain position and availability of perceptible cues in that context: no language neutralizes the voicing contrast in a more informative context, unless it also does so in a less informative context. The patterns across languages are shown in (1). (1) Patterns of voicing neutralization in stops (Steriade 1997) Context L anguage Totontepec Mixe

#_O, O_#

R_O

R_#

_R

R_R

no contrast

no contrast

no contrast

no contrast

contrast

Lithuanian

no contrast

no contrast

no contrast

contrast

contrast

French

no contrast

no contrast

contrast

contrast

contrast

Shilha

no contrast

contrast

contrast

contrast

contrast

Khasi

contrast

contrast

contrast

contrast

contrast

O = obstruent, R = sonorant

Contrast is very rarely attested in stop clusters: in both positions of the cluster, C1 and C2, important cues to contrast are missing. VOT information is not available to discriminate voicing in C1, and information about cues in the preceding vowel (such as duration) is not available in C2. The voicing contrast is even less frequent in obstruent clusters at the edges of a word: in a word-initial cluster both cues in the preceding vowel and VOT cues in C1 are missing, the same is true for C2 in a word-final cluster. Only cues in the closure (duration and voicing) are preserved. As the contexts become more informative (moving rightwards in (1)): voicing identification becomes more reliable and thus more languages exhibit voicing contrast in those contexts. Examples from Russian are shown in (2). (2)

Russian (data from Padgett 2002), arranged by descending cue availability.

Contrast R_R

sled-a 'track (GEN.SG.)’ sviet-a

'light (NOM.SG.)’

Contrast _R

teatr 'theater'

'film sequence'

No Contrast R_#

slet    'track (NOM.SG.)’ sviet

No Contrast #_O

gde

'where'

*kde, gte

kto

'who'

*kdo, gto

kadr

'light (GEN.PL.)'

A crucial part of the Licensing by Cue hypothesis is that different phonological contrasts are licensed by different cues and therefore can present different distribution. There are no specific syllabic positions where contrast is in general less attested, different contrasts are neutralized in positions which are specific to the cues signaling the contrast. Steriade (1997:30) shows that this hypothesis not only accounts for the distribution of voicing contrasts, but also for the distribution of other contrasts, e.g., preand post aspiration and glottalization contrast. There are languages which maintain voicing contrast before all consonants (e.g., bt, pt, bn, pn (intial clusters)) but they only maintain aspiration contrast before sonorants (thm, phn *thp, *phd) (e.g., Khasi (Henderson 1976)). Although voicing and post-aspiration contrasts are licensed by almost the same set of cues, the non-perfect overlap of cues results in a different crosslinguistic distribution.

2. 2 Voicing contrast in fricatives 2.2.1 Acoustic and articulatory properties of voiced and voiceless fricatives Little attention has been devoted until now to the perceptual cues to voicing in fricatives, but purely acoustic properties of voiced and voiceless fricatives have been described at length in the literature. Voicing during the frication, as well as voicing during stops is subject to the aerodynamic voicing constraint (Ohala 1983, Stevens 2000). In order for voicing to occur, two basic requirements have to be met. First, the vocal chords must have the appropriate degree of tension and the appropriate degree of adduction. Second, there must be air flowing through the glottis (Ohala 1983). In stops, voicing is hard to maintain because the air flowing through the vocal folds accumulates in the oral cavity raising the oral pressure to the point where voicing ends (i.e. when oral pressure equals to subglottal pressure). In fricatives, the difficulty to maintain voicing could theoretically be smaller, given that some of the air accumulated in the oral cavity escapes, lowering the oral pressure. However this is not the case. The difficulty comes about from the conflict between the production of voicing and the generation of turbulence that is required for the identification of a fricative and its place of articulation. For the generation of strong frication turbulence, high oral air pressure is necessary and, as in stops, if the oral air pressure is raised too much, it becomes too close in magnitude to subglottal air pressure and the production of voicing will cease. In the production of strident fricatives such as /z/ this issue is even more relevant (Beckman et al. 2006): in order to distinguish strident from non-strident fricatives, strident fricatives need to be

produced with a large amplitude of frication turbulence. Ohala (1997) observes that there is statistically greater tendency for fricatives to favor voicelessness than for stops: 24%  of  the  world’s  languages  have  only  voiceless  stops and about 38% have only voiceless fricatives. A rough comparison of the acoustic properties of voiced and voiceless fricatives is given in (3). (3) Acoustic properties of voiceless and voiced fricatives in a V1-C-V2 sequence C V1, V2

A coustic properties Frication duration Voicing during frication V1 duration F0/ V1, V2 F1/ V1, V2

Voiceless fricative Longer No Shorter Higher Higher

Voiced fricative Shorter Yes Longer Lower Lower

(Adapted from Steven’s (2000, passim) description of fricatives)

2.2.2 Phonological patterns: distribution of the contrast and contrast neutralization. The distribution of voicing contrast in strident fricatives is to a certain extent similar to the distribution of voicing contrast in stops. The preservation of contrast in stops is dependent on the presence of a following sonorant segment since important cues such as VOT are available in those contexts. Many languages neutralize voicing contrast in word-final strident fricatives as well as in fricatives preceding a stop, similarly to the pattern of stops. In this section we describe a few cases of fricative voicing neutralization that show strong similarities with the previously described stop cases. (4) illustrates the distribution of voicing contrast in strident fricatives with a sample of languages. (4) Patterns of voicing neutralization in fricatives Context L anguage Italian (e.g. Tuscan dialect)

V_ V

#_V

V_#

no contrast

no contrast

no contrast

German, Italian (e.g. Milanese dialect)

Contrast

no contrast

no contrast

Russian, Polish, Czech

Contrast

contrast

no contrast

Hungarian, Turkish, Romanian, French

Contrast

contrast

contrast

German In German, voicing contrast of strident fricatives is maintained only in the intervocalic environment; otherwise (word-finally, initially) it is neutralized. In wordinitial position, fricatives neutralize to voiced [z], though in unassimilated loanwords,

[s] is also found (Wiese 1996: 12). In word-initial position, fricatives neutralize to voiceless [s]. (5) a. Voicing contrast preserved between sonorants Contrast [+son]_[+son] Gräzer [z] 'grass (PL)' b. Voicing contrast is neutralized word-initially No Contrast #_[+son] Saat [z] 'seed' Sahne [z] ‘cream’ c. Voicing contrast is neutralized word-finally No Contrast [+son]_# Gras [s] 'grass (SG)'

Füße [s]

'foot (PL)'

*Saat [s] *Sahne [s]

*Graz [z]

Labio-dental fricatives display a different voicing pattern, in that they neutralize only in word-final position but not in word-initial and intervocalic positions: (6) a. Voicing contrast is preserved in word-initial and intervocalic position wolle [v] ‘wool’ volle [f] ‘full (PL)’ Archive [v] ‘archives (PL)’ Wölfe [f] ‘wolves (PL)’ b. Voicing contrast is neutralized in word-final position Archiv [f] ‘wolf’ (SG) *Archiv [v] This paper will focus on the voicing contrast in alveolar fricatives only; other fricatives are not discussed here.

Russian In Russian, the loss of distinctive voicing in strident fricatives occurs in word-final position. Contrast is preserved word-initially and before sonorants. The data in (7) shows this pattern. (7) a. Voicing contrast is preserved before sonorants: Contrast [+son]_[+son] les-a ‘forest (GEN.SG.)’ niz-a ‘bottom (GEN.SG.)’ Contrast #_[+son] sad- ‘orchard (NOM.SG.)’    zad-

‘back (NOM.SG.)’

b. Voicing contrast is neutralized everywhere else No Contrast [+son]_# les- ‘forest (NOM.SG.)’ nis- ‘bottom (NOM.SG.)’

Turkish In Turkish, word-finally, stops neutralize but fricatives do not. The distribution of voicing in fricatives has the same pattern as in English, where phonological voicing contrast is preserved everywhere (inter-vocalically, word-initially, and word-finally)2. (8) a. Voicing contrast is preserved before sonorants Contrast [+son]_[+son] asa ‘stick’ aza ‘member’ Contrast #_[+son] su ‘water’ zor ‘trouble’ b. Voicing contrast is preserved word-finally Contrast [+son]_# kaz ‘goose’ kas ‘muscle’ In spite of the fact that a comprehensive survey of the cross-linguistic distribution of fricative voicing neutralization has not been done, the small sample of languages illustrates a pattern which can be summarized by the implicational hierarchy below. (9) No Contrast V_V  No Contrast #_V  No Contrast V_#, No Contrast V_O The sample of languages presented in table 3 suggests that, as is the case for stop voicing neutralization, word-final neutralization in fricatives is more common than word-initial neutralization. More precisely, word-initial neutralization seems to imply word-final neutralization but not vice versa. 3. E xperiment: Identifying important cues to voicing in fricatives The similar contrast neutralization patterns in stops and fricatives (§2) suggests that the perception of fricative voicing would also depend on cues that rely on the presence of a following sonorant segment, just as the VOT cue for stops does. Until now, 2

In spite of the absence of phonological contrast neutralization, both stops and strident fricatives are

phonetically devoiced word-finally, i.e. spectrograms reveal that most of the times voicing is not produced throughout the closure/frication but only in the vicinity of the preceding sonorant.

however, perceptual cues to voicing in alveolar fricatives have not been thoroughly analyzed, and therefore there is no evidence of any sonorant-dependent property as a cue. The experiment reported here was designed to assess the saliency of individual acoustic cues to the perception of voiced and voiceless alveolar fricatives in the intervocalic environment.

3.1. Experimental Procedure Our perceptual study was a forced-choice identification task. Individual cues were edited one at a time, a method similar to Raphael’s (1981) work on the acoustic cues to  stop voicing in American English. In order to investigate the contribution of individual cues to the perception of voicing in alveolar fricatives, we hypothesized that when a listener is presented with a stimulus which is overall ambiguous but contains one salient cue pointing him to one voicing category, e.g., [+voice], he will be prone to systematically categorize the stimulus as being [+voice] because of the high perceptibility of that cue. In contrast, when the listener is presented with a stimulus which is overall ambiguous but contains a marginal cue pointing him to one voicing category, e.g., [-voice], he will categorize the stimulus as being sometimes [+voice] and sometime [-voice], because the marginal cue is not salient enough to override the overall ambiguity. The experiment was therefore designed to investigate the strength of each cue in moving the stimulus away from the categorical boundary (chance level [s] and [z] responses), into one of the voicing categories (more than chance level [s] or [z] responses). To do so, we first created the ambiguous base, using the measurements from five speakers. We moved this ambiguous stimulus into two directions of [s] or [z] by editing one cue at a time.

3.1.1 Source material Five American English speakers (males, age 25-35) were recorded while reading [asa] and [aza] in a sound attenuated booth. Each item was repeated three times and embedded in a carrier sentence (“Say x please”) to eliminate the effect of the boundary  tone on the final vowel when they were read isolated. The recording was made to obtain an average estimation of the acoustic properties of intervocalic /s/ and /z/. The speakers did not report any history of speaking or hearing disorder. The sound files were recorded at a sampling rate of 44.1 kHz. The following acoustic properties were measured using Praat: duration and intensity of the frication; duration of the

surrounding vowels; F0, F1 and F2 at the offset of V1 and at the onset of V2. The measurements were averaged across speakers and the standard deviation (SD) was calculated. The speaker whose values best fit within one SD from the average was chosen to be the representative speaker. In order to make the two tokens more representative, they were edited for the parameters which departed more than within 1 SD, based on the values across speakers. The resulting two tokens ([asa] and [aza]) were used as the source material to create the base stimulus for the editing. The acoustic properties of the two tokens were in line with the descriptions of intervocalic fricatives in (3) in Section 2. Below are the spectrograms of the two tokens. (10) a. [asa]

b. [aza]

The frication duration was longer in the voiceless fricative [s]. Voicing bars were present during the frication of [z] but not [s]. The preceding vowel (V1) was longer with [s] than with [z]. F0 and F1 on the preceding and succeeding vowels (V1, V2) were higher with voiceless [s] than with voiced [z].

3.1.2 Preparing stimuli Starting from the two tokens described above, an ambiguous stimulus was created through editing; its acoustic properties were exactly in the middle between those of the voiced and those of the voiceless fricative. Pretesting revealed that in order for the token to be perceptually ambiguous, there should be no voicing at all; therefore the frication of the ambiguous stimulus, the Base, was the frication of voiceless [asa], with edited duration. Moreover, we could not manipulate formant values of the vowels to the middle values since Praat does not allow manipulating formants without synthesis. We started with the vowels of [asa] to create the [Base], setting all the values to the averaged values of [asa] and [aza], except for the formant values which remained the same as those of [asa]. Despite this apparent [s] bias, in the pretesting phase of the experiment, subjects reported that the Base stimulus was very ambiguous: half of the listeners categorized it as [s], and half as [z]. The values of the acoustic variables of the

tokens [asa], [aza] and the Base are shown in (11). (11) Values of the acoustic variables V1 dur.

V2 dur.

F ric. dur.

F0/V1

F0/V2

F1, F2/V1

F1, F2/V2

[asa]

122

77

94

129

153

812

1248

525

1472

[aza]

152

88

52

106

95

447

1293

480

1371

Base

137

82

73

118

121

837

1247

495

1424

The perceptual effect of each cue was tested by subjecting the Base to a number of editing operations which consisted of manipulating single variables into the direction of [s] and of [z]. The stimuli for the perceptual experiment therefore differed from the Base only in one single cue. Assuming that the ambiguous Base lies near the categorical boundary between [s] and [z], the editing of each cue towards their values for [s] and [z] made the stimuli closer to [s] and [z] respectively. The directions and the effects of the editing are schematized in (12): (12) Stimulus editing [asa] [a?a] (the ambiguous Base) [aza]

categorical boundary

(13) presents the values of the variables for each single-variable edited stimulus type. All the cues were manipulated one at a time into two directions, except for the formants. Formant cues were edited only to the [z] direction because the formants of the base were the same as the formants of the vowels in [asa], for the reason described above. (13) Single-variable editing C ue Direction

V1 Dur

V2 Dur

F ric Dur

F0/V1

F0/V2

F1, F2/ V1

F1, F2/V2

Unedited [asa]

122

77

94

129

153

812

1248

525

1472

Unedited [aza]

152

88

52

106

95

447

1293

480

1371

[Base]

137

82

73

118

121

837

1247

495

1424

V1 dur [s]

122

82

73

118

121

837

1247

495

1424

V1 dur [z]

152

82

73

118

121

837

1247

495

1424

V2 dur [s]

137

77

73

118

121

837

1247

495

1424

V2 dur [z]

137

88

73

118

121

837

1247

495

1424

F ric dur [s]

137

82

94

118

121

837

1247

495

1424

F ric dur [z]

137

82

52

118

121

837

1247

495

1424

F0/V1 [s]

137

82

73

129

121

837

1247

495

1424

F0/V1 [z]

137

82

73

106

121

837

1247

495

1424

F0/V2 [s]

137

82

73

118

153

837

1247

495

1424

F0/V2 [z]

137

82

73

118

95

837

1247

495

1424

F1, F2/ V1 [z]

137

82

73

118

121

447

1293

495

1424

F1, F2/ V2 [z]

137

82

73

118

121

837

1247

480

1371

In addition, a second set of stimuli was created from the Base by manipulating more than one variable at the time: the cumulative perceptual effect of vowel cues was tested by replacing (splicing) both vowels of the Base with vowels of [asa] and [aza] respectively. Although the frication was ambiguous, formant transitions and vowel durations pointed towards one of the two voicing categories. Acoustic properties of these two stimuli are listed in (14). (14) Vowel replacement C ue Direction

V1 Dur

V2 Dur

F ric Dur

F0/V1

F0/V2

Unedited [asa]

122

77

94

129

Unedited [aza]

152

88

52

[Base]

137

82

V1 and V2 [asa]

122

V1 and V2 [aza]

152

F1, F2/ V1

F1, F2/V2

153

812

1248

525

1472

106

95

447

1293

480

1371

73

118

121

837

1247

495

1424

77

73

129

153

812

1248

525

1472

88

73

106

95

447

1293

480

1371

Finally, a third set of stimuli was created with the aim of identifying the effect of voicing during the frication on the perception of voicing distinction in fricatives, all cues other than frication voicing were kept as in the Base, as shown in (15). Voicing was added to the voiceless frication of the Base on a continuum from 10 to 50%.

(15) Frication voicing addition C ue Direction Unedited [asa]

V1 Dur 122

V2 Dur 77

F ric. Dur 94

Unedited [aza]

152

88

[Base]

137

82

10% [z] voicing

137

20% [z] voicing 30% [z] voicing

Voicing

F0/V1

F0/V2

F1, F2/ V1

F1, F2/V2

None

129

153

812

1248

525

1472

52

None

73

None

106

95

447

1293

480

1371

118

121

837

1247

495

1424

82

73

10%

118

121

837

1247

495

1424

137

82

137

82

73

20%

118

121

837

1247

495

1424

73

30%

118

121

837

1247

495

1424

40% [z] voicing

137

82

73

40%

118

121

837

1247

495

1424

50% [z] voicing

137

82

73

50%

118

121

837

1247

495

1424

Fillers were added to the twenty-two target stimuli: these were sixteen stimuli of the form [VsV] and [VzV], where V= e, i, o, u. The entire stimulus set was RMS equalized using Becker’s (2002) Praat script.

3.1.3 The perceptual experiment The subjects who participated in this study were nineteen native English speakers (age 25- 55). None of them reported history of hearing or speaking impairment. The experiment was a two-alternative forced-choice identification task. Stimuli were randomized and presented in three blocks (((22+16) stimuli * 6 repetitions) * 3 blocks = 684 tokens total) using Psyscope X B5 1D. Each stimulus was thus repeated 18 times. The listeners were asked to do a speeded categorization task, by pressing one of the two keys, labeled [S] and [Z], as fast as they could. Each trial was timed out 2000 ms. after the sound was played, and then a screen told subjects to respond faster. There was an interval of 200ms between the key press and the beginning of the following stimulus. 3.2.Results 3.2.1 Effect of the single-variable manipulations on the subjects’ responses The analysis of the subjects’ responses revealed that listeners are sensitive to most of the edited acoustic properties. The chart in (16) plots the probability of categorizing the stimulus as containing [s], p(s), for the stimuli for which one cue is edited at a time into both directions. Responses to the stimuli are plotted by comparing the effect of symmetric manipulations (editing the same cue in the two directions): the adjacent bars show that when the cue was edited towards the [s] value, p(s) rose, and when the cue was edited towards the [z] value, p(s) decreased. These results reveal thus that the direction of the manipulations is perceptually salient.

(16) Responses to single-variable edited stimuli (average across subjects)

Unedited instances of [asa] and [aza] were almost always correctly identified by the subjects. Among the stimuli edited by a single variable, frication duration has the biggest effect on the response type. Cues in V1 are also salient to the subjects. Cues in V2 (duration and F0) had a smaller effect on the perception of voicing. However, this should not be interpreted to mean that cues in V1 are generally more important than those in V2. This asymmetry between V1 and V2 may have come from the fact that our speaker put stress on V1 and substantially reduced and shortened V2 in both [asa] and [aza]. Thus, V1’s were distinct enough from each other in [asa] and [aza] whereas V2’s  in both utterances were similar to each other. It is therefore not surprising that cues in V1 had a greater discriminating effect than cues in V2. It is plausible that cues in V2 appear less perceptually salient because the segment was less salient overall, and not because of its intrinsic marginality. Nevertheless, the results may also reflect the fact that the duration of the preceding vowel can often be a more crucial cue to voicing in both stops and fricatives than the following vowel, especially in a language like English where final phonological voicing neutralization does not occur, but where phonetic final devoicing is attested. 3.2.2 Effect of vowel replacement and frication voicing on the subjects’ responses The chart below plots pairwise comparisons between the probability that the Base is categorized as voiceless, and the probability that the edited stimuli are perceived as voiceless. Importantly, the results from the pretesting phase are confirmed during the experiment: although there is a slight bias to categorize the Base as [asa], the stimulus turned out to be ambiguous with p(s) almost 0.5.

(17) Comparing the Base stimulus to the edited stimuli

The effects of the single-cue manipulations are discussed below.

Vowel formants: Formant editing had a big effect on the response type. It should be noted that F1 and F2 cannot be manipulated separately from the entire spectrum, and so our method of formant editing was rather indirect: Instead of changing formant values in the Base stimulus, we had to splice the entire [asa] or [aza] vowels onto the Base and subsequently edited other values (duration and F0) back to those of the Base except for the formant values. Because of this, we cannot guarantee that acoustic properties other than just formants were not transported to the Base beyond our control. Although for this reason we may have to underestimate the size of the effects of F1 and F2 cues than the results presented here, the formant cue should still be considered as one of the strongest. As for the formants on V2, the same caveat applies as in the previous section. Vowel replacement: The replacement of both vowels had a great effect on the categorization: Cues contained in the vowels are therefore very perceptually salient. This result is not surprising given that the replacement involves giving the listener at least three types of information regarding voicing distinction: vowel duration, formant transitions into the frication, formant transitions out of the frication. All of these cues were found to be individually salient, and therefore the additive effect was expected. The fact that replacement of the Base vowels with the vowels from [asa] has a much smaller perceptual effect than the replacement with the vowels from [aza] arises from the acoustic properties of the Base itself (§ 3.1.2): The smaller perceptual effect can be

attributed to the slight bias of the Base, towards [s], as mentioned above.

F rication voicing: The addition of frication voicing along a continuum interestingly had a linear effect on the subjects’ responses: as the frication of the [Base] becomes more voiced, subjects  are less likely to categorize the stimulus as being voiceless. More precisely, a linear decrease of p(s) corresponds to a linear increase of frication voicing. This finding suggests not only that even a small percentage of voicing is very perceptually salient, but also that perception of voicing during the frication is linear. 3.2.3 Perceptual effect of the cues to frication voicing A more telling method to quantify the importance of a cue to the perception of a phonological contrast is to quantify the perceptual distance between the individual editings and the categorical boundary. The greater the distance between the ambiguous stimulus and the target stimulus edited for one acoustic dimension, the greater the perceptual saliency of that dimension for the listener. More precisely, if the editing of one single cue shifts the distribution of the responses to the stimulus by a large amount with respect to the responses to the ambiguous [Base], that cue is salient to the perception of the contrast under investigation. There is an open debate in the literature as to which is the distance function for computing distance relationships between stimuli in the perceptual space. In our study, distance  measures  were  obtained  according  to  MacMillan  and  Creelman’s  Detection  Theory (2005). The stimuli were classified as one-dimensional on the assumption that the single edit of perceptual cues will only produce a one-dimensional change in the perception of the stimulus. We are nevertheless aware of the fact that a one-dimensional change in the acoustic property of a stimulus may sometimes result in a multidimensional  change  on  a  perceptual  scale.  “[…]  the  question  of  the  perceptual dimensionality of a stimulus set is distinct from that of physical structure. Stimuli differing in one dimension can produce multi-dimensional  perceptual  changes.”  (MacMillan and Creelman, 2004: 114). In the event that some of the editings had this unwanted result, our analysis will not capture the multi-dimensionality of the change, by treating them as one-dimensional. Perceptual distance was calculated between the Base stimulus and each edited target stimulus in order to find out whether the editing of any cue resulted in bigger perceptual distance compared to the editing of other cues. Distance is measured by the sensitivity statistics d’, defined in terms of z, the inverse of the normal distribution function:

(19) d’ = z(S1) – z(S2) where the z transformation converts the probability of a response into standard deviation units. A probability of 0.5 is converted into a z score of 0, small probabilities into negative, and large probabilities into positive z scores. Probabilities of exactly 1 or exactly 0 have infinite d’. For this reason, we converted probabilities of 1 and 0 to 0.95 and 0.05. S1 was consistently chosen to be the stimulus with the highest p(s) between the two stimuli to be compared, regardless of whether it was the Base, or the edited stimulus. (20) Converting p(s) into d’-values d’ = z(S1) – z(S2) S1: Stimulus with higher p(s) S2: Stimulus with lower p(s) where S1 or S2 is the “base”

Starting from the ambiguous base, most cues were edited in two directions: towards [s] and towards [z], yielding two symmetric stimuli. Since the manipulation was done along a single dimension, d’ values of the same cue in the two directions could be added. For each symmetric cue, the cumulative d’ was calculated as in (21): (21) d'(s, z) = d'(s, Base) + d'(Base, z) For those stimuli which do not have a symmetric counterpart, d’ could not be added and instead the individual d’ values will be reported. The table in (22) shows the d’ values of the different cues, arranged in descending order. (22) Effect of the manipulations on d’ C ues V1 and V2 replacement

2.368

50% [z] voicing

1.059

F1, F2/V1

0.985

Frication duration

0.859

d’

V1 duration

0.749

40% [z] voicing

0.702

F0/V1

0.690

F0/V2

0.667

30% [z] voicing

0.603

20% [z] voicing

0.454

F1, F2/V2

0.340

10% [z] voicing

0.308

V2 duration

0.151

A repeated measures analysis of variance (ANOVA) was performed for the obtained d’ values. The analysis indicated an effect of the factor cue manipulation on the d’ value which was significant [F(11, 198), p< 0.001]. As it emerged from the p(s) results, the d’ analysis reveals that vowel replacement and frication voicing are the most informative cues to voicing categorization of the alveolar fricatives under analysis. Among the single-edited dimensions, durational cues in the frication and in the preceding vowel are most salient to listeners. (23) Relative saliency of the edited cues

It should be noted that this experiment has investigated the relative importance of different cues to the perception of voicing in alveolar fricatives. It does not address the question of whether the same weighting system is applicable to non-strident fricatives; we do not have the relevant evidence to extend our findings to all fricatives. 4. Discussion The present experiment has revealed that there are two categories of cues which are important in the discrimination of voicing of alveolar fricatives in intervocalic position

in English. Two types of cues proved to be most salient to the listeners: these are cues in the adjacent vowels (duration and formant transitions), and cues in the frication (duration and voicing). The Licensing by Cue hypothesis predicts that contrast will be preferably preserved in intervocalic position, and likely neutralized word-initially and word-finally. Contrary to the distribution of voicing contrast in stops, no perceptual cues where found for fricatives. When present, such cues, like VOT, cause the word-initial position to be a better environment for the discrimination of contrast than the word-final position. Even though the stronger influence of V1 was due to the fact that V1 was stressed and V2 was heavily reduced so that there was no discernable difference between V2s in [asa] and [aza], if the differences between V1 and V2 found in this paper are reliable at all, the asymmetry predicts a complementary pattern of contrast neutralization with respect to stops, i.e., postvocalic (V_ #) better than prevocalic (#_V). Contrary to this prediction, a typological study showed that the distribution of voicing contrast in strident fricatives is remarkably similar to the one observed among stops. In the case of stop voicing contrast, the asymmetry is predicted by the nature of the strongest cue to the contrast, VOT, which depends on the following sonorant. In spite of the fact that only half of the formant cues are available at either edge of the word, the two edges are very different in terms of how informative they are. Whereas VOT cues are preserved in a word-initial stop, no VOT cue is available in the case of a word-final stop. The initial- vs. final- asymmetry is therefore accounted for by the Licensing by Cue hypothesis (Steriade 1997). The lack of VOT cue in word final position causes the identification of voicing to be less reliable, and therefore the contrast is more likely to undergo neutralization. While the neutralization pattern of stops can be attributed to lack of VOT cue, it has been less clear what cues are missing in the case of fricatives. The current study showed that cues in surrounding vowels are as important as those in the fricative itself, which explains the asymmetry between intervocalic and word initial/final positions. However, our results do not explain the word –initial and -final asymmetry: why word-initial position is a better place to keep the contrast than word-final position. We have shown in Section 2 the implicational hierarchy of positional neutralization of voicing contrast: Many languages neutralize the voicing contrast in word-final position (e.g., Russian); some neutralize the contrast both in word-final and in word-initial position (e.g., German). However, there is no language which preserves the distinction word-finally but neutralizes word-initially. Our results do not justify any differences of cues in V1 and V2, and thus the asymmetry between word-initial (prevocalic) and word-final (post-

vocalic) positions. Therefore, they neither support nor disprove the Licensing by Cue hypothesis. We have shown that the duration of frication is one of the most important single cues. We can hypothesize that the duration of frication is hard to perceive in word-final position, because acoustic signals gradually die out at the end of utterance, which may make the end point of frication in word final position less clear than the beginning of frication in word initial position. This may render frication duration harder to measure in word final position than in word initial position. Giavazzi (2008) is to investigate this hypothesis. In a nutshell, our result showed that cues in the surrounding sonorant and cues that are more salient in the presence of surrounding sonorants are crucial for distinction of fricative voicing contrast. However, our results do not explain the asymmetry between word-final and word-initial position. The Licensing by Cue hypothesis is neither supported nor disproved by our results, in explaining the positional (word initial and final) asymmetry of voicing contrast neutralization of fricatives. 5. Conclusion The present study has identified the important cues to voicing distinction of fricatives, by looking at the intervocalic position. It turned out that sonorant-dependent properties such as the cues in surrounding vowels as well as frication duration were important. This explains why contrast is preserved more often in intervocalic position than in word boundaries. However, our results do not provide any evidence that a following sonorant should be more important than a preceding sonorant, as was the case for VOT for stops. We hypothesize that frication duration is less perceptible in wordfinal position; a further study is necessary to justify this speculation. According to our results, the Licensing by Cue hypothesis is supported in explaining intervocalic and word-boundaries asymmetry, but it is neither supported nor disproved in explaining word-final/initial asymmetry of voicing contrast neutralization of fricatives. 6. References Beckman, J. et al. 2006. Phonetic Variation and Phonological Theory: German Fricative Voicing. Proceedings of the 25th West Coast Conference on Formal Linguistics. Cascadilla Proceedings Project. Flemming, E. 2007. Course materials for 24.910 Topics in Linguistic Theory:

Laboratory Phonology, Spring 2007. MIT Open Course Ware (http://ocw.mit.edu/). Giavazzi, M. 2008. On the durational aspects of the voicing contrast in alveolar fricatives. MIT, ms. Haggard, M. P., Ambler S., Callow M. 1970. Pitch as a voicing cue. Journal of the Acoustical Society of America 47. 613-17. Liljencrants J. & Lindblom B. 1972. Numerical simulation of vowel quality systems: The role of perceptual contrast. Language 48. 839-862. Lisker, L. 1957. Closure duration and the intervocalic voiced-voiceless distinction in English. Language 33. 42-49. Lisker, L. 1975. Is it VOT or a first formant transition detector? Journal of the Acoustical Society of America 57. 1547-51. Luce, R. D. 1963. Detection and recognition. Handbook of Mathematical Psychology (R. D. Luce, R. R. Bush, and E. Galanter, Eds.), 103-187. Macmillan, N. A. & Creelman, C. D. 2005. Detection Theory: A User's Guide (2nd ed.). Mahwah, N.J.: Lawrence Erlbaum Associates. Nosofsky, R.M. 1986. Attention, Similarity, and the Identification-Categorization Relationship. Journal of Experimental Psychology: General 115(1). Ohala, J. J. 1997. Aerodynamics of Phonology. Proceedings of the 4th Seoul International Conference on Linguistics. 92-97. Padgett, J. 2002. Russian voicing assimilation, final devoicing, and the problem of [v]. Natural Language and Linguistic Theory. Raphael, L. 1981. Duration and contexts as cues to word-final cognate opposition in English. Phonetica 38. 126-47. Repp, B. 1979. Relative amplitude of aspiration noise as a cue for syllable-initial stop consonants. Language and Speech 22. 947-950. Steriade, D. 1997. Phonetics in Phonology: The Case of Laryngeal Neutralization, UCLA, ms. Steriade, D. 1999. Alternatives to Syllable-Based Accounts of Consonantal Phonotactics. In O.Fujimura B.Joseph and B.Palek (eds.) Proceedings of the 1998 Linguistics and Phonetics Conference. 205-242. The Kaolinum Press. Stevens, K. 2000. Acoustic Phonetics, Cambridge: The MIT Press. Wiese, R. 1996. The Phonology of German, Oxford: Clarendon Press.