On the relation between - RUhosting

38 downloads 0 Views 6MB Size Report
On the relation between pitch excursion size and prommence ... differences in a linguistic context, the listeners' task remained non-linguistic in the sense ... increases up to about four octaves above"midбle и-. i"бпrrЛ. *oror, ..... Pierrehumbert' J. (1979)' The perception offundamentat rr"queily d;Iн;;;i"",-;;"rnat of the Acoustical.
Journal of Phonetics(1985)13, 299-308

On the relation betweenpitch excursion size and prommence A. C. M. Rietveld* and C. Gussenhovent *Institute of Phoneticsand t Departmentof English,Catholic Universityof Nijmegen' Erasmusplein1,6525 HT Nijmegen,The Netherlands Received 23rd January 1985, and in revisedform 21th June 1985

An experiment was carried out, first, to establish whether excursion size differences of 1.5 semitones (ST) in accent-lending Fe movements are sufficient to create a difference in the perception of prominence, and, second, to assesswhether differences in the perception of orominence as a function of excursion size differences are more àdequately described using a semitone scale or aHertz scale. The results suggest that a difference of 1.5ST is sufficient to cause a difference in the perception of prominence, and that prominence judgements of different excursion sizes follow a Hettz scale more closely than a semitone scale.

1. Introduction

n t

ft It

l

As the chief phonetic correlate of accent, fundamental frequency, has received a great deal of attention in experimentson the perception and production of accent.It is clearly of great importance, in speech analysis as well as speech synthesis, to understand what 't the perceptual tolerancesare within which listenersoperate. Hart (1981) quite rightly questionsthe relevanceof results of psycho-acousticexperimentsto the perception of Fo 'Just noticeable variations in speech, arguing that it is unlikely that the very small differences" (JND$ for Fo observed in those experiments play a significant role in speech 't perception. Hart, in fact, found much larger JNDs in experiments in which he asked listeners to judge Fo differences in speech. While observing that judges varied in the extent to which they were capable of discriminating between large and small pitch movements, some listeners requiring only 1.5-2.0 semitones(ST), others apparently as much as 4.0 ST. his conclusion is that differences of less than 3.0 ST are unlikely to play a communicative role in sPeech. 't Hart thus placed the issue of discriminability of Fo While the experiments by differences in a linguistic context, the listeners' task remained non-linguistic in the sense that they had to decide which item of a stimulus pair contained the- larger pitch movement. Linguistically, the size"of accent-lending Fo excursions would in general appear to correlate with the prominence of the accent. Accordingly, in the experiment 't reported below, we decided to put Hart's claim that differencesof lessthan 3.0 ST do ttot ptuy a role in speechto the test in a linguistically oriented task: one which required judges to decide which of two accents that varied in Fo excursion size was more prominent, choosing 1.5ST as our smallest interval. As is well known, the relation + l0 $03.00/0 0095-4470/85/030299

Inc.(London)Ltd. Press o 1985Academic

300

A. C. M. Rietvetd and C. Gussenhoven

between prominence and Fo excursion is confounded by overall intonation features. As Breckenridge & Liberman (1977) and pierrehrr*b". (rg.lg) have shown, the prominence impression of Fo excursionsis a function of the serial position of the accent, later accentsrequiring smaller excursions than earlier ones, an effect which i, g"n".uii; attributed to declination (cf. Cohen, Collier & 't Hart, 1982). Since we wanted to avoid problems of interpretation caused by possible declination effects, we decided only to elicit judgements about different Fo excursionsfor the same accent in otherwise identical utterances. A separate,and arguably more important issue in the relation between differencesin perceived prominence and Fo excursion size differences is that of the measure in which Fn differences should be expressedfor the purposes of linguistic description. Some authors, e.g. Pierrehumbert (1979), Ladd (1983) and Libermai & pierrehumbert (19g4), present their data in Hertz, others, e.g. 't Hart & collier (1975), Thorsen (19s0) and :í Hart (1981), in semitones.Expression of Fo data in sr would séem to do justice to the perception of pitch intervals: a jump from 150 to 300 Hz is, musically, to one from 100 to 200H2' on the other hand, there are also indications that "qrrui a given semitone interval in a low frequency range does not have the same perceptual effect as the same interval (expressedin semitones) in a higher frequency .àg". ïn a pilot study on the perceptual effect of Fo movements superimposed on a stieply áescending baselin! carried out by the first author, it was found that early movements created a stronger prominence impression than later movementswith the sameexcursion in semitones.perhaps Stevens, remarks (1975, p. 168) are relevant here; . . . all musicalintervals grow subjectivelylarger as frequency increases up to about four octavesabove"midále è-.i"áïrrË. *oror, throughoutthe whole of what is usualrycailedthe -,'rr*i-iung", intervalsmadeup of equalfrequency_ ràtios (i.e.musicalintervàts; increasein perceivedpitch extentwiih increasingfr.qu"""r.. . . lt l, often thought rhat the musicalscarebasedon rËqu.ti.f .-Jtios is somehowa subjectivescale.It is not. In summary, then, we intended to address two questions with our experiment. (l) Is a difference in excursion size of l.5ST sufficient to create a difference in the perception of prominence? (2) Does an excursionsizedifferenceof a given ST interval createthe sameprominence impression in a higher register as it does in a lower? We prepared our stimuli by means of analysis,manipulation and resynthesisof a number of recorded utterances.when increasing the sire oi an Fo .*"u.rión, while keeping its position as well as the segmentaldurations the same, as we did in our experiment, one inevitably increasesthe rate of change of the Fo movement. In an attempt to control for such concomitant variations in the rate of change, .t Hart (19gr) had the Fo changestake place during voicelesssegmentsin his speechstimuli. It is douótful if.such a precaution has any effect. As stressedby Hirst (ltg3), for example, an intonation contour that is interrupted by voiceless segments is functionally equivalent to the same intonation contour without such interruptions. we therefore aliowed the rate of change to vary concomitantly with the-Foexcursion,judging it less desirableeither to adjust segmental durations or to alter the position of thè Fo -ou.*.nt. Another inevitable, and less obvious, concomitant factor is the possiblà effect on the loudness of resynthesized utterancesas a result of ahy spectral changescaused by the manipulation of Fo.

Pitch excursionsize and prominence

301

2. The experiment 2.1. Material A trained female native speakerof Dutch produced four Dutch sentences,two of which contained one sentenceaccent and two of which contained two. All accentedsyllables contained a single sonorant onset consonant and a mid vowel, so that intrinsic pitch effectsmay be expected to be negligible. These sentenceswere: (l) Ik NEEM het niet langer 'I don't acCEPT this any longer' (2) Mijn idiOOM-tentamen gaat niet door 'My IDioms test has been cancelled' (3) Dat RIJM is om het MOOI te maken 'That RHYME is to make it BEAUtiful' (4) MaTLEEN loopt nu al WEER met Leo 'MaTLEEN is going out with Leo aGAIN' All accents in the sentenceswere realized as rise-fall contours on a low base-line, which is the most neutral realization of a sequenceof a sentenceaccentin Dutch, corresponding to an English nuclear'fall' ('t Hart & Collier, 1975).These utteranceswere recorded in a professional studio (tape speed 19cm/s, distance from microphone 20cm), digitalized (sampling rate l0kHz) and analysedwith the help of the IlS-package SIFT algorithm. The analysiswindow (10 ms) was shifted 5 ms for each output frame; the prediction order was 12, which is necessary1o explain the formants in the 0-5kHz band. Pre-emphasis was 0.95. The resultant pitch contours were checked by means of visual inspection and corrected for obvious octave jumps and voiced/unvoiced errors (cf. Van Rossum & Rietveld, 1984). Each rise-fall confi.guration (but not the rest of the contour) was sub'pointed hat', combination of a straight rise a sequently stylized by replacing it with a and a straight fall, linking up at the highest point of the original rise-fall configuration. Figure I gives the values for the beginning and end of the whole contour and of each straight-line stylization in semitones(ST). In addition to this semi-stylizedexperimental version of each of the four utterances, seven additional experimental versions were resynthesized. In each utterance, the apex(es)of the pointed hat(s) were (l) increasedby l.5ST, (2) increasedby 3.0ST and (3) decreasedby l.5ST, yielding three additional versions. By lowering the entire Fo contour of all four versions of each utterance thus produced by 7 ST, four versionswere obtained, which, apart from the lowered register, were exact copies of the other four. The synthesisquality of the lowered versionswas informally observedto be equal to that of the original versions. We also observed that the difference in register created the impression that the two registerswere produced by different speakers.The four versions 1.5* (the original in the (original) high registersare referred to as 4.5+,3.0+, semi-stylizedutterance) and 0.0 *, while the four copies in the low register are referred 1.5- and 0.0-. Figure 2 illustratesthesemanipulationsfor one of to as 4.5-,3.0-, the four utterances (the first given in Fig. l). 2.2. The test tapes All possible pairwise combinations of the eight versions of each utterance were randomized and recorded on magnetic tape. As combinations of equal versions were included as 'fillers', the number of stimulus pairs per sentencewas 36 (n x (n + l)/2).'Separate test

A. C. M. Rietveld and C, Gussenhoven

302

/:\ 2AO

NEEM het

/:\ : 272

4& 5íO

idiooM

492

ientomen

79Í

915 1008

het mooi le

2&1

'i'

/:\ :

1060 1206

151 334 506

morLEEN loopl

WEER

1717+mS

mei leo

Figure 1, Initial, peak and final valuesin sernitonesin schematicrepresentation oflhe original contours,as realisedby a femalespeaker[50H2 : 0ST (semitonei)].Semitonevaluesare convertedto Hertz valuesby 59 t 19(sr/o)' to give an indication of the correspondingHz values:22ST : l77Hz and 28ST:25lHz. tapes were prepared for the sentences with one accent and the sentences with two accents,

per tape). Moreover, in order to eachtape containing 72 stimuluspairs (two sentences produced,suchthat for every were control for order effects,two versionsofeach testtape pair the other. Thus, four test tapes b-a on stimuluspair a-b on one tape, there was a pairs. The silent interval betweentwo were produced,each of which included six trial pairs were 5 s, long enoughfor judges stimuli in a pair was 1.5s, while pausesbetween judgement. to record a 2.3. Presentationof the tesí tape Three groupsof 30 untrainedjudges,all native speakersof Dutch, were recruitedfrom the studentpopulation of the University of Nijmegen.The test tapeswere presentedto them through headphonesin a languagelaboratory. The listenersgavetheirjudgements on 5-point scales.The scalepositionswere definedas follows: (2) I am certain that the accentin stimulus2 is strongerthan the accentin stimulus l; (1) I am fairly certain that the accent in stimulus 2 is stronger than the accent in stimulus l:

Pitch excursion size and prominence

303

Figure 2. Schematic representation of the eight stimuli produced on the basis ofthe utterance'ik NEEM het niet ianger'. The dashed version represents the original utterance.

(0) I judge both accentsequally strong; (-l) I am fairly certain that the accent in stimulus I is stronger than the accent in stimulus 2; (-2) I am certain that the accent in stimulus I is stronger than the accent in stimulus 2. One group was asked to judge the one-accent stimulus pairs, while the two-accent stimulus pairs were presented to two groups, the first judging the first accent, the second judging the second. In each group, 15 judges judged the stimuli in one order and l5 in the other. This procedure yielded six sets of differencescoresfor all comparisons of the eight peak configurations in the six experimental accents.

3. Results and discussion The aim of our analysis of the data was first, to establish whether a peak-height difference of l.5ST is sufficient to create a perceptible difference in prominence, and, second; to establish whether the relative degreesof prominence perceived in each set of eight peak conflgurations corresponded with Fo ranges as measured in semitonesor as measured in Hertz. To this end, an analysis of variance was performed on each of the six sets of difference scores. The particular analysis method chosen was one which was specifically developed for the analysis of difference scores (Schetré,,1952). The analysis model is X,it :

(a, - a) t

ïii * 6,, + e,,0,

304

A. C. M. Rietveld and C. Gussenhoven

where c, is a parameter characterizing the value of the fth object on a hypothesized scale underlying the judgements (the main effect for the ith object), ?,7is an interaction term, included to take account of the fact that object i is compared to objectT rather than to any ofthe other objects,and óuand e11€xpressorder effectsand error respectively.Apart from yielding the usual analysis of variance table, the method also produces estimates of the scalevalues of the object on a one-dimensional scale. In addition, by providing 'yardstick', it enables the researcher to establish which scale values are significantly a different, using a procedure analogous to that of the Multiple Range Tests. Inallanalysesofvariancetherewasasignificantmaineffect(p < 0.01,df 1,2 : 7,784). This means that for all six accents, overall differences in prominence were observed. The 'subtractivity'-i.e. the hypothesis that there is no interaction - had to be hypothesis of 'stronger prominence' of accent i rejected for all six analysis. This means that the compared to accent jr is, statistically speaking, only true in an average sensewhen i and - 2 other accents as well as with each other. In order to check / are compared with the m of Scheffé's analyses, we also applied Thurstone's means by obtained the results (also called 'Law of Comparative Judgement', cf. technique scaling one-dimensional Torgerson, 1967) to a binary transformation of our data (Case V). In only three cases did we find that objects had a reversed order on the Thurstone scales;in these three cases the scale values were not significantly different on the Scheffé scale. Figure 3 gives the prominence scalesand the scale values of the eight peak configurations for each accent. The double-headed arrows indicate that the two scale positions concerned are not 'Yardstick' method. To give an example significantly different aï the 50Á level, using the gaat niet door Mijn idioOM-tentamen 1.5 of be read: stimulus + of how the figure should pointed hat on -OOM is of the upward slope the size of the has a scale value of 0. 13; scale is and that between 0.86, values and ST values, scale 66Hz: Pearson's r, between values andHz values 0.98*. Inspection of the scales reveals that, within a register, the four Fo ranges result in significantly different scale values in each case (p < 0.01, df : 134). Moreover, the rank order 0, 1.5,3,4.5ST is preservedin every case.This finding warrants the conclusion that a differenceof 1.5ST is sufficient to create an impression of a difference in prominence, provided the contour within which the difference is embedded is kept constant. However, when a comparison is made between the lower and higher registers, differences of l.5ST no longer correspond to differences in perceived prominence. For example, stimulus 0.0 + appears to create a similar prominence impression as stimulus 1.5-. By contrast, when we compare the scalevalues with the excursion sizes of the rises expressedin Hertz, we observe a closer correspondenceto the scale values. The Pearson's correlation coefficients given to the right of every scale confirm this impression: the one for the Hertz values is higher than the one for the semitone values in every case except the last. 4. Conclusion Our results indicate that fairly small diflerences in pitch excursion can lead to differences in the perception of prominence, a finding which is relevant to research into peak scaling *By taking the excursion size of the rise of the 'pointed hat' as a measure of the Fo excursion, we do not intend to suggest that this is in any way a canonical Hertz measure, to be used, for example, in pitch scaling. If we had taken the excursion size of the fall, or of the distance between the peak and the end oí the contour, we would have obtained equally high correlations between scale values and Hertz values.

305

Pitch excursion size and prominence

r= 0.86 r = O.98 r = O.84 RIJM

r = O.99

r= O.93 r =O.98 r = O.9O MARLEEN

r = O.99

r = 0.96 /= Oqa

Figure 3. Scale values of the eight peak configurations for six accents, with se;itone values (ST) above the scale and Hertz-values below the scale' For each scale Pearson's coÍrelation coefficients are given between scale values and semitone values and between scale values and Hertz values. The double headed arrows indicate that the two scale positions concerned are not significantly different.

as well as to investigationsinto the phonetic correlatesof phonological'degreesof stress' in word groups and sentences(Rietveld, 1984). The results should not, of course' be taken to suggestthat an interval of 1.5ST is a linguistically significant'unit'of some - 'pitch range', (Crystal, kind. While the linguistic significanceof excursion sizeper re 1969),'prominence' (Liberman & Pierrehumbert, 1984),'range' (Gussenhoven,1984) distinct categorically has this dimension is generallyacknowledged,the question whether 'gradient' (Bolinger, l96l) appears to be open. levels or is Our results also indicate that the prominence judgements made by our listeners showed better agreement with a H'eftz scale than with a semitone scale. It would, however, be premature to conclude from this that (re)synthesisprogrammes should specify accent-lendingFo excursions in Hertz rather then in semitones.It is conceivable that our resultswould have been different if we had increased,rather than decreased,our original utterancesby 7 ST, as the differencesbetween the two scaleswould have been larger. Further researchwould appear to be called for.

We should like to thank Bob Ladd, Mark Liberman and an anonymous reviewer for their helpful comments on an earlier version of this article.

306

A. C. M. Rietveld and C. Gussenhoven References

Bolinger,D' L. (1961).Generarity, Gradience and theÁil-or-none.TheHague:Mouton. Breckenridge,J. & Liberman_, 0977). The declinationeffectin p.r."ptïon. B"li Laboratorie, Technical Y. Memorandum.Murray Hill: Bell Laboratories. Cohen,A., Collier, R' &'t Hart, J. (1982).Declination:constructor intrinsic feature of speechpitch? Phonet ica, 39, 125-173. (1969). D' PtosodicSystemsand Intonatíonin Engtish.Cambridge:CambridgeUniversity press. lrystal, Cutler, A' & Ladd, D. R. (editors) (1983).Prosody:aodils and Measurímenl".n".riln, H"id"rb;;g, N;; York, Tokyo: Springer. Gussenhoven, c. (1984).on the Grammarand Semanticsof SentenceAccents.Dordrecht, Cinnaminson: Foris Publications. 't Yart, J. (1981).Differential sensitivityto pitch distance,particularly in speech,Journal of the Acoustical Societyof America,67,811-821. 't Hart, J' & Collier' R. (1975).Integratingdifferentlevelsof intonation analysis,Journal phonetics, of 3, 235155. Hirst, D' (1983).Structureand categoriesin prosodic representations.Inprosody: -Berlin: "r' Models and Measurements(4. Cutler & D. R. Ladd, editors),pp. 94_109. SpringeÍ. La!d,-D' R' (1983)-Peak c1!".ï and-overalislope.In ProsodyiMoáets and Measurementu (A. cutler & D. R. Ladd, editors), pp. 39-52. Berlin: Springór. Liberman, M. & Pierrehumbert,J. (1984).Intonàtional invarianceunder changesin pitch range and length' In Languagesoundstructure: studies in Phonology Presentedto Morí iaile by hísTeacherand (M. Aronoff & R. T. oehrle),pp. 157-233.cáuriage, u"r*"rr"r"tiriM.I.T. press. _.students Pierrehumbert'J. (1979)'The perceptionoffundamentat rr"queily d;Ií;;;i"",-;;"rnat of the Acoustical Socieíyof America, 66, 363-369. Rietveld,A' c. M. í1984)-Gra.dationsi1 pitch accents?In Proce,edings of the tenth international congress of.phone,tic sciencàs(M.p.R. van den Bio*i; t À. èohen, editors), pp. 574-579.Dordrecht, Cinnaminson:Foris Publications. scheffé,H' (1952).An analysisof variancefor paired comparisons, Journalof the Americanstaíistícal Association,47, 38l+00. Stevens,S. S' (1975)'Psychophysics: introductionto its neut.raland.soclalprospecÍs,New york: John Wiley, Thorsen,N.(1980).Astudy.oftheperceptionofsentenceintonation-iuioËn""lromDanish, Journalof theAcousticalSocietyof Ámerica,67,iOtA-tOfO. Torgerson,W. (1167). Theoryand.m1rhods york: of scaling.New John Wiley. !. van Rossum N. J. T. M. & Rietveld,A. c. M. (1984i.A perceptualevaluatio-n of v/u detectors,speech Communication, 3, I 5l-156.

Appendix

Raw data for all comparisonsare given on the following two pages, in order for researchers to be able to subjectthem to other analyses.

307

Pitch excursion size and prominence e'l

-r

co !+ c) O Ê..) -

É

Jêl

n c.t O É O O O O O O N (> ca O Éd

co \f, É co É a.l É o\ o s o o\ o o o *

o o r

O \O Q O !'t O c> ól

€ tt- o 01 \o o t-o

*

T

ÈÊ

É o t-- I

Fl

.t r- !n É ..r oo d =

È O cn oo O\ O\ ó ;

ail

c.l T

-f

R

o Oci

\o o

F

O I

3

-

t'- c4 O :] o O o\ Ê € c'l

-

!Q \O \O r+ n O Q co O\ \t 14 al c{ \O