The role of longterm acquaintances in speech

1 downloads 0 Views 122KB Size Report
KLJKO\ VLJQLILFDQW HIIHFW ZDV IRXQG LQ ERWK D DQG Dޝ 7KH SODQQHG FRQ- ... 6LJQLILFDQW HIIHFW ZDV IRXQG DOVR LQ Rޝ +RZHYHU QR ...
Yshai Kalmanovitcha

The role of longterm acquaintances in speech accommodation

Abstract Studies of speech accommodation observe almost exclusively accommodation processes between newly acquainted speakers. Despite the fact interactional experience over time was found to increase inter-speaker convergence, studies of inter-personal speech accommodation seldom look at long-term effects on accommodation patterns. The case-study reported here compares convergence levels between speakers in three interactional contexts, reflecting long-term, short-term and zero accommodation. It shows advantage for long-term accommodation over other conditions in presenting consistent and robust evidence for phonetic convergence between interlocutors with regard to global phonetic features characterizing their speech.

Keywords Phonetic convergence, accommodation theory

a

Corresponding author: [email protected] Phonetics Laboratory, Department of Comparative Linguistics, University of Zurich, Plattenstrasse 54, 8032 Zürich, Switzerland

94

Yshai Kalmanovitch

1.0 Background The Speech Accommodation Theory, developed by Giles, Coupland and others since the end of the 1970’s (Giles et al. 1991), describes temporary shifts in interlocutors’ speech in interaction, converging towards or diverging from each other. This phenomenon is often described as reflecting social solidarity or distance between interlocutors (Giles et al., 1991; Pardo, 2006). As such, it is described as a dynamic process, displaying changing trends and intensities within the same interaction, motivated by changes in social attitude (Bourhis & Giles, 1977), communicative role and status (Pardo, 2006) etc. Other studies show shifts towards phonetic convergence to be a more automatic process, independent of interlocutors’ conscious communicative intentions. Goldinger (1998) shows that shadowers are perceived in an AXB subjective evaluation experiment as converging towards target-stimuli despite lack of any communicational intention in the experimental design. Lewandowski (2011) describes her 2 experimenters converging towards her subjects despite clear instructions to avoid such convergence. Yet, while evidence for convergence between interlocutors on most linguistic levels is very convincing (Pickering & Garrod 2004), studies of phonetic convergence, while generally consistent, are less conclusive. Thus, observations based on subjective evaluation experiments frequently just barely reach significance and are often not supported by the acoustic analyses (Pardo, 2006; Bulatov, 2009). Other studies show inconsistency in accommodation patterns, showing convergence in some features and divergence in others (Giles et al., 1991). Coupland, in one of the very few studies of speech accommodation in natural discourse, questions whether it is the objective features of other interlocutors’ speech that speakers react to, or rather it is their own subjective expectations of such features that they converge to (Coupland, 1984, p. 65). Indeed, Bell (1991) observed radio moderators accommodating their speech towards their virtual addressees, further supporting Coupland’s doubts. Although often assumed to occur quite rapidly, studies indicate that phonetic convergence intensifies with time and experience. Thus, speakers seem to converge more in later stages of the interaction than in earlier stages (Pardo, 2006) or after several trials (Goldinger, 2008; Babel, 2009). Yet there are hardly any studies to the effect of long-term interactional experience on speech accommodation in interpersonal interaction. Indeed, Pardo et al. (2012) studied speech accommodation in pairs of room-mate students over a year since

The role of longterm acquaintances in speech accommodation

95

they started sharing an apartment. However, as data were obtained in a reading task rather than in a task involving direct interaction, her findings may not reflect necessarily speech accommodation patterns within the pairs, but more likely long term accommodation patterns of a more general nature, resulting from subjects’ integration into the new common social surroundings and linguistically adapting to it (Pardo et al., 2012, p. 196). In the case study reported here the speech patterns of four speakers long acquainted with each other were compared across different conditions representing different degrees of social motivation and interactional experience with other interlocutors. The basic hypothesis underlying this study is that speakers indeed accommodate their speech constantly in different communicational contexts, but that convergence towards the objective speech characteristics of other interlocutors requires long interactional experience with the target speaker. Thus, it makes a clear distinction between Speech accommodation on the one hand, aimed in the first place at expressing communicational intentions by signalling shifts in speech patterns but not necessarily converging towards the objective features presented by the addressee (Selting, 1985), and on the other hand a long-term automatic process of Phonetic convergence, which is motivated by cognitive needs for homogeneity of the perceptual environment.

2.0 Method Four female native-speakers of German in their mid 20s volunteered for this study. The four (referred to as AK, EC, LG and MB) have known each other for several years and were studying and working together in Basel (CH). MB was the newest member in the group, joining the other three speakers, who studied for 3 years in Reutlingen (DE) prior to their arrival in Basel. The four speakers took part in two group sessions and further individual sessions to obtain data under three experimental conditions reflecting different degrees of social motivation and/or of familiarity and exposure to other interlocutors’ speech: Long-term-accommodation (LTA): in two group-interactions of 25 and 15 minutes (GI and GII respectively) with a year interval between them the speakers were engaged in a free conversation with each other, slightly moderated by the experimenter. Questions from the experimenter were

96

Yshai Kalmanovitch

mostly directed at the group. In a few cases an individual speaker was directed, if she seemed too passive. This condition reflects high level of both social motivation and exposure of the speakers to each others’ speech-characteristics. Short-term-accommodation (STA): in short individual interviews of 7 to 10 minutes (INT) recorded in the same session as GII, subjects were engaged in an informal conversation with the experimenter. This condition still reflects a high level of social motivation, but very low level of exposure to the interlocutor’s speech characteristics. Furthermore, as will be discussed later, to examine Coupland’s hypothesis that speech accommodation in interaction is based not on objective characteristics of interlocutors’ speech but on subjective assumption about those, speech contributions from the experimenter were minimized. Zero-accommodation (ZA): for this condition speakers were required to read out an unfamiliar text on a scientific theme (TXT). Data recorded in the same session as GII and INT, were found quantitatively and qualitatively insufficient. Hence the assignment was recorded again in the following weeks. Regrettably, EC was no longer available at that time. The text was chosen to be as cognitively as demanding as possible to limit any potential social and communicational consideration in the task (see Bell, 1991). Thus, this condition reflects low level of social motivation and low level of exposure to other interlocutor’s speech. Assuming that those two factors indeed motivate convergence between interlocutors, any condition including those factors is expected to show more convergence between speakers – where differences between interlocutors are large enough to justify convergence. All recordings were taken in the speakers’ natural surroundings using a ZOOM2 field-recorder. The data were processed with Praat (Boersma & Weenink, 2010). With the exception of TXT, special attention was given in the current study to obtaining data from natural speech. Unlike former studies (e.g. Pardo, 2006), no attempts were made to manipulate speakers to use primed lexical items. Thus rather than comparing identical items between speakers and conditions, global phonetic and phonological features were observed. Observing speech in natural discourse was aimed at generalizing and increasing the validity of the observed phenomenon. Goldinger (1998) found that convergence in shadowing has a temporally limited effect, which disappears in delayed shadowing (but see Pardo, 2006, Goldinger & Azuma, 2004).

The role of longterm acquaintances in speech accommodation

97

Thus, experiments based on shadowing procedures – either explicit (Goldinger, 1998) or implicit (Pardo, 2006) – may indicate a local, temporally limited phenomenon rather than a general characteristic of verbal interaction in the more complex settings outside the laboratory (Goldinger, 1998, p. 268; Pickering & Garrod, p. 169). It is yet important to consider how different factors irrelevant to the observed phenomenon may influence the interpretation of the findings. Thus, when investigating global phonetic characteristics, the lack of control for lexical homogeneity may result in phonetic variability that reflects within-speaker variability conditioned by different phonetic environments in different lexical items rather than shifts towards other interlocutors. While this point is valid in possibly explaining within-speaker differences between the conditions, it is not likely to account for the qualitative predictions underlying this study. Thus, since in TXT all subjects are faced with identical texts, greater homogeneity between speakers may be expected in this condition. This however will contradict the main hypothesis of the study. Phonetic shifts may also reflect different styles or registers used for pragmatic reasons. In fact, a widespread assumption predicts that contexts perceived as more formal and distanced (such as reading a text) will call for more standard varieties and vice versa (Koch & Oesterreicher, 1994). Consequently, one should assume that GI and GII would show the greatest between-speaker variability, while in TXT a standard variety would be chosen, resulting hence in more convergence between speakers. Once more however, this assumption runs completely opposite to the predictions underlying the current study. A problem of a more practical nature was the very unbalance dataset, with great quantitative differences between the tokens provided by the different speakers within and between conditions. Thus AK and MB supplied most of the data, and LG the least. TXT and GI provided significantly more data than GII and INT. This had to be taken in mind in the quantitative analysis of course. As a general rule the minimum token-number per feature per speaker per condition was set to no less than 8 to be included in the analysis. Consequently, some features had to be excluded from the analysis.

98

Yshai Kalmanovitch

3.0 Predictions and analysis Two main predictions were investigated: 1. It is predicted that LTA will consistently show higher level of convergence compared to the other conditions, in particularly compared to ZA, with GII possibly showing more convergence than GI. 2. It is predicted that speech accommodation between STA and ZA will be characterized by mixed speech-shifting patterns with occasional convergence. For this purpose, the overall distances between the speakers with respect to each observed features in each condition were calculated (GI and GII were calculated separately). Thus the analysis was simplified to include the overall distances as the dependent variable and the experimental condition as the independent variable. This simplification also helped overcoming some of the difficulties arising from the very unbalanced dataset. Since EC was not available for TXT, her data were excluded. There was however no indication that her exclusion might have effect the general trends observed. It should be emphasized that the single subject of the analysis was the effect of the individual shifts on the relative level of convergence between speakers in the different conditions, and not the nature of the shifts themselves. Thus, psycho- and sociolinguistic questions (e.g. towards or away from the standard variety or which speaker converged to which speaker) were ignored in the analysis, and could not be detected by it in any case. It should further be noted, that the qualitative prediction about the well acquainted speakers converging in LTA was based on the assumption such convergence is automatic and serves to facilitate speech processing in interaction by arriving relative homogeneity of the immediate perceptual environment (Pickering & Garrod, 2013). As such it was enough to consider only the two most distanced speakers in each condition to represent the overall distances. For the normal distribution resembling variables this was evaluated by calculating the difference-distribution between the two samples provided by the observed speakers, which was then given to a one-way independent ANOVA based on the Brown-Forsyth formula to correct the unbalanced data, with planned orthogonal contrasts (GIGII vs. TXT, INT vs. TXT and GI vs. GII) to investigate the nature of the effect. For the investigation of the voicing realization of consonants, in which the dependent variable was categorical, the overall distance was characterized by the odds ratio between the two most diverging speakers in each condition.

The role of longterm acquaintances in speech accommodation

99

Considering the exclusion of the (reduced and hence insufficient for any analysis) speech data from the experimenter, a seemingly problematic question is the validity of comparisons of the distances between the speakers in INT. As mentioned before, Coupland (1984, p. 65) suggests that unfamiliar interlocutors accommodate not towards what they objectively perceive to be other interlocutor’s speech characteristics, but instead towards their subjective assumptions of those features, based on former interactional experience with speakers sharing the same background as the current interlocutors. In other words, such patterns are the result of an educated guess, leading to convergence when the accommodating party’s guesses right and to divergence when he’s wrong. If Coupland’s doubts are justified, than direct comparison of the four speakers should still produce mixed patterns of accommodation with occasional convergence, reflecting similarity or differences between assumptions made by the four speakers about the experimenter’s speech, based on their common knowledge of him (e.g. that he is a non-native speaker of German or that he studies German linguistics at the University). One should however not misinterpret this radical formulation of Coupland’s doubts. It is not argued that any observed pattern of convergence in past studies is pure coincidence. Speaker’s guesses on others’ speech are often very educated. Thus, if I meet a female speaker from Newcastle, I may assume in advance she speaks English, her voice is higher than mine, and I may also expect some salient regional characteristics. Researchers are also aware of such knowledge, which is why they find sociolinguistic information so relevant in their studies and why based on this information they often concentrate on specific salient characteristics, thus increasing their chances to find converging patterns.

4.0 Results 4.1 Voicing The voiced realization of Standard German /b, d, g, z, v/ shows great variability, mainly regionally distributed, with the distinction voiced often replaced by the distinction lenis, primarily in southern parts of Germany and in Switzerland (Hove et al., 2009). As the recording was often quite noisy, an auditory judgment was initially used to determine voiced realization of the observed tokens. Closer look at spectral peaks corresponding to expected harmonics in voiced realization was conducted as a more objective mean for defining

100

Yshai Kalmanovitch

voicing in the dental fricative /z/. Figure 1 summarizes the results for Standard German /z/. Figure 1: Voiced realization of Standard German /z/ 100

% Voiced realization

90 80 70 60 50 40 30

AK EC LG MB

20 10 0 GI

GII

INT

TXT

The overall distances were calculated by computing the odds-ratios between the two most diverging speakers (excluding EC) in each condition, giving odds ratios of GI 0.67, GII 0.53, INT 0.12 and TXT 0.05 (when 1 stands for total convergence). Those were weighted to balance the different proportional contributions of each speaker in the individual conditions and were then put into a 2X4 contingency table, (the two speakers observed in each condition over the four conditions). A G-test for goodness of fit was highly significant (X2(3) 38, p.001, N 347). Post hoc tests showed that compared to TXT, convergence was highly significant for GI (X2(1) 112, p.001, N 243) and GII (X2(1) 12.4, p.001, N 161). The comparison between INT and TXT was found non significant (X2(1) 1.11, ns., N 166), as also the comparison between GI and GII (X2(1) 0.19, ns., N 181). Standard German /g/ showed very similar patterns, but as the analysis was based on only auditive judgment, it will not be further elaborated here. The same is true for /d/, which showed almost perfect convergence for INT but no differences in levels of convergence between the other conditions. For /b/ and /v/ not enough tokens were collected.

The role of longterm acquaintances in speech accommodation

101

4.2 Vowel formants After normalizing the F1 and F2 to the Bark-scale (Traunmüller, 1990) the Euclidian distances between the vowels as realized by the individual speakers in the different conditions were calculated (Babel, 2009). Table 1 gives the overall distances with the goodness of fit measure. Brown-Forsyth corrected degrees of freedom are quoted. Table 1: Overall distances of vowel-realizations Vowel /a/ D‫ޝ‬

GI

L‫ޝ‬ /e/ /o/ R‫ޝ‬

INT

TXT

df*

FBF

p-value

3, 285

19.032

௘

3, 188

11.495

௘

3, 335

1.95

=௘06

3, 262

0.66

ns.

3, 130

0.25

ns.

3, 136

1.34

ns.

3, 128

2.26

௘

0.74

0.34

2.18

1.67

“௘

“௘

“௘

“௘

0.74 “௘

/i/

GII

0.38 “௘

1.82

1.81

“௘

“௘

0.93

0.82

0.44

“௘

“௘

“௘

0.66 “௘

1.03

0.91

0.81

0.81

“௘

“௘

“௘

“௘

0.74

0.79

0.91

0.9

“௘

“௘

“௘

“௘

0.43

0.31

0.89

0.84

“௘

“௘

“௘

“௘

0.39

0.82

0.01

“௘

“௘

“௘

0.85 “௘

Notes:'LVWDQFHVDUHJLYHQLQ%DUNWHVWVZHUHRQHWDLOHGIRURDQGR‫ޝ‬RQO\$.DQG0% supplied sufficient data for the analysis.

$KLJKO\VLJQLILFDQWHIIHFWZDVIRXQGLQERWKDDQGD‫ޝ‬7KHSODQQHGFRQtrasts confirmed that convergence in GI and GII was significantly higher than in TXT for both /a/ (t( 285) 4.325, p DQGD‫ ޝ‬W 188) 3.989, p.001). GII shows greater convergence than GI, (a patterns that repeats in most other vowels), but this pattern was not significant (for /a/: t( 285) 0.925, QVDQGIRUD‫ޝ‬W 188) 0.677, ns.). Nor was the divergence in INT compared to TXT, especially for /a/, significant (respectively: t( 285) 0.925, ns t( 285) 1.008, ns.). 6LJQLILFDQWHIIHFWZDVIRXQGDOVRLQR‫ޝ‬+RZHYHUQRVLJQLILFDQFHZDV detected neither for the convergence patterns of GI and GII compared to TXT (t( 128) 0.699, ns) nor for that of INT compared to TXT (t( 128) 1.322, ns),

102

Yshai Kalmanovitch

nor for the divergence pattern of GII compared to GI (t( 128) 0.715, ns). The marginally significant effect found in the realization of /i/, in which divergence was the highest for GI and GII and convergence was highest for INT, was also further investigated, but again none of the contrasts was significant.

4.3 Voice Onset Timing (VOT) VOT was measured for standard German voiceless stops in pre-vowel position in stressed syllables. Unfortunately, the data was mostly quantitatively unsatisfying. Hence only the distances between AK and MB with respect to standard German /t/ were investigated, despite MB supplying only 6 tokens in GII. The overall distances showed relative convergence in GII and INT (6 ms (34) and 4 ms (54) respectively) compared to GI and TXT (26 ms (44) and 17 ms (61). The ANOVA failed to detect any significant effect (FBF(3, 54.2) 0.37, ns.). Post hoc tests were nevertheless conducted and found the convergence level in GII significantly higher than in GI ((t( 42) 2.78, p.01). With an alpha level of .0167 to correct for family wise errors the convergence pattern of GII compared to TXT (t( 39) 1.78, p .04) and INT compared to TXT (t( 42) 1.84, p .037) were found only marginally significant. Indeed, the reliability of these results is questionable. Nevertheless, it confirms with the general trend in the analysis.

5.0 Discussion and conclusions As predicted, in LTA speakers consistently reached higher level of convergence in comparison with ZA. This is reflected in significantly greater convergence in both GI and GII where distances between the speakers were wide in TXT, and in the relative stability, where distances were smaller. In no case speakers reliable diverged in LTA compared to ZA. It was also predicted that within LTA, GII may show more convergence than GI. While a general pattern in this direction could often be observed, it was not found reliable. The marginally significant convergence between AK and MB in the VOT analysis should be taken with much caution due to quantitative problems mentioned above. It is noteworthy, that with regards to voicing level, convergence degree stayed stable despite the general tendency for voicing reduction in GII, which may be explained by the linguistic integration of the group during the one-year period between GI and GII into the

The role of longterm acquaintances in speech accommodation

103

Swiss linguistic surroundings, which favors the non-voiced realization. This kind of stability is in many ways predicted by Pickering and Garrod (2004), as the result of an alignment process between interlocutors over time, leading to the emergence of a complicated and comprehensive priming mechanism, in which aligned conceptual and behavioral levels reflect and prime each other in interaction. A second prediction was that the shifts of the individual speakers between STA and ZA will result in a mixed pattern of convergence and divergence between the speakers, similar to patterns observed in former studies. This, despite the lack of input about the target speaker’s speech and without direct comparison with his objective speech characteristics, which were the basis for the interpretations of such results in the past as motivated by reacting to objectively perceived speech characteristics of other interlocutors. Such a pattern could indeed be observed, e.g. in vowel quality, where /a/ showed very noticeDEOHGLYHUJHQFHEHWZHHQWKHVSHDNHUVZKLOHLQR‫ޝ‬VSHDNHUVFRQYHUJHGSUDFtically completely. Indeed, apart for the problematic instance of marginally significant convergence in the VOT analysis for /t/, none of the other patterns reached significance, which may be at least partly explained by quantitative difficulties mentioned before and the conservative procedures needed to deal with those. Nevertheless, it does seem to further support Coupland’s doubts about the objective motivation of speech accommodation between speakers in short term accommodation. Thus, there seems to be a qualitative difference between two – possibly complementary – strategies underlying short-term and long-term accommodation. Speakers may indeed begin a long-term process of converging towards other interlocutors, “experimenting” with their subjective assumptions and correcting them, possibly using a sensorimotor adaptation mechanism (Pickering & Garrod, 2013; Houde & Jordan, 2002) until convergence is sufficient. Support for an assumption about interlocutors ‘experimenting’ with their speech when lacking interactional experience may be found in the great variance in the vowel distances in STA, reflecting much greater range exploited for realization of the different vowels. The case-study reported here is a novel attempt to look into the effect of long term acquaintances on speech accommodation in interpersonal interaction. Admiringly however, it bares also flaws that must be addressed in future studies. Some of those problems – such as the lack of control for the quality and quantity of the data – cannot be completely resolved when investigating language in natural discourse, but the difficulties connected to it may be reduced by collecting much larger samples. For other problems – such as the

104

Yshai Kalmanovitch

need for pragmatic consistency – better solutions can be found in the design itself. Nevertheless, this study shows the potential this line of research may bare for better understanding of the nature of speech accommodation in and as a consequence of interpersonal interaction.

Acknowledgements This report is based on a study I have conducted for my MA thesis at the University Zurich. I would like to thank both my supervisors, Elvira Glaser from the Department of German Linguistics and Stephan Schmid from the Phonetics Laboratory for their patience and the freedom they allowed me in developing my theses. I would also like to thank Volker Dellwo from the Phonetics Laboratory for his technical assistance. Special gratitude I owe to the four anonym speakers and especially AK for her organizational help.

References Babel, M. (2009). Phonetic and social selectivity in speech accommodation. Doctoral dissertation, University of California, Berkeley. Bell, A. (1991). Audience Accommodation in the Mass Media. In: G. Howard et al. (Eds.), Context of accommodation (pp. 69–102). Cambridge: Cambrige University Press. Boersma, P., & Weenink, D. (2010). Praat: Doing Phonetics by Computer. Version 5.2.01. www.praat.org. Bourhis, R. Y., & Giles, H. (1977). The Language of Intergroup distinctiveness. In: G. Howard (Ed.): Language, ethnicity and intergroup relations (pp. 119–35). London: Academic Press. Bulatov, D. (2009). The Effect of Fundamental Frequency on Phonetic Convergence. UC Berkeley Phonology Lab Annual Report, 2009, 404–434. Coupland, N. (1984). Accommodation at work: Some phonological data and their implications. International Journal of the Sociology of Language, 46, 49–70. Giles, H., Coupland, N., & Coupland, J. (1991). Accomodation theory: communication, context and consequences. In: G. Howard et al. (Eds.), Context of accommodation (pp. 1–68). Cambridge: Cambrige University Press. Goldinger, S. D. (1998). Echoes of echoes? An episodic theory of lexical access. Psychological Review, 105(2), 251–279. Goldinger, S. D., & Azuma, T. (2004). Episodic memory in printed word naming. Psychonomic bulletin and review, 11, 716–722. Houde, J. F., & Jodan, M. L. (2002). Sensorimotor adaptation of speech I: Compensation and adaptation. Journal of Speech, Language, and Hearing Research, 45, 295–310.

The role of longterm acquaintances in speech accommodation

105

Hove, I., & Haas, W. (2009). Die Standardaussprache in der deutschsprachigen Schweiz. In: E. M. Krech, E. Stock, U. Hirschfeld, & L. C. Anders (Eds), Deutsches Aussprachewöterbuch (pp. 259–277). Berlin: de Gruyter. Koch, P., & Oesterreicher, W. (1994). Schriftlichkeit und Sprache. In: H. Günther, & O. Ludwig (Eds.): Schrift und Schriftlichkeit: Ein interdisziplinäres Handbuch internationaler Forschung (pp. 587–604). Berlin: de Gruyter. Lewandowski, N. (2011). Talent in nonnative phonetic convergence. Doctoral dissertation, University of Stuttgart. Pardo, J. S. (2006). On phonetic convergence during conversational interaction. Journal of the Acoustical Society of America, 119, 2382–2393. Pardo, J. S., Gibbons, R., Suppes, A., & Kraus, R. M. (2012). Phonetic convergence in college roommates. Journal of Phonetics, 40, 190–197. Pickering, M. J., & Garrod, S. (2004). Toward a mechanistic psychology of dialogue. The Behavioral and Brain Sciences, 27, 169–190. Pickering, M. J., & Garrod, S. (2013). An integrated theory of language production and comprehension. The Behavioral and Brain Sciences, 36, 329–347. Selting, M. (1985). Levels of style-shifting: Exemplified in the interaction strategies of a moderator in a listener participation programme. Journal of Pragmatics, 9, 179–197. Traunmüller, H. (1990). Analytical expressions for the tonotopical sensory scale. Journal of the Acoustical Society of America, 88(1), 97–100.