Processing prosodic boundaries in natural and hummed speech. An fMRI study
Anja K. Ischebeck1,2, Angela D. Friederici2, Kai Alter2,3
Innsbruck Medical University, Clinical Department of Neurology, Austria
Max Planck Institute of Human Cognitive and Brain Sciences, Leipzig, Germany 3
Newcastle University Medical School, Newcastle upon Tyne, U.K.
Corresponding author: Anja Ischebeck, PhD Innsbruck Medical University Anichstrasse 35 6020 – Innsbruck (Austria) tel.: +43 512 504 23661 e-mail: [email protected]
Abstract Speech contains prosodic cues such as pauses between different phrases of a sentence. These intonational phrase boundaries (IPBs) elicit a specific component in ERP-studies, the socalled closure-positive-shift (CPS). The aim of the present fMRI study is to identify the neural correlates of this prosody-related component in sentences containing segmental and prosodic information (natural speech) and hummed sentences only containing prosodic information. Sentences with two IPBs both in normal and hummed speech activated the middle superior temporal gyrus, the Rolandic operculum and the gyrus of Heschl more strongly than sentences with one IPB. The results from a region of interest (ROI) analysis of auditory cortex and auditory association areas suggest that the posterior Rolandic operculum, in particular, supports the processing of prosodic information. A comparison of natural speech and hummed sentences revealed a number of left-hemispheric areas within the temporal lobe as well as in the frontal and parietal lobe that were activated more strongly for natural speech than for hummed sentences. These areas constitute the neural network for the processing of natural speech. The finding that no area was activated more strongly for hummed sentences compared to natural speech suggests that prosody is an integrated part of natural speech.
Introduction The speech melody of an utterance can carry information that is critically important to understand the meaning of a sentence (see, for a review, Frazier et al., 2006; Friederici & Alter, 2004). In intonational languages such as German, Dutch, and English, prosodic information on sentence-level is mainly conveyed, among others, by the pitch contour of an utterance and the presence of speech pauses. Sentences usually contain one or more major intonational phrases (IPh; Selkirk, 1995) that can be separated by speech pauses. Syntactically relevant speech breaks are also referred to as intonational phrase boundaries (IPB). In studies using an event-related brain potential (ERP) paradigm IPBs were observed to give rise to a positive shift in the EEG-signal that is referred to as the closure positive shift or CPScomponent (Steinhauer et al., 1999). This component has been interpreted as being specifically related to the prosodic information contained in IPBs, as it has been observed using sentence materials that lacked semantic and syntactic information, such as pseudoword sentences (Pannekamp et al., 2005), filtered speech materials (Steinhauer & Friederici, 2001) or hummed speech (Pannekamp et al., 2005). The present fMRI study attempts to identify the brain regions that are involved in the processing of intonational phrase boundaries (IPBs). IPBs are often employed by the speaker to clarify the structure of an otherwise syntactically ambiguous sentence. A sentence like 'Before Ben starts # the day dreaming has to stop' has a different meaning than 'Before Ben starts the day # dreaming has to stop' (# indicating a break). Syntactically relevant speech breaks are also referred to as intonational phrase boundaries (IPB). IPBs often separate major intonational phrases and correspond to major syntactic boundaries (Cooper & Paccia-Cooper, 1981). They represent a high level in the so-called prosodic hierarchy (Selkirk, 1995, 2000; Nespor & Vogel, 1983). In psycholinguistic experiments, IPBs were shown to help resolve ambiguities related to late closure ambiguities (Schafer et al., 2000; Grabe et al., 1994).
Experimental evidence suggests that humans can make use of the prosodic information contained in IPBs, that is, in the absence of semantic or syntactic information. Behaviorally, it has been shown that listeners are able to detect major prosodic boundaries in meaningless speech materials, such as, for example, reiterant speech (i.e., a sentence spoken as a string of repeated syllables while preserving the original prosodic contour) (de Rooij, 1976), spectrally scrambled and low-pass filtered speech (de Rooij, 1975, Kreiman, 1982), and hummed sentences (t'Hart & Cohen, 1990). It should be noted, that the rationale for using stimuli of this kind is based on the assumption that speech is separable into layers, such as semantics, syntax and prosody. This assumption, however, may only be an approximation, as it has been argued that the possibility to separate these layers is inherently limited (Austin, 1975; Searle, 1969). In studies using an event-related brain potential (ERP) paradigm IPBs were observed to give rise to a positive shift in the EEG-signal that is referred to as the closure positive shift or CPS-component (Steinhauer et al., 1999). Subsequent studies indicated that the CPS is specifically related to the prosodic aspects of an IPB, as it was also observed for sentence materials that were stripped of semantic information, such as pseudoword sentences, and for sentence materials with reduced or absent segmental information, such as hummed sentences (Pannekamp et al., 2005), and filtered speech materials (Steinhauer & Friederici, 2001). The CPS component is typically distributed bilaterally with a central maximum with a shift to the right for hummed sentences. However, due to the intrinsic difficulty of source localization in EEG studies, it is unclear which brain regions generate the CPS component. To our knowledge, only one imaging study so far investigated the processing of IPBs in speech (Strelnikov et al., 2006). Strelnikov et al. compared sentence materials in Russian that contained an IPB (e.g. 'To chop not # to saw', meaning One should not chop but saw) with sentences that did not contain an IPB ('Father bought him a coat'). Comparing sentences with IPB ('segmented') to sentences without IPB ('not segmented') stronger activation1 was 4
observed within the right posterior prefrontal cortex and an area within the right cerebellum. In the reverse comparison, stronger activation was observed in the gyrus of Heschl, bilaterally, and the left sylvian sulcus. However, the two types of sentence materials were used in two different tasks thus confounding stimulus type and task. Although the comparison between segmented and non-segmented speech materials very likely yielded brain areas that are relevant for prosody processing, it can not be excluded that differences due to the tasks contributed to the results. Ideally, a study investigating the processing of IPBs should compare conditions that do not differ in any other respect than the presence or absence of IPBs, keeping everything else constant. In the electrophysiological studies reviewed above (e.g., Steinhauer et al., 1999), the same task was used on sentence materials that either had one or two IPBs. The sentence materials used in the electrophysiological studies reviewed above were carefully constructed as sentence pairs with the same or similar words. It should be noted that the IPBs in these sentences were obligatory, entailing differences with regard to the syntactic and semantic structure between the two sentences of such a pair. However, these differences only play a role, when the sentence materials are presented naturally spoken, but not when their segmental content is removed by filtering or humming. Hummed speech has the advantage that it preserves the prosody of natural speech while removing major aspects of semantic and syntactic information of the utterance. Human speakers have been shown to be able to selectively preserve the original prosodic contour of an utterance. When asked to produce reiterant speech (i.e., selectively preserving the prosodic contour of a meaningful utterance by repeating a syllable), the resultant utterance preserved the prosodic aspects from normal speech, such as duration and pitch (Larkey, 1983), as well as accentuation and boundary marking (Rilliard & Aubergé, 1998). When utterances are hummed, the hummed version of an utterance has also been shown to preserve pitch contour and duration of the natural spoken version (Pannekamp et al., 2005). Most importantly, the 5
CPS component Pannekamp et al. observed at IPBs within hummed speech materials were similar to the CPS observed for natural speech, but more lateralized to the right hemisphere. Hummed speech has the additional advantage that it is a familiar human vocalization. Different from speech materials that are rendered unintelligible artificially and sound more unfamiliar (Scott et al., 2000; Meyer et al., 2004) the known unintelligibility of humming effectively prevents participants from any attempt to decipher the original speech content of the signal. In the present functional magnetic resonance imaging (fMRI) study, sentence pairs containing one or two IPBs were presented auditorily to the participants as natural speech and as hummed sentences. Similar to previous electrophysiological studies (Steinhauer et al. 1999; Pannekamp et al., 2005; Isel et al., 2005), the sentence pairs were constructed using the same or equivalent content words, except for one or two critical words. All sentences were meaningful, syntactically correct and spoken with natural prosody. To ensure variability in the sentence materials used with regard to the position of the additional IPB, two types of sentence pairs were constructed, one type with the additional IPB at an early position within the sentence (type A), the other type at a later position in the sentence (see, Table 1, for examples of the sentence materials). Similar to previous electrophysiological studies, the IPBs contained in the sentences were obligatory. We had used materials with obligatory IPBs because such materials had been investigated in previous electrophysiological studies. A CPS had been observed for sentences of type A (Steinhauer et al., 1999; Pannekamp et al., 2005) as well as for coordination structures like type B (Steinhauer, 2003). In the case of the naturally spoken sentences with obligatory IPBs, observed activation differences might in part be due to the additional semantic and syntactic differences between the sentences, rather than being solely due to the presence of IPBs. To differentiate the processing of prosody from associated syntactic and semantic processing, also a hummed version of each sentence was
produced by a trained speaker who was instructed to preserve the natural prosody of the original sentence. The aim of the present study was to identify the neural structures involved in the processing of sentence-level prosody by comparing sentences with two IPBs to sentences that have only one IPB. To differentiate prosodic processing from syntactic and semantic processing, differences between sentences with a different number of IPBs were investigated separately for natural speech and hummed sentences. With regard to the neural correlates of IPB processing we hypothesized that the primary auditory cortices and the auditory association areas play an important role. These areas are, among others, involved in the processing of complex auditory signals and speech (see, for a review, Griffiths et al., 2004). The superior temporal gyrus has been observed to be involved in processing of prosody (Hesling et al., 2005; Doherty et al., 2004). Activation in regions outside the temporal lobe has also been reported but appears to vary across different studies. These activations were observed to depend on the specifics of the task (Tong et al., 2005; Plante et al., 2002), the type of prosody involved (e.g., affective vs. linguistic, Wildgruber et al., 2004) and the degree of propositional information contained in the stimulus materials (Hesling et al., 2005; Tong et al., 2005; Gandour et al., 2002, 2004). It is possible, that areas outside the temporal lobe are also involved and that the auditory association areas are only part of a more extended processing network. IPBs are realized by variations in the prosody of an utterance. We therefore hypothesized that the superior temporal gyrus, among others, will show a modulation, positive or negative, due to the presence of an additional IPB. Furthermore, given the observed shift of the CPS from a bilateral distribution for natural speech to a more right hemispheric distribution for hummed sentences a more right hemispheric lateralization for prosodic aspect may be observed. Evidence from patients with brain lesions inspired a first raw hypothesis about prosody processing, namely, that the right hemisphere plays a dominant role in prosody processing. Patients with lesions within the left 7
hemisphere often suffer from aphasia, while non-aphasic patients with right hemispheric damage seem to have difficulties to perceive or produce the prosodic aspects of speech (see, for a review, Wong, 2002, but see, Perkins et al., 1996). On the basis of this first raw hypothesis, two more sophisticated classes of hypotheses have been developed. According to the acoustic or cue-dependent class of hypotheses, hemispheric dominance is determined solely by the acoustic properties of the auditory signal. Zatorre and Belin (2001), for example, suggested that the left hemisphere is specialized in processing the high frequency components that generate the vowels and consonants in speech, while the right hemisphere is specialized in processing the low frequency patterns that make up the intonational contour of a syllable or sentence (for a similar view, see, Poeppel, 2003). According to the class of functional or taskdependent hypotheses, hemispheric dominance depends on the function of prosodic information (van Lancker, 1980) or on the attentional focus required for example by an experimental task. A shift from the right hemisphere to the left is assumed to occur, when the task or function of the prosodic information contained in a speech signal involves language processing rather than prosody processing. A review of the available empirical studies suggests that lateralization is stimulus-dependent but can, in addition, vary as a function of task (Friederici & Alter, 2004). It should be noted, however, that the first raw hypothesis of right-hemispheric dominance of prosody is still under debate. If prosodic processes are mainly subserved by the right hemisphere we should find the right hemisphere being activated in both natural and hummed speech and if, moreover, prosody is represented neuronally as a separate layer a direct comparison between hummed minus natural speech should result in more left hemispheric activation for natural speech.
Sixteen healthy right-handed healthy young adults (8 female; mean age: 26.1 years, SD: 4.44) took part in the experiment. The data of two participants were discarded from the analysis, one because of scanner malfunction, the other because of too many errors. All participants were or had been students of the University of Leipzig. They were native speakers of German with normal hearing and had no history of neurological or psychiatric illness. Volunteers were paid for their cooperation and had given written consent before the experiment. Ethical approval to the present study has been provided by the University of Leipzig. Materials Forty-eight German sentence pairs were constructed, that either had one or two intonational phrase boundaries (IPB). Ideally, the materials used should only differ with regard to the number of IPBs they contain. If possible, they should consist of the same words to ensure similar lexical retrieval processes. They should also contain the same number of words and syllables, so that they do not differ in length. To ensure some variability in the sentence materials used with regard to the position of the additional IPB, two types of sentence pairs were constructed. One type (A) had an early additional phrase boundary, one type (B) an additional phrase boundary at a later position in the sentence (see, Table 1, for examples of the sentence materials). The IPBs contained in the sentences were obligatory because such materials had been investigated in previous electrophysiological studies. Sentences of type A had been used in earlier electrophysiological studies where they were observed to elicit the CPS component (Steinhauer et al., 1999). These materials were also found to elicit the CPS component when they were low-pass filtered (Steinhauer & Friederici, 2001) or hummed (Pannekamp et al., 2005). In addition, coordination structures like the type B sentences used here also have been investigated in previous electrophysiological studies (Steinhauer, 2003) and a CPS was observed. The pairs of one and two IPB sentences of types A and B were as similar as possible with regard to the words used. In type B, the same content words were used. Also in type A, the same content words were used, with the exception of the verb, which 9
was either transitive or intransitive. To ensure comparability, the frequency and number of syllables of the verb was matched within a type A sentence pair. We had constructed these sentence materials with the aim to create conditions that do not differ by more than the aspect under scrutiny, namely, the number of IPBs. It should be noted that the sentence materials constructed for this experiment come close to this ideal but are not perfect. The type A (and type B) sentences with one or two IPBs additionally differ from each other with regard to their syntactic structure, and word order (type B). A possibility to circumvent these differences would have been the choice of sentence materials with optional IPBs rather than obligatory IPBs as in the materials used here. While obligatory IPBs correspond to major syntactic boundaries, optional IPBs are on a lower level of the prosodic hierarchy, namely the level of phonological phrases (Selkirk, 1995). They do not necessarily correspond to syntactic phrases (Truckenbrodt, 2005). Furthermore, optional IPBs depend on several factors such as speech rate and hesitations (filled or unfilled pauses) which might make them more difficult to detect for the listener. Although it is highly probable that optional IPBs also elicit a CPS component, this has not yet been investigated. The choice of our materials was primarily motivated to ensure comparability to previous electrophysiological studies where a CPS component was observed. Although not optimal, we think that our materials are reasonably comparable to allow inferences on prosodic processing. --- insert Table 1 about here --The sentences were spoken with natural prosody. Additionally, a hummed version of each sentence was recorded. All materials were recorded from a trained female native speaker of German. The speaker was instructed to speak a hummed version of the sentence after its naturally spoken version, and to take care to preserve its original prosody, speed and total number of syllables of the normal sentence. In Figure 1, spectrograms and fundamental frequency contours are given for type A and type B example sentence (hummed and naturally
spoken). The recorded materials were digitized (44.1 KHz) and normalized (70%) with regard to amplitude envelope to ensure an equal volume for all stimuli. --- insert Figure 1 about here --Design and Procedure The task of the participants was to indicate in a two alternative forced choice task whether a probe word presented after the sentence had been contained in the sentence or not. Probe words were chosen for each sentence and belonged to different word classes (e.g., nouns, verbs, determiners). The probes were spoken with final rising pitch, indicating a question. Of the 96 sentences in total, 72 were selected as experimental materials, 18 as probe sentences, and 6 as practice sentences. In each trial participants had to decide whether the probe word had occurred in the sentence or not. 'Yes' ('No') answers were given with the middle (index) finger of the right hand. Sentences from the experimental materials never contained the probe. The probe sentences were not analyzed. In the case of the naturally spoken probe sentences, the probe word was always contained in the sentence and a 'yes' answer was expected. In the case of the hummed experimental sentence materials, the task was very easy. As soon as the hummed sentence presentation finished, participants knew that 'no' had to be the answer. The main purpose of the task in the case of the hummed sentence materials was to have participants attend to the presentation of the sentence while it lasted. In the hummed probe sentences, one word of the hummed sentence was naturally spoken. To prevent that the participants suspended attention as soon as they detected a naturally spoken word in the hummed probe sentence, the probe word was identical to the naturally spoken word within the probe sentence in only half of the trials. This ensured that participants had to wait until the presentation of the probe word. Naturally spoken and hummed sentence materials were presented in alternating blocks. Every block consisted of 2 practice sentences, 24 experimental sentences, 6 probe-sentences, and 6 null events (i.e., trials in which no auditory stimulus was presented). The null events 11
were included to increase the efficiency of the design (Liu et al., 2001) and to provide a baseline condition for analysis. The blocks of hummed materials were of the same structure. The total experiment consisted of 6 blocks (3 hummed, 3 normal), that is, 228 trials in total. Trials were randomized with first-order transition probabilities between conditions held constant. Three randomizations were generated in total. Each trial began with the presentation of a hummed or normal sentence. Then a probe word was presented 500 ms after the offset of the sentence. After the presentation of the probe a waiting time was inserted to ensure a total inter-trial interval (ITI) of 11 seconds (see Figure 1). With a TR of 3 s and an ITI of 11 s, trial presentation and scanner triggering were synchronous every three trials. To ensure synchronization, the beginning of every third trial was synchronized using the scanner trigger. After synchronization a random waiting time of 0 to 600 ms was inserted before the presentation of the sentence. Stimulus presentation and response time recording were controlled by a computer outside the scanner using Presentation software (Neurobehavioral Systems Inc.). The experimental materials were presented over earphones within the scanner. The participants were additionally protected from scanner noise by earplugs. They reported that they had no difficulties understanding the sentences over the scanner noise. A button box compatible with the scanner environment was used to record the responses of the participants. --- insert Figure 2 about here --fMRI data acquisition All imaging data were acquired using a 3T Bruker 30/100 Medspec MR scanner. A high resolution structural scan using the 3D modified driven equilibrium Fourier transform (MDEFT) imaging technique was also obtained (128 sagittal slices, 1.5 mm thickness, inplane resolution: 0.98 x 0.98 mm, field of vision (FOV): 250 mm). Functional images were acquired using a T2* gradient echo-planar imaging (EPI) sequence. For the functional measurement a noise-reduced sequence was chosen to ensure that noise would not impair comprehension. 18 ascending axial slices per volume were continuously obtained, in the plane 12
of the anterior and posterior commissure (in-plane resolution of 3 x 3 mm, FOV: 192 mm, 5 mm thickness, 1 mm gap, echo time (TE): 40 ms, repetition time (TR): 3000 ms). The 18 slices covered the temporal lobes and part of the parietal lobes and cerebellum. According to a short questionnaire given to the participants after the experiment the noise level of the scanner did not critically impair the auditory quality and comprehensibility of the naturally spoken or the hummed stimulus materials. Functional data analysis FMRI data analysis was performed with statistical parametric mapping (SPM2) software (Wellcome Department of Cognitive Neurology, London, U.K.). The six blocks were measured as separate runs with 142 volumes each. The first two images of each functional run were discarded to ensure signal stabilization. The functional data of each participant were motion-corrected. The structural image of each participant was registered to the time series of functional images and normalized using the T1 template provided by SPM2, corresponding approximately to Talairach & Tournoux Space (Talairach & Tournoux, 1988). The functional images were normalized using the normalization parameters of the structural image and then smoothed with a full-width half-maximum (FWHM) Gaussian kernel of 12 mm. A statistical analysis on the basis of the general linear model was performed, as implemented in SPM2. Though hummed and normal sentence materials were presented in blocks, an event-related analysis was chosen. This made it possible to compare the sentences to the null events interspersed within the blocks as a baseline, as well as to discard probe trials and error trials from the analysis. The delta-function of the trial onsets per condition was convolved with the canonical form of the hemodynamic response function as given in SPM2 and its first and second temporal derivative to generate model time courses for the different conditions. Each trial was modelled in SPM2 using the beginning of the sentence as trial onset. Due to the length of the auditorily presented sentence stimuli the BOLD response was modelled with a duration of 5.345 s (average length of the sentences, hummed and natural). Errors and probe 13
sentences were modelled as separate conditions and excluded from the contrasts calculated. The functional time series was high-pass filtered with a frequency cut-off of 1/80 Hz. No global normalization was used. Motion parameters and the lengths of the individual sentences per trial were entered into the analysis as parameters of no interest. For the random effects analysis, the calculated contrast images from the first level analysis of each participant were entered into a second level analysis. ROI-Analysis: Eight regions of interest (ROIs) were created on the basis of the AAL-atlas (automated anatomical labelling (AAL) atlas, TzourioMazoyer et al., 2002): superior temporal gyrus, the Rolandic operculum, the gyrus of Heschl and the inferior frontal gyrus (pars triangularis and pars opercularis). The ROI for the superior temporal gyrus was divided into three parts, anterior (y > -15), middle (-35 < y < -15) and posterior (y < -35), and the Rolandic operculum into two parts, anterior (y > -15) and posterior (y < -15). A visualization of the temporal ROIs is given in Figure 3. The ROIanalysis was performed by averaging the effect-sizes (contrast estimates) over all voxels for the left and right ROIs per participant. --- insert Figure 3 about here --Results Behavioral Participants were excluded from further analysis if they made more than 30% errors in any condition. This led to the exclusion of one participant in addition to the one participant that was excluded due to scanner malfunction. Analyzing the data of the remaining fourteen participants yielded 144 errors (5.7 %) in total. Response times were measured from the presentation of the probe word. Only correct responses to experimental trials, not probe trials, within the interval of 200 to 2000 ms were analyzed. Response times were entered into a repeated measures analysis of variance with SPEECH TYPE (natural vs. hummed) and IPB (one vs. two IPBs) as factors. Participants reacted significantly faster to the hummed (473 ms, SD = 46.05) than to the naturally spoken sentence materials (753 ms, SD = 85.36), yielding a 14
significant main effect of SPEECH TYPE (F(1,13) = 30.64, MSE = 35691, p < 0.001). The number of IPBs did not influence response times; the main effect of IPB and the interaction SPEECH TYPE x IPB was not significant. FMRI-data: whole brain analysis Comparisons of sentence materials with a different number of IPBs. For natural speech, sentences with two IPBs activated the superior temporal gyrus, extending to the Rolandic operculum and gyrus of Heschl, bilaterally, more strongly than sentences with one IPB (Figure 4a, Table 2). In the corresponding comparison for hummed sentences, a significant cluster was observed in the left supramarginal gyrus, extending to the left superior temporal gyrus and the left gyrus of Heschl. No significant clusters were observed in the reverse comparisons, for natural speech as well as for hummed sentences. --- insert Figure 4 & Table 2 about here --Comparisons of the two basic types of materials (hummed sentences and natural speech) to baseline When natural speech was compared to baseline (null events), the strongest activations were observed within the superior temporal gyrus, bilaterally, and the SMA (Table 2). Further activations were observed within the right precentral gyrus, the right insula, the left superior parietal lobule and in the cerebellum. When hummed sentences were compared to baseline, activations were observed within the superior temporal gyrus, bilaterally, the precentral gyrus, bilaterally, the SMA and the right middle frontal gyrus, as well as the left inferior parietal lobule. Activations were also observed within the cerebellum, bilaterally. Comparisons of hummed sentences to natural speech Compared to hummed sentences, natural speech activated the frontal gyrus and middle temporal gyrus, bilaterally but with a left-hemispheric predominance, as well as the left angular gyrus and the left thalamus and caudate, more strongly (Figure 4b, Table 2). No brain area was significantly more activated for hummed sentences than for natural speech. 15
FMRI-data: ROI-Analysis To further investigate and compare the behaviour of the brain areas potentially involved in IPB processing, we conducted an ROI-analysis over eight ROIs (pars opercularis and triangularis of the inferior frontal gyrus, anterior, middle and posterior part of the superior temporal gyrus, gyrus of Heschl, and anterior and posterior Rolandic operculum). The effect sizes for each condition were averaged over all voxels contained within each ROI per side and participant. The results per ROI, hemisphere and condition are shown in Figure 5. They were entered into a repeated measures analysis of variance with ROI (8), HEMISPHERE (left vs. right), IPB (1 vs. 2 IPS), and SPEECH TYPE (naturally spoken vs. hummed) as factors. The left hemisphere was on average more strongly activated than the right hemisphere, which is reflected in a significant main effect of HEMISPHERE (F(1,13) = 49.53, MSE = 2.279, p < 0.001). Natural speech activated the brain areas investigated more strongly than hummed speech, yielding a significant main effect of SPEECH TYPE (F(1,13) = 11.93, MSE = 4.415, p < 0.01). More activation was observed for sentences containing two intonational phrase boundaries than sentences containing one intonational phrase boundary giving a main effect of IPB (F(1,13) = 17.45, MSE = 0.385, p < 0.01). The different overall activation levels within the ROIs yielded a significant main effect of ROI (F(7,91) = 72.71, MSE = 1.435, p < 0.001). SPEECH TYPE and IPB yielded additive effects, as none of the interactions containing both factors reached significance. The type of speech materials (hummed or natural) influenced lateralization, yielding a significant SPEECH TYPE x HEMISPHERE interaction (F(1,13) = 30.59, MSE = 0.419, p < 0.001) and depended on the ROI giving a significant triple ROI x SPEECH TYPE x HEMISPHERE interaction (F(7,91) = 3.92, MSE = 0.103, p < 0.001). The lateralization, type of materials, and number of IPBs, influenced activation differently in different ROIs, giving the significant two-way interactions ROI x HEMISPHERE (F(7,91) = 19.83, MSE = 0.437, p < 0.001), ROI x SPEECH TYPE (F(7,91) = 6.40, MSE = 0.175, p < 0.001) and ROI x IPB (F(7,91) = 16.00, MSE = 0.018, p < 0.001). 16
The significant triple interactions between ROI x HEMISPHERE x IPB (F(7,91) = 6.14, MSE = 0.007, p < 0.001) and ROI x HEMISPHERE x SPEECH TYPE (F(7,91) = 3.92, MSE = 0.103, p < 0.001) indicate that SPEECH TYPE and IPB each interacted with hemisphere differently in different ROIs. The remaining interactions were not significant. To identify these differences between ROIs more closely we conducted additional ANOVAs separately for different ROIs. Significant main effects for IPB were observed in all ROIs except for the inferior frontal gyrus ROIs. The main effect SPEECH TYPE was significant in all ROIs but the anterior and posterior Rolandic operculum. Significant main effects for HEMISPHERE were observed in all ROIs with the exception of the anterior STG and the anterior Rolandic operculum. The HEMISPHERE x SPEECH TYPE interaction was significant in all but three ROIs, the anterior and posterior Rolandic operculum and the gyrus of Heschl. A HEMISPHERE x IPB interaction was observed in the posterior STG and the gyrus of Heschl. No ROI showed a significant SPEECH TYPE x IPB interaction. The triple interaction HEMISPHERE x SPEECH TYPE x IPB was significant only in the anterior Rolandic operculum. To investigate differences with regard to lateralization for the two types of material in every ROI more closely, Newman-Keuls post-hoc tests were calculated for the interaction HEMISPHERE x SPEECH TYPE. A stronger activation within the left hemisphere compared to the right was observed for natural speech in nearly all ROIs, except the anterior Rolandic operculum and the anterior part of the superior temporal gyrus. For hummed speech a lefthemispheric dominance was observed in only four ROIs, the gyrus of Heschl, the middle and posterior part of the superior temporal gyrus and the posterior Rolandic operculum. No ROI showed a significantly stronger activation of the right hemisphere for hummed sentences compared to natural speech. --- insert Figure 3 about here --Discussion 17
The aim of the present study was to identify the brain areas involved in the processing of sentence-level prosody by investigating the processing of intonational phrase boundaries (IPBs). We will first discuss our results with regard to differences in the processing of sentences with two IPBs as compared to sentences with one IPB, and later turn to the general differences between natural and hummed speech. Brain areas involved in IPB processing In the whole brain analysis stronger activation was observed for sentences with two IPBs than sentences with one IPB within the left superior temporal gyrus, for natural speech as well as for hummed sentences. An additional focus within the superior temporal gyrus on the right side was significant for natural speech but failed to reach significance for hummed speech. This indicates that processing an additional IPB activates the superior temporal gyrus similarly for natural speech and hummed sentences. In the ROI-analysis (pars opercularis and triangularis of the inferior frontal gyrus, anterior, middle and posterior part of the superior temporal gyrus, gyrus of Heschl, and anterior and posterior Rolandic operculum), more activation for two than for one IPB was observed only in the temporal ROIs, but not within the inferior frontal gyrus. This indicates that prosody processing mainly involves brain areas related to auditory processing. In addition, all but one ROIs, the posterior Rolandic operculum, showed a significant main effect or an interaction with SPEECH TYPE (natural speech or hummed sentences). This result could be taken to indicate that these regions are involved in the processing of the more complex segmental information contained in natural speech. The posterior Rolandic operculum (bilaterally), on the other hand, did not show stronger activation for natural speech compared to hummed sentences: there was no main effect or any interaction with SPEECH TYPE. This suggests that this region might play a less pronounced role in the processing of the specific spectral composition and the additional linguistic information (semantics, syntax) contained in natural speech. As this region, however, was more strongly activated by 18
materials containing two IPBs rather than one (main effect of IPB) it could be speculated that this region might be specifically involved in the processing of the prosodic information. Interestingly, there was no interaction of IPB and SPEECH TYPE in any of the ROIs investigated. This indicates that the activation elicited by the presence of an additional IPB in the ROIs investigated did not depend on the comprehensibility of the utterance. Although IPBs can aid the understanding of an utterance, we did not observe any evidence for an interaction in the respective ROIs, not even within the left inferior frontal gyrus, a region assumed to be involved in syntactic processing. This finding should not lead to the conclusion, however, that prosody and syntax do not interact. It is possible that syntax and prosody are processed independently from each other in different brain regions and that the interaction between both types of information occurs in higher associative areas within the brain. This is also suggested by a recent ERP study with patients suffering from lesions in the corpus callosum (Friederici et al., 2007). So far, only one neuroimaging study investigated the perception of IPBs (Strelnikov et al., 2006). In the condition with IPB, the IPB could be in one of two positions, changing the meaning of the sentence. 'To chop not # to saw' ('Rubit nelzya, pilit') means: 'not to chop but to saw', whereas 'To chop # not to saw' ('Rubit, nelzya pilit') means 'to chop and not to saw'. This is a possible construction in Russian because, different from Germanic languages such as German, Dutch or English, the negative word ‘not’ may have scope to its left or right depending on the position of an IPB. In this condition, participants were first presented with one of the two versions and had to select the appropriate alternative (one needs to: chop/saw). In the condition without IPB, participants were first presented with a simple statement ('Father bought him a coat') and then had to select the appropriate alternative ('His father bought him: a coat/a watch'). Strelnikov and colleagues reported stronger activation for sentences containing an IPB (e.g., 'To chop not # to saw', or 'To chop # not to saw') than for sentences without an IPB (e.g., 'Father bought him a coat') within the right posterior prefrontal cortex 19
and an area within the right cerebellum. Different from our results, Strelnikov et al. did not observe stronger activation within the superior temporal gyrus. However, as already outlined in the introduction, the task used was different for both sentences types. Although the task consisted in both cases of a visually presented question with two response alternatives ('Father bought him: a coat / a watch' and 'One needs to: saw / chop', for the two examples, respectively), the first type of materials (sentences without IPBs) required a semantic judgement based on the segmental content of the utterance while the second type of materials (sentences with an IPB) required a semantic judgement based on the segmental content as well as the prosodic information (IPB position) of the utterance. It is therefore possible that the differences observed by Strelnikov et al. (2006) might not only be due to the presence or absence of IPBs in the materials but also to task variables. To summarize our results so far, we observed a stronger activation within the superior temporal gyrus for sentences containing two IPBs compared to sentences containing one IPB. The activation extended to the gyrus of Heschl and the Rolandic operculum. To find out whether one of these areas might show a specialization with regard to prosody processing as compared to the processing of the segmental content of speech we also conducted an ROI analysis. In the ROI-analysis, the posterior part of the Rolandic operculum was the only brain region that showed a modulation of activation due to the presence of an additional IPB independent of the amount of segmental information contained in natural speech as compared to hummed sentences. The processing of hummed sentences compared to natural speech Hummed sentences also represent a human vocalization and a complex auditory signal. Comparing the processing of natural speech with hummed sentences can therefore yield information with regard to the brain areas processing the segmental content of speech as compared to brain areas processing the more basic aspects of speech. Comparing the processing of hummed sentences to baseline gives an initial overview of the brain areas 20
involved in the processing of hummed speech, as well as, potentially, in the processing the prosodic aspects of speech. The strongest activations were observed within the superior temporal gyrus, extending to the Rolandic operculum and the gyrus of Heschl, areas involved in auditory processing. Strong activations were also observed within the SMA, the right precentral gyrus and the cerebellum. These activations are most likely due to the manual response required by the task. On the basis of the logic of cognitive subtraction, subtracting hummed sentences from natural speech should yield brain areas that are involved in the processing of segmental information, lexico-semantics and syntax. Stronger activations for natural speech than for hummed sentences were observed in the middle temporal gyrus, bilaterally, and within the left opercular and triangular part of the inferior frontal gyrus, corresponding approximately to BA 44 and 45. These fronto-temporal activations are related to the processing of syntactic and semantic information contained in natural speech. Similar activations have been reported in other studies comparing natural speech to unintelligible speech (Spitsyna et al., 2006). Natural speech also activated the superior temporal gyrus more strongly than hummed speech, possibly reflecting the involvement of the auditory cortex in processing the more complex spectral components carrying the segmental information of natural speech. It should be noted that the probe monitoring task used here might have had some influence on the brain activation patterns observed. Our task induced an attentional focus on lexico-semantic processing rather than pure prosody processing that might also have influenced the activations we observed in temporal areas for hummed speech. In the following, we will further discuss our findings on the background of the organization of the auditory system with regard to its relation to speech and prosody processing. The brain areas involved in the processing of complex sounds and speech The auditory association cortex seems to represent mainly spectral and temporal properties of auditory stimuli rather than more abstract high-level properties such as auditory objects (see, for reviews, Griffiths et al., 2004; Griffiths & Giraud, 2004). The auditory 21
association areas are assumed to be processing more abstract properties than primary auditory cortex. Some of these properties are assumed to be processed in separate regions. Rauschecker and Tian (2000) proposed that anterior regions subserve auditory object identification while posterior regions process spatial information, similar to the ventral ('what') and dorsal ('where') pathways of the visual system (Ungerleider & Mishkin, 1982). Although such an anterior-posterior distinction is supported in part by single cell recordings in monkeys (Recanzone, 2000; Tian et al., 2001) as well as by lesion (Adriani et al., 2003; Clarke et al., 2000) and brain imaging studies (Maeder et al., 2001), the distinction is not clear cut. The functional nature of this anterior-posterior division is therefore still under debate (Zatorre et al., 2002). With regard to the processing of speech, the dual pathway model (Rauschecker & Tian, 2000) suggests that speech should activate regions anterior to the primary auditory cortex, as it is a familiar and highly complex auditory signal with no relation to spatial information. When intelligible speech is compared to non-speech, stronger left-lateralized activations obtain in anterior and middle parts of the superior temporal sulcus, for single words as well as for sentences (Binder et al., 2000; Narain et al., 2003; Scott al., 2000; Meyer et al., 2004). However, often also activation within the posterior temporal and inferior parietal regions is observed in imaging studies (Meyer et al., 2002; Spitsyna et al., 2006). Although not directly derivable from the dual pathway model, this finding is not altogether surprising. It has long been known from lesion studies that posterior temporal and inferior parietal regions are critical for supramodal language comprehension (Geschwind, 1965). The stronger activations for natural speech than for hummed sentences observed in the present study within the superior temporal gyrus are compatible with previous results. We observed stronger activation for natural than for hummed speech in middle and posterior parts of the superior temporal gyrus and the posterior part of the Rolandic operculum. With regard to the lack of activation in anterior parts of the superior temporal lobe, it could be speculated 22
that hummed speech is recognized as a familiar human vocalization. This familiarity might explain differences to results from studies using speech materials that are rendered unintelligible artificially and sound more unfamiliar (Scott et al., 2000; Meyer et al., 2004). It should be noted, however, that the activations observed within the temporal cortex for speech may not be specific for speech. Similar activations have been observed for other complex auditory stimuli such as musical sounds and melodies (Griffiths et al., 1998; Patterson et al., 2001; see, for a review, Price et al., 2005). It is possible that differences between speech and other complex auditory stimuli are subtle and go beyond simple localization. A recent study by Tervaniemi et al. (2006), for example, showed that the superior temporal region reacts differently to changes in pitch or duration for speech syllables than for musical sounds. The brain areas involved in the perception of prosody Activations within the superior temporal gyri have been regularly observed in imaging studies when speech with or without prosodic modulation is presented and compared to baseline (e.g. Wildgruber et al., 2004; Hesling et al., 2005; Gandour et al., 2003, 2004). Regions outside the temporal lobe have also been observed to show a modulation of activation dependent on prosodic properties but appear to vary considerably across different studies. Prosody-related activations were observed to depend on the specifics of the task (Tong et al., 2005; Plante et al., 2002), the type of prosody involved (e.g., affective vs. linguistic, Wildgruber et al., 2004) and the degree of propositional information contained in the stimulus materials (Hesling et al., 2005; Tong et al., 2005; Gandour et al., 2003, 2004). Although the superior temporal region seems to be the main candidate for the perception of prosodic modulation in speech, other areas outside the temporal lobe are likely to be involved and the auditory association areas are likely to be only part of a more extended processing network for prosodic aspects of speech. Imaging studies investigating the processing of sentence-level prosody often compared natural speech to speech stimuli either with no segmental information or with no or little 23
prosodic information. One possibility is to remove the segmental information by filtering, preserving the intonational contour of the sentence (Meyer et al., 2002, 2004). The results from these studies point towards a fronto-temporal network with a right hemispheric dominance. The Rolandic operculum, in particular in the right hemisphere, has been identified as part of this network. Another approach is to remove the intonational contour of sentences, for example, by high-pass filtering (Meyer et al., 2004; Hermann et al., 2003), or to reduce prosody information by speaking with little prosodic expressiveness (Hesling et al., 2005). The rationale for this approach is to identify prosody-related activations by subtracting lowprosody speech from normal speech, thus ideally subtracting away activations related to the processing of the segmental content of speech while leaving prosody-related activations. The results from these studies, however, are mixed. Meyer et al. (2004) did not observe stronger activations for normal speech than for low-prosody speech. It is possible, that this is finding is due to the task that required a prosody comparison between two successive sentences stimuli in an experimental setting in which fattened speech [-prosodic information] and degraded speech [-segmental information were presented in a pseudo-randomized order. Hesling et al. (2005) observed different results for intelligible and nonintelligible (i.e., low-pass filtered) speech. In the case of high-prosody intelligible speech, the right superior temporal gyrus was more strongly activated than in the case of low-prosody intelligible speech. High-prosody nonintelligible speech did not activate any brain regions more strongly than low-prosody nonintelligible speech. These results indicate that the additional prosodic information contained in high-prosodic speech activates the superior temporal gyrus, although not very strongly. Finally, a study by Doherty and colleagues (2004) compared sentences with a rising pitch (question) to sentences with a falling pitch (statement). Although their results might not exclusively reflect prosody processing, they observed, among others, a stronger activation within the superior temporal gyrus, bilaterally, for sentences with a rising pitch. These findings also indicate that the superior temporal gyrus might play a dominant role in the first 24
stages of prosody processing. The results of the present study, namely, that areas within and around the superior temporal gyrus are involved in IPB processing therefore agree well with previous results. The lateralization of prosody processing Although a right hemispheric dominance in prosody processing was initially suggested based on evidence from patients, the results with regard to the perception of sentence-level nonaffective prosody show a mixed picture. Some studies show greater impairment in RHD patients compared to LHD and controls (Bryan, 1989), whereas others find greater impairment for LHD (Pell & Baum, 1997; Perkins et al., 1996), or a more complicated pattern of preserved function alongside impairments for both patient groups compared to controls (Baum & Dwivedi, 2003; Baum et al., 1997; Walker et al., 2001; Imaizumi et al., 1998). A possible reason for these differences might be that the patient groups comprise patients with non-overlapping patterns of brain damage, often in fronto-parietal areas. Cognitive functioning in these patients might therefore be compromised in a number of domains, such as, for example, working memory and drawing inferences, faculties required in complex tasks such as prosodic disambiguation of sentences. Imaging studies investigating the perception of sentence-level prosody with healthy adults circumvent this problem and allow a separate assessment of lateralization for different brain regions. A number of imaging studies have investigated the lateralization of prosodic processing (Hesling et al., 2005; Meyer et al., 2002, 2004; Kotz et al., 2003). Meyer et al. (2002, 2004) observed stronger right-hemispheric activations within fronto-temporal areas for low-pass filtered speech than for natural speech whereas Kotz et al. (2003) using natural speech stimuli observed predominantly bilateral activations. Hesling et al. (2005) found activation within the right superior temporal gyrus when high-prosody intelligible speech was compared to low-prosody intelligible speech. Another way to remove the propositional content of speech while preserving its prosodic contour is to present foreign language 25
materials to listeners with no knowledge of this language. In an experiment by Tong et al. (2005), bilateral activation was observed for English speakers listening to Chinese language materials for eight out of nine ROIs investigated. A stronger right-hemispheric activation was observed for only one ROI within the middle frontal gyrus. Tong et al. also found mostly bilateral activation (six out of nine ROIs) for Chinese participants listening to Chinese, and a dependence of lateralization on the type of task. This evidence indicates that prosody when presented together with segmental information is subserved by a bilaterally distributed network of fronto-temporo-parietal brain areas. In the present study, hummed sentence materials were presented as well as natural speech. Under the hypothesis that the right hemisphere is specialized in the processing of sentence-level prosody it was hypothesized that the processing of hummed speech compared to baseline shows right-hemispheric lateralization. In the whole brain analysis, bilateral activation was observed in the superior temporal gyrus and the precentral gyrus, and stronger activation of the right hemisphere was observed within the middle frontal gyrus for hummed sentences. In the ROI-analysis, six out of eight ROIs showed stronger activation within the left hemisphere for natural speech, whereas only four out of eight ROIs show a lefthemispheric dominance for hummed sentences. This suggests that prosodic processing is organized more bilaterally than the processing of other properties of speech, such as syntax or lexico-semantics. Further insights about the lateralization with regard to the processing of specific linguistic aspects of prosody can be derived from the lateralization of IPB processing. An interaction between the number of IPBs and lateralization was observed only in two ROIs (posterior STG, gyrus of Heschl) with both ROIs showing a stronger modulation of activation due to the number of IPBs in the left hemisphere, for natural speech as well as hummed sentences. This could be taken to indicate that these two temporal areas show a lefthemispheric dominance for prosody processing.
The left-hemispheric dominance in the temporal regions observed in the present study might be due to some degree to the task employed. While other studies required participants to attend directly to the prosodic information of the speech stimuli (Tong et al. 2005; Meyer et al., 2002, 2004), a probe detection task was used here that required participants, even in the hummed speech condition, to attend to and memorize a potentially appearing naturally spoken word carrying segmental information. The results of the present study are therefore compatible with the functional or task-dependent class of hypotheses. According to this class of hypotheses, the lateralization of prosody processing could shift from the right to the left hemisphere when the task or the stimulus materials promote attention to syntactic-semantic rather than prosodic properties of the materials.
Conclusion This study aimed at identifying the brain areas involved in the processing of sentence-level prosody, and in particular, the processing of intonational phrase boundaries (IPBs). Sentences with two IPBs activated the superior temporal gyrus, bilaterally, more strongly than sentences with one intonational phrase boundary. This pattern of activation was very similar for natural speech and hummed sentences. The results from the ROI analysis suggest that the posterior Rolandic operculum might play a specific role in the processing of prosodic information, because it was the only ROI not the showing an influence of the type of speech materials (hummed sentences or natural speech). When comparing natural speech and hummed sentence materials, we found natural speech to activate a number of areas in the left hemisphere more strongly than hummed sentences. The left-hemispheric dominance of temporal activations observed for hummed sentences, however, might be due to the attentional focus on segmental information required by the task employed in the present study.
References Adriani, M., Maeder, P., Meuli, R., Bellmann, A., Frischknecht, R., Villemure, J.-G., Mayer, J., Annoni, J.-M., Bogousslavsky, J., Fornari, E., Thiran, J.-P., Clarke, S. 2003. Sound recognition and localization in man: specialized cortical networks and effects of acute circumscribed lesions. Exp Brain Res 153:591-604. Austin J. 1975. How to do things with words. (William James Lectures). 2nd edition. M. Sbisa, J. Urmsson (Eds). Cambridge, Mass: Harvard University Press. Baum, S. R., Dwivedi, V. D. 2003. Sensitivity to prosodic structure in left-and righthemisphere-damaged individuals. Brain Lang 87:278-289. Baum, S., Pell, M., Leonard, C., Gordon, J. 1997. The ability of right- and left-hemispheredamaged individuals to produce and interpret prosodic cues marking phrasal boundaries. Lang Speech 40:313–330. Binder, J. R., Frost, J. A., Hammeke, T. A., Bellgowan, P. S. F., Springer, J. A., Kaufman, J. N., Possing, E. T. 2000. Human temporal lobe activation by speech and nonspeech sounds. Cereb Cortex 10:512–528. Brett, M., Christoff, K., Cusack, R., Lancaster, J. 2001. Using the Talairach atlas with the MNI template. Neuroimage 13:S85. Bryan, K. 1989. Language prosody and the right hemisphere. Aphasiology. 3:285–299. Clarke, S., Bellman, A., Meuli, R., Assal, G., Steck, A. J. 2000. Auditory agnosia and auditory spatial deficits following left hemispheric lesions: evidence for distinct processing pathways. Neuropsychologia 38:797-807. Cooper, WE., Paccia-Cooper, J. 1981. Syntax and Speech. Harvard University Press. Doherty, C. P., West, W. C., Dilley, L. C., Shattuck-Hufnagel, S., Caplan, D. 2004. Question/statement judgments: An fMRI study of intonation processing. Human Brain Mapp 23:85–98.
Frazier, L., Carlson, K., Clifton, C. 2006. Prosodic phrasing is central to language comprehension. Trends Cogn Sci 6:244-249. Friederici, A. D., Alter, K. 2004. Lateralization of auditory language functions: A dynamic dual pathway model. Brain Lang 89:267-276. Friederici, A. D., von Cramon, D.Y., Kotz, S. A. 2007. Role of the corpus callosum in speech comprehension: Interfacing syntax and prosdody. Neuron 53:135-145. Gandour, J., Dzemidzic, M., Wong, D., Lowe, M., Tong, Y., Hsieh, L., Satthamnuwong, N., Lurito, J. 2003. Temporal integration of speech prosody is shaped by language experience: An fMRI study. Brain Lang 84:318-336. Gandour, J., Wong, D., Lowe, M., Dzemidzic, M., Satthamnuwong, N., Tong, Y., Li, X. 2002. A cross-linguistic fMRI study of spectral and temporal cues underlying phonological processing. J Cogn Neurosci 14:1076–1087. Gandour, J., Tong, Y., Wong, D., Talavage, T., Dzemidzic, M., Xu., Y., Li., X., Lowe, M. 2004. Hemispheric roles in the perception of speech prosody. Neuroimage, 23:344357. Geschwind, N. 1965. Disconnexion syndromes in animals and man: Part I. Brain 88:237–294. Grabe, E., Warren, P., Nolan, F. 1994. Resolving category ambiguities - evidence from stress shift. Speech Communication 15:101-114. Griffiths, T. D., Giraud, A. L. 2004. Auditory Function. In: Human Brian Function. 2nd ed. R. S. J. Frackowiak, K. J. Friston, C. D. Frith, R. J. Dolan, C. J. Price, S. Zeki, J. Ashburner, W. Penny (Eds). Amsterdam, NL: Elsevier. pp. 61-75. Griffiths, T. D., Warren, J. D., Scott, S., K., Nelken, I., King, A. J. 2004. Cortical processing of complex sound: a way forward? Trends Neurosci. 27:181-185. Griffiths, T.D., Büchel, C., Frackowiak, R. S. J., Patterson, R. D. 1998. Analysis of temporal structure in sound by the human brain. Nat Neurosci. 1:422–427.
Grosjean, F., Hirt, C. 1996. Using prosody to predict the end of sentences in English and French: Normal and brain-damaged subjects. Lang Cogn Proc. 11:107-134. Herrmann, C. S., Friederici, A.D., Oertel, U., Maess, B., Hahne, A., Alter, K. 2003. The brain generates its own sentence melody: A Gestalt phenomenon in speech perception. Brain Lang, 85:396-401. Hesling, I., Clement, S., Bordessoules, M., Allard, M. 2005. Cerebral mechanisms of prosodic integration: evidence from connected speech. Neuroimage. 24:937-947. Imaizumi, S., Mori, K., Kiritani, S., Hiroshi, H., Tonoike, M. 1998. Task-dependent laterality for cue decoding during spoken language processing. Neuroreport. 9:899–903. Isel, F., Alter, K., Friederici, A. D. 2005. Influence of prosodic information on the processing of split particles: ERP evidence from spoken German. J Cogn Neurosci. 17:154-167. Kotz, S. A., Meyer, M., Alter, K., Besson, M., von Cramon, D. Y., Friederici, A. D. 2003. On the lateralization of emotional prosody: an event-related functional MR investigation. Brain Lang. 86:366– 376. Kreiman, J. 1982. Perception of sentence and paragraph boundaries in natural conversation. Journal of Phonetics, 10:163-175. Larkey, L. S. 1983. Reiterant speech: an acoustic and perceptual validation. J Acoust Soc Am. 73:1337-1345. Liu, T. T., Frank, L. R., Wong, E. C., Buxton, R. B. 2001. Detection power, estimation efficiency and predictability in eventrelated fMRI. Neuroimage. 13: 759–773. Maeder, P. P., Meuli, R. A., Adriani, M., Bellmann, A., Fornari, E.Thiran, J.-P., Pittet, A., Clarke, S. 2001. Distinct Pathways Involved in Sound Recognition and Localization: A Human fMRI Study. Neuroimage. 14:802-816. Meyer, M., Alter, K., Friederici, A. D., Lohmann, G., & von Cramon, D. Y. 2002. FMRI reveals brain regions mediating slow prosodic modulations in spoken sentences. Hum Brain Mapp. 17:73–88. 30
Meyer, M., Steinhauer, K., Alter, K., Friederici, A. D., von Cramon, D. Y. 2004. Brain activity varies with modulation of dynamic pitch variance in sentence melody. Brain Lang. 89:277-289. Narain, C., Scott, S. K., Wise, R. J. S., Rosen, S., Leff, A., Iversen, S. D., Matthews, P. M. 2003. Defining a left-lateralized response specific to intelligible speech using fMRI. Cereb. Cortex. 13:1362–1368. Nespor, M., Vogel I. 1983. Prosodic structure above the word. In: Prosody: Models and Measurements. A. Cutler, D. R. Ladd (Eds.) Berlin: Springer Verlag. pp. 123-140. Pannekamp, A., Toepel, U., Alter, K., Hahne, A., Friederici, A.D. 2005. Prosody-driven Sentence Processing: An Event-related Brain Potential Study. J Cogn Neurosci 17:407-421. Patterson, R. D., Uppenkamp, S., Johnsrude, I., Griffiths, T. D. 2002. The Processing of Temporal Pitch and Melody Information in Auditory Cortex. Neuron 36:767–776. Pell, M., Baum, S. 1997. The ability to perceive and comprehend intonation in linguistic and affective contexts by brain-damaged adults. Brain Lang 57:80–99. Perkins, J. M., Baran, J. A., Gandour, J. 1996. Hemispheric specialization in processing intonation contours. Aphasiology 10:343–362. Plante, E., Creusere, M., Sabin, C. 2002. Dissociating sentential prosody from sentence processing: Activation interacts with task demands. Neuroimage. 17:401–410. Poeppel, D. 2003. The analysis of speech in different temporal integration windows: Cerebral lateralization as asymmetric sampling in time. Speech Comm. 41:245–255. Price, C., Thierry, G., Griffiths, T. 2005. Speech-specific auditory processing: where is it? Trends Cogn Sci. 9:271-276. Rauschecker, J. P., Tian, B. 2000. Mechanisms and streams for processing of ‘what’ and ‘where’ in auditory cortex. Proc Natl Acad Sci 97:11800–11806.
Recanzone, G. H. 2000. Spatial processing in the auditory cortex of the macaque monkey. Proc Natl Acad Sci. 97:11829-11835. Rilliard, A., Aubergé, V. 1998. Reiterant Speech for the Evaluation of Natural vs. Synthetic Prosody. In: Third ESCA/COCOSDA Workshop on Speech Synthesis. Australia. 1998. 87-92. Rooij, J. J. de, 1975. Prosody and the perception of syntactic boundaries. IPO Annual Progress Report 10:36-39. Rooij, J. J. de, 1976. Perception of prosodic boundaries. IPO Annual Progress Report 11:2024. Schafer, A. J., Speer, S. R., Warren, P., White, S. D. 2000. Intonational disambiguation in sentence production and comprehension. J Psycholing Res 29:169-182. Schmahmann, J. D., Doyon, J., McDonald, D., Holmes, C., Lavoie, K., Hurwitza, A. S., Kabanic, N., Toga, A., Evans, A., Petrides, M. 1999. Three-Dimensional MRI Atlas of the Human Cerebellum in Proportional Stereotaxic Space. Neuroimage 10:233-260. Scott, S.K., Blank, C. C., Rosen, S., Wise, R. J. S. 2000. Identification of a pathway for intelligible speech in the left temporal lobe. Brain. 123:2400–2406. Searle J. 1969. An essay in the philosophy of language. Cambridge, MA: Cambridge University Press. Selkirk, E. 1995. Sentence prosody: intonation, stress, and phrasing. In The Handbook of Phonological Theory, J. A. Goldsmith (Ed.). Cambridge, MA: Blackwell. pp. 550-569. Selkirk, E. 2000. The interaction of constraints on prosodic phrasing. In Prosody: Theory and Experiment M. Horne (Ed.), Dordrecht: Kluwer Academic Publishing, pp. 231-262. Spitsyna, G., Warren, J. E., Scott, S. K., Turkheimer, F. E., Wise, R. J. S. 2006. Converging Language Streams in the Human Temporal Lobe. J Neurosci 26:7328-7336. Stark, C. E. L., Squire, L. R. 2001. When zero is not zero: The problem of ambiguous baseline conditions in fMRI. Proc Natl Acad Sci 98:12760–12766. 32
Steinhauer, K. 2003. Electrophysiological correlates of prosody and punctuation. Brain Lang 86:142-164. Steinhauer, K., Alter, K., Friederici, A. D. 1999. Brain potentials indicate immediate use of prosodic cues in natural speech processing. Nat Neurosci 2:191–196. Steinhauer, K., Friederici, A. D. 2001. Prosodic boundaries, comma rules, and brain responses: The closure positive shift in ERPs as a universal marker for prosodic phrasing in listeners and readers. J Psycholing Res 30:267–295. Strelnikov, K. N., Vorobyev, V. A., Chernigovskaya, T. V., Medvedeva, S. V. 2006. Prosodic clues to syntactic processing - a PET and ERP study. Neuroimage 29:1127-1134. Talairach, J., Tournoux, P. 1988. Co-planar stereotaxic atlas of the human brain. New York: Thieme. Tervaniemi, M., Szameitat, A., Kruck, S., Schröger, E., Alter, K., De Baene, W. Friederici, A. D. 2006. From air oscillations to music and speech: functional magnetic resonance imaging evidence for fine-tuned neural networks in audition. J Neurosci 23:8647– 8652. t'Hart, R. C. & Cohen, A. 1990. A perceptual study of intonation: an experimental-phonetic approach to speech melody. Cambridge, England: Cambridge Univ. Press Tian, B., Reser, D., Durham, A., Kustov, A., Rauschecker, J. 2001. Functional specialization in rhesus monkey auditory cortex. Science 292:290–293. Tomasi, D., Ernst, T., Caparelli, E. C., Chang, L. 2006. Common deactivation patterns during working memory and visual attention tasks: an intra-subject fMRI study at 4 Tesla. Human Brain Mapp 27:694-705. Tong, Y., Gandour, J., Talavage, T., Wong, D., Dzemidzic, M., Xu, Y., Li, X., Lowe, M. 2005. Neural circuitry underlying sentence-level linguistic prosody. Neuroimage 28:417-428.
Truckenbrodt, H. 2005. A short report on intonation phrase boundaries in German. Linguistische Berichte 203:273-296. Tzourio-Mazoyer, N., Landeau, B., Papathanassiou, D., Crivello, F., Etard, O., Delcroix, N., Mazoyer, B., Joliot, M. 2002. Automated anatomical labelling of activations in spm using a macroscopic anatomical parcellation of the MNI MRI single subject brain. Neuroimage 15:273-289. Ungerleider, L., Mishkin, M. 1982. Two cortical visual systems. In: Analysis of visual behavior. D. J. Ingle, M. A. Goodale, R. J. W. (Eds) Cambridge, MA: MIT Press. pp 549-586. Van Lancker, D. 1980. Cerebral lateralization of pitch cues in the linguistic signal. Papers in Linguistic: Int. J. Human Communication 13:200–277. Walker, J. P., Fongemie, K., Daigle, T. 2001. Prosodic facilitation in the resolution of syntactic ambiguities in subjects with left and right hemisphere damage. Brain Lang 78:169-196. Wildgruber, D., Hertrich, I., Riecker, A., Erb, M., Anders, S., Grodd, W., Ackermann, H., 2004. Distinct frontal regions subserve evaluation of linguistic and emotional aspects of speech intonation. Cereb Cortex 14:1384–1389. Wong, P. C. M. 2002. Hemispheric specialization of linguistic pitch patterns. Brain Res Bull 59:83–95. Zatorre, R. J., Belin, P. 2001. Spectral and temporal processing in human auditory cortex. Cereb Cortex. 11:946–953. Zatorre, R. J., Bouffard, M., Ahad, P., Belin, P. 2002. Where is ‘where’ in the human auditory cortex? Nat Neurosci 5:905–909.
This work was supported by a grant of the Human Frontier Science
Program (HFSP RGP5300/2002-C102) awarded to Kai Alter.
Table 1. Example of the stimulus materials and mean length of the sentences in secs.
Sentences (Naturally spoken and hummed)
Type A 1 IPB
Peter verspricht Anna zu arbeiten # und das Büro zu putzen Peter promises Anna to work and to clean the office
Peter verspricht # Anna zu entlasten # und das Büro zu putzen Peter promises to support Anna and to clean the office
Type B 1 IPB
Otto bringt Fleisch, # Ute und Georg kaufen Salat und Säfte für das Grillfest. Otto contributes meat, Ute and Georg buy salad and soft drinks to the barbecue.
Otto bringt Fleisch, # Ute kauft Salat # und Georg kauft Säfte für das Grillfest. Otto contributes meat, Ute buys salad and Georg buys soft drinks to the barbecue.
Table 2. Contrasts are thresholded at p < 0.001 uncorrected, p < 0.05 corrected on cluster level. Coordinates are reported as given by SPM2 (MNI space), corresponding only approximately to Talairach Turnoux space (Talairach and Tournoux, 1988; Brett et al., 2001). Anatomical labels are given on the basis of the classification of the AAL (automated anatomical labeling) atlas (Tzourio-Mazoyer et al., 2002). Cerebellar labels within the AALatlas are based on Schmahmann et al. (1999). The first label denotes the location of the maximum, the following labels denote further areas containing a majority of voxels of the activated cluster. Abbreviations: IFG_oper = inferior frontal gyrus, pars opercularis, IFG_tri = inferior frontal gyrus, pars triangularis, MFG = middle frontal gyrus, SFG = superior frontal gyrus, SMA = supplemental motor area, TP = temporal pole, MTG = middle temporal gyrus, STG = superior temporal gyrus, AG = angular gyrus, ROp = Rolandic operculum, HeschlG = gyrus of Heschl, SPL = superior parietal lobule, IPL = inferior parietal lobule, PrecG = precentral gyrus, PostG = postcentral gyrus, supramG = surpamarginal gyrus, k = cluster size. *Activation is part of a bigger cluster. 1 masked with natural speech > baseline (hummed sentences > baseline) at p < 0.05, inclusive. Side
x Natural Speech > Baseline
Frontal Left Right SMA 3 15 Right Insula* 36 24 Right PrecG* 30 -6 Right PrecG* 54 3 Temporal Left STG -60 -24 Right STG* 66 -18 Parietal Left SPL -30 -57 Cerebellum Left Right Right Crus1 42 -57 Natural > Hummed Sentences1 Frontal Left IFG_tri , IFG_oper -39 30 Right SMA 6 15 Right IFG_tri 45 27 Right MFG 33 3 Temporal Left TP, MTG, STG -57 9 Right MTG, STG 63 3 Parietal Left AG, SPL -27 -63 Basal Ganglia Left Thalamus, Caudate -15 -12 Natural Speech: 2 pauses > 1 pause Temporo-parietal Left HeschlG, STG, ROp -45 -18 Right STG, ROp, HeschlG 42 3 Natural Speech: 1 pauses > 2 pause ns
x Hummed Sentences > Baseline
54 6 30 48
7897 7897 7897 7897
5.36 4.77 4.63 4.09
SMA* MFG* PrecG
3 27 51
-3 36 -3
66 30 0
4928 4928 4928
5.81 4.42 6.40
-54 -36 -57
-36 -36 -36
92 93 87
4.00 4.04 3.82
0 54 18 54
600 209 50 65
5.13 5.42 4.13 4.30
Crus1 -42 VI 33 Crus1 45 Hummed > Natural Speech1 ns
Hummed sentences: 2 pauses > 1 pause 12 -12
SupramG, STG, HeschlG
Hummed sentences: 1 pauses > 2 pause ns
Figure 1. Spectrograms (0 - 5000 Hz) and fundamental frequency contours (pitch: 0 - 500 Hz) for sample stimuli of the sentence materials used for all conditions of the experiment. Natural speech is presented on the left, the respective hummed version of the sentence on the right. Spectograms and pitch contours are given for each pair (1 and 2 IPBs) of sentence materials, for type A sentences in the top half, for type B sentences in the bottom half. Figure 2. Trial timing scheme. Figure 3. A visualization of the six temporal ROIs as an ascending series of axial slices (shown only for the left hemishpere). ROIs: anterior (y > -15, light blue), middle (-15 > y > 35, yellow) and posterior (y < -35, brown) part of the superior temporal gyrus, gyrus of Heschl (dark blue), anterior (y > -35, medium blue) and posterior (y < -35, orange) part of the Rolandic operculum. Figure 4. a) Comparison of sentences with 2 IPBs to sentences with 1 IPB: 2 IPBs > 1 IPB. Natural speech (left) and hummed sentences (right). b) Comparison of speech types: Natural speech > hummed sentences. Threshold: p < 0.001, uncorrected, showing only clusters with more than 40 voxels, corresponding to a p < 0.05 corrected on cluster-level. Left is left in the image. Figure 5. Results of the ROI-analysis (bars with black stripes = left hemisphere, white bars = right hemisphere). Effect sizes were averaged over all voxels of each ROI. Error bars represent the standard error of the mean. Abbreviations: n1 = natural speech with 2 IPBs, n2 = natural speech with 2 IPBs, h1 = hummed sentences with 1 IPB, h2 = hummed sentences with 2 IPBs, IFG oper = inferior frontal gyrus, pars opercularis, IFG tri = inferior frontal gyrus, pars triangularis, ant/post RolOp = anterior/posterior part of the Rolandic operculum, ant/mid/post STG = anterior/middle/posterior part of the superior temporal gyrus, Heschl = gyrus of Heschl. 38
Footnotes 1. Although the term 'activation' suggest an absolute value, it is, in the case of fMRI data, always relative. This is due to two reasons. First, the statistical analysis only evaluates differences between conditions. Second, it is generally difficult to define an absolute baseline in brain activation for physiological reasons (Stark & Squire, 2001; Tomasi et al., 2006). In the present article, the term 'activation' is used only, when an experimental condition is compared to a low-level baseline. For comparisons between conditions the term 'activation difference' is used, or the term 'activation' with an adjective indicating the direction of the difference (e.g., 'stronger').