Emotional Intelligence, Not Music Training, Predicts ...

3 downloads 96 Views 335KB Size Report
Keywords: emotional intelligence, music training, emotional prosody, speech, intonation ... music were gathered by Juslin and Laukka (2003; see also Scherer,.
Emotion 2008, Vol. 8, No. 6, 838 – 849

Copyright 2008 by the American Psychological Association 1528-3542/08/$12.00 DOI: 10.1037/a0014080

Emotional Intelligence, Not Music Training, Predicts Recognition of Emotional Speech Prosody Christopher G. Trimmer and Lola L. Cuddy Queen’s University at Kingston Is music training associated with greater sensitivity to emotional prosody in speech? University undergraduates (n ⫽ 100) were asked to identify the emotion conveyed in both semantically neutral utterances and melodic analogues that preserved the fundamental frequency contour and intensity pattern of the utterances. Utterances were expressed in four basic emotional tones (anger, fear, joy, sadness) and in a neutral condition. Participants also completed an extended questionnaire about music education and activities, and a battery of tests to assess emotional intelligence, musical perception and memory, and fluid intelligence. Emotional intelligence, not music training or music perception abilities, successfully predicted identification of intended emotion in speech and melodic analogues. The ability to recognize cues of emotion accurately and efficiently across domains may reflect the operation of a cross-modal processor that does not rely on gains of perceptual sensitivity such as those related to music training. Keywords: emotional intelligence, music training, emotional prosody, speech, intonation

music training and sensitivity to emotional prosody. We note, however, that the status of this relation is far from clear. We next propose that sensitivity to emotional prosody may be an aspect of emotional intelligence, not music training, and present a study that supports this proposal.

Music and language share many commonalities, which has led to ongoing inquiry about their degree of association at cognitive and neural levels. Evolutionary theorists have typically ascribed a common origin for language and song due to many of these shared properties (Fitch, 2006). Current research has suggested an overlap of both neural architecture and cognitive processes based on shared structural properties (e.g., Patel, 2003, 2008), and low-level sound-based features of music and language (e.g., Scho¨n, 2007). Conversely, neuropsychological research has suggested a distinct modularity of music and language neural components (Peretz & Coltheart, 2003). Establishing an association or dissociation between music training and sensitivity to emotional speech prosody will contribute to the overall study of music and language relations. Controversies in that study include whether or not the acoustical and structural similarities between music and language are perceptually discernible, are subject to similar processes, or implicate shared neural resources. We first consider the plausible hypothesis that similarity between musical melody and speech prosody point to a link between

Prosody and Melody Both speech prosody and musical melody are acoustic-based forms of nonverbal communication. Prosody can be described as either linguistic—the local variations of acoustic cues that alter the meaning of an utterance; or emotional—the global pattern of acoustic cues that dictate the intended emotional tone of a speaker. Both linguistic and emotional prosody, along with musical melody, are characterized in terms of pitch, loudness, tempo, rhythmic, and timbral patterns. Studies of linguistic prosody typically examine sensitivity to statement versus question discrimination or discrimination of focus-shirt utterances (e.g., Ayotte, Peretz, & Hyde, 2002; Patel, Foxton, & Griffiths, 2005). Statement versus question discrimination involves detecting pitch shifts— up or down—at the end of utterances. Focus-shift involves pitch movement associated with a single word in a sentence; a change in pitch movement alters the meaning of the utterance (“SING now please”, vs. “Sing NOW, please” —an example from Ayotte et al., 2002, p. 244). By contrast, the study of emotional prosody involves the recognition of multiple acoustic cues that convey an emotional state of the speaker. The general trends of acoustic cues for emotional prosody and music were gathered by Juslin and Laukka (2003; see also Scherer, 1986). For example, anger in both speech and music is characterized by fast production rate, high intensity level, high variability of intensity, much high frequency energy, high F0 (fundamental frequency) level, high F0 variability, rising F0 contour, rapid event attacks, and microstructural irregularity. Juslin and Laukka (2003)

Christopher G. Trimmer and Lola L. Cuddy, Department of Psychology, Queen’s University, Kingston, Ontario, Canada. Research was supported by a Discovery Grant to Lola L. Cuddy from the Natural Sciences and Engineering Council of Canada. We thank Ingrid Johnsrude and Kevin Munhall for support and advice, Elizabeth Alexander and Lianne Wong for the audiovisual check of the gliding tone analogues, and Carol L. Krumhansl and W. F. Thompson for reading a draft of the manuscript. We also thank Meagan Curtis and two anonymous reviewers for many valuable suggestions and directions. Portions of the research were presented by Christopher G. Trimmer in a thesis submitted to Queen’s University for the M.A. degree (2006) and at Music and Language I, held at Cambridge University, United Kingdom (May, 2007). Correspondence concerning this article should be addressed to Lola L. Cuddy, Department of Psychology, 62 Arch St., Queen’s University, Kingston, ON K7L 3N6, Canada. E-mail: [email protected] 838

EMOTIONAL SPEECH PROSODY

interpreted the consistencies in cues for anger, fear, joy, sadness, and tenderness as indicating the existence of similar codes, and possibly the use of similar or the same neural resources, for emotional speech prosody and music. Acoustic similarities across domains for prosody and melody have led to speculation that training on one domain will lead to enhanced sensitivity to the other.

Emotional Speech Prosody and Music Training A recent resurgence of empirical research on the affective consequences of music has highlighted the effectiveness of the musician’s ability to communicate emotion with music (Juslin & Sloboda, 2001). It is quite plausible to suppose that musicians, due to their intensive involvement with an additional form of nonverbal emotional communication, possess a processing advantage for recognizing emotional prosody. Indeed, evidence from studies of linguistic prosody supports this notion. Music training is associated with a pitch and rhythmic processing advantage for both musical melody and linguistic prosody (Marques, Moreno, Castro, & Besson, 2007; Scho¨n, Magne, & Besson, 2004). Moreover, the presence of music processing difficulties (either congenital or acquired) is associated with lower accuracy in processing linguistic prosody (Ayotte et al., 2002; Nicholson, Baum, Cuddy, & Munhall, 2002; Nicholson et al., 2003; Patel et al., 2005; Patel, Peretz, Tramo, & Lebreque, 1998; Patel, Wong, Foxton, Lochy, & Peretz, 2008). Empirical evidence with regard to emotional expression has been reported by Nilsonne and Sundberg (1985). Music students were superior to law students at identifying the emotional state of a speaker (depressed or nondepressed) from the fundamental frequency contours of various speech utterances. A more recent study, by Thompson, Schellenberg, and Husain (2004), addressed differences between musicians and nonmusicians in decoding emotional speech prosody. Thompson and colleagues (2004) presented musically trained and untrained participants with stimuli intended to express anger, fear, joy, and sadness. Stimuli were either melodic analogues of spoken utterances (Experiment [Exp.] 1) or both spoken utterances and melodic analogues (Exp. 2). Spoken utterances had neutral semantic content. Analogues were sequences of discrete tones derived from spoken utterances; they preserved the pitch and timing contours of the utterances. Musically trained participants were significantly more accurate at identifying intended emotions than untrained participants. As well, in Exp. 2, they were more accurate both for emotions conveyed in a familiar language (English) and those conveyed in an unfamiliar language (Tagalog, a language spoken in the Philippines). In a final experiment, children assigned to music lessons (Exp. 3) for a year were found to be superior at identifying emotional content of utterances than those who had not been given lessons. The data imply a relation between music training and emotional sensitivity to prosody, but the conclusion is not straightforward. In the first experiment of Thompson et al. (2004) the untrained participants scored at chance; they may not have understood the requirements of the task. In the second experiment, the advantage for musicians held only for some of the intended emotions. There was no main effect of training. This result is difficult to interpret. In the third experiment, children given drama lessons also outperformed those with no lessons and did not differ from the children

839

given music lessons. Thus the effect of intervention was not specific to music. In sum, although the evidence to date is suggestive of a relation between music training and emotional sensitivity, the issue should be revisited before the findings may be judged reliable and a mechanism proposed to account for the relation.

Emotional Speech Prosody, Music Training, and Emotional Intelligence Thompson et al. (2004) hypothesized that music lessons would accelerate age-related improvements in perceiving emotions. This hypothesis raises the possibility that aspects of emotional intelligence are mediating factors for a relation between music training and sensitivity to emotional prosody. However, two independent investigations (Resnicow, Salovey, & Repp, 2004; Schellenberg, 2006) reported no differences between musicians and nonmusicians on the Mayer-Salovey-Caruso Emotional Intelligence Test (MSCEIT; Mayer, Salovey, & Caruso, 2002a). The MSCEIT is an ability-based test requiring judgments in four main branches of emotional intelligence: perceiving emotions, facilitating thought about emotions, understanding emotions and managing one’s own emotions. See Figure 1 for the general structure and levels of the MSCEIT. Sample items, similar to test items, are given in Salovey and Grewal (2005). Finding no relation between music training and emotional intelligence is important, because they imply no mediating effect of emotional intelligence on the relation between music training and sensitivity to emotional prosody. In statistical terms, the assumptions of a mediational analysis are not met. The study by Resnicow et al. (2004) suggests an intriguing alternative to account for individual variance in sensitivity to emotional prosody. Resnicow et al. (2004) asked undergraduate students to complete, in addition to the MSCEIT, a listening test that required identifying the intended emotions of musical performances. Each of several pieces of classical music was presented five times, with intent to convey either one of four emotions— happiness, sadness, anger and fearfulness— or a neutral state. Recognition of intended emotion was significantly correlated with emotional intelligence scores but not music training. Further, the experiential area score (left hand side of Figure 1), not the strategic area score (right hand side), was responsible for the significant correlation. The experiential area score represents a respondent’s ability to perceive emotion expressed by faces and abstract designs and to use perceived emotion to facilitate thought, while the strategic area score represents the ability to understand emotional information for the purpose of planning and self-management (Mayer, Salovey, & Caruso, 2002b). Resnicow et al. (2004) proposed that, although the listening test involved musical/auditory items and the experiential scores of the MSCEIT did not, both tests draw on an ability to understand or to empathize with the correlates of emotional expression. Resnicow et al.’s (2004) findings apply to sensitivity to musically expressed emotion, but they may apply to verbally expressed emotion as well. Indeed, the authors further hypothesized that recognition of emotion in speech is an important aspect of emotional intelligence.

TRIMMER AND CUDDY

840

MSCEIT TOTAL

EXPERIENTIAL

STRATEGIC

PERCEIVING

FACILITATING

UNDERSTANDING

MANAGING

Faces

Sensations

Blends

Emotional Management

Pictures

Facilitation

Changes

Emotional Relations

Figure 1. Hierarchy of Mayer-Salovey-Caruso emotional intelligence test (MSCEIT) (Total score ⬎ area scores ⬎ branch scores ⬎ task scores). From MayerSaloveyCaruso Emotional Intelligence Test (MSCEIT) Users Manual by J. D. Mayer, P. Salovey, and D. Caruso, 2002b, p. 71. Copyright 2002 by Multi-Health Systems Publishers. Redrawn with permission.

The Present Study The present study examines two hypotheses. First, it revisits the Thompson et al. study (2004) to examine the hypothesis that sensitivity to emotional prosody is related to music training. Second, it addresses the hypothesis that sensitivity to emotional prosody is an aspect of emotional intelligence. Sensitivity to emotional prosody was the outcome variable and we examined the relation to the main predictor variables of interest—music training and emotional intelligence. The procedure of Thompson et al. (2004) was followed with a number of modifications and additions: data were collected from a greater number of participants, more information was obtained about music training and experiences, a larger sample of emotional prosody stimuli was presented, and a scoring method to accommodate for response bias was used. Stronger evidence for the relation between music training and sensitivity to emotional prosody should be revealed with the greater power of the present design. We also assessed fluid intelligence (as did Thompson et al., 2004) and added a test of music perception and memory, to be used as control variables. Another feature of the present study was a different construction of the melodic analogues of the utterances. Melodic analogues are intended to be devoid of verbal content while conveying the pitch and timing dimensions of prosody—that is, to capture the structure of the ongoing fundamental frequency (F0) contour considered critical in decoding emotion (Ladd, Silverman, Bergmann, & Scherer, 1985). Thompson et al. (2004) presented sequences of discrete tones, of equal amplitude, that preserved the modal fundamental frequency and duration of each syllable of a speech utterance and thus the segmentation and speech rate of the utterance. However, listeners found the decoding of the intended emotion difficult (in comparison to spoken utterances), a finding suggesting that preservation of segmentation and rate was not sufficient. We queried whether preservation of the transitions (glides) in the F0 contour might be more effective and might lead to better performance. We isolated the F0 contour as a continuous glide, preserving pitch, timing, and amplitude variation within

utterances. (See Patel et al. (2005) for a comparable manipulation of focus-shift utterances). In sum, the present study was designed to connect several avenues of investigation—music training, emotional intelligence, and emotional perception in music and language.

Method Measures Sensitivity to emotional prosody was assessed by the ability to decode emotion in prerecorded, semantically neutral English utterances and gliding tone analogues, derived from the speech utterances, expressed in the four basic emotional tones (anger, fear, joy, sadness) and in a neutral condition. The original set of Thompson et al. (2004) utterances from the Florida Affect Battery (Bowers, Blonder, & Heilman, 1991) was kept and two further sets were added, each with a different speaker. Each speech set contained four different emotionally neutral sentences (e.g., “The bottle is on the table”) uttered by a female speaker with intent to convey one of four basic emotions (anger, fear, joy, sadness) or a neutral emotion. A complete list of sentences is listed in Appendix A. There were in all 60 utterances (3 speech sets ⫻ 4 sentences ⫻ 5 intended emotions). Gliding tone analogues were included as a form of prosody that preserved the intonation contours (frequency, intensity, and timing of fundamental frequency) of the original speech utterances. Music training was assessed through a questionnaire developed by Cuddy, Balkwill, Peretz, and Holden (2005). The questionnaire yields a music-training factor that is a weighted combination of 28 questions. The questions with the highest factor loadings are listed in Table 1. Emotional intelligence was assessed with the MSCEIT (Mayer et al., 2002a). According to the MSCEIT manual, an individual’s overall score is a measure of general consensus; answers to each question are compared against answers given by individuals from a normative sample (Mayer et al., 2002b). This scoring criteria has been externally

EMOTIONAL SPEECH PROSODY

Table 1 Questionnaire Items With Factor Loadings Greater Than .50 for the Music Training Factor Score Music training factor score Years of playing a musical instrument Number of instruments played Type(s) of music education—e.g., private, group, self-taught, conservatory examinations At the peak of interest, number of hours per week practicing the main instrument

validated against an expert consensus scoring criteria—proportional responses from 21 members of the International Society for Research on Emotions—and yielded an extremely strong correlation, r(19) ⫽ .91, p ⬍ .01 (Mayer, Salovey, Caruso, & Sitarenios, 2003). To highlight the internal consistency of the MSCEIT, Mayer and colleagues chose 2000 test-takers at random from the normative sample of 5000 and found a split-half reliability of .93 for the overall score of the MSCEIT (Mayer et al., 2003). Music perception and memory was assessed by the Montreal Battery Evaluation of Amusia (MBEA; Peretz, Champod, & Hyde, 2003). There are six subtests—three address pitch organization (melodic contour, interval, scale), two address rhythmic organization (rhythm, and meter), and a sixth test assesses incidental memory. Peretz et al. (2003; see also Cuddy et al., 2005; Peretz et al., 2008) report various psychometric properties of the MBEA, based on 160 neurologically intact participants, including sensitivity, test–retest reliability and validation with another leading test of music abilities— Gordon’s Musical Aptitude Profile (Gordon, 1995). Sample items for each of five tests are shown in Figure 2. Line (A) of Figure 2 is a sample standard stimulus melody for the pitch and rhythm tests. In each of the first four tests, the standard melody is paired with a comparison melody that may be same or different. Different comparison melodies are shown in lines (B–D) for the three pitch tests and in line (E) for the rhythm test. The listener is asked to judge whether the standard and the comparison memory are same or different. The fifth test is the metric test, line (F). It requires listeners to judge whether the stimulus is a march or a waltz. The sixth test, a test of incidental memory, requires listeners to discriminate melodies presented in the first five tests from novel melodies. Fluid intelligence was assessed with the Raven’s Advanced Progressive Matrices (APM), a mental rotation test used to assess nonsemantic intelligence (Raven, 1986). The test was used to control for general intelligence as a predictor of successful emotion recognition. The short form developed by Bors and Stokes (1998) was used; it contains a near-identical internal and test-retest validity with the long form of the Raven’s APM, but can be administered in half the time. The Raven’s APM has been shown to correlate strongly with the full-scale Wechsler Adult Intelligence Scale (McLaurin, Jenkins, Farrar, & Rumore, 1973).

Participants One hundred self-reported native English speakers, 75 women and 25 men, participated in exchange for monetary compensation. Participants were undergraduate university students; mean age was

841

21.6 years (range 19 –32 years) and mean reported years of music training was 6.5 years (range 0 –17 years).

Apparatus The two new sets of speech utterances added to the Thompson et al. (2004) set were recorded using a Shure sm57 microphone (Shure Incorporated, Niles, IL) patched through an M-Audio USB digital audio converter (Avid Technology, Tewksbury, MA). Utterances were recorded to an Apple Mac G4 computer using Audacity digital recording software. All utterances were recorded at a sampling rate of 44,100 Hz. Gliding tone analogues were created with the acoustic manipulation software Melodyne (Neubacker, Gehle, & Sourgens, 2004). The software first uses an autocorrelation technique to extract fundamental frequency fluctuations over time. Then, this F0 contour was resynthesized in Melodyne to include octave harmonics (two above the F0, each of equal amplitude) in order to, “make the fundamental clearly audible without sounding as dull as a simple sine wave” (Neubacker, personal communication, 2006). The perceptual result was that of a smooth and continuous signal. To verify that the gliding tone analogues retained the basic F0 contour characteristics of the original speech utterances, both an auditory and a visual test were conducted. For the auditory test, two research assistants (each with over 10 years of music training and uninformed about the hypotheses of the present study) independently assessed the similarity of each of the gliding tone analogues to the speech utterances from which they were derived. Each assistant listened to the 60 pairs of stimuli over headphones in a sound-attenuated booth. They were asked to respond “yes or no” to whether the glide appeared to match the prosodic contour of the speech utterance. For the visual test, the same two research assistants compared the similarity of visual displays of waveform and fundamental frequency curve for each speech utterance and its corresponding gliding tone analogue. Again they were asked to respond “yes or no” to whether the glide appeared to match the

Stimuli

Responses

same

different

same

different

same

different

same

different

march

waltz

Figure 2. Examples of the stimuli used in each of five tests of the Montreal Battery of Evaluation of Amusia. Asterisks in lines B-E represent the note altered from line A. From “Varieties of musical disorders: The Montreal Battery of Evaluation of Amusia” by I. Peretz, A. S. Champod, and K. Hyde, 2003, Annals of the New York Academy of Science, 999, p. 63. Copyright, 2003 by Blackwell Publishing. Redrawn with permission.

842

TRIMMER AND CUDDY

prosodic contour of the speech utterance. Visual output was attained using PRAAT acoustic analysis software (Boersma & Weenink, 2005). An example of the visual similarity between an emotional speech utterance and its respective gliding tone analogue is given in Figure 3. For both tests, the assessors judged that each gliding tone analogue preserved the contour of the original speech utterance. They found no mismatches between utterances and analogues. Agreement between assessors was 100%. An Apple G4 computer, running Psyscope experimental-design software (Cohen, MacWhinney, Flatt, & Provost, 1993), controlled the presentation of emotional prosodic stimuli (speech utterances and gliding tone analogues) and the Raven’s APM. A PC computer running Medialab software (Jarvis, 2006) was used to present the MBEA. Participants listened to all auditory stimuli on Sennheiser HD 480 headphones (Sennheiser Communications, Tullamore, Ireland). The MSCEIT test was an online version provided by the test suppliers and was implemented using an Apple G4 computer. Each speech utterance was analyzed with PRAAT acoustic analysis software in order to compare acoustic cue patterns across specific emotions. A list of acoustic cues and descriptive statistics, across emotions, is outlined in Appendix B.

Procedure Participant data was collected in two testing sessions. Participants were given the music questionnaire, Raven’s APM, and MBEA at the first session, and the emotional prosody test and MSCEIT emotional intelligence test at the second session. The presentation order of tasks within each session was randomized for

all participants. Total testing time was around two and a half hours—approximately 75 minutes per session. For the emotional prosody test, participants were told they would be required to make judgments of emotion expressed in both prerecorded spoken utterances and gliding tone analogues. On each trial, following the presentation of an utterance or analogue, participants were asked to provide a rating of 0 through 10 (0 ⫽ not present, 10 ⫽ extremely present) on the prominence of each of four emotions (joyful, sad, angry and fearful, in that order). Each participant was advised that a response of 10 typified an extremely salient display of that particular emotion. Participants were told that gliding tone analogues mimicked the prosody (pitch and rhythmic variations) of spoken sentences to varying degrees, and that they should rate the presence of emotion(s) in the same manner as for the normal speech. Six blocks of trials—three speech sets ⫻ two stimulus conditions—were presented to each participant. Each block consisted of 20 trials corresponding to 4 separate sentences or gliding tone analogues and 5 intended emotions. The order of trials within each block was randomized for each participant. The stimulus on each trial was presented once only; repeated listening to a trial was not allowed. The order of the six blocks was balanced to ensure that speech and tone analogues derived from the same set were separated by at least two intervening blocks. Participants were assigned to one of six groups adhering to this rule of presentation order. Before testing, participants were trained on the task with stimuli not used in the testing phase. Similarities between the speech stimuli and their resulting gliding tone analogues were highlighted. Six examples were given to make sure each participant understood the task.

Figure 3. Visual representation of the similarity between a normalized intensity waveform and F0 contour information of a) a typical emotive utterance, and b) its gliding tone analogue. The utterance is “The book is on the shelf” uttered in a joyful tone and analyzed by PRAAT acoustic analysis software. The gliding tone analogue was created with Melodyne software.

EMOTIONAL SPEECH PROSODY

To assess music training and ability, the music questionnaire was administered in pencil and paper format (see Cuddy et al., 2005 for further details) and the MBEA was administered under standardized instructions and procedures (see Peretz et al., 2003 for further details). For the MSCEIT emotional intelligence test, participants were informed that they would be performing a test requiring emotional judgments based on visual stimuli and written scenarios. The test included all other necessary instructions. For the Raven’s APM, participants were instructed to study a matrix of visual-patterns containing a missing section, and to choose one of the eight possible answers below the matrix that best completed the patterns surrounding the missing piece. Each participant was given four examples prior to testing to ensure proper comprehension of the task.

Results Emotional Prosody Data Emotional prosody data was scored using the difference score procedure of Resnicow et al. (2004). The difference score expresses the extent to which an intended emotion was conveyed by the utterance or gliding tone analogue relative to when it was not specifically intended. For each intended emotion, response accuracy is calculated as the rating for the intended emotion divided by the sum of the other scale ratings. Response accuracy is then compared against a baseline score for that emotion derived from the scale ratings for the neutral utterances or analogues. The baseline score is the rating of the target emotion scale of the neutral utterance or analogue divided by the sum of all ratings for that utterance. The final difference score is equal to the response accuracy score minus the baseline score, with a score of zero representing chance accuracy. Table 2 displays mean difference scores across stimulus and emotion conditions and shows the 95% confidence intervals for each stimulus condition and intended emotion. Since none of the confidence intervals encompass zero, it can be concluded that each of the difference score means is significantly greater than zero.

Table 2 Emotional Prosody Mean Difference Scores (and Standard Deviations) and 95% Confidence Intervals Across Emotions and Stimulus Type

Difference scores Anger Fear Joy Sadness Overall 95% confidence intervals Anger Fear Joy Sadness

Speech

Gliding tone analogues

.60 (.18) .49 (.15) .74 (.20) .52 (.20) .59 (.12)

.22 (.14) .16 (.12) .52 (.21) .06 (.12) .24 (.09)

.57–.64 .46–.52 .70–.78 .48–.56

.19–.25 .14–.18 .47–.56 .04–.09

Note. Possible difference scores range from ⫺1.00 to 1.00, with chance equaling zero.

843

The data of Table 2 were subjected to an analysis of variance with two within-subject factors—stimulus condition and intended emotion. Mean difference scores were distinctly higher for speech stimuli than for gliding tone analogues, F(1, 199) ⫽ 922.3, p ⬍ .001. Mean difference scores for the different intended emotions varied significantly as well, F(3, 297) ⫽ 160.71, p ⬍ .001, with participants showing greater accuracy for joyful utterances over angry, fearful and sad utterances, respectively. Finally, there was a significant interaction between stimulus condition and intended emotion, F(3, 297) ⫽ 24.7, p ⬍ .001. Table 2 shows that emotional recognition scores varied as a function of stimulus type; this is likely due to scores such as that of sadness, which is close to the average score on recognition rate in speech, but much below the average score for gliding tone analogue stimuli. Two further analyses focused on the gliding tone analogues. The first analysis addressed whether the inferior performance with the gliding tone analogues was due solely to an increase in the ratings for neutral stimuli. For the neutral stimuli, the proportion of false identification of an emotion was in fact higher for gliding tone analogues (M ⫽ .74) than for speech utterances (M ⫽ .64), t(99) ⫽ 7.00, p ⬍ .001. However, the proportion of correct identifications for non-neutral gliding tone analogues was much lower (M ⫽ .48) than for speech utterances (M ⫽ .78). Thus the recognition deficit for gliding tone analogues was not solely due to an increase in false identification of neutral stimuli. The second analysis examined the effect of stimulus presentation order on response accuracy for gliding tone analogues— whether accuracy depended on whether a gliding tone analogue preceded or followed the speech utterance from which it was derived. The mean difference score for glides heard prior to their corresponding speech stimuli was .24 and the mean difference score for glides heard following their corresponding speech stimuli was .26. The two scores were not significantly different, t(99) ⫽ ⫺1.68, p ⬎ .05.

Predictor Variable Data Table 3 displays descriptive statistics of participant performance on the predictor variables. Multi-Health System’s scoring procedure converted individual MSCEIT raw scores into standard scores with an overall mean of 100 and a standard deviation of 15. For the MSCEIT, participants’ mean score (see Table 3) was somewhat lower than that obtained by Resnicow and colleagues (2004) for a similar aged (18 –24 years) sample (M ⫽ 110.2, SD ⫽ 16.4), a difference that reached significance, t(122) ⫽ 8.64, p ⬍ .05. MBEA scores are expressed as a composite of six individual tasks of music ability. Participant scores closely resembled those reported by Peretz et al. (2003), (M ⫽ 26.0, SD ⫽ 1.6), in their original report of the test’s validity. For the Raven’s APM, expressed in proportion correct, the mean participant score was well above chance, .12. Participant scores were consistent with the scores of a similar-aged sample (M ⫽ 20.0 years, range ⫽ 17–30 years.) tested by Bors et al. (1998), (M ⫽ .58, SD ⫽ .21, range ⫽ 0 –1.0). A subsequent independent samples t test verified this similarity, t(604) ⫽ 0.97, p ⬎ .05.

Predicting Emotional Prosody Recognition A correlation analysis for emotional prosody scores and predictor variables is given in Table 4. The main findings from the

TRIMMER AND CUDDY

844

Table 3 Predictor Variable Descriptive Statistics for Music Training Factor, MSCEIT Emotional Intelligence Test, MBEA, and the Raven’s APM

Music Training, MSCEIT, MBEA, and Raven’s APM

Variable

Mean

SD

Range

Music training factor score MSCEIT total score Experiential area Perceiving branch Facilitating thought branch Strategic area Understanding branch Managing branch MBEA average Raven’s APM

0.04 102.6 104.8 107.7 100.0 99.0 101.7 96.2 26.2 .63

0.9 9.0 11.8 11.8 12.7 7.5 9.5 7.0 2.3 .23

⫺1.83–1.97 80.7–124.7 76.1–128.4 70.4–132.1 73.8–126.1 80.2–115.9 75.0–122.1 80.7–111.7 20–29.3 0–1.0

Music training scores did not relate to emotional prosody scores but they were significantly related to other tests previously associated with music training—MBEA scores (Cuddy et al., 2005) and the Raven’s APM score of fluid intelligence (Thompson et al., 2004). Music training factor scores, but not raw years of music training, were significantly related to performance on the strategic area of the MSCEIT. However, performance on the strategic area was not reflected in the nonsignificant correlation between music factor scores and the overall MSCEIT score. Table 6 illustrates a regression of music training factor scores on the MSCEIT overall score, MBEA, and the Raven’s APM scores. Given that the MBEA scores were not related to emotional prosody scores, it may be anticipated that participants whose MBEA scores were indicative of potential congenital amusia (also known as “tone deafness”, see Peretz et al., 2003) would perform normally on the emotional prosody tests. Ten individuals obtained MBEA scores below 23, the cut-off for suspected congenital amusia (MMBEA ⫽ 21.6, SD ⫽ 0.90). Emotional prosody scores for these participants were (M ⫽ .62, SD ⫽ .07) for speech utterances, and (M ⫽ .26, SD ⫽ .11) for gliding tone analogues. These scores are close to the sample means (see Table 2) and slightly, but not significantly, higher than scores for 12 participants with MBEA scores above the 90th percentile (MMBEA ⫽ 29.1, SD ⫽ 0.09). Emotional prosody scores for the latter participants were (M ⫽ .59, SD ⫽ .14) for speech utterances and (M ⫽ .22, SD ⫽ .08) for gliding tone analogues. The differences between the two subgroups of participants were not significant; for emotional speech utterances, t(20) ⫽ ⫺.90, p ⬎ .05, and for gliding tone analogues, t(20) ⫽ ⫺.71, p ⬎ .05.

correlational analysis may be summarized as follows: (1) There was no relationship between emotional prosody (overall, speech, and gliding tone analogues) and music training predictors— either the music training factor score or raw years of music training. The range of correlations was r(98) ⫽ ⫺.01 to ⫺.14, all ns; (2) There was no relationship between the experiential area score of the MSCEIT and music training predictors. The range of correlations was r(98) ⫽ ⫺.06 to .01, all ns; (3) Emotional prosody scores for speech utterances, gliding tone analogues, and overall scores were significantly related to the experiential area score of the MSCEIT. The range of correlations was r(98) ⫽ .46 to .57, all p ⬍ .01. In order to confirm the contribution of the MSCEIT experiential area score to the prediction of emotional prosody scores, a multiple regression was conducted with all variables entered simultaneously into the equation. The MSCEIT experiential area score (consisting of MSCEIT branches 1 & 2), MSCEIT strategic area score (consisting of MSCEIT branches 3 & 4), along with the music training factor score, MBEA overall score and the Raven’s APM score, were regressed against measures of emotional speech prosody recognition. Results are given in Table 5, where it may be seen that the MSCEIT experiential area score relating to perceiving emotions and using emotions to facilitate thought contributed significantly to the prediction of emotional prosody scores (for speech, gliding tone analogues, and overall scores).

Confirmation of Results A series of correlations and regression analyses was conducted. These analyses were intended to confirm the stability of the principal findings regarding the relation of music training and the experiential areas of the MSCEIT with emotional prosody. Different subsets of the data were examined— gender (male vs. female), and speaker set (three encoders—the Thompson et al. (2004) set,

Table 4 Correlations Between Predictor Variables and Scores of Emotional Prosody Variables Emotional prosody 1. Speech 2. Gliding tone analogues 3. Overall Music training 4. Factor score 5. Raw years MSCEIT Emotional Intelligence Scores 6. Area 1–experiential 7. Area 2–strategic 8. Overall score (branches 1–4) 9. MBEA 10. Raven’s APM ⴱ

p ⬍ .05.

ⴱⴱ

p ⬍ .01 (2-tailed).

1

2

— .39ⴱⴱ .88ⴱⴱ ⫺.09 ⫺.14 ⴱⴱ

.49 .08 .41ⴱⴱ ⫺.21ⴱ .01

3

— .78ⴱⴱ ⫺.01 ⫺.03 ⴱⴱ

.46 .19 .43ⴱⴱ ⫺.05 ⫺.09

4

5

— .59ⴱⴱ



.01 .31ⴱⴱ .19 .56ⴱⴱ .34ⴱⴱ

⫺.06 .09 .01 .39ⴱⴱ .17

6

7

8

9

10

— .18 .83ⴱⴱ .02 ⫺.02

— .70ⴱⴱ .20ⴱ .15

— .13 .09

— .31ⴱⴱ



— ⫺.06 ⫺.11 ⴱⴱ

.57 .15 .50ⴱⴱ ⫺.17 ⫺.04

EMOTIONAL SPEECH PROSODY

Table 5 Summary of a Linear Regression Analysis With Emotional Prosody Scores Regressed on Scores of MSCEIT (Area Scores 1 and 2), Music Training Factor Score, MBEA, and the Raven’s APM Predictor Emotional prosody (Overall) MSCEIT experiential area MSCEIT strategic area Music training factor MBEA average Raven’s APM Emotional prosody (Speech) MSCEIT experiential area MSCEIT strategic area Music training factor MBEA average Raven’s APM Emotional prosody (Gliding tone analogues) MSCEIT experiential area MSCEIT strategic area Music training factor MBEA average Raven’s APM



t

p

0.56 0.09 0.01 –0.21 0.03

6.67 1.05 0.08 –2.11 0.31

⬍.01 ⬎.05 ⬎.05 ⬍.05 ⬎.05

0.48 0.01 0.03 –0.26 0.09

5.43 0.10 0.26 –2.44 0.97

⬍.01 ⬎.05 ⬎.05 ⬍.05 ⬎.05

0.46 0.17 –0.02 –0.07 –0.07

5.10 1.80 –0.21 –0.66 –0.71

⬍.01 ⬎.05 ⬎.05 ⬎.05 ⬎.05

Note. (a) R (Overall) ⫽ .61; F(5, 93) ⫽ 10.74, p ⬍ .001; (b) R (Speech) ⫽ .53; F(5, 93) ⫽ 7.42, p ⬍ .001; (c) R (Gliding tone analogues) ⫽ .52; F(5, 93) ⫽ 6.88, p ⬍ .001.

vs. two sets constructed for this research). As well, the measure of raw years of music training was entered into the regressions rather than the music factor score. Overall, for all analyses, only MSCEIT experiential area scores were significantly related to emotional prosody scores. A subsequent consideration was the difference in scoring method from the percent correct score used by Thompson et al. (2004). The present difference score was adopted to reduce the possibility that response bias could produce benefits for some emotions over others—in particular benefits for some emotions that appeared only for the musically trained (as was found by Thompson et al., 2004). In the present study no correlations between music training and the difference scores for individual intended emotions were significant, range for r(98) was ⫺.18 to .09, all ns. Further, with respect to the percent correct score, it may be noted that it could not, with the present procedure, be scored for all trials—that is, participants did not clearly indicate a single preference for one emotion on 8.5% of the trials. However, two analyses were conducted with the remaining data where a single preference could be scored. First, overall percent correct was compared for the Thompson et al. (2004, Exp. 2) data and the data for the same set of utterances and analogues in the present study. Percent correct for the speech utterances was 84.8% in the Thompson et al. (2004) study and 89.3% for the current study. This difference was not significant, t(138) ⫽ 1.48, p ⬎ .05. Percent correct for the tone syllable sequences in the Thompson et al. (2004) study was 43.1% and for gliding tone analogues in the current study, 54.2%. This difference was significant, t(138) ⫽ 3.47, p ⬍ .01. Second, regressions were conducted on the available percent correct data in the current study. The regressions impli-

845

cated only the relation between MSCEIT scores and emotional prosody, and not music training. Details of these analyses are available from the authors.

Replication As part of another ongoing study in the laboratory, 92 participants were also tested on emotional prosody, the MBEA, and the MSCEIT. Procedures were identical to those in the main study above, with the exception that the order of testing was changed; the MSCEIT and music questionnaire were administered in one session and the MBEA and emotional prosody task were presented in another. The order of sessions was counterbalanced across participants. The Raven’s APM was not administered. Results confirmed the main results above. A multiple regression to predict emotional prosody scores implicated only the MSCEIT experiential area score (perceiving emotions and using emotions to facilitate thought) for speech, R ⫽ .48, F(1, 91) ⫽ 27.05, p ⬍ .001, gliding tone analogues, R ⫽ .26, F(1, 91) ⫽ 6.69, p ⬍ .05, and overall scores, R ⫽ .45, F(1, 91) ⫽ 23.26, p ⬍ .001. Music training factor scores were not significantly related to the emotional prosody scores, range across the three tests for r(90) was .07 to .12, all ns. Again, music training was significantly related to MBEA scores, r(90) ⫽ .34, p ⬍ .001, but not the overall MSCEIT, r(90) ⫽ ⫺.06, p ⬎ .05, nor the MSCEIT experiential area score, r(90) ⫽ ⫺.09 p ⬎ .05, or the MSCEIT strategic area score, r(90) ⫽ ⫺.01, p ⬎ .05. This latter result suggests that the correlation in the main study for music training and the strategic area of the MSCEIT is unreliable.

Discussion The main finding was that a test of emotional intelligence was a reliable predictor of emotional speech recognition in universityaged participants. In particular, the experiential area score of the MSCEIT, that includes perceiving emotion and facilitating thought branch scores, predicted emotional prosody scores. Contrary to the findings of Thompson et al. (2004), however, there was no link between music training and emotional prosody recognition scores. Furthermore, emotional prosody recognition for participants with MBEA composite scores below 23 (a cut-off typically indicating “tone deafness” or congenital amusia [Peretz et al., 2003]), did not differ from participants with near-perfect MBEA scores. Emotion recognition in speech likely involves processes different from those involved with musical/perceptual abilities.

Table 6 Summary of a Linear Regression Analysis With Music Training Factor Score Regressed on MSCEIT Overall Score, MBEA, and the Raven’s APM Predictor



t

p

MSCEIT overall score MBEA average Raven’s APM

0.12 0.49 0.17

1.45 5.61 1.89

⬎.05 ⬍.001 ⬎.05

Note. R (Overall) ⫽ .60; F(3, 95) ⫽ 17.41, p ⬍ .001. Raven’s APM was marginally significant at p ⫽ .06.

846

TRIMMER AND CUDDY

Three aspects of the findings will be discussed: first, the lack of association between music training and both sensitivity to emotional prosody and the experiential area of emotional intelligence; second, the significant association between sensitivity to emotional prosody and the experiential area of emotional intelligence; and third, briefly, the striking difference between sensitivity to emotional speech utterances and sensitivity to melodic analogues.

Music Training, Emotional Speech Prosody, and Emotional Intelligence With respect to music training and the lack of associations, it is always problematic to address a null result—that is, a failure to replicate and a failure to reject the null hypothesis. Nonetheless, correlations, and therefore effect sizes, in the present study were not merely close to significance but were essentially zero. Although correlation does not imply causation, the reverse, that causation should imply correlation, is arguably true. With no correlation significant here, we must question that there is any causative role. The lack of association between music training and sensitivity to emotional prosody is surprising not just because of the failure to replicate Thompson et al. (2004), but because it appears to run counter to multiple reports of cognitive enhancement associated ˘ rnc˘ec, Wilson, & Prior, with music training (for reviews, see C 2006; Schellenberg, 2004, 2005). However, with further consideration of the literature on music training and sensitivity to musical emotion, two lines of supportive evidence may be found. The first is evidence of the remarkable sophistication of nonmusicians in dealing with various musical tasks—such as detecting patterns of tension/relaxation in music, generating musical expectances, and learning new musical idioms (reviewed by Bigand & PoulinCharronnat, 2006). The most relevant study for present purposes is an investigation of judgment of emotion conveyed by serious nonvocal music (Bigand, Vieillard, Madurell, Marozeau, & Dacquet, 2005). The judgments of both musicians and nonmusicians were highly discriminative, reliable, and essentially the same. The second is the finding that patients with acquired amusia may accurately judge the emotional content of music despite severe deficits in basic musical perceptual abilities (Lantz, Kilgour, Nicholson, & Cuddy, 2003; Peretz & Gagnon, 1999; Peretz, Gagnon, & Bouchard, 1998). The authors propose that different perceptual features may be relevant for emotional and recognition judgments. Taken together, the lines of evidence suggest that intensive musical training or skill is not required for decoding emotion in music, and by extension, may not be required for decoding emotion in speech. The present sample of participants was not atypical; previously obtained relations of music training with the MBEA and with the Raven’s APM were replicated. The scoring of music training according to a music factor score was not responsible for the lack of association; raw years of music training also failed to correlate with scores on the emotional prosody test. The use of a difference score, rather than percent correct, was not responsible. We may only conclude that our efforts to strengthen the fragile relation reported by Thompson et al. (2004) did not strengthen but rather weakened the evidence. A lack of association between music training and overall scores of emotional intelligence is shored up by two previous studies also using the MSCEIT as a measure of

emotional intelligence–Resnicow et al. (2004) and Schellenberg (2006). Also supportive of a lack of association is the finding that, relative to a control group, children given music lessons did not improve in social skills (Schellenberg, 2004). To the extent that social skills involve emotional intelligence—the ability to “read” emotional situations—it may be concluded that it is unlikely that music training either directly or indirectly impacts emotional skills. The previous statement may be qualified by considering the nature of the MSCEIT. The MSCEIT addresses the construct of ability emotional intelligence, or cognitive-emotional ability, which may be distinguished from trait emotional intelligence, or self-perceptions of dispositions and emotionality (Petrides, Niven, & Mouskounti, 2006; see also the Bar-On Emotional Quotient Inventory [Bar-On, 1999]). As such, reports of the reliability and validity of the MSCEIT are highly favorable (Mayer-SaloveyCaruso Emotional Intelligence Test, nd). Salovey and Grewal (2005) review multiple studies reporting significant correlations of MSCEIT scores with satisfaction of social relationships, deviant behavior (negative relationship), and favorable ratings from coworkers. Addressing the different construct of trait intelligence, Petrides et al. (2006) reported a significant relation for 37 music students between a newly constructed questionnaire for trait emotional intelligence and length of musical training for music students (music students only, no controls were tested). The authors noted that trait emotional intelligence facets of self-reported assertiveness and determination, for example, were likely to correspond to a willingness to stick with music training. Thus it is possible that the results of the current study are specific to music training and cognitive-emotional ability and do not apply to all measures of emotional intelligence. Moreover, it should be noted that the emotions tested (anger, fear, joy, sadness), while “primary” in human emotional life, hardly cover the range of emotions/attitudes that the voice can express. Future research could confirm whether or not music training is implicated with more subtle emotional inflections (e.g., sarcasm, disgust, tenderness, irritation, contentment).

Emotional Speech Prosody and Emotional Intelligence A much more encouraging finding was the significant association between sensitivity to emotional prosody and the experiential area of emotional intelligence, a finding in accord with Resnicow and colleagues’ (2004) findings with emotional expression in music. Because of the quasi-experimental nature of the present study, the finding implies association, not cause and effect. However, we may note that general intelligence (as measured by the Raven’s APM) is not implicated in the relationship. Reflection on the nature of the tasks leads to a possible account of the relationship. The relevant MSCEIT test items survey participants’ ability to recognize emotion in pictures of human faces and visual scenes, and, using verbal scenarios, the degree to which the respondent can use his or her emotions to improve thinking. Mayer et al. (2002b) describe emotion perception as involving, “. . . paying attention to and accurately decoding emotional signals in facial expressions, tone of voice, and artistic expressions.” (p. 19). These connections suggest cross-sensory processes of emotion recognition; those that are sensitive to visual displays of

EMOTIONAL SPEECH PROSODY

emotion are similarly adept at decoding emotion in nonverbal communication (i.e., speech and music). Sensitivity to emotional speech prosody appears to be related to the operation of a highlevel, perceptual emotional processor. This argument is consistent with recent findings that emotion recognition accuracy within a perceptual domain is correlated with general emotion processing abilities rather than with perceptual abilities (e.g., Borod, Pick, Hall, Sliwinski, Madigan, Obler, et al., 2000). New preliminary research, such as Scherer and Ellgring (2007), is focused on moving beyond the traditional unimodal model of emotion communication and toward a focus on objective microcoding of simultaneous multimodal emotion configuration patterns. Moreover, it is possible that recognition of emotion is a skill dependent on the ability to empathize with another person’s state of mind. Theory-of-mind research deals with the ability to interpret what another person is thinking or feeling (for a review see Wellman, Cross, & Watson, 2001) and may be particularly relevant to the topic of cross-domain emotional perception. Perhaps congenital amusics (known for deficits in perceptual abilities related to music) are able to still fully recognize emotion in speech because their processing disadvantage does not extend to empathizing with encoders of emotion. A research paradigm testing relations between emotional intelligence, theory-of-mind reasoning and nonverbal emotion communication would help appraise the likelihood of this proposal. Whether or not directly manipulated training with one domain of emotional intelligence will transfer to another remains to be seen.

Emotional Speech Utterances and Tone Analogues Although the correlations and regression analyses did not differ for the two stimuli in the identification of emotional intelligence as a predictor, recognition was significantly lower for gliding tone analogues than for speech utterances. This finding replicates the Thompson et al. (2004) finding that discrete tone sequences derived from speech utterances were much more difficult to decode than speech utterances. (It was also found that performance for the gliding tone analogues was slightly but significantly better than performance for the discrete tone analogues reported by Thompson et al. (2004), but before attempting interpretation such a finding should be replicated with testing of the same listeners under identical conditions.) Since the gliding tone analogues retained the pitch and timing contours of the speech utterances, contours commonly regarded as important auditory cues for emotion recognition (e.g., Schro¨der, 2001), it is intriguing to consider why recognition was so very much lower for the gliding tone analogues than for the speech utterances. In other words, why, when upper frequencies and semantic content are missing, is there a sizable reduction in emotion recognition accuracy? Thompson et al. (2004) further reported that, even when the language was unfamiliar, decoding emotional expression was more difficult for tone syllable sequences derived from the speech utterance than the utterances themselves. Since the verbal content of the unfamiliar spoken utterances was unintelligible to the listeners, the critical difference between the two conditions was the removal of high frequency spectral components. Thus the degradation in recognition may be due to that removal. More recently, Ba¨nziger and Scherer (2005) have also suggested that information

847

in the high frequency components of the speech signal may override the fundamental frequency contour in importance for decoding emotional intent. These high frequency components contribute to an utterance’s perceived timbre, a feature long recognized as an important cue to vocal affect (e.g., Ladd et al., 1985). Thus, the lack of timbral cues in the gliding tone analogues may be one large factor responsible for the low identification accuracy. Further work could compare emotion identification for analogues where the presence of such cues was systematically varied. Another possibility that could be explored in future is manipulation of the familiarity of the analogue timbre. Unlike the speech material, the gliding tone analogues were artificial. It would be interesting to see whether performance would be similarly low with a musical instrument—for example, a violin—mimicking speech prosody. Juslin and Laukka (2003) have speculated that “many musical instruments are processed by brain modules as superexpressive voices” (p. 803). The present data provide a picture different from that provided by Ayotte et al. (2002) and Patel et al. (2005). In their studies, most individuals had no difficulty discriminating the analogues; only a sample of tone-deaf individuals had difficulty. At least two differences exist between the focus-shift research and the present study. First, the task of the focus-shift research was a “same-different” discrimination, not recognition. Second, the focus-shift research involved linguistic, not emotional, prosody.

Conclusion In sum, the current study suggests that indices of music training and music perceptual skills are not related to recognition of emotion in speech prosody, despite similar patterns of emotional acoustic cues between music and speech. Instead, indices of emotional intelligence predicted sensitivity to emotional speech prosody. The results of this study do not dispute the growing evidence that there are linguistic benefits associated with musical ability (Patel, 2008; Patel & Iversen, 2007). Rather, they question whether emotional speech prosody properly belongs to the category of linguistic abilities. The data point to a new direction: emotion recognition in both music and speech may have less to do with acoustical sensitivity than it does with the operation of a crossmodal emotional processing system.

References Ayotte, J., Peretz, I., & Hyde, K. (2002). Congenital amusia. Brain, 125, 238 –251. Ba¨nziger, T., & Scherer, K. R. (2005). The role of intonation in emotional expressions. Speech Communication, 46, 252–267. Bar-On, R. (1999). Bar-On Emotional Quotient Inventory (EQ-i). Toronto, Canada: Multi-Health Systems. Bigand, E., & Poulin-Charronnat, B. (2006). Are we “experienced listeners”? A review of the musical capacities that do not depend on formal musical training. Cognition, 100, 100 –130. Bigand, E., Viellard, S., Madurell, F., Marozeau, J., & Dacquet, A. (2005). Multidimensional scaling of emotional responses to music: The effect of musical expertise and of the duration of the excerpts. Cognition and Emotion, 19, 1113–1139. Boersma, P., & Weenink, D. (2005). Praat: Doing phonetics by computer (Version 4.3. 21) [Computer program]. Retrieved from http:// www.praat.org Borod, J. C., Pick, L. H., Hall, S., Sliwinski, M., Madigan, N., Obler, L. K.,

848

TRIMMER AND CUDDY

et al. (2000). Relationships among facial, prosodic, and lexical channels of emotional perceptual processing. Cognition & Emotion, 14, 193–211. Bors, D. A., & Stokes, T. L. (1998). Raven’s Advanced Progressive Matrices: Norms for first-year university students and the development of a short form. Educational and Psychological Measurement, 58, 382–398. Bowers, D., Blonder, L. X., & Heilman, K. M. (1991). Florida Affect Battery (FAB). Gainesville, FL: University of Florida. Cohen, J. D., MacWhinney, B., Flatt, M., & Provost, J. (1993). PsyScope: A new graphic interactive environment for designing psychology experiments. Behavioral Research Methods, Instruments, and Computers, 25, 257–271. C˘rnc˘ec, R., Wilson, S. J., & Prior, M. (2006). The cognitive and academic benefits of music to children: Facts and fiction. Educational Psychology, 98, 1–16. Cuddy, L. L., Balkwill, L.- L., Peretz, I., & Holden, R. R. (2005). Musical difficulties are rare: A study of “tone deafness” among university students. Annals of the New York Academy of Sciences, 1060, 311. Fitch, W. T. (2006). The biology and evolution of music: A comparative perspective. Cognition, 100, 173–215. Gordon, E. E. (1995). Musical aptitude profile. Chicago: GIA Publications. Jarvis, W. B. G. (2006). MediaLab, 2000 [computer software]. New York: Empirisoft. Juslin, P. N., & Laukka, P. (2003). Communication of emotions in vocal expression and music performance: Different channels, same code? Psychological Bulletin, 129, 770 – 814. Juslin, P. N., & Sloboda, J. A. (Eds.). (2001). Music and emotion: Theory and research. Oxford, England: Oxford University Press. Ladd, S., Silverman, K., Bergmann, G., & Scherer, K. (1985). Evidence for independent function of intonation contour type, voice quality, and F0 in signaling speaker affect. Journal of the Acoustical Society of America, 78, 435– 444. Lantz, M. E., Kilgour, A., Nicholson, K. G., & Cuddy, L. L. (2003). Judgments of musical emotion following right-hemisphere damage. Brain and Cognition, 51, 160 –248. Marques, C., Moreno, S., Castro, S. L., & Besson, M. (2007). Musicians detect pitch violation in a foreign language better than nonmusicians: Behavioral and electrophysiological evidence. Journal of Cognitive Neuroscience, 19, 1453–1463. Mayer, J. D., Salovey, P., & Caruso, D. (2002a). Mayer-Salovey-Caruso Emotional Intelligence Test (MSCEIT), Version 2.0. Toronto, Canada: Multi-Health Systems. Mayer, J. D., Salovey, P., & Caruso, D. (2002b). Mayer–Salovey–Caruso Emotional Intelligence Test (MSCEIT) User’s Manual. Toronto, Canada: Multi-Health Systems. Mayer, J. D., Salovey, P., Caruso, D. R., & Sitarenios, G. (2003). Measuring emotional intelligence with the MSCEIT V2.0. Emotion, 3, 97–105. Mayer-Salovey-Caruso Emotional Intelligence Test. (n.d.). In The sixteenth mental measurements yearbook. Retrieved December 2, 2007, from OVID WebSPIRS Mental Measurements Yearbook database. McLaurin, W., Jenkins, J., Farrar, W., & Rumore, M. (1973). Correlations of I. Q. on verbal and non-verbal tests of intelligence. Psychological Reports, 22, 821– 822. Neubacker, P., Gehle, C., & Sourgens, H. (2004). Melodyne (Version 2.0) [Computer software]. Mu¨nchen, Deutschland: Celemony Software GmbH. Nicholson, K. G., Baum, S., Cuddy, L. L., & Munhall, K. G. (2002). A case of impaired auditory and visual speech prosody perception after right hemisphere damage. Neurocase, 8, 314 –322. Nicholson, K. G., Baum, S., Kilgour, A., Koh, C. K., Munhall, K. G., & Cuddy, L. L. (2003). Impaired processing of prosodic and musical patterns after right hemisphere damage. Brain and Cognition, 52, 382–389. Nilsonne, A., & Sundberg, J. (1985). Differences in ability of musicians

and nonmusicians to judge emotional state from the fundamental frequency of voice samples. Music Perception, 2, 507–516. Patel, A. D. (2003). Language, music, syntax and the brain. Nature Neuroscience, 6, 674 – 681. Patel, A. D. (2008). Music, language, and the brain. New York: Oxford. Patel, A. D., Foxton, J. M., & Griffiths, T. D. (2005). Musically tone-deaf individuals have difficulty discriminating intonation contours extracted from speech. Brain and Cognition, 59, 310 –313. Patel, A. D., & Iversen, J. R. (2007). The linguistic benefits of musical abilities. Trends in Cognitive Sciences, 11, 369 –372. Patel, A. D., Peretz, I., Tramo, M., & Labreque, R. (1998). Processing prosodic and musical patterns: A neuropsychological investigation. Brain and Language, 61, 123–144. Patel, A. D., Wong, M., Foxton, J., Lochy, A., & Peretz, I. (2008). Speech intonation perception deficits in musical tone deafness (congenital amusia). Music Perception, 25, 357–368. Peretz, I., Champod, A. S., & Hyde, K. (2003). Varieties of musical disorders. Annals of the New York Academy of Sciences, 999, 58 –75. Peretz, I., & Coltheart, M. (2003). Modularity of music processing. Nature Neuroscience, 6, 688 – 691. Peretz, I., & Gagnon, L. (1999). Dissociation between recognition and emotional judgment for melodies. Neurocase, 5, 21–30. Peretz, I., Gagnon, L., & Bouchard, B. (1998). Music and emotion: Perceptual determinants, immediacy and isolation after brain damage. Cognition, 68, 111–141. Peretz, I., Gosselin, N., Tillmann, B., Cuddy, L. L., Gagnon, B., Trimmer, C. G., et al. (2008). On-line identification of congenital amusia. Music Perception, 25, 331–343. Petrides, K. V., Niven, T., & Mouskounti, T. (2006). The trait emotional intelligence of ballet dancers and musicians. Psicothema, 18, 101–107. Raven, J. C. (1986). Raven Progressive Matrices and Vocabulary Scales. London: H. K. Lewis & Co. Ltd. Resnicow, J. E., Salovey, P., & Repp, B. (2004). Is recognition of emotion in music performance an aspect of emotional intelligence? Music Perception, 22, 145–158. Salovey, P., & Grewal, D. (2005). The science of emotional intelligence. Current Directions in Psychological Science, 14, 281–285. Schellenberg, E. G. (2004). Music lessons enhance I. Q. Psychological Science, 15, 511–514. Schellenberg, E. G. (2005). Music and cognitive abilities. Current Directions in Psychological Science, 14, 317–320. Schellenberg, E. G. (2006, August). Music lessons and emotional intelligence. Paper presentation at the 9th International Conference on Music Perception and Cognition, Bologna, Italy. Scherer, K. R. (1986). Vocal affect expression: A review and a model for future research. Psychological Bulletin, 99, 143–165. Scherer, K. R., & Ellgring, H. (2007). Multimodal expression of emotion: Affect programs or computational appraisal patterns? Emotion, 7, 158 – 171. Scho¨n, D. (2007, May). Do language and music exist in the brain? Paper presentation at the annual conference of Language and Music as Cognitive Systems, Cambridge, England. Scho¨n, D., Magne, C., & Besson, M. (2004). The music of speech: Music training facilitates pitch processing in both music and language. Psychophysiology, 41, 341–349. Schro¨der, M. (2001). Emotional speech synthesis: A review. Proceedings of Eurospeech, 7, 561–564. Thompson, W. F., Schellenberg, E. G., & Husain, G. (2004). Decoding speech prosody: Do music lessons help? Emotion, 4, 46 – 64. Wellman, H. M., Cross, D., & Watson, J. (2001). Meta-analysis of theoryof-mind development: The truth about false belief. Child Development, 72, 655– 684.

EMOTIONAL SPEECH PROSODY

849

Appendix A Complete List of Neutral Content Sentences “Tomorrow, I’ll go shopping” “I rarely see my neighbor”

Set 1 “The “The “The “The

boy went to the store” chairs are made of wood” lamp is on the table” shoes are in the closet”

Set 3 “The box is in the kitchen” “The book is on the shelf” “The path goes through the forest” “A car is in the driveway”

Set 2 “The bottle is on the table” “The leaves are changing color”

Appendix B Summary of Acoustic Data for Speech Utterances Across Emotions Acoustic cue

Anger

Fear

Joy

Sad

F0 median (Hz) F0 mean (Hz) F0 SD (Hz) F0 range (Hz) F0 range (cents) Jitter (local %) F1 mean (Hz) F1 bandwidth (Hz) Duration (sec.) Number of pauses Amplitude SD (dB) Amplitude range (dB)

248.3 264.2 85.5 368.5 2450.6 2.77 663.5 535.1 1.63 4.42 12.2 54.2

241.2 253.4 71.4 395.9 2514.2 2.94 625.6 240.9 1.54 4.08 11.8 115.9

280.9 307.5 104.4 426.1 2456.4 2.62 650.7 137.8 1.54 4.42 18.8 143.2

216.5 230.6 88.8 433.8 2914.4 3.20 638.4 289.9 1.92 6.33 12.5 42.7

Received December 10, 2007 Revision received September 9, 2008 Accepted September 9, 2008 䡲