Music, Film, and Emotion - CiteSeerX

70 downloads 0 Views 721KB Size Report
(5 clips from Amélie, 1 from Little Man Tate, 11 from Maria Full of Grace, 4 from. Memento .... Rachmaninov Piano Concerto No.2: Allegro scherzando. Anxiety.
Quantitative and Visual Analysis of the Impact of Music on Perceived Emotion of Film ROB PARKE, ELAINE CHEW, AND CHRIS KYRIAKAKIS University of Southern California Viterbi School of Engineering This paper presents quantitative and visual methods for the analysis of the effect of music on emotion perceived in film. We discover strong, visible, and quantifiable trends in the effect of music on perceived emotion of film. We perform studies using both selected classical and composed music segments annotated with diverse emotions, paired with ambiguous film clips. We collect and analyze viewers’ ratings of stress, activity, and dominance in the silent film, and film with various music soundtracks. The results are mapped onto a threedimensional emotion space for visual and quantitative analysis. Aggregate scatter plots and center of mass results are presented using this emotion space. We find that the center of mass of the perceived emotion of film and music combined is consistently situated on a path between the center of mass of the music-alone response and that of the film-alone response. Regression analysis based on this trajectory observation results in coefficients with high R2 values (R2 = 0.675 for stress, R2 = 0.817 for activity, and R2 = 0.813 for dominance) for the first study with classical music selections, and lower R2 values (R2 = 0.199 for stress, R2 = 0.405 for activity, and R2 = 0.660 for dominance) for the second study with composed music. We conclude that continuous treatment of the two-dimensional emotion space provides a metric for comparing and assessing emotion ratings, and that spatial interpolation in this space provides a viable method for predicting the effect of music on perceived emotion of film. Categories and Subject Descriptors: H.1.2 [Models and Principles]: User/Machine Systems---Human factors; H.5.5 [Information Interfaces and Presentation]: Sound and Music Computing; J.5 [Arts and Humanities]: Performing Arts; J.5 [Arts and Humanities]: Fine Arts; General Terms: Design, Experimentation, Human Factors, Measurement Additional Key Words and Phrases: film, video, audio, music, emotion rating, emotion space ACM Reference Format: Parke, R., Chew, E., and Kyriakakis, C. 2007. Quantitative and Visual Analysis of the Impact of Music on Perceived Emotion of Film. ACM Comput. Entertaint., X, X, Article X (XXX), 21 pages. DOI = XXXXXXXXXXXX http://doi.acm.org/XXXXXXXXXXXX

INTRODUCTION We investigate the effect of music on viewers’ emotion response to film. The music accompanying a scene, for example, in a film, ballet, or opera, has a profound affect on its meaning, significance, and perceived emotion. Together, the visual events and the music and soundtrack create a rich, cohesive experience. Often, the music is composed or selected to elicit a specific emotion response. In the era of silent films, the director first specifies the emotion of a scene, and then the organist chooses musical examples labeled with that emotion to providing the background “mood music” during its screening [Cohen 2001]. __________________________________________________________________________________________ Authors’ address: University of Southern California Viterbi School of Engineering; email: {parke, echew, ckyriak}@usc.edu This work was supported in part by the University of Southern California Annenberg Center, and the Integrated Media Systems Center, an NSF Engineering Research Center, under Cooperative Agreement No. EEC9529152, and NSF Grants Nos. 0347988 and 0321377. It made use of Integrated Media Systems Center Shared Facilities supported by the NSF Cooperative Agreement No. EEC-9529152. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors, and do not necessarily reflect those of the Annenberg Center or the NSF. Permission to make digital/hard copy of part of this work for personal or classroom use is granted without fee provided that the copies are not made or distributed for profit or commercial advantage, the copyright notice, the title of the publication, and its date of appear, and notice is given that copying is by permission of the ACM, Inc. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or a fee. Permission may be requested from the Publications Dept., ACM, Inc., 2 Penn Plaza, New York, NY 11201-0701, USA, fax: +1 (212) 869-0481, [email protected] ©2007 ACM 1544-3574/07/0100-ART6 $5.00 DOI 10.1145/1219124.1219130 http://doi.acm.org/10.1145/ 1219124. 1219130

ACM Computers in Entertainment, Vol. X, No. X, Article X, Publication date: XXXX.

2



R. Parke et al.

The perceived emotion impact of a scene is also due in large part to the greater context of the film, as well as the more complex issue of induced-emotional effect based on the viewers’ own life experiences. In the present paper we focus on the question of quantifying and studying the effect (and the strength of this effect) of music on contextfree film scenes. We compare emotion ratings for film without music, music alone, and film with music. Following numerous seminal papers in research on music and emotion, we represent emotion ratings as a three-dimensional vector of stress, activity, and dominance. Films having ambiguous mood and content were selected so as to allow for greater flexibility in interpretation. The film clips were paired with selections from two music datasets: a classical set selected for its span of Huron’s annotation of Thayer’s emotion space [Thayer 1989; Huron 2000], and a second set composed to evoke certain emotions for each film clip. We analyze the viewers’ and listeners’ ratings in the emotion space, and use regression models to determine the strength of influence of music and film on the perceived emotion. Our results suggest that the three-dimensional vector space of stress, activity, and dominance is a reasonable and continuous representation of human emotion, and that emotion ratings of film and music combined can be predicted by the separate emotion ratings. The remainder of the paper is organized as follows: we begin with a review of related work with respect to representation of emotion in space, and other studies on film and music; next, we describe the experimental setup and methods, followed by a description of our analysis methods and results, and ending with the conclusions. An appendix provides the detailed plots generated from all the experiments. 1. LITERATURE REVIEW In this section, we review research studies in emotion spaces and emotion perception from video and music. Although the studies on emotion perception of film and music are most closely related to our present work, the discussion of such work cannot take place without first presenting the various research that has been done on the representation of emotion. Thus, we begin first with a review of emotion spaces in Section 1.1, followed by a review of related studies on emotion perception in film with music in Section 1.2. 1.1. Emotion Space Many constructs and variations thereof have been proposed and used in the representing and quantifying of human emotion and the perception of emotion. A common way to measure emotion in the music perception literature uses semantic differential scaling, for example, in Marshall and Cohen’s work on emotion perception of animation with music [1988]. In their work, subjects are asked to rate the stimulus based on a set of bipolar adjectives (for example, good/bad, weak/strong, fast/slow), usually separated by a Likerttype scale (such as, good = 1 and bad = 7). The use of semantic differential scaling has been attributed to Osgood, et. al. [1956], who grouped bipolar adjective pairs (for example, good/bad, weak/strong, fast/slow) into three different factors, represented by orthogonal axes. These factors (evaluation, activity, and potency) were then be used to define a three-dimensional emotion space. The specific names of the dimensions may vary by discipline, but the underlying emotion factors that are being measured are similar. For example, the web-based questionnaire interface used in our study, originally designed to measure emotional reaction to speech audio, uses valence, activation, and dominance [Grimm and Kroschel 2005], which correspond respectively to Osgood’s evaluation, activity, and potency. Another related model is Thayer's two-dimensional arousal model [1989]. Thayer divides a two-dimensional emotion space into four quadrants, separated by orthogonal axes of stress and energy. Furthermore, he makes this space more accessible by ascribing to each quadrant representative emotions such as content, anxious, depressed, and exuberant. Huron has adapted this model and showed it to be effective at mapping the human-perceived mood labels of different musical pieces [2000]. This model can be viewed as a two-dimensional projection of Osgood’s three-dimensional space, where

ACM Computers in Entertainment, Vol. X, No. X, Article X, Publication date: XXXX.

Quantitative and Visual Analysis



3

stress and energy relate to valence and activity. In this paper we will refer to the two dimensions as stress and activity, and the third dimension as dominance. Activity

Exuberant Excited

Anxious Frantic Stress (Valence)

Content Satisfied

Depressed Sad

Figure 1: Thayer’s Two-Dimensional Emotion Space Model 1.2. Emotion Perception from Video and Music There has been a great deal of literature discussing emotion and mood induction as it relates to music, but not much that specifically connects music together with video or film. One noteworthy exception is Marshall and Cohen's work [1988]. In the second of two studies, subjects used 12 pairs of bipolar adjectives (which were then condensed into evaluative, potency, and activity dimensions) to rate animations with and without music. The animation featured two minutes of shapes in motion coupled with "strong" classical music (minor key, accelerating tempo) and "weak" classical music (major key). They found that there was indeed a relationship between the effect of music and film on emotion, and the strongest connection between the two could be seen in the potency and activity dimensions. However, this relationship was shown to not simply be an additive one, especially for the evaluative dimension. Unfortunately, generalizing these findings for all types of visuals (or film specifically) is questionable because the visual shapes are vague. Sirius and Clarke [1994] extended this work by using a more involved animation and varied genres of music and asking subjects again to rate the emotion on 12 scales (4 each for evaluation, potency, and activity). Some of their results seemed to contradict the previous research, suggesting that the abstract visual animations were not interesting enough to maintain subject interest. Bolivar, Cohen, and Fentress studied semantic congruence of mood between music and video clips [1994]. Combining short clips of animals interacting (in a friendly or an aggressive manner) with music (happy or aggressive) music, the study found that subject were more likely to identify correct matches between moods (e.g. friendly animal clip and happy music). In a separate study, subjects were asked to rate the mood of the film, and the results showed that the average rating was increased when the video and audio had congruent moods congruent. In a study using actual film clips, Lipscomb and Kendall investigated how well subjects could determine the “correct” music for the film Star Trek IV: The Voyage Home [1994]. Presented with a scene (containing no audio) and five music clips from the actual movie soundtrack, subjects were asked to identify the correct (original, composerintended) music for the given scene. With a high degree of accuracy, subjects were able to pick the correct music though the scenes without humans posed the greatest challenge, probably because the subjects had no frame of reference without actor expressions and gestures. Bullerjahn and Güldenring investigated the impact of film music on the subjects’ prediction of the outcome of a scene [1994]. Subjects watched a short (10 min), intentionally ambiguous film with soundtracks from various genres and were asked to provide qualitative answers to open-ended questions about their interpretation of the events of the film. The study showed that the genre of music had a profound impact the subjects’ analysis of film; for example, the “thriller” soundtrack engendered responses

ACM Computers in Entertainment, Vol. X, No. X, Article X, Publication date: XXXX.

4



R. Parke et al.

involving a brutal resolution, adventure, and intrigue while the “melodrama” music elicited responses about family relationships and a positive resolution to the film. This study was particularly interesting in its use of both film as well as actual film scores. Not only the results suggest that music affects the mood of a scene, but it also affects future perceptions of events in the film. 2. EXPERIMENT DESIGN We describe here the experiments we have designed to gather participants’ emotion ratings of a set of film clips with music-alone, with film-alone, and with film-and-musiccombined. This section presents the criteria and process for selecting the film and the music clips used in the study. It also details the systematic setup of the experiment, as well as the study interface created for presenting the clips to the participants. 2.1. Media This section describes the media selection and preparation for our experiments. 2.1.1. Film: Scene Selection Our criterion for selecting the film clips was that they be ambiguous in order to allow the music to have the greatest emotional impact on the indeterminate visual events. First, 24 clips (each 15-30 seconds) were excerpted from five movies for evaluation (5 clips from Amélie, 1 from Little Man Tate, 11 from Maria Full of Grace, 4 from Memento, and 3 from Three Kings). We specifically selected lesser known films in order to limit subjects’ prior memories of, or emotional associations with, the films. We extracted the scenes from DVD versions of the movies, removed the original audio (music, dialogue, etc.), and encoded the video as high quality MPEG-format files. Next, we created a list of thirty-three possible descriptors for each scene, and the descriptors were grouped into twelve classes (the descriptors and mood classes are shown in Table 1); each of the authors independently assigned descriptors to each film clip; and, we then measured the ambiguity of each film clip based on the number of adjective classes spanned by the descriptors. A film clip having assigned adjectives spanning a larger number of classes was considered more ambiguous than one having a narrower span. Table 1. Mood Categories For Film Selection Original Mood Happy

Aggregate Mood

Original Mood anxious / on-edge

pleasant Calm humorous

positive

frightening terrifying dangerous

painful unpleasant sorrowful anguished Sad morose depressing

negative

resigned

resigned

Aggregate Mood

exciting (negative)

Original Mood grim

grim disgusting compassionate

compassionate

determined

exciting thrilling

exciting (positive)

determined matter of fact

romantic

romantic

uplifting hopeful inspiring triumphant

angry puzzling confusing

angry

Aggregate Mood

uplifting

puzzling

To ensure which clips would ultimately be the most open to interpretation, and ambiguous in content and emotion, we eliminated candidate clips deemed pivotal or distinctly memorable in original film, or if they presented actions or elements that could be perceived as emotionally polarizing, such as a character dying, or a handgun being shown. For the remaining seventeen clips, we selected them according to the following

ACM Computers in Entertainment, Vol. X, No. X, Article X, Publication date: XXXX.

Quantitative and Visual Analysis



5

criteria: 1) scenes should be emotionally ambiguous (i.e. having high ratings in contrasting categories); and, 2) exactly one clip should be chosen from each film so as to allow for greater variety in the candidate clips. For the contrasting category criteria, we selected clips that received relatively high ratings in both positive and negative mood categories, as well as exciting (positive) and exciting (negative) mood categories, or those in the puzzling category. Table 2 shows the relevant emotional ratings for the final clips selected for our experiment. In order to ensure that the subjects were not influenced by viewing multiple scenes from the same movie, exactly one scene was chosen from each film. In cases where there were two equally ambiguous possible selections, as in the case for Amélie and Maria Full of Grace, the composer of music set two (to be described in the coming section) made the final selection between the two candidates clips. In Table 2, the underlined clip represents the composer’s choice, and Table 3 describes the visual content of each of the final selections. Table 2. Most Ambiguous Candidate Clips Clips

Amélie

Memento

Tate

Three Kings

10

4

1

3

12

2

8

6 14

Maria

Key Categories

1

5

8

positive

2

6

6

negative

2

2

10

5

11

13

exciting (pos)

8

10

10

10

2

2

4

exciting (neg)

18

8

6

4

10

8

12

puzzling

4

4

2

2

2

4

2

Table 3. Final Clip Descriptions Film Amélie Memento Maria Tate Three Kings

Description A woman finds a box hidden in the wall and opens it. A truck pulls up to an old dilapidated shack; a man gets out, and walks to the shack (black & white). A teenage girl is watching a sonogram monitor. A young boy embraces a woman and crosses the street to meet another woman. A group of people and soldiers walk across the desert.

2.1.2. Music: Selection and Composition Next, we selected music to be paired with the video clips. We chose two separate groups of music for the experiment described in this paper: the first set consisted of classical instrumental music, and the second comprised of film score-style music specifically composed for this study. For music set one, we selected four musical excerpts from the classical genre for each clip (twenty pieces in total). All selections (shown in Table 4) were solo instrument recordings with piano and/or strings, and unlike the film selection, many of the pieces could be considered familiar by most listeners. We categorized each piece based on one of four disparate emotion descriptors (following similar emotion space classifications, for example [Juslin 2000]): content, depressed, exuberant, and anxious, and one piece from each category was paired with each film. To accommodate the short length required due to the length of the video clips, and to allow the music to have the greatest impact on the ambiguous video, we chose musical excerpts that are potent examples of the corresponding emotional descriptor; that is, we selected segments of music in which the mood was immediately apparent rather than segments where the mood developed over time (similar to the method employed in [Bullerjahn and Güldenring 1994]). We extracted the audio examples from CDs and edited them to fit the length of the video clips. For music set two, a professional composer, James Post, wrote four instrumental pieces in a popular style specifically for each of the five video clips (twenty pieces in all,

ACM Computers in Entertainment, Vol. X, No. X, Article X, Publication date: XXXX.

6



R. Parke et al.

as in set one). The only instruction we gave to the composer was that each of the four music tracks for a given music should be distinct in its emotional content. This is in contrast to the selection criteria for music set one, where we used four disparate mood descriptors to choose musical segments. The intent here, in set two, was to allow for more ambiguity in the emotional intent of the music, as well as to simulate the effect of having a film composer create music for a specific emotional effect. Table 5 below provides short descriptions from the composer which offers some insight into his intent for the music. All but one of the clips was instrumental. The composer recorded the clips and provided them as high-quality WAV audio files. 2.1.3. Combined Media After removing the original audio from the video excerpts, we added the music for each study. Ultimately, there are five versions of each video clip: one without audio, and four other versions with different audio tracks, resulting in a total of fifty video/audio clips (twenty-five for each study). Table 4. Classical Music and Mood Composer Mahler Debussy Verdi Grieg Radiohead Chopin Chopin Beethoven Radiohead Schumann Chopin Kreisler Rachmaninov Massenet Liszt Mozart Chopin Brahms Rachmaninov Mozart

Work Allegro Assai Und Sehr Trotzig Clair de Lune La Traviata: Act I: Prelude Holberg Suite, Op.40: Prelude (nice dream) Prelude in Db, Op.28, No.15, “Raindrop” Prelude in e, Op.28, No.4 Rage Over a Lost Penny Airbag Traumerei, Op.15, No.7 Piano Sonata No.2 in bb, Op.35: Funeral March Leibesfreud Piano Concerto No.2: Allegro scherzando Thäis: Meditation Grandes Etudes de Paganini: No.3 in g#: La Campanella Piano Concerto No.1 in D: Allegro Etude in c, Op.10, No.12, “Revolution” Intermezzo in Eb, Op.117, No.1 Prelude in c# Violin Concerto No.4 D: Allegro

Mood Anxiety Content Depressed Exuberant Anxiety Content Depressed Exuberant Anxiety Content Depressed Exuberant Anxiety Content Depressed Exuberant Anxiety Content Depressed Exuberant

Film Amélie Amélie Amélie Amélie Memento Memento Memento Memento Maria Maria Maria Maria Tate Tate Tate Tate Three Kings Three Kings Three Kings Three Kings

2.2. Web Interface We presented the video/audio clips to viewers through a web-based interface. For the framework for the web interface, we started with the one originally developed by Michael Grimm to record the emotion ratings of audio speech samples [Grimm and Kroschel 2005]. In the interface, participants first provide some background information (age, sex, etc.), as well as their present mood. Participants are allowed to take the two studies separately, but each one had to be completed in its entirety in one sitting, although short breaks are possible. As an extension to Grimm’s framework, we developed support for video files. More importantly, we implemented the ability to randomize the order in which the files are played. When a participant logs into the interface, two random number sequences are generated: one to determine the order of films to be played and a second to determine the order of clips of that film. The music clip order is random, but the "silent" (Film-Alone) video clip is always played first in order to yield a control response. By randomizing the order of the remainder video clips, we control for potential biases resulting for film order. Based on the randomly generated sequence, each clip is played and the participant is asked to rate its dominance, valence, and activation, as well as the overall mood of the clip. To rate each of the three emotion space dimensions, participants are presented with a visual, five point Likert-type scale featuring Self Assessment Mannequins (SAMs). SAMs (see Figure 2) provide graphic descriptions for each point on the Likert-scale, and

ACM Computers in Entertainment, Vol. X, No. X, Article X, Publication date: XXXX.

Quantitative and Visual Analysis



7

have been shown to be effective tools for recording emotion from subjects [Grimm 2005]. To rate the overall mood, the participant chooses from among the four emotional space-quadrant descriptors: content, depressed, exuberant, and anxious. There was no time limit for the clips, and participants can replay the clip as necessary. To avoid the more complex analysis of mood-induction, participants are asked to rate the perceived emotion and mood of the clips, and not the mood engendered in them after viewing the clip. Table 5. Composed Music and Descriptions Description Mysterious, foreboding, darkness Curious, playful, sneaky, "I wonder what is in the box" Sad, mourning, longing, "this is my last hope" Anxious, happy, anticipating Sneaky, investigative, film noir Thought-provoking, passing time, building Chaos, angry, insanity Moving on, mellow, "everything will be alright in time" Sweetness, caring, motherly Evil, omen, "now I have them!" Sadness, "a let down" Craziness, out-of-control, nonsense Sadness, loss, confusion Moving on for the better, inevitable, reserved emotions Cheesy reunion, warm feelings, 80s nostalgia Anxiety, predator lurking near, scary Courage, with hope, "saving the people" Dreamy, ethereal, a nice walk Sneaky, risky journey, "danger may lie ahead" Scary, agitated

Film Amélie Amélie Amélie Amélie Memento Memento Memento Memento Maria Maria Maria Maria Tate Tate Tate Tate Three Kings Three Kings Three Kings Three Kings

2.3. Control Group Baselines As described above, the silent film clip was always presented before the clip featuring music to allow for a control rating of film alone. In addition, we conducted a separate study using the same interface to obtain a control group evaluation of the music clips alone.

Figure 2. Screenshot of Interface 2.4. Subjects Forty-seven participants completed both studies via the web interface (partiallycompleted studies were excluded from analysis and discarded). There were 19 females and 28 males, having an average age of 32.6 years. Twenty of the participants were

ACM Computers in Entertainment, Vol. X, No. X, Article X, Publication date: XXXX.

8



R. Parke et al.

graduate students, and all subjects represented a diverse professional and ethnic background. For the separate baseline evaluation of the music alone, seven participants contributed 4 females and 3 males, with an average age of 28.9 years. 3. RESULTS In this section, we present the results of the two studies: (1) the film scenes paired with classical music selections; (2) more film scenes paired with composed music. 3.1. Scatter Plots We aggregate all responses to each video/audio clip and present them in Appendix A (with two of the film results appearing in Tables 6-7). The results are organized by film. For each film, we have responses for five versions of the clip, one without sound, and four with different music segments. In the left-most column, we present the results for the emotion ratings of the film without sound. In the center column, we present the aggregate ratings for each of the four different clips; and, in the rightmost column, we show the ratings for the music alone for each of the four music soundtracks. In the charts, we weight each point in the emotion space by the appropriate number of responses; the weighting is reflected in the radius and color of the disc representing the data point. The color bar next to each chart shows the color code for the percentage response. A note about the data: originally, the subjects' responses were from 1-5 along valence, activity, and dominance dimensions. For visual presentation, these scores were normalized to range from -2 to 2, making the origin the "desired" ambiguous rating. Also, the valence axis was inverted and labeled stress. This was done to make the emotion space more compatible with Thayer’s emotion space described earlier [Thayer 1989; Huron 2000]. 3.1.1. Film-Alone Baseline The film-alone baseline charts (see, for example, the leftmost column in Table 6) represent the scatter plot for the responses for film alone (using the activity and stress axes). In general, we see that the average response for each film-alone (see Tables 6a and 6a, and Appendix A) is more centrally distributed around the origin than the ratings for music-alone, which matches our intention that the clips be "ambiguous." For instance, the average rating of Amélie (Table 6a) is near the origin, and all four mood quadrants are well represented in the film-alone results; thus supporting the ambiguity of the film-alone content. On the other hand, Maria Full of Grace (see Table 7a) is much more skewed toward low stress, low activity, and contentment. This likely occurred because the clip’s visual scene (which featured a teenage girl seeing a sonogram) conveyed a positive and calming emotion. However, note that the film-alone scatter plots are more centrally distributed than the music-alone selections. Finally, the ratings for the dominance dimension (not featured in the graphs) also match the "ambiguity" criteria of central distribution near zero, and can be found Appendix C (Table C-1). During each of the two studies, the one with classical selections and the one with composed music, participants were presented the silent version of each film, so each film-alone clip was rated twice. We found that the two ratings are very consistent one with another, with the difference being 3.14% on average for each of three dimensions, which indicates a considerate level of intra-person reliability. These results are shown in Table C-1 in Appendix C. 3.1.2. Music-Alone Baseline Scatter plots for the music-alone analysis are presented in Table 6-7 and in Appendix A (again showing stress and activity). In contrast to the film-alone results, the music-alone aggregate data was significantly polarized (far from the center), which was our original intent. These trends can be found in both the classical selections and the composed music. Two other trends can be seen in the data. First, in the classical study, our original determination of mood for the music clips is consistent with the subject responses. In 100% of the classical music selections, the majority of the subjects selected the same

ACM Computers in Entertainment, Vol. X, No. X, Article X, Publication date: XXXX.

Quantitative and Visual Analysis



9

mood as the researchers. Secondly, for both the classical and composed music-alone ratings, the weighted center of mass of the responses almost always occurs in the same quadrant as the majority mood quadrant. For example in the Appendix A Table A-2a, six of seven respondents chose the mood of clip 11 (Chopin’s "Funeral March") to be "depressed," which is in the lower right quadrant, and the average stress and activity rating is in the lower right quadrant as well. In only two clips (clip 15 Lizst’s "La Campenella" and composed clip 25 "Sneaky"), the average stress and activity point occurs on the quadrant boundary axis, and not in the quadrant itself (see Tables A-4a and A-5b). Other than these two exceptions, the mood ratings were consistent with the center of mass of the aggregate ratings. 3.2. Center of Mass Plots To summarize the data for analysis, we present the next sequence of center of mass data plots. On these center of mass charts, each point represents an average of one of the previous scatter plot graphs: film-alone responses are represented by a red circle, the music-alone responses by a square, and the combined music and film responses by a star. There are five plots, one generated for each film. The related the music-alone (square) and combined film-music (star) data pair are presented in the same color. Finally, the influence of the music on the film is shown as the trajectory from the film-alone (red circle) through combined film-music (star) to the music-alone (square) data points. 3.2.1 Study 1: Classical Figures 9a-e display the center of mass plots for each film and the accompanying music, plotting stress ratings against activity ratings. As evidenced by the data, the music clearly has a profound impact on the emotion rating of the film clip. Memento, for instance, is a prototypical example of the hypothesized effect of music influencing the emotion rating of film (Figure 9c). The somewhat ambiguous film has a center of mass near the origin with polarized music clips residing in each of the four quadrants. For each music segment rating, note that the center of mass for the combined film-music clip is situated on a path between the film-alone center of mass and the respective music-alone centers of mass. For the other films, this relationship persists but with slightly more variability, as can be seen in Figures 9a-e. Figures 11a-e show the center of mass plots for the dominance versus activity ratings. As in the stress versus activity plots (Figures 9a-3), with few exceptions, the film-and-music-combined ratings tend to lie on a smooth and monotonically increasing or decreasing trajectory from the film-alone rating to the music-alone rating. Compared to the other center of mass plots (in Figures 9a-e), each trajectory from the film-alone through the music-and-film-combined to the music-alone rating is closest to being linear. 3.2.2. Study 2: Composed The results for the composed music study using the stress and activity ratings (Figures 10a-e) are similar to the results for the classical music discussed above. In general, the music and film average occurs on a trajectory from the film-alone to the music-alone centers of mass. For this second study, however, fewer parts of the emotion space are represented by the music segment ratings. Again using Memento as an example (Figure 10b), the music-alone centers of mass are congregated near the origin and only occur in the upper right quadrant (anxious) and lower left quadrant (content). This is due to the fact that the composer was given freedom in choosing the moods of the clips.

ACM Computers in Entertainment, Vol. X, No. X, Article X, Publication date: XXXX.

10



R. Parke et al.

Table 6a. Classical Study: Scatter Plots – Amélie Film Alone

Film and Music Combined

Music Alone

Piece / Emotion Mahler – Rondo (anxious)

Debussy - Clair de Lune (content)

Verdi – La Traviata: Prelude Act I (depressed)

Grieg – Prelude from Holberg Ste op. 40 (exuberant)

ACM Computers in Entertainment, Vol. X, No. X, Article X, Publication date: XXXX.

Quantitative and Visual Analysis



11

Table 6b. Composed Study: Scatter Plots – Amélie Film Alone

Film and Music Combined

Music Alone

Composer’s Notes mysterious, foreboding, darkness

curious, playful, sneaky, "i wonder what is in the box"

sad, mourning, longing, "this is my last hope"

anxious, happy, anticipating

ACM Computers in Entertainment, Vol. X, No. X, Article X, Publication date: XXXX.

12



R. Parke et al.

Table 7a. Classical Study: Scatter Plots – Maria Film Alone

Film and Music Combined

Music Alone

Piece / Emotion Radiohead – Airbag (anxious)

Schumann – Traumerei op 16, no 7 (content)

Chopin – Piano Sonata #12, Funeral March (depressed)

Kreisler – Leibesfreud (exuberant)

ACM Computers in Entertainment, Vol. X, No. X, Article X, Publication date: XXXX.

Quantitative and Visual Analysis



13

Table 7b. Composed Study: Scatter Plots – Maria Film Alone

Film and Music Combined

Music Alone

Composer’s Notes sweetness, caring, motherly

evil, omen, "now I have them!"

sadness, "a let down"

craziness, out-ofcontrol, nonsense

ACM Computers in Entertainment, Vol. X, No. X, Article X, Publication date: XXXX.

14



R. Parke et al.

Figure 9. Classical Study: Center of Mass Plots – Stress vs. Activity

Amélie Rondo (A) Clair (C) Traviata (D) Grieg (E)

(a) Amélie

Maria Airbag (A) Traumerei (C) Funeral (D) Leibesfreud (E)

(b) Maria

Tate Concerto 3 (A) Thais (C) Campenella (D) Concerto 1 (E)

Memento (nice dream) (A) Raindrop (C) Prelude (D) Rage (E)

(c) Memento

(d) Tate

Film Alone Three Kings Etude (A) Intermezzo (C) Prelude (D) Concerto 4 (E)

Film & Music Music Alone

(e) Three Kings

ACM Computers in Entertainment, Vol. X, No. X, Article X, Publication date: XXXX.

Quantitative and Visual Analysis



15

Figure 10. Composed Study: Center of Mass Plots – Stress vs. Activity

Maria Sweetness Evil A Let Down Craziness

Amélie Mysterious Curious Sad Anxious

(a) Amélie

(b) Maria

Tate Confusion Inevitable Nostalgia Predator

Memento Sneaky Thought-Provoking Chaos Moving On

(c) Memento

Three Kings Courage Dreamy Risky Journey Scary

(d) Tate

Film Alone Film & Music Music Alone

(e) Three Kings

ACM Computers in Entertainment, Vol. X, No. X, Article X, Publication date: XXXX.

16



R. Parke et al.

Figure 11. Classical Study: Center of Mass Plots – Dominance vs. Activity

Maria Airbag (A) Traumerei (C) Funeral (D) Leibesfreud (E)

Amélie Rondo (A) Clair (C) Traviata (D) Grieg (E)

(a) Amélie Memento (nice dream) (A) Raindrop (C) Prelude (D) Rage (E)

(c) Memento Three Kings Etude (A) Intermezzo (C) Prelude (D) Concerto 4 (E)

(b) Maria

Tate Concerto 3 (A) Thais (C) Campenella (D) Concerto 1 (E)

(d) Tate

Film Alone Film & Music Music Alone

(e) Three Kings

ACM Computers in Entertainment, Vol. X, No. X, Article X, Publication date: XXXX.

Quantitative and Visual Analysis



17

Figure 12. Composed Study: Center of Mass Plots – Dominance vs. Activity

Maria Sweetness Evil A Let Down Craziness

Amélie Mysterious Curious Sad Anxious

(a) Amélie

(b) Maria Tate Confusion Inevitable Nostalgia Predator

Memento Sneaky Thought-Provoking Chaos Moving On

(c) Memento

Three Kings Courage Dreamy Risky Journey Scary

(d) Tate

Film Alone Film & Music Music Alone

(e) Three Kings

ACM Computers in Entertainment, Vol. X, No. X, Article X, Publication date: XXXX.

18



R. Parke et al.

Furthermore, the composed music tends to exhibit a stronger polarization effect on the film clips than the classical music does; that is, the film-and-music-combined centers of mass occur very close to the music-alone centers of mass. For example, in Figure 10e showing the results for Three Kings, the film-and-music-combined and music-alone centers of mass for segments “Dreamy” (orange) and “Risky Journey” (dark blue) are nearly superimposed. One possible explanation for this is that the emotion of the music was more difficult for subjects to characterize thus it drew their attention from the film itself. Another explanation could be that the deliberately composed music employed established cues for emotional effect, thereby eliciting a more potent effect on film-andmusic-combined clip ratings. The center of mass plots using the dominance and activity ratings are shown in Figures 12 a-e. As with the previous plots, the film-and-music-combined ratings fall along a path between the film-alone and music-alone rating. 3.3. Regression Analysis Following up on our observation that the film-and-music-combined rating is situated on a path between the film-alone and music-alone ratings, we performed multiple linear regression analysis on the data to investigate the relationship between the centers of mass. Considering each of the three emotion space dimensions separately, we hypothesized a linear relationship where the film-alone average and music-alone average combine to yield the average of film-and-music-combined clip. The predicted relationship between the centers mass is as follows: FMi,j = αi,j . Fi,j + (1 – αi,j) . Mi,j + ε,

(1)

where i ∈ {1,2} is the experiment set, and j ∈ {stress, activity, dominance} is the emotion dimension. MF is the combined film-and-music-combined center of mass, F is the film-alone center of mass, M is the music-alone center of mass, α is the proportional weighting of film to music, and ε represents noise. The results of the regression analysis for study 1 with classical music selections can be summarized as follows for the three dimensions of stress, activity, and dominance: emotion dimension j stress activity dominance

α1,j

R2

F-value

p

0.3825 0.4260 0.4555

0.6749 0.8168 0.8134

37.37 80.24 78.46

0.000 0.000 0.000

The results for study 2 were not as strong as compared to the study 1 statistics. The resulting equation and coefficients for study 2, with composed music is as follows: emotion dimension j stress activity dominance

α2,j

R2

F-value

p

0.1741 0.4114 0.8267

0.1999 0.4046 0.6599

4.496 12.23 34.93

4.810×10–2 2.600×10–2 0.000

The detailed statistics for this composed study statistics can be found in Table D-2 in Appendix D. As in the previous case, from the R2, F-values, and p values, we can see that the stress data had more variability than the activity and dominance data. Overall, the R2 statistics were much lower for this composed music study than for the classical music study. Suppose we combine the results for experiments 1 and 2, and assume the following equation for relating the results: FMj = αj . Fj + (1 – αj) . Mj + ε,

ACM Computers in Entertainment, Vol. X, No. X, Article X, Publication date: XXXX.

(2)

Quantitative and Visual Analysis



19

where j ∈ {stress, activity, dominance} is the emotion dimension. The results of the regression analysis for both studies combined are as follows: emotion dimension j stress activity dominance

α,j

R2

F-value

p

0.3143 0.4520 0.5801

0.4999 0.6735 0.6897

37.98 78.38 84.48

0.000 0.000 0.000

As expected, the statistical parameters for the analysis of both studies combined reflect the less linear relationship in the composed study, and the detailed statistics for this combined analysis can be found in Table D-3 in Appendix D. The activity and dominance dimensions were again more accurately modeled than the stress dimension. The coefficient data and statistical results, including error means and variances, can be found in Tables D-1, D-2, and D-3 in Appendix D. 4 CONCLUSION This section provides a discussion of the results and their broader implications, as well as avenues for future research directions. 4.1. Discussion We began our experiments with the supposition that music has a significant impact on the perceived emotional rating of film clips. To study the impact of music on film, we paired ambiguous or emotionally neutral visual scenes with music segments with strong emotion ratings. Our results support the original premise on the strong effect of music on film. Moreover, we have proposed and demonstrated a way to assess, quantitatively and graphically, the degree of the emotional impact. Using a model similar to Osgood’s three-dimensional space [1956] consisting of axes representing stress, activity, and dominance, we have shown that the resulting emotion rating of the film with music tends to reside between that of the film alone, and of the music alone. Our analyses show that emotion ratings of music alone and film alone are good predictors of the ratings of the music and film combined. We found that a linear combination of the emotion ratings of film alone and music alone fit the emotion rating of film with music with high R2 (0.675, 0.817, and 0.813 for stress, activity, and dominance, respectively) for the classical music study and lower R2 (0.199, 0.405, and 0.660 for stress, activity, and dominance, respectively), and corresponding p-value indicators, for the composed music study. The results suggest that the three-dimensional vector space first described in Osgood, et. al. [1956] is a reasonable and continuous representation of human emotion, and that emotion ratings of film-and-music-combined can be predicted by the separate emotion ratings for music-alone and film-alone. We show visually, first in Thayer’s and Huron’s two-dimensional space [Thayer 1989; Huron 2000] that spatial interpolation is a promising method for predicting the effect of music on perceived emotion of film. We also tested and verified similar behavior in other two-dimensional projections of the original Osgood space [1956]. We conclude that continuous treatment of the emotion space (three- or two-dimensional) provides a metric for comparing and assessing emotion ratings, and that spatial interpolation may be used to predict perceived emotion of film with music from that of music alone and film alone. Though the two-dimensional models of Thayer and Huron use dimension of stress and activity, our data from the two experiment sets seem to indicate that dominance, not stress, might be a more reliable predictor of emotion rating. Dominance has a relatively high R2 value in the experiment sets; the data shows its significance to be higher than that for stress, and nearly equal to that for activity. These results can also be seen in the dominance versus activity plots for the classical study (Figures 11a-e). In these plots, the film-and-music-combined ratings tend to occur most nearly on a linear trajectory between the film-alone and music-alone ratings.

ACM Computers in Entertainment, Vol. X, No. X, Article X, Publication date: XXXX.

20



R. Parke et al.

4.2. Future Work This study represents an investigation into understanding and quantifying the effect of music segments with strong emotional ratings on the emotion perception of ambiguous film clips. Our experiments used 40 different music segments annotated with emotion labels, 20 selected from existing classical pieces, and 20 composed specially for the experiment. Further studies using larger data sets should be conducted to validate the linear relation observed between the ratings of music alone, film without music, and film with music. It would be worthwhile to explore, in the future, the perceived emotion of visual clips with high emotional ratings with emotionally ambiguous music, or the perceived emotion of film scenes with a strong emotional content with music having equally strong, but opposite, emotional ratings. The study can investigate if the linear model still holds in such cases. Further extensions of this work can be in the form of automating the emotion annotation process. For example, the qualitative emotion responses from subjects could be combined with audio signal analysis to correlate audio features with subject responses. The regression can be performed between emotion ratings of film-and-music-combined with that of film-alone, and statistically predicted emotion ratings of music-alone. In the same manner as Sloboda, who identified works having musical attributes causing particular physiological reactions in listeners [1991], the emotion ratings of music alone could be computed from statistical models trained on audio features and emotion annotations. This type of modeling has been used in speech [Yildirim, et. al. 2004] as well as music analysis [Lu, et. al. 2006]. ACKNOWLEDGEMENTS We would like to thank James Post for composing the music for our second experiment set and Merrick Mosst for contributing to the classical music mood annotation and for his references on emotion space. In addition, we are very grateful to the Speech Analysis and Interpretation Laboratory and Michael Grimm for providing us the initial framework for the web interface, and to all the participants who generously gave their time and feedback to evaluate all the video and audio clips. REFERENCES BOLIVAR, V. J., COHEN, A. J., AND FENTRESS, J. C. 1994. Semantic and Formal Congruency in Music and Motion Pictures: Effect on the Interpretation of Visual Action. Psychomusicology, 13, 28-59. BULLERJAHN, C. AND GÜLDENRING, M. 1994. An Empirical Investigation of Effects of Film Music Using Qualitative Content Analysis. Psychomusicology, 13, 99-118. COHEN, ANNABEL J. 2001. Music as a Source of Emotion in Film. In Patrik N. Juslin and John A. Sloboda (Eds.), Music and Emotion. Oxford: Oxford University Press, 249-272. GRIMM, M. 2005. Emotionen in der Sprache: Datenbankkonzepte und Intonationstranskription. Universität Karlsruhe: Institut für Nachrichtentechnik - Institut für Automation und Robotik. GRIMM, M. AND KROSCHEL, K. 2005. Evaluation of Natural Emotions Using Self Assessment Manikins. In Proceedings of IEEE Automatic Speech Recognition and Understanding Workshop. HURON, D. 2000. Perceptual and Cognitive Applications in Music Information Retrieval. ISMIR. JUSLIN, P. N. 2000. Cue Utilization in Communication of Emotion in Music Performance: Relating Performance to Perception. Journal of Experimental Psychology, 26, 1797-1813. KEHREIN, R. 2002. The Prosody of Authentic Emotions. In Proceedings of Speech Prosody Conference, Aixen-Provence, France, 423–426. LIPSCOMB, S. D. AND KENDALL, R. A. 1994. Perceptual Judgment of the Relationship between Musical and Visual Components in Film. Psychomusicology, 13, 60-98. LU, L., LIU, D., AND ZHANG, H. 2006. Automatic Detection and Tracking of Music Audio Signals, IEEE Trans. Audio, Speech, and Lang. Proc., 14, 1, 5-18. MARSHALL, S. K. AND COHEN, A. J. 1988. Effects of Musical Soundtrack on Attitudes toward Animated Geometric Figures. Music Perception, 6, 95-112. OSGOOD, C. E., SUCI, G. J., AND TANNEBAUM, P.H. 1956. The Measurement of Meaning. Urbana: University of Illinois Press. THAYER, R. 1989. The Biopsychology of Mood and Arousal. New York: Oxford University Press. SIRIUS, G. AND CLARKE, E. F. 1994. The Perception of Audiovisual Relationships: A Preliminary Study. Psychomusicology, 13, 119-132. SLOBODA, J. 1991. Music Structure and Emotional Response: Some Empirical Findings. Psychology of Music, 19, 110-120. SMITH, J. 1999. Movie Music as Moving Music: Emotion, Cognition, and the Film Score. In Carl Pantiga and Greg M. Smith Eds., Passionate Views: Film, Cognition, and Emotion. Baltimore: The Johns Hopkins

ACM Computers in Entertainment, Vol. X, No. X, Article X, Publication date: XXXX.

Quantitative and Visual Analysis



21

University Press, 149-167. YILDIRIM, S., BULUT M., LEE, C. M., KAZEMZADEH, A., BUSSO, C., DENG, Z., LEE, S., AND NARAYANAN, S. 2004. An Acoustic Study of Emotions Expressed in Speech, in Proceedings of ICSLP.

ACM Computers in Entertainment, Vol. X, No. X, Article X, Publication date: XXXX.