Recognition of Emotion in Moving and Static Composite Faces - CASRA

3 downloads 0 Views 244KB Size Report
S.D.Chiller-Glauset al.: DynamicEmotions in Composite Faces. SwissJ.Psychol.70(4)©2011byVerlag Hans Huber,HogrefeAG,Bern. Original Communication.
Swiss Journal of Psychology, 70 (4), 2011, 233–240

Sw issJ. S. D. Psychol. Chiller-Glaus 70 (4) ©et2011 al.: Dynamic by Verlag Emotions Hans Huber, in Hogrefe Composite AG,Faces Bern

Original Communication

Recognition of Emotion in Moving and Static Composite Faces Sarah Dagmar Chiller-Glaus1,2, Adrian Schwaninger3, Franziska Hofer1, Mario Kleiner4, and Barbara Knappmeyer4,5 1

Department of Psychology, University of Zurich, Switzerland, 2Swiss University of Distance Education, Brig, Switzerland, 3School of Applied Psychology, Institute Humans in Complex Systems, University of Applied Sciences Northwestern Switzerland, Olten, Switzerland, and Department of Informatics, University of Zurich, Switzerland, 4Department of Human Perception, Cognition and Action, Max Planck Institute for Biological Cybernetics, Tübingen, Germany, 5Center for Neural Science, New York University, USA Abstract. This paper investigates whether the greater accuracy of emotion identification for dynamic versus static expressions, as noted in previous research, can be explained through heightened levels of either component or configural processing. Using a paradigm by Young, Hellawell, and Hay (1987), we tested recognition performance of aligned and misaligned composite faces with six basic emotions (happiness, fear, disgust, surprise, anger, sadness). Stimuli were created using 3D computer graphics and were shown as static peak expressions (static condition) and 7 s video sequences (dynamic condition). The results revealed that, overall, moving stimuli were better recognized than static faces, although no interaction between motion and other factors was found. For happiness, sadness, and surprise, misaligned composites were better recognized than aligned composites, suggesting that aligned composites fuse to form a single expression, while the two halves of misaligned composites are perceived as two separate emotions. For anger, disgust, and fear, this was not the case. These results indicate that emotions are perceived on the basis of both configural and component-based information, with specific activation patterns for separate emotions, and that motion has a quality of its own and does not increase configural or component-based recognition separately. Keywords: emotion recognition, composite faces, dynamic information, facial motion, configural versus part-based processing

Introduction Recognizing other people’s identity and emotional state is a basic and important skill in social interaction that we are able to perform with great accuracy and consistency. A common classification of the information contained in faces is the distinction between component information, referring to separable local elements such as eyes, mouth, or nose (Carey & Diamond, 1977; Sergent, 1984), and configural information, referring to the spatial relations of these elements (Bruce, 1988; for reviews, see Schwaninger, Carbon, & Leder, 2003; Schwaninger, Wallraven, Cunningham, & Chiller-Glaus, 2006). Several hypotheses have been proposed to explain the mechanisms used in adult face processing, each postulating different roles for this local and global information. According to the holistic hypothesis, upright faces are stored as unparsed perceptual wholes in which components are not explicitly represented (Farah, Tanaka, & Drain, 1995; Farah, Wilson, Drain, & Tanaka, 1998; Tanaka & Farah, 1993). Numerous empirical findings can be interpreted to support this view. Tanaka and DOI 10.1024/1421-0185/a000061

Farah (1993), for example, found that facial components were much more difficult to recognize when presented in isolation than when presented in the context of a whole face. No such advantage of context was found, however, for inverted faces, scrambled faces, and houses. The authors, thus, concluded that face recognition relies mainly on holistic representations, in contrast to the recognition of objects. On the other hand, the component configural hypothesis assumes that face recognition relies on explicit representations of both component and configural information (e.g., Leder & Bruce, 2000; Searcy & Bartlett, 1996; Sergent, 1984; Williams, Moss, & Bradshaw, 2004; for an overview, see Schwaninger et al., 2006). Schwaninger, Lobmaier, and Collishaw (2002) reduced configural and component information separately using a blurring and scrambling technique. They showed that previously learned intact faces could be recognized even when they were scrambled into constituent parts, which challenges the assumption of purely holistic processing as assumed by Farah et al. (1995) and suggests that facial components are encoded and stored explicitly. Also, participants were able to reliably recognize blurred whole faces, which did not Swiss J. Psychol. 70 (4) © 2011 by Verlag Hans Huber, Hogrefe AG, Bern

234

S. D. Chiller-Glaus et al.: Dynamic Emotions in Composite Faces

contain any component information. This result suggests that separate representations exist for component and configural information. An interesting effect in this context is the composite effect (Young, Hellawell, & Hay, 1987): When combining the top half of one face with the bottom half of another face in alignment, recognition is significantly impaired when the halves are misaligned. This research paradigm, originally designed for facial identity, also proved to be applicable to expression recognition (Calder, Young, Keane, & Dean, 2000). Calder et al. (2000) combined the top half of a face depicting one expression with the bottom half depicting another expression. Consistent with Young et al. (1987), they found recognition performance to be much more accurate when the two halves were shifted sideways (misaligned) than when aligned. However, when the two halves of a face depicted the same emotion rather than two different emotions, recognition performance was more accurate in the aligned condition than in the misaligned condition. Here, too, the assumption that the two halves fuse to form a single impression accounts for the results: Fusion facilitates recognition when the two halves depict the same emotion and impedes it when they do not. Overall, the study by Calder et al. shows that, as for the processing of facial identity, holistic or configural information is generally pivotal for the recognition of facial expression (with or without explicit representation of configural and component information). It is interesting to note that some authors take the composite face effect as evidence for holistic processing (see White, 2000), while others see it as support for separate configural and component processing. However, Young et al. themselves explicitly state that “configurational and featural information are (. . .) both likely to contribute to normal face recognition” (p. 758). This might, of course, still include holistic processes if one assumes that “holistic” refers to the integration of all information present in a face to form one global impression. Yet, it seems unlikely that a single template lacking any explicit featural, or componentbased, information, as originally stated in the holistic hypothesis, could suffice. While most studies on identity or emotion recognition use static images, in reality faces are not static objects, but rather are constantly in motion. Compared to the wealth of research on emotion and identity recognition, only a few studies have addressed the role of dynamic information, for instance, whether faces are better recognized when seen in motion rather than statically (e.g., Ambadar, Schooler, & Cohn, 2005; Bassili, 1978). For the recognition of identity as well as expression, research has not yet established a clear answer as to whether motion facilitates recognition or not (O’Toole, Roark, & Abdi, 2002). The majority of researchers find that dynamic information has a beneficial effect on the recognition of faces (e.g., Knappmeyer, Thornton, & Bülthoff, 2003; Lander & Bruce, 2000, 2004). In a classic study using point light stimuli (white dots on a black-masked face), Bassili (1978) showed that facial expressions were perceived correctly when in motion, but not when static. Dynamic information also seems to help people with impaired face recognition abilities, for instance, expression recognition for people with schizophrenia (Tomlinson, Swiss J. Psychol. 70 (4) © 2011 by Verlag Hans Huber, Hogrefe AG, Bern

Jones, Johnston, Meaden, & Wink, 2006) or identity recognition for those with prosopagnosia (Steede, Tree, & Hole, 2007). Dynamic information also proved to be helpful under suboptimal viewing conditions such as poor illumination or long distance (Lander, Christie, & Bruce, 1999). Even haptic recognition of faces improved under dynamic conditions in live faces (Lederman et al., 2007). There are, however, contrary findings: Regarding the recognition of facial identity, Bruce et al. (1999) demonstrated that participants had difficulty matching unfamiliar target faces on video against arrays of photographs; accuracy proved to be poor in the static condition even when viewpoint and facial expression were standardized – and it did not improve when the target face was shown in motion. Christie and Bruce (1998) confirmed this lack of improvement. Regarding facial expression, Bould and Morris (2008) showed that, while subtle facial expressions were recognized better in motion, the effect was reduced for expressions of higher intensity. Also, Katsyri and Sams (2008) showed that dynamic information did not improve the identification of facial expressions that were already highly distinctive in the static condition. Kamachi et al. (2001) manipulated the velocity with which a neutral face turned into an emotional one. They found that happiness and surprise were better recognized from fast sequences and sadness better from slow sequences, concluding that – depending on the velocity of change – motion assisted in emotion recognition because the representations of emotions encode information about both dynamic and static properties. Overall performance was, nevertheless, slightly poorer for dynamic images than for static images, which was explained by the fact that, in the static condition, 100% of the target emotion was present for the full display duration, while in the moving condition an average of only 50% of the target emotion was presented for the entire sequence. Hoffmann, Traue, Bachmayr, and Kessler (2006) stressed the importance of the correct velocity for each emotion. Taken together, previous research revealed a tendency toward better recognition for dynamic stimuli, but this effect is not really clear. It is, therefore, the aim of this study to investigate the role of motion in emotion recognition more thoroughly. Another focus of this study is the interplay of component and configural processing with motion: Even if we assume that motion facilitates recognition, the exact mechanism of this process is unclear. There are several explanations of how motion might influence the recognition of information contained in faces, such as by providing additional information with an increased number of views available, by building up a better representation of the face (e.g., three-dimensional representation), by enhancing the perception of change (Ambadar et al., 2005), or by a quality of its own that is inherent to dynamic information (see, e.g., Lander & Bruce, 2000, for the recognition of facial identity and O’Toole et al., 2002, for face recognition in general). While the role of component and configural information has been well explored in static face recognition as described above, the contribution of these two types of processing in dynamic face recognition is not yet clear. Apart from the question concerning how motion influences face processing in general, this study aims to examine the exact processes of motion or, more specifically, whether

S. D. Chiller-Glaus et al.: Dynamic Emotions in Composite Faces

235

Figure 1. Example of stimuli. (a) Six basic emotions; (b) aligned and misaligned composites. The top half is sadness, the bottom half is happiness.

the possibly greater accuracy for dynamic expressions can be explained through heightened levels of either component or configural processing. Ambadar et al. (2005) found evidence that motion does not specifically enhance configural information. They used an inversion paradigm that allowed rather clear conclusions about the processing of configural information. However, to study the specific role of individual facial parts, another research paradigm seems more suitable, that is, the composite face paradigm (Calder et al., 2000; Young et al., 1987). This design allows us to make predictions about separate regions of a face (e.g., whether a particular facial part might be more important for recognition of certain emotions). It has been shown for static stimuli that misalignment of the two halves of the face disrupts configural information, thereby preventing the fusion of the two halves into one impression, which results in better recognition of the separate halves when they are aligned. If dynamic information enhanced configural but not component-based processing, we would expect an interaction between alignment of the halves and motion: The difference between performance on aligned and misaligned halves would be greater for dynamic than for static stimuli. On the other hand, if dynamic information enhanced component-based but not configural processing, we would expect to find an interaction in the other direction, namely, the difference between performance on aligned and misaligned halves for dynamic compared to static stimuli

would be smaller. If we found no interaction between alignment and motion, we would interpret this as indicating that dynamic information does not enhance configural or component-based information, but has a quality of its own, be it reduction of change blindness, as suggested by Ambadar et al. (2005), or something else.

Method Participants A total of 48 students (29 female), aged 20 to 39 years (M = 25), of the University of Zurich voluntarily participated in this experiment. All had normal or corrected-to-normal vision.

Materials For the static condition, color, freeze images of peak expressions of four actors (all female) expressing the six basic human emotions (anger, fear, surprise, sadness, disgust, happiness; Ekman & Friesen, 1976) were used as stimuli (see Figure 1). The 24 photographs were divided into an upper and a Swiss J. Psychol. 70 (4) © 2011 by Verlag Hans Huber, Hogrefe AG, Bern

236

S. D. Chiller-Glaus et al.: Dynamic Emotions in Composite Faces

lower segment by cutting along a horizontal line through the nose. These halves were recombined in such a way that every top emotion segment was joined with every bottom emotion segment of the same actor, resulting in a total of 36 stimuli per actor. Two versions of each of these stimuli were created according to the design used by Young et al. (1987): In the aligned condition (AL), the two segments were combined to form a single face, while in the misaligned condition (MAL) the segments were shifted horizontally so that the nose of the upper half was aligned with the edge of the face of the lower half. For 50% of the misaligned stimuli, the upper half was shifted to the right, and for 50% it was shifted to the left (counterbalanced across emotion type and composition). In total, a set of 288 stimuli were created (6 × 6 expressions × 4 actors × 2 conditions, viz., AL vs. MAL). For the dynamic condition, 7 s video sequences of the same six emotions ranging from a neutral expression to maximum emotion and back to a neutral expression were recorded of the same four female actors. Using computer simulation, the video sequences were edited so that they showed colored faces on a black background. As in the static condition, aligned and misaligned versions of the video sequences were created, which resulted in a total of 288 stimuli. The stimulus presentation duration was always 7 s.

Design and Procedure Four within-subjects factors were investigated: motion (static vs. dynamic), stimulus type (AL vs. MAL), composition (same emotion vs. different emotion), and emotion (happy, sad, surprised, angry, disgusted, fearful). The dependent variable was the number of correct answers (recognition of facial expression). The participants were tested in two sessions, one dynamic and one static; the order of the two conditions was randomized. The stimuli were presented in the center of a 15-inch screen with a resolution of 600 × 800 pixels. The viewing distance of 60 cm was maintained by a head rest; the stimuli covered a vertical visual angle of 6 °. The experiment began with a warm-up session of 12 trials, followed by a main session of 288 trials. Emotion, composition, and alignment were counterbalanced across four blocks; presentation of trials within each block was randomized. The participants’ task was to identify the emotion depicted in each stimulus. After each response, the participants rated the perceived intensity of the emotion. A fixation cross (presentation duration: 1000 ms) preceded each stimulus. Static stimuli were presented for a maximum of 7 s (to match the presentation duration in the dynamic condition; participants were free to stop the presentation in both the static and the dynamic condition be-

fore the maximum of 7 s had elapsed). Six answer buttons labeled with the six emotions, to be pressed with the left mouse button followed the stimuli. There was no limit on response time. The participants were free to take a short break after the first, second, and third blocks of the experiment. Half of the participants were instructed to assess the facial expression depicted in the upper segment, the other half to assess that in the lower segment. Each session lasted approximately 50 min.

Results The principal data analysis involved correct responses (hits). Note that the chance level for correct responses was .166. We compared reaction times for the static and moving conditions to ensure that any possible effect of motion was not the result of different viewing durations. There was no significant difference between the average duration of time participants viewed the static stimuli (1.41 s) and the dynamic stimuli (1.39 s), t(94) = 0.16, p = .88. One possible caveat here might be that, in the static condition, peak expressions were visible from the stimulus onset, while in the moving condition they evolved over time – the maximum might never have been reached considering the short reaction time. However, there was no significant difference between the participants’ ratings of the intensity of the emotions in the two conditions, with M = 54.8, SD = 10.9 points for static stimuli versus M = 54.7, SD = 11.2 points for moving stimuli on a 90-point scale. One possible explanation for this might be that peak expressions in the dynamic stimuli were reached fairly soon after stimulus onset. A two-way ANOVA revealed neither a main effect of motion, F(1, 46) = 0.02, p = .88, η² = .00, nor a significant interaction between motion and emotion, F(5, 230) = 2.18, p = .08, η² = .05, indicating that the dynamic stimuli were no less intense or expressive than the static stimuli. We analyzed the results for the top and bottom segments of the faces separately for two reasons: First, there seems to be evidence that not all emotions are equally well processed by their top and bottom parts. Calder et al. (2000), for example, showed that anger, fear, and sadness were better recognized by the top half, happiness and disgust by the bottom half. In our study, we did not find a main effect for top/bottom [five-way ANOVA with the factors top versus bottom, emotion, motion, composition and alignment, F(1, 46) = 0.59, p = .45, η² = .01], but the factor did significantly interact with the critical factor emotion, F(5, 230) = 22.55, p < .001, η² = .331, supporting the findings of Calder et al. concerning an emotion-spe-

1 The factor top/bottom also interacted with the following factors: Two-way interaction: top/bottom × alignment: F(1, 46) = 14.47, p < .001, η² = .30. Three-way interactions: top/bottom × emotion × motion: F(5, 230) = 6.71, p < .001, η² = .13; top/bottom × emotion × composition: F(5, 230) = 15.42, p < .001, η² = .25; top/bottom × emotion × alignment: F(5, 230) = 17.11, p < . 001, η² = .27; and top/bottom × composition × alignment: F(1, 46) = 22.04, p < .001, η² = .32. Four-way interaction: top/bottom × emotion × composition × alignment: F(5, 230) = 9.73, p < .001, η² = .18. Swiss J. Psychol. 70 (4) © 2011 by Verlag Hans Huber, Hogrefe AG, Bern

S. D. Chiller-Glaus et al.: Dynamic Emotions in Composite Faces

Bottom lower half

Hit Rate

Tophalf upper 1.0

1.0

0.9

0.9

0.8

0.8

0.7

0.7

0.6

0.6

0.5

0.5

0.4

0.4

0.3

0.3

0.2

0.2

0.1

0.1

0.0 Same_AL

Same_MAL Same_ML

Different_AL

0.0 Different_MAL Differen_ML

Same_AL

Same_MAL Same_ML

cific dominance of facial segments. Second, in our study, the distinction between top and bottom served only as a methodological control variable rather than being our subject of interest. Therefore, we did not include this factor in the main analysis, but rather interpreted the top and the bottom segments of the faces separately. The results are shown in Figure 2.

Main Effects A four-way ANOVA with the factors motion, emotion, composition, and alignment revealed a main effect for all four factors: motion: F(1, 23) = 40.77, p < .001, η² = .64 (bottom), and F(1, 23) = 15.88, p < .001, η² = .41 (top); emotion: F(5, 115) = 60.45, p < .001, η² = .72 (bottom), and F(5, 115) = 44.64, p < .001, η² = .66 (top); composition: F(1, 23) = 73.43, p < .001, η² = .76 (bottom), and F(1, 23) = 96.30, p < .001, η² = .81 (top); and alignment: F(1, 23) = 5.38, p < .05, η² = .19 (bottom), and F(1, 23) = 58.09, p < .001, η² = .72 (top). Also, the following interac-

Different_AL

Different_MAL Differen_ML

237

Figure 2. Effect of composition and alignment on recognition performance (average of all emotions) for static and moving expressions. Different = different emotion, Same = same emotion, AL = aligned, MAL = misaligned. Error bars represent standard deviations. Gray bars show the static condition, shaded bars the dynamic condition.

tions were significant for both the top and bottom halves of the face: emotion × motion: F(3.54, 81.36) = 4.36, p < .01, η² = .16 (bottom), and F(3.83, 88.05) = 5.05, p < .01, η² = .18 (top); emotion × composition: F(5, 115) = 13.25, p < .001, η² = .37 (bottom), and F(5, 115) = 10.26, p < .001, η² = .31 (top); emotion × alignment: F(5, 115) = 3.84, p < .01, η² = .14 (bottom), and F(3.66, 84.11) = 27.05, p < .001, η² = .54 (top); and alignment × composition: F(1, 23) = 12.97, p < .01, η² = .36 (bottom), and F(1, 23) = 65.30, p < .001, η² = .74 (top). (Note that the noninteger dfs here and in the remainder of the paper are the result of Greenhouse-Geisser corrections for nonsphericity.) In other words, the factor emotion interacted significantly with all other factors. Motion did not interact with any (except emotion). For the upper half, but not for the lower half, the three-way interaction between emotion, composition, and alignment was also significant: F(5, 115) = 15.57, p < .001, η² = .40. No other interactions were significant. The four-way ANOVA revealed a strong main effect for emotion as well as significant interactions between emotion and all the other factors, indicating that not all emo-

Table 1 a) Mean accuracy/hit rates of emotion judgments (standard deviations). S = same emotion, D = different emotion, AL = aligned, MAL = misaligned Condition Static Emotion

S-AL

Moving

S-MAL

D-AL

D-MAL

S-AL

S-MAL

D-AL

D-MAL

Bottom Anger

.46 (.23)

.47 (.25)

.35 (.17)

.32 (.18)

.57 (.23)

.51 (.24)

.51 (.17)

.52 (.14)

Disgust

.72 (.22)

.68 (.24)

.65 (.19)

.69 (.19)

.68 (.25)

.72 (.24)

.69 (.20)

.69 (.21)

Fear

.49 (.25)

.55 (.22)

.34 (.13)

.40 (.14)

.64 (.27)

.70 (.16)

.48 (.15)

.55 (.18) .97 (.05)

Happiness

1.00 (.00)

.99 (.05)

.96 (.07)

.97 (.05)

1.00 (.00)

1.00 (.00)

.93 (.09)

Sadness

.73 (.18)

.70 (.23)

.48 (.21)

.58 (.18)

.82 (.19)

.84 (.18)

.56 (.15)

.65 (.13)

Surprise

80. (.19)

.74 (.20)

.58 (.17)

.63 (.14)

.88 (.18)

.81 (.18)

.60 (.16)

.64 (.16)

Anger

.47 (.25)

.54 (.22)

.44 (.12)

.54 (.14)

.58 (.24)

.57 (.24)

.47 (.13)

.48 (.18)

Disgust

.51 (.21)

.33 (.22)

.28 (.15)

.24 (.15)

.61 (.19)

.57 (.29)

.46 (.16)

.44 (.17)

Fear

.67 (.27)

.63 (.27)

.46 (.19)

.63 (.17)

.71 (.26)

.73 (.28)

.51 (.13)

.64 (.17) .90 (.16)

Top

Happiness

.94 (.13)

.97 (.08)

.41 (.21)

.90 (.13)

.97 (.08)

.99 (.05)

.45 (.30)

Sadness

.84 (.18)

.86 (.19)

.74 (.20)

.84 (.17)

.90 (.16)

.90 (.18)

.70 (.19)

.79 (.20)

Surprise

.86 (.22)

.74 (.25)

.60 (.18)

.67 (.21)

.88 (.15)

.86 (.15)

.67 (.17)

.71 (.15)

Swiss J. Psychol. 70 (4) © 2011 by Verlag Hans Huber, Hogrefe AG, Bern

238

S. D. Chiller-Glaus et al.: Dynamic Emotions in Composite Faces

Table 1 b) Significance values of three-way ANOVAs. Mot-comp = Motion * Composition, mot-align = Motion * Alignment, comp-align = Composition * Alignment Main effects Emotion

Motion

Composition

Two-way interactions Alignment

mot-comp

mot-align

comp-align

Bottom Anger

p < .01

p < .01

n. s.

p < .05

n. s.

n. s.

Disgust

n. s.

n. s.

n. s.

n. s.

n. s.

n. s.

Fear

p < .001

p < .001

p < .01

n. s.

n. s.

n. s.

Happiness

n. s.

p < .001

n. s.

n. s.

n. s.

p < .01

Sadness

p < .05

p < .001

p < .01

n. s.

n. s.

p < .01

Surprise

n. s.

p < .001

n. s.

n. s.

n. s.

p < .01

Top Anger

n. s.

p < .05

n. s.

n. s.

n. s.

n. s.

Disgust

p < .001

p < .001

p < .01

n. s.

p < .05

n. s.

Fear

n. s.

p < .001

p < .01

n. s.

n. s.

p < .001

Happiness

n. s.

p < .001

p < .001

n. s.

n. s.

p < .001

Sadness

n. s.

p < .001

p < .01

p < .05

n. s.

p < .05

Surprise

n. s.

p < .001

n. s.

n. s.

n. s.

p < .05

tions are processed the same way. Other authors (e.g., Ambadar et al., 2005; Calder et al., 2000) have also found evidence for this: Happiness seems to be easier to recognize than other emotions, and different emotions seem to be more easily recognized from different facial areas, which, of course, would influence face recognition patterns differently in a composite face paradigm. We, therefore, looked at the effect of motion on the six emotions separately. The results are displayed in Table 1. As indicated in Table 1a, there is a general tendency for dynamic stimuli to lead to better recognition. However, the main effect of motion is only significant in 4 of the 12 cases (Table 1b). In three cases, motion interacted with one of the other factors; otherwise, no effects of motion were observed. As for composition, we found a main effect in all cases (except for the lower half of disgust), indicating that it was more difficult to judge emotions when displayed with another. A significant main effect was found for alignment in half of the cases (upper half of disgust, fear, happiness, and sadness; lower half of fear and sadness), and an interaction between composition and alignment for happiness, sadness, and surprise (both halves), as well as for the upper half of fear. As can be seen in Table 1a, the performance for happiness was near ceiling in most of the conditions. In order to better capture the influence of composition and alignment on this emotion, we analyzed reaction times in addition to hit rates. On average, the hit rates were faster for dynamic than for static stimuli (top: 1185 ms vs. 1337 ms; bottom: 867 ms vs. 923 ms). However, a three-way ANOVA with the factors motion, composition, and alignment did not yield a main effect of motion, F(1, 23) = 1.50, p = .23, η² = .06 (top), and F(1, 23) = 0.18, p = .67, η² = .01 (bottom). Overall, the analSwiss J. Psychol. 70 (4) © 2011 by Verlag Hans Huber, Hogrefe AG, Bern

ysis of reaction times revealed a pattern identical to that obtained with hit rates: A main effect of composition for both halves, F(1, 23) = 22.01, p < .001, η² = .49 (top), and F(1, 23) = 14.15, p < .01, η² = .38 (bottom); a main effect of alignment for the top half, F(1, 23) = 16.84, p < .001, η² = .42; and an interaction between composition and alignment for both top and bottom F(1, 23) = 21.42, p < .001, η² = .48 (top), and F(1, 23) = 9.52, p < .01, η² = .29 (bottom). The latter result likely stems from longer reaction times in the differentaligned condition, a result that is mirrored by the drop in hit rates in the same condition (see Table 1a).

Discussion We used the composite paradigm by Young et al. (1987) to investigate the effect of motion on facial expression recognition. First of all, our results suggest that emotions are easier to recognize in moving faces than in static ones, as can be seen in Figure 2. This finding is in line with previous research (e.g., Bassili, 1978). However, as the significant interaction between motion and emotion implies, this effect is not the same across all emotions. In fact, the large main effect for emotion shows that there are major differences between our ability to recognize each of the six emotions. Happiness seems to be exceptionally easy to recognize, a finding that is supported by a large number of other studies (e.g., Ambadar et al., 2005; Calvo & Lundquist, 2008; Montagne, Kessels, De Haan, & Perrett, 2007). Looking at the emotions separately, the benefit of motion is reduced and only reaches significance for four of the emotions (and for each of them in only one of the two segments), while there is only a tendency for

S. D. Chiller-Glaus et al.: Dynamic Emotions in Composite Faces

better performance with dynamic stimuli for the other emotions. These results might reflect the inconclusive findings of other researchers on the nature of dynamic information. Analyzing the cases separately, however, the pattern seems plausible: Anger (bottom) and disgust (top), which are generally not recognized as accurately as other expressions, as well as fear (bottom) and sadness (bottom) benefit from motion. These are segments for which performance is relatively low when compared to the rest of the emotions. On the other hand, happiness, which is recognized with great ease, is not influenced very much by motion. This finding might be interpreted along the lines of previous studies that showed that motion had the strongest effect on expressions that are not already distinctive in the static condition (Katsyri & Sams, 2008), or that motion has a positive influence on recognition performance under difficult viewing conditions (Lander et al., 1999) or when information is ambiguous (Knappmeyer et al., 2003). Thus, it seems reasonable to argue that people make the most of additional dynamic information for expressions that are difficult to recognize under static conditions. Furthermore, even though the effect of motion was not very large on separate emotions, we nevertheless found that at least motion did not reduce recognition performance. Thus, our results tend to point in the same direction as the study by Ambadar et al. (2005), who found that recognition of all emotions except happiness benefits from motion. In their experiment, they used subtle facial expressions rather than peak expressions. Ambadar et al. argued that the absence of any significant influence by motion might arise from the stimuli used: Peak expressions, as commonly used in experiments on emotion recognition, might be too intense to not mask the more subtle effects of dynamic information. This explanation might well account for our results. For dynamic stimuli, Nusseck, Cunningham, Wallraven, and Bülthoff (2008) showed that each emotion seems to have its own activation pattern, some relying only on single features, others relying on the interplay of several areas for consistent recognition. This leads us to our second research question, namely, what role configural and component-based information plays in recognizing emotions. In our design, the crucial test for this question was the interaction between composition and alignment: A significant interaction indicates that alignment affects same-emotion faces and differentemotion faces in unequal ways, that is, that different-emotion faces are more difficult to recognize when aligned than when misaligned, while the opposite is the case for same-emotion faces. Such an interaction can be interpreted as evidence for the fusion of the two separate halves of the face to form a single impression, which facilitates recognition in the case of two identical emotions and reduces it in the case of a distracting second emotion. Such a fusion would stress the importance of configurations in faces. We found a robust main effect of composition in all emotions for both static and moving stimuli (sole exception: lower half of disgust, for which neither main effects nor interactions were significant). The effect indicates that it is easier to recognize an emotion when there is no other emotion present for distraction. However, we did

239

find the critical interaction with alignment for three emotions (happiness, sadness, and surprise; and for fear in the upper half as well), as can be seen in Table 1b from the reported significances, as well as in Table 1a in terms of numerical values (at least for most cases). These numbers indicate that same-emotion faces are only easier to recognize when aligned, while recognition was easier for two separate emotions when they were misaligned. This pattern represents a typical composite effect found by Young et al. (1987) and Calder et al. (2000). The significant interaction between composition and alignment suggests that the processing of these facial expressions is based on global rather than local information, which reflects the general consensus in research that configural information plays a pivotal role. The fact that our data analysis involving correct responses matched the findings by Calder et al. with reaction times implies that the composite effect is highly robust. However, since this interaction can only be found in half of the emotions, we cannot exclude the role of local components. Anger, disgust, and fear (lower half) do not produce the critical interaction between composition and alignment. Not even the main effect for composition was significant for disgust (lower half), indicating that its recognition may solely rely on local facial features and on leaving more global information unattended. Taken together, our data suggest that both configural and component information are assessed when people process facial expressions, whereby the extent of involvement of each type of information depends on the specific emotion. Dynamic information had an enhancing effect on emotion recognition, overall, but does it specifically influence components and configural information in separate ways? The composite design allows us to make predictions about both types of information relatively independently of one another. Using the inversion paradigm, Ambadar et al. (2005) have already suggested that configural processing is not specifically supported by motion. We complemented this result by finding that component information is not specifically supported by motion either: The lack of any significant interactions between motion and other factors (alignment and composition) suggests that motion influences both global and local processing alike and does not enhance either one separately. Therefore, it seems likely that motion either has a beneficial effect on recognition performance because of the dynamic information per se or that it influences other processes in face recognition, such as change blindness, as suggested by Ambadar et al. Further research is necessary to address this issue.

Acknowledgments This study was financially supported by the “Stiftung für wissenschaftliche Forschung an der Universität Zürich,” University of Zurich, Switzerland. The authors would like to thank Gisela Schoch for data collection. Swiss J. Psychol. 70 (4) © 2011 by Verlag Hans Huber, Hogrefe AG, Bern

240

S. D. Chiller-Glaus et al.: Dynamic Emotions in Composite Faces

References Ambadar, Z., Schooler, J. W., & Cohn, J. F. (2005). Deciphering the enigmatic face: The importance of facial dynamics in interpreting subtle facial expressions. Psychological Science, 16, 403–410. Bassili, J. N. (1978). Facial motion in the perception of faces and of emotional expression. Journal of Experimental Psychology: Human Perception and Performance, 4, 373–379. Bould, E., & Morris, N. (2008). The role of motion signals in recognizing subtle facial expressions of emotion. British Journal of Psychology, 99, 167–189. Bruce, V. (1988). Recognizing faces. Hillsdale, NJ: Erlbaum. Bruce, V., Henderson, Z., Greenwood, K., Hancock, P. J. B., Burton, A. M., & Miller, P. (1999). Verification of face identities from images captured on video. Journal of Experimental Psychology: Applied, 5, 339–360. Calder, A. J., Young, A. W., Keane, J., & Dean, M. (2000). Configural information in facial expression perception. Journal of Experimental Psychology, 26, 527–551. Calvo, M. G., & Lundquist, D. (2008). Facial expressions of emotion (KDEF): Identification under different display-duration conditions. Behavioral Research Methods, 40(1), 109–115. Carey, S., & Diamond, R. (1977). From piecemeal to configural representation of faces. Science, 195, 312–314. Christie, F., & Bruce, V. (1998). The role of dynamic information in the recognition of unfamiliar faces. Memory and Cognition, 26, 780–790. Ekman, P., & Friesen, W. V. (1976). Pictures of facial affect. Palo Alto, CA: Consulting Psychologists Press. Farah, M. J., Tanaka, J. W., & Drain, H. M. (1995). What causes the face inversion effect? Journal of Experimental Psychology: Human Perception and Performance, 21, 628–634. Farah, M. J., Wilson, K. D., Drain, M., & Tanaka, J. N. (1998). What is “special” about face perception? Psychological Review, 105, 482–498. Hoffmann, H., Traue, H. C., Bachmayr, F., & Kessler, H. (2006). Perception of dynamic facial expressions of emotion. In E. André, L. Dybkjær, W. Minker, H. Neumann, M. Weber (Eds.), Perception and interactive technologies (pp. 175–178). Berlin: Springer-Verlag. Kamachi, M., Bruce, V., Mukaida, S., Gyoba, J., Yoshikawa, S., & Akamatsu, S. (2001). Dynamic properties influence the perception of facial expressions. Perception, 30, 875–887. Katsyri, J., & Sams, M. (2008). The effect of dynamics on identifying basic emotions from synthetic and natural faces. International Journal of Human-Computer Studies, 66, 233–242. Knappmeyer, B., Thornton, I. M., & Bülthoff, H. H. (2003). The use of facial motion and facial form during the processing of identity. Vision Research, 43, 1921–1936. Lander, K., & Bruce, V. (2000). Recognizing famous faces: Exploring the benefits of facial motion. Ecological Psychology, 12, 259–272. Lander, K., & Bruce, V. (2004). Repetition priming from moving faces. Memory and Cognition, 32, 640–647. Lander, K., Christie, F., & Bruce, V. (1999). The role of movement in the recognition of famous faces. Memory and Cognition, 27, 974–985. Leder, H., & Bruce, V. (2000). When inverted faces are recognized: The role of configural information in face recognition. Quarterly Journal of Experimental Psychology, 53A, 513–536.

Swiss J. Psychol. 70 (4) © 2011 by Verlag Hans Huber, Hogrefe AG, Bern

Lederman, S. J., Klatzky, R. L., Abramowicz, A., Salsman, K., Kitada, R., & Hamilton, C. (2007). Haptic recognition of static and dynamic expressions of emotion in the live face. Psychological Science, 18, 158–164. Montagne, B., Kessels, R. P., De Haan, E. H., & Perrett, D. I. (2007). The emotion recognition task: A paradigm to measure the perception of facial emotional expressions at different intensities. Perceptual and Motor Skills, 104, 589–598. Nusseck, M., Cunningham, D. W., Wallraven, C., & Bülthoff, H. H. (2008). The contribution of different facialregions to the recognition of conversational expressions. Journal of Vision, 8(8), 1–23. O’Toole, A. J., Roark, D. A., & Abdi, H. (2002). Recognizing moving faces: A psychological and neural synthesis. Trends in Cognitive Sciences, 6, 261–266. Schwaninger, A., Carbon, C. C., & Leder, H. (2003). Expert face processing: Specialization and constraints. In G. Schwarzer & H. Leder (Eds.), Development of face processing (pp. 81–97). Göttingen: Hogrefe. Schwaninger, A., Lobmaier, J., & Collishaw, S. M. (2002). Component and configural information in face recognition. 2nd international Workshop on Biologically Motivated Computer Vision, Tübingen, Germany, 2002. Lectures Notes in Computer Science, 2525, 643–650. Schwaninger, A., Wallraven, C., Cunningham, D. W., & ChillerGlaus, S. D. (2006). Processing of identity and emotion in faces: A psychophysical, psychological and computational perspective. Progress in Brain Research, 156, 321–343. Searcy, J. H., & Bartlett, J. C. (1996). Inversion and processing of component and spatial-relational information of faces. Journal of Experimental Psychology: Human Perception and Performance, 22, 904–915. Sergent, J. (1984). An investigation into component and configural processes underlying face recognition. British Journal of Psychology, 75, 221–242. Steede, L. L., Tree, J. J., & Hole, G. J. (2007). I can’t recognize your face but I can recognize its movement. Cognitive Neuropsychology, 24, 451–466. Tanaka, J. W., & Farah, M. J. (1993). Parts and wholes in face recognition. The Quarterly Journal of Experimental Psychology, 46A, 225–245. Tomlinson, E. K., Jones, C. A., Johnston, R. A., Meaden, A., & Wink, B. (2006). Facial emotion recognition from moving and static point-light images in schizophrenia. Schizophrenia Research, 85(1–3), 96–105. White, M. (2000). Parts and wholes in expression recognition. Cognition and Emotion, 14, 39–60. Williams, M. A., Moss, S. A., & Bradshaw, J. L. (2004). A unique look at face processing: The impact of masked faces on the processing of facial features. Cognition, 91, 155–172. Young, A. W., Hellawell, D., & Hay, D. C. (1987). Configural information in face perception. Perception, 16, 747–759.

Sarah Chiller-Glaus Department of Psychology University of Zurich Binzmühlestrasse 14/22 CH - 8050 Zurich Switzerland [email protected]