Perceptual Similarity of Shapes Generated From Fourier Descriptors

Copyright 1996 by the American Psychological Association, Inc. 0096-1523/96/$3.00

Journal of Experimental Psychology: Human Perception and Performance 1996, Vol. 22, No. 1, 133-143

Perceptual Similarity of Shapes Generated From Fourier Descriptors

This document is copyrighted by the American Psychological Association or one of its allied publishers. This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

James M. Cortese and Brian P. Dyre University of Illinois at Urbana-Champaign A metric representation of shape is preserved by a Fourier analysis of the cumulative angular bend of a shape's contour. Three experiments examined the relationship between variation in Fourier descriptors and judgments of perceptual shape similarity. Multidimensional scaling of similarity judgments resulted in highly ordered solutions for matrices of shapes generated by a Fourier synthesis of a few frequencies. Multiple regression analyses indicated that particular Fourier components best accounted for the recovered dimensions. In addition, variations in the amplitude and the phase of a given frequency, as well as the amplitudes of 2 different frequencies, produced independent effects on perceptual similarity. These results suggest that a Fourier representation is consistent with the perceptual similarity of shapes, at least for the relatively low-dimensional Fourier shapes considered.

Object recognition is a fundamental process in biological information processing, yet its mechanisms remain largely unknown. Part of the difficulty lies in the multiplicity of information sources potentially useful for recognition of an object: characteristic surface textures or hues, patterns of motion, contextual cues, and so forth. Contour shape, however, has been shown to be a predominant source of information for object recognition (Biederman & Ju, 1988). The visual system's facility in recognition of objects in contour drawings underscores this claim. Thus, an important component of a theory of object recognition is the representation of contour shape information. One prominent approach for shape representation involves parsing an object into a restricted set of parts and storing these parts in a structural description of prepositional relationships (Biederman, 1987; Hoffman & Richards, 1984; Marr & Nishihara, 1978; Reed, 1974; Sutherland, 1968; Winston, 1975). The problem of object recognition is made tractable by reducing shapes to clusters of two-dimensional (2-D) or three-dimensional (3-D) prototypes, the "alphabet" of visual recognition (Biederman, 1987, offered an analogy to phonemes). These prototypical shapes may then be hierarchically organized as parts or subparts of an object. In a review, Pinker (1985) suggested that structural descriptions "factor apart the information in a scene without necessarily losing information in it" (p. 11). However, no structural description model proposed to date can live up to this claim. The difficulty lies in representing

the infinite variety of shapes with a small set of primitives. Typically, the parts are distinguished only by qualitative differences in shape. For example, Biederman's model relies on volumetric primitives called geons, which are characterized by binary or ternary attributes of edge properties (e.g., straight vs. curved). W. Richards and Hoffman (1985) described curves by using codons, which are defined to be particular sequences of curvature extrema. Neither theory accounts for metric variations in shape within a part. Naturally, the theories of W. Richards and Hoffman and of Biederman were not intended to be complete descriptions of the object recognition process, and as suggested by Biederman, dichotomous contrasts on a few attributes of the edges of an object may be sufficient for an initial categorization. Indeed, the nonmetric character of the representation may even be necessary for categorization, as particular exemplars within a .category will generally vary metrically. It seems clear, however, that finer distinctions will require the processing of metric shape variation. Given a sufficiently large and flexible library of primitives, it may be possible, in principle, for structural description models to specify shapes to an arbitrary degree of precision. For example, Marr and Nishihara (1978) proposed that objects can be recognized by detecting the axes of parts modeled by generalized cones. A generalized cone is the surface produced as a given cross-sectional shape is swept along an axis. The size of the cross-section and the curvature of the axis are allowed to vary smoothly. Thus, to completely specify a generalized cone, one needs to store many parameters, characterizing (a) the shape of the crosssection, (b) the variation in size of the cross-section as it is swept along the axis, and (c) the variation in curvature of the axis (Pinker, 1985). Even given all these parameters, some metric shape information will be lost: Some shapes would be more accurately captured if the shape of the cross-section were also allowed to vary as it is swept along the axis. Thus, for the structural description approach, a trade-off exists between the simplicity of the representation and the amount of metric shape information captured. This type of representation, therefore, cannot easily account for the recogni-

James M. Cortese and Brian P. Dyre, Department of Psychology, University of Illinois at Urbana-Champaign. James M. Cortese is now at Lockheed Martin Engineering and Sciences, Houston, Texas. A portion of this research was presented at the June 1991 meeting of the American Psychological Society, Washington, DC. This work was partially supported by a National Science Foundation Graduate Fellowship. Correspondence concerning this article should be addressed to Brian P. Dyre, who is now at the Department of Psychology, University of Idaho, Moscow, Idaho 83844-3043. Electronic mail may be sent via Internet to [email protected].

133


134

CORTESE AND DYRE

tion of highly irregular forms (e.g., an outline map of the United States). An alternative approach to these analytic systems of shape representation is a holistic model that does not rely on parsing an object into parts. One such representation, developed in the computer vision literature, is the method of Fourier descriptors (FDs; Kuhl & Giardina, 1982; Persoon & Fu, 1974; Zahn & Roskies, 1972). In this method, given an arbitrary starting point on a closed contour, the function relating cumulative arc length to local contour orientation is expanded in a Fourier series (see Figure 1; the arc length is first normalized to 2ir). Each term in the expansion represents a particular frequency, amplitude, and phase angle. The resulting FDs completely describe the shape of the closed contour and are invariant over changes in the position or the size of the contour. In addition, shifts in the starting point on the curve or rotation of. the curve produce identical functions, differing by only a lag or an additive constant. Another advantage of this representation is that the global shape characteristics can be specified by just the first few low-frequency terms. Adding in higher frequencies then fills in local detail. FDs have been shown to be effective for computer pattern recognition (Brill, 1968, as cited in Persoon & Fu, 1974; Granlund, 1972). Psychophysical experiments by Alter and Schwartz (1988) have shown that FDs may be useful for characterizing biological visual systems as well. Using shapes with nonzero power at only a single frequency, Alter and Schwartz found that adaptation to a particular FD harmonic frequency raised the amplitude-discrimination

Proceed

Start here Stan nere

around

circumference

3t 2

/

— 0

o Cumulative angular bend function JIT

Circle

Circumference Figure 1. The cumulative angular bend function of an H is compared to that of a circle. The function is rotated and normalized before expansion in a Fourier series. From "Fourier Descriptors for Plane Closed Curves," by C. T. Zahn and R. Z. Roskies, 1972, IEEE Transactions on Computers, C-21, p. 270. Copyright ©1972 by IEEE Computer Society. Adapted with permission.

threshold for that frequency. Uhlarik (1989), using an alternative definition of an FD suggested by Kuhl and Giardina (1982), also demonstrated a frequency-specific adaptation effect. FDs have found support in the neurophysiological literature as well. Schwartz, Desimone, Albright, and Gross (1983) found that approximately half of the visually responsive neurons in the inferior temporal cortex were selectively tuned to the frequency of FD stimuli (with a mean bandwidth of 2.0 octaves). Using frequencies from 2 to 64 cycles per perimeter (in an octave series), Schwartz et al. found that all frequencies were about equally represented, except for a reduced incidence of the frequency 64 cycles per perimeter. The shape of the tuning curve typically remained invariant with changes in the size of the stimulus, while the absolute level of the response varied. Also, approximately half of the FD-selective neurons showed similarly shaped tuning curves across changes in amplitude. However, Albright, Charles, and Gross (1985) subsequently demonstrated that the coding of contour shape by inferior temporal neurons is not a simple linear function of FD frequency. Neurons selective for a single-frequency FD stimulus did not exhibit the same selectivity when other frequencies were added to the stimulus, thus suggesting that inferior temporal neurons are not simple linear filters for frequency. Despite this apparent invalidation of a simple linear model, FDs may still be useful characterizations of contour shape. The decomposition of a shape by discrete Fourier analysis provides a series of amplitude and phase parameters. To the extent that the visual system represents global shape by using FDs, perceived shape similarity should be at least monotonically related to the amplitudes and the phases of FD frequencies. In the present experiments, we tested this prediction by obtaining ratings of perceived shape similarity and subjecting them to multidimensional scaling. Multidimensional scaling has proven useful in a number of studies of shape perception (e.g., Brown & Andrews, 1968; Garbin, 1990; L. G. Richards, 1971). It has also been used in conjunction with Fourier analysis of spatial frequency in the luminance domain. Harvey and Gervais (1978, 1981) obtained similarity ratings of visual texture patterns and submitted them to multidimensionalscaling analyses. They found that the amplitudes of the spatial frequency components of texture patterns accounted for much of the variance in the perceptual similarity space. In the present studies, if, on the one hand, perceived shape similarity is related to variation in the amplitude and phase parameters of the contour, then vectors representing these Fourier components should account for the dimensions of the recovered similarity space. If, on the other hand, qualitative stimulus attributes are used to represent shape (e.g., smoothness, number of parts, or orientation), then vectors representing these qualities should account for the majority of variability in similarity judgments. For this reason, we also obtained ratings on a number of unidimensional scales representing qualitative aspects of the stimuli.

PERCEPTUAL SIMILARITY OF SHAPES


Experiment 1 In Experiment 1, we varied the amplitude and the phase of a single FD frequency. A Fourier representation of shape would predict that the perceptual similarity space should reflect variation of these two parameters. Also, because of the independence of amplitude and phase in a Fourier representation, we made an additional prediction: The amplitude and the phase of a given FD frequency should show independent effects on perceived similarity. That is, because the amplitude and phase values of a particular FD frequency are independent parameters of one term in the Fourier synthesis, the vectors representing these components in the multidimensional scaling solution should be orthogonal.

Method Participants. Ten graduate students from the University of Illinois at Urbana-Champaign were paid for their participation. They had normal or corrected-to-normal vision and were naive regarding the purpose of the study. Stimuli and apparatus. The stimuli consisted of nine simple closed contours whose shape was constructed from FDs (Zahn & Roskies, 1972) of frequencies 2, 4, and 6 cycles per perimeter.1 The amplitude of frequency 6 was 0.50, 0.75, or 1.00 radian,2 and the phase of frequency 6 was 0°, 45°, or 90°. The amplitude and phase values of frequencies 2 and 4 were held constant at 0.50 radian and 0°, respectively. The resulting nine stimuli are shown in Figure 2. The stimuli and the response scale were presented on an Apple Macintosh Plus computer. Participants used a mouse to make their responses. At a viewing distance of approximately 40 cm, the shapes subtended roughly 8° of visual angle. Procedure. The experiment was conducted in one self-paced 30-60-min session that consisted of three phases. First, participants were presented with each shape individually, in a random order, so that they could become familiar with all of the shapes.

r

Phase angle: Frequency = 6 (degrees) 0

45

90

0.50

Participants were instructed to merely observe and familiarize themselves with each shape and then to press the mouse button to see the next shape. Following the familiarization phase, participants twice viewed, in a random order, each of the 45 pairs of shapes, once in each of the left-right orders.3 Participants were instructed to rate the similarity of each pair of shapes on a 9-point scale, with 1 as low similarity, 5 as some similarity, and 9 as high similarity. After the participants had rated the similarity of all of the pairs, they rated each shape individually on seven unidimensional 9-point scales ranging from 1 to 9: (a) width from narrow to wide,) (b) straightness (from crooked to straight,) (c) smoothness (from bumpy to smooth), (d) number of parts (from 1 to 9), (e) complexity (from simple to complex), (f) symmetry (from nonsymmetrical to very symmetrical), and (g) orientation (from vertical to slanted). We chose these scales because informal observation suggested that they might be used to characterize the differences in the shapes.

Results and Discussion We performed a monotonic multidimensional scaling analysis (Kruskal & Wish, 1978), using a Euclidean distance function, on the similarity judgments. We should note that Kruskal's algorithm estimates stimulus coordinates in a metric space (in this case, Euclidean) such that the rank order of the distances agrees, as closely as possible, with the rank order of the judged dissimilarities (Davison, 1983). The stress \,alues were obtained for one- through fourdimensional solutions. The plot of stress versus the number of dimensions (see Figure 3) shows a clear elbow at the 2-D solution, suggesting that two underlying stimulus dimensions accounted for the majority of the variability in similarity judgments (Kruskal & Wish, 1978). The position of each shape within the 2-D space is shown in Figure 4. The solution is highly ordered, and the interpoint distances are approximately constant. The 3 X 3 matrix of shapes is essentially reproduced in the similarity space, which suggests that perceived dissimilarity is monotonically related to distance in a 2-D Euclidean space with, in this case, amplitude and phase as the two dimensions. Indeed, the relationship between distances in this space and perceived dissimilarities may be linear: A linear multidimensional-scaling analysis produced a 2-D solution with virtually the same pattern as that for the monotonic analysis.4 1

0.75

1.00

Figure 2. The stimuli used in Experiment 1. Moving across the matrix represents increasing amplitude on frequency 6; moving down the matrix represents changes in phase of frequency 6.

135

For brevity, hereinafter the notation "frequency x" stands for "frequency = x cycles per perimeter." 2 Units of amplitude represent the orientation, in radians, of a vector tangent to the contour of a shape (cumulative angular bend). An amplitude of 0 radians is equivalent to a horizontal tangent vector pointing to the right. Positive amplitude values represent tangent vector directions rotating counterclockwise from the zero position, whereas negative amplitude values represent tangent vector directions rotating clockwise from the zero position (see Figure 1). 3 Participants saw the full 9 X 9 matrix of pairings, plus an additional presentation of the pairs along the main diagonal (each shape paired with itself), in order to collect two responses per unique pair. 4 For each experiment in this article, we examined both monotonic and linear multidimensional-scaling solutions. For all three

136

CORTESE AND DYRE

These vectors were approximately orthogonal (angular difference = 99.2°), which suggests that the effects on perceived similarity of variations in amplitude and phase were largely independent. This result is consistent with the independence of phase and amplitude in the Fourier domain and provides further evidence that Fourier components capture perceptually relevant aspects of the shapes. Correlations between each unidimensional scale and the Fourier components of phase angle and amplitude also showed an interesting pattern (see Table 2). In particular, smoothness and complexity were highly correlated with amplitude (rs = -.961 and .965, respectively), and orientation was highly correlated with phase angle (r = .955), as one might expect on the basis of the definition of FDs. Generally, increasing the amplitude increases the size of bumps or lobes on the contour; small changes in phase shift the position or the orientation of the lobes.


o.r

o.o-i No. of Dimensions Figure 3. The stress plots for Experiments (Exp) 1-3 for solutions of one through four dimensions.

To aid in the interpretation of the two dimensions, we regressed vectors representing the unidimensional scales and the amplitude and the phase of Frequency 6 on the 2-D solution. The results of this analysis are shown in Table 1. The beta weights for each dimension represent the relative vector magnitude of each unidimensional scale along that dimension. The multiple correlation (R) represents the strength of the correlation between the resultant vector and the 2-D solution. We used a Bonferroni test to maintain familywise error at a = .05. As a result, correlations are considered statistically significant only at p < .005. Two interesting results emerged from this analysis. First, the highest multiple correlations were obtained for the Fourier components, phase (R = .998) and amplitude (R .984), as well as for the stimulus feature smoothness (R = - .984). This result indicates that the phase and the amplitude of Frequency 6 accounted for more variability in the judgments of similarity than did any of the unidimensional scales, with the exception of smoothness. Thus, it appears that participants' judgments of similarity were more likely based on abstract quantitative properties of the stimuli—the Fourier components—rather than on most of the qualitative stimulus features we examined. It is important to note that the unidimensional scales used in this experiment do not exhaust the possibilities for qualitative stimulus attributes. However, the high correlations found between Fourier components and perceptual similarity in this and the following experiments are certainly suggestive. A second interesting result concerns the direction of the vectors fitted for amplitude and phase of Frequency 6. experiments, both analyses produced virtually the same similarity space. Monotonic-scaling solutions are reported here because they do not make any overly restrictive assumptions (such as linearity) about the relationship between the Fourier components and similarity judgments.

Experiment 2 Experiment 1 found that Fourier components—phase and amplitude—accounted for the greatest amount of variability in similarity judgments. In addition, the results suggest that amplitude and phase have a nearly independent influence on judgments of similarity, an effect consistent with a Fourier representation of shape. Fourier theory also predicts another pattern of effects on similarity judgments: the independence of amplitude values at different frequencies. The purpose of Experiment 2 was to test this prediction by examining similarity judgments of shapes created by variations in relative amplitude and in mean amplitude of two FD frequencies. If contour shape is represented by Fourier components, then we expected that variation in these two parameters will also produce independent changes in perceived similarity. Method Participants. Fifteen undergraduate students from the University of Illinois at Urbana-Champaign received course credit for their participation. They had normal or corrected-to-normal vision and were naive regarding the purpose of the study. Stimuli and apparatus. The stimuli and the apparatus were identical to those used for Experiment 1, with the following exceptions: The stimuli were constructed with frequencies of 2 and 4 cycles per perimeter. Mean amplitudes across these frequencies were 0.40, 0.60, or 0.80 radian. The amplitude difference between frequencies 2 and 4 was —0.40, 0.00, or 0.40 radian. The resulting set of nine stimuli is shown in Figure 5. The manipulation of relative and mean amplitude is equivalent to independent variation in the amplitudes of frequencies 2 and 4 cycles per perimeter. In Figure 5, the amplitude of frequency 2 increases as one proceeds from the upper left corner along the diagonal of the matrix to the lower right corner. The amplitude of frequency 4 increases as one proceeds from the upper right coiner along the diagonal to the lower left corner. Thus, the amplitudes, in radians, at frequency 2 are 0.20 for Shape A; 0.40 for Shapes B and D; 0.60 for Shapes C, E, and G; 0.80 for Shapes F and H; and 1.00 for Shape I. The amplitudes, in radians, at frequency 4 are 0.20 for Shape C; 0.40 for Shapes B and F; 0.60 for Shapes A, E, and I; 0.80 for Shapes


137

2-Dimensional Solution (amp. in radians, phase in degrees)


1-

(1.00,0)

(0.75, 0)

(1.00,45)

I cw

(0.75, (0.50,45) (0.75, 9ff

(1.00, 90)

(0.50,< -1-

-2 -2

I -1

0

Dimension 1 Figure 4. The two-dimensional monotonic-scaling solution from Experiment 1. Amp. = amplitude. D and H; and 1.00 for Shape G. The phase values for both frequencies 2 and 4 were held constant at 0° for all stimuli. Procedure. The procedure was identical to that used for Experiment 1.

Results and Discussion As in Experiment 1, we performed a monotonic multidimensional scaling analysis (Kruskal & Wish, 1978), using a Table 1 Beta Weights and Multiple Correlations of Vectors Fit to Two-Dimensional Scaling Solution for Experiment 1 Dimension 1 Dimension 2 Scale Amplitude 0.236 0.955* Phase 0.239 -0.970* 0.168 Width -0.968* Straightness -0.241 -0.904* Smoothness -0.978* -0.111 No. of parts 0.888* 0.311 Complexity 0.202 0.943* -0.441 Symmetry -0.745 Orientation 0.274 -0.912* * Significant at familywise a = .05 level.

R

.984* .998* .982* -.936* -.984* .941* .965* .867 .952*

Euclidean distance function, on the similarity judgments, and the stress values were obtained for one- through fourdimensional solutions. The plot of stress versus the number of dimensions (see Figure 3) shows a clear elbow at the 2-D solution. The position of each shape within this 2-D space is illustrated in Figure 6. The solution is highly ordered, and the interpoint distances are approximately constant, with the exception of Shape C, the stimulus with the lowest amplitude at frequency 4. Thus, it appears that the perceived dissimilarities of these shapes are monotonically related to distance in a Fourier space, with amplitude of frequency 2 and amplitude of frequency 4 as the two dimensions (albeit with some nonuniformity along the dimension corresponding to frequency 4 at relatively low values). The unidimensional scales and the relative and mean amplitudes were regressed on the two dimensions of the multidimensional-scaling solution. The results of this analysis are shown in Table 3. The highest multiple correlation was obtained for relative amplitude (R = .996). In addition, multiple correlations for mean amplitude (R = .973), amplitude at frequency 2 (R = .988), amplitude at frequency 4 (R = 0.982), width (R = .946), Straightness (R = .982), smoothness (R = .972), number of parts (R = .927), and complexity (R = .954) were all significant at a = .05 (using

138

CORTESE AND DYRE

Table 2 Correlation Matrix of Vectors for Fourier Components and Unidimensional Scales for Experiment 1 Vector

1.


2. 3. 4. 5. 6. 7. 8. 9.

Amplitude Phase angle Width Straightness Smoothness No. of parts Complexity Symmetry Orientation

1 —

.000 -.887 -.896 -.961 .922 .965 -.790 .064

2

3

5

6

— -.907 -.984 .712 -.164

.903 -.758 .032

7

8

9

— -.384 .035 -.123 -.081 -.025 .256 .955

—

.869 .948 -.779 -.905 .608 -.393

the Bonferroni correction). These results complement those of Experiment 1 in that the Fourier components accounted for the greatest proportion of variability in similarity judgments. As in Experiment 1, vectors for all of the scales were fit to the 2-D similarity space. Fitted vectors for the amplitudes of the two frequency components were found to be orthogonal (angular difference = 88.8°, suggesting that there were independent perceptual effects of variations in amplitude on two different frequencies. This observation, along with the observed independence of amplitude and phase in Experiment 1, is consistent with a representation of shape based on FDs. The correlation matrix for the unidimensional scales and the amplitude components is shown in Table 4. In particular, variation in the amplitude of frequency 2 was highly correlated with width (r = —.905), whereas variation in the

Difference in Amplitude (rad.): A(f=2) - A(f=4)

0.4

Increasing Amplitude (f=4)

4

Increasing' Amplitude (f=2)

Figure 5. The stimuli used in Experiment 2. Moving across the matrix represents increasing mean amplitude (A); moving down the matrix represents changes in relative amplitude. Rad. = radians; f = frequency.

— .926 -.827 -.936 .833 -.007

—

— -.731 .100

— .185

—

amplitude of frequency 4 was highly correlated with judgments of smoothness (r = -.959), number of parts (r = .928), and complexity (r = .961). These results suggest that low-frequency Fourier components capture coarse aspects of the perceived shape, such as width or aspect ratio whereas high-frequency Fourier components capture detailed features, such as regularity of the contour, as one would expect on the basis of the nature of the FD.

Experiment 3 Experiments 1 and 2 examined the effects of variation in the amplitude and the phase of a single frequency and the effects of variation in the amplitudes of two different frequencies,'respectively. Experiment 3 tested the effects of variation in the phases of two different frequencies on judgments of similarity. As in Experiments 1 and 2, this was an investigation of the perceptual effects of variation in two parameters of the Fourier expansion. However, unlike the previous experiments, the parameters manipulated in this experiment did not exhibit independent effects on the shape of the contour, because the relative phases, and not the absolute phases, determined the shape. Consider the case of a figure consisting of a nonzero amplitude at only a single frequency, for example, frequency 4. This figure is rotationally symmetric with four lobes. The size of the lobes is dependent on amplitude, whereas the orientation of the lobes is dependent on phase. If phase is varied, the overall effect on a figure is equivalent to a rotation equal in magnitude to the phase value divided by its corresponding frequency. Thus, if the phase of frequency 4 is increased from 0° to 90°, the figure is effectively rotated by 22.5°. The situation is more complex if the example figure consists of nonzero amplitudes at two or more frequencies. Equal phase shifts at different frequencies do not result in a simple figural rotation; rather, these phase values must be scaled to their corresponding frequency. For example, consider a figure with nonzero amplitudes at frequency 6 and frequency 8 and a phase equal to zero at both frequencies. If the phase at frequency 6 is increased to 30°, this increase results in a scaled phase shift of 5° (30/6). If, in addition, the


139

2-Dimensional Solution (amp. in radians: f = 2, f = 4)


cs

.§

Dimension 1 Figure 6. The two-dimensional monotonic-scaling solution from Experiment 2. Amp. = amplitude; f = frequency.

phase at frequency 8 is increased to 40°—also resulting in a scaled phase shift of 5° (40/8)—then the resulting figure has exactly the same shape as the original, except that it has been rotated 5°. However, if the scaled phase shift is unTable 3 Beta Weights and Multiple Correlations of Vectors Fit to Two-Dimensional Scaling Solution for_Experiment 2 Scale

Dimension 1

Amplitude difference Mean amplitude Amplitude frequency = 2 Amplitude frequency = 4 Width Straightness Smoothness No. of parts Complexity Symmetry Orientation * Significant at familywise a

0.240* 0.943* 0.498* 0.836* -0.757* -0.976* -0.910* 0.793* 0.896* -0.641 -0.624 = .05 level.

Dimension 2

R

-0.966* 0.242 0.854* -0.512* -0.570* 0.105 0.340 -0.478 -0.324 0.594 0.653

.996* .973* .988* .982* .946* .982* .972* 927* .954* .875 .904

equal across different frequencies, then the shape of the figure is changed. Given that the same shape is produced by adding a scaled constant to all of the phase values, it is clear that (scaled) relative phase, not absolute phase, determines the shape of an object. Because relative quantities are involved, only n -1 independent parameters (n — 1 phase differences) can be obtained from a set of n phase values. Recall that in Experiment 1, the stimuli were constructed with three frequencies (2, 4, and 6) of nonzero amplitude. The phases of frequency 2 and frequency 4 were held constant, whereas the phase of frequency 6 was varied. This variation of frequency 6 could be equivalently described as varying the relative phase between frequencies 2 and 6 or between frequencies 4 and 6. Fixing either, together with the fixed relative phase between 2 and 4, fixes the shape. In Experiment 3, three nonzero frequencies were again used, but two phase components were varied. Thus, all three phase differences (which were not pairwise independent) varied. If perceptual similarity is based on a Fourier representation, then relative-phase values, or phase differences, should account for the most variability in similarity judgments.

140

CORTESE AND DYRE


1.


2. 3. 4. 5. 6. 7. 8. 9. 10. 11.

Mean amplitude Amplitude difference Amplitude frequency = 2 Amplitude frequency = 4 Width Straightness Smoothness No. of parts Complexity Symmetry Orientation

1

— .000 .707 .707 -.902 -.884 -.804 .647 .819 -.400 -.430

-.707 .707 .378 -.332 -.553 .666 .540 -.709 -.786

.000 -.905 -.390

-All

-.014 .197 .219 .251

-.371 -.860 -.959 .928 .961 -.784 -.860

.693 .546 -.358 -.560 .153 .098

— .947 -.875 -.922 .740 .703

— -.956 -.992 .798 .797

— .939 -.840 -.880

— -.735 -.760

10

11

— .876

—

Method

Results and Discussion

Participants. Eight undergraduate students from the University of Illinois at Urbana-Champaign were paid for their participation. They had normal or corrected-to-normal vision and were naive regarding the purpose of the study. Stimuli and apparatus. The stimuli and the apparatus were identical to those used for Experiments 1 and 2, with the following exceptions: The stimuli were constructed from frequencies of 4, 6, and 8 cycles per perimeter. The amplitudes for these frequencies were held constant at 0.40, 0.75, and 0.75 radian, respectively. The phase of frequency 4 was held constant at 0°, whereas the phases of frequencies 6 and 8 were varied independently. The phase of frequency 6 was 0°, —30°, or —60°. The phase of frequency 8 was 0°, 40°, or 80°. These phase values actually represented figural rotations of 0°, -5°, or -10° for frequency 6 and 0°, 5°, or 10° for frequency 8. The resulting set of nine stimuli is shown in Figure 7. Procedure. The procedure was identical to those used for Experiments 1 and 2.

As in Experiments 1 and 2, we performed a monotonic multidimensional scaling analysis (Kruskal & Wish, 1978), using a Euclidean distance function, on the similarity judgments, and the stress values were obtained for one- through four-dimensional solutions. The plot of stress versus the number of dimensions is shown in Figure 3. Unlike the previous two stress plots, this one lacks a clear elbow. However, when scaled in two dimensions, the stimuli produce a distinctive U-shaped pattern (see the discussion of dimensionality in Davison, 1983), which, along with the comparatively low stress values, suggests a one-dimensional solution. Figure 8 shows the position of each shape along this single dimension. That dimension appears to represent the scaled-phase difference between frequencies 6 and 8. This difference is represented in Figure 7 by the upper left to lower right diagonal. Shape A has the smallest difference in scaled phase (0°) between the two frequencies, followed by B and D (5°); C, E, and G (10°); F and H (15°, and finally I (20°). The stimuli have this same order on the single dimension of the multidimensional-scaling solution. This result is quite interesting: Even though the manipulated phase components cannot produce independent perceptual effects, at least some relative-phase differences are salient dimensions of perceptual similarity. The dimension that emerged in this experiment was the relative-phase difference between the two frequencies with greater amplitudes. The unidimensional scales, the phases of frequencies 6 and 8, and the difference in phase between frequencies 6 and 8 (scaled and unsealed) were regressed on the onedimensional solution.5 The results of this analysis are shown in Table 5. Significant familywise (a = .05) correlations were obtained for number of parts (r = .973), scaled phase difference between frequencies 6 and 8 (r = .972), and phase difference between frequencies 6 and 8 (r = .959). No other correlations were significant. The correlation ma-

Phase (deg.): frequency = 8 — ; 40 80

Increasing phase difference Figure 7. The stimuli used in Experiment 3. Moving across the matrix represents changes in the phase of frequency 6; moving down the matrix represents changes in the phase of frequency 8. Deg. = degrees.

5

The absolute phase values of frequencies 6 and 8 may be equivalently described as relative-phase differences between frequencies 6 and 4 and between frequencies 8 and 4, respectively, because the phase of frequency 4 was held constant at zero.

141


1-D Solution

(Phase in deg.: f = 6, f = 8)

(-6°'°>j,

(-30,80^

(0,40)

№

60,40)

(-60, 80)

|


-2.0

V-\

VT]*,

-1.5

(0,0)

,(0.80)

-1.0

;-30,40)

T|

I

-0.5

0.0

TT

T

0.5

rL B j (o\(-30,0)

-»f *!

1.0

+

•»- T

I

|

1.5

2.0

Dimension 1 Figure 8. The one-dimensional (1— D) monotonic-scaling solution from Experiment 3. Deg. degrees; f = frequency.

trix for the scales is shown in Table 6. Note the high correlation between scaled phase difference and number of parts (r = .901). This relationship, taken together with the results of Experiments 1 and 2, which found a significant relationship between number of parts and amplitude of frequencies 4 and 6, suggests that object parsing may be related to the amplitude and the relative phase of frequencies in this range (4 to 8 cycles per perimeter). An interesting topic for future research would be to explore this issue systematically. To assess the orthogonality of the two manipulated phase components, we also examined the 2-D scaling solution. As expected, the vectors representing phase of frequency 6 and phase of frequency 8 were not orthogonal (angular difference = 117.5°). This result suggests that the perceptual effects of variation in two phase values are dependent.

General Discussion The results of these experiments are consistent with a theory of shape recognition in which boundary curvature is expanded in a Fourier series. Across the three experiments, manipulations in amplitude and phase components of FDs Table 5 Correlations of Vectors Fit to One-Dimensional Scaling Solution for Experiment 3 Scale Phase difference (8 - 6) Scaled-phase difference Phase frequency = 6 Phase frequency = 8 Width Straightness Smoothness No. of parts Complexity Symmetry Orientation * Significant at familywise a = .05 level.

r -.959* -.972* .703 -.672 -.823 -.230 .319 -.973* -.369 -.765 -.672

:

best accounted for the dimensions of judged similarities. In addition, there was a striking tendency for judgments of shape similarities to be very closely related to changes in amplitude and phase. The multidimensional-scaling solutions presented in Figures 4 and 6 are highly ordered, with approximately constant interpoint distances between pairs of shapes with equivalent amplitude or phase variations. Of particular importance was the evidence found for independent perceptual effects for variations of amplitude and phase on a single frequency and for variations of amplitudes on two different frequencies. Both of these results are predicted by a Fourier theory. In addition, the perceptual effects of variations in two phase values were found to be dependent, as indeed they should be, given that the shape of the contour is determined by relative-phase differences. One interesting aspect of these results is the relationship of FDs to a number of descriptive labels of shape. Across the experiments, changes in amplitude on a lower frequency were related to judgments of width. Changes in amplitudes on higher frequencies were reflected by changes in perceived smoothness or complexity. Changes in phase, at least for a single phase change, were related to changes in perceived orientation. Thus, variation in specific Fourier components can account for variation in descriptive labels of shape. It should be pointed out that these results do not necessarily rule out other schemes for shape measurement, of which there are many. Although the structural description approach reviewed in the introduction is prevalent in the psychological literature, other fields, notably computer vision and image processing, have relied more on geometric rather than syntactic models for pattern recognition (one useful recent anthology is that by O, Toet, Foster, Heijmans, & Meer, 1994). Some of these schemes are closely related to the FD method used in the present work. For example, Kuhl and Giardina (1982) developed an alternative method of expanding a shape contour in a Fourier series. Other approaches, such as polygonal harmonics (Maeder, Davison & Clark, 1994), resemble FDs in that they quantitatively

142

CORTESE AND DYRE


1.


2. 3. 4. 5. 6. 7. 8. 9. 10. 11.

Phase difference ( 8 - 6 ) Scaled-phase difference Phase frequency = 6 Phase frequency = 8 Width Straightness Smoothness No. of parts Complexity Symmetry Orientation

1 — .990 —.600 .800 .834 .345 -.312 .882 .187 .793 .678

-.707 .707 .819 .347 -.224 .901 .183 .841 .603

.000 -.468 -.236 -.293 -.685 -.097 -.793 -.019

represent the extent to which a shape approximates an n-fold rotationally symmetric figure at varying levels of n. Finally, many of these schemes share with FDs the ability to extract contour information at a variety of spatial scales (e.g., Dudek, 1994; Subirana-Vilanova, 1994). To the extent that they can postulate specific components that may correspond to a given dimension of the multidimensional space, these schemes may be able to account for the present results. As part of a theory of object recognition, one possible limitation of FDs is that they preserve only the 2-D contour of the object and do not explicitly code 3-D information as do the structural description approaches of Biederman (1987) and Marr and Nishihara (1978). The projected image contours of 3-D objects may change drastically as a function of the orientation of the object relative to the viewer; this realization provided the impetus for the development of theories of 3-D object recognition. Recent research, however, has cast doubt on the idea of a 3-D model for object recognition (Rock & DiVita, 1987; Rock, Wheeler, & Tudor, 1989). Rather than a single 3-D model, recognition may be accomplished by a set of canonical views of an object, provided that the representation for each canonical view is stable over small orientation changes. A Fourier representation may provide this stability because the terms in the Fourier expansion will often undergo negligible changes due to shifts in viewpoint. Larger changes in terms of the expansion will correspond to different canonical views of the object. Another possible limitation of this research is that only simple closed contours were studied. However, this was not essential, and open or intersecting contours can be coded in precisely the same way. With intersecting contours, however, there is an ambiguity regarding the direction of edge tracing for the Fourier expansion. A good-continuation or smoothness rule may eliminate that problem. In these experiments, we used only simple closed contours to minimize noise in the similarity judgments. A more difficult challenge for the model, however, will be the coding of compound patterns with multiple isolated contours. Naturally, one may separately code each contour by a Fourier expansion, but then one is left with the necessity of invoking a structural description, or some other method, for coding the relative positions of the contours.

.691 .255 -.610 .589 .162 .396 .834

.188 -.440 .810 .320 .518 .573

— .259 .078 -.345 .576 .032

— -.407 -.264 .187 -.737

— .497 .631 .671

— -.142 .443

10

11

— .260

—

Finally, an important point is that all of the shapes used in this study were created by Fourier synthesis of only a few frequencies. Thus, all of these shapes are unusually simple in the frequency domain. One may wonder how well the results will transfer to more complex shapes with nonzero power at many frequencies. Cortese (1992) considered this issue in detail, using a set of arbitrary shapes (not created by Fourier synthesis), performing a Fourier transform, and then testing for evidence of Fourier components underlying the judged similarities of these more complex shapes. One fundamental difficulty is the perceptual dependence of the relative-phase values, which appears to place a limit on the predictability of shape similarity for arbitrary shapes. As shown by Cortese, simple metrics like the Euclidean distance metric cannot provide unambiguous measures of phase similarity between two shapes, because one can measure relative phases in any number of ways, each producing a different measure of similarity. That is, after performing a discrete Fourier transform, the phase of each frequency may be measured relative to the phase of frequency 1, for example, or the phase of frequency k may be measured relative to the phase of frequency k - 1, or any number of other patterns. Using Euclidean or other distance metrics, each of these will produce a different measure of similarity of phase for any two given shapes. Even an arbitrary choice of reference point, say the phase of frequency 1, will not, in general, solve the problem: If the amplitude on frequency 1 is zero, then the phase is undefined. In sum, relative phase (together with amplitude) determines shape; relative phase may be measured in many ways, and each of these ways will produce a different index of shape similarity between any two given shapes. Interestingly, this is a problem not merely for Fourier theories but indeed for any theory that attempts to measure metric spatial relations, for the problem has nothing to do with phases per se but rather with relative quantities. If we wish to measure the similarity in the spatial relations in, for example, two dot patterns, we will have the same problem of dependence, unless we make an arbitrary choice of an external reference point. Thus, there are inherent ambiguities that impede the construction of a distance measure of similarity for metric spatial relations. Further research is needed to determine whether this problem of the dependence of the relative phases can be overcome. For instance, in Experiment 3, we found that the


PERCEPTUAL SIMILARITY OF SHAPES most salient dimension was the relative phase between the frequencies of greater amplitude, and informal experimentation suggests that phase manipulations of frequencies at low amplitude will have relatively minor effects on perceived similarity. Furthermore, after having viewed a large number of FD shapes, we have noticed a trend for phase variations to interfere more when the frequencies are close in value. Whether this trend will be useful in overcoming the problem of phase dependency requires further investigation. In any case, it is clear from the results of the present experiments that amplitude and relative phase are perceptually salient dimensions of metric shape. Shape representations in the human visual system contain much more than dichotomous contrasts on edge properties. In order to account for the richness of shape perception, metric information, such as that provided by FDs, must be included.

References Albright, T. D., Charles, R. A., & Gross, C. G. (1985). Inferior temporal neurons do not seem to code shape by the method of Fourier descriptors. Society for Neuroscience Abstracts, 11, 1013. Alter, I., & Schwartz, E. L. (1988). Psychophysical studies of shape with Fourier descriptor stimuli. Perception, 17, 191-202. Biederman, I. (1987). Recognition-by-components: A theory of human image understanding. Psychological Review, 94, 115147. Biederman, I., & Ju, G. (1988). Surface versus edge-based determinants of visual recognition. Cognitive Psychology, 20, 38-64. Brown, D. R., & Andrews, M. H. (1968). Visual form discrimination: Multidimensional analyses. Perception & Psychophysics, 3, 401-406. Cortese, J. M. (1992). Perceptual similarity of closed contours. Unpublished doctoral dissertation, University of Illinois at Urbana-Champaign. Davison, M. L. (1983). Multidimensional scaling. New York: Wiley. Dudek, G. (1994). Shape description and classification using the interrelationship of structures at multiple scales. In Y. O, A. Toet, D. Foster, H. J. A. M. Heijmans, & P. Meer (Eds.), Shape in picture: Mathematical description of shape in grey-level images (pp. 473-482). Berlin, Germany: Springer-Verlag. Garbin, C. P. (1990). Visual-touch perceptual equivalence for shape information in children and adults. Perception & Psychophysics, 48, 271-279. Granlund, G. H. (1972). Fourier preprocessing for hand print character recognition. IEEE Transactions on Computers, C-21, 195201. Harvey, L. O., & Gervais, M. J. (1978). Visual texture perception and Fourier analysis. Perception & Psychophysics, 24, 534-542. Harvey, L. O., & Gervais, M. J. (1981). Internal representation of visual texture as the basis for the judgment of similarity. Journal of Experimental Psychology: Human Perception and Performance, 7, 741-753. Hoffman, D. D., & Richards, W. (1984). Parts of recognition. Cognition, 18, 65-96.

143

Kruskal, J. B., & Wish, M. (1978). Multidimensional scaling. Beverly Hills, CA: Sage. Kuhl, F. P., & Giardina, C. R. (1982). Elliptic Fourier features of a closed contour. Computer Graphics & Image Processing, 19, 230-258. Maeder, A. J., Davison, A. J., & Clark, N. N. (1994). Polygonal harmonic shape characterization. In Y. O., A. Toet, D. Foster, H. J. A. M. Heijmans, & P. Meer (Eds.), Shape in picture: Mathematical description of shape in grey-level images (pp. 463-472). Berlin, Germany: Springer-Verlag. Marr, D., & Nishihara, H. K. (1978). Representation and recognition of the spatial organization of three-dimensional shapes. Proceedings of the Royal Society of London (Series B), 200, 269-294. O, Y., Toet, A., Foster, D., Heijmans, H. J. A. M., & Meer, P. (Eds.). (1994). Shape in picture: Mathematical description of shape in grey-level images. Berlin, Germany: Springer-Verlag. Persoon, E., & Fu, K. (1974). Shape discrimination using Fourier descriptors. Proceedings of the Second International Joint Conference on Pattern Recognition, 126-130. Pinker, S. (1985). Visual cognition: An introduction. In S. Pinker (Ed.), Visual cognition (pp. 1-63). Cambridge, MA: MIT Press. Reed, S. K. (1974). Structural descriptions and the limitations of visual images. Memory & Cognition, 2, 329-336. Richards, L. G. (1971). Toward a psychophysics of form: A multidimensional scaling analysis of form perception. Unpublished doctoral dissertation, University of Illinois at Urbana-Champaign. Richards, W., & Hoffman, D. D. (1985). Codon constraints on closed 2D shapes. In A. Rosenfeld (Ed.), Human and machine vision II (pp. 207-223). Boston: Academic Press. Rock, I., & DiVita, J. (1987). A case of viewer-centered object perception. Cognitive Psychology, 19, 280-293. Rock, I., Wheeler, D., & Tudor, L. (1989). Can we imagine how objects look from other viewpoints? Cognitive Psychology, 21, 185-210. Schwartz, E. L., Desimone, R., Albright, T. D., & Gross, C. G. (1983). Shape recognition and inferior temporal neurons. Proceedings of the National Academy of Sciences, 80, 5776-5778. Subirana-Vilanova, J. B. (1994). Contour texture and frame curves for the recognition of non-rigid objects. In Y. O., A. Toet, D. Foster, H. J. A. M. Heijmans, & P. Meer (Eds.). Shape in picture: Mathematical description of shape in grey-level images (pp. 393-402). Berlin, Germany: Springer-Verlag. Sutherland, N. S. (1968). Outlines of a theory of visual pattern recognition in animals and man. Proceedings of the Royal Society of London (Series B), 171, 297-317. Uhlarik, J. (1989). Fourier descriptors for shape: Effects of adaptation on discrimination thresholds. Bulletin of the Psychonomic Society, 27, 323-326. Winston, P. H. (1975). Learning structural descriptions from examples. In P. H. Winston (Ed.), The psychology of computer vision (pp. 157-209). New York: Academic Press. Zahn, C. T., & Roskies, R. Z. (1972). Fourier descriptors for plane closed curves. IEEE Transactions on Computers, C-21, 269281. Received April 12, 1994 Revision received November 3, 1994 Accepted December 19, 1994

•