Harmony Perception by Periodicity Detection - Semantic Scholar

3 downloads 0 Views 417KB Size Report
can thus simply be identified by a set of tones or pitch classes .... 8. ¯. 9. 4¯. 10. 4¯. 11. 4¯. 12. ¯. 13. 6¯. 14. 6¯. 15. 4¯. 16. ¯. Figure 2: Harmonic overtones with ...
Harmony Perception by Periodicity Detection Frieder Stolzenburg Harz University of Applied Sciences Automation & Computer Sciences Department Friedrichstr. 57-59 · 38855 Wernigerode · GERMANY Tel.: +49 3943 659-333 · Fax: +49 3943 659-399 E-mail: [email protected]

arXiv:1306.6458v3 [cs.SD] 3 May 2014

Abstract

Early mathematical models expressed musical intervals, i.e. the distance between two pitches, by simple fractions. This The perception of consonance/dissonance of musical har- helps to understand that human subjects rate harmonies, e.g. monies is strongly correlated to their periodicity. This is major and minor triads, differently with respect to their consoshown in this article by consistently applying recent results nance. But since most common triads (cf. Figure 1) throughfrom psychophysics and neuroacoustics, namely that the just out are built from thirds, thirds do not provide a direct explanoticeable difference of human pitch perception is about 1% nation of the preference ordering. Newer explanations base for the musically important low frequency range and that on the notion of dissonance, roughness, instability or tension periodicities of complex chords can be detected in the hu- (Helmholtz, 1862; Hutchinson and Knopoff, 1978, 1979; Terman brain. The presented results correlate significantly to em- hardt et al., 1982; Parncutt, 1989; Sethares, 2005; Cook and pirical investigations on the perception of chords. Even for Fujisawa, 2006; Cook, 2009, 2012). They correlate better to scales, plausible results are obtained. For example, all classi- empirical results on harmony perception, but still can be imcal church modes appear in the front ranks of all theoretically proved. possible seven-tone scales.

1.1

Key words: consonance/dissonance; harmony perception; periodicity; dyads, triads, chords, and scales.

Aims

A theory of harmony perception should be as simple and explanatory as possible. This means on the one hand, that it does not make too many assumptions, e.g. on implicit or ex1 Introduction plicit knowledge about diatonic major scales from Western music theory, or use many mathematical parameters deterMusic perception and composition seem to be influenced not mining dissonance or roughness curves. On the other hand, only by convention or culture, manifested by musical styles or it should be applicable to musical harmonies in a broad sense: composers, but also by the neuroacoustics and psychophysics Tones of a harmony may sound simultaneously as in chords of tone perception (Langner, 1997; Roederer, 2008). While or consecutively and hence in context as in scales. A harmony studying the process of musical creativity including harmony can thus simply be identified by a set of tones or pitch classes perception, several questions may arise, such as: What are forming the respective interval, chord, or scale. This article the underlying principles of music perception? How can the aims at and presents a fully computational (and hence falsiperceived consonance/dissonance of chords and scales be ex- fiable) model for musical harmoniousness with high explanaplained? tory power for harmonies in this general, abstract sense. Numerous approaches tackle these questions, studying Furthermore, a theory of harmony perception should also the consonance/dissonance of dyads and triads, consisting consider and incorporate the results from neuroacoustics and of two or three tones, respectively (Euler, 1739; Kameoka psychophysics on auditory processing of musical tone sensaand Kuriyagawa, 1969; Hutchinson and Knopoff, 1978, 1979; tions in the ear and the brain. The frequency analysis in the Cook and Fujisawa, 2006; Johnson-Laird et al., 2012). For inner ear can be compared to that of a filter bank with many instance, the major triad (Figure 1a) is often associated with parallel channels. Therefore, each frequency and hence each emotional terms like pleasant, strong, or bright, and, in con- pitch of the original signal is represented, provided it is in the trast to this, the minor triad (Figure 1b) with terms like sad, audible range and not masked. In addition, periodicity is enweak, or dark. Empirical studies reveal a clear preference or- coded in the auditory system. Periodicities of complex chords dering on the perceived consonance/dissonance of common can be detected in the human brain, and information concerntriads in Western music (see Figure 1), e.g. major ≺ minor ing stimulus periodicities is still present in short-term memory (Roberts, 1986; Johnson-Laird et al., 2012). So the question (Langner, 1997; Lee et al., 2009). remains, what may be the reason for this. Finally, the correlation between the consonance/dis1

¯ ¯ 4¯¯ G 4¯¯ 6¯¯ ¯ (a) major

¯¯¯ 4¯¯¯ 6¯¯¯ 62¯¯¯ 4¯¯¯ ¯¯ ¯ (b) minor

(c) diminished

44¯¯¯ (d) augmented

6¯¯ ¯ ¯¯ ¯ ¯¯¯ (e) suspended

Figure 1: Common triads and their inversions. Figure 1a-d shows all triads that can be build from major and minor thirds, i.e., the distance between successive pairs of tones are 3 or 4 semitones, including their inversions which are obtained by transposing the currently lowest tone by an octave, always with A4 as lowest tone here. Figure 1e shows the suspended chord, built from perfect fourths (5 semitones apart). Its last inversion reveals this. sonance predicted by the theory and the perceived consonance/dissonance from empirical studies should be the highest possible. In order to make the theory definitely noncircular, we therefore eventually want to establish the relation between musical harmonies, i.e. complex vibrations, and their perceived consonance/dissonance. Several empirical experiments on harmony perception have been conducted, where participants are asked to listen to and rate musical harmonies on an ordinal scale with respect to their consonance/dissonance. The results of these experiments, which take intervals, chords, or scales as stimuli (Malmberg, 1918; Roberts, 1986; Krumhansl, 1990; Johnson-Laird et al., 2012; Temperley and Tan, 2013), can be used to evaluate theories on harmony perception.

1.2

one hand and diatonic scales, i.e. the classical church modes, on the other hand all show highest correlation with the empirical results (Schwartz et al., 2003; Johnson-Laird et al., 2012; Temperley and Tan, 2013), not only with respect to the ranks, but also with the ordinal values of the empirical ratings of musical consonance. For the latter, we consider logarithmic periodicity, i.e. log2 (h). As we will see, this logarithmic measure can be plausibly motivated by the concrete topological organization of the periodicity coding in the brain (cf. Section 2.6).

1.3

Overview of the Rest of the Article

The organization of the article is straightforward. After this introductory section (Section 1), we briefly discuss existing theories on harmony perception (Section 2) which often make use of the notions consonance and dissonance. In particular, we highlight the psychophysical basis of harmony perception by reviewing recent results from neuroacoustics on periodicity detection in the brain (e.g. Langner, 1997). Then, we state the periodicity-based approach with the harmoniousness measure h in detail (Section 3). Applying this measure to common musical chords and also scales, shows very high correlation to empirical results (Section 4). We compare these results with those of other models of harmony perception in our evaluation. Finally, we draw some conclusions (Section 5).

Main Contribution

This article applies recent results from psychophysics and neuroacoustics consistently, obtaining a computational theory of consonance/dissonance perception. We focus on periodicity detection, which can be identified as a fundamental mechanism to music perception, and exploit in addition the fact that the just noticeable difference of human pitch perception is about 1% for the musically important low frequency range (Zwicker et al., 1957; Roederer, 2008). This percentage is the only parameter of the present approach. The concept of periodicity pitch, which is related to the missing fundamental chord frequency, has been studied for intervals many times in the literature (cf. Roederer, 2008, and references therein). The idea in this article is to transfer this concept to chords and also scales by considering relative periodicity, i.e. the approximated ratio of the period length of the chord (its periodicity pitch) relative to the period length of its lowest tone component (without necessarily applying octave equivalence), called harmonicity h in Stolzenburg (2009). The hypothesis in this article is, that the perceived consonance of a musical harmony decreases with the relative periodicity h. The corresponding periodicity-based analysis presented here confirms that it does not matter much whether tones are presented consecutively (and hence in context) as in scales or simultaneously as in chords. Periodicity detection seems to be an important mechanism for the perception of all kinds of musical harmony: chords, scales, and probably also chord progressions. Listeners always prefer simpler, i.e. shorter periodic patterns. The predictions of the periodicitybased approach obtained for dyads and common triads on the

2

Theories on Harmony Perception

Since ancient times, the problem of explaining musical harmony perception attracted a lot of interest. In the sequel, we discuss some of them briefly. But since there are numerous approaches addressing the problem, the following list is by no means complete. We mainly concentrate on theories with a psychophysical or neuroacoustical background and on those that provide a computational measure on consonance/dissonance, which we can use in our evaluation of the different approaches (Section 4).

2.1

Overtones

The tones produced by real instruments like strings, tubes, or the human voice have harmonic or other overtones. The frequencies of harmonic overtones are integer multiples of a fundamental frequency f . For the frequency of the n-th overtone (n ≥ 1), also called partial in this context, it holds fn = n · f , 2

part of definition for composite numbers). The rationale behind this measure is to count the maximal number of times that the whole periodic structure of the given chord can be decomposed in time intervals of equal length. Brefeld (2005) proposes (a1 · . . . · ak · b1 · . . . · bk )1/2k as harmoniousness measure, i.e. the geometric average determined from the numerators and denominators of the frequency ratios of all involved intervals. Both these measures yield reasonable harmoniousFigure 2: Harmonic overtones with respect to the fundamen- ness values in many cases. tal tone E2. Honingh and Bod (2005, 2011) investigate convexity of scales, by visualizing them on the Euler lattice, in which each point represents an integer power of prime factors, where also i.e. f1 = f . The amplitudes of the overtones define the spec- negative exponents are allowed. Because of octave equivatrum of a tone or sound and account for its loudness and spe- lence (which is adopted in this approach), the prime 2 is omitcific timbre. ted. For example, the frequency ratio 5/3 (major sixth) can Overtones (cf. Figure 2) may explain the origin of the ma- be written as 3−1 · 51 and thus be visualized as point (−1, 1). jor triad and hence its high perceived consonance. The major Honingh and Bod (2005, 2011) consider periodicity blocks triad appears early in the sequence, namely overtones 4, 5, 6 according to Fokker (1969) and observe that almost all tradi(root position) and—even earlier—3, 4, 5 (second inversion). tional scales form (star-)convex subsets in this space. HowBut this does not hold for the minor chord. The overtones 6, ever, the star-convexity property does not allow to rank har7, 9 form a minor chord, which is out of tune however, since monies, as is done in this article, at least not in a direct manthe frequency ratio 7/6 differs from the ratio 6/5 that is usu- ner, because it is a multi-dimensional measure. ally assumed for the minor third. Besides that, in contrast to There are other more or less purely mathematical explanathe major triad, the overtones forming the minor triad are not tions for the origin of chords and scales, e.g. by group theory adjacent, and its ground tone B3 is not octave equivalent with (Balzano, 1980; Carey and Clampitt, 1989), ignoring however the fundamental tone E2 (i.e., they do not have the same basic the sensory psychophysical basis. Therefore, we will not conname). sider these approaches further here. Furthermore, the diminished triad is given by the overtones 5, 6, 7. Therefore, it appears previous to the minor triad, which is inconsistent with the empirical results. The so ob- 2.3 Dissonance, Roughness, and Instability tained diminished triad, built from two minor thirds, is also Helmholtz (1862) explains the dimension of consonance in out of tune, because the intervals correspond to different fre- terms of coincidence and proximity of overtones and differquency ratios, namely 6/5 and 7/6. ence tones. For instance, for the minor second (frequency ratio

6¯ 4¯ ¯ 6¯ ¯ 4¯ 4¯ ¯ 4¯ 6¯ ¯ ¯ 4¯ ¯ 9 10 11 12 13 14 15 16 ¯

1 2 3 4

G

5 6

7 8

¯

2.2

16/15), only very high, low-energy overtones coincide, so it is weakly consonant. For the perfect fifth (frequency ratio 3/2), all its most powerful overtones coincide, and only very weak ones are close enough to beat. The fifth is therefore strongly consonant and only weakly dissonant. This theory has survived with some modifications (Plomp and Levelt, 1965) until today. Newer explanations base upon the notion of dissonance or roughness (Kameoka and Kuriyagawa, 1969; Hutchinson and Knopoff, 1978, 1979; Parncutt, 1989; Sethares, 2005). In general, dissonance is understood here as the opposite to consonance, meaning how well tones sound together. If two sine waves sound together, typical perceptions include pleasant beating (when the frequency difference is small), so-called roughness (when the difference grows larger), and separation into two tones (when the frequency difference increases even further) (Sethares, 2005, Fig. 3.7). Based on these observations, several mathematical functions for dissonance or roughness curves are proposed in the literature. Figure 3 shows one possibility according to Hutchinson and Knopoff (1978, 1979) with two parameters a and b, where y denotes the sensory dissonance and x is the relative deviation between the two frequencies. In order to find out the dissonance among several tones, usually their overtone spectra are also taken into ac-

Frequency Ratios

Already Pythagoras studied harmony perception, relating music and mathematics with each other. He created a tuning system that minimized roughness for intervals of harmonic complexes and expressed musical intervals as simple fractions, i.e. with small numerators and denominators. Also Euler (1739) (see also Bailhache, 1997) spent some time with the problem and proposed the so-called gradus suavitatis (degree of softness) as harmoniousness measure, considering the least common multiple (lcm) of denominators and numerators of the frequency ratios {a1 /b1 , . . . , ak /bk } with respect to the lowest ml 1 tone in the given harmony. If pm 1 · . . . · pl is the prime factorization of lcm(a1 , . . . , ak ) · lcm(b1 , . . . , bk ), then the degree of P softness is defined by 1 + li=1 mi · (pi − 1). Following somewhat the lines of Euler (1739), Stolzenburg (2012) considers, besides periodicity, the complexity of the product lcm(a1 , . . . , ak ) · lcm(b1 , . . . , bk ) with respect to prime factorization, more precisely, its number of not necessarily distinct prime factors Ω (Hardy and Wright, 1979, p. 354), defined as follows: Ω(1) = 0, Ω(n) = 1, if n is a prime number, and Ω(n) = Ω(p) + Ω(q) if n = p · q. The function shares properties with the logarithm function (cf. last

3

Purves, 2009, p. 2).

2.5

The approaches discussed so far essentially take the frequency spectrum of a sound as their starting point. From a spectral point of view, sounds are combinations of a fundamental frequency and certain overtones. The ear works as a spectral analyzer. Spectral analysis is performed in the cochlea. This function of the ear is used in many explanations of consonance/dissonance. Clearly, analyzing the frequency spectrum is closely related to analyzing the time domain (periodicity). Fourier transformation allows us to translate between both mathematically. However, subjective pitch detection, i.e. the capability of the auditory system to identify the repetition rate (periodicity) of a complex tone sensation, only works for the lower but musically important frequency range up to about 1500 Hz (Plomp, 1967). A missing fundamental tone can be assigned to each interval. Note that the tone with the respective frequency, called periodicity pitch of the interval, is not present as an original tone component. Schouten (1938, 1940) introduces the notion of residue for periodicity pitch. Based on these observations, several periodicity-based theories have been proposed (Licklider, 1951, 1962; Boomsliter and Creel, 1961; Terhardt et al., 1982; Beament, 2001). Licklider (1962) discusses possible neuronal mechanisms for periodicity pitch, including autocorrelation and comb-filtering, which has been proved correct by recent results from neuroacoustics (see Section 2.6). Boomsliter and Creel (1961) explain musical phenomena as relationships between long patterns of waves, using so-called frequency discs. Hofmann-Engl (2004, 2008) provides a more computational model, which calculates the periodicity pitch and possible bass note candidates in the so-called sonance factor. The harmoniousness measure h presented in this article (in Section 3) is also based on periodicity detection. It is a fully computational model of consonance/dissonance, applicable to harmonies in the broad sense, which shows high correlation with empirical ratings.

Figure 3: Dissonance or roughness curve with y =  x/a · exp (1 − x/a) b and x = | f1 / f0 − 1| (absolute value of relative frequency deviation) following the lines of Hutchinson and Knopoff (1978, 1979). a is the interval for maximum roughness, and b is an index set to 2 to yield the standard curve. count, summing up the single dissonance values. Although these approaches correlate better to the empirical results on harmony perception, they do not explain the low perceived consonance of the diminished or the augmented triad, which are built from two minor or major thirds, respectively. Therefore, Cook and Fujisawa (2006) emphasize that harmony is more than the summation of interval dissonance among tones and their upper partials, adopting the argument from psychology that neighboring intervals of equal size are unstable and produce a sense of tonal instability or tension, that is resolved by pitch changes leading to unequal intervals (see also Cook, 2012). Since lowering any tone in an augmented triad by one semitone leads to a major triad and raising to a minor triad, Cook and Fujisawa (2006) assume sound symbolism, where the major triad is associated with social strength and the minor triad with social weakness. Since overtone spectra vary largely among different instruments, the number and amplitudes of overtones cannot be exactly determined. This uncertainty makes the calculation of dissonance as sketched above a bit vague. Maybe because of this, works based on dissonance and and related notions focus on the consonance of dyads and triads, whereas the analysis of complex chords or even scales is less often studied.

2.4

Periodicity-Based Approaches

Human Speech and Musical Categories

2.6

Gill and Purves (2009) consider a more biological rationale, investigating the relationship between musical modes and human speech. They examine the hypothesis that major and minor scales elicit different affective reactions because their spectra are similar to the spectra of voiced segments in excited and subdued speech. The results reveal that spectra of intervals in major scales are more similar to the ones found in excited speech, whereas spectra of particular intervals in minor scales are more similar to the ones of subdued speech (Bowling et al., 2010). The observation, that the statistical structure of human speech correlates with common musical categories, can also be applied to consonance rankings of dyads, yielding plausible results (Schwartz et al., 2003). As a measure for comparing harmonies, the mean percentage similarity of the respective spectra is considered. For an interval with frequency ratio a/b, it is defined as (a + b − 1)/(a · b) (Gill and

Neuronal Models

Since musical harmony, understood in the broad sense, seems to be a phenomenon present in almost all human cultures, harmony perception must be somehow closely connected with the auditory processing of musical tone sensations in the ear and in the brain. Hearing of sounds depends on sensory organs that transduce physical vibrations into nerve impulses. The sensory transducers are hair cells on the basilar membrane in the cochlea of the inner ear, which differ in the frequencies to which they respond maximally. That periodicity can be detected in the brain, has been well-known for years. For example, two pure tones forming a mistuned octave cause so-called second-order beats, although no exact octave is present (Plomp, 1967; Roederer, 2008). But only recently, neuroacoustics found the mechanism for being able to perceive periodicity. In the following brief presentation

4

of these results, which lays the basis for the periodicity measure h introduced in this article, we mainly follow the lines of Langner (1997), who conducted experiments on periodicity sound detection with animals and humans. The resulting neuronal model for the analysis of periodic signals, which is sketched in Figure 4, contains several stages, that we will explain next. First, so-called trigger neurons in the ventral cochlear nuinner ear cleus, well known as octopus cells in anatomical terms, transcochlea fer signals without significant delay. The period of the original signal and the neuronal activity correspond, but the neuronal activity typically has the form of spike trains, i.e. a series of discrete action potentials (Langner and Schreiner, 1988; Cariani, 1999; Tramo et al., 2001). The maximal amplitude of these spikes is limited. This means, that some information on the waveform of the original signal is lost, concerning the trigger overtone spectrum and also the amplitudes, which contribute neuron to the timbre and loudness of the sound. Second, there are oscillator neurons, which are chopper neurons or stellate cells, with intrinsic oscillation showing regularly timed discharges in response to stimuli, not corresponding to the temporal structure of the external signal. The oscillation intervals can be characterized as integer multiples n · T of a base period of T = 0.4 ms with n ≥ 2 for endotheroscillator integrator mic, i.e. warm-blooded animals (Langner, 1983; Schulze and neurons neurons Langner, 1997; Bahmer and Langner, 2006). The external signal is synchronized with that of the oscillator neurons, which limits signal resolution. Hence, frequencies and also intervals can only be distinguished up to a certain precision. Third, in the dorsal cochlear nucleus, periodic signals are transferred with (different) delays. Onset latencies of socalled integrator neurons (fusiform cells, i.e. type IV cells, coincidence and giant cells in anatomical terms) up to 120 ms have been neuron observed (Langner and Schreiner, 1988). While oscillator neurons respond with almost no delay, integrator neurons respond with a certain amount of delay. Both groups of neurons are triggered and synchronized by trigger neurons (on-cells). When the delay corresponds to the signal period, the delayed orthogonal logarithmic response and the non-delayed response to the next modulatonotopic and periodicity maps tion wave coincide. All neurons respond also to the harmonic overtones of their characteristic frequency (Langner, 1997). Therefore, fundamental bass and hence periodicity pitch can be detected in the brain. In the auditory midbrain (inferior colliculus), coincidence neurons respond best whenever the delay is compensated by the signal period. cognitive The latter provides the basis for an autocorrelation process mechanism by comb-filtering, which includes phase locking (Langner, 1983; Meddis and Hewitt, 1991; Lee et al., 2009). This means, phase differences among different signals can be neglected. Furthermore, runtime differences between different spike trains in the auditory system are nullified, thus faconsonance/dissonance perception cilitating the highest possible coincidence rate between two correlated spike trains. The neurons in the midbrain inferior Figure 4: Sketch of the neuronal model for the analysis of colliculus are capable of phase locking to stimulus periodic- periodic signals following (Langner, 1997, Figure 3). The arities up to 1000 Hz, as Langner and Schreiner (1988) found rows roughly denote the workflow of the auditory processing. out by experiments with cats. There seems to be evidence that periodicity detection is possible for significantly higher frequencies up to 4000 Hz in some cases (Langner, 1997). As 5

a result of a combined frequency-time analysis, that is some kind of autocorrelation by comb-filtering, pitch and timbre (i.e. frequency and periodicity), are mapped temporally and also spatially and orthogonally to each other in the auditory midbrain and auditory cortex. The neuronal periodicity maps in the inner cortex (cat, gerbil) and cortex (gerbil, human) are organized along logarithmic axes in accordance with the frequency tuning of the respective neurons as well as the orthogonal tonotopic maps which represent pitch (Langner and Schreiner, 1988; Schulze and Langner, 1997; Langner et al., 1997). In total, about 8 octaves can be represented in each dimension. Lee et al. (2009) discovered that the phase-locking activity to the temporal envelope is more accurate (i.e. sharper) in musicians than non-musicians, reporting experiments with two intervals: The consonant interval (major sixth, tones with 99 Hz and 166 Hz, approximate frequency ratio 5/3) shows the highest response in the brainstem at about 30.25 ms =ˆ 3/99 Hz, and the dissonant interval (minor seventh, 93 Hz and 166 Hz, frequency ratio 9/5) at about 54.1 ms =ˆ 5/93 Hz. All this fits very well with the periodicitybased approach of harmony perception: The denominator of the approximate frequency ratio (underlined above) is the relative period length with respect to the frequency of the lowest tone, as we will see later. In summary, the periodicity of a complex tone sensation can be detected by the auditory system in the brain. In addition, it seems possible that the auditory system alters the spectrum of intervals making them sound more consonant, i.e. obtaining simpler frequency ratios. Bidelman and Krishnan (2009) investigate correlates of consonance and dissonance by means of dyads and detect that brainstem responses to consonant intervals were more robust and yielded stronger pitch salience than those to dissonant intervals. All these results, which are also consistent with results from fMRI experiments on melody processing in the auditory pathway (Patterson et al., 2002), indicate that periodicity detection in the brain may be an important mechanism during music perception. Therefore, several approaches on harmony perception (Ebeling, 2007, 2008; Foltyn, 2012) directly refer to Langner (1997). Ebeling (2007, 2008) studies autocorrelation functions of intervals for different pulse forms, computing so-called generalized coincidence functions. It turns out that this mathematical model defines a measure that is in line with the degree of tonal fusion as described by Stumpf (1883). Cariani (1999) reviews neurophysiological evidence for interspike interval-based representations for pitch and timbre in the auditory nerve and cochlear nucleus (see also Tramo et al., 2001). Timings of discharges in auditory nerve fibers reflect the time structure of acoustic waveforms, such that the interspike intervals (i.e. the period lengths) that are produced convey information concerning stimulus periodicities, that are still present in short-term memory. Common to all these approaches is, that they consider essentially the energy of the autocorrelation function within one period of the given harmony, by counting the interspike intervals between both successive or non-successive spikes in a discharge pattern. This leads to

histograms, called autocorrelograms, which show high peaks for periods corresponding to the pitch. This procedure works well for dyads, but for triads and more complex chords the correlation with empirical ratings is relatively low, which is already stated by Ebeling (2007, Section 2.5.3).

2.7

Cognitive Theories

Cognitive science is the interdisciplinary scientific study of the mind and its processes. It includes research on intelligence and behavior, especially focusing on how information is represented, processed, and transformed. There exists also a series of approaches on harmony perception following this paradigm, in particular for more complex chords and scales. For instance, Johnson-Laird et al. (2012) propose a cognitive dual-process theory, employing three basic principles for ranking triads, namely (in order of decreasing priority) the use of diatonic scales, the central role of the major triad, and the construction of chords out of thirds. These three principles of Western tonal music predict a trend in increasing dissonance. Within each level of dissonance, roughness according to Hutchinson and Knopoff (1978, 1979) can predict a detailed rank order then. Temperley and Tan (2013) investigate the emotional connotations of diatonic modes. In the respective experiments, participants hear pairs of melodies, presented in different diatonic modes, and have to judge which of the two melodies sounds happier. The resulting overall preference order Ionian ≺ Mixolydian ≺ Lydian ≺ Dorian ≺ Aeolian ≺ Phrygian is then explained by several principles: familiarity, e.g., the major mode (Ionian) is the most common mode in classical and popular music, and sharpness, i.e. the position of the scale relative to the tonic on the circle of fifths. Both these cognitive theories show high correlation with the empirical results with respect to their application domain, i.e. chords in Western music and diatonic scales, respectively. McLachlan et al. (2013) conclude from experiments with common dyads and triads that the perception of consonance/dissonance involves cognitive processes. According to the cognitive incongruence model of dissonance, proposed by McLachlan et al. (2013), consonance/dissonance is learned to a certain extent and is based on enculturation, i.e. the natural process of learning a particular culture. Nevertheless, also in this model, periodicity plays a prominent role (McLachlan et al., 2013, Figure 10). So, a general tendency of musically trained persons toward higher dissonance ratings is observed, as musicians become better able to use periodicity-based pitch mechanisms. This is consistent with the results by Lee et al. (2009).

3 3.1

The Periodicity-Based Method Rationales of the Theory

We now present the periodicity-based method and its rationales in more detail. As harmoniousness measure h, we consider the ratio of the period length of the waveform of the 6

given harmony relative to the period length of its lowest tone component, eventually its base-2 logarithm log2 (h), called logarithmic periodicity henceforth. The rationale behind this is given by the recent results from neuroacoustics on periodicity detection in the brain, which we reviewed in Section 2.6. In particular, the rationale for taking the logarithm of the periodicity measure is the logarithmic organization of the neuronal periodicity map in the brain. Since one octave corresponds to a frequency ratio of 2, we adopt the base-2 logarithm. Clearly, different periodicity pitches must be mapped on different places on the periodicity map. However, a general observation is that, at least in a wide range, harmony perception is independent from transposition, i.e. pitch shifts. For instance, the perceived consonance/dissonance of an A major triad should be more or less the same as that of one in B. This seems to be in accordance with the logarithmic organization of the neuronal periodicity map. In line with McLachlan et al. (2013), one may conclude that harmony perception may be comprised somewhere else in the brain by some cognitive process or property filter (Licklider, 1962), conveying information concerning stimulus periodicities in (short-term) memory. The latter is also required to explain harmony perception, when tones are occurring consecutively. Because of this, we do not consider absolute period length here, but relative periodicity, employing the respective lowest tone component as reference point for the sake of simplicity. As we will see (in Section 4), this measure shows very high correlation to empirical ratings of harmonies, especially higher than that of roughness (Hutchinson and Knopoff, 1978, 1979) which only regards the auditory processing in the ear and not the brain. A second feature of the periodicity-based approach is the use of integer frequency ratios, which may differ from the real frequency ratios up to about 1%. The underlying rationale for approximating the actual frequency ratios is, that human subjects can distinguish frequency differences for pure tone components only up to a certain resolution, namely about 0.5% under optimal conditions. For the musically important low frequency range, especially the tones in (accompanying) chords, this just noticeable difference is worse, namely only about 1% (Zwicker et al., 1957; Roederer, 2008). Approximating the irrational frequency ratios of equal temperament finally yields the integer frequency ratios of just intonation. Just intervals, rather than tempered, are considered best in tune, but everything within a band of ±1% or even more for more dissonant and ambiguous intervals, e.g. the tritone, are acceptable to listeners (Hall and Hess, 1984). This also holds, in principle, if the stimuli are pure sinusoids (Vos, 1986, Experiment 2). There is also a mathematical reason for the approximation procedure: Computing relative periodicity requires that each frequency ratio is a fraction, i.e. not an irrational number, because otherwise no finite period length can be found at all. The evaluation of this approach shows that the results remain stable, even if different ratios are used, taking e.g. 16/9 instead of 9/5 as frequency ratio for the minor seventh, provided that the deviation from the real ratio is about 1%. However, if we adopt the frequency ratios from Pythagorean tuning, which is

not psychophysically motivated in this way, the correlation to empirical ratings of harmonies decreases. An additional rationale for approximating frequency ratios is given by the oscillator neurons with an intrinsic oscillation interval of at least 2T = 0.8 ms (cf. Section 2.6). Because of this, the resolution of the original signal is limited. Hence, frequencies and also intervals can only be distinguished up to a certain precision. Incidentally, the time constant 2T = 0.8 ms corresponds to a frequency of 1250 Hz, which roughly coincides with the capability of the auditory system to identify the repetition rate (periodicity) of a complex tone sensation, namely up to about f = 1500 Hz (Plomp, 1967). Furthermore, the lowest frequency, audible by humans, is about 20 Hz. Its ratio with the border frequency f is only slightly more than 1%. This percentage, corresponding to the above-mentioned just noticeable difference, is the only parameter of the presented approach on harmony perception. A third feature of the periodicity-based approach is, that the actual amplitudes of the tone components in the given harmony are ignored. Harmonic overtone spectra are irrelevant for determining relative periodicities, because the period length of a waveform of a complex tone with harmonic overtones is identical with that of its fundamental tone. We always obtain h = 1, since the frequencies of harmonic overtones are integer multiples of the fundamental frequency. All frequency ratios {1/1, 2/1, 3/1, . . .} have 1 as denominator in this case. Therefore, the harmoniousness measure h is independent from concrete amplitudes and also phase shifts of the pure tone components, i.e. tones with a plain sinusoidal waveform. Incidentally, information on the waveform of the original signal is lost in the auditory processing, because neuronal activity usually has the form of spike trains (Langner and Schreiner, 1988; Cariani, 1999; Tramo et al., 2001). Information on phase is also lost because of the phase-locking mechanism in the brain (Langner and Schreiner, 1988; Meddis and Hewitt, 1991; Lee et al., 2009). However, the information on periodicity remains, which is consistent with the periodicity-based approach proposed here. Also from a more practical point of view, it seems plausible, that harmony perception depends more on periodicity than on loudness and timbre of the sound. It should not matter much, whether a chord is played e.g. on guitar, piano, or pipe organ. Of course, this argument only holds for tones with harmonic overtone spectra. If we have inharmonic overtones in a complex tone such as in Indian, Thai, or Indonesian gong orchestra, i.e. Gamelan music, or stretched or compressed timbres as considered by Sethares (2005), then it holds h > 1 for the relative periodicity value of a single tone, i.e., we have an inherently increased harmonic complexity (Parncutt, 1989). Average listeners seem to prefer low or middle harmonic complexity, e.g. baroque or classical style on the one hand and impressionism and jazz on the other hand, which of course have to be analyzed as complex cultural phenomena. In this context, Parncutt (1989, pp. 57-58) speaks of optimal dissonance, that gradually increased during the history in Western music.

7

3.2

Computing Relative Periodicity

Now we explain the concrete computational details of the approach. As example, we consider the A major triad (Figure 1a) in just intonation. Its root position consists of three tones with absolute frequencies f1 = 440 Hz, f2 = 550 Hz, and f3 = 660 Hz. The respective frequency ratios with respect to the lowest tone (A4) are F1 = 1/1, F2 = 5/4 (major third), and F3 = 3/2 (perfect fifth). For the sake of simplicity, we ignore possible overtones and consider just the three pure tone components here. Figure 5a-c shows the sinusoids for the three pure tone components and Figure 5d their superposition, i.e. the graph of the function s(t) = sin(ω1 t) + sin(ω2 t) + sin(ω3 t), where ωi = 2π fi is the respective angular frequency. How can the periodicity of the signal s(t) (Figure 5d) be determined? One possibility is to apply continuous autocorrelation, i.e. the cross-correlation of a signal with itself. For the superposition of periodic functions s(t) over the reals, it is defined as continuous cross-correlation integral of s(t) with R +T itself, at lag τ, as follows: ρ(τ) = limT →∞ 1/2 T −T s(t) · P s(t − τ) dt = 1/2 3i=1 cos(ωi τ). The autocorrelation function reaches its peak at the origin. Other maxima indicate possible period lengths of the original signal. Furthermore, possibly existing phase shifts are nullified, because we always obtain a sum of pure cosines with the same frequencies as in the original signal as above. The respective graph of ρ(τ) for the major triad example is shown in Figure 5e. As one can see, it has a peak after four times the period length of the lowest tone (Figure 5a). This corresponds to the periodicity of the envelope frequency, to which respective neurons in the inner cortex respond. In general, overtones have to be taken into account in the computation of the autocorrelation function, because real tones have them. In many approaches, complex tones are made up from sinusoidal partials in this context. Their amplitudes vary e.g. as 1/k, where k is the number of the partial, ranging from 1 to 10 or similar (Hutchinson and Knopoff, 1978, 1979; Sethares, 2005; Ebeling, 2007, 2008), although the number of partials can be higher in reality, if one looks at the spectra of musical instruments. In contrast to this procedure, we calculate relative periodicity, abstracting from concrete overtone spectra, by simply considering the frequency ratios of the involved tones. Figure 5f illustrates this by showing the periodic patterns of the three tone components of the major triad as solid boxes one upon the other. As one can see, the period length of the chord is (only) four times the period length of the lowest tone for this example. This ratio is the relative periodicity h. It depends on the frequency ratios {a1 /b1 , . . . , ak /bk } of the given harmony. For this we assume that each frequency ratio Fi is a fraction ai /bi (always in its lowest terms). All frequencies are relativized to the lowest frequency f1 . Thus it holds Fi = fi / f1 for 1 ≤ i ≤ k and F1 = 1. The fundamental frequency of the overall harmony pattern then is f1 /h. The value of h can be computed as lcm(b1 , . . . , bk ), i.e., it is the least common multiple (lcm) of the denominators of the frequency ratios. This can be seen as follows: Since

(a)

(b)

(c)

(d)

(e)

(f)

Figure 5: Sinusoids of the major triad in root position (ac), their superposition (d), and corresponding autocorrelation function ρ(τ) (e). Figure 5f shows the general periodicity structure of the chord, abstracting from concrete overtone spectra and amplitudes. The solid boxes show the periodic patterns of the three tone components one upon the other. The dashed lines indicate the greatest common period of all tone components, called granularity by (Stolzenburg, 2012). Its frequency corresponds to their least common overtone.

8

the relative period length of the lowest tone T 1 = 1/F1 is 1, we have to find the smallest integer number that is an integer multiple of all relative period lengths T i = 1/Fi = bi /ai for 1 < i ≤ k. Since after ai periods of the i-th tone, we arrive at the integer bi , h can be computed as the least common multiple of all bi . For example, for T 2 = 1/F2 = 2/3 we obtain a2 · T 2 = 3 · 2/3 = 2 = b2 . Together with b3 = 4, ignoring b1 = 1, which is always irrelevant, we get h = lcm(1, 2, 4) = 4 as expected. We set up the following hypothesis on harmony perception: The perceived consonance of a harmony decreases with the relative periodicity h. For the major triad in root position, we have h = 4 (see above), which is quite low. Thus, the predicted consonance is high. This correlates very well to empirical results. Relative periodicity gives us a powerful approach to the analysis of musical harmony perception. We evaluate this hypothesis later at length (Section 4). But beforehand, we have to answer the question, which frequency ratios should be used in the computation of h. Since this is done in a special way here, we address this question in more detail now.

3.3

frequency ratio 81/64, deviation 0.45%) is tuned more accurately than usually assumed in just intonation (frequency ratio 5/4, deviation -0.79%), which we used earlier (Figure 5) for analyzing the major triad in root position. Hence, Pythagorean tuning is not in line with the results of psychophysics, in particular, that the just noticeable difference of pitch perception is about 1% (Zwicker et al., 1957; Roederer, 2008). Let us now look for tunings, where the relative deviation d with respect to equal temperament is approximately 1%. In the literature, historical and modern tunings are listed, e.g. Kirnberger III (see Table 1). However, they are also only partially useful in this context, because they do not take into account the fact on just noticeable differences explicitly. In principle, this also holds for the adaptive tunings introduced by Sethares (2005), where simple integer ratios are used and scales are allowed to vary. An adaptive tuning can be viewed as a generalized dynamic just intonation, which fits well with musical practice, because the frequencies for one and the same pitch category may vary significantly during the performance of a piece of music, dependent on the musical harmonic context. Trained musicians try to intonate e.g. a perfect fifth with the frequency ratio 3/2, and listeners are hardly able to distinguish this frequency ratio from others that are close to the value in equal temperament. In consequence, also the rational tuning, which we introduce now, primarily should not be considered as a tuning, but more as the basis for intonation and perception of intervals. The rational tuning takes the fractions with smallest possible denominators, such that the relative deviation with respect to equal temperament is just below a given percentage d. They can be computed by means of Farey sequences, i.e. ordered sequences of completely reduced fractions which have denominators less than or equal to some (small) n. Table 1 shows the rational tuning for d = 1% (cf. Stolzenburg, 2009, 2010). Its frequency ratios deviate only slightly from just tuning, namely only for the tritone and the minor seventh. Just tuning, as given in Table 1, can be understood as rational tuning with only slightly greater maximal relative deviation d = 1.1%. Therefore and since just intonation is widely used, we adopt it as well as the rational tuning as underlying tuning in the following analyses of our periodicity-based approach with harmoniousness measure h. This does not necessarily imply that the human auditory system is somehow computing a rational approximation. We simply assume here, consistent with the results from psychophysics and neuroacoustics, that the resolution of periodicity pitch in the brain is limited. This is in line with the results of Kopiez (2003) that professional musicians are able to adapt to equal temperament but cannot really discriminate in their performance between two tuning systems such as equal temperament and just intonation.

Tuning and Frequency Ratios

In Western music, an octave is usually divided into 12 semitones, corresponding to a frequency ratio of (approximately) √ 12 2. The frequencies for the k-th semitone in equal temperament √ with twelve tones per octave can be computed as 12 fk = 2 k · f0 , where f0 is the frequency of the lowest tone. The respective frequency ratios are shown in Table 1. The frequency values grow exponentially and not linearly, following the Weber-Fechner law in psychophysics, which says that, if the physical magnitude of stimuli grows exponentially, then the perceived intensity grows only linearly. In equal temperament and approximations thereof, e.g. Bach’s well temperament or just intonation, all keys sound more or less equal. This is essential for playing in different keys on one instrument and for modulation, i.e. changing from one key to another within one piece of music. Since this seems to be the practice that is currently in use, at least in Western music, we adopt the equal temperament as reference system for tunings here. The frequency ratios in equal temperament are irrational numbers (except for the unison and its octaves), but for computing periodicity they must be fractions, as mentioned above. Let us thus consider tunings with rational frequency ratios. The oldest tuning with this property is probably the Pythagorean tuning, as shown in Table 1. Here, frequency relationships of all intervals have the form 2m · 3n for some (possibly negative) integers m and n, i.e., they are based on fifths, strictly speaking, a stack of perfect fifths (frequency ratio 3/2), applying octave equivalence. However, although large numbers appear in the numerators and denominators of the fractions in Pythagorean tuning, the relative deviations with respect to equal temperament (shown in brackets in Table 1) may be relatively high. For example, the tritone (semitone 6, frequency ratio 729/512) is still slightly mistuned (deviation 0.68%). On the other hand, the major third (semitone 4,

3.4

The Approximation Procedure

How can frequency ratios Fi be approximated by fractions algorithmically? For this, the so-called Stern-Brocot tree can be employed (Graham et al., 1994; Foriˇsek, 2007). It induces an effective procedure for approximating numbers x by frac-

9

Table 1: Table of frequency ratios for different tunings. Here, k denotes the number of the semitone corresponding to the given interval. In parentheses, the relative deviation of the respective frequency ratio from equal temperament is shown. The maximal deviation d for the rational tuning listed here is 1%. interval unison minor second major second minor third major third perfect fourth tritone perfect fifth minor sixth major sixth minor seventh major seventh octave

k equal temper. Pythagorean 0 1.000 1/1 (0.00%) 1 1.059 256/243 (-0.56%) 2 1.122 9/8 (0.23%) 3 1.189 32/27 (-0.34%) 4 1.260 81/64 (0.45%) 5 1.335 4/3 (-0.11%) 6 1.414 729/512 (0.68%) 7 1.498 3/2 (0.11%) 8 1.587 128/81 (-0.45%) 9 1.682 27/16 (0.34%) 10 1.782 16/9 (-0.23%) 11 1.888 243/128 (0.57%) 12 2.000 2/1 (0.00%)

xmin = (1 − p)x; xmax = (1 + p)x; al /bl = bxc/1; ar /br = (bxc + 1)/1; loop am /bm = (al + ar )/(bl + br ); if xmin ≤ am /bm ≤ xmax then return am /bm ; else if xmax < am /bm then k = b(ar − xmax br )/(xmax bl − al )c; am /bm = (ar + k al )/(br + k bl ); ar /br = am /bm else if am /bm < xmin then k = b(xmin bl − al )/(ar − xmin br )c; am /bm = (al + k ar )/(bl + k br ); al /bl = am /bm end if end loop

Kirnberger III 1/1 (0.00%) 25/24 (-1.68%) 9/8 (0.23%) 6/5 (0.91%) 5/4 (-0.79%) 4/3 (-0.11%) 45/32 (-0.56%) 3/2 (0.11%) 25/16 (-1.57%) 5/3 (-0.90%) 16/9 (-0.23%) 15/8 (-0.68%) 2/1 (0.00%)

rational tuning 1/1 (0.00%) 16/15 (0.68%) 9/8 (0.23%) 6/5 (0.91%) 5/4 (-0.79%) 4/3 (-0.11%) 17/12 (0.17%) 3/2 (0.11%) 8/5 (0.79%) 5/3 (-0.90%) 16/9 (-0.23%) 15/8 (-0.68%) 2/1 (0.00%)

just tuning 1/1 (0.00%) 16/15 (0.68%) 9/8 (0.23%) 6/5 (0.91%) 5/4 (-0.79%) 4/3 (-0.11%) 7/5 (-1.01%) 3/2 (0.11%) 8/5 (0.79%) 5/3 (-0.90%) 9/5 (1.02%) 15/8 (-0.68%) 2/1 (0.00%)

nation (frequency ratio 3/2) is approximated as good as possible. Thus, we develop a fraction a/b with 2a/b ≈ 3/2, where a is the number of the semitone representing the fifth. Hence, we have to approximate x = log2 (3/2) ≈ 0.585. The corresponding sequence of mediants (between 0/1 and 1/1) is 1/2, 2/3, 3/5, 4/7, 7/12, 17/29, 24/41, . . . Thus, among others, it shows a/b = 7/12 as desired, because semitone a = 7 gives the perfect fifth in the chromatic scale with b = 12 tones per octave with high precision. Incidentally, only from this mediant on, the deviation of a/b with respect to x (which is logarithmic with respect to the relative frequency) is below 1%, namely 0.27% more precisely.

3.5

Examples

We conclude this section by more detailed examples of computing relative and logarithmic periodicity. First, let us consider the first inversion of the diminished triad consisting of the tones A4, C5, and F]5 (see Figure 1c). The corresponding numbers of semitones are {0, 3, 9}, where the lowest tone alFigure 6: Approximating rational numbers x by fractions a/b ways gets the number 0 and is associated with the frequency with precision p. We use the floor function bxc here, which ratio 1/1 (unison), since it is taken as reference tone. We use yields the largest integer less than or equal to x. the corresponding frequency ratios according to just tuning, i.e. {1/1, 6/5, 5/3} for the example chord. Hence, its relative tions y = a/b with some precision, i.e. maximal relative error periodicity is h0 = lcm(1, 5, 3) = 15. This is taken as the raw |y/x − 1|. The main idea is to perform a binary search be- value for further computation. tween two bounds: al /bl (left) and ar /br (right). We start with In order to smooth and hence stabilize the calculated pethe two integer numbers, that are nearest to x, e.g. 1/1 and riodicity values, we take the inversions of the given harmony 2/1, and repeat computing the so-called mediant am /bm = into account: We adopt each tone as reference tone, not only (al + ar )/(bl + br ), until x is approximated with the desired the lowest tone. Thus, we consider also the chords with the precision. Figure 6 shows an improved, efficient version of semitones {−3, 0, 6} and {−9, −6, 0}. For semitones associated this procedure, following the lines of Foriˇsek (2007). with a negative number −n, we take the frequency ratio of The approximation procedure for frequency ratios may semitone 12 − n and halve it, i.e., we do not assume ocalso help us explain the origin of the chromatic twelve-tone tave equivalence here. Therefore, we get the frequency ratios scale. For this, we look for a tuning in equal temperament {5/6, 1/1, 7/5} and {3/5, 7/10, 1/1} with relative periodicity with n tones per octave, such that the perfect fifth in just into- values h1 = lcm(6, 1, 5) = 30 and h2 = lcm(5, 10, 1) = 10, 10

respectively. Since relative periodicity of chords is computed relative to the lowest tone, we multiply the values of h by the lowest frequency ratio in the chord, obtaining h00 = 15, h01 = 5/6 · 30 = 25, and h02 = 3/5 · 10 = 6. The arithmetic average of these values is h = (15 + 25 + 6)/3 ≈ 15.3. For logarithmic periodicity, we compute the average log2 (h) = (log2 (15) + log2 (25) + log2 (6))/3 ≈ 3.7. Note that, as a side effect of this procedure, for logarithmic periodicity, we implicitly obtain the geometric average over all h values instead of the arithmetic average. Because of the Weber-Fechner law and the logarithmic mapping of frequencies on both the basilar membrane in the inner ear and the periodicity map in the brain, this seems to be entirely appropriate. As second example, we consider once again the major triad, but this time spread over more than one octave, consisting of the tones C3 (lowest tone), E4 (major tenth), and G4 (perfect twelfth) with corresponding numbers of semitones {0, 16, 19}. In contrast to other approaches, we do not project the tones into one octave, but apply a factor 2 for each octave. We assign the frequency ratio of the semitone with number n mod 12, multiplied with 2n div 12 , to the n-th semitone. For instance, the frequency ratio for the major tenth with corresponding semitone number n = 16 is 2 · 5/4 = 5/2 in lowest terms. The frequency ratios of the whole chord are {1/1, 5/2, 3/1}. Hence, h = lcm(1, 2, 1) = 2 and log2 (h) = 1. Averaging over inversions does not change the results here. Perceived consonance/dissonance certainly depends on overtones to a certain extent. Since concrete amplitudes are neglected in the periodicity-based approach, there are ambiguous cases, where harmonic overtones cannot be distinguished from extra tones coinciding with these overtones. This holds for the actual example, because it contains the tones C3 and G4, which are more than an octave apart, namely n = 19 semitones (perfect twelfth), with frequency ratio 3/2 · 2 = 3/1. It cannot be distinguished from a complex tone with C3 as fundamental tone and harmonic overtones, because its third partial corresponds to G4. Nonetheless, the waveforms normally differ here: In the former case the amplitudes of C3 and G4 may be almost equal, whereas in the latter case the amplitude a of the overtone G4 may be significantly lower than that of C3. The superposition of both sinusoids can be stated roughly as sin(ωt) + a sin(3ωt), where ω = 2π f with f ≈ 131 Hz (the frequency of C3), and t is the time. For −1/3 ≤ a ≤ 1/9, the higher tone component G4 does not induce any additional local extrema in the waveform, which would correspond to additional spikes in the neuronal activity. It seems that people tend to underestimate the number of pitches in such chords (Dewitt and Crowder, 1987; McLachlan et al., 2013, p. 5). Nevertheless, even in such ambiguous cases, the periodicity-based approach yields meaningful results and high correlations to empirical ratings of harmony perception for realistic chords (see Section 4.2). As third and last example, let us calculate the periodicity of the complete chromatic scale, i.e. the twelve tones constituting Western music. For this, we compute the least common multiple of the denominators of all frequency ratios according to just tuning within one octave. We obtain 11

h = lcm(1, 15, 8, 5, 4, 3, 5, 2, 5, 3, 5, 8) = 120. Averaging over the inversions yields h ≈ 168.2 and log2 (h) ≈ 7.4. Thus interestingly, the logarithmic periodicity for the chromatic scale is within the biological bound of 8 octaves, which can be represented in the neuronal periodicity map (cf. Section 2.6). In the sequel, we always make use of averaged periodicity values and therefore omit overbars for the ease of notation.

4

Results and Evaluation

Let us now apply the periodicity-based approach and other theories on harmony perception to common musical harmonies and correlate the obtained results with empirical results (Malmberg, 1918; Roberts, 1986; Schwartz et al., 2003; Johnson-Laird et al., 2012; Temperley and Tan, 2013). The corresponding experiments are mostly conducted by (cognitive) psychologists, where the harmonies of interest are presented singly or in context. The listeners are required to judge the consonance/dissonance, using an ordinal scale. All empirical and theoretical consonance values are either taken directly from the cited references or calculated by computer programs according to the respective model on harmony perception. In particular, the periodicity values h and log2 (h) are computed by a program, implemented by the author of this article, written in the declarative programming language ECLiPSe Prolog (Apt and Wallace, 2007; Clocksin and Mellish, 2010). A table listing the computed harmoniousness values for all 2048 possible harmonies within one octave consisting of up to 12 semitones among other things is available under http: //ai-linux.hs-harz.de/fstolzenburg/harmony/. In our analyses, we correlate the empirical and the theoretical ratings of harmonies. Since in most cases only data on the ranking of harmonies is available, we mainly correlate rankings. Nevertheless, correlating concrete numerical values yields additional interesting insights (see Section 4.2). For the sake of simplicity and consistency, we always compute Pearson’s correlation coefficient r, which coincides with Spearman’s rank correlation coefficient on rankings, provided that there are not too many bindings, i.e. duplicate values. For the significance of the results, we apply √ √ determining r n − 2/ 1 − r2 ∼ tn−2 , i.e., we have n − 2 degrees of freedom in Student’s t distribution, where n is the number of corresponding ranks or values. We always perform a one-sided test whether r ≯ 0.

4.1

Dyads

Table 2 shows the perceived and computed relative consonance of dyads (intervals). The empirical rank is the average ranking according to the summary given by Schwartz et al. (2003, Figure 6), which includes the data by Malmberg (1918). Table 3 provides a more extensive list of approaches on harmony perception, indicating the correlation of the rankings together with its significance. As one can see, the correlations of the empirical rating with the sonance factor and

Table 2: Consonance rankings of dyads. The respective numbers of semitones with respect to the Western twelve-tone system are given in braces, raw values of the respective measures in parentheses. The empirical rank is the average rank according to the summary given by Schwartz et al. (2003, Figure 6). The roughness values are taken from Hutchinson and Knopoff (1978, Appendix). For computing the sonance factor (Hofmann-Engl, 2004, 2008), the Harmony Analyzer 3.2 applet software has been used, available at http://www.chameleongroup.org.uk/software/piano.html. For these models, always C4 (middle C) is taken as lowest tone. interval unison {0, 0} octave {0, 12} perfect fifth {0, 7} perfect fourth {0, 5} major third {0, 4} major sixth {0, 9} minor sixth {0, 8} minor third {0, 3} tritone {0, 6} minor seventh {0, 10} major second {0, 2} major seventh {0, 11} minor second {0, 1} correlation r

emp. rank 1 2 3 4 5 6 7 8 9 10 11 12 13

roughness 2 (0.0019) 1 (0.0014) 3 (0.0221) 4 (0.0451) 6 (0.0551) 5 (0.0477) 7 (0.0843) 10 (0.1109) 8 (0.0930) 9 (0.0998) 12 (0.2690) 11 (0.2312) 13 (0.4886) .967

sonance factor 1-2 (1.000) 1-2 (1.000) 3 (0.737) 4 (0.701) 5 (0.570) 6 (0.526) 7 (0.520) 8 (0.495) 11 (0.327) 9 (0.449) 10 (0.393) 12 (0.242) 13 (0.183) .982

similarity 1-2 (100.00%) 1-2 (100.00%) 3 (66.67%) 4 (50.00%) 6 (40.00%) 5 (46.67%) 9 (30.00%) 7 (33.33%) 8 (31.43%) 10 (28.89%) 11 (22.22%) 12 (18.33%) 13 (12.50%) .977

rel. periodicity 1-2 (1.0) 1-2 (1.0) 3 (2.0) 4-5 (3.0) 6 (4.0) 4-5 (3.0) 7-8 (5.0) 7-8 (5.0) 9 (6.0) 10 (7.0) 12 (8.5) 11 (8.0) 13 (15.0) .982

Table 3: Correlations of several consonance rankings with empirical ranking for dyads. For percentage similarity (Gill and Purves, 2009, Table 1), gradus suavitatis (Euler, 1739), consonance value (Brefeld, 2005), and the Ω measure (Stolzenburg, 2012), the frequency ratios from just tuning (see Table 1) are used. approach sonance factor (Hofmann-Engl, 2004, 2008) relative and logarithmic periodicity (just tuning) consonance raw value (Foltyn, 2012, Figure 5) percentage similarity (Gill and Purves, 2009, Table 1) roughness (Hutchinson and Knopoff, 1978, Appendix) gradus suavitatis (Euler, 1739) consonance value (Brefeld, 2005) pure tonalness (Parncutt, 1989, p. 140) relative and logarithmic periodicity (rational tuning) dissonance curve (Sethares, 2005, Figure 6.1) Ω measure (Stolzenburg, 2012) generalized coincidence function (Ebeling, 2008, Figure 3B) relative periodicity (Pythagorean tuning) relative periodicity (Kirnberger III) complex tonalness (Parncutt, 1989, p. 140)

12

correlation r .982 .982 .978 .977 .967 .941 .940 .938 .936 .905 .886 .841 .817 .796 .738

significance p .0000 .0000 .0000 .0000 .0000 .0000 .0000 .0000 .0000 .0000 .0000 .0002 .0003 .0006 .0020

with relative or logarithmic periodicity show the highest correlation (r = .982). However, for dyads, almost all correlations of the different approaches are highly statistically significant. Exceptions are complex tonalness (Parncutt, 1989, p. 140) and relative periodicity, if Pythagorean tuning or Kirnberger III are employed, but both these tunings are not psychophysically motivated by the just noticeable difference of pitch perception of about 1% (cf. Section 3.3). In contrast to this, relative and logarithmic periodicity employing just intonation or the rational tuning with maximal deviation d = 1% (see Table 1) shows high correlation. Last but not least, it should be noted that the so-called major profile, investigated by Krumhansl (1990, Figure 3.1), also corresponds quite well with the empirical consonance ranking. Here, an ascending or descending major scale is presented to test persons. A probe tone comes next. The listener’s task is to rate it as a completion of the scale context. The correlation in this case is r = .846, which is still significant with p = .0001.

notion is motivated more by neuroacoustical results, namely that the spatial structure of the periodicity-pitch representation in the brain is organized as a logarithmic periodicity map. Johnson-Laird et al. (2012, Experiment 3) also provide data on n = 48 four-tone chords. A summary analysis is shown in Table 7, indicating once again high correlation for several harmoniousness measures, including logarithmic periodicity.

4.3

From Chords to Scales

In contrast to many other approaches, the periodicity-based approach can easily be applied to scales and yields meaningful results (Stolzenburg, 2009). For instance, Figure 7a shows the pentachord Emaj7/9 (with E4 as lowest tone), classically built from a stack of thirds, standard in jazz music. It is the highest ranked harmony with 5 out of 12 tones (log2 (h) = 3.751 with respect to just tuning, log2 (h) = 4.234 with respect to rational tuning). Although in most cases this pentachord is most likely not heard in a bitonal manner, it may be alternatively understood as superposition of the major triads E and B, which are in a tonic-dominant relationship ac4.2 Triads and More cording to classical harmony theory. As just said, it appears in Table 4 shows the perceived and computed relative conso- the front rank of all harmonies consisting of 5 out of 12 tones, nance of common triads (cf. Figure 1). There are several em- which does not hold for a superposition of a random chord pirical studies on the perception of common triads (Roberts, sequence. Thus, also for chord progressions the periodicity1986; Cook, 2009; Johnson-Laird et al., 2012). But since the based approach yields meaningful results. Nevertheless, this experiments conducted by Johnson-Laird et al. (2012, Exper- point has to be investigated further. All harmonies shown in iment 1), are the most comprehensive, because they exam- Figure 7 have low, i.e. good periodicity values, ranking among ined all 55 possible three-note chords, we adopt this study the top 5% in their tone multiplicity category with respect as reference for the empirical ranking here. Nonetheless, all to rational tuning. This holds for the pentatonics (5 tones, cited studies are consistent with the following preference or- log2 (h) = 5.302) the diatonic scale (7 tones, log2 (h) = 6.453) dering on triads: major ≺ minor ≺ suspended ≺ diminished ≺ as well as the blues scale (8 tones, log2 (h) = 7.600). augmented, at least for chords in root position. However, the Temperley and Tan (2013) investigate the perceived conordinal ratings of minor and suspended chords do not differ sonance of diatonic scales. Table 8 lists all classical church very much. Again, the summary table (Table 5) reveals high- modes, i.e. the diatonic scale and its inversions. The cogniest correlations for relative and logarithmic periodicity, if the tive model on the perception of diatonic scales introduced by underlying tuning is psychophysically motivated. The relative Temperley and Tan (2013) results in a 100% correlation with prevalence of chord types in Western classical music (Eber- the empirical data. Although the correlation for logarithmic lein, 1994, p. 421) also correlates rather well with the empir- periodicity obviously is not that good, it shows still high corical consonance rankings (r = .481, p = .0482). However, relation. Nevertheless, the major scale (Ionian, cf. Figure 7c) roughness (Hutchinson and Knopoff, 1978, 1979) and the so- appears in the front rank of 462 possible scales with 7 out of nance factor (Hofmann-Engl, 2004, 2008) yield relatively bad 12 tones with respect to relative and logarithmic periodicity. predictions on the perceived consonance of common triads. In addition, in contrast to more cognitive theories on harmony This is in line with the results of Cousineau et al. (2012), that perception, the periodicity-based approach introduced in this the quality of roughness (induced by beating) constitutes an article does not presuppose any principles of tonal music, e.g. aesthetic dimension that is distinct from that of dissonance. the existence of diatonic scales or the common use of the maThe data sets in Johnson-Laird et al. (2012) suggest fur- jor triad. They may be derived from underlying, more primither investigations. So, Table 6 shows the analysis of all pos- tive mechanisms, namely periodicity detection in the human sible three-tone chords in root position (Johnson-Laird et al., (as well as animal) brain. 2012, Figure 2). As one can see, the correlation between the Interestingly, with rational tuning as basis, the results for empirical rating with the predictions of the dual-process the- scales (Table 8) are better for logarithmic periodicity than ory is very high (r = .916). This also holds for logarithmic with just tuning (r = .964 vs. r = .786). In the former periodicity but not that much for relative periodicity, in partic- case, the seven classical church modes even appear in the very ular, if the correlation between the ordinal rating and the con- front ranks of their tone multiplicity category. The correlation crete periodicity values are taken, which are shown in brackets of percentage similarity is low. But despite this, the diatonic in Table 6 (r = .810 vs. r = .548). This justifies our prefer- scales and its inversions are among the 50 heptatonic scales ence of logarithmic to relative periodicity, because the former whose intervals conform most closely to a harmonic series

13

Table 4: Consonance rankings of common triads. The empirical rank is adopted from Johnson-Laird et al. (2012, Experiment 1), where the tones are reduced to one octave in the theoretical analysis here. The roughness values are taken from Hutchinson and Knopoff (1979, Table 1), where again C4 (middle C) is taken as the lowest tone. For relative periodicity and percentage similarity (Gill and Purves, 2009), the frequency ratios from just tuning are used. The dual-process theory (Johnson-Laird et al., 2012, Figure 2) as a cognitive theory only provides ranks, no numerical raw values. chord class major {0, 4, 7} {0, 3, 8} {0, 5, 9} minor {0, 3, 7} {0, 4, 9} {0, 5, 8} susp. {0, 5, 7} {0, 2, 7} {0, 5, 10} dim. {0, 3, 6} {0, 3, 9} {0, 6, 9} augm. {0, 4, 8} correlation r

emp. rank 1 (1.667) 5 (2.889) 3 (2.741) 2 (2.407) 10 (3.593) 8 (3.481) 7 (3.148) 6 (3.111) 4 (2.852) 12 (3.889) 9 (3.519) 11 (3.667) 13 (5.259)

roughness 3 (0.1390) 9 (0.1873) 1 (0.1190) 4 (0.1479) 2 (0.1254) 7 (0.1712) 11 (0.2280) 13 (0.2490) 6 (0.1549) 12 (0.2303) 10 (0.2024) 8 (0.1834) 5 (0.1490) .352

instability 1 (0.624) 5 (0.814) 4 (0.780) 2 (0.744) 3 (0.756) 6 (0.838) 8 (1.175) 11 (1.219) 9 (1.190) 12 (1.431) 7 (1.114) 10 (1.196) 13 (1.998) .698

similarity 1-2 (46.67%) 8-9 (37.78%) 5-6 (45.56%) 1-2 (46.67%) 5-6 (45.56%) 8-9 (37.78%) 3-4 (46.30%) 3-4 (46.30%) 7 (42.96%) 13 (32.70%) 10-11 (37.14%) 10-11 (37.14%) 12 (36.67%) .802

rel. periodicity 2 (4.0) 3 (5.0) 1 (3.0) 4 (10.0) 7 (12.0) 10 (15.0) 5 (10.7) 9 (14.3) 6 (11.0) 12 (17.0) 11 (15.3) 8 (13.3) 13 (20.3) .846

dual proc. 2 1 3 4 5 6 7 9 8 12 10 11 13 .791

Table 5: Correlations of several consonance rankings with empirical ranking for triads. Since only n = 10 values were available for pure tonalness (Parncutt, 1989, p. 140) and the consonance degree according to Foltyn (2012, Figure 6), because suspended or diminished chords, respectively, are missing, we have only 8 degrees of freedom in the calculation of the respective significance values. For some approaches (Helmholtz, 1862; Plomp and Levelt, 1965; Kameoka and Kuriyagawa, 1969; Sethares, 2005), the rankings are taken from Cook (2009, Table 1). approach relative periodicity (just tuning) logarithmic periodicity (just tuning) logarithmic periodicity (rational tuning) relative periodicity (rational tuning) percentage similarity (Gill and Purves, 2009) dual process (Johnson-Laird et al., 2012, Figure 2) consonance value (Brefeld, 2005) consonance degree (Foltyn, 2012, Figure 6) dissonance curve (Sethares, 2005) instability (Cook and Fujisawa, 2006, Table A2) gradus suavitatis (Euler, 1739) sensory dissonance (Kameoka and Kuriyagawa, 1969) tension (Cook and Fujisawa, 2006, Table A2) pure tonalness (Parncutt, 1989, p. 140) critical bandwidth (Plomp and Levelt, 1965) temporal dissonance (Helmholtz, 1862) sonance factor (Hofmann-Engl, 2004, 2008) roughness (Hutchinson and Knopoff, 1979, Table 1)

14

correlation r .846 .831 .813 .808 .802 .791 .755 .826 .723 .698 .690 .607 .599 .675 .570 .503 .434 .352

significance p .0001 .0002 .0004 .0004 .0005 .0006 .0014 .0016 .0026 .0040 .0045 .0139 .0153 .0162 .0210 .0399 .0692 .1193

Table 6: Consonance correlation for complete list of triads in root position (Johnson-Laird et al., 2012, Experiment 2, n = 19). The ordinal rating and the numerical values of the considered measures are given in parentheses. The correlation and significance values written in parentheses refer to the ordinal rating, while the ones outside parentheses just compare the respective rankings. The roughness values are taken from Johnson-Laird et al. (2012, Figure 2), who employ the implementation available at http://www.uni-graz.at/richard.parncutt/computerprograms.html, based on the research reported in Bigand et al. (1996). In all other cases, just tuning is used as underlying tuning for the respective frequency ratios. chord # semitones emp. rank roughness similarity rel. periodicity log. periodicity dual proc. 1a (major) {0, 16, 19} 1 (1.667) 2 (0.0727) 1 (64.44%) 1 (2.0) 1 (1.000) 1 2a {0, 19, 22} 3 (2.481) 1 (0.0400) 5 (52.59%) 8-9 (12.3) 6-7 (3.133) 2 3a {0, 16, 22} 4-5 (2.630) 3 (0.0760) 9 (38.62%) 15-16 (19.0) 8 (3.271) 3 4a {0, 15, 22} 6 (2.926) 4 (0.0894) 8 (39.26%) 8-9 (12.3) 6-7 (3.133) 4 5a (minor) {0, 15, 19} 2 (2.407) 5 (0.0972) 3-4 (55.56%) 4 (5.0) 4 (2.322) 5 6a (susp.) {0, 7, 14} 7 (3.148) 6 (0.0983) 3-4 (55.56%) 6-7 (11.7) 5 (2.918) 6 7a {0, 19, 23} 8 (3.370) 7 (0.1060) 2 (56.67%) 2-3 (4.0) 2-3 (2.000) 7 8a {0, 16, 23} 4-5 (2.630) 8 (0.1097) 6 (52.22%) 2-3 (4.0) 2-3 (2.000) 8 9a (dim.) {0, 15, 18} 11 (3.889) 16 (0.2214) 17 (28.57%) 13 (17.0) 14 (3.786) 9 10a {0, 11, 14} 13 (3.963) 9 (0.1390) 18 (28.33%) 10-11 (14.3) 12 (3.585) 10 11a {0, 18, 22} 15 (5.148) 12 (0.1746) 14 (30.05%) 15-16 (19.0) 11 (3.540) 11 12a {0, 17, 23} 12 (3.926) 13 (0.1867) 12 (34.37%) 6-7 (11.7) 10 (3.497) 12 13a {0, 14, 17} 9 (3.481) 14 (0.1902) 11 (36.11%) 5 (10.0) 9 (3.308) 13 14a {0, 15, 26} 10 (3.630) 17 (0.2485) 13 (33.52%) 12 (15.7) 15 (3.800) 14 15a {0, 11, 18} 14 (4.815) 18 (0.2639) 10 (36.90%) 17-18 (25.7) 17 (4.571) 15 16 (augm.) {0, 16, 20} 16 (5.259) 10 (0.1607) 7 (41.67%) 10-11 (14.3) 13 (3.655) 16 17a {0, 15, 23} 17-18 (5.593) 11 (0.1727) 16 (28.89%) 17-18 (25.7) 18 (4.655) 17 18a {0, 20, 23} 19 (5.630) 15 (0.2164) 15 (29.44%) 14 (17.7) 16 (3.989) 18 19a {0, 14, 25} 17-18 (5.593) 19 (0.3042) 19 (19.93%) 19 (101.0) 19 (5.964) 19 correlation r .761 (.746) .760 (.765) .713 (.548) .867 (.810) .916 significance p .0001 (.0001) .0001 (.0001) .0003 (.0075) .0000 (.0000) .0000

Table 7: Correlations of several consonance rankings with ordinal values of empirical rating for n = 48 selected four-note chords, spread over more than one octave (Johnson-Laird et al., 2012, Figure 3). For the Ω measure (Stolzenburg, 2012), percentage similarity (Gill and Purves, 2009), and gradus suavitatis (Euler, 1739), once again the frequency ratios from just tuning are used. approach dual process (Johnson-Laird et al., 2012, Figure 3) Ω measure (Stolzenburg, 2012) gradus suavitatis (Euler, 1739) logarithmic periodicity (just tuning) logarithmic periodicity (rational tuning) percentage similarity (Gill and Purves, 2009) relative periodicity (just tuning) relative periodicity (rational tuning) roughness (Hutchinson and Knopoff, 1978, 1979)

15

correlation r .895 .824 .785 .758 .754 .734 .567 .531 .402

significance p .0000 .0000 .0000 .0000 .0000 .0000 .0000 .0001 .0023

4444 G

¯ ¯ ¯ ¯ ¯ 6¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ 6¯ 4¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ (a) pentachord

(b) pentatonics

(c) diatonic scale

(d) blues scale

Figure 7: Harmonies (scales) with more than three tones. out of 4 · 107 examined possibilities Gill and Purves (2009, music with Makam melody types or tone scales of recordings Table 3), which is of course a meaningful result. of traditional Central African music (see http://music. africamuseum.be/ and Moelants et al., 2009), do not seem to be based on equal temperament tunings. Nevertheless, they 5 Discussion and Conclusions can be analyzed by the periodicity-based approach, predicting also relatively good, i.e. low values of consonance for 5.1 Summary these scales (Stolzenburg, 2010). The approaches by Gill and We have seen in this article, that harmony perception can be Purves (2009) and Honingh and Bod (2011) are also applicaexplained well by considering the periodic structure of har- ble in this context, too. monic sounds, that can be computed from the frequency ratios of the intervals in the given harmony. The presented results show highest correlation with empirical results and thus contribute to the discussion about consonance/dissonance of musical chords and scales. We conclude that there is a strong neuroacoustical and psychophysical basis for harmony perception including chords and scales.

5.2

Limitations of the Approach

The periodicity-based method, as presented in this article, clearly has some limitations. First, the available empirical studies on harmony perception used in this article (Schwartz et al., 2003; Johnson-Laird et al., 2012; Temperley and Tan, 2013) in general take the average over all participants’ ratings. Thus, individual differences in perception are neglected, e.g. the influence of culture, familiarity, or musical training. However, some studies report that especially the number of years of musical training has a significant effect on harmony perception, although periodicity detection remains an important factor, that is used to a different extent by musicians and nonmusicians (Hall and Hess, 1984; Lee et al., 2009; McLachlan et al., 2013). Second, we do not consider in detail the context of a harmony in a musical piece during the periodicity-based analyses in this article. Therefore, Parncutt and Hair (2011) attempt to explain the perception of musical harmonies as a holistic phenomenon, covering a broad spectrum, including the conception of consonance/dissonance as pleasant/unpleasant and the history in Western music and music theory, emphasizing that consonance/dissonance should be discussed along several dimensions such as tense/relaxed, familiar/unfamiliar, and tonal/atonal. Nevertheless, periodicity may be used as one constituent in explaining harmony perception, even with respect to the historical development of (Western) music, by assuming different levels of harmonic complexity changing over time (Parncutt, 1989, pp. 57-58). Third, we adopt the equal temperament as reference system for tunings, in this article, which is the basis of Western music. However, non-Western scales, e.g. in Turkish classical

5.3

Future Work

Future work should concentrate on even more exhaustive empirical experiments on harmony perception, in order to improve the significance and confidence in the statistical analyses, including more detailed investigations of chord progressions, possibly employing different tunings and timbres, and taking into account the historical development of Western music and beyond. Last but not least, the working of the brain with respect to auditory processing still must be better understood (see e.g. Patterson et al., 2002). In consequence, models of the brain that take temporal properties into account should be investigated, claimed also by Roederer (2008). Artificial neural networks that have this property (Cariani, 2001; Bahmer and Langner, 2006; Haykin, 2008; Stolzenburg and Ruh, 2009; Voutsas et al., 2004) could be considered further in this context.

Acknowledgments This article is a completely revised and extended version of previous work (Stolzenburg, 2009, 2010, 2012, 2014). I would like to thank Wolfgang Bibel, Peter A. Cariani, Norman D. Cook, Martin Ebeling, Adrian Foltyn, Sergey Gaponenko, Ludger J. Hofmann-Engl, Phil N. Johnson-Laird, Gerald Langner, Florian Ruh, Tilla Schade, and some anonymous referees for helpful discussions, hints, and comments on this article or earlier versions thereof.

References Apt, K. R. and Wallace, M. (2007). Constraint Logic Programming Using ECLiPSe. Cambridge University Press, Cambridge, UK. Bahmer, A. and Langner, G. (2006). Oscillating neurons in the cochlear nucleus: I. Simulation results, II. Experimental basis of a simulation paradigm. Biological Cybernetics, 95:371–392.

16

Table 8: Rankings of common heptatonic scales (church modes), i.e. with 7 out of 12 tones. As empirical rating, the overall preference for the classical church modes is adopted (Temperley and Tan, 2013, Figure 10). For the sonance factor (HofmannEngl, 2004, 2008), again C4 (middle C) is taken as lowest tone. For percentage similarity, the values are taken directly from (Gill and Purves, 2009, Table 3). mode

semitones

Ionian {0, 2, 4, 5, 7, 9, 11} Mixolydian {0, 2, 4, 5, 7, 9, 10} Lydian {0, 2, 4, 6, 7, 9, 11} Dorian {0, 2, 3, 5, 7, 9, 10} Aeolian {0, 2, 3, 5, 7, 8, 10} Phrygian {0, 1, 3, 5, 7, 8, 10} Locrian {0, 1, 3, 5, 6, 8, 10} correlation r significance p

emp. rank

sonance factor

similarity

1 2 3 4 5 6 7

4 (0.147) 1.5 (0.162) 1.5 (0.162) 3 (0.152) 6 (0.138) 7 (0.126) 5 (0.142) .667 .0510

3 (39.61%) 6 (38.59%) 5 (38.95%) 2 (39.99%) 4 (39.34%) 1 (40.39%) 7 (37.68%) .036 .4697

(0.83) (0.64) (0.58) (0.40) (0.34) (0.21)

Bailhache, P. (1997). Music translated into mathematics: Leonhard Euler. In Problems of translation in the 18th century, Nantes, France. Conference of the Center Franc¸ois Vi`ete. Original in French. English translation by Joe Monzo. Balzano, G. J. (1980). The group-theoretic description of twelvefold and microtonal pitch systems. Computer Music Journal, 4(4):66–84.

log. periodicity (just tuning) 1 (5.701) 4 (5.998) 2 (5.830) 3 (5.863) 7 (6.158) 5 (6.023) 6 (6.033) .786 .0181

log. periodicity (rational tuning) 1 (6.453) 3 (6.607) 2 (6.584) 4 (6.615) 5 (6.767) 6 (6.778) 7 (6.790) .964 .0002

8th Triennial Conference of the European Society for the Cognitive Sciences of Music, Thessaloniki, Greece. Carey, N. and Clampitt, D. (1989). Aspects of well-formed scales. Music Theory Spectrum, 11(2):187–206. Cariani, P. A. (1999). Temporal coding of periodicity pitch in the auditory system: An overview. Neural Plasticity, 6(4):147–172.

Cariani, P. A. (2001). Neural timing nets. Neural Networks, Beament, J. (2001). How we hear music: The relationship 14(6-7):737–753. between music and the hearing mechanism. The Boydell Press, Woodbridge, UK. Clocksin, W. F. and Mellish, C. S. (2010). Programming in Prolog: Using the ISO Standard. Springer, Berlin, HeidelBidelman, G. M. and Krishnan, A. (2009). Neural correlates berg, New York, 5th edition. of consonance, dissonance, and the hierarchy of musical pitch in the human brainstem. The Journal of Neuroscience, Cook, N. D. (2009). Harmony perception: Harmoniousness is 29(42):13165–13171. more than the sum of interval consonance. Music Perception, 27(1):25–41. Bigand, E., Parncutt, R., and Lerdahl, F. (1996). Perception of musical tension in short chord sequences: The influence Cook, N. D. (2012). Harmony, Perspective and Triadic Cogof harmonic function, sensory dissonance, horizontal monition. Cambridge University Press, Cambridge, New York, tion, and musical training. Perception and Psychophysics, Melbourne, Madrid, Cape Town. 58(1):125–141. Cook, N. D. and Fujisawa, T. X. (2006). The psychophysics of Boomsliter, P. C. and Creel, W. (1961). The long pattern hyharmony perception: Harmony is a three-tone phenomenon. pothesis in harmony and hearing. Journal of Music Theory, Empirical Musicology Review, 1(2):1–21. 5(1):2–31. Cousineau, M., McDermott, J. H., and Peretz, I. (2012). The Bowling, D. L., Gill, K. Z., Choi, J. D., Prinz, J., and Purves, basis of musical consonance as revealed by congenital amuD. (2010). Major and minor music compared to excited sia. Proceedings of the National Academy of Sciences, and subdued speech. Journal of the Acoustical Society of 109(48):19858–19863. America, 127(1):491–503. Dewitt, L. A. and Crowder, R. G. (1987). Tonal fusion of conBrefeld, W. (2005). Zweiklang, Konsonanz, Dissonanz, Oksonant musical intervals: The oomph in Stumpf. Perception tave, Quinte, Quarte und Terz. Online Publication. In Ger& Psychophysics, 41(1):73–84. man. Ebeling, M. (2007). Verschmelzung und neuronale AutokorreCambouropolos, E., Tsougras, C., Mavromatis, P., and Paslation als Grundlage einer Konsonanztheorie. Peter Lang, tiadis, K., editors (2012). Proceedings of 12th InternaFrankfurt am Main, Berlin, Bern, Bruxelles, New York, Oxtional Conference on Music Perception and Cognition and ford, Wien. In German. 17

Ebeling, M. (2008). Neuronal periodicity detection as a basis Honingh, A. and Bod, R. (2005). Convexity and the wellfor the perception of consonance: A mathematical model of formedness of musical objects. Journal of New Music Retonal fusion. Journal of the Acoustical Society of America, search, 34(3):293–303. 124(4):2320–2329. Honingh, A. and Bod, R. (2011). In search of universal propEberlein, R. (1994). Die Entstehung der tonalen Klangsyntax. erties of musical scales. Journal of New Music Research, Peter Lang, Frankfurt. In German. 40(1):81–89. Euler, L. (1739). Tentamen novae theoria musicae ex certissimis harmoniae principiis dilucide expositae. In Latin. English translation by Charles S. Smith in June 1960.

Hutchinson, W. and Knopoff, L. (1978). The acoustic component of western consonance. Interface/Journal of New Music Research, 7(1):1–29.

Fokker, A. D. (1969). Unison vectors and periodicity blocks Hutchinson, W. and Knopoff, L. (1979). The significance in the three-dimensional (3-5-7-)harmonic lattice of notes. of the acoustic component of consonance in Western triad. In Proceedings of Koninklijke Nederlandsche Akademie Journal of Musicological Research, 3(1-2):5–22. van Wetenschappen, pages 153–168, Amsterdam, NetherJohnson-Laird, P. N., Kang, O. E., and Leong, Y. C. (2012). lands. On musical dissonance. Music Perception, 30(1):19–35. Foltyn, A. (2012). Neuroscientific measure of consonance. In Kameoka, A. and Kuriyagawa, M. (1969). Consonance theCambouropolos et al. (2012), pages 310–316. ory: I. Consonance of dyads, II. Consonance of complex Foriˇsek, M. (2007). Approximating rational numbers by fractones and its calculation method. Journal of the Acoustical tions. In Crescenzi, P., Prencipe, G., and Pucci, G., editors, Society of America, 45(6):1451–1469. Fun with Algorithms – Proceedings of 4th International Conference, LNCS 4475, pages 156–165, Castiglioncello, Kopiez, R. (2003). Intonation of harmonic intervals: Adaptability of expert musicians to equal temperament and just Italy. Springer, Berlin, Heidelberg, New York. intonation. Music Perception, 20(4):383–410. Gill, K. Z. and Purves, D. (2009). A biological rationale for Krumhansl, C. L. (1990). Cognitive Foundations of Musimusical scales. PLoS ONE, 4(12):e8144. cal Pitch, volume 17 of Oxford Psychology Series. Oxford Graham, R. L., Knuth, D. E., and Patashnik, O. (1994). ConUniversity Press, New York. crete Mathematics. Addison-Wesley, Reading, MA, 2nd Langner, G. (1983). Evidence for neuronal periodicity detecedition. tion in the auditory system of the Guinea fowl: Implications Hall, D. E. and Hess, J. T. (1984). Perception of musical infor pitch analysis in the time domain. Experimental Brain terval tuning. Music Perception, 2(2):166–195. Research, 52(3):333–355. Hardy, G. H. and Wright, E. M. (1979). An Introduction to the Langner, G. (1997). Temporal processing of pitch in the audiTheory of Numbers. Clarendon Press, Oxford, England, 5th tory system. Journal of New Music Research, 26:116–132. edition. Langner, G., Sams, M., Heil, P., and Schulze, H. (1997). Haykin, S. (2008). Neural Networks and Learning Machines. Frequency and periodicity are represented in orthogonal Prentice Hall, 3rd edition. maps in the human auditory cortex: evidence from magnetoencephalography. Journal of Comparative Physiology Helmholtz, H. L. F. v. (1862). Die Lehre von den TonempfindA, 181:665–676. ungen. Vieweg, Braunschweig. In German. Langner, G. and Schreiner, C. E. (1988). Periodicity coding in Hirata, K., Tzanetakis, G., and Yoshii, K., editors (2009). Prothe inferior colliculus of the cat: I. Neuronal mechanisms, ceedings of 10th International Society for Music InformaII. Topographical organization. Journal of Neurophysioltion Retrieval Conference, Kobe, Japan. ogy, 60(6):1799–1840. Hofmann-Engl, L. J. (2004). Virtual pitch and its application to contemporary harmonic analysis. Chameleon Group Online Publication.

Lee, K. M., Skoe, E., Kraus, N., and Ashley, R. (2009). Selective subcortical enhancement of musical intervals in musicians. The Journal of Neuroscience, 29(18):5832–5840.

Hofmann-Engl, L. J. (2008). Virtual pitch and the classifi- Licklider, J. C. R. (1951). A duplex theory of pitch perception. cation of chords in minor and major keys. In Miyazaki, Experientia, 7(4):127–134. K., Hiraga, Y., Adachi, M., Nakajima, Y., and Tsuzaki, M., editors, Proceedings of 10th International Conference on Licklider, J. C. R. (1962). Periodicity pitch and related audiMusic Perception and Cognition, Sapporo, Japan. tory process models. International Journal of Audiology, 1(1):11–36.

18

Malmberg, C. F. (1918). The perception of consonance and dissonance. Psychological Monographs, 25(2):93–133.

Sethares, W. A. (2005). Tuning, Timbre, Spectrum, Scale. Springer, London, 2nd edition.

McLachlan, N., Marco, D., Light, M., and Wilson, S. (2013). Consonance and pitch. Journal of Experimental Psychology: General.

Stolzenburg, F. (2009). A periodicity-based theory for harmony perception and scales. In Hirata et al. (2009), pages 87–92.

Meddis, R. and Hewitt, M. J. (1991). Virtual pitch and phase sensivity of a computer model of the auditory periphery: I. Pitch identification, II. Phase sensivity. Journal of the Acoustical Society of America, 89(6):2866–2894.

Stolzenburg, F. (2010). A periodicity-based approach on harmony perception including non-western scales. In Demorest, S. M., Morrison, S. J., and Campbell, P. S., editors, Proceedings of 11th International Conference on Music Perception and Cognition, pages 683–687, Seattle, Washington, USA.

Moelants, D., Cornelis, O., and Leman, M. (2009). Exploring African tone scales. In Hirata et al. (2009), pages 489–494.

Parncutt, R. (1989). Harmony: A Psychoacoustical Approach. Stolzenburg, F. (2012). Harmony perception by periodicity and granularity detection. In Cambouropolos et al. (2012), Springer, Berlin, Heidelberg, New York. pages 958–959. Parncutt, R. and Hair, G. (2011). Consonance and dissonance in music theory and psychology: Disentangling dissonant Stolzenburg, F. (2014). Harmony perception by periodicity detection. In Proceedings of 13th International Conference dichotomies. Journal of Interdisciplinary Music Studies, on Music Perception and Cognition and 5th Conference of 5(2):119–166. the Asian-Pacific Society for the Cognitive Sciences of MuPatterson, R. D., Uppenkamp, S., Johnsrude, I. S., and Grifsic, Seoul, South Korea. Accepted for publication. fiths, T. D. (2002). The processing of temporal pitch and melody information in auditory cortex. Neuron, 36(4):767– Stolzenburg, F. and Ruh, F. (2009). Neural networks and continuous time. In Schmid, U., Ragni, M., and Knauff, M., 776. editors, Proceedings of KI 2009 Workshop Complex CogniPlomp, R. (1967). Beats of mistuned consonances. Journal of tion, pages 25–36, Paderborn. the Acoustical Society of America, 42(2):462–474. Stumpf, C. (1883). Tonpsychologie, volume 1-2. S. Hirzel, Plomp, R. and Levelt, W. J. M. (1965). Tonal consonance Leipzig. In German. and critical bandwidth. Journal of the Acoustical Society of Temperley, D. and Tan, D. (2013). Emotional connotations of America, 38(4):548–560. diatonic modes. Music Perception, 30(3):237–57. Roberts, L. A. (1986). Consonant judgments of musical chords by musicians and untrained listeners. Acustica, Terhardt, E., Stoll, G., and Seewann, M. (1982). Algorithm for extraction of pitch and pitch salience from complex 62:163–171. tonal signals. Journal of the Acoustical Society of AmerRoederer, J. G. (2008). The Physics and Psychophysics of ica, 71(3):679–688. Music: An Introduction. Springer, Berlin, Heidelberg, New York, 4th edition. Tramo, M. J., Cariani, P. A., Delgutte, B., and Braida, L. D. (2001). Neurobiological foundations for the theory of harSchouten, J. F. (1938). The perception of subjective tones. mony in western tonal music. Annals of the New York In Proceedings of Koninklijke Nederlandsche Akademie Academy of Sciences, 930:92–116. van Wetenschappen 41/10, page 1086–1093, Amsterdam, Netherlands. Vos, J. (1986). Purity ratings of tempered fifths and major thirds. Music Perception, 3(3):221–258. Schouten, J. F. (1940). The residue, a new component in subjective sound analysis. In Proceedings of Koninklijke Voutsas, K., Langner, G., Adamy, J., and Ochsen, M. (2004). Nederlandsche Akademie van Wetenschappen 43/3, page A brain-like neural network for periodicity analysis. IEEE 356–365, Amsterdam, Netherlands. Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, 35(1):12–22. Schulze, H. and Langner, G. (1997). Periodicity coding in the primary auditory cortex of the mongolian gerbil (merione- Zwicker, E., Flottorp, G., and Stevens, S. S. (1957). Critical sunguiculatus): two different coding strategies for pitch and band width in loudness summation. Journal of the Acousrhythm? Journal of Comparative Physiology A, 181:651– tical Society of America, 29(5):548–557. 663. Schwartz, D. A., Howe, C. Q., and Purves, D. (2003). The statistical structure of human speech sounds predicts musical universals. The Journal of Neuroscience, 23(18):7160– 7168. 19