Background Music Generation Using Music

Background Music Generation Using Music Texture Synthesis Min-Joon Yoo1 , In-Kwon Lee1 , and Jung-Ju Choi2 1

Department of Computer Science, Yonsei University Seoul 120-749, South Korea [email protected], [email protected] 2

Division of Media, Ajou University Suwon 443-749, South Korea [email protected]

Abstract. This paper suggests a method to synthesize a long background music sequence from a given short music clip in real-time. The copies of input clip are placed with overlapped region, the length of which is computed by random or clip matching method. Based on pitch, rhythm and chord cut criteria, the cutting point of the two clips is computed within the overlapped region. As a result, the two clips are concatenated at the cutting point. Generating some variations such as mirroring, retrograding and transposing of the given music clip makes the synthesized result much more dynamic. Suggested method is especially useful for interactive and real-time applications such as games and web contents.

1

Introduction

Background music along with various sound effects helps audiences to be immersed in various multimedia applications such as animations, games, web contents, and digital movies [4, 9, 11, 14]. Recently, background music is often created with loop-based method [16]. Users can easily create background music by arranging the template music loops offered by loop sequencing softwares. Because of various limitations such as memory space and transmission cost, the length of background music is usually limited to be very short and played repeatedly in many applications. However, such a simple repetition of the same music clip would be tedious even if the clip is a masterpiece. In this paper, we suggest a method to synthesize a long (possibly infinite) music sequence from a given short music clip. Audiences can realize that the resulting synthesized music is similar to the original music clip, but is not a simple repetition of the original clip. The copies of original music clip are placed randomly or using clip matching with some overlaps. Each overlapped region between two clips is analyzed to determine a cutting point, where two clips are cut and concatenated (see Fig. 1(a)). In this sense, our method, called “music texture synthesis”, is similar to the patch based texture synthesis techniques [2, 6, 8, 17] that are used for generating a large size texture image from a small input.

A

C D

B A

B

M

C

D

d

C

Cutting Points

(a)

(b)

Fig. 1. (a) The output music is synthesized by placing the copies of an input clip and concatenating them at some cutting points; (b) Placement is always aligned with measures (M : current output music, C: a copy of input music clip, and d: the number of measures in the overlapped region).

Recently, there have been some trials to synthesize a new audio signal from given audio sources [1, 5, 12]. However, most of those work concentrated on synthesizing sound effects but not background music. For the music synthesis we need to consider more about the musical relationships among the musical elements in the input music clip. We also present some extensions of the music synthesis in this paper: synthesizing output music from several input music clip variations obtained by mirroring, retrograding, and transposing the source music clip.

2

Placement

The copies of an input music clip can be placed with various granularities in terms of the musical elements such as beat and measure. In this paper, we only consider the measure-aligned placement scheme to keep the consistent time signature of the output music. Let C be a given input music clip with n measures, and assume that we want to place C near the end of the currently synthesized output music M with some overlapped regions of d measures, where 1 ≤ d ≤ n (see Fig. 1(b)). We use one of the two methods to determine the value d: random selection and clip matching methods. In random selection method, d can be randomly selected from the possible range of the value. In clip matching method, the acceptance of a randomly selected value d is determined by computing the cost c of matching in the overlapped region as follow: P 2 (M (t) − C(t)) , (1) c = t∈d r where r denotes the number of musical events in the overlapped region d. In the above formula, M (t) and C(t) are two musical events that happen at the same time t (or at least matching pair for comparison - see Section 3) in the region d. The difference between the two musical events, (M (t) − C(t)), can be measured with various criteria, for example, difference of pitches or durations of two notes.

We will present the detailed difference computation schemes later in Section 3. If the cost is less than a given limit, the random value d is accepted as the next translation for the synthesis. To control the randomness in the selection of d value, we can exploit the probabilistic determination scheme. When a matching cost c for a d is computed, we accept the d value only when the probability: c

P (c) = e− k ,

(2)

is larger than a given limit probability value (we took 0.5 as the limit probability for most of experiments). The predefined constant value k is determined by considering the average cost of matching, which can be used to control the randomness in the placement. The larger the value of k is given, the larger the randomness in selection of d.

3

Cutting Point

After determining the overlapping region d between the current output music M and the input music clip C, we need to decide the exact cutting point in d, where M and C are cut and concatenated. For determining the cutting point, we can exploit various music analysis and synthesis techniques such as melody generation, rhythm analysis, and chord analysis [15]. First, for each note M (t) in the current output music, a matching note in the input clip C is found. If there is a note C(t) in C played at the same time t, (M (t), C(t)) can be a matching pair. If not, we pick a nearest note C(t∗ ) from t, where t∗ < t, as the matching note of M (t) (see Fig. 2). Similarly, the matching note for each note C(t) can be found in M . Pitch Cut: In the melody of typical music, the pitch difference between two adjacent notes is not usually too big [13]. Using this fact, we can design the process to select the cutting point in terms of the pitch variations in the overlapped region. We select the position of a matching note pair with minimum pitch difference as the cutting point. If two or more tied candidates are found, we can randomly select one of the candidates (see Fig. 2). Rhythm Cut: For determining the cutting measure in terms of rhythm variation, we find a pair of two corresponding measures (the two measures from M and C overlapped in d) having the most similar rhythm pattern. The number of matching note pairs (denoted as m) having different note durations is counted for each pair of measures. The pair of measures with smallest m can be selected as the cutting measure. Fig. 2 shows an example of finding a cutting measure. Chord Cut: If the chord sequence for the input music clip is given or analyzed [15], there can be many possibilities to find different cutting points in terms of various chord progression schemes in popular or jazz music [10]. For the convenience in chord analysis, we classified the given chords into the three basic categories as follows (the chord names are given assuming C major key):

C(T)

F(S)

C(T)

F(S)

& 44 q q q q q q

Output: M

C(T)

& 44 q q q q q q q h F(S)

C(T)

G(D)

q

qqh

Input: C

C(T)

G(D)

C(T)

q q q qq h h a

h F(S)

b

qqqqq qq w G(D)

C(T)

Fig. 2. Determining cutting point: The vertical line segments between the notes of M and C represent the matching pairs. Each chord name is denoted with the class of the chord: one of T (Tonic), S (Subdominant), and D (Dominant). Using pitch cut, two pairs, a and b, are the candidates of the cutting point, which have smallest difference between pitches (diminished 2nd interval). With the rhythm cut method, the first measure pair among the four measure pairs is selected as cutting measure pair that have most similar rhythm patterns. With chord cutting method, we can select the third or fourth measure pair, each of which consists of two measures having the same chord.

– Tonic family: C, Em, and Am – Subdominant family: F and Dm – Dominant family: G and Bm−5 In the above classification, we enumerated only the diatonic triads, however, more complex chords used in jazz or popular music can be classified into the above three classes [10]. We also considered various chord substitution rules used in reharmonizing jazz music [10] for the classification. The cutting pair of measures is determined as the pair of measures where two measures have most similar chord progression (see Fig. 2). Hybrid Method: After deciding the cutting measure by rhythm or chord cut method, we can additionally apply the pitch cut method for the melody in the cutting measure. Using this hybrid method, the exact cutting note is determined in the cutting measure, thus, the melody in the output music can be smoothly connected without any abrupt change.

4

Variations and Results

We can apply the following variations to the given original music clip to generate slightly varied input sources: – Mirroring: The pitch of each note in the original clip is mirrored with respect to the average pitch. For example, in Fig. 3(b), the notes in the original music clip are mirrored with respect to the center pitch ‘F’. – Retrograding: The order of notes in the original melody is reversed from end to start (see Fig. 3(c)). – Transposing: The original melody is transposed to some other meter (see Fig. 3(d)).

(a)

& b 86 q

(c)

c & b 86 q q q

(e)

& b 86 q

c qq

qqq qqqqqq q

qqq q

qqqqqq qqqq

qqqq

(b)

& b 86 q

bq q q q

bq q q #q bq q bq q q q

c bq q

(d)

& b 86 q

qqq q

q q q nq q q q q q q

c qq

q q q nq q q q q q q

c qq

b

c

a

qqq qqqq

qqq q

d

q q q q q q q q q bq q q q

qqq q

Fig. 3. Melody synthesis example: (a) 4 measure input music clip, (b) mirrored input (center = F), (c) retrograde input, (d) transposed input (by major 2nd interval), and (e) synthesized output from (a)–(d).

C2(I)

(a)

Db7(V)

Am7(I)

Dm7(IV)

Em7(I)

Dm7(IV)

Bm7-5(V)

& 44 q q q q q q q q q q q q q q (b)

F7(IV)

G7(V)

F7(IV)

F#m7-5(I)

G7(V)

& 44 q q q q q q b q n q b q # q b q q q n q q q q q q q q q q # q q b q q q q q

Dm7(IV)

Bm7-5(V)

C2(I)

Db7(V)

Dm7(IV)

Em7(I)

G7(V)

Dm7(IV)

Em7(I)

Dm7(IV)

Em7(I)

Dm7(IV)

Bm7-5(V)

Dm7(IV)

G7(V)

F7(IV)

F#m7-5(I)

G7(V)

qq qqq q q q q q q q s qc q q q q q q q q q q q q q q q q q q q # q q q q q q F#m7(I)

A7(V)

& q q q q q q q q q q q #q q bq q b q q q q q #q q

Em7(IV)

F#m7(I)

Em7(IV)

bq q q q q #q q q q

C2(I)

Db7(V)

q q q q q q q q q q q q q q q q q q q q bq q bq

C#m7-5(V)

D2(I)

D#7(V)

Dm7(IV)

Em7(I)

qq qqq qq q F#m7(I)

Em7(V)

q bq q q q q q q q q q q q bq #q #q q q q bq q q

Fig. 4. An example including chords: (a) original input; (b) synthesized output from the original hOi, retrograde hRi, mirrored hMi, and transposed hTi inputs. We used hybrid method combining chord-cut and pitch-cut to generate the output.

From the given input music clip, the varied input clips using above operations can be automatically generated and used for creating synthesized music (see Fig. 3(e)), which is usually more dynamic than the music synthesized from a single input clip. We used pitch-cut for determining the cutting points, and clip matching for the placements of source clips in the example in Fig. 3(e). Fig. 4 shows a practical example including the chord progression. In this example, we applied the variations such as transposing, mirroring, and retrograding to generate various input sources from the given original clip with 8 measures. When we generate the variations of an input music clip (with chords), the generated melody is possibly not compatible with the chords. In this case, the pitches of some notes in the varied melody are slightly modified by considering the available notes of various modes in jazz theory [3]. The modification process can be automated by the systematic rules defining available and avoided notes in seven possible modes such as Ionian, Lydian, Aeolian, etc. Each avoided note can be modified to the nearest available note. This modification makes the resulting melody become more suitable with the chords. Readers can access more results of this research by internet [18] including some examples with melody and accompaniment.

5

Conclusion

In this paper, we suggested a method to synthesize a long music sequence from a given short input music clip. Our method is inherently greedy-type, thus, they can be tuned for interactive and real-time applications. Instead of packaging a long music sequence within a single multimedia contents, the background music can be synthesized from a short input music clip in real-time. We are investigating better methods by exploiting more musical criteria for cutting and placing the music clip. For example, a more intelligent chord-cut method can be used by considering non-diatonic scale chords such as secondary dominants [10]. For the placement, we can exploit more sophisticated algorithm using MRF (Markov Random Field) that is often used in texture synthesis [6, 17]. Although we tested our methods for the music of MIDI format, we believe the methods can also be applicable to the audio signal with the aids of music perception techniques [7].

References 1. Cardle, M., Brooks, S., Bar-Joseph, Z., and Robinson, P.: Sound-by-Numbers: Motion-Driven Sound Synthesis. Proceedings of ACM SIGGRAPH/Eurographics Symposium on Computer Animation (2003) 2. Cohen, M., Shade, J., Hiller, S., and Deussen, O.: Wang Tiles for Image and Texture Generation. Proceedings of ACM SIGGRAPH 2003 (2003) 287-294 3. Coker, J.: Improvising Jazz. Fireside publisher (1986) 4. Cook, P.: Real Sound Synthesis for Interactive Applications. AK Peters (2002) 5. Dubnov, S., Bar-Joseph, Z., El-Yaniv, R., Lischinski, D., and Werman, M.: Synthesis of Audio sound Textures by Learning and Resampling of Wavelet Trees. IEEE Computer Graphics and Applications, 22(4) (2002) 38–48 6. Efros, A., and Freeman, W.: Image Quilting for Texture Synthesis and Transfer. Proceedings of ACM SIGGRAPH 2001 (2001) 341–346 7. Gold, B., and Morgan, N.: Speech and Audio Signal Processing: Processing and Perception of Speech and Music. Wiley Text Books (1999) 8. Kwatra, V., Sch¨ odl, A., Essa, I., Turk, G., and Bobick, A.: Graphcut Textures:Image and Video Texture Synthesis Using Graph Cuts. Proceedings of ACM SIGGRAPH (2003) 277-286 9. Laybourne, K.: The Animation Book. Three Rivers Press (1998) 10. Levine, M.: The Jazz Theory Book. Sher Music Co. (1996) 11. Marks, A.: The Complete Guide to Game Audio. CMP Books (2001) 12. Parker, J., and Chan, S.: Sound Synthesis for the Web, Games, and Virtual Reality. Paper Sketch at SIGGRAPH 2003 (2003) 13. Perricone, J.: Melody in Songwriting. Hal Leonard Publisher (2000) 14. Rose, J.: Producing Great Sound for Digital Video. CMP Books (1999) 15. Rowe, R.: Machine Musicianship. MIT Press (2004) 16. Souvignier, T.: Loops and Grooves: The Musician’s Guide to Groove Machines and Loop Sequencers. Hal Leonard Publisher (2002) 17. Wei, L., and Levoy, M., Fast Texture Synthesis Using Tree Structured Vector Quantization. Proceedings of ACM SIGGRAPH (2000) 479-488 18. Music Texture Synthesis Research Page: http://visualcomputing.yonsei.ac.kr/research/musictexture.html