NetNeg: A Hybrid System Architecture for Composing Polyphonic Music

8 downloads 0 Views 155KB Size Report
Music. Claudia V. Goldman and Dan Gang and Je rey S. Rosenschein. Computer Science Department. Hebrew University. Givat Ram, Jerusalem, Israel.
NetNeg: A Hybrid System Architecture for Composing Polyphonic Music Claudia V. Goldman and Dan Gang and Je rey S. Rosenschein Computer Science Department Hebrew University Givat Ram, Jerusalem, Israel ph: 011-972-2-658-5353 fax: 011-972-2-658-5439 email:[email protected], [email protected], je @cs.huji.ac.il

Abstract

There are musical activities in which we are faced with symbolic and sub-symbolic processes. This research focuses on the question whether integrating a neural network together with a distributed arti cial intelligence approach has any advantages in the music domain. In this work, we present a new approach for composing and analyzing poliphonic music. As a case study, we began experimenting with rst species two-part counterpoint melodies. Our system design is inspired by the cognitive process of a human musician. We have developed a hybrid system composed of a connectionist module and an agent-based module to combine the symbolic and sub-symbolic levels to achieve this task. The network produces aesthetic melodies based on the training examples it was given. The agents choose which are the next notes in the two-part melodies by negotiating over the possible combinations of notes suggested by the network. Keywords: Music, Distributed Arti cial Intelligence, Neural Networks, Learning, Negotiation.

1 Introduction

The interaction between music and Arti cial Intelligence (AI) can be seen in many areas, such as musical signal processing, midi software, and real-time interaction between a performer and a machine. In AI and Neural Networks (NN), researchers explore musical tasks such as composing, listening, analyzing, and performance. Research in AI and NN can itself bene t from results in music research; in music, for example, terms like time and hierarchical structure are inherent to the domain. Therefore, music is an appropriate experimental domain for formal speci cation of AI theories, and their veri cation. Whenever we are involved in any musical activity (e.g., composing and listening) we are faced with symbolic and sub-symbolic processes. While listening, our aesthetic judgment is not necessarily achieved by following explicit

rules. Applying a learning mechanism has been shown to be a convenient (and appropriate) way to overcome the explicit formulation of rules. Nevertheless, some of these aesthetic processes might have a symbolic representation and might be speci ed by a rule-based system. In many musical styles, the composer needs to create di erent sequences of notes (i.e., melody lines) that will be played simultaneously. Each sequence should follow some aesthetic criterion, and in addition the sequences should sound appropriate when combined. This overall composition is the result of many interactions among its components. The musician achieves his overall result by compromising between the perfection of a single component and the combination of sequences as a whole. Thus, in this activity there is a constant tradeo between the quality of a single sequence versus the quality of the combined group of sequences. In this work, we are interested in aesthetic melodies produced by Neural Networks, while at the same time investigating how Distributed Arti cial Intelligence (DAI) methods can be applied to solve the aforementioned tradeo . The case study we have chosen to experiment with is the polyphonic vocal style of the sixteenth century; more speci cally, we start with an investigation of two-part counterpoint rst species. We are currently implementing a neural network that learns how to produce each part of these melodies. We then regard the different components of the melodies as agents (in the DAI sense). These agents negotiate with one another to maximize the global utility of the whole system, which for our purposes should be interpreted as the global quality of the composition. The main purpose of this work is to build a hybrid system architecture for composing polyphonic music. Composing such music is a human activity requiring intelligence, and thus an appropriate problem for Arti cial Intelligence research. This system is also useful as an application for composition and analysis. We have decided to begin experimenting with twopart counterpoint melodies for two reasons. First, this is one way that humans acquire the skills to begin writing polyphonic music. Second, this is a relatively simple problem, and thus it is easier to train the network and verify its results. In spite of that, our current system can serve as a basis for further investigation of more complex musical problems.

2 The System Architecture

The focus of this work is on developing a system that is able to produce new two-part counterpoint melodies. We combine the symbolic and sub-symbolic levels to achieve this task. The overall design of our system re ects the cognitive process that a musician undergoes in producing such compositions.

2.1 The Human Musician's Approach

Usually, one part of the melody (cantus rmus) is given, and an upper or lower part is added to it. Both parts consist of whole notes. Issues that have to be considered regard the beauty of the melody itself, and the beauty of the combination of both parts. The second part should have the same aesthetic characteristics as the cantus rmus. For example, the highest note should appear only once, every skip should be compensated by a movement in the opposite direction, the melody should be exible and non-redundant. Both parts should follow strict rules about dissonant and consonant combinations. In order to achieve this, the musician is involved in a cognitive process similar to a negotiation process. He has to compromise between the melodies' notes by choosing from among the permitted notes those that are preferable.

2.2 The Computational Approach

In our system, we create both counterpoint parts dynamically. Our program, NetNeg, is composed of two main sub-systems. One sub-system is implemented by a modi ed version of Jordan's sequential neural network [Jordan, 1986]. The second one is a two-agent model based on Distributed Arti cial Intelligence. See Figure 1. mapping to 13 values

Output Units 0

1

18

(Do,Re)

?? Agent1

0

1

(La,Fa)

18

State Units

Plan Units

Agent2

Figure 1: The NetNeg Architecture

The Connectionist Subsystem

Each part of the melody is produced independently by a neural network. Todd [Todd, 1989] has already suggested a sequential neural network that can learn a sequence of melody notes. Currently, our neural network is based on the same idea. The state units in the input layer and the units in the output layer represent the pitch and the contour. The state units represent the context of the melody, which is composed of the notes produced so far. The output unit activations vector represents the distribution of the

predictions for the next note in the melody for the given current context. The role of the plan units is to label di erent sequences of notes. In the generalization phase, we can interpolate and extrapolate the values of the plan units so as to yield new melodies. The actual input to the network is given in the plan units. At each step, the net is fed with the variable values in the state units together with the original plan units' values. These values will cause the next element in the sequence to appear in the output layer and it will be propagated as feedback into the state units (these connections do not appear in Figure 1). The current values of the state units are composed of the previous values multiplied by a decay parameter and the current output values.

The DAI-Based Subsystem

We consider each part (i.e., voice) of the melody as an agent. Each agent's goal is to compose his melody by choosing the right notes. On the one hand, he has to act according to the aesthetic criteria that exist for his voice; at the same time, he has to compose the voice in a manner compatible with the other voice-agent such that both together will result in a two-part counterpoint. At every time unit in our simulations, each agent receives from the network a vector of activations for all the notes among which he can choose. Were the agent alone in the system, he would have chosen the note that got the highest activation from the neural network, meaning that this note is the one most expected to be next in the melody. But, in order to compose species, both agents will have to negotiate [Rosenschein and Zlotkin, 1994] over all the other possible combinations to obtain a globally superior result. In principle, each agent can suggest any of the n possible notes received from the network, leading to n2 possible di erent combinations. Not all of these combinations are legal according to the rules of the species. In addition, there are speci c combinations that are preferred over others for the current context. This idea is expressed in this module by computing a utility function for each pair of notes. In this sense, the goal of the agents is to agree on the pair of notes that is legal and has also achieved the maximal utility value among all the other options.

3 Implementation 3.1 The Connectionist Module

We have implemented the Connectionist module in Planet [Miyata, 1991]. We built a three-layer sequential net, that learns series of notes. Each series is a one part melody (i.e., cantus rmus). Each sequence of notes is labeled by a vector of plan units. The net is a version of a feedforward backpropagation net with feedback loops from the output layer to the state units (in the input layer). In the generalization phase, the net will be able to produce new cantus rmi by extrapolating and interpolating the values of the plan units. The state units and the output layer can represent the notes in di erent ways. In this implementation, we choose to represent the notes as a vector of 19 units. The

rst eight units encode the pitch. The next nine units represent the intervals between the notes. The last two units describe whether the movement of the melody is ascendent or descendent (we will refer to these units as the movement units). The intervals and the movement units describe the contour of the melody. These units together with the pitch units complement each other and result in a signi cant representation. For example, if the current tone is DO(C), the net predicts both RE(D) and FA(F) as the next best tones, and the ascendent movement unit is on, then the interval can help us to decide which tone to choose (i.e., one tone or two and a half tones). If after the net has chosen the tone SOL(G), it predicts LA(A) or FA(F) and the interval is of one tone, then we could choose whether to descend or ascend based on the activations of the movement units. In the current state of our work, we have focused on two aspects. On the one hand, since we have trained the net with one part melodies, the range of the notes is very limited. Therefore, we needed to formulate rules in order to locate the second part relative to the rst one. On the other hand, we are interested in two part melodies, in which the interval between any pair of tones might be up to a tenth. In order to extract more information out of the output units activations, we want to combine the pitch activations with the interval activations. We have mapped the activations of the output units into a vector of thirteen activations corresponding to the notes in more than an octave and a half. This mapping is based on the last tone of the new melody built so far. If the activation of the ascendent movement unit is higher than the activation of the descendent movement unit, then we change each component in the 13-length vector as follows: rst, we copy each pitch activation to the correspondent pitch component in the 13-length vector. Then, we nd the interval between the previous chosen note and the note we are updating. Finally, we add the activation of this interval unit to the correspondent pitch component in the 13-length vector. Similarly when the descendent movement activation is higher than the ascendent movement activation. Each agent receives the 13-length vector, and in uences the context with their agreement (see Section 3.2). Then, the network will predict another output vector given this new context and the initial values of the plan units. This process continues sequentially until the melodies are completed. It will be interesting in the future to investigate a modi ed model, where the agent-based component of the system in uences the training phase. Thus, we could take into consideration other constraints that the agents could suggest, based on their rules, to enrich the learning abilities of the network.

3.2 The Agent-based Module

We have implemented the agent module using the Mice testbed [Montgomery et al., 1992]. Both agents are involved in a dynamic process in order to create their correspondent melodies. At each time unit, each agent gets a vector of 13 activation values corresponding to the activations of the tones in more than an octave and a

half. This vector is the result of a mapping applied to the output vector of the network as has been explained in 3.1. Then the agents start a negotiation process at the end of which a new note is added to each of the current melodies. Each agent sends to the other all of its notes, one at a time, and saves the pair consisting of his note and the other agent's note that is legal according to the rst species style rules and that has yielded the maximal utility so far. At the end of this process the pair that has achieved the maximal utility is chosen. Both agents feed the network with this result as the current context so that the network can predict the next output. Each agent, then, receives a new input based on this output and the negotiation step is repeated until the melody is completed. We de ne the system utility function to express the rules given by Jeppesen in [Jeppesen, 1992]. A pair of notes is considered legal according to the following rules: 1. The intervals between pairs of notes in the two part melodies should not be dissonant. 2. There should be perfect consonance in the rst and last place of the melody. 3. Unison is only permitted in the rst or last place of the melody. 4. Hidden and parallel fths and octaves are not permitted. 5. The di erence between the previous and the current interval (when it is a fth or an octave) should be two. 6. The interval between both tones cannot be greater than a tenth. 7. At most four thirds or sixths are allowed. 8. If both parts skip in the same direction, none of them will skip more than a fourth. 9. In each part, the new tone is di erent from the previous one. 10. We prefer not to have more than two perfect consonants in the two part contra-punct, not including the rst and last notes. The utility function values will be determined according to whether the pairs of notes are legal or illegal, and whether they are more preferred or less preferred. More formally we de ne the utility function as follows: SysUtility(T1 ; T2 ) = [(act(T1 )  act(T2 )) + cm(T1 ; T2 )]  match(T1 ; T2 ) where: T ; i 2 f1; 2g is the tone proposed by agent i at time t. act(T ) is the activation of tone i. cm characterizes the motion between the previous two tones and the current pair of tones. Denote the following condition cmCond, it is true when there is contrary motion. Notice that the cm values are larger whenever the movement steps are smaller. cmCond(T1 ; T2 ) def = :(((T1 < T1 ?1) ^ (T2 < T2 ?1)) _ ((T1 > T1 ?1) ^ ?1 = 1 ; T2 ) (T2 > T2 ))) 1 Then we de ne if cm(T cmCond(T ; T ) t

t

t

t

t i

t

t

t

t

t i

t

t

t

t

t

t

t

j

0

?

t 1

interval(T1

t

? )?interval(T t;T t )j 1 2

t 1

;T2

t

t

t

t

t

otherwise

1

t

2

=  match(T1 ; T2 ) 1 if (T1 ; T2 ) is legal w.r.t the above rules 0 otherwise We have considered the contrary motion in this function because this type of motion produces the most natural and appropriate e ect for this kind of music (as noticed by Jeppesen [Jeppesen, 1992]). The utility function is heuristic. An interesting point to notice is that taking into account the contrary motion term in the function enables the system to produce contrary motion although the network might have suggested movement in the same direction. t

t

t

t

4 Experiments 4.1 Running the Net

In the training phase, the network learned to reproduce four melodies that were taken from [Jeppesen, 1992]. See Figure 2. 1

Figure 3: A simulation running snapshot is shown in the middle rigth window, where the descendent movement is preferred. Since in both cases (i.e., LA|SOL and LA|SI) there is a one tone interval, we might use the movement unit activations to decide that SOL(G) is the winning note.

2

4.2 Running NetNeg

3

4

Figure 2: The learning examples We have tested the performance of the network with di erent number of hidden units. The result we present regards to a net with 15 units in the hidden layer. In the generalization phase, given a speci c vector of plan units, the network will predict a vector of activations in the output layer. This vector can be interpreted as the distribution of expectations for the next note in the melody taking into consideration the contour of it. We have tested the net with di erent plan units vectors to create many new melodies (cantus rmus). For example, when the plan units values were (0.8 0 0.8 0), the net will produce melodies that are in a certain way closer to the rst and third training examples than to the others. A snapshot of the generalization phase appears in Figure 3. This snapshot was taken at the third step in the generalization phase for the plan vector (0.8 0 0.8 0), shown in the bottom right window. The previous two notes were RE(D) and LA(A), that can be seen in the state units in the bottom left window. The left upper window represents the pitch activations, showing that the most expected notes to be next in the melody are SOL(G) and SI(B). The right upper window represents the interval activations where it is clear that the one tone unit interval got the highest value. The movement activation

We have chosen two di erent plan vectors for the net that will output the notes for each agent. We run the net, each time with the correspondent plan vectors, and mapped their outputs to two di erent thirteen activation values. Then, we run the dai-based module with these inputs. The agents negotiate over the di erent pairs of possible combinations, computing for each the system utility. Finally, the agents will agree upon a legal pair of notes that has yielded the maximal utility or might decide that no combination is legal, given the previous note in the melody. In our current case, the nets are fed with the agents agreement and the system continues to run. This process is executed until the two part melodies are completed. Currently the length of the melodies is xed. A melody that resulted from an experiment we performed with two di erent plan vectors ((0.8 0 0.8 0) and (0 1 0 1)) is given in Figure 4. (4) A2

#

A1 (1)

(2)

(3)

Figure 4: A new two-part melody From Figure 4, we can observe that the system gives aesthetic results, quite appropriate for the species style we have experimented with. Both melodies are consistent with the intervals constraints. Nevertheless, there

is a contour problem as pointed in (1) and (2) in A1's melody in Figure 4. According to Jeppesen [Jeppesen, 1992] it is preferred to descend by a step and then perform a descending skip. After a descending skip, we are expected to have a compensating ascending movement. In (3), we prefer to approach the last note by a step. A2's melody is perfect with regard to the contour. There is a single climax as shown in (4).

5 Summary and Future Work

We have presented a novel computational approach for composing two-part counterpoint melodies. Our system design has been inspired by the cognitive process of a human musician, and the hybrid system is intended to broadly re ect that process. We start by building a system composed of a connectionist module and an agent-based module. The agents decide upon the notes in both melodies. The neural network predicts the expectation for every note to be the next one in the melody. The agents then negotiate over the possible combinations of notes until they reach an agreement on notes that, added to the melody, will most greatly \bene t" the system. Our system is still under development. The current state of the system consists of a sequential neural network that is able to produce Dorian melodies. The agents can read their input based on the network's output from a le and write their result (each one of the notes that have been agreed upon) on another le that comprises the input data for the network. We are developing more natural ways to combine both modules. In their negotiation, the agents follow the rules taken from Jeppesen [Jeppesen, 1992]. The system utility function considers the activations of the current notes, whether this pair is legal or not and it prefers contrary motion. We presented preliminary results of the system, namely the two-part counterpoint melodies produced by the program. Other issues to be further investigated include other species (second, third, and fourth species). We are also interested in dealing with counterpoint in Bach style, and with polyphonic music in a more exible and more abstract style. We will in the future examine how other representations and negotiation protocols can in uence the performance of the system.

References

[Jeppesen, 1992] K. Jeppesen. Counterpoint The Polyphonic Vocal Style of the Sixteenth Century. Dover, New York, 1992. [Jordan, 1986] M.I. Jordan. Attractor dynamics and parallelism in a connectionist sequential machine. In Proceedings of The Eighth Annual Conference of the Cognitive Science Society, Hillsdale, N.J., 1986. [Miyata, 1991] Yoshiro Miyata. A User's Guide to PlaNet Version 5.6. Computer Science Dept., Uni-

versity of Colorado, Boulder, 1991. [Montgomery et al., 1992] T. A. Montgomery, J. Lee, D. J. Musliner, E. H. Durfee, D. Damouth, and Y. So.

MICE Users Guide. Arti cial Intelligence Laboratory, Department of Electrical Engineering and Computer Science, University of Michigan, Ann Arbor, Michigan, February 1992. [Rosenschein and Zlotkin, 1994] Je rey S. Rosenschein and Gilad Zlotkin. Rules of Encounter:Designing Conventions for Automated Negotiation Among Computers. MIT Press, Cambridge, Massachusetts, 1994.

[Todd, 1989] P.M. Todd. A connectionist approach to algorithmic composition. Computer Music Journal, 13:27{43, 1989.