Multiple Functional Neural Fuzzy Networks Fusion Using Fuzzy Integral

台灣．高雄．高雄大學

2009 第17屆模糊理論及其應用研討會

Multiple Functional Neural Fuzzy Networks Fusion Using Fuzzy Integral Siao-Yin Wu1 Chia-Hu Hsu2 Cheng-Jian Lin1 De-Yu Wang2 1

Department of Computer Science and Information Engineering,

National Chin-Yi University of Technology, Taichung 411, Taiwan, R.O.C. E-mail: [email protected] E-mail: [email protected] 2 Department of Computer Science and Information Engineering, Chaoyang University of Technology, Taichung, 413, Taiwan, R.O.C. E-mail: [email protected] E-mail: [email protected] network. The FNFN model has been demonstrated to be effective in other literature [1] and combines the semantic transparency of rule-based fuzzy systems with the learning capability of neural networks. There are several different classifier combination algorithms, such as majority vote [10], product, maximum, and fuzzy integral. Many combination methods and algorithms have been developed, including methods based on fuzzy sets [11]. The fuzzy integral is a nonlinear function that is defined with respect to a fuzzy measure, especially the gλ-fuzzy measure introduced by Sugeno [12], [14]. By combining the outputs of a team of classifiers, we aim at a more accurate decision than one made by the single best member of the team. The ability of the fuzzy integral to combine information from multiple sources has been established in several previous works [2], [9]. However, treating combined classifiers as a branch of statistical pattern recognition sometimes brings about an unwelcome attitude toward using fuzzy combiners. The fuzzy integral considers in its decision making process both the objective evidence supplied by each information source and the expected worth of each subset of information sources. The purpose of this paper is to present a fusion fuzzy integral method for solve the problem which considers the differing performances of each functional neural fuzzy network in the combined networks. We also demonstrate the superior performance of the proposed method and compare it with conventional averaging and voting methods through experiments. This is clearly unable to take a general simple average or simple voting.

摘要本論文提出多函式類神經模糊網路結合模糊積分方法。結合多個分類器可互相輔助，並克服單一分類器的限制。提出的方法與模糊積分結合達到比現有傳統方法更高的精確度。此方法的優點不僅將分類結果結合也考慮其他網路的關連性。模擬結果證明此方法可比現有的傳統方法更精確分類。關鍵詞：函式類神經模糊網路, 模糊積分, 模糊量測, 線上學習。

Abstract This paper presents multiple functional neural fuzzy networks (FNFN) fusion using fuzzy integral. The fusion of multiple classifiers can overcome the limitations of a single classifier since the classifiers complement each other. A combination of multiple FNFN classifiers with fuzzy integral (FI) is proposed to achieve data classification with higher accuracy than existing traditional methods. The advantage of the proposed method is that not only are the classification results combined but the relative importance of the different networks is also considered. Simulations classifications show that the fusion of multiple FNFNs using fuzzy integral can perform better than existing traditional methods. Keywords ： functional neural fuzzy network, classification, fuzzy integral, fuzzy measures, on-line learning

1. Introduction 2. The Functional Neural Fuzzy Network

In the proposed model, we use the functional link neural network (FLNN) [7], to the consequent part of the fuzzy rules, called a functional neural fuzzy

This subsection describes the FNFN model,

506



which uses a nonlinear combination of input variables (FLNN). Each fuzzy rule corresponds to a sub-FLNN, comprising a functional link. Fig. 1 presents the structure of the proposed FNFN model. The FNFN model realizes a fuzzy IF–THEN rule in the following form. Rule-j: IF x1 is A1 j and x2 is A2 j ... and xi is Aij ... and x N is ANj

described earlier is adopted to perform the precondition part of the fuzzy rules. As a result, the output function of each inference node is

u (j3) =

∏u

( 2) ij

(4)

i

where the ∏ uij(2) of a rule node represents the firing i

M

THEN yˆ j = ∑ wkjφ k k =1

= w1 jφ1 + w2 jφ 2 + ... + wMjφ M

(1) where xi and yˆ j are the input and local output variables, respectively; Aij is the linguistic term of the precondition part with Gaussian membership function, N is the number of input variables, wkj is the link weight of the local output, φ k is the basis trigonometric function of input variables, M is the number of basis function, and rule j is the jth fuzzy rule.

strength of its corresponding rule. Nodes in layer 4 are called consequent nodes. The input to a node in layer 4 is the output from layer 3, and the other inputs are calculated from the FLNN. For such a node M

u (j4 ) = u (j3) ⋅

∑w φ

(5)

kj k

k =1

where wkj is the corresponding link weight of the FLNN and φ k is the functional expansion of input variables. The functional expansion uses a trigonometric polynomial basis function, given by [ x1 , sin (π x1 ), cos(π x1 ), x2 , sin (π x2 ), cos(π x2 )] for two-dimensional input variables. Therefore, M is the number of basis functions, M = 3× N, where N is the number of input variables. Moreover, the output nodes of the FLNN depend on the number of fuzzy rules of the FNFN model. The output node in layer 5 integrates all of the actions recommended by layers 3 and 4 and acts as a defuzzifier with

∑ = ∑ ∑ = ∑ R

y=u

( 5)

u ( 4) j =1 j R u (3) j =1 j

=

∑

R j =1

u (j3)  

∑

∑

R j =1

M k =1

wkjφk  

u (j3)

(6)

R

u ( 3) yˆ j j =1 j R u ( 3) j −1 j

where R is the number of fuzzy rules and y is the output of the FNFN model.

Fig. 1. Structure of an FNFN. The operation functions of the nodes in each layer of the FNFN model are now described. In the

3. Learning algorithms of FNFN Model

following description, ui(1) denotes the output of a node in the lth layer. No computation is performed in layer 1. Each node in this layer only transmits input values to the next layer directly

This section presents an online learning algorithm for constructing the FNFN model. The proposed learning algorithm comprises a structure learning phase and a parameter learning phase. Fig. 2 presents flow diagram of the learning scheme for the FNFN model. Structure learning is based on the entropy measure used to determine whether a new rule should be added to satisfy the fuzzy partitioning of input variables. Parameter learning is based on supervised learning algorithms. The backpropagation algorithm minimizes a given cost function by adjusting the link weights in the consequent part and the parameters of the membership functions. Initially, there are no nodes in the network except the input–output nodes. The nodes are created automatically as learning proceeds, upon the reception of online incoming training data in the structure and parameter learning processes. The rest of this section details the structure learning phase and the parameter

ui(1) = xi (2) Each fuzzy set Aij is described here by a Gaussian membership function. Therefore, the calculated membership value in layer 2 is  [u (1) − m ]2  (3) uij( 2) = exp − i 2 ij    σ ij   where mij and σ ij are the mean and variance of the Gaussian membership function, respectively, of the jth term of the ith input variable xi. Nodes in layer 3 receive one-dimensional membership degrees of the associated rule from the nodes of a set in layer 2. Here, the product operator

507



output for each discrete time t. In each training cycle, starting at the input variables, a forward pass is adopted to calculate the activity of the model output y(t).

learning phase.

4. Fusion of multiple FNFNs using FI In this section, we shall introduce some definitions related to the fuzzy integral and present an effective method for combining the outputs of multiple FNFNs with regard to the subjectively defined importance of individual FNFNs.

EM max < EM ?

4.1. Fuzzy Integral Fuzzy integral (FI) has been applied to classifier combinations in a number of contexts [3], [5], [15]. For the sake of completeness, some basic definitions and properties of the FI are discussed below. For a full description of the FI and related fuzzy measures, the reader is referred to [6]. The fuzzy measure g is evaluated from the parameter λ and the so-called fuzzy density function gi which, for the collective decision problem, can be interpreted as the degree of importance of the source/classifier xi, xi ∈A, toward final evaluation. The relationship between the fuzzy measure g and the density function gi is given by

Fig. 2. Flow diagram of the structure/parameter learning for the FNFN.

3.1. Structure Learning Phase The first step in structure learning is to determine whether a new rule should be extracted from the training data and to determine the number of fuzzy sets in the universe of discourse of each input variable, since one cluster in the input space corresponds to one potential fuzzy logic rule, in which mij and σ ij represent the mean and variance of that

∏ (1 + λg )− 1 i

g ( A) =

x∈A

(13)

λ

The value of λ can be evaluated by solving the equation

λ +1 =

cluster, respectively. For each incoming pattern xi, the rule firing strength can be regarded as the degree to which the incoming pattern belongs to the corresponding cluster. The entropy measure between each data point and each membership function is calculated based on a similarity measure. A data point of closed mean will have lower entropy. Therefore, the entropy values between data points and current membership functions are calculated to determine whether or not to add a new rule.

∏ (1 + λg ) n

i

(14)

i =1

where λ ∈ (− 1,+∞ ) and λ ≠ 0 . This Eq. (14) can be calculated by solving the (n-1)st degree polynomial and finding the unique root which is greater than -1. The fuzzy integral is a nonlinear function that is defined with respect to the fuzzy measure g described above and can be defined as follows: Let X be a finite set and let h: X → [0, 1] be a subset of X. Then the FI over X of the function h with respect to the fuzzy measure g is

3.2. Parameter Learning Phase

h( x ) o g (⋅) = max  min  min h( x ), g ( E )  (15)  X ∈E   E⊆ X  Suppose that a class is evaluated with a set of classifiers X. Let h(x) → [0 1] denote the decision for the class when x ∈ X is considered and let g(x) denote the degree of importance of this source. Consider that the class is evaluated with sources from A ⊆ X. The min h(x) can be considered as the best security decision the class provides. g(A) denotes the grade of importance of the subset of sources. The min operator in Eq. (15) is interpreted as the grade of agreement between the real possibilities h(x) and the expectations g. Hence, the fuzzy integration is basically a search scheme for the maximal grade of agreement between the objective evidence and the expectation. If (Ai | i = 1, …, n) is a partition of the set X,

After the network structure has been adjusted according to the current training data, the network enters the parameter learning phase to adjust the parameters of the network optimally based on the same training data. The learning process involves determining the minimum of a given cost function. The gradient of the cost function is computed and the parameters are adjusted with the negative gradient. The backpropagation algorithm is adopted for this supervised learning method. When the single output case is considered for clarity, the goal to minimize the cost function E is defined as 1 1 E (t ) = [ y(t ) − y d (t )]2 = e 2 (t ) (12) 2 2 where yd(t) is the desired output and y(t) is the model

508



then the FI reduces to h( x ) o g ( x ) ≥ max( e1 , e2 , ..., en ) (16) where ei is the FI of h with respect to g over Ai. The calculation of the FI can be easily described as follows: Let X = (x1, x2, …, xn) and h: X → [0, 1] be a function. The fuzzy integral e can be computed by n

e = max{min h( xi ), g ( Ai )}

(17)

i =1

In the case of a multi-network fusion, let Ω = (c1, c2, …, cc) be the set of classes of interest, let T = (t1, t2, …, tn) be a set of the network, and let A be the pixel under testing. Let hk: T → [0, 1] represent the partial evaluation of the pattern A for class tn; that is, hk(ti) indicates how certain we are that A is classified in class ck using the network ti, where 1 indicates an absolute certainty that A belongs to ck and 0 implies that A is not in ck. In the study, we took the normalized difference between the two most activated output nodes of the networks as h. When each ti is considered, the degree of importance gi must also be evaluated. These values can be evaluated easily from the training data. In our study, we took the recognition accuracies of each class with each network as gi. From these, the fuzzy measure g can be evaluated using Eq. (13) and (14). The FI can thus be computed from Eq. (17).

4.2. Multiple FNFNs fusion using FI The structure of our proposed method is shown in Fig. 3. The operation of this model can be thought of as a non-linear decision-making process. In this paper, we propose a novel fuzzy integral fusion method for multiple FNFNs. Given an unknown input X = (x1, x2, ..., xT) and the class set = (c1, c2, …, cc), each output of FNFN estimates the probability P(ci | X) of belonging to the class by itself using

P(ci | X ) ≈ f

{∑ w ⋅ (∏ G )} i,k

k, j

(18)

Fig. 3. The multiple FNFNs fusion architecture. where G represents a Gaussian basis function, such as G( x ) = exp(−( x − m) 2 / σ 2 ) , and wi,k represents the constant consequence of the fuzzy rule. We used the results of the Gaussian basis function process compensatory operation and the weight multiplication to make the summary finally to substitute si. This means that si is the result of wi , k ⋅ ( Gk , j ) . After that, we defined Smin = MIN(si) and Smax = MAX(si). We used f, a normalization function such as

∑

∏

509

f(x) = (si - Smin) / (Smax + Smin), to obtain the results. On other hand, we assigned the fuzzy densities gi, the degree of importance of each FNFN, based on how well these FNFNs performed on the validation data. These densities can be subjectively assigned by an expert, or they can be generated from the training data. We computed these values as follows p g i = i d sum (19) ∑ pi where pi is the performance of FNFNi on the validation data and dsum is the sum of fuzzy densities. Now consider the case of using the fuzzy integral. First of all, we have to calculate the λ value from the g values, where g = MAX(gi). The Sugeno fuzzy measure g must have a parameter λ . The unique root greater than -1, which produces the following fuzzy measure on the power set of Y. As expected, the subset of criteria is more important for confirming the hypothesis than either subset. And how the consensus is formed, where H(E) = min(h(yi), g(Ai)). Finally, the maximum value is selected as decision output, max [H(E)].

5. Experimental results Example1:Iris Data The Iris data set [13] contains 150 patterns that are distributed equally into three output species, Iris Setosa, Iris Versicolur and Iris Virginica. Each pattern consists of four input features: sepal length, sepal width, petal length, and petal width. We exploit these patterns to produce both the training data and testing data. In the experiment, 25 instances from each species were randomly selected as the training set (i.e., a total of 75 training patterns were used) and the remaining instances were used as the testing set. The data set was normalized to the range [0, 1], and the output y of the FNFN model was used with the following classification rules. if ( y1 , y 2 , y3 ) = (0, 0, 1) Setosa,  Iris = Versicolour , if ( y1 , y 2 , y3 ) = (0, 1, 0 ) (20) Virginica, if ( y1 , y 2 , y3 ) = (1, 0, 0 )  The initial parameters η m = ησ = η w = 0.01 and entropy = [0.26, 0.3] were chosen. We repeated the experiment on 5 different training and testing data sets that were obtained via a random process from the original Iris data. Table 1 tabulated the results of the 5 different data sets in independent runs and uses three FNFN models. The average testing accuracy rate of the FNFN models were 97.06%, 97.33% and 96.8%. Table 2 shows experimental results fusion fuzzy integral of Iris data. The average testing accuracy rate of multiple FNFNs using FI was 97.6%. From the simulation results, we know that using fuzzy integral combined with three FNFNs can avoid mistakes in classification done by only a single FNFN. Because of this characteristic, fuzzy integral can cause higher recognition accuracy than single model do.



Table 1：：Three FNFN models for independent runs. Experiment

1

2

3

4

5

Accuracy (%)

Testing error1

1

2

3

3

2

97.06

Testing error2

2

2

3

1

2

97.33

Testing error3

1

4

3

1

3

96.8

experiment, half of the 683 patterns were used as the training data set and the remaining patterns were used as the test data set. We demarcated the output y of the FNFN model using the following classification rules. if ( y1 , y 2 ) = (0, 1)  Benign, class =  Malignant, if ( y1 , y 2 ) = (1, 0)

We set parameters

1

2

3

4

5

Avg.

Testing errors

1

2

3

1

2

1～3

Accuracy (%)

98.7

97.3

96

98.7

97.3

97.6

We pasted up the 67th test data in our simulation. The test results showed that only the 67th test data belonged to the Virginica class in the FNFN1 model (see Fig. 4 (a))and the Versicolour class in the FNFN2 and FNFN3 models (see Fig. 4 (b) and (c)).

(a)FNFN1

(c)FNFN3

η m = ησ = η w = 0.01

and

entropy=[0.26, 0.3] as initial values. The experiments were performed on 5 different training/test data sets that were obtained via a random process from the original data. Table 4 tabulated the results of the 5 different data sets in independent runs and uses three FNFN models. The average testing accuracy rate of the FNFN models were 98.07%, 97.95% and 97.78%. Table 5 shows experimental results fusion fuzzy integral of Wisconsin Breast Cancer Diagnostic data. The average testing accuracy rate of multiple FNFNs using FI was 98.2%. Table 4：：Three FNFN models for independent runs.

Table 2：Experimental results fusion FI of Iris data. Experiment

(21)

Experiment

1

2

3

4

5

Accuracy (%)

Testing error1

8

7

4

7

7

98.07

Testing error2

7

7

7

7

7

97.95

Testing error3

12

7

4

9

6

97.78

Table 5：：Experimental results of Wisconsin Breast Cancer Diagnostic data.

(b)FNFN2

Experiment

1

2

3

4

5

Avg.

Testing errors

7

7

4

7

6

4～7

Accuracy (%)

98

98

98.8

98

98.3

98.2

(d)Multiple FNFN fusion by FI

Fig. 4. Distribution of testing data by all kinds of networks. Table 3：：Experimental results of Iris data. Items Models MLP MLP using FI [12] NFN [4] SANFIS [8] FNFN [1] Multiple FNFNs using FI

Avg. training accuracy (%) 98.5 96 98 98.6 98.7

Avg. testing accuracy (%) 94.66 95.44 97.04 97.3 96.06～97.33

98.9

97.6

(a)FNFN1

Example2:Wisconsin Breast Cancer Diagnostic Data In our experiments, these data sets were replaced by the average of the column (feature) regardless of the class labels. The data set contains 699 patterns distributed into two output classes, Benign (458 patterns) and Malignant (241 patterns). Each pattern consists of nine input features: clump thickness, uniformity of cell size, uniformity of cell shape, marginal adhesion, single epithelial cell size, bare nuclei, bland chromatin, normal nucleoli, and mitoses. Since there were 16 patterns containing missed values, only 683 patterns were used. In the

(c)FNFN3

(b)FNFN2

(d)Multiple FNFN fusion by FI

Fig. 5. Distribution of testing data by all kinds of networks. We printed out the 28th test data in our simulation. The test results showed that only the 28th test data belonged to the Benign class in the FNFN3 model (see Fig. 5 (c)) and to the Malignant class in the FNFN1 and FNFN2 models (see Fig. 5 (a) and (b)). If the majority voting method were chosen, the 28th test data should belong to the Malignant class. But this is wrong. Using the fuzzy integral fusion ensemble

510



networks, we found the result to be more accurate. When multiple FNFNs were combined with the fuzzy integral, the result was correct. Therefore, we can be sure that to combine fuzzy integral with the FNFNs can avoid mistakes in classification using either only a single FNFN or the general voting methods. It can also reduce mistakes. The results show that the proposed model has a higher average recognition rate than other methods. The comparison results are tabulated in Table 6. Table 6：：Experimental results of Wisconsin Breast Cancer Diagnostic data. Items Models MLP MLP using FI [15] SANFIS [8] NFN [16] FNFN[3] Multiple FNFNs using FI

[5]

[6]

[7]

Avg. recognition rate (%) 95.72 96.42 96.07 97.95 97.78~98.07 98.2

[8]

6. Conclusion

[9]

In this thesis, we proposed the functional neural fuzzy network (FNFN) fusion using fuzzy integral. The FNFN model uses the functional link neural network (FLNN) as the consequent part of the fuzzy rules. Many researchers already solve the problem of combining multiple classifiers have provided new insights into pattern recognition. Previously, the main were focused on the design of one good classifier that to get the best classification rate. Now, we can change our focus. We can build a number of different classifiers. Each classifier by itself may not achieve the desired performance, but the appropriate combination of these individual classifiers may produce a highly reliable performance. Therefore, we used two better-known benchmark data sets in our classification to show that our proposed model can work well on pattern classification problems.

[10]

[11]

[12]

[13]

[14]

7. References [1]

[2]

[3]

[4]

C. H. Chen, C. J. Lin, and C. T. Lin, “A Functional-Link-Based Neurofuzzy Network for Nonlinear System Control,” IEEE Trans. Fuzzy Syst., vol. 16, no. 5, pp. 1362–1378, Oct. 2008. C. F. Juang, J. Y. Lin, and C. T. Lin, “Genetic reinforcement learning through symbiotic evolution for fuzzy controller design,” IEEE Trans. Syst., Man, Cybern. B, Cybern., vol. 30, no. 2, pp. 290–302, Apr. 2000. D. Wang, J. M. Keller, C. A. Carson, K. K. McAdoo-Edwards, and C. W. Bailey, “Use of fuzzy-logic-inspired features to improve bacterial recognition through classifier fusion,” IEEE Trans. Syst., Man, and Cybernetics, Vol. 28B, 1998, pp. 583-591. F. J. Lin, C. H. Lin, and P. H. Shen,

511

[15]

[16]

[17]

“Self-constructing fuzzy neural network speed controller for permanent-magnet synchronous motor drive,” IEEE Trans. on Fuzzy Syst., Vol. 9, 2001, pp. 751-759. Grabisch, “On equivalence classes of fuzzy connectives − the case of fuzzy integrals,” IEEE Trans. on Fuzzy Syst., Vol. 3, 1995, pp. 96-109. H. Tahani and J. Keller, “Information fusion in computer vision using fuzzy integral,” IEEE Trans. Syst., Man, and Cybernetics, Vol. 20, 1990, pp. 733-741. J. C. Patra, R. N. Pal, B. N. Chatterji, and G. Panda, “Identification of nonlinear dynamic systems using functional link artificial neural networks,” IEEE Trans. Syst.,Man, Cybern., vol. 29, no. 2, pp. 254–262, Apr. 1999. J. S. Wang and C. S. G. Lee, “Self-adaptive neuro-fuzzy inference systems for classification applications,” IEEE Trans. on Fuzzy Syst., Vol. 10, 2002, pp.790-801. K. S.Narendra and K. Parthasarathy, “Identification and control of dynamical systems using neural networks,” IEEE Trans. Neural Netw., vol. 1, no. 1, pp. 4–27, Mar. 1990. L. I. Kuncheva, “‘Fuzzy’ versus ‘non-fuzzy’ in combining classifiers designed boosting,” IEEE Trans. on Fuzzy Systems, Vol. 11, 2003, pp. 729-741. L. I. Kuncheva, “Combining classifiers: soft computing solutions,” Pattern Recognition, From Classical to Modern Approaches, in S. K. Pal and A. Pal, eds., World Scientific, Singapore, Ch. 15, 2001, pp. 427-452. M. Grabisch and M. Sugeno, “Multi-attribute classification using fuzzy integral,” in Proceedings of the IEEE International Conference on Fuzzy Syst., 1992, pp. 47-54. R. Fisher, “The use of multiple measurements in taxonomic problems,” Annals of Eugenics, Vol. 7, 1936, pp. 179-188. S. B. Cho and J. H. Kim, “Multiple network fusion using fuzzy logic,” IEEE Trans. on Neural Networks, Vol. 6, 1995, pp. 497-501. S. B. Cho and J. H. Kim, “Combining multiple neural networks by fuzzy integral and robust classification,” IEEE Trans. Syst., Man, and Cybernetics, Vol. 25,1995, pp. 380-384. S. Paul and S. Kumar, “Subsethood-product fuzzy neural inference system (SuPFuNIS),” IEEE Trans. on Neural Networks, Vol. 13, 2002, pp. 578-599. Y. H. Pao, S. M. Phillips, and D. J. Sobajic, “Neural-net computing and intelligent control systems,” Int. J. Control, vol. 56, no. 2, pp. 263–289, 1992.