Pattern Fusion In Feature Recognition Neural Networks ... - IEEE Xplore

1 downloads 0 Views 211KB Size Report
Networks for Handwritten Character Recognition. Shie-Jue Lee and Hsien-Leing Tsai. Abstract— Hussain and Kabuka [8] proposed a feature recognition.
612

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS, VOL. 28, NO. 4, AUGUST 1998

Correspondence Pattern Fusion in Feature Recognition Neural Networks for Handwritten Character Recognition Shie-Jue Lee and Hsien-Leing Tsai

Abstract— Hussain and Kabuka [8] proposed a feature recognition neural network to reduce the network size of Neocognitron [6]. However, a distinct subnet is created for every training pattern. Therefore, a big network is obtained when the number of training patterns is large. Furthermore, recognition rate can be hurt due to the failure of combining features from similar training patterns. We propose an improvement by incorporating the idea of fuzzy ARTMAP [1], [2] in the feature recognition neural network. Training patterns are allowed to be merged, based on the measure of similarity among features, resulting in a subnet being shared by similar patterns. Because of the fusion of training patterns, network size is reduced and recognition rate is increased. Index Terms— Fuzzy ARTMAP, link weight, matching degree, node weight, recognition-by-parts, vigilance.

I. INTRODUCTION Handwritten character recognition has been a major topic in the pattern recognition society [4], [12]. Very good results were reported using neural network techniques [6], [8]–[11]. Neocognitron [5], [6] was a famous architecture which applied a recognition-by-parts algorithm, tolerant of deformation, noise, and shifts in position. However, the size of Neocognitron is huge. Hussain and Kabuka [8] proposed a feature recognition neural network to overcome this disadvantage. However, a distinct subnet is created for every training pattern, resulting in a big network when the number of training patterns is large. To keep a resulting network to be reasonably small, it required that training patterns be preselected by human experts. Such filtering on training patterns, i.e., determining which ones are good and which ones are bad, by human beings is hard if not impossible. Furthermore, recognition rate can be hurt due to the failure of combining features from similar training patterns. We develop a neural network architecture which incorporates the idea of fuzzy ARTMAP [2], [1] in feature recognition neural networks. Training patterns are allowed to be merged, based on the measure of similarity among features. A subnet is allowed to be shared by similar patterns. A minimal number of subnets is learned automatically to meet accuracy criteria. Therefore, the network size can be reduced and training patterns need not be preselected by human experts. Also, due to the fusion of training patterns, the recognition rate of our network is higher than that of the feature recognition neural network. The next section of this paper presents the architecture of our network. Section III describes the creation and training of a network from a set of training patterns. Section IV describes how a trained network performs recognition on input patterns. An example is given for illustration in Section V. Section VI presents simulation Manuscript received June 15, 1996; revised November 8, 1997. This work was supported in part by the National Science Council under Grants NSC83-0408-E-110-004 and NSC-85-2213-E-110-035. The authors are with the Department of Electrical Engineering, National Sun Yat-Sen University, Kaohsiung, Taiwan 80424, R.O.C. (e-mail: [email protected]). Publisher Item Identifier S 1083-4419(98)04979-6.

(a)

(b)

Fig. 1. (a) An example pattern and (b) its 16 nominal subpatterns.

results, plus a comparison with the feature recognition network. For convenience, we will call Hussain and Kabuka’s feature recognition network as FRNN and our network as modified FRNN (MFRNN) in the rest of the paper. II. NETWORK ARCHITECTURE

As in FRNN, an input pattern is an array of 16 2 16 pixels which are numbered 1, 2, 1 1 1, 256 from left to right and top to bottom. Any 4 2 4 pixels is called a subpattern and the subpatterns are numbered 1, 2, 1 1 1 ; 16 similarly. A subpattern j is called a nominal subpattern if it contains the following pixels:

0 1 2 64 + ( 0 1) 2 16 + (( 0 1)%4) 2 4 + 4 1 4 1 4 where b1c is the floor function, “%” is the modulo function, and j

k

k

;

j

i;

i

k

and i denote the row index and column index, respectively, for the pixels of the subpattern j . For example, the sixth nominal subpattern contains pixels 69, 70, 71, 72, 85, 86, 87, 88, 101, 102, 103, 104, 117, 118, 119, and 120. A subpattern with the coordinate c1x ; c1y of the center pixel is said to have a distance h from another subpattern with the coordinate c2x ; c2y of the center pixel, if

(

(

)

)

max(j 1x 0 2x j j 1y 0 2y j) = We also say that the former subpattern is ( 1x 0 2x c

c

; c

c

h:

)

c c ; c1y 0 c2y away from the latter subpattern. Fig. 1 shows an example pattern with its 16 nominal subpatterns. An MFRNN consists of four layers, excluding the input layer, as shown in Fig. 2. Subpatterns of an input pattern are presented to the shift-subpattern layer. Shift-subpattern nodes take care of the deformation, i.e., shift, noise, or size, of the input pattern. Each subpattern node summarizes the measure of similarity between the corresponding input subpattern and the stored subpattern. Similarly, a pattern node in the pattern layer reflects the similarity measure between the input pattern and the stored pattern. A pattern node is connected to one category node in the category layer, indicating the class of the input pattern. A subpattern node is responsible for the match between an input nominal subpattern and the stored subpattern. However, for tolerating possible deformation of the input pattern, we have to consider the neighboring subpatterns of an input nominal subpattern. Suppose we allow a deformation of up to 6d d is a positive integer pixels in either X or Y directions. We have to consider all the neighboring subpatterns within the distance d in order to detect a possible deformation. Each neighboring subpattern is taken care of

1083–4419/98$10.00  1998 IEEE

(

)

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS, VOL. 28, NO. 4, AUGUST 1998

613

Y is in the vertical direction. A positive (negative) sx denotes a right (left) shift, and a positive (negative) sy denotes a down (up) shift of the subpattern. We may call Ni;j;(s ;s ) as the (sx ; sy )th shiftsubpattern node of Ni;j : We define a subnet of a pattern node Nk to be the subnetwork consisting of the pattern node Nk ; its subpattern nodes and shift-subpattern nodes, together with all the associated links. III. NETWORK CREATION

TRAINING

AND

Suppose we are given a set of training patterns. Each pattern is represented by a row matrix A of 256 pixels, and each subpattern by a row matrix I j of 16 pixels, 1  j  16; namely,

Ij

= [Ij

; Ij ; 1 1 1 ; Ij

A = [I 1 ; I 2 ; 1 1 1 ; I 16 ]

];

 j  16

1

where Ij is the normalized graylevel of the corresponding pixel, i.e.,

Ij

2 f01; 1g;

 k  16; 1  j  16 01 representing white. For

1

with 1 representing black and convenience, we represent the input to a shift-subpattern node Ni;j;(s ;s ) by I j (sx ; sy ): I j (0; 0) may be abbreviated as I j : Since we are doing supervised classification, it is assumed that the class of each input training pattern is known. As mentioned earlier, each subpattern node stores a node weight shared by all its shift-subpattern nodes. For a subpattern node Ni;j ; its node weight W i;j is defined to be

W i;j

Fig. 2. The architecture of MFRNN.

= [Wi;j;1 ; Wi;j;2 ;

111;W

i;j;16 ]

(1)

where each Wi;j;k ; 1  k  16; is an integer. Suppose an input training pattern A with class C is presented to the network. Each shift-subpattern node Ni;j;(s ;s ) computes its output Oi;j;(s ;s ) by T

in a shift-subpattern node. Therefore, a subpattern node may receive the outputs of up to (2d + 1)2 shift-subpattern nodes. For example, the first subpattern node has (d + 1)2 shift-subpattern nodes, and the sixth subpattern node has (2d + 1)2 shift-subpattern nodes. Each subpattern node stores a node weight W which is shared by all its shift-subpattern nodes. A shift-subpattern node computes, based on the input pattern and its node weight, a value and outputs the value to the associated subpattern node. The value computed by a shift-subpattern node measures the similarity between an input subpattern with distance (sx ; sy ); 0d  sx  d; 0d  sy  d from the underlying input nominal subpattern and the node weight stored in the subpattern node. A subpattern node investigates the output values of all its shift-subpattern nodes and takes the maximum of them as its output. The third layer contains pattern nodes. A pattern node is linked by 16 subpattern nodes, with a link weight ! associated with each link. Also, a vigilance parameter ; 0    1; is associated with each pattern node. The values of vigilance parameters are adjusted in the training phase of the network, as will be described later. Similar to fuzzy ARTMAP [1], vigilance parameters control the accuracy of classifying input training patterns. Each pattern node receives values from all its subpattern nodes and computes a number from these values. The numbers from all pattern nodes are involved in triggering one of the class nodes, indicating that the input pattern has been appropriately classified. For convenience, we use the following notation in referring to nodes. Let Ni be a pattern node. Then Ni;j denotes the j th subpattern node of Ni and Ni;j;(s ;s ) denotes the shift-subpattern node of Ni;j that takes care of the input subpattern which is (sx ; sy ) away from the nominal subpattern. As usual, X is in the horizontal direction and

Oi;j;(s

;s

)

=

W i;j I j (sx ; sy ) jW i;j j

(2)

where

jW j =

16

i;j

jW

i;j;k

j

k=1

and the superscript T stands for matrix transposition. Since each T element of I j (sx ; sy ) is either 1 or 01, the following relationship holds:

0jW j  W Therefore, we have 01  O i;j

I j (sx ; sy )  jW i;j j: T

i;j

i;j;(s ;s )  1: Apparently, Oi;j;(s ;s ) measures the similarity between I j (sx ; sy ) and the node weight W i;j stored in Ni;j : The more I j (sx ; sy ) is similar to the stored weight W i;j ; the closer Oi;j;(s ;s ) is to 1. On the contrary, the more I j (sx ; sy ) is different from W i;j ; the closer Oi;j;(s ;s ) is to 01: All the outputs of shift-subpattern nodes are sent to respective subpattern nodes. Each subpattern node Ni;j takes the maximum value of all its inputs, i.e.,

Oi;j

= max(Oi;j;(0d;0d) ;

111;O

i;j;(0;0)

; 1 1 1 ; Oi;j;(d;d) );

and sends this value to its pattern node Ni : The way Oi;j is computed reflects the spirit of recognition by parts. Also, this accounts for the tolerability of MFRNN on deformation, noise, and shift in position. Obviously, 01  Oi;j  1 for every possible i and j . Let the priority index Pi for a pattern node Ni be defined by 16

Pi

=

(3 j =1

2 O 0 2) i;j

1=3

(3)

614

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS, VOL. 28, NO. 4, AUGUST 1998

with j ranging over the 16 subpattern nodes of Ni : Using priority indexes makes the training procedure more efficient, as in fuzzy ARTMAP [1]. The priority indexes of all pattern nodes are sorted in decreasing order and placed in the priority list. Suppose the largest priority index in the priority list is Pk . Let the pattern node corresponding to Pk be Nk ; the class for Nk be Ck ; and Nk ’s vigilance be k : We compute the following matching degree Mk for Nk : 16

Mk

=

(!k;j

j =1

16

^O

k;j

+ 1)

(4) Fig. 3. A class node having two or more subnets.

(!k;j + 1)

j =1

where !k;j is the link weight between Nk and Nk;j : The operator ^ is defined as in [15]

!k;j ^ Ok;j

= min(!k;j ; Ok;j ):

Since !k;j ^ Ok;j  !k;j and 01  !k;j (to be clear later), we have 0  Mk  1: Note that Mk reflects the similarity between the input pattern A and the pattern stored in the subnet of Nk in a global sense; the more similar they are, the larger Mk we have. Then we have the following cases. 1) If Mk  k and C = Ck ; then we modify the pattern stored in the subnet of Nk by changing the associated node weights and link weights as follows:

W k;j !k;j

W k;j + I j (sjx ; sjy ); 1  j  16; !k;j ^ Ok;j ; 1  j  16

(5)

where I j (sjx ; sjy ) is the input to Nk;j;(s ;s ) whose output value to Nk;j is the largest among the shift-subpattern nodes of Nk;j : Then we are done with the input training pattern A: Equation (5) intends to increase the output value of Nk;j;(s ;s ) more than the output values of the other shift-subpattern nodes of Nk;j when an identical input pattern is presented to the network next time. 2) If Mk  k ; Ck 6= C; and Mk < 1; then we increase k as follows:

k

Mk +

(6)

where is a very small positive real number. With this increase in k ; the next time when an identical input pattern is presented to the network, Mk would be no longer greater than or equal to k : 3) If Mk  k ; Ck 6= C; and Mk = 1; then the modification becomes

k !k;j

1;

Ok;j + ; 1  j  16

(7)

where is a very small positive real number. In this case, Mk would be slightly less than k the next time when an identical input pattern is presented to the network, since the numerator of (4) takes the smaller of !k;j and Ok;j : 4) If Mk is smaller than the vigilance k of Nk ; then we do not modify anything in the subnet of Nk : If any of the last three cases occurs, we select the next highest priority index in the priority list and continue the above process iteratively until either the first case occurs or every member of the priority list is considered. If the first case has never occurred, then it means that the training pattern should not be combined in any existing pattern subnet. In this case, we create a new pattern subnet for storing this

training pattern. Let Nn be the pattern node of this new subnet. The node weight W n;j of the j th subpattern node of Nn is initialized by

W n;j

Ij;

1

 j  16

(8)

and the j th link weight, !n;j ; of Nn is initialized to 1, namely,

!n;j

1;

1

 j  16

(9)

and the vigilance n associated with Nn is set to an initial value which depends on how much degree of fuzziness is allowed for Nn to include in the subnet the other input patterns. If the network already contains a class node for C; then we connect Nn to this class node; otherwise we create a class node for C and connect Nn to it. Notice that priority indexes help training in the following way. For a training pattern A with class C if no class node of C exists in the network, then we don’t need to go through the above procedure at all. We create a new subnet by applying (8) and (9). Next time when an identical pattern is presented, it is sure that this subnet will be activated since it will be the first element in the priority list. This will not cause any problem for the recognition phase, since priority indexes are applied in the same way as described in the next section. Two or more pattern nodes may connect to an identical class node, indicating that the patterns stored in these subnets are in the same class. This case occurs if training patterns of a class are clustered in groups, as shown in Fig. 3. The patterns in one cluster are not similar enough, measured by matching degrees, to the patterns in another cluster. As a result, each cluster results in a different subnet. The above procedure is iterated with the training pattern set until the network is stable, i.e., none of the vigilances in the network changes. IV. RECOGNITION

BY

TRAINED NETWORK

After the training phase, the network is ready for recognizing unknown patterns. Suppose A is a normalized input pattern presented to the trained network. First, the priority indexes of all pattern nodes are computed. These indexes are sorted, as before, in decreasing order in the priority list. Suppose the largest priority index in the priority list is Pk : Let the pattern node corresponding to Pk be Nk ; the class for Nk be Ck ; and Nk ’s vigilance be k : Then we compute the matching degree Mk for Nk : If Mk is greater than or equal to k ; then the input pattern is classified to Ck and we are done. If Mk is less than k ; then we select the next highest priority index in the priority list and proceed the above process iteratively. If we cannot find a pattern node with the matching degree being greater than or equal to its vigilance, then we classify the input pattern to the class represented by the class node connected by the pattern node with the highest priority index. V. AN EXAMPLE Let us illustrate our idea with an example. For simplicity, the patterns used in this section are made more like printed characters

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS, VOL. 28, NO. 4, AUGUST 1998

than handwritten ones. Suppose we have five training patterns as shown in Fig. 4, with the third and the fourth patterns being “F” and the other patterns being “E.” Initially, no subnet exists. When the first training pattern, Fig. 4(a), comes in, a subnet is created and the 1 I j ; !1;j 1; 1  j  16; pattern node is labeled N1 with W 1;j and 1 0:9; according to (8) and (9). Note that I ij was used to represent subpattern I j of pattern i and the initial vigilance was assumed to be 0.9. Then we create a class node C1 for “E ” and connect N1 to C1 : Next, we consider the second training pattern. The matching degree of pattern node N1 is M1 = 0:891 computed by (4). Since M1 < 1 ; a new subnet is created and the pattern node 2 is labeled N2 with W 2;j I j ; !2;j 1; 1  j  16; and 2 0:9: Then we connect N2 to the class node C1 : Next we consider the third training pattern. Since there exists no class node for class “F ,” a subnet is created and the pattern node is called N3 , 3 with W 3;j I j ; !3;j 1; 1  j  16 and 3 0:9: We create a class node C2 for “F ” and connect N3 to C2 : Next, we consider the fourth training pattern. The priority indexes for N1 ; N2 ; and N3 are computed by (3) and they are

(a)

615

(b)

(d)

(c)

(e)

Fig. 4. Five training patterns.

P3 = 15:115; P1 = 12:63; P2 = 8:202 in decreasing order. Since M3 = 0:961 > 3 and the class of the training pattern matches the class of the subnet for N3 , we modify the node and link weights of the subset, according to (5). Notice that we do not add any new subnet this time. Finally, we consider the fifth training pattern. The priority indexes for N1 ; N2 ; and N3 are

P3 = 13:814; P1 = 12:485; P2 = 9:246

(a)

(b)

(c)

Fig. 5. Node weights for (a) subnet 1, (b) subnet 2, and (c) subnet 3.

in decreasing order. P3 is the largest and M3 = 0:972 > 3 : The class for N3 does not match the class of the training pattern. We increase the vigilance of N3 by

3 = 0:972 + 0:000001 = 0:972001 according to (6) with = 0:000001: We then select P1 : Since M1 = 0:930 > 1 and the classes match, we modify the node and link weights of the subset for N1 ; according to (5). This ends the first round of training. We then perform the second round with the same five training patterns. The network stabilizes after this round and the link weights of the three subnets for pattern nodes N1 ; N2 ; and N3 are

[1; 1; 1; 1; 1; 0:75; 0:875; 1; 1; 0:625; 0:625; 1; 1; 1; 1; 1]; [1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1]; [1; 1; 1; 1; 1; 1; 0:875; 1; 1; 0:75; 1; 1; 1; 0:75; 1; 1]

respectively. The node weights for the three subnets are shown in Fig. 5 in which values are represented by gray levels for convenience. The vigilance values for the three pattern nodes are 1 = 0:9; 2 = 0:9; and 3 = 0:972; respectively. Note that in this case our network contains three subnets, in contrast to five subnets created in an FRNN network by Hussain and Kabuka’s method. Now, we use the above trained network to recognize unknown input patterns. Suppose we have four test patterns shown in Fig. 6. For the first test pattern, the priority indexes for N1 ; N2 ; and N3 are

P2 = 13:939; P1 = 12:979; P3 = 11:355 in decreasing order. Since M2 = 0:906 > 2 ; the first test pattern is classified to “E.” Similarly, the second and third test patterns are classified to “E” and “F,” respectively. For the fourth test pattern, the priority indexes for N1 ; N2 ; and N3 are

P1 = 12:578; P3 = 10:352; P2 = 7:955

Fig. 6. Four test patterns.

in decreasing order. The matching degrees are

M1 = 0:876 < 1 ; M3 = 0:852 < 3 ; M2

= 0:805 < 2 :

Therefore, this test pattern is classified to “E” which is the class of

N1 :

VI. EXPERIMENT RESULTS We will present the results in two stages. First, we will show that MFRNN’s are as powerful as FRNN’s based on the three preselected training pattern sets provided in [8]. Then we will show that MFRNN’s perform better than FRNN’s on training pattern sets that are not filtered by human beings. Note that all the patterns used in the following experiments were obtained from handwritten characters through a thinning preprocessor. Also, the shift parameter, d, for both FRNN’s and MFRNN’s was set to 2, and the initial vigilance used in MFRNN’s was set to 0.95. A. Simulations with Preselection We show that our method is as powerful as Hussain and Kabuka’s method based on the three experiments presented in [8]. Each experiment used a set of preselected training patterns that were so well selected that they could hardly be combined. In the first experiment, a standard printed set of 26 alphabetic characters was used as training patterns. Hussain and Kabuka’s method created 26 subnets. A set of 37 test patterns was correctly

616

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS, VOL. 28, NO. 4, AUGUST 1998

(a)

(b) Fig. 7. Patterns for numeral recognition with preselection: (a) training patterns and (b) test patterns.

(a)

(b)

(a)

(b)

(c)

(d)

Fig. 9. Patterns for numeral recognition without preselection: (a) training patterns, (b) test patterns, (c) patterns incorrectly classified by FRNN, and (d) patterns incorrectly classified by MFRNN.

(c)

Fig. 8. Patterns for rotation recognition with preselection: (a) training patterns, (b) test patterns, and (c) patterns incorrectly classified.

classified. For simplicity, we do not show the training and test patterns. Our method also created 26 subnets and all the 37 test patterns were correctly classified for this case. The second experiment used a set of 18 numerals, shown in Fig. 7(a), as training patterns. The 64 patterns shown in Fig. 7(b) were used as test patterns. Using Hussain and Kabuka’s method, 18 subnets were created and six test patterns (the last six ones) were incorrectly classified. Our method generated only 17 subnets, with the two 1’s in Fig. 7(a) being combined in one subnet. The last six test patterns could not be correctly recognized either. The third experiment dealt with recognition of rotated patterns. A set of 43 numerals, shown in Fig. 8(a), was used as training patterns. The 43 patterns, shown in Fig. 8(b), were used as test patterns. Using Hussain and Kabuka’s method, 43 subnets were created and the three test patterns, shown in Fig. 8(c), were incorrectly classified. Our method generated 42 subnets since the third and the fifth 8’s were combined in one subnet. The three test patterns of Fig. 8(c) could not be correctly recognized either. B. Simulations without Preselection In this section, we present some experiment results based on training patterns that were not preselected by human beings. The first experiment used 50 numerals, shown in Fig. 9(a), as training patterns, and another 83 patterns, shown in Fig. 9(b), as test patterns. Using Hussain and Kabuka’s method, 50 subnets were created and 20 test patterns, shown in Fig. 9(c), were incorrectly classified (recognition rate = 76%). Our method generated only 42 subnets, and 15 test patterns, shown in Fig. 9(d), were incorrectly classified (recognition rate = 82%). The second experiment used a set of 100 alphabets, shown in Fig. 10(a), as training patterns, and a set of 78 patterns, shown

(a)

(b)

(c)

(d)

Fig. 10. Patterns for alphabet recognition without preselection: (a) training patterns, (b) test patterns, (c) patterns incorrectly classified by FRNN, and (d) patterns incorrectly classified by MFRNN.

in Fig. 10(b), for test. Using Hussain and Kabuka’s method, 100 subnets were created and 33 test patterns, shown in Fig. 10(c), were incorrectly classified (recognition rate = 58%). Our method generated only 70 subnets and 18 test patterns, shown in Fig. 10(d), were incorrectly classified (recognition rate = 77%). The third experiment concerned rotation recognition. A set of 300 numerals, shown in Fig. 11(a), was used as training patterns. The 110 patterns shown in Fig. 11(b) were used as test patterns. Using Hussain and Kabuka’s method, 300 subnets were created and 18 test patterns, shown in Fig. 11(c), were incorrectly classified (recognition rate = 84%). Our method generated only 141 subnets, and 12 test patterns, shown in Fig. 11(d) were incorrectly classified (recognition rate = 89%). The results of the above experiments are summarized in Table I. Clearly, our method can combine training patterns in a subnet automatically, so preselection on training patterns by human beings is not necessary. Also, due to the fusion of training patterns, the recognition rate of MFRNN’s is higher than that of FRNN’s. Hussain and Kabuka’s method requires training patterns be presented only once in the training stage since one subnet is created for each pattern. Iterative presentation of training patterns is needed in our method, since parameters in subnets are modified until they are stable. However, experiments have shown that an MFRNN network only needs two or three iterations before it stabilizes.

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS, VOL. 28, NO. 4, AUGUST 1998

617

permits the combination of similar training patterns automatically in the same subnet. Consequently, networks obtained by our method can be much smaller than those obtained by Hussain and Kabuka’s method, and it is not required that training patterns be preselected by human experts. Moreover, with the embodiment of fuzzy theories, the recognition rate of our method can be better than that of Hussain and Kabuka’s method. Hidden Markov models (HMM) have been recently studied for handwritten character recognition to deal with uncertain and incomplete information due to confusing shapes, writer variabilities, geometric distortion, noise or minor deformation and contextual effects [3], [7], [13], [14]. However, little effort has been devoted to feature development and incorporation of subcharacter HMM’s into neural network schemes. ACKNOWLEDGMENT The authors appreciate the comments from the anonymous referees. REFERENCES (b)

(c)

(d)

(a)

Fig. 11. Patterns for rotation recognition without preselection: (a) training patterns, (b) test patterns, (c) patterns incorrectly classified by FRNN, and (d) patterns incorrectly classified by MFRNN. TABLE I COMPARISON BETWEEN FRNN

AND

MFRNN

Both FRNN’s and MFRNN’s are affected by the value of d. A large d may, on one hand, hurt recognition rate due to different patterns being grouped into the same class, but may, on the other hand, do good for recognition rate due to high resistibility to shifts in position. The size and performance of MFRNN’s are also affected by the value of the initial vigilance. A large value of vigilance allows a small number of patterns to be combined in a subnet, resulting in a large network size and a low tolerance for variations. On the contrary, a small value of vigilance encourages many patterns to be combined in a subnet, resulting in a small network size and high resistibility to noises. VII. CONCLUSION We have proposed an improvement to Hussain and Kabuka’s method. The use of vigilance parameters and matching degrees

[1] G. A. Carpenter, S. Grossberg, N. Markuzon, J. H. Reynold, and D. B. Rosen, “Fuzzy ARTMAP: A neural network architecture for incremental supervised learning of analog multidimensional maps,” IEEE Trans. Neural Networks, vol. 3, pp. 698–713, Sept. 1992. [2] G. A. Carpenter, S. Grossberg, and J. H. Reynolds, “ARTMAP: Supervised real-time learning and classification of nonstationary data by a self-organizing neural network,” Neural Networks, vol. 4, pp. 565–588, 1991. [3] Z. Chi, M. Suters, and H. Yan, “Handwritten digit recognition using combined ID3-derived fuzzy rule and Markov chains,” Pattern Recognit., vol. 29, pp. 1821–1834, Nov. 1996. [4] P. W. Frey and D. J. Slate, “Letter recognition using Holland-type adaptive classifiers,” Mach. Learn., vol. 6, pp. 161–182, 1991. [5] K. Fukushima and S. Miyake, “Neocognitron: A neural network model for a mechanism of visual pattern recognition,” IEEE Trans. Syst., Man, Cybern,, vol. SMC-13, pp. 826–834, Sept. 1983. [6] K. Fukushima and N. Wake, “Handwritten alphanumeric character recognition by the neocognitron,” IEEE Trans. Neural Networks, vol. 2, May 1991. [7] J. Hu, M. K. Brown, and W. Turin, “HMM based on-line handwriting recognition,” IEEE Trans. Pattern Anal. Machine Intell., vol. 18, pp. 1039–1044, Oct. 1996. [8] B. Hussain and M. R. Kabuka, “A novel feature recognition neural network and its application to character recognition,” IEEE Trans. Pattern Anal. Machine Intell., vol. 16, pp. 98–106, Jan. 1994. [9] J. J. Hopfield and D. W. Tank, “Neural computation of decision in optimization problems,” Biol. Cybern., vol. 52, pp. 141–152, 1985. [10] S. Lee and J. C.-J. Pan, “Unconstrained handwritten numeral recognition based on radial basis competitive and cooperative networks with spatiotemporal feature representation,” IEEE Trans. Neural Networks, vol. 7, pp. 455–474, Mar. 1996. [11] S. W. Lee, “Off-line recognition of totally unconstrained handwritten numerals using multilayer cluster neural network,” IEEE Trans. Pattern Anal. Machine Intell., vol. 18, pp. 648–652, June 1996. [12] Y. Li, D. Lopresti, G. Nagy, and A. Tomkins, “Validation of image defects models for optical character recognition,” IEEE Trans. Pattern Anal. Machine Intell., vol. 18, pp. 99–108, June 1996. [13] M. Mohamed and P. Gader, “Handwritten word recognition using segmentation-free hidden Markov modeling and segmentation-based dynamic programming techniques,” IEEE Trans. Pattern Anal. Machine Intell., vol. 18, pp. 548–553, May 1996. [14] H.-S. Park and S.-W. Lee, “Off-line recognition of large-set handwritten characters with multiple hidden Markov models.” Pattern Recognit., vol. 29, pp. 231–244, Feb. 1996. [15] H. J. Zimmermann. Fuzzy Set Theory and Its Applications, 2nd ed. Norwell, MA: Kluwer, 1991.