Combining statistical pattern recognition approach with neural ...

Combining Statistical Pattern Recognition Approach with Neural Networks for Recognition of Large-Set Categories Yoshimasa KIMURA, Tom WAKAHARA and Kazumi ODAKA NTT Human Interface Laboratories NIPPON TELEGRAPH AND TELEPHONE CORPORATION 1-1 Hikarinooka, Yokosuka-Shi, Kanagawa, 239 JAPAN E-mail address k;[email protected]

Abstract We present a two-stage hierarchical system consisting of a statistical pattern recognition (SPR) module and arti$cial neural network (ANN) to recognize a large number of categories including similar category sets. In the first stage, the SPR module performs classi$cation. If the first candidate does not belong to a pre-determined similar category set, the first candidate is accepted as the final result; other wise, the first candidate is sent to the ANN module. In the second stage, ANN performs classification f o r similar categories to select a correct candidate from the predetermined candidate set designated by the first candidate. The new scheme offers improved system performance by sharing tasks between SPR and ANN according to the degree of classification dificulty and forming specialized ANNs f o r each similar category. The system achieves higher performance f o r the recognition of 3,201 handprinted characters than a traditional system constructed with just the SPR module.

combinatorial error caused by misselection or misintegration; this is because these errors are mutually related and do not decrease simultaneously which makes the problem more complex. Only empirical system designs have been examined but none known to the authors can produce the desired results. Statistical pattern recognition (SPR) has been investigated for a long time and its effectiveness is well known. BP and SPR have been compared [SI but BP-SPR cooperation appears to be the more effective approach [9] because the real world recognition problem is too difficult to solve with a single method. Indeed, the SPR approach seems more suitable for solving the large-set category recognition problem, while the ANN approach is powerful against small-set categories; combining them is the most promising approach. This paper proposes a new scheme which combines SPR and ANN in a cascade structure and demonstrates its effectiveness through its application to handprinted character recognition.

2. Basic concept of combining scheme 1. Introduction Multi-layer perceptron (MLP) trained with backpropagation (BP) [ 11 was originally applied to recognize small-sets composed of similar categories whose feature distributions are close or whose decision boundaries are very complex [2],[31 because it is superior in separating close or complex distributions. The need to recognize largeset categories has spurred research into large-scale neural network systems based on multi-stage tree structures with artificial neural networks (ANN) on each node of each stage [4]-[7]. This research led to two combining schemes for ANNs [4]: the selection scheme [5]-[7] and the integration scheme [41. However, both schemes use far too many ANNs. Increasing the number of ANNs makes it hard to adjust network size and the categories processable by the ANNs so as to minimize ANN recognition error and the

0-7803-4122-8/97 $10.0001997 IEEE

1,429

As the SPR approach, we choose the subspace method (SM) [lo], which performs classification in a lowdimensional subspace generated from a high-dimensional feature space by linear transformation using principal component analysis (PCA). PCA is optimal in the sense that it approximates feature distributions so as to minimize the mean-square approximation error of inter-category. Moreover since the subspace is formed for individual categories it is easy to increase the number of categories. SM is useful for the classification of large-set non-similar categories. However, SM is weak against similar categories because it does not use intra-category information. On the other hand, MLP possesses converse classification characteristics because it uses non-linear decision boundaries configured by training of feature distributions among intracategories. The categories are fixed thus it is difficult to

SPR module Input +-Feature extractor pattern

+ .

SuLspace method

Selector -

Final if C ( * ) E R to ANN

ANNmodule

----

K similar categories \

w

J

n(k) candidate categories

Fig. 1 Combined SPR-ANN architecture

add categories after training. This consideration suggests that combining SM and MLP is very promising because of their complementary characteristics.

3. SPR-ANN hierarchical architecture A block diagram of the proposed hierarchical recognition system is shown in Fig. 1. In the first stage, the SPR module (SM type) takes features as its input, performs classification, then outputs first candidate C'I) from the N to be recognized. The selector determines the destination of C"). If C'" does not belong to the pre-determined similar category set R={C,, C,, ..&}, C'I) is passed through and accepted as the final result. If C(')belongs to R,C") is sent to the ANN module. The ANN module consists of MLP, (k=1,2,...,K) modules which are trained beforehand so as to accurately classify the pre-determined candidate set S, involved with the n(k) candidate categories. C'" designates MLP, which satisfies the condition C'"=C, among the K MLPs. In the second stage, MLP, employes the features as its input, derives the recognition result from among the n(k) candidates, then outputs it as the final result. Although the SPR module sends C'" (i.e.CJ the ANN module performs the role of verification. Namely, MLP, is required to find that if C, is

1430

correct, it outputs C, , or else it finds the correct category hidden in the n(k) candidates. Key to the new combining scheme is the sharing of tasks between the SPR and ANN modules. The SPR module acts as both a classifier for non-similar categories and a designator which provides a candidate category set for similar categories. The ANN module acts as classifier for similar categories.

4. Design of architecture

SPR-ANN hierarchical

It is necessary to establish a design procedure which determines the elements of R and S,, and parameters K and n(k) to realize the system shown in Fig. 1. We assume that a category whose correct recognition rate is lower than the threshold is regarded as a similar category; the design procedure uses training samples as follows; Step 1 Generate the SM module for each category. Step 2 Generate R by selecting the categories C, whose correct recognition rate r(C,)% with SM does not exceed the threshold p%, i.e. R={C, I r(C,) < p}. K is decided automatically by the above process. Step 3 Obtain category series C,' (i=1,2,...,N) arranged in

-

-

descending order according to the probability p(C,') that category C,' is the correct category when C, appears as the first candidate as the result of SM. In this case, the properties N

p(C,') zp(C,'+') and @(C,')=l hold. Generate S,, which i- 1

corresponds to C,, by selecting C,' which satisfies the condition that until the cumulative probability of p(C,') for the top m categories exceeds, for the first time, the threshold m-l

m

~ , 2..., , C ~x~(c,)< I p S X ~ ( C ~ ) The >. 1 i- I

p, i. e. s,={c,',

1-

number of categories, n(k), which MLP, must learn, is given

as n(k)=m. p means the inclusion rate with which the correct category exists among the top n(k) categories. Step 4 Train MLP, with BP using the samples which belong to S, until the recognition rate of S , exceeds the threshold y o . If the recognition rate of S, can not exceed YOwithin the pre-determined training cycle number T, the weights of MLP when the recognition rate of S , reached its maximum value are selected. The proposed scheme is significant in the point that it separates the total recognition error into three types; SM error, selection error in MLP, and MLP error, and controls each error by manipulating parameters B, p and y, respectively. It makes system design easy, that is, it provides one solution to the complex problem underlying the design of large-scale neural network systems.

-

consisting of 2,965 Kanji characters, 166 Kana (Japanese syllabary) characters, 34 symbols and alphanumerics. The training data (1,350 samples of each category) and the test data (100 samples of each category) were written in the square style within a writing box of size of 10 X lOmm, and so contained relatively high-quality samples. However, they included similar-shaped characters as shown in Fig. 2(a)-(b) and Fig. 2(c)-(d), well known as a difficult recognition problem. This task is reasonable for our purpose from the viewpoints of both category number and category similarity. Peripheral direction contributivity features [ 111 with the dimension of 256 were extracted from each datum and then used as inputs for classification. The system used N=3201, &96%, p=0.99, T=900,000 and ~ 9 9 . 5 % This . yielded K=139, the average value of n(k) of 3.5, and ~ 0 . 9 9 2 We . note that the schemes of Ref. [4] and Ref. [5] are limited to K values in the range of 300 -400 and average n(k) values of 9- 13. Since p, which is equivalent to the correct selection rate of MLP, is high-score, the combinatorial error is considered to have been solved. These desirable results are due to the introduction of SPR. Some examples of S, are illustrated in Table 1. They show that all C,' are similar-shaped characters. This confirms that it is reasonable to consider categories whose correct recognition rates do not exceed P% as similar categories.

Table I Examples of Ckand S k

5. Application to handprinted Kanji character recognition Handprinted character recognition experiments were conducted to test the performance of the proposed system. They are a representative example of the large-set category recognition problem. 3,20 1 categories were examined

El

Q

El El (a)

Fig'

*

(c)

,e

Fa

(b),%

18.91 h$l l"ll Of

bd ! (d) F7

simi'ar-shaped

14131

Weevaluate performance from twoviewpoints. (1) Individual performance of each module: The correct recognition rates of the samples which pass through in Fig. 1 are 99.13% and 98.79% for the training data and the test data, respectively. Thus SM is sufficiently powerful for non-similar categories. The correct rates of MLP are 98.92% and 95.41% for 139 categories in the training data and the test data, respectively, while the rates of SM are lower at 93.51% and 92.28% for the same data. MLP significantly reduced the error rate by 83.3% and 40.5%. More surprisingly, Table 2 shows MLP achieved the error reduction rates of 81.9% and 53.8% for quite similar-shaped characters on the test data of ",%" and "8EFJt',

input character

data

recognition rate

,Ei

training test

SM 86.2% 76.3%

b7

training test

87.0% 84.0%

whereas the average rate is 40.5%. These results confirm that each module performs well in its appointed task and the specialized MLP for individual similar categories is remarkably effective. (2) Overall system performance The correct recognition rates of the proposed system are 99.09% and 98.59% for the training data and the test data, respectively. The rates of the traditional system consisting only of SM, ( constructed by eliminating the ANN module from the system shown in Fig. 1), are 98.85% and 98.45%, respectively. This means that the proposed system reduces the error rate by 20.6%and 8.8% from that of the traditional system. These results show that the new scheme increases system performance.

6. Conclusion We have presented a new scheme which hierarchically combines an SPR module and an ANN module. The idea is that SPR classifies non-similar categories and introduces a candidate category set for similar categories, and ANN classifies similar categories. Also the idea establishes a design method that can control the three component of the total recognition error. We confirmed that the proposed system performs well as expected through experiments on large-set character recognition. The important conclusion of this research is that sharing tasks between SPR and ANN according to the degree of classification difficulty and creating specialized MLPs for similar categories are useful in solving the large-set category recognition problem. The proposed scheme is very viable and should lead to various practical applications because of its simplicity and generality.

References [l] D. E. Rumelhalt, J. L. McCelland and the PDP Research

1432

error reduction rate

MLP 99.4% 95.7%

95.7% 81.9%

99.2% 92.6%

93.8% 53.8%

Group, Parallel Distributed Processing, The MIT Press, 1986. [2] Y. Mori and K. Yokozawa, Neural networks that leams to discriminate similar Kanji characters, Advances in NIPS, Vol. 1, pp.332-339, Morgan Kaufmann Publishes Inc., 1989. [3] Y. Kimura, Distorted handwritten Kanji character pattem recognition by a learning algorithm minimizing output variation, Proc. of IJCNNf91-Seattle,Vol. I, pp. 103-106, 1991. [4] Y. Mori and K. Joe, A large-scale neural network which recognizes handwritten Kanji characters, Advances in NIPS, Vol. 2, pp. 415-422, Morgan Kaufmann Publishes Inc., 1990. [5] A. Iwata, Y. Suwa, Y. Ino and N. Suzumura, Hand-written alpha-numericrecognition by a self-growing neural network "CombNET-II",Proc. of LJCNNt92-Baltimore,Vol. IV, pp. 228-234, 1992. [6] H.-H. Song and S.-W. Lee, A self organizing neural tree for large-set pattem classification, Proc. of the Int. Conf. on Doc. Anal. and Recog., Montreal, pp.1111-1114, 1995. [7] T. Ohhira, N. Recharanin, A. Taguchi, N. Iijima, Y. Akima and M. Sone, Chinese character recognition by the auto recognition system, Proc. of ICN"95, Perth, Vol. 5, pp.22222225, 1995. 181 P. Gallinari, S. Thiria and F. Fogelman Soulie, Multi-layer perceptrons and data analysis, Proc. of ICNN'88, Vol. I, pp. 391-339, 1988. [9] A. K. Jain, Neural networks and pattem recognition, IEEE World Congress on Computational Intelligence, Orlando, Plenary talks, June 30, 1994. 103 E. Oja, Subspace method and pattem recognition, Research Studies Press Ltd., 1983. 111 T. Akiyama and N. Hagita, Automated entry system for printed documents, Pattern Recognition, Vol. 23, No. 1 1, pp. 1141-1154, 1990.