Unconstrained Handwritten Character Recognition ... - Semantic Scholar

Unconstrained Handwritten Character Recognition Using Different Classification Strategies Alessandro L. Koerich Department of Computer Science (PPGIA) Pontifical Catholic University of Paran´a (PUCPR) Curitiba, PR, Brazil E-mail: [email protected]

Abstract In this paper we tackle the problem of unconstrained handwritten character recognition using different classification strategies. For such an aim, four multilayer perceptron classifiers (MLP) are built and used into three different classification strategies: combination of two 26– class classifiers; a 26–metaclass classifier and a 52– class classifier. Experimental results on the NIST SD19 database show that better recognition performance is achieved by the metaclass classifier in which the uppercase and the lowercase representations of the characters are merged into single classes.

1 Introduction The recognition of handwritten characters is very challenging and it has been the subject of much attention in the field of handwriting recognition. Several proposals to solve this problem have been presented in the last years [2, 4, 6, 8, 9, 10]. However, most of the research efforts have been focused on the recognition of digits. In fact, digit recognition is just a subset, for which the solutions are much simpler and robust. But when we talk about the recognition of alphabetic characters, this problem becomes more complicated. The most obvious difference is the number of classes that can be up to 52, depending if uppercase (A–Z) and lowercase (a–z) characters are distinguished from each other. Consequently, there are a number of ambiguous alphabetic characters other than numerals. Character recognition is further complicated by other factors such as multiple patterns to represent a single character, cursive representation of letters, and the number of disconnected and multi–stroke characters. Most of the approaches that have been proposed to the recognition of letters focus on specific writing styles: uppercase or lowercase characters [2, 8, 10]. The accuracy reported for the recognition of uppercase characters is between 90% and 98% and between 80% and 90% for lowercase characters, depending on the testing conditions and

database used. A more difficult problem is the recognition of unconstrained characters because the writing style is not know a priori. This is a very relevant problem because most of the practical applications do not give any hint about the writing style, so, the recognition system itself has to find out and manage different writing styles. The difficulties related to the recognition of unconstrained handwritten characters arise from the fact that there are characters which belong to different classes but which have very similar shapes (e.g. ”O” and ”D”, ”q” and ”g”, ”I” and ”l”). The accuracy reported for the recognition of unconstrained handwritten characters has been between 59% and 83% [1, 2, 6, 7, 12]. Neural networks classifiers (NN) have been used extensively in character recognition [1, 3, 4, 5, 6, 8]. Many recognition systems are based on multilayer perceptrons (MLPs) [1, 3, 4, 5, 8]. Gader et al. [4] describe an algorithm for handprinted word recognition that uses four 27– output–4–layer backpropagation networks to account for uppercase and lowercase characters. Recognition rates of 94% and 82% were achieved for uppercase and lowercase characters respectively. A similar approach but with a different feature set was proposed in [5]. Recognition rates of 86.24% for uppercase characters and 83.45% for lowercase characters were achieved. Instead of using separated networks to account for uppercase and lowercase characters, Blumenstein et al. [1] have used a 52–output neural network representing 26 uppercase and 26 lowercase characters. Case sensitive and case non–sensitive experiments were conducted and the recognition accuracy achieved was almost 60%. Dong et al. [3] have presented a local learning framework consisting of quantization and ensemble layer for recognition of lowercase handwritten characters. Such an approach achieved recognition rate of 92.34% on a cleaned set of NIST database. Pedrazzi and Colla [8] presented a simple feature set for handprinted character representation build by the combination of average pixel density and measures of local alignment along some directions. Classification is carried out by a MLP and recognition rates of 96.08% and 87.40% are achieved for uppercase and lowercase characters respectively.

Is unconstrained character recognition a 26–class or a 52–class problem? In fact, for most practical applications it is not important to know the writing style but only which is the letter. In this paper we investigate the recognition of unconstrained handwritten characters using different number of classes and different classification strategies. Section 2 presents the feature set used to represent characters. Section 3 presents some strategies to build and combine classifiers according to the number of classes at the output. The experimental results are presented in Section 4. Some conclusions are drawn in Section 5.

2 Feature Extraction Several different types of features have been proposed to represent handwritten characters [2, 4, 6, 8, 9, 10, 11]. We have developed several types of features such as surface, extrema, orientation, excentricity, H/W ratio, and different zoning. We have carried out some exploratory experiments to determine which combination of features achieves the best recognition rates on the NIST SD19 database. The recognition rate was not the only criterion, we have also taken into account the dimension of the resulting feature vector where smaller is better. Such an empirical evaluation lead us to build a 108–dimensional feature vector by combining 3 different types of features: profiles, projection histograms, and contour–directional histograms. Profiles The profile counts the number of pixels (distance) between the bounding box of the character image and the edge of the character. The profiles describe well the external shapes of characters and allow to distinguish between a great number of letters, such as “p” and “q”. Since the profiles depend on the image dimension, the features are made scale independent by normalizing the profiles to ten bins at each axis to have an equal number of elements for all characters1 . The profiles are taken at 4 positions: top, bottom, left and right hand sides as illustrated in Figure 1. Projection Histograms Projection histograms count the number of pixels in each column and row of a character image [11]. Projection histograms can separate characters such as “m” and “n” (3 and 2 peaks in vertical projection respectively) or “E” and “F” (3 and 2 peaks in horizontal projection respectively). The projection histograms are taken at the vertical and horizontal axis and they are made scale independent by normalizing to ten bins at each axis to have an equal number of elements for all characters. Figure 2 shows the vertical and horizontal projection histograms for the letter “a”. 1 The

number of bins was determined empirically by exploratory experiments on the validation dataset where the character recognition rate and the dimensionality were used as evaluation criteria.

Figure 1. The four projection profiles for the letter “a": top, bottom, left and right hand side profiles.

Figure 2. The vertical and horizontal projection histograms for the letter “a”.

Contour–Directional Histogram The contour of the character image is given by the outer and inner boundary pixels that can be easily found by examining each pixel within a 3×3 window. Figure 3 shows the contour for the letter “a”. The resulting contour is divided into 3×2 zones as show in Figure 3 [11]. For each of these zones the contour is followed and a directional histogram is obtained by analyzing the adjacent pixels in the 3×3 neighborhood of each contour’s pixel. The goal of the zoning is to obtain local characteristics instead of global characteristics.

Figure 3. The contour extracted from the letter “a” and the contour split in 6 parts corresponding to the 6 zones (3×2).

3 Design of the Character Classifier We have designed a simple unconstrained character recognizer based on a multilayer perceptron (MLP) with one hidden layer. The choice of such a classifier to perform the character recognition task is determined by some constraints such as: estimation of a posteriori probabilities at the output and recognition speed. To build a MLP classifier basically we have to determine the number of layers and the number of neurons in each layer. The number of hidden neurons was determined by a rule of thumb and some exploratory experiments where the error rates on the training and validation sets were used as criteria. Network output estimates a posteriori probabilities and the value of each output necessary remains between zero and one because of the sigmoidal function used. Many different classification strategies could be used to recognize unconstrained handwritten characters, however, in the scope of this paper, we have considered the followings: • A 52–class classification problem: uppercase and lowercase representations of a single character are considered different classes (e.g. “A” and “a” are two distinct classes). A network with 52 outputs was designed (A–Z, a–z); • A 26+26–class classification problem: uppercase and lowercase representations of a single character are considered different classes. Two networks with 26 outputs, one for uppercase and one for lowercase characters were designed and the outputs are combined by several rules; • A 26–class classification problem: uppercase and lowercase representations of a single character are merged into a unique class called metaclass (e.g. “A” and “a” form the metaclass “Aa”). One network with 26 outputs was designed; A question that may arise is if it is really necessary to consider a 52–class problem. In many practical applications it is not important to recognize if a character is uppercase or lowercase. Furthermore, some character shapes are very similar, like an uppercase “O” and a lowercase “o”, an uppercase “V” and a lowercase “v”, an uppercase “C” and a lowercase “c”, an uppercase “I” and a lowercase “i”, etc. So, it seems useless to use up network resources to account for such cases where the separation between classes does not make sense in practice. On the other hand, some character shapes are not very similar, such as an uppercase “A” and a lowercase “a”, an uppercase “E” and a lowercase “e”, an uppercase “G” and a lowercase “g”, etc. So, the strategy of merging all uppercase and lowercase classes may also introduce some confusions to the recognition process.

Table 1. Recognition rates for the uppercase classifier and the lowercase classifier on the respective datasets. Dataset

Lowercase

Uppercase

Training Validation Test Training Validation Test

Recognition Rate (%) NN26Lower NN26Upper 95.82 — 90.06 — 86.73 — — 97.87 — 93.60 — 92.49

Our goal here is to determine which classification strategy is the most appropriate for unconstrained handwritten character recognition. To such an aim, several experiments and analyses are carried out in the next section.

4

Experimental Results and Analysis

The recognition strategies proposed in the previous section were implemented and tested on the NIST SD19 database which contains 814,255 binary alphanumeric characters. All experiment were conducted on a PC AMD Athlon 1.1GHz with 512MB of RAM and the average throughput is 4770 characters per second. From the hsf0, hsf1, hsf2, and hsf3 sets of the NIST database 1,440 samples per character class (A–Z, a–z) were taken randomly for training the classifiers using the backpropagation algorithm. Three different feature vectors were generated: one composed by 37440 uppercase characters (A–Z), one composed by 37440 lowercase characters (a–z) and one composed by 74880 characters resulting from the merging of both previous datasets. From the hsf7 set were generated three feature vectors to be used as validation sets during the training procedure to watch over the generalization and to stop the training at the minimum of the error. The first validation set is composed by 12092 uppercase characters, the second is composed by 11578 lowercase characters and the last with 23670 characters is composed by the combination of both previous datasets. From the hsf4 set were generated other three feature vectors to test the performance of the classifiers. The first testing set has 11941 uppercase characters, the second is composed by 12000 lowercase characters and the last, which is the combination of both previous datasets, has 23941 characters. First, two 26–class classifiers were built, one for uppercase characters (NN26Upper) and one for lowercase characters (NN26Lower). Both classifiers have 100 units in the hidden layer and 26 outputs. These classifiers were trained with 37440 uppercase characters (A–Z) and 37440 lowercase characters (a–z) respectively. Table 1 shows the recognition rates achieved by both classifiers on the respective training, validation and test datasets. In this experiment we have recognized uppercase and lowercase characters separately just to check the performance of the feature set on both datasets. The recogni-

tion rates achieved are satisfactory. However, our goal is to recognize unconstrained characters, that is, the recognition system do not know a priori the writing style of the characters. So, the outputs of the NN26Upper and the NN26Lower classifiers were combined by three combination rules: average, maximum and product. The recognition rates achieved by the combination of both classifiers are shown in Table 2. The results are case insensitive and they were obtained on the merged datasets (uppercase+lowercase). Next, a metaclass classifier (NN26UpperLower) was designed, which has 100 units in the hidden layer and 26 outputs. For such a classifier, we have considered uppercase and lowercase representations of characters as belonging to the same classes (metaclasses). This classifier was trained and validated with the merged dataset composed by 74880 and 23670 characters respectively. The recognition rates achieved by the NN26UpperLower classifier are shown in Table 2. Finally a 52–class classifier (NN52UpperLower) was designed, which has which has 150 units in the hidden layer and 52 outputs. This classifier was also trained and tested with the merged datasets. The output of such a classifier was obtained in two ways: case sensitive and case insensitive. In the first condition, we have considered if both the character class and case were recognized correctly (e.g. “A” is recognized as “A”). In the second condition, we have only considered if the character class was recognized correctly (e.g. “A” is recognized as “A” or “a”). The recognition rates achieved by such a classifier are also shown in Table 2. From the results shown in Table 2 it is clear that the metaclass classifier (NN26UpperLower) provides better recognition rates than the other classification strategies. The results achieved by the 52–class classifier are also very good even if more classes are involved in the network training. On the other hand the combination of two specialized classifiers did not have produced good results. It would be rash to conclude that the combination is not a good strategy. Maybe other combination rules might produce better results.

4.1 Analysis of Confusions It is interesting to investigate why the recognition scheme based on metaclasses has performed better than the others. To such an aim, we look at the effects of the metaclasses into the individual classes. Table 3 shows the recognition rates achieved by the metaclass classifier for each metaclass. The metaclass that merges uppercase and ˆ It is also shown in Table lowercase “a’s” is denoted as A. 3, the average recognition rate achieved by the 52–class classifier. Here we take the average of the recognition rates for the uppercase and the lowercase characters. The average recognition rate of the uppercase and lowercase “a’s” is denoted as “A-a”. The last column of the Table shows the difference in the recognition rate between the

Table 3. Recognition rates for each class on the training dataset for the metaclass classifier and for the 52–class classifier) Metaclass Classifier Recognition Class Rate (%) ˆ A 95.38 ˆ B 97.60 ˆ C 97.15 ˆ 96.77 D ˆ E 97.29 Fˆ 97.15 ˆ G 89.65 ˆ H 95.35 Iˆ 87.01 Jˆ 95.14 ˆ K 97.78 ˆ L 84.97 ˆ M 98.51 ˆ N 97.08 ˆ O 96.32 Pˆ 98.16 ˆ Q 92.95 ˆ R 97.08 Sˆ 97.57 Tˆ 98.23 ˆ U 95.00 Vˆ 96.04 ˆ W 98.19 ˆ X 97.15 Yˆ 96.01 ˆ Z 98.96

52–Class Classifier Recognition Class Rate (%) A-a 91.08 B-b 96.66 C-c 67.53 D-d 95.00 E-e 96.10 F-f 69.27 G-g 82.46 H-h 94.76 I-i 70.28 J-j 85.14 K-k 74.16 L-l 78.48 M-m 69.13 N-n 94.34 O-o 62.98 P-p 77.22 Q-q 90.13 R-r 95.31 S-s 66.14 T-t 94.44 U-u 73.12 V-v 67.25 W-w 81.21 X-x 76.84 Y-y 75.69 Z-z 79.59

Difference in Recognition Rates (%) 4.3 0.94 29.62 1.77 1.19 27.88 7.19 0.59 16.73 10.00 23.62 6.49 29.38 2.74 33.34 20.94 2.82 1.77 31.43 3.79 21.88 28.79 16.98 20.31 20.32 19.37

26–metaclass classifier and the average of the uppercase and lowercase output of the 52–class classifier, for each class. We can observe that for some classes, the difference is very significant (e.g. over 10%) while for others, the difference is relatively small. So, why such differences? They can be viewed as a hint that shows which classes are more suitable to be merged. For example, there is not an advantage in merging uppercase and lowercase “a’s”, on the other hand, merging uppercase and lowercase “c’s” seems to be very interesting because many confusions can be eliminated at the classification level. Such a conclusion agrees with differences in shapes. For example, uppercase and lowercase “c’s” have similar shapes and considering both as different classes causes many confusions. On the other hand, uppercase and lowercase “b’s” have quite different shapes, so, merging both representation into a single metaclass will not bring any advantage. Such observations suggest that neither a 26–metaclass nor a 52–class problem are the most appropriate approaches to deal with unconstrained handwritten characters. Possibly, a strategy in between both, that merges only the uppercase and lowercase classes that have similar shapes, will achieve better recognition results.

Table 2. Recognition rates achieved by the combination of the specialized classifiers, by the metaclass classifier (NN26UpperLower) and by the 52–class classifier. Dataset

Uppercase + Lowercase

Training Validation Test

Recognition Rate (%) Combination NN26Upper & NN26Lower NN26 Average∗ Maximum∗ Product∗ UpperLower∗ 89.90 90.32 77.64 95.71 85.95 86.09 75.26 91.24 82.60 82.92 72.86 87.79

NN52UpperLower Case Case Sensitive Insensitive 80.94 93.83 73.60 89.55 69.09 86.34

*The results are case insensitive.

References Table 4. Summary of results in isolated handwritten character recognition Dataset Reference Blumenstein[1] Camastra[2] Kimura[6] Kimura[7] Yamada[12]

Database CEDAR CEDAR USPS CEDAR CEDAR

# Samples 1,212 16,213 9,539 NA 1,219

Recognition Rate (%) Case Case Sensitive Insensitive 56.11 58.50 NA 81.72 76.31 83.87 61.30 70.03 67.80 75.70

5 Conclusion We have developed three classification strategies to recognize unconstrained handwritten characters. The experiments that have been carried out on the NIST database to assess the performance of each classification strategy have shown that it is preferable to deal with a 26–class classification problem, where uppercase and lowercase characters are merged into metaclasses, even if for some character classes, the shape of uppercase and lowercase characters varies significantly. This can be due to the amount of data used to train each metaclass, which is twice the amount of data used to train each class in the other classifiers. Table 4 presents the performance of some other character recognition systems. The recognition rates achieved in this paper compare favorably with the results reported in other works, which vary from 59% to 83%. However, notice that a direct comparison is not fair, since the results were obtained on different test conditions and on different datasets. Besides the good performance in terms of recognition rate, the recognition strategy based on the metaclass classifier is simple and requires short training and recognition times. It is important to notice that the features used in this work were not adapted to metaclasses. Further studies are necessary to find a feature set that better represents the intra-class variation (upper–lower) of the shapes into the metaclasses. We expect that this new feature set will improve further the recognition rates. Finally, the last remark is that we believe that the metaclass strategy can be further improved by merging not all uppercase and lowercase representations, but only those that have similar shapes and those that introduce many errors when considered separately.

[1] Blumenstein and B. Verma. Neural–based solutions for the segmentation and recognition of difficult handwritten words from a benchmark database. In Proc. 5th International Conference on Document Analysis and Recognition, pages 281–284, Bangalore, India, 1999. [2] F. Camastra and A. Vinciarelli. Cursive character recognition by learning vector quantization. Pattern Recognition Letters, 22:625–629, 2001. [3] J. Dong, A. Krzyzak, and C. Y. Suen. Local learning framework for recognition of lowercase handwritten characters. In Proc. International Workshop on Machine Learning and Data Mining in Pattern Recognition, page to appear, Leipizig, Germany, 2001. [4] P. Gader, M. Whalen, M. Ganzberger, and D. Hepp. Handprinted word recognition on a nist data set. Machine Vision and Applications, 8:31–41, 1995. [5] P. D. Gader, M. A. Mohamed, and J. H. Chiang. Handwritten word recognition with character and inter– character neural networks. IEEE Transactions on Systems, Man and Cybernetics – Part B, 27:158–164, 1994. [6] F. Kimura, N. Kayahara, Y. Miyake, and M. Shridhar. Machine and human recognition of segmented characters from handwritten words. In Proc. 4th International Conference on Document Analysis and Recognition, pages 866–869, Ulm, Germany, 1997. [7] F. Kimura, M. Shridhar, and Z. Chen. Improvements of a lexicon directed algorithm for recognition of unconstrained handwritten words. In Proc. International Conference on Document Analysis and Recognition, pages 18–22, Tsukuba, Japan, 1993. [8] P. Pedrazzi and A. M. Colla. Simple feature extraction for handwritten character recognition. In Proc. International Conference on Image Processing, pages 320–323, Washington, USA, 1995. [9] S. N. Srihari. Recognition of handwritten and machine– printed text for postal address interpretations. Pattern Recognition Letters, 14:291–302, 1993. [10] H. Takahashi and T. D. Griffin. Recognition enhancement by linear tournament verification. In Proc. International Conference on Document Analysis and Recognition, pages 585–588, Tsukuba, Japan, 1993. [11] O. Trier, A. K. Jain, and T. Taxt. Feature extraction methods for character recognition. Pattern Recognition, 29(4):641–662, 1996. [12] H. Yamada and Y. Nakano. Cursive handwritten word recognition using multiple segmentation determined by contour analysis. IEICE Transactions on Information Systems, 79(5):464–470, 1996.