Convolutional Neural Network Training incorporating ...

Convolutional Neural Network Training incorporating Rotation based Generated Patterns and Handwritten Numeral Recognition of Major Indian Scripts M. A. H. Akhand1*, Mahtab Ahmed1, M. M. Hafizur Rahman2, Md. Monirul Islam3 1

2

3

Dept. of Computer Science and Engineering, Khulna University of Engineering & Technology, Khulna, Bangladesh Dept. of Computer Science, KICT, International Islamic University, Malaysia, Selangor, Malaysia Dept. of Computer Science and Engineering, Bangladesh University of Engineering & Technology, Dhaka, Bangladesh

*

corresponding: [email protected] & [email protected]

Abstract: Recognition of handwritten numerals has gained much interest in recent years due to its various application potentials. Bangla and Hindi are the two major languages in Indian subcontinent and a large number of population in vast land scape uses Bangla and Devnagari numeral scripts of these two languages. Well performed handwritten numeral recognition of Bangla and Devnagari is challenging because of similar shaped numerals in both these scripts; in some cases a numeral differ from its similar one with a very few variation even in printed form. In this study, convolutional neural network (CNN) based two different methods have been investigated for better recognition of Bangla and Devnagari handwritten numerals. Both the methods use rotation based generated patterns along with ordinary patterns to train CNN but in two different modes. In multiple CNN case, three different training sets (one with ordinary patterns and two with clock wise and anti-clock wise rotation based generated patterns) are prepared; and then three different CNNs are trained individually with each of these training sets combining their decisions for final system decision. On the other hand, in case of single CNN, combination of above three training sets is used to train one CNN. A moderated pre-processing is also employed while generating patterns from the scanned images. The proposed methods have been tested on a large hand written benchmark numeral dataset and its outcomes were compared with existing prominent methods. The proposed methods have achieved remarkable recognition accuracy outperforming the existing methods on the basis of both training and test set accuracy. Moreover, the effectiveness of incorporating rotation based generated pattern to improve CNN performance has also been clearly identified from the presented experimental results.

Keywords: Convolutional Neural Network, Handwritten Numeral Recognition, Image Pre-processing.

1. Introduction Recognition of handwritten numerals has gained much interest in recent years due to its various application potentials in postal system automation, passports and document analysis, automatic bank cheque processing and even for number plate identification [1]. Research on recognition of unconstrained handwritten numerals has made impressive progress in Roman, Chinese and Arabic script

1

[1, 2, 3]. On the other hand, recognition of handwritten Indian numerals is largely neglected. Bangla and Hindi are the two important languages in Indian subcontinent and a large number of population in vast land scape use Bangla and Devnagari numeral scripts of these two languages. Common features of Bangla and Devnagari numerals consist of curves and holes. Both the scripts contain similar shaped numerals; in some cases even in printed form, it becomes very difficult to distinguish two almost similar shaped numeral with slight variation. Because of this prevarication, recognition of Bangla and Devnagari numeral is a challenging task. The objective of the present work is to develop a recognition scheme for handwritten numerals of Bangla and Devnagari scripts which is capable of providing high recognition accuracy. Towards this goal, Convolutional Neural Network (CNN) is trained for recognition purpose in this study which is found to be very efficient for image classification with its distinct features, automatically providing some degree of translation invariance [4, 5, 6]. To achieve better recognition accuracy, training is performed incorporating rotation based generated patterns in two different modes identified as single CNN mode and multiple CNN mode. In multiple CNN case, three different CNNs (having same architecture) are trained with three different training sets; among them one CNN is trained with conventional training set whereas the other two are trained with different rotation based patterns. On the other hand, in single CNN mode, combination of three training sets as considered in multiple CNN case is used to train a single CNN. The benchmark handwritten numeral image dataset maintained by CVPR unit, ISI, Kolkata [7, 8, 9] is considered in this study. The method showed satisfactory recognition accuracy on the benchmark dataset and outperformed other prominent existing methods. The rest of the paper is organized as follows. Section 2 briefly reviews the prominent works on Bangla and Devnagari numeral recognition. Section 3 discuses about Bangla and Devnagari benchmark handwritten numeral image datasets and pre-processing to prepare patterns. Section 4 explains proposed CNN based Bangla and Devnagari numeral recognition system incorporating rotation based patterns; the section also includes review of CNN structure for classification. Section 5 presents experimental results of the proposed method and performance comparison with other related works. Finally, a brief conclusion of the work is given in Section 6.

2. Literature Review Indian subcontinent is rich with various languages. In India only, there are several official languages and Hindi Devnagari script is a major one among those official languages. On the other hand, Bangla is the first language of Bangladesh and the second most popular language in India. Bangla numerals are also used in Assamese, Bishnupriya, Manipuri and Meithei languages. Although a large number of population in this vast land scape uses Bangla and Devnagari numeral scripts, studies

2

regarding these scripts are few with respect to other major languages such as English. Following subsections briefly describe several prominent works for handwritten Bangla and Devnagari numeral recognition. 2.1. Brief Survey of Bangla Handwritten Numeral Recognition

A few notable works are available for Bangla handwritten numeral recognition. Bashar et al. [10] investigated a recognition system based on windowing and histogram techniques. Windowing technique is used to extract uniform features from scanned image files and then histogram is produced from those generated features. Finally, recognition of the numeral is performed on the basis of this generated histogram. Khan et al. [11] employed an evolutionary approach to train artificial neural network (NN) for recognition of Bangla handwritten numeral. They used boundary extraction in order to extract the numeral in a single window by horizontal-vertical scanning and applied scaling to convert the image into fixed sized matrix. Then Multi-Layer Perceptron (MLP) is evolved for recognition. Basu et al. [3] used Dempster-Shafer (DS) technique for combining the classification decisions obtained from two MLP based classifiers using two different feature sets. Feature sets they investigated are called shadow feature and centroid feature. Pal et al. [2] introduced a technique based on the concept of water overflow from the reservoir for feature extraction and then employed binary tree classifier for recognition. Wen et al. [12] proposed a handwritten Bangla numeral recognition system for automatic letter sorting machine. They used Support Vector Machine (SVM) classifier combined with extensive feature extractor using Principal Component Analysis (PCA) and kernel PCA (KPCA). Das et al. [13] also used SVM for classification but used different techniques for feature selection. In their proposed scheme, a genetic algorithm (GA) based region sampling strategy has been employed to select an optimal subset of local regions containing high discriminating information about the pattern shapes. Wen and He [14] proposed a kernel and Bayesian Discriminant based method to recognize handwritten Bangla numeral. Recently, Nasir and Uddin [15] introduced a hybrid recognition system for automated postal system, which performed feature extraction using k-means clustering, Baye’s theorem and Maximum a Posteriori, then the recognition is performed using SVM. On the other hand, Bhattacharya and Chaudhuri [9] presented a multistage cascaded recognition scheme using wavelet-based multi-resolution representations and MLP classifiers. The scheme first computes features using wavelet-filtered image at different resolutions. It has two recognition stages in which the first stage involves a cascade of three MLP classifiers Most recently, Akhand et al. [16] investigated a Convolutional Neural Network (CNN) based Bangla handwritten numeral recognition system without using any feature extraction method unlike 3

other existing works. In their study, handwritten numeral images are normalized first and then CNN is employed to classify individual numeral. Since CNN is efficient for image classification with its distinct features, their CNN based method showed satisfactory classification accuracy.

2.2. Brief Survey of Devnagari Handwritten Numeral Recognition

A number of studies are available on Devnagari numeral recognition. Bajaj et al. [17] investigated a Devnagari numeral recognition system by ensemble of NN based classifiers which are trained with features sets generated in different ways. The architecture consists of three different stages where the first one categorizes the characters into distinct styles, the second one distinguishes the correct character from others by training NNs and the third one resolves ambiguities of the final classification result by integrating the output of more than one classifiers. In their work, two Kohonen self-organising nets are trained with Density features and Moment features; and a multi-layer perceptron (MLP) is trained with Segment features. Finally Bayesian classifier based method is used to combine decisions of the three classifiers for numeral recognition. Shrivastava and Gharde [18] investigated a SVM based technique trained with 18 different features of numerals. They used Automated Numeral Extraction and Segmentation Program (ANESP) for segmentation of the raw images collected from 20 different people and applied moment invariants (MIs) and affine moment invariants (AMIs) for feature extraction. Finally, Binary classification techniques along with linear kernel function in SVM are used for classification. Singh and Tyagi [19] proposed a Radial Basis Function (RBF) based technique for handwritten numeral recognition of Devnagari script. At first they applied different pre-processing techniques such as normalization and noise removal to compensate the poorness of the paper and the scanner. Then, PCA is used for feature extraction, and the centre of the basis function along with hidden-output weights are calculated. Finally, the RBF network was trained for classification task. Kumar and Ravulakollu [20] presented a study on the performance of transformed domain features in Devnagari numeral recognition. To generate features from pixel values of numeral images they applied fourier transform, discrete cosine transform, gaussian pyramid, laplacian pyramid, wavelet transform and curvelet transform. After that these feature vectors were used in different classification schemes such as feed forward NN, Cascade NNs and K-nearest neighbor (KNN) classifiers. Finally, they applied majority voting classification scheme using KNN classifiers based on the above mentioned features considering some rejection criteria. Kumar and Ravulakollu [21] also investigated ensemble of several classifiers that are trained with different feature sets extracted from the pixel values of the images. Profile based method and gradient features extracted using Kirsch and wavelet transforms are considered for feature set generation. Finally, 4

a set of classifiers such as feed forward NN, cascade NN, statistical KNN classifier were trained with the features sets, and the results of these classifiers were combined according to majority voting. Singh et al. [22] proposed a feature selection based classifier combination algorithm for handwritten Devnagari numeral recognition. They investigated gradient feature decomposition technique to generate features from numeral images and then applied information theoretic based maximum relevance minimum redundancy method to create different subsets of features. Neural networks having different architectures as well as an ensemble of NNs are trained with different subsets to identify better recognition ability. Goyal and Garg [23] investigated a method of Devnagari numeral recognition without using any pre-processing techniques except binarization like thinning and slant removal. Structural and statistical features (such as open from left side, open from right side, open from above side, open from below side, vertical crossing, horizontal crossing, triple horizontal crossing and distance etc.) are used for feature generation from the numeral images. Finally, the method used decision based classification through nested if else statements implemented in Matlab for recognition purpose.

3. Handwritten Numeral Image Data and Pre-processing The benchmark handwritten numeral image dataset maintained by CVPR unit, ISI, Kolkata [9] is considered in this study. Several recent studies used this dataset or prepared dataset considering the images as source [7, 8]. The samples are the scanned images which contain similar shaped numerals from pin codes used on postal mail pieces. The digits are from people of different age and sex groups as well as having different levels of education. The scanned images are divided into training and test sets for both Bangla and Devnagari. For Bangla, test set contains total 4000 images (having 400 samples for each of 10 digit) and training set images are 19392 images (having around 1900 images in each individual digit). In this study, all 4000 test images and 18000 training images (1800 images from each digit) are prepared and pre-processed. Similar to Bangla, total 18000 training images (1800 images from each digit) from total 18793 images and all available 3763 test images are considered for Devnagari in this study. Fig.1 shows few sample images of each numeral for both Bangla and Devnagari which helps to realize the level of ambiguity that makes the recognition task challenging. Pre-processing is performed on the images to prepare them into common form in order to feed into classifiers. The original images are in different sizes, resolutions and shapes. Matlab2015a is used to preprocess the images to same dimension and format. Fig. 2 shows the basic steps of pattern formation from scanned handwritten digit. At first, an image is transformed into binary image form with automatic thresholding of Matlab. This step removes background as well as improves intensity of black color writing. Since black color is used for writing on white paper (background), the binary image files contains

5

English Bangla Numeral Numeral

0

০

1

১

2

২

3

৩

4

৪

5

৫

6

৬

7

৭

8

৮

9

৯

Sample Bangla Handwritten Numeral Images

Devnagari numeral

Sample Devnagari Handwritten Numeral Images

० १ २ ३ ४ ५ ६ ७ ८ ९ Fig.1: Samples of handwritten Bangla and Devnagari numerals from the dataset.

more white point (having value 1) than black (having value 0). To reduce computational overhead, images are converted through foreground numeral black to white and background changed to black. Written numeral may be a portion in the scanned image that is easily visible from the foregroundbackground interchange. An image has been cropped to the actual writing portion removing black lines from all four sides (i.e., left, right, tom and bottom). Finally, images are resized into 28×28 dimension to maintain appropriate and equal inputs for all the numerals. To capture pattern values of resized images as well as to retain best possible quality in the resized images, double type matrix is considered (instead of binary in the previous stages). For better understanding of pre-processing, outcome of stepwise (1) Scanned Transform into handwritten binary image numeral

(2)

(3)

(4)

Foreground background interchange

Cropped to actual writing portion

Resized into 28×28 dimension

Pattern for classification

Fig.2: Steps of pattern formation from scanned handwritten numeral. transformation on six selected images (0 to 5) from Bangla numeral are presented in Fig. 3. It is notable that operations on Devnagari images are also similar to Bangla. Finally the resized images (i.e., their value) are used as patterns in classifier.

6

Step 1 Bangla Origninal image in Numeral tif format

Step 2

Step 3

Binary image with automatic thresholding

Forground and background interchanged

Step 4

Step 5

Cropped to Resized into original writing 28×28 portion dimension

০ ১ ২ ৩ ৪ ৫ Fig. 3: Stepwise outcomes of pre-processing for sample figures from 0, 1, 2, 3, 4 and 5.

7

4. Convolutional Neural Network Training incorporating Rotation based Patterns This section explains proposed CNN based handwritten numeral recognition scheme incorporating rotation based patterns in detail. At first, CNN structure used in this study as a classifier is explained. Then rotation based pattern generation and two different modes of CNN training incorporating the generated patterns are explained. 4.1. Review of CNN for Classification CNNs are a family of multi-layer neural networks particularly designed for use on twodimensional data, such as images. Fig. 4 shows CNN structure considered in this study which is used in several studies [16, 26, 27]. The CNN holds two convolutional layers (C1 and C2) with kernel size of 5×5 and two subsampling layers (S1 and S2) with 2×2 local averaging area. A kernel acts as a small sized filter which is nothing but a set of weights and a bias. In the input layer (I), 28×28 pixels are considered as 784 linear nodes on which convolution operation are to be performed. In the 1st convolution operation, the input image I and six kernels are convolved to produce 24×24 sized six convolved feature maps (CFMs) of C1. In the first sub-sampling operation, the six CFMs of C1 are subjected to 2×2 local averaging and produces 12×12 sized six sub-sampled feature maps (SFMs) in S1. The 2nd convolution layer (i.e., C2) contains 12 feature maps. Since input of C2 layer is six SFMs of S1 layer, in convolution operation six different kernels are applied on each of the SFMs and composed to produce an 8×8 sized CFM in C2. Therefore, total 72 (=12×6) kernels are operated to produce 12 CFMs. The second sub-sampling operation is similar to 1st sub-sampling operation and produces 4×4 sized 12 SFMs. The values of these 12 SFMs (12×4×4 = 192) are placed linearly as hidden layer (H) with 192 nodes. Finally, nodes of hidden layer are fully connected to the 10 output nodes for the numeral set. Each output node represents a particular digit and the desired value of the node was defined as 1 (and other 9 output nodes value as 0) for the input set of the pattern.

Fig. 4: Structure of a CNN considered in this study.

8

Fig. 5: Operational flowchart of a CNN.

Figure 5 shows the operational flowchart of a CNN for better understanding. In the figure, C1CFM and S1SFM represent CFMs and SFMs of C1 and S1 layers, respectively. Similarly, C2CFM and S2SFM are instances for CFMs and SFMs of C2 and S2 layers, respectively. On the other hand, K1 and K2 are the kernels of C1 and C2 layers, respectively. The upper portion of figure is the forward pass to generate CNN response and lower portion depicts backward pass to update CNN through training. The updating parameters of CNN are the hidden-output layer weights (W), two kennels (K1 and K2) and associative biases of those; the ΔW, ΔK1 and ΔK2 are gradient (i.e., correction) terms those are calculated in the backward pass. C1CFM’, S1SFM’, C2CFM’and S2SFM’ are generated as intermediary terms to calculate the above mentioned gradient terms. In convolution process, a CFM is generated using a kernel on an input feature map (IFM) which is a previous layer feature map [21]. Small portion of the IFM is termed as local receptive field (LRF) and a particular LRF with the kernel will give a particular point in the CFM. All the LRFs of an IFM with the same kernel will give a complete CFM. A common form of convolution operation to get a CFM from an IFM through kernel (K) is shown in Eq. 1. 𝐾𝐻 𝐾𝑊

𝐶𝐹𝑀𝑥,𝑦 = 𝑓 (𝑏 + ∑ ∑ 𝐾𝑟,𝑐 ∗ 𝐼𝐹𝑀𝑥+𝑟,𝑦+𝑐 )

(1)

𝑟=1 𝑐=1

Here, 𝑓(. ) is the activation function, b is bias value of the kernel, KH and KW denote the size of the kernel as KH×KW matrix which in this study is 5×5. It is useful to apply the kernel everywhere in the image. This makes sense, because if the weights and bias are such that the hidden neuron can pick out a vertical edge in a particular local receptive field then it is also likely to be useful at other places in the image. That’s why CNNs are well adapted to the translation invariance of images. While distinct kernels may produce distinct CFMs from the same IFM; operations of multiple kernels are composed to produce a

9

CFM from multiple IFMs. It is worth mentionable that original input image is the IFM of first convolution operation to produce first set of CFMs. In CNN, a sub-sampling layer is followed by each convolutional layer which simplifies the information of a CFM condensing its important feature points. Subsampling operation produces a SFM from a CFM and general form of its operation is shown in Eq. 2. 𝑅−1 𝐶−1

𝑆𝐹𝑀𝑥,𝑦 = 𝑑𝑜𝑤𝑛 (∑ ∑ 𝐶𝐹𝑀𝑥𝑅−1+𝑟,𝑦𝐶−1+𝑐 ),

(2)

𝑟=0 𝑐=0

where R and C denote the size of the pooling area as RxC matrix of CFM, here it is 2×2; down(.) represents a subsampling function on a pooling area. The size of SFM becomes 2-times smaller with respect to CMF in both spatial dimensions. As we have considered subsampling to be performed using local averaging in this study, the 4 pixels in a 2×2 area of CFM are taken and their average value is calculated which acts as a single point in the SFM. The hidden layer (between 2nd subsampling layer and output layer) nodes is the linear representation of the 2nd subsampling layer SFM values. Since hidden layer and output layer is fully connected, the final output of a particular node in the output layer is the weighted sum of hidden layer values plus bias term passing through an activation function according to Eq. 3. The class decision is taken as the highest output value generating node. 𝑁

𝑦𝑜 = 𝑓(∑(𝑤𝑜ℎ ∗ ℎ𝑜 ) + 𝑏) ,

(3)

ℎ=1

In the output layer, errors are measured by comparing desired output with the actual output. 𝑃

𝑂

1 1 2 𝐸= ∑ ∑(𝑑𝑜 (𝑝) − 𝑦𝑜 (𝑝)) , 2 𝑃𝑂

(4)

𝑝=1 𝑜=1

where P is the total number of patterns; O is the total output nodes of the problem; do and yo are the desired and actual output of a node respectively for a particular pattern p. The kernel values with bias in different convolution layers and weights of hidden-output layers are updated during training to minimize the error (E). A modified version of BP is used to train a CNN and description regarding this is available in [20, 22]. Hidden-output weights are updated using simple Back-Propagation (BP) whereas the gradient calculation of the former one requires pretty complex procedures. Starting from the output layer the hidden-output weights are updated at first. Then hidden layer values (calculated during the forward pass) are reshaped to 12 SFMs (𝑆2𝑆𝐹𝑀′ ) similar to original 2nd layer SFMs (S2SFM). Upsampling S2SFM’ and multiplying with the first derivative of C2CFM produces new 2nd layer CFMs (𝐶2𝐶𝐹𝑀′) as shown in Eq. 5. 10

𝐶2𝐶𝐹𝑀′ = 𝑓 ′ (C2CFM) ⃘ 𝑢𝑝(S2SFM′) ,

(5)

where up(.) denotes the upsampling operation. This new CFM is used in inverse convolutional operation with 2nd layer kernel (K2) to produce new 1st layer 6 SFMs (S1SFM’) as shown in Eq. 6. At the same time 1st layer subsample feature map (S1SFM) is used in inverse convolution with new 2nd layer CFMs (𝐶2𝐶𝐹𝑀′) producing the kernel gradient (∆𝐾2) at the 2nd convolutional layer as shown in Eq. 7. Now, the bias gradient of the kernel (∆b𝐾2) is computed by simply summing over the C2CFM’s and it is done using Eq. 8. 𝑆1𝑆𝐹𝑀′ = 𝑐𝑜𝑛𝑣 −1 (𝐶2𝐶𝐹𝑀′ , 𝐾2)

(6)

∆𝐾2 = 𝑐𝑜𝑛𝑣 −1 (𝐶2𝐶𝐹𝑀′ , 𝑆1𝑆𝐹𝑀)

(7)

∆𝑏𝐾2 = ∑ 𝐶2𝐶𝐹𝑀′

(8)

Similarly, 1st layer CFMs (𝐶1𝐶𝐹𝑀′), kernel gradient (∆𝐾1) and bias gradient of kernel (∆b𝐾1) are constructed using Eqs. 9, 10 and 11 similar to Eqs. 6, 7 and 8, respectively. 𝐶1𝐶𝐹𝑀′ = 𝑓 ′ (C1CFM) ⃘ 𝑢𝑝(S1SFM′)

(9)

∆𝐾1 = 𝑐𝑜𝑛𝑣 −1 (𝐶1𝐶𝐹𝑀′ , 𝐼𝐹𝑀)

(10)

∆𝑏𝐾1 = ∑ 𝐶1𝐶𝐹𝑀′

(11)

Finally, both the kernels (K1 and K2) as well as their biases and hidden-output layer weights are updated by applying the calculated gradients. First convolution layer contains total of 156 (= (5×5 +1)×6) parameters for six kernels to be updated. Similarly, total training parameters for 12 CFMs in 2nd convolutional layer are 1812 (= (6×5×5 + 1)×12). Finally, total 1920 (=192×10) weights of fully connected output layer are also updated during training. The training procedure is repeated for the defined number of epochs or until the error is minimized up to a certain level. 4.2. Rotation based Pattern Generation and CNN Training incorporating the Patterns The objective of the present work is to develop a high performance handwritten Bangla and Devnagari numeral recognition system. Toward this goal, the use of rotation based generated patterns in CNN training to improve recognition performance is the most important step of this study. Generated artificial pattern is commonly used in ensemble construction to promote diversity among individual classifiers (e.g., feed forward NNs or decision trees) hence to improve recognition performance [24, 25]. In this study, different techniques of pattern generation are investigated and such ensemble construction is found to be most effective for the small sized problems (i.e., problems having relatively small number of patterns) [25]. In case of handwritten numeral recognition, Bhattacharya and Chaudhuri [9] used random rotation and blurring to generate patterns. They trained MLPs with these generated patterns plus original patterns; and resultant training set size was 10 times larger than the original training set. 11

A simple rotation based technique has been followed in this study to generate patterns. The rotation is performed in between foreground-background interchange (Step 2) and crop operation (Step 3) of Fig. 2. It is observed that rotation before crop operation is most effective because it helps to remove some boundary line noise in the scanned image. For a defined fixed rotational angle (ɵ), two different patterns are generated from an image for clock wise and anti-clock wise rotations. Since CNN has the ability to capture rotation, a small rotation angle might enhance its recognition capability. Two different approaches are investigated to incorporate the generated patterns in CNN training and are explained in the following subsections. 4.2.1. Multiple CNN incorporating Rotation based Patterns (mCNNRP) In mCNNRP, three different CNNs (having same architecture described in Section 4.1) are trained with three different training sets; among them one is trained with ordinary training set and another two are trained with different rotation based patterns. Fig. 6 shows the steps of pattern generation for mCNNRP in this study. The steps are similar as in Fig. 2 except that in Step 3, two images are generated by rotating foreground-background interchanged image into clock wise and anti-clock wise direction for user define angle ɵ and added with ordinary one. After that cropping to the actual writing portion (Step 4) and resizing into 28×28 (Step 5) are applied on the three images separately. Finally, the outcome is three sets of patterns: one ordinary pattern set and two rotation based generated pattern sets. Figure 7 shows the structure of mCNNRP case for a sample image. In the figure, CNN2 is trained with ordinary training set. On the other hand, CNN1 and CNN3 are trained with artificially generated pattern sets by rotating original training set images into anti-clock wise and clock wise direction, respectively. mCNNRP output is achieved combining outputs of these three CNNs. Since training of CNNs are performed independently with individual training set, it can be ported on three separate parallel machines simultaneously. In such case, training time will be equal to a single CNN training with ordinary patterns. On the other hand, mCNNRP incurs a cost to get system output combining individual CNN’s output although the cost is minimal. To get mCNNRP output from three individual CNNs’ output, techniques of ensemble decision making from individual classifiers may be used and among them commonly used techniques are simple average, winner takes all and voting. In case of simple average, node wise responses of three individual CNNs’ are averaged and node having maximum average value is considered as predictive numeral class (Eq. 12). In case of winner takes all, CNN response with maximum output node value is considered as the final output (Eq. 13). Each CNN gives a class decision which is the maximum output node value

12

(1)

(2)

(3)

(4)

(5)

Image Resize into Crop to actual Generated 28×28 rotating writing portion Pattern for dimension -ɵ -ɵ rotation Transform Foreground - Image Resize into Scanned into Ordinary background without Crop to actual 28×28 handwritten binary Pattern interchange rotation writing portion dimension numeral image Image Resize into Generated Crop to actual rotating 28×28 Pattern for writing portion +ɵ dimension +ɵ rotation Fig. 6: Steps of pattern generation for mCNNRP.

Fig. 7: Structure of mCNNRP for handwritten numeral classification.

among 10 nodes and different CNNs may give class responses in different numeral classes. Among three CNNs, class response with highest value is considered as mCNNRP response. 𝑖 ∑10 𝑗=1 𝐶𝑁𝑁𝑗

𝑚𝐶𝑁𝑁𝑅𝑃(𝐴𝑣𝑔) = 𝑀𝑎𝑥 (

3

) 𝑓𝑜𝑟, 𝑖 = 1,2,3

(12)

1 2 3 𝑚𝐶𝑁𝑁𝑅𝑃(𝑊𝑖𝑛. 𝑡𝑎𝑘𝑒𝑠 𝑎𝑙𝑙) = 𝑀𝑎𝑥(Max(𝐶𝑁𝑁𝑖=1…10 ), Max(𝐶𝑁𝑁𝑖=1…10 ), Max(𝐶𝑁𝑁𝑖=1…10 )) (13)

On the other hand, in case of voting, decision of individual CNN is considered as a vote for a particular numeral class and mCNNRP response is finalized with two/three votes for a particular class. In case of all three CNNs giving class response in three different classes, the final response of mCNNRP will follow winner takes all mechanism (i.e., CNN with highest response value). 4.2.2. Single CNN incorporating Rotation based Patterns (sCNNRP) In sCNNRP, a single CNN is trained with patterns consists of ordinary patterns plus rotation based artificially generated patterns. Fig. 8 shows the steps of pattern preparation for sCNNRP which produces three patterns from a single image. In the Step 3, two images are generated by rotating foreground13

background interchanged image into clock wise and anti-clock wise direction for user define angle ɵ and finally added with the original pattern. After that, cropping to the actual writing portion (Step 4) and resizing into 28×28 (Step 5) are applied on these three images. Finally, the outcome is a single set of patterns in which three patterns are available for each handwritten image: one ordinary pattern and two rotation based generated patterns. In another sense, the training set of sCNNRP is the combination of individual training sets of three CNNs in mCNNRP. Figure 9 shows the structure of sCNNRP for a sample image case. The CNN structure described in Section 4.1 is used in sCNNRP but significance is that it is trained with relatively larger training set incorporating rotation based generated patterns. Generated patterns will motivate the CNN to recognize a large set of diverse patterns along with the patterns having little distortion. CNN has the property of being rotation and translation invariant but how much rotation can be handled without distortion is not defined. sCNNRP mode infer the insight of maximum rotation level up to which CNN can handle the invariance. It is notable that due to larger training set, the training time will increase but in case of testing it will not incur any additional cost because an ordinary pattern of testing image will be used to get CNN response. (1)

(2)

(3) Image rotating -ɵ

Transform Foreground Scanned into binary background handwritten image interchange numeral

Image without rotation Image rotating +ɵ

(4)

(5)

3 Patterns Crop all 3 Resize all 3 (One images to images into ordinary actual 28×28 pattern and writing dimension two portion generated patterns)

Fig. 8: Steps of pattern generation for sCNNRP.

Fig.9: Structure of sCNNRP for handwritten numeral classification.

14

5. Experimental Studies This section investigates the effectiveness of the proposed recognition scheme for both Bangla and Devnagari handwritten numerals. System classification accuracy on both training and test sets are presented and performance comparison with the existing methods are based on the test set classification accuracy. At first experimental setup is explained and then results along with analysis are presented for Bangla and Devnagari in two different sub sections. 5.1. Experimental Setup The proposed mCNNRP and sCNNRP are implemented in Matlab R2015a managing well-known Matlab based CNN toolbox [26]. The experiment has been conducted on HP pro desktop machine (CPU: Intel Core i7 @ 3.60 GHz and RAM: 8.00 GB) in Window 7 (64bit) environment. The rotational angle (ɵ) to generate training pattern is considered as a user defined parameter and varied from 10 to 40 degree. The batch wise training has been performed in this study due to large sized training set; and experiments have been conducted with different batch sizes from 10 to 150. Weights of the CNN are updated once for a batch of image patterns and batch size (BS), i.e., number of patterns in a batch, is considered as a user defined parameter in such a way that total 18000 training patterns are completely dividable by the BS value. For the experiments, the learning rate (i.e, eta) values was considered as 1.0. In mCNNRP, three different combination techniques of individual CNNs’ response (i.e, simple average, winner takes all, multiple voting) have been tested and no significant difference is observed among these combination methods. Therefore, the results are presented for only simple average case using Eq. 12. The benchmark datasets used in this study provided training and test sets separately; patterns of training set are used to develop the proposed system (i.e., sCNNRP or mCNNRP); and test set was reserved and not used in any stage of training. Since aim of any recognition system is to get proper response on unseen data, test set accuracy is presented to measure performance of the developed system as of existing studies. Test set accuracy of the proposed system is checked for different fixed number of iterations and reported accordingly. 5.2. Recognition Results on Bangla Numeral Figure 10 shows test set accuracy of proposed mCNNRP for two BS values (i.e., 50 and 100) varying rotational angle (ɵ) from 10 to 40 degree. The standard CNN (sCNN), i.e., single CNN training following conventional way of a single pattern for an image, is also considered for better understanding and is marked as ‘sCNN’ in the figure. It is clearly observed from the figure that mCNNRP with any rotational angle is better than sCNN(i.e. traditional single CNN without incorporating any generated pattern). Such scenario is clearly observed from Fig. 10(b) for BS=100. Fig. 11 shows test set accuracy of proposed sCNNRP for two BS values (i.e., 50 and 100) with different rotational angle varying from 15

10 to 40 degree. It is observed from the figure that accuracy improves with rotational angle up to a certain level and after that, accuracy is found to be worse than sCNN. For BS=100, the test set accuracy for ɵ =10 is much better than sCNN, but accuracy for ɵ =30 is worse than sCNN. More sensitivity with increasing rotational angle in sCNNRP is logical because for larger rotational angle, the generated patterns are more dislike compared to the original ones and many of them coincide with patterns of other numerals. It is to be metioned that, in mCNNRP, one CNN (i.e., CNN2) is trained in the conventional way as of sCNN. 6.

98.5 98.5

Test Set Accuracy (%)


98 98

sCNN ɵ = 10

97.5

ɵ = 20 ɵ = 30

97

97.5 sCNN ɵ = 10 97

ɵ = 20 ɵ = 30

ɵ = 40 96.5

96.5

ɵ = 40

96

96 0

50

100

150

200

250

0

300

50

100

Iteration

(a) Test set accuracy of mCNNRP for BS= 50.

150 200 Iteration

250

300

(b) Test set accuracy of mCNNRP for BS= 100.

Fig. 10: Test set accuracy of proposed mCNNRP for BS= 50 and 100 varying rotational angle ( ɵ) from 10 to 40 degree on handwritten Bangla numeral. 99

98.5

sCNN ɵ = 10 ɵ = 20 ɵ = 30 ɵ = 40

98.5


Test Set Acuracy (%)

98 sCNN ɵ = 10 ɵ = 20 ɵ = 30 ɵ = 40

97.5

97

98

97.5

96.5

97

96.5

96

96 0

50

100

150 Iteration

200

250

300

0

50

100

150

200

250

300

Iteration

(a) Test set accuracy of sCNNRP for BS= 50. (b) Test set accuracy of sCNNRP for BS= 100. Fig. 11: Test set accuracy of proposed sCNNRP for BS= 50 and 100 varying rotational angle (ɵ) from 10 to 40 degree on handwritten Bangla numeral.

16

Table 1 presents test set accuracies of both mCNNRP and sCNNRP after 300 iterations for different batch sizes as well as different rotational angles (ɵ) in pattern generation. From the table it is observed that mCNNRP outperformed sCNN for any rotational angle with any BS value; however smaller BS value with relatively large ɵ is shown to achieve better performance. On the other hand, sCNNRP for smaller BS value with larger ɵ seems inferior to sCNN and best performance of sCNNRP is found for small ɵ value along with relatively large BS value. We have observed the recognition accuracy of the system for various fixed number of iterations with different BS and ɵ values. Finally, the best test set accuracy is achieved by mCNNRP is for BS=10, ɵ =40 at 220 iteration and the achieved accuracy is 98.88%. On the other hand, the best test set accuracy for sCNNRP is 98.98% for BS=100, ɵ = 10 at 280 iteration. At that point sCNNRP misclassified 41 test patterns out of total 4000 patterns and Table 2 shows the confusion matrix of test set samples at that point. From the table it is observed that the proposed method truly classified all test set examples of “২” and “৪” and misclassified only one sample for each of “৩” and “৭”. On the other hand, sCNNRP showed worse performance for the numeral “১” and “৯”; classifying 386 and 390 cases for “১” and “৯”, respectively out of 400 test cases for each one. Among the Bangla numerals, these two numerals seem most similar even in printed form. Numeral “১” classified as “৯” in five cases; on the other hand “৯” classified as “১” in eight cases. Also in the Bangla handwritten numeral script, “৫” and “৬” looks similar; therefore in four cases system misclassified them as one another. Table 3 shows some handwritten numeral images from total 41 misclassified images by sCNNRP. It is observed from the table that due to large variation in writing styles, such images are difficult to correctly recognize even by human. All other misclassified images are also found ambiguous, therefore misclassification by the system is acceptable. Table 1: Performance evaluation of mCNNRP and sCNNRP for different Batch Size (BS) and Rotational angle (ɵ) after 300 iterations for handwritten Bangla numeral. Batch Size

10 25 50 100 150

mCNNRP test set accuracy in % sCNN

98.25 98.30 98.40 97.95 97.48

sCNNRP test set accuracy in %

ɵ = 10

ɵ = 20

ɵ = 30

ɵ = 40

ɵ = 10

ɵ = 20

ɵ = 30

ɵ = 40

98.55 98.60 98.40 98.35 98.23

98.58 98.63 98.55 98.50 98.43

98.80 98.63 98.68 98.45 98.43

98.78 98.63 98.78 98.58 98.33

98.13 98.45 98.58 98.95 98.45

98.18 98.03 98.48 98.33 98.53

97.70 97.15 98.05 97.85 97.68

96.35 97.00 96.70 96.80 97.40

17

Table 2: Confusion matrix produced for test samples of Bangla handwritten numerals in case of best performed sCNNRP. Number of samples classified as English Numeral

0 1 2 3 4 5 6 7 8 9

Bangla Numeral

০ ১ ২ ৩ ৪ ৫ ৬ ৭ ৮ ৯

০

১

২

৩

৪

৫

৬

৭

৮

৯

398

1

0

0

0

0

0

1

0

0

0

386

1

1

2

0

3

1

1

5

0

0

400

0

0

0

0

0

0

0

0

0

0

399

0

0

0

0

0

1

0

0

0

0

400

0

0

0

0

0

1

0

0

0

2

392

2

3

0

0

0

0

0

1

0

2

397

0

0

0

0

0

0

0

1

0

0

399

0

0

0

1

0

0

0

0

1

0

398

0

0

8

0

0

0

0

0

2

0

390

Table 3: Sample Bangla handwritten numerals misclassified by sCNNRP. Handwritten Numeral Image

Image Classified as

Image in Category

Handwritten Numeral Image

Image Classified as

Image in Category

৯

১

৩

৬

২

১

৪

১

৭

৪

০

৫

১

৯

৬

৫

১

৯

৫

৬

Table 4 compares the outcome of the proposed mCNNRP and sCNNRP methods with other prominent works of Bangla handwritten numeral recognition. The table also presents distinct features of individual methods for brief overview of the systems in the form of feature selection, classification technique and dataset. It is notable that proposed methods did not employ any feature selection technique whereas an existing method uses single or two stages of feature selection. The dataset used in this study consists of sufficient number of training and test patterns. Without feature selection, each of the proposed method is shown to outperform the existing ones. It can be clearly seen that, the proposed mCNNRP and sCNNRP achieved test set recognition accuracy of 98.88% and 98.98%, respectively. On the other hand, the test

18

Table 4: Performance (i.e., test set recognition accuracy) comparison of proposed mCNNRP and sCNNRP with prominent methods on handwritten Bangla numeral recognition. Work Reference and Year

Feature Selection

Classification

Dataset; Training and Test Set Test Samples Recognition Accuracy

Pal et al. [2], 2006

Water overflow from the reservoir based feature selection

Binary decision tree

Self-prepared, 12000

92.80%

Basu et al. [3], 2005

Shadow feature and Centroid feature

MLPs with DempsterShafer technique

Samples from CVPR, ISI, India [13]; 4000 and 2000

95.10%

Wen et al. [12], 2007

Principal component analysis (PCA) and Kernel PCA

SVM

Dhaka automatic letter sorting machine; 6000 and 10000

95.05%

Bhattacharya and Chaudhuri [9], 2009

Wavelet filter at different resolutions

Four MLPs in two stages (three + one)

CVPR, ISI [13]; 19392 and 4000

Wen and He [14], 2012

Eigenvalues and eigenvectors

Kernel and Bayesian discriminant (KBD)

Dhaka automatic letter sorting machine; 30000 and 15000

96.91%

Das et al. [13], 2012

GA to select optimal set of local feature set from five different local feature sets

SVM

CMATERdb 3.1.1 [14]; 4000 and 2000

97.70%

Nasir and Uddin [15], 2013

K-means clustering and Bayes’ theorem and Maximum a Posteriori

SVM

Self-prepared, 300

96.80%

Akhnad et al. [16], 2015

No

CNN

From CVPR, ISI [13]; 14000 and 3000

97.93%

Proposed mCNNRP

No

CNN

CVPR, ISI [13]; 18000 and 4000

98.88%

Proposed sCNNRP

No

CNN

CVPR, ISI [13]; 18000 and 4000

98.98%

98.20%

set accuracies on the same dataset for works of [3] and [9] are 95.10% and 98.20% respectively. It is notable that the authors of [9] prepared the benchmark dataset on their own and result of [9] is the best among the existing methods. Moreover, training set actually used to train in [9] is much larger (i.e., 10 times larger) than the original one and is several times bigger than the training set used in the proposed methods. It is also interesting to know that the proposed methods also outperformed the work of [9] on the basis of training set accuracy too, correctly recognizing 99.14% training samples; whereas the proposed methods mCNNRP and sCNNRP correctly recognizes 99.86% and 99.29%, respectively. Besides recognition performance, the proposed methods without feature selection is simpler than other existing methods. Moreover, major performance increase of the proposed methods with respect to the recent work [16], which used a single CNN, reveals the effectiveness of the incorporation of rotation based generated patterns in training CNN. At a glance, proposed methods revealed as a good Bangla handwritten numeral recognition system.

19

5.3 Recognition Results on Devnagari Numeral Experiments conducted for Devnagari handwritten numeral recognition with the proposed mCNNRP and sCNNRP are similar to Bangla. Fig. 12 and Fig. 13 show test set accuracy of proposed mCNNRP and sCNNRP, respectively, for two BS values (i.e., 50 and 100) varying rotational angle (ɵ) from 10 to 40 degree. The observation for mCNNRP on Devnagari is similar to Bangla as presented in Fig. 10 that mCNNRP with any rotational angle is better than sCNN. But improvement with generated patterns on Devnagari is much better compared to Bangla. The observation for sCNNRP on Devnagari is also similar to Bangla (Fig. 11): accuracy improves with rotational angle up to a certain level and accuracy with larger rotational angle is found even worse than sCNN. For BS=100, the test set accuracy with ɵ =10 is much better than sCNN but accuracy with ɵ =40 is much worse than sCNN. Moreover, as in the case of Bangla, sCNNRP is more sensitive to increasing rotational angle with respect to mCNNRP as it would generate some confusing patterns.

6

99

98.5

98.5

98



99

sCNN

97.5

ɵ = 20 ɵ = 30 ɵ = 40

96.5

sCNN

97.5

ɵ = 10

97

98

ɵ = 10 ɵ = 20

97

ɵ = 30 ɵ = 40

96.5

96

96 0

50

100

150

200

250

Iteration

(a) Test set accuracy of mCNNRP for BS= 50.

300

0

50

100

150

200

250

300

Iteration

(b) Test set accuracy of mCNNRP for BS= 100.

Fig. 12: Test set accuracy of proposed mCNNRP for BS= 50 and 100 varying rotational angle ( ɵ) from 10 to 40 degree on handwritten Devnagari numeral.

20

7

98.5

98.5

98


98


sCNN ɵ = 10 ɵ = 20 ɵ = 30 ɵ = 40

97.5

97.5

97 sCNN ɵ = 10 ɵ = 20 ɵ = 30 ɵ = 40

96.5

96 0

50

100

150

200

250

97

96.5

96 300

0

50

100

Iteration

150

200

250

300

Iteration

(a) Test set accuracy of sCNNRP for BS= 50.

(b) Test set accuracy of sCNNRP for BS= 100.

Fig. 13: Test set accuracy of proposed sCNNRP for BS= 50 and 100 varying rotational angle (ɵ) from 10 to 40 degree on handwritten Devnagari numeral.

Table 5 presents test set accuracies of both mCNNRP and sCNNRP after 300 iterations for different batch sizes as well as different rotational angles (ɵ) for pattern generation. Similar to the case of Bangla it is observed from the table that, mCNNRP outperformed sCNN for any rotational angle with any BS value; whereas smaller BS with relatively large ɵ is shown to achieve much better performance. On the other hand, sCNNRP for smaller BS value with larger ɵ value seems inferior to sCNN and best performance of sCNNRP is shown for small ɵ value with relatively large BS value. We have observed recognition accuracy of the system for various fixed number of iterations for different BS and ɵ values. Finally, the best test set accuracy of 99.31% is achieved by mCNNRP is for BS=75, ɵ =40 at 750 iteration, misclassifying 26 test patterns out of total 3763 patterns. On the other hand, the best test set accuracy for sCNNRP was 98.96% for BS=100, ɵ = 5 at 1450 iteration. Table 5: Performance evaluation of mCNNRP and sCNNRP for different Batch Size (BS) and Rotational angle (ɵ) after 300 iterations for handwritten Devnagari numeral. Batch Size

10 25 50 100 150

mCNNRP test set accuracy in % sCNN

97.74 97.95 98.03 98.06 97.69

sCNNRP test set accuracy in %

ɵ = 10

ɵ = 20

ɵ = 30

ɵ = 40

ɵ = 10

ɵ = 20

ɵ = 30

ɵ = 40

98.94 98.94 98.67 98.64 98.35

99.04 98.88 98.75 98.86 98.51

99.07 99.02 98.86 98.86 98.70

99.07 98.88 98.75 98.88 98.57

98.49 98.49 98.64 98.49 98.46

98.17 98.25 98.17 98.38 98.22

97.56 97.66 97.56 97.77 97.95

96.49 97.29 97.16 96.73 96.63

21

Table 6 shows the confusion matrix of test set samples for best performed mCNNRP. Form the table it is observed that the proposed method truly classified all test set examples of “२” and “८” whereas misclassified only one sample for “०”. On the other hand, mCNNRP showed worse performance for the numeral “७”; truly classifying 370 cases out of 378 ones. In seven cases the numeral misclassified as “६”. Table 7 shows some handwritten numeral images from total of 26 misclassified images by mCNNRP. Rotation based generated pattern along with large variation in writing styles might be the reason for misclassification of such images. Table 6: Confusion matrix produced for test samples of Devnagari handwritten numerals in case of best performed mCNNRP Number of samples classified as English Numeral

0 1 2 3 4 5 6 7 8 9

Devnagari Numeral

० १ २ ३ ४ ५ ६ ७ ८ ९

Total samples 369

०

१

२

३

४

५

६

७

८

९

368

0

1

0

0

0

0

0

0

0

378

0

376

0

0

0

0

0

2

0

0

378

0

0

378

0

0

0

0

0

0

0

377

0

0

2

373

0

0

0

2

0

0

376

0

1

0

0

374

1

0

0

0

0

378

0

0

0

0

4

374

0

0

0

0

374

0

0

0

0

0

0

372

0

1

1

378

0

0

0

0

1

0

7

370

0

0

377

0

0

0

0

0

0

0

0

377

0

378

0

0

3

0

0

0

0

0

0

375

Table 7: Sample Devnagari handwritten numerals misclassified by mCNNRP. Handwritten Numeral Image

Image Classified as

Image in Category

२

Handwritten Numeral Image

Image Classified as

Image in Category

०

४

७

१

४

५ ५

२

३

६

७

२

३

६

७

५

४

२

९ 22

Table 8 compares the outcome of the proposed mCNNRP and sCNNRP methods on Devnagari handwritten numeral recognition with other prominent works. It also presents distinct features of individual methods for brief overview of the systems: unlike other existing methods, proposed methods did not employ any feature selection technique. Among the existing methods, the best test set accuracy is archived by the work of [9] which is 99.03%. On the same test set, the proposed mCNNRP achieved 99.31% accuracy. As of Bangla, the proposed methods also outperformed the work of [9] on the training set accuracy which uses 10 times of original one correctly recognizing 99.27% training samples; whereas the proposed methods mCNNRP and sCNNRP correctly recognizes 99.96% and 99.86% test samples, respectively. At a glance, proposed methods, following training with rotation based generated patterns, revealed as a good recognition system for Devnagari handwritten numeral.

Table 8: Performance (i.e., test set recognition accuracy) comparison of proposed mCNNRP and sCNNRP with prominent methods on handwritten Devnagari numeral recognition. Work Reference and Year

Feature Selection

Classification

Dataset; Training and Test Samples

Test Set Recognition Accuracy

Self-prepared; 340 and 2460

89.68%

Bajaj et al. [17], 2002

Density Feature, Moment Feature and Segment Feature

Kohonen selforganising map, Multi-layer perceptron and Baysean classifier

Bhattacharya and Chaudhuri [9], 2009

Wavelet filter at different resolutions

Four MLPs in two stages (three + one)

CVPR, ISI [13]; 18794 and 3763

Kumar and Ravulakollu [20], 2014

Gaussian Pyramid and other techniques

K-Nearest Neighbor (KNN) and others

CAPR – 2012 database; 24000 and 11000

96.93%

Kumar and Ravulakollu [21], 2014

Profile based, Kirsch and wavelet transforms

Ensemble of Feed forward network (NN), Cascade neural network and K-Nearest Neighbor

CAPR – 2012 database; 24000 and 11000

97.87%

Singh et. al. [22], 2014

Gradient feature decomposition to generate and information theoretic based method for feature subsets creation.

NNs and ensemble of NNs

CVPR (ISI); 1400 and 600

98.53%,

Goyal and Garg [23], structural and statistical 2014 features

decision based classification

Self-prepared; Total 500

96.80%

Proposed mCNNRP

No

CNN

CVPR, ISI [13]; 18000 and 3763

99.31%

Proposed sCNNRP

No

CNN

CVPR, ISI [13]; 18000 and 3763

98.96%

99.04%

23

6. Conclusions Handwritten numeral recognition is a high-dimensional complex task and is more challenging for Bangla and Devnagari scripts those contain similar shaped numerals. Recently, CNN is found to be very efficient for image classification with its distinct features. In this study, incorporating rotation based generated patterns, CNN based two different recognition systems (i.e., mCNNRP and sCNNRP) have been investigated for Bangla and Devnagari handwritten numeral. In mCNNRP, three different training sets (one with ordinary generated patterns and two with rotating the images in clock wise and anti-clock wise direction) are prepared and three different CNNs are trained with these three training sets. Finally system output is calculated by combining the decision of the three CNNs. On the other hand, in sCNNRP, the combination of these three training sets is used to train one CNN. There are several significant differences among proposed methods and traditional methods for recognition of handwritten numeral. The use of rotation based generated patterns in CNN training is the key feature of this study. Existing CNN based studies rely on the CNN’s feature of rotational and translation invariance; and they trained only with the ordinary patterns. Whereas rotation based generated patterns along with ordinary patterns are found to enhance the recognition performance of CNN in the present study. The proposed methods are simpler than existing ones and did not conceive any feature selection scheme. Moreover, a moderated pre-processing technique has been employed while generating training patterns from the scanned handwritten images. The proposed methods have been tested on a large hand written benchmark numeral dataset of Bangla and Devnagari. The outcome of the proposed methods are compared with existing prominent methods and the proposed methods are shown to outperform the existing methods on the basis of both training and test set accuracies. Moreover, experimental results clearly identified the effectiveness of the rotation based generated pattern incorporation in CNN training to improve recognition performance. A number of potential future direction of works are available from the present study. In the present study two patterns are generated from an image rotating it in clock wise and anti-clock wise, and rotational angle was a user defined parameter. Additional patterns generated with other different angles might give better performance and remain as a scope of future study. For multiple CNN case, only three CNNs are trained in this study and several CNNs training might also give better performance.

Acknowledgment The authors would like to show gratitude to Dr. Ujjwal Bhattacharya (Computer & Communication Sciences Division, Indian Statistical Institute, Kolkata, India) for providing the benchmark dataset used in this study.

24

References [1]

R. Plamondon and S. N. Srihari, “On-line and off-line handwritten recognition: A comprehensive survey,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 22, pp. 62-84, 2000.

[2]

U. Pal, C. B. B. Chaudhuri and A. Belaid, “A System for Bangla Handwritten Numeral Recognition,” IETE Journal of Research, Institution of Electronics and Telecommunication Engineers, vol. 52, no. 1, pp. 2734, 2006.

[3]

S. Basu, R. Sarkar, N. Das, M. Kundu, M. Nasipuri and D. K. Basu, “Handwritten Bangla Digit Recognition Using Classifier Combination Through DS Technique,” LNCS, vol. 3776, pp. 236–241, 2005.

[4]

Y. Lecun and Y. Bengio, “Pattern Recognition and Neural Networks”, in Arbib, M. A. (Eds), The Handbook of BrainTheory and Neural Networks, MIT Press 1995.

[5]

Y. LeCun, L. Bottou, Y. Bengio and P. Haffner, Gradient-based learning applied to document Recognition, in Proceedings of the IEEE, vol. 86, no. 11, pp. 2278–2324, November 1998.

[6]

Y. LeCun, Y. Bengio and G Hinton, “Deep Learning”, Nature, vol. 521, pp. 436–444, 2015.

[7]

Off-Line Handwritten Bangla Numeral Database, http://www.isical.ac.in/~ujjwal/download/database.html

[8]

CMATERdb 3.1.1: Handwritten Bangla Numeral Database, http://code.google.com/p/cmaterdb/ (accessed July 12, 2015).

[9]

U. Bhattacharya and B. B. Chaudhuri, Handwritten numeral databases of Indian scripts and multistage recognition of mixed numerals, IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 31, no. 3, pp. 444-457, 2009.

[10] M. R. Bashar, M. A. F. M. R. Hasan, M. A. Hossain and D. Das, “Handwritten Bangla Numerical Digit Recognition using Histogram Technique,” Asian Journal of Information Technology, vol. 3, pp. 611-615, 2004. [11] M. M. R. Khan, S. M. A. Rahman and M. M. Alam, “Bangla Handwritten Digits Recognition using Evolutionary Artificial Neural Networks” in Proc. of the 7th International Conference on Computer and Information Technology (ICCIT 2004), 26-28 December, 2004, Dhaka, Bangladesh. [12] Y. Wen, Y. Lu and P. Shi, “Handwritten Bangla numeral recognition system and its application to postal automation,” Pattern Recognition, vol. 40, pp. 99-107, 2007. [13] N. Das, R.Sarkar, S. Basu, M. Kundu, M. Nasipuri and D. K. Basu, “A genetic algorithm based region sampling for selection of local features in handwritten digit recognition application,” Applied Soft Computing, vol. 12, pp. 1592-1606, 2012. [14] Y. Wen and L. He, “A classifier for Bangla handwritten numeral recognition,” Expert Systems with Applications, vol. 39, pp. 948-953, 2012. [15] M. K. Nasir and M. S. Uddin, “Hand Written Bangla Numerals Recognition for Automated Postal System,” IOSR Journal of Computer Engineering (IOSR-JCE), vol. 8, no. 6, pp. 43-48, 2013. [16] M. A. H. Akhand, Md. Mahbubar Rahman, P. C. Shill, Shahidul Islam and M. M. Hafizur Rahman, “Bangla Handwritten Numeral Recognition using Convolutional Neural Network,” in Proc. of International Conference on Electrical Engineering and Information & Communication Technology (iCEEiCT2015), Dhaka, Bangladesh, pp. 1-5, May 21-23, 2015. [17] R. Bajaj, L. Dey and S. Chaudhary, " Devnagari numeral recognition by combining decision of multiple connectionist classifiers”, Sadhana, vol. 27, part 1, pp. 59-72, Feb 2002. [18] S. K. Shrivastava and S. S. Gharde, "Support Vector Machine for Handwritten Devnagari Numeral Recognition”, International Journal of Computer Applications, vol. 7, no.11, pp. 9-14, 2010.

25

[19] P. Singh and N. Tyagi, “Radial Basis Function for Handwritten Devnagari Numeral Recognition,” International Journal of Advanced Computer Science and Applications (IJACSA), vol. 2, no. 5, pp. 126129, 2011. [20] R. Kumar and K. K. Ravulakollu, "Offline Handwritten Devnagari Digit Recognition”, ARPN Journal of Engineering and Applied Sciences, vol. 9, no.2, pp. 109-115, Feb 2014. [21] R. Kumar and K. K. Ravulakollu, "Handwritten Devnagari Digit Recognition: Benchmarking on New Dataset”, Journal of Theoretical and Applied Information Technology, vol. 60, no.3, pp. 543-555, Feb 2014. [22] P. Singh, A. Verma and N. S. Chaudhari, "Devanagri Handwritten Numeral Recognition using Feature Selection Approach”, I.J. Intelligent Systems and Applications, MECS Press, vol. 6, no.12, pp. 40-47, Nov 2014. [23] M. Goyal and N. K. Garg, "Handwritten Devnagari Numeral Recognition using Structural and Statistical Features”, International Journal of Emerging Technologies in Computational and Applied Sciences (IJETCAS), vol. 9, pp. 324-329, 2014. [24] P. Melville and R. J. Mooney, “Creating diversity in ensembles using artificial data”, Information Fusion, vol. 6, pp. 99-111, 2005. [25] M. A. H. Akhand and K. Murase, “Ensembles of Neural Networks based on the Alteration of Input Feature Values,” International Journal of Neural Systems, vol. 22, issue 1, pp. 77-87, 2012. [26] Convolutional Neural Network in Deep Learning Toolbox. Avilable in https://github.com/rasmusbergpalm/ DeepLearnToolbox/tree/master/CNN [27] Md. Mahbubar Rahman, M. A. H. Akhand, Shahidul Islam, Pintu Chandra Shill and M. M. Hafizur Rahman, “Bangla Handwritten Character Recognition using Convolutional Neural Network,” I.J. Image, Graphics and Signal Processing(IJIGSP), vol. 7, no. 3, pp. 42-49, 2015. Doi: 10.5815/ijigsp.2015.08.05.

26