2017 3rd International Conference on Electrical Information and Communication Technology (EICT), 7-9 December 2017, Khulna, Bangladesh
Bangla Handwritten Digit Classification and Recognition Using SVM Algorithm with HOG Features Hasin Rehana Department of Computer Science & Engineering (Rajshahi University of Engineering & Technology, Bangladesh) [email protected]
Abstract— This paper presents the basic approach of multiclass classification for handwritten digit recognition using Support Vector Machine and a comparative accuracy analysis for three well known kernel functions (linear, RBF and polynomial) and feature vectors corresponding to different cell sizes. However, the process of digit recognition includes several basic steps such as preprocessing, feature extraction and classification. Among them, feature extraction is the fundamental step for digit classification and recognition as accurate and distinguishable feature plays an important role to enhance the performance of a classifier. Histogram of Oriented Gradient (HOG) feature extraction technique has been used here. Therefore, for various cell sizes, the experimental results show around 98-100% accuracy for trained data and 91-97% accuracy for test set data according to various kernel functions. The target of this paper is to select a kernel function best suited for a particular resolution of image. Keywords— Supervised Learning, HOG, Feature Extraction, Classification, Support Vector Machine, Kernel Function.
Digit recognition is a subset of Optical Character Recognition (OCR). Digit recognition can be divided into two types, handwritten and fixed font printed digit recognition . However, the main challenge of handwritten digit recognition is to deal with wide variety of writing styles. The objective of handwritten digit recognition is to read digits from scanned images such as envelope, license plate, bank cheque etc. and digitalize them for storing and further using . Handwritten digit recognition is more difficult comparing to printed digits as the size, shape and style of digits vary from man to man. The accuracy of classification and recognition largely depends on the feature extraction method. In this paper Histogram of Oriented Gradient feature extraction technique is used to obtain features. The rest of the paper is organized as follows: section II mentions the necessary steps of digit recognition and briefly describes the methodology of preprocessing, feature extraction and classification steps respectively. The main focus of this paper is section III that includes a comparative performance analysis among three major SVM kernels (linear, RBF and polynomial). Experimental results are reported in section IV to compare the accuracy level of digit recognition for different kernel functions and cell sizes. Finally, section V concludes the experimental analysis.
978-1-5386-2307-7/17/$31.00 ©2017 IEEE
The digit recognition process is mainly divided into 3 major parts: preprocessing, feature extraction and classification. The steps are shown in figure 1. A. Preprocessing The steps performed before feature extraction is called preprocessing. The aim of preprocessing is the improvement of the image data that suppresses unwanted distortions or enhances some image features important for further processing. Preprocessing steps include image acquisition, binarization, noise removal, skew detection, segmentation and scaling . 1) Image Acqusition: Images can be captured using any device that has a camera or scanner . Images from PDF file can also be fed into the system. The images can be a single digit or a set of digits collected from number plate, bank cheque, postal code etc. 2) Binarization of image: RGB image is converted to gray scale before binarizing it. The binarization is performed based on a fixed threshold value using Otsu’s threshold method . 3) Noise Removal: Noise removal is performed to reduce the probability of misclassification because of low quality of image. Here median filter is used for noise removal . It is a highly used smoothing technique. 4) Skew Detection: Usually skew is occurred due to angular placement of image while acquisition. Skew is usually removed by rotating the image to opposite angle of the estimated skew value . Preprocessing
Figure 1. Fundamental steps of digit classification.
5) Segmentation: Digit segmentation is the operation that decomposes an image into sub images of individual digits. It includes line segmentation and character segmentation . Line segmentation is performed by scanning the image horizontally for a particular number of frequency of white pixels in each raw. Then digit segmentation is done by scanning each line vertically to find gap between digits and store the sub images. 6) Scaling: To compare the feature vectors each digits have to be scaled to a particular size. With the increased size of
image, more features can be extracted leading to higher accuracy. But the memory requirement and time complexity also increases as well. On the contrary, smaller images provide less features leading to lower accuracy. To balance between feature size and processing time, all the images are scaled to 32x32 matrices. B. Feature Extraction: Feature extraction plays an important role in the area of image processing and machine learning. Feature extraction techniques are applied to get feature vector that will play an effective role in classifying and recognition of image or a class. A feature vector is a vector that contains information that can describe an object’s important characteristics. A well-defined feature vector consists of discriminating information, which has the ability to distinguish one class from another. In image processing, features can take many forms. It may be the row intensity value of each pixels, color components, length, area, gradient magnitude, gray-level intensity value etc. However, feature extraction means generating a new set of features from a huge amount of original features. There are various types of feature extraction techniques such as zoning, Fourier transform, Gabor filter etc. . In this paper, Histogram of Oriented Gradient (HOG) features are used. HOG feature extraction steps mainly include gradient calculation, histogram generation and block normalization , . Firstly, the image is divided into predefined equal sized smaller regions called cells. Each cell has a one dimensional histogram of oriented gradient direction. The gradient orientation is quantized into 9 bins spaced from 0° to 180°. The cells are grouped into bigger units called blocks. These blocks are overlapped and the cells are shared among the neighboring blocks like figure 2. This overlapping may seem redundant but it is helpful for better performance. The number of elements for each block is n*n*9 for block size [n n]. Block normalization is done for contrast stretching.
Figure 4. Extracted HOG features of "0"for different cell sizes
A simplified version of HOG feature extraction steps are shown in figure 3. However, histogram of oriented features of a sample digit is shown in figure 4 for three different cell sizes to understand the effect of cell size ie. resolution in HOG feature. The cell size must be small enough to capture the digits but large enough so that the analysis is efficient. C. Classification: Classification is the method of identifying the category a test data, on the basis of features extracted from the training data set. If a training set of correctly identified observation is available for training classification algorithms, then it is called supervised learning. Well known classification algorithm includes Support vector machine, Decision trees, neural network, Naïve Bayes classifier etc. There are two types of classifications. Binary classification indicates that there must be two classes. On the contrary, multi class classification is the problem of classifying instances into more than two classes. • Multi class SVM classification Support vector machine are supervised learning based upon the idea of maximizing the margin i.e. maximizing the minimum distance from the separating hyper plane to the nearest example . A supervised learning algorithm analyzes the training data and produces an inferred function, which can be used for mapping new examples. The basic SVM supports only binary classification, but extensions have been proposed to handle the multi class classification problems. In these extensions, additional parameters and constraints are added to the optimization problem to handle the separation of the different classes. K*(K-1)/2 binary SVM models using one-vs.-one coding design has been used here . •
Figure 2. Overlapping of cells in neighboring blocks.
Figure 3. HOG feature extraction steps.
Kernel methods are a class of algorithms for pattern analysis, whose best known member is the support vector machine. This approach is called kernel trick. Kernel functions are used to classify sequence of texts, images, data, graphs as well as vectors. It takes two inputs and split out how similar they are. In case of SVM, there are a few well-known kernel functions such
as linear kernel, RBF kernel, polynomial kernel etc. Linear kernel means no kernel, which is the simplest method. The Gaussian, i.e. redial basis function (RBF) kernel is a popular kernel function used in various kernelled learning algorithms. In particular, it is commonly used in SVM classification. The polynomial kernel is a kernel function also commonly used with SVM and other kernelled models, that present the similarity vectors in a feature space over polynomials of the original variables, allowing learning of non-linear models. These three kernels are mentioned in Table 1. C and Gamma are parameters for a non-linear SVM with RBF kernel. However, a standard SVM tries to find a margin that separates all positive and negative examples, which result in a poorly fit model if any of the examples are mislabeled. C is the parameter for soft margin cost function which minimizes this over-fitting problem. TABLE I.
complexity and space complexity is increased. The comparative relation among kernel functions, feature size and resulting accuracy is shown in figure 6 and 7. When feature vector is much bigger (such as 4 times bigger) than number of observation, RBF kernel becomes very much sensitive to outliers. Because of high variance of classifier margin, overfitting occurs and the accuracy of SVM degrades because of misclassification. In this case linear kernel suits good.
Figure 5. Sample Data set for training system.
X1*X2 || 1 2|| 2 1 1∗ 2
Gamma plays an important role in Gaussian function. If the gamma is large, then variance is small implying the support vector does not have wide-spread influence. Actually, large gamma leads to high bias and low variance models and vice versa. However, the scale is selected heuristically based on the variance/covariance structure in the data. The value of used here is 1. So the resulting equation of Gaussian kernel is, 1, 2
0.5 ∗ exp
On the other hand, the order of polynomial kernel used here is p=3 i.e. the function for polynomial kernel becomes, 1, 2
Figure 6. Cell size vs. accuracy(%) for three different kernels of SVM (linear, RBF, Polynomial).
From equation (2) and (3) it is quite clear that the polynomial kernel has higher dimensionality than the Gaussian kernel which is a clear indication that the polynomial kernel should provide higher accuracy than the Gaussian kernel.
COMPARATIVE ACCURACY ANALYSIS AMONG DIFFERENT KERNELS
6000 handwritten Bangla digit images from Mendeley Data were divided into training set (80%) and test set (20%) . Number of training elements = 4800 Number of testing elements = 1200 Sample data set are shown in figure 5. Linear SVM is very fast. On the other hand RBF and polynomial kernels are comparatively slower. Despite these disadvantage, RBF and polynomial kernel is very much useful for the situation where the data analyst doesn’t have large amount of features. Efficiency of SVM depends greatly on the size of feature vector used. With the increasing size of feature vector, both time
Figure 7. Kernels vs. accuracy(%) with respect to 7 different cell sizes.
On the other hand, when there is a large number of data set available but available feature is limited, RBF and polynomial kernel fits properly with the data set resulting better accuracy. Linear kernel doesn’t work good in this case. But if the extracted feature vector is very much smaller than number of observation then using RBF or polynomial kernel results in under-fitting and performance degrades due to missclassification. For such a situation linear kernel performed
better. Table 2 describes the performance of polynomial SVM classifier for 324 features. TABLE II.
is linear indicating very high accuracy. On the contrary the scatter plot of test label vs. predicted label is shown in fig. 9 which is quite scattered indicating some misclassified data.
ACCURACY RATE FOR POLYNOMIAL KERNEL USING 324 FEATURES.
No. of test data
Correctly identified data
The result is summarized in Table 3 Shows that for a very large number of training features linear SVM works very much efficiently. But if the feature size is smaller than the number of observation then RBF or polynomial kernel is preferred because they fit this kind of data set properly, resulting in higher accuracy than linear SVM. A graphical comparison of table III is presented in figure 6 and 7 and the reasons behind such accuracy behavior is briefly described in section III. TABLE III.
COMPARATIVE ACCURACY FOR DIFFERENT KERNEL FUNCTIONS
Training set Ca se no.
Feat ure Leng th
Traini ng Accur acy (%)
Test Accur acy (%)
Trai ning Accu racy (%)
Test Accur acy (%)
Trai ning Accu racy (%)
Test Accu racy (%)
Figure 8. Scatterplot of training labels vs. predicted labels for polynomial kernel with cell size [8 8].
In this paper, comparative performance of three well-known kernels of SVM classification algorithm has been investigated to find out the appropriate kernel function for used sample dataset of Bangla handwritten digits. Experimental result shows that using HOG features, handwritten digit recognition shows at most 97.08% accuracy for polynomial kernel function. This performance mostly depends on the preprocessing and feature extraction techniques. However, the recognition rate can be improved using the combination of more than one feature extraction techniques. ACKNOWLEDGMENT Thanks to my project supervisor Dr. Md. Ali Hossain (Assistant Professor, RUET) for his guidance, patience, suggestions and encouragement. Thanks to the anonymous reviewers of EICT-2017 for their insightful suggestions and comments that allowed me to improve this paper. However, any errors are my own and should not tarnish the reputation of these esteemed persons. REFERENCES 
Figure 9. Scatterplot of test labels vs. predicted labels for polynomial kernel with cell size [8 8].
A scatter plot of training label vs. predicted label is shown in fig. 8 for polynomial kernel with 324 feature length. The plot
J. Mantas, "An overview of character recognition methodologies." Pattern recognition 19.6 (1986): 425-430. B. B. Chaudhuri, and U. Pal. "A complete printed Bangla OCR system." Pattern recognition 31.5 (1998): 531-549. Md. Mahbub Alam, and Dr. M. Abul Kashem. "A complete Bangla OCR system for printed characters." JCIT 1.01 (2010): 30-35.
Md. Mojahidul Islam, Md. Imran Hossain, and Md. Kislu Noman. "Bangla character recognition system is developed by using automatic feature extraction and XOR operation." Global Journal of Computer Science and Technology (2013). N. Otsu, "A threshold selection method from gray-level histograms." IEEE transactions on systems, man, and cybernetics 9.1 (1979): 62-66. Sumana Barman, Amit Kumar Samanta, and Tai-hoon Kim. "Design of a view based approach for Bengali Character recognition." Int. J. Advanced Science and Technology. 2010. Nallapareddy Priyanka, Srikanta Pal, and Ranju Mandal. "Line and word segmentation approach for printed documents." IJCA Special Issue on Recent Trends in Image Processing and Pattern Recognition-RTIPPR (2010): 30-36. Gaurav Kumar, and Pradeep Kumar Bhatia. "A detailed review of feature extraction in image processing systems." Advanced Computing & Communication Technologies (ACCT), 2014 Fourth International Conference on. IEEE, 2014.
Yuqian Li, and Guangda Su. "Simplified histograms of oriented gradient features extraction algorithm for the hardware implementation." Computers, Communications, and Systems (ICCCS), International Conference on. IEEE, 2015. Navneet Dalal, and Bill Triggs. "Histograms of oriented gradients for human detection." Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on. Vol. 1. IEEE, 2005. Nibaran Das, et al. "Handwritten Bangla basic and compound character recognition using MLP and SVM classifier." arXiv preprint arXiv:1002.4040 (2010). Christopher M. Bishop, Pattern recognition and machine learning. springer, 2006. Nabeel Mohammed, Sifat Momen, Anowarul Abedin, Mithun Biswas, Rafiqul Islam, Gautam Shom and Md Shopon, "BanglaLekha-Isolated", Mendeley Data, v2, 2017.