Arabic Handwritten Text Recognition and Writer

Republic of Iraq Ministry of Higher Education and Scientific Research University of Technology Department of Computer Science

Arabic Handwritten Text Recognition and Writer Identification

A Thesis Submitted to the Department of Computer Science of the University of Technology in a Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy in Computer Science BY Mustafa Salam Kadhm AL-Shammari

Supervisor Asst. Prof. Dr. Alia K. Abdul Hassan

2016

‫بسم اهلل الرمحن الرحيم‬

‫ُْ‬ ‫َّ ُ َّ َ َ ُ‬ ‫َ َْ‬ ‫يرف ِع الله الذِين آمنوا ِمنكم‬

‫ْ‬ ‫ْ‬ ‫ُ‬ ‫َّ‬ ‫ُ‬ ‫َ َ ََ‬ ‫َ‬ ‫َ‬ ‫وال ِذين أوتوا ال ِعلم درجات‬ ‫صدق اهلل العلي العظيم‬

‫(سورة المجادلة‪ :‬االية ‪)11‬‬

This thesis is lovingly dedicated to my aunt “Montaha Jasim” (may Allah rest her soul in peace). Her support, encouragement and constant love have sustained me throughout my life.

Mustafa,

Linguistic Certification This is to certify that this thesis entitled “Hand Written Text Recognition for Security Application", prepared by “Mustafa Salam Kadhm” at the University of Technology/ Department of Computer Science, is reviewed linguistically. Its language was amended to meet the style of the English language.

Signature: Name: Assist. Prof. Dr. Mutaz S. Abdul-Wahab Date: 2 / 11 / 2016

Acknowledgements All my thanks first of all are addressed to Almighty Allah, who has guided my steps towards the path of knowledge and without his help and blessing, this Thesis would not have progressed or have seen the light. My sincere appreciation is expressed to my supervisor Dr. Alia K. Abdul Hassan for providing me with support, ideas and inspiration. I am extremely grateful to all members of Computer Science Department of University of Technology for their general support. Finally, I would never have been able to finish my Thesis without the help from friends, and support from my family and wife.

Thank you all!

Mustafa,

Abstract Most of the governments and organizations have a huge number of handwritten documents generated by their daily processes. It is imperative to use computers to read the generated handwritten texts, and make them editable and searchable. Therefore, handwritten recognition lately became a very popular research topic and the number of its possible applications is very large. It's capable in resolving complex problems and simplify human activities by converting the handwritten documents into digital form. However, the Arabic handwritten text recognition is a complex process compared with other handwritten languages because Arabic handwritten text is cursive of nature. Therefore, this thesis proposed an Arabic handwritten text recognition and writer identification system based on segmenting the input handwritten text into handwritten sub-words. The system has two main modules that are used, for the recognition of the handwritten text and identifying the text’s writer. The first module1 has six stages that work together to recognize the Arabic handwritten text and convert it into editable text. These stages are: image acquisition, segmentation, preprocessing, features base construction, classification and post-processing. The second module2 is identified the desired text’s writers through several stages that similar to module1. The system proposes an efficient and accurate segmentation algorithm that segments the input handwritten text into a number of handwritten sub-images and each of these segmented sub-images has an Arabic handwritten sub-word. Besides that, an image thresholding algorithm is proposed to convert the sub-images into binary based on using fuzzy cmean clustering method. Furthermore, the binary sub-images went through proposed noise removal algorithm in order to remove undesired pixels. After that, two groups of features are extracted from the handwritten sub-images. The first features group that is used for models1 includes structural, statistical, Discrete Cosine Transform (DCT) and proposed Modified Histogram of Oriented Gradient (MHOG1) features. However, the second features group which is used for module2 includes proposed MHOG2 and shape features. In addition, best classification results are obtained by using Support Vector i

Machine (SVM) classifier. An Arabic lexicon is proposed for the first module to convert the classified classes into the Arabic editable text, and a writers’ lexicon is proposed too to assign the classified label into the desired writer. In order to test the system performance, three Arabic handwritten databases are used, which are AHDB database, IESK-ArDB database and a proposed Arabic handwritten database. The results obtained from the first module were 96.317% for AHDB, 82% for IESK-ArDB and 98% for the proposed database using SVM polynomial kernel. On the other hand, the results of the second module using the proposed handwritten database was 85% for handwritten sub-word level and 100% for handwritten text level approaches. Finally, the processing time of the proposed system is 6.2 (seconds).

ii

List of Contents Subject

Page No.

Abstract

i

List of Contents

iii

List of Tables

Vi

List of Figures

Viii

List of Abbreviations

Xi

List of Algorithms

Xiii

Chapter One: General Introduction 1.1 Introduction

1

1.2 Handwritten Recognition

1

1.2.1 Security Applications of Handwritten Recognition

2

1.2.2 Handwritten Text Recognition and Writer Identification

3

1.2.3 Handwritten Text Dependent and Text Independent

3

1.2.4 Arabic Language and Handwritten Recognition System

4

1.2.5 Handwritten Recognition Approaches

4

1.3 Literature Survey

5

1.4 Aim of Thesis

10

1.5 Thesis Contributions

10

1.6 Organization of Thesis

12

Chapter Two: Theoretical Background 2.1 Introduction

13

2.2 Handwritten Recognition

13

2.3 Classification of Text Recognition

14

2.3.1 Online Text Recognition

15

2.3.2 Offline Text Recognition

15

2.4 General Structure of Handwritten Text Recognition System

16

2.4.1 Image Acquisition

17

2.4.2 Preprocessing

17

2.4.3 Segmentation

24

2.4.4 Features Extraction

25 iii

33

2.4.5 Classification

2.5 Arabic Handwritten Text Recognition

40

2.5.1 Features of Arabic Language

40

2.5.2 Arabic Handwritten Text Recognition Databases

43

2.6 Handwritten Recognition Applications

47

2.6.1 Offline Handwritten Recognition

47

2.6.2 Online Handwritten Recognition

48

2.7 Evaluation Measures of Handwritten Recognition System

49

Chapter Three: Proposed Arabic Handwritten Text Recognition System 3.1 Introduction

52

3.2 Architecture of the Proposed System

52

3.3 Arabic Handwritten Text Recognition (Module1)

53

3.3.1 Image Acquisition Stage

54

3.3.2 Segmentation Stage

55

3.3.3 Preprocessing Stage

58

3.3.4 Features Base Construction Stage

67

3.3.5 Classification Stage

78

3.3.6 Post-processing Stage

82

3.4 Arabic Handwritten Text Writer Identification (Module2)

83

3.4.1 Features Base Construction (module2)

84

3.4.2 Classification Stage (module2)

89

3.4.3 Post-processing Stage (module2)

90

3.5 Proposed Handwritten Database

90

Chapter Four: Experiments and Results Discussion 4.1 Introduction

94

4.2 Evaluation of the HTRSA System (module1)

94

4.2.1 Arabic Handwritten Database

94

4.2.2 Handwritten Text Segmentation

97

4.2.3 Handwritten Text Image Preprocessing

101


109

4.2.5 Features Normalization (FN)

114


114 iv

115

4.2.7 Number of Images Set

4.3 Evaluation of the HTRSA System (module2)

116


116


118

4.4 Discussion

119

Chapter Five: Conclusions and Suggestions 5.1 Conclusions

121

5.2 Suggestions for Future Work

123

References

v

List of Tables Table No.

Table Caption

Page No.

2.1

SVM kernels

39

2.2

Arabic characters and their forms

40

2.3

Arabic letters with diacritical points

42

2.4

AHTR databases The energy contained in different DCT coefficients number Sample of the proposed Arabic lexicon

46

Sample of proposed writers’ lexicon Arabic handwritten images from different databases Segmentation Results Evaluation of proposed thresholding algorithm on IESK-arDB database Recognition accuracy after adding noise Experimental results of applying the proposed noise removal algorithm Experimental results of applying BSE algorithm

90

3.1 3.2 3.3 4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8 4.9 4.10 4.11 4.12 4.13 4.14 4.15 4.16 4.17

Experimental results with various image sizes Comparison of results for different edge detection filters Comparison of results for different MHOG1 values Experiment results for overlapping approach Experiment results for different dividing approaches Ordering techniques of selecting coefficients Features extraction times of DCT and FCT methods Comparison of results for different features extraction methods Comparison results of applying FN algorithm The recognition accuracy of different Arabic Databases and SVM kernels Experiment results of different training and testing images numbers vi

72 82

95 99 104 105 106 107 108 109 110 111 112 112 113 113 114 115 116

4.18 4.19 4.20

Experiment results for different MHOG2 values The identification accuracy of the extracted features The identification accuracy of different SVM kernels

vii

117 118 119

List of Figures Figure No.

Figure Caption

Page No.

2.1

Handwritten recognition system

14

2.2

14

2.4

Classifications of Text Recognition Acquire of the offline and online handwritten text General process flow of HTR system

2.5

Sobel convolution kernels

20

2.6

Image thinning The four templates of Stentiford thinning algorithm Image gradient

22

29

2.13

Cell division Histogram of oriented gradient for all image cells Image transformation using DCT The hyperplane H that separates the two sets of points Support vectors

2.14

Example maximum margin (optimal hyperplane)

37

2.15

linear SVM with soft margin

38

2.16

Changing the data space

39

2.17

41

2.19

Cursiveness of Arabic language Example semi-word constituting Arabic subword Example of AHDB database

2.20

Example of IESK-arDB database

45

3.1

Proposed HTRSA system architecture

53

3.2

Architecture of module1

54

3.3

Arabic handwritten text image

55

3.4

Distances feature of the Arabic handwritten text Applying the proposed text segmentation Algorithm The proposed preprocessing stage of module1

56

2.3

2.7 2.8 2.9 2.10 2.11 2.12

2.18

3.5 3.6

viii

16 16

23 28 29 32 35 36

42 44

57 58

3.7

Image thresholding

61

3.8

Noise Removal

63

3.9

Black space elimination

65

3.10

Image thinning

65

3.11

Edge image using Sobel detector

66

3.12

Image scaling

67

3.13

Proposed features base construction of module1

68

3.14

Different Arabic descriptors

69

3.15

Image blocking

70

3.16

Edge detection

73

3.17

Image gradient

74

3.18

Cells division

74

3.19

Histogram of oriented gradients

75

3.20

Interpolation votes of gradient orientation

75

3.21

Histograms concatenation

76

3.22

Feature vector Architecture of proposed SVM one against all approach Classification process of module1

77

83

3.27

Architecture of writer identification (module2) Preprocessing stage of writer identification (module2) Proposed features base construction of module2

3.28

Blocks dividing

85

3.29

Histogram of gradient orientation of MHOG2 width, height and centroid of the handwritten sub-image Classification approach of module2 Handwritten character example of the proposed database Handwritten sub-word example of the proposed database Sample of Arabic text written by same writer Handwritten text image example of the proposed database

86

3.23 3.24 3.25 3.26

3.30 3.31 3.32 3.33 3.34 3.35

ix

79 80

84 84

88 89 91 91 92 93

4.1

Arabic handwritten text documents

96

4.2

Arabic handwritten images of the text (‫)العام‬

97

4.3

Image segmentation

98

4.4

Error segmentation Mean and Standard deviation of handwritten image for the same Arabic text Image thresholding by the proposed thresholding method The recognition accuracy of different SVM kernels

4.5 4.6 4.7

x

100 101 102 114

List of Abbreviations Abbreviations 1D

Meaning One Dimension

AHDB

Arabic Handwritten Database

AHTR

Arabic Handwritten Text Recognition

ANN

Artificial Neural Network

AR

BSE

Aspect Ratio American Standard Code for Information Interchange Black Space Elimination

CNN

Convolutional Neural Network

CRFs

Conditional Random Fields

DBN

Dynamic Bayesian Network

DCT

Discrete Cosine Transform

DFT

Discrete Fourier Transform

DWT

Discrete Wavelet Transform

ASCII

ED

Euclidean Distance

ENIT

Ecole Nationale Ingénieurs de Tunis

FCM

Fuzzy C-Mean

FCT

Fast Cosine Transform

FFT

Fast Fourier Transform

FN

Features Normalization

GW

Gabor Wavelet

HCRFs

Hidden Conditional Random Fields

HMM

Hidden Markov Model

HOG

Histogram of Oriented Gradient

HR

Handwritten Recognition

HT

Hough Transform

HTK

Hidden Markov Model Toolkit

HTR

Handwritten Text Recognition Hand Written Text Recognition System for Security Application

HTRSA

xi

HWT ICZ

Haar Wavelet Transform

IFN

Image Centroid Zone Institute for Electronics, Signal Processing and Communication - Arabic Database Institute for Nachrichtentechnik

KNN

K-Nearest Neighbor

LBP

Local Binary Pattern

LMR

Local Minima Regression

MD

Mahalanobis Distance

ME

Misclassification Error

IESK-ArDB

MHOG MLP MS

Modified Histogram of Oriented Gradient Multilayer Perceptron Matching Score

MSE

Mean Square Error

MV

Majority Voting

OCR

Optical Character Recognition

PDA

Personal Digital Assistant

PSNR

Peak Signal to Noise Ratio

RBF SD

Radial Basis Function Standard Division

SVM

Support Vector Machine

ZCZ

Zone Centroid Zone

xii

List of Algorithms Algorithm No.

Algorithm Caption

Page No.

2.1

Stentiford

23

2.2

Fast Cosine Transform (FCT)

33

3.1

Text Segmentation

56

3.2

Arabic Handwritten Image Thresholding

60

3.3

Noise Removal

62

3.4

Black Space Elimination

64

3.5

Statistical Features

71

3.6

DCT- Features Extraction

73

3.7

MHOG1 Features

76

3.8

Features Normalization

78

3.9

SVM Training

81

3.10

SVM Testing

81

3.11

MHOG2 Features

86

3.12

Shape Features

88

xiii

General Introduction

Chapter One General Introduction 1.1 Introduction Pattern recognition currently has a very wide field of methods supporting the development of numerous applications in many different activities areas. The methods and techniques of pattern recognition generally are in the middle of the simulation "intelligent" tasks, which certainly infiltrated our daily lives. The technical information related to processing is currently experiencing a very active development in conjunction with the information and have the potential for more important in the field of human-machine interaction. Humans want to communicate with the computer easily to facilitate and accelerate the interaction and information exchange. They seek to make these machines accessible by voice, ability to read, see, manipulate and quickly analyze the received information [1]. However, writing to communicate was all the time a primary concern of human. The writing was and will remain one of the great foundations of civilization and mode of excellence in conservation and transmission of knowledge. Indeed, many objects that around us have a paper: the signs, notices job products, newspapers, books, forms ... etc. Enabling the machine to read will capture more information easily and process the documents faster. With the advent of new information technologies, electronics computers and further increase in the power of machines, the automated processing (edit, search and archiving) therein attached appears is unavoidable. Therefore, a system that makes the machine to understand the human writing is needed [2].

1.2 Handwritten Recognition Handwritten recognition is the most crucial part of converting the handwritten document or characters into computer editable text. The handwritten recognition also

1

Chapter One


Therefore, it focuses on the large repetitive applications with rather large databases namely: automatic processing of administrative files, automatic sorting of mails, reading the amounts and bank checks, processing of addresses, the forms processing, interfacing without keyboard, the analysis of the written gesture, reading heritage documents, indexing and search library archives of information in a database. The automation of any of these examples, is an extremely difficult problem to in view of the large variability associated with habits of writers and styles and forms of writing (handwritten and cursive). Indeed, reading activity that is simple for a human is not an easy task to do in a computer. Thus, accomplishing this task requires that machine gets a prior knowledge base domain and use a powerful mathematical formalism [2].

1.2.1 Security Applications of Handwritten Recognition Most of the security systems and applications use various techniques to achieve a higher security against any type of threats. In pattern recognition field, there are several common methods that have been considered such as: fingerprint, iris and face [3] to identify the required user. However, the handwritten signature identification is recently used to identify the user based on his/her signature. Each person has distinct signature and every signature has its own physiology or behavioral characteristics. The commonly used of the handwritten signature identification is in bank checks [4]. On the other hand, the handwritten text is considered a good characteristic that identifies the text writer. Like signature, each person has distinct handwritten text and every handwritten text has its own physiology or behavioral characteristics. Since each person can have only one or two signature, the handwritten text gives more features about the writer than the handwritten signature [4].

2

Chapter One


Therefore, the writer identification of handwritten text can be considered a very satisfying method for the security systems and applications. One of the main applications of handwritten text is its use in forensic sciences of e-government systems. Identification of a person based on an arbitrary handwritten text sample is a useful application. Handwritten text writer identification allows for determining the suspects in conjunction with the inherent characteristic of a crime (the case of threat letters). This is different from other biometric techniques, where the relation between the evidence material and the details of an offense can be quite remote [5]. In addition to forensic applications of handwritten text, other applications exist, including: forgery detection [6] and identification on handwritten musical scores [7].

1.2.2 Handwritten Text Recognition and Writer Identification Although both handwritten text recognition and handwritten text writer identification are considered to be parts of the pattern recognition field, handwritten text recognition differs from handwritten text writer identification in that they seek to maximize opposite characteristics. The objective of handwritten text writer identification is to find the variations of the handwritten text and recognize the uniqueness of each writer with little regard to the text content. However, the objective of handwritten text recognition is to seek the similar features for the same text and identify the text content [8]. Nevertheless, the two fields have used quite similar techniques in features extraction and classification stages as will be described in chapter three.

1.2.3 Handwritten Text Dependent and Text Independent Handwriting text recognition and identification can be divided into two categories which are, text dependent and text independent. Text dependent recognition and identification systems require certain known handwritten text to be written, while the text3

Chapter One


independent recognition and identification systems can work on any given handwritten text [8]. In this thesis, text dependent for recognition and identification of handwritten text is involved.

1.2.4 Arabic Language and Handwritten Recognition System Arabic is written and spoken by approximately more than 250 million people. Arabic text is cursive by nature which leads to a lower recognition accuracy than other languages. Because it is the language of the Muslims Book (Al-Quran), all Muslims can read the Arabic language. Besides that, the Arabic language is also important to other languages in the Middle East. It acts as the main text for languages such as Persian, Urdu, and Kurdish. Thus, the ability to automate the interpretation of written Arabic would have widespread benefits. Moreover, Arabic is the official language in all the institutes of Arab countries, which make it very important for different applications. Besides, most of the historical books and documents are written in Arabic language. Consequently, the Arabic handwritten recognition is the most needed system in many countries, especially the countries that want to convert their handwritten works into digital as apart of applying the e-government system [9].

1.2.5 Handwritten Recognition Approaches Handwritten recognition has two main approaches for recognizing the handwritten text. These approaches depend on the way of dealing with handwritten text and how to recognize it. The two handwritten approaches are:

· The holistic approach · The analysis approach

4

Chapter One


The holistic approach generally utilizes shape features extracted from the handwritten text image in an attempt to recognize the entire handwritten text. On the other hand, the analytic approach segments the handwritten text image into primitive components (typically characters) [9]. This thesis deals with handwritten text, not only handwritten words or characters. Besides, the holistic approach is considered by segmenting the handwritten text document into subwords then recognizes the entire handwritten sub-words without segmenting it into characters then build the output editable text from these recognized sub-words.

1.3 Literature Survey In the following review the various methods and approaches that used for developing the Arabic handwritten recognition systems are presented in this section: The first Arabic handwritten recognition system started with recognizing the Arabic digits. Since there are only 10 classes for the Arabic digits (0 – 9), the researchers achieved a high accuracy in recent years. Besides, handwritten digits recognition system is commonly used for recognizing the bank checks numbers [10]: In 2012 Mohd A., developed an Arabic handwritten digits recognition system based on zoning features and Majority Voting (MV) as a classifier. The system achieved 82.7% recognition accuracy [11]. However, in 2013 Gita S., et al. proposed digits recognition system using SVM classifier. The authors used Image Centroid Zone (ICZ) and Zone Centroid Zone (ZCZ) features and obtained 97.7% recognition accuracy [12]. In 2014 Mohamed H. Ghaleb et al. proposed recognition system using horizontal and vertical moment features and minimum distance classifier to recognize the printed and handwritten Arabic digits using 4500 image samples. The accuracy of the system was 74.9% [13].

5

Chapter One


In 2014 Mohsen B., et al. proposed handwritten recognition system for Arabic digits using Local Binary Pattern (LBP) as the base feature extraction method and Multi-Layer Perceptron (MLP) for classification. The obtained accuracy was 99.59% [14]. In addition, Pawan K., et al. in 2016 proposed an accurate digits recognition system based on moment features and deep learning of Multi-Layer Perceptron (MLP) classifier. The recognition accuracy of the proposed system was 99.3% [15].

After the digits recognition system, Arabic handwritten character recognition system has taken place. The Arabic handwritten character recognition system is considered in many researches. Since Arabic has 28 characters, the recognition system used a 28 classes for isolated characters and more for different character positions. The handwritten recognition system depends on the accuracy of segmentation. Like handwritten digits recognition, the researchers achieved a high recognition accuracy in isolated Arabic handwritten character recognition system [16]: In 2011 Lawgali A., et al. proposed Arabic handwritten character recognition system using Discrete Cosine Transform (DCT), Discrete Wavelet Transform (DWT) for features extraction and Artificial Neural Network (ANN) for classification. The system used 5600 Arabic handwritten character images and achieved 79.8%, 40.7% recognition accuracy for DCT and DWT respectively [17]. On the other hand, in 2012 Manal A., et al. developed a recognition system to recognize the Arabic characters by segment a set of handwritten word into characters then recognize the segmented characters. The authors recognize the character by matching the segmented characters with the characters in the used database and obtained 81% recognition rate [18]. In 2014 Lawgali A., et al. proposed a framework for Arabic handwritten recognition based on characters segmentation. DCT was used for features extraction and

6

Chapter One


ANN for classification. The proposed framework achieved 90.7% recognition accuracy [19]. In 2015 Farah M., used Haar wavelet transform (HWT), zoning features and Mahalanobis Distance (MD) classifier to develop handwritten recognition system of the Arabic character. 73% is the recognition rate that was achieved by the proposed system [20]. Mohamed E., et al. in 2016 achieved 98.3% recognition accuracy using Convolutional Neural Network (CNN) based Support Vector Machine (SVM) model of developing an offline Arabic character recognition system [21].

In recent years the researchers started working with holistic approach, by recognizing the whole Arabic handwritten word directly without segmenting it into characters. This type of system allowed the researchers to avoid the character segmentation problems which is still a big challenge and open research for the researchers. Besides, the Arabic handwritten words give more features than the characters which leads to better features extraction results. However, other researchers segment the handwritten word into characters then recognize the characters and recover the original words [22]: In 2014 Moftah E., et al. presented offline handwritten recognition based on Gabor Wavelet (GW) and explicit segmentation. About 600 handwritten word images were taken from IESK-arDB database and about 200 handwritten word images from FIN/ENIT (Institute of communication technologies (IFN) / National School of Engineers of Tunis (ENIT)) databases be used to evaluate the system. In preprocessing, Hough Transform (HT) and Local Minima Regression (LMR) are used to correct the skew of the handwritten word images. Besides, each handwritten word image was segmented into number of segments. GW is used to extract the features of the segmented images. Support Vector Machine (SVM) is applied to classify each handwritten word segment into its desired class. The recognition rates that achieved by the proposed system for both FIN/ENIT and 7

Chapter One


IESK-arDB databases were 71% and 55% respectively [23]. In 2015 Moftah E., et al. presented offline handwritten recognition system based on three classifiers. Their system investigates the application of the probabilistic discriminative based Conditional Random Fields (CRFs) and its extension the hidden-states CRFs (HCRFs) to the problem of off-line Arabic handwritten recognition. First, the word images are segmented into characters and the shape descriptor is used to extract the features of each handwritten character image. After that, HMMs, CRF and HCRF were used for classification. The system used 800 handwritten word images of IESK-arDB database and, the overall recognition rates obtained were 72%, 73%, and 75%, when testing HMMs, CRFs, and HCRFs on the samples of testing set respectively [24]. In 2016 Khaoula J., et al. proposed Arabic handwritten recognition system based on Dynamic Bayesian Network (DBN). The authors selected several handwritten images from IFN/ENIT database. The system preprocessed then normalized handwritten word images and segmented it into characters; besides, the moment invariant of Zernike and HU moment features were extracted for each character. The used features descriptors generate continuous features. Therefore, k-means method was used to quantize these features vector. Finally, the DBN used for classification and the recognition rate that achieved was 63-78.5% [25].

In addition, Arabic handwritten text recognition is the most recent issue in the handwritten recognition field. The Arabic handwritten text recognition is the difficult recognition system, since it deals with not only handwritten words and handwritten characters but a handwritten text that contains characters, words, and sub-words. The newest research that recognize the Arabic handwritten text is illustrated in following:

8

Chapter One


In 2016 Hicham E., et al. proposed an Arabic handwritten recognition system based on Hidden Markov Model Toolkit (HTK). The proposed system is applied to an “ArabicNumbers” data corpus which contains 47 handwritten words from AHDB database and 1905 sentences. These sentences are written by five different peoples. The input to the system is a handwritten text images that have three lines and each line has three or four handwritten words. The proposed system segments the input handwritten text into separate line images. The technique used is based on the horizontal projection profile of the input image then extracting the features from each line image using sliding window technique. The system achieved a rate of 80.33 %. The authors mentioned that, their system can be used in text recognition of bank checks or any other domains [26]. The results published in the literature showed that the rate of obtained recognition is restricted to areas of limited accuracy or writing classes, and constraints representative a particular aspect of using handwritten databases. Besides, the works focused on handwritten word or handwritten character recognition through recognizing the handwritten word directly or segment the handwritten word into characters, or segment the handwritten text into lines. In addition, all the used handwritten databases have only gray characters or words handwritten images. Moreover, the existing works are done for only one or maximum two handwritten databases with limited number of classes. On the other hand, there is not such a work done for handwritten text recognition by segmenting the text into sub-words then recognize the segmented sub-words. Also, there is no such a system that recognize the Arabic handwritten text and identify the text writer. Therefore, the Arabic text handwritten recognition and writer identification is still a subjects of active research in various areas.

9

Chapter One


1.4 Aim of Thesis The aim of thesis is to develop an accurate handwritten text recognition system based on multi-scale features extraction methods and identifying the writer of the input handwritten text, such that the allover system may be considered as e-services unit being a step in developing the e-government. Moreover, is to develop an Arabic handwritten database with colored and gray handwritten images that works for character, word, text recognition system and can be used for the security applications.

1.5 Thesis Contributions The main contributions of this thesis can be summarized as follows: 1. An Arabic handwritten database has been proposed. The proposed database has characters, words and texts for several writers from different ages and education backgrounds. 2. A simple and practical segmentation algorithm for segmenting the Arabic handwritten text into set of sub-words is proposed. 3. An efficient preprocessing stage has been proposed for the Arabic handwritten text and successfully applied to the proposed database. The proposed phases of the preprocessing include:  Thresholding algorithm is proposed for converting the gray image of the handwritten image into binary based on using proposed binarization algorithm that combine the intensity of image pixels and Fuzzy C-mean Clustering (FCM) method.  Noise removal algorithm has proposed for removing the unwanted pixels from the binary handwritten image based on two threshold obtained from the characteristic of Arabic language and without losing any desired information of the handwritten text shape. 10

Chapter One


 A black space elimination algorithm is proposed for removing the undesired pixels in the handwritten image background which do not give any feature of the handwritten text. 4. Using several methods for extracting the most appropriate features to recognize the Arabic handwritten text and identify its writer. However, several features extraction algorithms are proposed which are:  To extract the suitable features from the handwritten sub-images, a statistical and structural features extraction algorithms are proposed. The proposed algorithms extract the features based on the structure of the Arabic text and the intensity distribution of the pixels in the images based on mathematics computations.  Features extraction algorithms based on modifying the Histogram of Oriented Gradient (HOG) method are proposed for text recognition and writer identification. Besides, an edge detection filters for diagonal and antidiagonal directions are proposed to detect the handwritten image edges. 5. A features scaling algorithm has been proposed for reducing the system processing time by making all the extracting features in the range of [-1,1] which make the computation process simple. 6. Employing the Support Vector Machine (SVM) for multi classification process to classify the Arabic handwritten texts and writers using several kernels. 7. An Arabic and writers lexicons are proposed. The Arabic lexicon is used for assigning the classified handwritten texts into their corresponding editable Arabic texts. However, writers’ lexicon is used for assigning the classified labels into the desired writers of the handwritten text.

11

Chapter One


1.6 Organization of Thesis The thesis is structured in five chapters, here a brief description of their contents is given: Chapter two describes the handwritten recognition system, its types, and the concept of the handwritten recognition with its applications and overview of the Arabic handwritten recognition and the characteristics of the Arabic language. Chapter three gives a presentation of the proposed recognition and identification algorithms that are used to design the proposed system and the implementation of each one. Chapter four discusses the experimental results obtained from implementation of the proposed system. Chapter five highlights the conclusions and lists a number of suggestions for future work.

12

Theoretical Background

Chapter Two Theoretical Background 2.1 Introduction Handwritten Recognition (HR) is an active research area in artificial intelligence, pattern recognition and computer vision. The field of text recognition achieved great success in the real world target applications especially in the egovernment system, security application and other fields [27]. In this chapter the handwritten recognition classifications and their general process flow has been explained. For each step a brief overview is given for the handwritten techniques and methods used.

2.2 Handwritten Recognition The Handwritten recognition is the process of converting the handwritten text image into text file that are understandable by the computer and used for many purposes. Advances in handwritten recognition, have aided the automation of many demanding tasks in our daily life. There are a lot of applications that depends on handwritten which are postal address reading for mail sorting purposes, cheque recognition and word spotting on a handwritten text page, and etc. Naturally, Arabic handwritten text is cursive and more difficult than printed recognition due to several factors which are the writer’s style, quality of paper and geometric factors controlled by the writing condition such as being very unsteady in shape and quality of tracing. Moreover, there are several types of recognition which are [27]:  Numeral (digits) Recognition.  Character Recognition.  Word Recognition.  Text Recognition. 13

Chapter Two


In this thesis the handwritten text recognition system is concerned. Handwritten text recognition is the most difficult type of recognition systems. It deals with text documents or pages that contains several handwritten words and characters which make the recognition process more difficult and challengeable. An example of handwritten recognition system is shown in Figure 2.1.

Figure 2.1: Handwritten recognition system.

2.3 Classification of Text Recognition Text recognition systems are mainly classified into offline text and online text. Subsequently, offline text is classified into two subcategories of handwritten and typed text recognition. Figure 2.2 shows general text recognition classification [28].

Figure 2.2: Classifications of text recognition. 14

Chapter Two


2.3.1 Online Text Recognition In online text recognition, the handwritten text is collected and recognized in real time. A special digitizer tablet and pen are used to generate this type. A digitizer is an electromagnetic tablet which transfers the coordinates of the pen position to the computer at a constant rate. Personal Digital Assistant (PDA) and tablet is a clear example of generating the text for online text recognition. However, online text recognition is less difficult than offline text recognition, since the dynamic information which is usually available for online text recognitions like: number of strokes, order of strokes, direction for each stroke and speed of writing within each stroke. This valuable information assists in recognition of documents and frequently leads to better performing systems compared with offline text recognition [29].

2.3.2 Offline Text Recognition In offline text recognition, the text is produced using an ordinary pen and paper. Thus, the offline recognition methods use scanned images of the handwriting. The images are normally first enhanced and then features are extracted from images by means of digital image processing techniques. However, the offline text recognition is more difficult than the online text recognition due to the noise that created by the processing of the input devices; also due to the great variability found in human writing. Personal writing characteristics have an important influence leading to very different visual appearances of the same handwritten character. On the other hand, the handwritten text is more difficult than the printed text. The printed text has a stable writing style while the handwritten text is cursive which makes the process of recognition tougher [29]. Figure 2.3 illustrates the common ways to acquire offline and online handwritten text.

15

Chapter Two


Figure 2.3: Acquire of the offline and online handwritten text.

2.4 General Structure of Handwritten Text Recognition System In this part, the general structure of Handwritten Text Recognition (HTR) system is described. The input to the system is a handwritten text image and the output will be class labels that present the desired handwritten text. The typical text recognition system has several stages which are image acquisition, preprocessing, segmentation, features extraction and classification. However, some researchers are omit or merge these stages [30]. Figure 2.4 shows the general process flow of the HTR system.

Figure 2.4: General process flow of HTR system. 16

Chapter Two


2.4.1 Image Acquisition The first step in a HTR system is to convert the handwritten text document in quantities adapted to the digital processing system with a minimum of possible impairments. In offline mode according to the procurement tool used (scanner or camera), a color or gray level image is obtained.

2.4.2 Preprocessing Preprocessing is an essential stage of any recognition system which performed after the acquisition process. Generally, the preprocessing is not specific to the recognition of the handwritten text, but is conventional preprocessing in image processing field. The preprocessing is designed to prepare the image of the route to the next stage of analysis. It is essentially to reduce the noise superimposed data and keep as much as possible significant information as presented. The noise may be due to the device acquisition, the acquisition conditions (lighting, incorrect document formatting) or yet the quality of the original document. Among the preprocessing operations generally used include: thresholding, noise removal, edge detection, image thinning, and image normalization [31].

A. Binarization Binarization is the process of converting the gray image into a binary that is composed of two values 0 and 1 which make the image easiest to process. In general, using a binarization threshold appropriate reflecting the limits of high and low contrast in the image. For the low contrast images or variable contrast, it is difficult to set the threshold to a specific value. A classical method determines a binarization threshold by calculate the histogram of grayscale of the image. The threshold value will be equal to the gray level value lying in the valley between the two peaks of the histogram. The pixels h17

Chapter Two


aving a gray level above this threshold belong to the background and those with a lower value belong to the object(foreground) [32].

 Fuzzy C-Means Clustering (FCM) Fuzzy c-means (FCM) is a method of clustering which allows one piece of data to belong to two or more clusters. This method (developed by Dunn in 1973 and improved by Bezdek in 1981) is frequently used in pattern recognition. There are two main processes for Fuzzy c-means clustering which are: the calculation of cluster centers and the mentioning of points to these centers using Euclidian distance. In order to make the cluster centers are stable, the process is repeated. For each item of the data for the clusters FCM assigns a membership value within a range of 0 to 1. So it combines the fuzzy set’s concepts of partial membership and forms overlapping clusters to support it. A fuzzification parameter m is needed in range [1, n] that indicate the degree of fuzziness in the clusters. FCM is depends on minimization objective function that is described in Equation 2.1 [33]. 2

𝐶 𝑚 𝐽𝑚 = ∑𝑁 𝑖=1 ∑𝑗=0 𝑢𝑖𝑗 ‖𝑥𝑖 − 𝑐𝑗 ‖

(2.1)

Where m is any real number greater than 1, C is the number of clusters, N is the number of data, xi is the ith of d-dimensional measured data, cj is the d-dimension center of the cluster, uij is the degree of membership of xi in the cluster j, and ||*|| is any norm expressing the similarity between any measured data and the center. Through an iterative optimization of the Equation 2.1 function, fuzzy partitioning is carried out with the update of membership uij and the cluster centers cj by Equation 2.2.

18

Chapter Two

uij =


1

(2.2)

2 ‖xi−cj‖ m−1 c ∑k=1( ) ‖xi−ck‖

Where ||xi - cj|| is the distance between point i and current cluster center j, ||xi - ck|| is the distance between point i and other cluster centers k, Equation 2.3 is used to find the ddimension center of the cluster Cj for membership uij:

𝑐𝑗 =

𝑚 ∑𝑁 𝑖=1 𝑢𝑖𝑗 . 𝑥𝑖

(2.3)

𝑚 ∑𝑁 𝑖=1 𝑢𝑖𝑗

(𝑘+1)

The fuzzy partitioning process is stop when,𝑚𝑎𝑥𝑖𝑗 {|𝑢𝑖𝑗

(𝑘)

− 𝑢𝑖𝑗 |} < ε where ε is

between (0) and (1) to stop iterations and k is iteration stages [34].

B. Noise Removal The image of the handwritten text may be subject to noise introduced during the acquisition and transmission. This noise is undesired pixels that may appears in the binary image after the process of thresholding method. Noise removal is to examine the neighborhood of a pixel and eliminate isolated pixels of the one part (cleaning) [35]. A frequently used for noise removal is median filter that is performed by browsing image pixel using a window of size 3 × 3 and changing the value of the pixel based on the values of its 8 neighbors. Median filter sort the neighborhood pixels in ascending order then take the middle pixel as a median value [36].

19

Chapter Two


C. Edge Detection Edge detection is a research field that belongs to image processing and computer vision. It is identify areas of a digital image corresponding to rapid changes in light intensity. These changes include discontinuities in depth, the orientation of a surface, the properties of a material and in scene illumination. Detecting edges of an image significantly reduces the amount of data and eliminates the less relevant information, while preserving the important structural properties. Several edge detection operators are used to detect the image edges. The most common operators are: Canny, Sobel, Roberts and Prewitt edge detectors [37]. Sobel is used in the proposed work of this thesis.

 Sobel Edge Detector The Sobel is used in image processing for edge detection. Basically, the operator calculates the gradient of the intensity of each pixel. This indicates the direction of the largest change from light to dark and the rate of change in that direction. Then the sudden change points of brightness, possibly corresponding to the edges and the orientation of these edges will be known [38]. The operator uses the convolution of matrices. The matrix undergoes a convolution with the image to calculate the horizontal and vertical derivative as shown in Figure 2.5.

Figure 2.5: Sobel convolution kernels. (a) Vertical direction, (b) Horizontal direction. 20

Chapter Two


At each point, the approximations of the horizontal and vertical gradients and directions can be combined as in Equations 2.4 and 2.5 to get an approximation of the gradient norm [38]:

g

g x2  g y2

(2.4)

Where g is gradient magnitude

  tan 1 (

gx ) gy

(2.5)

Where θ is gradient direction

D. Image Thinning Thinning algorithm is a morphological operation that is used to remove selected foreground pixels from binary image. It preserves the topology (extent and connectivity) of the original region while throwing away most of the original foreground pixels [39]. Figure 2.6 shows the result of a thinning operation on a simple binary image. In image thinning, the template based mark-and-delete thinning algorithms are very popular because of their reliability and effectiveness. This type of thinning processes uses templates where a match of the template in the image deletes the centre pixel. They are iterative algorithms which erodes the outer layers of pixel until no more layers can be removed. Almost all iterative thinning algorithms use mark-and-delete templates including Stentiford, Zhang-Suen and Guo-Hall algorithms [40].

21

Chapter Two


(a)

(b)

Figure 2.6: Image thinning. (a) Original image, (b) Thinned image. In this thesis, Stentiford thinning algorithm which uses connectivity numbers to mark and delete pixels is used. The connectivity number is a measure of how many objects are connected with a particular pixel. The Equation 2.6 is used to calculate connectivity number. 𝐶𝑛 = ∑𝑘 ∈ 𝑆 𝑁𝑘 − (𝑁𝑘 . 𝑁𝑘+1 . 𝑁𝑘+2 )

(2.6)

Where Nk is the color of the eight neighbours of the pixel analysed, N0 is the centre pixel, N1 is the color value of the pixel to the right of the central pixel and the rest are numbered in counter clockwise order around the centre. Stentiford algorithm uses a set of four 3 x 3 templates to scan the image. Figure 2.7 shows the templets of Stentiford thinning algorithm.

22

Chapter Two


Figure 2.7: The four templates of Stentiford thinning algorithm.

The white circle represents a white pixel with a value of 255 and the black circle represents a black pixel with a value of zero. These templates cross the image in the following order:  T1 - from left to right and top to bottom.  T2 - from bottom to top and from left to right.  T3 - from right to left and from bottom to top.  T4 - from top to bottom and from right to left. The endpoint pixels are the pixels that are connected to only one other pixel. That is, if a black pixel has only one black neighbour out of the eight possible neighbours. Algorithm 2.1 shows the main process steps of Stentiford thinning algorithm [40].

Algorithm 2.1: Stentiford Input: Binary image Output: Thinned image Step1: Read the input image. Step2: Scroll the image until a pixel that fits the template T1 is found. Step3: If the pixel is not an endpoint, and has connectivity number = 1, then mark this pixel for deletion. Step4: Repeat steps 2 and 3 for all pixel locations matching T1. 23

Chapter Two


Step5: Repeat steps 2 ‫ ـــ‬4 for the rest of the templates: T2, T3, and T4. Step6: Set to white the pixels that marked for deletion. Step7: If any pixel was deleted in step 6. Repeat all steps from step 2. Step8: Return (thinned image)

E. Image Scaling The handwritten images in the databases have various sizes and resolutions. The recognition systems are sensitive to small variations in the size and position as is the case in matching templates and correlation methods. The scaling of the size of images seeks to reduce variations between images due to the size of handwritten text to improve the performance of the recognizer. Therefore, the adjusting of handwritten text sizes into standard size like 128*128, 64*64, 32*32 and etc., is needed. The common image scaling method is the image scaling based on preserved aspect ratio [41].

2.4.3 Segmentation Segmentation is a critical and decisive stage in several recognition systems. It is defined as the operation that seeks to break down the handwritten text image. The result of this operation is a form isolated from an image and that could be a handwritten words, sub-words, characters or sub-characters. However, this separation is not always possible. Generally, the performance of the segmentation affects directly the reliability of the overall handwritten text recognition system. There are two techniques were used in handwritten text recognition system [42]:

24

Chapter Two


A. Implicit segmentation: is to segment the handwritten text in the lower parts called grapheme (character and word components) and find characters and words by combining those graphemes. B. Explicit segmentation: is to segment the handwritten text exactly into sub-words or characters using general writing properties of the sub-word and characters. Consequently, explicit segmentation is used in this thesis.

2.4.4 Features Extraction For decision making, the handwritten recognition system only needs information relevant for differentiating one object from another. For this purpose, an extraction step of features is realized. This is a critical step during the construction of the handwritten text recognition system. However, the features extraction in the handwritten text recognition system is facing the largest problem of intra-class variability. Indeed, a character can take different forms depending on its position in the text [43]. On the other hand, the largest changes are introduced by the writer. The writing is personal to each individual and the route resulting from the writing of a text by two persons can be well different. Moreover, for the same writer, there are a number of constraints affecting the realization of the layout of his/her writing. The best recognition depends on a successful features extraction methods. A lot of features extraction methods have been proposed for handwritten recognition purpose [44]. The features are generally classified into three main categories which are: the structural features, statistical features and global transformations [45].

25

Chapter Two


A. Structural Features Structural features describe the geometrical and topological features of a pattern by describing its global and local properties. The structural features depend on the kind of pattern to be classified. For Arabic handwritten text, the features includes [46]: • The number of strokes, their sizes, directions and slopes. • The extreme points (end points). • The number of loops. • The number of dots. • The number of intersection. • The number of connected components. In general, structural features are challenging to extract from the Arabic handwritten text image and many errors occur because of the small difference between the Arabic characters [47].

B. Statistical Features The statistical features are extracted from the statistical distribution of pixels which describe the characteristic measurements of the input image pattern. The statistical features provide low complexity and high speed. The major statistical features can be summarized as: histograms of chain code directions, pixel densities, moments, Fourier descriptor and histogram of oriented gradient [48].

1. Histogram of Oriented Gradient [49] Histogram of Oriented Gradient (HOG) was first introduced by Dalal and Triggs for human body detection but it is now one of the successful and popular used descriptors in computer vision and pattern recognition. It is a characteristic describing the overall texture of the image by dividing the image into a grid of regions and then concatenate the 26

Chapter Two


histogram of oriented gradients of these regions into a single vector. Given an image intensity I, computing the HOG involves five main steps: 1) Calculating the image gradient. 2) Dividing the image into a cells. 3) Calculating a HOG for the all cells. 4) Normalizing HOG of each cell. 5) Concatenating the HOGs of all cells into one vector. Gradient of an image is a vector representing the variation in intensity by relative to the movement in the horizontal and vertical direction. Two filters are used to calculate the gradient of the image in horizontal and vertical directions which are:  Horizontal filter = (-1 0 1)

(2.7)

 Vertical filter = (-1 0 1) T

(2.8)

By applying these two filters, GH (x, y) and GV (x, y) of the image I are given by Equations 2.9 and 2.10 respectively. GV(x, y) = I(x + 1, y) − I(x − 1, y)

(2.9)

GH (x, y) = I(x, y + 1) − I(x, y − 1)

(2.10)

27

Chapter Two


(a)

(c)

(b)

(d)

Figure 2.8: Image gradient. (a) Original image, (b) Horizontal gradients, (c) Vertical gradients, (d) Magnitude of gradient. As any vector, the gradient NG (x, y) and its orientation θG (x, y) were found by Equations 2.4 and 2.5 according to GH (x, y) and GV (x, y). Figure 2.8 (d) shows the calculation of image gradient. After calculating the image gradient, the latter divides the image into cells that cover the entire image. All the cells have equivalent dimensions, such as 4X4 or 8X8 set of pixels. An example of a cell division process is illustrated in Figure 2.9.

28

Chapter Two


Figure 2.9: Cell division.

For each cell, a HOG is calculated using the gradients of all pixels. Each pixel position (x, y) is involved in computing the HOG of n components by Equation 2.11.

HOG (a) = HOG (a) + NG(x, y)

(2.11)

Where a ∈ [1, n] and satisfying the following condition: θG(x, y) ∈ [∆a, ∆a + 1] where

∆𝑎 =

(𝑎−1)

(2.12)

𝑛

The number of HOG components is configurable. It sets the precision orientation gradients. Figure 2.10 shows the configuration used in HOG. The gradient image is divided into a 3x3 grid of cells and each cell HOG 9 bins are extracted [49].

Figure 2.10: Histogram of oriented gradient for all image cells. 29

Chapter Two


The next step of HOG descriptor is the normalization. This step is to normalize the HOG for each region independently from other. Dalal and Triggs mentions two standardizations based on L1 and L2 norms:

𝑁𝑜𝑟𝑚 𝐿1 : 𝐻𝑂𝐺 =

𝐻𝑂𝐺 ‖𝐻𝑂𝐺‖1 + 𝜀

(2.13)

𝑁𝑜𝑟𝑚 𝐿2 : 𝐻𝑂𝐺 =

𝐻𝑂𝐺 ‖𝐻𝑂𝐺‖22 + 𝜀

(2.14)

Where HOG denotes the standard histogram of oriented gradient and ℰ is a regularization term (constant). After calculating {HOGi, 1 ≤ i ≤ M} of the normalized M cells of the image, a vector (HOGv) that concatenate all the HOGs is built as in Equation 2.15:

HOGv = [HOG1, HOG2, . . . ,HOGn]

(2.15)

2. Aspect Ratio (AR) The AR of an image describes the proportional relationship between its width and its height. The common use of the AR is in image normalization, features extraction and evaluation. AR is defined by Equation 2.16 [50]:

Aspect Ratio = Width / Height

(2.16)

C. Global Transformation The transformation schemes convert the pixels transformation of the pattern to a more compact form which reduces the dimensionality of features [50]. In this thesis Discrete Cosine Transform (DCT) is used for extracting the handwritten text features. 30

Chapter Two


 Discrete Cosine Transform Features (DCT) The DCT converts the pixel values of an image in the spatial domain into its elementary frequency components in the frequency domain [51]. Given an image f (x, y), its 2D DCT transform is defined in Equation 2.17.

𝐹 (𝑢, 𝑣 ) =

2

𝑁−1 𝐶 (𝑢)𝐶 (𝑣 ) ∑𝑁−1 𝑥=0 ∑𝑦=0 𝑓 (𝑥, 𝑦) cos [

𝑁

(2𝑥+1)𝑢𝜋 2𝑁

(2𝑦+1)𝑣𝜋

] cos [

2𝑁

]

(2.17)

The inverse transform is defined by Equation 2.18:

𝐹 (𝑖, 𝑗) =

2 𝑁

(2𝑥+1)𝑢𝜋

𝑁−1 ∑𝑁−1 𝑢=0 ∑𝑣=0 𝐶 (𝑢 )𝐶 (𝑣 ) 𝑓 (𝑢, 𝑣 ) cos [

2𝑁

(2𝑦+1)𝑣𝜋

] cos [

2𝑁

]

(2.18)

Where

√ 𝐶 (𝑢 ), 𝐶 ( 𝑣 ) = {

√

1

For u, v = 0

𝑁 2 𝑁

Otherwise

Figure 2.11 illustrates the DCT transformation band, where the image is decomposed into 8x8 dimension blocks to make the computation process fast, results in the blocks of DCT of dimensions 8x8. In each DCT block, there will be Low Frequency (LF) band which represents the lowest frequency coefficients while the High Frequency (HF) band is used to represent the higher frequency coefficients of the block. Besides, the Middle Frequency (MF) band is used to represent the middle frequency coefficients [51].

31

Chapter Two


DCT coefficients

Input image

Figure 2.11: Image transformation using DCT. (a) Transformation of spatial domain to frequency domain (b) DCT bands. Due to its strong capability to compress energy, the DCT is a useful tool for pattern recognition applications. The DCT can contribute to a successful recognition system with classification techniques such as Support Vector Machine (SVM) and Artificial Neural Network (ANN) [52]. Furthermore, the main advantage of using DCT is the removal of the redundancy among neighboring pixels that provides uncorrelated transform coefficients that can be used independently [53]. However, J. Makhoul [54] proposed a fast computation of the DCT via the Fast Fourier transform (FFT) that is called Fast Cosine Transform (FCT). The DCT of an N-point real signal is derived by taking the Discrete Fourier Transform (DFT) of a 2N-point even extension of the signal and using only an N-point DFT of a reordered version of the original signal. The implementation of the FCT is illustrated in Algorithm 2.2.

32

Chapter Two


Algorithm 2.2: Fast Cosine Transform (FCT) Input: Data sequence Output: DCT coefficients Step1: Load the N-point real data sequence X(n)

0≤n≤N-1

Step2: Generate y(n) to be a 2N-point even extension of X(n) defined by: 𝑦 (𝑛 ) = {

𝑥 (𝑛 ), 0 ≤𝑛 ≤𝑁−1 𝑥(2𝑁 − 𝑛 − 1) 𝑁 ≤ 𝑛 ≤ 2𝑁 − 1

Step3: Divide the sequence y(n) into 2N-point sequences v(n) and w(n) v(n) = y(2n) w(n) = y(2n +1)

// v(n) is even set

0 ≤ n ≤ N-1

// w(n) is odd set

Step4: Compute V(k), the DFT of v(n) 𝑘 ∑𝑁−1 ( ) 𝑉 (𝑘 ) = 2 𝑅𝑒 [𝑊4𝑁 𝑊𝑁𝑛𝑘 ], 𝑛=0 𝑣 𝑛

0≤k≤N-1

Step5: Obtain DCT C(k) from V(k) by: 𝐶 (𝑘 ) = 𝑅𝑒 [ 2 𝑒

−𝑗𝜋𝑘 2𝑁

V]

0≤k≤[N/2]

Step6: Return C(k)

2.4.5 Classification In the entire process of a pattern recognition system, classification plays an important role by deciding on a form belonging to a class. The main idea of classification is to assign an example (a form) to an unknown predefined class from the description in the form of parameters. There are several classifiers that have been applied to text recogni33

Chapter Two


tion systems such as: K-Nearest Neighbors (KNN), ANN, Hidden Markov Models (HMM) and SVM [55].

A. K-Nearest Neighbors [56] The KNN algorithm is one of the simplest machine learning algorithms. In a classification context of a new observation X, the basic idea is simply to choose nearest neighbors of this observation. X has a class that is determined according to the class majority among the K nearest neighbors of the observation X. To find the nearest K of a given rank, one can choose the Euclidean Distance (ED). A two data represented by two vectors xi and xj, the distance between these two data is given by Equation 2.19.

𝑑(𝑥𝑖 , 𝑥𝑗 ) = √∑𝑑𝑘=1(𝑥𝑖𝑘 − 𝑥𝑗𝑘 )2

(2.19)

Where d is the x length, xi is the first vector, and xj is the second vector. The main advantage of this algorithm is its simplicity and the fact that it does not require learning. It is associated with a distance function and a choice function of the class in terms of the classes of the nearest neighbors, which is the model. The KNN then fall into the category of non-parametric models. The introduction of new data can improve the performance of the algorithm without requiring the reconstruction of a model. These are a major differences with algorithm such as artificial neural network (ANN).

B. Support Vector Machine (SVM) SVM is a binary classification method for supervised learning that was introduced by Vapnik in 1995. This method is therefore a recent alternative for classification. It is based on the existence of a linear classifier in a suitable space. Since it is a classification 34

Chapter Two


problem with two classes, this method uses a training set to learn the model parameters. It is based on the use of so-called kernel functions (kernel) which allows an optimal data separation. SVM is particularly effective in that it can deals with problems involving large numbers of descriptors, provides a unique solution (no local minimum problems like neural networks) and provideds good results on real problems [57]. The algorithm in its original form is like looking for a linear decision boundary between two classes, but this model can be greatly enriched by projecting in another space to increase the separation of the data. Therefore, the same algorithm can be applied in the new space which results in a non-linear decision boundary in the initial space. The main advantages of the SVM are producing a very accurate classifiers and less overfitting that is robust to noise. [58]

 General Principles [59] The simplest case is where the training data only come from two different classes (+1 or -1) this is called binary classification. The idea of SVMs is to find a hyperplane which separates these two classes. If such a hyperplane exists, that is to say if the data is linearly separable it is called a SVM (Hard Margin) as in Figure 2.12.

Figure 2.12: The hyperplane H that separates the two sets of points. 35

Chapter Two


The nearest points which are alone used for the determination of the hyperplane are called support vectors as shown in Figure 2.13.

Support Vectors

Figure 2.13: Support vectors.

The separating hyperplane is represented by Equation 2.20: 𝐻 (𝑥 ) = 𝑤 𝑇 𝑥 + 𝑏

(2.20)

Where w is direction perpendicular to the hyperplane, and b is a constant. The decision function for an x example can be expressed as follows:

𝐶𝑙𝑎𝑠𝑠=1 𝐼𝐹 𝐻(𝑥) > 0 } 𝐶𝑙𝑎𝑠𝑠= −1 𝐼𝐹 𝐻(𝑥) < 0

Since both classes are linearly separable, there is no example which located on the hyperplane for satisfying H (x) = 0. It is then appropriate to use the following decisions:

𝐶𝑙𝑎𝑠𝑠=1 𝐼𝐹 𝐻(𝑥) > 1 } 𝐶𝑙𝑎𝑠𝑠= −1 𝐼𝐹 𝐻(𝑥) < −1

36

Chapter Two


The values +1 and -1 to the right of the inequality may be any constants +a and -a, but by dividing both sides of the inequality has found previous inequalities are found, which are equivalent to the Equation 2.21: 𝑦𝑖 (𝑤 𝑇 𝑋𝑖 + 𝑏 ) ≥ 1, 𝑖 = 1 . . 𝑛)

(2.21)

The hyperplane wT.xi + b = 0 represents a hyperplane separating the two classes, and distance between the hyperplane and the closest example is called the margin. The region that lies between the two hyperplanes wT.xi + b = -1 and wT.xi + b = 1 is called the region generalization of the learning machine. Over this region, the generalization capability of the machine is greater. Maximizing this region is the objective of the training phase. The SVM method seeks the hyperplane that maximizes the generalization region of the margin. Such a hyperplane is called “optimal hyperplane” as shown in Figure 2.14. Assuming that the training data does not contain noisy data (poorly-labeled) and the test data follow the same probability as training data the maximum margin hyperplane will certainly maximize generalization ability of the

Class x2 values

learning machine.

Class x1 values

Figure 2.14: Example maximum margin (optimal hyperplane) [59]. 37

Chapter Two


In the case where the data are not linearly separable or contain noise (outliers: mislabeled data) constraints cannot be verified and there is the need to relax a little. This can be done by admitting a certain error classification of data which is called SVM (Soft Margin) as

Class x2 values

illustrated in Figure 2.15.

Class x1 values

Figure 2.15: Linear SVM with soft margin [59]. Then introduced the constraints of variables ℰi are introduced, called for relaxing the constraint of the Equation 2.22: 𝑦𝑖 (𝑤 𝑇 𝑥𝑖 + 𝑏 ) ≥ 1 − 𝜀𝑖 , 𝑖 = 1 . . 𝑛)

(2.22)

During the testing, the input sample is associated with the class whose output is positive as Rule: x ∈ Ck if wi .x + bi > 0 for i = k. However, it is possible that several outputs are positive for a given test sample. This is particularly true of ambiguous data located near the borders of separation of classes. In this case, a vote majority to assign the instance x to the class Ck according to the decision rule is used as in Equation 2.23:

38

Chapter Two


𝑐 = arg max(𝑤𝑖 . 𝑥 + 𝑏𝑖 )

(2.23)

In another hand, when training data is not linearly separable then SVM uses a kernel function (K) as in Table 2.1 to map the data into a higher dimensional space (feature space) where data can be linearly separated as shown in Figure 2.16.

Figure 2.16: Changing the data space. (a) Input space, (b) Feature Space.

Several SVM kernel functions that were used to map the input space to feature space and gave a good classification accuracy when classifying a new example. The most common kernel functions are shown in Table 2.1.

Table 2.1: SVM kernels. Core

Formula

Linear

K(x,y) = x.y

Sigmoid

K(x,y) = tanh(ax.y +b)

Radial Basis Function (RBF) K(x,y) = exp(-||x-y||2/σ2) K(x,y) = (ax.y +b)d

Polynomial

39

Chapter Two


2.5 Arabic Handwritten Text Recognition Arabic Handwritten Text Recognition (AHTR) has been a very challenging field in recent years. Unlike the Chinese and Latin languages, Arabic is considered very difficult language in recognition process due to its complex features [60]. In the next section, a brief description of the Arabic language features and handwritten databases are presented.

2.5.1 Features of Arabic Language [61] The main features of Arabic writing can be summarized as follows: A. The Arabic alphabet has 28 basic letters. Unlike the Latin alphabet, Arabic letters each comes in several forms depending on its place in the text, Initial (In), middle (M), final (F) and isolated (I) as shown in Table 2.2 which distinguish 28 characters of the alphabet with four forms of writing.

Table 2.2: Arabic characters and their forms.

40

Chapter Two


B. There is no difference between the handwritten letters and printed letters shapes. The capital letter and lowercase letter notions do not exist. C. Most Arabic handwritten letters are linked, even the printed Arabic letters, which gives Arabic writing characteristic of cursive. Figure 2.17 illustrates cursiveness of the Arabic language.

Figure 2.17: Cursiveness of Arabic language.

D. An Arabic character can contains a vertical line (TAA (‫))ط‬, an oblique stroke (KAF (‫ ))كـ‬or zigzag (HAMZA (‫))ء‬. E. Arabic characters do not have a fixed size (height and width), height varies from one character to another and from one form to another within the same character. F. In Arabic language, a single word can be interspersed one or more spaces giving many semi-words or related components or also sub-words which is the case of the word represented in Figure 2.18. In handwritten, the spacing between the different semi-words of the same sub-word is smaller than the spacing between two different sub-words.

41

Chapter Two


4 semi-words

1 sub-word

(a)

(b)

Figure 2.18: Example semi-word constituting Arabic sub-word. (a) 4 semi-words, (b) 1sub-word. G. In the Arabic alphabet, 15 letters of the 28 have one or more points. These points are located either above or below of the shape to which they are associated with but never both at once. The maximum number of points a letter may have a letter is three points above the character or two points below it. Table 2.3 presents the letters with points, their numbers and positions.

Table 2.3: Arabic letters with diacritical points. Position Above

Below

‫خ ز ض ظ غ ف ن ذ‬

‫ب ج‬

Number of Point One points

‫ق‬

Two points

‫ت‬

‫ث ش‬

Three points

42

‫ي‬ none

Chapter Two


2.5.2 Arabic Handwritten Text Recognition Databases In order to evaluate a handwritten recognition system, the accuracy and the speed should be measured and compared with an average of human reader. Although some works have been conducted on Arabic handwritten, they generally use small databases of their own or presented results on databases which were unavailable to the public. However, there are a handwritten databases which are available to public and can be used for recognition purpose but unfortunately most of these databases are not accessible because they were developed for a well-defined research work. The first common handwritten database is Al-Isra handwritten database which proposed by Nawwaf K., et al. [62]. The data from Al-Isra database was collected in the University of Amman. This database includes a gray variation image of Arabic handwritten words (37000), numbers (10000 Arabic and Indian), signatures (2500) and texts (500 paragraphs). The second handwritten database to represent is CENPARMI. This database is published by Al-Ohali Y., et al. [63] in 2000. It has 7000 handwritten gray images of Saudi checks. These checks are divided into several parts. The first part has 1547 handwritten word images, while the second part has 1547 of printed word images. Besides, the third part has 23325 semi-words images and the last part has 9865 images of Indian numbers. Alma'adeed S., et al. [64] presented an Arabic handwritten database (AHDB). A hundred writers were invited to write words from the vocabulary of digital amounts. In addition, the AHDB contains the most popular words in Arabic handwritten and numbers which contain 4700 handwritten gray images. An example of the AHDB database is illustrated in Figure 2.19.

43

Chapter Two


‫يمكن‬

‫هللا‬

‫على‬

‫ألف‬

‫عبد‬

‫احدى‬

Figure 2.19: Example of AHDB database [63].

A database for off-line Arabic handwritten IESK-ArDB was presented by Moftah E., et al. [65]. The database contains 280 pages of a 14th century historical manuscripts, more than 4000 handwritten word gray images and 6000 segmented character images. The word database vocabulary covers most of Arabic part of speech; nouns, verbs, country/city names, security terms and words used for writing bank amounts. An example of the IESKArDB database is illustrated in Figure 2.20.

44

Chapter Two


‫نووي‬

‫االسبوع‬

‫القاهرة‬

‫رصاص‬

‫العراق‬

‫تسليم‬

Figure 2.20: Example of IESK-ArDB database [65].

Furthermore, the last database represented is (Institute of communication technologies (IFN) / National School of Engineers of Tunis (ENIT)) IFN/ENIT database. This database is developed by the Institute of communication technologies (IFN) in cooperation with National School of Engineers of Tunis (ENIT) in 2002. The database consists of 5 subassemblies, containing in whole 32492 cities / Tunisian village names images. These names are collected from more than 1,000 writers of different ages and professions [66]. In this thesis the AHDB and IESK-ArDB databases will be used for training and testing the system. These databases used by several researches in the literature that have accuracy results to compare with. In summary, the Arabic handwritten databases are summarized in Table 2.4.

45

Chapter Two


Table 2.4: Arabic handwritten databases. Database Name

Type

Data

Arabic handwritten 1 Al-Isra

words , digits , signatures and text

Availability

37000 words 10000 numbers 2500 signatures

Confidential

500 texts (paragraphs) 1547 words amounts

2 CENPARMI

Arabic handwritten

1547 digital amounts

words

23325 semi-words

Confidential

9865 Indian numbers 3 AHDB

4 IESK_ArDB

5 IFN/ENIT

Arabic handwritten words

3150 words

Arabic handwritten

6000 characters

words and characters

4000 words

Arabic handwritten words

32492 words

Public

Confidential

Confidential

Most of the existing databases are unavailable online and the others are not free to use. Besides, all the databases have been created only for recognition systems and cannot be employed for security applications. In addition, the handwritten databases have different text images that written by the same writer but do not have similar text images that are written by the same writer, which make the identification process is text independent only. Furthermore, the available handwritten databases have many handwritten images written by known writers which make the identification task impossible to implement. Therefore, to overcome the problems of the existing handwritten databases, a new database is needed. 46

Chapter Two


In this thesis, an Arabic handwritten database is proposed to satisfy the recognition system’s requirements.

2.6 Handwritten Recognition Applications [67] Through particularization problems, the handwritten recognition has enabled the development of specific and effective applications for the handwritten recognition offline and online.

2.6.1 Offline Handwritten Recognition The handwritten recognition offline experienced significant growth in the areas associated with the development of economic interests and e-government services.

A. Reading postal addresses Reading handwritten postal codes associated with the reading of the names of cities has extended the development of automatic mail sorting machines (letters).

B. Banking Recognition of literal amounts manuscripts, associated with the recognition of digital reading amounts to validate checks. Recognition of checks may be associated with the corresponding coupons. Machines capable of reading several thousand checks per hour are already in use currently. The recognition of digital amounts is allowed for the creation of the ABMs discount bank checks. In these controllers, the client is identified by his/her bank card that indicates the keyboard of the check amount. If the amount coincides with the digital amount acknowledged receipt of the check is immediately validated. 47

Chapter Two


C. Forms and Schedules These are mainly applications of OCR reading of survey forms, order forms and insurance observations.

D. User Authentication: Handwritten Verification and Identification Handwritten user authentication can be achieved in two alternative classification modes which are verification and identification. In user verification, the authentication system classifies if a set of handwritten features is similar enough to a set of reference templates of a person of claimed identity to confirm the claim or not. The result of handwritten user verification is always a binary yes/no decision, confirming an identity or not. However, identification describes the process of determining the identity of a writer based on handwritten features. Here, the classification will assign the handwritten features to one out of all the classes of persons that are registered with a particular system. One possibility to implement identification is exhaustive verification, where the actual handwritten features are compared with all registered references and the identity is determined by the origin of references of greatest similarity. In this view, identification can be accomplished by a systematic verification between an actual handwritten samples to references of all known (registered) users in the authentication system. The result of this process then yields the identity linked to those references showing greatest similarity. Consequently, the process of identification can be modeled as sequences of one-to-all verifications [68].

2.6.2 Online Handwritten Recognition Recognition of the online writing is experiencing significant growth in related areas of a natural human-machine communication, friendly and ergonomic as well as portability. 48

Chapter Two


 Personal Digital Assistant (PDA) Recognition of the online writing aims to replace the keyboard and mouse of computer with a pen. This change is intended to make them more user-friendly computers allowing use in very diverse situations (taking notes, writing orders, accident reports, writing, teaching and etc.). This is firstly of making mobile computing.

2.7 Evaluation Measures of Handwritten Recognition System In order to evaluate the handwritten recognition system, several measures are used. Each of these measures are used in one of the handwritten recognition system stages. First of all, in segmentation stage pixels based Matching Score (MS) criterion [69] is used in order to evaluate the segmentation results. The MS criterion is defined by Equation 2.24:

𝑀𝑆 =

𝑇 ∩ 𝑇𝐺

(2.24)

𝑇 ∪ 𝑇𝐺

Where T is the segmented image of the segmentation algorithm and T G is the original image of the handwritten database. MS is a real number between 0 and 1 and represents the matching score between the resultant image and the original one. A higher MS difference indicates a better segmentation. In preprocessing stage, Misclassification Error (ME) evaluates the binary images according to the similarities between the output images and the ground-truth images. The ME finds the number of background pixels wrongly assigned to foreground and on the contrary foreground pixels wrongly assigned to background [70]. Equation 2.25 is used to compute the ME.

𝑀𝐸 = 1 −

(|𝐵𝑜 ∩ 𝐵𝑜 | + |𝐹𝑜 ∩ 𝐹𝑜 |) (|𝐵𝑜 |+ |𝐹𝑜 |)

(2.25) 49

Chapter Two


Where BO and FO denote respectively, the background and foreground of the original ground-truth image and BT and FT denote the background and foreground pixels in the test image. The ME varies from 0 for a perfectly thresholded image and to 1 for a totally wrongly thresholded image. However, the performance of the noise removal is found by Mean Square Error (MSE) and Peak Signal to Noise Ratio (PSNR) [71]. Smaller value of MSE means better performance of the noise removal and higher PSNR value means the better noise removal performance. In order to compute the Mean Square Error (MSE) Equation 2.26 is used.

𝑀𝑆𝐸 =

1 𝑀∗𝑁

𝑀 2 ∑𝑁 𝑖=1 ∑𝑗=1[𝐼 (𝑖, 𝑗) − 𝐼′(𝑖, 𝑗)]

(2.26)

Where I (i,j) is the original image, I’(i,j) is the reconstructed image after removing the noise and M,N are the dimensions of the image. However, PSNR is found by Equation 2.27. 2552

𝑃𝑆𝑁𝑅 = 10 log10 (

𝑀𝑆𝐸

)

(2.27)

In another hand, to evaluate the accuracy of any handwritten recognition system, a performances measures are used. The concept of True Positive (TP) is equivalent with a correct recognition of the test handwritten image as being a certain member from the testing set, while the True Negative (TN) corresponds to a correct rejection. Besides that, False Positive (FP) means incorrect recognition of the test handwritten image as being a certain member from the testing set, while it is not. Last, the False Negative (FN) corresponds to the error of failing to recognize the test handwritten image as being a certain member from the testing set while it is true [72]. 50

Chapter Two


Based on the above terminology, the True Positive Rate (TPR) is defined as the ratio between the number of TP and the total number of TP and FN as in Equation 2.28.

𝑇𝑃𝑅(𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛) =

𝑇𝑃

(2.28)

𝑇𝑃+𝐹𝑁

While the False Positive Rate (FPR) is obtained by Equation 2.29.

𝐹𝑃𝑅(𝑟𝑒𝑐𝑎𝑙𝑙) =

𝐹𝑃

(2.29)

𝐹𝑃+𝑇𝑁

These two measures are employed for class discrimination as focused in this thesis. The recognition rate (accuracy) can be defined as in Equation 2.30.

𝑅𝑒𝑐𝑜𝑔𝑛𝑖𝑡𝑖𝑜𝑛 𝑟𝑎𝑡𝑒 =

𝑇𝑃 𝑇𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑡𝑒𝑠𝑡 𝑖𝑚𝑎𝑔𝑒𝑠

𝑋 100%

(2.30)

However, the error rate can be defined as in Equation 2.31.

𝐸𝑟𝑟𝑜𝑟 𝑟𝑎𝑡𝑒 =

𝐹𝑃 𝑇𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑡𝑒𝑠𝑡 𝑖𝑚𝑎𝑔𝑒𝑠

𝑋 100%

(2.31)

Finally, the F1score is used for computing the test accuracy based on the precision and recall in Equations 2.28 and 2.29. F1score can be found using Equation 2.32.

𝐹1𝑠𝑐𝑜𝑟𝑒 =

2∗𝑇𝑃𝑅∗ 𝐹𝑃𝑅

(2.32)

𝑇𝑃𝑅+𝐹𝑃𝑅

51

Proposed Arabic Handwritten Text Recognition System

Chapter Three Proposed Arabic Handwritten Text Recognition System 3.1 Introduction This chapter presents the proposed Arabic Handwritten Text Recognition System (AHTRS). The chapter will describe the system architecture in details with the proposed algorithms in the various stages of the proposed system. Moreover, a proposed database of Arabic handwritten text is presented and discussed.

3.2 Architecture of the Proposed System The main tasks of the proposed system are recognizing the input Arabic handwritten text by convert it into editable text and identify the writer of this document. AHTR system has two modules, one for the recognition of Arabic handwritten text and the other for identifying the desired writer of that text. The input to the system is an Arabic text handwritten text image which passes into the both modules, where the output of the first module is editable Arabic text and the second one is the writer of the input Arabic handwritten text document. Figure 3.1 shows the proposed system architecture. Module1 works when a handwritten text image is entered into the system. The lexicon of Arabic text is used to assign the output class labels into their desired Arabic editable text. Besides, the module2 works when the pre-procced handwritten sub-images are obtained from module1. The list of known authorship is used to the output class labels into their desired writers. Furthermore, the output of the system depends on the manager query, which are editable Arabic text and the handwritten text’s writer.

52

Chapter Three


Figure 3.1: Proposed AHTR system architecture.

3.3 Arabic Handwritten Text Recognition (Module1) The aim of module1 in the AHTR system is to recognize the Arabic handwritten text. The input to the module1 is a handwritten text image, and the output is an Arabic editable text. Module1 has several main stages and each stage has several phases that work together to achieve the system goals. The main stages for the module1 are: image acquisition, segmentation, preprocessing, features base construction, classification and post-processing which are shown in Figure 3.2.

53

Chapter Three


Figure 3.2: Architecture of module1.

3.3.1 Image Acquisition Stage In this stage the input image of the handwritten text image is taken via camera or scanner. The image should have several formats such as JPEG, BMP, PNG, and etc. The input captured may be in gray or color form. Input images are converted into gray in order to reduce the image size then passed to the next stage of the system. An example of the Arabic handwritten text image is illustrated in Figure 3.3.

54

Chapter Three


(a)

(b)

(c) Figure 3.3: Arabic handwritten text image. (a), (b) Color image, (c) Gray image.

3.3.2 Segmentation Stage The input text image is segmented into several Arabic texts (sub-words) then each segment passes to the next stage of the system. In order to perform the segmentation, two features in the Arabic handwritten text are considered. The first feature is the longest Arabic word having eleven characters and the shortest Arabic word having only two characters. However, the second feature is that, the distances between the handwritten sub-words in the Arabic text are greater than the distances between the semi-words which belong to the same Arabic handwritten sub-word as shown in Figure 3.4. 55

Chapter Three


Distances between the handwritten sub-words

Distances between the semi-words which form to the same Arabic handwritten sub-word

Figure 3.4: Distances feature of the Arabic handwritten text.

Therefore, according to these features a proposed segmentation algorithm is implemented, through drawing a rectangle around each Arabic handwritten segment in the input text handwritten image and crop each segment to save it as a single handwritten sub-image. Algorithm 3.1 explains the steps of the proposed segmentation stage.

Algorithm 3.1: Text Segmentation. Input: Arabic handwritten text image (gray image). Output: Set of sub-images // each sub-image contains handwritten sub-word. Step1: Read the input image (I). Step2: Apply Sobel filter on the input image to detect the edge I’= edge (I). Step3: Dilate the edge image I’.

// by 3*3 mask.

Step4: Fill the holes if found in edge image I’. Step5: Scan the edge image I’ and label all the objects. Step6: Find the center of mass C (centroid) for each labeled object. Step7: Draw a rectangle around each object based on the centroid C. 56

Chapter Three


Step8: Apply the obtained rectangles on the objects in the original input image (I). Step9: Crop the objects inside the rectangles as a sub-images I’1 . . . . I’n // n is the number Step10: Return (set of sub-images).

of sub-images

By applying the Algorithm 3.1 there will be several handwritten segments (sub-words). Each segment is given a name then passed to the preprocessing stage. The proposed segmentation algorithm is simple and does not need to deal with the histograms of the input images to segment it into sub-images. Figure 3.5 presents an example of applying the main stages of the proposed segmentation Algorithm (Algorithm 3.1).

Figure 3.5: Applying the proposed text segmentation Algorithm. (a) Input image, (b) applying Sobel filter, (c) applying dilation and filling methods, (d) drawing rectangles around the labeled objects, (e) Drawing the obtained rectangles on original image, (f) handwritten sub-images. 57

Chapter Three


3.3.3 Preprocessing Stage The role of the preprocessing is to prepare the sub-images of the handwritten text for the next recognition stages. This is basically to reduce the noise superimposed data and keep only the desired information. In the proposed system, a preprocessing stage has been proposed. The proposed preprocessing stage has six steps which are:

A. Image thresholding, B. Noise removal, C. Black space elimination, D. Image thinning, E. Edge detection, and F. Image scaling. Each step has different effects on the AHTR system and by working together, the recognition accuracy of the AHTR system has increased. Figure 3.6 shows the proposed preprocessing stage steps.

Figure 3.6: The proposed preprocessing stage of module1. 58

Chapter Three


The proposed stage produce three output handwritten images. The first image is the thinned handwritten image and the second one is a binary handwritten image; while the third one is a handwritten edge image. Each of these images will be used for specific features extraction methods in order to build the features vector to be used for training and testing the proposed system.

A. Image Thresholding The first step of the proposed preprocessing stage is image thresholding which converts the grayscale image into binary. The intensity histogram computation based approaches generally use two peaks for finding the threshold value, but many images do not have such two peaks in the histogram. Besides, the common algorithms have unsatisfied results for the handwritten text documents such as global thresholding. Global thresholding is a common technique to convert the grayscale image into binary based on choosing a global threshold value. Finding such a threshold value is computationally complex. This technique also has a side effect on the handwritten text by excluding some parts of its parts or inclusion of some noise and vice versa. Therefore, the proposed thresholding algorithm overcomes this problem by classifying pixels into the foreground and background correctly with less numbers of misclassified pixels. The proposed thresholding algorithm starts with calculating the intensity value of pixels (Imax and Imin) in the gray handwritten sub-image, then finding the mean and difference between the max and min intensity. Furthermore, the result will be as input to the Fuzzy C-Means Culturing (FCM) that is presented in section 2.4.2(A) to attract the nearest similar pixels by clustering operation and produce best threshold level to convert the image into binary. Algorithm 3.2 describes the main steps of the proposed image thresholding.

59

Chapter Three


Algorithm 3.2: Arabic Handwritten Image Thresholding Input: Gray image(handwritten sub-image) Output: Binary image Step1: Read the Gray image Step2: Calculate the IMax and IMin intensity of pixel: IMax = max(grayimage(:)).

\\ Maximum intensity of pixel

IMin = min(grayimage(:)).

\\ Minimum intensity of pixel

Step3: Find the IMea of the intensity through IMea = mean(grayimage(:)).

\\ Mean intensity of pixel

Step4: Find the difference between the IMax and IMin intensity Idiffr= IMax – IMin Step5: IF the IMea < Idiffr then

\\ IMea < Idiffr Average value

Io=110

\\ I offset For large intensity variations

ELSE Io=20

\\ I offset For lower intensity variations

End IF Step6: Calculate the threshold value T= IMin + Io

\\ T Threshold value

Step7: Feed the FCM with T value and the gray image \\ FCM: Fuzzy C-Means Clustering Lev= FCM(T, gray image).

\\ Lev Threshold luminance level

Step8: Convert the gray image I into binary based on the Lev value Ib = binary(I , Lev). Step9: Return (binary image (Ib)).

The intensity of pixel value determines the black and white pixels in the image then the FCM will use the black/white as input, and select several black/white pixels to use them as centroid for the clustering operation. In addition, after testing several values between (0 and 200) on 50 images to get best variation range, 20 and 110 were chosen by experiment as the best offset variation ranges of the handwritten images. In addition, if difference between foreground and background intensity, that is Idiffr>Imea (average value) 60

Chapter Three


larger than the mean , then the possible variation range of text content in large range and the offset (Io) is set to be 110. However, for small difference, the variation range of the text content will also be small. Therefore, the offset (Io) is set to be 20 which is a small value. Experimental results show that the offset values of 20 for lower intensity variations and 110 for large variations could cater for general pen pressure and color variations, and hence gives very satisfactory results after testing 50 Arabic handwritten images. In another hand, the threshold value (T) is used to enhance the FCM processing through minimizing the selected clustering points and the number of iterations. The FCM gives best result for overlapped data set and comparatively better than k-means algorithm. Unlike k-means where data point must exclusively belong to one cluster center here data point is assigned membership to each cluster center as a result of which data point may belong to more than one cluster center. Figure 3.7 shows the Arabic text handwritten subimage after applying the proposed thresholding Algorithm.

(a)

(b)

Figure 3.7: Image thresholding. (a) Original image, (b) Image after applying the proposed algorithm.

B. Noise Removal During the acquisition process and thresholding steps, some false pixels are added to the handwritten text image and sub-image. These pixels represent noise affecting the image quality in creating irregularities on the outline of the handwritten text. In order to overcome this noise, a simple and practical noise removal algorithm is performed. 61

Chapter Three


This algorithm eliminates the noise of the handwritten binary image by removing isolated pixels. By giving binary image of an Arabic handwritten text and two thresholds, the algorithm removes the pixels that are under and above the thresholds through assigning number zero (0) for these pixels. In Algorithm 3.3 the proposed noise removal is described.

Algorithm 3.3: Noise Removal Input: Binary image (output image from threshold process) Output: Binary image without noise Step1: Read the Binary image ( I ). Step2: Initialize first and second thresholds T1=3 , T2=300. Step3: Calculate label matrix (L), that contains labels for the 8-connected objects. Step4: Find the number of connected objects (Num) in ( I ). Step5: For j=0 to Num

\\ Num number of objects in the image

5.1: Find the position of the object in the binary image (row and column) [obj_idx]=find(L==j). 5.2: If length(obj_idx) < T1

\\ obj_idx is the positions indexes that contain the objects in L \\ length(obj_idx) is the number of the object pixels

Output image (j) = 0. ELSE Output image (j) = I (j). END IF 5.3: If length(obj_idx) > T2 Output image (j) = 0. ELSE Output image (j) = I (j). END IF 5.4: END FOR Step6: Return (Output image)

62

Chapter Three


The proposed noise removal algorithm uses two thresholds which are T1=3 and T2=300 which are less and greater than the number of smallest and biggest pixels for the handwritten Arabic components. The values of the thresholds are obtained after operating many testing processes on several handwritten images of the used handwritten databases. The noise removal algorithm is labeled each components in the image then eliminates all the unwanted pixels around these components depending on the choose thresholds. These steps make the proposed algorithm simple and efficient, through remove only the unwanted pixels without affecting the original shape of the handwritten text in the input image, unlike the existing methods which depends on a sets of filters and removing some of the handwritten text shape. Figure 3.8 shows example of applying the proposed noise removal algorithm.

(a)

(b)

Figure 3.8: Noise Removal. (a) Input image before applying proposed Noise Removal algorithm, (b) Output image after applying proposed Noise Removal algorithm.

C. Black Space Elimination The fourth step of the proposed preprocessing stage is removing the black space around the handwritten text images in the background. The black space represented by (0) value does not have any desired information about the handwritten text and does not help the recognition system which may affect the features extraction results and make it inefficient. The proposed approach for the removing the black space is based on counting 63

Chapter Three


the number of black pixels from all image directions. From each side of the binary image, closest foreground pixels of the written text will be obtained. This produced four points which formed the boundaries of the Bounding Box. The black area around this box could then be eliminated using these four values. The main steps of the black space elimination are described in Algorithm 3.4.

Algorithm 3.4: Black Space Elimination Input: Binary image (output from noise removal algorithm 3.3) Output: Binary image without black space around the text Step1: Read the binary image Step2: Scan each row and column of the image Step3: Calculate the distances between all the image borders into the closet foreground pixels by counting the number of black pixels (background) Step4: For the all obtained distances find 4.1: Itop

\\ the closet pixel from top side of the image

4.2: Ibottom \\ the closet pixel from bottom side of the image 4.3: Ileft

\\ the closet pixel from left side of the image

4.4: Iright

\\the closet pixel from right side of the image

End For Step5: Draw a rectangle (Bounding Box) around the text according to the ( Itop , Ibottom , Ileft , Iright ) points Step6: Return (image)

\\binary image without black space around the text

Step6: Crop only the BoundingBox area of the image Step7: Returnthe (Binary without black space around the 3.4), text) lots of unwanted pixels in the By applying Blackimage Space Elimination (Algorithm

background are removed without any effect on the shape handwritten text as shown in Figure 3.9.

64

Chapter Three


(b)

(a)

Figure 3.9: Black space elimination. (a) Input image, (b) Output image of Black Space Elimination Algorithm.

D. Image Thinning In order to extract the image skeleton, Stentiford thinning Algorithm 2.1 in section 2.4.2(D) is used. This algorithm is important for the structural features extraction. The skeleton image is clear and has very few pixels to represent the shape of handwritten text in the image. An example of applying the Stentiford thinning Algorithm is illustrated in Figure 3.10.

(a)

(b)

Figure 3.10: Image Thinning. (a) Input image, (b) Thinned image.

65

Chapter Three


E. Edge Detection The fifth step of the proposed preprocessing stage is the edge detection. In order to extract the edge of the image, Sobel edge detector, described in section 2.4.2(C) is used in the proposed system which give the best results for the Arabic handwritten images. The binary image that obtained from the black space elimination algorithm is converted to gray, then the image edge is detected. First the Sobel convolution vertical and horizontal kernels are applied for the image then the magnitude is calculated using Equation 2.4. Figure 3.11 shows the edge image that is obtained by applying Sobel detector.

(b)

(a)

Figure 3.11: Edge image using Sobel detector. (a) Input image, (b) Output image.

F. Image Scaling Each writer, writes the text in different style and size. Therefore, it is important to make all the images in the same size, which make the process of the feature extraction faster. Scaling is often used to scale text images to a fixed size. Besides, by making the images in the same size. The extracted features will be more efficient for the classification stage. In the proposed system, all the images are normalized into same size (128x128) as shown in Figure 3.12 which is the final output of the preprocessing stage. Scaling size has been chosen after many tests on 500 images with various sizes.

66

Chapter Three


214x376

(d)

(a)

117x278

(e)

(b)

117x278

(f)

(c)

Figure 3.12: Image scaling. (a,b,c) Images with various sizes (214x376) and (117x278), (d,e,f) Images after scaling to single size (128x128).

3.3.4 Features Base Construction Stage The objective of the features base construction stage is to capture the most relevant and discriminate characteristics of the handwritten text image and store it in a database. The selection of good features can strongly affect the classification performance and reduce the computational time. In other words, it's possible to choose a set of features that denote the significant differences from one class to another. These selected features consequently result in an easier classification task. The features used must be suitable for the application and for the applied classifier. In the proposed system two groups of features base are constructed, features base1 for module1 and features base2 for module2. 67

Chapter Three


The features base1 group is used in module1 for recognition purpose through finding the most similar features for the same text handwritten, and the feature base2 used in module2 for identification through finding the most similar features in the text handwritten for the same writer. Moreover, all the extracted features in module1 are saved in a features base1 which contains a set of vectors that represent the text handwritten features, which are shown in Figure 3.13.

Figure 3.13: Proposed features base construction of module1.

There are four main steps for features base construction of module1 which are: features extraction, feature vector, features normalization and features base1 construction.

A. Features Extraction The first group of features that is used for recognition in module1 is extracted by several methods as will be described:

68

Chapter Three


1. Structural Features The first set of features is the structural features. Primitive visual Arabic writing is part of the structural descriptors that are connected in the form of writing. Various descriptors that are specific to the Arabic text can be extracted from writing which are number of points, number of loops, number of end points and number of junctions as shown in Figure 3.14.

End Points

Loops

Points

Junctions

Figure 3.14: Different Arabic descriptors. The first feature extracted in the proposed system is the number of points. They are very important features, due to the variances in the Arabic text. The points are the smallest items in the image, and it contents only a little pixels. The second extracted feature is the number of loops. Nine of the Arabic alphabets have one loop and only one character has two loops. Loops feature has a better effect on the system when it is extracted from an Arabic handwritten sub-word than extracted from a single character. The third extracted features are the number of end points. Each Arabic handwritten sub-word has a unique number of endpoints that specified from the other sub-words. The last extracted feature is the number of junctions. 69

Chapter Three


2. Statistical Features The second set of features will be extracted by a proposed statistical features extraction method in the AHTR system. The proposed method depends on dividing the edge image that is obtained for the preprocessing stage into various number of blocks then extracting the required features from each block individually. The handwritten text image is divided into four square blocks, eight horizontal and eight vertical blocks as illustrated in Figure 3.15(b, c and d).

Figure 3.15: Image blocking. (a) Original image, (b) Four blocks divided, (c) eight blocks divided, (d) Eight vertical blocks divided, (e) Diagonal pixels of the original image, (f) Diagonal pixels of the four divided blocks. 70

Chapter Three


The first features are extracted by dividing the handwritten text image into four blocks as in Figure 3.15(b) then apply a specific mathematical operation, such as the summation of the diagonal pixels only as in Figure 3.15(e and f) to get the features for each block. Besides that, the second features are obtained by dividing the images into eight equal blocks as in Figure 3.15(c). Moreover, the last features are obtained by dividing the images into eight vertical blocks and obtain the summation of the white pixels for each block as in Figure 3.13(d). All the extraction steps of the proposed statistical features are illustrated in Algorithm 3.5.

Algorithm 3.5: Statistical Features Input: Edge image Output: Statistical Features Step1: Read the input image (I) Step2: Calculate the summation of diagonal pixels of the input image (I) Step3: Divide the input image (I) into four blocks b1,b2,b3,b4 Step4: Calculate the summation of diagonal white pixels for each block d1=ds(b1), d2= ds(b2), d3=ds(b3), d4=ds(b4)

//ds diagonal summation

Step5: Divide the input image (I) into eight blocks bk1,…,bk8 Step6: Calculate the summation of white pixels for each block s1=ps(bk1),…….., s8=ps(bk8)

// ps pixels summation

Step7: Divide the input image (I) into eight vertical blocks vb1,….,vb8 Step8: Calculate the summation of white pixels for each block vs1=vps(vb1),…….., vs8=vps(vb8)

// vps vertical pixels summation

Step9: statistical features= [ d1‫ ـــ‬d4 , s1 ‫ ــــ‬s8 , vs1 ‫ ــــ‬vs8] Step10: Return (statistical features vector)

71

Chapter Three


3. The Discrete Cosine Transform (DCT) Features In the proposed system the DCT is applied for the whole handwritten sub-image that produced from the preprocessing stage. The output of the DCT is an array of DCT coefficients. The features are extracted in a vector sequence by arranging the DCT coefficients in zigzag order. Therefore, most of the DCT coefficients away from the beginning are small or zero. In order to choose appropriate number of coefficients which represent the features, 500 handwritten images are selected for the testing. DCT is applied for all the images, and then the energy is obtained to reconstruct the original images from the selected coefficients. Examples of energy contained with different DCT coefficients numbers are illustrated in Table 3.1.

Table 3.1: The energy contained in different DCT coefficients number. Number of DCT Coefficients 10 20 30 50 All

Image_1 Energy %

Image_2 Energy %

Image_3 Energy %

Image_4 Energy %

99.15 99.16 99.19 99.77 100

98.91 98.97 98.99 99.18 100

98.30 98.8 99.1 99.22 100

98.50 99 99.1 99.23 100

The first 50 coefficients give the best energy to reconstruct the original images. In the proposed system, by testing the coefficients it is found that the best number of DCT coefficients to represent the handwritten sub-word as feature vector with minimum number of features is the first 10 coefficients which are given a good energy. In order to get the DCT features, fast cosine transform (FCT) in section 2.4.4(C) is used. The main steps to get the DCT features are presented in Algorithm 3.6.

72

Chapter Three


Algorithm 3.6: DCT- Features Extraction Input: Edge image Output: DCT Features Step1: Read the input image Step2: Compute DCT for the input image using FCT (Algorithm 2.2) Step3: Convert the DCT image into 1D array by zigzag order Step4: Select the first 10 DCT coefficients as a features Step5: Return (DCT features vector)

IV. Modified Histogram of Oriented Gradient (MHOG1) Features The important extracted features in the proposed system are MHOG. Several proposed steps are performed to extract the required features. First, the binary image that obtained from the previous stages is converted to gray form ( multiply by 255), and to find the gradient of the image, an edge detector filters are proposed as in Equations 3.1 and 3.2, and the resultant images of applying the proposed filters are shown in Figure 3.16. 0 −1 ]) 1 0 −1 0  Diagonal direction filter = ([ ]) 0 1  Anti-diagonal direction filter = ([

(a)

(b)

(3.1) (3.2)

(c)

Figure 3.16: Edge detection. (a) Original image, (b) X-axis, (c) Y-axis.

73

Chapter Three


The next step is computing the image magnitude and orientation using equations 2.4 and 2.5 respectively. The gradient magnitude and direction are shown in Figure 3.17.

(b)

(a)

Figure 3.17: Image gradient. (a) Image magnitude, (b) image direction.

After that, the gradient image is divided into 6x6 cells then scan the image from left to right within 2x2 overlapped blocks as in Figure 3.18. Overlapping Block2 Block1

Cells

Figure 3.18: Cells division.

For each block the histogram of the orientation gradient is obtained based on weighted vote into orientation bins over special cells. Since the output range of the gradient orientation fall in [-140 ‫ ـــــ‬π] which gives a lot of directions. In the proposed system the gradient orientation is quantized into 9 bins within range [-140 +2π /bin ‫ ــــ‬π] and the dist74

Chapter Three


ance between the directions is (2π/bin) as shown in Figure 3.19. Gradient orientations Magnitude (vote weight)

9 orientation bins (b)

(a)

Figure 3.19: Histogram of oriented gradients. (a) Gradient orientations, (b) Histogram of oriented gradients for one cell.

If the gradient orientation of the image dose not match any of the quantized orientations, an interpolate votes linearly between neighboring bin centers is considered. An example of such a situation, if the theta= 75 which falls between 60 and 100, the distance to the bin center bin 60 and bin 100 are 15 and 25 degrees respectively and the different between 60 and 100 is 40. Hence, ratios are obtained by 1-15/40=0.6, 1-25/40=0.3 as illustrated in Figure 3.20.

Figure 3.20: Interpolation votes of gradient orientation. 75

Chapter Three


After that, all the output histograms are concatenated to make it 1D MHOG1 features vector that represent the image features which are shown in Figure 3.21. Moreover, all the extraction steps are illustrated in Algorithm 3.7.

HOG1 vector

Figure 3.21: Histograms concatenation.

Algorithm 3.7: MHOG1 Features Input: Binary sub-image(word) Output: MHOG Features Step1: Read the input image (I)

\\ binary sub-image

Step2: Convert the image into gray form Ig

\\ multiply by 255

Step3: Apply the proposed edge filters (Equations 3.1 and 3.2) on the image Ig Step4: Compute the gradient magnitude Igm and orientation Igo (Equations 2.4 and 2.5) Step5: Construct the gradient image Igr = Igm+ Igo Step6: Divide the gradient image into Igr number of 6x6 cells Step7: Scan the image with 2x2 overlapped blocks and compute the histogram for each block Step8: Accumulate interpolation votes into orientation bins(9) over spatial cells Step9: Concatenate the output histograms into 1D_vector (MHOG1) Step10: Return (MHOG1 vector)

76

Chapter Three


B. Feature Vector Each features extraction method extracts several number of features based on its entire process. Therefore, there will be four types of features extracted for each input handwritten image. In the feature vector step, all the extracted features which are extracted by Algorithms 3.5, 3.6 and 3.7 are combined together and saved in one dimension array (1*120) of all the extracted features called feature vector. The created feature vector represents the input Arabic handwritten sub-image by a set of features that will be used for classification. Figure 3.22 shows the building of the feature vector.

Figure 3.22: Feature vector.

C. Features Normalization An important step to make the mathematical computing simple and fast is by normalizing the numbers. Large numbers make complex mathematics calculation and take long times for processing. Therefore, in the proposed work features normalization (scaling) has been used to make the features in the same scale. Since there are a signed and unsigned features in the features vector that are obtained by the proposed features extraction algorithms, all the features are scaled in the range of [-1,1] by applying the Algorithm 3.8. 77

Chapter Three


Algorithm 3.8: Features Normalization Input: Feature Vector Output: Normalized Feature Vector Step1: Load the feature vector F Step2: For each column 2.1: mx = Max (F)

// mx is the biggest number in F

2.2: mi = Min (F)

// mi is the smallest number in F

2.3: F’= (F- mi) / ( mx – mi)

// F’ is the normalized vector

2.4: F’= (F’ * 2) - 1

// make the F’ values in range of [1 -1]

End For Step3: Return (normalized feature vector)

Several scaling ranges were tried in order to choose the most appropriate one. The best scaling range was [-1, 1] which reduce the training and classification time of the proposed system.

D. Features Base1 Construction The final step in feature base construction stage is to create the features base1. Each feature vector first will have a label. The label is an integer number that represent the desired class of the feature vector. Therefore, all the features vectors that belong to the same class (same sub-word) will have the same label to make the classification process more efficient. Moreover, all the features vectors are stored in two dimensions array (rows present the classes and columns present the features) then saved as a base in a file.

3.3.5 Classification Stage In the proposed system SVM is employed as one-vs-all multi-class. Since SVM is a binary classifier, each Arabic handwritten sub-image (class) will have a SVM model which represents the handwritten text in the sub-image. The first SVM model separates 78

Chapter Three


the class "‫ "عبد‬from the remaining classes (sub-images), the class of "‫ "عبد‬handwritten subimage is considered the positive class and all remaining handwritten sub-images form the negative class. Figure 3.23 shows an example of applying one against all approach in the proposed system.

Figure 3.23: Architecture of proposed SVM one against all approach.

The proposed system used three different Arabic handwritten databases. In AHDB database 2730 handwritten images were used for training and 1365 handwritten images used for testing, while in IESK-ArDB database 420 handwritten images for training and 180 handwritten images used for testing. Besides, in the proposed database 780 handwritten images for training and 520 handwritten images for testing were used. During the training process the handwritten image goes through image acquisition (section 3.3.1), preprocessing stage (section 3.3.3), and features base construction (section 3.3.4) to save all the extracted feature vectors. 79

Chapter Three


However, during the testing process images go through the same stages of the training and the segmentation stage (section 3.3.2). Furthermore, all the extracted features in the training and testing processes were used to train and test the classifier, in order to get the best accuracy of recognizing the desired class labels. The overall handwritten SVM training and testing processes are shown in Figure 3.24.

Figure 3.24: Classification process of module1.

During training, all the examples in the class considered are labeled positively (+1) and all examples not belonging to this class are labeled negative (-1). In another hand, during the testing, the input Arabic handwritten sub-image was associated with the class whose output is positive. The process of classification based on the SVM classifier is illustrated in Algorithm 3.9 for training and Algorithm 3.10 for testing.

80

Chapter Three


Algorithm 3.9: SVM Training Input: Feature base Output: Multi SVM Model Step1: Load the features base (features base1) Step2: Assign unique class label for the feature vectors group of the handwritten sub-images that have same Arabic text Step3: For all class labels 3.1: Gather the feature vectors of the same class label as a group 3.2: Assign a positive label (+1) for the first group, and negative label(-1) for the other groups 3.3: Train the SVM within the groups and their labels 3.4: Build SVM model for the first group 3.5: Repeat 3.1,3.2,3.3 and 3.4 steps for the remaining groups End For Step4: Assign back the unique label in step2 for the each output SVM model Step5: Save all the SVM models and their labels as a one general model(Multi SVM Model) Step6: Return (Multi SVM model)

Algorithm 3.10: SVM Testing Input: Feature vector Output: predicted labels Step1: Load the features vector (testing set) Step2: Load the Multi SVM model Step3: Assign unique class label for the feature vectors group of the handwritten sub-images that have same Arabic text Step4: For all class labels 4.1: Gather the feature vectors of the same class label as a group 4.2: Test each group based on the Multi SVM model(Algorithm3.9) //predict the desired labels 4.3: Save the outputs of the testing as 1D vector of the predicted class labels. End For Step5: Return (predicted class labels) 81

Chapter Three


3.3.6 Post-processing Stage The final stage of module1 is the post-processing. The output of the classification stage is a class labels of each input Arabic handwritten sub-image. In this step these labels are used to represent the output text and display it in the system interface. Most of the computer applications do not support the Arabic language. Therefore, in the postprocessing process an Arabic lexicon is built. The lexicon has all required text and the Unicode of each text to simplify the dealing with that text. A sample of the proposed Arabic lexicon is presented in Table 3.2.

Table 3.2: Sample of the proposed Arabic lexicon. Arabic Text

Unicode

‫هللا‬

[ 1575,1604,1604,1607]

‫عبد‬

[1593,1576,1583]

‫العام‬

[ 1575,1604,1593,1575,1605]

Arabic lexicon provides the ability to the proposed recognition system to edit the recognized Arabic text. In the literature of chapter one (section 1.6), the previous system displays the recognition results by viewing the accuracy only. However, some of the systems gave the ASCII codes as the output for the desired handwritten text. In the both cases the end users can not see any text as an output of the system. The users all see only the system accuracy, class labels or the ASCII codes of the handwritten text. In addition, ASCII codes of the Arabic language gave only a single characters without joining them as a complete sub-words or sentences. Therefore, the proposed Arabic lexicon overcomes all these problems and provides the end users with the required editable Arabic text.

82

Chapter Three


3.4 Arabic Handwritten Text Writer Identification (Module2) Module2 is used to identifying the writer of the input Arabic handwritten text images in the AHTR system. The main stages of the module2 are: image acquisition, segmentation, preprocessing, features extraction, classification and post-processing as illustrated in Figure 3.25.

Figure 3.25: Architecture of writer identification (module2).

Most of the stages in the module2 are similar to the module1 stages with a little differences. Image acquisition in section 3.3.1 is used to prepare the handwritten text image for the following processes of the system through converting the input handwritten text image into gray. Algorithm 3.1 is used to segment the input handwritten image into sub-words. Algorithm 3.2 is used to convert the handwritten sub-image into binary in the preprocessing stage, then the noise is removed by Algorithm 3.3. Moreover, Algorithm 3.4 is used to remove the black space around the handwritten image as shown in Figure 3.26.

83

Chapter Three


Figure 3.26: Preprocessing stage of writer identification (module2).

3.4.1 Features Base Construction (module2) The features used in module2 to identifying the identity of the writers who wrote the Arabic handwritten text image through finding the differences between all the writers’ styles and characteristics. Moreover, these features are extracted from each segment images and all the extracted features saved in a features base2 which contains a set of vectors that represent the writers’ features as shown in Figure 3.27.

Figure 3.27: Proposed features base construction of module2.

84

Chapter Three


Two types of features are extracted for module2. The first type is extracted by Modified HOG descriptor and the second one is a shape features which are extracted by several methods. In addition, the features extracted from the output text handwritten sub-images of the preprocessing stage.

A. Modified Histogram of Oriented Gradient (MHOG2) Features In order to extract the first type of the mentioned features, the binary image is first converted back into gray then normalized into size 128x128. After that, the proposed filters in Equations 3.1 and 3.2 are applied to the image to get the edge in X and Y directions same as in Figure 3.16. The proposed MHOG for identification uses only gradient directions of image that is illustrated in Figure 3.17(b) for extracting the required features. Furthermore, the obtained image directions are divided into four blocks as shown in Figure 3.28.

Figure 3.28: Blocks dividing.

For each block the histogram of the orientation gradient is obtained based on weighted vote into orientation bins. Besides that, the gradient orientation is quantized into 10 bins within range [-π ‫ ــــ‬π] and the distance between the directions is (2π/bin). The weight 85

Chapter Three


of each bin in the histogram depends on the number of directions appearing in each block as in Figure 3.29.

Figure 3.29: Histogram of gradient orientation of MHOG2.

However, if the gradient orientation of the image does not match any of the quantized ones, an interpolate votes linearly between neighboring bin centers is considered. Finally, all the output histograms are concatenated to form 1D vector that represents the MHOG2 features. All the extraction steps of MHOG2 are illustrated in Algorithm 3.11.

Algorithm 3.11: MHOG2 Features Input: Binary handwritten segment image Output: MHOG Features Step1: Read the input image (I) Step2: Convert the image into gray form Ig1

\\ multiply by 255

Step3: Normalize the image Ig into 128x128 Step4: Apply the proposed filters (Equations 3.1 and 3.2) on the gray image Ig1 Step5: Compute the gradient orientation Igo1 (Equation 2.5) Step6: Divide the gradient orientation image Igo1 into 4 blocks 86

Chapter Three


Step7: Compute the histogram for each block Step8: Accumulate interpolation votes into orientation bins (10) Step9: Concatenate the output histograms into 1D vector (MHOG2) Step10: Return (MHOG2 features vector)

B. Shape Features The second type of extracted features is the shape feature. Several features are extracted from the handwritten sub-images which depend on the shape of the handwritten text. The first extracted feature is the aspect ratio (Equation 2.16) of the handwritten text in the sub-image by finding the width and height of the text handwritten as in Figure 3.30. Width of the text in the sub-image is found by successively penetrating each column in the binary image to find the first and last pixels in the image and store their column numbers. The width of the image is calculated by subtracting the column number of last pixel to the column number of first pixel. However, height of the text in the sub-image is found by consecutively probing each row in the binary image. The first and last pixels of the image are found and the corresponding row numbers are stored. The height of the image is computed by subtracting from the row number of last pixel to the row number of first pixel. The aspect ratio of all the sub-images are stored as a features. Besides that, the second extracted feature is the centroid. The centroid represents the center of mass of the handwritten sub-image depending on the width and height whose centers represent the centroid point as shown in Figure 3.30.

87

Chapter Three


Figure 3.30: Width, height and centroid of the handwritten sub-image.

The third feature is the area of the handwritten text in the sub-image. The Area of the text in the sub-image is calculated as the number of foreground pixels. Furthermore, the fourth extracted feature is the perimeter. In order to compute the perimeter, the number of “1” pixels that have “0” pixels as neighbors are counted, and the result is represents the perimeter. In the case of the sub-image having more than one handwritten segments, the perimeter is found by the summation of all the results from each segment. Algorithm 3.12 shows the steps of extracting the proposed shape features.

Algorithm 3.12: Shape Features Input: Binary handwritten text (sub-image) Output: set of Shape Features Step1: Read the input sub-image (I) Step2: Find the width Iw and height Ih of the handwritten text in the sub-image (I) Step3: Calculate the Aspect Ratio (AR) (Equation 2.21) of the handwritten text in sub-image (I) Step4: Find the Centroid (Cn) of the handwritten text in sub-image (I) Step5: Find the Area (Are) of the handwritten text in sub-image (I) Step6: Find the Perimeter (Pr) of the handwritten text in sub-image (I) Step7: Combine all the extracted features and save it in one dimension array [AR, Cn, Are, Pr] Step8: Return (set of shape features)

//vector 88

Chapter Three


In addition, features normalization in section 3.3.4(C) has been used to make the features ranges [-1,1] by applying the Algorithm 3.8. The final step is to create the features base. Each feature vector first will have a label. Therefore, all the feature vectors that belong to the same class will have the same label to simplify the classification process. Moreover, all the feature vectors are stored in two dimensional arrays then saved in features base2.

3.4.2 Classification Stage (module2) SVM is used to make the decision by assigning each input handwritten text images into its desired writer. One-vs-all approach is used by separating each writer class from the other classes. Algorithms 3.9 and 3.10 are used to train and test the classifier. The SVM works by classifying the whole handwritten text image into its writer. Each handwritten sub-image is classified into its desired class, then a voting process is applied for all the classes in order to choose the most frequented class. The voting process depends on a threshold in order to get the best identification accuracy. If the appearance percentage of the frequent class is greater than the selected threshold which is (85% obtained by subword level identification approach), then it considers the desired class of the right writer. The classification approach of writer identification process is illustrated in Figure 3.31.

Figure 3.31: Classification approach of module2.

89

Chapter Three


3.4.3 Post-processing Stage (module2) The output of the identification process is a class label of the desired writer. Another part of the proposed writers’ lexicon is created for saving the writers information. Each class label of the input text is assigned to its writer to make the output of the system display the writer name. A sample of the proposed writers’ lexicon is presented in Table 3.3. Table 3.3: Sample of proposed writers’ lexicon. Class Label

Writer Name

[1]

Writer1

[2]

Writer2

. . . [n]

. . . Writer(n)

3.5 Proposed Handwritten Database An Arabic database for character, sub-word, text recognition and writer identification has been proposed in this work. The first part of the database contains all Arabic characters which are written in different sizes and styles. Each characters has 20 JPG images that represent the character and the overall database images are 560 images. Samples of the proposed handwritten character database are illustrated in Figure 3.32. The second part of the proposed database has 1,300 handwritten sub-word images written by several writers of different ages and education background. Some writers used black pen but the others used blue pen for writing.

90

Chapter Three


‫ث‬

‫ت‬

‫ب‬

‫ا‬

‫ظ‬

‫ج‬

‫ص‬

‫ش‬

Figure 3.32: Handwritten character example of the proposed database.

The database can be used for handwritten recognition systems and in the security systems and applications, such as writer identification and verification, when the other standard databases can be used only for the recognition systems only. Figure 3.33 shows an examples of the proposed database.

‫الرحمة‬

‫لهم‬

‫قلب‬ ‫ك‬

‫والمحبة‬

‫واشعر‬

‫للرعية‬

Figure 3.33: Handwritten sub-word example of the proposed database. 91

Chapter Three


Furthermore, the proposed database has a similar and different texts that are written by same writer which make the recognition for the writer identity process text dependent or independent as shown in Figure 3.34.

Figure 3.34: Sample of Arabic text written by same writer.

Finally, the last part of the database is the handwritten text documents. The database has several Arabic handwritten text documents written by several writers. These documents are used to test the performance of the proposed work. An example of the Arabic handwritten text documents in the proposed system is illustrated in Figure 3.35. All these mentioned features are missed from the standard databases which were reviewed in section 2.5.2 of chapter two.

92

Chapter Three


Figure 3.35: Handwritten text image example of the proposed database.

93

Experiments And Results Discussion

Chapter Four Experiments and Results Discussion 4.1 Introduction The results obtained from applying the proposed algorithms and the effect of each proposed algorithm of the system are presented in this chapter. In the following sections, the test setup and the experimental results obtained for the segmentation, preprocessing, multiscale features, and classifier are discussed. The proposed system is implemented in Matlab 2015a and Visual Studio 2013 programming languages. The experiments were performed on an Intel Core i5, 64 bit Operating System, 2.50 GHz processor and 6GB RAM.

4.2 Evaluation of The AHTRS System (module1) In order to evaluate the proposed system, number of metrics are considered. The experimental results of the proposed system are given in details in the next sections.

4.2.1 Arabic Handwritten Database The important martial that needed to evaluate any system is the database. Unlike the existing handwritten systems in the literature (section 1.3), the proposed system used three different handwritten databases and more Arabic handwritten images. The first database is AHDB, which was used for the recognition purpose. In the proposed system 4095 images of Arabic handwritten numbers and the most common Arabic words of the AHDB database are used. The proposed system randomly used 70% of the database which is 2730 handwritten images for training and 30% of which are 1365 handwritten images for testing. The second used database is IESKArDB that is also used for handwritten recognition. The training set use 420 images and 180 images for the testing set. The last used database is the proposed dataset which is used for recognition and identifying the handwritten text writers. The proposed sys94

Chapter Four

Experiments and Results Discussion

tem used 1300 handwritten word images, 60% used for training and 40% for testing the system. Examples of the used handwritten databases are shown in Table 4.1.

Table 4.1: Arabic handwritten images from different databases. Handwritten images

95

Text

Database

‫ألف‬

AHDB

‫لاير‬

AHDB

‫يمكن‬

AHDB

‫تغتنم‬

Proposed

‫الخلق‬

Proposed

‫بيلوجي‬

IESK-ArDB

‫موظف‬

IESK-ArDB

Chapter Four


As seen in Table 4.1, each handwritten database has different images colors and types with different shapes and sizes. These differences make the recognition a very difficult task to overcome the differences and achieve higher accuracy. However, for testing the system an Arabic handwritten text documents with different type, size and colors are used. An examples of the used Arabic handwritten document images which are used as input for the proposed system are illustrated in Figure 4.1.

Figure 4.1: Arabic handwritten text documents. 96

Chapter Four


In addition, each Arabic handwritten text has several orientations and shapes in the handwritten images of used handwritten databases. Even the single writer has some of differences in his/her writing. An example of the mentioned case the Arabic text “‫”العام‬ takes several orientations and shapes in the handwritten text image of the AHDB database as shown in Figure 4.2.

Figure 4.2: Arabic handwritten images of the text (‫)العام‬.

4.2.2 Handwritten Text Segmentation The first process of the proposed system is to segment the handwritten text into sub-words. These sub-words are then saved to be used for further process in the next stages. The results of applying the proposed segmentation are shown in Figure 4.3. The proposed segmentation algorithm is applied for several images from the AHDB database and the proposed database. The segmentation rates that obtained are presented in Table 4.2.

97

Chapter Four


(a)

(b)

Figure 4.3: Image segmentation. (a) AHDB database, (b) Proposed database.

98

Chapter Four


Table 4.2: Segmentation Results. Database

Correct Segm.

Under Segm.

Over Segm.

Misplaced Segm.

AHDB

89%

3%

6%

2%

Proposed

92%

4%

2%

2%

The correct segmentation for AHDB and proposed databases are 89% and 92% respectively which were obtained by using Equation 2.30. However, Equation 2.31 was used to calculate the segmentation errors which are 11% for AHDB database and 8% for the proposed database, because of the variation in handwriting, such as the unstable spaces between the sub-words and the semi-words. Different types of error occurred as in Figure 4.4 during the segmentation process as viewed in Table 4.2. These errors are:  Over Segmentation: the number of segments is larger than the actual number.  Under Segmentation: the number of segments is less than the actual number.  Misplaced Segmentation when the number of segments is correct but the limits are wrong.

The output sub-images from the segmentation algorithm go through an evaluations to check if the sub-images are segmented correctly. First of all, the mean and Standard Deviation (SD) are calculated for the handwritten text images of each Arabic text in the used databases as shown in Figure 4.5.

99

Chapter Four


Original Image

Segmented Image

a

b

c

Figure 4.4: Error segmentation. (a) Over segmentation, (b) Under segmentation, (c) Misplaced Segmentation. In addition, the height, width and area are calculated too. In another hand, all the calculation are done also for the output handwritten sub-image from the segmentation algorithm. After that, Equation 2.24 is used for both calculation results to indicate whether the sub-image are segmented correctly or not.

100

Chapter Four


Figure 4.5: Mean and SD of handwritten image for same Arabic text.

4.2.3 Handwritten Text Image Preprocessing In this stage several processes are performed on the handwritten sub-images in order to make them ready for the next stages. These processes are performed using the algorithms that are explained in chapter three (section 3.3). The evaluation of these processes will be explained in the following sections:

A. Image Thresholding For evaluating the proposed image thresholding algorithm, several handwritten text images from various text handwritten databases are used. The output of applying the thresholding algorithm is a binary image that contains two types of pixels, black color (0) for background and white color (1) for foreground. By applying the proposed image thresholding algorithm on the AHDB, IESK-ArDB and the proposed databases the output images will be clear, readable and noiseless as shown in Figure 4.6.

101

Chapter Four


(a)

(b)

Figure 4.6: Image thresholding by the proposed thresholding method. (a) Input images,(b) Output images.

102

Chapter Four


Another evaluation of the image thresholding algorithm can be performed by using the ground-truth images in the handwritten database such as IESK-ArDB database. IESKArDB database provides text handwritten images which are produced using Adobe Photoshop software by painting in black those pixels that were expected to be considered as background by the thresholding algorithm, and painting in white those that were expected to be considered foreground. More precisely, pixels in the resulting binary image can be classified as pixels correctly assigned to the foreground, pixels correctly assigned to the background, pixels incorrectly assigned to the foreground, and pixels incorrectly assigned to the background. Misclassification Error (ME) in chapter two (section 2.7) is used to calculate the missclasification pixels between the binary image and the ground-truth image as in Table 4.3. The optimal results of Misclassification Error (ME) is zero. Therefore, when the results are close to zero means best thresholding result and vise versa. Table 4.3 shows the strength of the proposed image thresholding algorithm through the number of the misclassification pixels between the binary images and the ground-truth images. The low number of misclassification pixels makes the features extraction methods work better and also increase the recognition accuracy. The handwritten text images with less or without misclassification pixels (ME = 0), make each handwritten text clear, unique and keep the orginal shape of the handwritten text without any change. This process makes the features extraction methods extract the features in an appropriate way without any conflict between the handwritten texts.

103

Chapter Four


Table 4.3: Evaluation of proposed thresholding algorithm on IESK-ArDB database. Binary Image

Ground-truth Image

ME

0.002

0.031

0.02

In another hand, the more misclassification pixels are reduced the recognition accuracy and make the features extraction methods work badly. The more misclassification pixels force the recognition system to use several or many features extraction methods to extract an unique features for each handwritten text without conflicts. Using several features extraction methods lead to more processing time and very long feature vector.

104

Chapter Four


B. Noise Removal To address the robustness of the system and the proposed noise removal method, a noise models have been added to the handwritten sub-images. These noise models were used to generate a set of sub-images that considers different levels of degradations in reallife scanned handwritten text document images. These degradations allow to carefully study the performance of the proposed system. Salt and Pepper noise, Gaussian noise have been added to the handwritten image. Table 4.4 shows the added noises and the proposed system accuracy with these noise before applying the proposed noise removal algorithm. The recognition accuracy of the system is computed by using Equation 2.30 in chapter two (section 2.7).

Table 4.4: Recognition accuracy after adding noise. Original Images

Images with noise

Noise

PSNR

MSE Accuracy

Gaussian

40.30

6.055

95.6%

Salt and Pepper

105

40.40

7.917

Chapter Four


The results of PSNR and MSE in Table 4.4 showed the effects of adding the noise into the hardwearing image. Small PSNR and large MSE value means a noisy and poor quality image. Although the noise is added to the handwritten images, the proposed system still gets a better accuracy as shown in Table 4.4. In another hand, for performance evaluation of the proposed noise removal algorithm, MSE and PSNR are used on the images before and after applying the proposed noise removal method. Furthermore, Table 4.5 shows the recognition accuracy of the proposed system after applying the noise removal algorithm on the noisy handwritten image. The recognition accuracy of the system is computed by using Equation 2.30 in chapter two (section 2.7).

Table 4.5: Experimental results of applying the proposed noise removal algorithm. Input Images

Output Images

PSNR

MSE

Accuracy

74.7065 0.0022

96.2%

76.3286 0.0025

106

Chapter Four


The results in Table 4.5 showed the benefit of using the proposed noise removal algorithm. MSE is found by the differences between the image before and after applying the proposed noise removal method. The MSE results in the table are small which means better noise removal results. However, the PSNR results are high and the higher value of PSNR means the better noise removal. On the other hand, the system achieved a high recognition accuracy after applying the proposed noise removal algorithm.

C. Black Space Elimination The proposed Black Space Elimination (BSE) algorithm has a good impact on the recognition accuracy of the proposed system. Removing the black space around the handwritten text, is avoiding the redundancy of the background pixels which are represented by number (0). These process makes the features extraction methods work well and avoid the undesired pixels to be used for extractions. Table 4.6 shows the accuracy of recognition system before and after applying the proposed BSE algorithm 3.4 in the proposed system.

Table 4.6: Experimental results of applying BSE algorithm. System

Accuracy

AHTRS system without BSE algorithm

93%

AHTRS system + BSE algorithm

96.317%

107

Chapter Four


As seen in Table 4.5 the recognition accuracy is increased by 3.317% when the BSE algorithm is used. The removing of unwanted background pixels makes the features extraction methods extract the features directly from the handwritten text shape. This process makes the features extraction works fast, because it will use less image pixels for extracting the required features. In another hand, the features extraction methods will avoid all the uninformative pixels that lead to redundant features which reduce the recognition accuracy.

D. Image Scaling The proposed work used various image sizes. The increasing of image sizes cause the increasing in recognition accuracy, but makes the recognition process performed slowly. However, reducing the image sizes cause the reduction in recognition accuracy and losing some of image information but makes the recognition process work faster. In the proposed system several sizes are tested in order to get the best image unified size. After testing these several sizes, it is found that the best accuracy is obtained by within 128*128, 128*64, and 64*128 image sizes for the recognition process in module1. The experimental results with different image sizes are illustrated in Table 4.7.

Table 4.7: Experimental results with various image sizes. Image Size

Accuracy

32 * 32

94 %

64 * 64

94.8%

64 * 128

95.22%

128 * 64

95%

128 * 128

96.317% 108

Chapter Four


4.2.4 Features Extraction Various features extraction methods are used in the proposed system in order to extract the required features. Each method has its strength in different aspects. All the methods are applied on the images to get the best features that present the desired class (handwritten sub-word) and specify it from the other classes in the module1. In addition, different number of features are extracted by these methods. The large number of features make the recognition and identification process slow, but increase the accuracy. However, the small number of features make the recognition and identification process fast, but reducing the accuracy of the system. In the MHOG1 features (3.2.1 D) more than one edge detection filters are applied in order to get the best accuracy. The HOG descriptor has its own default filter for edge detection which was reviewed in chapter two (section 2.4.4 B (I)). Furthermore, Sobel, Canny, Roberts and the proposed filters (Equations 3.1 and 3.2) are used. The best recognition accuracy results are obtained by using the proposed filters. Table 4.8 shows the comparison of using different edge detection filters and the proposed one on the AHTRS system.

Table 4.8: Comparison of results for different edge detection filters. Edge Detection Filter

Accuracy

HOG filter

89.2%

Sobel

89%

Canny

87%

Roberts

90.1%

Proposed

92.70% 109

Chapter Four


In addition, several number of bins are tested to obtain the optimal once. The signed and unsigned theta is also tested for better performance. Also, the cells, block numbers and sizes are chosen after hundreds of tests on the used handwritten databases. The results of testing all the mentioned values are show in Table 4.9.

Table 4.9: Comparison of results for different MHOG1 values. Cells

Blocks

Bins

Accuracy

6x6 6x6

2x2 2x2

8 9

91% 92.70%

6x6

2x2

10

90.88%

6x6

3x3

8

89.317%

6x6

3x3

9

90.9%

6x6

3x3

10

89%

6x6

4x4

8

88.4%

6x6

4x4

9

89.5%

6x6

4x4

10

89%

5x5

2x2

8

84%

5x5

2x2

9

86.8%

5x5

2x2

10

84%

5x5

3x3

8

83%

5x5

3x3

9

84%

5x5

3x3

10

80%

5x5

4x4

8

80%

5x5

4x4

9

81.2%

5x5

4x4

10

80%

110

Chapter Four


The proposed MHOG1 features use an overlapped blocks that explained in chapter three (Figure 3.18) in order to compute the histograms. The overlapped processes increase the recognition accuracy better than using un-overlapped blocks. The process of overlapping makes each histogram depend on the previous and next once, which gives a very useful details about the text handwritten appearance. The effects of applying the overlapping process in MHOG1 are illustrated in Table 4.10.

Table 4.10: Experiment results for overlapping approach. Approach

Accuracy

un-overlapped blocks

88.5%

overlapped blocks

92.70%

However, in DCT algorithm 3.6 there are different approaches that are used to extract the features. The features can be extracted directly from the handwritten-sub image as one block or by dividing the handwritten sub-image into blocks then finding the DCT coefficients for each block. Four approaches are attempted in the proposed work in order to find the DCT coefficients. The first approach is finding the DCT coefficients from the sub-word directly and the other approaches by dividing the sub-image into 4x4, 6x6 and 8x8 blocks. The experimental results of applying the various approaches of finding the DCT coefficients are shown in Table 4.11.

111

Chapter Four


Table 4.11: Experiment results for different dividing approaches. Blocks

Accuracy

1 block

67.92%

4x4 blocks

60%

6x6 blocks

61.22%

8x8 blocks

64.7%

Appling the DCT on the image directly give the best recognition accuracy after testing 500 Arabic handwritten sub-images. The output of applying DCT is an array of coefficients which has the same size of the handwritten sub-image. The final step of extracting the DCT is choosing the appropriate coefficients as a features. There are two ordering techniques to choose these coefficients. The first technique is by taking the coefficients sequentially, and the second technique by taking the coefficients in zig-zag order. The results of applying the both ordering techniques are shown in Table 4.12.

Table 4.12: Ordering techniques of selecting coefficients. Ordering technique

Accuracy

Sequential

66.7%

Zig-zag

67.92%

In the proposed system the DCT features are extracted by the common DCT algorithm and using FCT algorithm. The experiments show that, extracting the DCT features using FCT algorithm is faster than using the DCT algorithm within same accuracy results. Table 4.13 shows the features extraction times of applying the DCT and FCT algorithms. 112

Chapter Four


Table 4.13: Features extraction times of DCT and FCT methods. Method

Extraction Time (second)

DCT

1.6

FCT

0.8

The results of extracted features of the proposed system and the combining of the extracted features are displayed in Table 4.14.

Table 4.14: Comparison of results for different features extraction methods. Features

Accuracy

DCT

67.92%

MHOG1

92.70%

Statistical + Structural

70.88%

All features

96.317%

Each one of these features outperforms the others on set of handwritten classes. By combining these features, the weakness of each single feature is strengthened by the other features type; thereby the whole proposed system is improved. The result verifies that it is possible to model an image by its unique features and taking advantage of corresponding sort of features to represent it; also, the recognition and identification effectiveness based on combined features give a higher results than those of individual features.

113

Chapter Four


4.2.5 Features Normalization (FN) In the proposed work, the features are normalized into the range [-1 1] using the proposed features normalization algorithm 3.8. The proposed algorithm reduces the training and testing time of the system by simplifying the computation processes. The impact of the proposed features normalization algorithm on the system is illustrated in Table 4.15.

Table 4.15: Comparison results of applying FN algorithm. Features Without FN

Classification Time (second) 4.5

With FN

0.9

4.2.6 Classification In the proposed system, different kernels of SVM are used. SVM is commonly used with linear, polynomial and RBF kernels. A multiclass SVM classification has been used in the proposed system and it achieved a very high recognition accuracy using the polynomial kernel for the both modules. The recognition accuracy that achieved in module1 for linear, polynomial and RBF kernels are shown in Figure 4.7.

Figure 4.7: The recognition accuracy of different SVM kernels. 114

Chapter Four


The proposed system used three Arabic handwritten databases for evaluation. Each database has different number and type of Arabic handwritten images. In addition, due to the writing style and the number of the images in each database, it caused of various recognition accuracies. In Table 4.16 the comparisons results of using different databases within the SVM kernels are presented.

Table 4.16: The recognition accuracy of different Arabic Databases and SVM kernels.

Kernel

linear

polynomial

RBF

AHDB

92%

96.317%

93.1%

IESK-ArDB

76%

82%

78.66%

Proposed

96.2%

98%

97%

Database

4.2.7 Number of Images Set The accuracy of any pattern recognition system is directly affected by the number of images set used for training and testing. When the machine is trained on more data samples, then the machine is able to predict the result more accurately. By increasing the training images, the accuracy is increased and the training time is also increased. However, decreasing the training images causes decrease in the training time. For the experiments, the results of using AHDB database are shown in Table 4.17.

115

Chapter Four


Table 4.17: Experiment results of different training and testing images numbers. Number of Training

Number of Testing

Accuracy

Classification Time (second)

2137

2137

94%

1.3

2565

1710

95.82%

0.99

2993

1282

96.31%

0.9

3420

855

98.5%

0.82

3848

427

99.4%

0.78

4.3 Evaluation of the AHTRS System (module2) In order to evaluate the module2, the focuses will be on the features extraction and the classification stages. In this section the results of applying the proposed features extraction algorithms for writer identification will be discussed; it also views the effects of applying different classification kernels and approaches.

4.3.1 Features Extraction As mentioned in chapter three (section 3.4.1), two features are extracted from the handwritten sub-images. MHOG2 is the first features which gave a unique orientation details of the handwritten text for each writer. However, shape features are the second features that gave a special shape representation of the same handwritten text for the different writers.

116

Chapter Four


Table 4.18: Experiment results for different MHOG2 values. Scaling size

Divided blocks

Bins number

Accuracy

32x32 32x32 32x32 32x32 32x32 32x32

2x2 2x2 2x2 3x3 3x3 3x3

8 9 10 8 9 10

55% 56.5% 60.8% 56.2% 58.9% 62%

32x32 32x32 32x32 64x64 64x64 64x64

4x4 4x4 4x4 2x2 2x2 2x2

8 9 10 8 9 10

60.3% 62.5% 65% 64% 68.6% 70%

64x64 64x64 64x64 64x64 64x64 64x64

3x3 3x3 3x3 4x4 4x4 4x4

8 9 10 8 9 10

72% 73.3% 75% 73% 74% 76%

128x128 128x128 128x128 128x128 128x128 128x128

2x2 2x2 2x2 3x3 3x3 3x3

8 9 10 8 9 10

70.7% 72% 75% 78.5% 79.1% 80%

128x128 128x128 128x128

4x4 4x4 4x4

8 9 10

80% 81.2% 82.9%

117

Chapter Four


In MHOG2, the handwritten sub-images are normalized into several sizes in order to choose the best size, also the handwritten sub-images are divided into number of blocks and several numbers of bins are tested to obtain the optimal once. The signed and unsigned theta is also tested for better performance. Table 4.18 shows the results of testing all the mentioned values. The best accuracy obtained for the MHOG2 features is by normalized image with size of 128x128, 4x4 divided blocks and 10 bins which has been shown in Table 4.18. However, the results of applying the MHOG2 (Algorithm 3.11) and shape features extraction (Algorithm 3.12) are illustrated in Table 4.19.

Table 4.19: The identification accuracy of the extracted features. Features

Accuracy

MHOG2

95.9%

Shape

93 %

MHOG2 + Shape

100%

4.3.2 Classification In the module2 two approaches are used to recognize the desired writer of the handwritten text. The first approach depends on the handwritten sub-word level and the second one depends on the handwritten text level. The obtained results of writer recognition for the both approaches based on using various SVM kernels are illustrated in Table 4.20.

118

Chapter Four


Table 4.20: The identification accuracy of different SVM kernels. Approach

Kernel

Accuracy

Sub-word level

linear

80%

Sub-word level

polynomial

85 %

Sub-word level

RBF

81.9%

Text level

linear

98%

Text level

polynomial

100%

Text level

RBF

98.6%

4.4 Discussion The evaluations of the proposed system performance are accomplished in order to check the effectiveness of proposed algorithms. The proposed algorithms adapts to the model in image pre-processing, text segmentation, features extraction, classification and the post-processing. The proposed system has two modules, the first module is tested on three Arabic handwritten databases, and the second module tested on a proposed Arabic handwritten database. The system used 4095 handwritten images from the AHDB database, 600 handwritten images from the IESK-ArDB database and 1300 handwritten images from the proposed database. The proposed segmentation method segments the handwritten text images into a suitable form that made the system work probably. The efficient proposed preprocessing methods give a best results in converting the used handwritten images into binary and normalized for the most appropriate size which is 128x128. The features extraction methods extracted the best features that represent each Arabic text according to its characteristics. Moreover, the best classifier for dealing with a large data sample is used which is SVM, and gave a better accuracy results. Finally, 6.2 (seconds) was the system processing time.

119

Chapter Four


The proposed Arabic handwritten database has character, word and text images. The proposed system achieved 99.8% for handwritten character recognition, 99.7% for handwritten word recognition and 98% for handwritten text recognition. Additionally, the system is evaluated within three classifiers which are SVM, KNN and ANN and the obtained recognition accuracies are 98%, 93% and 94% respectively for the handwritten text recognition. However, the accuracy of identification was 85% using SVM, 81% using KNN and 82.3% using ANN for the sub-word level approach and 100% using SVM, 95% using KNN and 98% using ANN for the text level approach.

120

Conclusions and Suggestions for Future Work

Chapter Five Conclusions and Suggestions for Future Work 5.1 Conclusions This chapter summarizes the evaluation of thesis results, and shows the main contributions of the proposed work. Based on the implementation of the proposed work, the following conclusions are given: 1. The proposed system depends on handwritten sub-word segmentation approach which is simple, practical, and efficient proposed segmentation algorithm which achieved a high segmentation rate as shown in Table 4.2, thus accurate recognition. 2. Several methods and algorithms in the proposed preprocessing stage have an ideal effect on the proposed system:  The proposed thresholding algorithm which is based on calculating the intensity value of pixels in the gray scale image and Fuzzy C-Means Culturing (FCM), select an essential thresholding points that assign the background and foreground pixels correctly within less misclassification error pixels as illustrated in Table 4.3.  Removing the unwanted pixels has been done by proposed algorithm which depends on two thresholds derivative from the characteristic of the Arabic language and without removing any important pixels from the binary image. The proposed algorithm keeps a best results of MSE and PSNR after removing the unwanted pixels of the image, by as shown in Table 4.5.  One of the important algorithms of the preprocessing stage that increased the recognition accuracy is the black space elimination (BSE) algorithm that viewed in Table 4.6. The algorithm increased the system accuracy through removing the unwanted pixels in the image background, which makes the features extraction methods work fast and well by extracting the features from only the important part of the image.

121

Chapter Five


 The used image normalization method makes all the handwritten images in the same size 128*128, in order to create a similarity in shape size of the text in different images. The appropriate choice of the image size in the proposed system shown in Table 4.7. 3. The main success key of the proposed system is the features extraction stage. Proposed features extraction methods got the best useful features which represent the text handwritten in the image that makes results of recognition and identification efficient. Table 4.14 shows the obtained results by using these methods and how accurate each one. Besides, the employment of MHOG1 in the proposed system is the main successful part of this thesis. Furthermore, the results in Table 4.8 show the strength of using the proposed edge detection filter for HMOG1 over the other filters. 4. The results in Table 4.19 show the strength of proposed MHOG2 and shape features extraction algorithms, which give unique features to each text’s writer. 5. The training and classification time are reduced by features scaling (FS) algorithm, subsequently reducing the system processing time. 6. The use of one vs all approach with polynomial kernel of Support Vector Machines (SVM) classification algorithm which is shown in Figure 4.7, yields more robust recognition results and identification performance than the use of other approaches, kernels and classifiers. 7. The proposed system has achieved better accuracy with three different handwritten databases which are presented in Table 4.16, than all the previous works that discussed in section 1.3 of chapter one. 8. The proposed text handwritten database gives a better accuracy result than the other handwritten databases in Table 4.16, and it also works in identification process and gives an ideal accuracy as shown in Table 4.20 for the both sub-word level and text level identification approaches. Besides, the database can work for character and word recognition, which are discussed in section 3.5 of chapter three. 122

Chapter Five


9. The number of training set and testing set in Table 4.17 shows that, the accuracy results and the classification time of proposed system did not affected that much by increasing the number of testing set and decreasing the training set. On the other hand, increasing the training set leads to optimal results. 10. All the proposed system stages implemented in only 6.2 seconds.

5.2 Suggestions for Future Work The proposed system can be improved in various directions which are: 1. Improving the proposed system to work in real time by apply it within online recognition and identification of the mobile and tablet devices. 2. The proposed system can be employed as a retrieval system based on user keyword inquiry and return the required handwritten documents. 3. The proposed system can be modified to work with printed Arabic document recognition.

123

References

References [1] Jorge S., Nicolás P., and Pedro P., “Pattern Recognition and Image Analysis (Part II)”, Springer Berlin Heidelberg, 2005. [2] Zhi-Qiang L., Jinhai C., and Richard B., “Handwriting Recognition”, Berlin, Germany: Springer-Verlag, 2003. [3] Rodrigo L., Carlos A., Othman A., and Juan R., “Biometric Identification Systems”, Signal Processing, Elsevier, vol. 83, no. 12, pp. 2539–2557, 2003. [4] Madhuri R., and Shubhangi M., “A Survey on Offline Signature Recognition and Verification Schemes”, Industrial Instrumentation and Control Conference, IEEE, pp. 165 – 169, 2015. [5] Niels R., Vuurpijl L., and Schomaker L., “Automatic Allograph Matching In Forensic Writer Identification”, International Journal of Pattern Recognition and Artificial Intelligence (IJPRAI), vol. 21, no. 1, pp 61-81, 2007. [6] Leedham G., and Chachra S., “Writer Identification Using Innovative Binarised Features of Handwritten Numerals”, Seventh International Conference on Document Analysis and Recognition, pp. 413-416, 2003. [7] Fornes A., Llados J., Sanchez G., and Bunke H., “Writer Identification in Old Handwritten Music Scores”, Eighth IAPR International Workshop on Document Analysis Systems, pp. 347-353, 2008. [8] Srihari S., Cha S., Arora H., and Lee S., “Individuality of Handwriting”, Journal of Forensic Sciences (JOFS), vol. 47, no. 4, pp. 1-17, 2002. [9] Khorsheed M., “Off-Line Arabic Character Recognition – A Review Pattern Analysis & Applications”, Pattern Analysis & Applications, Springer, vol. 5, no. 1, pp. 31-45, 2002. [10] Saeeda N., Saad B., Riaz A., and Muhammad I., “Arabic Script Based Digit Recognition Systems”, International Conference on Recent Advances in Computer Systems, pp. 67-73, 2016.

124

References

[11] Mohd A., Mohammad N., Khairuddin O., Che A., and Khadijah G., “Exploiting Features From Triangle Geometry For Digit Recognition”, Control, Decision and Information Technologies (CoDIT) International Conference, pp. 876–880, 2012. [12] Gita S., and Jitendra K., “Arabic numeral Recognition Using SVM Classifier”, International Journal of Emerging Research in Management and Technology (IJERMT), vol. 9359, no. 5, pp. 62–67, 2013. [13] Mohamed H., Loay E., and Faisel G., “Printed and Handwritten Hindi/Arabic Numeral Recognition Using Centralized Moments”, International Journal of Scientific and Engineering Research, vol. 5, no. 3, pp. 140–144, 2014. [14] Mohsen B., Faezeh M., Jalil G., “Persian/Arabic Handwritten Digit Recognition Using Local Binary Pattern”, International Journal of Digital Information and Wireless Communications (IJDIWC), vol. 4, no. 4, pp. 486-492, 2014. [15] Pawan K., Ram S., and Mita N., “A Study of Moment Based Features on Handwritten Digit Recognition”, Applied Computational Intelligence and Soft Computing, vol. 16, pp. 1-17, 2016. [16] Omar B., and Adnan S., “Isolated Arabic Handwritten Character Recognition: A Survey”, International Journal of Advanced Research in Computer Science and Software Engineering, vol. 4, no. 10, pp. 175-185, 2014. [17] Lawgali A., Bouridane A., Angelova M., and Ghassemlooy Z., “Handwritten Arabic Character Recognition: Which Feature Extraction Method?”, International Journal of Advanced Science and Technology, vol. 34, no. 9, pp. 1-8, 2011. [18] Manal A., Lulwah M., and Hanadi H., “Off-Line Arabic Handwriting Character Recognition Using Word Segmentation”, Journal of Computing, vol. 4, no. 3, pp. 4044, 2012. [19] Lawgali A., Angelova M., and Bouridane A., “A Framework for Arabic Handwritten Recognition Based on Segmentation”, International Journal of Hybrid Information Technology, vol.7, no.5, pp.413-428, 2014.

125

References

[20] Farah M., “A Haar Wavelet-Based Zoning for Offline Arabic Handwritten Character Recognition”, Journal of Babylon University/Pure and Applied Sciences, vol. 23, no. 2, pp 575-585, 2015. [21] Mohamed E., Rania M., and Monji K., “A New Design Based-SVM of the CNN Classifier Architecture with Dropout for Offline Arabic Handwritten Recognition”, International Conference on Computational Science, Elsevier, pp. 1712–1723, 2016. [22] Mohammad T., and Sabri A., “Offline Arabic handwritten text recognition: A Survey”, ACM Computing Surveys (CSUR), vol. 45, no. 23, pp 1-35, 2013. [23] Moftah E., Ayoub A., Zaher A., Laslo D., Sherif and Anwar S., “Gabor Wavelet Recognition

Approach

for

Off-Line

Handwritten

Arabic

Using

Explicit

Segmentation”, Image Processing and Communications Challenges 5, Springer International Publishing Switzerland, vol. 233, no. 1, pp. 245-254, 2014. [24] Moftah E., Ayoub A., Laslo D., and Sherif E., “CRFs and HCRFs Based Recognition for Off-Line Arabic Handwriting”, Springer International Publishing Switzerland, vol. 9475, no. 2, pp. 337–346, 2015. [25] Khaoula J., Mohamed M., and Najoua B., “Arabic Handwritten Word Recognition Based On Dynamic Bayesian Network”, International Arab Journal of Information Technology (IAJIT), vol. 13, no. 3, pp. 276-283, 2016. [26] Hicham E., Khalid S., Akram H., “Recognition of Off-line Arabic Handwriting words Using HMM Toolkit (HTK)”, 13th International Conference Computer Graphics, Imaging and Visualization, IEEE, pp. 167-171, 2016. [27] Gernot A., “Markov Models for Pattern Recognition (Part III)”, Springer, 2014. [28] Pervez A., and Yousef A., “Arabic Character Recognition: Progress and Challenges”, Journal of King Saud University, Elsevier, vol. 12, no. 4, pp. 85-116, 2000. [29] Plamondon R., and Srihari S., “On-Line and Offline Handwriting Recognition: A Comprehensive Survey”, Pattern Analysis and Machine Intelligence, vol. 22, no. 1, pp. 63-84, 2000. 126

References

[30] Veena B., and Sinha R., "Integrating knowledge sources in Devanagari text recognition system", IEEE Transactions on Systems, Man, and Cybernetics-Part A: Systems and Humans, vol. 30, no. 4, pp. 500-505, 2000. [31] Poovizhi P., “A Study on Preprocessing Techniques for the Character Recognition”, International Journal of Open Information Technologies, vol. 2, no. 12, pp. 21-24, 2014. [32] Kesidis A., Galiotou E., Gatos B., Pratikakis I., “A Word Spotting Framework for Historical Machine-Printed Documents”, International Journal On Document Analysis And Recognition (IJDAR), Springer, vol. 14, no. 2, pp. 131-144, 2011. [33] Suganya R., and Shanthi R., “Fuzzy C- Means Algorithm- A Review”, International Journal of Scientific and Research Publications, vol. 2, no. 11, pp. 1-3, 2012. [34] James C., "Pattern Recognition with Fuzzy Objective Function Algorithms", Advanced Applications in Pattern Recognition, US: Springer-Verlag, 1981. [35] Lorigo, M. and Venu G., “Off-line Arabic Handwriting Recognition: A Survey”, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 28, no. 5, pp. 712- 724, 2006. [36] Atena F., Abdolhossein S., and Jamshid S., “Document Image Noises and Removal Methods”, International MultiConference of Engineers and Computer Scientists, pp. 1-5, 2013. [37] Canny J., "A computational approach to edge detection", Pattern Analysis and Machine Intelligence, IEEE Transactions, vol. 8, no. 1, pp. 679-698, 1986. [38] Rafael C., Richard E., "Digital Image Processing (3rd Edition)”, Prentice Hall, 2008. [39] Alberto M., and Sabri T., "Image Processing Techniques for Machine Vision", Florida International University, pp. 1-9, 2000. [40] Davies E., “Machine Vision: Theory, Algorithms and Practicalities (4th Edition)”, Academic Press, 2012. 127

References

[41] Liu C., Nakashima K., Sako H., and Fujisawa H., “Handwritten Digit Recognition: Investigation of Normalization and Feature Extraction Techniques”, Pattern Recognition, vol. 37, no.2, pp. 265–279, 2004. [42] Lorigo L., and Govindaraju V., "Segmentation and Pre-Recognition of Arabic Handwriting", Eighth International Conference on Document Analysis and Recognition, pp. 605-609, 2005. [43] Haikal E., and Volker M., “Arabic Text Recognition Systems - State Of the Art and Future Trends”. International Conference on Innovations in Information Technology (IIT), pp. 692-696, 2008. [44] Isabelle G., Masoud N., Steve G., and Lotfi A., “Feature Extraction Foundations and Applications”, Berlin Heidelberg New York: Springer-Verlag, 2006. [45] Faye I., Samir, B., and Eltoukhy M., “Digital Mammograms Classification Using a Wavelet Based Feature Extraction Method”, Second International Conference on Computer and Electrical Engineering, pp. 318-322, 2009. [46] Naviz. A., and Fatos T., “An Overview of Character Recognition Focused On OffLine Handwriting”, Systems Man and Cybernetics Part C (Applications and Reviews), IEEE Transactions, vol. 31, no. 2, pp 216-233, 2001. [47] Naviz. A., and Fatos T., “Optical Character Recognition for Cursive Handwriting”, Pattern Analysis and Machine Intelligence, IEEE Transactions, vol. 24, no. 6, pp. 801-813, 2002. [48] Saunders C., Grobelnik M., Gunn S., and Shawe J., “Subspace, Latent Structure and Feature Selection”, Springer-Verlag Berlin Heidelberg, 2006. [49] Navneet D., and Bill T., “Histograms of Oriented Gradients for Human Detection”, IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 1-8, 2005. [50] Mark N., and Alberto S., “Feature Extraction and Image Processing for Computer Vision (3rd Edition)”, Academic Press: Elsevier, 2012.

128

References

[51] Nasir A., Natarajan T., and Kamisetty R., "Discrete Cosine Transform" IEEE Transactions on Computers, vol. 1, pp. 90-93, 1974. [52] Jiang J., Weng Y., and Li P., “Dominant Colour Extraction in DCT Domain”, Image and Vision Computing, Elsevier, vol. 24, no. 12, pp. 1269-1277, 2006. [53] Sarhan A., “Iris Recognition Using Discrete Cosine Transform and Artificial Neural Networks”, Journal of Computer Science, vol. 5, no. 5, pp. 369-373, 2009. [54] John M., "A Fast Cosine Transform in One and Two Dimensions", IEEE Transaction on ASSP, vol. 28, no.1, pp. 27 – 34, 1980. [55] Abdul R., Christian V., and Marzuki K., "Lexicon-Based Word Recognition Using Support Vector Machine and Hidden Markov Model", 10th International Conference on Document Analysis and Recognition (ICDAR), pp. 161-165, 2009. [56] Padraig C., and Sarah J., "K-Nearest Neighbor Classifiers", Multiple Classifier Systems: UCD School of Computer Science & Informatics,

University College

Dublin, Belfield, Ireland, pp. 1-17, 2007. [57] Cortes C., and Vapnik V., “Support-Vector Networks”, Machine Learning, Springer, vol. 20, no. 3, pp. 273–297, 1995. [58] Vapnik V., “The Nature of Statistical Learning Theory”, Springer-Verlag New York, 2000. [59] Sammut C., and Webb G., “Encyclopedia of Machine Learning”, US: Springer, 2010. [60] Faouzi B., Rachid H., and Mouldi B., "Handwritten Arabic Character Recognition Based On SVM Classifier", 3rd International Conference on Information and Communication Technologies: From Theory to Applications (ICTTA), pp. 1-4, 2008. [61] Amin A., “Recognition of Printed Arabic Text Based On Global Features and Decision Tree Learning Techniques”, Pattern Recognition, Elsevier, vol. 33, no. 8, pp. 1309-1323, 2000.

129

References

[62] Kharma N., Ahmed M., and Ward R., “A New Comprehensive Database of Handwritten Arabic Words, Numbers, and Signatures Used For OCR Testing”, Canadian Conference on Electrical and Computer Engineering, IEEE, pp. 766–768, 1999. [63] Al-Ohali Y., and Cheriet M., “Databases for Recognition of Handwritten Arabic Cheques”, 7th International Workshop on Frontiers in Handwriting Recognition (IWFHR), pp. 601–606, 2000. [64] Al-Ma’adeed S., Elliman D., and Higgins C., “A Data Base for Arabic Handwritten Text Recognition Research”, 8th International Workshop on Frontiers in Handwriting Recognition (IWFHR), pp. 485–489, 2002. [65] Moftah E., Ayoub A., Zaher A., and Laslo D., “IESK-ArDB: A Database for Handwritten

Arabic

and

an

Optimized

Topological

Segmentation

Approach”, International Journal on Document Analysis and Recognition (IJDAR), vol. 16, no. 3, pp. 295–308, 2013. [66] Pechwitz M., Maddouri S., Margner V., Ellouze N., and Amiri H., “IFN/ENITDatabase of Handwritten Arabic Words”, International Conference of Francophone Symposium on Writing and Document, pp. 127-136, 2002. [67] Melhi M., “Off-Line Arabic Cursive Handwriting Recognition Using Artificial Neural Networks”, PhD dissertation, Department of Cybernetics, University of Bradford, 2001. [68] Claus V., “Biometric User Authentication for IT Security, from Fundamentals to Handwriting”, Springer-Verlag US, 2006. [69] Dhaval S., Jun Z., Jarrell W., and Song W., “Handwritten Text Segmentation using Average Longest Path Algorithm”, Applications of Computer Vision (WACV), IEEE Workshop, pp. 15-17, 2013. [70] Jouni K., Chris S., and Juni P., “Encyclopedia of Biostatistics”, John Wiley and Sons, 2005.

130

References

[71] Padmaja V., Giri P., and Chandrasekhar B., “Image Compression Effects on Face Recognition for Images with Reduction in Size”, International Journal of Computer Applications, vol. 61, no.22, pp. 38-42, 2013. [72] Sebastiano I., “Fundamentals in Handwriting Recognition”, Springer Berlin Heidelberg, 2012.

131

‫المستخلص‬ ‫معظم الحكومات والمنظمات لديها عدد كبير من الوثائق المكتوبة بخط اليد الناتجة عن العمليات اليومية‪ .‬ال بد من‬ ‫استخدام أجهزة الكمبيوتر لقراءة النصوص المكتوبة بخط اليد‪ ،‬وجعلها قابلة للتعديل و البحث‪ .‬لذلك التعرف على‬ ‫الكتابة اليدوية أصبح في اآلونة األخيرة موضوع بحث شائع جدا وعدد تطبيقاته المحتملة كبيرة جدا‪ .‬حيث لديه‬ ‫القدرة على حل المشاكل المعقدة وتبسيط األنشطة البشرية من خالل تحويل الوثائق المكتوبة بخط اليد إلى شكل‬ ‫رقمي‪ .‬ومع ذلك‪ ،‬فإن التعرف على النص العربي المكتوب بخط اليد هو عملية معقدة مقارنة مع أنظمة الكتابة‬ ‫اليدوية للغات األخرى بسبب طبيعة المزج لكتابة اليد في اللغة العربية‪.‬‬ ‫لهذه االسباب‪ ,‬تم اقتراح للتعرف على النص المكتوب بخط اليد للغة العربية وتحديد هوية كاتب النص باالعتماد‬ ‫على تجزئة المدخالت من نصوص الوثائق المكتوبة بخط اليد إلى كلمات فرعية مكتوبة بخط اليد‪ .‬النظام يحوي‬ ‫اثنين من االجزاء )‪ (modules‬األساسية المستخدمة للتعرف على النص المكتوب بخط اليد وتحديد كاتب النص‪.‬‬ ‫الجزء االول )‪ (module1‬له ست مراحل والتي تعمل معا للتعرف على النص العربي المكتوب بخط اليد وتحويله‬ ‫إلى نص قابل للتعديل‪ .‬وهذه المراحل هي‪ :‬اكتساب الصور‪ ،‬التجزئة‪ ،‬التجهيز‪ ،‬بناء قاعدة الميزات‪ ,‬التصنيف‬ ‫ومرحلة ما بعد المعالجة‪ .‬في حين أن الجزء الثاني )‪ (module2‬يقوم بتحديد الكاتب المطلوب للنص من خالل‬ ‫عدة مراحل مشابهة لمراحل الجزء االول ‪ .‬اقترح النظام خوارزمية تجزئة فعالة ودقيقة والتي تجزء النص المكتوب‬ ‫بخط اليد المدخل إلى عدد من الصور الفرعية المكتوبة بخط اليد وكل صورة فرعية تحوي على كلمة فرعية من‬ ‫اللغة العربية‪ .‬باالضافة الى ذلك‪ ،‬تم اقتراح خوارزمية صورة العتبة لتحويل الصور الفرعية إلى صورة ثنائية‬ ‫باستخدام دالة )‪ .(Fuzzy C-Mean Clustering‬باالضافة الى ذلك‪ ،‬تمر الصور الفرعية الثنائية من خالل‬ ‫خوارزمية مقترحة إلزالة الووضاء من أجل إزالة المعلومات غير المرغوب فيها‪ .‬بعد ذلك‪ ،‬مجموعتان من‬ ‫الميزات يتم استخراجها من الصور الفرعية المكتوبة بخط اليد‪ .‬المجموعة األولى من الميزات التي تستخدم للجزء‬ ‫االول )‪ (models1‬توم الهيكلي‪ ،‬اإلحصائي‪ )DCT( discrete cosine transform ،‬و ‪Modified‬‬ ‫‪ )MHOG1(Histogram of Oriented Gradient‬المقترحة‪ .‬من جهة اخرى‪ ،‬فإن مجموعة الميزات الثانية‬ ‫التي تستخدم للجزء الثاني)‪ ( module2‬يشمل )‪Modified Histogram of Oriented (MHOG2‬‬ ‫‪ Gradient‬المقترح وميزات الشكل‪ .‬وباإلضافة إلى ذلك‪ ،‬تم الحصول على أفول نتائج التصنيف من خالل‬ ‫استخدام المصنف ‪ .)SVM( Support Vector Machine‬وتم اقتراح معجم عربي للجزء االول )‪(module1‬‬ ‫لتحويل المسميات المصنفة الى نص عربي قابل للتعديل ‪ ،‬ومعجم للكتاب اقترح أيوا لغرض تعيين المسمى‬ ‫المصنف إلى الكاتب المنشود‪.‬‬

‫من أجل اختبار أداء النظام‪ ،‬تم استخدام ثالثة قواعد بيانات للغة العربية المكتوبة بخط اليد والتي هي قاعدة بيانات‬ ‫‪ ،AHDB‬قاعدة بيانات ‪ IESK-arDB‬وقاعدة بيانات مقترحة للغة العربية المكتوبة بخط اليد‪ .‬وكانت النتائج التي‬ ‫تم الحصول عليها من الجزء األول )‪ ٪96.317 (module1‬ل‪ ٪82 ،AHDB‬لل ‪ IESK-arDB‬و ‪٪98‬‬ ‫لقاعدة البيانات المقترحة باستخدام ‪ SVM‬لنواة متعدد الحدود‪ .‬من جهة أخرى‪ ،‬كانت نتائج الجزء‬ ‫الثاني )‪ (module2‬باستخدام قاعدة البيانات المقترحة ‪ ٪85‬لطريقة مستوى الكلمات الفرعية المكتوبة بخط اليد‬ ‫و ‪ ٪100‬لطريقة مستوى النص المكتوب بخط اليد‪.‬‬

‫وزارة التعليم العالي و البحث العلمي‬ ‫الجامعة التكنولوجية‬ ‫قسم علوم الحاسوب‬

‫التعرف على النص العربي المكتوب بخط اليد‬ ‫وتحديد هوية الكاتب‬

‫أطروحة مقدمة الى قسم علوم الحاسوب‬ ‫في الجامعة التكنولوجية كجزء من متطلبات نيل درجة‬ ‫الدكتوراه فلسفة في علوم الحاسوب‬

‫اعدت من قبل‬

‫مصطفى سالم كاظم الشمري‬

‫بأشراف‬

‫أ‪.‬م‪.‬د‪ .‬علياء كريم عبد الحسن‬

‫‪ ١٤٣٨‬هـ‬

‫‪ ٢٠١٦‬م‬