Symbol Recognition Combining Vectorial and Statistical Features

Symbol Recognition Combining Vectorial and Statistical Features ´ Trupin, Jacques Labiche and Pierre Héroux Hervé Locteau, Sébastien Adam, Eric ´ Laboratoire PSI, Université de Rouen,Saint Etienne de Rouvray, France E-mail: [email protected] Abstract In this paper, we investigates symbol representation introducing a new hybrid approach. Using a combination of statistical and structural descriptors, we overcome deficiencies of each method taken alone. Indeed, a Region Adjacency Graph of loops is associated with a graph of vectorial primitives. Thus, a loop is both representend in terms of its boundaries and its content. Some preliminary results are provided thanks to the evaluation protocol established for the GREC 2003 workshop. Experiments have shown that the existing system does not really suffer from errors but needs to be more tolerant. Keywords: Graph Matching, Symbol representation, Symbol Recognition, Vectorisation, Moment Invariants.

1

Introduction

Managing huge amount of digital image of documents implies an effective knowledge extraction process and a suitable representation. Thus some systems have been designed taking advantage of well defined rules regarding the involved domains but this knowledge - parameters, scenarii - may appear scattered in the code. In this way, a new case of study may lead to start from scratch the development of a new system. That is why new systems have to be designed putting out as much as possible the knowledge relative to image treatment and to the application domain. As each domain introduces its own graphic notation, which is not really standardized in some cases (e.g. architecture), the automatic interpretation of such documents with a generic approach relies on the ability to discover and learn the corresponding notation. Designing an automatic interpretation system is somehow ambitious and researches have been mainly devoted to automatic conversion of graphic documents to a format able to be understood by CAD systems, that it to say, a set of vectorial primitives. However, despite the rasterto-vector conversion is a generic step, systems have to provide information closest to the domain. They have to enable the interpretation of the drawing, to give meaning in a semantic way. In this way, the very first studies to get the high level representation of a drawing have been designed based on structural representations consisting of vectorial primitives using structural model. As [8] stated, typical applications in the field of graphical recognition, for example, backward conversion from raster images to CAD, hand-drawn based user interfaces for design systems or retrieving by content in graphical document databases, involve symbol recognition processes. We investigate in this study the symbol recognition step. Symbol recognition consists in locating and identifying the symbols on a graphical document. Many types of documents contain symbols that appear connected to other graphic components, making symbol extraction a difficult task [4]. Among the application domains and the corresponding specificities for the process of symbol recognition we focus on, we can distinguish three main classes of documents: technical drawings; maps; musical scores and mathematical formulas. Regarding technical drawings, symbols are mainly embedded in a net of lines, without any predefined orientation or scale. A huge number of informations can be drawn in a restricted region of the document. Thus, the problem of symbol recognition depends on the crowded context in which they are located. Moreover, images can be degraded since the documents are themselves degraded or since the acquisition process is unreliable. From this point of view, document image analysis is accomplished by building a hierarchical perception of the document from raster to high level objects belonging to the domain. Symbol recognition must

be designed regarding obvious constraints among which invariance to affine transforms, segmentation, degradation and scalability. The GREC 2003 symbol recognition contest has proposed a performance evaluation framework to compare various works on this topic [16]. We can find a classical classification of the existing modelisation : structural approaches and statistical approaches. As we remind it before, many structural models involve vectorial primitives that are embedded in a structure where relations are geometric constraints among which ’interconnection’, ’tangency’, ’intersection’, ’parallelism’ and so on [17, 12, 13]. Nevertheless, other geometric primitives have been used (loops or contours for example). Matching consists in finding a subgraph isomorphism between the input graph and the prototype graph. Error-tolerant subgraph isomorphism has then became an important field of research to decrease the computational complexity [10, 9, 3]. Structural approaches can be very sensitive in presence of noise or deformation but the rotation and scale-invariance can be easily acquired. In statistical pattern recognition, each entity is being assigned a feature vector extracted from a portion of the image. Classification is then achieved using a partition technique of the feature space. Among features, we can cite studies on geometric features, moment invariants or image transformations ([6, 1, 11, 14, 15, 2, 18]). Since statistical approaches work at the pixel-level, they are domain-independent and do not rely on a thinning and a vectorisation process. However, some signatures suffer from scale and rotation estimator when trying to define the invariance. At least, statistical approaches have to be defined on a well defined region of interest which is not an easy task dealing with symbols embedded in a net of lines. The remainder of the paper is organized as follows. We present in section 2 an exhaustive graph representation for symbols based on regions in terms of theirs boundary and content. In section 3, we propose a system that illustrates advantage of such a model while section 4 deals with the use of statistical features. Experiments are discussed in Section 5 and, finally, section 6 is devoted to the conclusions and further works.

2

Proposed symbol representation

As low-level stages and the image itself may introduce error and noise, the encoding of an instance of an object using an attributed graph may differ from an ideal model of this object. Indeed, the resulting graph may appear distorted, have more or less nodes and edges and get different labels. Therefore, the recognition step has to take into account errors to make an object identified as a distorted instance of the involved model. We propose an exhaustive graph representation combining vectorial primitives and statistical features. A Region Adjacency Graph (RAG) is built using two formalisms for the nodes. A node is associated to a loop and is mapped to a structure consist of both : • region boundaries in terms of vectorial primitives, • region content in terms of moments invariant (Zernike invariant). To get a rotation and scale invariant representation, the extremities of each vectorial primitive are encoded using relative location taken into account polar coordinates with respect to the centroid, the orientation of the loop and the size of the whole object. Edges involves the shared vectorial primitives and relative properties such as location of the centroid, area and orientation. One may note that RAG are not suitable to encode some patterns (e.g. in Figure 1). Indeed, vectorial primitives that do not appear in any boundary definition are not translated in this formalism. All the vectorial primitives, those involved in the region boundaries and the others, are embedded in a graph using ’interconnection’ links. Using such a representation enables to define precisely the regions of interest in an unknown digital document where entities are connected in a network. For further developments, we plan to insert additional geometric constraints such as the works we recall before do ([17, 12, 13]).

2

(a) Two isolated components without any region

(b) Two isolated regions

(c) A first symbol (d) A second symbol represented by a disc represented by a disc

Figure 1: Some symbols that can not be well defined using RAG with loops as regions

2.1

Example

We report below a glimpse of the represented structure of a saving file associated to the symbol of the figure 2. One may reads for example that the segment P0 is interconnected to the arc P7 at pixel (438,193); one of the vectorial paths in the graph - identified PP6 - is made of segments P2, P5, P4, P3; L3 get an internal boundary (PP6) since L3 includes L0; . . . ... ↓ P4

x=”438” y=”193” P5 →

...

← P3 ↓ PO

← (438,193)

← P9

”P0”

L0

P7 →

”P1” ↑ P1

”P7” ”P9”

L3

↓ P2

”P2,P5,P4,P3” ...

Figure 2: A symbol and its representation model

”0.586,1.518,0.257,...,0.0192” ”a=289.733 d=0.0147,a=70.267 d=0.0147” ”a=250.641 d=0.0147,a=289.733 d=0.0147” ”a=109.359 d=0.0147,a=250.641 d=0.0147” ”a=70.267 d=0.0147,a=109.359 d=0.0147” ...

3

”0.399,0.969,1.790,...,0.000” ”a=223.356 d=0.0104297,a=134.167 d=0.0103775,..., a=134.167 d=0.0103775,a=43.4713 d=0.0101896” ”a=69.4653 d=0.006,a=109.339 d=0.006” ”a=286.723 d=0.006,a=69.465 d=0.006” ”a=248.415 d=0.006,a=286.723 d=0.006” ”a=109.339 d=0.006,a=248.415 d=0.006” ...

2.2

Relative properties

We encode in the edge mapped structure the centroid’s location, area and orientation with respect to the source node mapped structure. We detail below how are estimated the affine transform parameters. 2.2.1

Orientation

The orientation φ of a shape can be evaluated (±pi) from the Hu moments as : 2m1,1 1 φ = arctan 2 m2,0 − m0,2 where

Z

∞

∞

Z

f (x, y)xp y q dxdy

mp,q = −∞

−∞

2

and (p, q) ∈ N+ Nevertheless, if we want to get a directed orientation estimator, we must consider Zernike moments. The Zernike moments have complex kernel functions based on Zernike polynomials, and are often defined with respect to a polar coordinate representation of the image intensity function f (ρ, θ) as : An,l

n+1 = π

Z 0

2π

Z

1 ? Vn,l (ρ, θ)f (ρ, θ)ρdρdθ

0

where ρ ≤ 1, the function Vn,l (ρ, θ) denotes a Zernike polynomial of order n and repetition l, and ? denotes complex conjugate. In the above equation n is a non-negative integer, and l is an integer such that n − |l| is even, and |l| ≤ n. The Zernike polynomials are defined as : Vn,l (ρ, θ) = Rn,l (ρ) exp(ilθ)

4

where i2 = −1 , and Rn,l () is the real-valued Zernike radial polynomial. (n−|l|)/2

Rn,l (ρ) =

X

(−1)s ρn−2s (n − s)! n−|l| s! n+|l| − s ! − s ! 2 2

s=0

As Teague has demonstrated it, given an image and its rotated image with an angle θ, the Zernike moments of the second image can be expressed using the one of the first image : Aφn,l = A0n,l exp(−ilφ) Thus, choosing a suitable (n, l), we can remove the uncertaincy of the orientation estimator when trying to map a node of the model with a node of an unknown pattern testing whether Apattern = Amodel exp(−ilφ) or n,l n,l pattern model An,l = An,l exp(−il(φ + π)). We define relative location of target node’s centroid with respect to the source node’s centroid using : gctarget = gcsource + dsource,target exp(iαsource,target + αsource ) where αsource,target denotes the relative orientation of the segment gcsource gctarget with respect to the orientation αsource of the source node and dsource,target the distance between gcsource and gctarget . 2.2.2

Scale

Relative area and distance properties are strongly dependent to scale property. Width and height of the fitting ellipse are evaluated using the singular values of the covariance matrix : v q u u µ2,0 + µ0,2 + (µ2,0 − µ0,2 )2 + 4µ2 1,1 t width = µ0,0 /2 v q u u µ2,0 + µ0,2 − (µ2,0 − µ0,2 )2 + 4µ2 1,1 t height = µ0,0 /2 At first sight, from a given reference, the scale of a pattern can be approximated using the ratio : scalepattern =

widthpattern × heightpattern widthref erence × heightref erence

Nevertheless, a variation of scale for a shape can be observed by a variation of its area but this variation depends on the compactness of the shape. Moreover, scale has to be evaluated taking into account variations of the layout thickness since noise can make the layout thicker. Once having defined such a model, next section shows the system which take advantage of it.

3

Knowledge operationalization

In the litterature, symbols can be recognized using either a bottom-up or a top-down strategy. The first family of approaches try to make a symbol appearing among an ascending hiearchy of representations. Thus, the recognition is not guided by any pre-acquired knowledge. In the other hand, the second family of approaches is based on a query, systems try to fit a symbol’s model into the data. The matching is then performed verifying the presence of the different components defined in the model. Contextual knowledge (the image to be analysed) and constraints (the candidate model to be found) are then used to formulate and then to verify interpretation 5

hypothesis. In practice, recognition systems alternate bottom-up and top-down strategies. The detection of an object enables the system to generate a set of search actions concerning other objects in its neighborhod that are then evaluated. From this overview, a structural representation of the entity’s components seems to be a relevant choice when we can define - for a given domain - a set of symbols to be matched in an unknown document where symbols are embedded in a network. Once the database of models G = {Gk }k∈[1,K] has been defined, for all (i, j) ∈ [1, K]2 , subgraph isomorphism is evaluated from Gi to Gj to avoid an early recognition of a Gi instead of Gj . A recognition hypothesis is triggered identifying a white connected-component (loopi ) within an image as a region of a symbol’s RAG (see algorithm 11 ). The interpretation hypothese is translated as an affine transform λ() and a mapping is created between the seed loop and the node of the candidate RAG model. The next step consists in evaluating whether we can find in the neighborhood of the involvesd loops in the image adjacent regions that are specified within the model. A hierarchy of regions to be found permits to break the current process as soon as possible giving higher priority in the matching to biggest regions. This strategy is illustrated by the algorithm 21 . Algorithm 1 1 Trigger symbol recognition Input: Image to be analysed for all loopi ∈ Image do for all labelj ∈ L(loopi ) do CG(j) ← {Gm ∈ G | ∃ nodenm ∈ N ode(Gm ), labelj ∈ L(nodenm )} for all Gk ∈ CG(j) do for all nodelk ∈ N ode(Gk ), labelj ∈ L(nodelk ) do RAG M atching(Gk , nodelk , loopi ) end for end for end for end for

4

Learning and classificition

One of the main problem of the classification phase in pattern recognition relies on the lack of samples for the learning step. Using the method proposed by [5], we can generate noisy images that may have been obtained by operations like printing, photocopying, or scanning processes. For a given model and from a set of images obtained using such an approach, we apply for each loop of the model a mask in order to extract on each noisy image one Zernike moments’ vector corresponding to the involved loop(s). As we do not want to introduce any specific knowledge, each loop of each model lead to a distinct label of the alphabet of nodes. It means that we get at the end as many labels as we have loop in the database. Once a database of symbols has been defined, a Principal Component Analysis (PCA) is performed in order to reduce the space feature dimension with an unsupervised linear feature extraction method. Each cloud is then approximated using a multivariate Gaussian distribution. Finaly, the empirical Bayes decision rule is used for the classification phase. For our experiments, 1

We recall here the involved notations : • G = {Gk }k∈[1,K] is the database of the K symbols models, e • N ode(Gk ) = {noden k } and Edge(Gk ) = {edgek } are respectively the set of nodes and edges of Gk ,

• L() is the label function for a region, • Λ is the set of parameters of the affine transform function λ(), • X ⇔ Y denotes a valid bijective mapping between X and Y

6

Algorithm 2 1 RAG Matching Inputs: Gk the candidate model to be found seedN ode a node of Gk seedLoop a loop such as L(seedLoop) ∩ L(seedN ode) 6= ∅ M atching ← {seedN ode} evaluate Λ from (seedN ode, seedLoop) N ode(G) ← {seedLoop} repeat AN (M atching) ← {nodelk ∈ N ode(Gk ) | ∃ edgeem ∈ Edge(Gm ), edgeem (source) ∈ M atching, edgeem (target) ∈ N ode(Gk ) r M atching} nodeA k ← arg maxn∈AN (M atching) {Area(n)} A if f ind(loopi ∈ Image | L(loopi ) ∩ L(nodeA k ) 6= ∅, loopi ⇔ λ(nodek )) then A M atching ← M atching ∪ {nodek } N ode(G) ← N ode(G) ∪ {loopi } else return false end if update Λ until G ⇔ Gk return true we use the Zernike moments up to the 7th order on images generated using the first 6 degradation models of the GREC 2003 symbol recognition contest.

Figure 3: The 50 symbols used during the GREC 2003 Symbol Recognition Contest

7

5

Preliminary results

Experiments have been achieved using the datasets (figure 3) of the GREC 2003 symbol recognition contest to evaluate the scalability of our approach according to the number of symbol models. Evaluation is performed for images with rotation and scaling and binary degradation. As our system just take advantage of the RAG to trigger symbol recognition, we do not report result for Level 7-9 which mainly lead to thinner lines than the model and make region split with one another. We have specified both the recognition rate and the rejection rate for this study. Some observations - regarding the goal of this study - can be drawn : • Obviously, the system is too much demanding and further experiments are required, even for clean images (table 1). • Scale evaluation suffers from a lack of local analysis to get a suitable evaluation of distance between regions (tables 2 and 3). • Errors appear on the third dataset as it contains symbols that are isomorph if we only take into account the RAG part of our representation. As other approaches, the performance of our approach decreases as the number of symbols increases (table 4).

Models Symbols Images Result

5 5 5 100 0

20 20 20 35

Models Symbols Images Level 1 Level 2 Level 3 Level 4 Level 5 Level 6

50 50 50 65

42

38

Table 1: Accuracy and Reject in Recognizing Ideal Images (%) Models Symbols Images Rotation Scaling Both

5 5 25 0 100 32 68 0 100

20 20 100 12 88 26 74 6 92

50 50 250 7.6 72.4 36.8 43.2 5.2 74.4

5 5 25 56 56 52 48 20 60

44 44 48 52 80 40

20 20 100 66 34 61 39 58 42 54 46 37 62 58 42

50 50 250 55.2 27.2 50.8 35.2 44.4 43.6 42.8 45.2 29.2 58.8 41.2 45.2

Table 3: Accuracy and Reject in Recognizing Symbols with Degradation (%)

Table 2: Accuracy and Reject in Recognizing Symbols with Rotation and Scaling (%)

Models Accuracy Reject Error

5 37.39 62.61 0

20 41.85 57.83 0.33

50 34.96 49.22 15.83

Table 4: Scalability (%)

6

Conclusion and future works

Designing of a symbol recognition system mainly relies on the choice of (i) preprocessing, (ii) data representation and (iii) decision making. In this study, we have focus ou attention on the second phase and have proposed an hybrid representation of symbols combining vectorial primitives and statistical features. A Region Adjacency Graph based on loops is firstly built and associated with a graph of vectorial primitives. Therefore, this representation put into relief the definition of regions using two points of view : (i) region boundaries and 8

(ii) region content. For further development, we intend to trigger recognition using either a string edit distance as [7] or a statistical recognition based on moment invariants. Having such a combination will enables to formulate and validate hypothesis with a higher confidence rate while processing an unsegmented document. Moreover, it gives higher discriminative capabilities than RAG. As we select only one model per symbol, we should introduce rules to be applied on the depth map in order to merge regions from the skeleton graph (signalto-noise ratio). Nevertheless, since preprocessing methods still preserve some deviations to the ideal image, we should select various prototypes per class and include more tolerance according to the training set. The presented results must be taken as a preliminary study in the way of building a exhaustive representation for symbols. Experiments on the GREC 2003 contest have shown that a few number of errors occurs despite we do not take advantage of the vectorial primitives.

References [1] S. Adam, J.M. Ogier, C. Cariou, and J. Gardes. A scale and rotation parameters estimator application to technical document interpretation. In Y.-B. Kwon (Eds.) D. Blostein, editor, Graphics Recognition. Algortihms and Applications : 4th International Workshop, GREC 2001, pages 266–272, Kingston, Ontario, 2001. Springer-Verlag Heidelberg. [2] Chee-Way Chong, Paramesran Raveendran, and Ramakrishnan Mukundan. Translation and scale invariants of legendre moments. Pattern Recognition, 37:119–129, 2004. [3] Luigi P. Cordella, Pasquale Foggia, Carlo Sansone, and Mario Vento. A (sub)graph isomorphism algorithm for matching large graphs. IEEE Transactions on Pattern Analysis and Machine Intelligence, 26(10), oct 2004. [4] Luigi P. Cordella and Mario Vento. Symbol recognition in documents: a collection of techniques? International Journal on Document Analysis and Recognition, 3(2):73–88, 2000. [5] Tapas Kanungo, Robert Martin Haralick, Henry S. Baird, Werner Stuezle, and David Madigan. Document degradation models: Parameter estimation and model validation. IAPR Workshop on Machine Vision Applications, pages 552–557, 1994. [6] A. Khotanzad and Y.H. Hong. Invariant image recognition by zernike moments. IEEE Transactions on Pattern Analysis and Machine Intelligence, 12(5):489–497, may 1990. [7] J. Lladós, E. Marti, and J.J. Villanueva. Symbol recognition by error-tolerant subgraph matching between region adjacency graphs. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23(10):1137– 1143, october 2001. [8] Josep Lladós, Ernest Valveny, and Gemma Sánchez. A case study of pattern recognition: Symbol recog´ Trupin, editors, Pattern Recognition in Infornition in graphic documents. In Jean-Marc Ogier and Eric mation Systems, Proceedings of the 3rd International Workshop on Pattern Recognition in Information Systems, pages 1–13. ICEIS Press, 2003. [9] Daniel P. Lopresti and Gordon T. Wilfong. Evaluating document analysis results via graph probing. In 6th International Conference on Document Analysis and Recognition, pages 116–120, 2001. [10] Bruno T. Messmer and H. Bunke. A decision tree approach to graph and subgraph isomorphism detection. Pattern Recognition, 32:1979–1998, 1999. [11] R. Mukundan, S.H. Ong, and P.A. Lee. Image analysis by tchebichef moments. IEEE Transactions on image Processing, 10(9):1357–1364, 2001. 9

[12] Binbin Peng, Yin Liu, Wenyin Liu, and Guanglin Huang. Sketch recognition based on topological spatial relationship. In Structural, Syntactic, and Statistical Pattern Recognition, pages 434–443, 2004. [13] Jean-Yves Ramel and Nicole Vincent. Strategy for line drawing understanding. In Fifth IAPR International Workshop on Graphics Recognition, pages 1–12, Barcelona, Spain, 2003. [14] Oriol Ramos and Ernest Valveny. Radon transform for lineal symbol representation. In Proceedings of the Seventh International Conference on Document Analysis and Recognition, pages 195–199, Edinburgh, Scotland, 2003. [15] S. Tabbone, L. Wendling, and K. Tombre. Matching of graphical symbols in line-drawing images using angular signature information. International Journal on Document Analysis and Recognition, 6(2):115– 125, 2003. [16] Ernest Valveny and Philippe Dosch. Symbol recognition contest: A synthesis. In J. Lladós and Y. B. Kwon, editors, Selected Papers of the 5th IAPR International Workshop on Graphics Recognition, volume 3088 of Lecture Notes in Computer Science, pages 368–385. Springer-Verlag, 2004. [17] Luo Yan, Guanglin Huang, Liu Yin, and Liu Wenyin. A novel constraint-based approach to online graphics recognition. In Structural, Syntactic, and Statistical Pattern Recognition, volume 3138 of Lecture Notes in Computer Science, pages 104–113, 2004. [18] Su Yang. Symbol recognition via statistical integration of pixel-level constraint histograms: A new descriptor. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(2):278–281, feb 2005.

10