New Methodologies Torwads an Automatic Optical Recognition of ...

1 downloads 0 Views 2MB Size Report
1. The symbols that are featured by a vertical segment with height greater than a threshold: notes (e.g. § ), notes with flags (e.g. § @ ) and open notes. (e.g. ¨ ). 2.
New methodologies towards an automatic optical recognition of handwritten musical scores Ana Maria Rebelo Faculty of Engineering, University of Porto Doctoral Program in Electrical and Computer Engineering Prof. Dr. Jaime dos Santos Cardoso Prof. Dr. André Marçal

June 2009

PDEEC (INESC Porto)

OMR

June 2009

1 / 34

1

Introduction

2

Related Works

3

Stable Path Approach Results Staff Line Removal

4

Segmentation and Classification Process

5

Conclusion and Future Work

PDEEC (INESC Porto)

OMR

June 2009

2 / 34

Introduction – Staff, StaffSpaceHeight and StaffLineHeight

PDEEC (INESC Porto)

OMR

June 2009

3 / 34

Introduction – Symbols Symbols

Description Treble, Alto and Bass clef

Sharp, Flat and Natural

Beams

Accent and Staccatissimo

Crochet, Quaver and Minim

Quarter, Eighth, Sixteenth and thirty-second rests

Ties and Slurs PDEEC (INESC Porto)

OMR

June 2009

4 / 34

Introduction – Final Objective

PDEEC (INESC Porto)

OMR

June 2009

5 / 34

Introduction – Principal Steps

1 Recognition of musical symbols from a music sheet. a) Staff line detection and removal. b) Symbols primitives segmentation. c) Symbols recognition. 2 Reconstruction of the musical information to build a logical description of musical notation. 3 Construction of a musical notation model for its representation as a symbolic description of the musical sheet.

PDEEC (INESC Porto)

OMR

June 2009

6 / 34

Related Works – Staff line detection and removal

The investigation in the OMR field began with Pruslin1 and Prerau2 . The work has expanded since the decade of 1980’s: e.g. Carter3 , Kia4 , Bainbridge5 . Find local maxima in the horizontal projection of the black pixels of the image: e.g. Fujinaga6 , Toyama7 . Combination of projection techniques: e.g. Rossant8 . Grouping of vertical columns based on their spacing, thickness and vertical position on the image: Reed9 . Rule-based classification of thin horizontal line segments: Mahoney10 .

1 Automatic recognition of sheet music, 1966. 2 Computer pattern recognition of standard engraved music notation, 1970. 3 Automatic Recognition of Printed Music in the Context of Electronic Publishing , 1989. 4 Automatic Recognition of Printed Music in the Context of Electronic Publishing, 1995. 5 Extensible Optical Music Recognition, 1997. 6 Staff detection and removal, 1994. 7 Symbol recognition of printed piano scores with touching symbols, 2006. 8 Optical music recognition based on a fuzzy modeling of symbol classes and music writing rules, 2005. 9 Automatic computer recognition of printed music, 1996. 10 Automatic analysis of music score images, 1982. PDEEC (INESC Porto)

OMR

June 2009

7 / 34

Related Works – Limitations

Lines with some curvature or discontinuities are inadequately resolved. Build staff lines from local information, without properly incorporating global information in the detection process. None tries to define a reasonable process from the intrinsic properties of staff lines, namelly the fact that they are the only extensive black objects on the music score.

PDEEC (INESC Porto)

OMR

June 2009

8 / 34

Related Works – Musical Symbol Extraction and Classification

Extraction elementary graphic symbols that can be composed to build music notation: e.g. Rossant11 , Toyama12 . Line Adjacency Graph to extract symbols: Carter13 . A collection of processing modules that communicate by a common working memory: Kato14 . The segmentation task using Hidden Markov Model: Pugin15 .

11 Optical music recognition based on a fuzzy modeling of symbol classes and music writing rules, 2005. 12 Symbol recognition of printed piano scores with touching symbols, 2006. 13 Automatic Recognition of Printed Music in the Context of Electronic Publishing, 1989. 14 A recognition system for printed piano music using musical knowledge and constraints, 1990. 15 Optical music recognition of early typographic prints using hidden markov models, 2006. PDEEC (INESC Porto)

OMR

June 2009

9 / 34

Stable Path Approach – Algorithm Outline

(a) Skewed satff lines with music symbols.

PDEEC (INESC Porto)

(b) The first 11 shortest paths between left and right margins.

OMR

June 2009

10 / 34

Stable Path Approach – Algorithm Outline

(c) Skewed satff lines with music symbols.

(d) The first 11 shortest paths between left and right margins.

The sequential computation of the shortest path may be prohibitive for some applications. It would be interesting to be able to compute several “shortest paths” simultaneously.

PDEEC (INESC Porto)

OMR

June 2009

10 / 34

Stable Path Approach – Graphs I

PDEEC (INESC Porto)

OMR

June 2009

11 / 34

Stable Path Approach – Graphs I

Definition A path Ps,t is a stable path between regions Ω1 and Ω2 if Ps,t is the shortest path between s ∈ Ω1 and the whole region Ω2 , and Ps,t is the shortest path between t ∈ Ω2 and the whole region Ω1 .

PDEEC (INESC Porto)

OMR

June 2009

11 / 34

Stable Path Approach – Graphs II

(e) Shortest path between each pixel in the left column the whole right column.

(f) Shortest paths from each pixel in the right column and the whole left column.

Two approachs: shortest path and stable path. With the concept of Stable Paths, the computation of all the staff lines has only roughly twice the complexity of the shortest path computation. The stable paths concept provides a means to find all of shortest paths simultaneously: compute all the shortest path between the left margin and the whole right margin; in a second step repeat this process in a transverse way; at the end if the two endpoints of a direct and reverse graph coincide, we have a stable path.

PDEEC (INESC Porto)

OMR

June 2009

12 / 34

Stable Path Approach – Pseudo Code

BEGIN PreProcessing compute staffspaceheight and stafflineheight compute weights of the graph END PreProcessing CYCLE compute stable paths validate paths with blackness and shape remove valid paths from image add valid paths to list of stafflines END OF CYCLE if no valid path is found BEGIN PostProcessing uncross stafflines organize stafflines in staves smooth and trim stafflines END PostProcessing

PDEEC (INESC Porto)

OMR

June 2009

13 / 34

Stable Path Approach – Algorithm 1

Preprocessing: Estimating the values staffspaceheight and stafflineheight; Estimating the edges’ weights.

PDEEC (INESC Porto)

OMR

June 2009

14 / 34

Stable Path Approach – Algorithm

2

Successively finding the stable paths between the left and right margins, adding the paths found to a list and erasing them from the image. Stopping criterions: A path is discarded if it does not have a percentage of black pixels above a threshold (80% of the median value of blackness of all lines found in the first iteration). A path is discarded if its shape differs too much from the shape of the line with median blackness.

PDEEC (INESC Porto)

OMR

June 2009

14 / 34

Stable Path Approach – Algorithm

3

Postprocessing: remove intersections, cluster lines in staves, remove spurius staves, smooth and trim the lines.

PDEEC (INESC Porto)

OMR

June 2009

14 / 34

Stable Path Approach – Weight Function

PDEEC (INESC Porto)

OMR

June 2009

15 / 34

Stable Path Approach – Weight Function

Assign low weights to the adjacents pixels; otherwise assign to the edge a high cost.

PDEEC (INESC Porto)

OMR

June 2009

15 / 34

Stable Path Approach – Weight Function Assign low weights to the adjacents pixels; otherwise assign to the edge a high cost. Discriminate black pixels in the staff lines from black pixels in the musical symbols, penalizing the latter and favouring the former: If a black pixel is part of a short vertical run of black pixels, then it is more likely to be part of a staff line rather than of a symbol: include in the weight function a term that benefits those edges. If the nearest vertical run of black pixels on the same column is excessively far from the vertical run of black pixels containing the current black pixel, then this pixel is more likely to belong to a symbol : a penalising term is incorporated in the weight function.

PDEEC (INESC Porto)

OMR

June 2009

15 / 34

Real Database – 40 musical scores.

PDEEC (INESC Porto)

OMR

June 2009

16 / 34

Synthetic Database – 32 musical scores.

PDEEC (INESC Porto)

OMR

June 2009

17 / 34

Deformations Applied to the Synthetic Scores.

(g)

(j)

(h)

Original.

Staff line thickness

(k)

(i)

Curvature.

Staff line y-variation.

(l)

Kanungo.

Typeset emulation.

variation.

(m)

PDEEC (INESC Porto)

Rotation.

(n)

White Speckles.

OMR

(o)

Staff line interruptions.

June 2009

18 / 34

Metrics.

Two error metrics: the percentage of false positive staff lines and the percentage of staff lines missed to detect. Computation of the average Euclidian distance between each reference staff line and each staff line actually detected. Solve the matching problem on the resulting bipartite graph by minimizing the assignment cost. Only pairs with average error-distance below stafflineheight were assumed correctly matched. The remaining pairs were assumed to originate from a false positive staff line being matched to an undetected true staff line and were therefore unmatched.

PDEEC (INESC Porto)

OMR

June 2009

19 / 34

Erro

Ângulo Stable path Sortest Path Dalitz

Erro

Razão dos pixeis brancos Stable path Sortest Path Dalitz

Erro

Desvio máximo, n Stable path Sortest Path Dalitz

Erro

Results I – average (standard deviation) of the false detection rate and miss detection rate (in %).

Largura do espassamento, ngap Stable path Sortest Path Dalitz

PDEEC (INESC Porto)

rotação 2.5 5 0.7 (3.5); 0.7 (3.5) 1.2 (4.0); 1.2 (4.0) 0.7 (3.5); 0.7 (3.5) 1.2 (4.0); 1.2 (4.0) 4.2 (19.6); 9.8 (29.0) 5.5 (9.3); 37.5 (41.9) white speckle 0.07 0.09 0.11 0.9 (3.7); 0.9 (3.7) 1.2 (3.8); 1.2 (3.8) 2.1 (4.6); 2.3 (4.8) 0.9 (3.7); 0.9 (3.7) 1.7 (4.0); 1.9 (4.3) 5.3 (7.4); 7.0 (9.6) 26.7 (25.3); 29.9 (27.2) 89.3 (54.6); 86.9 (25.6) 54.5 (55.9); 95.2 (17.0) line y-variation 4 5 6 0.7 (3.5); 0.7 (3.5) 0.8 (3.6); 0.8 (3.6) 1.1 (3.8); 1.1 (3.8) 0.7 (3.5); 0.7 (3.5) 0.8 (3.6); 0.8 (3.6) 1.1 (3.8); 1.1 (3.8) 15.7 (27.2); 33.7 (45.0) 13.0 (20.1); 33.7 (45.0) 12.8 (18.6); 34.2 (44.7) typeset emulation I 7 10 13 0.6 (3.5); 0.6 (3.5) 0.6 (3.5); 0.6 (3.5) 0.6 (3.5); 0.6 (3.5) 0.6 (3.5); 0.6 (3.5) 0.6 (3.5); 0.6 (3.5) 0.6 (3.5); 0.6 (3.5) 22.3 (30.0); 17.4 (19.0) 24.2 (38.9); 16.7 (22.0) 31.4 (42.3); 19.2 (20.3) -2.5 0.7 (3.5); 0.7 (3.5) 0.7 (3.5); 0.7 (3.5) 8.6 (14.0); 15.5 (28.7)

OMR

Runtime 858 seg. 6006 seg. 612 seg. Runtime 809 seg. 5122 seg. 872 seg. Runtime 767 seg. 5122 seg. 768 seg. Runtime 739 seg. 5085 seg. 703 seg.

June 2009

20 / 34

Results II – average (standard deviation) in %.

Dalitz Shortest path Stable path

PDEEC (INESC Porto)

False detection rate 5.2% (10.4) 1.4% (3.5) 1.3% (5.7)

OMR

miss detection rate 5.9% (11.3) 2.5% (7.3) 1.4% (6.4)

Runtime 112 seg. 612 seg. 115 seg.

June 2009

21 / 34

Staff Line Removal – Algorithm

The algorithm tracks the staff lines and checks when a vertical black run is longer than a threshold (experimentally set a stafflineheight).

PDEEC (INESC Porto)

OMR

June 2009

22 / 34

Staff Line Removal – Metrics

The staff line removal is considered as a two-class classification problem at the pixel level, that is, one pixel can belong to a staff line or not: Pixel error rate =

x +y z

x =Number of misclassified staff pixels y =Number of misclassified non staff pixels z =Number of all black pixels This error indicates how badly the symbols are distorted when compared to the ideal staff-less images.

PDEEC (INESC Porto)

OMR

June 2009

23 / 34

Staff Line Removal – Results II

Pixel Error Rate

Stable path + LTH 2.8 (1.2)

Dalitz + LTH 3.8 (2.6)

Skeleton 6.5 (8.2)

Tabela: Removal performance on real music scores (in percentage): average (standard deviation).

PDEEC (INESC Porto)

OMR

June 2009

24 / 34

Staff Line Removal – Results III ROTATION

Angle Stable path + LTH Dalitz + LTH Skeleton

-5 1.7 (0.7) 19.4 (18.4) 1.9 (0.9)

-2.5 1.5 (0.7) 5.2 (8.7) 1.7 (0.8)

Amplitude/staffwidth Stable path + LTH Dalitz + LTH Skeleton

0.02 1.4 (0.7) 3.8 (5.8) 2.6 (2.4)

0.04 1.4 (0.7) 14.0 (12.2) 5.2 (5.1)

Rate whitened pixels Stable path + LTH Dalitz + LTH Skeleton

0.03 11.9 (3.1) 11.5 (3.2) 14.6 (3.2)

0.05 17.2 (4.9) 16.8 (4.9) 21.5 (4.6)

Max deviation, n Stable path + LTH Dalitz + LTH Skeleton

2 1.2 (0.7) 9.0 (13.2) 1.5 (0.8)

Max gap width, ngap Stable path + LTH Dalitz + LTH Skeleton

1 1.4 (0.7) 2.6 (1.8) 26.4 (9.8)

Max vert. shift, nshift Stable path + LTH Dalitz + LTH Skeleton

1 1.4 (0.7) 1.5 (0.8) 7.9 (8.9)

0 1.4 (0.7) 1.4 (0.8) 1.5 (0.7)

2.5 1.4 (0.7) 4.4 (8.8) 1.6 (0.7)

5 1.6 (0.7) 17.5 (18.9) 1.7 (0.8)

0.08 1.5 (0.7) 31.1 (11.0) 11.9 (8.6)

0.10 1.6 (0.7) 35.0 (10.6) 15.4 (10.4)

0.07 0.09 21.1 (5.9) 24.0 (6.7) 26.7 (8.0) 53.3 (14.9) 27.1 (5.6) 35.2 (12.8) LINE Y- VARIATION 3 4 5 1.3 (0.7) 1.3 (0.6) 1.4 (0.6) 10.4 (14.1) 10.9 (14.5) 10.9 (14.5) 1.7 (0.8) 2.2 (0.9) 3.7 (1.7) TYPESET EMULATION I 4 7 10 1.4 (0.7) 1.4 (0.7) 1.4 (0.7) 2.9 (2.0) 3.2 (1.7) 2.9 (1.7) 27.3 (10.1) 27.2 (11.3) 25.5 (9.8) TYPESET EMULATION II 4 7 10 1.4 (0.7) 1.4 (0.7) 1.5 (0.7) 2.8 (1.6) 3.3 (2.5) 3.8 (2.4) 24.1 (9.1) 26.7 (11.0) 26.1 (9.6)

0.11 26.1 (7.2) 73.3 (14.6) 46.9 (18.7)

CURVATURE

0.06 1.4 (0.7) 22.8 (13.7) 8.1 (7.2) WHITE SPECKLE

6 1.4 (0.6) 11.0 (14.6) 5.2 (2.2) 13 1.4 (0.7) 3.0 (1.8) 26.4 (10.3) 13 1.6 (0.7) 4.7 (3.7) 29.1 (10.7)

Tabela: Effect of different deformations on the overall staff removal error rates in percentage: average (standard deviation). PDEEC (INESC Porto)

OMR

June 2009

25 / 34

Musical Symbols

1

2

The symbols that are featured by a vertical segment with height greater than a threshold: notes (e.g.   ), notes with flags (e.g.  ( ) and open notes (e.g.   ). ==== The symbols that link the notes: beams (e.g.     ).

3

the remaining symbols connected to staff lines: clefs, rests (e.g. @ ), accidentals (e.g. [, ], \) and time signature (e.g. R ).

4

The symbols above and under staff lines: notes, relations (e.g. a) and accents (e.g. > ).

PDEEC (INESC Porto)

OMR

June 2009

26 / 34

Segmentation Process

1

Based on a hierarchical decomposition of the music image.

2

The music sheet was split by staffs.

3

The connected components were identified.

4

Selection of the objects based in the features of the musical symbols.

PDEEC (INESC Porto)

OMR

June 2009

27 / 34

Classification Process

1

Several sets of symbols were extracted from different musical scores to train the classifiers.

2

The symbols were grouped according to their shape – 14 classes.

3

In total we have 3222 handwritten music symbols and 2521 printed music symbols.

4

Classifiers: SVM, NN, KNN; each image of a symbol was initially resized to 20 × 20 pixels and then converted to a vector of 400 binary values.

5

Classifier: HMM; each image were resized to 150 × 30 pixels and feature were extracted.

PDEEC (INESC Porto)

OMR

June 2009

28 / 34

Classification Process – Elastic Deformation

y x eymn exmn + ξmn ξmn λ mn m=1 n=1 M

D(x, y ) =

PDEEC (INESC Porto)

N

∑ ∑

OMR

June 2009

29 / 34

Classification Process – Results

The available dataset was randomly split into training and test sets. The splitting of the data into training and test was repeated ten times in order to obtain more stable results for accuracy by averaging and also to assess the variability of this measure. A confidence interval was computed for the mean of these values ¯ − t ∗ √S ≤ µ ≤ X ¯ + t ∗ √S X N N and for the standard deviation values (n − 1)S 2 (n − 1)S 2 ≤ σ2 ≤ 2 2 χ[1−(α/2)] χ(α/2) We conclude that the application of the elastic deformation to the handwritten and printed music symbols does not improve the performance of the classifiers trained without elastically-deformed symbols.

PDEEC (INESC Porto)

OMR

June 2009

30 / 34

Conclusions

1

A robust algorithm for the automatic detection of staff lines in music scores was presented.

2

The proposed method uses a very simple but fundamental principle to assist detection and avoid the difficulties typically posed by symbols superimposed on staff lines. Existing staff line removal algorithms were enhanced by using the stable path method as their first processing step.

3

4

Several tests showed better results in detection staff lines phase.

5

The segmentation method was based on a hierarchical decomposition of the music image.

6

A comparative study with some classifiers were performed.

7

A new methodology based on elastic deformation was used in our dataset with conjunction with classifiers.

PDEEC (INESC Porto)

OMR

June 2009

31 / 34

Contributions and Related Publications

The introduction of the algorithm of the staff lines detection based in the Stable Paths approach. The creation of a database of real scores with its references: detection and removal references. New algorithms for the automatic detection and classification of musical symbols. Articles: “Optical Recognition of Music Symbols: a comparative study”, Ana Rebelo, Artur Capela and Jaime S. Cardoso, na International Journal of Document Analysis and Recognition (IJDAR 2009). “Staff Detection with Stable Paths”, Jaime S. Cardoso, Artur Capela, Ana Rebelo, Carlos Guedes, Joaquim Pinto da Costa, na IEEE Transaction on Pattern Analysis and Machine Intelligence (TPAMI 2009). “A Connected Path Approach for Staff Detection on a Music Score”, Jaime S. Cardoso, Artur Capela, Ana Rebelo, Carlos Guedes, na IEEE International Conference on Image Processing (ICIP 2008). “Staff Line Detection and Removal with Stable Paths”, Artur Capela, Ana Rebelo, Jaime S. Cardoso, Carlos Guedes, na International Conference on Signal Processing and Multimedia Applications (SIGMAP 2008). “Integrated Recognition System for Music Scores”, Artur Capela, Jaime S. Cardoso, Ana Rebelo, Carlos Guedes, na International Computer Music Conference (ICMC 2008). “A Shortest Path Approach for Staff Line Detection”, Ana Rebelo, Artur Capela, Joaquim F. Pinto da Costa, Carlos Guedes, Eurico Carrapatoso, Jaime S. Cardoso, na International Conference on Automated Production of Cross Media Content for Multi-channel Distribution (AxMedis 2007).

PDEEC (INESC Porto)

OMR

June 2009

32 / 34

Future Work

1

The elastic deformation is a useful defect model for handwritten symbols, but is not useful model for the types of defects that occur in the typeset symbols: use different degradation models to expand the training data for printed symbols.

2

Investigation of new methods for the segmentation phase: integration of syntactic music rules.

3

Investigation and application of inductive logic programming techniques.

4

Incorporate the prior knowledge of the musical rules in the recognition of symbols.

5

Become methodology naturally adaptable to manuscript images and to different musical notations.

6

Conversion of the musical scores into a format of musical description: MusicXML.

7

Integration of the algorithms developed in an OMR system with remote access via the internet to a wide corpus of unpublished handwritten music encoded in a adequate format. PDEEC (INESC Porto)

OMR

June 2009

33 / 34

Thank You

PDEEC (INESC Porto)

OMR

June 2009

34 / 34