Multivariate statistical projection methods to ... - Semantic Scholar

Journal of Electronic Imaging 17(3), 031106 (Jul–Sep 2008)

Multivariate statistical projection methods to perform robust feature extraction and classification in surface grading José Manuel Prats-Montalbán Technical University of Valencia Department of Applied Statistics Operation Research and Quality Camino de vera s/n 46022 Valencia, Spain E-mail: [email protected] Fernando López José M. Valiente Technical University of Valencia Department of Computer Engineering Camino de vera s/n 46022 Valencia, Spain Alberto Ferrer Technical University of Valencia Department of Applied Statistics Operation Research and Quality Camino de vera s/n 46022 Valencia, Spain

Abstract. We present an innovative way to simultaneously perform feature extraction and classification for the quality-control issue of surface grading by applying two multivariate statistical projection methods: SIMCA and PLS-DA. These tools have been applied to compress the color texture data that describe the visual appearance of surfaces (soft color texture descriptors) and to directly perform classification using statistics and predictions from the projection models. Experiments have been carried out using an extensive ceramic images database (VxC TSG) comprised of 14 different models, 42 surface classes, and 960 pieces. A factorial experimental design evaluated all the combinations of several factors affecting the accuracy rate. These factors include the tile model, color representation scheme (CIE Lab, CIE Luv, and RGB), and compression/ classification approach (SIMCA and PLS-DA). Moreover, a logistic regression model is fitted from the experiments to compute accuracy estimates and study the effect of the factors on the accuracy rate. Results show that PLS-DA performs better than SIMCA, achieving a mean accuracy rate of 98.95%. These results outperform those obtained in a previous work where the soft color texture descriptors in combination with the CIE Lab color space and the k-NN classifier achieved an accuracy rate of 97.36%. © 2008 SPIE and IS&T. 关DOI: 10.1117/1.2957886兴

1 Introduction At present, there are many industries manufacturing flat surface materials that need to split their production into Paper 07159SSR received Aug. 3, 2007; revised manuscript received Oct. 24, 2007; accepted for publication Oct. 26, 2007; published online Jul. 18, 2008. This paper is a revision of a paper presented at the SPIE conference on Quality Control by Artificial Vision, May 2007, Le Creusot, France. The paper presented there appears 共unrefereed兲 in SPIE Proceedings Vol. 6356.

Journal of Electronic Imaging

homogeneous series grouped by the global appearance of the final product. These kinds of products are used as wall and floor coverings. Some of them are natural products such as marble, granite, or wooden boards, and others are artificial, such as ceramic tiles. At the moment, these industries rely on human operators to carry out the task of surface grading. However, human grading is subjective and thus is often inconsistent between different graders 共reproducibility兲 and even within the same grader 共repeatability兲.2 thereby necessitating automatic and reliable systems. Also, real-time compliance to inspect the overall production at online rates is an important issue. In a recent work,1 we successfully approached the issue of surface grading using a set of soft color texture descriptors that were able to achieve a global success ratio of 97.36% in the VxC TSG image database. The work was performed using two well-established statistical tools—the experimental design3 and logistic regression4—in order to test for the k-nearest neighbors 共k-NN兲 classifier, several color spaces, classifier validation schemes, and all possible combinations of color texture features. In all color spaces, the best results were achieved using all color texture descriptors. Thus, no feature selection5 was performed on the set of color texture features. As reducing computational costs is an important issue in

1017-9909/2008/17共3兲/031106/10/$25.00 © 2008 SPIE and IS&T.

031106-1

Jul–Sep 2008/Vol. 17(3)

Prats-Montalban et al.: Multivariate statistical projection methods… Table 1 Summary of the surface-grading literature. Ground truth Boukouvalas Ceramic tiles

Features Color

Time study Accuracy No

—

Baldrich

Polished tiles Color/texture

No

92.0%

Lumbreras


No

93.3%

Peñaranda


Yes

—

Kauppinen

Wood

Color

Yes

80.0%

Kyllönen

Wood

Color/texture

No

—

Lebrun

Marble

Color/texture

No

98.0%

Ceramic tiles

Color

No

80.0%

Kukkonen

inspection applications, we now have focused our attention on statistical methods able to carry out an efficient compression of the feature space, that is, able to perform efficient feature extraction.5 Thus, here we present a study based on multivariate statistical projection models that enables this compression. Moreover, we apply the SIMCA and PLS-DA methods not only to carry out the feature extraction on color texture descriptors, but also to directly perform classification using statistics and predictions computed from the extracted projection models. The paper is developed as follows. Section 2 presents an overview of literature works related to the surface grading issue and a description of the soft color texture descriptors method. Section 3 introdues the multivariate statistical projection Approaches 共SIMCA and PLS-DA兲. Section 4 presents the experimental design and logistic regression. Section 5 introduces the VxC TSG database, used as ground truth in our experiments. Section 6 deals with the experiments and results. Finally, Section 7 concludes the paper. 2

Background

2.1 Surface Grading Surface grading is related to the automatic classification of flat pieces presenting random surface patterns. The aim of surface grading is to split the production into different classes sorted by their global appearance, which depends on color and texture properties. In recent years, many approaches to surface grading have been reported 共see Table 1兲. Boukouvalas et al.6 proposed color histograms and dissimilarity measures of these distributions to grade ceramic tiles. Other works consider a specific type of ceramic tile, polished porcelain tiles, which imitate granite. These works include texture features. Baldrich et al.7 proposed a perceptual approximation based on the use of discriminant features defined at the factory by human classifiers. These features mainly concern grain distribution and size. The method includes grain segmentation and feature measurement. Lumbreras et al.8 joined color and texture through multiresolution decompositions on several color spaces. They tested combinations of multiresolution decomposition Journal of Electronic Imaging

schemes 共Mallat’s, àtrous, and wavelet packets兲, decomposition levels, and color spaces 共graey, RGB, Otha, and Karhunen–Loève transform兲. Peñaranda et al.9 used the first and second histogram moments of each RGB space channel. Kauppinnen2 developed a method for grading wood based on the percentile 共or centile兲 features of histograms calculated for RGB channels. Kyllönen and Pietikäinen10 used color and texture features. They chose centiles for color, and LBP 共local binary pattern兲 occurrence histograms for texture description. Lebrun and Macaire11 described the surfaces of the Portuguese “Rosa Aurora” marble using the mean color of the background and mean color, absolute density, and contrast of marble veins. They achieved good results, but their approach was very dependent on the visual properties of this marble. Finally, Kukkonen et al.12 presented a system for grading ceramic tiles using spectral images. However, spectral images have the drawback of producing great amounts of data. In the literature review, we found that many of these approaches specialized a specific type of surface, others were not accurate enough or simply did not provide accuracy information 共most performed a brief, experimental work兲, and others did not take time restrictions in a real inspection process at the factory into account. Thus, we thought surface grading was still an open field where more contributions were possible. In this sense, we attempted to fill these literature deficiencies by presenting a generic approach that is suitable for a wide range of random surfaces. We also carried out extensive experiments, achieving accurate results with a representative data set of ceramic tiles.1

2.2 Soft Color Texture Descriptors Method The soft color texture descriptors method1 is simple: A set of statistical features describing color and texture properties is collected.13 The features are computed in a perceptually uniform color space 共CIE Lab or CIE Luv兲. These statistics form a feature vector used in the classification stage where the well-known k-NN is chosen as the classifier.5 CIE Lab and CIE Luv were designed to be perceptually uniform. The term “perceptual” refers to the way humans perceive colors, and “uniform” implies that the perceptual difference between two coordinates 共two colors兲 will be related to a measure of distance, which commonly is the Euclidean distance. Thus, color differences can be measured in a way close to the human perception of colors. These spaces were chosen to provide accuracy and a perceptual approach to color difference computation. As the data-set images were acquired originally in RGB, conversion to CIE Lab or CIE Luv coordinates was necessary. This conversion is performed using the standard RGB to CIE Lab and RGB to CIE Luv transformations.14 Following the ITU-R Recommendation BT.709, we used the illuminant D65 in the formulas. We proposed several statistical features for describing the surface appearance. For each color channel, we chose the mean and standard deviation. Also, by computing the histogram of each color channel, we were able to calculate histogram moments. The nth moment of z about the mean is defined as

031106-2

Jul–Sep 2008/Vol. 17(3)

Prats-Montalban et al.: Multivariate statistical projection methods… L

enew = xnew − anewBT .

␮n共z兲 = 兺共zi − m兲n p共zi兲, i=1

where z is the random variable, p共zi兲, i = 1 , 2 , . . . , L the histogram, L the number of different variable values, and m the mean value of z. We chose from the 2nd to the 5th histogram moments, which are related to texture information.13 We called these color texture features the soft color texture descriptors in comparison with other classical literature approaches to texture description. These features are simple and fast 共to compute兲, and are appropriate for real-time compliance. 3 Multivariate Statistical Projection Approaches 3.1 SIMCA The SIMCA approach, or Soft Independent Modeling of Class Analogy,15 consists of building one principal components analysis 共PCA兲16 for each of the studied surface grades calculating in order to determine whether or not new observations belong to one of them. This is performed by calculating the distance to each of the built models, computed as the residual sum of squares 共RSS兲, and the assignment of the new image 共ceramic tile兲 to the model showing the shortest distance. In this case, when creating each of the PCA models, each surface grade has been mean-centered and scaled by its own mean and standard deviation. In order to understand the mechanism of this approach, let us introduce the principal components analysis model. PCA is a well-known projection method that transforms the original variables onto new ones, called latent variables, that express the inner relationship among the original ones by maximizing the variance in the data structure. A PCA model can be expressed in matrix notation as X = ABT + R, where X 共I ⫻ J兲 is the training matrix corresponding to I images and J features, A 共I ⫻ F兲 is the score matrix, B 共J ⫻ F兲 is the loading matrix, R 共I ⫻ J兲 is the residual matrix, and F is the number of latent variables, called principal components. These new latent variables are extracted from the data according to their eigenvalue, extracting and compressing the most relevant information in a few orthogonal vectors. This is why PCA has traditionally been used in pattern recognition as a feature extraction technique5 since, by combining the weights of the original variables in the new ones, two objectives are achieved: 共1兲 a reduction in the data dimensionality; 共2兲 a fulfillment of the independency and normality assumptions required by different classifiers used a posteriori to perform the classification.17 In SIMCA, a training procedure is performed by building a PCA model for each surface grade of a given tile model. Once the loading matrix B has been obtained, the score for the feature vector of any new testing image xnew 共1 ⫻ J兲 is calculated by projecting this matrix onto B: anew = xnewB. When the score vector of the new image anew 共1 ⫻ F兲 is obtained, the residual vector can be computed as Journal of Electronic Imaging

From this residual vector enew 共1 ⫻ J兲 associated to each image, the residual sum of squares 共RSS兲 statistic is computed,18 which represents the Euclidean distance of the new testing image to each built model. This way, the lower the RSS value of a new image to a PCA model, the closer it is to that model. Thus, the SIMCA approach works by assigning the image to the surface grade of the closest PCA model. Here, PCA models are directly used as a classifier as well as a feature extraction tool. 3.2 PLS-DA Another multivariate projection approach is the discriminant version of the partial least-squares 共PLS兲 model,19 called partial least-squares discriminant analysis 共PLS-DA兲.20 PLS is a multivariate predictive model, similar to PCA in the sense that it looks to reduce the size of the original data structure 共feature extraction兲, but generates new latent variables 共compressed feature vectors兲 that explain the inner relations between an input X 共I ⫻ J兲 matrix and an output response Y 共I ⫻ C兲 matrix 共C is the number of output variables兲, by maximizing the covariance 共X , Y兲 instead of maximizing the variance in X 共as in PCA兲. In our case, X relates to the feature matrix, and Y to the classes matrix obtained from the surface grades. In the PLS model, the weight matrix W 共J ⫻ F兲 maximizes the covariance between X and Y. This matrix lets us obtain the feature extraction vectors or scores T 共I ⫻ F兲 as T = XW共PTW兲−1, where P 共J ⫻ F兲 is the loading matrix for X, which lets us test whether or not a new image belongs to the built model: X = TPT + R, where R 共I ⫻ J兲 is a residual matrix. On the other hand, Y can also be expressed in the latent space as Y = UQT + S, where U 共I ⫻ F兲 and Q 共C ⫻ F兲 are the score and loading matrices for Y 共I ⫻ C兲, respectively and S 共I ⫻ C兲 is a residual matrix. Finally, using the inner relationship between the compressed vectors T and U gathered by B 共F ⫻ F兲, Upred = TB, it is possible to predict the response matrix Y: Ypred = TBQT . Using the PLS-DA approach, PLS models can be used for classification purposes. In this case, it is necessary to build a Y matrix associated with each of the classes 共surface grades兲 present in the training set, and formed by as many dummy variables as classes. These dummy variables are actually column vectors with value 1 for those images 共feature vectors兲 linked to the corresponding class 共surface grade兲, and 0 for the rest. This way, the PLS-DA model tries to find those directions that maximize the separation between the different classes in the training set. Finally, by projecting any new testing image onto the PLS-DA model, its prediction with respect to each class is obtained.

031106-3

Jul–Sep 2008/Vol. 17(3)

Prats-Montalban et al.: Multivariate statistical projection methods…

One traditionally applied classifier in pattern recognition is Fisher linear discriminant analysis 共LDA兲,4 which is used after a feature extraction process in order to fulfill the LDA requirements linked to the well conditioning of the data matrix sets: normality assumptions, homogeneity between classes, and more samples than variables. When using LDA, the dimensionality reduction model used is PCA, not PLS. Thus, some latent variables providing discriminating information may be not taken into account when extracting the feature vectors,21 if we do not extract many latent variables 共this is not our aim, since we are also interested in dimensionality reduction兲. This happens because PCA does not look for discriminating directions 共latent variables兲 between classes, but for latent variables maximizing the variance for each class. However, this hidden information may be important for the discrimination between classes. PLS-DA is different from this more traditional LDA,4 and has advantages since it does not suffer from these previously mentioned restrictions. Since maximizing the covariance between X and Y is the aim in PLS-DA, this information will be directly gathered by the first latent variables of the PLS-DA model. Thus, by using PLS-DA, we achieve two goals at the same time: extracting the features that help in segregating the classes 共surface grades兲 in the model, and minimizing the dimension of the feature extraction data structure. A comparison of different discriminant analysis techniques can be found in Ref. 22. Design of Experiments and Logistic Regression The goal of any classification procedure is to efficiently link a new sample 共in our case, a new image兲 to its corresponding class. This must be accomplished with the maximum accuracy and in the shortest time. When using multivariate statistical projection methods for image classification, different factors affecting the accuracy show up, such as the three factors involved in this study: the ceramic tile model, the color data representation scheme 共color space兲, and the applied multivariate statistical classification approach. In order to build efficient models for automated classification purposes, a methodology is needed for determining—according to the problem analyzed and the classification method applied—the best combination of factors to maximize the accuracy. We propose using a combination of design of experiments3 and logistic regression4 to fulfill this goal. In this work, we have 14 different ceramic tile models to classify 共see Table 2兲, three possible color spaces to use 共CIE Lab, CIE Luv, and RGB兲, and two compression/ classification approaches to apply 共SIMCA and PLS-DA兲. Thus, we opted to use a complete factorial experimental design, which includes all the possible combinations of the factor levels in the model. This way, each studied effect related to each factor and its interactions with the other factors is orthogonal to the rest, enabling us to analyze all statistical significances of every effect in a separate way and to find the best combination that maximizes the accuracy of the surface grading in each ceramic tile model. Thus, a 141 ⫻ 31 ⫻21 complete factorial design has been built, leading to an 84-run design of experiment. If the number of experiments is a limiting factor, more economic

Table 2 Ground truth of ceramic tiles. Classes

Tiles/Class

Size 共cm兲

Aspect

13, 37, 38

16

33⫻ 33

Marble

Antique

4, 5, 8

14

23⫻ 33

Stone

Berlin

2, 3, 11

24

16⫻ 16

Granite

Campinya

8, 9, 25

30

20⫻ 20

Stone

Firenze

9, 14, 16

20

20⫻ 25

Stone

Lima

1, 4, 17

24

16⫻ 16

Granite

Marfil

27, 32, 33

14

23⫻ 33

Marble

Mediterranea

1, 2, 7

30

20⫻ 20

Stone

Oslo

2, 3, 7

24

16⫻ 16

Granite

Petra

7, 9, 10

28

16⫻ 16

Stone

Santiago

22, 24, 25

28

19⫻ 19

Stone

Somport

34, 35, 38

28

19⫻ 19

Stone

Vega

30, 31, 37

20

20⫻ 25

Marble

Venice

12, 17, 18

20

20⫻ 25

Marble

Agata

4


designs in terms of experiments can be generated using fractional factorial designs or optimal designs.3 This methodology has also been applied in Ref. 23, since several factors possibly affecting the accuracy appeared, such as the compression level applied on the images, the number of images used when building the models, the preprocessing of the data, etc. That work uses an optimal design of experiments in order to minimize the number of runs. Once the results of the accuracy in the classification have been registered for each run 共corresponding to each ceramic tile model, color space, and classification approach兲, an adequate tool to analyze these results is needed. Commonly, a linear regression model is used to predict the response variable. But, when dealing with the analysis of percentages, as is the case with the accuracy in classification, the proper tool to employ is the logistic regression model, where the original percentage p variable is transformed by the logit function in to the variable log关p / 共1 − p兲兴. This transformation assures us that the range of the prediction will be 关0,1兴, hence avoiding estimation and interpretation problems.4 This is illustrated in Fig. 1 for the case of one quantitative variable X. A multiple logistic regression model, where several explicative variables Xi are involved 共it does not matter if they correspond to simple factors or interactions兲, can be expressed as

冋册

log

p = ␤ 0 + ⌺ ␤ iX i . 1−p

Expressing the model in terms of p gives

031106-4

Jul–Sep 2008/Vol. 17(3)

Prats-Montalban et al.: Multivariate statistical projection methods… 2

2. Backward elimination consists of creating a complete model with all possible terms 共simple factors and interactions兲 and eliminating the nonsignificant ones. This is carried out in a step-by-step process, where each step removes the term that shows a lower level of statistical significance for some predefined type I risk. At each step, all the terms included in the model are evaluated and the least influential one is removed if its significance level is higher than the given type I risk. The process is repeated until the type I risk of all the terms remaining in the model is lower than the given type I risk, i.e., until no term is removed from it.

Logistic regression

1.5

Linear regression

Probability

1

0.5

0

−0.5

−1

0

20

40

60

80

100

120

140

160

180

Many software packages implementing statistics methods include options for computing the most common regression methods. We used the statistical package Statgraphics v5.1 to fit the logistic regression models. Here, in order to decide which terms should be part of the final model and to remove the nonstatistically significant terms, we have applied the forward selection stepwise method.

200

Fig. 1 Logistic vs. linear regression.

p=

e␤0+⌺␤iXi . 1 + e␤0+⌺␤iXi

This model differs in the meaning of the involved variables, according to their nature. When Xi corresponds to a quantitative variable, and assuming constant values for the rest of the variables, a positive ␤i-value in the quantitative variable means that as the value of Xi increases, the probability of the accuracy in the classification also does. If ␤i is negative, the higher the value of Xi, the lower the probability. However, when dealing with qualitative variables, as is our case, the interpretation is different, since we need to create, for each factor with K categories, K − 1 dummy variables 共binary variables兲 that evaluate the difference in the accuracy with respect to the kth category used as a reference. This way, for one specific categorical variable, a positive 共or negative兲 ␤i-value related to one of the categories of the variable means that the probability of the accuracy in the classification in that category is higher 共or lower兲 than that accomplished by the category used as a reference. The parameters ␤ of the logistic regression approach can be estimated using the maximum likelihood estimation 共MLE兲 or the weighted least squares 共WLS兲.4 In addition to the parameter estimation method, there are several approaches for computing the logistic model depending on the number of factors taken into account. The simplest way to determine the logistic model is to consider all the factors and their interactions. Then we are forced to estimate all the ␤ parameters for all factors and interactions, even if they are not statistically significant in relation to the response variable. However, two approaches do not consider all factors and interactions, known as the stepwise methods: 1. Forward selection starts from an initial model and consists of incorporating at each iteration 共or step兲 the most statistically significant term while the significance level 共p-value兲 is lower than some predefined type I error, usually between 5 and 10%. New terms are added, one at each step. The term providing the most information is incorporated in to the model. The process is repeated until no new term presents a significance level lower than the type I risk, i.e., until no new term is incorporated. Journal of Electronic Imaging

5 VxC TSG Image Database All the experiments were carried out using the VxC TSG* image database 共VxC Tiles for Surface Grading兲. Based on samples taken from the ceramic tile industry, this image database comprises 14 ceramic tile models, 42 surface grades, and 960 pieces. It was built in the VxC laboratory 共located at the Computer Engineering Department of the Technical University of Valencia兲 in collaboration with Keraben S.A. 共a large Spanish ceramic tile company in Spain兲 and is an extensive image database of ceramic tiles representing the wide range of surface classes in the ceramic tile industry. VxC TSG is also intended to be a tool for the scientific community working on surface grading or texture recognition. Each model in the database has three different surface classes, with a variable number of pieces per class 共see Table 2 and Fig. 2兲. The pieces were labeled, by specialized human graders in the factory, using a numeration scheme where close numbers mean close classes. For each model, the selected samples had two close classes and another distant to them. Thus, the database includes difficult cases within every model. Models were chosen representing the extensive variety of surfaces that factories can produce. A catalog of 700 models is common in these industries. However, in spite of this large number of models, almost all of them imitate one of the following mineral textures: marble, granite, or stone. Digital images of tiles were acquired using a spatially and temporally uniform acquisition system. This system was comprised of high-quality components: a scan-line color camera 共Dalsa Trillium TR-31-02k25兲 and an illumination system 共Mercrom FXC2372-2兲 formed by two special high-frequency and spatially uniform fluorescents. To overcome variations along time, the power supply for the fluorescents was automatically regulated by a photoresistor located near them. Spatial and temporal uniformity is very important in the surface-grading application6,7,9 because slight variations in *

Available at http://miron.disca.upv.es/vision/uxctsg/.

031106-5

Jul–Sep 2008/Vol. 17(3)

Prats-Montalban et al.: Multivariate statistical projection methods… FLUORESCENTS 5

noticiable difference granito mediterranea somport vega blue venice green venice

Euclidean distance

4

3

2

1

0 0

10

20

30

40

50

time in hours

Fig. 2 Samples from the ground truth; Petra 共a兲 and Marfil 共b兲 models. Each sample corresponds to one of the three classes within each model.

acquisition conditions, i.e., illumination, can produce different shades for the same surface and then misclassifications. To demonstrate the reliability of the acquisition system, we carried out the following experiment. Six tiles, each corresponding to a different model, were captured repeatedly. The tiles were chosen in an attempt to cover a wide range of surface types and colors. The complete set of tiles was acquired at random moments, 23 times, over 54 hours. We extended the experiment over 54 hours because this is the mean period at factories when they produce a specific model, and we wanted to study the spatial and temporal uniformity for a complete surface-grading session. Environmental conditions were kept constant using an air conditioner system for the temperature and a closed cabin for illumination. In order to study the temporal response, we measured the mean CIE Lab color of each piece. Also, to study the spatial response, we randomly oriented the pieces in each capture. The CIE Lab is a perceptually uniform color space and we can measure the perceptual difference between two colors using the Euclidean distance in this space.14 Thus, color differences can be measured in a very similar way to the human perception of colors. Mahy and Oosterlink24 established that in CIE Lab a noticeable difference in color 共for humans兲 begins at 2.3 or greater Euclidean distances. From this assertion, we will consider the system sufficiently stable if there is no Euclidean distance above 2.3 when we calculate all the Euclidean distances between the first sample and the rest of them for a given tile. In the results 共see Fig. 3兲, there was no distance above 2.3 in all tiles; also, they remained significantly far away from this limit. Thus, we determined that the acquisition system was sufficiently spatially and temporally uniform. 6 Experiments and Results We carried out the design of experiments listed in Section 4 and computed the corresponding logistic regression model. Accuracies were compiled using two validation methods during the classification stage: hold-out and leave-one-out. When we use a classifier, we need to validate it, and comJournal of Electronic Imaging

Fig. 3 Acquisition fluorescents.

system

response

over

54 hours

using

monly this task is performed by estimating the error rate 共or accuracy兲 in classification. To carry out this validation, we need to split our universe of samples into two sets: the training set to tune or build the classifier and the test set to validate it by computing the error rate. There are several approaches to design the training and test sets in order to estimate the error rate: hold-out, leave-one-out, k-fold cross-validation, and bootstrapping. A description of these methods and an in-depth comparison of the latter two can be found in Ref. 25. We chose the hold-out and leave-oneout validation methods because the universe of samples for each ceramic tile model was not very large 共see Table 2兲. The number of samples in models varies from 42 to 90 共a mean of 68 per model兲. Thus, the k-fold cross-validation and bootstrap methods were not adequate since they need larger universes of samples. In the hold-out validation method, the universe of samples is split into two separate sets. We split the samples in each tile model into sets of the same size; half of the samples for the training set, and the remaining half for the test set. In this approach, the error-rate estimation is pessimistic because the information is not all used for training or testing. On the other hand, the leave-one-out method is an attempt to use the data as efficiently as possible. One sample is extracted from the universe of samples, so the resulting set is now used as the training set and the extracted sample as the test set. This is done for all samples present in the original data set. The final error-rate estimation will be the percentage of failed classifications. This method provides an unbiased error rate and is recommended for small data sets, where the hold-out approach achieves poor estimates. We used the Statgraphics v5.1 software and computed two logistic regression models using the data collected in the design of experiments. The first regression model is achieved using the leave-one-out validation method in classification, and the second using hold-out. The computed models are illustrated in Tables 3 and 4. The adjusted percentage of deviance explained by the models 共55.91% and 52.25%兲 is an indication of the goodness of fit of the logistic models. In the first column, we collect the statistically significant 共p-value ⬍0.05兲 factors and interactions among them. The second column gives us the weights 共parameter

031106-6

Jul–Sep 2008/Vol. 17(3)

Prats-Montalban et al.: Multivariate statistical projection methods… Table 3 Logistic regression model using the leave-one-out validation method.

Table 4 Logistic regression model using the hold-out validation method.

Estimated regression model 共maximum likelihood兲

Estimated regression model 共maximum likelihood兲

Parameter

Estimate

Standard error

Estimated odds ratio

Parameter

Estimate

Dependent variable: accuracy

Dependent variable: accuracy

Factors: Model, Space, Approach

Factors: Model, Space, Approach

CONSTANT

2.3199

0.4688

14.4076

374.0390

1.81E6

Model= BERLIN

1.3831

0.8532

Model= CAMPINYA

2.0218

Estimated odds ratio

2.1835

0.5210

Model= ANTIQUE

14.7061

406.1430

2.45E6

3.9873

Model= BERLIN

−0.4525

0.5694

0.6360

0.9878

7.5524

Model= CAMPINYA

1.9196

0.9123

6.8187

−0.0144

0.6235

0.9857

Model= FIRENZE

0.1863

0.6462

1.2048

1.0882

0.7766

2.9699

Model= LIMA

1.2017

0.7818

3.3260

Model= MARFIL

−0.4089

0.6287

0.6644

Model= MARFIL

−1.0030

0.5949

0.3667

Model= MEDITERRANEA

−0.3314

0.5527

0.7179

Model= MEDITERRANEA

0.2548

0.5985

1.2902

Model= OSLO

14.9027

365.9240

2.97E6

Model= OSLO

3.3242

1.798

27.7785

Model= PETRA

−0.9213

0.5279

0.3980

Model= PETRA

−0.3636

0.5606

0.6951

Model= SANTIAGO

1.7258

0.9091

5.6171

Model= SANTIAGO

15.3091

388.2370

4.45E6

Model= SOMPORT

−0.1458

0.5719

0.8643

Model= SOMPORT

−1.1988

0.5299

0.3015

Model= VEGA

3.0099

1.7972

20.2850

Model= VEGA

15.0095

395.4730

3.30E6

Model= VENICE

0.6680

0.7283

1.9503

Model= VENICE

0.5735

0.7004

1.7745

Approach= PLS− DA

1.8861

0.3431

6.5935

Space= Lab

−0.6624

0.3180

0.5156

Space= Luv

−0.7270

0.3158

0.4833

0.9779

0.2564

2.6591

Model= ANTIQUE

Model= FIRENZE Model= LIMA

CONSTANT

Standard error

Percentage of deviance explained by model= 76.1984. Adjusted percentage= 55.9135.

Approach= PLS− DA

estimates ␤*兲 to be used for each factor or interaction in the linear combinations used to compute the predicted accuracy p*. Uncertainty in the estimation of the parameter estimates 共weights兲 is shown in third column. Finally, the fourth column provides the so-called odds ratios. The odds ratio, also called the relative risk, is a complementary measure to ␤* parameter estimates. When the predictor variable Xi is dichotomous 共binary or dummy兲, assuming that the rest of the predictors do not change, the relative risk is the ratio of the probability of success in classification vs. failure when Xi = 1 regarding the same ratio 共success/failure兲 when Xi = 0: odds − ratio =

p/1 − p兩Xi = 1 = exp共␤i兲. p/1 − p兩Xi = 0

A positive value of the ␤* parameter implies an increment in the response variable 共accuracy兲 using the corresponding factor level with regards to the reference level used in the model. For example, in Table 3, the AGATA tile model is the reference for the ceramic tile model factor 共does not appear兲; thus, models with positive ␤* parameter Journal of Electronic Imaging

Percentage of deviance explained by model= 72.6364. Adjusted percentage= 52.2467.

estimates will achieve better accuracy than the reference model, and those with negative ␤* parameter estimates will achieve worse accuracy than the reference. This is translated into odds ratios with a value greater than 1 corresponding to positive ␤* parameter estimates, and a value lower than 1 corresponding to negative ␤* parameter estimates. For the leave-one-out validation method 共see Table 3兲, no statistically significant difference is appreciated between the color spaces, maybe linked to the fact that both PCA and PLS models are able to combine the effect of the different color channels in a proper way when creating the latent variables. This is an important result since we could use the fastest 共to compute兲 color space without affecting the classification performance. RGB is the fastest color space; for CIE Lab and CIE Luv, we need to apply transformation formulas to convert the original RGB data into these color spaces.

031106-7

Jul–Sep 2008/Vol. 17(3)

Prats-Montalban et al.: Multivariate statistical projection methods…

In terms of the classification approaches, in all cases the PLS-DA model performs better in classification than the SIMCA approach, as indicated by the positive ␤*-value of 1.8861 共SIMCA is the reference for this factor兲. This is also related to the fact that the estimated odds ratio is higher than 1 共6.5935兲. Thus, PLS-DA provides higher accuracies in classification than SIMCA, meaning that PLS-DA should be used rather than SIMCA. This suggests an important conclusion: There is some discriminating information not taken into account by PCA models, probably because its variance in the X data structure of each surface grade is small. All ceramic tile models show statistically significant differences in the accuracy with respect to the AGATA model, which was used as the reference. This means that, although the accuracies are quite close, the differences between models are consistent from a statistical point of view. The positive values of ␤* estimates indicate that the corresponding ceramic tile model will show a higher accuracy than AGATA, which was the model used as a reference. For example, the CAMPINYA and LIMA models will present higher accuracies in the classifications, since the corresponding ␤* values are 2.0218 and 1.0882, respectively. On the other hand, the FIRENZE and SOMPORT ceramic tile models will show statistically significant lower accuracies than AGATA, since the corresponding ␤* values are −0.0144 and −0.1458, respectively. This is also related to the corresponding estimated odds ratios: 7.5524 and 2.9699 for the CAMPINYA and LIMA ceramic tile models 共greater than 1兲, and 0.9857 and 0.8643 for the FIRENZE and SOMPORT ceramic tile models 共lower than 1兲. Similar results can be stated for the hold-out validation method 共see Table 4兲. But here, a statistically significant difference among the three color spaces shows up: The RGB color space provides the best results 共RGB is the reference for this factor, and Lab and Luv achieve negative ␤* parameter estimates with regards to it兲. Again, PLS-DA shows better classification performance than SIMCA. Tables 5 and 6 present the predicted and observed accuracies 共averaged for the three color spaces兲 achieved by PLS-DA for the different ceramic tile models, achieving a global predicted accuracy of 98.95% for the leave-one-out validation method 共optimistic兲, and a global predicted accuracy of 96.45% the for hold-out validation method 共pessimistic兲. This means that, using RGB images in a straightforward way, one can generally obtain quite good results 共between 96.45% and 98.95%兲. With regards to the computational cost, once the training process is performed, SIMCA and PLS-DA are fast since the classification only implies the projection onto a known matrix 共see Section 3兲. The theoretical cost of k-NN is ␪共n * m + C兲, where n is the number of features 共18兲, m the number of prototypes, and C a constant related to the k-nearest neighbors sorting. Nevertheless, C is, in fact, zero since the best k-value in Ref. 1 was 1. The theoretical cost of PLS-DA is the same as k-NN, as for every new sample we only have to compute y = xnewBPLS, where n and n ⫻ m are the size of the vectors xnew and BPLS, respectively. BPLS is computed in the training stage as W共PTW兲−1BQT. The cost of the SIMCA approach is ␪关4 * 共F * n兲 + C兴, where n is the number of features and F is the number of Journal of Electronic Imaging

Table 5 Predicted and observed accuracies using PLS-DA and the leave-one-out validation method. Tile model

Predicted accuracy

Observed accuracy

Agata

98.53%

100.00%

100.00%

100.00%

Berlin

99.63%

100.00%

Campinya

99.80%

100.00%

Firenze

98.51%

98.89%

Lima

99.50%

99.54%

Marfil

97.81%

96.83%

Mediterranea

97.97%

98.15%

Oslo

100.00%

100.00%

Petra

96.39%

95.63%

Santiago

99.74%

99.60%

Somport

98.30%

97.62%

Vega

99.93%

100.00%

Venice

99.24%

99.44%

Global mean

98.95%

98.98%

Antique

chosen principal components 共between 3 and 12, with a mean of 7.5 in the experiments兲. Thus, if 4F ⬍ m, this approach will be better than k-NN 共feature compression兲. In our case, m is the number of prototypes 共a mean of 34 per model兲. Thus, SIMCA is, in fact, faster than k-NN. 7 Conclusions This work has analyzed the benefits of using multivariate statistical projection models for surface grading, when using soft color texture descriptors as features. Particularly, SIMCA and PLS-DA, which are based on principal components analysis and partial least squares models, respectively, have been used to automatically classify three different grades on each of the 14 ceramic tile models used as ground truth. These PCA and PLS models let us perform the classification by directly projecting the new observations 共images兲 onto them and computing several statistics and predictions. The hold-out and leave-one-out validation methods have been used to tune the models 共classifiers兲 and estimate classification accuracies. In order to study the possible differences among the accuracies achieved for each ceramic tile model, color space 共CIE Lab, CIE Luv, or RGB兲, and the two approaches applied 共SIMCA or PLS-DA兲, a complete factorial design of experiments has been performed, leading to an 84-run experiment. Since the response variable, the accuracy, is a percentage, the correct way of analyzing the design of experiments is via a logistic regression model.

031106-8

Jul–Sep 2008/Vol. 17(3)

Prats-Montalban et al.: Multivariate statistical projection methods… Table 6 Predicted and observed accuracies using PLS-DA and the hold-out validation method. Tile model

Predicted accuracy

Observed accuracy

Agata

95.94%

91.67%

100.00%

100.00%

Berlin

93.75%

94.44%

Campinya

99.38%

97.78%

Firenze

96.60%

100.00%

Lima

98.74%

100.00%

Marfil

89.65%

95.24%

Mediterranea

96.82%

97.78%

Oslo

99.85%

100.00%

Petra

94.26%

59.52%

Santiago

100.00%

100.00%

Somport

87.68%

90.48%

100.00%

100.00%

Venice

97.67%

100.00%

Global mean

96.45%

94.78%

Antique

Vega

The results showed statistically significant differences among the tile models. Thus, though in all models a very good accuracy rate is achieved, the tile models differ with regards to classification. However, the most important result is that PLS-DA shows better performance than SIMCA, giving strength to the idea that using inferential 共covariance-based兲 models for classification provides better results than using compression 共variance-based兲 models. One possible reason for PLS-DA to perform better than SIMCA is that surface grades show slight differences that do not affect the general internal data structure of the images. Thus, they may not be gathered by the PCA models. But if these differences have an influence on the segregation of the surface grades, they will be comprised in the PLS-DA model. This could be the reason why inferential models should work better than compression models for classes showing some common structure. The logistic regression model predictions have shown accuracies very close to the averaged observed ones, which gives confidence in the results, with predicted values between 96.39% and 100%, with an averaged predicted value of 98.95% 共using leave-one-out validation兲. These results also indicate that the soft color texture descriptors and the multivariate statistical projection models are simple and easy-to-implement techniques in industrial environments for monitoring purposes. Finally, it must be pointed out that PLS-DA has outperformed our earlier results,1 where we used the k-NN clasJournal of Electronic Imaging

sifier instead. In that work, the global accuracy achieved was 97.36%. Moreover, both SIMCA and PLS-DA present additional advantages. On the one hand, they compress the feature space, performing efficient feature extraction and reducing computational costs. On the other hand, they are also able to use a reject criterion,16 so they can reject any new sample that does not belong to any of the classes used to build the models, for some predefined type I risk. However, this reject criterion has not been applied in this work, in order to compare the results with those we achieved earlier.1 Acknowledgments This work was partially supported by European funds and the Spanish Department of Education and Science, FEDER-CICYT 共DPI2007-66596-C02-01兲 and CICYT CTM2005-06919-C03-03/TECNO. References 1. F. López, J. M. Valiente, J. M. Prats, and A. Ferrer, “Performance evaluation of soft color texture descriptors for surface grading using experimental design and logistic regression,” Pattern Recogn. 41, 1161–1172 共2008兲. 2. H. Kauppinen, “Development of a color machine vision method for wood surface inspection,” Ph.D. thesis, Oulu University 共1999兲. 3. D. C. Montgomery, Design and Analysis of Experiments, 4th ed., John Wiley & Sons, New York 共1997兲. 4. J. D. Jobson, Applied Multivariate Data Analysis: Categorical and Multivariate Methods, Springer-Verlag, Berlin 共1992兲. 5. A. Webb, Statistical Pattern Recognition, John Wiley & Sons, Chichester, UK 共2002兲. 6. C. Boukouvalas, J. Kittler, R. Marik, and M. Petrou, “Color grading of randomly textured ceramic tiles using color histograms,” IEEE Trans. Ind. Electron. 46共1兲, 219–226 共1999兲. 7. R. Baldrich, M. Vanrell, and J. J. Villanueva, “Texture-color features for tile classification,” in Proc. EUROPTO/SPIE Conf. on Color and Polarisation Techniques in Industrial Inspection 共1999兲. 8. F. Lumbreras, J. Serrat, R. Baldrich, M. Vanrell, and J. J. Villanueva, “Color texture recognition through multiresolution features,” in Proc. 5th Int. Conf. on Quality Control by Artificial Vision 共2001兲. 9. J. A. Peñaranda, L. Briones, and J. Florez, “Color machine vision system for process control in ceramics industry,” Proc. SPIE 3101, 182–192 共1997兲. 10. J. Kyllönen and M. Pietikäinen, “Visual inspection of parquet slabs by combining color and texture,” in Proc. IAPR Workshop on Machine Vision Applications 共2000兲. 11. V. Lebrun and L. Macaire, “Aspect inspection of marble tiles by color line-scan camera,” in Proc. 5th Int. Conf. on Quality Control by Artificial Vision 共2001兲. 12. S. Kukkonen, H. Kälviänen, and J. Parkkinen, “Color features for quality control in ceramic tile industry,” Opt. Eng. 40共2兲, 170–177 共2001兲. 13. R. C. Gonzalez and P. Wintz, Digital Image Processing, 2nd ed., Addison-Wesley, Reading, MA 共1987兲. 14. G. Wyszecki and W. S. Stiles, Color Science: Concepts and Methods, Quantitative Data and Formulae, 2nd ed., Wiley, New York 共1982兲. 15. S. Wold, C. Albano, W. J. Dunn, U. Edlund, K. Esbensen, P. Geladi, S. Hellberg, E. Johansson, W. Lindberg, and M. Sjöström, “Multivariate data analysis in chemistry,” in B. R. Kowalski 共Ed.兲, Chemometrics: Mathematics and Statistics in Chemistry, D. Reidel Publishing Company, Dordrecht, Holland 共1984兲. 16. J. E. Jackson, A User’s Guide to Principal Components, Wiley, New York 共2003兲. 17. A. K. Jain, R. P. W. Duin, and J. Mao, “Statistical pattern recognition: A review,” IEEE Trans. Pattern Anal. Mach. Intell. 22共1兲, 4–37 共2000兲. 18. P. Nomikos and J. F. MacGregor, “Multivariate SPC charts for monitoring batch processes,” Technometrics 37共1兲, 41–59 共1995兲. 19. H. Wold, “PLS regression,” In Encyclopedia of Statistical Sciences, N. L. Johnson, and S. Kotz 共Eds.兲, vol. 6, pp. 581–591, Wiley, New York 共1984兲. 20. M. Sjöström, S. Wold, and B. Söderströmvol, “PLS discriminant plots,” in Proc. PARC in Practice, Elsevier 共1986兲. 21. L. Noorgard, R. Bro, F. Westad, and S. B. Elgensen, “A modification of canonical variates analysis to handle highly collinear multivariate data,” J. Chemom. 20, 425–435 共2006兲.

031106-9

Jul–Sep 2008/Vol. 17(3)

Prats-Montalban et al.: Multivariate statistical projection methods… 22. J. M. Prats-Montalbán, A. Ferrer, J. Gorbeña, and J. L. Malo, “A comparison of different discriminant analysis techniques in a steel industry welding process,” Chemom. Intell. Lab. Syst. 80, 109–119 共2006兲. 23. J. M. Prats-Montalbán and A. Ferrer, “Integration of colour and textural information in multivariate image analysis: Defect detection and classification issues,” J. Chemom. 21, 10–23 共2007兲. 24. M. L. Mahy and E. A. Oosterlink, “Evaluation of uniform color spaces developed after the adoption of CIELAB and CIELUV,” Color Res. Appl. 19共2兲, 105–121 共1997兲. 25. R. Kohavi, “A study of cross-validation and bootstrap for accuracy estimation and model selection,” in Proc. Int. Joint Conf. Artif. Intell. (IJCAI), pp. 1137–1145 共1995兲. José Manuel Prats-Montalbán received his MS in industrial engineering in 1998 from the Technical University of Valencia, Spain. He worked for Quality Management Consulting from 1999 to 2000, and then he joined the Applied Statistics, Operations Research and Quality Department at the Technical University of Valencia, where he received his PhD in statistics in 2005. He is currently a lecturer in this department, joining Dr. Ferrer’s Multivariate Statistical Engineering Group. Fernando López is currently a lecturer in the Computer Engineering Department 共DISCA兲 of the Technical University of Valencia, Spain. He received his MS and PhD in computer science from the Technical University of Valencia in 1996 and 2005, respectively. From 1998 to 2000, he worked for RADICOM, a medical imaging software company. Dr. López is a member of the Spanish Association of Pattern Recognition and Image Analysis 共AERFAI兲. His research interests include pattern recognition and computational models for human-like vision.


José M. Valiente received his MS and PhD in industrial engineering from the Technical University of Valencia, Spain, in 1984 and 1996, respectively. Presently, he is a professor in the Computer Engineering Department 共DISCA兲 and a main researcher for the Computer Vision Research Group at the Technical University of Valencia. His research interests include image description and modeling, color image segmentation, and defect detection and classification in automated visual inspection tasks. He is particularly interested in the application of these techniques in industrial sectors like ceramics or textile. Dr. Valiente is a member of the IEEE Computer Society. Alberto Ferrer received his MS in agricultural engineering and his PhD in statistics from the Technical University of Valencia, Spain, in 1987 and 1991, respectively. Presently, he is a professor in the Applied Statistics, Operation Research and Quality Department and a main researcher for the Multivariate Statistical Engineering Division at the Technical University of Valencia. His research interests include statistical techniques for quality and productivity improvement. Dr. Ferrer is a member of the editorial board of Quality Engineering, a member of the International Society for Business and Industrial Statistics 共ISBIS兲 Council, and a member of the European Network for Business and Industrial Statistics 共ENBIS兲.

031106-10

Jul–Sep 2008/Vol. 17(3)