Image Texture Classification Using Textons

6 downloads 1733 Views 817KB Size Report
Abstract—In this paper, we explore the use of textons for image texture classification in the context of population density estimation. For this purpose, we have ...
Image Texture Classification Using Textons Yousra Javed, Muhammad Murtaza Khan National University of Sciences and Technology (NUST), School of Electrical Engineering and Computer Science (SEECS) Islamabad, Pakistan [email protected], [email protected] Abstract—In this paper, we explore the use of textons for image texture classification in the context of population density estimation. For this purpose, we have taken high resolution Google Earth images and classified them into four classes i.e. high population density , medium population density, low population density and unpopulated (land/vegetation) areas. A texton dictionary is first built by clustering the responses obtained after convolving the images with a set of filters i.e. “Filter banks”. Using this dictionary, texton histograms are calculated for each class’s texture. These histograms are used as training models. Classification of a test image proceeds by mapping this image to a texton histogram and comparing this histogram to the learnt models. To obtain a quantitative assessment of the efficiency of the proposed method, we compare the results of the proposed method with those obtained through supervised classification based on texture extracted by Gray Level Co-occurrence Matrix (GLCM). The results demonstrate that texton based classification achieves better results. Keywords: textons, classification, population density I.

INTRODUCTION

Multispectral or hyperspectral satellite images are generally classified into urban and rural areas by using their spectral information [1]. Satellite images are not readily available in under developed countries hence it is of interest to develop methods based on information which is more readily available. In this regard, we have focused on using Google Earth images. Although, Google Earth images have a high spatial resolution, their spectral fidelity is not assured. Thus, the spectral information provided by these images cannot be used confidently for classification. Thus focus can therefore be shifted to the use of texture information present in these images for classification purposes. An image texture is a collection of “elements” or “patterns”, where the elements themselves may or may not have well-defined structures [2]. Texture can be analyzed with the help of structured and statistical approaches. According to the structured approach, an image texture is a set of primitive units in some regular or repeated pattern, i.e. artificially created textures. A statistical approach sees an image texture as a quantitative measure of the arrangement of intensities in a region, i.e. natural textures such as wood and rocks [2][3]. Since texture information is related to variation in intensities, this information is readily available in all the images and makes it a suitable candidate for classification. Recently in [4] the authors have used Google earth images to

978-1-4577-0768-1/11/$26.00 ©2011 IEEE

find villages i.e. classification in urban and rural areas by using phase gradients of the regions as texture features. Image texture has been used for image classification, including population density estimation in satellite images. In [5], Liu et al. has modeled population density in urban areas using GLCM as the image texture. The results show that GLCM can differentiate between residential and nonresidential land use satisfactorily. In [6], regression analysis was conducted to infer the relationship between block population densities and image texture using semivariance. Their results show that their method has higher accuracy compared to the conventional land-use-based dasymetric mapping method. Thus, there exist different methods of extracting texture from an image and the classification results obtained using different methods vary. This highlights the need to try other texture extraction methods and observe their effectiveness for the task of classification. Recently, textons have been proposed for image texture representation [9][3]. We propose to use textons for image texture classification because they provide considerable speed efficiency and data reduction as compared to other approaches. The paper is organized as follows. Section II explains the concept of textons and how they are calculated. Section III describes our proposed methodology. Section IV details our experiments and the respective results and Section V presents a brief discussion on results and future perspective. II.

TEXTON BASED CLASSIFICATION

In this section we present an introduction to textons; how they are calculated and how they can be used for texture classification. Textons refer to the fundamental structures in natural images and thus constitute the basic elements in visual perception. By analogy to physics, if image bases are like protons, neutrons and electrons, then textons are like atoms [7][8]. These textons can represent many different pixel relationships in a region, which is essential for image texture analysis. Textons may also be defined as the representative responses occurring after convolving an image with a set of filters, “filter banks”. The procedure for texton calculation comprises of the following steps: 1. Obtaining training data

2. Selection of filters 3. Clustering the filter responses 1) Obtaining training data: The training data should contain all the information that needs to be classified. For example, for our case, the training data containing the four classification types, was gathered with the help of experts.For each texture class, the experts were asked to extract ten sample image blocks from Google Earth images. Figure 1 shows samples for each of the four texture classes.

Fig. 1. Sample texture image blocks. From left to right: high, medium, low population density and unpopulated areas

2) Choice of filter bank: A filter in a filter bank is an n×n matrix of numbers that when convoluted with each pixel results in various different features of pixels. Mostly, n is chosen to be of size 3, 5, 7, 25 or 49. Following are the three types of filter banks that can be used for texton based classification: a) The Leung-Malik (LM) bank: The LM bank consists of 48 filters at multiple scales and orientations. It consists of first and second derivatives of Gaussians at 6 orientations and 3 scales making a total of 36; 8 Laplacian of Gaussian (LOG) filters; and 4 Gaussians. [8].Thus, resulting in 48 filters. b) The Schmid (S) bank: The Schmid bank consists of 13 rotationally invariant filters of the form

F(r,ߪ,߬) = Fo(ߪ,߬) + cos(ߨ߬r/ߪ) e –r2/2ߪ2

Vpi = [FR1 FR2 FR3….. FRn]

(2)

Where pi represents the pixel index and FR is the filter response on that pixel.The resulting filter response vectors are divided into k clusters using k-means clustering algorithm. The response vectors corresponding to the k cluster centers are regarded as the textons of a particular texture class. Thus, the number of textons is dependent on the value of k chosen.The same procedure is repeated for other texture classes resulting in their respective textons. III.

METHODOLOGY

In this section, we describe how the texton based texture features were used for population density estimation. A flow diagram of the proposed methodology is shown in Figure 3. A. Texton Dictionary A texton dictionary is built by merging the textons of all the texture classes. We experimented with k=5, 10 and 15 and chose k=10 for clustering the responses. Therefore, merging 10 textons per class resulted in a texton dictionary of 40 textons as shown in Figure 2. Varying the value of k between 5 to 15 did not affect the results significantly and it was observed that for k=10 the size of the dictionary was neither too small nor too large thus, making k=10 an appropriate choice for our algorithm.

(1)

Where r represents the x and y coordinates of a pixel, ı is the scale and ߬ is the frequency i.e. number of cycles of the harmonic function within the Gaussian envelope of the filter. Fo(ı,߬) is added to obtain a zero DC component with the (ı, ߬) pair taking 13 different values[8]. c) The Maximum Response (MR) bank: The MR8 filter bank consists of 38 filters at multiple orientations. They comprise of Gaussian and laplacian of Gaussian filters. Maximum response across all orientations reduces the number of responses from 38 (6 orientations at 3 scales for 2 oriented filters, plus 2 isotropic) to 8 (3 scales for 2 filters, plus 2 isotropic)[8]. The MR4 filter bank is a subset of the MR8 filter bank where the oriented edge and bar filters occur at a single fixed scale (ıx = 4, ıy = 12). We tested all the three filter banks on sample image blocks and observed that the best results were obtained with LM filter banks. Therefore, we selected the LM bank for our experiments. 3) Clustering the filter responses: From each class, k representative vectors are selected and are referred to as

978-1-4577-0768-1/11/$26.00 ©2011 IEEE

textons. For this purpose, image blocks of the class are passed through the LM filter bank. At each pixel a response vector of size equal to the number of filters in the filter bank is formed by storing the response from each filter.This response vector is represented in the following form:

Fig. 2. Procedure for building texton dictionary B. Building models Using the texton dictionary, a histogram telling how many pixels are assigned to each texton is calculated for each of the ten images of a particular texture class. This is done by first assigning pixels to each texton based on the minimum norm between their filter response vectors. After that, the total number of pixels assigned to a particular texton is calculated. The histograms for the ten images per texture class are then averaged out. This procedure is repeated for each texture

class. The histograms of the four classes are then used as models for classification [8]. C. Classification For classification, the test image is split up into blocks and texton histogram for each block is calculated. The norm of each block’s histogram is computed with each of the four textures’ model histograms. The texton histogram giving the minimum norm determines to which texture class the image block belongs. The images were classified into four classes similar to that in supervised classification. Build texton dictionary

Convolve image blocks with filter bank

K-means clustering of the responses resulting in a dictionary for each class

Construct filter response vectors for each pixel

Construct training models for texture classes

Build texton histograms for the four classes by counting pixels assigned to each texton based on minimum norm

Calculate minimum norm between filter response vectors and textons

Classify a test image Split test image into blocks

Construct model for each block

Assign texture class based on minimum norm with the training models

Fig. 3. Texton based classification algorithm

IV.

EXPERIMENTS AND RESULTS

In this section, we discuss our experimental setup and the results. Using Kappa coefficient as a quantitative measure, a comparison of our proposed method is presented with GLCM based population density estimation method. A. Dataset Our image dataset comprises of 10 premium resolution 4800 x 2850 pixels Google earth images from Islamabad, Rawalpindi and nearby villages. The sample texture blocks for each class were extracted from these images with the help of experts who marked the ground truth using our marking tool. Our marking tool allows an expert to mark/fit a polygon on a region in the Google Earth image and classify it among one of the four population density categories. B. Class Labels Experts were asked to mark ground truth images into the following four classes: • High population density(Red)

978-1-4577-0768-1/11/$26.00 ©2011 IEEE

• Medium population density (Green) • Low population density (Blue) • Unpopulated i.e. Land/Vegetation (White) In the ground truth marked images, land/vegetation areas have been left as it is at some places. These pixels are counted as the fourth class i.e. white color. Sample ground truth marked images are shown in figure 4 and 5b. C. Experimental Setup In our experiments, we built texton histograms for image blocks of sizes 32x32 and 64x64 pixels. Therefore, for classification, we split the image into blocks of the same sizes respectively and performed classification on each block separately. We compared our experimental results with those of supervised classification based on texture. Gray Level Cooccurrence Matrix was used for image texture and support vector machines as the classifier [5]. We calculated GLCM for each pixel on a 2h x 2h window at 0, 45, 90 and 135 degrees, varying h from 1 to 9. The final GLCM was obtained by averaging out the four GLCMs. Six descriptors based on this final matrix were used as the texture feature vector [5]. The classifier was trained on ground truth images marked using our marking tool. Firstly, the texture of the training images was extracted. A training model was built based on the textures and labels of the marked images. Optimum parameter values for the model were selected by training the classifier on a sample dataset for varying values of cost and gamma and choosing the values that gave the highest prediction accuracy. The training model corresponding to these values was selected. For classification, texture for the test image was calculated and the training model was used to predict the labels for the test image based on its texture. The prediction accuracy achieved by SVM classifier in our experiments was 37%. In order to have a close comparison of the texton based classification and GLCM based supervised classification, we performed pixel wise classification in addition to the block wise classification. For this purpose, a 32x32 pixel window was slid across each pixel and the pixel was classified based on the texton histogram for this window. Figures 4 and 5 show the classification results on city and village images. D. Quantitative comparison of the classification methods To estimate the accuracy of texton based classification scheme, we compared our method to GLCM based classification approach using the kappa coefficient. Table 1 displays the Kappa coefficient values for the two classification methods. Table 1 shows that the kappa coefficient values for texton based classification using block size 64x64 pixels is atleast two times higher than that for the GLCM based approach.

a) Test image

b)

Ground Truth

c) Texton based classification on block size 64x64 pixels

d) Texton based classification on block size 32x32 pixels

e)

Texton based classification (pixel wise)

a) Test image

b) Ground Truth

c) Texton based classification on block size 64x64 pixels

d) Texton based classification on block size 32x32 pixels

e) Texton based classification (pixel wise)

f) GLCM based supervised classification

f) GLCM based supervised classification

Fig. 4. Classification results on a city area

Fig. 5. Classification results on a village area

978-1-4577-0768-1/11/$26.00 ©2011 IEEE

TABLE I CLASSIFICATION COMPARISON BASED ON KAPPA COEFFICIENT

Image

City Village

Texton based classification

GLCM based supervised classification

Block size 64x64

Block size 32x32

Pixel wise

Pixel wise

0.59 0.53

0.40 0.33

0.35 0.47

0.20 0.30

It can also be seen from Figures 4c and 5c that most of the areas were correctly classified in the block wise texton based classification approach as compared to the GLCM based approach. V.

CONCLUSION

In this paper, we have proposed a population density estimation scheme based on texture classification.We have demonstrated that using texton based texture features provide better results as compared to GLCM. We have demonstrated this quantitatively using kappa coefficient and qualitatively. The increase in accuracy of classification increases for blocks instead of pixel level classification results since the ground truth was marked by human experts using polygons and hence generally encompasses large areas. This may not be disadvantageous as we generally do prefer to get regions related to population rather than individual pixels. Currently, we have tested this method on a small dataset and we intend to test it on larger and more diverse datasets. REFERENCES [1]M. Fauvel , J. Chanussot , J. A. Benediktsson and J. R. Sveinsson "Spectral and spatial classification of hyperspectral data using SVMs and morphological profiles", IEEE International Geoscience and Remote Sensing Symposium, IGARSS, vol. 46, p.3804 , 2008 [2] LS. Davis, “Foundations of Image Understanding”, Kluwer Academic Publishers Norwell, MA, USA, 2001 [3]Th. Leung and J. Malik, “Representing and Recognizing the Visual Appearance of Materials using Three-dimensional Textons”,In Proc. Of International Conf. on Computer Vision (ICCV), 1999 [4] K. Murtaza, S. Khan and N. Rajpoot, “VillageFinder: Segmentation of Nucleated Villages in Satellite Imagery”, British Mission Vision Corporation, (2009) [5] X.H. Liu, “Dasymetric mapping with image texture”, Proceedings of American Society for Photogrammetry and Remote Sensing ASPRS, USA, 2004 [6] S. Wu, X. Qiu, and L. Wang, “Using semi-variance image texture statistics to model population densities”, Cartography and Geographic Information Science, Apr.2006 [7] S. C. Zhu, C. E. Guo, Y. Wu, and Y. Wang, “What are textons”, In Proc. Of European Conf. on Computer Vision (ECCV), 2002

978-1-4577-0768-1/11/$26.00 ©2011 IEEE

[8] M. Varma and A. Zisserman, “A statistical approach to texture classification from single images”, International Journal of Computer Vision, 62(1–2):61–81, Apr. 2005 [9] B. Julesz, “Textons, the elements of texture perception, and their interaction”, Nature, 1981 [10] R.D.Nawarathna, J.Oh, X.Yuan, J.Lee and S. J. Tang, “Abnormal Image Detection Using Texton Method in Wireless Capsule Endoscopy Videos”, pp.153-162, ICMB, 2010 [11] http://chengenguo.com/uclatexton.htm