Cognitive Image Representation Based on

10th WSEAS Int. Conf. on MATHEMATICAL METHODS AND COMPUTATIONAL TECHNIQUES IN ELECTRICAL ENGINEERING (MMACTEE'08), Sofia, Bulgaria, May 2-4, 2008

Cognitive Image Representation Based on Spectrum Pyramid Decomposition ROUMEN KOUNTCHEV Department of Radio Communications Technical University of Sofia, BULGARIA

STUART RUBIN Space and Naval Warfare Systems Center San Diego (SSCSD) CA, USA

VLADIMIR TODOROV T&K Engineering Sofia BULGARIA

MARIOFANNA MILANOVA Department of Computer Science UALR, USA

ROUMIANA KOUNTCHEVA T&K Engineering Sofia BULGARIA

Abstract: - The contemporary image representation is based on various techniques, using matrices, vectors, multi-resolution pyramids, R-tree, orthogonal transforms, anisotropic perceptual representations, etc. In this paper is offered one new approach for cognitive image representation based on adaptive spectrum pyramid decomposition controlled by neural networks. This approach corresponds to the hypothesis of the human way for image recognition using consecutive approximations with increasing resolution for the selected regions of interest. Such image representation is suitable for the creation of the objects’ learning models, which should be extracted from image databases in accordance with predefined decision rules. Significant element of the new representation is the use of a feedback, which to provide iterative change of the cognitive models’ parameters in accordance with the data mining results obtained. Key words: - Cognitive image representation, Image decomposition. shape features [3, 4], RST-invariant features [5], Rtree [2], etc. The pyramidal representation [1-4] describes the image with progressively increased resolution, which corresponds to the layers of the Gaussian-Laplacian Pyramid. The derivatives of this representation are the Reduced Sum/Difference pyramid; the S-transform pyramid, the HierarchyEmbedded Differential Pyramid; the Least Square Pyramid, the Morphological Pyramid, etc. This group of pyramids is called over complete [6] because the needed memory is larger than that for the non-compressed image. The Orthogonal pyramids are non-over complete. They are usually based on Wavelets [1] or Contourlets [7] functions and have higher efficiency and computational complexity than pyramids from the first group. The spectral image representation [1-4] is based on orthogonal transforms of different kind: statistical (Karhunen-Loeve Transform, Principle Component Analysis, Independent Component Analysis, Singular Value Decomposition) and determined (Discrete Fourier Transform, Discrete Cosine

1 Introduction The contemporary means for still image representation depend on their application, for example – medicine, digital libraries, electronic galleries, geographic information systems, documents archiving, digital communication systems, etc. The primary form for digital image presentation is the matrix [1]. In this case the image is not compressed. In result, its storage requires significant information resources, and the processing – high computational power. The secondary forms for image representation are obtained from the primary one and could be multi-dimensional vectors, pyramids, orthogonal transforms, tree structures, algebraic models, models for visual information perception, etc. The vector representation is used for image compression with vector quantization [2] and for image analysis and recognition based on 3dimensional colour features, textures features, Kdimensional colour histograms, multi-dimensional

ISBN: 978-960-6766-60-2

230

ISSN: 1790-5117


Transform, Walsh-Hadamard Transform-WHT, Hartley Transform, Lapped Orthogonal Transform, etc.). In this group could be included the new algebraic image transform [8] based on 2D angular windowing functions, which is suitable for the synthetic shape local phase and orientation evaluation. The transforms from the first group have higher computational complexity than these from the second one. Another approach for image representation is the perceptual one [9], based on anisotropic filtration controlled by the Human Visual System (HVS) visual attention model. The knowledge-based models for image representation are used mostly in the systems for Visual Information Retrieval [10-13]. In these cases the main approach for image representation used is the pyramid model of 4 layers, which contain correspondingly: the primary matrix, the features vectors, the description of the relations between the features and the semantic image structure. Other methods for image representation in the Content-Based Image Retrieval (CBIR) systems are based on the colour- or spectrum histograms similarity of the queried image and the images in the database [11, 12] or on the introduction of relevance feedback for the image retrieval [13]. The methods for RST (Rotation Translation and Scaling) invariant image representation are based on iterative models obtained with affine transforms [14]; Fourier-Mellin transform [15]; Matching Pursuit techniques [16] and affine moment invariants [17]. Another method for pyramidal multi-scale image representation based on Gabor functions [18] offers efficient implementation in the spatial domain, which is faster than the conventional Fast Fourier Transforms. The common features of the image representation methods, described above, are the relatively poor use of the image content knowledge obtained through learning, or that they are based on too complicated cognitive structures. The state-of-the-art analysis shows that there still exist unexplored possibilities for wider cognitive approach in the creation of the object representation model. The requirements on these models are contradictory: minimum features number and invariability together with exact description, low computational complexity, etc. The methods, described above, solve these problems to some degree, but can’t achieve the best balance, because they are not flexible enough (they do not involve learning and feedback procedures). In this paper is offered new approach for cognitive image representation based on adaptive spectral pyramid decomposition with neural network control. This approach is based on the hypothesis for the human way for image recognition using

ISBN: 978-960-6766-60-2

consecutive approximations with increasing resolution for the selected regions of interest. Such kind of image representation is suitable for the creation of the queried objects learning models, which should be extracted from image databases in accordance with predefined decision rules. Significant element of the new representation is the use of a feedback, which provides iterative change of the cognitive models’ parameters in accordance with the image data mining results obtained.

2 Basic Principles of the Cognitive Image Representation The main idea of the new approach is based on the method for image representation with Inverse Spectrum Pyramid (ISP) decomposition [19, 20]. The decomposition represents the image with consecutive approximations based on any kind of 2D orthogonal transform (DCT, WHT, etc.), retaining the resolution and increasing the approximation quality. The calculated transform coefficients build the consecutive layers of the spectrum pyramid. The essence of the ISP decomposition is presented in brief for 8-bit greyscale images as follows. First, the digital image is processed with two-dimensional (2D) direct Orthogonal Transform (OT) using limited number of coefficients only. The values of the coefficients, calculated in result of the transform, constitute the lowest pyramid level. Then, using these values, the image is restored with Inverse Orthogonal Transform (IOT). In result is obtained the first (coarse) approximation of the original image, which is then subtracted pixel by pixel from the original one. The difference image, which is of same size as the original, is divided in 4 sub-images and each is then processed with the 2D OT again. The calculated values of the chosen coefficients constitute the second pyramid level. The processing continues in similar way with the next pyramid layers. The set of coefficients of the orthogonal transform, chosen for every pyramid layer, can be different and defines the restored image quality (more coefficients naturally ensure higher image quality). The image decomposition is stopped when the required quality of the approximating image is obtained – usually earlier than the last possible pyramid layer. The values of the coefficients got in result of the orthogonal transform from all pyramid layers are then quantizated, sorted in accordance with their spatial frequency, scanned sequentially and compressed losslessly. For practical applications the decomposition is usually “truncated”, i.e. it does not start from the lowest possible layer but from some of the higher

231

ISSN: 1790-5117


between corresponding coefficients from the consecutive decomposition layers, is possible to define the object representation model for any decomposition layer, thus solving the problems concerning the image scaling. The object image representation based on this decomposition offers the solution for problems, concerning image rotation and translation: for RSTinvariant transforms the values of the decomposition coefficients are invariant as well. The object representation is usually done using single original image, or more than one image (different view, lighting, colour, scaling, etc). In the second case the initial object representation (in the lower decomposition layers) is fuzzy and the more exact representation is defined for the higher decomposition layers. The offered method permits to develop initial models for some basic image kinds, for example: texts/graphics, cartoon images, medical images, natural greyscale or colour images, etc., which to be later defined more accurately, in accordance with the object peculiarities. This approach, which is based on preliminary knowledge, facilitates the object representation and search. The selection of the participating transform coefficients for every decomposition layer depends on the approximation quality obtained and is defined by neural network (NN). The processed image is represented by reduced spectrum pyramid of n layers, whose coefficients are obtained using the Fourier-Mellin transform. The spectrum coefficients for every sub-image of the consecutive pyramid layers build the vectors of the object’s features, which are then used for the model evaluation. For this the coefficients obtained are processed with Inverse Fourier-Mellin transform, after which the quality of the restored image (i.e. the model error), is estimated. In case that this error is still too big, the back propagation neural network, which controls the features’ selection for the next decomposition layer should be tuned correspondingly. At the last decomposition layer is obtained the final description of the object model. One of the main advantages of the new method for image representation is the ability to obtain query results in large databases faster. The new approach is presented in Fig. 3. The presumption is that the queried image is smaller than the database image. The queried ISP image model is used for the creation of corresponding pyramid decomposition for sub-image (window) of same size in every image from the database. The initial position of the window is the lower left pixel of the database image. For this position is evaluated the distance between the vector

ones and for this, the discrete original image is divided in blocks of corresponding size, 2n×2n. Then, each block is represented by an individual pyramid, whose elements are defined with recursive calculations. The top of the pyramid corresponds to its lowest layer, which comprises smallest number of coefficients. The pyramid decomposition of 3 layers is shown in Fig. 1. The approximation images (test image “Michaela” of size 512 x 512 pixels) for each layer are shown together with their PSNR values and bitrate. The sub-blocks for the 3 pyramid layers are correspondingly 16 x 16 for the lowest, and 4 x 4 for the highest one. Spectral space

Pixel space

Layer p=2

OT - IOT Layer p=1 OT - IOT

Layer p=0 Spectrum Pyramid

Layer p = 0 PSNR = 29,4 dB Bitrate 0,067 bpp Sub-block size 16 x 16

OT - IOT Image Layers

Layer p = 1 PSNR = 32,34 dB Bitrate 0,22 bpp Sub-block size 8 x 8

Layer p = 2 PSNR = 35,95 dB Bitrate 0,8 bpp Sub-block size = 4 x 4

Fig.1. The spectrum pyramid decomposition: example with 3-layer pyramid. The “truncated” pyramid is with lowest layer p = 0, and highest, p = 2.

The described decomposition offers the ability for adaptive and cognitive image representation and flexible match pursuit. The relations existing between coefficients in neighbour sub-blocks from the consecutive decomposition layers permit the image representation with reduced number of coefficients and in result the number of coefficients, representing the image without quality loss is equal with that of the original image (i.e. the reduced pyramidal decomposition is not over complete) [21]. Together with this, knowing the main coefficients, which represent the queried image with required quality and using the existing relations

ISBN: 978-960-6766-60-2

232

ISSN: 1790-5117


¾ The ability to obtain RST-invariant model representation for the queried group of visual objects (images); ¾ The automatic tuning of the models’ parameters for the queried objects depending on their belonging to one or other image class, etc. ¾ The model creation, which involves learning with neural networks, resulting in high flexibility and permitting easy tuning.

of the ISP coefficients for the layer 0 of the queried image and the corresponding vector from the image database. After translation with one step in the selected direction the distance between the compared vectors is evaluated again, etc. The evaluation results are stored for the following procedures. When the scanning of the current database image is finished for the decomposition layer 0, the search continues in similar way with the next database image, for the same decomposition layer until all images are processed. When the analysis for the layer 0 is finished, the database images, which are close enough to the queried image model for this layer, are separated in a special group. In particular, in case that there are no images, which answer the requirements, this group could be empty. The described operations are performed in similar way for the decomposition layer 1 of the separated images only. In the consecutive ISP layers the number of images, which answer the requirement to be close enough to the queried one, becomes smaller. For the defined empty groups additional search should performed, for which through feedback is introduced the next model (different view angle, lighting, etc., if there is such) and the described operations are preformed again. The parameters of the query image ISP model are defined by the tuning NN. In the block diagram are shown two possible outputs: for detected closest image from the database (Out 1) and for missing similar image (Out 2).

Acknowledgement This paper was supported by the National Fund for Scientific Research of the Bulgarian Ministry of Education and Science (Contract VU-I 305/2007).

References 1. R.Gonzalez, R. Woods, Digital Image Processing, Prentice Hall, 2001. 2. A. Gersho, R. Gray, Vector quantization and signal compression, Kluwer AP, 1992. 3. S. Bow, Pattern Recognition and Image Preprocessing. Marcel Dekker, NY, 2002. 4. S. Mitra, T. Acharya, Data Mining: Multimedia, Soft Computing and Bioinformatics, John Wiley, 2003. 5. Z. Lu, D. Li, H. Burkhardt. Image retrieval based on RST-invariant features. IJCSNS, Vol. 6, No 2A, Feb. 2006, pp.169-174. 6. V. Goyal, M. Vetterli, N. Thao, Quantization of over complete expansions, Proc. IEEE Data Compression Conf., Utah, USA, 1995, pp.13-22. 7. M. Do, M. Vetterli, Contourlets in Beyond Wavelets, J. Stoeckler, G. Welland (еds.), Academic Press, New York, 2002. 8. D. Zang, G. Sommer. Algebraically Extended 2D Image Representation. 17th Intern. Conf. on the Application of Computer Science and Mathematics in Architecture and Civil Engineering, Germany, July 2006, pp. 1-10. 9. M. Mancas, B. Gosselin, B. Macq. Perceptual Image Representation, EURASIP Journal on Image and Video Processing, 2007, pp. 1-9. 10. W. Grosky, R. Jain, R. Mehrotra (eds.), The Handbook of Multimedia Information Management, Prentice Hall, 1997. 11. S. Siggelkow. Improvement of histogram-based image retrieval and classification, ICPR, Quebec Canada, 2002, pp. 367-370. 12. X. Liu, D. Wang. Texture classification using spectral histograms, IEEE Trans. on Image Processing, Vol. 12, No. 6, 2003, pp. 661-670, 13. X. Zhou, T. Huang. Relevance feedback in image retrieval: A comprehensive review. ACM

3 Conclusions The method for image representation, described above, opens new abilities for the development of more efficient and accurate algorithms for query-bycontent in image databases, i.e. it permits the development of interactive systems that allow the user to define various queries of the kind: to find N most similar images which best suit the chosen set of image properties. Significant influence is expected in the application areas, concerning image coding, image archiving, the image transmission systems, distance learning, remote medical diagnostics and patients’ monitoring, etc. The basic advantages of the new approach for adaptive cognitive image representation are: ¾ The ability to obtain minimum description, retaining the restored image quality; ¾ The enhanced image data mining in large databases, using the consecutive layers of the spectrum pyramid; ¾ The relatively low computational complexity;

ISBN: 978-960-6766-60-2

233

ISSN: 1790-5117


14. 15.

16. 17.

18. O. Nestares, R. Navarro, J. Portilla, A. Tabernero. Efficient spatial-domain implementation of a multiscale image representation based on Gabor functions. J. Electronic Imaging, V. 7, 1998, pp. 166-173. 19. R. Kountchev, V. Haese-Coat, J. Ronsin. Inverse Pyramidal Decomposition with multiple DCT. Signal Processing: Image Communication, Vol. 17, Issue 2, Feb. 2002, pp. 201-218. 20. R. Kountchev, S. Rubin. Image Compression with Adaptive Inverse Difference Pyramid. Proc. of the 6th World Multiconference SCI’02, Vol. 16, Computer Science III, Orlando, USA, July 14-18, 2002, pp. 412-416. 21. R. Kountchev, R. Kountcheva. Image Representation with Reduced Spectrum Pyramid. KES-IIMSS-08, Athens, Greece, July 9-11, 2008. (In press).

Multimedia Systems Journal, Special Issue on CBIR, 8(6): 2003, pp. 536–544. A. Jacquin. Fractal image coding: a review, Proceedings of the IEEE, Vol. 81,Oct. 1993, pp. 1451-1466. D. Zheng, J. Zhao, A. Saddik. RST Invariant Digital Image Watermarking Based on LogPolar Mapping and Phase Correlation, IEEE Trans. on CSVT, Sept. 2003, pp. 1-14. S. Mallat, Z. Zhang. Matching pursuits with time-frequency dictionaries, IEEE Trans. Signal Processing, Vol. 41, Dec.1993, pp. 3397-3415. J. Flusser, T. Suk. Pattern recognition by affine moment invariants, Pattern recognition, Vol. 26 (1), 1993, pp. 167-174.

Out

Queried image(s) decomposition Layer n

Features extraction

Model evaluation

Final model description

Y

Queried image(s) decomposition Layer n-1

Features extraction

Model evaluation

Good enough?

NN tuning N

Y

Queried image(s) decomposition Layer 2

Features extraction

Model evaluation

Good enough?

NN tuning N

Y

Queried image(s) decomposition Layer 1

Features extraction

Model evaluation

Good enough?

NN tuning N

Input

Fig. 2. Creation of the cognitive image model.

ISBN: 978-960-6766-60-2

234

ISSN: 1790-5117


lmage database image 1

image 2

image N

ISP 1

ISP 2

ISP N

Level n1 Level 01 Level 0

ln

query image ISP model

Level n2

Level nN

Level 02

Level 0N

similarity estimation for level 0

Level 1 Level 11 Level 12

good result?

Level 1N

1 No

Yes

similarity estimation for level 1 Level n1 Level n2

Level nN Yes

good result?

2 No

similarity estimation for level n Level n

NN tuning

Yes

good result?

1 2 more models?

No

No

Out 1 Yes Out 2

Fig. 3. Block diagram of the cognitive image retrieval

ISBN: 978-960-6766-60-2

235

ISSN: 1790-5117