Similarity-Measure-Based VLSI Searching System

0 downloads 0 Views 3MB Size Report
system for MPEG-7 is proposed using VLSI Associative Processor (AP) ..... layout, one MD module and one WTA module are provided for 10 template memories.
Similarity-Measure-Based VLSI Searching System for MPEG-7 Huaiyu Xu Department of Electronics Engineering, The University of Tokyo

Yoshio Mita Department of Electronic engineering, The University of Tokyo

Tadashi Shibata Department of Frontier Informatics, The University of Tokyo ABSTRACT In this paper, a general-purpose similarity-measure searching hardware system for MPEG-7 is proposed using VLSI Associative Processor (AP) chips. The system downloads MPEG-7 data from large volume of multimedia databases on the Internet into the on-chip cache memory of an AP, then performs similarity-based search in an extremely short time. Although the similarity-based search processing is computationally very expensive by software, latency free search has become possible due to the highly parallel maximum-likelihood search architecture of the AP chip. In this study, the hardware-accelerated searching on a client’s computer is introduced. In this solution, an AP chip is embedded in a personal computer; a person can enjoy a very comfortable interactive preference search all in a response time less than 10-3 sec. As an example, a house design E-commerce application, where data is transferred in the MPEG-7 vector format, has been developed. Using a prototype AP chip implemented in field-programmable gate arrays (FPGA), the effectiveness of such application systems has been demonstrated. KEYWORDS: similarity-based search, MPEG-7, Associative Processor, E-commerce, house design, FPGA INTRODCUTION Toward the increasing access to a large number of contents of multimedia information, how easy the query can be found, retrieved, accessed and filtered becomes very important.

In order to enable both humans and machines to generate and understand the multimedia descriptions, the MPEG-7 standard is developed to provide a rich set of standardized tools to describe multimedia content. [1]. In grace of the MPEG-7 standard, it becomes possible to develop a universal search engine based on the similarity-measure applicable to any sort of multimedia contents. Such search engine will play an essential role in multimedia searching on MPEG-7. However, in order to find the best results that match a query, the similarities between the query and all template vectors, representing multimedia contents in the MPEG-7 format, should be calculated. It is really a time-consuming computation when executed by software. How fast and how efficiently one can find the necessary information is a key issue. Therefore the importance of an efficient similarity search shall be emphasized. Some algorithms that take soft similarity values into account have been developed to reduce the computation time. To reduce the computation time, two excellent algorithms are known to be successful: one is the use of k-d trees [2] and the other is case retrieval net [3]. However, both of them are inefficient if the database is often updated [4]. On the contrary, frequent update of database is the very key feature of on-line applications. The authors claim that introduction of similarity-computing integrated circuit (Associative Processor: AP) can solve the problem; namely, we can achieve a very fast similarity-measure based search even if the database is updated very frequently. In our research group, various VLSI similarity search AP’s have been developed both in digital [5] and analog-digital-merged [6,7] technologies. The optimization of the AP architecture for multimedia applications was studied in our previous work [8]. Two solutions of the similarity-based search engines on the WWW were then proposed [9]. In the present work, we focus on how to apply these processors to MPEG-7. The aim of this paper is to develop a general-purpose similarity-measure based VLSI searching system for MPEG-7 employing the AP chip as a key component in the hardware. Taking one of the examples that are predicted on the MPEG-7 white paper [1], a house design system using MPEG-7 standardized vector was developed and demonstrated. The AP prototype chip on FPGA (Field-Programmable Gate Array) running at 30MHz showed the capability of reducing similarity computation time for 40,000 vectors by a factor of 103, as compared to the software calculation running on a Pentium-III PC at 600MHz. MPEG-7 DATA SEARCHING SCHEME Let us imagine that in near future all the multimedia information is described with the

MPEG-7 standard. There will be a large demand for efficient search engines on variety of contents such as music, real estate, and house design [1]. In our previous work [9], we have proposed two types of architecture for such search engines: the server-based solution and the client-based solution. In this paper, we propose the client-based solution is most suitable for the application. In the client-based solution, AP chips are embedded in client PC’s as shown in Fig. 1. While the user is specifying what he or she wants to search, the client PC will visit many databases on the Internet and download template vectors in the standard format. Since human response time for specification is the order of seconds, a sufficient number of data are expected to be stored in the cache memory of the AP. Then the user performs similarity-measure searching many times until he or she gets satisfied. The response can be returned with no latency because of the nature of hardware-parallel searching, thus allowing the user to enjoy comfortable searching with many preference requirements under no stress for waiting. Q u ery

Top 1



Top M



D atabase 1 Server1 …

A p p lic a tio n

W W W

D atabase n … A P c h ip s

Server n

Fig. 1 A MPEG-7 client-based solution. In Fig. 1 the multimedia information is given from the servers in MPEG-7 vector format [10]. A vector should be declared in eXtensible Markup Language (XML): If the application is a house design, the house can be described as follows: 198 20 25 16 13 10 18

There are 7 elements in this ‘HouseDesign’ vector. In this example, each value refers to the square, dinning room, major room, second room, third room, kitchen and family room, respectively. Several working groups are now trying to standardize the contents and the order of elements for their applications. The important thing is that the information from any server has the same format. It becomes possible for a client program to visit many servers to download the standard

format data and perform searching. It means that a universal AP-based search engine becomes possible to be realized. Figure 1 shows typical similarity-measure based VLSI searching system architecture. During the negotiation between the customer and the PC, the vectors that are relevant to the query can be downloaded into the AP from the different database servers. If the number of candidates exceeds the maximum downloadable limit of the AP, the system asks detailed conditions to reduce the number. For example, if a user wants to search a three-bedroom house design, only the information relevant to the three-bedroom is downloaded. Then the AP calculates the similarity and Top-M most similar results are returned. The similarity is measured in Manhattan distance; several other similarity-measure is applicable. Due to the highly parallel maximum-likelihood search architecture of the AP chip, latency free search is possible. Although it may take time to download the information from servers, it is worth downloading because the customer needs to repeat search many times by freely changing the search policy. SIMILARITY-MEASURE-BASED VLSI SEARCHING SYSTEM WWW

MPEG -7 vector parser

Query

TM



… … 100

MD 1

WTA 1





Query

Data Processing & Driver

100

Result Result analyzer analyzer User Analyzed results

Application

1

Top 1~M

1

Buffer memory for distance & comparator

Top M

VLSI similarity search engine

Fig. 2 The scheme of the similarity-measure based VLSI searching system The scheme of the similarity-measure based VLSI searching system is shown in Fig. 2. When the MPEG-7 multimedia information is downloaded from the Internet, the MPEG-7 vector parser analyzes the data and sends the preprocessed data into the cache memory of the AP. The AP chip stores all the information by means of template vectors. The authors have already studied several pieces of AP architecture [9]. The best architecture for the MPEG-7 client-based solution is shown in Fig. 2. In the AP, one Manhattan distance (MD) calculation module and one winner take all (WTA) module are provided for 100 template

memories (TM). A buffer memory storing distance values and a comparator are added for sequential winner search. One MD module calculates all Manhattan distances between a query vector and 100 template vectors. Top-M maximal likelihood results will be analyzed and returned to the user. In the designed chip [8,9], the dimension of a vector is scalable to any size up to 128. One AP chip can contain 4640 vectors when a vector has 128 elements. If the number of elements in a vector is smaller, the number of storable vectors increases. Over 20000 vectors are storable in one chip with 16-dimensional vectors. The AP needs only 4×10-4 second to return the top 100 results to the customer since the computation is carried out in parallel on the hardware. The speed is fast enough for one user. AN EXAMPLE OF SIMILARITY-MEASURE BASED VLSI SEARCHING SYSTEM FOR MPEG-7

Fig. 3 House design example of similarity-measure based VLSI searching system for MPEG-7. An AP-based versatile house design searching application that takes individual’s preference into account has been developed. The system downloads the house design information from the Internet in the MPEG-7 vector format. The dimension of the house design vector is 7. The vector type is 8-bit integer. Downloading is preformed during the negotiation between the customer and PC. After negotiation, system searches candidates with the AP, and returns the top-M analyzed house design ideas best matching to the query to the customer in real time. Figure 3 shows the house design example of similarity-measure based VLSI searching system for MPEG-7. The house design search was demonstrated on a hardware system (left-hand of Fig. 4). The

demonstration system is consisted of an interface board and an AP implemented on 400kgates FPGA (ALTERA EP20K400EFC672-1). The AP was designed in Verilog-HDL and mapped to the FPGA using Quartus design software. The same searching function is implemented using only C++ language. Computation time at FPGA and software are measured for comparison. Clock frequency for the AP was 30MHz and that for software computation on Pentium III processor was 600MHz.

Fig. 4 A prototype board and layout of the AP. To show the feasibility of AP in real VLSI, a layout has been designed in a 0.6-µm triple-metal CMOS technology. The layout is shown on the right-hand side of Fig. 4. In the layout, one MD module and one WTA module are provided for 10 template memories (TM). It contains 256 vectors of 128 dimensions on one chip. In order to increase the number of template vectors on one chip, 32KBytes SRAM on the chip is manually designed at the layout level to contain maximal number of memory in the limited area. The MD, WTA modules are designed using Verilog-HDL. Synopsis Design Analyzer synthesized the design, and then Apollo obtained the layout. The chip size is 9mm×9mm. If the top 100 results were searched within 40,000 template vectors, which would be necessary for search in a sufficient number of design ideas, the search time by software becomes 4.2 seconds. It is intolerant for a user who will search many times with different policies for results. On the other hand, the AP for MPEG-7 client-based solution can return the top 100 results less than 10-3 second. The AP is over 10000 times faster than the software at a clock frequency of only 30MHz. Therefore similarity-measure based VLSI searching system for MPEG-7 becomes feasible by employing the AP chips. CONCLUSIONS

Real time similarity-measure based searching system for MPEG-7 becomes possible by using the Associative Processor. It is important to the increasing amount of MPEG-7 applications on the Internet. Since the Associative Processor has a very flexible architecture in terms of the number of elements and weighted similarity calculation, it can be used in every kind of MPEG-7 search applications. The MPEG-7 house design searching application has been developed and successfully demonstrated. By the performance comparison using the FPGA prototype, it is shown that more than 104 times faster searching is possible for template vectors over 40,000 using AP at a clock frequency of 30MHz as compared to the software calculation, thus allowing the high-performance and low power system solutions. REFERENCES [1] J. Martinez, “Overview of the MPEG-7 Standard (version 5.0),” ISO/IEC JTC1/SC29/WG11 N4031, March, Singapore (2001). [2] S. Wess, K.-D. Althoff and G.Derwand, “Using k-d trees to improve the retrieval step in case-based reasoning,” Topics in Case-Based Reasoning. Springer Press (1994). [3] M. Lenz and H.-D. Burkhard, “Case Retrieval Nets: Basic Ideas and Extensions,” In G. Gorz, S. Holldobler (Eds.). KI-96: Advances in Artificial Intelligence. Springer Press (1996). [4] W. Wilke, “ CBR and electronic commerce on the WWW,” Invited talk on the International conference on Case-Based Reasoning (ICCBR-97), Providence, Rhode Island (1997). [5] A. Nakata, T. Shibata, M.Konda, T. Morimoto and T. Ohmi, “A fully-parallel vector quantization processor for real-time motion picture compression,” IEEE Journal of Solid-State Circuits, Vol.34, No. 6, pp. 822-830 (1999). [6] M.Ogawa and T. Shibata, “Nmos-based Gaussian-Elements-Matching Analog Associative Memory,” European Solid-State Circuit Conference (ESSCIRC ’01), Sept. 18-20, Vilach, Austria, pp. 272-275 (2001). [7] T. Yamasaki, K. Yamamoto and T. Shibata, “Analog Pattern Classifier with Flexible Matching Circuitry Based on Principal Axis Projection Vector Representation,” ESSCIRC ’01, pp. 212-215 (2001). [8] H. Xu, Y. Mita and T. Shibata, “Optimizing Associative Processor Architecture for Intelligent Internet Search Applications,” 2001 International Conference on Solid State Devices and Materials (SSDM-2001), September 26-28, Tokyo (2001). [9] H. Xu, Y. Mita and T. Shibata, “ Intelligent Internet Search Applications Based on VLSI Associative Processors,” The 2002 International Symposium on Applications and the Internet (SAINT-2002), January 28-Feburary 1, Nara, Japan (2002). [10] P.v. Beek, A. B. Benitez, J. Heuer, J. Martinez, P. Salembier, J. Smith and T. Walker, “MPEG-7 Multimedia Description Schemes XM (Version 3.1),” ISO/IEC JTC 29/WG 11/M6155, July, Beijing, (2000).