fpga implementation of adaboost algorithm for

0 downloads 0 Views 199KB Size Report
hardware implementation of kernel function in SVM .... In our work, Xilinx System GeneratorTM [4], Xilinx. ISETM, and MATLAB were used for FPGA design.
FPGA IMPLEMENTATION OF ADABOOST ALGORITHM FOR DETECTION OF FACE BIOMETRICS Yu Wei, Xiong Bing, and Charayaphan Chareonsak School of Electrical and Electronic Engineering, Nanyang Technological University, Nanyang Avenue, Singapore 639798. E-mail: {[email protected],[email protected]} ABSTRACT At present, most of the high performance arbitraryshape object detection algorithms are based on statistical methods such as SVM (Support Vector Machine) and AdaBoost. AdaBoost method for frontalview face detection was proposed by Viola [1] in 2001. This method offered impressive accuracy at high speed. Thus, many medical applications requiring biometric and general object detection are good candidates for AdaBoost algorithm. However, in the cases of highresolution image and real-time video, the algorithm still poses a high computation load that could not be met by average personal computers. In this paper, FPGA (Field Programmable Gate Array) design of hardware for a real-time face detection based on AdaBoost algorithm is described. Being fully programmable, hardware design using FPGA offers a much shorter development time and enables a quick verification of DSP algorithms.

1. INTRODUCTION Face detection is one of the most active research topics in the field of bioinformatics due to its potential applications. Some of the early research works involve using low-level features such as color, motion, and texture. However, the complexity of real world scenes limits their success. Soft computation methods such as neural network were also studied [2]. Currently, the best reported face detection techniques are mostly based on using statistical methods such as SVM and AdaBoost [1]. The hardware implementation of kernel function in SVM method poses many challenges. Linear SVM without kernel function does not offer the high performance found in non-linear SVM. Software implementation of AdaBoost algorithm is efficient and a near real-time operation can be achieved using personal computer. Frontal face images at 384×288 resolutions were reported to be successfully detected at 15 frames per second using PIII 700MHz. There are open domain resources, such as OpenCV [5], offering libraries that can be used for research and development. However, using only software, almost all of the CPU computing power will be needed. Beside that, in the case of high image resolution or in real-time video applications, it is not practical to apply AdaBoost algorithm in software.

__________________________________________ 0-7803-8665-5/04/$20.00 ©2004 IEEE

Thus, a hardware implementation of AdaBoost algorithm will be very useful. A dedicated embedded hardware for face biometric detection shows great potential for various applications [9]. Techniques using neural network [6] and skin color detection [8] have been applied in embedded hardware for face detection. However, there has been very little study done on hardware design for AdaBoost algorithm. It will be shown in this paper that an efficient hardware architecture that implements AdaBoost algorithm can be realized using FPGA. A system level design of FPGA is described. The rest of the paper is organized as follow; section 2 describes the face detection technique using AdaBoost and analyzes its computation complexity. Section 3 describes the FPGA architecture for the algorithm. The experiment results are given in section 4 followed by conclusion in section 5.

Fig. 1. General block diagram of a face detection system based on statistical learning method.

2. THEORY OF ADABOOST 2.1. Detection Based on AdaBoost Technique A general block diagram of face detection system based on statistical learning method is shown in Fig. 1. Two procedures, the training and the testing procedures, are involved. During the testing procedure (the detection), the input image is first

scanned exhaustively in a cascade manner at different scales. The sequences of the generated scanned windows, typically sized between 16×16 and 48×48, are then passed through the preprocessing unit which performs basic enhancement operations such as histogram equalization, brightness and contrast adjustment. Then, the feature vectors are extracted and used as the input of the classifier to judge whether the windows contain a face. The classifier is obtained from the training procedure which performs different learning method. In this paper, we adopted the AdaBoost method as introduced in [1].

2.4. Cascade classification in image In an object detection based on classification methods, a pyramid framework is commonly used. The calculation of pyramid poses heavy computation load. For example, given an input image resolution of x-by-y, and the size of the normalized image for classification x0–by-y0. Let the scaling factor is r, and Sx and Sy be the steppings in the x- and y-direction respectively. The total number of sub-images, T, generated by the input image, is: T

2.2. Wavelet presentation A set of haar wavelets, used as the input features to the cascaded classifiers, are shown in Fig. 2 [3]. In our work, haar wavelets features are computed using integral image [1] to improve computation efficiency. An integral image is defined as the sum of all pixel values (in an image) above and to the left, including itself. Based on the computed integral image, the haar wavelets features can be efficiently calculated.

Fig. 2. The set of haar wavelets for AdaBoost features.

2.3. Classifier for sub-image Using haar wavelet features, the classification of a sub-image can be done by computing a classifier hj which can identify the sub-image as face or non-face pattern, according to feature value fj: 1 if p j f j < p jθ j (1) h j ( x) = 0 o th e rw ise where x is a sub-image and pj is the parity indicating the direction of the inequality sign. fj is the jth feature value and j is the trained threshold. AdaBoost method combines many weak classifiers (based on single haar wavelet feature) into a strong classification function [1]. The final strong classifier obtained from the training procedure is: T 1 T α h ( x) < αt 1 (2) t=1 t t h( x ) = 2 t =1 0 otherwise where x is a sub-image, T is the preset number of weak classifiers, ht(x) is the classifier with lowest error in the tth iteration of AdaBoost training process, and t is the corresponding weight for ht(x). Although AdaBoost offers a very efficient feature selection technique, the number of features selected from the over-complete haar wavelet features is often too large for a practical real-time application (see next subsection). However, the algorithm is suitable for parallel implementation using hardware and this is the motivation for our work.



xy S xS

r y

2

x 02 y , x 2 y − 1

− m in ( r

2

2 0 2

)

(3)

A typical values of x0 and y0 is 20, and the values of Sx and Sy are in the range of 1 to 5 pixels. When r = 1.2 and input image size is 256×256, using on (3), T is more than 100,000 images. Frequently, the parameters are adjusted to compromise between the detection accuracy and processing speed. However, based on [1], the accuracy of face detection might reduce by as much as 2% when r is increased. A cascaded technique has to be applied here to handle the huge number of sub-images. Because in typical images, most of the sub-images are non-face patterns, the cascade technique attempts to discard these windows as much as possible. This is achieved using classifiers with small number of features in the first stages. These first few stages, running at high speed, produce a much smaller number of possible face regions for further processing by the more complicated classification stages. 3. PROPOSED FPGA IMPLEMENTATION FOR FACE DETECTION BASED ON ADABOOST In our work, Xilinx System GeneratorTM [4], Xilinx ISETM, and MATLAB were used for FPGA design. The general block diagram describing our hardware architecture is shown in Fig. 3. The synchronization and memory controller, which will not be discussed here, perform the general timing control of the whole circuitry. VRAM refers to video RAM. As in typical hardware realization of DSP algorithm, a tradeoff between hardware resource and processing speed is needed. In our FPGA design, we take into account the parallelism nature of feature classifiers and hardware saving through cascading. 3.1. Scan controller subsystem The more detailed block diagram of the scan controller, which performs the pyramid scaling, is shown in Fig. 4. Starting from the original image size of 120×120, the scan controller generates the pyramid of images at 96×96, 72×72, 48×48 and 24×24 resolutions. The image re-sampling is implemented using 5×5 low-pass convolution filter kernel. An inverse mapping technique is used to compensate for the effect of pixel address decimation. Using the original image size 120×120, and with five scaling,

the total number of the sub-images to be scanned and classified in one 120×120 image is 17,281. This relatively large number of frames emphasizes the need for hardware acceleration.

Fig. 3. General block diagram of the FPGA hardware for face detection based on AdaBoost algorithm.

3.3. Face verification classifier Here, we shall first describe the calculation of haar features. Then, our realization of haar feature computation using efficient hardware architecture is presented. At some point in the calculation of haar feature, there is a need to compute a summation from the integral image. This summation of every rectangle is calculated using Equation 5 below (see Fig. 5 (a)). Sum = a + d – b - c (5) where a, b, c, and d, are the integral image values at the corresponding location shown in Fig. 5 (a). From Fig. 5 (b), showing the computation of haar feature of the left most haar wavelet in Fig. 2, a computation of a difference needed to be computed: Difference = coeff × Sum2 - Sum1 Sum1 = a + d – b - c Sum2 = e + h – g - f

where and

(6) (7) (8)

The constant coeff is in the range 2-3.

Fig. 4. Block diagram of scan controller shown in Fig. 3.

3.2. Integral image generator The integral image is simply defined as the sum of all pixel values above and to the left, including itself. Equation 4 shown below represents our hardware implementation of integral image generator. i(x,y) is the image intensity at pixel location (x, y) and ii(x,y) is the integral image at location (x, y). s ( x, y ) = s ( x, y − 1) + i ( x, y ) ii ( x, y ) = ii ( x − 1, y ) + s ( x, y )

(4)

In typical software implementation of AdaBoost algorithm, integral image of the whole original image is commonly used, taking advantage of the large amount of RAM available on PC. In hardware implementation, however, it is not practical to reserve precious hardware resource for such a large amount of memory. Thus in our design, the integral image is computed for each and every sub-image (24×24) instead of the original image (120×120). Performing integral image calculation on sub-image, instead of the whole image, offers many advantages. First is the saving of number of bits in the summation. For example, for a 24×24 sub-image, the maximum value of the integration is 24×24×255 = 146,880; less than 18 bits is needed. For the integration of the 120×120 image, the maximum is 120×120×255 = 3,672,000, which requires more than 22 bits. The other advantage is that the values of the coefficients used in the next stage, face verification classifier, do not depend on original image size.

a

b

a

b

e

g

c

d

c

d

f

h

(a) (b) Fig. 5. (a) Computation of a summation (b) Computation of the difference of two summations. Table. 1. Coefficients of haar feature. Location a d b c e h g f Weights -1 -1 1 1 coeff coeff -coeff -coeff

By expanding the terms in Equations 7 and 8, we can tabulate a set of weights as shown in Table 1. It is very efficient to implement the calculation of Equations 6 to 8 in hardware using multiplyaccumulation (MAC) technique. MAC technique is commonly used in hardware to save the resource as well as to increase operating speed. The direct circuit implementation for Equations 6-8 needs six adders and two multipliers. Using MAC technique, only one adder and one multiplier are needed. Our FPGA circuit for the computation of haar feature using MAC technique is shown in Fig. 6 (in System Generator). As can be seen from the figure, some bits from Address1 are used to address Mux1 to select the appropriate weights according to Table 1. In typical implementation of face verification using AdaBoost algorithm, the classifiers are implemented in stages, as many as 25. The later stages usually contain more haar wavelets in order to increase the accuracies. We design our FPGA using 24x24 sub-image and implement three classifier stages. The first, the second, and the last stages contain 9, 16, and 200 haar wavelets respectively. It can be shown that, using MAC architecture explained earlier, every haar feature can be calculated in 8 clock cycles, and the total number of clocks needed for completion of all the stages is 72.

In hardware, fixed-point numeric is commonly used instead of floating-point numeric. The main disadvantages using floating-point in hardware are high resource requirements and high clock frequency. We designed our FPGA using 16-bit fixed-point numeric format. One bit is used for sign, 3 bits for integer part, and 12 bits for the fractional part.

Block RAM (BRAM) needed will be significantly reduced by one-half. It is also straightforward to implement some form of circuit sharing in order to save more FPGA resources. Based on the synthesis result, the FPGA can operate at 91 MHz; equivalent to 15 video frames (120×120) per second. Thus, the processing speed in term of sub-images/second is 15×17,281=259,215. Thus, our FPGA design offers the computation performance of a high-end PC at a very low cost. Table. 2. The resource requirement for the first 2, and the last stages of the FPGA design. Resources available Slices (10,752) Slice Flip Flops (21,504) 4 input LUTs (21,504) bonded IOBs (624) BRAMs (56) MULT18X18s (56) GCLKs (16)

Fig. 6. Improved circuit for computation of haar feature using MAC technique.

4. SIMULATION AND SYNTHESIS OF THE FPGA FOR ADABOOST ALGORITHM We carried out FPGA functional simulation using System Generator and MATLAB, on CMU test dataset downloaded from web site in [7]. In the first two stages, CMU database includes 472 face patterns in test set and 390 faces are correctly classified and 92 faces are falsely rejected (Generally, in AdaBoost algorithm, every stages should have a high true accept ratio. Because CMU faces set is very strict and the highest reported accuracy is only a little over 80%. Therefore, our result of true face accept ratio at over 80% is already considered high). CMU database includes 23,573 non-face patterns in its test set and 13,240 can be filtered out and 10,333 non-face patterns pass the first two stages. This means that over one-half of the sub-images could be filtered out. In fact, in typical images where the background is not as complex as in CMU database, the first two stages can filter out over three-quarter of sub-images. The last stage contains 200 haar wavelets. Using CMU database, half of faces samples are correctly classified and over 85% non-face patterns are correctly rejected. Currently, we are working on increasing the size of the image database, and implementing pre-processing to enhance the input face images to improve the classification performance. After the successful simulation, VHDL codes were generated from the System Generator design and then synthesized using Xilinx ISE. The target device was a two million gate Virtex-2 FPGA. The hardware resources used in the three stages are shown in Table 2. The figures in parenthesis in the first column are the total amount of resources available in the FPGA. Note that if the size of sub-image is reduced slightly to 22×22 (instead of 24×24), the

Resource requirements First 2 stages Last stage 945 (8%) 6,721 (62%) 540 (2%) 4,033 (18%) 1,669 (7%) 12,184 (56%) 31 (4%) 29 (4%) 6 (10%) 50 (89%) 3 (5%) 25 (44%) 1 (6%) 1 (6%)

5. CONCLUSION In this paper, we present hardware architecture of a real-time face biometric detection algorithm using FPGA. The face detection algorithm is based on haarlike feature using AdaBoost. In our hardware architecture, all the features obtained from AdaBoost training are computed in parallel, thus accelerating the computation. We present a very efficient implementation of the face classification stage. As a result of using hardware acceleration, the bottleneck hindering the real-time realization using software is resolved. The proposed architecture offers a very good speed/cost performance and thus suitable for real-time face detection. The hardware can be applied in many medical and bioinformatics applications. 6. REFERENCES [1] P.Viola and M.J.Jones, “Robust real-time object detection”, Proc. of IEEE Workshop on Statistical and Computational Theories of Vision, 2001. [2] H.A.Rowley, S.Baluja, and T.Kanade, “Neural network-based face detection”, IEEE Trans. Pattern Anal Mach. Intell. 20, pp. 23-38, January 1998. [3] C. Papageorgiou, M. Oren, and T. Poggio, “A general framework for object detection”, International conference on computer vision, 1998. [4] Xilinx Inc., Xilinx System Generator v3.1 for MathWorks Simulink: Quick Start Guide, 2003. [5] http://prdownloads.sourceforge.net/opencv/library/Op enCVReferenceManual.pdf [6] Fan Yang and Michel Paindavoine, “Implementation of an RBF neural network on embedded systems realtime face tracking and identity verification”, IEEE Transaction on Neural Networks, vol. 14, no. 5, September 2003. [7] http://vasc.ri.cmu.edu/idb/html/face/ [8] Stavros Paschalakis, Miroslaw Bober, "Real-time face detection and tracking for mobile videoconferencing", Real-Time Imaging 10 p81–94 (2004). [9] Xiong Bing, Yu Wei, and Charayaphan Charoensak, “Face contour tracking in video,” in Proc. IEEE Int. Conf. Image Processing, Oct. 24-27, 2004. To appear.