Trainable Classifier-Fusion Schemes: an ... - Semantic Scholar

4 downloads 59944 Views 348KB Size Report
Automotive Dataset, offering a set of complementary experi- ments using ... subsequent training datasets in order to produce new classifiers that are better able ...
Trainable Classifier-Fusion Schemes: an Application to Pedestrian Detection Oswaldo Ludwig Junior, David Delgado, Valter Gonc¸alves, Urbano Nunes ISR-Institute of Systems and Robotics, Department of Electrical and Computer Engineering, University of Coimbra Polo II, 3030-290, Coimbra, Portugal, Email: [email protected], [email protected]

Abstract—This work proposes a novel classifier-fusion scheme using learning algorithms, i.e. syntactic models, instead of the usual Bayesian or heuristic rules. Moreover, this paper complements the previous comparative studies on DaimlerChrysler Automotive Dataset, offering a set of complementary experiments using feature extractor and classifier combinations. The experimental results provide evidence of the effectiveness of our methods regarding false positive rate, AUC, and accuracy, which reached 96.67%.

I. I NTRODUCTION Based on a substantial amount of theoretical and empirical evidence we can state that an ensemble of classifiers is generally better than a single classifier. The two most usual ensemble strategies are boosting [1] and bagging [2]. Numerous theoretical studies explain the success of Boosting by proving bounds and margins on its error. The boosting strategy is to produce a series of classifier trained by means of subsequent training datasets in order to produce new classifiers that are better able to predict examples for which the current ensemble’s performance is poor. The bagging strategy is to compose an ensemble of classifiers where each one is trained with a subset based on a random redistribution of the training dataset. In fact, each individual classifier in the ensemble is generated with a different random sampling of the training set. This work adopts an ensemble of different classifiers, all of them are trained with the same training dataset. The motivation is to compose classifier ensembles with diversity of behavior, in order to increase the accuracy. This kind of classifier fusion has been applied by our research group on pedestrian detection with, often, better result than single classifiers. Therefore, this work proposes a novel fusion scheme using adjustable algorithms, i.e. syntactic classifiers such as Neural Networks (NN), instead of the usual bayesian or heuristic rules. Moreover, this paper complements the comparative study [3] on DaimlerChrysler Automotive Dataset, offering results of a set of experiments using feature extractor and classifier combinations. The paper is organized as follows. In Section II and III, we briefly review some feature extractors and classifiers, whose results are compared in this paper. The DaimlerChrysler Automotive Dataset is presented in Section IV and the experiments are explained in Section V. Finally, Section VI presents conclusions.

II. F EATURE E XTRACTORS This section briefly describes feature extractors used in the present work for pedestrian detection studies. A. HOG features Histogram of Oriented Gradients (HOG) [7] is inspired on Scale-Invariant Feature Transform (SIFT) descriptors proposed by [5]. To compose HOG, the cell histograms of each pixel within the cell casts a weighted vote, according to the gradient L2-norm, for an orientation-based histogram channel. In this work the histogram channels are calculated over rectangular cells (i.e. R-HOG) by the computation of unsigned gradient. The cells overlap half of their area, meaning that each cell contributes more than once to the final feature vector. In order to account for changes in illumination and contrast, the gradient strengths were locally normalized, i.e. normalized over each cell. The HOG parameters were adopted after a set of experiments performed over the training data set using a Neural Network (NN) as classifier. The higher Area Under ROC Curve (AUC), computed over the validation data set, was achieved by means of 9 rectangular cells and 9 bin histogram per cell. The nine histograms with nine bins were then concatenated to make a 81-dimensional feature vector. B. COV features The utilization of covariance matrices descriptors in classification problems was followed by [8] and [10]. Let I be the input image matrix, and zp the corresponding d-dimensional feature vector calculated for each pixel p,   q |Iy | zp = x, y, |Ix | , |Iy | , Ix2 + Iy2 , |Ixx | , |Iyy | , arctan |Ix | (1) where x and y are the pixel p coordinates, Ix and Iy are the first order intensity derivatives regarding to x and y respectively, Ixx and Iyy are the second order derivatives, and the last term in (1) is the edge orientation. In this work, four sub-regions are computed within a region R, which represents the area of a cropped image. Each subregion overlaps half of its area, meaning that each sub-region contributes more than once to the final feature vector. For the ith rectangular sub-region Ri , the covariance matrix CRi is expressed by,

CRi

Ni 1 X = (zp − µi )(zp − µi )T Ni − 1 p=1

yhs = ϕ(W2 · yhf + b2 ) (2) yb =

th

where µi is the statistical mean of zp over the i sub-region Ri and Ni is the number of pixels in Ri . Notice that, due to the symmetry of CRi , only the upper triangle part need to be stored, hence the covariance descriptor of a sub-region is an 8 × 8 matrix. The features of the whole region R are also calculated, therefore a feature vector with 180 features is generated, i.e., 4 sub-regions Ri , totalizing 144 features, plus 36 features of the whole region R. III. C LASSIFIERS This section briefly describes the Classifiers used in the precedent work [3] as well as the ones used in the present paper. A. Neural Network Trained by GDX Gradient Descendent with momentum term and adaptive learning rate (GDX) is an algorithm applied to NN training. In that algorithm the backpropagation is used to calculate the derivatives of the MSE with respect to the weight and bias. Each variable is adjusted according to the gradient descendent with momentum term. For each epoch, if MSE decreases toward the goal, then the learning rate α1 is increased by a given factor η. If MSE increases by more than a given threshold γ, the learning rate is decreased by a given factor ϕ, and the change, that increased the MSE, is not made. This work adopts a new approach. We define AU Cuse as the area under the ROC curve in the interval [0, 0.05] of false positive rate, because the optimal operation point for ROC curves often occours in that interval. More specifically, during the GDX training, the value of AU Cuse is calculated at each 75 Epochs, over the validation dataset. If AU Cuse increases, then a register of network parameters is updated. In the final, the registered network parameters, which correspond to the biggest AU Cusef ul , are adopted. In this work GDX is applied to train an MLP with only one sigmoidal hidden layer, whose model is: yhf = ϕ(W1 · x + b1 )

(3)

yb = W2 · yhf + b2

(4)

B. Neural Network Trained by MCI Minimization of InterClass Interference (MCI) [13] is a maximum-margin based training algorithm for Neural Networks (NN). MCI aims to create a NN hidden layer output (i.e. feature space) where the patterns have a desirable statistical distribution. Regarding the neural architecture, the linear output layer is replaced by the Mahalanobis kernel in order to improve generalization. MCI is applicable on a neural network model with two sigmoidal hidden layers and one output nonlinear layer: yhf = ϕ(W1 · x + b1 ) (5)

d2 − d1 d2 + d1

(6) (7)

where yhf is the output vector of the first hidden layer, yhs is the output vector of the second hidden layer, Wk (k = 1, 2) is the synaptic weights matrix of the layer k, bk is the bias vector of layer k, x is the input vector, ϕ(.) is the sigmoid function, dm = (yhs − µm )T Σ−1 (yhs − µm ) is the Mahalanobis distance between yhs and µm , Σ is the covariance matrix over all the output vectors yhs, presented by the second layer in response to the training data set, PNhidden m yhsm (i) is the prototype of Class m, Nm is µm = N1m i=1 the number of training patterns that belong to Class m, and yhsm (i) is the second hidden layer output for an input that belongs to Class m. Analyzing (7) we can observe that yb varies continuously from −1, for yhs = µ2 , to 1, for yhs = µ1 . This continuous approach enables ROC curves calculation. The MCI creates a hidden space where the Euclidean distance between the prototypes of each class is increased, and the patterns dispersion of each class is decreased. The goal is to maximize the objective function T

J = (µ1 − µ2 ) (µ1 − µ2 ) − δ12 − δ22

(8)

PNm 2 T = where, δm i=1 (yhm (i) − µm ) (yhm (i) − µm ) is the deviation of Class m patterns in the hidden space. The weights and biases are updated based on the gradient ascendent algorithm. C. FLDA Let us consider w a vector of adjustable gains and {xc } the set of feature vectors that P belong to Class c, (c = 1, 2) with mean µc , and covariance c . The linear combination w · xc has mean w · µc and covariance wT Σc w. The ratio, J(w), of 2 the variance between the classes, σbet , by the variance within 2 the classes, σwit , is a suitable measure of separation between these two classes: J(w) =

2 σbet (w · (µ2 − µ1 ))2 = 2 σwit wT (Σ1 + Σ2 )w

(9)

To obtain the maximum separation between classes one has to find the vector w which solves the optimization problem max J(w)

(10)

w = (Σ1 + Σ2 )−1 (µ2 − µ1 )

(11)

w

whose solution is

To find the plane that best separates the data, wT µ1 + b = −(wT µ2 + b) have to be solved for the bias b.

Feature Extractor

D. SVM Support Vector Machines (SVM) are based on the statistical theory of learning, developed by Vapnik [12]. This theory provides a set of principles to be followed in order to obtain classifiers with good generalization, defined as its ability to predict correctly the class of new data in the same area where the learning occurred. Table I presents three usual SVM Kernels, where nd is a natural number denoting the polinomial degree.

Kernel name Polynomial RBF

SVM_RBF Cropped Image

H x, x H x, x

0

0



0



FLDA

= xT x

Likelihood

FLDA

Fig. 1.

Feature Extractor

0

0

= (xT x + 1)nd

0 2 = exp(−γ x − x )

Fusion Classifier

COV

Kernel function H x, x

Linear

Trainable Fusion

HOG

Esquema1

TABLE I U SUAL SVM K ERNELS .

Classifier

Fusion Scheme 1.

Classifier

Trainable Fusion

SVM_RBF HOG Cropped Image

FLDA

Fusion Classifier

Likelihood

SVM is very sensitive to the margin parameter C 1 , thereFeature Trainable Classifier COV Extractor fore, it is not appropriate to adjust this parameter based on Fusion GDX-MCI the SVM performance on the test data set, otherwise we will SVM_RBF bring information from the test data set to the SVM. The usual HOG approach is to apply K-fold cross validation over the training Cropped Esquema2Image Fig. 2. Fusion Scheme 2. Fusion Likelihood data set. FLDA Classifier 3 Standard SVM training algorithms have O(m ) time and COV 2 O(m ) space complexities, where m is the number of training FLDA were randomly selected 2000 example. Therefore, in case of large training datasets, the pixels of gray scale. In this work computational cost is prohibitive. Because this paper treats examples form the training subset to compose a validation dataset, that was applied to set parameters of classifiers and with a large dataset, we were forced to choose the Radial Basis Esquema 1 Function (RBF) kernel, due to its suitable computational cost. training methods. If a SVM with linear or polinomial kernels is trained whit large datasets, the margin parameter C can not be adequately Feature VI. E XPERIMENTS Trainable Classifier adjusted, due to the computational cost. Extractor Fusion IV. F USION A LGORITHM

SVM_RBF From the DaimlerChrysler dataset were extracted HOG and COV features. HOG These features have been applied in four Cropped different GDX, Likelihood Fusion with Imageclassifiers, specifically, FLDA, NN trained FLDA Classifier NN trained with GXD-MCI, and SVM with RBF kernel. The trainable fusionCOV algorithm was finally applied to improve the accuracy. The classifiers parameters are presented in Tables II GDX-MCI and III, where nc is the number of iterations between AUC calculation and sc is the stop criterion, i.e., the number of iterations without AU Cuse improvement.

Actually the fusion algorithm is also a classifier that receives the likelihoods from the others single-classifiers and decides the class. In this work were tested three fusion schemes with an hierarchical structure, as illustrated in Figures 1, 2, and 3. Both the single classifiers and the fusion algorithm are trained with the same training dataset. However, the single-classifiers are trained before the fusion algorithm in order to create a ’likelihood training dataset’ which is used together with the training labels in the fusion algorithm training process. The Esquema 2 Algorithm 1 explains the training details.

Feature Extractor

V. DATASET This work applies the DaimlerChrysler Pedestrian benchmark dataset, which is composed by a collection of 29400 images, split into 19600 examples for training and 9800 test examples. The training subset is divided in two parts: 10000 images of non-pedestrians and the others 9600 are pedestrians. The test subset has 5000 examples of non-pedestrian and 4800 examples of pedestrian. Samples are scaled to size 18 × 36

Classifier

Trainable Fusion

SVM_RBF HOG Cropped Image

1 Margin parameter that determines the trade-off between maximization of Esquema 3 the margin and minimization of the classification error [?]

FLDA COV GDX-MCI

Fig. 3.

Fusion Scheme 3.

Fusion Classifier

Likelihood

Algorithm 1 Training of fusion algorithm Input: {xtrain }, {ytrain }: training dataset containing Nt pairs of training examples {xvalid }, {yvalid }: validation dataset containing Nv pairs of validation examples (for stop criterion) {φ}: set containing Nc classifiers previously trained by means of the training dataset {xtrain }, {ytrain } Output: trained parameters θ of the fusion algorithm 1: Utrain ← emptymatrix; 2: Uvalid ← emptymatrix; 3: for k=1:Nc do 4: vtrain ← emptyvector; 5: for n=1:Nt do 6: process the nth training example xntrain through the k th previously trained classifier φk , in order to obtain the nth likelihood Lktrain,n;  7: vtrain ← vtrain |Lktrain,n : concatenate likelihood of each training example in order to compose a likelihood vector; 8: end for 9: vvalid ← emptyvector; 10: for n=1:Nv do 11: process the nth validation example xnvalid through the k th previously trained classifier φk , in order to obtain the nth likelihood Lkvalid,ni; h 12:

13: 14:

15: 16: 17:

vvalid ← vvalid |Lkvalid,n : concatenate likelihood of each validation example in order to compose a likelihood vector; end for k Utrain ←[Utrain |vtrain ]: concatenate likelihood vectors of each classifier in order to create a matrix of likelihood vectors; k Uvalid ←[Uvalid ; vvalid ]; end for apply the matrixes Utrain , Uvalid , and target vectors ytrain , yvalid to training the fusion algorithm, obtaining the adjusted parameters θ TABLE II PARAMETERS OF NN Classifier

α1

ϕ

η

nc

nn

α2

sc

GDX GDX − M CI

0.1 0.1

0.7 0.7

1.05 1.05

300 1

15 100

0.5

4500 30

TABLE III PARAMETERS OF SVM Classifier

γ

C

1 SV MRBF 2 SV MRBF

0.1 30000

2000 2000

A. Experiments with Single Classifiers The experiments with single classifiers give evidence of the advantage of FLDA in case of COV features, as re-

TABLE IV AUC OF SINGLE CLASSIFIERS Descriptor

Classifier

AUC

AU Cuse

HOG

FLDA GDX GDX-MCI 1 SV MRBF

0.8726 0.9301 0.9174 0.9543

0.3881 0.5871 0.3776 0.6815

COV

FLDA GDX GDX-MCI 2 SV MRBF

0.9903 0.9716 0.9763 0.9399

0.8475 0.6282 0.7337 0.5455

HOG-COV

FLDA GDX GDX-MCI 2 SV MRBF

0.9929 0.9711 0.9731 0.9431

0.8938 0.5737 0.6434 0.5778

TABLE V P ERFORMANCE OF SINGLE CLASSIFIERS Descriptor

Classifier

accuracy

TP

FP

HOG

FLDA GDX GDX-MCI 1 SV MRBF

0.7806 0.8584 0.8589 0.8828

0.8325 0.8646 0.8640 0.9019

0.2692 0.1476 0.1460 0.1356

COV

FLDA GDX GDX-MCI 2 SV MRBF

0.9505 0.9064 0.9146 0.8595

0.9958 0.9796 0.9742 0.8988

0.0930 0.1638 0.1426 0.1782

HOG-COV

FLDA GDX GDX-MCI 2 SV MRBF

0.9566 0.9247 0.9265 0.8601

0.9862 0.9788 0.9652 0.9325

0.0718 0.1272 0.1106 0.2095

ported in Tables IV and V. The result of the combination COV+FLDA (95.05%) is similar to the best state-of-the-art methods, with small computational cost. The FLDA accuracy increases slightly when the ’bag of features’ HOG-COV is applied. B. Fusion Experiments The first experiment applies the fusion scheme of the Figure 1 using the best single classifier for each feature extractor, according to Table V. The results are reported in Table VI, which gives evidence of the advantage of GDX as fusion algorithm in the Scheme 1. Taking into account the redundance of Scheme 1, which applies the same classifier (FLDA) on the COV features and the ’bag of features’ HOG-COV, we tried the scheme of Figure 2, which uses the second best single classifier for COV features (i.e. GDX-MCI). As reported in Table VII, Scheme 2 increases slightly the accuracy. The best performance is obtained by means of the fusion Scheme 3 shown in Figure 3, which applies the ’bag of features’

TABLE VI P ERFORMANCE OF CLASSIFIERS FUSION WITH S CHEME 1

TABLE VIII P ERFORMANCE OF CLASSIFIERS FUSION WITH S CHEME 3

Descriptor

Classifier

Fusion

accuracy

TP

FP

Descriptor

Classifier

Fusion

accuracy

TP

FP

HOG COV HOG-COV

1 SV MRBF FLDA FLDA

FLDA

0.9579

0.9765

0.0600

HOG-COV HOG-COV HOG-COV

2 SV MRBF GDX-MCI FLDA

FLDA

0.9565

0.9910

0.0766

HOG COV HOG-COV

1 SV MRBF FLDA FLDA

GDX

0.9650

0.9785

0.0480

HOG-COV HOG-COV HOG-COV

2 SV MRBF GDX-MCI FLDA

GDX

0.9661

0.9858

0.0528

HOG COV HOG-COV

1 SV MRBF FLDA FLDA

GDX-MCI

0.9630

0.9760

0.0496

HOG-COV HOG-COV HOG-COV

2 SV MRBF GDX-MCI FLDA

GDX-MCI

0.9667

0.9865

0.0522

HOG COV HOG-COV

1 SV MRBF FLDA FLDA

1 SV MRBF

0.9616

0.9752

0.0514

HOG-COV HOG-COV HOG-COV

2 SV MRBF GDX-MCI FLDA

1 SV MRBF

0.9665

0.9871

0.0532

1

TABLE VII P ERFORMANCE OF CLASSIFIERS FUSION WITH S CHEME 2 Descriptor

Classifier

Fusion

accuracy

TP

FP

HOG COV HOG-COV

1 SV MRBF GDX-MCI FLDA

FLDA

0.9652

0.9750

0.0442

HOG COV HOG-COV

1 SV MRBF GDX-MCI FLDA

GDX

0.9655

0.9750

0.0436

HOG COV HOG-COV

1 SV MRBF GDX-MCI FLDA

GDX-MCI

0.9663

0.9779

0.0448

HOG COV HOG-COV

1 SV MRBF GDX-MCI FLDA

1 SV MRBF

0.95

0.9

0.85

0.8

0.75

0.9601

0.9700

0.0494

Scheme3/GDX-MCI Scheme2/GDX-MCI Scheme1/GDX HOG-COV/FLDA 0

0.05

0.1

0.15

0.2

0.25

Fig. 4. Comparison between the ROC curves of the best fusion schemes (including the ROC of the single classifier FLDA with HOG/COV features)

DaimlerChrysler dataset (see Section V). HOG-COV for all the single classifiers. Scheme 3 reaches the impressive accuracy of 96.67%, as reported in Table VIII. In the last two experiments the NN trained by GDX-MCI was the best fusion algorithm. Figure 4 illustrates a comparison between the ROC curves of the best fusion ensembles. That figure also includes the result of the best single classifier, i.e., the FLDA with HOG-COV features.

accuracy =

T P · npe + (1 − F P ) · nne npe + nne

(12)

Notice that, this procedure results in an optimistic performance evaluation, because we are considering the best threshold on the test dataset, which was applied to calculate that ROC curves. The results of the previous works [3], [14], [15], [16], [17] and [18] are in Tables IX − XIV, respectively. VII. C ONCLUSION

C. Other Results From some previous works on DaimlerChrysler dataset we collect performance indexes from ROC curves by means of (12), which take into account the best relation between True Positive rate (T P ) and False Positive rate (F P ) on the ROC curves as well as the number of positive examples (npe) and the number of negative examples (nne) of the

The proposed trainable fusion method improves the performance of the best single classifier (FLDA with HOG-COV features) by 1.01%. However, despite the slight performance improvement, in Matlab environment the Fusion Scheme 3 expends only 8.340e-4 seconds per image. Regarding computer vision, the computational effort of the usual feature extractors is more critical than the computational effort of classifiers (excepting the LRF). This fact justifies the fusion approach.

TABLE IX A N E XPERIMENTAL S TUDY ON P EDESTRIAN C LASSIFICATION [3]

TABLE XIV F EATURE M INING FOR I MAGE C LASSIFICATION [18]

Descriptor

Classifier

accuracy

TP

FP

Descriptor

Classifier

accuracy

TP

FP

LRF LRF LRF LRF

SV MP olynomial K − NN NN SV MRBF

0.9470 0.8255 0.8908 0.9023

0.9460 0.8260 0.8500 0.9100

0.0520 0.1750 0.070 0.1050

HOG

SV MRBF

0.9151

0.9100

0.0800

HW HW HW HW

SV MP olynomial K − NN NN SV MRBF

0.9134 0.8300 0.8160 0.8590

0.8580 0.8300 0.7750 0.7850

0.0334 0.1700 0.1450 0.0700

TABLE X E NSEMBLE OF M ULTIPLE P EDESTRIAN R EPRESENTATIONS [14]

computational cost. This paper also provides evidence that classifier performance is strongly dependent of the feature space (see Table V). ACKNOWLEDGMENT This work was supported by Institute of Systems and Robotics (ISR-UC), and FCT under grant PTDC/EEAACR/72226/2006. R EFERENCES

Descriptor

Classifier

Fusion

accuracy

TP

FP

Lem Lbp

SV MRBF SV MRBF

sum rule

0.9421

0.9600

0.0750

LRF

SV MRBF

0.9128

0.9000

0.0750

TABLE XI P EDESTRIAN D ETECTION USING KPCA AND FLD A LGORITHMS [15] Descriptor

Classifier

accuracy

TP

FP

KPCA

FLDA

0.8598

0.8700

0.1500

HW

SV MP OLY

0.8657

0.8300

0.1000

TABLE XII A N E XPERIMENTAL E VALUATION OF L OCAL F EATURES FOR P EDESTRIAN C LASSIFICATION [16] Descriptor

Classifier

accuracy

TP

FP

HOG HOG

SV MP OLY SV MLinear

0.9430 0.9200

0.9200 0.9200

0.0350 0.0800

COV COV

SV MRBF SV MLinear

0.9347 0.8900

0.9500 0.8600

0.0800 0.0800

LRF

SV MP OLY

0.9000

0.8700

0.0700

TABLE XIII A PPROXIMATE RBF K ERNEL SVM AND I TS A PPLICATIONS IN P EDESTRIAN C LASSIFICATION [17] Descriptor

Classifier

accuracy

TP

FP

HOG HOG HOG HOG HOG

SV MRBF SV MAT 2−RBF SV MAU 2−RBF SV MAP 2−RBF SV MLinear

0.9249 0.9102 0.9151 0.9151 0.7976

0.9300 0.9000 0.9100 0.9100 0.6700

0.0800 0.0800 0.0800 0.0800 0.0800

The simple composition of COV features with FLDA classifier presents a surprising result (95.05%), overcoming all the other previous works (see Tables IX − XIV), with small

[1] Freund, Y. Schapire, R., ”Experiments with a new boosting algorithm” Proceedings of the Thirteenth International Conference on Machine Learning, 148-156 Bari, 1996. [2] Breiman, L., ”Bagging predictors”, Machine Learning, 24(2), 123-140, 1996. [3] Munder, S. and Gavrila D.M., ”An Experimental Study on Pedestrian Classification”, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 28, No. 11, November 2006. [4] Lin, T; Zha, H, ”Riemannian Manifold Learning”, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 30 (5), May 2008. [5] Lowe, D., Distinctive image features from scale-invariant keypoints, International Journal of Computer Vision, vol. 60, pp. 91110, 2004. [6] Zhu, Q., Avidan, Q., Yeh, M.-C., and Cheng, K.-T., Fast human detection using a cascade of histograms of oriented gradients, in IEEE Computer Society Conference on Computer Vision and Pattern Recognition, New York, June 2006. [7] Dalal, N. and Triggs, B., Histograms of oriented gradients for human detection, in IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2005, pp. 886893. [8] Tuzel, O., Porikli, F., and Meer, P., A fast descriptor for detection and classification ,Proc. 9th European Conf. on Computer Vision, 2006. [9] Porikli, F., Tuzel, O., and Meer, P., Covariance Tracking using Model Update Based on Lie Algebra, IEEEConf. on Computer Vision and Pattern Recognition, 2006. [10] Tuzel, O., Porikli, F., and Meer, P., Human Detection via Classification on Riemannian Manifolds, Proc. CVPR, pp 1-8, 2007 [11] Papageorgiou, C., Evgeniou, T., and Poggio, T., A trainable pedestrian detection system, in Proceedings of Intelligent Vehicles Symposium, October 1998, pp. 241246. [12] Vapnik, V.N., Statistical Learning Theory, John Wiley and Sons, 1998. [13] Ludwig O. and Nunes, U. Improving the Generalization Properties of Neural Networks: an Application to Vehicle Detection . in 11th International IEEE Conference on Intelligent Transportation Systems ITSC 2008, October 12-15, Beijing, China. [14] Nanni, L. and Lumini, A., ”Ensemble of Multiple Pedestrian Representations”, IEEE Transactions on Intelligent Transportation Systems, Vol. 9, June 2008, pp. 365-369. [15] Liang, Y-h., Wang, Z-h., Guo, S., Xu, X-w. and Cao, X-y., ”Pedestrian Detection using KPCA and FLD Algorithms”, IEEE International Conference on Automation and Logitics, Aug. 2007, pp. 1572-1575. [16] S. Paisitkriangkrai, C. Shen, and J. Zhang, ”An experimental evaluation of local features for pedestrian classification,” in Proc. Int. Conf. Digital Image Computing - Techniques and Applications, Adelaide, Australia, 2007. [17] Cao, H., Naito, T. and Ninomiya, Y., ”Approximate RBF Kernel SVM and Its Applications in Pedestrian Classification”, MLVMA’08, Sep. 2008. [18] P. Dollr, Z. Tu, H. Tao, and S. Belongie, ”Feature mining for image classification,” in IEEE Conf. Comp. Vis. Pattern Recognit., CVPR’2007, Minneapolis, MN, Jun. 2006.