Pattern Recognition

Pattern Recognition Prof. Christian Bauckhage

outline additional material for lecture 22

support vector machines are neural networks

recap

in lecture 19, we discussed support vector machines for binary classification the basic idea was to consider the maximum margin between two classes to determine a separating hyperplane and thus a projection vector w and offset w0 to obtain a classifier +1 if w0 + wT x > 0 y(x) = −1 otherwise

w w0 kwk

recap

assuming labeled training data

n xi , yi i=1

where xi ∈ Rm yi ∈ −1, +1 we saw that the main problem is to identify those data vectors xs that determine the maximum margin and hence support the separating hyperplane these vectors are called the support vectors

recap

we also saw that this problem can be cast as a constrained quadratic optimization problem whose dual is a problem of estimating optimal Lagrange parameters µi for instance, for the case of a L2 SVM, the dual problem is argmax − µT G + yyT + C1 I µ µ

1T µ = 1 s.t. µ>0 where the elements of matrix G are given by Gij = yi xTi xj yj

recap

we furthermore saw that this seemingly difficult problem can easily be solved using the Frank-Wolfe algorithm we also recall that the support vectors we are after are those vectors xs in our training data for which µs > 0 (the Lagrange multipliers of non-support vectors, on the other hand, equal 0) having determined (the indices of) the support vectors, w and w0 can be computed as follows X X w= µi yi xi = µs ys xs s

µi >0

w0 =

X µi >0

µi yi =

X s

µs ys

recap

a support vector classifier therefore is a function

y(x) = sign w0 + wT x = sign w0 +

X

! µs ys xTs x

s

= sign w0 +

X

! wTs x

s

where we simply defined ws = µs ys xs

note

a support vector classifier

y(x) = sign w0 +

X

! wTs x

s

is a neural network with a single hidden layer where the activation functions of the hidden neurons are 1 f wTs x = id wTs x

y w0 f

f

x1

1

1

1

1

...

f

x2

...

f

xm