Support Vector Machine (SVM)

By LI Haoyang 2020.10.25

This is a short note summarized from Section 2.3, Chapter 2 in the book below

Bennamoun M, Shah SAA, Khan S, Rahmani H. A Guide to Convolutional Neural Networks for Computer Vision. Morgan & Claypool Publishers: Morgan & Claypool Publishers; 2018.

Support Vector Machine (SVM) is designed to find an optimal linear hyperplane to separate the training dataset into two classes, which is achieved by making the margin between the nearest data points to be as large as possible.

$\cal{D}$ of two classes, i.e.

\cal{D}=\{(\bold{x}_1,y_1),\dots,(\bold{x}_n,y_n)\},\bold{x}_i\in\R^d,y_i\in\{1,-1\}

The hyperplane separating the data can be written as

\bold{w}^\top \bold{x}+b=0

$\bold{w}^\top\bold{x}_i+b>0$ $1$ $\bold{w}^\top\bold{x}_i+b < 0$ $-1$ $\bold{w}^\top\bold{x}_i+b=1$ $\bold{w}^\top\bold{x}_i+b=-1$ $\frac{2}{\sqrt{\bold{w}^\top\bold{w}}}$ .

Thus, SVM is learned by solving the following problem

\min_{\bold{w},b}\frac{\bold{w^\top w}}{2},\ s.t.\forall\bold{x}_i\in\mathcal{D}:y_i(\bold{w}^\top\bold{x}_i+b)\ge 1

$\xi_i$ can allow some data points to appear on the other side (soft-margin extension)

\min_{\bold{w},b,\xi}\frac{\bold{w^\top w}}{2}+C\sum_i\xi_i,\ s.t.\forall\bold{x}_i\in\mathcal{D}:y_i(\bold{w}^\top\bold{x}_i+b)\ge 1-\xi_i,\xi_i\ge 0

$\phi:\R^d\to\R^D$ can be applied to project the data points into a higher-dimensional and linear separable space (nonlinear decison boundary)

\min_{\bold{w},b,\xi}\frac{\bold{w^\top w}}{2}+C\sum_i\xi_i,\ s.t.\forall\bold{x}_i\in\mathcal{D}:y_i(\bold{w}^\top\phi(\bold{x}_i)+b)\ge 1-\xi_i,\xi_i\ge 0

$D\gg d$ $\bold{w}$ directly, to avoid which, the Lagrange dual form is used (dual form of SVM)

\max_{\alpha}\sum_i\alpha_i-\frac{1}{2}\sum_{i,j}\alpha_i\alpha_jy_iy_j\phi(\bold{x}_i)^\top\phi(\bold{x}_j),\ s.t.\sum_i\alpha_iy_i=0,0\le\alpha\le C\\

The derivation of dual form of SVM
For the original problem, the Lagrange function is
$L(\bold{w},b,\alpha)=\frac{\bold{w}^\top\bold{w}}{2}-\sum_{i=1}^n\alpha_i[y_i(\bold{w}^\top\phi(\bold{x}_i)+b)-1]$
By Lagrange duality, the following problem is equal to the original problem
$\min_{\bold{w},b}\max_{\alpha}L(\bold{w},b,\alpha)\iff \max_{\alpha}\min_{\bold{w},b}L(\bold{w},b,\alpha)$
The dual form of SVM is then
$\min_{\bold{w},b}L(\bold{w},b,\alpha)$
$L(\bold{w},b,\alpha)$ $\bold{w}$ $b$ are
$\nabla_{\bold{w}}L(\bold{w},b,\alpha)=\bold{w}-\sum_{i=1}^n\alpha_iy_i\phi(\bold{x}_i)\\ \nabla_bL(\bold{w},b,\alpha)=-\sum_{i=1}^n\alpha_iy_i$
Thus the solution for the inner minimization is
$\bold{w}=\sum_{i=1}^n\alpha_iy_i\phi(\bold{x}_i)\\ \sum_{i=1}^n\alpha_iy_i=0$
Hence there is the dual form is SVM introduced above.

$\phi(\bold{x}_i)^\top\phi(\bold{x}_i)$ $\bold{x}_i$ $\phi(\bold{x}_i)^\top\phi(\bold{x}_i)$ , with which, the problem becomes (kernel trick)

\max_{\alpha}\sum_i\alpha_i-\frac{1}{2}\sum_{i,j}\alpha_i\alpha_jy_iy_jK(\bold{x}_i,\bold{x}_j),\ s.t.\sum_i\alpha_iy_i=0,0\le\alpha\le C

Thus the final form of SVM.