Overview

$$ \begin{aligned} wx_i + b &\ge 1 \text{ when } y_i = 1\ wx_i + b &\le 1 \text{ when } y_i = -1\ \rightarrow y_i(wx_i + b) &\ge 1 \end{aligned}

$$

$$ \begin{aligned} wx^+ + b &= 1\ wx^- + b &= -1\ \rightarrow w(x^+ - w^-) &= 2\ M &= \frac{(x^+ - x^-) \cdot w}{|w|} = \frac{2}{|w|} \end{aligned}

$$

  • Maximize $$M$$ = minimize $$\frac{1}{2}w^Tw$$, where $$y_i(wx_i + b) \ge 1$$ for all $$i$$
    • Quadratic optimization problem, solve for $$w$$ and $$b$$

Solving Quadratic Optimization Problem

Dual Problem

Find Lagrange multiplier $$\alpha$$ s.t.

  • $$Q(\alpha) = \sum \alpha_i - \frac{1}{2}\sum\sum \alpha_i \alpha_j y_i y_j K(x_i, x_j)$$ is maximized
  • $$\sum \alpha_i y_i = 0$$
  • $$\alpha_i \ge 0$$
Solution
  • $$w = \sum \alpha_i y_i x_i$$
  • $$b = y_k - w^T x_k$$ for any $$x_k$$ s.t. $$\alpha_k \ne 0$$ (i.e. $$x_k$$ is a supporting vector)
Classifying Function
  • $$f(x) = wx + b = \sum \alpha_i y_i K(x_i, x_j) + b$$ (involves computing inner products between all pairs of training points)

Soft Margin Classification

Slack variable $$\epsilon$$ to allow noise.

  • Minimize $$\frac{1}{2}w^Tw + C \sum_{k=1}^R \epsilon_k$$, where $$y_i(wx_i + b) \ge 1 - \epsilon_i$$, $$\epsilon_i \ge 0$$ for all $$i$$
  • $$C$$ to control overfitting

Non-Linear SVM

Kernel Function
  • Linear classifier: $$K(x_i, x_j) = x_i^Tx_j$$
  • With mapping $$\Phi: x \rightarrow \phi(x)$$, $$K(x_i, x_j) = \phi(x_i)^T\phi(x_j)$$

    • Polynomial power $$K(x_i, x_j) = (1 + x_i^Tx_j)^p$$

    • Gaussian kernel $$K(x_i, x_j) = e^{-\frac{|x_i - x_j|^2}{2\sigma^2}}$$

    • Sigmoid $$K(x_i, x_j) = tanh(\beta_0 x_i^Tx_j + \beta_1)$$

Properties
  • Flexibility in choosing kernel function
  • Sparseness of solution
    • Only support vectors used
  • Able to handle large feature space
  • Overfitting controlled by soft margin approach
  • Always converge to a single global solution
  • Feature selection

results matching ""

    No results matching ""