Optimization Objective

$$min{\theta} C \sum^m{i=1}[y^{(i)}cost1(\theta^T x^{(i)}) + (1 - y^(i)) cost_0 (\theta^T x^{(i)})] + \frac{1}{2} \sum^n{i=1} \theta^2_j$$

$$\begin{aligned} h_{\theta}(x) &= 1 \text{ if } \theta^T x \ge 0\ &= 0 \text{ otherwise} \end{aligned}$$

Large Margin Classifier

Given $$x$$, compute new feature depending on proximity to landmarks.

$$f_i = similarity(x, l^{(i)}) = e^{-\frac{|x-l^{(i)}|^2}{2\sigma^2}}$$

Choose $$l^{(i)} = x^{(i)}$$ .

$$C$$
- Large: lower bias, higher variance
- Small: higher bias, lower variance
$$\sigma^2$$
- Large: features vary more smoothly, higher bias, lower variance
- Small: features vary less smoothly, smaller bias, higher variance

Train $$K$$ SVMs, each distinguishing $$y = i$$ from the rest. Get $$\theta^{(i)}$$s. Pick class $$i$$ with largest $$(\theta^{(i)})^T x$$.

Let n = # of features, m = # of training examples

n larger than m
- Logistic regression || SVM with linear kernel
n small (1-1000), m intermediate (10-10000)
- SVM with Gaussian kernel
n small (1-1000), m large (50000+)
- Create more features
- Logistic regression || SVM with linear kernel

Neural networks slower to train.