Trouble Shooting

Getting more training examples
- Fix high variance
Trying smaller sets of features
- Fix high variance
Trying additional features
- Fix high bias
Trying polynomial features
- Fix high bias
Increasing λ
- Fix high variance
Decreasing λ
- Fix high bias

Evaluating Hypothesis

Split dataset into training (70%) & test (30%) sets
Learn $$\Theta$$ and minimize $$J_{train}(\Theta)$$
Compute
Linear regression

$$ J{test}(\Theta) = \frac{1}{2m{test}} \sum{i=1}^{m{test}} (h{\Theta}(x{test}^{(i)}) - y_{test}^{(i)})^2

Classification

$$ \begin{aligned} err(h{\Theta}(x), y) &= 1 & \text{ if } h{\Theta}(x) \ge 0.5 \text{ and } y = 0 \text { or } h{\Theta}(x) < 0.5 \text{ and } y = 1 \ &= 0 & \text{otherwise} \ \text{Test error} &= \frac{1}{m{test}}\sum^{m{test}}{i=1} err(h{\Theta}(x^{(i)}{test}), y^{(i)}_{test}) \end{aligned}

Model Selection

Split dataset into training (60%), cross validation (20%) & test set (20%)
Learn $$\Theta$$ and minimize $$J_{train}(\Theta)$$ on training set for each polynomial degree
Find the polynomial degree $$d$$ with the least error using the cross validation set
Estimate the generalization error using the test set with $$J_{test}(\Theta^{(d)})$$

Bias & Variance

High Bias (Underfitting)

Both $$J{train}(\Theta)$$ & $$J{CV}(\Theta)$$ high
$$J{train}(\Theta) \approx J{CV}(\Theta)$$

High Variance (Overfitting)

$$J_{train}(\Theta)$$ low
$$J{train}(\Theta) \gg J{CV}(\Theta)$$

Regularization

Create a list of $$\lambda$$ (i.e. $$\lambda \in { 0, 0.01, 0.02, 0.04, 0.08, 0.16, 0.32, 0.64, 1.28, 2.56, 5.12, 10.24 }$$)
Create a set of models with different degrees or any other variants
Iterate through the $$\lambda$$ s, and for each one, go through all the models to learn some $$\Theta$$
Compute the cross validation error using the learned $$\Theta$$ (computed with $$\lambda$$) on the $$J_{CV}(\Theta)$$ without regularization (i.e. $$\lambda = 0$$)
Select the best combo that produces the lowest error on the cross validation set
Using the best combo $$\Theta$$ and $$\lambda$$, apply it on $$J_{test}(\Theta)$$ to see if it has a good generalization of the problem

Learning Curves

As the training set gets larger, the error for a quadratic function increases
The error value will plateau out after a certain m, or training set size

High Bias

Low training set size
- Causes $$J{train}(\Theta)$$ to be low and $$J{CV}(\Theta)$$ to be high
Large training set size
- Causes both $$J{train}(\Theta)$$ and $$J{CV}(\Theta)$$ to be high

Getting more data will not help.

High Variance

Low training set size
- Causes $$J{train}(\Theta)$$ to be low and $$J{CV}(\Theta)$$ to be high
Large training set size
- $$J_{train}(\Theta)$$ increases with training set size
- $$J_{CV}(\Theta)$$ decreases without leveling off
- $$J{train}(\Theta) < J{CV}(\Theta)$$, but difference remains significant

Getting more data will likely help.

Diagnosing Neural Networks

Few parameters / lower-order polynomials
- Prone to underfitting
- Computationally cheaper
More parameters / higher-order polynomials
- Prone to overfitting
  - Can use regularization (increase λ) to address
- Computationally expensive

Error Analysis

Approach to solving ML problems
- Start with a simple algorithm, implement it quickly, test it early on your cross validation data
- Plot learning curves to decide if more data, more features, etc. are likely to help
- Manually examine the errors on examples in the cross validation set, try to spot a trend where most of the errors were made

Skewed Data

	Actual class 1	Actual class 0
Predicted class 1	True positive	False positive
Predicted class 0	False negative	True negative

Error metrics
- Accuracy
  - (True positive + True negative) / total examples
- Precision
  - True positive / (True positive + False positive)
- Recall
  - True positive / (True positive + True negative)
- F1-score
  - 2 * precision * recall / (precision + recall)

Evaluation (Stanford)