Problem Formulation

Algorithm

Choose features $$x_i$$ you think might be indicative of anomalous examples
Fit parameters $$\mu_1, ..., \mu_n, \sigma^2_1, ..., \sigma^2_n$$
1. $$\muj = \frac{1}{m}\sum^m{i=1}x_j^{(i)}$$
2. $$\sigmaj = \frac{1}{m} \sum{i=1}^m (x_j^{(i)} - \mu_j)^2$$
Given new example $$x$$, compute $$p(x)$$
1. $$p(x) = \prod^n{j=1}p(x_j; \mu_j, \sigma_j^2) = \prod^n{j=1} \frac{1}{\sqrt{2\pi}\sigma_j} e^{-\frac{(x_j-\mu_j)^2}{2\sigma_j^2}}$$
Anomaly if $$p(x) < \epsilon$$

Fit model $$p(x)$$ on training set
On cross-validation/test example $$x$$, predict:
1. $$y = 1$$ if $$p(x) < \epsilon$$ (anomaly)
2. $$y = 0$$ if $$p(x) \ge \epsilon$$
Evaluation metrics
1. True positive, true negative, false positive, false negative
2. Precision & recall
3. F1-score

Anomaly Detection	Supervised Learning
Very small number of positive examples (1-20); large number of negative examples	Large number of positive & negative examples
Many different types of anomalies (because future anomalies may look totally different from what have been seen)	Enough positive examples for the algorithm to learn what positive examples should look like

Choose features that are uncommonly large or small under anomalous events, e.g. $$x = \frac{\text{CPU load}}{\text{network traffic}}$$.

Parameters: $$\mu \in \mathbb{R}^n$$, $$\Sigma \in \mathbb{R}^{n \times n}$$(covariance matrix)

$$\muj = \frac{1}{m}\sum^m{i=1}x_j^{(i)}$$
$$\Sigma = \frac{1}{m}\sum^m_{i=1}(x^{(i)} - \mu)(x^{(i)} - \mu)^T$$
$$\Sigma = \begin{bmatrix} \sigma_1^2 & 0 & \dots & 0 \ 0 & \sigma_2^2 & \dots & 0 \ \vdots & \vdots & \ddots & \vdots \ 0 & \dots & \dots & \sigma_n^2\end{bmatrix}$$
$$p(x) = \prod^n_{j=1}p(x_j; \mu_j, \Sigma) = \frac{1}{\sqrt{2\pi}^n\sqrt{|\Sigma|}} e^{-\frac{(x - \mu)(x - \mu)^T}{2\Sigma}}$$

Original Model	Multivariate Gaussian Model
Manually create features to capture anomalies	Automatically capture correlations between features
Computationally cheaper	Computationally more expensive
If m small	If Σ invertible, i.e. if m > n