Problem Formulation
Algorithm
- Choose features $$x_i$$ you think might be indicative of anomalous examples
- Fit parameters $$\mu_1, ..., \mu_n, \sigma^2_1, ..., \sigma^2_n$$
- $$\muj = \frac{1}{m}\sum^m{i=1}x_j^{(i)}$$
- $$\sigmaj = \frac{1}{m} \sum{i=1}^m (x_j^{(i)} - \mu_j)^2$$
- Given new example $$x$$, compute $$p(x)$$
- $$p(x) = \prod^n{j=1}p(x_j; \mu_j, \sigma_j^2) = \prod^n{j=1} \frac{1}{\sqrt{2\pi}\sigma_j} e^{-\frac{(x_j-\mu_j)^2}{2\sigma_j^2}}$$
- Anomaly if $$p(x) < \epsilon$$
Evaluation
- Fit model $$p(x)$$ on training set
- On cross-validation/test example $$x$$, predict:
- $$y = 1$$ if $$p(x) < \epsilon$$ (anomaly)
- $$y = 0$$ if $$p(x) \ge \epsilon$$
- Evaluation metrics
- True positive, true negative, false positive, false negative
- Precision & recall
- F1-score
Anomaly Detection vs Supervised Learning
Anomaly Detection | Supervised Learning |
---|---|
Very small number of positive examples (1-20); large number of negative examples | Large number of positive & negative examples |
Many different types of anomalies (because future anomalies may look totally different from what have been seen) | Enough positive examples for the algorithm to learn what positive examples should look like |
Features
Non-Gaussian Features
$$log(x)$$
$$log(x + n)$$
$$\sqrt{x}$$
$$x^(\frac{1}{n})$$
Choice of Features
Choose features that are uncommonly large or small under anomalous events, e.g. $$x = \frac{\text{CPU load}}{\text{network traffic}}$$.
Multivariate Gaussian Distribution
Parameters: $$\mu \in \mathbb{R}^n$$, $$\Sigma \in \mathbb{R}^{n \times n}$$(covariance matrix)
- $$\muj = \frac{1}{m}\sum^m{i=1}x_j^{(i)}$$
- $$\Sigma = \frac{1}{m}\sum^m_{i=1}(x^{(i)} - \mu)(x^{(i)} - \mu)^T$$
$$\Sigma = \begin{bmatrix} \sigma_1^2 & 0 & \dots & 0 \ 0 & \sigma_2^2 & \dots & 0 \ \vdots & \vdots & \ddots & \vdots \ 0 & \dots & \dots & \sigma_n^2\end{bmatrix}$$
$$p(x) = \prod^n_{j=1}p(x_j; \mu_j, \Sigma) = \frac{1}{\sqrt{2\pi}^n\sqrt{|\Sigma|}} e^{-\frac{(x - \mu)(x - \mu)^T}{2\Sigma}}$$
Original Model | Multivariate Gaussian Model |
---|---|
Manually create features to capture anomalies | Automatically capture correlations between features |
Computationally cheaper | Computationally more expensive |
If m small | If Σ invertible, i.e. if m > n |