Problem Formulation

Algorithm

  1. Choose features $$x_i$$ you think might be indicative of anomalous examples
  2. Fit parameters $$\mu_1, ..., \mu_n, \sigma^2_1, ..., \sigma^2_n$$
    1. $$\muj = \frac{1}{m}\sum^m{i=1}x_j^{(i)}$$
    2. $$\sigmaj = \frac{1}{m} \sum{i=1}^m (x_j^{(i)} - \mu_j)^2$$
  3. Given new example $$x$$, compute $$p(x)$$
    1. $$p(x) = \prod^n{j=1}p(x_j; \mu_j, \sigma_j^2) = \prod^n{j=1} \frac{1}{\sqrt{2\pi}\sigma_j} e^{-\frac{(x_j-\mu_j)^2}{2\sigma_j^2}}$$
  4. Anomaly if $$p(x) < \epsilon$$

Evaluation

  1. Fit model $$p(x)$$ on training set
  2. On cross-validation/test example $$x$$, predict:
    1. $$y = 1$$ if $$p(x) < \epsilon$$ (anomaly)
    2. $$y = 0$$ if $$p(x) \ge \epsilon$$
  3. Evaluation metrics
    1. True positive, true negative, false positive, false negative
    2. Precision & recall
    3. F1-score

Anomaly Detection vs Supervised Learning

Anomaly Detection Supervised Learning
Very small number of positive examples (1-20); large number of negative examples Large number of positive & negative examples
Many different types of anomalies (because future anomalies may look totally different from what have been seen) Enough positive examples for the algorithm to learn what positive examples should look like

Features

Non-Gaussian Features

$$log(x)$$
$$log(x + n)$$
$$\sqrt{x}$$
$$x^(\frac{1}{n})$$

Choice of Features

Choose features that are uncommonly large or small under anomalous events, e.g. $$x = \frac{\text{CPU load}}{\text{network traffic}}$$.

Multivariate Gaussian Distribution

Parameters: $$\mu \in \mathbb{R}^n$$, $$\Sigma \in \mathbb{R}^{n \times n}$$(covariance matrix)

  • $$\muj = \frac{1}{m}\sum^m{i=1}x_j^{(i)}$$
  • $$\Sigma = \frac{1}{m}\sum^m_{i=1}(x^{(i)} - \mu)(x^{(i)} - \mu)^T$$
  • $$\Sigma = \begin{bmatrix} \sigma_1^2 & 0 & \dots & 0 \ 0 & \sigma_2^2 & \dots & 0 \ \vdots & \vdots & \ddots & \vdots \ 0 & \dots & \dots & \sigma_n^2\end{bmatrix}$$

  • $$p(x) = \prod^n_{j=1}p(x_j; \mu_j, \Sigma) = \frac{1}{\sqrt{2\pi}^n\sqrt{|\Sigma|}} e^{-\frac{(x - \mu)(x - \mu)^T}{2\Sigma}}$$

Original Model Multivariate Gaussian Model
Manually create features to capture anomalies Automatically capture correlations between features
Computationally cheaper Computationally more expensive
If m small If Σ invertible, i.e. if m > n

results matching ""

    No results matching ""