Curse of Dimensionality

  • High dimensions
    • Average distance between two points will be large -> sparse
    • Predictions much less reliable -> need more data
  • Reduce dimensionality
    • Speed up training
    • Helps visualization
  • Approaches
    • Feature selection
    • Feature extraction
      • Transforms dataset into a new feature space with lower dimensionality

Principal Component Analysis

Feature extraction, unsupervised. Finds the directions of maximum variance (minimum information loss) in data & projects data onto a new subspace with equal or fewer dimensions.

  • Principal component
    • The orthogonal axes of the new subspace

Algorithm

Find $$d \times k$$ matrix $$W$$ to map vector $$x$$ of dimension $$d$$ to new $$k$$-dimensional feature space.

$$ z = xW, x \in \mathbb{R}^d, z \in \mathbb{R}^k, W \in \mathbb{R}^{d \times k}

$$

  1. Standardize $$d$$-dimensional dataset
  2. Construct covariance matrix
    1. $$\Sigma = \begin{bmatrix}\sigma1^2 & \sigma{12} & \sigma{13} \ \sigma{21} & \sigma2^2 & \sigma{23} \ \sigma{31} & \sigma{32} & \sigma_3^2 \end{bmatrix}$$
  3. Decompose covariance matrix into eigenvectors & eigenvalues
    1. $$\Sigma v = \lambda v$$
    2. Eigenvector: principal components
    3. Eigenvalue: magnitude of the corresponding principal component
  4. Sort eigenvalues by decreasing order to rank the corresponding eigenvectors
  5. Select $$k$$ eigenvectors corresponding to the $$k$$ largest eigenvalues
  6. Construct projection matrix $$W$$ from the top $$k$$ eigenvectors
  7. Transform $$d$$-dimensional dataset into a new $$k$$-dimensional one
Variance

$$\sigma^2 = \sum \frac{(X - \mu)^2}{n}$$

Covariance

$$\sigma{jk} = \frac{1}{n}\sum^n{i=1}(x_j^{(i)} - \mu_j)(x_j^{(i)} - \mu_k)$$

PCA for Compression

Lossy compression: only keep the information that has the most effect on how the image looks.

Can do so only if $$W$$ has an inverse -> non-square matrices do not have one.

results matching ""

    No results matching ""