K-Means Algorithm

Init K cluster centroids mu[1]..mu[K] in R^n

Repeat:
    # cluster assignment
    for i = 1..m:
        c[i] := index of cluster centroid closest to x[i]

    # move centroid
    for k = 1..K:
        mu[k] := average of points assigned to cluster k
  • $$c^{(i)}$$: index of cluster which $$x^{(i)}$$ is assigned to
  • $$\mu_k$$: cluster centroid $$k$$
  • $$\mu_{c^{(i)}}$$: cluster centroid of cluster to which example $$x^{(i)}$$ has been assigned
  • Optimization objective:

$$ J(c^{(1)}, ..., c^{(m)}, \mu1, ..., \mu_K) = \frac{1}{m} \sum{i=1}^{m} |{x^{(i)} - \mu{c^{(i)}}}|^2\ min{c^{(1)}, ..., c^{(m)}, \mu_1, ..., \mu_K} J(c^{(1)}, ..., c^{(m)}, \mu_1, ..., \mu_K)

$$

Random Initialization

Randomly pick $$K$$ training examples, set $$\mu_1, ..., \mu_k$$ to these examples.

for i = 1..100:
    Random init mu[1]..mu[K]

    # run K-means, get c & mu

    Calculate cost function J(c, mu)

Pick argmin(J)

Choosing K

  • Elbow Method
  • Consider downstream tasks

Dimensionality Reduction

Principal Component Analysis

Reduce from n-dimension to k-dimension: find $$k$$ vectors $$u^{(1)}, ..., u^{(k)}$$ onto which to project the data, so as to minimize the projection error.

Algorithm
Data preprocessing
  1. Feature scaling/mean normalization:

$$ \muj = \frac{1}{m} \sum{i=1}^m x_j^{(i)}

$$

Replace $$x_j^{(i)}$$ with $$x_j^{(i)} - \mu_j$$.
(Optional) If features on different scales, replace $$x_j^{(i)}$$ with $$\frac{x_j^{(i)} - \mu_j}{s_j}$$.

Reduce Dimensionality
  1. Compute covariance matrix:

$$ \Sigma = \frac{1}{m} \sum^n_{i=1} (x^{(i)})(x^{(i)})^T

$$

   Sigma = (1/m) * X * X';
  1. Compute eigenvectors:
    [U, S, V] = svd(Sigma); # Single Value Decomposition
    # U in R^(nxn), Sigma in R^(nxn)
    
  2. Take $$u^{(1)}..u^{(K)}$$ as $$U_{reduce}$$
    Ureduce = U(:, 1:k);
    
  3. $$z = U_{reduce}^Tx$$
    z = Ureduce' * x;
    
Reconstruction from Compressed Representation

$$ x{approx} = U{reduce}z

$$

Choosing k (number of principal components)
  • Average squared projection error: $$\frac{1}{m} \sum^m{i=1} |x^{(i)} - x{approx}^{(i)}|^2$$
  • Total variation in data: $$\frac{1}{m} \sum^m_{i=1} |x^{(i)}|^2$$

Choose $$k$$ to be smallest s.t. $$\frac{\frac{1}{m} \sum^m{i=1} |x^{(i)} - x{approx}^{(i)}|^2}{\frac{1}{m} \sum^m{i=1} |x^{(i)}|^2} = (1 - \frac{\sum{i=1}^k S{ii}}{\sum{i=1}^n S_{ii}}) \le 0.01$$ (99% of variance retained).

Algorithm
k = 1

Repeat:
    Try PCA with k

    Compute Ureduce, z, x_approx, S

    Pick smallest k s.t. (sum S[1,1]..S[k,k] / sum S[1,1]..S[n,n]) >= 0.99)
Implementation
  • Run PCA (mapping $$x^{(i)}$$ to $$z^{(i)}$$) only on training data; this mapping can be applied to $$x{cv}$$ and $$x{test}$$
  • Use case
    • Compression
      • Reduce storage/memory
      • Speed up algorithm
    • Visualizatoin
  • Invalid use case
    • Prevent overfitting (should use regularization)
    • Design of ML system (should run with raw data first)

results matching ""

    No results matching ""