K-Means Algorithm

Init K cluster centroids mu[1]..mu[K] in R^n

Repeat:
    # cluster assignment
    for i = 1..m:
        c[i] := index of cluster centroid closest to x[i]

    # move centroid
    for k = 1..K:
        mu[k] := average of points assigned to cluster k

$$c^{(i)}$$: index of cluster which $$x^{(i)}$$ is assigned to
$$\mu_k$$: cluster centroid $$k$$
$$\mu_{c^{(i)}}$$: cluster centroid of cluster to which example $$x^{(i)}$$ has been assigned
Optimization objective:

$$ J(c^{(1)}, ..., c^{(m)}, \mu1, ..., \mu_K) = \frac{1}{m} \sum{i=1}^{m} |{x^{(i)} - \mu{c^{(i)}}}|^2\ min{c^{(1)}, ..., c^{(m)}, \mu_1, ..., \mu_K} J(c^{(1)}, ..., c^{(m)}, \mu_1, ..., \mu_K)

Random Initialization

Randomly pick $$K$$ training examples, set $$\mu_1, ..., \mu_k$$ to these examples.

for i = 1..100:
    Random init mu[1]..mu[K]

    # run K-means, get c & mu

    Calculate cost function J(c, mu)

Pick argmin(J)

Choosing K

Elbow Method
Consider downstream tasks

Dimensionality Reduction

Principal Component Analysis

Reduce from n-dimension to k-dimension: find $$k$$ vectors $$u^{(1)}, ..., u^{(k)}$$ onto which to project the data, so as to minimize the projection error.

Algorithm

Data preprocessing

Feature scaling/mean normalization:

$$ \muj = \frac{1}{m} \sum{i=1}^m x_j^{(i)}

Replace $$x_j^{(i)}$$ with $$x_j^{(i)} - \mu_j$$.
(Optional) If features on different scales, replace $$x_j^{(i)}$$ with $$\frac{x_j^{(i)} - \mu_j}{s_j}$$.

Reduce Dimensionality

Compute covariance matrix:

$$ \Sigma = \frac{1}{m} \sum^n_{i=1} (x^{(i)})(x^{(i)})^T

   Sigma = (1/m) * X * X';

Compute eigenvectors:

[U, S, V] = svd(Sigma); # Single Value Decomposition
# U in R^(nxn), Sigma in R^(nxn)

Take $$u^{(1)}..u^{(K)}$$ as $$U_{reduce}$$
```
Ureduce = U(:, 1:k);
```
$$z = U_{reduce}^Tx$$
```
z = Ureduce' * x;
```

Reconstruction from Compressed Representation

$$ x{approx} = U{reduce}z

Choosing k (number of principal components)

Average squared projection error: $$\frac{1}{m} \sum^m{i=1} |x^{(i)} - x{approx}^{(i)}|^2$$
Total variation in data: $$\frac{1}{m} \sum^m_{i=1} |x^{(i)}|^2$$

Choose $$k$$ to be smallest s.t. $$\frac{\frac{1}{m} \sum^m{i=1} |x^{(i)} - x{approx}^{(i)}|^2}{\frac{1}{m} \sum^m{i=1} |x^{(i)}|^2} = (1 - \frac{\sum{i=1}^k S{ii}}{\sum{i=1}^n S_{ii}}) \le 0.01$$ (99% of variance retained).

Algorithm

k = 1

Repeat:
    Try PCA with k

    Compute Ureduce, z, x_approx, S

    Pick smallest k s.t. (sum S[1,1]..S[k,k] / sum S[1,1]..S[n,n]) >= 0.99)

Implementation

Run PCA (mapping $$x^{(i)}$$ to $$z^{(i)}$$) only on training data; this mapping can be applied to $$x{cv}$$ and $$x{test}$$
Use case
- Compression
  - Reduce storage/memory
  - Speed up algorithm
- Visualizatoin
Invalid use case
- Prevent overfitting (should use regularization)
- Design of ML system (should run with raw data first)

Unsupervised Learning (Stanford)

K-Means Algorithm

Random Initialization

Choosing K

Dimensionality Reduction

Principal Component Analysis

Algorithm

Data preprocessing

Reduce Dimensionality

Reconstruction from Compressed Representation

Choosing k (number of principal components)

Average squared projection error: $$\frac{1}{m} \sum^m{i=1} |x^{(i)} - x{approx}^{(i)}|^2$$

Algorithm

Implementation

results matching ""

No results matching ""