Problems with PCA
- Different classes become more discriminable, but still not linearly separable
Linear Discriminant Analysis
- PCA
- Finds orthogonal component axes of maximum variance in dataset
- No supervision
- LDA
- Find feature subspace that optimizes class separability
- With supervision
Both are linear transformation techniques.
Algorithm
Goal: maximize between-class variance & minimize within-class variance.
- Standardize $$d$$-dimensional dataset
- For each class, compute mean vector
- Construct between-class scatter matrix $$S_B$$ & within-class scatter matrix $$S_W$$
- $$SW = \sum^c{i=1} Si \ S_i = \sum{x \in D_i}(x - m_i)(x - m_i)^T$$
- $$SB = \sum^c{i=1}n_i(m_i-m)(m_i-m)^T$$
- Decompose $$S_W^{-1}S_B$$ into eigenvectors & eigenvalues
- Sort eigenvalues by decreasing order to rank the corresponding eigenvectors
- Select $$k$$ eigenvectors corresponding to the $$k$$ largest eigenvalues
- Construct projection matrix $$W$$ from the top $$k$$ eigenvectors
- Transform $$d$$-dimensional dataset into a new $$k$$-dimensional one