Overview
- Motivation
- No single classifier is best for all circumstances
- Necessary & sufficient condition
- Individual classifiers (base learners) are accurate and diverse
Diversity v.s. Accuracy
Correct predictions are positively correlated with themselves; need intentionally non-optimal base learners for better ensemble performance.
- Different models/algorithms
- Different hyper-parameters
k
in KNN- Threshold in decision tree
- Kernel function in SVM
- Initial weights in neural networks
- Different input representations of the same event
- Sensor fusion, sound, mouth shape for speech recognition
- Random subset of features
- Different training sets
- Random subset of samples
- Sequential training
Majority Voting
- Error rate
- $$\epsilon$$: error of an individual classifier
- $$\epsilon_{ensemble} = P(y \ge k) = \sum^n_k {n \choose k} \epsilon^k (1-\epsilon)^{n-k}$$
- Why very good ensembles can often be constructed?
- Statistical
- When training data small, reduces risk of choosing the wrong one
- Computational
- Better approximation to the true optimum combining many local optima
- Statistical
Implementation
- Weighting schemes: confidence, accuracy, Bayesian prior, etc.
- Equal weighting
- Unequal weighting
- Weighted probability
Plurality Voting
Extension of majority voting for multi-class setting. Individual classifiers can be the same or different types.
E.g. random forest: multiple decision trees.
Parallel/Sequential Combining
- Parallel: multi-expert combination
- Global
- All base learners generate outputs
- Local
- Select a few base learners based on input (gating)
- Global
- Sequential: multi-stage combination
- Start with simpler models, increase model complexity to handle datasets not well handled
Bagging
Random subsets of samples for each base learner.
- Parallel
- Random sample with replacement
- Cannot reduce bias; same as base classifiers
Adaptive Boosting (AdaBoost)
Collecting weak base learners, i.e. diversity over accuracy. Later classifiers focus on weak parts of earlier classifiers.
- Sequential
- Random sample without replacement
- Can reduce bias
Algorithm
- For each boosting round
- Train a learner
- Predict class labels
- Compute weighted error rate
- Compute re-weighting coefficients
- Update weights
- Normalize weights