Overview

  • Motivation
    • No single classifier is best for all circumstances
  • Necessary & sufficient condition
    • Individual classifiers (base learners) are accurate and diverse
Diversity v.s. Accuracy

Correct predictions are positively correlated with themselves; need intentionally non-optimal base learners for better ensemble performance.

  • Different models/algorithms
  • Different hyper-parameters
    • k in KNN
    • Threshold in decision tree
    • Kernel function in SVM
    • Initial weights in neural networks
  • Different input representations of the same event
    • Sensor fusion, sound, mouth shape for speech recognition
    • Random subset of features
  • Different training sets
    • Random subset of samples
    • Sequential training

Majority Voting

  • Error rate
    • $$\epsilon$$: error of an individual classifier
    • $$\epsilon_{ensemble} = P(y \ge k) = \sum^n_k {n \choose k} \epsilon^k (1-\epsilon)^{n-k}$$
  • Why very good ensembles can often be constructed?
    • Statistical
      • When training data small, reduces risk of choosing the wrong one
    • Computational
      • Better approximation to the true optimum combining many local optima
Implementation
  • Weighting schemes: confidence, accuracy, Bayesian prior, etc.
    • Equal weighting
    • Unequal weighting
    • Weighted probability

Plurality Voting

Extension of majority voting for multi-class setting. Individual classifiers can be the same or different types.

E.g. random forest: multiple decision trees.

Parallel/Sequential Combining

  • Parallel: multi-expert combination
    • Global
      • All base learners generate outputs
    • Local
      • Select a few base learners based on input (gating)
  • Sequential: multi-stage combination
    • Start with simpler models, increase model complexity to handle datasets not well handled

Bagging

Random subsets of samples for each base learner.

  • Parallel
  • Random sample with replacement
  • Cannot reduce bias; same as base classifiers

Adaptive Boosting (AdaBoost)

Collecting weak base learners, i.e. diversity over accuracy. Later classifiers focus on weak parts of earlier classifiers.

  • Sequential
  • Random sample without replacement
  • Can reduce bias
Algorithm
  1. For each boosting round
    1. Train a learner
    2. Predict class labels
    3. Compute weighted error rate
    4. Compute re-weighting coefficients
    5. Update weights
    6. Normalize weights

results matching ""

    No results matching ""