Decision Tree

  • Training: building a tree
    • Long
  • Prediction: going through one path
    • Short
Pros & Cons
  • Pros
    • Simple to understand & interpret
    • Handles numerical & categorical data
    • Little data preparation needed e.g. normalization not needed
    • White box model
    • Performs well with large datasets
    • Mirrors human decision making closely
  • Cons
    • Sensitive to small changes in data
    • NP-completeness of training
    • Prone to overfitting
    • Tree can be prohibitively large for data boundaries not orthogonally orientated

Training Algorithm

Classification And Regression Tree (CART)

Assume binary split, features with continuous values.

  1. Split training set into 2 subsets with a single feature $$k$$ and threshold $$t_k$$
    1. Minimize $$J(k, tk) = \frac{m{left}}{m}G{left} + \frac{m{right}}{m}G_{right}$$
    2. $$G$$: impurity of left/right subset
    3. $$m$$: number of instances in left/right subset
  2. Recursively do so for each subset
  3. Stop when:

    1. Maximum depth reached
    2. Cannot find split that reduces impurity
  4. Properties

    • Greedy; not guaranteed to be optimal (minimum average depth)
    • NP-complete to find an optimal tree
Values

Discrete

Continuous

Impurity Measure
  • Gini: computes faster
    • $$Gi = 1 - \sum^n{k=1}p_{i, k}^2$$
    • $$p_{i,k}$$: ratio of class $$k$$ instances among the training instances in the i-th node
  • Entropy: produces more balanced trees
    • $$Hi = -\sum^n{k=1} p{i,k}log(p{i,k})$$

Regularization

  • Maximum depth
  • Minimum samples in a node before it can split
  • Minimum samples in a leaf node
  • Maximum leaf nodes
  • Maximum features evaluated for splitting

Regression

Instead of a class, predict a value.

Also needs regularization.

Impurity Measure
  • Minimum Squared Error (MSE)
    • $$MSE{node} = \sum{i \in node}(\hat{y}_{node} - y^{(i)})^2)$$
    • $$\hat{y}{node} = \frac{1}{m{node}} \sum_{i \in node} y^{(i)}$$

results matching ""

    No results matching ""