Powered by GitBook

K-Nearest Neighbors

The Simplest Classifier

Assign new sample the class of similar labelled samples
Instance-based
- Non-parametric learning

Algorithm

A training dataset of samples already classified
For each new sample, find k nearest records
- Assign it the majority of the k nearest neighbors

Choosing `k`

Bias-variance tradeoff
- High variance: overfitting
- High bias: underfitting
Common practice
- $$k = \sqrt{\text{size of training sample}}$$

Data Normalization

$$ X' = \frac{X - min(X)}{max(X) - min(X)}

$$

Problem: min/max values might change in the future

Fix

$$ X' = \frac{X - \mu}{\sigma}

$$

Nominal/Categorical Features

Dummy coding
- Same as one-hot, but leave one out as base variable
One-hot coding
Ordinal coding

results matching ""

No results matching ""