Logistic Regression
Not quite regression (continuous output), but classification (discrete output).
- Odds ratio: probability
- $$\frac{p}{(1-p)}$$
- Logit function: maps probability to real number
- $$logit(p) = log(\frac{p}{(1-p)})$$
- $$logit(p(y=1|x)) = w^T x$$
- Sigmoid: inverse of logit function, maps real number to probability
- $$\phi(z) = \frac{1}{1+e^{-z}}$$
Learning Algorithm
Maximize likelihood:
$$ L(w) = P(y|x; w) = \prod^n{i=1}P(y^{(i)}|x^{(i)}; w) = \prod^n{i=1}(\phi(z^{(i)}))^{y^{(i)}}(1-\phi(z^{(i)}))^{1-y^{(i)}}
$$
i.e. minimum negative log likelihood:
$$ J(w) = \sum^n_{i=1} (-y^{(i)}log(\phi(z^{(i)}))-(1-y^{(i)})log(1-\phi(z^{(i)})))
$$
The learning algorithm remains the same as in Adaline:
$$ \Delta wj = -\eta \frac{\partial J}{\partial w_j} = \eta \sum^n{i=1}(y^{(i)} - \phi(z^{(i)}))x_j^{(i)}
$$