Logistic Regression

Not quite regression (continuous output), but classification (discrete output).

Odds ratio: probability
- $$\frac{p}{(1-p)}$$
Logit function: maps probability to real number
- $$logit(p) = log(\frac{p}{(1-p)})$$
- $$logit(p(y=1|x)) = w^T x$$
Sigmoid: inverse of logit function, maps real number to probability
- $$\phi(z) = \frac{1}{1+e^{-z}}$$

Learning Algorithm

Maximize likelihood:

$$ L(w) = P(y|x; w) = \prod^n{i=1}P(y^{(i)}|x^{(i)}; w) = \prod^n{i=1}(\phi(z^{(i)}))^{y^{(i)}}(1-\phi(z^{(i)}))^{1-y^{(i)}}

i.e. minimum negative log likelihood:

$$ J(w) = \sum^n_{i=1} (-y^{(i)}log(\phi(z^{(i)}))-(1-y^{(i)})log(1-\phi(z^{(i)})))

The learning algorithm remains the same as in Adaline:

$$ \Delta wj = -\eta \frac{\partial J}{\partial w_j} = \eta \sum^n{i=1}(y^{(i)} - \phi(z^{(i)}))x_j^{(i)}