Setup
Training data
$$ {x{i}, y{i}}^{N}_{i=1} $$
Objective function: predict for each
x
$$ \begin{align} p(y|x) &= \frac{e^{W{y}.x}}{\sum{c=1}^{C}e^{W{c}.x}}~where~W \in \mathbb{R}^{C \times d} \ &= softmax(f) \downarrow y \ f{y} &= W{y}.x = \sum{i=1}^{d}W{yi}x{i} \end{align} \ $$
$$ \rightarrow Minimize \ -log~p(y|x) = -log(\frac{e^{f{y}}}{\sum{c=1}^{C}e^{f_{c}}}) $$
Cross-entropy error with ground truth probability
p
and computed probabilityq
$$ \begin{align} H(p,q) &= -\sum{c=1}^{C}p(c)~log~q(c) \ &= H(p) + D{KL}(p||q)~(Kullback-Leibler~divergence) \ D{KL}(p||q) &= \sum{c=1}^{C}p(c)~log~ \frac{p(c)}{q(c)} \end{align} $$
$$ J(θ) = \frac{1}{N}\sum{i=1}^{N}-log(\frac{e^{f{i}}}{\sum{c=1}^{C}e^{f{c}}}) $$
$$ \rightarrow with~regularization \ J(θ) = \frac{1}{N}\sum{i=1}^{N}-log(\frac{e^{f{i}}}{\sum{c=1}^{C}e^{f{c}}}) + \lambda \sum{k}θ{k}^{2} $$