To find out optimal parameters without iterating through gradient decent by calculating out the optima.

Normal Equation

Taking derivatives with respect to the θj and setting them to zero.

$$ \theta = (X^TX)^{-1}X^Ty

$$

No need feature scaling.

Comparison with Gradient Descent

Gradient Descent Normal Equation
Choosing α Yes No
Iterate Yes No
Time $$O(kn^2)$$ $$O(n^3)$$, needs to calculate inverse
Usage timing n big n small

Noninvertibility

pinv gives pseudo inversion. Will give a value even if the matrix is not invertible.

  • Cause
    • Redundant features causing linear dependency
      • Remove redundant features
    • Too many features e.g. $$m \le n$$
      • Delete some features
      • Regularization

results matching ""

    No results matching ""