Powered by GitBook

To find out optimal parameters without iterating through gradient decent by calculating out the optima.

Normal Equation

Taking derivatives with respect to the θj and setting them to zero.

$$ \theta = (X^TX)^{-1}X^Ty

$$

No need feature scaling.

Comparison with Gradient Descent

	Gradient Descent	Normal Equation
Choosing α	Yes	No
Iterate	Yes	No
Time	$$O(kn^2)$$	$$O(n^3)$$, needs to calculate inverse
Usage timing	`n` big	`n` small

Noninvertibility

pinv gives pseudo inversion. Will give a value even if the matrix is not invertible.

Cause
- Redundant features causing linear dependency
  - Remove redundant features
- Too many features e.g. $$m \le n$$
  - Delete some features
  - Regularization

results matching ""

No results matching ""