Problem Formulation

Content-Based Recommender System

  • $$r(i, j)$$: 1 if user $$j$$ has rated movie $$i$$, 0 otherwise$$y^{(i, j)}$$
  • $$y^{(i, j)}$$: rating of user $$j$$ on movie $$i$$ (if defined)$$\theta^{(j)}$$
  • $$\theta^{(j)}$$: parameter vector for user $$j$$$$x^{(i)}$$
  • $$x^{(i)}$$: feature vector for movie $$i$$$$(\theta^{(j)})^Tx^{(i)}$$ (prepend 1 as intercept term)
  • $$(\theta^{(j)})^Tx^{(i)}$$: predicted rating of user $$j$$ on movie $$i$$
  • $$m^{(j)}$$: # of movies rated by user $$j$$

To learn $$\theta^{(j)}$$:

$$ min{\theta^{(j)}} = \frac{1}{2m^{(j)}} \sum{i:r(i, j) = 1} ((\theta^{(j)})^Tx^{(i)} - y^{(i, j)})^2 + \frac{\lambda}{2m^{(j)}}\sum_{k=1}^n (\theta_k^{(j)})^2

$$

To learn $$\theta^{(1)}, ..., \theta^{(n_u)}$$:

$$ min{\theta^{(1)}, ..., \theta^{(n_u)}} = \frac{1}{2m^{(j)}} \sum{j=1}^{nu} \sum{i:r(i, j) = 1} ((\theta^{(j)})^Tx^{(i)} - y^{(i, j)})^2 + \frac{\lambda}{2m^{(j)}}\sum{j=1}^{n_u}\sum{k=1}^n (\theta_k^{(j)})^2

$$

Gradient update:

$$ \begin{aligned} \theta_0^{(j)} &= \theta_0^{(j)} - \alpha ((\theta^{(j)})^Tx^{(i)} - y^{(i, j)})x_k^{(i)}\ \theta_k^{(j)} &= \theta_k^{(j)} - \alpha (((\theta^{(j)})^Tx^{(i)} - y^{(i, j)})x_k^{(i)} + \lambda \theta_k^{(j)}) \end{aligned}

$$

$$m^{(j)}$$ can actually be eliminated.

Collaborative Filtering

Given $$X$$, estimate $$\Theta$$:

$$ min{\theta^{(1)}, ..., \theta^{(n_u)}} = \frac{1}{2} \sum{j=1}^{nu} \sum{i:r(i, j) = 1} ((\theta^{(j)})^Tx^{(i)} - y^{(i, j)})^2 + \frac{\lambda}{2}\sum{j=1}^{n_u}\sum{k=1}^n (\theta_k^{(j)})^2

$$

Given $$\Theta$$, can estimate $$X$$:

$$ min{x^{(1)}, ..., x^{(n_m)}} = \frac{1}{2} \sum{i=1}^{nm} \sum{j:r(i, j) = 1} ((\theta^{(j)})^Tx^{(i)} - y^{(i, j)})^2 + \frac{\lambda}{2}\sum{i=1}^{n_m}\sum{k=1}^n (x_k^{(i)})^2

$$

Minimizing $$X$$ and $$\Theta$$ simultaneously:

$$ J(\Theta, X) = min{x^{(1)}, ..., x^{(n_m)}, \theta^{(1)}, ..., \theta^{(n_u)}} = \frac{1}{2} \sum{(i, j):r(i, j) = 1} ((\theta^{(j)})^Tx^{(i)} - y^{(i, j)})^2 + \frac{\lambda}{2}\sum{i=1}^{n_m}\sum{k=1}^n (xk^{(i)})^2 + \frac{\lambda}{2}\sum{j=1}^{nu}\sum{k=1}^n (\theta_k^{(j)})^2

$$

  1. Initialize $$\Theta$$, $$X$$ to small random values (without intercept term)
  2. Minimize $$J(\Theta, X)$$ using gradient descent or other optimization algorithm
    1. $$xk^{(i)} = x_k^{(i)} - \alpha(\sum{j:r(i,j)=1}((\theta^{(i)})^Tx^{(i)} - y^{(i, j)})\theta_k^{(j)} + \lambda x_k^{(i)})$$
    2. $$\thetak^{(j)} = \theta_k^{(j)} - \alpha (\sum{i:r(i,j)=1}((\theta^{(j)})^Tx^{(i)} - y^{(i, j)})x_k^{(i)} + \lambda \theta_k^{(j)})$$
  3. Predict rating with $$\theta^Tx$$

Low Rank Matrix Factorizatoin

Predicted ratings:

$$ \begin{bmatrix} (\theta^{(1)})^Tx^{(1)} & (\theta^{(1)})^Tx^{(2)} & \dots & (\theta^{(1)})^Tx^{(n_m)}\ (\theta^{(2)})^Tx^{(1)} & (\theta^{(1)})^Tx^{(2)} & \dots & (\theta^{(2)})^Tx^{(n_m)}\ \vdots & \vdots & \ddots & \vdots \ (\theta^{(n_u)})^Tx^{(1)} & (\theta^{(n_u)})^Tx^{(2)} & \dots & (\theta^{(n_u)})^Tx^{(n_m)} \end{bmatrix}

$$

$$ X = \begin{bmatrix} (x^{1})^T\ (x^{2})^T\ \vdots\ (x^{n_m})^T \end{bmatrix}

$$

$$ \Theta = \begin{bmatrix} (\theta^{1})^T\ (\theta^{2})^T\ \vdots\ (\theta^{n_u})^T \end{bmatrix}

$$ Related movies:

$$|x^{(i)} - x^{(j)}|$$ the smaller, movie $$i$$ and $$j$$ the more similar.

Mean Normalization

Prediction: $$(\theta^{(j)})^Tx^{(i)} + \mu_i$$.

results matching ""

    No results matching ""