Matrix factorization (recommender systems): Difference between revisions

Content deleted Content added
Citation bot (talk | contribs)
Alter: title. Add: chapter. Removed parameters. | Use this bot. Report bugs. | #UCB_CommandLine
Changed users, items etc. to text in math equations
Line 10:
=== Funk MF ===
The original algorithm proposed by Simon Funk in his blog post <ref name="Funkblog" /> factorized the user-item rating matrix as the product of two lower dimensional matrices, the first one has a row for each user, while the second has a column for each item. The row or column associated to a specific user or item is referred to as ''latent factors''.<ref name="Agarwal09">{{cite book |last1=Agarwal |first1=Deepak |title=Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD '09 |last2=Chen |first2=Bee-Chung |date=28 June 2009 |pages=19–28 |doi=10.1145/1557019.1557029 |publisher=ACM|chapter=Regression-based latent factor models |isbn=9781605584959 |s2cid=17484284 }}</ref> Note that, in Funk MF no [[singular value decomposition]] is applied, it is a SVD-like machine learning model.<ref name="Funkblog" />
The predicted ratings can be computed as <math>\tilde{R}=H W</math>, where <math>\tilde{R} \in \mathbb{R}^{\text{users} \times \text{items}}</math> is the user-item rating matrix, <math>H \in \mathbb{R}^{\text{users} \times \text{latent factors}}</math> contains the user's latent factors and <math>W \in \mathbb{R}^{\text{latent factors} \times \text{items}}</math> the item's latent factors.
 
Specifically, the predicted rating user ''u'' will give to item ''i'' is computed as:
 
<math>\tilde{r}_{ui} = \sum_{f=0}^{\text{n factors} } H_{u,f}W_{f,i}</math>
 
It is possible to tune the expressive power of the model by changing the number of latent factors. It has been demonstrated <ref name="Jannach13">{{cite book |last1=Jannach |first1=Dietmar |last2=Lerche |first2=Lukas |last3=Gedikli |first3=Fatih |last4=Bonnin |first4=Geoffray |title=User Modeling, Adaptation, and Personalization |chapter=What Recommenders Recommend – an Analysis of Accuracy, Popularity, and Sales Diversity Effects |volume=7899 |date=2013 |pages=25–37 |doi=10.1007/978-3-642-38844-6_3 |publisher=Springer Berlin Heidelberg |language=en|series=Lecture Notes in Computer Science |isbn=978-3-642-38843-9 |citeseerx=10.1.1.465.96 }}</ref> that a matrix factorization with one latent factor is equivalent to a ''most popular'' or ''top popular'' recommender (e.g. recommends the items with the most interactions without any personalization). Increasing the number of latent factors will improve personalization, therefore recommendation quality, until the number of factors becomes too high, at which point the model starts to [[overfitting|overfit]] and the recommendation quality will decrease. A common strategy to avoid overfitting is to add [[regularization (mathematics)|regularization]] terms to the objective function.<ref name="bi2017">{{cite journal|last1=Bi|first1=Xuan|last2=Qu|first2=Annie|last3=Wang|first3=Junhui|last4=Shen|first4=Xiaotong|year=2017|title=A group-specific recommender system.|url=https://amstat.tandfonline.com/doi/abs/10.1080/01621459.2016.1219261|journal=Journal of the American Statistical Association|volume=112|issue=519|pages=1344–1353|doi=10.1080/01621459.2016.1219261|s2cid=125187672}}</ref><ref>{{cite journal|last1=Zhu|first1=Yunzhang|last2=Shen|first2=Xiaotong|last3=Ye|first3=Changqing|year=2016|title=Personalized prediction and sparsity pursuit in latent factor models.|url=https://amstat.tandfonline.com/doi/abs/10.1080/01621459.2016.1219261|journal=Journal of the American Statistical Association|volume=111|issue=513|pages=241–252|doi=10.1080/01621459.2016.1219261|s2cid=125187672}}</ref>
Line 31:
The predicted rating user ''u'' will give to item ''i'' is computed as:
 
<math>\tilde{r}_{ui} = \mu + b_i + b_u + \sum_{f=0}^{\text{n factors}} H_{u,f}W_{f,i}</math>
 
Where <math>\mu</math> refers to the overall average rating over all items and <math>b_i</math> and <math>b_u</math> refers to the observed deviation of the item <math>i</math> and the user <math>u</math> respectively from the average.<ref>{{Cite journal |last1=Koren |first1=Yehuda |last2=Bell |first2=Robert |last3=Volinsky |first3=Chris |date=August 2009 |title=Matrix factorization techniques for recommender systems |url=https://datajobs.com/data-science-repo/Recommender-Systems-[Netflix].pdf |journal=Computer |pages=45}}</ref> SVD++ has however some disadvantages, with the main drawback being that this method is not ''model-based.'' This means that if a new user is added, the algorithm is incapable of modeling it unless the whole model is retrained. Even though the system might have gathered some interactions for that new user, its latent factors are not available and therefore no recommendations can be computed. This is an example of a [[Cold start (recommender systems)|cold-start]] problem, that is the recommender cannot deal efficiently with new users or items and specific strategies should be put in place to handle this disadvantage.<ref name="Kluver14">{{cite book|last1=Kluver |first1=Daniel |title=Proceedings of the 8th ACM Conference on Recommender systems - Rec ''Sys'' '14 |last2=Konstan |first2=Joseph A. |date=6 October 2014 |pages=121–128 |doi=10.1145/2645710.2645742 |publisher=ACM|chapter=Evaluating recommender behavior for new users |isbn=9781450326681 |s2cid=18509558 }}</ref>
Line 40:
Note that this does not entirely solve the [[Cold start (recommender systems)|cold-start]] problem, since the recommender still requires some reliable interactions for new users, but at least there is no need to recompute the whole model every time. It has been demonstrated that this formulation is almost equivalent to a SLIM model,<ref name="Zheng14">{{cite book |last1=Zheng |first1=Yong |last2=Mobasher |first2=Bamshad |last3=Burke |first3=Robin |title=CSLIM: contextual SLIM recommendation algorithms |date=6 October 2014 |pages=301–304 |doi=10.1145/2645710.2645756 |publisher=ACM|chapter=CSLIM |isbn=9781450326681 |s2cid=15931532 }}</ref> which is an [[item-item recommender|item-item]] model based recommender.
 
<math>\tilde{r}_{ui} = \mu + b_i + b_u + \sum_{f=0}^{\text{n factors}} \biggl( \sum_{j=0}^{\text{n items}} r_{uj} W_{j,f} \biggr) W_{f,i}</math>
 
With this formulation, the equivalent [[item-item recommender]] would be <math>\tilde{R} = R S = R W^{\rm T} W</math>. Therefore the similarity matrix is symmetric.
Line 49:
The predicted rating user ''u'' will give to item ''i'' is computed as:
 
<math>\tilde{r}_{ui} = \mu + b_i + b_u + \sum_{f=0}^{\text{n factors}} \sum_{j=0}^{\text{n items}} r_{uj} Q_{j,f}W_{f,i}</math>
 
With this formulation, the equivalent [[item-item recommender]] would be <math>\tilde{R} = R S = R Q^{\rm T} W</math>. Since matrices Q and W are different the similarity matrix is asymmetric, hence the name of the model.
Line 58:
The predicted rating user ''u'' will give to item ''i'' is computed as:
 
<math>\tilde{r}_{ui} = \sum_{f=0}^{\text{n factors}} (H_{u,f}+S_{v_u,f})(W_{f,i}+T_{f,j_i})</math>
 
Here <math>v_u</math> and <math>j_i</math> represent the group label of user ''u'' and item ''i'', respectively, which are identical across members from the same group. And <math>S</math> and <math>T</math> are matrices of group effects. For example, for a new user <math>u_{new}</math> whose latent factor <math>H_{u_{new}}</math> is not available, we can at least identify their group label <math>v_{u_{new}}</math>, and predict their ratings as:
 
<math>\tilde{r}_{u_{new}i} = \sum_{f=0}^{\text{n factors}} S_{v_{u_{new}},f}(W_{f,i}+T_{f,j_i})</math>
 
This provides a good approximation to the unobserved ratings.