Revision as of 23:57, 5 November 2023 edit 59.101.92.160 (talk) Changed users, items etc. to text in math equations Tag: Visual edit: Switched ← Previous edit		Revision as of 16:39, 24 November 2023 edit undo Me, Myself, and I are Here (talk \| contribs) Extended confirmed users 107,047 edits m fixed dashes using a script, cap; fixed 4x invalid ref position using ARA Next edit →
Line 3: {{Recommender systems}} '''Matrix factorization''' is a class of [[collaborative filtering]] algorithms used in [[recommender system]]s<!-- I agree that this is the description but I think the wording should be different. The scientific community as a whole wouldn't commonly know matrix factorization defined in this way -->. Matrix factorization algorithms work by decomposing the user-item interaction [[Matrix (mathematics)\|matrix]] into the product of two lower dimensionality rectangular matrices.<ref name="Koren09">{{cite journal \|last1=Koren \|first1=Yehuda \|last2=Bell \|first2=Robert \|last3=Volinsky \|first3=Chris \|title=Matrix Factorization Techniques for Recommender Systems \|journal=Computer \|date=August 2009 \|volume=42 \|issue=8 \|pages=30–37 \|doi=10.1109/MC.2009.263\|citeseerx=10.1.1.147.8295 \|s2cid=58370896 }}</ref> This family of methods became widely known during the [[Netflix prize]] challenge due to its effectiveness as reported by Simon Funk in his 2006 blog post,<ref name="Funkblog">{{cite web \|last1=Funk \|first1=Simon \|title=Netflix Update: Try This at Home \|url=http://sifter.org/~simon/journal/20061211.html}}</ref> where he shared his findings with the research community. The prediction results can be improved by assigning different regularization weights to the latent factors based on items' popularity and users' activeness.<ref>{{Cite journal\|last1=ChenHung-Hsuan\|last2=ChenPu\|date=2019-01-09\|title=Differentiating Regularization Weights --– A Simple Mechanism to Alleviate Cold Start in Recommender Systems\|journal=ACM Transactions on Knowledge Discovery from Data \|volume=13\|pages=1–22\|language=EN\|doi=10.1145/3285954\|s2cid=59337456}}</ref> == Techniques == Line 9: === Funk MF === The original algorithm proposed by Simon Funk in his blog post <ref name="Funkblog" /> factorized the user-item rating matrix as the product of two lower dimensional matrices, the first one has a row for each user, while the second has a column for each item. The row or column associated to a specific user or item is referred to as ''latent factors''.<ref name="Agarwal09">{{cite book \|last1=Agarwal \|first1=Deepak \|title=Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining -– KDD '09 \|last2=Chen \|first2=Bee-Chung \|date=28 June 2009 \|pages=19–28 \|doi=10.1145/1557019.1557029 \|publisher=ACM\|chapter=Regression-based latent factor models \|isbn=9781605584959 \|s2cid=17484284 }}</ref> Note that, in Funk MF no [[singular value decomposition]] is applied, it is a SVD-like machine learning model.<ref name="Funkblog" /> The predicted ratings can be computed as <math>\tilde{R}=H W</math>, where <math>\tilde{R} \in \mathbb{R}^{\text{users} \times \text{items}}</math> is the user-item rating matrix, <math>H \in \mathbb{R}^{\text{users} \times \text{latent factors}}</math> contains the user's latent factors and <math>W \in \mathbb{R}^{\text{latent factors} \times \text{items}}</math> the item's latent factors. Line 16: <math>\tilde{r}_{ui} = \sum_{f=0}^{\text{n factors} } H_{u,f}W_{f,i}</math> It is possible to tune the expressive power of the model by changing the number of latent factors. It has been demonstrated <ref name="Jannach13">{{cite book \|last1=Jannach \|first1=Dietmar \|last2=Lerche \|first2=Lukas \|last3=Gedikli \|first3=Fatih \|last4=Bonnin \|first4=Geoffray \|title=User Modeling, Adaptation, and Personalization \|chapter=What Recommenders Recommend – an Analysis of Accuracy, Popularity, and Sales Diversity Effects \|volume=7899 \|date=2013 \|pages=25–37 \|doi=10.1007/978-3-642-38844-6_3 \|publisher=Springer Berlin Heidelberg \|language=en\|series=Lecture Notes in Computer Science \|isbn=978-3-642-38843-9 \|citeseerx=10.1.1.465.96 }}</ref> that a matrix factorization with one latent factor is equivalent to a ''most popular'' or ''top popular'' recommender (e.g. recommends the items with the most interactions without any personalization). Increasing the number of latent factors will improve personalization, therefore recommendation quality, until the number of factors becomes too high, at which point the model starts to [[overfitting\|overfit]] and the recommendation quality will decrease. A common strategy to avoid overfitting is to add [[regularization (mathematics)\|regularization]] terms to the objective function.<ref name="bi2017">{{cite journal\|last1=Bi\|first1=Xuan\|last2=Qu\|first2=Annie\|last3=Wang\|first3=Junhui\|last4=Shen\|first4=Xiaotong\|year=2017\|title=A group-specific recommender system.\|url=https://amstat.tandfonline.com/doi/abs/10.1080/01621459.2016.1219261\|journal=Journal of the American Statistical Association\|volume=112\|issue=519\|pages=1344–1353\|doi=10.1080/01621459.2016.1219261\|s2cid=125187672}}</ref><ref>{{cite journal\|last1=Zhu\|first1=Yunzhang\|last2=Shen\|first2=Xiaotong\|last3=Ye\|first3=Changqing\|year=2016\|title=Personalized prediction and sparsity pursuit in latent factor models.\|url=https://amstat.tandfonline.com/doi/abs/10.1080/01621459.2016.1219261\|journal=Journal of the American Statistical Association\|volume=111\|issue=513\|pages=241–252\|doi=10.1080/01621459.2016.1219261\|s2cid=125187672}}</ref> Funk MF was developed as a ''rating prediction'' problem, therefore it uses explicit numerical ratings as user-item interactions. Line 33: <math>\tilde{r}_{ui} = \mu + b_i + b_u + \sum_{f=0}^{\text{n factors}} H_{u,f}W_{f,i}</math> Where <math>\mu</math> refers to the overall average rating over all items and <math>b_i</math> and <math>b_u</math> refers to the observed deviation of the item <math>i</math> and the user <math>u</math> respectively from the average.<ref>{{Cite journal \|last1=Koren \|first1=Yehuda \|last2=Bell \|first2=Robert \|last3=Volinsky \|first3=Chris \|date=August 2009 \|title=Matrix factorization techniques for recommender systems \|url=https://datajobs.com/data-science-repo/Recommender-Systems-[Netflix].pdf \|journal=Computer \|pages=45}}</ref> SVD++ has however some disadvantages, with the main drawback being that this method is not ''model-based.'' This means that if a new user is added, the algorithm is incapable of modeling it unless the whole model is retrained. Even though the system might have gathered some interactions for that new user, its latent factors are not available and therefore no recommendations can be computed. This is an example of a [[Cold start (recommender systems)\|cold-start]] problem, that is the recommender cannot deal efficiently with new users or items and specific strategies should be put in place to handle this disadvantage.<ref name="Kluver14">{{cite book\|last1=Kluver \|first1=Daniel \|title=Proceedings of the 8th ACM Conference on Recommender systems -– Rec ''Sys'' '14 \|last2=Konstan \|first2=Joseph A. \|date=6 October 2014 \|pages=121–128 \|doi=10.1145/2645710.2645742 \|publisher=ACM\|chapter=Evaluating recommender behavior for new users \|isbn=9781450326681 \|s2cid=18509558 }}</ref> A possible way to address this cold start problem is to modify SVD++ in order for it to become a ''model-based'' algorithm, therefore allowing to easily manage new items and new users. Line 45: === Asymmetric SVD === Asymmetric SVD aims at combining the advantages of SVD++ while being a model based algorithm, therefore being able to consider new users with a few ratings without needing to retrain the whole model. As opposed to the model-based SVD here the user latent factor matrix H is replaced by Q, which learns the user's preferences as function of their ratings.<ref name="Pu13">{{cite book \|last1=Pu \|first1=Li \|title=Proceedings of the 7th ACM conference on Recommender systems -– Rec ''Sys'' '13 \|last2=Faltings \|first2=Boi \|date=12 October 2013 \|pages=41–48 \|doi=10.1145/2507157.2507178 \|publisher=ACM\|chapter=Understanding and improving relational matrix factorization in recommender systems \|isbn=9781450324090 \|s2cid=14106198 \|chapter-url=http://infoscience.epfl.ch/record/197484 }}</ref> The predicted rating user ''u'' will give to item ''i'' is computed as: Line 67: === Hybrid MF === In recent years many other matrix factorization models have been developed to exploit the ever increasing amount and variety of available interaction data and use cases. Hybrid matrix factorization algorithms are capable of merging explicit and implicit interactions <ref name="Zhao16">{{cite journal \|last1=Zhao \|first1=Changwei \|title=Hybrid Matrix Factorization for Recommender Systems in Social Networks \|last2=Sun \|first2=Suhuan \|last3=Han \|first3=Linqian \|last4=Peng \|first4=Qinke \|journal=Neural Network World \|date=2016 \|volume=26 \|issue=6 \|pages=559–569 \|doi=10.14311/NNW.2016.26.032\|doi-access=free }}</ref> or both content and collaborative data <ref name="Zhou12">{{cite book \|last1=Zhou \|first1=Tinghui \|last2=Shan \|first2=Hanhuai \|last3=Banerjee \|first3=Arindam \|last4=Sapiro \|first4=Guillermo \|title=Proceedings of the 2012 SIAM International Conference on Data Mining \|chapter=Kernelized Probabilistic Matrix Factorization: Exploiting Graphs and Side Information \|date=26 April 2012 \|pages=403–414 \|doi=10.1137/1.9781611972825.35 \|publisher=Society for Industrial and Applied Mathematics\|isbn=978-1-61197-232-0 }}</ref><ref name="Adams10">{{cite arXiv \|last1=Adams \|first1=Ryan Prescott \|last2=Dahl \|first2=George E. \|last3=Murray \|first3=Iain \|title=Incorporating Side Information in Probabilistic Matrix Factorization with Gaussian Processes 1003.4944\|date=25 March 2010 \|eprint=1003.4944\|class=stat.ML }}</ref><ref name="Fang11">{{cite book \|last1=Fang \|first1=Yi \|title=Proceedings of the 2nd International Workshop on Information Heterogeneity and Fusion in Recommender Systems -– Het ''Rec'' '11 \|last2=Si \|first2=Luo \|date=27 October 2011 \|pages=65–69 \|doi=10.1145/2039320.2039330 \|publisher=ACM\|chapter=Matrix co-factorization for recommendation with rich side information and implicit feedback \|isbn=9781450310277 \|s2cid=13850687 }}</ref> ===Deep-~~Learning~~learning MF=== In recent years a number of neural and deep-learning techniques have been proposed, some of which generalize traditional Matrix factorization algorithms via a non-linear neural architecture.<ref>{{cite book \|last1=He \|first1=Xiangnan \|last2=Liao \|first2=Lizi \|last3=Zhang \|first3=Hanwang \|last4=Nie \|first4=Liqiang \|last5=Hu \|first5=Xia \|last6=Chua \|first6=Tat-Seng \|title=Proceedings of the 26th International Conference on World Wide Web \|chapter=Neural Collaborative Filtering \|date=2017 \|pages=173–182 \|doi=10.1145/3038912.3052569 \|chapter-url=https://dl.acm.org/citation.cfm?id=3052569 \|accessdate=16 October 2019 \|publisher=International World Wide Web Conferences Steering Committee\|isbn=9781450349130 \|arxiv=1708.05031 \|s2cid=13907106 }}</ref> While deep learning has been applied to many different scenarios: context-aware, sequence-aware, social tagging etc. its real effectiveness when used in a simple [[Collaborative filtering]] scenario has been put into question. Systematic analysis of publications applying deep learning or neural methods to the top-k recommendation problem, published in top conferences (SIGIR, KDD, WWW, RecSys, IJCAI), has shown that on average less than 40% of articles are reproducible, with as little as 14% in some conferences. Overall the studies identify 26 articles, only 12 of them could be reproduced and 11 of them could be outperformed by much older and simpler properly tuned baselines. The articles also highlights a number of potential problems in today's research scholarship and call for improved scientific practices in that area.<ref>{{cite book \|last1=Rendle \|first1=Steffen \|last2=Krichene \|first2=Walid \|last3=Zhang \|first3=Li \|last4=Anderson \|first4=John \|title=Fourteenth ACM Conference on Recommender Systems \|chapter=Neural Collaborative Filtering vs. Matrix Factorization Revisited \|date=22 September 2020 \|pages=240–248 \|doi=10.1145/3383313.3412488\|arxiv=2005.09683 \|isbn=9781450375832 \|doi-access=free }}</ref><ref>{{cite journal \| last1=Dacrema \|last2=Ferrari \|title=A Troubling Analysis of Reproducibility and Progress in Recommender Systems Research. \|journal=ACM Transactions on Information Systems \|pages=39.2 \|date=2021 \|volume=39 \|issue=2 \|doi=10.1145/3434185 \|arxiv=1911.07698 \|s2cid=208138060 }}</ref> Similar issues have been spotted also in sequence-aware recommender systems.<ref>{{cite book \|last1=Ludewig \|first1=Malte \|last2=Mauro \|first2=Noemi \|last3=Latifi \|first3=Sara \|last4=Jannach \|first4=Dietmar \|title=Proceedings of the 13th ACM Conference on Recommender Systems \|chapter=Performance comparison of neural and non-neural approaches to session-based recommendation \|date=2019 \|pages=462–466 \|doi=10.1145/3298689.3347041 \|publisher=ACM\|isbn=9781450362436 \|doi-access=free }}</ref>

Matrix factorization (recommender systems): Difference between revisions