Talk:Document-term matrix: Difference between revisions

Browse history interactively

Content deleted Content added

VisualWikitext

Add topic

Revision as of 11:09, 20 July 2005 edit Kh251 (talk \| contribs) 35 edits No edit summary		Latest revision as of 23:51, 31 January 2024 edit undo Cewbot (talk \| contribs) Bots 8,374,869 edits m Maintain {{WPBS}} and vital articles: 1 WikiProject template. Create {{WPBS}}. Keep majority rating "Stub" in {{WPBS}}. Remove 1 same rating as {{WPBS}} in {{WikiProject Linguistics}}. Tag: Talk banner shell conversion
(11 intermediate revisions by 6 users not shown)
Line 1: {{WikiProject banner shell\|class=Stub\| {{WikiProject Linguistics\|importance=Low\|applied=Yes\|applied-importance=\|auto=Yes}} }} ==Comments== I need some help here: Line 4 ⟶ 8: We definitely need more applications. [[User:Kh251\|Kh251]] I don't agree with the last changes. Performing eigenvalue decomposition reduce the size of the matrix, thus improves speed, but decreases accuracy. I know I might be wrong, but I'd like to understand... [[User:Kh251\|KH251]] 09:32, 21 July 2005 (UTC) : Not necessarily: what you say is one valid interpretation of the reduction, but the reduction can also be interpreted as creating a "better" matrix, since the operation tends to "soften" the representation and reduce possible noise. : Also, it's not always true that this makes it easier on the computational side; for instance, LSA is rather ''heavier'' than just just leaving the thing alone (I have a reference for that somewhere, I am just rather busy at the moment...). Hope it helps ! Cheers ! [[User:Rama\|Rama]] 12:14, 21 July 2005 (UTC) :: Yes but LSA is computed once, the important part is having real time answers to ''queries''. Once the matrix is smaller, this will be faster, won't it ? [[User:Kh251\|KH251]] 12:37, 21 July 2005 (UTC) :::LSA produces a very serious computation burden on a search engine. Right now, if you type a word at a search engine, it looks the word up in a [[trie]] and finds documents that contain that word in O(1) time (independent of the number of documents in the collection). If you had a search engine that looked up documents in the LSA latent space, it would have to perform high-dimensional nearest neighbor search. LSA is typically used with 100+ dimensions, so none of the [[computational geometry]] speed-ups for nearest neighbor search apply. Therefore, the search would be O(N), where N is the ''number of documents in the collection''. For Google, that would be 8,000,000,000. As you can see, this is disastrous for searching the web. -- [[User:Hike395\|hike395]] 06:14, July 22, 2005 (UTC) :::: Oh ! That's how ! Thank you very much for the explanation. You made my day. [[User:Kh251\|KH251]] 09:02, 22 July 2005 (UTC) Since we seem to be several people to have a taste for the thing, would anyone fancy creating a "NLP project" on Wikipedia ? [[User:Rama\|Rama]] 12:18, 22 July 2005 (UTC) = Intro Improvement Request = I encountered this term for the first time just a few minutes ago. I read the intro, but I still don't have a clear idea of what a document-term matrix is, other than it is a mathematical matrix and that it is related to a body of text. [[User:Danielx\|Danielx]] ([[User talk:Danielx\|talk]]) 01:42, 2 November 2009 (UTC)