Talk:Document-term matrix: Difference between revisions

Content deleted Content added
Kh251 (talk | contribs)
No edit summary
Cewbot (talk | contribs)
m Maintain {{WPBS}} and vital articles: 1 WikiProject template. Create {{WPBS}}. Keep majority rating "Stub" in {{WPBS}}. Remove 1 same rating as {{WPBS}} in {{WikiProject Linguistics}}.
 
(7 intermediate revisions by 6 users not shown)
Line 1:
{{WikiProject banner shell|class=Stub|
{{WikiProject Linguistics|importance=Low|applied=Yes|applied-importance=|auto=Yes}}
}}
==Comments==
I need some help here:
 
Line 13 ⟶ 17:
 
:: Yes but LSA is computed once, the important part is having real time answers to ''queries''. Once the matrix is smaller, this will be faster, won't it ? [[User:Kh251|KH251]] 12:37, 21 July 2005 (UTC)
 
:::LSA produces a very serious computation burden on a search engine. Right now, if you type a word at a search engine, it looks the word up in a [[trie]] and finds documents that contain that word in O(1) time (independent of the number of documents in the collection). If you had a search engine that looked up documents in the LSA latent space, it would have to perform high-dimensional nearest neighbor search. LSA is typically used with 100+ dimensions, so none of the [[computational geometry]] speed-ups for nearest neighbor search apply. Therefore, the search would be O(N), where N is the ''number of documents in the collection''. For Google, that would be 8,000,000,000. As you can see, this is disastrous for searching the web. -- [[User:Hike395|hike395]] 06:14, July 22, 2005 (UTC)
 
:::: Oh ! That's how ! Thank you very much for the explanation. You made my day. [[User:Kh251|KH251]] 09:02, 22 July 2005 (UTC)
 
Since we seem to be several people to have a taste for the thing, would anyone fancy creating a "NLP project" on Wikipedia ? [[User:Rama|Rama]] 12:18, 22 July 2005 (UTC)
 
= Intro Improvement Request =
 
I encountered this term for the first time just a few minutes ago. I read the intro, but I still don't have a clear idea of what a document-term matrix is, other than it is a mathematical matrix and that it is related to a body of text. [[User:Danielx|Danielx]] ([[User talk:Danielx|talk]]) 01:42, 2 November 2009 (UTC)