Vector space model: Difference between revisions

Browse history interactively

← Previous edit

Content deleted Content added

VisualWikitext

Revision as of 03:40, 22 June 2025 edit 240b:c020:623:dd25:cdc3:8408:8aad:a1e (talk) ce ← Previous edit		Latest revision as of 16:58, 17 August 2025 edit undo Macrakis (talk \| contribs) Autopatrolled, Extended confirmed users, Pending changes reviewers, Rollbackers 54,689 edits use standard variant Tag: Visual edit
(One intermediate revision by one other user not shown)
Line 1: {{Short description\|Model for representing text documents}} '''Vector space model''' or '''term vector model''' is an algebraic model for representing text documents (or more generally, items) as [[vector space\|vectors]] such that the distance between vectors represents the relevance between the documents. It is used in [[information filtering]], [[information retrieval]], [[index (search engine)\|index]]ing and ~~relevancy~~relevance rankings. Its first use was in the [[SMART Information Retrieval System]].<ref>{{cite journal \| last1 = Berry \| first1 = Michael W. \| last2 = Drmac \| first2 = Zlatko Line 88: ===Free open source software=== * [[Apache Lucene]]. Apache Lucene is a high-performance, open source, full-featured text search engine library written entirely in Java. * [[OpenSearch (software)]], [[Elasticsearch]] and [[Apache Solr\|Solr]]: the ~~two~~three most well-known search engine programs ~~(many smaller exist)~~ based on Lucene. Others are also available. * [[Gensim]] is a Python+[[NumPy]] framework for Vector Space modelling. It contains incremental (memory-efficient) algorithms for [[tf–idf\|term frequency-inverse document frequency]], [[Latent Semantic Indexing\|latent semantic indexing]], [[Locality sensitive hashing#Random projection\|random projections]] and [[Latent Dirichlet Allocation\|latent Dirichlet allocation]]. * [[Weka (machine learning)\|Weka]]. Weka is a popular data mining package for Java including WordVectors and [[Bag-of-words model\|Bag Of Words models]].