Content deleted Content added
Znamy się ? Tags: Reverted Visual edit Mobile edit Mobile web edit |
use standard variant |
||
(8 intermediate revisions by 8 users not shown) | |||
Line 1:
{{Short description|Model for representing text documents}}
'''Vector space model''' or '''term vector model''' is an algebraic model for representing text documents (or more generally, items) as [[vector space|vectors]] such that the distance between vectors represents the relevance between the documents. It is used in [[information filtering]], [[information retrieval]], [[index (search engine)|index]]ing and
| last1 = Berry | first1 = Michael W.
| last2 = Drmac | first2 = Zlatko
| last3 = Jessup | first3 = Elizabeth R.
| date = January 1999
| doi = 10.1137/s0036144598347035
| issue = 2
| journal = SIAM Review
| pages = 335–362
| title = Matrices, Vector Spaces, and Information Retrieval
| volume = 41}}</ref>
==Definitions==
Line 37 ⟶ 47:
As all vectors under consideration by this model are element-wise nonnegative, a cosine value of zero means that the query and document vector are [[orthogonal]] and have no match (i.e. the query term does not exist in the document being considered). See [[cosine similarity]] for further information.<ref name=":0" />
== Term
In the classic vector space model proposed by [[Gerard Salton|Salton]], Wong and Yang
:<math>
Line 73 ⟶ 83:
==Software that implements the vector space model==
{{further information|Vector database}}
The following software packages may be of interest to those wishing to experiment with vector models and implement search services based upon them.
===Free open source software===
* [[Apache Lucene]]. Apache Lucene is a high-performance, open source, full-featured text search engine library written entirely in Java.
* [[OpenSearch (software)]], [[Elasticsearch]] and [[Apache Solr|Solr]]
* [[Gensim]] is a Python+[[NumPy]] framework for Vector Space modelling. It contains incremental (memory-efficient) algorithms for [[tf–idf|term frequency-inverse document frequency]], [[Latent Semantic Indexing|latent semantic indexing]], [[Locality sensitive hashing#Random projection|
* [[Weka (machine learning)|Weka]]. Weka is a popular data mining package for Java including WordVectors and [[Bag-of-words model|Bag Of Words models]].
* [[Word2vec]]. Word2vec uses vector spaces for word embeddings.
|