Boolean model of information retrieval: Difference between revisions

Content deleted Content added
lx
m manual clean up, typo(s) fixed: worst case → worst-case using AWB (10310)
Line 3:
==Definitions==
 
The BIR is based on [[Boolean logic]] and classical [[set theory]] in that both the documents to be searched and the user's query are conceived as sets of terms. Retrieval is based on whether or not the documents contain the query terms. Given a finite set
 
: T = {t1, t2, ..., tj, ..., tm}
 
of elements called index terms (e.g. words or expressions - which may be [[stemming|stemmed]] - describing or characterising documents such as keywords given for a journal article), a finite set
 
: D = {D1, ..., Di, ..., Dn}, where Di is an element of the powerset of T
Line 14:
 
:Q = (Wi OR Wk OR ...) AND ... AND (Wj OR Ws OR ...),
:with Wi=ti, Wk=tk, Wj=tj, Ws=ts, or Wi=NON ti, Wk=NON tk, Wj=NON tj, Ws=NON ts
 
where ti means that the term ti is present in document Di, whereas NON ti means that it is not.
 
Equivalently, Q can be given in a [[disjunctive normal form]], too. An operation called retrieval, consisting of two steps, is defined as follows:
 
: 1. The sets Sj of documents are obtained that contain or not term tj (depending on whether Wj=tj or Wj=NON tj) :
Line 96:
=== Hash Sets ===
 
Another possibility is to use [[hash set]]s. Each document is represented by a hash table which contains every single term of that document. Since Hash-table size increases and decreases in real time with the addition and removal of terms, each document will occupy much less space in memory. However, it will have a slowdown in performance because the operations are more complex than with [[bit vector]]s. On the worst -case performance can degrade from O(n) to O(n<sup>2</sup>). On the average case, the performance slowdown will not be that much worse than bit vectors and the space usage is much more efficient.
 
== References ==
Line 102:
* {{ citation | last= Lashkari | first=A.H. | coauthors=Mahdavi, F.; Ghomi, V. | doi= 10.1109/ICIME.2009.101 | title= A Boolean Model in Information Retrieval for Search Engines | year=2009 }}
 
{{DEFAULTSORT:Standard Boolean Model}}
[[Category:Operations research]]
[[Category:Mathematical modeling]]