Content deleted Content added
m Wbm1058 moved page Standard Boolean model to Boolean model of information retrieval |
No edit summary |
||
Line 1:
{{technical|date=June 2018}}<!-- Needs a plain-English definition without mathematical symbols -->
The (standard) '''Boolean model of information retrieval
==Definitions==
Line 12:
We seek to find the set of documents that satisfy <math display="inline">Q</math>. This operation is called ''retrieval'' and consists of the following two steps:
: 1. For each <math display="inline">W_j</math> in <math display="inline">Q</math>, find the set <math display="inline">S_j</math> of documents that satisfy <math display="inline">W_j</math>:<math display="block">S_j = \{D_i\
==Example==
Line 18:
Let the set of original (real) documents be, for example
: <math>O = \{O_1,\ O_2,\ O_3\}</math>
where
Line 24:
<math display="inline">O_1</math> = "Bayes' Principle: The principle that, in estimating a parameter, one should initially assume that each possible value has equal probability (a uniform prior distribution)."
<math display="inline">O_2</math> = "[[Bayes' theorem|Bayesian
<math display="inline">O_3</math> = "Bayesian [[
Let the set <math display="inline">T</math> of terms be:
<math display="block">T = \{t_1=\text{Bayes' Principle}, t_2=\text{probability}, t_3=\text{decision-making}, t_4=\text{Bayesian
Then, the set <math display="inline">D</math> of documents is as follows:
: <math display="block">D = \{D_1,\ D_2,\ D_3\}</math>
where <math display="block">\begin{align}
D_1 &= \{\text{probability},\ \text{Bayes'
D_2 &= \{\text{probability},\ \text{decision-making}\} \\
D_3 &= \{\text{probability},\ \text{Bayesian
\end{align}</math>
Line 73:
=== Hash sets ===
Another possibility is to use [[hash set]]s. Each document is represented by a hash table which contains every single term of that document. Since hash table size increases and decreases in real time with the addition and removal of terms, each document will occupy much less space in memory. However, it will have a slowdown in performance because the operations are more complex than with [[bit vector]]s. On the worst-case performance can degrade from O(''n'') to O(''n''<sup>2</sup>). On the average case, the performance slowdown will not be that much worse than bit vectors and the space usage is much more efficient.
== References ==
|