Boolean model of information retrieval: Difference between revisions

Content deleted Content added
Line 4:
==Definitions==
 
An ''index term'' is a word or expression'','' which may be [[stemming|stemmed]], describing or characterizing a document, such as a keyword given for a journal article. Let<math display="block">T = \{t_1, t_2,\ ...\ldots,\ t_m\}</math>be the set of all such index terms.
 
A ''document'' is any subset of <math>T</math>. Let<math display="block">D = \{D_1,\ ...\ldots\ ,D_n\}</math>be the set of all documents.
 
A ''query'' is a Boolean expression <math display="inline">Q</math> in normal form:<math display="block">Q = (W_1\ \or\ W_2\ \or\ \ldotscdots) \and\ \ldotscdots\ \and\ (W_i\ \or\ W_{i+1}\ \or\ \ldotscdots)</math>where <math display="inline">W_i</math> is true for <math>D_j</math> when <math>t_i \in D_j</math>. (Equivalently, <math display="inline">Q</math> could be expressed in [[disjunctive normal form]].)
 
We seek to find the set of documents that satisfy <math display="inline">Q</math>. This operation is called ''retrieval'' and consists of the following two steps: