Boolean model of information retrieval: Difference between revisions

Content deleted Content added
In the definitions section, rephrase some of the English to be clearer and more concise, and re-write some of the subscripting notation to be clearer.
Fix incomplete sentence I had introduced in previous edit. Fix single hyphens inappropriately used.
Line 4:
==Definitions==
 
The BIR is based on [[Boolean logic]] and classical [[set theory]] in that both the documents to be searched and the user's query are conceived as sets of terms. Retrieval is based on whether or not the documents contain the query terms. Given a finite set<math display="block">T = \{t_1, t_2,\ ...,\ t_m\}</math>of elements called ''index terms,'' (e.g. words or expressions -, which may be [[stemming|stemmed]] -, describing or characterizing documents such as keywords given for a journal article), and a finite set<math display="block">D = \{D_1,\ ...\ ,D_n\} \text{ where } D_i \in Powerset(T)</math>of elements called ''documents'', andas well as a Boolean expression - in a normal form - <math display="inline">Q</math> in a normal form:<math display="block">Q = (W_1\ \or\ W_2\ \or\ \ldots) \and\ \ldots\ \and\ (W_i\ \or\ W_{i+1}\ \or\ \ldots)</math>called a ''query'', where <math display="inline">W_i</math> is true for <math>D_j</math> when <math>t_i \in D_j</math>. (Equivalentlyequivalently, <math display="inline">Q</math> could be expressed in a [[disjunctive normal form]])., Wewe seek to find the set of documents that satisfy <math display="inline">Q</math>. This operation is called ''retrieval'' and consists of two steps, defined as follows:
 
: 1. For each <math display="inline">W_j</math>, find the set <math display="inline">S_j</math> of documents that satisfy <math display="inline">W_j</math>:<math display="block">S_j = \{D_i\ |\ W_j\}</math>2. Then the set of documents that satisfy Q is given by:<math display="block">(S_1 \cup S_2 \cup \ \ldots) \cap\ \ldots\ \cap\ (S_i \cup S_{i+1} \cup\ \ldots)</math>