Content deleted Content added
Undid revision 591406915 by 213.55.79.192 (talk) Reverted meaningless edit |
lx |
||
Line 3:
==Definitions==
The BIR is based on [[Boolean
: T = {t1, t2, ..., tj, ..., tm}
of elements called index terms (e.g. words or expressions - which may be [[stemming|stemmed]] - describing or characterising documents such as keywords given for a journal article), a finite set
: D = {D1, ..., Di, ..., Dn}, where Di is an element of the powerset of T
Line 18:
where ti means that the term ti is present in document Di, whereas NON ti means that it is not.
Equivalently, Q can be given in a [[disjunctive normal form]], too. An operation called retrieval, consisting of two steps, is defined as follows:
: 1. The sets Sj of documents are obtained that contain or not term tj (depending on whether Wj=tj or Wj=NON tj) :
Line 26:
: 2. Those documents are retrieved in response to Q which are the result of the corresponding sets operations, i.e. the answer to Q is as follows:
:: UNION ( [[Intersection|INTERSECTION]] Sj)
==Example==
Line 38:
O1 = Bayes' Principle: The principle that, in estimating a parameter, one should initially assume that each possible value has equal probability (a uniform prior distribution).
O2 = [[Bayes' theorem|Bayesian Decision Theory]]: A mathematical theory of decision-making which presumes utility and probability functions, and according to which the act to be chosen is the Bayes act, i.e. the one with highest
O3 = Bayesian [[Epistemology]]: A philosophical theory which holds that the epistemic status of a proposition (i.e. how well proven or well established it is) is best measured by a probability and that the proper way to revise this probability is given by Bayesian conditionalisation or similar procedures. A Bayesian epistemologist would use probability to define, and explore the relationship between, concepts such as epistemic status, support or explanatory power.
Line 44:
Let the set T of terms be:
T =
Bayesian Epistemology}
Line 78:
== Advantages ==
* Clean
* Easy to implement
* Intuitive concept
Line 84:
== Disadvantages ==
* [[String search algorithm|Exact matching]] may retrieve too few or too many documents
* Difficult to rank output, some documents are more important than others
* Hard to translate a query into a Boolean expression
* All terms are equally weighted
* More like ''[[data retrieval]]'' than ''information retrieval''
== Data structures and algorithms ==
From a pure formal mathematical point of view, the BIR is straightforward. From a practical point of view, however, several further problems should be solved that relate to algorithms and data structures, such as, for example, the choice of terms (manual or automatic selection or both), [[stemming]], [[
=== Hash Sets ===
Another possibility is to use
== References ==
|