Content deleted Content added
Sietse Snel (talk | contribs) m →References: fix deprecated reference tag |
→Hash Sets: Fixed grammar Tags: canned edit summary Mobile edit Mobile app edit |
||
Line 93:
From a pure formal mathematical point of view, the BIR is straightforward. From a practical point of view, however, several further problems should be solved that relate to algorithms and data structures, such as, for example, the choice of terms (manual or automatic selection or both), [[stemming]], [[hash table]]s, [[Inverted index|inverted file]] structure, and so on.<ref name="wartik">{{cite book | last = Wartik | first = Steven | title = Information Retrieval Data Structures & Algorithms | chapter=Boolean operations | publisher = Prentice-Hall, Inc. | year = 1992 | isbn = 0-13-463837-9 | url = http://www.scribd.com/doc/13742235/Information-Retrieval-Data-Structures-Algorithms-William-B-Frakes }}</ref>
=== Hash
Another possibility is to use [[hash set]]s. Each document is represented by a hash table which contains every single term of that document. Since Hash-table size increases and decreases in real time with the addition and removal of terms, each document will occupy much less space in memory. However, it will have a slowdown in performance because the operations are more complex than with [[bit vector]]s. On the worst-case performance can degrade from O(n) to O(n<sup>2</sup>). On the average case, the performance slowdown will not be that much worse than bit vectors and the space usage is much more efficient.
|