Search engine indexing: Difference between revisions

Content deleted Content added
Reverted 1 edit by 174.214.49.165 (talk): Way too long
Tags: Twinkle Undo Mobile edit Mobile web edit
Line 61:
|}
 
This index can only determine whether a word exists within a particular document, since it stores no information regarding the frequency and position of the word; it is therefore considered to be a [[booleanBoolean datatypedata type|booleanBoolean]] index. Such an index determines which documents match a query but does not rank matched documents. In some designs the index includes additional information such as the frequency of each word in each document or the positions of a word in each document.<ref>Grossman, Frieder, Goharian. [http://www.cs.clemson.edu/~juan/CPSC862/Concept-50/IR-Basics-of-Inverted-Index.pdf IR Basics of Inverted Index]. 2002. Verified Aug 2011.</ref> Position information enables the search algorithm to identify word proximity to support searching for phrases; frequency can be used to help in ranking the relevance of documents to the query. Such topics are the central research focus of [[information retrieval]].
 
The inverted index is a [[sparse matrix]], since not all words are present in each document. To reduce [[Computer Storage|computer storage]] memory requirements, it is stored differently from a two dimensional [[Array data structure|array]]. The index is similar to the [[document-term matrix|term document matrices]] employed by [[latent semantic analysis]]. The inverted index can be considered a form of a hash table. In some cases the index is a form of a [[binary tree]], which requires additional storage but may reduce the lookup time. In larger indices the architecture is typically a [[distributed hash table]].<ref>Tang, Hunqiang. [[Sandhya Dwarkadas|Dwarkadas, Sandhya]]. "Hybrid Global Local Indexing for Efficient