Content deleted Content added
→Inverted indices: phrase search |
m Fixed CS1 errors: extra text: edition and general fixes |
||
Line 1:
{{Short description|Method for data management}}
'''Search engine indexing''' is the collecting, [[parsing]], and storing of data to facilitate fast and accurate [[information retrieval]]. Index design incorporates interdisciplinary concepts from [[linguistics]], [[cognitive psychology]], mathematics, [[informatics]], and [[computer science]]. An alternate name for the process, in the context of [[search engine]]s designed to find [[
Popular search engines focus on the [[Full-text search|full-text]] indexing of online, [[Natural language processing|natural language]] documents.<ref>Clarke, C., Cormack, G.: Dynamic Inverted Indexes for a Distributed Full-Text Retrieval System. TechRep MT-95-01, University of Waterloo, February 1995.</ref> [[Media type]]s such as pictures, video,<ref>{{cite journal |last=Sikos |first=L. F. |date=August 2016 |title=RDF-powered semantic video annotation tools with concept mapping to Linked Data for next-generation video indexing |journal=Multimedia Tools and Applications |doi=10.1007/s11042-016-3705-7 |s2cid=254832794 |url=https://ap01.alma.exlibrisgroup.com/view/delivery/61USOUTHAUS_INST/12165436490001831 }}{{Dead link|date=August 2023 |bot=InternetArchiveBot |fix-attempted=yes }}</ref> audio,<ref>http://www.ee.columbia.edu/~dpwe/papers/Wang03-shazam.pdf {{Bare URL PDF|date=March 2022}}</ref> and graphics<ref>Charles E. Jacobs, Adam Finkelstein, David H. Salesin. [http://grail.cs.washington.edu/projects/query/mrquery.pdf Fast Multiresolution Image Querying]. Department of Computer Science and Engineering, University of Washington. 1995. Verified Dec 2006</ref> are also searchable.
Line 74:
# If not, continue to the next occurrence of "first".
The postings lists can be navigated using a binary search in order to minimize the time complexity of this procedure.<ref>{{Cite book |last=Büttcher |first=Stefan |title=Information retrieval: implementing and evaluating search engines |last2=Clarke |first2=Charles L. A. |last3=Cormack |first3=Gordon V. |date=2016 |publisher=The MIT Press |isbn=978-0-262-52887-0 |edition=First MIT Press paperback
===Index merging===
Line 173:
===HTML priority system===
{{Original research section|date=November 2013}}
Indexing often has to recognize the [[HTML]] tags to organize priority. Indexing low priority to high margin to labels like ''strong'' and ''link'' to optimize the order of priority if those labels are at the beginning of the text could not prove to be relevant. Some indexers like [[Google]] and [[Bing (search engine)|Bing]] ensure that the [[search engine]] does not take the large texts as relevant source due to
===Meta tag indexing===
|