Search engine indexing: Difference between revisions

Content deleted Content added
Mstary (talk | contribs)
No edit summary
Meta tag indexing: Citations needed.
 
(5 intermediate revisions by 4 users not shown)
Line 3:
'''Search engine indexing''' is the collecting, [[parsing]], and storing of data to facilitate fast and accurate [[information retrieval]]. Index design incorporates interdisciplinary concepts from [[linguistics]], [[cognitive psychology]], mathematics, [[informatics]], and [[computer science]]. An alternate name for the process, in the context of [[search engine]]s designed to find [[web page]]s on the Internet, is ''[[web indexing]]''.
 
Popular search engines focus on the [[Full-text search|full-text]] indexing of online, [[Natural language processing|natural language]] documents.<ref>Clarke, C., Cormack, G.: Dynamic Inverted Indexes for a Distributed Full-Text Retrieval System. TechRep MT-95-01, University of Waterloo, February 1995.</ref> [[Media type]]s such as pictures, video,<ref>{{cite web|url=https://superstarseo.com/how-search-engines-actually-find-and-index-video-content/|title= How Search Engines Actually Find and Index Video Content| website= superstarseo.com }}</ref> audio,<ref>{{Cite web| title=An Industrial-Strength Audio Search Algorithm | url=http://www.ee.columbia.edu/~dpwe/papers/Wang03-shazam.pdf | archive-url=https://web.archive.org/web/20060512074748/http://www.ee.columbia.edu:80/~dpwe/papers/Wang03-shazam.pdf | archive-date=2006-05-12}}</ref> and graphics<ref>Charles E. Jacobs, Adam Finkelstein, David H. Salesin. [http://grail.cs.washington.edu/projects/query/mrquery.pdf Fast Multiresolution Image Querying]. Department of Computer Science and Engineering, University of Washington. 1995. Verified Dec 2006</ref> are also searchable.
 
[[Metasearch engine|Meta search engines]] reuse the indices of other services and do not store a local index whereas cache-based search engines permanently store the index along with the [[text corpus|corpus]]. Unlike full-text indices, partial-text services restrict the depth indexed to reduce index size. Larger services typically perform indexing at a predetermined time interval due to the required time and processing costs, while [[Intelligent agent|agent]]-based search engines index in [[Real time business intelligence|real time]].
Line 179:
Meta tag indexing plays an important role in organizing and categorizing web content. Specific documents often contain embedded meta information such as author, keywords, description, and language. For HTML pages, the [[meta tag]] contains keywords which are also included in the index. Earlier Internet [[search engine technology]] would only index the keywords in the meta tags for the forward index; the full document would not be parsed. At that time full-text indexing was not as well established, nor was [[computer hardware]] able to support such technology. The design of the HTML markup language initially included support for meta tags for the very purpose of being properly and easily indexed, without requiring tokenization.<ref>Berners-Lee, T., "Hypertext Markup Language - 2.0", RFC 1866, Network Working Group, November 1995.</ref>
 
As the Internet grew through the 1990s, many [[brick and mortar business|brick-and-mortar corporations]] went 'online' and established corporate websites. The keywords used to describe webpages (many of which were corporate-oriented webpages similar to product brochures) changed from descriptive to marketing-oriented keywords designed to drive sales by placing the webpage high in the search results for specific search queries. The fact that these keywords were subjectively specified was leading to [[spamdexing]], which drove many search engines to adopt full-text indexing technologies in the 1990s. Search engine designers and companies could only place so many 'marketing keywords' into the content of a webpage before draining it of all interesting and useful information. Given that conflict of interest with the business goal of designing user-oriented websites which were 'sticky', the [[customer lifetime value]] equation was changed to incorporate more useful content into the website in hopes of retaining the visitor. In this sense, full-text indexing was more objective and increased the quality of search engine results, as it was one more step away from subjective control of search engine result placement, which in turn furthered research of full-text indexing technologies.{{cn|date=August 2025}}
 
In [[desktop search]], many solutions incorporate meta tags to provide a way for authors to further customize how the search engine will index content from various files that is not evident from the file content. Desktop search is more under the control of the user, while Internet search engines must focus more on the full text index.{{cn|date=August 2025}
 
==See also==