Content deleted Content added
Vasylchenko (talk | contribs) m fixed typo |
GreenC bot (talk | contribs) Rescued 1 archive link. Wayback Medic 2.5 per WP:URLREQ#freeuk.com |
||
Line 101:
Generating or maintaining a large-scale search engine index represents a significant storage and processing challenge. Many search engines utilize a form of [[data compression|compression]] to reduce the size of the indices on [[computer storage|disk]].<ref>H.S. Heaps. Storage analysis of a compression coding for a document database. 1NFOR, I0(i):47-61, February 1972.</ref> Consider the following scenario for a full text, Internet search engine.
* It takes 8 bits (or 1 [[byte]]) to store a single character. Some [[character encoding|encodings]] use 2 bytes per character<ref>[https://www.unicode.org/faq/basic_q.html#15 The Unicode Standard - Frequently Asked Questions]. Verified Dec 2006.</ref><ref>[https://web.archive.org/web/20010209140313/http://www.uplink.freeuk.com/data.html Storage estimates]. Verified Dec 2006.</ref>
* The average number of characters in any given word on a page may be estimated at 5 ([[Wikipedia:Size comparisons]])
|