Content deleted Content added
Apparition11 (talk | contribs) m Restored revision 1189431309 by HeyElliott (talk): Rv link to spreadsheet |
rmv non-WP:RS : content marketing blog |
||
Line 104:
* The average number of characters in any given word on a page may be estimated at 5 ([[Wikipedia:Size comparisons]])
Given this scenario, an uncompressed index (assuming a non-[[conflation|conflated]], simple, index) for 2 billion web pages would need to store 500 billion word entries. At 1 byte per character, or 5 bytes per word, this would require 2500 gigabytes of storage space alone.
Notably, large scale search engine designs incorporate the cost of storage as well as the costs of electricity to power the storage. Thus compression is a measure of cost.{{cn}}
==Document parsing==
|