Content deleted Content added
rmv non-WP:RS : content marketing blog |
m Dating maintenance tags: {{Cn}} |
||
Line 104:
* The average number of characters in any given word on a page may be estimated at 5 ([[Wikipedia:Size comparisons]])
Given this scenario, an uncompressed index (assuming a non-[[conflation|conflated]], simple, index) for 2 billion web pages would need to store 500 billion word entries. At 1 byte per character, or 5 bytes per word, this would require 2500 gigabytes of storage space alone.{{cn|date=December 2023}} This space requirement may be even larger for a fault-tolerant distributed storage architecture. Depending on the compression technique chosen, the index can be reduced to a fraction of this size. The tradeoff is the time and processing power required to perform compression and decompression.{{cn|date=December 2023}}
Notably, large scale search engine designs incorporate the cost of storage as well as the costs of electricity to power the storage. Thus compression is a measure of cost.{{cn|date=December 2023}}
==Document parsing==
|