Content deleted Content added
→Compression using quantization: Added plagiarism template to "Compression using quantization" as it appears to be almost direct copy from the abstract of the paper linked to in the template. Also added an "unrelated" template as the text seems to be mostly unrelated to the topic of the article |
Changed position of templates as they may appear to the reader that the previous section is being flagged |
||
Line 45:
In a typical virtual memory implementation, paging happens on a [[least recently used]] basis, potentially causing the compression algorithm to use up CPU cycles dealing with the lowest priority data. Furthermore, program code is usually read-only, and is therefore never paged-out. Instead code is simply discarded, and re-loaded from the program’s auxiliary storage file if needed. In this case the bar for compression is higher, since the I/O cycle it is attempting to eliminate is much shorter, particularly on flash memory devices.
{{Copypaste|section|url=https://www.cs.virginia.edu/~ml2au/papers/IISWCFinalVersion.pdf|7 July 2024}}▼
{{Unrelated|1=Doesn't seem to be related to the subject of this article}}▼
==Compression using quantization==
▲{{Copypaste|section|url=https://www.cs.virginia.edu/~ml2au/papers/IISWCFinalVersion.pdf|7 July 2024}}
Accelerator designers exploit quantization to reduce the bitwidth of values and reduce the cost of data movement. However, any value that does not fit in the reduced bitwidth results in an overflow (we refer to these values as outliers). Therefore accelerators use quantization for applications that are tolerant to overflows. In most applications the rate of outliers is low and values are often within a narrow range <ref name="Quant"/> providing the opportunity to exploit quantization in general-purpose processors. However, a software implementation of quantization in general-purpose processors has three problems. First, the programmer has to manually implement conversions and the additional instructions that quantize and dequantize values, imposing a programmer's effort and performance overhead. Second, to cover outliers, the bitwidth of the quantized values often become greater than or equal to the original values. Third, the programmer has to use standard bitwidth; otherwise, extracting non-standard bitwidth (i.e., 1–7, 9–15, and 17–31) for representing narrow integers exacerbates the overhead of software-based quantization. A hardware support in the memory hierarchy of general-purpose processors for quantization can solve these problems. The hardware support allows representing values by few and flexible numbers of bits and storing outliers in their original format in a separate space, preventing any overflow. It minimizes metadata and the overhead of locating quantized values using a software-hardware interaction that transfers quantization parameters and data layout to hardware. As a result, transparent hardware-based quantization has three advantages over cache compression techniques: (i) less metadata, (ii) higher compression ratio for floating-point values and cache blocks with multiple data types, and (iii) lower overhead for locating the compressed blocks.<ref name="Quant"/>
|