Quantization (signal processing): Difference between revisions

Content deleted Content added
m + Figures with quatized and digital signal
Rescuing 1 sources and tagging 0 as dead.) #IABot (v2.0.9.5
 
(688 intermediate revisions by more than 100 users not shown)
Line 1:
{{Short description|Process of mapping a continuous set to a countable set}}
[[Image:Quantized.signal.png|right|thumb|Quantized signal]]
{{Use American English|date=April 2019}}
[[Image:Digital.signal.png|right|thumb|Digital signal]]
In [[digital signal processing]], '''quantization''' is the process of approximating a continuous range of values (or a very large set of possible discrete values) by a relatively-small set of discrete symbols or integer values.
More specifically, a [[signal (information theory)|signal]] can be multi-dimensional and quantization need not be applied to all dimensions.
A discrete signal need not necessarily be quantized (a pedantic point, but true nonetheless and can be a point of confusion). ''See [[ideal sampler]].''
 
[[File:Quantization error.png|thumb|upright=2|The simplest way to quantize a signal is to choose the digital amplitude value closest to the original analog amplitude. This example shows the original analog signal (green), the quantized signal (black dots), the [[Signal reconstruction|signal reconstructed]] from the quantized signal (yellow) and the difference between the original signal and the reconstructed signal (red). The difference between the original signal and the reconstructed signal is the quantization error and, in this simple quantization scheme, is a deterministic function of the input signal.]]
A common use of quantization is in the conversion of a [[discrete signal]] (a [[sample (signal)|sampled]] [[continuous signal]]) into a [[digital signal]] by quantizing.
Both of these steps (sampling and quantizing) are performed in [[analog-to-digital converter]]s with the quantization level specified in [[bit]]s.
A specific example would be [[compact disc]] (CD) audio which is sampled at 44,100 [[Hz]] and quantized with 16 bits (2 [[byte]]s) which can be one of 65,536 (<math>2^{16}</math>) possible values per sample.
 
'''Quantization''', in mathematics and [[digital signal processing]], is the process of mapping input values from a large set (often a continuous set) to output values in a (countable) smaller set, often with a finite [[number of elements]]. [[Rounding]] and [[truncation]] are typical examples of quantization processes. Quantization is involved to some degree in nearly all digital signal processing, as the process of representing a signal in digital form ordinarily involves rounding. Quantization also forms the core of essentially all [[lossy compression]] algorithms.
== Mathematical description ==
The simplest and best-known form of quantization is referred to as [[scalar]] quantization, since it operates on scalar (as opposed to multi-dimensional [[vector]]) input data.
In general, a scalar quantization operator can be represented as
 
The difference between an input value and its quantized value (such as [[round-off error]]) is referred to as '''quantization error''', '''noise''' or '''distortion'''. A device or [[algorithm function|algorithmic function]] that performs quantization is called a '''quantizer'''. An [[analog-to-digital converter]] is an example of a quantizer.
:<math>Q(x) = g(\lfloor f(x) \rfloor)</math>
 
==Example==
For example, [[Rounding#Round half up|rounding]] a [[real number]] <math>x</math> to the nearest integer value forms a very basic type of quantizer – a ''uniform'' one. A typical (''mid-tread'') uniform quantizer with a quantization ''step size'' equal to some value <math>\Delta</math> can be expressed as
 
:<math>Q(x) = \Delta \cdot \left\lfloor \frac{x}{\Delta} + \frac{1}{2} \right\rfloor</math>,
 
where the notation <math> \lfloor \ \rfloor </math> denotes the [[floor function]].
 
Alternatively, the same quantizer may be expressed in terms of the [[ceiling function]], as
:<math>Q(x) = \Delta \cdot \left\lceil \frac{x}{\Delta} - \frac{1}{2} \right\rceil</math>.
 
(The notation <math> \lceil \ \rceil </math> denotes the ceiling function).
 
The essential property of a quantizer is having a countable set of possible output values smaller than the set of possible input values. The members of the set of output values may have integer, rational, or real values. For simple rounding to the nearest integer, the step size <math>\Delta</math> is equal to 1. With <math>\Delta = 1</math> or with <math>\Delta</math> equal to any other integer value, this quantizer has real-valued inputs and integer-valued outputs.
 
When the quantization step size (Δ) is small relative to the variation in the signal being quantized, it is relatively simple to show that the [[mean squared error]] produced by such a rounding operation will be approximately <math>\Delta^2/ 12</math>.<ref name=Sheppard>{{cite journal | last=Sheppard | first=W. F. |author-link=William Fleetwood Sheppard| title=On the Calculation of the most Probable Values of Frequency-Constants, for Data arranged according to Equidistant Division of a Scale | journal=Proceedings of the London Mathematical Society | publisher=Wiley | volume=s1-29 | issue=1 | year=1897 | issn=0024-6115 | doi=10.1112/plms/s1-29.1.353 | pages=353–380| url=https://zenodo.org/record/1447738 }}</ref><ref name=Bennett>W. R. Bennett, "[http://www.alcatel-lucent.com/bstj/vol27-1948/articles/bstj27-3-446.pdf Spectra of Quantized Signals]", ''[[Bell System Technical Journal]]'', Vol. 27, pp. 446–472, July 1948.</ref><ref name=OliverPierceShannon>{{cite journal | last1=Oliver | first1=B.M. | last2=Pierce | first2=J.R. | last3=Shannon | first3=C.E. |author-link3=Claude Shannon| title=The Philosophy of PCM | journal=Proceedings of the IRE | volume=36 | issue=11 | year=1948 | issn=0096-8390 | doi=10.1109/jrproc.1948.231941 | pages=1324–1331| s2cid=51663786 }}</ref><ref name=Stein>Seymour Stein and J. Jay Jones, ''[https://books.google.com/books/about/Modern_communication_principles.html?id=jBc3AQAAIAAJ Modern Communication Principles]'', [[McGraw–Hill]], {{ISBN|978-0-07-061003-3}}, 1967 (p. 196).</ref><ref name=GishPierce>{{cite journal | last1=Gish | first1=H. | last2=Pierce | first2=J. | title=Asymptotically efficient quantizing | journal=IEEE Transactions on Information Theory | volume=14 | issue=5 | year=1968 | issn=0018-9448 | doi=10.1109/tit.1968.1054193 | pages=676–683}}</ref><ref name=GrayNeuhoff>{{cite journal | last1=Gray | first1=R.M. |author-link=Robert M. Gray| last2=Neuhoff | first2=D.L. | title=Quantization | journal=IEEE Transactions on Information Theory | volume=44 | issue=6 | year=1998 | issn=0018-9448 | doi=10.1109/18.720541 | pages=2325–2383| s2cid=212653679 }}</ref> Mean squared error is also called the quantization ''noise power''. Adding one bit to the quantizer halves the value of Δ, which reduces the noise power by the factor {{sfrac|1|4}}. In terms of [[decibel]]s, the noise power change is <math>\scriptstyle 10\cdot \log_{10}(1/4)\ \approx\ -6\ \mathrm{dB}.</math>
 
Because the set of possible output values of a quantizer is countable, any quantizer can be decomposed into two distinct stages, which can be referred to as the ''classification'' stage (or ''forward quantization'' stage) and the ''reconstruction'' stage (or ''inverse quantization'' stage), where the classification stage maps the input value to an integer ''quantization index'' <math>k</math> and the reconstruction stage maps the index <math>k</math> to the ''reconstruction value'' <math>y_k</math> that is the output approximation of the input value. For the example uniform quantizer described above, the forward quantization stage can be expressed as
:<math>k = \left\lfloor \frac{x}{\Delta} + \frac{1}{2}\right\rfloor</math>,
and the reconstruction stage for this example quantizer is simply
:<math>y_k = k \cdot \Delta</math>.
 
This decomposition is useful for the design and analysis of quantization behavior, and it illustrates how the quantized data can be communicated over a [[communication channel]] – a ''source encoder'' can perform the forward quantization stage and send the index information through a communication channel, and a ''decoder'' can perform the reconstruction stage to produce the output approximation of the original input data. In general, the forward quantization stage may use any function that maps the input data to the integer space of the quantization index data, and the inverse quantization stage can conceptually (or literally) be a table look-up operation to map each quantization index to a corresponding reconstruction value. This two-stage decomposition applies equally well to [[vector quantization|vector]] as well as scalar quantizers.
 
==Mathematical properties==
Because quantization is a many-to-few mapping, it is an inherently [[non-linear]] and irreversible process (i.e., because the same output value is shared by multiple input values, it is impossible, in general, to recover the exact input value when given only the output value).
 
The set of possible input values may be infinitely large, and may possibly be continuous and therefore [[uncountable]] (such as the set of all real numbers, or all real numbers within some limited range). The set of possible output values may be [[finite set|finite]] or [[countably infinite]].<ref name=GrayNeuhoff/> The input and output sets involved in quantization can be defined in a rather general way. For example, vector quantization is the application of quantization to multi-dimensional (vector-valued) input data.<ref>{{cite book |author1=Allen Gersho |author-link=Allen Gersho |author2=Robert M. Gray |author-link2=Robert M. Gray |url=https://books.google.com/books?id=DwcDm6xgItUC |title=Vector Quantization and Signal Compression |publisher=[[Springer Science+Business Media|Springer]] |isbn=978-0-7923-9181-4 |date=1991}}</ref>
 
==Types==
[[File:2-bit resolution analog comparison.png|thumbnail|2-bit resolution with four levels of quantization compared to analog<ref>Hodgson, Jay (2010). ''Understanding Records'', p.56. {{ISBN|978-1-4411-5607-5}}. Adapted from Franz, David (2004). ''Recording and Producing in the Home Studio'', p.38-9. Berklee Press.</ref>]]
[[File:3-bit resolution analog comparison.png|thumbnail|3-bit resolution with eight levels]]
 
===Analog-to-digital converter===
An [[analog-to-digital converter]] (ADC) can be modeled as two processes: [[Sampling (signal processing)|sampling]] and quantization. Sampling converts a time-varying voltage signal into a [[discrete-time signal]], a sequence of real numbers. Quantization replaces each real number with an approximation from a finite set of discrete values. Most commonly, these discrete values are represented as fixed-point words. Though any number of quantization levels is possible, common word lengths are [[audio bit depth|8-bit]] (256 levels), 16-bit (65,536 levels) and 24-bit (16.8&nbsp;million levels). Quantizing a sequence of numbers produces a sequence of quantization errors which is sometimes modeled as an additive random signal called '''quantization noise''' because of its [[stochastic]] behavior. The more levels a quantizer uses, the lower is its quantization noise power.
 
===Rate–distortion optimization===
''[[Rate–distortion theory|Rate–distortion optimized]]'' quantization is encountered in [[source coding]] for lossy data compression algorithms, where the purpose is to manage distortion within the limits of the [[bit rate]] supported by a communication channel or storage medium. The analysis of quantization in this context involves studying the amount of data (typically measured in digits or bits or bit ''rate'') that is used to represent the output of the quantizer and studying the loss of precision that is introduced by the quantization process (which is referred to as the ''distortion'').
 
===Mid-riser and mid-tread uniform quantizers===
Most uniform quantizers for signed input data can be classified as being of one of two types: ''mid-riser'' and ''mid-tread''. The terminology is based on what happens in the region around the value 0, and uses the analogy of viewing the input-output function of the quantizer as a [[stairway]]. Mid-tread quantizers have a zero-valued reconstruction level (corresponding to a ''tread'' of a stairway), while mid-riser quantizers have a zero-valued classification threshold (corresponding to a ''[[Stair riser|riser]]'' of a stairway).<ref name=Gersho77>{{cite journal | last=Gersho | first=A. |author-link=Allen Gersho | title=Quantization | journal=IEEE Communications Society Magazine | volume=15 | issue=5 | year=1977 | issn=0148-9615 | doi=10.1109/mcom.1977.1089500 | pages=16–28| s2cid=260498692 }}</ref>
 
Mid-tread quantization involves rounding. The formulas for mid-tread uniform quantization are provided in the previous section.
 
:<math>Q(x) = \Delta \cdot \left\lfloor \frac{x}{\Delta} + \frac{1}{2} \right\rfloor</math>,
 
Mid-riser quantization involves truncation. The input-output formula for a mid-riser uniform quantizer is given by:
:<math>Q(x) = \Delta\cdot\left(\left\lfloor \frac{x}{\Delta}\right\rfloor + \frac1{2}\right)</math>,
where the classification rule is given by
:<math>k = \left\lfloor \frac{x}{\Delta} \right\rfloor</math>
and the reconstruction rule is
:<math>y_k = \Delta\cdot\left(k+\tfrac1{2}\right)</math>.
 
Note that mid-riser uniform quantizers do not have a zero output value – their minimum output magnitude is half the step size. In contrast, mid-tread quantizers do have a zero output level. For some applications, having a zero output signal representation may be a necessity.
 
In general, a mid-riser or mid-tread quantizer may not actually be a ''uniform'' quantizer – i.e., the size of the quantizer's classification [[interval (mathematics)|intervals]] may not all be the same, or the spacing between its possible output values may not all be the same. The distinguishing characteristic of a mid-riser quantizer is that it has a classification threshold value that is exactly zero, and the distinguishing characteristic of a mid-tread quantizer is that is it has a reconstruction value that is exactly zero.<ref name=Gersho77/>
 
===Dead-zone quantizers===
A '''dead-zone quantizer''' is a type of mid-tread quantizer with symmetric behavior around 0. The region around the zero output value of such a quantizer is referred to as the ''dead zone'' or ''[[deadband]]''. The dead zone can sometimes serve the same purpose as a [[noise gate]] or [[squelch]] function. Especially for compression applications, the dead-zone may be given a different width than that for the other steps. For an otherwise-uniform quantizer, the dead-zone width can be set to any value <math>w</math> by using the forward quantization rule<ref>{{cite book| first1=Majid |last1=Rabbani |first2=Rajan L. |last2=Joshi |first3=Paul W. |last3=Jones |editor1-first=Peter |editor1-last=Schelkens |editor2-first=Athanassios |editor2-last=Skodras |editor3-first=Touradj |editor3-last=Ebrahimi |title=The JPEG 2000 Suite | url=https://archive.org/details/jpegsuitethewile00sche | url-access=limited |publisher=[[John Wiley & Sons]] |date=2009 |isbn=978-0-470-72147-6 |chapter=Section 1.2.3: Quantization, in Chapter 1: JPEG 2000 Core Coding System (Part 1) |pages=[https://archive.org/details/jpegsuitethewile00sche/page/n73 22]–24}}</ref><ref>{{cite book| first1=David S. |last1=Taubman |first2=Michael W. |last2=Marcellin |title=JPEG2000: Image Compression Fundamentals, Standards and Practice | url=https://archive.org/details/jpegimagecompres00taub | url-access=limited |publisher=[[Kluwer Academic Publishers]] |date=2002 |isbn=0-7923-7519-X |chapter=Chapter 3: Quantization |page=[https://archive.org/details/jpegimagecompres00taub/page/n126 107]}}</ref><ref name=SullivanIT/>
:<math>k = \sgn(x) \cdot \max\left(0, \left\lfloor \frac{\left| x \right|-w/2}{\Delta}+1\right\rfloor\right)</math>,
where the function {{no break|<math>\sgn</math>(&nbsp;)}} is the [[sign function]] (also known as the ''signum'' function). The general reconstruction rule for such a dead-zone quantizer is given by
:<math>y_k = \sgn(k) \cdot\left(\frac{w}{2}+\Delta\cdot (|k|-1+r_k)\right)</math>,
where <math>r_k</math> is a reconstruction offset value in the range of 0 to 1 as a fraction of the step size. Ordinarily, <math>0 \le r_k \le \tfrac1{2}</math> when quantizing input data with a typical [[probability density function]] (PDF) that is symmetric around zero and reaches its peak value at zero (such as a [[Gaussian distribution|Gaussian]], [[Laplacian distribution|Laplacian]], or [[generalized Gaussian distribution|generalized Gaussian]] PDF). Although <math>r_k</math> may depend on <math>k</math> in general and can be chosen to fulfill the optimality condition described below, it is often simply set to a constant, such as <math>\tfrac1{2}</math>. (Note that in this definition, <math>y_0 = 0</math> due to the definition of the {{no break|<math>\sgn</math>(&nbsp;)}} function, so <math>r_0</math> has no effect.)
 
A very commonly used special case (e.g., the scheme typically used in financial accounting and elementary mathematics) is to set <math>w=\Delta</math> and <math>r_k=\tfrac1{2}</math> for all <math>k</math>. In this case, the dead-zone quantizer is also a uniform quantizer, since the central dead-zone of this quantizer has the same width as all of its other steps, and all of its reconstruction values are equally spaced as well.
 
==Noise and error characteristics{{anchor|Noise|Error}}==
 
===Additive noise model===
A common assumption for the analysis of quantization error is that it affects a signal processing system in a similar manner to that of additive [[white noise]] – having negligible correlation with the signal and an approximately flat [[power spectral density]].<ref name=Bennett/><ref name=GrayNeuhoff/><ref name=Widrow1>{{cite journal | last=Widrow | first=B. |author-link=Bernard Widrow| title=A Study of Rough Amplitude Quantization by Means of Nyquist Sampling Theory | journal=IRE Transactions on Circuit Theory | volume=3 | issue=4 | year=1956 | issn=0096-2007 | doi=10.1109/tct.1956.1086334 | pages=266–276| hdl=1721.1/12139 | s2cid=16777461 | hdl-access=free }}</ref><ref name=Widrow2>{{cite journal |last1=Widrow |first1=B. |author-link=Bernard Widrow |title=Statistical analysis of amplitude-quantized sampled-data systems |journal=Transactions of the American Institute of Electrical Engineers, Part II: Applications and Industry |date=1961 |volume=79 |issue=6 |pages=555–568 |url=http://www-isl.stanford.edu/~widrow/papers/j1961statisticalanalysis.pdf |doi=10.1109/TAI.1961.6371702 |archive-date=2011-04-01 |access-date=2012-08-17 |archive-url=https://web.archive.org/web/20110401025420/http://www-isl.stanford.edu/~widrow/papers/j1961statisticalanalysis.pdf |url-status=dead }}</ref> The additive noise model is commonly used for the analysis of quantization error effects in digital filtering systems, and it can be very useful in such analysis. It has been shown to be a valid model in cases of high-resolution quantization (small <math>\Delta</math> relative to the signal strength) with smooth PDFs.<ref name=Bennett/><ref name=MarcoNeuhoff>{{cite journal | last1=Marco | first1=D. | last2=Neuhoff | first2=D.L. | title=The Validity of the Additive Noise Model for Uniform Scalar Quantizers | journal=IEEE Transactions on Information Theory | volume=51 | issue=5 | year=2005 | issn=0018-9448 | doi=10.1109/tit.2005.846397 | pages=1739–1755| s2cid=14819261 }}</ref>
 
Additive noise behavior is not always a valid assumption. Quantization error (for quantizers defined as described here) is deterministically related to the signal and not entirely independent of it. Thus, periodic signals can create periodic quantization noise. And in some cases, it can even cause [[limit cycle]]s to appear in digital signal processing systems. One way to ensure effective independence of the quantization error from the source signal is to perform ''[[dither]]ed quantization'' (sometimes with ''[[noise shaping]]''), which involves adding random (or [[pseudo-random]]) noise to the signal prior to quantization.<ref name=GrayNeuhoff/><ref name=Widrow2/>
 
===Quantization error models===
In the typical case, the original signal is much larger than one [[least significant bit]] (LSB). When this is the case, the quantization error is not significantly correlated with the signal and has an approximately [[uniform distribution (continuous)|uniform distribution]]. When rounding is used to quantize, the quantization error has a [[mean]] of zero and the [[root mean square]] (RMS) value is the [[standard deviation]] of this distribution, given by <math>\scriptstyle {\frac{1}{\sqrt{12}}}\mathrm{LSB}\ \approx\ 0.289\,\mathrm{LSB}</math>. When truncation is used, the error has a non-zero mean of <math>\scriptstyle {\frac{1}{2}}\mathrm{LSB}</math> and the RMS value is <math>\scriptstyle {\frac{1}{\sqrt{3}}}\mathrm{LSB}</math>. Although rounding yields less RMS error than truncation, the difference is only due to the static (DC) term of <math>\scriptstyle {\frac{1}{2}}\mathrm{LSB}</math><nowiki>. The RMS values of the AC error are exactly the same in both cases, so there is no special advantage of rounding over truncation in situations where the DC term of the error can be ignored (such as in AC-coupled systems). In either case, the standard deviation, as a percentage of the full signal range, changes by a factor of 2 for each 1-bit change in the number of quantization bits. The potential signal-to-quantization-noise power ratio therefore changes by 4, or </nowiki><math>\scriptstyle 10\cdot \log_{10}(4)</math>, approximately 6&nbsp;dB per bit.
 
At lower amplitudes the quantization error becomes dependent on the input signal, resulting in distortion. This distortion is created after the anti-aliasing filter, and if these distortions are above 1/2 the sample rate they will alias back into the band of interest. In order to make the quantization error independent of the input signal, the signal is dithered by adding noise to the signal. This slightly reduces signal-to-noise ratio, but can completely eliminate the distortion.
 
===Quantization noise model===
[[File:Frequency spectrum of a sinusoid and its quantization noise floor.gif|thumb|300px|Comparison of quantizing a sinusoid to 64 levels (6 bits) and 256 levels (8 bits). The additive noise created by 6-bit quantization is 12 dB greater than the noise created by 8-bit quantization. When the spectral distribution is flat, as in this example, the 12 dB difference manifests as a measurable difference in the noise floors.]]
 
Quantization noise is a [[Model (abstract)|model]] of quantization error introduced by quantization in the ADC. It is a rounding error between the analog input voltage to the ADC and the output digitized value. The noise is non-linear and signal-dependent. It can be modeled in several different ways.
 
In an ideal ADC, where the quantization error is uniformly distributed between −1/2 LSB and +1/2 LSB, and the signal has a uniform distribution covering all quantization levels, the [[Signal-to-quantization-noise ratio]] (SQNR) can be calculated from
 
:<math>\mathrm{SQNR} = 20 \log_{10}(2^Q) \approx 6.02 \cdot Q\ \mathrm{dB} \,\!</math>
 
where Q is the number of quantization bits.
 
The most common test signals that fulfill this are full amplitude [[triangle wave]]s and [[sawtooth wave]]s.
 
For example, a [[16-bit]] ADC has a maximum signal-to-quantization-noise ratio of 6.02 × 16 = 96.3&nbsp;dB.
 
When the input signal is a full-amplitude [[sine wave]] the distribution of the signal is no longer uniform, and the corresponding equation is instead
 
:<math> \mathrm{SQNR} \approx 1.761 + 6.02 \cdot Q \ \mathrm{dB} \,\!</math>
 
Here, the quantization noise is once again ''assumed'' to be uniformly distributed. When the input signal has a high amplitude and a wide frequency spectrum this is the case.<ref>{{cite book
| last = Pohlman
| first =Ken C.
| title = Principles of Digital Audio 2nd Edition
| publisher = SAMS
| date = 1989
| page = 60
| isbn =9780071441568
| url = https://books.google.com/books?id=VZw6z9a03ikC&pg=PA37}}</ref> In this case a 16-bit ADC has a maximum signal-to-noise ratio of 98.09&nbsp;dB. The 1.761 difference in signal-to-noise only occurs due to the signal being a full-scale sine wave instead of a triangle or sawtooth.
 
For complex signals in high-resolution ADCs this is an accurate model. For low-resolution ADCs, low-level signals in high-resolution ADCs, and for simple waveforms the quantization noise is not uniformly distributed, making this model inaccurate.<ref>{{cite book
| last = Watkinson
| first = John
| title = The Art of Digital Audio 3rd Edition
| publisher = [[Focal Press]]
| date = 2001
| isbn = 0-240-51587-0}}</ref> In these cases the quantization noise distribution is strongly affected by the exact amplitude of the signal.
 
The calculations are relative to full-scale input. For smaller signals, the relative quantization distortion can be very large. To circumvent this issue, analog [[companding]] can be used, but this can introduce distortion.
 
==Design==
===Granular distortion and overload distortion===
Often the design of a quantizer involves supporting only a limited range of possible output values and performing clipping to limit the output to this range whenever the input exceeds the supported range. The error introduced by this clipping is referred to as ''overload'' distortion. Within the extreme limits of the supported range, the amount of spacing between the selectable output values of a quantizer is referred to as its ''granularity'', and the error introduced by this spacing is referred to as ''granular'' distortion. It is common for the design of a quantizer to involve determining the proper balance between granular distortion and overload distortion. For a given supported number of possible output values, reducing the average granular distortion may involve increasing the average overload distortion, and vice versa. A technique for controlling the amplitude of the signal (or, equivalently, the quantization step size <math>\Delta</math>) to achieve the appropriate balance is the use of ''[[automatic gain control]]'' (AGC). However, in some quantizer designs, the concepts of granular error and overload error may not apply (e.g., for a quantizer with a limited range of input data or with a countably infinite set of selectable output values).<ref name=GrayNeuhoff/>
 
===Rate–distortion quantizer design===
A scalar quantizer, which performs a quantization operation, can ordinarily be decomposed into two stages:
;Classification
:A process that classifies the input signal range into <math>M</math> non-overlapping ''[[interval (mathematics)|intervals]]'' <math>\{I_k\}_{k=1}^{M}</math>, by defining <math>M-1</math> ''decision boundary'' values <math> \{b_k\}_{k=1}^{M-1} </math>, such that <math> I_k = [b_{k-1}~,~b_k)</math> for <math>k = 1,2,\ldots,M</math>, with the extreme limits defined by <math> b_0 = -\infty</math> and <math> b_M = \infty</math>. All the inputs <math>x</math> that fall in a given interval range <math>I_k</math> are associated with the same quantization index <math>k</math>.
;Reconstruction
:Each interval <math> I_k </math> is represented by a ''reconstruction value'' <math> y_k </math> which implements the mapping <math> x \in I_k \Rightarrow y = y_k </math>.
 
These two stages together comprise the mathematical operation of <math>y = Q(x)</math>.
 
[[Entropy coding]] techniques can be applied to communicate the quantization indices from a source encoder that performs the classification stage to a decoder that performs the reconstruction stage. One way to do this is to associate each quantization index <math>k</math> with a binary codeword <math>c_k</math>. An important consideration is the number of bits used for each codeword, denoted here by <math>\mathrm{length}(c_k)</math>. As a result, the design of an <math>M</math>-level quantizer and an associated set of codewords for communicating its index values requires finding the values of <math> \{b_k\}_{k=1}^{M-1} </math>, <math>\{c_k\}_{k=1}^{M} </math> and <math> \{y_k\}_{k=1}^{M} </math> which optimally satisfy a selected set of design constraints such as the ''bit rate'' <math>R</math> and ''distortion'' <math>D</math>.
 
Assuming that an information source <math>S</math> produces random variables <math>X</math> with an associated PDF <math>f(x)</math>, the probability <math>p_k</math> that the random variable falls within a particular quantization interval <math>I_k</math> is given by:
:<math> p_k = P[x \in I_k] = \int_{b_{k-1}}^{b_k} f(x)dx </math>.
 
The resulting bit rate <math>R</math>, in units of average bits per quantized value, for this quantizer can be derived as follows:
:<math> R = \sum_{k=1}^{M} p_k \cdot \mathrm{length}(c_{k}) = \sum_{k=1}^{M} \mathrm{length}(c_k) \int_{b_{k-1}}^{b_k} f(x)dx </math>.
 
If it is assumed that distortion is measured by mean squared error,{{efn|Other distortion measures can also be considered, although mean squared error is a popular one.}} the distortion '''D''', is given by:
:<math> D = E[(x-Q(x))^2] = \int_{-\infty}^{\infty} (x-Q(x))^2f(x)dx = \sum_{k=1}^{M} \int_{b_{k-1}}^{b_k} (x-y_k)^2 f(x)dx </math>.
 
A key observation is that rate <math>R</math> depends on the decision boundaries <math>\{b_k\}_{k=1}^{M-1}</math> and the codeword lengths <math>\{\mathrm{length}(c_k)\}_{k=1}^{M}</math>, whereas the distortion <math>D</math> depends on the decision boundaries <math>\{b_k\}_{k=1}^{M-1}</math> and the reconstruction levels <math>\{y_k\}_{k=1}^{M}</math>.
 
After defining these two performance metrics for the quantizer, a typical rate–distortion formulation for a quantizer design problem can be expressed in one of two ways:
# Given a maximum distortion constraint <math>D \le D_\max</math>, minimize the bit rate <math>R</math>
# Given a maximum bit rate constraint <math>R \le R_\max</math>, minimize the distortion <math>D</math>
 
Often the solution to these problems can be equivalently (or approximately) expressed and solved by converting the formulation to the unconstrained problem <math>\min\left\{ D + \lambda \cdot R \right\}</math> where the [[Lagrange multiplier]] <math>\lambda</math> is a non-negative constant that establishes the appropriate balance between rate and distortion. Solving the unconstrained problem is equivalent to finding a point on the [[convex hull]] of the family of solutions to an equivalent constrained formulation of the problem. However, finding a solution – especially a [[Closed-form expression|closed-form]] solution – to any of these three problem formulations can be difficult. Solutions that do not require multi-dimensional iterative optimization techniques have been published for only three PDFs: the uniform,<ref>{{cite journal | last1=Farvardin | first1=N. |author-link=Nariman Farvardin| last2=Modestino | first2=J. | title=Optimum quantizer performance for a class of non-Gaussian memoryless sources | journal=IEEE Transactions on Information Theory | volume=30 | issue=3 | year=1984 | issn=0018-9448 | doi=10.1109/tit.1984.1056920 | pages=485–497}}(Section VI.C and Appendix B)</ref> [[Exponential distribution|exponential]],<ref name=SullivanIT>{{cite journal | last=Sullivan | first=G.J. |author-link=Gary Sullivan (engineer)| title=Efficient scalar quantization of exponential and Laplacian random variables | journal=IEEE Transactions on Information Theory | volume=42 | issue=5 | year=1996 | issn=0018-9448 | doi=10.1109/18.532878 | pages=1365–1374}}</ref> and [[Laplace distribution|Laplacian]]<ref name=SullivanIT/> distributions. Iterative optimization approaches can be used to find solutions in other cases.<ref name=GrayNeuhoff/><ref name=Berger72>{{cite journal | last=Berger | first=T. |author-link=Toby Berger| title=Optimum quantizers and permutation codes | journal=IEEE Transactions on Information Theory | volume=18 | issue=6 | year=1972 | issn=0018-9448 | doi=10.1109/tit.1972.1054906 | pages=759–765}}</ref><ref name=Berger82>{{cite journal | last=Berger | first=T. |author-link=Toby Berger| title=Minimum entropy quantizers and permutation codes | journal=IEEE Transactions on Information Theory | volume=28 | issue=2 | year=1982 | issn=0018-9448 | doi=10.1109/tit.1982.1056456 | pages=149–157}}</ref>
 
Note that the reconstruction values <math>\{y_k\}_{k=1}^{M}</math> affect only the distortion – they do not affect the bit rate – and that each individual <math>y_k</math> makes a separate contribution <math> d_k </math> to the total distortion as shown below:
:<math> D = \sum_{k=1}^{M} d_k </math>
where
:<math> d_k = \int_{b_{k-1}}^{b_k} (x-y_k)^2 f(x)dx </math>
* <math>x</math> is a real number,
This observation can be used to ease the analysis – given the set of <math>\{b_k\}_{k=1}^{M-1}</math> values, the value of each <math>y_k</math> can be optimized separately to minimize its contribution to the distortion <math>D</math>.
* <math>\lfloor x \rfloor</math> is the [[floor function]], yielding the integer <math>i = \lfloor f(x) \rfloor</math>
* <math>f(x)</math> and <math>g(i)</math> are arbitrary real-valued functions.
 
For the mean-square error distortion criterion, it can be easily shown that the optimal set of reconstruction values <math>\{y^*_k\}_{k=1}^{M}</math> is given by setting the reconstruction value <math>y_k</math> within each interval <math>I_k</math> to the [[conditional expected value]] (also referred to as the ''[[centroid]]'') within the interval, as given by:
The integer value <math>i</math> is the representation that is typically stored or transmitted, and then the final interpretation is constructed using <math>g(i)</math> when the data is later interpreted. The integer value <math>i</math> is sometimes referred to as the ''quantization index''.
:<math>y^*_k = \frac1{p_k} \int_{b_{k-1}}^{b_k} x f(x)dx</math>.
 
The use of sufficiently well-designed entropy coding techniques can result in the use of a bit rate that is close to the true information content of the indices <math>\{k\}_{k=1}^{M}</math>, such that effectively
In computer audio and most other applications, a method known as ''uniform quantization'' is the most common. There are two common variations of uniform quantization, called ''mid-rise'' and ''mid-tread'' uniform quantizers.
:<math> \mathrm{length}(c_k) \approx -\log_2\left(p_k\right)</math>
and therefore
:<math> R = \sum_{k=1}^{M} -p_k \cdot \log_2\left(p_k\right) </math>.
 
The use of this approximation can allow the entropy coding design problem to be separated from the design of the quantizer itself. Modern entropy coding techniques such as [[arithmetic coding]] can achieve bit rates that are very close to the true entropy of a source, given a set of known (or adaptively estimated) probabilities <math>\{p_k\}_{k=1}^{M}</math>.
If <math>x</math> is a real-valued number between -1 and 1, a mid-rise uniform quantization operator that uses ''M'' bits of precision to represent each quantization index can be expressed as
 
In some designs, rather than optimizing for a particular number of classification regions <math>M</math>, the quantizer design problem may include optimization of the value of <math>M</math> as well. For some probabilistic source models, the best performance may be achieved when <math>M</math> approaches infinity.
:<math>Q(x) = \frac{\left\lfloor 2^{M-1}x \right\rfloor+0.5}{2^{M-1}}</math>.
 
===Neglecting the entropy constraint: Lloyd–Max quantization===
In this case the <math>f(x)</math> and <math>g(i)</math> operators are just multiplying scale factors (one multiplier being the inverse of the other) along with an offset in ''g''(''i'') function to place the representation value in the middle of the input region for each quantization index. The value <math>2^{-(M-1)}</math> is often referred to as the ''quantization step size''. Using this quantization law and assuming that quantization noise is approximately [[uniform distribution (continuous)|uniformly distributed]] over the quantization step size (an assumption typically accurate for rapidly varying <math>x</math> or high <math>M</math>) and further assuming that the input signal <math>x</math> to be quantized is approximately uniformly distributed over the entire interval from -1 to 1, the [[signal to noise ratio]] (SNR) of the quantization can be computed as
In the above formulation, if the bit rate constraint is neglected by setting <math>\lambda</math> equal to 0, or equivalently if it is assumed that a fixed-length code (FLC) will be used to represent the quantized data instead of a [[variable-length code]] (or some other entropy coding technology such as arithmetic coding that is better than an FLC in the rate–distortion sense), the optimization problem reduces to minimization of distortion <math>D</math> alone.
 
The indices produced by an <math>M</math>-level quantizer can be coded using a fixed-length code using <math> R = \lceil \log_2 M \rceil </math> bits/symbol. For example, when <math>M=</math>256 levels, the FLC bit rate <math>R</math> is 8 bits/symbol. For this reason, such a quantizer has sometimes been called an 8-bit quantizer. However using an FLC eliminates the compression improvement that can be obtained by use of better entropy coding.
:<math>
\frac{S}{N_q} \approx 20 \log_{10}(2^M)
=
6.0206 M \ \operatorname{dB}</math>.
 
Assuming an FLC with <math>M</math> levels, the rate–distortion minimization problem can be reduced to distortion minimization alone. The reduced problem can be stated as follows: given a source <math>X</math> with PDF <math>f(x)</math> and the constraint that the quantizer must use only <math>M</math> classification regions, find the decision boundaries <math>\{b_k\}_{k=1}^{M-1} </math> and reconstruction levels <math>\{y_k\}_{k=1}^M</math> to minimize the resulting distortion
From this equation, it is often said that the SNR is approximately 6 [[decibel|dB]] per [[bit]].
:<math> D=E[(x-Q(x))^2] = \int_{-\infty}^{\infty} (x-Q(x))^2f(x)dx = \sum_{k=1}^{M} \int_{b_{k-1}}^{b_k} (x-y_k)^2 f(x)dx =\sum_{k=1}^{M} d_k </math>.
 
Finding an optimal solution to the above problem results in a quantizer sometimes called a MMSQE (minimum mean-square quantization error) solution, and the resulting PDF-optimized (non-uniform) quantizer is referred to as a ''Lloyd–Max'' quantizer, named after two people who independently developed iterative methods<ref name=GrayNeuhoff/><ref>{{cite journal | last=Lloyd | first=S. | title=Least squares quantization in PCM | journal=IEEE Transactions on Information Theory | volume=28 | issue=2 | year=1982 | issn=0018-9448 | doi=10.1109/tit.1982.1056489 | pages=129–137| s2cid=10833328 | citeseerx=10.1.1.131.1338 }} (work documented in a manuscript circulated for comments at [[Bell Laboratories]] with a department log date of 31 July 1957 and also presented at the 1957 meeting of the [[Institute of Mathematical Statistics]], although not formally published until 1982).</ref><ref>{{cite journal | last=Max | first=J. | title=Quantizing for minimum distortion | journal=IEEE Transactions on Information Theory | volume=6 | issue=1 | year=1960 | issn=0018-9448 | doi=10.1109/tit.1960.1057548 | pages=7–12| bibcode=1960ITIT....6....7M }}</ref> to solve the two sets of simultaneous equations resulting from <math> {\partial D / \partial b_k} = 0 </math> and <math>{\partial D/ \partial y_k} = 0 </math>, as follows:
For mid-tread uniform quantization, the offset of 0.5 would be added within the floor function instead of outside of it.
:<math> {\partial D \over\partial b_k} = 0 \Rightarrow b_k = {y_k + y_{k+1} \over 2} </math>,
which places each threshold at the midpoint between each pair of reconstruction values, and
:<math> {\partial D \over\partial y_k} = 0 \Rightarrow y_k = { \int_{b_{k-1}}^{b_k} x f(x) dx \over \int_{b_{k-1}}^{b_k} f(x)dx } = \frac1{p_k} \int_{b_{k-1}}^{b_k} x f(x) dx </math>
which places each reconstruction value at the centroid (conditional expected value) of its associated classification interval.
 
[[Lloyd's algorithm|Lloyd's Method I algorithm]], originally described in 1957, can be generalized in a straightforward way for application to vector data. This generalization results in the [[Linde–Buzo–Gray algorithm|Linde–Buzo–Gray (LBG)]] or [[k-means]] classifier optimization methods. Moreover, the technique can be further generalized in a straightforward way to also include an entropy constraint for vector data.<ref name=ChouLookabaughGray>{{cite journal | last1=Chou | first1=P.A. | last2=Lookabaugh | first2=T. | last3=Gray | first3=R.M. |author-link3=Robert M. Gray| title=Entropy-constrained vector quantization | journal=IEEE Transactions on Acoustics, Speech, and Signal Processing | volume=37 | issue=1 | year=1989 | issn=0096-3518 | doi=10.1109/29.17498 | pages=31–42}}</ref>
Sometimes, mid-rise quantization is used without adding the offset of 0.5. This reduces the signal to noise ratio by approximately 6.02 dB, but may be acceptable for the sake of simplicity when the step size is small.
 
===Uniform quantization and the 6&nbsp;dB/bit approximation===
In digital [[telephone|telephony]], two popular quantization schemes are the '[[A-law algorithm|A-law]]' (dominant in [[Europe]]) and '[[Mu-law algorithm|&mu;-law]]' (dominant in [[North America]] and [[Japan]]). These schemes map discrete analog values to an 8-bit scale that is nearly linear for small values and then increases logarithmically as amplitude grows. Because the human ear's perception of [[loudness]] is roughly logarithmic, this provides a higher signal to noise ratio over the range of audible sound intensities for a given number of bits.
The Lloyd–Max quantizer is actually a uniform quantizer when the input PDF is uniformly distributed over the range <math>[y_1-\Delta/2,~y_M+\Delta/2)</math>. However, for a source that does not have a uniform distribution, the minimum-distortion quantizer may not be a uniform quantizer. The analysis of a uniform quantizer applied to a uniformly distributed source can be summarized in what follows:
 
A symmetric source X can be modelled with <math> f(x)= \tfrac1{2X_{\max}}</math>, for <math>x \in [-X_{\max} , X_{\max}]</math> and 0 elsewhere.
== Quantization and data compression ==
The step size <math>\Delta = \tfrac {2X_{\max}} {M} </math> and the ''signal to quantization noise ratio'' (SQNR) of the quantizer is
Quantization plays a major part in [[lossy data compression]]. In many cases, quantization can be viewed as the fundamental element that distinguishes [[lossy data compression]] from [[lossless data compression]], and the use of quantization is nearly always motivated by the need to reduce the amount of data needed to represent a signal. In some compression schemes, like [[MP3]] or [[Vorbis]], compression is also achieved by selectively discarding some data, an action that can be analyzed as a quantization process (e.g., a vector quantization process) or can be considered a different kind of lossy process.
:<math>{\rm SQNR}= 10\log_{10}{\frac {\sigma_x^2}{\sigma_q^2}} = 10\log_{10}{\frac {(M\Delta)^2/12}{\Delta^2/12}}= 10\log_{10}M^2= 20\log_{10}M</math>.
 
For a fixed-length code using <math>N</math> bits, <math>M=2^N</math>, resulting in
One example of a lossy compression scheme that uses quantization is [[JPEG]] image compression.
<math>{\rm SQNR}= 20\log_{10}{2^N} = N\cdot(20\log_{10}2) = N\cdot 6.0206\,\rm{dB}</math>,
During JPEG encoding, the data representing an image (typically 8-bits for each of three color components per pixel) is processed using a [[discrete cosine transform]] and is then quantized and [[entropy encoding|entropy coded]]. By reducing the precision of the transformed values using quantization, the number of bits needed to represent the image can be reduced substantially.
For example, images can often be represented with acceptable quality using JPEG at less than 3 bits per pixel (as opposed to the typical 24 bits per pixel needed prior to JPEG compression).
Even the original representation using 24 bits per pixel requires quantization for its [[pulse-code modulation|PCM]] sampling structure.
 
or approximately 6&nbsp;dB per bit. For example, for <math>N</math>=8 bits, <math>M</math>=256 levels and SQNR = 8×6 = 48&nbsp;dB; and for <math>N</math>=16 bits, <math>M</math>=65536 and SQNR = 16×6 = 96&nbsp;dB. The property of 6&nbsp;dB improvement in SQNR for each extra bit used in quantization is a well-known figure of merit. However, it must be used with care: this derivation is only for a uniform quantizer applied to a uniform source. For other source PDFs and other quantizer designs, the SQNR may be somewhat different from that predicted by 6&nbsp;dB/bit, depending on the type of PDF, the type of source, the type of quantizer, and the bit rate range of operation.
In modern compression technology, the [[information entropy|entropy]] of the output of a quantizer matters more than the number of possible values of its output (the number of values being <math>2^M</math> in the above example).
 
However, it is common to assume that for many sources, the slope of a quantizer SQNR function can be approximated as 6&nbsp;dB/bit when operating at a sufficiently high bit rate. At asymptotically high bit rates, cutting the step size in half increases the bit rate by approximately 1 bit per sample (because 1 bit is needed to indicate whether the value is in the left or right half of the prior double-sized interval) and reduces the mean squared error by a factor of 4 (i.e., 6&nbsp;dB) based on the <math>\Delta^2/12</math> approximation.
== Relation to quantization in nature ==
At the most fundamental level, all [[physical quantity|physical quantities]] are quantized. This is a result of [[quantum mechanics]] (see [[Quantization (physics)]]). Signals may be treated as continuous for mathematical simplicity by considering the small quantizations as negligible.
 
At asymptotically high bit rates, the 6&nbsp;dB/bit approximation is supported for many source PDFs by rigorous theoretical analysis.<ref name=Bennett/><ref name=OliverPierceShannon/><ref name=GishPierce/><ref name=GrayNeuhoff/> Moreover, the structure of the optimal scalar quantizer (in the rate–distortion sense) approaches that of a uniform quantizer under these conditions.<ref name=GishPierce/><ref name=GrayNeuhoff/>
In any practical application, this inherent quantization is irrelevant. First of all, it is overshadowed by [[signal noise]], the intrusion of extraneous phenomena present in the system upon the signal of interest. The second, which appears only in measurement applications, is the inaccuracy of instruments.
<!-- I don't think that was proved by anyone else before it was done by Gish & Pearce in '68. For example, was it done by Koshelev in '63? (I don't think so) Zador in '66? (I don't know - probably not) Goblick & Holsinger in '67? (I don't see it in that paper.) -->
 
==In Related topicsother fields==
{{See also|Quantum noise|Quantum limit}}
* [[Analog-to-digital converter]], [[Digital-to-analog converter]]
Many physical quantities are actually quantized by physical entities. Examples of fields where this limitation applies include [[electronics]] (due to [[electron]]s), [[optics]] (due to [[photon]]s), [[biology]] (due to [[DNA]]), [[physics]] (due to [[Planck limits]]) and [[chemistry]] (due to [[molecule]]s).
* [[Discrete]], [[Digital]]
* [[Dither]]
* [[Information theory]]
* [[Rate distortion theory]]
* [[Vector quantization]]
 
==See External links also==
* [[Beta encoder]]
*[http://www.math.ucdavis.edu/~saito/courses/ACHA/44it06-gray.pdf Paper on mathematical theory and analysis of quantization]
* [[Color quantization]]
*[http://www.dsprelated.com/comp.dsp/keyword/Quantization.php Quantization threads in Comp.DSP]
* [[Data binning]]
* [[Discretization]]
* [[Discretization error]]
* [[Least count]]
* [[Posterization]]
* [[Pulse-code modulation]]
* [[Quantile]]
* [[Quantization (image processing)]]
* [[Regression dilution]] – a bias in parameter estimates caused by errors such as quantization in the explanatory or independent variable
 
==Notes==
[[Category:Signal processing]]
{{Notelist}}
 
==References==
[[da:Kvantisering]]
{{Reflist}}
{{refbegin}}
*{{Citation |last=Sayood |first= Khalid|year= 2005 |title= Introduction to Data Compression, Third Edition |publisher= Morgan Kaufmann |isbn= 978-0-12-620862-7}}
*{{Citation |last1=Jayant |first1= Nikil S.|last2=Noll|first2=Peter|year= 1984 |title= Digital Coding of Waveforms: Principles and Applications to Speech and Video |publisher= Prentice–Hall |isbn=978-0-13-211913-9}}
*{{Citation |last=Gregg|first= W. David |year= 1977 |title= Analog & Digital Communication |publisher= John Wiley |isbn=978-0-471-32661-8
}}
*{{Citation |last1=Stein |first1= Seymour|last2= Jones|first2= J. Jay |year= 1967 |title= Modern Communication Principles |publisher= [[McGraw–Hill]] |isbn=978-0-07-061003-3}}
{{refend}}
 
==Further reading==
* {{cite book |url=http://www.mit.bme.hu/books/quantization/ |title=Quantization noise in Digital Computation, Signal Processing, and Control |author1=Bernard Widrow |author2=István Kollár |date=2007 |publisher=Cambridge University Press |isbn=9780521886710}}
 
{{DSP}}
{{Compression Methods}}
{{Noise}}
 
{{DEFAULTSORT:Quantization (Signal Processing)}}
[[Category:Digital signal processing]]
[[Category:Computer graphic artifacts]]
[[Category:Digital audio]]
[[Category:Noise (electronics)]]
[[Category:Signal processing]]
[[Category:Telecommunication theory]]
[[Category:Data compression]]