Quantile-parameterized distribution: Difference between revisions

Content deleted Content added
Definition: Fixed typo
Tags: Mobile edit Mobile web edit
Riskanal (talk | contribs)
Applications: changed some links
Line 6:
For example, the [[beta distribution]] is a flexible Pearson distribution that is frequently used to model percentages of a population. However, if the characteristics of this population are such that the desired [[cumulative distribution function]] (CDF) should run through certain specific CDF points, there may be no beta distribution that meets this need. Because the beta distribution has only two shape parameters, it cannot, in general, match even three specified CDF points. Moreover, the beta parameters that best fit such data can be found only by nonlinear iterative methods.
 
Practitioners of [[decision analysis]], needing distributions easily parameterized by three or more CDF points (e.g., because such points were specified as the result of an [[Expert elicitation|expert-elicitation process]]), originally invented quantile-parameterized distributions for this purpose. Keelin and Powley (2011)<ref name="KeelinPowley">[[doi:10.1287/deca.1110.0213|Keelin, T.W. and Powley, B.W. (2011). “Quantile-parameterized distributions.” Decision Analysis. 8 (3): 206–219.]]</ref> provided the original definition. Subsequently, Keelin (2016)<ref name="Keelin2016">[[doi:10.1287/deca.2016.0338|Keelin, T.W. (2016). “The Metalog Distributions.” Decision Analysis. 13 (4): 243–277.]]</ref> developed the [http://www.metalogs.org [metalog distributionsdistribution]]s, a family of quantile-parameterized distributions that has virtually unlimited shape flexibility, simple equations, and closed-form moments.
 
== Definition ==
Line 63:
 
=== Shape flexibility ===
A QPD with <math>n</math> terms, where <math>n\ge 2</math>, has <math>n-2</math> shape parameters. Thus, QPDs can be far more flexible than the [[Pearson distribution]]s, which have at most two shape parameters. For example, ten-term [http://www.metalogs.org [metalog] distributionsdistribution]]s parameterized by 105 CDF points from 30 traditional source distributions (including normal, student-t, lognormal, gamma, beta, and extreme value) have been shown to approximate each such source distribution within a [[Kolmogorov–Smirnov test|K–S]] distance of 0.001 or less.<ref>[[doi:10.1287/deca.2016.0338|Keelin, T.W. (2016), Table 8]]</ref>
 
=== Transformations ===
QPD transformations are governed by a general property of quantile functions: for any [[quantile function]] <math>x=Q(y)</math> and increasing function <math>t(x), x=t^{-1} (Q(y))</math> is a [[quantile function]].<ref>Gilchrist, W., 2000. Statistical modelling with quantile functions. CRC Press.</ref> For example, the [[quantile function]] of the [[normal distribution]], <math>x=\mu+\sigma \Phi^{-1} (y)</math>, is a QPD by the Keelin and Powley definition. The natural logarithm, <math>t(x)=\ln(x-b_l)</math>, is an increasing function, so <math>x=b_l+e^{\mu+\sigma \Phi^{-1} (y)}</math> is the [[quantile function]] of the [[Log-normal distribution|lognormal distribution]] with lower bound <math>b_l</math>. Importantly, this transformation converts an unbounded QPD into a semi-bounded QPD. Similarly, applying this log transformation to the [https://en.wikipedia.org/wiki/Metalog_distribution#Unbounded,_semibounded,_and_bounded_metalog_distributions unbounded metalog distribution]<ref name="UnboundedMetalog">[[doi:10.1287/deca.2016.0338|Keelin, T.W. (2016), Section 3, pp. 249–257.]]</ref> yields the [https://en.wikipedia.org/wiki/Metalog_distribution#Unbounded,_semibounded,_and_bounded_metalog_distributions semi-bounded (log) metalog distribution];<ref name="KeelinSec4">[[doi:10.1287/deca.2016.0338|Keelin, T.W. (2016), Section 4.]]</ref> likewise, applying the logit transformation, <math>t(x)=\ln((x-b_l)/(b_u-x))</math>, yields the [https://en.wikipedia.org/wiki/Metalog_distribution#Unbounded,_semibounded,_and_bounded_metalog_distributions bounded (logit) metalog distribution]<ref name="KeelinSec4" /> with lower and upper bounds <math>b_l</math> and <math>b_u</math>, respectively. Moreover, by considering <math>t(x)</math> to be <math>F^{-1} (y)</math> distributed, where <math>F^{-1} (y)</math> is any QPD that meets Keelin and Powley’s definition, the transformed variable maintains the above properties of feasibility, convexity, and fitting to data. Such transformed QPDs have greater shape flexibility than the underlying <math>F^{-1} (y)</math>, which has <math>n-2</math> shape parameters; the log transformation has <math>n-1</math> shape parameters, and the logit transformation has <math>n</math> shape parameters. Moreover, such transformed QPDs share the same set of feasible coefficients as the underlying untransformed QPD.<ref>[http://metalogdistributions.com/images/Powley_Dissertation_2013-augmented.pdf Powley, B.W. (2013). “Quantile Function Methods For Decision Analysis”. Corollary 12, p 30. PhD Dissertation, Stanford University]</ref>
 
 
Line 96:
 
== Applications ==
The original applications of QPDs were by decision analysts wishing to conveniently convert expert-assessed quantiles (e.g., 10th, 50th, and 90th quantiles) into smooth continuous probability distributions. QPDs have also been used to fit output data from simulations in order to represent those outputs (both CDFs and PDFs) as closed-form continuous distributions.<ref>[[doi:10.1287/deca.2016.0338|Keelin, T.W. (2016), Section 6.2.2, pp. 271–274.]]</ref> Used in this way, they are typically more stable and smoother than histograms. Similarly, since QPDs can impose fewer shape constraints than traditional distributions, they have been used to fit a wide range of empirical data in order to represent those data sets as continuous distributions (e.g., reflecting bimodality that may exist in the data in a straightforward manner<ref>[[doi:10.1287/deca.2016.0338|Keelin, T.W. (2016), Section 6.1.1, Figure 10, pp 266–267.]]</ref>). Quantile parameterization enables a closed-form QPD representation of known distributions whose CDFs otherwise have no closed-form expression. Keelin et al. (2019)<ref>[https://dl.acm.org/doi/abs/10.5555/3400397.3400643 Keelin, T.W., Chrisman, L. and Savage, S.L. (2019). “The metalog distributions and extremely accurate sums of lognormals in closed form.” WSC '19: Proceedings of the Winter Simulation Conference. 3074–3085.]</ref> apply this to the sum of independent identically distributed lognormal distributions, where quantiles of the sum can be determined by a large number of simulations. Nine such quantiles are used to parameterize a semi-bounded metalog distribution that runs through each of these nine quantiles exactly. QPDs have also been applied to assess the risks of asteroid impact,<ref>[[doi:10.1111/risa.12453|Reinhardt, J.D., Chen, X., Liu, W., Manchev, P. and Pate-Cornell, M.E. (2016). “Asteroid risk assessment: A probabilistic approach.” Risk Analysis. 36 (2): 244–261]]</ref> cybersecurity,<ref name="Faber" /><ref>[https://www.sciencedirect.com/science/article/pii/S0167404819300604 Wang, J., Neil, M. and Fenton, N. (2020). “A Bayesian network approach for cybersecurity risk assessment implementing and extending the FAIR model.” Computers & Security. 89: 101659.]</ref> biases in projections of oil-field production when compared to observed production after the fact,<ref>[https://www.onepetro.org/journal-paper/SPE-195914-PA Bratvold, R.B., Mohus, E., Petutschnig, D. and Bickel, E. (2020). “Production forecasting: Optimistic and overconfident—Over and over again.” Society of Petroleum Engineers. doi:10.2118/195914-PA.]</ref> and future Canadian population projections based on combining the probabilistic views of multiple experts.<ref>[https://library.oapen.org/bitstream/handle/20.500.12657/42565/2020_Book_DevelopmentsInDemographicForec.pdf?sequence=1#page=51 Dion, P., Galbraith, N., Sirag, E. (2020). “Using expert elicitation to build long-term projection assumptions.” In Developments in Demographic Forecasting, Chapter 3, pp. 43–62. Springer]</ref> See [https://en.wikipedia.org/wiki/Metalog_distribution#Applications metalog distributions] and Keelin (2016)<ref name="Keelin2016" /> for additional applications of the metalog distribution.
 
 
== External links ==