Quantile-parameterized distribution: Difference between revisions

Content deleted Content added
Citation bot (talk | contribs)
Added publisher. | Use this bot. Report bugs. | Suggested by LeapTorchGear | #UCB_webform 82/639
OAbot (talk | contribs)
m Open access bot: url-access updated in citation with #oabot.
Line 2:
 
== History ==
The development of quantile-parameterized distributions was inspired by the practical need for flexible continuous probability distributions that are easy to fit to data. Historically, the [[Pearson distribution|Pearson]]<ref>Johnson NL, Kotz S, Balakrishnan N. Continuous univariate distributions, Vol 1, Second Edition, John Wiley & Sons, Ltd, 1994, pp. 15–25.</ref> and [[Norman Lloyd Johnson|Johnson]]<ref>{{cite journal | url=https://www.jstor.org/stable/2332539 | jstor=2332539 | title=Systems of Frequency Curves Generated by Methods of Translation | last1=Johnson | first1=N. L. | journal=Biometrika | year=1949 | volume=36 | issue=1/2 | pages=149–176 | doi=10.2307/2332539 | pmid=18132090 | url-access=subscription }}</ref><ref>{{cite journal | url=https://www.jstor.org/stable/2335422 | jstor=2335422 | title=Systems of Frequency Curves Generated by Transformations of Logistic Variables | last1=Tadikamalla | first1=Pandu R. | last2=Johnson | first2=Norman L. | journal=Biometrika | year=1982 | volume=69 | issue=2 | pages=461–465 | doi=10.1093/biomet/69.2.461 | url-access=subscription }}</ref> families of distributions have been used when shape flexibility is needed. That is because both families can match the first four moments (mean, variance, skewness, and kurtosis) of any data set. In many cases, however, these distributions are either difficult to fit to data or not flexible enough to fit the data appropriately.
 
For example, the [[beta distribution]] is a flexible Pearson distribution that is frequently used to model percentages of a population. However, if the characteristics of this population are such that the desired [[cumulative distribution function]] (CDF) should run through certain specific CDF points, there may be no beta distribution that meets this need. Because the beta distribution has only two shape parameters, it cannot, in general, match even three specified CDF points. Moreover, the beta parameters that best fit such data can be found only by nonlinear iterative methods.
Line 93:
* Quantile functions expressed as [[polynomial]] functions of cumulative probability <math>y</math>, including [[Chebyshev polynomial]] functions.
 
Like the SPT metalog distributions, the Johnson Quantile-Parameterized Distributions<ref>{{cite journal | url=https://pubsonline.informs.org/doi/abs/10.1287/deca.2016.0343 | doi=10.1287/deca.2016.0343 | title=Johnson Quantile-Parameterized Distributions | year=2017 | last1=Hadlock | first1=Christopher C. | last2=Bickel | first2=J. Eric | journal=Decision Analysis | volume=14 | pages=35–64 | url-access=subscription }}</ref><ref>{{cite journal | url=https://pubsonline.informs.org/doi/abs/10.1287/deca.2018.0376 | doi=10.1287/deca.2018.0376 | title=The Generalized Johnson Quantile-Parameterized Distribution System | year=2019 | last1=Hadlock | first1=Christopher C. | last2=Bickel | first2=J. Eric | journal=Decision Analysis | volume=16 | pages=67–85 | s2cid=159339224 | url-access=subscription }}</ref> (JQPDs) are parameterized by three quantiles. JQPDs do not meet Keelin and Powley’s QPD definition, but rather have their own properties. JQPDs are feasible for all SPT parameter sets that are consistent with the [[Probability theory|rules of probability]].
 
== Applications ==
The original applications of QPDs were by decision analysts wishing to conveniently convert expert-assessed quantiles (e.g., 10th, 50th, and 90th quantiles) into smooth continuous probability distributions. QPDs have also been used to fit output data from simulations in order to represent those outputs (both CDFs and PDFs) as closed-form continuous distributions.<ref>[[doi:10.1287/deca.2016.0338|Keelin, T.W. (2016), Section 6.2.2, pp. 271–274.]]</ref> Used in this way, they are typically more stable and smoother than histograms. Similarly, since QPDs can impose fewer shape constraints than traditional distributions, they have been used to fit a wide range of empirical data in order to represent those data sets as continuous distributions (e.g., reflecting bimodality that may exist in the data in a straightforward manner<ref>[[doi:10.1287/deca.2016.0338|Keelin, T.W. (2016), Section 6.1.1, Figure 10, pp 266–267.]]</ref>). Quantile parameterization enables a closed-form QPD representation of known distributions whose CDFs otherwise have no closed-form expression. Keelin et al. (2019)<ref>{{cite book | url=https://dl.acm.org/doi/abs/10.5555/3400397.3400643 | isbn=9781728132839 | title=The metalog distributions and extremely accurate sums of lognormals in closed form | date=18 May 2020 | pages=3074–3085 | last1=Mustafee | first1=N. | publisher=Institute of Electrical and Electronics Engineers (IEEE) }}</ref> apply this to the sum of independent identically distributed lognormal distributions, where quantiles of the sum can be determined by a large number of simulations. Nine such quantiles are used to parameterize a semi-bounded metalog distribution that runs through each of these nine quantiles exactly. QPDs have also been applied to assess the risks of asteroid impact,<ref>{{cite journal | url=https://doi.org/10.1111/risa.12453 | doi=10.1111/risa.12453 | title=Asteroid Risk Assessment: A Probabilistic Approach | year=2016 | last1=Reinhardt | first1=Jason C. | last2=Chen | first2=Xi | last3=Liu | first3=Wenhao | last4=Manchev | first4=Petar | last5=Paté-Cornell | first5=M. Elisabeth | journal=Risk Analysis | volume=36 | issue=2 | pages=244–261 | pmid=26215051 | bibcode=2016RiskA..36..244R | s2cid=23308354 | url-access=subscription }}</ref> cybersecurity,<ref name="Faber" /><ref>{{cite journal | url=https://www.sciencedirect.com/science/article/pii/S0167404819300604 | doi=10.1016/j.cose.2019.101659 | title=A Bayesian network approach for cybersecurity risk assessment implementing and extending the FAIR model | year=2020 | last1=Wang | first1=Jiali | last2=Neil | first2=Martin | last3=Fenton | first3=Norman | journal=Computers & Security | volume=89 | page=101659 | s2cid=209099797 }}</ref> biases in projections of oil-field production when compared to observed production after the fact,<ref>{{Cite journal |url=https://www.onepetro.org/journal-paper/SPE-195914-PA |doi=10.2118/195914-PA |title=Production Forecasting: Optimistic and Overconfident—Over and over Again |year=2020 |last1=Bratvold |first1=Reidar B. |last2=Mohus |first2=Erlend |last3=Petutschnig |first3=David |last4=Bickel |first4=Eric |journal=Spe Reservoir Evaluation & Engineering |volume=23 |issue=3 |pages=0799–0810 |s2cid=219661316 |url-access=subscription }}</ref> and future Canadian population projections based on combining the probabilistic views of multiple experts.<ref>{{Cite book |url=https://library.oapen.org/bitstream/handle/20.500.12657/42565/2020_Book_DevelopmentsInDemographicForec.pdf?sequence=1#page=51 |title=Developments in Demographic Forecasting |year=2020 |isbn=978-3-030-42471-8 |series=The Springer Series on Demographic Methods and Population Analysis |volume=49 |pages=43–62 |doi=10.1007/978-3-030-42472-5 |hdl=20.500.12657/42565 |s2cid=226615299}}</ref> See [[Metalog distribution#Applications|metalog distributions]] and Keelin (2016)<ref name="Keelin2016" /> for additional applications of the metalog distribution.