Probability distribution fitting: Difference between revisions

Content deleted Content added
Stopped an image intruding on the references section.
 
(9 intermediate revisions by 9 users not shown)
Line 1:
{{Short description|Mathematical concept}}
{{Use dmy dates|date=October 2020}}
'''Probability distribution fitting''' or simply '''distribution fitting''' is the fitting of a [[probability distribution]] to a series of data concerning the repeated measurement of a variable phenomenon.
Line 27 ⟶ 28:
 
The following techniques of distribution fitting exist:<ref>''Frequency and Regression Analysis''. Chapter 6 in: H.P.Ritzema (ed., 1994), ''Drainage Principles and Applications'', Publ. 16, pp. 175–224, International Institute for Land Reclamation and Improvement (ILRI), Wageningen, The Netherlands. {{ISBN|9070754339}}. Free download from the webpage [http://www.waterlog.info/articles.htm] under nr. 12, or directly as PDF : [http://www.waterlog.info/pdf/freqtxt.pdf]</ref>
*''Parametric methods'', by which the [[parameter]]s of the distribution are calculated from the data series.<ref>H. Cramér, "Mathematical methods of statistics" , Princeton Univ. Press (1946)</ref> The parametric methods are:
**[[Method of moments (statistics)|Method of moments]]
**[[Maximum spacing estimation]]
**Method of [[L-moment]]s<ref>{{cite journal | last=Hosking | first=J.R.M. | year=1990 | title=L-moments: analysis and estimation of distributions using linear combinations of order statistics | journal=Journal of the Royal Statistical Society, Series B | volume=52 | issue=1 | pages=105–124 | jstor=2345653}}</ref>
**[[Maximum likelihood]] method<ref>{{cite journal | last = Aldrich | first = John | title = R. A. Fisher and the making of maximum likelihood 1912–1922 | year = 1997 | journal = Statistical Science | volume = 12 | issue = 3 | pages = 162–176 | doi = 10.1214/ss/1030037906 | mr = 1617519 | ref = citeref Aldrich1997| doi-access = free }}</ref>
::{| class="wikitable"
Line 68 ⟶ 69:
== Shifting of distributions ==
 
Some probability distributions, like the [[exponential distribution|exponential]], do not support negative data values (''X'') equal to or less than zero. Yet, when negative data are present, such distributions can still be used replacing ''X'' by ''Y''=''X''-''Xm'', where ''Xm'' is the minimum value of ''X''. This replacement represents a shift of the probability distribution in positive direction, i.e. to the right, because ''Xm'' is negative. After completing the distribution fitting of ''Y'', the corresponding ''X''-values are found from ''X''=''Y''+''Xm'', which represents a back-shift of the distribution in negative direction, i.e. to the left.<br>
The technique of distribution shifting augments the chance to find a properly fitting probability distribution.
 
Line 77 ⟶ 78:
 
== Uncertainty of prediction ==
[[File:BinomialConfBelts.jpg|thumb|<small>Uncertainty analysis with confidence belts using the binomial distribution </small><ref>Frequency predictions and their binomial confidence limits. In: International Commission on Irrigation and Drainage, Special Technical Session: Economic Aspects of Flood Control and non-Structural Measures, Dubrovnik, Yugoslavia, 1988. [http://www.waterlog.info/pdf/binomial.pdf On line]</ref>]]
Predictions of occurrence based on fitted probability distributions are subject to [[uncertainty]], which arises from the following conditions:
 
Line 90 ⟶ 91:
With the binomial distribution one can obtain a [[prediction interval]]. Such an interval also estimates the risk of failure, i.e. the chance that the predicted event still remains outside the confidence interval. The confidence or risk analysis may include the [[return period]] ''T=1/Pe'' as is done in [[hydrology]].
 
=== [[Variance]] of [[Bayesian inference|Bayesian]] fitted probability functions ===
[[File:CumList.png|thumb|left|List of probability distributions ranked by goodness of fit.<ref>[https://www.waterlog.info/cumfreq.htm Software for probability distribution fitting]</ref>]]
A Bayesian approach can be used for fitting a model <math>P(x|\theta)</math> having a prior distribution <math>P(\theta)</math> for the parameter <math>\theta</math>. When one has samples <math>X</math> that are independently drawn from the underlying distribution then one can derive the so-called posterior distribution <math>P(\theta|X)</math>. This posterior can be used to update the probability mass function for a new sample <math>x</math> given the observations <math>X</math>, one obtains
 
<math display="block">P_\theta (x | X) := \int d\theta\ P(x|\theta)\ P(\theta|X) .</math>
 
The variance of the newly obtained probability mass function can also be determined. The variance for a Bayesian probability mass function can be defined as
 
<math display="block">\sigma_{P_\theta(x|X)}^2 := \int d\theta\ \left[ P(x|\theta) - P_\theta(x|X) \right]^2\ P(\theta|X).</math>
 
This expression for the variance can be substantially simplified (assuming independently drawn samples). Defining the "self probability mass function" as
 
<math display="block">P_\theta(x|\left\{X,x\right\}) = \int d\theta\ P(x|\theta)\ P(\theta|\left\{X, x\right\}),</math>
 
one obtains for the variance<ref>{{Cite journal |last1=Pijlman |last2=Linnartz |date=2023 |title=Variance of Likelihood of data |url=https://sitb2023.ulb.be/proceedings/ |journal=SITB 2023 Proceedings |pages=34}}</ref>
 
<math display="block">\sigma_{P_\theta(x|X)}^2 = P_\theta(x|X) \left[ P_\theta(x|\left\{X,x\right\}) - P_\theta(x|X) \right].</math>
 
The expression for variance involves an additional fit that includes the sample <math>x</math> of interest.[[File:CumList.png|thumb|left|List of probability distributions ranked by goodness of fit.<ref>[https://www.waterlog.info/cumfreq.htm Software for probability distribution fitting]</ref>]]
 
[[File:GEVdistrHistogr+Density.png|thumb|220px|Histogram and probability density of a data set fitting the [[GEV distribution]] ]]
Line 107 ⟶ 125:
* [[Mixture distribution]]
* [[Product distribution]]
{{clear}}
 
== References ==