Content deleted Content added
m Removed erroneous space and general fixes (task 1) |
Stopped an image intruding on the references section. |
||
(14 intermediate revisions by 13 users not shown) | |||
Line 1:
{{Short description|Mathematical concept}}
{{Use dmy dates|date=October 2020}}
'''Probability distribution fitting''' or simply '''distribution fitting''' is the fitting of a [[probability distribution]] to a series of data concerning the repeated measurement of a variable phenomenon.
The aim of distribution fitting is to [[prediction|predict]] the [[probability]] or to [[forecasting|forecast]] the [[Frequency (statistics)|frequency]] of occurrence of the magnitude of the phenomenon in a certain interval.
There are many probability distributions (see [[list of probability distributions]]) of which some can be fitted more closely to the observed frequency of the data than others, depending on the characteristics of the phenomenon and of the distribution. The distribution giving a close fit is supposed to lead to good predictions.
In distribution fitting, therefore, one needs to select a distribution that suits the data well.
Line 29 ⟶ 28:
The following techniques of distribution fitting exist:<ref>''Frequency and Regression Analysis''. Chapter 6 in: H.P.Ritzema (ed., 1994), ''Drainage Principles and Applications'', Publ. 16, pp. 175–224, International Institute for Land Reclamation and Improvement (ILRI), Wageningen, The Netherlands. {{ISBN|9070754339}}. Free download from the webpage [http://www.waterlog.info/articles.htm] under nr. 12, or directly as PDF : [http://www.waterlog.info/pdf/freqtxt.pdf]</ref>
*''Parametric methods'', by which the [[parameter]]s of the distribution are calculated from the data series.<ref>H. Cramér, "Mathematical methods of statistics"
**[[
**[[
**
**[[Maximum likelihood]] method<ref>{{cite journal | last = Aldrich | first = John | title = R. A. Fisher and the making of maximum likelihood 1912–1922 | year = 1997 | journal = Statistical Science | volume = 12 | issue = 3 | pages = 162–176 | doi = 10.1214/ss/1030037906 | mr = 1617519 | ref = citeref Aldrich1997| doi-access = free }}</ref>
::{| class="wikitable"
| bgcolor="white" | ''For example, the parameter <math>\mu</math> (the'' ''[[expected value|expectation]]) can be estimated by the [[Arithmetic mean|mean]] of the data and the parameter <math>\sigma^2</math> (the [[variance]]) can be estimated from the [[standard deviation]] of the data. The mean is found as <math display="inline">m=\sum{X}/n</math>, where <math>X</math> is the data value and <math>n</math> the number of data, while the standard deviation is calculated as <math display="inline">s = \sqrt{\frac{1}{n-1} \sum{(X-m)^2}}</math>. With these parameters many distributions, e.g. the normal distribution, are completely defined.''
|}
[[File:FitGumbelDistr.tif|thumb|220px|Cumulative Gumbel distribution fitted to maximum one-day October rainfalls in [[
*
::{| class="wikitable"
Line 70 ⟶ 69:
== Shifting of distributions ==
Some probability distributions, like the [[exponential distribution|exponential]], do not support negative data values (''X'')
The technique of distribution shifting augments the chance to find a properly fitting probability distribution.
Line 79 ⟶ 78:
== Uncertainty of prediction ==
[[File:BinomialConfBelts.jpg|thumb|<small>Uncertainty analysis with confidence belts using the binomial distribution
Predictions of occurrence based on fitted probability distributions are subject to [[uncertainty]], which arises from the following conditions:
Line 92 ⟶ 91:
With the binomial distribution one can obtain a [[prediction interval]]. Such an interval also estimates the risk of failure, i.e. the chance that the predicted event still remains outside the confidence interval. The confidence or risk analysis may include the [[return period]] ''T=1/Pe'' as is done in [[hydrology]].
=== [[Variance]] of [[Bayesian inference|Bayesian]] fitted probability functions ===
[[File:CumList.png|thumb|left|List of probability distributions ranked by goodness of fit.<ref>[https://www.waterlog.info/cumfreq.htm Software for probability distribution fitting]</ref>]]▼
A Bayesian approach can be used for fitting a model <math>P(x|\theta)</math> having a prior distribution <math>P(\theta)</math> for the parameter <math>\theta</math>. When one has samples <math>X</math> that are independently drawn from the underlying distribution then one can derive the so-called posterior distribution <math>P(\theta|X)</math>. This posterior can be used to update the probability mass function for a new sample <math>x</math> given the observations <math>X</math>, one obtains
<math display="block">P_\theta (x | X) := \int d\theta\ P(x|\theta)\ P(\theta|X) .</math>
The variance of the newly obtained probability mass function can also be determined. The variance for a Bayesian probability mass function can be defined as
<math display="block">\sigma_{P_\theta(x|X)}^2 := \int d\theta\ \left[ P(x|\theta) - P_\theta(x|X) \right]^2\ P(\theta|X).</math>
This expression for the variance can be substantially simplified (assuming independently drawn samples). Defining the "self probability mass function" as
<math display="block">P_\theta(x|\left\{X,x\right\}) = \int d\theta\ P(x|\theta)\ P(\theta|\left\{X, x\right\}),</math>
one obtains for the variance<ref>{{Cite journal |last1=Pijlman |last2=Linnartz |date=2023 |title=Variance of Likelihood of data |url=https://sitb2023.ulb.be/proceedings/ |journal=SITB 2023 Proceedings |pages=34}}</ref>
<math display="block">\sigma_{P_\theta(x|X)}^2 = P_\theta(x|X) \left[ P_\theta(x|\left\{X,x\right\}) - P_\theta(x|X) \right].</math>
▲The expression for variance involves an additional fit that includes the sample <math>x</math> of interest.[[File:CumList.png|thumb|left|List of probability distributions ranked by goodness of fit
[[File:GEVdistrHistogr+Density.png|thumb|220px|Histogram and probability density of a data set fitting the [[GEV distribution]] ]]
Line 109 ⟶ 125:
* [[Mixture distribution]]
* [[Product distribution]]
{{clear}}
== References ==
|