Probability distribution fitting: Difference between revisions

Browse history interactively

← Previous edit

Content deleted Content added

VisualWikitext

Revision as of 22:27, 24 November 2022 edit BD2412 (talk \| contribs) Autopatrolled, Administrators 2,528,047 edits m →Techniques of fitting: Fixing links to disambiguation pages, replaced: Surinam → Suriname Tag: AWB ← Previous edit		Latest revision as of 07:45, 17 April 2025 edit undo Helper201 (talk \| contribs) Extended confirmed users 93,372 edits Stopped an image intruding on the references section.
(9 intermediate revisions by 9 users not shown)
Line 1: {{Short description\|Mathematical concept}} {{Use dmy dates\|date=October 2020}} '''Probability distribution fitting''' or simply '''distribution fitting''' is the fitting of a [[probability distribution]] to a series of data concerning the repeated measurement of a variable phenomenon. Line 27 ⟶ 28: The following techniques of distribution fitting exist:<ref>''Frequency and Regression Analysis''. Chapter 6 in: H.P.Ritzema (ed., 1994), ''Drainage Principles and Applications'', Publ. 16, pp. 175–224, International Institute for Land Reclamation and Improvement (ILRI), Wageningen, The Netherlands. {{ISBN\|9070754339}}. Free download from the webpage [http://www.waterlog.info/articles.htm] under nr. 12, or directly as PDF : [http://www.waterlog.info/pdf/freqtxt.pdf]</ref> ''Parametric methods'', by which the [[parameter]]s of the distribution are calculated from the data series.<ref>H. Cramér, "Mathematical methods of statistics" , Princeton Univ. Press (1946)</ref> The parametric methods are: [[Method of moments (statistics)\|Method of moments]] [[Maximum spacing estimation]] Method of [[L-moment]]s<ref>{{cite journal \| last=Hosking \| first=J.R.M. \| year=1990 \| title=L-moments: analysis and estimation of distributions using linear combinations of order statistics \| journal=Journal of the Royal Statistical Society, Series B \| volume=52 \| issue=1 \| pages=105–124 \| jstor=2345653}}</ref> [[Maximum likelihood]] method<ref>{{cite journal \| last = Aldrich \| first = John \| title = R. A. Fisher and the making of maximum likelihood 1912–1922 \| year = 1997 \| journal = Statistical Science \| volume = 12 \| issue = 3 \| pages = 162–176 \| doi = 10.1214/ss/1030037906 \| mr = 1617519 \| ref = citeref Aldrich1997\| doi-access = free }}</ref> ::{\| class="wikitable" Line 68 ⟶ 69: == Shifting of distributions == Some probability distributions, like the [[exponential distribution\|exponential]], do not support negative data values (''X'') ~~equal to or less than zero~~. Yet, when negative data are present, such distributions can still be used replacing ''X'' by ''Y''=''X''-''Xm'', where ''Xm'' is the minimum value of ''X''. This replacement represents a shift of the probability distribution in positive direction, i.e. to the right, because ''Xm'' is negative. After completing the distribution fitting of ''Y'', the corresponding ''X''-values are found from ''X''=''Y''+''Xm'', which represents a back-shift of the distribution in negative direction, i.e. to the left.<br> The technique of distribution shifting augments the chance to find a properly fitting probability distribution. Line 77 ⟶ 78: == Uncertainty of prediction == [[File:BinomialConfBelts.jpg\|thumb\|<small>Uncertainty analysis with confidence belts using the binomial distribution </small><ref>Frequency predictions and their binomial confidence limits. In: International Commission on Irrigation and Drainage, Special Technical Session: Economic Aspects of Flood Control and non-Structural Measures, Dubrovnik, Yugoslavia, 1988. [http://www.waterlog.info/pdf/binomial.pdf On line]</ref>]] Predictions of occurrence based on fitted probability distributions are subject to [[uncertainty]], which arises from the following conditions: Line 90 ⟶ 91: With the binomial distribution one can obtain a [[prediction interval]]. Such an interval also estimates the risk of failure, i.e. the chance that the predicted event still remains outside the confidence interval. The confidence or risk analysis may include the [[return period]] ''T=1/Pe'' as is done in [[hydrology]]. === [[Variance]] of [[Bayesian inference\|Bayesian]] fitted probability functions === [[File:CumList.png\|thumb\|left\|List of probability distributions ranked by goodness of fit.<ref>[https://www.waterlog.info/cumfreq.htm Software for probability distribution fitting]</ref>]]▼ A Bayesian approach can be used for fitting a model <math>P(x\|\theta)</math> having a prior distribution <math>P(\theta)</math> for the parameter <math>\theta</math>. When one has samples <math>X</math> that are independently drawn from the underlying distribution then one can derive the so-called posterior distribution <math>P(\theta\|X)</math>. This posterior can be used to update the probability mass function for a new sample <math>x</math> given the observations <math>X</math>, one obtains <math display="block">P_\theta (x \| X) := \int d\theta\ P(x\|\theta)\ P(\theta\|X) .</math> The variance of the newly obtained probability mass function can also be determined. The variance for a Bayesian probability mass function can be defined as <math display="block">\sigma_{P_\theta(x\|X)}^2 := \int d\theta\ \left[ P(x\|\theta) - P_\theta(x\|X) \right]^2\ P(\theta\|X).</math> This expression for the variance can be substantially simplified (assuming independently drawn samples). Defining the "self probability mass function" as <math display="block">P_\theta(x\|\left\{X,x\right\}) = \int d\theta\ P(x\|\theta)\ P(\theta\|\left\{X, x\right\}),</math> one obtains for the variance<ref>{{Cite journal \|last1=Pijlman \|last2=Linnartz \|date=2023 \|title=Variance of Likelihood of data \|url=https://sitb2023.ulb.be/proceedings/ \|journal=SITB 2023 Proceedings \|pages=34}}</ref> <math display="block">\sigma_{P_\theta(x\|X)}^2 = P_\theta(x\|X) \left[ P_\theta(x\|\left\{X,x\right\}) - P_\theta(x\|X) \right].</math> ▲The expression for variance involves an additional fit that includes the sample <math>x</math> of interest.[[File:CumList.png\|thumb\|left\|List of probability distributions ranked by goodness of fit.<ref>[https://www.waterlog.info/cumfreq.htm Software for probability distribution fitting]</ref>]] [[File:GEVdistrHistogr+Density.png\|thumb\|220px\|Histogram and probability density of a data set fitting the [[GEV distribution]] ]] Line 107 ⟶ 125: [[Mixture distribution]] * [[Product distribution]] {{clear}} == References ==