Probability plot correlation coefficient plot: Difference between revisions

Content deleted Content added
Definition: Repairing links to disambiguation pages - You can help! using AWB
m Disambiguating links to Probability plot (link removed; intentional link to DAB) using DisamAssist.
 
(6 intermediate revisions by 6 users not shown)
Line 1:
Many [[statistical analysis|statistical analyses]] are based on distributional assumptions about the [[population (statistics)|population]] from which the data have been obtained. However, distributional families can have radically different shapes depending on the value of the [[shape parameter]]. Therefore, finding a reasonable choice for the shape parameter is a necessary step in the analysis. In many analyses, finding a good distributional model for the data is the primary focus of the analysis.
 
The '''probability plot correlation coefficient (PPCC) plot''' is a [[graphical technique]] for identifying the shape parameter for a distributional family that best describes the data set. This technique is appropriate for families, such as the [[Weibull distribution|Weibull]], that are defined by a single shape parameter and [[___location parameter|___location]] and [[scale parameter]]s, and it is not appropriate or even possible for distributions, such as the [[normal distribution|normal]], that are defined only by ___location and scale parameters.
 
Many [[statistical analysis|statistical analyses]] are based on distributional assumptions about the [[population (statistics)|population]] from which the data have been obtained. However, distributional families can have radically different shapes depending on the value of the [[shape parameter]]. Therefore, finding a reasonable choice for the shape parameter is a necessary step in the analysis. In many analyses, finding a good distributional model for the data is the primary focus of the analysis.
 
The technique is simply "plot the [[probability plot correlation coefficient]]s for different values of the shape parameter, and choose whichever value yields the best fit".
Line 8:
The PPCC plot is formed by:
*Vertical axis: [[Probability plot correlation coefficient]];
*Horizontal axis: Value of shape parameter.
That is, for a series of values of the shape parameter, the [[Pearson product-moment correlation coefficient|correlation coefficient]] is computed for the [[probability plot]] associated with a given value of the shape parameter. These correlation coefficients are plotted against their corresponding shape parameters. The maximum correlation coefficient corresponds to the optimal value of the shape parameter. For better precision, two iterations of the PPCC plot can be generated; the first is for finding the right neighborhood and the second is for fine tuning the estimate.
 
The PPCC plot is used first to find a good value of the shape parameter. The probability plot is then generated to find estimates of the ___location and scale parameters and in addition to provide a graphical assessment of the adequacy of the distributional fit.
Line 17:
#Does the best-fit member provide a good fit (in terms of generating a probability plot with a high correlation coefficient)?
#Does this distributional family provide a good fit compared to other distributions?
#How sensitive is the choice of the shape parameter?
 
==Comparing distributions==
Line 25:
 
==Tukey-lambda PPCC plot for symmetric distributions==
{{seealsosee also|Tukey lambda distribution}}
The Tukey lambda PPCC plot, with shape parameter λλ, is particularly useful for symmetric distributions. It indicates whether a distribution is short or long tailed and it can further indicate several common distributions. Specifically,
#λλ = −1: distribution is approximately [[Cauchy distribution|Cauchy]]
#λλ = 0: distribution is exactly [[logistic distribution|logistic]]
#λλ = 0.14: distribution is approximately normal
#λλ = 0.5: distribution is U-shaped
#λλ = 1: distribution is exactly [[continuous uniform distribution|uniform]](−1, 1)
If the Tukey lambda PPCC plot gives a maximum value near 0.14, one can reasonably conclude that the normal distribution is a good model for the data. If the maximum value is less than 0.14, a [[long-tailed distribution]] such as the [[Double exponentialLaplace distribution|double exponential]] or logistic would be a better choice. If the maximum value is near −1, this implies the selection of very long-tailed distribution, such as the Cauchy. If the maximum value is greater than 0.14, this implies a [[short-tailed distribution]] such as the [[Beta distribution|Beta]] or uniform.
 
The Tukey-lambda PPCC plot is used to suggest an appropriate distribution. One should follow-up with PPCC and probability plots of the appropriate alternatives.
 
==See also==
*[[Probability plot (disambiguation)|Probability plot]]
 
==External links==
Line 46:
|last=Filliben
|first=J. J.
|date=February 1975
|month = February
|year = 1975
|title = The Probability Plot Correlation Coefficient Test for Normality
|journal = Technometrics
Line 54 ⟶ 53:
|volume = 17
|issue = 1
|url=http://jstor.org/stable/=1268008
}}
 
{{NIST-PD}}
 
 
[[Category:Statistical charts and diagrams]]