Probability bounds analysis: Difference between revisions

Content deleted Content added
moved long comment in talk page
Yobot (talk | contribs)
m Reference before punctuation detected and fixed using AWB (9585)
Line 28:
| ___location = Amsterdam
| isbn = 0-444-11037-2 }}
</ref> Also dating from the latter half of the 19th century, the [[Chebyshev_inequalityChebyshev inequality|inequality]] attributed to [[Chebyshev]] described bounds on a distribution when only the mean and
variance of the variable are known, and the related [[Markov_inequalityMarkov inequality|inequality]] attributed to [[Andrey Markov|Markov]] found bounds on a
positive variable when only the mean is known.
[[Henry E. Kyburg, Jr.|Kyburg]]<ref name="kyburg99">Kyburg, H.E., Jr. (1999). [http://www.sipta.org/documentation/interval_prob/kyburg.pdf Interval valued probabilities]. SIPTA Documention on Imprecise Probability.</ref> reviewed the history
Line 45:
 
The methods of probability bounds analysis that could be routinely used in
risk assessments were developed in the 1980s. Hailperin<ref name=Hailperin86 /> described a computational scheme for bounding logical calculations extending the ideas of Boole. Yager<ref name=Yager>Yager, R.R. (1986). Arithmetic and other operations on Dempster–Shafer structures. ''International Journal of Man-machine Studies'' '''25''': 357–366.</ref> described the elementary procedures by which bounds on [[convolution of probability distributions|convolutions]] can be computed under an assumption of independence. At about the same time, Makarov, <ref name=Makarov>Makarov, G.D. (1981). Estimates for the distribution function of a sum of two random variables when the marginal distributions are fixed. ''Theory of Probability and Its Applications'' '''26''': 803–806.</ref> and independently, Rüschendorf<ref>Rüschendorf, L. (1982). Random variables with maximum sums. ''Advances in Applied Probability'' '''14''': 623–632.</ref> solved the problem, originally posed by [[Kolmogorov]], of how to find the upper and lower bounds for the probability distribution of a sum of random variables whose marginal distributions, but not their joint distribution, are known. Frank et al.<ref name=Franketal87>Frank, M.J., R.B. Nelsen and B. Schweizer (1987). Best-possible bounds for the distribution of a sum—a problem of Kolmogorov. ''Probability Theory and Related Fields'' '''74''': 199–211.</ref> generalized the result of Makarov and expressed it in terms of [[Copula (probability theory)|copulas]]. Since that time, formulas and algorithms for sums have been generalized and extended to differences, products, quotients and other binary and unary functions under various dependence assumptions.<ref name=WilliamsonDowns>Williamson, R.C., and T. Downs (1990). Probabilistic arithmetic I: Numerical methods for calculating convolutions and dependency bounds. ''International Journal of Approximate Reasoning'' '''4''': 89–158.</ref><ref name=Fersonetal03>Ferson, S., V. Kreinovich, L. Ginzburg, D.S. Myers, and K. Sentz. (2003). [http://www.ramas.com/unabridged.zip ''Constructing Probability Boxes and Dempster–Shafer Structures'']. SAND2002-4015. Sandia National Laboratories, Albuquerque, NM.</ref><ref>Berleant, D. (1993). Automatically verified reasoning with both intervals and probability density functions. ''Interval Computations'' '''1993 (2) ''': 48–70.</ref><ref>Berleant, D., G. Anderson, and C. Goodman-Strauss (2008). Arithmetic on bounded families of distributions: a DEnv algorithm tutorial. Pages 183–210 in ''Knowledge Processing with Interval and Soft Computing'', edited by C. Hu, R.B. Kearfott, A. de Korvin and V. Kreinovich, Springer (ISBN 978-1-84800-325-5).</ref><ref name=BerleantGoodmanStrauss>Berleant, D., and C. Goodman-Strauss (1998). Bounding the results of arithmetic operations on random variables of unknown dependency using intervals. ''Reliable Computing'' '''4''': 147–165.</ref><ref name=Fersonetal04>Ferson, S., R. Nelsen, J. Hajagos, D. Berleant, J. Zhang, W.T. Tucker, L. Ginzburg and W.L. Oberkampf (2004). [http://www.ramas.com/depend.pdf ''Dependence in Probabilistic Modeling, Dempster–Shafer Theory, and Probability Bounds Analysis'']. Sandia National Laboratories, SAND2004-3072, Albuquerque, NM.</ref> <!--
 
It is possible to mix very different kinds of knowledge together in a bounding analysis. For instance,
Line 62:
 
===Mathematical details===
Let {{Unicode|&#x1D53B;}} denote the space of distribution functions on the [[real number]]s {{Unicode|ℝ}}, i.e., {{Unicode|&#x1D53B;}} = {''D'' | ''D'' : {{Unicode|ℝ}} → [0,1], ''D''(''x'') ≤ ''D''(''y'') whenever ''x'' < ''y'', for all ''x'', ''y'' [[Naive_set_theoryNaive set theory#Sets.2C_membership_and_equality2C membership and equality|&isin;]] {{Unicode|ℝ}}}, and let {{Unicode|&#x1D540;}} denote the set of real [[Interval (mathematics)|intervals]], i.e., {{Unicode|&#x1D540;}} = {''i'' | ''i'' = [''i''<sub>1</sub>, ''i''<sub>2</sub>], ''i''<sub>1</sub> ≤ ''i''<sub>2</sub>, ''i''<sub>1</sub>, ''i''<sub>2</sub> ∈ {{Unicode|ℝ}}}. Then a p-box is a quintuple {''{{overbar|F}}'', <u>''F''</u>, ''m'', ''v'', '''F'''}, where ''{{overbar|F}}'', <u>''F''</u> ∈ {{Unicode|&#x1D53B;}}, while ''m'', ''v'' ∈ {{Unicode|&#x1D540;}}, and '''F''' ⊆ {{Unicode|&#x1D53B;}}. This quintuple denotes the set of distribution functions ''F'' ∈ '''F''' ⊆ {{Unicode|&#x1D53B;}} such that ''{{overbar|F}}''(''x'') ≤ ''F''(''x'') ≤ <u>''F''</u>(''x'') for all ''x'' ∈ {{Unicode|ℝ}}}, and the mean and variance of ''F'' are in the intervals ''m'' and ''v'' respectively.
 
If ''F'' is a [[distribution function]] and ''B'' is a [[p-box]], the notation ''F'' ∈ ''B'' means that ''F'' is an
Line 68:
[''v''<sub>1</sub>,''v''<sub>2</sub>], '''B'''}, that is,
''B''<sub>2</sub>(''x'') &le; ''F''(''x'') &le; ''B''<sub>1</sub>(''x''), for all ''x'' ∈ {{Unicode|ℝ}},
[[Expected_valueExpected value|E]](''F'') &isin; [''m''<sub>1</sub>,''m''<sub>2</sub>],
[[Variance|V]](''F'') &isin; [''v''<sub>1</sub>,''v''<sub>2</sub>], and
''F'' &isin; '''B'''. We sometimes say ''F'' is <em>''inside</em>'' ''B''.
In some cases, there may be no information about the moments or distribution family other than what is
encoded in the two distribution functions that constitute the edges of the p-box. Then the quintuple
Line 100:
 
Finding bounds on the distribution of sums ''Z'' = ''X'' + ''Y''
<em>''without making any assumption about the dependence</em>'' between ''X''
and ''Y'' is actually easier than the problem assuming independence.
Makarov<ref name=Makarov/><ref name=Franketal87/><ref name=WilliamsonDowns/> showed that
:''Z'' ~ <big>[ sup</big><sub>x+y=z</sub> max(''F''(''x'') + ''G''(''y'') − 1, 0), <big>inf</big><sub>x+y=z</sub> min(''F''(''x'') + ''G''(''y''), 1) <big>]</big>.
 
These bounds are implied by the [[copula_copula (probability_theoryprobability theory)#Fr.C3.A9chet.E2.80.93Hoeffding_copula_bounds93Hoeffding copula bounds|Fréchet–Hoeffding]] [[copula (probability theory)|copula]] bounds. The problem can also be solved using the methods of [[mathematical programming]].<ref name=BerleantGoodmanStrauss />.
 
The convolution under the intermediate assumption that ''X'' and ''Y'' have [[positive quadrant dependence|positive dependence]] is likewise easy to compute, as is the convolution under the extreme assumptions of [[Comonotonicity|perfect positive]] or [[countermonotonicity|perfect negative]] dependency between ''X'' and ''Y''.<ref name=Fersonetal04 />
Line 113:
 
==Logical expressions==
Logical or [[Boolean_functionBoolean function|Boolean expressions]] involving [[logical_conjunctionlogical conjunction|conjunctions]] ([[AND_gateAND gate|AND]] operations), [[logical_disjunctionlogical disjunction|disjunctions]] ([[OR_gateOR gate|OR]] operations), exclusive disjunctions, equivalences, conditionals, etc. arise in the analysis of fault trees and event trees common in risk assessments. If the probabilities of events are characterized by intervals, as suggested by [[George Boole|Boole]]<ref name="BOOLE1854" /> and [[John Maynard Keynes|Keynes]]<ref name="kyburg99" /> among others, these binary operations are straightforward to evaluate. For example, if the probability of an event A is in the interval P(A) = ''a'' = [0.2, 0.25], and the probability of the event B is in P(B) = ''b'' = [0.1, 0.3], then the probability of the [[logical conjunction|conjunction]] is surely in the interval
: &nbsp;&nbsp;P(A & B) = ''a'' &times; ''b''
:::: = [0.2, 0.25] &times; [0.1, 0.3]
Line 141:
 
Prob(A or B) = Prob(A) + Prob(B) – Prob(A) * Prob(B)
 
 
Operation Formula
Line 173 ⟶ 172:
 
==Sampling-based computation==
Some analysts<ref>Alvarez, D. A., 2006. On the calculation of the bounds of probability of events using infinite random sets. ''International Journal of Approximate Reasoning'' '''43''': 241–267.</ref><ref>Baraldi, P., Popescu, I. C., Zio, E., 2008. Predicting the time to failure of a randomly degrading component by a hybrid Monte Carlo and possibilistic method. ''IEEE Proc. International Conference on Prognostics and Health Management''.</ref><ref>Batarseh, O. G., Wang, Y., 2008. Reliable simulation with input uncertainties using an interval-based approach. ''IEEE Proc. Winter Simulation Conference''.</ref><ref>Roy, Christopher J., and Michael S. Balch (2012). A holistic approach to uncertainty quantification with application to supersonic nozzle thrust. ''International Journal for Uncertainty Quantification'' [in press].</ref><ref>Zhang, H., Mullen, R. L., Muhanna, R. L. (2010). Interval Monte Carlo methods for structural reliability. ''Structural Safety'' '''32''': 183–190.</ref><ref>Zhang, H., Dai, H., Beer, M., Wang, W. (2012). Structural reliability analysis on the basis of small samples: an interval quasi-Monte Carlo method. ''Mechanical Systems and Signal Processing'' [in press].</ref> use sampling-based approaches to computing probability bounds, including [[Monte Carlo simulation]], [[Latin hypercube]] methods or [[importance sampling]]. These approaches cannot assure mathematical rigor in the result because such simulation methods are approximations, although their performance can generally be improved simply by increasing the number of replications in the simulation. Thus, unlike the analytical theorems or methods based on mathematical programming, sampling-based calculations usually cannot produce [[verified computing|verified computations]]. However, sampling-based methods can be very useful in addressing a variety of problems which are computationally [[NP-hard|difficult]] to solve analytically or even to rigorously bound. One important example is the use of Cauchy-deviate sampling to avoid the [[curse of dimensionality]] in propagating [[Interval (mathematics)|interval]] uncertainty through high-dimensional problems.<ref>Trejo, R., Kreinovich, V. (2001). [http://www.cs.utep.edu/vladik/2000/tr00-17.pdf Error estimations for indirect measurements: randomized vs. deterministic algorithms for ‘black-box’ programs]. ''Handbook on Randomized Computing'', S. Rajasekaran, P. Pardalos, J. Reif, and J. Rolim (eds.), Kluwer, 673–729.</ref>
 
==Relationship to other uncertainty propagation approaches==
PBA belongs to a class of methods that use [[imprecise probability|imprecise probabilities]] to simultaneously represent [[Uncertainty_quantificationUncertainty quantification|aleatoric and epistemic uncertainties]]. PBA is a generalization of both [[interval analysis]] and probabilistic [[convolution_of_probability_distributionsconvolution of probability distributions|convolution]] such as is commonly implemented with [[Monte Carlo simulation]]. PBA is also closely related to [[robust Bayes analysis]], which is sometimes called [[Bayesian sensitivity analysis]]. PBA is an alternative to [[second-order Monte Carlo simulation]].
 
==Applications==