Content deleted Content added
m Disambiguate Interval to Interval (mathematics) using popups |
Script-assisted style fixes: mainly date formats |
||
Line 1:
{{Use dmy dates|date=April 2013}}
'''Probability bounds analysis (PBA)''' is a collection of methods of uncertainty propagation for making qualitative and quantitative calculations in the face of uncertainties of various kinds.
This [[upper and lower bounds|bounding]] approach permits analysts to make calculations without requiring overly precise assumptions about parameter values, dependence among variables, or even distribution shape.
The diverse methods comprising probability bounds analysis provide algorithms to evaluate mathematical expressions when there is uncertainty about the input values, their dependencies, or even the form of mathematical expression itself.
the bounds could be no tighter without excluding some of the possible
distributions.
P-boxes are usually merely bounds on possible distributions.
==History of bounding probability==
Line 27 ⟶ 28:
| ___location = Amsterdam
| isbn = 0-444-11037-2 }}
</ref>
variance of the variable are known, and the related [[Markov_inequality|inequality]] attributed to [[Andrey Markov|Markov]] found bounds on a
positive variable when only the mean is known.
[[Henry E. Kyburg, Jr.|Kyburg]]<ref name="kyburg99">Kyburg, H.E., Jr. (1999). [http://www.sipta.org/documentation/interval_prob/kyburg.pdf Interval valued probabilities]. SIPTA Documention on Imprecise Probability.</ref> reviewed the history
of interval probabilities and traced the development of the critical ideas through the
Of particular note is [[Maurice René Fréchet|Fréchet]]'s derivation in the
dependence assumptions. Bounding probabilities has continued to the
present day (e.g., Walley's theory of [[imprecise probability]]<ref name="WALLEY1991">{{cite book
Line 44 ⟶ 45:
The methods of probability bounds analysis that could be routinely used in
risk assessments were developed in the
It is possible to mix very different kinds of knowledge together in a bounding analysis.
In some cases, we may not know whether a quantity varies or is a fixed constant.
In some cases, the shape or family of the distribution of a quantity may be known from mechanistic or physics-based arguments, but its parameters may be in doubt.
Further suppose that sparse data were used to form the 95% confidence limits for the distribution of ''C''.
Probability bounds analysis includes the important special case of [[dependency bounds analysis]]<<__Williamson and Downs>> to compute bounds on the cumulative distribution of a function of random variables when only the marginal distributions of the variables are known, which is a problem originally posed by [[Kolmogorov]].
Line 58 ⟶ 59:
==Arithmetic expressions==
Arithmetic expressions involving operations such as additions, subtractions, multiplications, divisions, minima, maxima, powers, exponentials, logarithms, square roots, absolute values, etc., are commonly used in [[Probabilistic risk assessment|risk analyses]] and uncertainty modeling.
===Mathematical details===
Let {{Unicode|𝔻}} denote the space of distribution functions on the [[real number]]s {{Unicode|ℝ}}, i.e., {{Unicode|𝔻}} = {''D'' | ''D'' : {{Unicode|ℝ}} → [0,1], ''D''(''x'') ≤ ''D''(''y'') whenever ''x'' < ''y'', for all ''x'', ''y'' [[Naive_set_theory#Sets.2C_membership_and_equality|∈]] {{Unicode|ℝ}}}, and let {{Unicode|𝕀}} denote the set of real [[Interval (mathematics)|intervals]], i.e., {{Unicode|𝕀}} = {''i'' | ''i'' = [''i''<sub>1</sub>, ''i''<sub>2</sub>], ''i''<sub>1</sub> ≤ ''i''<sub>2</sub>, ''i''<sub>1</sub>, ''i''<sub>2</sub> ∈ {{Unicode|ℝ}}}.
If ''F'' is a [[distribution function]] and ''B'' is a [[p-box]], the notation ''F'' ∈ ''B'' means that ''F'' is an
Line 69 ⟶ 70:
[[Expected_value|E]](''F'') ∈ [''m''<sub>1</sub>,''m''<sub>2</sub>],
[[Variance|V]](''F'') ∈ [''v''<sub>1</sub>,''v''<sub>2</sub>], and
''F'' ∈ '''B'''.
In some cases, there may be no information about the moments or distribution family other than what is
encoded in the two distribution functions that constitute the edges of the p-box.
representing the p-box {''B''<sub>1</sub>, ''B''<sub>2</sub>, [
can be denoted more compactly as [''B''<sub>1</sub>, ''B''<sub>2</sub>].
that of intervals on the real line, except that the endpoints are distributions rather than points.
Line 79 ⟶ 80:
distribution function ''F'', that is, ''F'' = ''F''(''x''):{{Unicode|ℝ}}→[0,1]:x→Pr(''X''≤''x'').
<!-- can I get the "mapsto" character without resorting to ugly <math> ? -->
Let us generalize the tilde notation for use with p-boxes.
''X'' ~ ''B''
to mean that ''X'' is a random variable whose distribution function is unknown except that it is inside ''B''.
Line 88 ⟶ 89:
If ''X'' and ''Y'' are independent random variables with distributions ''F'' and ''G''
respectively, then ''X'' + ''Y'' = ''Z'' ~ ''H'' given by
:''H''(''z'') = <big>∫ </big><sub>z=x+y</sub> ''F''(''x'') ''G''(''y'') d''z'' = <big>∫ </big>{{su|''p''=∞|''b''=−∞}} ''F''(''x'') ''G''(''z
This operation is called a [[convolution]] on ''F'' and ''G''.
p-boxes is straightforward for sums.
Suppose
Line 102 ⟶ 103:
and ''Y'' is actually easier than the problem assuming independence.
Makarov<ref name=Makarov/><ref name=Franketal87/><ref name=WilliamsonDowns/> showed that
:''Z'' ~ <big>[ sup</big><sub>x+y=z</sub> max(''F''(''x'') + ''G''(''y'')
These bounds are implied by the [[copula_(probability_theory)#Fr.C3.A9chet.E2.80.93Hoeffding_copula_bounds|Fréchet–Hoeffding]] [[copula (probability theory)|copula]] bounds.
The convolution under the intermediate assumption that ''X'' and ''Y'' have [[positive quadrant dependence|positive dependence]] is likewise easy to compute, as is the convolution under the extreme assumptions of [[Comonotonicity|perfect positive]] or [[countermonotonicity|perfect negative]] dependency between ''X'' and ''Y''.<ref name=Fersonetal04 />
Generalized convolutions for other operations such as subtraction, multiplication, division, etc., can be derived using transformations.
[''B''<sub>2</sub>(
<!--
Line 121 ⟶ 122:
cumulative distribution function is convex on (, 0.1] and concave on [0.1, )
That is, ''A'' denotes all the probability distribution functions of normally distributed random variables whose mean is between 0.5 and 0.6, and whose variance is between 0.001 and 0.01.
''a'' = normal([0.3,0.4], sqrt([0.001,.01]))
Line 137 ⟶ 138:
What can be inferred about the sum of these uncertain numbers depends on the assumptions about the stochastic dependence between the quantities.
If, for instance, they can be assumed to be independent, or related according to some other specific [[copula (probability theory)|copula]] or dependence function, the bounds on the sum will be tighter than they would be if their dependence were imprecisely specified (e.g., that they are positively related, or that their interaction can be characterized by a particular correlation coefficient).
can be computed in different ways as a result of assumptions made about the dependence among the quantities.
In this example, we compute bounds on the sum from only partial information about each of the respective random variables.
Shown below are the bounds on each of the four inputs and bounds on the sum, both with an assumption of independence and without any assumption about the dependence among the variables.
When the quantities are independent and their p-boxes are degenerate so they define particular distribution functions, the result of the probability bounds analysis is the same as would be obtained in a traditional probabilistic convolution such as is commonly implemented with Monte Carlo simulation.
Line 148 ⟶ 149:
A+B+C+D A+B+C+D
Figure 7.
a=lognormal1([.5,.6],
b = minmaxmode(0, 1, .3)
c= hist(0,1,.2, .5, .6, .7, .75, .8)
d=uniform(0,1)
e =a |+| b
The table below lists the summary statistical measures yielded by three analyses of this hypothetical calculation.
Summary Monte Carlo Independence General
Line 162 ⟶ 163:
variance 0.135 [ 0.086, 0.31] [ 0, 0.90]
Notice that, while the Monte Carlo simulation produces point estimates, the bounding analyses yield intervals for the various measures.
====Condensation====
Line 181 ⟶ 182:
<dt>Use of dependence operators
<dd>
<dt>Rearranging to reduce repeated uncertainties<dd>The dependency problem can be eliminated by replacing an expression to be evaluated by an algebraically equivalent expression in which no variable appears more than once.
(''a''+''b''+''c''+''d'')/''x'', the dependence problem disappears because the rearranged expression has no repeated variables.
Likewise, the
''x''<sup>2</sup>
can be replaced
by the algebraically equivalent expression
(''x''
<dt>Subinterval reconstitution
Line 204 ⟶ 205:
==Logical expressions==
Logical or [[Boolean_function|Boolean expressions]] involving [[logical_conjunction|conjunctions]] ([[AND_gate|AND]] operations), [[logical_disjunction|disjunctions]] ([[OR_gate|OR]] operations), exclusive disjunctions, equivalences, conditionals, etc. arise in the analysis of fault trees and event trees common in risk assessments.
: P(A & B) = ''a'' × ''b''
:::: = [0.2, 0.25] × [0.1, 0.3]
:::: = [0.2 × 0.1, 0.25 × 0.3]
:::: = [0.02, 0.075]
so long as A and B can be assumed to be independent events.
: P(A & B) = env(max(0, ''a''+''b''
:::: = env(max(0, [0.2, 0.25]+[0.1, 0.3]
:::: = env([max(0, 0.2+0.
:::: = env([0,0], [0.1, 0.25])
:::: = [0, 0.25]
where env([''x''<sub>1</sub>,''x''<sub>2</sub>], [''y''<sub>1</sub>,''y''<sub>2</sub>]) is [min(''x''<sub>1</sub>,''y''<sub>1</sub>), max(''x''<sub>2</sub>,''y''<sub>2</sub>)].
: P(A v B) = ''a'' + ''b''
:::: = 1
:::: = 1
:::: = 1
:::: = [0.28, 0.475]
if A and B are independent events.
: P(A v B) = env(max(''a'', ''b''), min(1, ''a'' + ''b''))
:::: = env(max([0.2, 0.25], [0.1, 0.3]), min(1, [0.2, 0.25] + [0.1, 0.3]))
Line 227 ⟶ 228:
:::: = [0.2, 0.55].
It is also possible to compute interval bounds on the conjunction or disjunction under other assumptions about the dependence between A and B.
Prob(A and B) = Prob(A) * Prob(B).
Line 235 ⟶ 236:
Operation Formula
conjunction [ max(0, a+b–1),
disjunction [ max(a, b),
a = [0.2, 0.25]
Line 254 ⟶ 255:
==Magnitude comparisons==
The probability that an uncertain number represented by a p-box ''D'' is less than zero is the interval Pr(''D'' < 0) = [<u>''F</u>''(0), ''F̅''(0)], where ''F̅''(0) is the left bound of the probability box ''D'' and <u>''F''</u>(0) is its right bound, both evaluated at zero.
:''A'' < ''B'' = Pr(''A''
:''A'' > ''B'' = Pr(''B''
:''A'' ≤ ''B'' = Pr(''A''
:''A'' ≥ ''B'' = Pr(''B''
Thus the probability that ''A'' is less than ''B'' is the same as the probability that their difference is less than zero, and this probability can be said to be the value of the expression ''A'' < ''B''.
Line 264 ⟶ 265:
==Sampling-based computation==
Some analysts<ref>Alvarez, D. A., 2006. On the calculation of the bounds of probability of events using infinite random sets. ''International Journal of Approximate Reasoning'' '''43''': 241–267.</ref><ref>Baraldi, P., Popescu, I. C., Zio, E., 2008. Predicting the time to failure of a randomly degrading component by a hybrid Monte Carlo and possibilistic method. ''IEEE Proc. International Conference on Prognostics and Health Management''.</ref><ref>Batarseh, O. G., Wang, Y., 2008. Reliable simulation with input uncertainties using an interval-based approach. ''IEEE Proc. Winter Simulation Conference''.</ref><ref>Roy, Christopher J., and Michael S. Balch (2012). A holistic approach to uncertainty quantification with application to supersonic nozzle thrust. ''International Journal for Uncertainty Quantification'' [in press].</ref><ref>Zhang, H., Mullen, R. L., Muhanna, R. L. (2010). Interval Monte Carlo methods for structural reliability. ''Structural Safety'' '''32''': 183–190.</ref><ref>Zhang, H., Dai, H., Beer, M., Wang, W. (2012). Structural reliability analysis on the basis of small samples: an interval quasi-Monte Carlo method. ''Mechanical Systems and Signal Processing'' [in press].</ref> use sampling-based approaches to computing probability bounds, including [[Monte Carlo simulation]], [[Latin hypercube]] methods or [[importance sampling]].
==Relationship to other uncertainty propagation approaches==
PBA belongs to a class of methods that use [[imprecise probability|imprecise probabilities]] to simultaneously represent [[Uncertainty_quantification|aleatoric and epistemic uncertainties]].
==Applications==
Line 280 ⟶ 281:
Value of information; dilation
===Bayesian inference of p-boxes===
Vicky
===Analysis of data consisting of intervals===
===Validation===
Line 288 ⟶ 289:
Doubt about the function that combines inputs
==Limitations and drawbacks==
Loses modal information; Cedric
==Generalizations==
-->
|