Revision as of 12:35, 8 November 2013 edit Yobot (talk \| contribs) Bots 4,733,870 edits m Reference before punctuation detected and fixed using AWB (9585) ← Previous edit		Revision as of 06:04, 5 July 2015 edit undo Bender235 (talk \| contribs) Autopatrolled, Extended confirmed users, Pending changes reviewers, Rollbackers, Template editors 472,818 edits m some copy-editing Next edit →
Line 4: This [[upper and lower bounds\|bounding]] approach permits analysts to make calculations without requiring overly precise assumptions about parameter values, dependence among variables, or even distribution shape. Probability bounds analysis is essentially a combination of the methods of standard [[interval analysis]] and classical [[probability theory]]. Probability bounds analysis gives the same answer as interval analysis does when only range information is available. It also gives the same answers as [[Monte Carlo simulation]] does when information is abundant enough to precisely specify input distributions and their dependencies. Thus, it is a generalization of both interval analysis and probability theory. The diverse methods comprising probability bounds analysis provide algorithms to evaluate mathematical expressions when there is uncertainty about the input values, their dependencies, or even the form of mathematical expression itself. The calculations yield results that are guaranteed to enclose all possible distributions of the output variable if the input [[probability box\|p-boxes]] were also sure to enclose their respective distributions. In some cases, a calculated p-box will also be best-possible in the sense that the bounds could be no tighter without excluding some of the possible distributions. Line 62: ===Mathematical details=== Let {{Unicode\|𝔻}} denote the space of distribution functions on the [[real number]]s {{Unicode\|ℝ}}, i.e., {{Unicode\|𝔻}} = {''D'' \| ''D'' : {{Unicode\|ℝ}} → [0,1], ''D''(''x'') ≤ ''D''(''y'') whenever ''x'' < ''y'', for all ''x'', ''y'' [[Naive set theory#Sets.2C membership and equality\|~~∈~~∈]] {{Unicode\|ℝ}}}, and let {{Unicode\|𝕀}} denote the set of real [[Interval (mathematics)\|intervals]], i.e., {{Unicode\|𝕀}} = {''i'' \| ''i'' = [''i''<sub>1</sub>, ''i''<sub>2</sub>], ''i''<sub>1</sub> ≤ ''i''<sub>2</sub>, ''i''<sub>1</sub>, ''i''<sub>2</sub> ∈ {{Unicode\|ℝ}}}. Then a p-box is a quintuple {''{{overbar\|F}}'', <u>''F''</u>, ''m'', ''v'', '''F'''}, where ''{{overbar\|F}}'', <u>''F''</u> ∈ {{Unicode\|𝔻}}, while ''m'', ''v'' ∈ {{Unicode\|𝕀}}, and '''F''' ⊆ {{Unicode\|𝔻}}. This quintuple denotes the set of distribution functions ''F'' ∈ '''F''' ⊆ {{Unicode\|𝔻}} such that ''{{overbar\|F}}''(''x'') ≤ ''F''(''x'') ≤ <u>''F''</u>(''x'') for all ''x'' ∈ {{Unicode\|ℝ}}}, and the mean and variance of ''F'' are in the intervals ''m'' and ''v'' respectively. If ''F'' is a [[distribution function]] and ''B'' is a [[p-box]], the notation ''F'' ∈ ''B'' means that ''F'' is an element of ''B'' = {''B''<sub>1</sub>, ''B''<sub>2</sub>, [''m''<sub>1</sub>,''m''<sub>2</sub>], [''v''<sub>1</sub>,''v''<sub>2</sub>], '''B'''}, that is, ''B''<sub>2</sub>(''x'') ~~≤~~≤ ''F''(''x'') ~~≤~~≤ ''B''<sub>1</sub>(''x''), for all ''x'' ∈ {{Unicode\|ℝ}}, [[Expected value\|E]](''F'') ~~∈~~∈ [''m''<sub>1</sub>,''m''<sub>2</sub>], [[Variance\|V]](''F'') ~~∈~~∈ [''v''<sub>1</sub>,''v''<sub>2</sub>], and ''F'' ~~∈~~∈ '''B'''. We sometimes say ''F'' is ''inside'' ''B''. In some cases, there may be no information about the moments or distribution family other than what is encoded in the two distribution functions that constitute the edges of the p-box. Then the quintuple representing the p-box {''B''<sub>1</sub>, ''B''<sub>2</sub>, [~~−∞~~−∞,~~∞~~∞], [0,~~∞~~∞], 𝔻} can be denoted more compactly as [''B''<sub>1</sub>, ''B''<sub>2</sub>]. This notation harkens to that of intervals on the real line, except that the endpoints are distributions rather than points. The notation ''X'' ~ ''F'' denotes the fact that ''X''∈{{Unicode\|ℝ}} is a random variable governed by the distribution function ''F'', that is, ''F'' = ''F''(''x''):{{Unicode\|ℝ}}~~→~~→[0,1]:~~x→Pr~~x→Pr(''X''~~≤~~≤''x''). <!-- can I get the "mapsto" character without resorting to ugly <math> ? --> Let us generalize the tilde notation for use with p-boxes. We will write ''X'' ~ ''B'' to mean that ''X'' is a random variable whose distribution function is unknown except that it is inside ''B''. Thus, ''X'' ~ ''F'' ~~∈~~∈ ''B'' can be contracted to X ~ B without mentioning the distribution function explicitly. If ''X'' and ''Y'' are independent random variables with distributions ''F'' and ''G'' respectively, then ''X'' + ''Y'' = ''Z'' ~ ''H'' given by :''H''(''z'') = <big>~~∫~~∫ </big><sub>z=x+y</sub> ''F''(''x'') ''G''(''y'') d''z'' = <big>~~∫~~∫ </big>{{su\|''p''=~~∞~~∞\|''b''=~~−∞~~−∞}} ''F''(''x'') ''G''(''z − x'') d''x'' = ''F * G''. <!-- the convolution asterisk looks a little better italicized --> This operation is called a [[convolution]] on ''F'' and ''G''. The analogous operation on p-boxes is straightforward for sums. Suppose :''X'' ~ ''A'' = [''A''<sub>1</sub>, ''A''<sub>2</sub>] and :''Y'' ~ ''B'' = [''B''<sub>1</sub>, ''B''<sub>2</sub>]. If ''X'' and ''Y'' are stochastically independent, then the distribution of ''Z''=''X''+''Y'' is inside the p-box :[''A''<sub>1</sub> * ''B''<sub>1</sub>, ''A''<sub>2</sub> * ''B''<sub>2</sub>]. <!-- when there are subscripts, the convolution asterisk looks better not italicized --> Finding bounds on the distribution of sums ''Z'' = ''X'' + ''Y'' ''without making any assumption about the dependence'' between ''X'' and ''Y'' is actually easier than the problem assuming independence. Makarov<ref name=Makarov/><ref name=Franketal87/><ref name=WilliamsonDowns/> showed that :''Z'' ~ <big>[ sup</big><sub>x+y=z</sub> max(''F''(''x'') + ''G''(''y'') − 1, 0), <big>inf</big><sub>x+y=z</sub> min(''F''(''x'') + ''G''(''y''), 1) <big>]</big>. Line 114: ==Logical expressions== Logical or [[Boolean function\|Boolean expressions]] involving [[logical conjunction\|conjunctions]] ([[AND gate\|AND]] operations), [[logical disjunction\|disjunctions]] ([[OR gate\|OR]] operations), exclusive disjunctions, equivalences, conditionals, etc. arise in the analysis of fault trees and event trees common in risk assessments. If the probabilities of events are characterized by intervals, as suggested by [[George Boole\|Boole]]<ref name="BOOLE1854" /> and [[John Maynard Keynes\|Keynes]]<ref name="kyburg99" /> among others, these binary operations are straightforward to evaluate. For example, if the probability of an event A is in the interval P(A) = ''a'' = [0.2, 0.25], and the probability of the event B is in P(B) = ''b'' = [0.1, 0.3], then the probability of the [[logical conjunction\|conjunction]] is surely in the interval :   P(A & B) = ''a'' ~~×~~× ''b'' :::: = [0.2, 0.25] ~~×~~× [0.1, 0.3] :::: = [0.2 ~~×~~× 0.1, 0.25 ~~×~~× 0.3] :::: = [0.02, 0.075] so long as A and B can be assumed to be independent events. If they are not independent, we can still bound the conjunction using the classical [[Frechet inequalities\|~~Fréchet~~Fréchet inequality]]. In this case, we can infer at least that the probability of the joint event A & B is surely within the interval :   P(A & B) = env(max(0, ''a''+''b''−1), min(''a'', ''b'')) :::: = env(max(0, [0.2, 0.25]+[0.1, 0.3]−1), min([0.2, 0.25], [0.1, 0.3])) Line 125: :::: = [0, 0.25] where env([''x''<sub>1</sub>,''x''<sub>2</sub>], [''y''<sub>1</sub>,''y''<sub>2</sub>]) is [min(''x''<sub>1</sub>,''y''<sub>1</sub>), max(''x''<sub>2</sub>,''y''<sub>2</sub>)]. Likewise, the probability of the [[logical disjunction\|disjunction]] is surely in the interval :   P(A v B) = ''a'' + ''b'' − ''a'' ~~×~~× ''b'' = 1 − (1 − ''a'') ~~×~~× (1 − ''b'') :::: = 1 − (1 − [0.2, 0.25]) ~~×~~× (1 − [0.1, 0.3]) :::: = 1 − [0.75, 0.8] ~~×~~× [0.7, 0.9] :::: = 1 − [0.525, 0.72] :::: = [0.28, 0.475] if A and B are independent events. If they are not independent, the ~~Fréchet~~Fréchet inequality bounds the disjunction :   P(A v B) = env(max(''a'', ''b''), min(1, ''a'' + ''b'')) :::: = env(max([0.2, 0.25], [0.1, 0.3]), min(1, [0.2, 0.25] + [0.1, 0.3])) Line 136: :::: = [0.2, 0.55]. It is also possible to compute interval bounds on the conjunction or disjunction under other assumptions about the dependence between A and B. For instance, one might assume they are positively dependent, in which case the resulting interval is not as tight as the answer assuming independence but tighter than the answer given by the ~~Fréchet~~Fréchet inequality. Comparable calculations are used for other logical functions such as negation, exclusive disjunction, etc. When the Boolean expression to be evaluated becomes complex, it may be necessary to evaluate it using the methods of mathematical programming<ref name=Hailperin86 /> to get best-possible bounds on the expression. If the probabilities of the events are characterized by probability distributions or p-boxes rather than intervals, then analogous calculations can be done to obtain distributional or p-box results characterizing the probability of the top event. <!-- Prob(A and B) = Prob(A) * Prob(B). Line 165: :''A'' < ''B'' = Pr(''A'' − ''B'' < 0), :''A'' > ''B'' = Pr(''B'' − ''A'' < 0), :''A'' ~~≤~~≤ ''B'' = Pr(''A'' − ''B'' ~~≤~~≤ 0), and :''A'' ~~≥~~≥ ''B'' = Pr(''B'' − ''A'' ~~≤~~≤ 0). Thus the probability that ''A'' is less than ''B'' is the same as the probability that their difference is less than zero, and this probability can be said to be the value of the expression ''A'' < ''B''. Line 212: ==References== {{Reflist\|30em}} ==Additional references==▼ ▲==~~Additional~~Further references== * {{cite book \| last1 = Bernardini \| first1 = Alberto \| last2 = Tonon \| first2 = Fulvio \| title = Bounding Uncertainty in Civil Engineering: Theoretical Background \| publisher = Springer \| ___location = Berlin \| year = 2010 \| isbn = 3-642-11189-0 }} * {{cite book \| last = Ferson \| first = Scott \| title = RAMAS Risk Calc 4.0 Software : Risk Assessment with Uncertain Numbers \| publisher = Lewis Publishers \| ___location = Boca Raton, Florida \| year = 2002 \| isbn = 1-56670-576-2 }} * {{cite book \| last1 = Oberkampf \| first1 = William L. \| last2 = Roy \| first2 = Christopher J. \| title = Verification and Validation in Scientific Computing \| publisher = Cambridge University Press \| ___location = New York \| year = 2010 \| isbn = 0-521-11360-1 }}<!-- In an email dated 28 March 2011, William Oberkampf stated "PBA is the only UQ method we discuss and apply in our examples in the book." -->

Probability bounds analysis: Difference between revisions