Content deleted Content added
Mike Shelk (talk | contribs) m Added a link to the article about sina plots |
→Whiskers: Box needs its own section, if whiskers has one. And that's confusing anyway. |
||
Line 25:
: <math>\text{IQR} = Q_3 - Q_1 = q_n(0.75) - q_n(0.25)</math>
A box-plot usually includes two parts, a box and a set of whiskers as shown in Figure 2.
===Box===
The box is drawn from ''Q''<sub>1</sub> to ''Q''<sub>3</sub> with a horizontal line drawn inside it to denote the median. Some box plots include an additional character to represent the mean of the data.<ref name="frigge hoaglin iglewicz2">{{Cite journal|last1=Frigge|first1=Michael|last2=Hoaglin|first2=David C.|last3=Iglewicz|first3=Boris|date=February 1989|title=Some Implementations of the Boxplot|journal=[[The American Statistician]]|volume=43|issue=1|pages=50–54|doi=10.2307/2685173|jstor=2685173}}</ref><ref>{{cite journal|last1=Marmolejo-Ramos|first1=F.|last2=Tian|first2=S.|date=2010|title=The shifting boxplot. A boxplot based on essential summary statistics around the mean|journal=International Journal of Psychological Research|volume=3|issue=1|pages=37–46|doi=10.21500/20112084.823|doi-access=free}}</ref>▼
===Whiskers===
The whiskers must end at an observed data point, but can be defined in various ways. In the most straight-forward method, the boundary of the lower whisker is the minimum value of the data set, and the boundary of the upper whisker is the maximum value of the data set. Because of this variability, it is appropriate to describe the convention that is being used for the whiskers and outliers in the caption of the box-plot.▼
▲In the most straight-forward method, the boundary of the lower whisker is the minimum value of the data set, and the boundary of the upper whisker is the maximum value of the data set.
Another popular choice for the boundaries of the whiskers is based on the 1.5 IQR value. From above the upper quartile ('''''Q''<sub>3</sub>'''), a distance of 1.5 times the IQR is measured out and a whisker is drawn ''up to'' the largest observed data point from the dataset that falls within this distance. Similarly, a distance of 1.5 times the IQR is measured out below the lower quartile ('''''Q''<sub>1</sub>''') and a whisker is drawn ''down to'' the lowest observed data point from the dataset that falls within this distance. Because the whiskers must end at an observed data point, the whisker lengths can look unequal, even though 1.5 IQR is the same for both sides. All other observed data points outside the boundary of the whiskers are plotted as '''outliers'''.<ref>{{Cite book |title=A Modern Introduction to Probability and Statistics |url=https://archive.org/details/modernintroducti00dekk_722 |url-access=limited |last=Dekking |first=F.M. |publisher=Springer |year=2005 |isbn=1-85233-896-2 |pages=[https://archive.org/details/modernintroducti00dekk_722/page/n240 234]–238 }}</ref> The outliers can be plotted on the box-plot as a dot, a small circle, a star, ''etc.'' (see example below).
Line 35 ⟶ 38:
There are other representations in which the whiskers can stand for several other things, such as:
* One [[standard deviation]] above and below the mean of the data set
* The 9th percentile and the 91st percentile of the data set
Line 41 ⟶ 43:
Rarely, box-plot can be plotted without the whiskers. This can be appropriate for sensitive information to avoid whiskers (and outliers) disclosing actual values observed.<ref name="DGRW">{{Cite journal|last1=Derrick|first1=Ben|last2=Green|first2=Elizabeth|last3=Ritchie|first3=Felix|last4=White|first4=Paul|date=September 2022|title=The Risk of Disclosure When Reporting Commonly Used Univariate Statistics|journal=Privacy in Statistical Databases|volume=13463|pages=119–129|doi=10.1007/978-3-031-13945-1_9}}</ref>
▲Some box plots include an additional character to represent the mean of the data.<ref name="frigge hoaglin iglewicz2">{{Cite journal|last1=Frigge|first1=Michael|last2=Hoaglin|first2=David C.|last3=Iglewicz|first3=Boris|date=February 1989|title=Some Implementations of the Boxplot|journal=[[The American Statistician]]|volume=43|issue=1|pages=50–54|doi=10.2307/2685173|jstor=2685173}}</ref><ref>{{cite journal|last1=Marmolejo-Ramos|first1=F.|last2=Tian|first2=S.|date=2010|title=The shifting boxplot. A boxplot based on essential summary statistics around the mean|journal=International Journal of Psychological Research|volume=3|issue=1|pages=37–46|doi=10.21500/20112084.823|doi-access=free}}</ref>
The unusual percentiles 2%, 9%, 91%, 98% are sometimes used for whisker cross-hatches and whisker ends to depict the [[seven-number summary]]. If the data are [[Normal distribution|normally distributed]], the locations of the seven marks on the box plot will be equally spaced. On some box plots, a cross-hatch is placed before the end of each whisker.
==Variations==
|