Box plot: Difference between revisions

Content deleted Content added
Added my own drawing of a box plot and changed a few things.
m Changed minor words.
Line 28:
In addition to the minimum and maximum values used to construct a box-plot, another important element that can also be employed to obtain a box-plot is the interquartile range (IQR), as denoted below:
 
* '''[[Interquartile range]] (IQR)''' : the distance between the upper and lower quartiles
 
:: <math>\text{IQR} = Q_3 - Q_1 = q_n(0.75) - q_n(0.25)</math>
Line 41:
 
Another popular choice for the boundaries of the whiskers is based on the 1.5 IQR value. From above the upper quartile ('''''Q''<sub>3</sub>'''), a distance of 1.5 times the IQR is measured out and a whisker is drawn ''up to'' the largest observed data point from the dataset that falls within this distance. Similarly, a distance of 1.5 times the IQR is measured out below the lower quartile ('''''Q''<sub>1</sub>''') and a whisker is drawn ''down to'' the lowest observed data point from the dataset that falls within this distance. Because the whiskers must end at an observed data point, the whisker lengths can look unequal, even though 1.5 IQR is the same for both sides. All other observed data points outside the boundary of the whiskers are plotted as '''outliers'''.<ref>{{Cite book |title=A Modern Introduction to Probability and Statistics |url=https://archive.org/details/modernintroducti00dekk_722 |url-access=limited |last=Dekking |first=F.M. |publisher=Springer |year=2005 |isbn=1-85233-896-2 |pages=[https://archive.org/details/modernintroducti00dekk_722/page/n240 234]–238 }}</ref> The outliers can be plotted on the box-plot as a dot, a small circle, a star, ''etc.'' (see example below).
[[File:Box Plot Picture.png|thumb|377x377px389x389px|This is a picture of a box plot representing data.]]
There are other representations in which the whiskers can stand for several other things, such as:
 
Line 48:
* The 2nd percentile and the 98th percentile of the data set
 
Rarely, box-plot can be plotted without the whiskers. This can be appropriate for sensitive information to avoid whiskers (and outliers) disclosing actual values observed.<ref name="DGRW">{{Cite book|last1=Derrick|first1=Ben|last2=Green|first2=Elizabeth|last3=Ritchie|first3=Felix|last4=White|first4=Paul|date=September 2022|chapter=The Risk of Disclosure When Reporting Commonly Used Univariate Statistics|title=Privacy in Statistical Databases|series=Lecture Notes in Computer Science |volume=13463|pages=119–129|doi=10.1007/978-3-031-13945-1_9|isbn=978-3-031-13944-4 }}</ref>
 
The unusual percentiles 2%, 9%, 91%, 98% are sometimes used for whisker cross-hatches and whisker ends to depict the [[seven-number summary]]. If the data are [[Normal distribution|normally distributed]], the locations of the seven marks on the box plot will be equally spaced. On some box plots, a cross-hatch is placed before the end of each whisker.