Content deleted Content added
specific for the article |
Added my own drawing of a box plot and changed a few things. |
||
Line 2:
[[File:Michelsonmorley-boxplot.svg|thumb|upright=1.5|Figure 1. Box plot of data from the [[Michelson–Morley experiment#Michelson experiment (1881)|Michelson experiment]]]]
In [[descriptive statistics]], a '''box plot''' or '''boxplot''' is a method for demonstrating graphically the locality, spread and skewness groups of numerical data through their [[quartile]]s.<ref>{{Cite book|last=C.|first=Dutoit, S. H.|url=http://worldcat.org/oclc/1019645745|title=Graphical exploratory data analysis.|date=2012|publisher=Springer|isbn=978-1-4612-9371-2|oclc=1019645745}}</ref>
In addition to the box on a box plot, there can be lines (which are called ''whiskers'') extending from the box indicating variability outside the upper and lower quartiles, thus, the plot is also called the '''box-and-whisker plot''' and the '''box-and-whisker diagram'''. [[Outlier]]s that differ significantly from the rest of the dataset<ref>{{Cite journal|last=Grubbs|first=Frank E.|date=February 1969|title=Procedures for Detecting Outlying Observations in Samples|url=http://dx.doi.org/10.1080/00401706.1969.10490657|journal=Technometrics|volume=11|issue=1|pages=1–21|doi=10.1080/00401706.1969.10490657|issn=0040-1706}}</ref> may be plotted as individual points beyond the whiskers on the box-plot. Box plots are [[non-parametric]]: they display variation in samples of a [[statistical population]] without making any assumptions of the underlying [[probability distribution|statistical distribution]]<ref>{{Cite book|last=Richard.|first=Boddy|url=http://worldcat.org/oclc/940679163|title=Statistical Methods in Practice : for Scientists and Technologists.|date=2009|publisher=John Wiley & Sons|isbn=978-0-470-74664-6|oclc=940679163}}</ref> (though Tukey's boxplot assumes symmetry for the whiskers and normality for their length). == History ==
Line 38 ⟶ 41:
Another popular choice for the boundaries of the whiskers is based on the 1.5 IQR value. From above the upper quartile ('''''Q''<sub>3</sub>'''), a distance of 1.5 times the IQR is measured out and a whisker is drawn ''up to'' the largest observed data point from the dataset that falls within this distance. Similarly, a distance of 1.5 times the IQR is measured out below the lower quartile ('''''Q''<sub>1</sub>''') and a whisker is drawn ''down to'' the lowest observed data point from the dataset that falls within this distance. Because the whiskers must end at an observed data point, the whisker lengths can look unequal, even though 1.5 IQR is the same for both sides. All other observed data points outside the boundary of the whiskers are plotted as '''outliers'''.<ref>{{Cite book |title=A Modern Introduction to Probability and Statistics |url=https://archive.org/details/modernintroducti00dekk_722 |url-access=limited |last=Dekking |first=F.M. |publisher=Springer |year=2005 |isbn=1-85233-896-2 |pages=[https://archive.org/details/modernintroducti00dekk_722/page/n240 234]–238 }}</ref> The outliers can be plotted on the box-plot as a dot, a small circle, a star, ''etc.'' (see example below).
[[File:Box Plot Picture.png|thumb|377x377px|This is a picture of a box plot representing data.]]
There are other representations in which the whiskers can stand for several other things, such as:
Line 118 ⟶ 121:
=== Example with outliers ===
[[File:Boxplot with outlier.png|thumb|Figure 6. The generated boxplot of the example on the left with outliers]]
Above is an example without outliers. Here is a
The ordered set for the recorded temperatures is (°F): 52, 57, 57, 58, 63, 66, 66, 67, 67, 68, 69, 70, 70, 70, 70, 72, 73, 75, 75, 76, 76, 78, 79, 89.
|