Exploratory data analysis: Difference between revisions

Content deleted Content added
cleanup
m Overview: minor wording correction
Line 8:
Tukey defined data analysis in 1961 as: "Procedures for analyzing data, techniques for interpreting the results of such procedures, ways of planning the gathering of data to make its analysis easier, more precise or more accurate, and all the machinery and results of (mathematical) statistics which apply to analyzing data."<ref>[http://projecteuclid.org/download/pdf_1/euclid.aoms/1177704711 John Tukey-The Future of Data Analysis-July 1961]</ref>
 
Exploratory data analysis is an analysis technique to analyze and investigate the data set and summariessummarize the main characteristics of the dataset. Main advantage of EDA is providing the data visualization of data after conducting the analysis.
 
Tukey's championing of EDA encouraged the development of [[Computational statistics|statistical computing]] packages, especially [[S (programming language)|S]] at [[Bell Labs]].<ref>{{Citation |last=Becker |first=Richard A. |title=A Brief History of S |publisher=AT&T Bell Laboratories |place=Murray Hill, New Jersey |access-date=2015-07-23 |url=http://www2.research.att.com/areas/stat/doc/94.11.ps |format=PS |archive-url=https://web.archive.org/web/20150723044213/http://www2.research.att.com/areas/stat/doc/94.11.ps |archive-date=2015-07-23 |quotation="... we wanted to be able to interact with our data, using Exploratory Data Analysis (Tukey, 1971) techniques."}}</ref> The S programming language inspired the systems [[S-PLUS]] and [[R (programming language)|R]]. This family of statistical-computing environments featured vastly improved dynamic visualization capabilities, which allowed statisticians to identify [[outlier]]s, [[trend estimation|trends]] and [[pattern recognition|patterns]] in data that merited further study.