Exploratory data analysis

This is an old revision of this page, as edited by 83.56.206.140 (talk) at 20:28, 18 October 2006. The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

Exploratory data analysis (EDA) is that part of statistical practice concerned with reviewing, communicating and using data where there is a low level of knowledge about its cause system. It was so named by John Tukey. Many EDA techniques have been adopted into data mining and are being taught to young students as a way to introduce them to statistical thinking.

Tukey held that too much emphasis in statistics was placed on evaluating and testing given hypotheses (confirmatory data analysis) and that the balance was in need of redressing in favour of using data to suggest hypotheses to test. In particular, confusion of the two types of analysis and employing them on the same set of data can lead to bias owing to the issues endemic in testing hypotheses suggested by the data.

The objectives of EDA are to:

The principal graphical tools used in EDA are:

The principal quantitative tools are:

Software

  • XLisp-Stat (free software and Lisp based EDA development framework for Mac, PC and X-Windows)
  • ViSta (free interactive software based on Xlisp-Stat for EDA)
  • DataDesk (free-to-try commercial EDA software for Mac and PC)
  • Orange (free component-based software for interactive EDA and machine learning)
  • GGobi (free interactive multivariate visualization software linked to R)
  • MANET (free Mac-only interactive EDA software)
  • Mondrian (free interactive software for EDA)
  • Fathom (for high-school and intro college courses)
  • TinkerPlots (for upper elementary and middle school students)

See also

Bibliography

  • Hoaglin, D C; Mosteller, F & Tukey, John Wilder (Eds) (1985). Exploring Data Tables, Trends and Shapes. ISBN 0-471-09776-4. {{cite book}}: Cite has empty unknown parameter: |coauthors= (help)CS1 maint: multiple names: authors list (link)
  • Hoaglin, D C; Mosteller, F & Tukey, John Wilder (Eds) (1983). Understanding Robust and Exploratory Data Analysis. ISBN 0-471-09777-2. {{cite book}}: Cite has empty unknown parameter: |coauthors= (help)CS1 maint: multiple names: authors list (link)
  • Tukey, John Wilder (1977). Exploratory Data Analysis. ISBN 0-201-07616-0. {{cite book}}: Cite has empty unknown parameters: |accessyear=, |origmonth=, |accessmonth=, |month=, |chapterurl=, |origdate=, and |coauthors= (help)
  • Velleman, P F & Hoaglin, D C (1981) Applications, Basics and Computing of Exploratory Data Analysis ISBN 0-87150-409-X