Content deleted Content added
Duckmather (talk | contribs) adding {{copyvio-revdel}} |
Yash Thale (talk | contribs) m →top: Added some info about EDA/plotting/Visualizatio libraries used currently woth Python For EDA. Tags: Mobile edit Mobile app edit Android app edit App full source |
||
(One intermediate revision by one other user not shown) | |||
Line 1:
{{short description|Approach of analyzing data sets in statistics
In [[statistics]], '''exploratory data analysis''' (EDA) is an approach of [[data analysis|analyzing]] [[data set]]s to summarize their main characteristics, often using [[statistical graphics]] and other [[data visualization]] methods. A [[statistical model]] can be used or not, but primarily EDA is for seeing what the data can tell beyond the formal modeling and thereby contrasts with traditional hypothesis testing, in which a model is supposed to be selected before the data is seen. Exploratory data analysis has been promoted by [[John Tukey]] since 1970 to encourage statisticians to explore the data, and possibly formulate hypotheses that could lead to new data collection and experiments. EDA is different from [[Data analysis#Initial data analysis|initial data analysis (IDA)]],<ref>{{cite book |last=Chatfield |first=C. |year=1995 |title=Problem Solving: A Statistician's Guide |publisher=Chapman and Hall |isbn=978-0412606304 |edition=2nd }}</ref><ref>{{cite journal |doi=10.1371/journal.pcbi.1009819|title=Ten simple rules for initial data analysis|year=2022|last1=Baillie|first1=Mark|last2=Le Cessie|first2=Saskia|last3=Schmidt|first3=Carsten Oliver|last4=Lusa|first4=Lara|last5=Huebner|first5=Marianne|author6=Topic Group "Initial Data Analysis" of the STRATOS Initiative|journal=PLOS Computational Biology|volume=18|issue=2|pages=e1009819|pmid=35202399|pmc=8870512|bibcode=2022PLSCB..18E9819B |doi-access=free }}</ref> which focuses more narrowly on checking assumptions required for model fitting and hypothesis testing, and handling missing values and making transformations of variables as needed. EDA encompasses IDA.
Line 101:
* [[Orange (software)|Orange]], an [[open-source software|open-source]] [[data mining]] and [[machine learning]] software suite.
* [[Python (programming language)|Python]], an open-source programming language widely used in data mining and machine learning.
* Matplotlib & Seaborn are the Python libraries used in todays world for EDA and Plotting/Data Visualization.(point updated: 2025)
* [[R (programming language)|R]], an open-source programming language for statistical computing and graphics. Together with Python one of the most popular languages for data science.
* [[TinkerPlots]] an EDA software for upper elementary and middle school students.
|