Content deleted Content added
→top: Add citations needed tag Tags: Mobile edit Mobile app edit Android app edit |
→top: Fix grammar and remove duplicated content Tags: Mobile edit Mobile app edit Android app edit |
||
Line 2:
'''Data preprocessing''' can refer to manipulation or dropping of data before it is used in order to ensure or enhance performance,<ref>{{Cite web|title=Guide To Data Cleaning: Definition, Benefits, Components, And How To Clean Your Data|url=https://www.tableau.com/learn/articles/what-is-data-cleaning|access-date=2021-10-17|website=Tableau|language=en-US}}</ref> and is an important step in the [[data mining]] process. The phrase [[GIGO|"garbage in, garbage out"]] is particularly applicable to [[data mining]] and [[machine learning]] projects. [[Data collection]] methods are often loosely controlled, resulting in out-of-range values, impossible data combinations, and [[missing values]], amongst other issues.
Analyzing data that has not been carefully screened for such problems can produce misleading results. Thus,
Often, data preprocessing is the most important phase of a [[machine learning]] project, especially in [[computational biology]].<ref>{{cite journal
| vauthors = Chicco D
Line 13:
| pmid = 29234465
| doi = 10.1186/s13040-017-0155-3
| pmc= 5721660}}</ref> If there is
Data preprocessing may affect the way in which outcomes of the final data processing can be interpreted.<ref>{{Cite journal|last1=Oliveri|first1=Paolo|last2=Malegori|first2=Cristina|last3=Simonetti|first3=Remo|last4=Casale|first4=Monica|date=2019|title=The impact of signal preprocessing on the final interpretation of analytical outcomes – A tutorial|journal=Analytica Chimica Acta|language=en|volume=1058|pages=9–17|doi=10.1016/j.aca.2018.10.055|pmid=30851858|s2cid=73727614}}</ref> This aspect should be carefully considered when interpretation of the results is a key point, such in the multivariate processing of chemical data ([[chemometrics]]).
==Data mining==
|