Content deleted Content added
Larry.europe (talk | contribs) No edit summary |
Larry.europe (talk | contribs) No edit summary |
||
Line 1:
'''Data pre-processing''' is an important step in the [[data mining]] process. The phrase [[GIGO|"garbage in, garbage out"]] is particularly applicable to data mining and [[machine learning]] projects. Data-gathering methods are often loosely controlled, resulting in [[range error|out-of-range]] values (e.g., Income: −100), impossible data combinations (e.g., Sex: Male, Pregnant: Yes), [[missing values]], etc. Analyzing data that has not been carefully screened for such problems can produce misleading results. Thus, the representation and [[data quality|quality of data]] is first and foremost before running an analysis.<ref>Pyle, D., 1999. ''Data Preparation for Data Mining.'' Morgan Kaufmann Publishers, [[Los Altos, California]].</ref>
Often, data pre-processing is the most important phase of a [[machine learning]] project, especially in [[computational biology]].<ref>{{cite journal | vauthors = Chicco D
| title = Ten quick tips for machine learning in computational biology
|