Content deleted Content added
m Date maintenance tags and general fixes |
m sp, date & link fixes; unlinking common words using AWB |
||
Line 1:
{{Context|date=October 2009}}
'''Data pre-processing''' is an often neglected but important step in the data mining process. The phrase [[GIGO|"Garbage In, Garbage Out"]] is particularly applicable to [[data mining]] and [[machine learning]] projects. Data gathering methods are often loosely controlled, resulting in out-of-range values (e.g., Income: -100), impossible data combinations (e.g., Gender: Male, Pregnant: Yes), [[missing values]], etc. Analyzing data that has not been carefully screened for such problems can produce misleading results. Thus, the representation and quality of
If there is much irrelevant and redundant information present or noisy and unreliable data, then [[knowledge discovery]] during the training phase is more difficult. Data preparation and filtering steps can take considerable amount of processing time. Data pre-processing includes [[Data cleaning|cleaning]], normalization, transformation, [[feature extraction]] and selection, etc. The product of data pre-processing is the final [[training set]]. Kotsiantis et al. (2006) present a well
==References==
|