Data preprocessing: Difference between revisions

Content deleted Content added
m fixed dashes using a script; trivials; downcase
m remove context tag
Line 1:
{{Context|date=October 2009}}
'''Data pre-processing''' is an often neglected but important step in the data mining process. The phrase [[GIGO|"garbage in, garbage out"]] is particularly applicable to [[data mining]] and [[machine learning]] projects. Data-gathering methods are often loosely controlled, resulting in out-of-range values (e.g., Income: −100), impossible data combinations (e.g., Gender: Male, Pregnant: Yes), [[missing values]], etc. Analyzing data that has not been carefully screened for such problems can produce misleading results. Thus, the representation and quality of data is first and foremost before running an analysis.<ref>Pyle, D., 1999. ''Data Preparation for Data Mining.'' Morgan Kaufmann Publishers, [[Los Altos, California]].</ref>