Revision as of 02:53, 23 February 2017 edit Joelcarbonera (talk \| contribs) 6 edits No edit summary ← Previous edit		Revision as of 04:39, 29 March 2017 edit undo Deepakmahapatra91 (talk \| contribs) 1 edit No edit summary Next edit →
Line 3: If there is much irrelevant and redundant information present or noisy and unreliable data, then [[knowledge discovery]] during the training phase is more difficult. Data preparation and filtering steps can take considerable amount of processing time. Data pre-processing includes [[Data cleaning\|cleaning]], [[Instance selection]], [[data normalization\|normalization]], [[data transformation\|transformation]], [[feature extraction]] and [[Feature selection\|selection]], etc. The product of data pre-processing is the final [[training set]]. Kotsiantis et al. (2006) present a well-known algorithm for each step of data pre-processing.<ref>S. Kotsiantis, D. Kanellopoulos, P. Pintelas, "Data Preprocessing for Supervised Learning", ''International Journal of Computer Science'', 2006, Vol 1 N. 2, pp 111–117.</ref> Various programming languages has many libraries that implements the best strategy available to pre process your data for machine learning applications. One such library available in python programming language is pandas. ==See also== *[[Data cleansing]]

Data preprocessing: Difference between revisions