Data preprocessing: Difference between revisions

Content deleted Content added
No edit summary
No edit summary
Line 3:
If there is much irrelevant and redundant information present or noisy and unreliable data, then [[knowledge discovery]] during the training phase is more difficult. Data preparation and filtering steps can take considerable amount of processing time. Data pre-processing includes [[Data cleaning|cleaning]], [[Instance selection]], [[data normalization|normalization]], [[data transformation|transformation]], [[feature extraction]] and [[Feature selection|selection]], etc. The product of data pre-processing is the final [[training set]]. Kotsiantis et al. (2006) present a well-known algorithm for each step of data pre-processing.<ref>S. Kotsiantis, D. Kanellopoulos, P. Pintelas, "Data Preprocessing for Supervised Learning", ''International Journal of Computer Science'', 2006, Vol 1 N. 2, pp 111–117.</ref>
 
Various programming languages has many libraries that implements the best strategy available to pre process your data for machine learning applications. One such library available in python programming language is pandas.
==See also==
*[[Data cleansing]]