Content deleted Content added
Tag: Reverted |
Tag: Reverted |
||
Line 21:
===Data mining===
{{Cleanup section|date=August 2023|reason=This section requires grammar and capitalisation fixes}}
Data preprocessing allows for the removal of unwanted data with the use of data cleaning, this allows the user to have a dataset to contain more valuable information after the preprocessing stage for data manipulation later in the data mining process. Editing such dataset to either correct data corruption or human error is a crucial step
The reason why a user transforms existing files into a new one is because of many reasons. Aspects of data preprocessing may include imputing missing values, aggregating numerical quantities and transforming continuous data into categories ([[data binning]]).<ref>{{Cite book |last1=Hastie |first1=Trevor |url=https://books.google.com/books?id=eBSgoAEACAAJ |title=The Elements of Statistical Learning: Data Mining, Inference, and Prediction |last2=Tibshirani |first2=Robert |last3=Friedman |first3=Jerome H. |date=2009 |publisher=Springer |isbn=978-0-387-84884-6 |language=en}}</ref> More advanced techniques like principal component analysis and [[feature selection]] are working with statistical formulas and are applied to complex datasets which are recorded by GPS trackers and motion capture devices.
|