Data preprocessing: Difference between revisions

Content deleted Content added
m Data mining: Fixed grammar
Tags: Mobile edit Mobile app edit Android app edit
top: Add subsections
Tags: Mobile edit Mobile app edit Android app edit
Line 15:
| pmc= 5721660}}</ref> If there is a high proportion of irrelevant and redundant information present or noisy and unreliable data, then [[knowledge discovery]] during the training phase may be more difficult. [[Data preparation]] and filtering steps can take a considerable amount of processing time. Examples of methods used in data preprocessing include [[Data cleaning|cleaning]], [[instance selection]], [[data normalization|normalization]], [[One-hot|one-hot encoding]], [[data transformation]], [[feature extraction]] and [[feature selection]].
 
==Data miningApplications==
===Data mining===
{{Cleanup section|date=August 2023|reason=This section requires grammar and capitalisation fixes}}
The origins of data preprocessing are located in [[data mining]].{{cn|date=March 2021}} The idea is to aggregate existing information and search in the content. Later it was recognized, that for machine learning and neural networks a data preprocessing step is needed too. So it has become to a universal technique which is used in computing in general.
Line 23 ⟶ 24:
The reason why a user transforms existing files into a new one is because of many reasons. Data preprocessing has the objective to add missing values, aggregate information, label data with categories ([[data binning]]) and smooth a trajectory.{{cn|date=March 2021}} More advanced techniques like principal component analysis and [[feature selection]] are working with statistical formulas and are applied to complex datasets which are recorded by GPS trackers and motion capture devices.
 
===Semantic data preprocessing===
Semantic data mining is a subset of data mining that specifically seeks to incorporate [[___domain knowledge]], such as formal semantics, into the data mining process. Domain knowledge is the knowledge of the environment the data was processed in. Domain knowledge can have a positive influence on many aspects of data mining, such as filtering out redundant or inconsistent data during the preprocessing phase.<ref>{{cite web |title=Semantic Data Mining: A Survey of Ontology-based Approaches |author=Dou, Deijing and Wang, Hao and Liu, Haishan |publisher=University of Oregon |url=http://ix.cs.uoregon.edu/~dou/research/papers/icsc15_invited.pdf |language=en-US}}</ref> Domain knowledge also works as constraint. It does this by using working as set of prior knowledge to reduce the space required for searching and acting as a guide to the data. Simply put, semantic preprocessing seeks to filter data using the original environment of said data more correctly and efficiently.