Revision as of 11:32, 7 August 2023 edit EditorOnOccasion (talk \| contribs) 436 edits m →Data mining: Fixed grammar Tags: Mobile edit Mobile app edit Android app edit ← Previous edit		Revision as of 11:34, 7 August 2023 edit undo EditorOnOccasion (talk \| contribs) 436 edits →top: Add subsections Tags: Mobile edit Mobile app edit Android app edit Next edit →
Line 15: \| pmc= 5721660}}</ref> If there is a high proportion of irrelevant and redundant information present or noisy and unreliable data, then [[knowledge discovery]] during the training phase may be more difficult. [[Data preparation]] and filtering steps can take a considerable amount of processing time. Examples of methods used in data preprocessing include [[Data cleaning\|cleaning]], [[instance selection]], [[data normalization\|normalization]], [[One-hot\|one-hot encoding]], [[data transformation]], [[feature extraction]] and [[feature selection]]. ==~~Data mining~~Applications== ===Data mining=== {{Cleanup section\|date=August 2023\|reason=This section requires grammar and capitalisation fixes}} The origins of data preprocessing are located in [[data mining]].{{cn\|date=March 2021}} The idea is to aggregate existing information and search in the content. Later it was recognized, that for machine learning and neural networks a data preprocessing step is needed too. So it has become to a universal technique which is used in computing in general. Line 23 ⟶ 24: The reason why a user transforms existing files into a new one is because of many reasons. Data preprocessing has the objective to add missing values, aggregate information, label data with categories ([[data binning]]) and smooth a trajectory.{{cn\|date=March 2021}} More advanced techniques like principal component analysis and [[feature selection]] are working with statistical formulas and are applied to complex datasets which are recorded by GPS trackers and motion capture devices. ===Semantic data preprocessing=== Semantic data mining is a subset of data mining that specifically seeks to incorporate [[___domain knowledge]], such as formal semantics, into the data mining process. Domain knowledge is the knowledge of the environment the data was processed in. Domain knowledge can have a positive influence on many aspects of data mining, such as filtering out redundant or inconsistent data during the preprocessing phase.<ref>{{cite web \|title=Semantic Data Mining: A Survey of Ontology-based Approaches \|author=Dou, Deijing and Wang, Hao and Liu, Haishan \|publisher=University of Oregon \|url=http://ix.cs.uoregon.edu/~dou/research/papers/icsc15_invited.pdf \|language=en-US}}</ref> Domain knowledge also works as constraint. It does this by using working as set of prior knowledge to reduce the space required for searching and acting as a guide to the data. Simply put, semantic preprocessing seeks to filter data using the original environment of said data more correctly and efficiently.

Data preprocessing: Difference between revisions