*[[Data reduction]]
*[[Data wrangling]]
==Example==
In this example we have 5 Adults in our dataset who have the Sex of Male or Female and whether they are pregnant or not. We can detect that Adult 3 and 5 are impossible data combinations.
{|
|-
|
{| class="wikitable" style="border:none; float:left; margin-top:0; text-align:center;"
!style="background:white; border:none;" colspan="2" rowspan="2"|
!colspan="2" style="background:none;"|
|-
!Sex
!Pregnant
|-
!rowspan="5" style="height:6em;background:none;"|<div>Adult </div>
!1
|Male
|No
|-
!2
|Female
|Yes
|-
!<span style="color:red">3</span>
|'''Male'''
|'''Yes'''
|-
!4
|Female
|No
|-
!<span style="color:red">5</span>
|'''Male'''
|'''Yes'''
|-
|}
|
|}
We can perform a [[Data cleansing]] and choose to delete such data from our table. We remove such data because we can determine that such data existing in the dataset is caused by user entry errors or data corruption. A reason that one might have to delete such data is because the impossible data will affect the calculation or data manipulation process in the later steps of the data mining process.
{|
|-
|
{| class="wikitable" style="border:none; float:left; margin-top:0; text-align:center;"
!style="background:white; border:none;" colspan="2" rowspan="2"|
!colspan="2" style="background:none;"|
|-
!Sex
!Pregnant
|-
!rowspan="3" style="height:6em;background:none;"|<div>Adult </div>
!1
|Male
|No
|-
!2
|Female
|Yes
|-
!4
|Female
|No
|-
|}
|
|}
We can perform a [[Data editing]] and change the Sex of the Adult by knowing that the Adult is Pregnant we can make the assumption that the Adult is Female and make changes accordingly. We edit the dataset to have a clearer analysis of the data when performing data manipulation in the later steps within the data mining process.
{|
|-
|
{| class="wikitable" style="border:none; float:left; margin-top:0; text-align:center;"
!style="background:white; border:none;" colspan="2" rowspan="2"|
!colspan="2" style="background:none;"|
|-
!Sex
!Pregnant
|-
!rowspan="5" style="height:6em;background:none;"|<div>Adult </div>
!1
|Male
|No
|-
!2
|Female
|Yes
|-
!<span style="color:blue">3</span>
|'''Female'''
|'''Yes'''
|-
!4
|Female
|No
|-
!<span style="color:blue">5</span>
|'''Female'''
|'''Yes'''
|-
|}
|
|}
We can use a form of [[Data reduction]] and sort the data by Sex and by doing this we can simplify our dataset and choose what Sex we want to focus on more.
{|
|-
|
{| class="wikitable" style="border:none; float:left; margin-top:0; text-align:center;"
!style="background:white; border:none;" colspan="2" rowspan="2"|
!colspan="2" style="background:none;"|
|-
!Sex
!Pregnant
|-
!rowspan="5" style="height:6em;background:none;"|<div>Adult </div>
!2
|Female
|Yes
|-
!4
|Female
|No
|-
!1
|Male
|No
|-
!3
|Male
|Yes
|-
!5
|Male
|Yes
|-
|}
|
|}
==Data mining==
The origins of data preprocessing are located in [[data mining]].{{cn|date=March 2021}} The idea is to aggregate existing information and search in the content. Later it was recognized, that for machine learning and neural networks a data preprocessing step is needed too. So it has become to a universal technique which is used in computing in general.
|