Content deleted Content added
Micahtobon (talk | contribs) m Grammar fixes |
RandFreeman (talk | contribs) Adding local short description: "Review and adjustment of survey data", overriding Wikidata description "data cleanup process" |
||
(8 intermediate revisions by 8 users not shown) | |||
Line 1:
{{Short description|Review and adjustment of survey data}}
'''Data editing''' is defined as the process involving the review and adjustment of collected [[survey data]].<ref>{{cite web |last1=Ferguson |first1=Dania P. |title=AN INTRODUCTION TO THE DATA EDITING PROCESS |url=https://unece.org/DAM/stats/publications/editing/SDE1.pdf |website=unece.org/}}</ref> Data editing helps define guidelines that will reduce potential bias and ensure consistent estimates leading to a clear analysis of the data set by correct inconsistent data using the methods later in this article.
==Editing methods==
Editing methods refer to a range of procedures and processes which are used for detecting and handling errors in data. Data editing is used with the goal to improve the quality of statistical data produced. These modifications can greatly improve the quality of analytics created by aiming to detect and correct errors. Examples of different techniques to data editing such as micro-editing, macro-editing, selective editing, or the different tools used to achieve data
===Interactive editing===
Line 12:
*Compare the respondent's data to data from similar respondents
*Use the subject matter knowledge of the human editor
Interactive editing is a standard way to edit data. It can be used to edit both [[categorical data|categorical]] and [[continuity (mathematics)|continuous]] data.<ref>Waal, Ton de et al. "Handbook of Statistical Data Editing and Imputation". Wiley publication, 2011, p.15.</ref> Interactive editing reduces the time frame needed to complete the cyclical process of review and adjustment.<ref name="auto1">{{Cite web|url=http://www.unece.org/info/ece-homepage.html|title=UNECE Homepage|website=www.unece.org}}</ref> Interactive editing also requires an understanding of the data set and the possible results that would come from an analysis of the data.
===Selective editing===
Line 19:
*The critical stream
*The non-critical stream
The critical stream consists of records that are more likely to contain influential errors. These critical records are edited in a traditional interactive manner. The records in the non-critical stream which are unlikely to contain influential errors are not edited in a computer-assisted manner.<ref name="auto">Waal, Ton de et al. "Handbook of Statistical Data Editing and Imputation". Wiley publication, 2011, p.16.</ref>
=== Data
Data editing can be accomplished in many ways and primarily depends on the data set that is being explored.
==== Validity and
The validity of a data set depends on the completeness of the responses provided by the respondents. One method of data editing is to ensure that all responses are complete in fields that require a numerical or non-numerical answer. See the example below.
[[File:Completeness Table for Data Editing.png|frame|none
==== Duplicate data entry ====
Verifying that the data is unique is an important aspect of data editing to ensure that all data provided was only entered once. This reduces the possibility for repeated data that could skew [[analytics]] reporting. See the example below.
[[File:Duplicate Data Entries in Data Editing.png|frame|none
==== Outliers ====
It is common to find outliers in data sets, which as described before are values that do not fit a model of data well. These extreme values can be found based on the distribution of data points from previous data series or parallel data series for the same data set. The values can be considered erroneous and require further analysis for checking and determining the validity of the response. See the example below.
[[File:Outliers in Data Editing.png|frame|none
==== Logical
Logical consistency is the presence of logical relationships and interdependence between the variables. This editing requires a certain understanding around the dataset and the ability to identify errors in data based on previous reports or information. This type of data editing is used to account for the differences between data fields or variables. See the example below.
[[File:Logical Consistency in Data Editing.png|frame|none
===Macro editing===
Line 51 ⟶ 53:
====Distribution method====
Data available is used to characterize the [[Coefficient of variation|distribution of the variables]]. Then all individual values are compared with the distribution. Records containing values that could be considered uncommon (given the distribution) are candidates for further inspection and possibly for editing.<ref>Bethlehem, J. "Applied Survey Methods A Statistical Perspective ". Wiley publication, 2009, p.205.</ref>
===Automatic editing===
Line 57 ⟶ 59:
In automatic editing records are edited by a computer without human intervention.<ref>Waal, Ton de et al. "Handbook of Statistical Data Editing and Imputation". Wiley publication</ref> Prior knowledge on the values of a single variable or a combination of variables can be formulated as a set of edit rules which specify or constrain the admissible values
=== Determinants of
Data editing has its limitations with the capacity and resources of any given study. These determinants can have a positive or negative impact on the post-analysis of the data set. Below are several determinants of data editing.
'''Available resources:''' <ref name=":0" />
|