Data editing: Difference between revisions

Content deleted Content added
Changes to grammar and structure of sentences and includes some examples of data editing.
Line 1:
'''Data editing''' is defined as the process involving the review and adjustment of collected [[survey data]]. Data editing helps define guidelines that will reduce potential bias and ensure consistent estimates leading to a clear analysis of the data set by correct inconsistent data using the methods later in this article. <ref>{{Cite web|title=National Center for Education Statistics (NCES) Home Page, part of the U.S. Department of Education|url=https://nces.ed.gov/|access-date=2020-12-06|website=nces.ed.gov|language=EN}}</ref> The purpose is to control the quality of the collected data.<ref>{{Cite web|url=http://www.unece.org/stats/editing.html|title=UNECE}}</ref> Data editing can be performed manually, with the assistance of a computer or a combination of both.<ref>{{Cite web|url=https://www150.statcan.gc.ca/n1/edu/power-pouvoir/ch3/editing-edition/5214781-eng.htm|title=Statistics: Power from Data! Data editing|website=www150.statcan.gc.ca}}</ref>
 
==Editing methods==
Line 10:
*Compare the respondent's data to data from similar respondents
*Use the subject matter knowledge of the human editor
Interactive editing is a standard way to edit data. It can be used to edit both [[categorical data|categorical]] and [[continuity (mathematics)|continuous]] data.<ref>Waal, Ton de et al. "Handbook of Statistical Data Editing and Imputation". Wiley publication, 2011,p.15.</ref> Interactive editing reduces the time frame needed to complete the cyclical process of review and adjustment.<ref name="auto1">{{Cite web|url=http://www.unece.org/info/ece-homepage.html|title=UNECE Homepage|website=www.unece.org}}</ref> Interactive editing also requires an understanding of the data set and the possible results that would come from an analysis of the data.
 
===Selective editing===
Line 18:
*The non-critical stream
The critical stream consists of records that are more likely to contain influential errors. These critical records are edited in a traditional interactive manner. The records in the non-critical stream which are unlikely to contain influential errors are not edited in a computer assisted manner.<ref name="auto">Waal, Ton de et al. "Handbook of Statistical Data Editing and Imputation". Wiley publication, 2011,p.16.</ref>
 
=== Data Editing Techniques ===
Data editing can be accomplished in many ways and primarily depends on the data set that is being explored.
 
==== Validity and Completeness of Data ====
The validity of a data set depends on the completeness of the responses provided by the respondents. One method of data editing is to ensure that all responses are complete in fields that require a numerical or non-numerical answer. See example below.
 
==== Duplicate data entry ====
Verifying that the data is unique is an important aspect of data editing to ensure that all data provided was only entered once. This reduces the possible for repeated data that could skew [[analytics]] reporting.
 
==== Outliers ====
It is common to find outliers in data sets, which as described before are values that do not fit a model of data well. These extreme values can be found based off of the distribution of data points from previous data series or parallel data series for the same data set. The values can be considered erroneous and require further analysis for checking and determining the validity of the response.
 
===Macro editing===