Data editing: Difference between revisions

Content deleted Content added
clean up following AfC creation
BG19bot (talk | contribs)
m WP:CHECKWIKI error fix for #61. Punctuation goes before References. Do general fixes if a problem exists. - using AWB (10242)
Line 1:
'''Data editing''' is defined as the process involving the review and adjustment of collected survey data. The purpose is to control the quality of the collected data.<ref>[http://www.unece.org/stats/editing.html UNECE]</ref> Data editing can be performed manually, with the assistance of a computer or a combination of both. <ref>http://www.statcan.gc.ca/edu/power-pouvoir/ch3/editing-edition/5214781-eng.htm</ref>
 
== Editing Methods ==
 
Line 9 ⟶ 10:
*Compare the respondent's data to data from similar respondents
*Use the subject matter knowledge of the human editor
Interractive editing is a standard way to edit data. It can be used to edit both [[categorical]] and [[continuous]] data.<ref>Waal, Ton de et al. "Handbook of Statistical Data Editing and Imputation". Wiley publication, 2011,p.15.</ref> Interractive editing reduces the time frame needed to complete the cyclical process of review and adjustment. <ref>http://www.unece.org/fileadmin/DAM/stats/publications/editing/SDE1chA.pdf</ref>
 
=== Selective editing ===
 
Selective editing is an umbrella term for several methods to identify the influential errors, <ref group=note> the errors that have substatial impact on the publication figures</ref> and outliers .<ref group=note>values that do not fit a model of data well</ref>. Selective editing techniques aim to apply interactive editing to a well-chosen subset of the records, such that the limited time and resources available for interactive editing are allocated to those records where it has the most effect on the quality of the final estimates of publication figures. In selective editing, data is split into two streams
*The critical stream
*The noncritical stream
The critical stream consists of records that are more likely to contain influential errors. These critical records are edited in a traditional interactive manner. The records in the non critical stream which are unlikely to contain influential errors are not edited in a computer assisted manner.<ref>Waal, Ton de et al. "Handbook of Statistical Data Editing and Imputation". Wiley publication, 2011,p.16.</ref>
 
=== Macro editing ===
 
There are two forms of macro editing<ref>Waal, Ton de et al. "Handbook of Statistical Data Editing and Imputation". Wiley publication, 2011,p.16.</ref>
 
==== Aggregation method ====
 
This method is followed in almost every statistical agency before publication: verifying whether figures to be published seem plausible. This is accomplished by comparing quantities in publication tables with same quantities in previous publications. If an unusual value is observed, a micro-editing procedure is applied to the individual records and fields contributing to the suspicious quantity.<ref>http://www.unece.org/fileadmin/DAM/stats/publications/editing/SDE1chB.pdf</ref>
 
==== Distribution method ====
 
Data available is used to characterize the distribution of the variables. Then all individual values are compared with the distribution. Records containing values that could be considered uncommon (given the distribution) are candidates for further inspection and possibly for editing.<ref>Bethlehem,J. "Applied Survey Methods A Statistical Perspective ". Wiley publication, 2009,p.205.</ref>
 
=== Automatic editing ===
 
In automatic editing records are edited by a computer without human intervention.<ref>Waal, Ton de et al. "Handbook of Statistical Data Editing and Imputation". Wiley publication, 2011,p.16.</ref><ref>http://www.unece.org/fileadmin/DAM/stats/publications/editing/SDE1chC.pdf</ref> Prior knowledge on the values of a single variable or a combination of variables can be formulated as a set of edit rules which specify or constrain the admissible values. <ref>http://www.cbs.nl/NR/rdonlyres/E1FF7D78-E697-42E7-A36D-94AE74EDB83A/0/201309x10pub.pdf</ref>
 
== Notes ==