Content deleted Content added
Micahtobon (talk | contribs) Changes to grammar and structure of sentences and includes some examples of data editing. |
Micahtobon (talk | contribs) Added visuals and restructured document for readability. |
||
Line 1:
<references />
'''Data editing''' is defined as the process involving the review and adjustment of collected [[survey data]]. Data editing helps define guidelines that will reduce potential bias and ensure consistent estimates leading to a clear analysis of the data set by correct inconsistent data using the methods later in this article. <ref>{{Cite web|title=National Center for Education Statistics (NCES) Home Page, part of the U.S. Department of Education|url=https://nces.ed.gov/|access-date=2020-12-06|website=nces.ed.gov|language=EN}}</ref> The purpose is to control the quality of the collected data.<ref>{{Cite web|url=http://www.unece.org/stats/editing.html|title=UNECE}}</ref> Data editing can be performed manually, with the assistance of a computer or a combination of both.<ref>{{Cite web|url=https://www150.statcan.gc.ca/n1/edu/power-pouvoir/ch3/editing-edition/5214781-eng.htm|title=Statistics: Power from Data! Data editing|website=www150.statcan.gc.ca}}</ref>
Line 6 ⟶ 7:
The term interactive editing is commonly used for modern computer-assisted manual editing. Most interactive data editing tools applied at National Statistical Institutes (NSIs) allow one to check the specified edits during or after data entry, and if necessary to correct erroneous data immediately. Several approaches can be followed to correct erroneous data:
*
*Compare the respondent's data to his data from the previous year
*Compare the respondent's data to data from similar respondents
*Use the subject matter knowledge of the human editor
Line 14 ⟶ 15:
===Selective editing===
Selective editing is an umbrella term for several methods to identify the influential errors, <ref group=note>the errors that have a substantial impact on the publication figures</ref> and [[outliers]].<ref group=note>values that do not fit a model of data well</ref> Selective editing techniques aim to apply interactive editing to a well-chosen subset of the records, such that the limited time and resources available for interactive editing are allocated to those records where it has the most effect on the quality of the final estimates of
*The critical stream
*The non-critical stream
The critical stream consists of records that are more likely to contain influential errors. These critical records are edited in a traditional interactive manner. The records in the non-critical stream which are unlikely to contain influential errors are not edited in a computer
=== Data Editing Techniques ===
Data editing can be accomplished in many ways and primarily depends on the data set that is being explored. <ref>{{Cite web|last=SCAD|title=SCAD|url=https://www.scad.gov.ae/|access-date=2020-12-07|website=SCAD|language=en}}</ref>
==== Validity and Completeness of Data ====
The validity of a data set depends on the completeness of the responses provided by the respondents. One method of data editing is to ensure that all responses are complete in fields that require a numerical or non-numerical answer. See the example below.
[[File:Completeness Table for Data Editing.png|frame|none|1500px]]
==== Duplicate data entry ====
Verifying that the data is unique is an important aspect of data editing to ensure that all data provided was only entered once. This reduces the
[[File:Duplicate Data Entries in Data Editing.png|frame|none|1500px]]
==== Outliers ====
It is common to find outliers in data sets, which as described before are values that do not fit a model of data well. These extreme values can be found based
[[File:Outliers in Data Editing.png|frame|none|1500px]]
===Macro editing===
Line 37 ⟶ 41:
====Aggregation method====
This method is followed in almost every statistical agency before publication: verifying whether figures to be published
====Distribution method====
|