Content deleted Content added
m JorisvS moved page Native Language Identification to Native-language identification |
m ce |
||
Line 1:
'''Native-language identification''' (NLI) is the task of determining an author's [[first language|native language]] (L1) based only on their writings in a
NLI works through identifying language
▲on their writings in a second language ([[Second language|L2]]).<ref>Wong, Sze-Meng Jojo, and Mark Dras. [http://anthology.aclweb.org/D/D11/D11-1148.pdf "Exploiting parse structures for native language identification"]. Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2011.</ref>
This is motivated in part by applications in [[
▲NLI works through identifying language usage patterns that are common to specific L1 groups and this knowledge is then applied to predict the mother tongue of previously unseen texts.
▲This is motivated in part by applications in [[Second Language Acquisition]], Language teaching and [[Forensic Linguistics]], amongst others.
== Overview ==
NLI works under the assumption that an author's L1 will dispose them towards particular language production patterns in their L2, as influenced by their
Using large-scale English data, NLI methods achieve over 80% accuracy in predicting the
==Applications==
===Pedagogy and
This identification of L1-specific features has been used to study [[language transfer]] effects in
===Forensic
NLI methods can also be applied in [[
This is particularly useful in situations where a text, e.g. an anonymous letter, is the key piece of evidence in an investigation and clues about the native language of a writer can help investigators in identifying the source.
This has already attracted interest and funding from intelligence agencies.<ref>Ria Perkins. 2014. "Linguistic identifiers of L1 Persian speakers writing in English: NLID for authorship analysis". Ph.D. thesis, Aston University.</ref>
Line 21 ⟶ 20:
== Methodology ==
[[Natural
A range of ensemble based systems have also been applied to the task and shown to improve performance over single classifier systems.<ref>Malmasi, Shervin, Sze-Meng Jojo Wong, and Mark Dras. [http://anthology.aclweb.org/W/W13/W13-1716.pdf "NLI Shared Task 2013: MQ submission"]. Proceedings of the Eighth Workshop on Innovative Use of NLP for Building Educational Applications. 2013.</ref>
Line 27 ⟶ 26:
Surface level lexical features such as character, word and lemma [[n-gram|n-grams]] have also been found to be quite useful for this task.
== 2013
The Building Educational Applications (BEA) workshop at [[NAACL]] 2013 hosted the inaugural NLI shared task.<ref>Tetreault et al, [http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.365.5931&rep=rep1&type=pdf "A report on the first native language identification shared task"], 2013</ref> The competition resulted in 29 entries from teams across the globe, 24 of which also published a paper describing their systems and approaches.
|