Native-language identification: Difference between revisions

Content deleted Content added
WikiCleanerBot (talk | contribs)
m v2.04b - Bot T20 CW#61 - Fix errors for CW project (Reference before punctuation)
Weas3l5491 (talk | contribs)
mNo edit summary
 
(3 intermediate revisions by 3 users not shown)
Line 1:
{{Short description|determiningDetermining someone's first language based on how they write or speak a different language}}
 
'''Native-language identification''' ('''NLI''') is the task of determining an author's [[first language|native language]] (L1) based only on their writings in a [[second language]] (L2).<ref>Wong, Sze-Meng Jojo, and Mark Dras. [http://anthology.aclweb.org/D/D11/D11-1148.pdf "Exploiting parse structures for native language identification"]. Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2011.</ref> NLI works through identifying language-usage patterns that are common to specific L1 groups and then applying this knowledge to predict the native language of previously unseen texts. This is motivated in part by applications in [[second-language acquisition]], language teaching and [[forensic linguistics]], amongst others.
 
Line 6 ⟶ 5:
NLI works under the assumption that an author's L1 will dispose them towards particular language production patterns in their L2, as influenced by their native language. This relates to cross-linguistic influence (CLI), a key topic in the field of second-language acquisition (SLA) that analyzes transfer effects from the L1 on later learned languages.
 
Using large-scale English data, NLI methods achieve over 80% accuracy in predicting the native language of texts written by authors from 11 different L1 backgrounds.<ref>Shervin Malmasi, Keelan Evanini, Aoife Cahill, Joel Tetreault, Robert Pugh, Christopher Hamill, Diane Napolitano, and Yao Qian. 2017. [https://aclanthology.org/W17-5007/ "A Report on the 2017 Native Language Identification Shared Task"].pdf In Proceedings of the 12th Workshop on Innovative Use of NLP for Building Educational Applications, pages 62–75, Copenhagen, Denmark. Association for Computational Linguistics.</ref> This can be compared to a baseline of 9% for choosing randomly.
 
==Applications==
Line 24 ⟶ 23:
 
Various linguistic feature types have been applied for this task. These include syntactic features such as constituent parses, grammatical dependencies and part-of-speech tags.
Surface level lexical features such as character, word and lemma [[n-gram|n-grams]]s have also been found to be quite useful for this task. However, it seems that character n-grams<ref>Radu Tudor Ionescu, Marius Popescu and Aoife Cahill. [http://www.mitpressjournals.org/doi/abs/10.1162/COLI_a_00256 "String Kernels for Native Language Identification: Insights from Behind the Curtains"], Computational Linguistics, 2016</ref><ref>Radu Tudor Ionescu and Marius Popescu. [https://arxiv.org/abs/1707.08349 "Can string kernels pass the test of time in Native Language Identification?"], In Proceedings of BEA12, 2017.</ref> are the single best feature for the task.
 
== 2013 shared task ==
Line 31 ⟶ 30:
==See also==
{{div col|colwidth=22em}}
*[[{{annotated link|Crosslinguistic influence]]}}
*[[{{annotated link|Foreign language writing aid]]}}
*[[{{annotated link|Computer-assisted language learning]]}}
*[[{{annotated link|Language education]]}}
*[[{{annotated link|Natural language processing]]}}
*[[{{annotated link|Language transfer]]}}
{{div col end}}