Revision as of 11:18, 12 April 2015 edit JorisvS (talk \| contribs) Extended confirmed users, Pending changes reviewers 46,766 edits m JorisvS moved page Native Language Identification to Native-language identification ← Previous edit		Revision as of 11:22, 12 April 2015 edit undo JorisvS (talk \| contribs) Extended confirmed users, Pending changes reviewers 46,766 edits m ce Next edit →
Line 1: '''Native-language identification''' (NLI) is the task of determining an author's [[first language\|native language]] (L1) based only on their writings in a ~~second language (~~[[~~Second~~second language~~\|L2~~]] (L2).<ref>Wong, Sze-Meng Jojo, and Mark Dras. [http://anthology.aclweb.org/D/D11/D11-1148.pdf "Exploiting parse structures for native language identification"]. Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2011.</ref>▼ ~~'''Native Language Identification''' (NLI) is the task of determining an author's native language ([[First language\|L1]]) based only~~ NLI works through identifying language -usage patterns that are common to specific L1 groups and then applying this knowledge ~~is then applied~~ to predict the ~~mother~~native ~~tongue~~language of previously unseen texts.▼ ▲on their writings in a second language ([[Second language\|L2]]).<ref>Wong, Sze-Meng Jojo, and Mark Dras. [http://anthology.aclweb.org/D/D11/D11-1148.pdf "Exploiting parse structures for native language identification"]. Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2011.</ref> This is motivated in part by applications in [[~~Second~~second-language ~~Language Acquisition~~acquisition]], Language teaching and [[~~Forensic~~forensic ~~Linguistics~~linguistics]], amongst others.▼ ▲NLI works through identifying language usage patterns that are common to specific L1 groups and this knowledge is then applied to predict the mother tongue of previously unseen texts. ▲This is motivated in part by applications in [[Second Language Acquisition]], Language teaching and [[Forensic Linguistics]], amongst others. == Overview == NLI works under the assumption that an author's L1 will dispose them towards particular language production patterns in their L2, as influenced by their ~~mother~~native ~~tongue~~language. This relates to ~~Cross~~cross-~~Linguistic~~linguistic ~~Influence~~influence (CLI), a key topic in the field of ~~Second Language~~second-language ~~Acquisition~~acquisition (SLA) that analyzes transfer effects from the L1 on later learned languages. Using large-scale English data, NLI methods achieve over 80% accuracy in predicting the ~~mother~~native ~~tongue~~language of texts written by authors from 11 different L1 backgrounds. This can be compared to a baseline of 9% for choosing randomly. ==Applications== ===Pedagogy and ~~Language~~language ~~Transfer~~transfer=== This identification of L1-specific features has been used to study [[language transfer]] effects in ~~Second Language~~second-language ~~Acquisition~~acquisition.<ref>Malmasi, Shervin, and Mark Dras. [http://www.aclweb.org/anthology/D/D14/D14-1144.pdf "Language Transfer Hypotheses with Linear SVM Weights."] Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). 2014.</ref> This is useful for developing pedagogical material, teaching methods, L1-specific instructions and generating learner feedback that is tailored to their ~~mother~~native ~~tongue~~language. ===Forensic ~~Linguistics~~linguistics=== NLI methods can also be applied in [[~~Forensic~~forensic ~~Linguistics~~linguistics]] as a method of performing ~~Authorship~~authorship ~~Profiling~~profiling in order to infer the attributes of an author, including their linguistic background. This is particularly useful in situations where a text, e.g. an anonymous letter, is the key piece of evidence in an investigation and clues about the native language of a writer can help investigators in identifying the source. This has already attracted interest and funding from intelligence agencies.<ref>Ria Perkins. 2014. "Linguistic identifiers of L1 Persian speakers writing in English: NLID for authorship analysis". Ph.D. thesis, Aston University.</ref> Line 21 ⟶ 20: == Methodology == [[Natural ~~Language~~language ~~Processing~~processing]] methods are used to extract and identify language usage patterns common to speakers of an L1-group. This is done using language learner data, usually from a [[learner corpus]]. Next, [[~~Machine~~machine learning]] is applied to train classifiers, like [[~~Support Vector Machine\|Support~~support ~~Vector~~vector ~~Machines~~machine]]s, for predicting the L1 of unseen texts.<ref>Tetreault et al, [http://anthology.aclweb.org/C/C12/C12-1158.pdf "Native Tongues, Lost and Found: Resources and Empirical Evaluations in Native Language Identification"], In Proc. International Conf. on Computational Linguistics (COLING), 2012</ref> A range of ensemble based systems have also been applied to the task and shown to improve performance over single classifier systems.<ref>Malmasi, Shervin, Sze-Meng Jojo Wong, and Mark Dras. [http://anthology.aclweb.org/W/W13/W13-1716.pdf "NLI Shared Task 2013: MQ submission"]. Proceedings of the Eighth Workshop on Innovative Use of NLP for Building Educational Applications. 2013.</ref> Line 27 ⟶ 26: Surface level lexical features such as character, word and lemma [[n-gram\|n-grams]] have also been found to be quite useful for this task. == 2013 ~~Shared~~shared ~~Task~~task == The Building Educational Applications (BEA) workshop at [[NAACL]] 2013 hosted the inaugural NLI shared task.<ref>Tetreault et al, [http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.365.5931&rep=rep1&type=pdf "A report on the first native language identification shared task"], 2013</ref> The competition resulted in 29 entries from teams across the globe, 24 of which also published a paper describing their systems and approaches.

Native-language identification: Difference between revisions