History of natural language processing: Difference between revisions

Content deleted Content added
AnomieBOT (talk | contribs)
m Dating maintenance tags: {{Update}}
m Redirect bypass from Georgetown-IBM experiment to Georgetown–IBM experiment using popups | wp:datescript-assisted date/terms audit; see wp:unlinkdates, wp:overlink
Line 6:
The history of machine translation dates back to the seventeenth century, when philosophers such as [[Gottfried Wilhelm Leibniz|Leibniz]] and [[Descartes]] put forward proposals for codes which would relate words between languages. All of these proposals remained theoretical, and none resulted in the development of an actual machine.
 
The first patents for "translating machines" were applied for in the mid-1930s. One proposal, by [[Georges Artsrouni]] was simply an automatic bilingual dictionary using [[paper tape]]. The other proposal, by [[Peter Troyanskii]], a [[Russians|Russian]], was more detailed. It included both the bilingual dictionary, and a method for dealing with grammatical roles between languages, based on [[Esperanto]].
 
In 1950, [[Alan Turing]] published his famous article "[[Computing Machinery and Intelligence]]" which proposed what is now called the [[Turing test]] as a criterion of intelligence. This criterion depends on the ability of a computer program to impersonate a human in a real-time written conversation with a human judge, sufficiently well that the judge is unable to distinguish reliably — on the basis of the conversational content alone — between the program and a real human.
 
In 1957, [[Noam Chomsky]]’s ''[[Syntactic Structures]]'' revolutionized Linguistics with '[[universal grammar]]', a rule -based system of syntactic structures.<ref>{{cite web
| url = http://www.cs.bham.ac.uk/~pjh/sem1a5/pt1/pt1_history.html
| title = SEM1A5 - Part 1 - A brief history of NLP
Line 16:
}}</ref>
 
The [[Georgetown-IBMGeorgetown–IBM experiment|Georgetown experiment]] in 1954 involved fully automatic translation of more than sixty Russian sentences into English. The authors claimed that within three or five years, machine translation would be a solved problem.<ref>Hutchins, J. (2005)</ref> However, real progress was much slower, and after the [[ALPAC|ALPAC report]] in 1966, which found that ten years long research had failed to fulfill the expectations, funding for machine translation was dramatically reduced. Little further research in machine translation was conducted until the late 1980s, when the first [[statistical machine translation]] systems were developed.
 
Some notably successful NLP systems developed in the 1960s were [[SHRDLU]], a natural language system working in restricted "[[blocks world]]s" with restricted vocabularies.
Line 22:
In 1969 [[Roger Schank]] introduced the [[conceptual dependency theory]] for natural language understanding.<ref>[[Roger Schank]], 1969, ''A conceptual dependency parser for natural language'' Proceedings of the 1969 conference on Computational linguistics, Sång-Säby, Sweden, pages 1-3</ref> This model, partially influenced by the work of [[Sydney Lamb]], was extensively used by Schank's students at [[Yale University]], such as Robert Wilensky, Wendy Lehnert, and [[Janet Kolodner]].
 
In 1970, William A. Woods introduced the [[augmented transition network]] (ATN) to represent natural language input.<ref>Woods, William A (1970). "Transition Network Grammars for Natural Language Analysis". Communications of the ACM 13 (10): 591–606 [http://www.eric.ed.gov/ERICWebPortal/custom/portlets/recordDetails/detailmini.jsp?_nfpb=true&_&ERICExtSearch_SearchValue_0=ED037733&ERICExtSearch_SearchType_0=no&accno=ED037733]</ref> Instead of ''[[phrase structure rules]]'' ATNs used an equivalent set of [[finite -state automata]] that were called recursively. ATNs and their more general format called "generalized ATNs" continued to be used for a number of years. During the 1970s many programmers began to write 'conceptual ontologies', which structured real-world information into computer-understandable data. Examples are MARGIE (Schank, 1975), SAM (Cullingford, 1978), PAM (Wilensky, 1978), TaleSpin (Meehan, 1976), QUALM (Lehnert, 1977), Politics (Carbonell, 1979), and Plot Units (Lehnert 1981). During this time, many [[chatterbots]] were written including [[PARRY]], [[Racter]], and [[Jabberwacky]].
 
{{anchor|Machine learning}}
Up to the 1980s, most NLP systems were based on complex sets of hand-written rules. Starting in the late 1980s, however, there was a revolution in NLP with the introduction of [[machine learning]] algorithms for language processing. This was due both to the steady increase in computational power resulting from [[Moore's Lawlaw]] and the gradual lessening of the dominance of [[Noam Chomsky|Chomskyan]] theories of linguistics (e.g. [[transformational grammar]]), whose theoretical underpinnings discouraged the sort of [[corpus linguistics]] that underlies the machine-learning approach to language processing.<ref>Chomskyan linguistics encourages the investigation of "[[corner case]]s" that stress the limits of its theoretical models (comparable to [[pathological (mathematics)|pathological]] phenomena in mathematics), typically created using [[thought experiment]]s, rather than the systematic investigation of typical phenomena that occur in real-world data, as is the case in [[corpus linguistics]]. The creation and use of such [[text corpus|corpora]] of real-world data is a fundamental part of machine-learning algorithms for NLP. In addition, theoretical underpinnings of Chomskyan linguistics such as the so-called "[[poverty of the stimulus]]" argument entail that general learning algorithms, as are typically used in machine learning, cannot be successful in language processing. As a result, the Chomskyan paradigm discouraged the application of such models to language processing.</ref> Some of the earliest-used machine learning algorithms, such as [[decision tree]]s, produced systems of hard if-then rules similar to existing hand-written rules. Increasingly, however, research has focused on [[statistical natural language processing|statistical models]], which make soft, [[probabilistic]] decisions based on attaching [[real-valued]] weights to the features making up the input data. The [[cache language model]]s upon which many [[speech recognition]] systems now rely are examples of such statistical models. Such models are generally more robust when given unfamiliar input, especially input that contains errors (as is very common for real-world data), and produce more reliable results when integrated into a larger system comprising multiple subtasks.
 
Many of the notable early successes occurred in the field of [[machine translation]], due especially to work at IBM Research, where successively more complicated statistical models were developed. These systems were able to take advantage of existing multilingual [[text corpus|textual corpora]] that had been produced by the [[Parliament of Canada]] and the [[European Union]] as a result of laws calling for the translation of all governmental proceedings into all official languages of the corresponding systems of government. However, most other systems depended on corpora specifically developed for the tasks implemented by these systems, which was (and often continues to be) a major limitation in the success of these systems. As a result, a great deal of research has gone into methods of more effectively learning from limited amounts of data.
Line 40:
! style="background-color:#EEF6D6;" | Reference
|-
|'''[[Georgetown-IBMGeorgetown–IBM experiment|Georgetown experiment]] '''
|1954
|[[Georgetown University]] and [[IBM]]