Content deleted Content added
No edit summary Tags: Mobile edit Mobile web edit |
Citation bot (talk | contribs) Altered template type. Add: journal, publisher, authors 1-1. | Use this bot. Report bugs. | Suggested by Abductive | Category:Outlines | #UCB_Category 560/928 |
||
(48 intermediate revisions by 29 users not shown) | |||
Line 1:
{{Short description|1=Overview of and topical guide to natural-language processing}}
<!--... Attention: THIS IS AN OUTLINE
part of the set of
[[
Wikipedia outlines are
Line 9 ⟶ 10:
content navigation systems
[[Wikipedia:WikiProject
Further improvements
to this outline are on the way
...-->
The following [[Outline (list)|outline]] is provided as an overview of and topical guide to natural
'''[[Natural language processing|natural-language processing]]''' – computer activity in which computers are entailed to [[natural
{{TOC limit|limit=2}}
== Natural
Natural
* A field of [[science]] – systematic enterprise that builds and organizes knowledge in the form of testable explanations and predictions about the universe.<ref>"... modern science is a discovery as well as an invention. It was a discovery that nature generally acts regularly enough to be described by laws and even by mathematics; and required invention to devise the techniques, abstractions, apparatus, and organization for exhibiting the regularities and securing their law-like descriptions." —p.vii, [[J. L. Heilbron]], (2003, editor-in-chief) ''The Oxford Companion to the History of Modern Science'' New York: Oxford University Press {{ISBN|0-19-511229-6}}
*{{cite
<!--{{sfn|Popper|2002|p=3}}--></ref>
** An [[applied science]] – field that applies human knowledge to build or design useful things.
*** A field of [[computer science]] – scientific and practical approach to computation and its applications.
**** A branch of [[artificial intelligence]] –
**** A subfield of [[computational linguistics]] –
** An application of [[engineering]] – science, skill, and profession of acquiring and applying scientific, economic, social, and practical knowledge, in order to design and also build structures, machines, devices, systems, materials and processes.
*** An application of [[software engineering]] – application of a systematic, disciplined, quantifiable approach to the design, development, operation, and maintenance of software, and the study of these approaches; that is, the application of engineering to software.<ref name="BoDu04">[[Software Engineering Body of Knowledge|SWEBOK]] {{Cite book|
| last = ACM
| year = 2006
Line 37 ⟶ 38:
| title = Computing Degrees & Careers
| publisher = ACM
|
| archive-date = 2011-06-17
| archive-url = https://web.archive.org/web/20110617053818/http://computingcareers.acm.org/?page_id=12
| url-status = dead
}}</ref><ref>
{{cite book | last = Laplante | first = Phillip | title = What Every Engineer Should Know about Software Engineering | publisher = CRC | ___location = Boca Raton
| year = 2007 | isbn = 978-0-8493-7228-5 | url = https://books.google.com/books?id=pFHYk0KWAEgC&
</ref>
**** A subfield of [[computer programming]] – process of designing, writing, testing, debugging, and maintaining the source code of computer programs. This source code is written in one or more programming languages (such as Java, C++, C#, Python, etc.). The purpose of programming is to create a set of instructions that computers use to perform specific operations or to exhibit desired behaviors.
***** A subfield of [[artificial intelligence]] programming –
* A type of [[system]] – set of interacting or interdependent components forming an integrated whole or a set of elements (often called 'components' ) and relationships which are different from relationships of the set or its elements to other elements or sets.
** A system that includes [[software]] –
* A type of [[technology]] – making, modification, usage, and knowledge of tools, machines, techniques, crafts, systems, methods of organization, in order to solve a problem, improve a preexisting solution to a problem, achieve a goal, handle an applied input/output relation or perform a specific function. It can also refer to the collection of such tools, machinery, modifications, arrangements and procedures. Technologies significantly affect human as well as other animal species' ability to control and adapt to their natural environments.
** A form of [[computer technology]] – computers and their application. NLP makes use of computers, image scanners, microphones, and many types of software programs.
*** [[Language technology]] –
== Prerequisite technologies ==
The following technologies make natural
* [[Communication]] – the activity of a source sending a message to a [[Receiver (information theory)|receiver]]
Line 69 ⟶ 73:
**** [[Image scanner]]s –
== Subfields of natural
* [[Information extraction]] (IE) – field concerned in general with the extraction of semantic information from text. This covers tasks such as [[named-entity recognition]], [[Coreference|coreference resolution]], [[relationship extraction]], etc.
* [[Ontology engineering]] – field that studies the methods and methodologies for building ontologies, which are formal representations of a set of concepts within a ___domain and the relationships between those concepts.
* [[Speech processing]] – field that covers [[speech recognition]], [[text-to-speech]] and related tasks.
* [[Statistical natural
** [[Statistical semantics]] – a subfield of [[computational semantics]] that establishes semantic relations between words to examine their contexts.
*** [[Distributional semantics]] – a subfield of [[statistical semantics]] that examines the semantic relationship of words across a corpora or in large samples of data.
== Related fields ==
Natural
* [[Automated reasoning]] – area of computer science and mathematical logic dedicated to understanding various aspects of reasoning, and producing software which allows computers to reason completely, or nearly completely, automatically. A sub-field of artificial intelligence, automatic reasoning is also grounded in theoretical computer science and philosophy of mind.
* [[Linguistics]] – scientific study of human language. Natural
** [[Applied linguistics]] – interdisciplinary field of study that identifies, investigates, and offers solutions to language-related real-life problems. Some of the academic fields related to applied linguistics are education, linguistics, psychology, computer science, anthropology, and sociology. Some of the subfields of applied linguistics relevant to natural
*** [[Multilingualism|Bilingualism / Multilingualism]] –
*** [[Computer-mediated communication]] (CMC) – any communicative transaction that occurs through the use of two or more networked computers.<ref>McQuail, Denis. (2005). ''Mcquail's Mass Communication Theory''. 5th ed. London: SAGE Publications.</ref> Research on CMC focuses largely on the social effects of different computer-supported communication technologies. Many recent studies involve Internet-based [[social networking]] supported by [[social software]].
Line 92 ⟶ 95:
*** [[Interlinguistics]] – study of improving communications between people of different first languages with the use of ethnic and auxiliary languages (lingua franca). For instance by use of intentional international auxiliary languages, such as Esperanto or Interlingua, or spontaneous interlanguages known as pidgin languages.
*** [[Language assessment]] – assessment of first, second or other language in the school, college, or university context; assessment of language use in the workplace; and assessment of language in the immigration, citizenship, and asylum contexts. The assessment may include analyses of listening, speaking, reading, writing or cultural understanding, with respect to understanding how the language works theoretically and the ability to use the language practically.
*** [[Language pedagogy]] – science and art of language education, including approaches and methods of language teaching and study. Natural
*** [[Language planning]] –
*** [[Language policy]] –
Line 98 ⟶ 101:
*** [[literacy|Literacies]] –
*** [[Pragmatics]] –
*** [[Second
*** [[stylistics (literature)|Stylistics]] –
*** [[Translation]] –
** [[Computational linguistics]] –
*** [[Computational semantics]] –
*** [[Corpus linguistics]] – study of language as expressed in samples ''(corpora)'' of "real world" text. ''Corpora'' is the plural of ''corpus'', and a corpus is a specifically selected collection of texts (or speech segments) composed of natural language. After it is constructed (gathered or composed), a corpus is analyzed with the methods of computational linguistics to infer the meaning and context of its components (words, phrases, and sentences), and the relationships between them. Optionally, a corpus can be annotated ("tagged") with data (manually or automatically) to make the corpus easier to understand (e.g., [[part-of-speech tagging]]). This data is then applied to make sense of user input, for example, to make better (automated) guesses of what people are talking about or saying, perhaps to achieve more narrowly focused web searches, or for speech recognition.
** [[Metalinguistics]] –
** [[Sign language#Linguistics
* [[Human–computer interaction]] – the intersection of computer science and behavioral sciences, this field involves the study, planning, and design of the interaction between people (users) and computers. Attention to human-machine interaction is important, because poorly designed human-machine interfaces can lead to many unexpected problems. A classic example of this is the [[Three Mile Island accident]] where investigations concluded that the design of the human–machine interface was at least partially responsible for the disaster.
* [[Information retrieval]] (IR) – field concerned with storing, searching and retrieving information. It is a separate field within computer science (closer to databases), but IR relies on some NLP methods (for example, stemming). Some current research and applications seek to bridge the gap between IR and NLP.
Line 115 ⟶ 118:
** [[Statistical classification]] –
== Structures used in natural
* [[Anaphora (linguistics)|Anaphora]] – type of expression whose reference depends upon another referential element. E.g., in the sentence 'Sally preferred the company of herself', 'herself' is an anaphoric expression in that it is coreferential with 'Sally', the sentence's subject.
* [[Context-free language]] –
Line 143 ⟶ 146:
*** [[Taxonomy for search engines]] – typically called a "taxonomy of entities". It is a [[tree structure|tree]] in which nodes are labelled with entities which are expected to occur in a web search query. These trees are used to match keywords from a search query with the keywords from relevant answers (or snippets).
* [[Textual entailment]] – directional relation between text fragments. The relation holds whenever the truth of one text fragment follows from another text. In the TE framework, the entailing and entailed texts are termed text (t) and hypothesis (h), respectively. The relation is directional because even if "t entails h", the reverse "h entails t" is much less certain.
* [[Triphone]] – sequence of three phonemes. Triphones are useful in models of natural
== Processes of NLP ==
=== Applications ===
* [[Automated essay scoring]] (AES) – the use of specialized computer programs to assign grades to essays written in an educational setting. It is a method of educational assessment and an application of natural
* [[Automatic image annotation]] – process by which a computer system automatically assigns textual metadata in the form of captioning or keywords to a digital image. The annotations are used in image retrieval systems to organize and locate images of interest from a database.
* [[Automatic summarization]] – process of reducing a text document with a computer program in order to create a summary that retains the most important points of the original document. Often used to provide summaries of text of a known type, such as articles in the financial section of a newspaper.
Line 168 ⟶ 171:
* [[Dialog system]] –
* [[Foreign-language reading aid]] – computer program that assists a non-native language user to read properly in their target language. The proper reading means that the pronunciation should be correct and stress to different parts of the words should be proper.
* [[Foreign
* [[Grammar checker|Grammar checking]] – the act of verifying the grammatical correctness of written text, especially if this act is performed by a [[computer program]].
* [[Information retrieval]] –
Line 179 ⟶ 182:
** [[Example-based machine translation]] –
** [[Rule-based machine translation]] –
* [[Natural
* [[Natural
* [[Optical character recognition]] (OCR) – given an image representing printed text, determine the corresponding text.
* [[Question answering]] – given a human-language question, determine its answer. Typical questions have a specific right answer (such as "What is the capital of Canada?"), but sometimes open-ended questions are also considered (such as "What is the meaning of life?").
Line 186 ⟶ 189:
* [[Naive Bayes spam filtering|Spam filtering]] –
* [[Sentiment analysis]] – extracts subjective information usually from a set of documents, often using online reviews to determine "polarity" about specific objects. It is especially useful for identifying trends of public opinion in the social media, for the purpose of marketing.
* [[Speech recognition]] – given a sound clip of a person or people speaking, determine the textual representation of the speech. This is the opposite of [[text to speech]] and is one of the extremely difficult problems colloquially termed "[[AI-complete]]" (see above). In [[natural speech]] there are hardly any pauses between successive words, and thus [[speech segmentation]] is a necessary subtask of speech recognition (see below).
* [[Speech synthesis]] (Text-to-speech) –
* [[Text-proofing]] –
* [[Text simplification]] – automated editing a document to include fewer words, or use easier words, while retaining its underlying meaning and information.
=== Component processes ===
* [[Natural
* [[Natural language generation|Natural-language generation]] – task of converting information from computer databases into readable human language.
==== Component processes of natural
* [[Automatic document classification]] (text categorization) –
** [[Automatic language identification]] –
Line 205 ⟶ 208:
** [[Text simplification]] –
* [[Deep linguistic processing]] –
* [[Discourse analysis]] – includes a number of related tasks. One task is identifying the [[discourse]] structure of connected text, i.e. the nature of the discourse relationships between sentences (e.g. elaboration, explanation, contrast). Another possible task is recognizing and classifying the [[speech act]]s in a chunk of text (e.g.
* [[Information extraction]] –
** [[Text mining]] – process of deriving high-quality information from text. High-quality information is typically derived through the devising of patterns and trends through means such as statistical pattern learning.
*** [[Biomedical text mining]] –
*** [[Decision tree learning]] –
*** [[Sentence extraction]] –
Line 214 ⟶ 217:
* [[Latent semantic indexing]] –
* [[Lemmatisation]] – groups together all like terms that share a same lemma such that they are classified as a single item.
* [[Morphology (linguistics)|Morphological segmentation]] – separates words into individual [[morphemes]] and identifies the class of the morphemes. The difficulty of this task depends greatly on the complexity of the [[morphology (linguistics)|morphology]] (i.e. the structure of words) of the language being considered.
* [[Named
* [[Ontology learning]] –
* [[Parsing]] – determines the [[parse tree]] (grammatical analysis) of a given sentence. The [[grammar]] for [[natural language]]s is [[ambiguous]] and typical sentences have multiple possible analyses. In fact, perhaps surprisingly, for a typical sentence there may be thousands of potential parses (most of which will seem completely nonsensical to a human).
** [[Shallow parsing]] –
* [[Part-of-speech tagging]] – given a sentence, determines the [[part of speech]] for each word. Many words, especially common ones, can serve as multiple [[parts of speech]]. For example, "book" can be a [[noun]] ("the book on the table") or [[verb]] ("to book a flight"); "set" can be a [[noun]], [[verb]] or [[adjective]]; and "out" can be any of at least five different parts of speech.
* [[Query expansion]] –
* [[Relationship extraction]] – given a chunk of text, identifies the relationships among named entities (e.g. who is the wife of whom).
Line 233 ⟶ 236:
* [[Topic segmentation]] and recognition – given a chunk of text, separates it into segments each of which is devoted to a topic, and identifies the topic of the segment.
* [[Truecasing]] –
* [[Word segmentation]] – separates a chunk of continuous text into separate words. For a language like
* [[Word
** [[Word-sense induction]] – open problem of natural
** [[Automatic acquisition of sense-tagged corpora]] –
* [[W-shingling]] – set of unique "shingles"—contiguous subsequences of tokens in a document—that can be used to gauge the similarity of two documents. The w denotes the number of tokens in each shingle in the set.
==== Component processes of natural
[[Natural language generation|Natural-language generation]] – task of converting information from computer databases into readable human language.
* [[Automatic taxonomy induction]] (ATI) – automated building of [[tree structure]]s from a corpus. While ATI is used to construct the core of ontologies (and doing so makes it a component process of natural
* [[Document structuring]] –
== History of natural
[[History of natural language processing|History of natural-language processing]]
* [[History of machine translation]]
* [[Automated essay scoring#History|History of automated essay scoring]]
* [[Natural
* [[Natural
* [[Optical character recognition#History|History of optical character recognition]]
* [[Question answering#History|History of question answering]]
* [[Speech synthesis#History|History of speech synthesis]]
* [[Turing test]] – test of a machine's ability to exhibit intelligent behavior, equivalent to or indistinguishable from, that of an actual human. In the original illustrative example, a human judge engages in a natural
* [[Universal grammar]] – theory in [[linguistics]], usually credited to [[Noam Chomsky]], proposing that the ability to learn grammar is hard-wired into the brain.<ref>
* [[ALPAC]] – was a committee of seven scientists led by John R. Pierce, established in 1964 by the U. S. Government in order to evaluate the progress in computational linguistics in general and machine translation in particular. Its report, issued in 1966, gained notoriety for being very skeptical of research done in machine translation so far, and emphasizing the need for basic research in computational linguistics; this eventually caused the U. S. Government to reduce its funding of the topic dramatically.
* [[Conceptual dependency theory]] – a model of natural
* [[Augmented transition network]] – type of graph theoretic structure used in the operational definition of formal languages, used especially in parsing relatively complex natural languages, and having wide application in artificial intelligence. Introduced by William A. Woods in 1970.
* [[Distributed Language Translation]] (project) –
=== Timeline of NLP software ===
{|
! style="background-color:#ECE9EF;" | Software
Line 270 ⟶ 271:
! style="background-color:#EEF6D6;" | Reference
|-
|[[
|1954
|[[Georgetown University]] and [[IBM]]
Line 291 ⟶ 292:
|1970
|[[Terry Winograd]]
|a natural
|
|-
Line 324 ⟶ 325:
|1978
|Hendrix
|
|
|-
Line 347 ⟶ 348:
|
|-
|
|1982
|[[Rollo Carpenter]]
Line 389 ⟶ 390:
|[[IBM]]
|A question answering system that won the [[Jeopardy!]] contest, defeating the best human players in February 2011.
|-
|MeTA
Line 409 ⟶ 404:
|}
== General natural
* [[Sukhotin's algorithm]] – statistical classification algorithm for classifying characters in a text as vowels or consonants. It was initially created by Boris V. Sukhotin.
* [[T9 (predictive text)]] – stands for "Text on 9 keys", is a USA-patented predictive text technology for mobile phones (specifically those that contain a 3x4 numeric keypad), originally developed by Tegic Communications, now part of Nuance Communications.
* [[Tatoeba]] – free collaborative online database of example sentences geared towards foreign
* [[Teragram Corporation]] – fully owned subsidiary of SAS Institute, a major producer of statistical analysis software, headquartered in Cary, North Carolina, USA. Teragram is based in Cambridge, Massachusetts and specializes in the application of computational linguistics to multilingual natural
* [[TipTop Technologies]] – company that developed TipTop Search, a real-time web, social search engine with a unique platform for semantic analysis of natural language. TipTop Search provides results capturing individual and group sentiment, opinions, and experiences from content of various sorts including real-time messages from Twitter or consumer product reviews on Amazon.com.
* [[Transderivational search]] – when a search is being conducted for a fuzzy match across a broad field. In computing the equivalent function can be performed using content-addressable memory.
Line 427 ⟶ 422:
* [[Brill tagger]] –
* [[Cache language model]] –
* [[ChaSen]], [[MeCab]] – provide morphological analysis and word splitting for
* [[Classic monolingual WSD]] –
* [[ClearForest]] –
Line 433 ⟶ 428:
* [[Concept mining]] –
* [[Content determination]] –
*
* [[DBpedia Spotlight]] –
* [[Deep linguistic processing]] –
Line 448 ⟶ 442:
* [[Grammatik]] –
* [[Hashing-Trick]] –
* [[Hidden
* [[Human language technology]] –
* [[Information extraction]] –
Line 455 ⟶ 449:
* [[Language Computer Corporation]] –
* [[Language model]] –
* [[
* [[Latent semantic mapping]] –
* [[Legal information retrieval]] –
Line 478 ⟶ 472:
* [[Naive semantics]] –
* [[Natural language]] –
* [[Natural-language user interface|Natural-language interface]] –
* [[Natural
* [[News analytics]] –
* [[Nondeterministic polynomial]] –
Line 502 ⟶ 495:
* [[String kernel]] –
== Natural
* [[Google Ngram Viewer]] – graphs ''n''-gram usage from a corpus of more than 5.2 million books
=== Corpora ===
* [[Text corpus]] (see [[List of text corpora|list]]) – large and structured set of texts (nowadays usually electronically stored and processed). They are used to do statistical analysis and hypothesis testing, checking occurrences or validating linguistic rules within a specific language territory.
** [[Bank of English]]
Line 513 ⟶ 505:
** [[Oxford English Corpus]]
=== Natural
The following '''natural
{|class="wikitable sortable"
!Name!!Language!!License!!Creators
|-
|[[Apertium]] || [[C++]], [[Java (programming language)|Java]] || [[GPL]] || (various)
|-
|[[ChatScript]] || [[C++]] || [[GPL]] || [[Bruce Wilcox]]
|-
|[[Deeplearning4j]]||[[Java (programming language)|Java]], [[Scala (programming language)|Scala]]||[[Apache License|Apache 2.0]]|| Adam Gibson, Skymind
|-
|[[DELPH-IN]]||[[LISP]], [[C++]]||[[LGPL]], [[MIT License|MIT]], ... || Deep Linguistic Processing with [[HPSG]] Initiative
|-
|[[Distinguo]]||[[C++]]||Commercial ||Ultralingua Inc.
|-
|[[DKPro]] Core||[[Java (programming language)|Java]]||[[Apache License|Apache 2.0]] / Varying for individual modules ||[[Technische Universität Darmstadt]] / Online community
|-
|[[
|-
|[[
|-
|[[
|-
|[[
|-
|[[
|-
|[[
|-
|[[
|-
|Apache [[OpenNLP]]||[[Java (programming language)|Java]]||[[Apache Software Foundation|Apache License 2.0]]||Online community
|-
|[[spaCy]]
Line 573 ⟶ 543:
|[[MIT License|MIT]]
|Matthew Honnibal, Explosion AI
|-
|[[UIMA]]||[[Java (programming language)|Java]] / [[C++]]
|-
|}
=== Named
* ABNER (A Biomedical Named
*
=== Translation software ===
Line 601 ⟶ 558:
** [[DeepL]]
** [[Linguee]] – web service that provides an online dictionary for a number of language pairs. Unlike similar services, such as LEO, Linguee incorporates a search engine that provides access to large amounts of bilingual, translated sentence pairs, which come from the World Wide Web. As a translation aid, Linguee therefore differs from machine translation services like Babelfish and is more similar in function to a translation memory.
** [[Universal Networking Language|UNL]] Universal Networking Language
** [[Yahoo! Babel Fish]]
Line 607 ⟶ 563:
=== Other software ===
* [[CTAKES]] – open-source natural-language processing system for information extraction from electronic medical record clinical free-text. It processes clinical notes, identifying types of clinical named entities — drugs, diseases/disorders, signs/symptoms, anatomical sites and procedures. Each named entity has attributes for the text span, the ontology mapping code, context (family history of, current, unrelated to patient), and negated/not negated. Also known as Apache cTAKES.
* [[Digital Media Access Protocol|DMAP]] –
* [[ETAP-3]] – proprietary linguistic processing system focusing on English and Russian.<ref>{{cite web|url=http://www.iitp.ru/ru/science/works/452.htm |title=МНОГОЦЕЛЕВОЙ ЛИНГВИСТИЧЕСКИЙ ПРОЦЕССОР ЭТАП-3 |publisher=Iitp.ru |access-date
* [[JAPE (linguistics)|JAPE]] – the Java Annotation Patterns Engine, a component of the open-source General Architecture for Text Engineering (GATE) platform. JAPE is a finite state transducer that operates over annotations based on regular expressions.
* [[LOLITA]] – "Large-scale, Object-based, Linguistic Interactor, Translator and Analyzer". LOLITA was developed by Roberto Garigliano and colleagues between 1986 and 2000. It was designed as a general-purpose tool for processing unrestricted text that could be the basis of a wide variety of applications. At its core was a semantic network containing some 90,000 interlinked concepts.
* [[Maluuba]] –
* [[METAL MT]] –
* [[Never-Ending Language Learning]] – semantic machine learning system developed by a research team at Carnegie Mellon University, and supported by grants from DARPA, Google, and the NSF, with portions of the system running on a supercomputing cluster provided by Yahoo!.<ref name=NYT2010>{{cite news
* [[NLTK]] –
* [[Online-translator.com]] –
Line 626 ⟶ 580:
* [[Weka (machine learning)|Weka's]] classification tools –
* [[word2vec]] – models that were developed by a team of researchers led by Thomas Milkov at Google to generate word embeddings that can reconstruct some of the linguistic context of words using shallow, two dimensional neural nets derived from a much larger vector space.
* [[Festival Speech Synthesis System]] –
* [[CMU Sphinx]] speech recognition system –
* [[Language Grid]] – Open source platform for language web services, which can customize language services by combining existing language services.
=== Chatterbots ===
{{Main|List of chatbots}}
{{For|online chatterbots with [[avatar (computing)|avatars]]|Automated online assistant}}
[[Chatterbot]] – a text-based conversation [[Software agent|agent]] that can interact with human users through some medium, such as an [[instant message]] service. Some chatterbots are designed for specific purposes, while others converse with human users on a wide range of topics.<!--
Line 637 ⟶ 591:
Please add new entries alphabetically to the appropriate section according to the guidelines on the TalkPage regarding encyclopaedic relevance. Provide references and short descriptions as appropriate.-->
==== Classic chatterbots ====
* [[Dr. Sbaitso]]
* [[
* [[
* [[Racter]] (or Claude Chatterbot)
* [[Mark V Shaney]]
==== General chatterbots ====
* [[Albert One]] – 1998 and 1999 [[Loebner Prize|Loebner]] winner, by [[Robby Garner]].
* [[Artificial Linguistic Internet Computer Entity|A.L.I.C.E.]] * [[Charlix]]
* [[Cleverbot]] (winner of the 2010 Mechanical Intelligence Competition)
* [[Elbot]]
* [[Eugene Goostman]]
* [[Fred (chatterbot)|Fred]]
* [[Jabberwacky]]
* [[Jeeney AI]]
* [[MegaHAL]]
* [[Mitsuku]], 2013 and 2016 [[Loebner Prize]] winner<ref>{{cite web|url=http://www.paulmckevitt.com/loebner2013/|title=Loebner Prize Contest 2013 |publisher=People.exeter.ac.uk |date=2013-09-14 |
* Rose - ... 2015 - 3x [[Loebner Prize]] winner, by [[Bruce Wilcox]].
* [[SimSimi]]
* [[Starship Titanic#Gameplay|Spookitalk]]
* [[Ultra Hal Assistant|Ultra Hal]]
* [[Verbot]]
==== Instant messenger chatterbots ====
* [[GooglyMinotaur]], specializing in [[Radiohead]], the first bot released by [[ActiveBuddy]] (June 2001-March 2002)<ref>{{Cite news
| last = Gibes
| first = Al
| title = Circle of buddies grows ever wider
| work = Las Vegas Review-Journal (Nevada) <!--|
| date = 2002-03-25
}}</ref>
* [[SmarterChild]], developed by [[ActiveBuddy]] and released in June 2001<ref>{{Cite news
|url=http://www.thefreelibrary.com/ActiveBuddy+Introduces+Software+to+Create+and+Deploy+Interactive...-a088988298
| title = ActiveBuddy Introduces Software to Create and Deploy Interactive Agents for Text Messaging; ActiveBuddy Developer Site Now Open: www.BuddyScript.com
| work = Business Wire
|
| date = 2002-07-15
}}</ref>
Line 689 ⟶ 639:
| number = 2
| date = Summer 1998
|
| url = http://www.foo.be/docs/tpj/issues/vol3_2/tpj0302-0002.html
}}</ref>
* [[Negobot]], a bot designed to catch online pedophiles by posing as a young girl and attempting to elicit personal details from people it speaks to.<ref>{{cite
==
* [[AFNLP]] (Asian Federation of Natural Language Processing Associations) – the organization for coordinating the natural-language processing related activities and events in the Asia-Pacific region.
* [[Australasian Language Technology Association]] –
* [[Association for Computational Linguistics]] – international scientific and professional society for people working on problems involving natural
=== Natural
* [[Annual Meeting of the Association for Computational Linguistics]] (ACL)
* [[International Conference on Intelligent Text Processing and Computational Linguistics]] (CICLing)
* [[International Conference on Language Resources and Evaluation]] – biennial conference organised by the European Language Resources Association with the support of institutions and organisations involved in
* [[Annual Conference of the North American Chapter of the Association for Computational Linguistics]] (NAACL)
* [[Text, Speech and Dialogue]] (TSD) – annual conference
* [[Text Retrieval Conference]] (TREC) – on-going series of workshops focusing on various information retrieval (IR) research areas, or tracks
=== Companies involved in natural
* [[AlchemyAPI]] – service provider of a natural
* [[Google, Inc.]] – the Google search engine is an example of automatic summarization, utilizing keyphrase extraction.
* [[Calais (Reuters product)]] – provider of a natural
* [[
== Natural
=== Books ===
* ''[https://www.amazon.com/Connectionist-Statistical-Symbolic-Approaches-Processing/dp/3540609253 Connectionist, Statistical and Symbolic Approaches to Learning for Natural Language Processing ]'' – Wermter, S., Riloff E. and Scheler, G. (editors).<ref>{{cite book |last1=Wermter |first1=Stephan |title=Connectionist, Statistical and Symbolic Approaches to Learning for Natural Language Processing |year=1996|publisher=Springer |author2=Ellen Riloff |author3=Gabriele Scheler }}</ref> First book that addressed statistical and neural network learning of language.
* ''[http://www.cs.colorado.edu/~martin/slp.html Speech and Language Processing: An Introduction to Natural Language Processing, Speech Recognition, and Computational Linguistics]'' – by [[Daniel Jurafsky]] and [[James H. Martin]].<ref>{{cite book |last1=Jurafsky |first1=Dan |title=Speech and Language Processing. An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition |year=2008|publisher=Prentice Hall |___location=Upper Saddle River (N.J.) |page=2 |url=http://www.cs.colorado.edu/~martin/slp.html |author2=James H. Martin |edition=2nd}}</ref> Introductory book on language technology.
Line 736 ⟶ 675:
* ''[[Computational Linguistics (journal)|Computational Linguistics]]'' – peer-reviewed academic journal in the field of computational linguistics. It is published quarterly by MIT Press for the Association for Computational Linguistics (ACL)
== People influential in natural
* [[Daniel Bobrow]] –
* [[Rollo Carpenter]] – creator of Jabberwacky and Cleverbot.
Line 742 ⟶ 681:
| url = http://www.cs.bham.ac.uk/~pjh/sem1a5/pt1/pt1_history.html
| title = SEM1A5 - Part 1 - A brief history of NLP
|
}}</ref>
* [[Kenneth Colby]] –
Line 748 ⟶ 687:
* [[Lyn Frazier]] –
* [[Daniel Jurafsky]] – Professor of Linguistics and Computer Science at Stanford University. With [[James H. Martin]], he wrote the textbook ''Speech and Language Processing: An Introduction to Natural Language Processing, Speech Recognition, and Computational Linguistics''
* [[Roger Schank]] – introduced the [[conceptual dependency theory]] for natural
* [[Jean E. Fox Tree]] –
* [[Alan Turing]] – originator of the [[Turing Test]].
Line 754 ⟶ 693:
* [[Terry Winograd]] – professor of computer science at Stanford University, and co-director of the Stanford Human-Computer Interaction Group. He is known within the philosophy of mind and artificial intelligence fields for his work on natural language using the SHRDLU program.
* [[William Aaron Woods]] –
* [[Maurice Gross]] – author of the concept of local grammar,<ref name="AHI">[http://hdl.handle.net/2042/14456 Ibrahim, Amr Helmy. 2002. "Maurice Gross (1934-2001). À la mémoire de Maurice Gross". ''Hermès'' 34.]</ref> taking finite automata as the competence model of language.<ref name="RD">[http://www.nyu.edu/pages/linguistics/kaliedoscope/mauricegross13.pdf Dougherty, Ray. 2001. ''Maurice Gross Memorial Letter''.]</ref>
* [[Stephen Wolfram]] – CEO and founder of [[Wolfram Research]], creator of the programming language (natural
* [[Victor Yngve]] –
Line 764 ⟶ 703:
* [[Watson (computer)]]
* [[Biomedical text mining]]
* [[Compound
* [[Computer-assisted reviewing]]
* [[Controlled natural language]]
* [[Deep linguistic processing]]
* [[Foreign
* [[Foreign
* [[Language technology]]
* [[Latent Dirichlet allocation|Latent Dirichlet allocation (LDA)]]
* [[Latent semantic indexing]]
* [[List of natural language processing projects|List of natural-language processing projects]]
* [[LRE Map]]
* [[Natural
* [[Reification (linguistics)]]
* [[Semantic folding]]
Line 783 ⟶ 722:
* [[Word2vec]]
}}
== References ==
{{Reflist|30em}}
== Bibliography ==
* {{Crevier 1993}}
* {{Citation | last=McCorduck | first=Pamela | title=Machines Who Think | year=2004 | edition=2nd | ___location=Natick, MA | publisher=A. K. Peters, Ltd. | isbn=978-1-56881-205-2 | oclc=52197627}}.
* {{Russell Norvig 2003}}.
== External links ==
{{Sister project links|Natural language processing}}
{{Outline footer}}
[[Category:Natural language processing|*]]
[[Category:
[[Category:Outlines|Natural language processing]]
|