Content deleted Content added
write those articles first |
reduce verbosity per Wikipedia:Manual_of_Style/Words_to_watch#Editorializing and MOS:NOTE |
||
Line 187:
* [[Naive Bayes spam filtering|Spam filtering]] –
* [[Sentiment analysis]] – extracts subjective information usually from a set of documents, often using online reviews to determine "polarity" about specific objects. It is especially useful for identifying trends of public opinion in the social media, for the purpose of marketing.
* [[Speech recognition]] – given a sound clip of a person or people speaking, determine the textual representation of the speech. This is the opposite of [[text to speech]] and is one of the extremely difficult problems colloquially termed "[[AI-complete]]" (see above). In [[natural speech]] there are hardly any pauses between successive words, and thus [[speech segmentation]] is a necessary subtask of speech recognition (see below).
* [[Speech synthesis]] (Text-to-speech) –
* [[Text-proofing]] –
Line 216:
* [[Lemmatisation]] – groups together all like terms that share a same lemma such that they are classified as a single item.
* [[Morphology (linguistics)|Morphological segmentation]] – separates words into individual [[morphemes]] and identifies the class of the morphemes. The difficulty of this task depends greatly on the complexity of the [[morphology (linguistics)|morphology]] (i.e. the structure of words) of the language being considered. [[English language|English]] has fairly simple morphology, especially [[inflectional morphology]], and thus it is often possible to ignore this task entirely and simply model all possible forms of a word (e.g. "open, opens, opened, opening") as separate words. In languages such as [[Turkish language|Turkish]], however, such an approach is not possible, as each dictionary entry has thousands of possible word forms.
* [[Named entity recognition]] (NER) – given a stream of text, determines which items in the text map to proper names, such as people or places, and what the type of each such name is (e.g. person, ___location, organization).
* [[Ontology learning]] – automatic or semi-automatic creation of [[Ontology (information science)|ontologies]], including extracting the corresponding ___domain's terms and the relationships between those concepts from a corpus of natural language text, and encoding them with an [[ontology language]] for easy retrieval. Also called "ontology extraction", "ontology generation", and "ontology acquisition".
* [[Parsing]] – determines the [[parse tree]] (grammatical analysis) of a given sentence. The [[grammar]] for [[natural language]]s is [[ambiguous]] and typical sentences have multiple possible analyses. In fact, perhaps surprisingly, for a typical sentence there may be thousands of potential parses (most of which will seem completely nonsensical to a human).
** [[Shallow parsing]] –
* [[Part-of-speech tagging]] – given a sentence, determines the [[part of speech]] for each word. Many words, especially common ones, can serve as multiple [[parts of speech]]. For example, "book" can be a [[noun]] ("the book on the table") or [[verb]] ("to book a flight"); "set" can be a [[noun]], [[verb]] or [[adjective]]; and "out" can be any of at least five different parts of speech.
* [[Query expansion]] –
* [[Relationship extraction]] – given a chunk of text, identifies the relationships among named entities (e.g. who is the wife of whom).
|