Content deleted Content added
Tags: Reverted extraneous markup |
m Reverted edits by 95.214.210.48 (talk) (AV) |
||
Line 86:
Supervised text summarization is very much like supervised keyphrase extraction. Basically, if you have a collection of documents and human-generated summaries for them, you can learn features of sentences that make them good candidates for inclusion in the summary. Features might include the position in the document (i.e., the first few sentences are probably important), the number of words in the sentence, etc. The main difficulty in supervised extractive summarization is that the known summaries must be manually created by extracting sentences so the sentences in an original training document can be labeled as "in summary" or "not in summary". This is not typically how people create summaries, so simply using journal abstracts or existing summaries is usually not sufficient. The sentences in these summaries do not necessarily match up with sentences in the original text, so it would be difficult to assign labels to examples for training. Note, however, that these natural summaries can still be used for evaluation purposes, since ROUGE-1 evaluation only considers unigrams.
====Maximum entropy-based summarization====
During the DUC 2001 and 2002 evaluation workshops, [[Netherlands Organisation for Applied Scientific Research|TNO]] developed a sentence extraction system for multi-document summarization in the news ___domain. The system was based on a hybrid system using a [[naive Bayes]] classifier and statistical language models for modeling salience. Although the system exhibited good results, the researchers wanted to explore the effectiveness of a [[maximum entropy classifier|maximum entropy]] (ME) classifier for the meeting summarization task, as ME is known to be robust against feature dependencies. Maximum entropy has also been applied successfully for summarization in the broadcast news ___domain.
==== Adaptive summarization ====
|