Automatic summarization

Automatic Summarization is the creation of a shortened version of a text by a computer program. The product of this procedure still contains the most important points of the original text.

Access to coherent and correctly-developed text summaries can be of great use, especially in our time of information overload, in which the amount of information electronically available to us, grows every day. A good example of the use of summarization technology could be search engines such as Google.

Technologies that can make a coherent summary, of any kind of text, need to take into account several variables such as length, writing-style and syntax to make a useful summary.

Extraction and Abstraction

Broadly, one distinguishes two approaches: extraction and abstraction.

Extraction techniques merely copy the information deemed most important by the system to the summary, while abstraction involves paraphrasing sections of the source document. In general, abstraction can condense a text more strongly than extraction, but the programs that can do this are harder to develop.

Types of Summaries

There are different types of summaries depending what the summarization program focuses on to make the summary of the text, for example generic summaries or query relevant summaries (sometimes called query-biased summaries).

Nowadays, summarization systems are able to create both query relevant text summaries and generic machine-generated summaries depending on what the user needs. Summarization of multimedia documents, e.g. pictures or movies are also possible.

Aided summarization

Machine learning techniques from closely related fields such as Information Retrieval or Text mining have been successfully adapted to help automatic summarization.

Apart from Fully Automated Summarizers (FAS), there are systems that aid users with the task of summarization (MAHS = Machine Aided Human Summarization), for example by highlighting candidate passages to be included in the summary, and there are systems that depend on post-processing by a human (HAMS = Human Aided Machine Summarization).

Issues

One of the major issues of the current automatic summarization research is, how to evaluate the summarization system ? Or, in other words, how to automatically and systematically tell summary A is better than summary B ? Also, what is the "better" ? Would us prefer a perfect grammatical summary but contains very few information, or a no-so-well-written summary that has a lot of information ?

External link

Presentation about statistical summarization methods as wel as an on-line summarizer
Text Summarization
ACM Special Interest Group on Information Retrieval
Pertinence Summarizer, a commercial webpage summarization demo (lauch a query, click "summary", login: "google", password: "google")

Automatic summarization

Contents

Extraction and Abstraction

Types of Summaries

Aided summarization

Issues

Further Reading

See also

External link