Content deleted Content added
Citation bot (talk | contribs) Add: pages, issue, volume. Removed proxy/dead URL that duplicated identifier. Formatted dashes. | Use this bot. Report bugs. | Suggested by AManWithNoPlan | #UCB_CommandLine |
|||
Line 32:
An example of a summarization problem is document summarization, which attempts to automatically produce an abstract from a given document. Sometimes one might be interested in generating a summary from a single source document, while others can use multiple source documents (for example, a [[cluster analysis|cluster]] of articles on the same topic). This problem is called [[multi-document summarization]]. A related application is summarizing news articles. Imagine a system, which automatically pulls together news articles on a given topic (from the web), and concisely represents the latest news as a summary.
Image collection summarization is another application example of automatic summarization. It consists in selecting a representative set of images from a larger set of images.<ref>Jorge E. Camargo and Fabio A. González. A Multi-class Kernel Alignment Method for Image Collection Summarization. In Proceedings of the 14th Iberoamerican Conference on Pattern Recognition: Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications (CIARP '09), Eduardo Bayro-Corrochano and Jan-Olof Eklundh (Eds.). Springer-Verlag, Berlin, Heidelberg, 545-552. {{
At a very high level, summarization algorithms try to find subsets of objects (like set of sentences, or a set of images), which cover information of the entire set. This is also called the ''core-set''. These algorithms model notions like diversity, coverage, information and representativeness of the summary. Query based summarization techniques, additionally model for relevance of the summary with the query. Some techniques and algorithms which naturally model summarization problems are TextRank and PageRank, [[Submodular set function]], [[Determinantal point process]], maximal marginal relevance (MMR) etc.
Line 166:
==History==
The first publication in the area dates back to 1957 <ref> Luhn, Hans Peter (1957). "A Statistical Approach to Mechanized Encoding and Searching of Literary Information" (PDF). IBM Journal of Research and Development. 1 (4): 309–317. doi:10.1147/rd.14.0309.</ref> ([[Hans Peter Luhn]]), starting with a statistical technique. Research increased significantly in 2015. [[Term frequency–inverse document frequency]] had been used by 2016. Pattern-based summarization was the most powerful option for multi-document summarization found by 2016. In the following year it was surpassed by [[latent semantic analysis]] (LSA) combined with [[non-negative matrix factorization]] (NMF). Although they did not replace other approaches and are often combined with them, by 2019 machine learning methods dominated the extractive summarization of single documents, which was considered to be nearing maturity. By 2020, the field was still very active and research is shifting towards abstractive summation and real-time summarization.<ref>{{Cite journal|date=2020-05-20|title=Review of automatic text summarization techniques & methods
===Recent approaches===
|