Revision as of 23:42, 28 April 2022 edit 2607:ea00:107:3c07:150d:25ad:f536:f15f (talk) →Further reading ← Previous edit		Revision as of 19:07, 11 June 2022 edit undo Citation bot (talk \| contribs) Bots 5,868,607 edits Add: pages, issue, volume. Removed proxy/dead URL that duplicated identifier. Formatted dashes. \| Use this bot. Report bugs. \| Suggested by AManWithNoPlan \| #UCB_CommandLine Next edit →
Line 32: An example of a summarization problem is document summarization, which attempts to automatically produce an abstract from a given document. Sometimes one might be interested in generating a summary from a single source document, while others can use multiple source documents (for example, a [[cluster analysis\|cluster]] of articles on the same topic). This problem is called [[multi-document summarization]]. A related application is summarizing news articles. Imagine a system, which automatically pulls together news articles on a given topic (from the web), and concisely represents the latest news as a summary. Image collection summarization is another application example of automatic summarization. It consists in selecting a representative set of images from a larger set of images.<ref>Jorge E. Camargo and Fabio A. González. A Multi-class Kernel Alignment Method for Image Collection Summarization. In Proceedings of the 14th Iberoamerican Conference on Pattern Recognition: Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications (CIARP '09), Eduardo Bayro-Corrochano and Jan-Olof Eklundh (Eds.). Springer-Verlag, Berlin, Heidelberg, 545-552. {{~~DOI~~doi\|10.1007/978-3-642-10268-4_64}}</ref> A summary in this context is useful to show the most representative images of results in an [[image collection exploration]] system. Video summarization is a related ___domain, where the system automatically creates a trailer of a long video. This also has applications in consumer or personal videos, where one might want to skip the boring or repetitive actions. Similarly, in surveillance videos, one would want to extract important and suspicious activity, while ignoring all the boring and redundant frames captured. At a very high level, summarization algorithms try to find subsets of objects (like set of sentences, or a set of images), which cover information of the entire set. This is also called the ''core-set''. These algorithms model notions like diversity, coverage, information and representativeness of the summary. Query based summarization techniques, additionally model for relevance of the summary with the query. Some techniques and algorithms which naturally model summarization problems are TextRank and PageRank, [[Submodular set function]], [[Determinantal point process]], maximal marginal relevance (MMR) etc. Line 166: ==History== The first publication in the area dates back to 1957 <ref> Luhn, Hans Peter (1957). "A Statistical Approach to Mechanized Encoding and Searching of Literary Information" (PDF). IBM Journal of Research and Development. 1 (4): 309–317. doi:10.1147/rd.14.0309.</ref> ([[Hans Peter Luhn]]), starting with a statistical technique. Research increased significantly in 2015. [[Term frequency–inverse document frequency]] had been used by 2016. Pattern-based summarization was the most powerful option for multi-document summarization found by 2016. In the following year it was surpassed by [[latent semantic analysis]] (LSA) combined with [[non-negative matrix factorization]] (NMF). Although they did not replace other approaches and are often combined with them, by 2019 machine learning methods dominated the extractive summarization of single documents, which was considered to be nearing maturity. By 2020, the field was still very active and research is shifting towards abstractive summation and real-time summarization.<ref>{{Cite journal\|date=2020-05-20\|title=Review of automatic text summarization techniques & methods~~\|url=https://www.sciencedirect.com/science/article/pii/S1319157820303712~~\|journal=Journal of King Saud University - Computer and Information Sciences\|language=en\|doi=10.1016/j.jksuci.2020.05.006\|issn=1319-1578\|last1=Widyassari\|first1=Adhika Pramita\|last2=Rustad\|first2=Supriadi\|last3=Shidik\|first3=Guruh Fajar\|last4=Noersasongko\|first4=Edi\|last5=Syukur\|first5=Abdul\|last6=Affandy\|first6=Affandy\|last7=Setiadi\|first7=De Rosal Ignatius Moses\|volume=34 \|issue=4 \|pages=1029–1046 \|doi-access=free}}</ref> ===Recent approaches===

Automatic summarization: Difference between revisions