Automatic summarization: Difference between revisions

Content deleted Content added
Rescuing 1 sources and tagging 0 as dead.) #IABot (v2.0.8.8
Citation bot (talk | contribs)
Removed URL that duplicated identifier. Removed access-date with no URL. | Use this bot. Report bugs. | #UCB_CommandLine
 
(46 intermediate revisions by 28 users not shown)
Line 1:
{{Short description|Computer-based method for summarizing a text}}
{{More citations needed|date=April 2022}}
'''Automatic summarization''' is the process of shortening a set of data computationally, to create a subset (a [[Abstract (summary)|summary]]) that represents the most important or relevant information within the original content. [[Artificial intelligence]] [[algorithm]]s are commonly developed and employed to achieve this, specialized for different types of data.
 
In[[Plain additiontext|Text]] tosummarization text,is imagesusually andimplemented videosby can[[natural alsolanguage beprocessing]] summarized.methods, Textdesigned summarizationto findslocate the most informative sentences in a given document;.<ref name="Torres2014">{{cite book|author1=Torres-Moreno, Juan-Manuel|title=Automatic Text Summarization|url=https://www.wiley.com/en-gb/Automatic+Text+Summarization-p-9781848216686|date=1 October 2014|publisher=Wiley|isbn=978-1-848-21668-6|pages=320–}}</ref> variousOn methodsthe ofother imagehand, visual content can be summarized using [[computer vision]] algorithms. [[Image]] summarization areis the subject of ongoing research,; withexisting someapproaches lookingtypically attempt to display the most representative images from a given image collection, or generatinggenerate a video; that only includes the most important content from the entire collection.<ref>{{Cite journal|last1=Pan|first1=Xingjia|last2=Tang|first2=Fan|last3=Dong|first3=Weiming|last4=Ma|first4=Chongyang|last5=Meng|first5=Yiping|last6=Huang|first6=Feiyue|last7=Lee|first7=Tong-Yee|last8=Xu|first8=Changsheng|date=2021-04-01|title=Content-Based Visual Summarization for Image Collection|journal=IEEE Transactions on Visualization and Computer Graphics|volume=27|issue=4|pages=2298–2312|doi=10.1109/tvcg.2019.2948611|pmid=31647438|s2cid=204865221|issn=1077-2626}}</ref><ref>{{Cite news|date=January 10, 2018|title=WIPO PUBLISHES PATENT OF KT FOR "IMAGE SUMMARIZATION SYSTEM AND METHOD" (SOUTH KOREAN INVENTORS)|work=US Fed News Service|url=https://www.proquest.com/docview/1986931333|access-date=January 22, 2021|id={{ProQuest|1986931333}}}}</ref><ref>{{Cite journal|last1=Li Tan|last2=Yangqiu Song|last3=Shixia Liu|author3-link=Shixia Liu|last4=Lexing Xie|date=February 2012|title=ImageHive: Interactive Content-Aware Image Summarization|journal=IEEE Computer Graphics and Applications|volume=32|issue=1|pages=46–55|doi=10.1109/mcg.2011.89|pmid=24808292|s2cid=7668289|issn=0272-1716}}</ref> videoVideo summarization extractsalgorithms identify and extract from the original video content the most important frames from(''key-frames''), and/or the most important video contentsegments (''key-shots''), normally in a temporally ordered fashion.<ref name="PalPetrosino2012">{{cite book|author1=Sankar K. Pal|author2=Alfredo Petrosino|author3=Lucia Maddalena|title=Handbook on Soft Computing for Video Surveillance|url=https://books.google.com/books?id=O0fNBQAAQBAJ&q=video+surveillance+summarization&pg=PA81|date=25 January 2012|publisher=CRC Press|isbn=978-1-4398-5685-7|pages=81–}}</ref><ref name="Elhamifar2012">{{cite book |last1=Elhamifar |first1=Ehsan |last2=Sapiro |first2=Guillermo |last3=Vidal |first3=Rene |title=2012 IEEE Conference on Computer Vision and Pattern Recognition |chapter=See all by looking at a few: Sparse modeling for finding representative objects |year=2012 |pages=1600–1607 |publisher=IEEE |doi=10.1109/CVPR.2012.6247852 |isbn=978-1-4673-1228-8 |s2cid=5909301 }}</ref><ref name="Mademlis2016">{{cite journal |last1=Mademlis |first1=Ioannis |last2=Tefas |first2=Anastasios |last3=Nikolaidis |first3=Nikos |last4=Pitas |first4=Ioannis |title=Multimodal stereoscopic movie summarization conforming to narrative characteristics |url=https://research-information.bris.ac.uk/files/111433536/Ioannis_Pitas_Multimodal_Stereoscopic_Movie_Summarization_Conforming_to_Narrative_Characteristics.pdf |journal=IEEE Transactions on Image Processing |year=2016 |volume=25 |issue=12 |pages=5828–5840 |publisher=IEEE |doi=10.1109/TIP.2016.2615289 |pmid=28113502 |bibcode=2016ITIP...25.5828M |hdl=1983/2bcdd7a5-825f-4ac9-90ec-f2f538bfcb72 |s2cid=18566122 |access-date=4 December 2022}}</ref><ref name="Mademlis2018">{{cite journal |last1=Mademlis |first1=Ioannis |last2=Tefas |first2=Anastasios |last3=Pitas |first3=Ioannis |title=A salient dictionary learning framework for activity video summarization via key-frame extraction |url=https://www.sciencedirect.com/science/article/abs/pii/S0020025517311398 |journal=Information Sciences |year=2018 |volume=432 |pages=319–331 |publisher=Elsevier |doi=10.1016/j.ins.2017.12.020 |access-date=4 December 2022|url-access=subscription }}</ref> Video summaries simply retain a carefully selected subset of the original video frames and, therefore, are not identical to the output of [[video synopsis]] algorithms, where ''new'' video frames are being synthesized based on the original video content.
 
== Commercial products ==
In 2022 [[Google Docs]] released an automatic summarization feature.<ref>{{Cite web |title=Auto-generated Summaries in Google Docs |url=http://ai.googleblog.com/2022/03/auto-generated-summaries-in-google-docs.html |access-date=2022-04-03 |website=Google AI Blog |date=23 March 2022 |language=en}}</ref>
 
==Approaches==
Line 16:
Here, content is extracted from the original data, but the extracted content is not modified in any way. Examples of extracted content include key-phrases that can be used to "tag" or index a text document, or key sentences (including headings) that collectively comprise an abstract, and representative images or video segments, as stated above. For text, extraction is analogous to the process of skimming, where the summary (if available), headings and subheadings, figures, the first and last paragraphs of a section, and optionally the first and last sentences in a paragraph are read before one chooses to read the entire document in detail.<ref>Richard Sutz, Peter Weverka. How to skim text. https://www.dummies.com/education/language-arts/speed-reading/how-to-skim-text/ Accessed Dec 2019.</ref> Other examples of extraction that include key sequences of text in terms of clinical relevance (including patient/problem, intervention, and outcome).<ref name="Afzal_et_al"/>
 
===AbstractionAbstractive-based summarization===
 
Abstractive summarization methods generate new text that did not exist in the original text.<ref>{{Cite book |last=Zhai |first=ChengXiang |title=Text data management and analysis : a practical introduction to information retrieval and text mining |date=2016 |others=Sean Massung |isbn=978-1-970001-19-8 |page=321 |___location=[New York, NY] |oclc=957355971}}</ref> This has been applied mainly for text. Abstractive methods build an internal semantic representation of the original content (often called a language model), and then use this representation to create a summary that is closer to what a human might express. Abstraction may transform the extracted content by [[automated paraphrasing|paraphrasing]] sections of the source document, to condense a text more strongly than extraction. Such transformation, however, is computationally much more challenging than extraction, involving both [[natural language processing]] and often a deep understanding of the ___domain of the original text in cases where the original document relates to a special field of knowledge. "Paraphrasing" is even more difficult to apply to images and videos, which is why most summarization systems are extractive.
"Paraphrasing" is even more difficult to apply to image and video, which is why most summarization systems are extractive.
 
===Aided summarization===
Line 48 ⟶ 47:
 
====Supervised learning approaches====
Beginning with the work of Turney,<ref>{{Cite journal |arxiv = cs/0212020|last1 = Turney|first1 = Peter D|title = Learning Algorithms for Keyphrase Extraction|journal = Information Retrieval, )|volume = 2|issue = 4|pages = 303–336|year = 2002|doi = 10.1023/A:1009976227802|bibcode = 2002cs.......12020T|s2cid = 7007323}}</ref> many researchers have approached keyphrase extraction as a [[supervised machine learning]] problem.
Given a document, we construct an example for each [[unigram]], [[bigram]], and trigram found in the text (though other text units are also possible, as discussed below). We then compute various features describing each example (e.g., does the phrase begin with an upper-case letter?). We assume there are known keyphrases available for a set of training documents. Using the known keyphrases, we can assign positive or negative labels to the examples. Then we learn a classifier that can discriminate between positive and negative examples as a function of the features. Some classifiers make a [[binary classification]] for a test example, while others assign a probability of being a keyphrase. For instance, in the above text, we might learn a rule that says phrases with initial capital letters are likely to be keyphrases.
After training a learner, we can select keyphrases for test documents in the following manner. We apply the same example-generation strategy to the test documents, then run each example through the learner. We can determine the keyphrases by looking at binary classification decisions or probabilities returned from our learned model. If probabilities are given, a threshold is used to select the keyphrases.
Keyphrase extractors are generally evaluated using [[precision and recall]]. Precision measures how
many of the proposed keyphrases are actually correct. Recall measures how many of the true
keyphrases your system proposed. The two measures can be combined in an F-score, which is the
Line 58 ⟶ 57:
Designing a supervised keyphrase extraction system involves deciding on several choices (some of these apply to unsupervised, too). The first choice is exactly how to generate examples. Turney and others have used all possible unigrams, bigrams, and trigrams without intervening punctuation and after removing stopwords. Hulth showed that you can get some improvement by selecting examples to be sequences of tokens that match certain patterns of part-of-speech tags. Ideally, the mechanism for generating examples produces all the known labeled keyphrases as candidates, though this is often not the case. For example, if we use only unigrams, bigrams, and trigrams, then we will never be able to extract a known keyphrase containing four words. Thus, recall may suffer. However, generating too many examples can also lead to low precision.
 
We also need to create features that describe the examples and are informative enough to allow a learning algorithm to discriminate keyphrases from non- keyphrases. Typically features involve various term frequencies (how many times a phrase appears in the current text or in a larger corpus), the length of the example, relative position of the first occurrence, various booleanBoolean syntactic features (e.g., contains all caps), etc. The Turney paper used about 12 such features. Hulth uses a reduced set of features, which were found most successful in the KEA (Keyphrase Extraction Algorithm) work derived from Turney's seminal paper.
 
In the end, the system will need to return a list of keyphrases for a test document, so we need to have a way to limit the number. Ensemble methods (i.e., using votes from several classifiers) have been used to produce numeric scores that can be thresholded to provide a user-provided number of keyphrases. This is the technique used by Turney with C4.5 decision trees. Hulth used a single binary classifier so the learning algorithm implicitly determines the appropriate number.
Line 82 ⟶ 81:
===Document summarization===
Like keyphrase extraction, document summarization aims to identify the essence of a text. The only real difference is that now we are dealing with larger text units—whole sentences instead of words and phrases.
 
Before getting into the details of some summarization methods, we will mention how summarization systems are typically evaluated. The most common way is using the so-called [[ROUGE (metric)|ROUGE]] (Recall-Oriented Understudy for Gisting Evaluation) measure. This is a recall-based measure that determines how well a system-generated summary covers the content present in one or more human-generated model summaries known as references. It is recall-based to encourage systems to include all the important topics in the text. Recall can be computed with respect to unigram, bigram, trigram, or 4-gram matching. For example, ROUGE-1 is computed as division of count of unigrams in reference that appear in system and count of unigrams in reference summary.
 
If there are multiple references, the ROUGE-1 scores are averaged. Because ROUGE is based only on content overlap, it can determine if the same general concepts are discussed between an automatic summary and a reference summary, but it cannot determine if the result is coherent or the sentences flow together in a sensible manner. High-order n-gram ROUGE measures try to judge fluency to some degree.
Note that ROUGE is similar to the BLEU measure for machine translation, but BLEU is precision- based, because translation systems favor accuracy.
 
A promising line in document summarization is adaptive document/text summarization.<ref>{{Cite journal | doi=10.3103/S0005105510030027|title = Automatic genre recognition and adaptive text summarization| journal=Automatic Documentation and Mathematical Linguistics| volume=44| issue=3| pages=111–120|year = 2010|last1 = Yatsko|first1 = V. A.| last2=Starikov| first2=M. S.| last3=Butakov| first3=A. V.|s2cid = 1586931}}</ref> The idea of adaptive summarization involves preliminary recognition of document/text genre and subsequent application of summarization algorithms optimized for this genre. First summarizes that perform adaptive summarization have been created.<ref>[http://yatsko.zohosites.com/universal-summarizer-unis.html UNIS (Universal Summarizer)]</ref>
 
====Supervised learning approaches====
Supervised text summarization is very much like supervised keyphrase extraction. Basically, if you have a collection of documents and human-generated summaries for them, you can learn features of sentences that make them good candidates for inclusion in the summary. Features might include the position in the document (i.e., the first few sentences are probably important), the number of words in the sentence, etc. The main difficulty in supervised extractive summarization is that the known summaries must be manually created by extracting sentences so the sentences in an original training document can be labeled as "in summary" or "not in summary". This is not typically how people create summaries, so simply using journal abstracts or existing summaries is usually not sufficient. The sentences in these summaries do not necessarily match up with sentences in the original text, so it would be difficult to assign labels to examples for training. Note, however, that these natural summaries can still be used for evaluation purposes, since ROUGE-1 onlyevaluation caresonly aboutconsiders unigrams.
 
====Maximum entropy-based summarization====
During the DUC 2001 and 2002 evaluation workshops, [[Netherlands Organisation for Applied Scientific Research|TNO]] developed a sentence extraction system for multi-document summarization in the news ___domain. The system was based on a hybrid system using a [[naiveNaive Bayes]] classifier]] and statistical language models for modeling salience. Although the system exhibited good results, the researchers wanted to explore the effectiveness of a [[maximum entropy classifier|maximum entropy]] (ME) classifier for the meeting summarization task, as ME is known to be robust against feature dependencies. Maximum entropy has also been applied successfully for summarization in the broadcast news ___domain.
 
==== Adaptive summarization ====
A promising line in document summarizationapproach is adaptive document/text summarization.<ref>{{Cite journal |last1=Yatsko doi|first1=10V.3103/S0005105510030027 A. |titlelast2=Starikov |first2=M. S. |last3=Butakov |first3=A. V. |year=2010 |title=Automatic genre recognition and adaptive text summarization| |journal=Automatic Documentation and Mathematical Linguistics| |volume=44| |issue=3| |pages=111–120|year = 2010|last1 doi= Yatsko|first1 = V10.3103/S0005105510030027 A.| last2=Starikov| first2=M. S.| last3=Butakov| first3=A. V.|s2cid = 1586931}}</ref> The idea of adaptive summarizationIt involves preliminaryfirst recognitionrecognizing ofthe document/text genre and subsequentthen application ofapplying summarization algorithms optimized for this genre. FirstSuch summarizes that perform adaptive summarizationsoftware havehas been created.<ref>[http://yatsko.zohosites.com/universal-summarizer-unis.html UNIS (Universal Summarizer)]</ref>
 
====TextRank and LexRank====
Line 109 ⟶ 104:
It is worth noting that TextRank was applied to summarization exactly as described here, while LexRank was used as part of a larger summarization system ([[MEAD]]) that combines the LexRank score (stationary probability) with other features like sentence position and length using a [[linear combination]] with either user-specified or automatically tuned weights. In this case, some training documents might be needed, though the TextRank results show the additional features are not absolutely necessary.
 
Unlike TextRank, LexRank has been applied to multi-document summarization.
Another important distinction is that TextRank was used for single document summarization, while LexRank has been applied to multi-document summarization. The task remains the same in both cases—only the number of sentences to choose from has grown. However, when summarizing multiple documents, there is a greater risk of selecting duplicate or highly redundant sentences to place in the same summary. Imagine you have a cluster of news articles on a particular event, and you want to produce one summary. Each article is likely to have many similar sentences, and you would only want to include distinct ideas in the summary. To address this issue, LexRank applies a heuristic post-processing step that builds up a summary by adding sentences in rank order, but discards any sentences that are too similar to ones already placed in the summary. The method used is called Cross-Sentence Information Subsumption (CSIS).
 
These methods work based on the idea that sentences "recommend" other similar sentences to the reader. Thus, if one sentence is very similar to many others, it will likely be a sentence of great importance. The importance of this sentence also stems from the importance of the sentences "recommending" it. Thus, to get ranked highly and placed in a summary, a sentence must be similar to many sentences that are in turn also similar to many other sentences. This makes intuitive sense and allows the algorithms to be applied to any arbitrary new text. The methods are ___domain-independent and easily portable. One could imagine the features indicating important sentences in the news ___domain might vary considerably from the biomedical ___domain. However, the unsupervised "recommendation"-based approach applies to any ___domain.
 
====Multi-document summarization====
Line 117 ⟶ 110:
'''Multi-document summarization''' is an automatic procedure aimed at extraction of information from multiple texts written about the same topic. Resulting summary report allows individual users, such as professional information consumers, to quickly familiarize themselves with information contained in a large cluster of documents. In such a way, multi-document summarization systems are complementing the [[news aggregators]] performing the next step down the road of coping with [[information overload]]. Multi-document summarization may also be done in response to a question.<ref>"[https://www.academia.edu/2475776/Versatile_question_answering_systems_seeing_in_synthesis Versatile question answering systems: seeing in synthesis]", International Journal of Intelligent Information Database Systems, 5(2), 119-142, 2011.</ref><ref name="Afzal_et_al">Afzal M, Alam F, Malik KM, Malik GM, [https://www.jmir.org/2020/10/e19810/ Clinical Context-Aware Biomedical Text Summarization Using Deep Neural Network: Model Development and Validation], J Med Internet Res 2020;22(10):e19810, DOI: 10.2196/19810, PMID 33095174</ref>
 
Multi-document summarization creates information reports that are both concise and comprehensive. With different opinions being put together and outlined, every topic is described from multiple perspectives within a single document. While the goal of a brief summary is to simplify information search and cut the time by pointing to the most relevant source documents, comprehensive multi-document summary should itself contain the required information, hence limiting the need for accessing original files to cases when refinement is required. Automatic summaries present information extracted from multiple sources algorithmically, without any editorial touch or subjective human intervention, thus making it completely unbiased. {{dubious|date=June 2018}}
Automatic summaries present information extracted from multiple sources algorithmically, without any editorial touch or subjective human intervention, thus making it completely unbiased. {{dubious|date=June 2018}}
 
=====Incorporating diversityDiversity=====
Multi-document extractive summarization faces a problem of potential redundancy. Ideally, we would likewant to extract sentences that are both "central" (i.e., contain the main ideas) and "diverse" (i.e., they differ from one another). LexRankFor dealsexample, within diversitya asset aof heuristicnews finalarticles stageabout usingsome CSISevent, andeach otherarticle systemsis likely to have usedmany similar methods,sentences. suchTo asaddress Maximalthis Marginalissue, RelevanceLexRank (MMR),<ref>Carbonell,applies Jaime,a andheuristic Jadepost-processing Goldstein.step "[https://www.cs.cmu.edu/afs/.cs.cmu.edu/Web/People/jgc/publication/MMR_DiversityBased_Reranking_SIGIR_1998.pdfthat Theadds usesentences ofin rank MMRorder, diversity-basedbut rerankingdiscards forsentences reorderingthat documentsare andtoo producingsimilar summaries]."to Proceedingsones ofalready in the 21stsummary. annualThis internationalmethod ACMis SIGIRcalled conferenceCross-Sentence onInformation ResearchSubsumption and(CSIS). developmentThese inmethods informationwork retrieval.based ACM,on 1998.</ref>the inidea tryingthat tosentences eliminate"recommend" redundancyother insimilar informationsentences retrievalto the resultsreader. ThereThus, if one sentence is avery generalsimilar purposeto graph-basedmany rankingothers, algorithmit likewill Page/Lex/TextRanklikely thatbe handlesa bothsentence "centrality"of andgreat "diversity"importance. inIts aimportance unifiedalso mathematicalstems frameworkfrom basedthe onimportance [[absorbingof Markovthe chain]]sentences random"recommending" walksit. (AnThus, absorbingto randomget walkranked ishighly likeand aplaced standardin randoma walksummary, excepta somesentence statesmust arebe nowsimilar absorbingto statesmany sentences that actare asin "blackturn holes"also thatsimilar causeto themany walkother tosentences. endThis abruptlymakes atintuitive thatsense state.)and Theallows algorithmthe isalgorithms calledto GRASSHOPPER.<ref>Zhu,be Xiaojin,applied etto al.an "[http://www.aclweb.org/anthology/N07-1013arbitrary Improvingnew Diversitytext. inThe Rankingmethods usingare Absorbing___domain-independent Randomand Walks]."easily HLT-NAACLportable. 2007.</ref>One Incould additionimagine tothe explicitlyfeatures promotingindicating diversityimportant duringsentences in the rankingnews process,___domain GRASSHOPPERmight incorporatesvary aconsiderably priorfrom rankingthe (basedbiomedical on___domain. sentenceHowever, positionthe inunsupervised the"recommendation"-based caseapproach ofapplies summarization)to any ___domain.
 
A related method is Maximal Marginal Relevance (MMR),<ref>Carbonell, Jaime, and Jade Goldstein. "[https://www.cs.cmu.edu/afs/.cs.cmu.edu/Web/People/jgc/publication/MMR_DiversityBased_Reranking_SIGIR_1998.pdf The use of MMR, diversity-based reranking for reordering documents and producing summaries]." Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 1998.</ref> which uses a general-purpose graph-based ranking algorithm like Page/Lex/TextRank that handles both "centrality" and "diversity" in a unified mathematical framework based on [[absorbing Markov chain]] random walks (a random walk where certain states end the walk). The algorithm is called GRASSHOPPER.<ref>Zhu, Xiaojin, et al. "[http://www.aclweb.org/anthology/N07-1013 Improving Diversity in Ranking using Absorbing Random Walks]." HLT-NAACL. 2007.</ref> In addition to explicitly promoting diversity during the ranking process, GRASSHOPPER incorporates a prior ranking (based on sentence position in the case of summarization).
The state of the art results for multi-document summarization, however, are obtained using mixtures of submodular functions. These methods have achieved the state of the art results for Document Summarization Corpora, DUC 04 - 07.<ref>Hui Lin, Jeff Bilmes. "[https://arxiv.org/abs/1210.4871 Learning mixtures of submodular shells with application to document summarization]</ref> Similar results were also achieved with the use of determinantal point processes (which are a special case of submodular functions) for DUC-04.<ref>Alex Kulesza and Ben Taskar, [http://www.nowpublishers.com/article/DownloadSummary/MAL-044 Determinantal point processes for machine learning]. Foundations and Trends in Machine Learning, December 2012.</ref>
 
The state of the art results for multi-document summarization, however, are obtained using mixtures of submodular functions. These methods have achieved the state of the art results for Document Summarization Corpora, DUC 04 - 07.<ref>Hui Lin, Jeff Bilmes. "[https://arxiv.org/abs/1210.4871 Learning mixtures of submodular shells with application to document summarization]</ref> Similar results were also achieved with the use of determinantal point processes (which are a special case of submodular functions) for DUC-04.<ref>Alex Kulesza and Ben Taskar, [http://www.nowpublishers.com/article/DownloadSummary/MAL-044 Determinantal point processes for machine learning]. Foundations and Trends in Machine Learning, December 2012.</ref>
A new method for multi-lingual multi-document summarization that avoids redundancy works by simplifying and generating ideograms that represent the meaning of each sentence in each document and then evaluates similarity "qualitatively" by comparing the shape and position of said ideograms has recently been developed. This tool does not use word frequency, does not need training or preprocessing of any kind and works by generating ideograms that represent the meaning of each sentence and then summarizes using two user-supplied parameters: equivalence (when are two sentences to be considered equivalent) and relevance (how long is the desired summary).
 
A new method for multi-lingual multi-document summarization that avoids redundancy works by simplifying and generatinggenerates ideograms thatto represent the meaning of each sentence in each document and, then evaluates similarity "qualitatively" by comparing theideogram shape and position of said ideograms has recently been developed. This toolIt does not use word frequency, does not need training or preprocessing. of any kind and works by generating ideograms that represent the meaning of each sentence and then summarizesIt usinguses two user-supplied parameters: equivalence (when are two sentences to be considered equivalent?) and relevance (how long is the desired summary?).
 
===Submodular functions as generic tools for summarization===
The idea of a [[submodular set function]] has recently emerged as a powerful modeling tool for various summarization problems. Submodular functions naturally model notions of ''coverage'', ''information'', ''representation'' and ''diversity''. Moreover, several important [[combinatorial optimization]] problems occur as special instances of submodular optimization. For example, the [[set cover problem]] is a special case of submodular optimization, since the set cover function is submodular. The set cover function attempts to find a subset of objects which ''cover'' a given set of concepts. For example, in document summarization, one would like the summary to cover all important and relevant concepts in the document. This is an instance of set cover. Similarly, the [[Optimal facility ___location|facility ___location problem]] is a special case of submodular functions. The Facility Location function also naturally models coverage and diversity. Another example of a submodular optimization problem is using a [[determinantal point process]] to model diversity. Similarly, the Maximum-Marginal-Relevance procedure can also be seen as an instance of submodular optimization. All these important models encouraging coverage, diversity and information are all submodular. Moreover, submodular functions can be efficiently combined, and the resulting function is still submodular. Hence, one could combine one submodular function which models diversity, another one which models coverage and use human supervision to learn a right model of a submodular function for the problem.
 
While submodular functions are fitting problems for summarization, they also admit very efficient algorithms for optimization. For example, a simple [[greedy algorithm]] admits a constant factor guarantee.<ref>Nemhauser, George L., Laurence A. Wolsey, and Marshall L. Fisher. "An analysis of approximations for maximizing submodular set functions—I." Mathematical Programming 14.1 (1978): 265-294.</ref> Moreover, the greedy algorithm is extremely simple to implement and can scale to large datasets, which is very important for summarization problems.
Line 134 ⟶ 128:
Submodular functions have achieved state-of-the-art for almost all summarization problems. For example, work by Lin and Bilmes, 2012<ref>Hui Lin, Jeff Bilmes. "[https://arxiv.org/abs/1210.4871 Learning mixtures of submodular shells with application to document summarization]", UAI, 2012</ref> shows that submodular functions achieve the best results to date on DUC-04, DUC-05, DUC-06 and DUC-07 systems for document summarization. Similarly, work by Lin and Bilmes, 2011,<ref>Hui Lin, Jeff Bilmes. "[http://www.aclweb.org/anthology/P11-1052 A Class of Submodular Functions for Document Summarization]", The 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (ACL-HLT), 2011</ref> shows that many existing systems for automatic summarization are instances of submodular functions. This was a breakthrough result establishing submodular functions as the right models for summarization problems.{{citation needed|date=June 2018}}
 
Submodular Functions have also been used for other summarization tasks. Tschiatschek et al., 2014 show<ref>Sebastian Tschiatschek, Rishabh Iyer, Hoachen Wei and Jeff Bilmes, [http://papers.nips.cc/paper/5415-learning-mixtures-of-submodular-functions-for-image-collection-summarization.pdf Learning Mixtures of Submodular Functions for Image Collection Summarization], In Advances of Neural Information Processing Systems (NIPS), Montreal, Canada, December - 2014.</ref> that mixtures of submodular functions achieve state-of-the-art results for image collection summarization. Similarly, Bairi et al., 2015<ref>Ramakrishna Bairi, Rishabh Iyer, Ganesh Ramakrishnan and Jeff Bilmes, [http://www.aclweb.org/anthology/P15-1054 Summarizing Multi-Document Topic Hierarchies using Submodular Mixtures], To Appear In the Annual Meeting of the Association for Computational Linguistics (ACL), Beijing, China, July - 2015</ref> show the utility of submodular functions for summarizing multi-document topic hierarchies. Submodular Functions have also successfully been used for summarizing machine learning datasets.<ref>Kai Wei, Rishabh Iyer, and Jeff Bilmes, [http://www.jmlr.org/proceedings/papers/v37/wei15.pdf Submodularity in Data Subset Selection and Active Learning] {{Webarchive|url=https://web.archive.org/web/20170313220928/http://jmlr.org/proceedings/papers/v37/wei15.pdf |date=2017-03-13 }}, To Appear In Proc. International Conference on Machine Learning (ICML), Lille, France, June - 2015</ref>
 
===Applications===
Line 140 ⟶ 134:
Specific applications of automatic summarization include:
* The [[Reddit]] [[Internet bot|bot]] "autotldr",<ref>{{cite web|title=overview for autotldr|url=https://www.reddit.com/user/autotldr|website=reddit|access-date=9 February 2017|language=en}}</ref> created in 2011 summarizes news articles in the comment-section of reddit posts. It was found to be very useful by the reddit community which upvoted its summaries hundreds of thousands of times.<ref>{{cite book|last1=Squire|first1=Megan|author-link = Megan Squire|title=Mastering Data Mining with Python – Find patterns hidden in your data|publisher=Packt Publishing Ltd|isbn=9781785885914|url=https://books.google.com/books?id=_qXWDQAAQBAJ&pg=PA185|access-date=9 February 2017|language=en|date=2016-08-29}}</ref> The name is reference to [[TL;DR]] − [[Internet slang]] for "too long; didn't read".<ref>{{cite web|title=What Is 'TLDR'?|url=https://www.lifewire.com/what-is-tldr-2483633|website=Lifewire|access-date=9 February 2017}}</ref><ref>{{cite web|title=What Does TL;DR Mean? AMA? TIL? Glossary Of Reddit Terms And Abbreviations|url=http://www.ibtimes.com/what-does-tldr-mean-ama-til-glossary-reddit-terms-abbreviations-431704|work=International Business Times|access-date=9 February 2017|date=29 March 2012}}</ref>
* [[Adversarial stylometry]] may make use of summaries, if the detail lost is not major and the summary is sufficiently stylistically different to the input.{{sfn|Potthast|Hagen|Stein|2016|p=11-12}}
 
==Evaluation techniques==
<!-- IMPORTANT: This section needs to be tied in to the above article so it fits in. Currently, it is not clear what the relation of evaluation is to any of the above topics. The following questions need to be answered: First, in the context of automatic summarization, what is evaluation? Second, what is the significance of evaluation? That is, what is evaluation used for?
-->
The most common way to evaluate the informativeness of automatic summaries is to compare them with human-made model summaries.
 
Evaluation techniquescan fall intobe intrinsic andor extrinsic,<ref>[http://research.nii.ac.jp/ntcir/workshop/OnlineProceedings2/sum-mani.pdf Mani, I. Summarization evaluation: an overview]</ref> and inter-textual andor intra-textual.<ref>{{Cite journal | doi=10.3103/S0005105507030041|title = A method for evaluating modern systems of automatic text summarization| journal=Automatic Documentation and Mathematical Linguistics| volume=41| issue=3| pages=93–103|year = 2007|last1 = Yatsko|first1 = V. A.| last2=Vishnyakov| first2=T. N.|s2cid = 7853204}}</ref>
 
=== Intrinsic andversus extrinsic evaluation ===
An intrinsicIntrinsic evaluation testsassesses the summarizationsummaries system in and of itselfdirectly, while an extrinsic evaluation testsevaluates how the summarization based on how itsystem affects the completion of some other task. Intrinsic evaluations have assessed mainly the coherence and informativeness of summaries. Extrinsic evaluations, on the other hand, have tested the impact of summarization on tasks like relevance assessment, reading comprehension, etc.
assessed mainly the coherence and informativeness of summaries. Extrinsic evaluations, on the other hand, have tested the impact of summarization on tasks like relevance assessment, reading comprehension, etc.
 
=== Inter-textual andversus intra-textual ===
Intra-textual methodsevaluation assess the output of a specific summarization system, and thewhile inter-textual onesevaluation focusfocuses on contrastive analysis of outputs of several summarization systems.
 
Human judgement often hasvaries widegreatly variance onin what isit consideredconsiders a "good" summary, whichso meanscreating thatan making theautomatic evaluation process automatic is particularly difficult. Manual evaluation can be used, but this is both time and labor-intensive, as it requires humans to read not only the summaries but also the source documents. Other issues are those concerning [[coherence (linguistics)|coherence]] and coverage.
 
OneThe ofmost thecommon metricsway usedto inevaluate summaries is [[NISTROUGE (metric)|ROUGE]]'s annual(Recall-Oriented DocumentUnderstudy Understandingfor Conferences,Gisting inEvaluation). whichIt researchis groupsvery submit their systemscommon for both summarization and translation tasks,systems isin the[[NIST]]'s ROUGEDocument metric (Recall-Oriented Understudy for Gisting EvaluationUnderstanding Conferences.[https://web.archive.org/web/20060408135021/http://haydn.isi.edu/ROUGE/]) ROUGE is a recall-based measure of how well a summary covers the content of human-generated summaries known as references. It essentially calculates [[n-gram]] overlaps between automatically generated summaries and previously written human summaries. AIt highis levelrecall-based ofto overlapencourage should indicate a high levelinclusion of sharedall conceptsimportant betweentopics the twoin summaries. NoteRecall thatcan overlapbe metricscomputed likewith this are unablerespect to provideunigram, anybigram, feedbacktrigram, onor a summary's4-gram coherencematching. [[anaphoraFor (linguistics)|Anaphorexample, resolution]]ROUGE-1 remainsis anotherthe problemfraction yetof tounigrams bethat fullyappear solved.in Similarly,both forthe imagereference summarization,summary Tschiatschekand etthe al.,automatic developedsummary aout Visual-ROUGEof scoreall whichunigrams judgesin the performancereference ofsummary. algorithmsIf forthere imageare summarization.<ref>Sebastianmultiple Tschiatschek,reference Rishabh Iyersummaries, Hoachentheir Weiscores andare Jeffaveraged. Bilmes,A [http://papers.nips.cc/paper/5415-learning-mixtures-of-submodular-functions-for-image-collection-summarization.pdfhigh Learning Mixtureslevel of Submodularoverlap Functionsshould forindicate Imagea Collection Summarization], Inhigh Advancesdegree of Neuralshared Informationconcepts Processingbetween Systemsthe (NIPS),two Montreal, Canada, December - 2014summaries. (PDF)</ref>
 
ROUGE cannot determine if the result is coherent, that is if sentences flow together in a sensibly. High-order n-gram ROUGE measures help to some degree.
===Domain specific versus ___domain independent summarization techniques===
Domain independent summarization techniques generally apply sets of general features which can be used to identify information-rich text segments. Recent research focus has drifted to ___domain-specific summarization techniques that utilize the available knowledge specific to the ___domain of text. For example, automatic summarization research on medical text generally attempts to utilize the various sources of codified medical knowledge and ontologies.<ref>{{Cite book|last1=Sarker|first1=Abeed|last2=Molla|first2=Diego|last3=Paris|first3=Cecile|title=An Approach for Query-focused Text Summarization for Evidence-based medicine|date=2013|volume=7885|pages=295–304|doi=10.1007/978-3-642-38326-7_41|series=Lecture Notes in Computer Science|isbn=978-3-642-38325-0}}</ref>
 
Another unsolved problem is [[anaphora (linguistics)|Anaphor resolution]]. Similarly, for image summarization, Tschiatschek et al., developed a Visual-ROUGE score which judges the performance of algorithms for image summarization.<ref>Sebastian Tschiatschek, Rishabh Iyer, Hoachen Wei and Jeff Bilmes, [http://papers.nips.cc/paper/5415-learning-mixtures-of-submodular-functions-for-image-collection-summarization.pdf Learning Mixtures of Submodular Functions for Image Collection Summarization], In Advances of Neural Information Processing Systems (NIPS), Montreal, Canada, December - 2014. (PDF)</ref>
===Evaluating summaries qualitatively===
 
The main drawback of the evaluation systems existing so far is that we need at least one reference summary, and for some methods more than one, to be able to compare automatic summaries with models. This is a hard and expensive task. Much effort has to be done in order to have corpus of texts and their corresponding summaries. Furthermore, for some methods, not only do we need to have human-made summaries available for comparison, but also manual annotation has to be performed in some of them (e.g. SCU in the Pyramid Method). In any case, what the evaluation methods need as an input, is a set of summaries to serve as gold standards and a set of automatic summaries. Moreover, they all perform a quantitative evaluation with regard to different similarity metrics.
===Domain -specific versus ___domain -independent summarization techniques===
Domain -independent summarization techniques generally apply sets of general features which can be used to identify information-rich text segments. Recent research focusfocuses has drifted toon ___domain-specific summarization techniques that utilize the availableusing knowledge specific to the ___domain of text.'s For example___domain, automaticsuch summarization research onas medical textknowledge generallyand attemptsontologies tofor utilize the various sources of codifiedsummarizing medical knowledge and ontologiestexts.<ref>{{Cite book|last1=Sarker|first1=Abeed|last2=Molla|first2=Diego|last3=Paris|first3=Cecile|title=Artificial Intelligence in Medicine |chapter=An Approach for Query-focusedFocused Text SummarizationSummarisation for Evidence-based medicineBased Medicine |date=2013|volume=7885|pages=295–304|doi=10.1007/978-3-642-38326-7_41|series=Lecture Notes in Computer Science|isbn=978-3-642-38325-0}}</ref>
 
===Qualitative===
The main drawback of the evaluation systems existing so far is that we need at least onea reference summary, and (for some methods, more than one), to be able to compare automatic summaries with models. This is a hard and expensive task. Much effort has to be done in ordermade to havecreate corpuscorpora of texts and their corresponding summaries. Furthermore, for some methods, not only do we need to have human-made summaries available for comparison, but alsorequire manual annotation hasof tothe be performed in some of themsummaries (e.g. SCU in the Pyramid Method). In any case, what the evaluation methods need as an input, is a set of summaries to serve as gold standards and a set of automatic summaries. Moreover, they all perform a quantitative evaluation with regard to different similarity metrics.
 
==History==
The first publication in the area dates back to 1957 <ref> Luhn, Hans Peter (1957). "A Statistical Approach to Mechanized Encoding and Searching of Literary Information" (PDF). IBM Journal of Research and Development. 1 (4): 309–317. doi:10.1147/rd.14.0309.</ref> ([[Hans Peter Luhn]]), starting with a statistical technique. Research increased significantly in 2015. [[Term frequency–inverse document frequency]] had been used by 2016. Pattern-based summarization was the most powerful option for multi-document summarization found by 2016. In the following year it was surpassed by [[latent semantic analysis]] (LSA) combined with [[non-negative matrix factorization]] (NMF). Although they did not replace other approaches and are often combined with them, by 2019 machine learning methods dominated the extractive summarization of single documents, which was considered to be nearing maturity. By 2020, the field was still very active and research is shifting towards abstractive summation and real-time summarization.<ref>{{Cite journal|date=2020-05-20|title=Review of automatic text summarization techniques & methods|journal=Journal of King Saud University - Computer and Information Sciences|language=en|doi=10.1016/j.jksuci.2020.05.006|issn=1319-1578|last1=Widyassari|first1=Adhika Pramita|last2=Rustad|first2=Supriadi|last3=Shidik|first3=Guruh Fajar|last4=Noersasongko|first4=Edi|last5=Syukur|first5=Abdul|last6=Affandy|first6=Affandy|last7=Setiadi|first7=De Rosal Ignatius Moses|volume=34 |issue=4 |pages=1029–1046 |doi-access=free}}</ref>
 
===Recent approaches===
Recently the rise of [[Transformer (machine learning model)|Transformertransformer models]] replacing more traditional [[Rnn (software)|RNN]] ([[LSTM]]) have provided a flexibility in the mapping of text sequences to text sequences of a different type, which is well suited to automatic summarization. This includes models such as T5<ref>{{Cite web |title=Exploring Transfer Learning with T5: the Text-To-Text Transfer Transformer |url=http://ai.googleblog.com/2020/02/exploring-transfer-learning-with-t5.html |access-date=2022-04-03 |website=Google AI Blog |date=24 February 2020 |language=en}}</ref> and Pegasus.<ref>Zhang, J., Zhao, Y., Saleh, M., & Liu, P. (2020, November). Pegasus: Pre-training with extracted gap-sentences for abstractive summarization. In International Conference on Machine Learning (pp. 11328-11339). PMLR.</ref>
 
==See also==
Line 178 ⟶ 176:
==References==
{{Reflist|2}}
 
== Works cited ==
* {{ cite conference | url = https://ceur-ws.org/Vol-1609/16090716.pdf | title = Author Obfuscation: Attacking the State of the Art in Authorship Verification | last1 = Potthast | first1 = Martin | last2 = Hagen | first2 = Matthias | last3 = Stein | first3 = Benno | conference = Conference and Labs of the Evaluation Forum | year = 2016 }}
 
== Further reading ==
*{{cite book |last=Hercules |first=Dalianis |year=2003 |title=Porting and evaluation of automatic summarization|url=https://www.researchgate.net/publication/277288103}}
*{{cite book |last=Roxana |first=Angheluta |year=2002 |title=The Use of Topic Segmentation for Automatic Summarization|url=https://www.researchgate.net/publication/2553088}}
*{{cite book |last=Anne |first=Buist |year=2004 |title=Automatic Summarization of Meeting Data: A Feasibility Study |url=https://www.cs.ru.nl/~kraaijw/pubs/Biblio/papers/meeting_sum_tno.pdf |access-date=2020-07-19 |archive-date=2021-01-23 |archive-url=https://web.archive.org/web/20210123014007/http://www.cs.ru.nl/~kraaijw/pubs/Biblio/papers/meeting_sum_tno.pdf |url-status=dead }}
*{{cite book |last=Annie |first=Louis |year=2009 |title=Performance Confidence Estimation for Automatic Summarization|url=https://repository.upenn.edu/cgi/viewcontent.cgi?article=1762&context=cis_papers}}
*{{cite book |last=Elena |first=Lloret and Manuel, Palomar |year=2009 |title=Challenging Issues of Automatic Summarization: Relevance Detection and Quality-based Evaluation |url=http://www.informatica.si/ojs-2.4.3/index.php/informatica/article/download/273/269 |access-date=2018-10-03 |archive-date=2018-10-03 |archive-url=https://web.archive.org/web/20181003061926/http://www.informatica.si/ojs-2.4.3/index.php/informatica/article/download/273/269 |url-status=dead }}
*{{cite book |last=Andrew |first=Goldberg |year=2007 |title=Automatic Summarization}}
*{{cite book |last=Alrehamy |first=Hassan |year=2017 |title=AutomaticAdvances Keyphrasesin ExtractionComputational Intelligence Systems |volume=650 |pages=222–235 |doi=10.1007/978-3-319-66939-7_19 |chapter=SemCluster: Unsupervised Automatic Keyphrase Extraction Using Affinity Propagation |series=Advances in Intelligent Systems and Computing |date=2018 |isbn=978-3-319-66938-0 }}
*{{cite book |last=Endres-Niggemeyer |first=Brigitte |year=1998 |title=Summarizing Information |publisher=Springer |url=https://archive.org/details/springer_10.1007-978-3-642-72025-3 |isbn=978-3-540-63735-6}}
*{{cite book |last=Marcu |first=Daniel |year=2000 |title=The Theory and Practice of Discourse Parsing and Summarization |publisher=MIT Press |isbn=978-0-262-13372-2}}
*{{cite book |last=Mani |first=Inderjeet |year=2001 |title=Automatic Summarization |isbn=978-1-58811-060-2}}
*{{cite book |last=Huff |first=Jason |year=2010 |title=AutoSummarize |url=http://www.jason-huff.com/projects/autosummarize/}}, Conceptual artwork using automatic summarization software in Microsoft Word 2008.
Line 195 ⟶ 196:
*{{cite book |last=Miranda-Jiménez |first=Sabino, Gelbukh, Alexander, and Sidorov, Grigori |year=2013 |doi=10.1007/978-3-642-35786-2_18 |title=Conceptual Structures for STEM Research and Education |volume=7735 |pages=245–253 |series=Lecture Notes in Computer Science |isbn=978-3-642-35785-5 |chapter=Summarizing Conceptual Graphs for Automatic Summarization Task }}, Conceptual Structures for STEM Research and Education.
 
{{Natural Language Processing}},
 
[[Category:Computational linguistics]]