Content deleted Content added
m Disambiguating links to Facility ___location problem (link changed to Optimal facility ___location) using DisamAssist. |
Citation bot (talk | contribs) Removed URL that duplicated identifier. Removed access-date with no URL. | Use this bot. Report bugs. | #UCB_CommandLine |
||
(12 intermediate revisions by 10 users not shown) | |||
Line 1:
{{Short description|Computer-based method for summarizing a text}}
{{More citations needed|date=April 2022}}
'''Automatic summarization''' is the process of shortening a set of data computationally, to create a subset (a [[Abstract (summary)|summary]]) that represents the most important or relevant information within the original content. [[Artificial intelligence]] [[algorithm
[[Plain text|Text]] summarization is usually implemented by [[natural language processing]] methods, designed to locate the most informative sentences in a given document.<ref name="Torres2014">{{cite book|author1=Torres-Moreno, Juan-Manuel|title=Automatic Text Summarization|url=https://www.wiley.com/en-gb/Automatic+Text+Summarization-p-9781848216686|date=1 October 2014|publisher=Wiley|isbn=978-1-848-21668-6|pages=320–}}</ref> On the other hand, visual content can be summarized using [[computer vision]] algorithms. [[Image]] summarization is the subject of ongoing research; existing approaches typically attempt to display the most representative images from a given image collection, or generate a video that only includes the most important content from the entire collection.<ref>{{Cite journal|last1=Pan|first1=Xingjia|last2=Tang|first2=Fan|last3=Dong|first3=Weiming|last4=Ma|first4=Chongyang|last5=Meng|first5=Yiping|last6=Huang|first6=Feiyue|last7=Lee|first7=Tong-Yee|last8=Xu|first8=Changsheng|date=2021-04-01|title=Content-Based Visual Summarization for Image Collection|journal=IEEE Transactions on Visualization and Computer Graphics|volume=27|issue=4|pages=2298–2312|doi=10.1109/tvcg.2019.2948611|pmid=31647438|s2cid=204865221|issn=1077-2626}}</ref><ref>{{Cite news|date=January 10, 2018|title=WIPO PUBLISHES PATENT OF KT FOR "IMAGE SUMMARIZATION SYSTEM AND METHOD" (SOUTH KOREAN INVENTORS)|work=US Fed News Service|url=https://www.proquest.com/docview/1986931333|access-date=January 22, 2021|id={{ProQuest|1986931333}}}}</ref><ref>{{Cite journal|last1=Li Tan|last2=Yangqiu Song|last3=Shixia Liu|author3-link=Shixia Liu|last4=Lexing Xie|date=February 2012|title=ImageHive: Interactive Content-Aware Image Summarization|journal=IEEE Computer Graphics and Applications|volume=32|issue=1|pages=46–55|doi=10.1109/mcg.2011.89|pmid=24808292|s2cid=7668289|issn=0272-1716}}</ref> Video summarization algorithms identify and extract from the original video content the most important frames (''key-frames''), and/or the most important video segments (''key-shots''), normally in a temporally ordered fashion.<ref name="PalPetrosino2012">{{cite book|author1=Sankar K. Pal|author2=Alfredo Petrosino|author3=Lucia Maddalena|title=Handbook on Soft Computing for Video Surveillance|url=https://books.google.com/books?id=O0fNBQAAQBAJ&q=video+surveillance+summarization&pg=PA81|date=25 January 2012|publisher=CRC Press|isbn=978-1-4398-5685-7|pages=81–}}</ref><ref name="Elhamifar2012">{{cite book |last1=Elhamifar |first1=Ehsan |last2=Sapiro |first2=Guillermo |last3=Vidal |first3=Rene |title=2012 IEEE Conference on Computer Vision and Pattern Recognition |chapter=See all by looking at a few: Sparse modeling for finding representative objects
== Commercial products ==
Line 18:
===Abstractive-based summarization===
Abstractive summarization methods generate new text that did not exist in the original text.<ref>{{Cite book |last=Zhai |first=ChengXiang
===Aided summarization===
Line 57:
Designing a supervised keyphrase extraction system involves deciding on several choices (some of these apply to unsupervised, too). The first choice is exactly how to generate examples. Turney and others have used all possible unigrams, bigrams, and trigrams without intervening punctuation and after removing stopwords. Hulth showed that you can get some improvement by selecting examples to be sequences of tokens that match certain patterns of part-of-speech tags. Ideally, the mechanism for generating examples produces all the known labeled keyphrases as candidates, though this is often not the case. For example, if we use only unigrams, bigrams, and trigrams, then we will never be able to extract a known keyphrase containing four words. Thus, recall may suffer. However, generating too many examples can also lead to low precision.
We also need to create features that describe the examples and are informative enough to allow a learning algorithm to discriminate keyphrases from non- keyphrases. Typically features involve various term frequencies (how many times a phrase appears in the current text or in a larger corpus), the length of the example, relative position of the first occurrence, various
In the end, the system will need to return a list of keyphrases for a test document, so we need to have a way to limit the number. Ensemble methods (i.e., using votes from several classifiers) have been used to produce numeric scores that can be thresholded to provide a user-provided number of keyphrases. This is the technique used by Turney with C4.5 decision trees. Hulth used a single binary classifier so the learning algorithm implicitly determines the appropriate number.
Line 164:
==History==
The first publication in the area dates back to 1957 <ref>
===Recent approaches===
Line 183:
*{{cite book |last=Hercules |first=Dalianis |year=2003 |title=Porting and evaluation of automatic summarization|url=https://www.researchgate.net/publication/277288103}}
*{{cite book |last=Roxana |first=Angheluta |year=2002 |title=The Use of Topic Segmentation for Automatic Summarization|url=https://www.researchgate.net/publication/2553088}}
*{{cite book |last=Anne |first=Buist |year=2004 |title=Automatic Summarization of Meeting Data: A Feasibility Study |url=https://www.cs.ru.nl/~kraaijw/pubs/Biblio/papers/meeting_sum_tno.pdf |access-date=2020-07-19 |archive-date=2021-01-23 |archive-url=https://web.archive.org/web/20210123014007/http://www.cs.ru.nl/~kraaijw/pubs/Biblio/papers/meeting_sum_tno.pdf |url-status=dead }}
*{{cite book |last=Annie |first=Louis |year=2009 |title=Performance Confidence Estimation for Automatic Summarization|url=https://repository.upenn.edu/cgi/viewcontent.cgi?article=1762&context=cis_papers}}
*{{cite book |last=Elena |first=Lloret and Manuel, Palomar |year=2009 |title=Challenging Issues of Automatic Summarization: Relevance Detection and Quality-based Evaluation |url=http://www.informatica.si/ojs-2.4.3/index.php/informatica/article/download/273/269 |access-date=2018-10-03 |archive-date=2018-10-03 |archive-url=https://web.archive.org/web/20181003061926/http://www.informatica.si/ojs-2.4.3/index.php/informatica/article/download/273/269 |url-status=dead }}
|