Automatic summarization: Difference between revisions

Content deleted Content added
I added more real-world uses to the Applications section like research, journalism, legal, healthcare, and customer support. Also included DeepSeek Text Summarizer with the link. Kept the Reddit bot and stylometry part, just cleaned up and expanded. Whole thing looks more useful and professional now.
Tags: Manual revert Reverted references removed
Citation bot (talk | contribs)
Removed URL that duplicated identifier. Removed access-date with no URL. | Use this bot. Report bugs. | #UCB_CommandLine
 
(2 intermediate revisions by 2 users not shown)
Line 3:
'''Automatic summarization''' is the process of shortening a set of data computationally, to create a subset (a [[Abstract (summary)|summary]]) that represents the most important or relevant information within the original content. [[Artificial intelligence]] [[algorithm]]s are commonly developed and employed to achieve this, specialized for different types of data.
 
[[Plain text|Text]] summarization is usually implemented by [[natural language processing]] methods, designed to locate the most informative sentences in a given document.<ref name="Torres2014">{{cite book|author1=Torres-Moreno, Juan-Manuel|title=Automatic Text Summarization|url=https://www.wiley.com/en-gb/Automatic+Text+Summarization-p-9781848216686|date=1 October 2014|publisher=Wiley|isbn=978-1-848-21668-6|pages=320–}}</ref> On the other hand, visual content can be summarized using [[computer vision]] algorithms. [[Image]] summarization is the subject of ongoing research; existing approaches typically attempt to display the most representative images from a given image collection, or generate a video that only includes the most important content from the entire collection.<ref>{{Cite journal|last1=Pan|first1=Xingjia|last2=Tang|first2=Fan|last3=Dong|first3=Weiming|last4=Ma|first4=Chongyang|last5=Meng|first5=Yiping|last6=Huang|first6=Feiyue|last7=Lee|first7=Tong-Yee|last8=Xu|first8=Changsheng|date=2021-04-01|title=Content-Based Visual Summarization for Image Collection|journal=IEEE Transactions on Visualization and Computer Graphics|volume=27|issue=4|pages=2298–2312|doi=10.1109/tvcg.2019.2948611|pmid=31647438|s2cid=204865221|issn=1077-2626}}</ref><ref>{{Cite news|date=January 10, 2018|title=WIPO PUBLISHES PATENT OF KT FOR "IMAGE SUMMARIZATION SYSTEM AND METHOD" (SOUTH KOREAN INVENTORS)|work=US Fed News Service|url=https://www.proquest.com/docview/1986931333|access-date=January 22, 2021|id={{ProQuest|1986931333}}}}</ref><ref>{{Cite journal|last1=Li Tan|last2=Yangqiu Song|last3=Shixia Liu|author3-link=Shixia Liu|last4=Lexing Xie|date=February 2012|title=ImageHive: Interactive Content-Aware Image Summarization|journal=IEEE Computer Graphics and Applications|volume=32|issue=1|pages=46–55|doi=10.1109/mcg.2011.89|pmid=24808292|s2cid=7668289|issn=0272-1716}}</ref> Video summarization algorithms identify and extract from the original video content the most important frames (''key-frames''), and/or the most important video segments (''key-shots''), normally in a temporally ordered fashion.<ref name="PalPetrosino2012">{{cite book|author1=Sankar K. Pal|author2=Alfredo Petrosino|author3=Lucia Maddalena|title=Handbook on Soft Computing for Video Surveillance|url=https://books.google.com/books?id=O0fNBQAAQBAJ&q=video+surveillance+summarization&pg=PA81|date=25 January 2012|publisher=CRC Press|isbn=978-1-4398-5685-7|pages=81–}}</ref><ref name="Elhamifar2012">{{cite book |last1=Elhamifar |first1=Ehsan |last2=Sapiro |first2=Guillermo |last3=Vidal |first3=Rene |title=2012 IEEE Conference on Computer Vision and Pattern Recognition |chapter=See all by looking at a few: Sparse modeling for finding representative objects |url=https://ieeexplore.ieee.org/document/6247852 |year=2012 |pages=1600–1607 |publisher=IEEE |doi=10.1109/CVPR.2012.6247852 |isbn=978-1-4673-1228-8 |s2cid=5909301 |access-date=4 December 2022}}</ref><ref name="Mademlis2016">{{cite journal |last1=Mademlis |first1=Ioannis |last2=Tefas |first2=Anastasios |last3=Nikolaidis |first3=Nikos |last4=Pitas |first4=Ioannis |title=Multimodal stereoscopic movie summarization conforming to narrative characteristics |url=https://research-information.bris.ac.uk/files/111433536/Ioannis_Pitas_Multimodal_Stereoscopic_Movie_Summarization_Conforming_to_Narrative_Characteristics.pdf |journal=IEEE Transactions on Image Processing |year=2016 |volume=25 |issue=12 |pages=5828–5840 |publisher=IEEE |doi=10.1109/TIP.2016.2615289 |pmid=28113502 |bibcode=2016ITIP...25.5828M |hdl=1983/2bcdd7a5-825f-4ac9-90ec-f2f538bfcb72 |s2cid=18566122 |access-date=4 December 2022}}</ref><ref name="Mademlis2018">{{cite journal |last1=Mademlis |first1=Ioannis |last2=Tefas |first2=Anastasios |last3=Pitas |first3=Ioannis |title=A salient dictionary learning framework for activity video summarization via key-frame extraction |url=https://www.sciencedirect.com/science/article/abs/pii/S0020025517311398 |journal=Information Sciences |year=2018 |volume=432 |pages=319–331 |publisher=Elsevier |doi=10.1016/j.ins.2017.12.020 |access-date=4 December 2022|url-access=subscription }}</ref> Video summaries simply retain a carefully selected subset of the original video frames and, therefore, are not identical to the output of [[video synopsis]] algorithms, where ''new'' video frames are being synthesized based on the original video content.
 
== Commercial products ==
Line 18:
===Abstractive-based summarization===
 
Abstractive summarization methods generate new text that did not exist in the original text.<ref>{{Cite book |last=Zhai |first=ChengXiang |url=https://www.worldcat.org/oclc/957355971 |title=Text data management and analysis : a practical introduction to information retrieval and text mining |date=2016 |others=Sean Massung |isbn=978-1-970001-19-8 |page=321 |___location=[New York, NY] |oclc=957355971}}</ref> This has been applied mainly for text. Abstractive methods build an internal semantic representation of the original content (often called a language model), and then use this representation to create a summary that is closer to what a human might express. Abstraction may transform the extracted content by [[automated paraphrasing|paraphrasing]] sections of the source document, to condense a text more strongly than extraction. Such transformation, however, is computationally much more challenging than extraction, involving both [[natural language processing]] and often a deep understanding of the ___domain of the original text in cases where the original document relates to a special field of knowledge. "Paraphrasing" is even more difficult to apply to images and videos, which is why most summarization systems are extractive.
 
===Aided summarization===
Line 131:
 
===Applications===
{{Expand section|date=February 2017}}
Specific applications of automatic summarization include:
* The [[Reddit]] [[Internet bot|bot]] "autotldr",<ref>{{cite web|title=overview for autotldr|url=https://www.reddit.com/user/autotldr|website=reddit|access-date=9 February 2017|language=en}}</ref> created in 2011 summarizes news articles in the comment-section of reddit posts. It was found to be very useful by the reddit community which upvoted its summaries hundreds of thousands of times.<ref>{{cite book|last1=Squire|first1=Megan|author-link = Megan Squire|title=Mastering Data Mining with Python – Find patterns hidden in your data|publisher=Packt Publishing Ltd|isbn=9781785885914|url=https://books.google.com/books?id=_qXWDQAAQBAJ&pg=PA185|access-date=9 February 2017|language=en|date=2016-08-29}}</ref> The name is reference to [[TL;DR]] − [[Internet slang]] for "too long; didn't read".<ref>{{cite web|title=What Is 'TLDR'?|url=https://www.lifewire.com/what-is-tldr-2483633|website=Lifewire|access-date=9 February 2017}}</ref><ref>{{cite web|title=What Does TL;DR Mean? AMA? TIL? Glossary Of Reddit Terms And Abbreviations|url=http://www.ibtimes.com/what-does-tldr-mean-ama-til-glossary-reddit-terms-abbreviations-431704|work=International Business Times|access-date=9 February 2017|date=29 March 2012}}</ref>
 
* [[Adversarial stylometry]] may make use of summaries, if the detail lost is not major and the summary is sufficiently stylistically different to the input.{{sfn|Potthast|Hagen|Stein|2016|p=11-12}}
Reddit Bot – autotldr:
The Reddit bot "autotldr", created in 2011, summarizes news articles in the comment sections of Reddit posts. It gained widespread popularity in the Reddit community, with users upvoting its summaries hundreds of thousands of times. The bot's name is a nod to TL;DR, Internet slang for "too long; didn't read". This application shows how automated summarization tools can provide value by distilling lengthy content into digestible summaries for faster consumption.[31][32][33][34]
 
Adversarial Stylometry:
In the field of adversarial stylometry, automatic summarization may be used to obscure or alter an author's writing style while retaining core content. This technique is particularly useful in privacy-preserving text transformations, where reduced detail and stylistic changes can help anonymize authorship without significant information loss.[35]
 
Academic and Research Assistance:
Researchers and students often use summarization tools to quickly digest large volumes of academic literature, abstracts, and reports. Tools like DeepSeek’s [https://deepseekstools.com/deepseek-text-summarizer/ Text Summarizer] help streamline this process by generating concise summaries from complex documents, improving productivity and comprehension.
 
News Aggregation and Journalism:
News organizations and aggregation platforms use summarization to generate quick digests of breaking news, allowing readers to stay informed without reading full articles. This is especially useful in mobile apps, push notifications, and briefing formats.
 
Customer Service and Email Triage:
Businesses use automatic summarization to process and summarize large volumes of customer support emails or chat logs. This helps support teams prioritize and respond more efficiently.
 
Legal and Compliance Work:
Law firms and compliance departments use summarization to extract key information from lengthy legal documents, contracts, and case studies, saving time in document review processes.
 
Healthcare and Medical Records:
In healthcare, summarization tools are being applied to patient records, clinical trial data, and medical literature to aid doctors and researchers in identifying relevant information quickly.
 
==Evaluation==