Automatic summarization: Difference between revisions

Content deleted Content added
Further reading: removing extraneous character
Line 16:
Here, content is extracted from the original data, but the extracted content is not modified in any way. Examples of extracted content include key-phrases that can be used to "tag" or index a text document, or key sentences (including headings) that collectively comprise an abstract, and representative images or video segments, as stated above. For text, extraction is analogous to the process of skimming, where the summary (if available), headings and subheadings, figures, the first and last paragraphs of a section, and optionally the first and last sentences in a paragraph are read before one chooses to read the entire document in detail.<ref>Richard Sutz, Peter Weverka. How to skim text. https://www.dummies.com/education/language-arts/speed-reading/how-to-skim-text/ Accessed Dec 2019.</ref> Other examples of extraction that include key sequences of text in terms of clinical relevance (including patient/problem, intervention, and outcome).<ref name="Afzal_et_al"/>
 
===AbstractionAbstractive-based summarization===
 
Abstractive summarization methods generate new text that did not exist in the original text<ref>{{Cite book |last=Zhai |first=ChengXiang |url=https://www.worldcat.org/oclc/957355971 |title=Text data management and analysis : a practical introduction to information retrieval and text mining |date=2016 |others=Sean Massung |isbn=978-1-970001-19-8 |page=321 |___location=[New York, NY] |oclc=957355971}}</ref>. This has been applied mainly for text. Abstractive methods build an internal semantic representation of the original content (often called a language model), and then use this representation to create a summary that is closer to what a human might express. Abstraction may transform the extracted content by [[automated paraphrasing|paraphrasing]] sections of the source document, to condense a text more strongly than extraction. Such transformation, however, is computationally much more challenging than extraction, involving both [[natural language processing]] and often a deep understanding of the ___domain of the original text in cases where the original document relates to a special field of knowledge.
"Paraphrasing" is even more difficult to apply to image and video, which is why most summarization systems are extractive.