Revision as of 09:57, 19 May 2023 edit Justice2022 (talk \| contribs) Extended confirmed users 643 edits m images and videos Tag: Visual edit ← Previous edit		Revision as of 02:18, 5 July 2023 edit undo Headbomb (talk \| contribs) Edit filter managers, Autopatrolled, Extended confirmed users, Page movers, File movers, New page reviewers, Pending changes reviewers, Rollbackers, Template editors 473,442 edits ce Next edit →
Line 47: ====Supervised learning approaches==== Beginning with the work of Turney,<ref>{{Cite journal \|arxiv = cs/0212020\|last1 = Turney\|first1 = Peter D\|title = Learning Algorithms for Keyphrase Extraction\|journal = Information Retrieval~~, )~~\|volume = 2\|issue = 4\|pages = 303–336\|year = 2002\|doi = 10.1023/A:1009976227802\|bibcode = 2002cs.......12020T\|s2cid = 7007323}}</ref> many researchers have approached keyphrase extraction as a [[supervised machine learning]] problem. Given a document, we construct an example for each [[unigram]], [[bigram]], and trigram found in the text (though other text units are also possible, as discussed below). We then compute various features describing each example (e.g., does the phrase begin with an upper-case letter?). We assume there are known keyphrases available for a set of training documents. Using the known keyphrases, we can assign positive or negative labels to the examples. Then we learn a classifier that can discriminate between positive and negative examples as a function of the features. Some classifiers make a [[binary classification]] for a test example, while others assign a probability of being a keyphrase. For instance, in the above text, we might learn a rule that says phrases with initial capital letters are likely to be keyphrases. After training a learner, we can select keyphrases for test documents in the following manner. We apply the same example-generation strategy to the test documents, then run each example through the learner. We can determine the keyphrases by looking at binary classification decisions or probabilities returned from our learned model. If probabilities are given, a threshold is used to select the keyphrases.

Automatic summarization: Difference between revisions