Revision as of 18:54, 13 January 2023 edit Polyphemus Goode (talk \| contribs) Extended confirmed users 1,098 edits application in adversarial stylometry ← Previous edit		Revision as of 09:40, 26 January 2023 edit undo Kku (talk \| contribs) Extended confirmed users 122,081 edits m link precision and recall using Find link Next edit →
Line 51: Given a document, we construct an example for each [[unigram]], [[bigram]], and trigram found in the text (though other text units are also possible, as discussed below). We then compute various features describing each example (e.g., does the phrase begin with an upper-case letter?). We assume there are known keyphrases available for a set of training documents. Using the known keyphrases, we can assign positive or negative labels to the examples. Then we learn a classifier that can discriminate between positive and negative examples as a function of the features. Some classifiers make a [[binary classification]] for a test example, while others assign a probability of being a keyphrase. For instance, in the above text, we might learn a rule that says phrases with initial capital letters are likely to be keyphrases. After training a learner, we can select keyphrases for test documents in the following manner. We apply the same example-generation strategy to the test documents, then run each example through the learner. We can determine the keyphrases by looking at binary classification decisions or probabilities returned from our learned model. If probabilities are given, a threshold is used to select the keyphrases. Keyphrase extractors are generally evaluated using [[precision and recall]]. Precision measures how many of the proposed keyphrases are actually correct. Recall measures how many of the true keyphrases your system proposed. The two measures can be combined in an F-score, which is the

Automatic summarization: Difference between revisions