Content deleted Content added
Jameshfisher (talk | contribs) |
Jameshfisher (talk | contribs) m →Inter-textual versus intra-textual: presumed typo |
||
Line 151:
Human judgement often varies greatly in what it considers a "good" summary, so creating an automatic evaluation process is particularly difficult. Manual evaluation can be used, but this is both time and labor-intensive, as it requires humans to read not only the summaries but also the source documents. Other issues are those concerning [[coherence (linguistics)|coherence]] and coverage.
The most common way to evaluate summaries is [[ROUGE (metric)|ROUGE]] (Recall-Oriented Understudy for Gisting Evaluation). It is very common for summarization and translation systems in [[NIST]]'s Document Understanding Conferences.[https://web.archive.org/web/20060408135021/http://haydn.isi.edu/ROUGE/]
ROUGE cannot determine if the result is coherent, that is if sentences flow together in a sensibly. High-order n-gram ROUGE measures help to some degree.
|