Language model: Difference between revisions

Content deleted Content added
mNo edit summary
Line 52:
* Stanford Sentiment [[Treebank]]<ref>{{Cite web|url=https://nlp.stanford.edu/sentiment/treebank.html|title=Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank|website=nlp.stanford.edu|access-date=2019-02-25|archive-date=27 October 2020|archive-url=https://web.archive.org/web/20201027125825/https://nlp.stanford.edu/sentiment/treebank.html|url-status=live}}</ref>
* Winograd NLI
* BoolQ, PIQA, SIQA, HellaSwag, WinoGrande, ARC, OpenBookQA, NaturalQuestions, TriviaQA, RACE, [[MMLU|MMLU (Massive Multitask Language Understanding)]], BIG-bench hard, GSM8k, RealToxicityPrompts, WinoGender, CrowS-Pairs.<ref>{{Citation| last = Hendrycks| first = Dan| title = Measuring Massive Multitask Language Understanding| accessdate = 2023-03-15| date = 2023-03-14| url = https://github.com/hendrycks/test| archive-date = 15 March 2023| archive-url = https://web.archive.org/web/20230315011614/https://github.com/hendrycks/test| url-status = live}}</ref> ([https://github.com/facebookresearch/llama/blob/main/MODEL_CARD.md LLaMa Benchmark])
 
== See also ==