Talk:Training, validation, and test data sets

Text and/or other creative content from Training_set was copied or moved into Test_set with this edit. The former page's history now serves to provide attribution for that content in the latter page, and it must not be deleted as long as the latter page exists.

Statistics Start‑class Mid‑importance

	This article is within the scope of WikiProject Statistics, a collaborative effort to improve the coverage of statistics on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.StatisticsWikipedia:WikiProject StatisticsTemplate:WikiProject StatisticsStatistics
Start	This article has been rated as Start-class on Wikipedia's content assessment scale.
Mid	This article has been rated as Mid-importance on the importance scale.

Robotics Start‑class Low‑importance

	This article is within the scope of WikiProject Robotics, a collaborative effort to improve the coverage of Robotics on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.RoboticsWikipedia:WikiProject RoboticsTemplate:WikiProject RoboticsRobotics
Start	This article has been rated as Start-class on Wikipedia's content assessment scale.
Low	This article has been rated as Low-importance on the project's importance scale.

Merge

Latest comment: 10 years ago3 comments2 people in discussion

There is absolutely no value added of having two articles Training set and Test set separately when neither can be discussed alone. The concept is Training and test sets with references to information science, statistics, data mining, biostatistics, etc. Currently the two articles are near duplicates (or could be based on the available information. Can we imagine some information for either which is not relevant for the other? Sda030 (talk) 22:53, 27 February 2014 (UTC)Reply

I agree they should be merged. Both articles say as much in their introductions. Prax54 (talk) 04:03, 10 January 2015 (UTC)Reply

Merger done, some rewrites needed.Prax54 (talk) 15:55, 20 June 2015 (UTC)Reply

Totally agree with the suggestion - training set, testing set and validation set are all parts of one whole and should be presented in one topic. (MM-Professor of QM & MIS, WWU-USA)

synonym "discovery set"

Latest comment: 9 years ago1 comment1 person in discussion

A training set is also called a discovery set, right? (See for example <DOI: 10.1056/NEJMoa1406498>.) Perhaps a link should be created so that looking up "discovery set" redirects to here. Now, "discovery set" just gets a bunch of mostly-irrelevant search results. 73.53.61.168 (talk) 11:17, 13 December 2015 (UTC)Reply

"Gold standard"

Latest comment: 9 years ago1 comment1 person in discussion

I have seen the term "gold standard" been used at a few places in connection with articles about machine learning. On the page Gold standard (disambiguation), it says that in statistics and machine learning, gold standard is "a manually annotated training set or test set". What does it mean that the test set is manually annotated? And is "gold standard" a term that is important enough to be mentioned in this article perhaps? —Kri (talk) 16:00, 19 January 2016 (UTC)Reply

Remove GNG template

Latest comment: 7 years ago1 comment1 person in discussion

Lots of mentions in ML literature. Wqwt (talk) 20:52, 22 March 2018 (UTC)Reply

Claim that the meaning of test and validation is flipped in practice

Latest comment: 6 years ago1 comment1 person in discussion

It's not clear in _whose_ practice this terms are flipped. In lots of posts by recognized practitioners (e.g. ^[1]) they're not flipped. — Preceding unsigned comment added by FabianMontescu (talk • contribs) 19:35, 21 September 2018 (UTC)Reply

^ https://www.datarobot.com/wiki/training-validation-holdout/

[1] ttps://www.datarobot.com/wiki/training-validation-holdout/

[1]