Content deleted Content added
fixed typo |
→External links: rm deadlink extlink |
||
(8 intermediate revisions by 7 users not shown) | |||
Line 1:
{{
{{short description|Collection of data}}
[[File:Iris dataset scatterplot.svg|thumb|right|Various plots of the multivariate data set [[Iris flower data set|''Iris'' flower data set]] introduced by [[Ronald Fisher]] (1936).<ref name="fisher36"/>]]
A '''data set''' (or '''dataset''') is a collection of [[data]]. In the case of [[tabular data]], a data set corresponds to one or more [[table (database)|database tables]], where every [[column (database)|column]] of a table represents a particular [[Variable (computer science)|variable]], and each [[row (database)|row]] corresponds to a given [[Record (computer science)|record]] of the data set in question. The data set lists values for each of the variables, such as for example height and weight of an object, for each member of the data set. Data sets can also consist of a collection of documents or files.<ref name="Editorial">{{cite journal | last1 = Snijders | first1 = C. | last2 = Matzat | first2 = U. | last3 = Reips | first3 = U.-D. | year = 2012 | title = 'Big Data': Big gaps of knowledge in the field of Internet | url = http://www.ijis.net/ijis7_1/ijis7_1_editorial.html | journal = International Journal of Internet Science | volume = 7 | pages = 1–5 | access-date = 2017-02-10 | archive-date = 2019-11-23 | archive-url = https://web.archive.org/web/20191123051001/http://www.ijis.net/ijis7_1/ijis7_1_editorial.html | url-status = dead }}</ref>
In the [[open data]] discipline,
==Properties==
Line 27:
==Example==
Loading datasets using [[Python (programming language)|Python]]:
<syntaxhighlight lang="
$ pip install datasets
</syntaxhighlight>
<syntaxhighlight lang="python">
from datasets import load_dataset
dataset = load_dataset(NAME OF DATASET)
</syntaxhighlight>
Line 52 ⟶ 55:
* [https://www.data.gov/ Data.gov] – the U.S. Government's open data
* [https://data.humdata.org/ Humanitarian Data Exchange(HDX)] – The Humanitarian Data Exchange (HDX) is an open humanitarian [[data sharing]] platform managed by the [[United Nations Office for the Coordination of Humanitarian Affairs]].
* [https://opendata.cityofnewyork.us/ NYC Open Data] – free public data published by New York City agencies and other partners.
* [https://relational.
* [https://web.archive.org/web/20190214051201/http://www.researchpipeline.com/mediawiki/index.php?title=Main_Page Research Pipeline] – a wiki/website with links to data sets on many different topics
* [http://lib.stat.cmu.edu/jasadata/ StatLib–JASA Data Archive]
|