Scikit-learn: Difference between revisions

Content deleted Content added
Add "who?" template
Add a "Features" section
Line 46:
|url=http://jmlr.org/papers/v12/pedregosa11a.html
}}</ref>
It features various [[statistical classification|classification]], [[regression analysis|regression]] and [[Cluster analysis|clustering]] [[Algorithm|algorithms]] including [[support vector machine|support-vector machine]]s, [[random forests]], [[gradient boosting|gradient]] F, [[k-means clustering|''k''-means]] and [[DBSCAN]], and is designed to interoperate with the [[Python (programming language)|Python]] numerical and scientific libraries [[NumPy]] and [[SciPy]]. Scikit-learn is a [[NumFOCUS]] fiscally sponsored project.<ref>{{cite web|title=NumFOCUS Sponsored Projects|url=https://numfocus.org/sponsored-projects|publisher=NumFOCUS|access-date=2021-10-25}}</ref>
 
==Overview==
Line 62:
|page=43
}}</ref> In 2019, it was noted that scikit-learn is one of the most popular machine learning libraries on [[GitHub]].<ref>{{Cite web|url=https://github.blog/2019-01-24-the-state-of-the-octoverse-machine-learning/|title=The State of the Octoverse: machine learning|date=2019-01-24|website=The GitHub Blog|publisher=[[GitHub]]|language=en-US|access-date=2019-10-17}}</ref>
 
== Features ==
 
* Large catalogue of well-established machine learning algorithms and data pre-processing methods (i.e. [[feature engineering]])
* Utility methods for common data-science tasks, such as splitting data into [[Training, validation, and test data sets|train and test sets]], [[Cross-validation (statistics)|cross-validation]] and [[grid search]]
* Consistent way of running machine learning models ({{code|estimator.fit()|python}} and {{code|estimator.predict()|python}}), which libraries can implement
* Declarative way of structuring a data science process (the {{Code|Pipeline|Python}}), including data pre-processing and model fitting
 
==Implementation==