Programming with Big Data in R: Difference between revisions

Content deleted Content added
m top: Typo fixing, replaced: servers as → serves as
Rescuing 4 sources and tagging 1 as dead.) #IABot (v2.0
Line 19:
| website = {{URL|www.r-pbd.org}}
}}
'''Programming with Big Data in R''' (pbdR)<ref>{{cite web|author=Ostrouchov, G., Chen, W.-C., Schmidt, D., Patel, P.|title=Programming with Big Data in R|year=2012|url=http://r-pbd.org}}</ref> is a series of [[R (programming language)|R]] packages and an environment for [[statistical computing]] with [[big data]] by using high-performance statistical computation.<ref>{{cite web|author1=Chen, W.-C. |author2=Ostrouchov, G. |lastauthoramp=yes |url=http://thirteen-01.stat.iastate.edu/snoweye/hpsc/|year=2011|title=HPSC -- High Performance Statistical Computing for Data Intensive Research|access-date=2013-06-25|archive-url=https://web.archive.org/web/20130719020318/http://thirteen-01.stat.iastate.edu/snoweye/hpsc/|archive-date=2013-07-19|url-status=dead}}</ref> The pbdR uses the same programming language as R with [[S (programming language)|S3/S4]] classes and methods which is used among [[statistician]]s and [[Data mining|data miners]] for developing [[statistical software]]. The significant difference between pbdR and R code is that pbdR mainly focuses on [[distributed memory]] systems, where data are distributed across several processors and analyzed in a [[Batch processing|batch mode]], while communications between processors are based on [[Message Passing Interface|MPI]] that is easily used in large [[High-performance computing|high-performance computing (HPC)]] systems. R system mainly focuses{{Citation needed|date=July 2013}} on single [[Multi-core processor|multi-core]] machines for data analysis via an interactive mode such as [[Graphical user interface|GUI interface]].
 
Two main implementations in [[R (programming language)|R]] using [[Message Passing Interface|MPI]] are Rmpi<ref name=rmpi>{{cite journal|author=Yu, H.|title=Rmpi: Parallel Statistical Computing in R|year=2002|url=https://cran.r-project.org/package=Rmpi|journal=R News}}</ref> and pbdMPI of pbdR.
Line 142:
 
== Further reading ==
* {{cite techreport|author=Raim, A.M.|year=2013|title=Introduction to distributed computing with pbdR at the UMBC High Performance Computing Facility|institution=UMBC High Performance Computing Facility, University of Maryland, Baltimore County|number=HPCF-2013-2|url=http://userpages.umbc.edu/~gobbert/papers/pbdRtara2013.pdf}}{{Dead link|dateaccessdate=November 20192013-06-26|archiveurl=https://web.archive.org/web/20140204051402/http://userpages.umbc.edu/~gobbert/papers/pbdRtara2013.pdf|archivedate=2014-02-04|url-status=dead}}
* {{cite techreport|author=Bachmann, M.G., Dyas, A.D., Kilmer, S.C. and Sass, J.|year=2013|title=Block Cyclic Distribution of Data in pbdR and its Effects on Computational Efficiency|institution=UMBC High Performance Computing Facility, University of Maryland, Baltimore County|number=HPCF-2013-11|url=http://userpages.umbc.edu/~gobbert/papers/REU2013Team1.pdf}}{{Dead link|dateaccessdate=November 20192014-02-01|archiveurl=https://web.archive.org/web/20140204051351/http://userpages.umbc.edu/~gobbert/papers/REU2013Team1.pdf|archivedate=2014-02-04|url-status=dead}}
* {{cite techreport|author=Bailey, W.J., Chambless, C.A., Cho, B.M. and Smith, J.D.|year=2013|title=Identifying Nonlinear Correlations in High Dimensional Data with Application to Protein Molecular Dynamics Simulations|institution=UMBC High Performance Computing Facility, University of Maryland, Baltimore County|number=HPCF-2013-12|url=http://userpages.umbc.edu/~gobbert/papers/REU2013Team2.pdf}}{{Dead link|dateaccessdate=November 20192014-02-01|archiveurl=https://web.archive.org/web/20140204055902/http://userpages.umbc.edu/~gobbert/papers/REU2013Team2.pdf|archivedate=2014-02-04|url-status=dead}}
* {{cite web|title=High-Performance and Parallel Computing with R|author=Dirk Eddelbuettel|url=https://cran.r-project.org/web/views/HighPerformanceComputing.html|author-link=Dirk Eddelbuettel}}
* {{cite news|title=R at 12,000 Cores|url=http://www.r-bloggers.com/r-at-12000-cores/}}<br />This article was read 22,584 times in 2012 since it posted on October 16, 2012 and ranked number 3<ref>{{cite news|url=http://www.r-bloggers.com/100-most-read-r-posts-for-2012-stats-from-r-bloggers-big-data-visualization-data-manipulation-and-other-languages/|title=100 most read R posts in 2012 (stats from R-bloggers) – big data, visualization, data manipulation, and other languages}}</ref>
* {{cite web|url=http://rwiki.sciviews.org/doku.php?id=developers:projects:gsoc2013:mpiprofiler|title=Profiling Tools for Parallel Computing with R|author=Google Summer of Code - R 2013}}{{Dead link|date=May 2020 |bot=InternetArchiveBot |fix-attempted=yes }}
* {{cite web|url=http://rpubs.com/wush978/pbdMPI-linux-pilot|title=在雲端運算環境使用R和MPI|author=Wush Wu (2014)}}
* {{cite web|url=https://www.youtube.com/watch?v=m1vtPESsFqM|title=快速在AWS建立R和pbdMPI的使用環境|author=Wush Wu (2013)}}