Programming with Big Data in R: Difference between revisions

Content deleted Content added
Wccsnow (talk | contribs)
mNo edit summary
Stat3472 (talk | contribs)
m Link to model-based clustering article updated to link to main article
 
(89 intermediate revisions by 46 users not shown)
Line 1:
{{multiple issues|
<!-- Please do not remove or change this AfD message until the issue is settled -->
{{Article for deletion/dated|page=Programming with Big Data in R|timestamp=20130627212758|year=2013|month=June|day=27|substed=yes|help=off}}
<!-- For administrator use only: {{Old AfD multi|page=Programming with Big Data in R|date=27 June 2013|result='''keep'''}} -->
<!-- End of AfD message, feel free to edit beyond this point -->
{{notability|date=June 2013}}
{{Expert-subject|Statistics,Computer scienceCOI|date=June 2013}}
}}
 
{{Infobox programming language
| name = pbdRbdrp
| logo = [[File:Pbdr.png|200px]]
| paradigm = [[SPMD]] and [[MPMD]]
| yearreleased = {{Start date and = Sep. age|2012|09}}
| designer = [http://thirteen-01.stat.iastate.edu/snoweye/mypage/ Wei-Chen Chen], [http://www.csm.ornl.gov/~ost George Ostrouchov], Pragneshkumar Patel, and [http://wrathematics.github.io/ Drew Schmidt]
| developer = pbdR Core Team
| latest_test_version = Through [[GitHub]] at [httphttps://github.com/RBigData/ RBigData]
| typing = [[dynamic typing|Dynamic]]
| influenced_by = [[R (programming language)|R]], [[C (programming language)|C]], [[Fortran (programming language)|Fortran]], and [[Message Passing Interface|MPI]], and [[ZeroMQ|ØMQ]]
| operating_system = [[Cross-platform]]
| license = [[General Public License]] and [[Mozilla Public License]]
| website = [http://{{URL|www.r-pbd.org r-pbd.org]}}
}}
'''Programming with Big Data in R''' (pbdR)<ref>{{cite web|author=Ostrouchov, G., Chen, W.-C., Schmidt, D., Patel, P.|title=Programming with Big Data in R|year=2012|url=http://r-pbd.org/}}</ref><ref>{{cite web|title=XSEDE|url=https://portal.xsede.org/knowledge-base/-/kb/document/bcrw}}</ref><ref name=pbdDEMO/> is a series of [[R (programming language)|R]] packages and an environment for [[statistical computing]] with [[Bigbig Datadata]] by utilizingusing high-performance statistical computation.<ref>{{cite web|authorauthor1=Chen, W.-C. and |author2=Ostrouchov, G.|name-list-style=amp|url=http://thirteen-01.stat.iastate.edu/snoweye/hpsc/|year=2011|title=HPSC -- High Performance Statistical Computing for Data Intensive Research|access-date=2013-06-25|archive-url=https://web.archive.org/web/20130719020318/http://thirteen-01.stat.iastate.edu/snoweye/hpsc/|archive-date=2013-07-19|url-status=dead}}</ref> The pbdR uses the same programming language as [[R (programming language)|R]]<ref name=R>{{cite bookweb|authorurl=R Core Teamhttps://learnshareit.com/tutorials-for-r/|title=R:Basic ATutorials Languagefor andR Environmentto forStart StatisticalAnalyzing ComputingData|year=2012|isbndate=3-900051-07-0|url=http://www.r-project.org/ November 2022 }}</ref> The pbdR uses the same programming language as R with [[S (programming language)|S3/S4]] classes and methods which is used among [[statistician]]s and [[Data mining|data miners]] for developing [[statistical software]]. The significant difference between pbdR and [[R (programmingcode language)|R]]<refis name=R/> codes isthat pbdR mainly focuses on [[distributed memory]] systemsystems, where data are distributed across several processors and analyzed in a [[Batch processing|batch mode]], while communications between processors are based on [[Message Passing Interface|MPI]] whichthat is easily utilizedused in large [[High-performance computing|high-performance computing (HPC)]] systems. [[R (programmingsystem language)|R]]mainly system<reffocuses{{Citation nameneeded|date=R/>July mainly focuses2013}} on single [[Multi-core processor| multi-core]] machines for data analysis via an interactive mode such as [[Graphical user interface|GUI interface]].<ref>Martinez, W. L. (2011), Graphical user interfaces. WIREs Comp Stat, 3: 119–133. doi: 10.1002/wics.150</ref>
 
Two main implementations in [[R (programming language)|R]] using [[Message Passing Interface|MPI]] are [http://cran.r-project.org/package=Rmpi Rmpi]<ref name=rmpi/> and [http://cran.r-project.org/package=pbdMPI pbdMPI] of pbdR.
* The pbdR built on [http://cran.r-project.org/package=pbdMPI pbdMPI] uses [[SPMD|SPMD parallelism]] where every processors are considered as workers and own parts of data. The [[SPMD|SPMD parallelism]]<ref name=spmd/><ref name=spmd_ostrouchov/> introduced in mid 1980 is particularly efficient in homogeneous computing environments for large data, for example, performing [[Singular value decomposition|singular value decomposition]]<ref>{{Cite book | last1=Golub | first1=Gene H. | author1-link=Gene H. Golub | last2=Van Loan | first2=Charles F. | author2-link=Charles F. Van Loan | title=Matrix Computations | publisher=Johns Hopkins | edition=3rd | isbn=978-0-8018-5414-9 | year=1996 }}
</ref> on a large matrix, or performing [[Mixture model|clustering analysis]] on high-dimensional large data. On the other hand, there is no restriction to use [[Master/slave (technology)|manager/workers parallelism]] in [[SPMD|SPMD parallelism]] environment.
* The [http://cran.r-project.org/package=Rmpi Rmpi]<ref name=rmpi/> uses [[Master/slave (technology)|manager/workers parallelism]] where one main processor (manager) servers as the control of all other processors (workers). The [[Master/slave (technology)|manager/workers parallelism]]<ref>[http://userpages.uni-koblenz.de/~laemmel/MapReduce/paper.pdf "Google's MapReduce Programming Model -- Revisited"] — paper by Ralf Lämmel; from [[Microsoft]]</ref> introduced around early 2000 is particularly efficient for large tasks in small [[Computer cluster|clusters]], for example, [[Bootstrapping (statistics)|bootstrap method]] and [[Monte Carlo method|Monte Carlo simulation]] in applied statistics since [[Independent and identically distributed random variables|i.i.d.]] assumption is commonly used in most [[Statistics|statistical analysis]]. In particular, [http://math.acadiau.ca/ACMMaC/Rmpi/structure.html| task pull] parallelism has better performance for Rmpi in heterogeneous computing environments.
The idea of [[SPMD|SPMD parallelism]] is to let every processors do the same works but on different parts of large data. For example, modern [[Graphics processing unit|GPU]] is a large collection of slower co-processors which can simply apply the same computation on different parts of relatively smaller data, but it ends up an efficient way to obtain final solutions.<ref>{{cite web | url = http://www.engadget.com/2006/09/29/stanford-university-tailors-folding-home-to-gpus/ | title = Stanford University tailors Folding@home to GPUs | author = Darren Murph | accessdate = 2007-10-04 }}</ref><ref>{{cite web | url = http://graphics.stanford.edu/~mhouston/ | title = Folding@Home - GPGPU | author = Mike Houston | accessdate = 2007-10-04 }}</ref>
 
Two main implementations in [[R (programming language)|R]] using [[Message Passing Interface|MPI]] are [http://cran.r-project.org/package=Rmpi Rmpi]<ref name=rmpi/>{{cite andjournal|author=Yu, [httpH.|title=Rmpi: Parallel Statistical Computing in R|year=2002|url=https://cran.r-project.org/package=pbdMPIRmpi|journal=R News}}</ref> and pbdMPI] of pbdR.
It is clearly that pbdR is not only suitable for small [[Computer cluster|clusters]], but also is stabler for analyzing [[Big data]] and is more scalable for [[Supercomputer|supercomputers]].<ref>{{cite journal|author=Schmidt, D., Ostrouchov, G., Chen, W.-C., and Patel, P.|title=Tight Coupling of R and Distributed Linear Algebra for High-Level Programming with Big Data|year=2012|pages=811-815|journal=High Performance Computing, Networking, Storage and Analysis (SCC), 2012 SC Companion:|url=http://dl.acm.org/citation.cfm?id=2477156}}</ref>
* The pbdR built on [http://cran.r-project.org/package=pbdMPI pbdMPI] uses [[SPMD|SPMD parallelism]] where every processorsprocessor areis considered as workersworker and ownowns parts of data. The [[SPMD|SPMD parallelism]]<ref name=spmd/><ref name=spmd_ostrouchov/> introduced in mid 1980 is particularly efficient in homogeneous computing environments for large data, for example, performing [[Singular value decomposition|singular value decomposition]]<ref>{{Cite bookon |a last1=Golublarge |matrix, first1=Geneor H.performing | author1-link=Gene H. Golub[[Mixture model|clustering last2=Vananalysis]] Loanon |high-dimensional first2=Charleslarge Fdata. |On author2-link=Charlesthe F.other Vanhand, Loanthere |is title=Matrixno Computationsrestriction |to publisher=Johnsuse Hopkins[[Master/slave (technology)|manager/workers edition=3rdparallelism]] | isbn=978-0-8018-5414-9in [[SPMD|SPMD year=1996parallelism]] }}environment.
* The [http://cran.r-project.org/package=Rmpi Rmpi]<ref name=rmpi/> uses [[Master/slave (technology)|manager/workers parallelism]] where one main processor (manager) serversserves as the control of all other processors (workers). The [[Master/slave (technology)|manager/workers parallelism]]<ref>[http://userpages.uni-koblenz.de/~laemmel/MapReduce/paper.pdf "Google's MapReduce Programming Model -- Revisited"] — paper by Ralf Lämmel; from [[Microsoft]]</ref> introduced around early 2000 is particularly efficient for large tasks in small [[Computer cluster|clusters]], for example, [[Bootstrapping (statistics)|bootstrap method]] and [[Monte Carlo method|Monte Carlo simulation]] in applied statistics since [[Independent and identically distributed random variables|i.i.d.]] assumption is commonly used in most [[Statistics|statistical analysis]]. In particular, [http://math.acadiau.ca/ACMMaC/Rmpi/structure.html| task pull] parallelism has better performance for Rmpi in heterogeneous computing environments.
The idea of [[SPMD|SPMD parallelism]] is to let every processorsprocessor do the same worksamount of work, but on different parts of a large data set. For example, a modern [[Graphics processing unit|GPU]] is a large collection of slower co-processors whichthat can simply apply the same computation on different parts of relatively smaller data, but itthe SPMD parallelism ends up with an efficient way to obtain final solutions.<ref>{{cite web | url = http://www(i.engadgete.com/2006/09/29/stanford-university-tailors-folding-home-to-gpus/ | title = Stanford University tailors Folding@hometime to GPUssolution |is author = Darren Murph | accessdate = 2007-10-04 }}</ref>shorter).<ref>{{cite web | url = http://graphics.stanford.edu/~mhouston/ | title = Folding@Home - GPGPU | author = Mike Houston | accessdateaccess-date = 2007-10-04 }}</ref>
 
== Package design ==
Line 33 ⟶ 29:
{| class="wikitable"
|-
! General !! I/O !! Computation !! Application !! Profiling !! Client/Server
|-
| pbdDEMO || pbdNCDF4 || pbdDMAT || pmclust || pbdPROF || pbdZMQ
| [http://cran.r-project.org/package=pbdDEMO pbdDEMO] || [http://cran.r-project.org/package=pbdNCDF4 pbdNCDF4] || [http://cran.r-project.org/package=pbdDMAT pbdDMAT] || [http://cran.r-project.org/package=pmclust pmclust]
|-
| pbdMPI || pbdADIOS || pbdBASE || pbdML || pbdPAPI || remoter
| [http://cran.r-project.org/package=pbdMPI pbdMPI] || || [http://cran.r-project.org/package=pbdBASE pbdBASE] ||
|-
| || || [http://cran.r-project.org/package=pbdSLAP pbdSLAP]|| || hpcvis || pbdCS
|-
| || || kazaam || || || pbdRPC
|}
[[File:Pbd overview.png|thumb|The images describes how various pbdr packages are correlated.]]
Among these packages, pbdMPI provides wrapper functions to [[Message Passing Interface|MPI]] library, and it also produces a [[Library (computing)|shared library]] and a configuration file for [[MPI]] environments. All other packages rely on this configuration for installation and library loading that avoidavoids difficulty of library linking and compiling. All other packages can directly utilizeuse [[MPI]] functions easily.
 
* [http://cran.r-project.org/web/packages/pbdMPI/vignettes/pbdMPI-guide.pdf pbdMPI] --- an efficient interface to [[MPI]] either [[Open MPI|OpenMPI]]<ref> or [[MPICH2]] with a focus on Single Program/Multiple Data ([[SPMD]]) parallel programming style
* pbdSLAP --- bundles scalable dense linear algebra libraries in double precision for R, based on [[ScaLAPACK]] version 2.0.2 which includes several scalable linear algebra packages (namely [[BLACS]], [[PBLAS]], and [[ScaLAPACK]]).
{{cite web
* pbdNCDF4 --- interface to Parallel Unidata [[NetCDF]]4 format data files
| url=http://www.open-mpi.org/papers/sc-2008/jsquyres-cisco-booth-talk-1up.pdf
* [http://cran.r-project.org/web/packages/pbdBASE/vignettes/pbdBASE-guide.pdf pbdBASE] --- low-level [[ScaLAPACK]] codes and wrappers
|author=Jeff Squyres
* pbdDMAT --- distributed matrix classes and computational methods, with a focus on linear algebra and statistics
| publisher=Open MPI Project
* pbdDEMO --- set of package demonstrations and examples, and this unifying vignette
| title=Open MPI: 10^15 Flops Can't Be Wrong
* pmclust --- parallel [[model-based clustering]] using pbdR
| accessdate=2011-09-27}}</ref> or [[MPICH2]]<ref>[http://www.mcs.anl.gov/research/projects/mpich2/downloads/license.txt MPICH License]</ref> with a focus on Single Program/Multiple Data ([[SPMD]]) parallel programming style<ref name=spmd>{{cite journal|author=Darema, F.|title=The SPMD Model: Past, Present and Future|url=http://dx.doi.org/10.1007/3-540-45417-9_1|year=2001}}</ref><ref>{{cite journal|author=Ortega, J.M., Voight, G.G., and Romine, C.H.|year=1989|title=Bibliography on Parallel and Vector Numerical Algorithms|url=http://liinwww.ira.uka.de/bibliography/Parallel/ovr.html}}</ref><ref name=spmd_ostrouchov>{{cite journal|author=Ostrouchov, G.|year=1987|title=Parallel Computing on a Hypercube: An Overview of the Architecture and Some Applications|journal=Proc. 19th Symp. on the Interface of Computer Science and Statistics|page=27-32}}</ref>
* pbdPROF --- profiling package for MPI codes and visualization of parsed stats
* [http://cran.r-project.org/web/packages/pbdSLAP/vignettes/pbdSLAP-guide.pdf pbdSLAP] --- bundles scalable dense linear algebra libraries in double precision for R, based on [[ScaLAPACK]] version 2.0.2<ref>{{cite book|title=ScaLAPACK Users' Guide|author=Blackford, L.S., et.al.|year=1997|url=http://netlib.org/scalapack/slug/scalapack_slug.html/}}</ref> which includes several scalable linear algebra packages (namely [[BLACS]], [[PBLAS]], and [[ScaLAPACK]]).<ref>{{cite web|title=PBLAS|url=http://www.netlib.org/utk/papers/scalapack/node9.html|work=Netlib|first=Antoine |last=Petitet
* pbdZMQ --- interface to [[ZeroMQ|ØMQ]]
|year=1995|accessdate= 13 July 2012}}</ref><ref name=pbblas>{{cite journal|title=PB-BLAS: a set of Parallel Block Basic Linear Algebra Subprograms|journal=Scalable High-Performance Computing Conference|year=1994|month=May|pages=534–541|url=http://www.netlib.org/utk/people/JackDongarra/journals/079_1996_pb-blas-a-set-of-parallel-block-basic-linear-algebra-subroutines.pdf|doi=10.1109/SHPCC.1994.296688|isbn=0-8186-5680-8|last1=Jaeyoung Choi|last2=Dongarra|first2=J.J.|last3=Walker|first3=D.W.}}</ref>
* remoter --- R client with remote R servers
* [http://cran.r-project.org/web/packages/pbdNCDF4/vignettes/pbdNCDF4-guide.pdf pbdNCDF4] --- Interface to Parallel Unidata [[NetCDF|NetCDF4]] format data files<ref>{{cite web|title=Network Common Data Form|author=NetCDF Group|url=http://www.unidata.ucar.edu/software/netcdf/|year=2008}}</ref>
* pbdCS --- pbdR client with remote pbdR servers
* [http://cran.r-project.org/web/packages/pbdBASE/vignettes/pbdBASE-guide.pdf pbdBASE] --- low-level [[ScaLAPACK]] codes and wrappers
* pbdRPC --- remote procedure call
* [http://cran.r-project.org/web/packages/pbdDMAT/vignettes/pbdDMAT-guide.pdf pbdDMAT] --- distributed matrix classes and computational methods, with a focus on linear algebra and statistics<ref>{{cite journal|author=J. Dongarra and D. Walker|title=The Design of Linear Algebra Libraries for High Performance Computers|url=http://acts.nersc.gov/scalapack/hands-on/datadist.html}}</ref><ref>{{cite journal|author=J. Demmel, M. Heath, and H. van der Vorst|title=Parallel Numerical Linear Algebra|url=http://acts.nersc.gov/scalapack/hands-on/datadist.html}}</ref><ref>{{cite web|title=2d block-cyclic data layout|url=http://acts.nersc.gov/scalapack/hands-on/datadist.html}}</ref>
* kazaam --- very tall and skinny distributed matrices
* [http://cran.r-project.org/web/packages/pbdDEMO/vignettes/pbdDEMO-guide.pdf pbdDEMO] --- set of package demonstrations and examples, and this unifying vignette<ref name=pbdDEMO>{{cite journal|author=Schmidt, D., Chen, W.-C., Patel, P., Ostrouchov, G.|year=2013|title=Speaking Serial R with a Parallel Accent|url=http://github.com/wrathematics/pbdDEMO/blob/master/inst/doc/pbdDEMO-guide.pdf?raw=true}}</ref>
* pbdML --- machine learning toolbox
* [http://cran.r-project.org/web/packages/pmclust/vignettes/pmclust-guide.pdf pmclust] -- parallel [[Mixture model|model-based clustering]] using pbdR
 
AmountAmong those packages, the pbdDEMO package is a collection of 20+ package demos which offer example uses of the various pbdR packages, and contains a vignette whichthat offers detailed explanations for the demos and provides some mathematical or statistical insight.
 
== Examples ==
 
=== Example 1 ===
Hello World! Save the following code in a file called ``"demo.r``"
<sourcesyntaxhighlight lang="rsplusr">
### Initial MPI
library(pbdMPI, quiet = TRUE)
Line 73 ⟶ 72:
### Finish
finalize()
</syntaxhighlight>
</source>
and use the command
<sourcesyntaxhighlight lang="bash">
mpiexec -np 2 Rscript demo.r
</syntaxhighlight>
</source>
to execute the code where [[R (programming language)|Rscript]] is one of command line executable program.
 
=== Example 2 ===
The following example modified from pbdMPI illustrates the basic [[programming language syntax|syntax of the language]] of pbdR.
Since pbdR is designed in [[SPMD]], all the R scripts are stored in files and executed from the command line via [[MPI|mpiexec]], [[MPI|mpirun]], etc. Save the following code in a file called ``"demo.r``"
<sourcesyntaxhighlight lang="rsplusr">
### Initial MPI
library(pbdMPI, quiet = TRUE)
Line 102 ⟶ 101:
### Finish
finalize()
</syntaxhighlight>
</source>
and use the command
<sourcesyntaxhighlight lang="bash">
mpiexec -np 4 Rscript demo.r
</syntaxhighlight>
</source>
to execute the code where [[R (programming language)|Rscript]] is one of command line executable program.
 
=== Example 3 ===
The following example modified from pbdDEMO illustrates the basic ddmatrix computation of pbdR which performs [[Singular value decomposition|singular value decomposition]] on a given matrix.
Save the following code in a file called ``"demo.r``"
<sourcesyntaxhighlight lang="rsplusr">
# Initialize process grid
library(pbdDMAT, quiet=T)
Line 131 ⟶ 130:
# Finish
finalize()
</syntaxhighlight>
</source>
and use the command
<sourcesyntaxhighlight lang="bash">
mpiexec -np 2 Rscript demo.r
</syntaxhighlight>
</source>
to execute the code where [[R (programming language)|Rscript]] is one of command line executable program.
 
== Further reading ==
* [http://userpages.umbc.edu/~gobbert/papers/pbdRtara2013.pdf UMBC HPCF Technique Report by Raim, A.M. (2013)].<ref>{{cite journaltech report|author=Raim, A.M.|year=2013|title= Introduction to distributed computing with pbdR at the UMBC High Performance Computing Facility|journalinstitution=UMBC TechnicalHigh ReportPerformance Computing Facility, University of Maryland, Baltimore County|number=HPCF-2013-2|url=http://userpages.umbc.edu/~gobbert/papers/pbdRtara2013.pdf|accessdate=2013-06-26|archiveurl=https://web.archive.org/web/20140204051402/http://userpages.umbc.edu/~gobbert/papers/pbdRtara2013.pdf|archivedate=2014-02-04|url-status=dead}}</ref>
* {{cite tech report|author=Bachmann, M.G., Dyas, A.D., Kilmer, S.C. and Sass, J.|year=2013|title=Block Cyclic Distribution of Data in pbdR and its Effects on Computational Efficiency|institution=UMBC High Performance Computing Facility, University of Maryland, Baltimore County|number=HPCF-2013-11|url=http://userpages.umbc.edu/~gobbert/papers/REU2013Team1.pdf|accessdate=2014-02-01|archiveurl=https://web.archive.org/web/20140204051351/http://userpages.umbc.edu/~gobbert/papers/REU2013Team1.pdf|archivedate=2014-02-04|url-status=dead}}
* [http://cran.r-project.org/ CRAN] Task View: [http://cran.r-project.org/web/views/HighPerformanceComputing.html High-Performance and Parallel Computing with R].<ref>{{cite web|title=High-Performance and Parallel Computing with R|author=Dirk Eddelbuettel|url=http://cran.r-project.org/web/views/HighPerformanceComputing.html}}</ref>
* {{cite tech report|author=Bailey, W.J., Chambless, C.A., Cho, B.M. and Smith, J.D.|year=2013|title=Identifying Nonlinear Correlations in High Dimensional Data with Application to Protein Molecular Dynamics Simulations|institution=UMBC High Performance Computing Facility, University of Maryland, Baltimore County|number=HPCF-2013-12|url=http://userpages.umbc.edu/~gobbert/papers/REU2013Team2.pdf|accessdate=2014-02-01|archiveurl=https://web.archive.org/web/20140204055902/http://userpages.umbc.edu/~gobbert/papers/REU2013Team2.pdf|archivedate=2014-02-04|url-status=dead}}
* [http://www.r-bloggers.com/r-at-12000-cores/ R at 12,000 Cores].<ref>{{cite news|title=R at 12,000 Cores|url=http://www.r-bloggers.com/r-at-12000-cores/}}</ref> This article was read 22,584 times in 2012 since it posted on October 16, 2012 and ranked number 3 according to [http://www.r-bloggers.com/100-most-read-r-posts-for-2012-stats-from-r-bloggers-big-data-visualization-data-manipulation-and-other-languages/|Top 100 R posts of 2012]<ref>{{cite news|url=http://www.r-bloggers.com/100-most-read-r-posts-for-2012-stats-from-r-bloggers-big-data-visualization-data-manipulation-and-other-languages/|title=100 most read R posts in 2012 (stats from R-bloggers) – big data, visualization, data manipulation, and other languages}}</ref>
* [http://cran.r-project.org/ CRAN] Task View: [http://cran.r-project.org/web/views/HighPerformanceComputing.html High-Performance and Parallel Computing with R].<ref>{{cite web|title=High-Performance and Parallel Computing with R|author=Dirk Eddelbuettel|date=13 November 2022 |url=httphttps://cran.r-project.org/web/views/HighPerformanceComputing.html|author-link=Dirk Eddelbuettel}}</ref>
* [http://rwiki.sciviews.org/doku.php?id=developers:projects:gsoc2013:mpiprofiler|MPI Profiler for pbdR] mentored by the [http://rwiki.sciviews.org/doku.php| Organization of R Project for Statistical Computing] for Google summer of code 2013.<ref>{{cite web|url=http://rwiki.sciviews.org/doku.php?id=developers:projects:gsoc2013:mpiprofiler|title=Profiling Tools for Parallel Computing with R|author=GSOC-R 2013}}</ref>
* [http://www.r-bloggers.com/r-at-12000-cores/ R at 12,000 Cores].<ref>{{cite news|title=R at 12,000 Cores|url=http://www.r-bloggers.com/r-at-12000-cores/}}<br /ref> This article was read 22,584 times in 2012 since it posted on October 16, 2012, and ranked number 3 according to [http://www.r-bloggers.com/100-most-read-r-posts-for-2012-stats-from-r-bloggers-big-data-visualization-data-manipulation-and-other-languages/|Top 100 R posts of 2012]<ref>{{cite news|url=http://www.r-bloggers.com/100-most-read-r-posts-for-2012-stats-from-r-bloggers-big-data-visualization-data-manipulation-and-other-languages/|title=100 most read R posts in 2012 (stats from R-bloggers) – big data, visualization, data manipulation, and other languages}}</ref>
 
* [{{cite web|url=http://rwiki.sciviews.org/doku.php?id=developers:projects:gsoc2013:mpiprofiler|MPI Profiler for pbdR] mentored by the [httparchive-url=https://rwikiarchive.sciviews.orgtoday/20130629095333/doku.php| Organization of R Project for Statistical Computing] for Google summer of code 2013.<ref>{{cite web|url=http://rwiki.sciviews.org/doku.php?id=developers:projects:gsoc2013:mpiprofiler|url-status=dead|archive-date=2013-06-29|title=Profiling Tools for Parallel Computing with R|author=GSOCGoogle Summer of Code - R 2013}}</ref>
== External links ==
* {{cite web|url=http://rpubs.com/wush978/pbdMPI-linux-pilot|title=在雲端運算環境使用R和MPI|author=Wush Wu (2014)}}
* {{Official website|r-pbd.org}} of the pbdR project
* {{cite web|url=https://www.youtube.com/watch?v=m1vtPESsFqM|title=快速在AWS建立R和pbdMPI的使用環境|author=Wush Wu (2013)|website=[[YouTube]] }}
* [http://thirteen-01.stat.iastate.edu/snoweye/pbdr/ Technical website] of the pbdR packages
* [http://code.r-pbd.org Source Code] of developing version of the pbdR packages
* [http://group.r-pbd.org Discussion Group] for any of pbdR related topics
 
== Milestones ==
2013
* Version 1.0-2:&nbsp; Add pmclust.
* Version 1.0-1:&nbsp; Add pbdNCDF4.
* Version 1.0-0:&nbsp; Add pbdDEMO.
2012
* Version 0.1-2:&nbsp; Add pbdBASE and pbdDMAT.
* Version 0.1-1:&nbsp; Add pbdSLAP.
* Version 0.1-0:&nbsp; Migrate from [http://cran.r-project.org/package=Rmpi Rmpi]<ref name=rmpi>{{cite journal|author=Yu, H.|title=Rmpi: Parallel Statistical Computing in R|year=2002|url=http://cran.r-project.org/package=Rmpi|journal=R News}}</ref> to pbdMPI.
 
== References ==
{{Reflist|30em}}
 
== External links ==
* {{Official website|www.r-pbd.org}} of the pbdR project
 
{{DEFAULTSORT:PbdR}}
[[Category:Parallel computing]]
[[Category:Programming languages]]
[[Category:Cross-platform free software]]
[[Category:FunctionalData languagesmining and machine learning software]]
[[Category:Data-centric programming languages]]
[[Category:Statistical software]]
[[Category:Free statistical software]]
[[Category:LinuxFunctional numerical analysis softwarelanguages]]
[[Category:DataNumerical mininganalysis andsoftware machinefor learning softwareLinux]]
[[Category:StatisticalNumerical analysis software for macOS]]
[[Category:Numerical analysis software for Windows]]
[[Category:Parallel computing]]