Content deleted Content added
m more independent references are added. |
|||
Line 15:
}}
'''Programming with Big Data in R''' (pbdR)<ref>{{cite web|author=Ostrouchov, G., Chen, W.-C., Schmidt, D., Patel, P.|title=Programming with Big Data in R|year=2012|url=http://r-pbd.org/}}</ref><ref>{{cite web|title=XSEDE|url=https://portal.xsede.org/knowledge-base/-/kb/document/bcrw}}</ref><ref name=pbdDEMO/> is a series of [[R (programming language)|R]] packages and an environment for [[statistical computing]] with [[Big Data]] by utilizing high-performance statistical computation.<ref>{{cite web|author=Chen, W.-C. and Ostrouchov, G.|url=http://thirteen-01.stat.iastate.edu/snoweye/hpsc/|year=2011|title=HPSC -- High Performance Statistical Computing for Data Intensive Research}}</ref> The pbdR uses the same programming language as [[R (programming language)|R]]<ref name=R>{{cite book|author=R Core Team|title=R: A Language and Environment for Statistical Computing|year=2012|isbn=3-900051-07-0|url=http://www.r-project.org/}}</ref> with [[S (programming language)|S3/S4]] classes and methods which is used among [[statistician]]s and [[Data mining|data miners]] for developing [[statistical software]]. The significant difference between pbdR and [[R (programming language)|R]]<ref name=R/> codes is pbdR mainly focuses on [[distributed memory]] system where data are distributed across several processors, while communications between processors are based on [[Message Passing Interface|MPI]] which is easily utilized in large [[High-performance computing|high-performance computing (HPC)]] systems. [[R (programming language)|R]] system<ref name=R/> mainly focuses on interactive data analysis on single [[Multi-core processor| multi-core]] machines. Two main implementations in [[R (programming language)|R]] using [[Message Passing Interface|MPI]] are [http://cran.r-project.org/package=Rmpi Rmpi]<ref name=rmpi/> and [http://cran.r-project.org/package=pbdMPI pbdMPI] of pbdR.
* The pbdR built on [http://cran.r-project.org/package=pbdMPI pbdMPI] uses [[SPMD|SPMD
* The [http://cran.r-project.org/package=Rmpi Rmpi]<ref name=rmpi/> uses [[Master/slave (technology)|manager/workers parallelism]] where one main processor (manager) servers as the control of all other processors (workers).
It is clearly that pbdR is suitable for small [[Computer cluster|clusters]], but is stabler for analyzing larger data and is more scalable for [[Supercomputer|supercomputers]].<ref>{{cite journal|author=Schmidt, D., Ostrouchov, G., Chen, W.-C., and Patel, P.|title=Tight Coupling of R and Distributed Linear Algebra for High-Level Programming with Big Data|year=2012|pages=811-815|journal=High Performance Computing, Networking, Storage and Analysis (SCC), 2012 SC Companion:|url=http://dl.acm.org/citation.cfm?id=2477156}}</ref>
|