Content deleted Content added
No edit summary |
|||
Line 13:
| website = [http://www.r-pbd.org r-pbd.org]
}}
'''Programming with Big Data in R''' (pbdR)<ref>{{cite web|author=Ostrouchov, G., Chen, W.-C., Schmidt, D., Patel, P.|title=Programming with Big Data in R|year=2012|url=http://r-pbd.org/}}</ref> is a [[free software]] [[programming language]] and
Two main implementations in [[R (programming language)|R]] using [[Message Passing Interface|MPI]] are [[Rmpi]]<ref name=rmpi/> and [[pbdMPI]] of pbdR.
* The [[Rmpi]]<ref name=rmpi/> uses [[Master/slave (technology)|Manager/Workers Parallelism]] where one main processor (manager) servers as the control of all other processors (workers). This parallelism is particularly efficient for large tasks in small [[Computer cluster|clusters]], for example, [[Bootstrapping (statistics)|bootstrap method]] and [[Monte Carlo method|Monte Carlo simulation]] in applied statistics since [[Independent and identically distributed random variables|i.i.d.]] assumption is commonly used in most [[Statistics|statistical analysis]].▼
* The pbdR built on [[pbdMPI]] uses [[SPMD|SPMD Parallelism]] where every processors are considered as workers and own parts of data. This parallelism is particularly for large data, for example, performing [[Singular value decomposition|singular value decomposition]] on a large matrix, or performing [[Mixture model|clustering analysis]] on high-dimensional large data. On the other hand, there is no restriction to use [[Master/slave (technology)|Manager/Workers Parallelism]] in [[SPMD|SPMD Parallelism]] environment.
▲* The [[Rmpi]]<ref name=rmpi/> uses [[Master/slave (technology)|Manager/Workers Parallelism]] where one main processor (manager) servers as the control of all other processors (workers). This parallelism is particularly efficient for large tasks in small [[Computer cluster|clusters]], for example, [[Bootstrapping (statistics)|bootstrap method]] and [[Monte Carlo method|Monte Carlo simulation]] in applied statistics since [[Independent and identically distributed random variables|i.i.d.]] assumption is commonly used in most [[Statistics|statistical analysis]].
It is clearly that pbdR is suitable for small [[Computer cluster|clusters]], but is stabler for analyzing larger data and is more scalable for [[Supercomputer|supercomputers]].
|