Revision as of 20:30, 26 June 2013 edit 128.219.49.14 (talk) This will be more notability to this topics in next 10 years ← Previous edit		Revision as of 20:34, 26 June 2013 edit undo 128.219.49.14 (talk) No edit summary Next edit →
Line 18: * The pbdR built on [[pbdMPI]] uses [[SPMD\|SPMD Parallelism]] where every processors are considered as workers and own parts of data. This parallelism is particularly for large data, for example, performing [[Singular value decomposition\|singular value decomposition]] on a large matrix, or performing [[Mixture model\|clustering analysis]] on high-dimensional large data. On the other hand, there is no restriction to use [[Master/slave (technology)\|Manager/Workers Parallelism]] in [[SPMD\|SPMD Parallelism]] environment. * The [[Rmpi]]<ref name=rmpi/> uses [[Master/slave (technology)\|Manager/Workers Parallelism]] where one main processor (manager) servers as the control of all other processors (workers). This parallelism is particularly efficient for large tasks in small [[Computer cluster\|clusters]], for example, [[Bootstrapping (statistics)\|bootstrap method]] and [[Monte Carlo method\|Monte Carlo simulation]] in applied statistics since [[Independent and identically distributed random variables\|i.i.d.]] assumption is commonly used in most [[Statistics\|statistical analysis]]. It is clearly that pbdR is suitable for small [[Computer cluster\|clusters]], but is stabler for analyzing larger data and is more scalable for [[Supercomputer\|supercomputers]].<ref>{{cite journal\|author=Schmidt, D., Ostrouchov, G., Chen, W.-C., and Patel, P.\|title=Tight Coupling of R and Distributed Linear Algebra for High-Level Programming with Big Data\|year=2012\|pages=811-815\|journal=High Performance Computing, Networking, Storage and Analysis (SCC), 2012 SC Companion:}}</ref> == Package design ==

Programming with Big Data in R: Difference between revisions