Revision as of 15:25, 27 June 2013 edit Wccsnow (talk \| contribs) 109 edits m More independent contents and references ← Previous edit		Revision as of 15:27, 27 June 2013 edit undo Wccsnow (talk \| contribs) 109 edits No edit summary Next edit →
Line 19: * The pbdR built on [http://cran.r-project.org/package=pbdMPI pbdMPI] uses [[SPMD\|SPMD parallelism]] where every processors are considered as workers and own parts of data. The [[SPMD\|SPMD parallelism]]<ref name=spmd/><ref name=spmd_ostrouchov/> introduced in mid 1980 is particularly efficient in homogeneous computing environments for large data, for example, performing [[Singular value decomposition\|singular value decomposition]]<ref>{{Cite book \| last1=Golub \| first1=Gene H. \| author1-link=Gene H. Golub \| last2=Van Loan \| first2=Charles F. \| author2-link=Charles F. Van Loan \| title=Matrix Computations \| publisher=Johns Hopkins \| edition=3rd \| isbn=978-0-8018-5414-9 \| year=1996 }} </ref> on a large matrix, or performing [[Mixture model\|clustering analysis]] on high-dimensional large data. On the other hand, there is no restriction to use [[Master/slave (technology)\|manager/workers parallelism]] in [[SPMD\|SPMD parallelism]] environment. * The [http://cran.r-project.org/package=Rmpi Rmpi]<ref name=rmpi/> uses [[Master/slave (technology)\|manager/workers parallelism]] where one main processor (manager) servers as the control of all other processors (workers). The [[Master/slave (technology)\|manager/workers parallelism]]<ref>[http://userpages.uni-koblenz.de/~laemmel/MapReduce/paper.pdf "Google's MapReduce Programming Model -- Revisited"] — paper by Ralf Lämmel; from [[Microsoft]]</ref> introduced inaround ~~mid~~early 2000 is particularly efficient for large tasks in small [[Computer cluster\|clusters]], for example, [[Bootstrapping (statistics)\|bootstrap method]] and [[Monte Carlo method\|Monte Carlo simulation]] in applied statistics since [[Independent and identically distributed random variables\|i.i.d.]] assumption is commonly used in most [[Statistics\|statistical analysis]]. In particular, [http://math.acadiau.ca/ACMMaC/Rmpi/structure.html\|task pull] parallelism has better performance for Rmpi in heterogeneous computing environments. The idea of [[SPMD\|SPMD parallelism]] is to let every processors do the same works but on different parts of large data, for example, modern [[Graphics processing unit\|GPU]] is a large collection of slower co-processors which can simply apply the same computation on different parts of (relatively smaller) data, but it ends up an efficient way to obtain final solution.<ref>{{cite web \| url = http://www.engadget.com/2006/09/29/stanford-university-tailors-folding-home-to-gpus/ \| title = Stanford University tailors Folding@home to GPUs \| author = Darren Murph \| accessdate = 2007-10-04 }}</ref><ref>{{cite web \| url = http://graphics.stanford.edu/~mhouston/ \| title = Folding@Home - GPGPU \| author = Mike Houston \| accessdate = 2007-10-04 }}</ref> It is clearly that pbdR is not only suitable for small [[Computer cluster\|clusters]], but also is stabler for analyzing [[Big data]] and is more scalable for [[Supercomputer\|supercomputers]].<ref>{{cite journal\|author=Schmidt, D., Ostrouchov, G., Chen, W.-C., and Patel, P.\|title=Tight Coupling of R and Distributed Linear Algebra for High-Level Programming with Big Data\|year=2012\|pages=811-815\|journal=High Performance Computing, Networking, Storage and Analysis (SCC), 2012 SC Companion:\|url=http://dl.acm.org/citation.cfm?id=2477156}}</ref>

Programming with Big Data in R: Difference between revisions