Revision as of 00:51, 1 July 2013 edit Wccsnow (talk \| contribs) 109 edits mNo edit summary ← Previous edit		Revision as of 01:08, 1 July 2013 edit undo Wccsnow (talk \| contribs) 109 edits mNo edit summary Next edit →
Line 25: </ref> on a large matrix, or performing [[Mixture model\|clustering analysis]] on high-dimensional large data. On the other hand, there is no restriction to use [[Master/slave (technology)\|manager/workers parallelism]] in [[SPMD\|SPMD parallelism]] environment. * The [http://cran.r-project.org/package=Rmpi Rmpi]<ref name=rmpi/> uses [[Master/slave (technology)\|manager/workers parallelism]] where one main processor (manager) servers as the control of all other processors (workers). The [[Master/slave (technology)\|manager/workers parallelism]]<ref>[http://userpages.uni-koblenz.de/~laemmel/MapReduce/paper.pdf "Google's MapReduce Programming Model -- Revisited"] — paper by Ralf Lämmel; from [[Microsoft]]</ref> introduced around early 2000 is particularly efficient for large tasks in small [[Computer cluster\|clusters]], for example, [[Bootstrapping (statistics)\|bootstrap method]] and [[Monte Carlo method\|Monte Carlo simulation]] in applied statistics since [[Independent and identically distributed random variables\|i.i.d.]] assumption is commonly used in most [[Statistics\|statistical analysis]]. In particular, [http://math.acadiau.ca/ACMMaC/Rmpi/structure.html\| task pull] parallelism has better performance for Rmpi in heterogeneous computing environments. The idea of [[SPMD\|SPMD parallelism]] is to let every processors do the same works but on different parts of large data. For example, modern [[Graphics processing unit\|GPU]] is a large collection of slower co-processors which can simply apply the same computation on different parts of relatively smaller data, but itthe [[SPMD\|SPMD parallelism]] ends up an efficient way to obtain final solutions, i.e. time to solution is shorter.<ref>{{cite web \| url = http://www.engadget.com/2006/09/29/stanford-university-tailors-folding-home-to-gpus/ \| title = Stanford University tailors Folding@home to GPUs \| author = Darren Murph \| accessdate = 2007-10-04 }}</ref><ref>{{cite web \| url = http://graphics.stanford.edu/~mhouston/ \| title = Folding@Home - GPGPU \| author = Mike Houston \| accessdate = 2007-10-04 }}</ref> It is clearly that pbdR is not only suitable for small [[Computer cluster\|clusters]], but also is stabler for analyzing [[Big data]] and is more scalable for [[Supercomputer\|supercomputers]].<ref>{{cite journal\|author=Schmidt, D., Ostrouchov, G., Chen, W.-C., and Patel, P.\|title=Tight Coupling of R and Distributed Linear Algebra for High-Level Programming with Big Data\|year=2012\|pages=811-815\|journal=High Performance Computing, Networking, Storage and Analysis (SCC), 2012 SC Companion:\|url=http://dl.acm.org/citation.cfm?id=2477156}}</ref> In short, pbdR * does '''not''' like Rmpi, snow, snowfall, do-like, '''nor''' parallel packages in R, * does '''not''' focus on interactive computing '''nor''' master/workers, * but is able to use '''both''' SPMD and task parallelisms. It is clearly that pbdR is not only suitable for small [[Computer cluster\|clusters]], but also is stabler for analyzing [[Big data]] and is more scalable for [[Supercomputer\|supercomputers]].<ref>{{cite journal\|author=Schmidt, D., Ostrouchov, G., Chen, W.-C., and Patel, P.\|title=Tight Coupling of R and Distributed Linear Algebra for High-Level Programming with Big Data\|year=2012\|pages=811-815\|journal=High Performance Computing, Networking, Storage and Analysis (SCC), 2012 SC Companion:\|url=http://dl.acm.org/citation.cfm?id=2477156}}</ref> == Package design ==

Programming with Big Data in R: Difference between revisions