Programming with Big Data in R: Difference between revisions

Content deleted Content added
Citation bot (talk | contribs)
Misc citation tidying. | You can use this bot yourself. Report bugs here. | Suggested by Abductive | Category:Cross-platform free software | via #UCB_Category 19/379
No edit summary
Tags: Mobile edit Mobile web edit
Line 6:
 
{{Infobox programming language
| name = pbdRbdrp
| logo =
| paradigm = [[SPMD]] and [[MPMD]]
Line 24:
* The pbdR built on pbdMPI uses [[SPMD|SPMD parallelism]] where every processor is considered as worker and owns parts of data. The [[SPMD|SPMD parallelism]] introduced in mid 1980 is particularly efficient in homogeneous computing environments for large data, for example, performing [[singular value decomposition]] on a large matrix, or performing [[Mixture model|clustering analysis]] on high-dimensional large data. On the other hand, there is no restriction to use [[Master/slave (technology)|manager/workers parallelism]] in [[SPMD|SPMD parallelism]] environment.
* The Rmpi<ref name=rmpi/> uses [[Master/slave (technology)|manager/workers parallelism]] where one main processor (manager) serves as the control of all other processors (workers). The [[Master/slave (technology)|manager/workers parallelism]] introduced around early 2000 is particularly efficient for large tasks in small [[Computer cluster|clusters]], for example, [[Bootstrapping (statistics)|bootstrap method]] and [[Monte Carlo method|Monte Carlo simulation]] in applied statistics since [[Independent and identically distributed random variables|i.i.d.]] assumption is commonly used in most [[Statistics|statistical analysis]]. In particular, task pull parallelism has better performance for Rmpi in heterogeneous computing environments.
The idea of [[SPMD|SPMD parallelism]] is to let every processor do the same amount of work, but on different parts of a large data set. For example, a modern [[Graphics processing unit|GPU]] is a large collection of slower co-processors that can simply apply the same computation on different parts of relatively smaller data, but the SPMD parallelism ends up with an efficient way to obtain final solutions (i.e. time to solution is shorter).<ref>{{cite web | url = http://graphics.stanford.edu/~mhouston/ | title = Folding@Home - GPGPU | author = Mike Houston | access-date = 2007-10-04 }}</ref>
 
== Package design ==