Programming with Big Data in R: Difference between revisions

Browse history interactively

← Previous edit

Content deleted Content added

VisualWikitext

Revision as of 00:51, 1 July 2013 edit Wccsnow (talk \| contribs) 109 edits mNo edit summary ← Previous edit		Latest revision as of 16:31, 28 February 2024 edit undo Stat3472 (talk \| contribs) 47 edits m Link to model-based clustering article updated to link to main article
(89 intermediate revisions by 46 users not shown)
Line 1: {{multiple issues\| ~~<!-- Please do not remove or change this AfD message until the issue is settled -->~~ ~~{{Article for deletion/dated\|page=Programming with Big Data in R\|timestamp=20130627212758\|year=2013\|month=June\|day=27\|substed=yes\|help=off}}~~ ~~<!-- For administrator use only: {{Old AfD multi\|page=Programming with Big Data in R\|date=27 June 2013\|result='''keep'''}} -->~~ ~~<!-- End of AfD message, feel free to edit beyond this point -->~~ {{notability\|date=June 2013}} {{~~Expert-subject\|Statistics,Computer science~~COI\|date=June 2013}} }} {{Infobox programming language \| name = ~~pbdR~~bdrp \| logo = ~~[[File:Pbdr.png\|200px]]~~ \| paradigm = [[SPMD]] and [[MPMD]] \| ~~year~~released = {{Start date and ~~= Sep.~~ age\|2012\|09}} \| designer = ~~[http://thirteen-01.stat.iastate.edu/snoweye/mypage/~~ Wei-Chen Chen], ~~[http://www.csm.ornl.gov/~ost~~ George Ostrouchov], Pragneshkumar Patel, and ~~[http://wrathematics.github.io/~~ Drew Schmidt] \| developer = pbdR Core Team \| latest_test_version = Through [[GitHub]] at [~~http~~https://github.com/RBigData/ RBigData] \| typing = [[dynamic typing\|Dynamic]] \| influenced_by = [[R (programming language)\|R]], [[C (programming language)\|C]], [[~~Fortran (programming language)\|~~Fortran]], ~~and~~ [[Message Passing Interface\|MPI]], and [[ZeroMQ\|ØMQ]] \| operating_system = [[Cross-platform]] \| license = [[General Public License]] and [[Mozilla Public License]] \| website = ~~[http://~~{{URL\|www.r-pbd.org ~~r-pbd.org]~~}} }} '''Programming with Big Data in R''' (pbdR)<ref>{{cite web\|author=Ostrouchov, G., Chen, W.-C., Schmidt, D., Patel, P.\|title=Programming with Big Data in R\|year=2012\|url=http://r-pbd.org/}}</ref~~><ref>{{cite web\|title=XSEDE\|url=https://portal.xsede.org/knowledge-base/-/kb/document/bcrw}}</ref><ref name=pbdDEMO/~~> is a series of [[R (programming language)\|R]] packages and an environment for [[statistical computing]] with [[~~Big~~big ~~Data~~data]] by ~~utilizing~~using high-performance statistical computation.<ref>{{cite web\|~~author~~author1=Chen, W.-C. ~~and~~ \|author2=Ostrouchov, G.\|name-list-style=amp\|url=http://thirteen-01.stat.iastate.edu/snoweye/hpsc/\|year=2011\|title=HPSC -- High Performance Statistical Computing for Data Intensive Research\|access-date=2013-06-25\|archive-url=https://web.archive.org/web/20130719020318/http://thirteen-01.stat.iastate.edu/snoweye/hpsc/\|archive-date=2013-07-19\|url-status=dead}}</ref> ~~The pbdR uses the same programming language as [[R (programming language)\|R]]~~<ref ~~name=R~~>{{cite ~~book~~web\|~~author~~url=~~R Core Team~~https://learnshareit.com/tutorials-for-r/\|title=R:Basic ATutorials ~~Language~~for ~~and~~R ~~Environment~~to ~~for~~Start ~~Statistical~~Analyzing ~~Computing~~Data\|~~year=2012\|isbn~~date=3~~-900051-07-0\|url=http://www.r-project.org/~~ November 2022 }}</ref> The pbdR uses the same programming language as R with [[S (programming language)\|S3/S4]] classes and methods which is used among [[statistician]]s and [[Data mining\|data miners]] for developing [[statistical software]]. The significant difference between pbdR and [[R ~~(programming~~code ~~language)\|R]]<ref~~is ~~name=R/> codes is~~that pbdR mainly focuses on [[distributed memory]] ~~system~~systems, where data are distributed across several processors and analyzed in a [[Batch processing\|batch mode]], while communications between processors are based on [[Message Passing Interface\|MPI]] ~~which~~that is easily ~~utilized~~used in large [[High-performance computing\|high-performance computing (HPC)]] systems. [[R ~~(programming~~system ~~language)\|R]]~~mainly ~~system<ref~~focuses{{Citation ~~name~~needed\|date=~~R/>~~July ~~mainly focuses~~2013}} on single [[Multi-core processor\| multi-core]] machines for data analysis via an interactive mode such as [[Graphical user interface\|GUI interface]].~~<ref>Martinez, W. L. (2011), Graphical user interfaces. WIREs Comp Stat, 3: 119–133. doi: 10.1002/wics.150</ref>~~ Two main implementations in [[R (programming language)\|R]] using [[Message Passing Interface\|MPI]] are [http://cran.r-project.org/package=Rmpi Rmpi]<ref name=rmpi/> and [http://cran.r-project.org/package=pbdMPI pbdMPI] of pbdR.▼ * The pbdR built on [http://cran.r-project.org/package=pbdMPI pbdMPI] uses [[SPMD\|SPMD parallelism]] where every processors are considered as workers and own parts of data. The [[SPMD\|SPMD parallelism]]<ref name=spmd/><ref name=spmd_ostrouchov/> introduced in mid 1980 is particularly efficient in homogeneous computing environments for large data, for example, performing [[Singular value decomposition\|singular value decomposition]]<ref>{{Cite book \| last1=Golub \| first1=Gene H. \| author1-link=Gene H. Golub \| last2=Van Loan \| first2=Charles F. \| author2-link=Charles F. Van Loan \| title=Matrix Computations \| publisher=Johns Hopkins \| edition=3rd \| isbn=978-0-8018-5414-9 \| year=1996 }}▼ </ref> on a large matrix, or performing [[Mixture model\|clustering analysis]] on high-dimensional large data. On the other hand, there is no restriction to use [[Master/slave (technology)\|manager/workers parallelism]] in [[SPMD\|SPMD parallelism]] environment. * The [http://cran.r-project.org/package=Rmpi Rmpi]<ref name=rmpi/> uses [[Master/slave (technology)\|manager/workers parallelism]] where one main processor (manager) servers as the control of all other processors (workers). The [[Master/slave (technology)\|manager/workers parallelism]]<ref>[http://userpages.uni-koblenz.de/~laemmel/MapReduce/paper.pdf "Google's MapReduce Programming Model -- Revisited"] — paper by Ralf Lämmel; from [[Microsoft]]</ref> introduced around early 2000 is particularly efficient for large tasks in small [[Computer cluster\|clusters]], for example, [[Bootstrapping (statistics)\|bootstrap method]] and [[Monte Carlo method\|Monte Carlo simulation]] in applied statistics since [[Independent and identically distributed random variables\|i.i.d.]] assumption is commonly used in most [[Statistics\|statistical analysis]]. In particular, [http://math.acadiau.ca/ACMMaC/Rmpi/structure.html\| task pull] parallelism has better performance for Rmpi in heterogeneous computing environments.▼ The idea of [[SPMD\|SPMD parallelism]] is to let every processors do the same works but on different parts of large data. For example, modern [[Graphics processing unit\|GPU]] is a large collection of slower co-processors which can simply apply the same computation on different parts of relatively smaller data, but it ends up an efficient way to obtain final solutions.<ref>{{cite web \| url = http://www.engadget.com/2006/09/29/stanford-university-tailors-folding-home-to-gpus/ \| title = Stanford University tailors Folding@home to GPUs \| author = Darren Murph \| accessdate = 2007-10-04 }}</ref><ref>{{cite web \| url = http://graphics.stanford.edu/~mhouston/ \| title = Folding@Home - GPGPU \| author = Mike Houston \| accessdate = 2007-10-04 }}</ref>▼ ▲Two main implementations in [[R (programming language)\|R]] using [[Message Passing Interface\|MPI]] are ~~[http://cran.r-project.org/package=Rmpi~~ Rmpi]<ref name=rmpi/>{{cite ~~and~~journal\|author=Yu, ~~[http~~H.\|title=Rmpi: Parallel Statistical Computing in R\|year=2002\|url=https://cran.r-project.org/package=~~pbdMPI~~Rmpi\|journal=R News}}</ref> and pbdMPI] of pbdR. It is clearly that pbdR is not only suitable for small [[Computer cluster\|clusters]], but also is stabler for analyzing [[Big data]] and is more scalable for [[Supercomputer\|supercomputers]].<ref>{{cite journal\|author=Schmidt, D., Ostrouchov, G., Chen, W.-C., and Patel, P.\|title=Tight Coupling of R and Distributed Linear Algebra for High-Level Programming with Big Data\|year=2012\|pages=811-815\|journal=High Performance Computing, Networking, Storage and Analysis (SCC), 2012 SC Companion:\|url=http://dl.acm.org/citation.cfm?id=2477156}}</ref> ▲* The pbdR built on ~~[http://cran.r-project.org/package=~~pbdMPI ~~pbdMPI]~~ uses [[SPMD\|SPMD parallelism]] where every ~~processors~~processor ~~are~~is considered as ~~workers~~worker and ~~own~~owns parts of data. The [[SPMD\|SPMD parallelism]]~~<ref name=spmd/><ref name=spmd_ostrouchov/>~~ introduced in mid 1980 is particularly efficient in homogeneous computing environments for large data, for example, performing [[~~Singular value decomposition\|~~singular value decomposition]]~~<ref>{{Cite~~ ~~book~~on \|a ~~last1=Golub~~large \|matrix, ~~first1=Gene~~or H.performing ~~\| author1-link=Gene H. Golub~~[[Mixture model\|clustering ~~last2=Van~~analysis]] ~~Loan~~on \|high-dimensional ~~first2=Charles~~large Fdata. \|On ~~author2-link=Charles~~the F.other ~~Van~~hand, ~~Loan~~there \|is ~~title=Matrix~~no ~~Computations~~restriction \|to ~~publisher=Johns~~use ~~Hopkins~~[[Master/slave (technology)\|manager/workers ~~edition=3rd~~parallelism]] ~~\| isbn=978-0-8018-5414-9~~in [[SPMD\|SPMD ~~year=1996~~parallelism]] }}environment. ▲* The ~~[http://cran.r-project.org/package=~~Rmpi ~~Rmpi]~~<ref name=rmpi/> uses [[Master/slave (technology)\|manager/workers parallelism]] where one main processor (manager) ~~servers~~serves as the control of all other processors (workers). The [[Master/slave (technology)\|manager/workers parallelism]]~~<ref>[http://userpages.uni-koblenz.de/~laemmel/MapReduce/paper.pdf "Google's MapReduce Programming Model -- Revisited"] — paper by Ralf Lämmel; from [[Microsoft]]</ref>~~ introduced around early 2000 is particularly efficient for large tasks in small [[Computer cluster\|clusters]], for example, [[Bootstrapping (statistics)\|bootstrap method]] and [[Monte Carlo method\|Monte Carlo simulation]] in applied statistics since [[Independent and identically distributed random variables\|i.i.d.]] assumption is commonly used in most [[Statistics\|statistical analysis]]. In particular, ~~[http://math.acadiau.ca/ACMMaC/Rmpi/structure.html\|~~ task pull] parallelism has better performance for Rmpi in heterogeneous computing environments. ▲The idea of [[SPMD\|SPMD parallelism]] is to let every ~~processors~~processor do the same ~~works~~amount of work, but on different parts of a large data set. For example, a modern [[Graphics processing unit\|GPU]] is a large collection of slower co-processors ~~which~~that can simply apply the same computation on different parts of relatively smaller data, but itthe SPMD parallelism ends up with an efficient way to obtain final solutions~~.<ref>{{cite~~ ~~web \| url = http://www~~(i.~~engadget~~e.~~com/2006/09/29/stanford-university-tailors-folding-home-to-gpus/~~ ~~\| title = Stanford University tailors Folding@home~~time to ~~GPUs~~solution \|is ~~author = Darren Murph \| accessdate = 2007-10-04 }}</ref>~~shorter).<ref>{{cite web \| url = http://graphics.stanford.edu/~mhouston/ \| title = Folding@Home - GPGPU \| author = Mike Houston \| ~~accessdate~~access-date = 2007-10-04 }}</ref> == Package design == Line 33 ⟶ 29: {\| class="wikitable" \|- ! General !! I/O !! Computation !! Application !! Profiling !! Client/Server \|- \| pbdDEMO \|\| pbdNCDF4 \|\| pbdDMAT \|\| pmclust \|\| pbdPROF \|\| pbdZMQ \| [http://cran.r-project.org/package=pbdDEMO pbdDEMO] \|\| [http://cran.r-project.org/package=pbdNCDF4 pbdNCDF4] \|\| [http://cran.r-project.org/package=pbdDMAT pbdDMAT] \|\| [http://cran.r-project.org/package=pmclust pmclust] \|- \| pbdMPI \|\| pbdADIOS \|\| pbdBASE \|\| pbdML \|\| pbdPAPI \|\| remoter ~~\| [http://cran.r-project.org/package=pbdMPI pbdMPI] \|\| \|\| [http://cran.r-project.org/package=pbdBASE pbdBASE] \|\|~~ \|- \| \|\| \|\| ~~[http://cran.r-project.org/package=~~pbdSLAP ~~pbdSLAP]~~\|\| \|\| hpcvis \|\| pbdCS \|- \| \|\| \|\| kazaam \|\| \|\| \|\| pbdRPC \|} [[File:Pbd overview.png\|thumb\|The images describes how various pbdr packages are correlated.]] Among these packages, pbdMPI provides wrapper functions to [[Message Passing Interface\|MPI]] library, and it also produces a [[Library (computing)\|shared library]] and a configuration file for [[MPI]] environments. All other packages rely on this configuration for installation and library loading that ~~avoid~~avoids difficulty of library linking and compiling. All other packages can directly ~~utilize~~use [[MPI]] functions easily. * ~~[http://cran.r-project.org/web/packages/~~pbdMPI~~/vignettes/pbdMPI-guide.pdf pbdMPI]~~ --- an efficient interface to [[MPI]] either [[Open MPI\|OpenMPI]]~~<ref>~~ or [[MPICH2]] with a focus on Single Program/Multiple Data ([[SPMD]]) parallel programming style * pbdSLAP --- bundles scalable dense linear algebra libraries in double precision for R, based on [[ScaLAPACK]] version 2.0.2 which includes several scalable linear algebra packages (namely [[BLACS]], [[PBLAS]], and [[ScaLAPACK]]). ~~{{cite web~~ * pbdNCDF4 --- interface to Parallel Unidata [[NetCDF]]4 format data files ~~\| url=http://www.open-mpi.org/papers/sc-2008/jsquyres-cisco-booth-talk-1up.pdf~~ * ~~[http://cran.r-project.org/web/packages/~~pbdBASE~~/vignettes/pbdBASE-guide.pdf pbdBASE]~~ --- low-level [[ScaLAPACK]] codes and wrappers▼ ~~\|author=Jeff Squyres~~ * pbdDMAT --- distributed matrix classes and computational methods, with a focus on linear algebra and statistics ~~\| publisher=Open MPI Project~~ * pbdDEMO --- set of package demonstrations and examples, and this unifying vignette ~~\| title=Open MPI: 10^15 Flops Can't Be Wrong~~ * pmclust --- parallel [[model-based clustering]] using pbdR \| accessdate=2011-09-27}}</ref> or [[MPICH2]]<ref>[http://www.mcs.anl.gov/research/projects/mpich2/downloads/license.txt MPICH License]</ref> with a focus on Single Program/Multiple Data ([[SPMD]]) parallel programming style<ref name=spmd>{{cite journal\|author=Darema, F.\|title=The SPMD Model: Past, Present and Future\|url=http://dx.doi.org/10.1007/3-540-45417-9_1\|year=2001}}</ref><ref>{{cite journal\|author=Ortega, J.M., Voight, G.G., and Romine, C.H.\|year=1989\|title=Bibliography on Parallel and Vector Numerical Algorithms\|url=http://liinwww.ira.uka.de/bibliography/Parallel/ovr.html}}</ref><ref name=spmd_ostrouchov>{{cite journal\|author=Ostrouchov, G.\|year=1987\|title=Parallel Computing on a Hypercube: An Overview of the Architecture and Some Applications\|journal=Proc. 19th Symp. on the Interface of Computer Science and Statistics\|page=27-32}}</ref> * pbdPROF --- profiling package for MPI codes and visualization of parsed stats * [http://cran.r-project.org/web/packages/pbdSLAP/vignettes/pbdSLAP-guide.pdf pbdSLAP] --- bundles scalable dense linear algebra libraries in double precision for R, based on [[ScaLAPACK]] version 2.0.2<ref>{{cite book\|title=ScaLAPACK Users' Guide\|author=Blackford, L.S., et.al.\|year=1997\|url=http://netlib.org/scalapack/slug/scalapack_slug.html/}}</ref> which includes several scalable linear algebra packages (namely [[BLACS]], [[PBLAS]], and [[ScaLAPACK]]).<ref>{{cite web\|title=PBLAS\|url=http://www.netlib.org/utk/papers/scalapack/node9.html\|work=Netlib\|first=Antoine \|last=Petitet * pbdZMQ --- interface to [[ZeroMQ\|ØMQ]] \|year=1995\|accessdate= 13 July 2012}}</ref><ref name=pbblas>{{cite journal\|title=PB-BLAS: a set of Parallel Block Basic Linear Algebra Subprograms\|journal=Scalable High-Performance Computing Conference\|year=1994\|month=May\|pages=534–541\|url=http://www.netlib.org/utk/people/JackDongarra/journals/079_1996_pb-blas-a-set-of-parallel-block-basic-linear-algebra-subroutines.pdf\|doi=10.1109/SHPCC.1994.296688\|isbn=0-8186-5680-8\|last1=Jaeyoung Choi\|last2=Dongarra\|first2=J.J.\|last3=Walker\|first3=D.W.}}</ref> * remoter --- R client with remote R servers * [http://cran.r-project.org/web/packages/pbdNCDF4/vignettes/pbdNCDF4-guide.pdf pbdNCDF4] --- Interface to Parallel Unidata [[NetCDF\|NetCDF4]] format data files<ref>{{cite web\|title=Network Common Data Form\|author=NetCDF Group\|url=http://www.unidata.ucar.edu/software/netcdf/\|year=2008}}</ref> * pbdCS --- pbdR client with remote pbdR servers ▲* [http://cran.r-project.org/web/packages/pbdBASE/vignettes/pbdBASE-guide.pdf pbdBASE] --- low-level [[ScaLAPACK]] codes and wrappers * pbdRPC --- remote procedure call * [http://cran.r-project.org/web/packages/pbdDMAT/vignettes/pbdDMAT-guide.pdf pbdDMAT] --- distributed matrix classes and computational methods, with a focus on linear algebra and statistics<ref>{{cite journal\|author=J. Dongarra and D. Walker\|title=The Design of Linear Algebra Libraries for High Performance Computers\|url=http://acts.nersc.gov/scalapack/hands-on/datadist.html}}</ref><ref>{{cite journal\|author=J. Demmel, M. Heath, and H. van der Vorst\|title=Parallel Numerical Linear Algebra\|url=http://acts.nersc.gov/scalapack/hands-on/datadist.html}}</ref><ref>{{cite web\|title=2d block-cyclic data layout\|url=http://acts.nersc.gov/scalapack/hands-on/datadist.html}}</ref> * kazaam --- very tall and skinny distributed matrices * [http://cran.r-project.org/web/packages/pbdDEMO/vignettes/pbdDEMO-guide.pdf pbdDEMO] --- set of package demonstrations and examples, and this unifying vignette<ref name=pbdDEMO>{{cite journal\|author=Schmidt, D., Chen, W.-C., Patel, P., Ostrouchov, G.\|year=2013\|title=Speaking Serial R with a Parallel Accent\|url=http://github.com/wrathematics/pbdDEMO/blob/master/inst/doc/pbdDEMO-guide.pdf?raw=true}}</ref> * pbdML --- machine learning toolbox * [http://cran.r-project.org/web/packages/pmclust/vignettes/pmclust-guide.pdf pmclust] -- parallel [[Mixture model\|model-based clustering]] using pbdR ~~Amount~~Among those packages, the pbdDEMO package is a collection of 20+ package demos which offer example uses of the various pbdR packages, and contains a vignette ~~which~~that offers detailed explanations for the demos and provides some mathematical or statistical insight. == Examples == === Example 1 === Hello World! Save the following code in a file called ``"demo.r``" <~~source~~syntaxhighlight lang="~~rsplus~~r"> ### Initial MPI library(pbdMPI, quiet = TRUE) Line 73 ⟶ 72: ### Finish finalize() </syntaxhighlight> ~~</source>~~ and use the command <~~source~~syntaxhighlight lang="bash"> mpiexec -np 2 Rscript demo.r </syntaxhighlight> ~~</source>~~ to execute the code where [[R (programming language)\|Rscript]] is one of command line executable program. === Example 2 === The following example modified from pbdMPI illustrates the basic [[programming language syntax\|syntax of the language]] of pbdR. Since pbdR is designed in [[SPMD]], all the R scripts are stored in files and executed from the command line via ~~[[MPI\|~~mpiexec]], ~~[[MPI\|~~mpirun]], etc. Save the following code in a file called ``"demo.r``" <~~source~~syntaxhighlight lang="~~rsplus~~r"> ### Initial MPI library(pbdMPI, quiet = TRUE) Line 102 ⟶ 101: ### Finish finalize() </syntaxhighlight> ~~</source>~~ and use the command <~~source~~syntaxhighlight lang="bash"> mpiexec -np 4 Rscript demo.r </syntaxhighlight> ~~</source>~~ to execute the code where [[R (programming language)\|Rscript]] is one of command line executable program. === Example 3 === The following example modified from pbdDEMO illustrates the basic ddmatrix computation of pbdR which performs [[~~Singular value decomposition\|~~singular value decomposition]] on a given matrix. Save the following code in a file called ``"demo.r``" <~~source~~syntaxhighlight lang="~~rsplus~~r"> # Initialize process grid library(pbdDMAT, quiet=T) Line 131 ⟶ 130: # Finish finalize() </syntaxhighlight> ~~</source>~~ and use the command <~~source~~syntaxhighlight lang="bash"> mpiexec -np 2 Rscript demo.r </syntaxhighlight> ~~</source>~~ to execute the code where [[R (programming language)\|Rscript]] is one of command line executable program. == Further reading == * ~~[http://userpages.umbc.edu/~gobbert/papers/pbdRtara2013.pdf UMBC HPCF Technique Report by Raim, A.M. (2013)].<ref>~~{{cite ~~journal~~tech report\|author=Raim, A.M.\|year=2013\|title= Introduction to distributed computing with pbdR at the UMBC High Performance Computing Facility\|~~journal~~institution=UMBC ~~Technical~~High ~~Report~~Performance Computing Facility, University of Maryland, Baltimore County\|number=HPCF-2013-2\|url=http://userpages.umbc.edu/~gobbert/papers/pbdRtara2013.pdf\|accessdate=2013-06-26\|archiveurl=https://web.archive.org/web/20140204051402/http://userpages.umbc.edu/~gobbert/papers/pbdRtara2013.pdf\|archivedate=2014-02-04\|url-status=dead}}~~</ref>~~ * {{cite tech report\|author=Bachmann, M.G., Dyas, A.D., Kilmer, S.C. and Sass, J.\|year=2013\|title=Block Cyclic Distribution of Data in pbdR and its Effects on Computational Efficiency\|institution=UMBC High Performance Computing Facility, University of Maryland, Baltimore County\|number=HPCF-2013-11\|url=http://userpages.umbc.edu/~gobbert/papers/REU2013Team1.pdf\|accessdate=2014-02-01\|archiveurl=https://web.archive.org/web/20140204051351/http://userpages.umbc.edu/~gobbert/papers/REU2013Team1.pdf\|archivedate=2014-02-04\|url-status=dead}} * [http://cran.r-project.org/ CRAN] Task View: [http://cran.r-project.org/web/views/HighPerformanceComputing.html High-Performance and Parallel Computing with R].<ref>{{cite web\|title=High-Performance and Parallel Computing with R\|author=Dirk Eddelbuettel\|url=http://cran.r-project.org/web/views/HighPerformanceComputing.html}}</ref>▼ * {{cite tech report\|author=Bailey, W.J., Chambless, C.A., Cho, B.M. and Smith, J.D.\|year=2013\|title=Identifying Nonlinear Correlations in High Dimensional Data with Application to Protein Molecular Dynamics Simulations\|institution=UMBC High Performance Computing Facility, University of Maryland, Baltimore County\|number=HPCF-2013-12\|url=http://userpages.umbc.edu/~gobbert/papers/REU2013Team2.pdf\|accessdate=2014-02-01\|archiveurl=https://web.archive.org/web/20140204055902/http://userpages.umbc.edu/~gobbert/papers/REU2013Team2.pdf\|archivedate=2014-02-04\|url-status=dead}} * [http://www.r-bloggers.com/r-at-12000-cores/ R at 12,000 Cores].<ref>{{cite news\|title=R at 12,000 Cores\|url=http://www.r-bloggers.com/r-at-12000-cores/}}</ref> This article was read 22,584 times in 2012 since it posted on October 16, 2012 and ranked number 3 according to [http://www.r-bloggers.com/100-most-read-r-posts-for-2012-stats-from-r-bloggers-big-data-visualization-data-manipulation-and-other-languages/\|Top 100 R posts of 2012]<ref>{{cite news\|url=http://www.r-bloggers.com/100-most-read-r-posts-for-2012-stats-from-r-bloggers-big-data-visualization-data-manipulation-and-other-languages/\|title=100 most read R posts in 2012 (stats from R-bloggers) – big data, visualization, data manipulation, and other languages}}</ref>▼ ▲* ~~[http://cran.r-project.org/ CRAN] Task View: [http://cran.r-project.org/web/views/HighPerformanceComputing.html High-Performance and Parallel Computing with R].<ref>~~{{cite web\|title=High-Performance and Parallel Computing with R\|author=Dirk Eddelbuettel\|date=13 November 2022 \|url=~~http~~https://cran.r-project.org/web/views/HighPerformanceComputing.html\|author-link=Dirk Eddelbuettel}}~~</ref>~~ * [http://rwiki.sciviews.org/doku.php?id=developers:projects:gsoc2013:mpiprofiler\|MPI Profiler for pbdR] mentored by the [http://rwiki.sciviews.org/doku.php\| Organization of R Project for Statistical Computing] for Google summer of code 2013.<ref>{{cite web\|url=http://rwiki.sciviews.org/doku.php?id=developers:projects:gsoc2013:mpiprofiler\|title=Profiling Tools for Parallel Computing with R\|author=GSOC-R 2013}}</ref>▼ ▲* ~~[http://www.r-bloggers.com/r-at-12000-cores/ R at 12,000 Cores].<ref>~~{{cite news\|title=R at 12,000 Cores\|url=http://www.r-bloggers.com/r-at-12000-cores/}}<br /~~ref~~> This article was read 22,584 times in 2012 since it posted on October 16, 2012, and ranked number 3 ~~according to [http://www.r-bloggers.com/100-most-read-r-posts-for-2012-stats-from-r-bloggers-big-data-visualization-data-manipulation-and-other-languages/\|Top 100 R posts of 2012]~~<ref>{{cite news\|url=http://www.r-bloggers.com/100-most-read-r-posts-for-2012-stats-from-r-bloggers-big-data-visualization-data-manipulation-and-other-languages/\|title=100 most read R posts in 2012 (stats from R-bloggers) – big data, visualization, data manipulation, and other languages}}</ref> ▲* [{{cite web\|url=http://rwiki.sciviews.org/doku.php?id=developers:projects:gsoc2013:mpiprofiler\|~~MPI Profiler for pbdR] mentored by the [http~~archive-url=https://~~rwiki~~archive.~~sciviews.org~~today/20130629095333/~~doku.php\| Organization of R Project for Statistical Computing] for Google summer of code 2013.<ref>{{cite web\|url=~~http://rwiki.sciviews.org/doku.php?id=developers:projects:gsoc2013:mpiprofiler\|url-status=dead\|archive-date=2013-06-29\|title=Profiling Tools for Parallel Computing with R\|author=~~GSOC~~Google Summer of Code - R 2013}}~~</ref>~~ == External links ==▼ * {{cite web\|url=http://rpubs.com/wush978/pbdMPI-linux-pilot\|title=在雲端運算環境使用R和MPI\|author=Wush Wu (2014)}} * {{Official website\|r-pbd.org}} of the pbdR project▼ * {{cite web\|url=https://www.youtube.com/watch?v=m1vtPESsFqM\|title=快速在AWS建立R和pbdMPI的使用環境\|author=Wush Wu (2013)\|website=[[YouTube]] }} * [http://thirteen-01.stat.iastate.edu/snoweye/pbdr/ Technical website] of the pbdR packages * [http://code.r-pbd.org Source Code] of developing version of the pbdR packages * [http://group.r-pbd.org Discussion Group] for any of pbdR related topics ~~== Milestones ==~~ ~~2013~~ * Version 1.0-2:  Add pmclust. * Version 1.0-1:  Add pbdNCDF4. * Version 1.0-0:  Add pbdDEMO. ~~2012~~ * Version 0.1-2:  Add pbdBASE and pbdDMAT. * Version 0.1-1:  Add pbdSLAP. * Version 0.1-0:  Migrate from [http://cran.r-project.org/package=Rmpi Rmpi]<ref name=rmpi>{{cite journal\|author=Yu, H.\|title=Rmpi: Parallel Statistical Computing in R\|year=2002\|url=http://cran.r-project.org/package=Rmpi\|journal=R News}}</ref> to pbdMPI. == References == {{Reflist\|30em}} ▲== External links == ▲* {{Official website\|www.r-pbd.org}} ~~of the pbdR project~~ {{DEFAULTSORT:PbdR}} [[Category:Parallel computing]]▼ ~~[[Category:Programming languages]]~~ [[Category:Cross-platform free software]] [[Category:~~Functional~~Data ~~languages~~mining and machine learning software]] [[Category:Data-centric programming languages]] [[Category:Statistical software]]▼ [[Category:Free statistical software]] [[Category:~~Linux~~Functional ~~numerical analysis software~~languages]] [[Category:~~Data~~Numerical ~~mining~~analysis ~~and~~software ~~machine~~for ~~learning software~~Linux]] ▲[[Category:~~Statistical~~Numerical analysis software for macOS]] [[Category:Numerical analysis software for Windows]] ▲[[Category:Parallel computing]]