Академический Документы
Профессиональный Документы
Культура Документы
Luke Tierney
1 Introduction
0.30
1 thread
2 threads
4 threads
0.25
0.25
8 threads
0.20
0.20
Time in milliseconds
Time in milliseconds
0.15
0.15
0.10
0.10
0.05
0.05
0.00
0.00
0 200 400 600 800 1000 0 200 400 600 800 1000
n n
0.30
0.25
0.25
0.20
0.20
Time in milliseconds
Time in milliseconds
0.15
0.15
0.10
0.10
0.05
0.05
0.00
0.00
0 200 400 600 800 1000 0 200 400 600 800 1000
n n
Fig. 1. Timings of vectorized function evaluations for qnorm and pgamma as a func-
tion of vector length for two 8-processor systems. Plots for 10 replications are shown.
sqrt
sin
cos
exp
dnorm
pnorm
qnorm
dgamma
pgamma
qgamma
pbeta
qbeta
ptukey
qtukey
Fig. 2. Selected cutoff levels for switching to parallel evaluation of vectorized func-
tions.
to gcc 4.1 in recent Fedora and Redhat Linux releases. The current MinGW
Windows compiler suite also includes Open MP support.
Open MP uses compiler directives (#pragma statements in C; FORTRAN
uses structured comments) to request parallel implementation of a loop. For
example, Figure 3 shows the loop used for vectorizing a function of a single ar-
gument along with the Open MP parallelization directive. Functions of more
Fig. 3. Vectorization loop for function of one argument with Open MP paralleliza-
tion directive.
for recycling shorter arguments. A compiler that does not support Open MP
will ignore the omp directive and compile this as a standard sequential loop.
If the compiler supports Open MP and is asked to use it, then this will be
compiled to use the number of threads specified by the variable P .
Use of Open MP eliminates the need to manually manage threads, but
some effort is still needed. Only loops with simple control structure can be
parallelized by Open MP, which requires rewriting some of the loops used in
the standard R code. Also, it is essential that the functions being called are
safe to call from multiple threads. For this to be true these functions cannot
use read/write global variables, call R’s memory manager, signal warnings
or errors, or check for user interrupts. Even creating internationalized error
messages can be problematic as the subroutines that do this are not guar-
anteed to be thread-safe. Almost all functions in the basic R math library
are either thread-safe or easily modified to be thread-safe. Exceptions are the
Bessel functions and the Wilcoxon and signed rank functions.
A preliminary implementation of the approach outlined here is available
as a package pnmath. Loading this package replaces the standard vectorized
functions in R by parallelized ones. For Linux and Mac OS X predetermined
intercept calibrations are used; for other platforms a calibration test is run at
package load time. The package requires a version of gcc that supports Open
MP and allows dlopen to be used with the support library libgomp.
## sequential version:
> R <- 1000
> system.time(nuke.boot <-
+ boot(nuke.data, nuke.fun, R=R, m=1,
+ fit.pred=new.fit, x.pred=new.data))
user system elapsed
12.703 0.001 12.706
## Parallel version, using 10 processes:
> clusterEvalQ(cl,library(boot))
> clusterSetupRNG(cl)
> system.time(cl.nuke.boot <-
+ clusterCall(cl,boot,nuke.data, nuke.fun,
+ R=R/length(cl), m=1,
+ fit.pred=new.fit, x.pred=new.data))
user system elapsed
0.009 0.004 1.246
Work on implicit parallelization within R is still in its early stages. The par-
allel vectorized math library package described in Section 2 above is a first
step. Over the next few months this work will be folded into the base R distri-
50 Tierney, L.
model. Extensions to snow to support the BSP model are currently being
explored.
More extensive rewriting of the R implementation might enable the inte-
gration of more advanced parallel libraries, such as ScaLAPACK (Blackford
et al. (1997)), and more advanced parallel programming approaches. This is
the subject of future research.
5 Acknowledgements
This work was supported in part by National Science Foundation grant DMS
06-04593. Some of the computations for this paper were performed on equip-
ment funded by National Science Foundation grant DMS 06-18883.
References
ASANOVIC, K., BODIK, R., CATANZARO, B.C., GEBIS, J.J., HUSBANDS,
P., KEUTZER, K., PATTERSON, D.A., PLISHKER, L.W., SHALF, J.,
WILLIAMS, S.W., YELICK, K.A. (2006): The landscape of parallel computing
research: a view from Berkeley, EECS Department, University of California,
Berkeley, Technical Report No. UCB/EECS-2006-183.
BISSELING, R.H. (2004): Parallel Scientific Computation: A Structured Approach
using BSP and MPI, Oxford university Press, Oxford.
BLACKFORD, L.S., CHOI, J., CLEARY, A., D’AZEVEDO, E., DEMMEL, J.,
DHILLON, I., DONGARRA, J., HAMMARLING, S., HENRY, G., PETITET,
A., STANLEY, K., WALKER, D., WHALEY, R.C. (1997): ScaLAPACK
Users’ Guide. Society for Industrial and Applied Mathematics, Philadelphia.
CHANDRA, R., MENON, R., DAGUM, L., KOHR, D. (2000): Parallel Program-
ming in OpenMP. Morgan Kaufmann, San Francisco.
GEIST, A., BEGUELIN, A., DONGARRA, J., JIANG, W. (1994): PVM: Parallel
Virtual Machine, MIT Press, Cambridge.
L’ECUYER, P., SIMARD, R., CHEN, E.J., KELTON, W.D. (2002): An objected-
oriented random-number package with many long streams and substreams,
Operations Research, 50 (6), 1073–1075.
PACHECO, P. (1997): Parallel Programming with MPI, Morgan Kaufmann, San
Fransisco.
ROSSINI, A.J., TIERNEY, L., LI, N. (2007): Simple Parallel Statistical Computing
in R, Journal of Computational and Graphical Statistics, 16 (1), 399–420.
WHALEY, R.C., PETITET, A. (2005): Minimizing development and maintenance
costs in supporting persistently optimized BLAS, Software: Practice and Ex-
perience, 35 (2), 101–121.