Вы находитесь на странице: 1из 25

foreach + iterators

Bryan Lewis Steve Weston

Revolution Computing
New Haven, CT USA

Rmetrics 2009
Outline

iterators

foreach

Experimenting with existing packages


iterators

An S3 class with tools for iterating over various R data structures:


I Conceptually like while loops
I Defined by a nextElem function
I Like iterators in Java and other languages
Simple Examples

it <- iter (1:3)

it <- icount (3)


Another example

iquery <- function (con, statement, ..., n=1) {


rs <- dbSendQuery (con, statement, ...)
nextElem <- function() {
d <- fetch (rs, n)
if (nrow (d) == 0) {
dbClearResult (rs)
stop (StopIteration)
}
d
}
structure (list (nextElem=nextElem),
class=c (iquery, iter))
}
nextElem.iquery <- function(obj) obj$nextElem()
foreach

I New looping methods for R


I An abstract interface to parallel computing
I Python/Haskell-like list comprehensions
Foreach Syntax

foreach(iterator,...) %dopar% {
statements
}
Example

> foreach (j=1:4) %dopar% { j }


[[1]]
[1] 1

[[2]]
[1] 2

[[3]]
[1] 3

[[4]]
[1] 4
Examples

> foreach (j=1:4,.combine=c) %dopar% { j }


[1] 1 2 3 4

> foreach (j=icount(4),.combine=+) %dopar% { j }


[1] 10

Note the difference with sum (1:4).


Another Example

> library (randomForest)


> x <- matrix (runif (500), 100)
> y <- gl (2, 50)
> rf <- foreach (ntree=rep (250, 4),
.combine=combine) %dopar%
+ randomForest (x, y, ntree=ntree)
The %dopar% operator

%dopar% is a registration API for parallel back-ends:


I doSEQ (the default backend)
I doMC (multicore package)
I doNWS
I doSNOW
The %dopar% operator

%dopar% is a registration API for parallel back-ends:


I doSEQ (the default backend)
I doMC (multicore package)
I doNWS
I doSNOW
I doRHIPE?
I doRMPI?
I ...
Foreach tries to parse R syntax reasonably

> z <- 2
> f <- function (x) { sqrt (x + z) }
> foreach (j=1:4, .combine=c) %dopar% { f (j) }
[1] 1.732051 2.000000 2.236068 2.449490
List comprehension

> foreach (j=-2:2,.combine=c) %:% when (j>=0)


+ %dopar% sqrt (j)

[1] 0.000000 1.000000 1.414214


Nesting

Foreach loops can be nested. Nesting admits at least two


interesting cases:
I Easy loop unrolling
I Easy multi-paradigm parallelism
Loop unrolling

Compare (100 iterations of 5 parallel tasks):

x <- foreach (j=1:100,.combine=sum) %do% {


foreach (k=1:5,.combine=c) %dopar% {j*k}
}

With an unrolled version (500 parallel tasks):

y <- foreach (j=1:100,.combine=sum) %:%


foreach (k=1:5,.combine=c) %dopar% {j*k}

The unrolled approach is better load-balanced on a cluster.


Multi-paradigm parallelism

> require (doSNOW)


> cl <- makeCluster (c (n1, n2, n3, n4))
> registerDoSNOW (cl)
> foreach (j=<iterator>, .packages=doMC) %dopar% {
+ foreach (k=<iterator>) %dopar% {
+ registerDoMC ()
+ ...
+ }
+ }
Example: Very simple backtesting

simpleRule <- function (z, fast=12, slow=26,


signal=9, instr, benchmark)
{
x <- MACD (z, nFast=fast, nSlow=slow,
nSig=signal, maType="EMA")
position <- sign (x[,1]-x[,2])
s <- xts (position,order.by=index(z))
return (instr*(s>0) + benchmark*(s<=0))
}
Brute-force parameter optimization

# Define a return series Ra for the instrument


# (below we use the closing price of MSFT), and
# benchmark series Rb

M <- 100
S <- matrix(0,M,M)
for (j in 1:(M-1)) {
for (k in min ((j+2),M):M) {
R <- simpleRule (Cl (MSFT),j,k,9, Ra, Rb)
Dt <- na.omit (R - Rb)
S[j,k] <- mean (Dt)/sd(Dt)
}
}
Now in parallel, by rows...

M <- 100
S <- foreach (j=1:(M-1), .combine=rbind,
.packages=c (xts,TTR)) %dopar% {
x <- rep (0,M)
for (k in min ((j+2),M):M) {
R <- simpleRule (Cl (MSFT),j,k,9,Ra,Rb)
Dt <- na.omit (R - Rb)
x[k] <- mean (Dt)/sd( Dt)
}
x
}
Parallelizing parts of an existing package

Basic idea
I Profile code with Rprof (profr is a nice wrapper that visualizes
the results)
I Examine bottlenecks for apply-like statements and for loops
with independent code blocks
I Rewrite for loops without side-effects as required (may require
a custom combine function)
I Unlock the namespace, provisionally replace target
function(s) and experiment (a nice trick)
Example: ipred

(Work through the ipred replacement functions in the filecvx.R.)


Appendix: Fun map/reduce examples
Succint map/reduce...from the mapReduce package by
Christopher Brown:

mapReduce <- function (map, ..., data=NULL,


applyfun=sapply)
{
innerFun <- function(my.data, expr)
eval(expr, my.data)
outerFun <- function (expr, split.data)
sapply (split.data, innerFun, expr)
attach (data)
map <- eval (substitute (map, data))
detach (data)
expr = substitute (c ( ... ))[-1]
split.data <- split( data, map )
applyfun (expr, outerFun, split.data)
}
mapReduce sequential and parallel examples

# An example
mapReduce (cyl, mean(mpg), mean(hp),
data=mtcars, applyfun=sapply)

# With multicore:
require (mutlicore)
mapReduce (cyl, mean(mpg), mean(hp),
data=mtcars, applyfun=mclapply)

# With SNOW parSapply:


require (snow)
cl <- makeSOCKcluster(c("localhost","localhost"))
ssapply <- function (A,B,C) {parSapply(cl, A, B, C)}
mapReduce (cyl, mean(mpg), mean(hp),
data=mtcars, applyfun=ssapply)
mapReduce parallel examples

# With Rmpi mpi.parSapply:


require (Rmpi)
x <- mapReduce (cyl, mean(mpg), mean(hp),
data=mtcars, applyfun=mpi.parSapply)

# With foreach:
require (foreach)
fapply <- function (A,B,C) {
foreach (j=A, .combine=cbind) %dopar% B(j, C) }
mapReduce (cyl, mean(mpg), mean(hp),
data=mtcars, applyfun=fapply)

Вам также может понравиться