Вы находитесь на странице: 1из 2

Brandon Merkl

Strassen Implementation
This algorithm was implemented as follows:
void inline strassen(int n,double* c,int ldc,int rc,int cc, double*
a,int lda,int ra, int ca,double *b, int ldb, int rb, int cb, int alpha,
int min_block_size ){
//C(i,j) = alpha*C(i,j) + A(i,k) * B(k,j)
//alpha [0,1]

Since it is recursive in nature, a flag named alpha, was used to delineate whether the
results need to be accumulated into C or merely set to C. Also, a minimum level of
recursion was defined using a variable called min_block_size was used. If the recursion
level produces an n such that n < min_block_size, then normal matrix multiply is used.
The results are shown below for min_block_size = 32 and min_block_size = 64. Also
included are the IJK_Blocking_20 (block size = 20) and IJK_Blocking_21 (block size =
21). A naïve implementation is included as well as the default DGEMM for comparison.

Performance comparison of Selected Algorithms


450

400

350
Performance MFLOP/s

300 strassen32
strassen64
250
DGEMM
IJK_BLOCKING_20
200
IJK_BLOCKING_21
150 IJK_NAIVE

100

50

0
0 100 200 300 400 500 600 700
N
APPENDIX A – Results Strassen
F=frobenius norm

N B DGEMM STRASSEN IJK_blocking IJK_Naive

4 20 16.53 21.1 F:000.18 24.8 F:000.18 28.9 F:000000


8 20 40.68 78.6 F:000.49 91.0 F:000.49 75.0 F:000000
16 20 60.85 166.3 F:000.61 174.7 F:000.61 116.9 F:000000
32 20 68.25 198.1 F:000.97 251.7 F:000.76 153.9 F:000000
64 20 72.87 229.8 F:0001.5 334.0 F:000001 177.7 F:000000
128 20 74.53 213.2 F:0002.4 326.8 F:0001.4 146.8 F:000000
256 20 33.92 186.3 F:0004.1 194.9 F:0001.9 44.2 F:000000
512 20 33.91 175.8 F:0006.8 240.4 F:0002.7 44.5 F:000000

4 21 19.60 21.9 F:000.36 27.7 F:000.36 30.8 F:000000


8 21 41.19 76.6 F:000.47 87.3 F:000.47 66.4 F:000000
16 21 58.02 165.9 F:000.58 173.7 F:000.58 117.8 F:000000
32 21 68.32 184.3 F:000.94 253.2 F:000.77 153.1 F:000000
64 21 72.91 231.2 F:0001.6 342.1 F:000.99 177.9 F:000000
128 21 73.00 210.7 F:0002.6 346.2 F:0001.4 147.0 F:000000
256 21 33.91 185.7 F:0004.2 277.9 F:0001.9 43.9 F:000000
512 21 34.00 176.0 F:0006.8 262.4 F:0002.6 44.3 F:000000

4 64 20.67 22.7 F:000.25 29.0 F:000.25 25.5 F:000000


8 64 42.27 84.9 F:000.46 88.0 F:000.46 68.7 F:000000
16 64 58.18 168.0 F:000.59 172.2 F:000.59 118.1 F:000000
32 64 68.22 260.2 F:000.76 260.4 F:000.76 158.4 F:000000
64 64 72.94 318.8 F:000001 348.6 F:000.94 177.8 F:000000
128 64 72.30 296.1 F:0001.5 378.2 F:0001.4 147.4 F:000000
256 64 34.98 247.6 F:0002.2 282.2 F:0001.9 43.9 F:000000
512 64 34.05 225.9 F:0003.3 124.8 F:0002.6 44.2 F:000000

Вам также может понравиться