Вы находитесь на странице: 1из 10

# Liviu Octavian Mafteiu-Scai /International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com Vol.

## Average Bandwidth Relevance in Parallel Solving Systems of Linear Equations

Liviu Octavian Mafteiu-Scai
Computer Science Department, West University of Timisoara, Timisoara, Romania

ABSTRACT:
This paper presentssomeexperimental resultsobtainedon aparallel computerIBMBlueGene /Pthat showsthe average bandwidth reduction [11] relevancein the serial and parallel cases ofgaussianeliminationandconjugate gradient.New measuresfor the effectivenessofparallelizationhave beenintroducedin order to measurethe effects of average bandwidth reduction. The main conclusion is that the average bandwidth reduction in sparse systems of linear equations improves the performance of these methods, a fact thatrecommendusing thisindicator inpreconditioningprocesses, especially when thesolvingis doneusing aparallel computer. KEYWORDS: average bandwidth, linear system of equations, parallel methods, gaussian elimination, conjugate gradient, sparsematrices system nis divisible with number of partitions p and the partitions are equals in terms of number of rows,iek=n/p rows ineach partition, iea partitionpi ofsizekwill 1 includeconsecutiverows/equationsbetween + 1 and + + 1 . Parallel gaussian elimination (GE) The mainoperationsperformed there are:local pivot determination for each partition in part, global pivot detemination, pivot row exchange, pivot row distribution, computing the elimination factors, computing the matrix elements.Because thevalues of theunknownsdepend on eachotherand are computesoneafteranother, thecomputation of thesolutions in thebackwardsubstitutionisinherently serial. In gaussianeliminationis an issuewithloadbalancingbecausesomeprocessors are idlesincealltheirworkisdone. Paralel conjugate gradient (CG) The CG methodis verygoodforlarge and sparse linear systemsbecause it hasthe property ofuniformconvergence, but onlyif the associatedmatrix is symmetric andpositive definite. The parallelisminCGalgorithmderives fromparallelmatrix-vector product. Otheroperations can beperformedinparallelas long asthere is nodependency betweenthem, such as for example, updating theresidualvectorandthe vectorsolution. Butthese lastteroperationscan not be performedbefore performingthe matrix-vector productandthe matrixvector productin a newiterationcan not beperformeduntilthe residualvectoris updated. So,there are twomomentsinwhichprocessorsmustsynchronizebefor e they canbecontinue thework.It is desirable that between these twopoints ofsynchronization, the processorsdo nothaveperiods of inactivity, which is an ideal case. In practice,the efficiency of thecomputationfollows aminimization ofthis waiting/idle timefor synchronization. It has been observed that a particular preparation of systembefore applicationanumerical methodfor solving,leads toan improvementof the process andthe solution. Thispreparationwascalledpreconditioning. Intime,manypreconditioningmethodshave been proposed,designed to improvethe process of solvinga system ofequations.There are manystudies on
1

1. THEORETICAL CONSIDERATIONS
The systems of linear equationsappear in almosteverybranch of scienceand engineering. Theengineeringareasinwhere sparse and large linear systems of equationarisefrequently include thechemical engineering processes, design and computer analysis of circuits, powersystemnetworksandmanyothers. The search for efficientsolutionsisbeingdrivenbytheneedto solve hugesystems-millions of unknownson parallelcomputers. The interest inparallelsolvingsystemsof equations, especially those very largeand sparse,has been veryhigh,there arehundreds ofpapersthat deal withthissubject. As solving methodsthere aredirect methodsanditerativemethods. Gaussianeliminationandconjugategradientare twopopularexamplesin this respect. In gaussian elimination the solution is exact and its obtained infinitelymanyoperations. The conjugate gradient methodgeneratessequences of approximationsthat converge in thelimittothesolution.For each of themthere are manyvariantsdeveloped and the literatureis veryrichin describingthese methods, especially in the case ofserialimplementations. Below, are listed suscint,by some particularities ofparallelimplementationsof thesetwo methods, case where the matrix system is partitioned perrows. It isconsidereda nxnsystem, the case when the size of

1898 | P a g e

Liviu Octavian Mafteiu-Scai /International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com Vol. 3, Issue 1, January -February 2013, pp.1898-1907
theinfluence ofpreconditioningto parallel solving thesystemsof linear equations [1, 2, 3]. Reducing the bandwidth of associated matrix isone of these preconditioning methods and for thisthere area lot ofmethods,the most popular beingpresentedinworks such as [4, 5, 6, 7 and 8]. In paper [9] it is presenteda studyofparalleliterative methods Newton, conjugate gradientandChebyshev, including the influence ofbandwidth reduction in terms ofconvergenceof thesemethods. In paper [10] it wasproposed anew measurefor sparse matricescalledaveragebandwidth(mbw). In [11] algorithms andcomparative studiesrelated tothis new indicatorwasmade. Paper [12] proposes methods thatallowfora pair ofmatrixlines/columns, withoutperforminginterchange, qualitativeandquantitativemeasurement ofopportunity forinterchange in terms of bandwidthandaveragebandwidthreduction. According to [11], theaveragebandwidthisdeffinedbyrelation: 1 = 0 , = 1, = 1 (1) wheremisthenumber of non-zero elementsandnisthesize of thematrixA. Thereasons,specified in [11],for usingaveragebandwidth (mbw) insteadbandwidth (bw) in preconditioningbeforeparallelsolvingsystem of equations are: - mbwreductionleadingto a more uniform distribution of non-zero elementsaroundthemain diagonalandalong themain diagonal; - mbwis more sensitivethanthebwtothepresencearoundthemain diagonal of theso-called "holes", that are compact regions of zero values; - mbwislesssensitivetothepresence of someisolated non-zero elements far fromthemain diagonal. In case of a matrixwhichminimizesmbwwillhavemost non-zero elementsveryclosetothemain diagonal andveryfew non-zero elementsawayfrom it. Thisis an advantageaccordingtothepaper [13], astobeseen in Figure1c).Forthe samematrix1a)twoalgorithmsCutHill-McKee for 1b) wereusedandtheoneproposedin [10] for 1c), thefirsttoreduce thebandwidthbwandthe secondtoreducetheaveragebandwidthmbw. Paper [15] describesa setof indicatorsto measure the effectiveness of parallel processes. From thatwork, two simplebutrelevant indicatorswere chosen: RelativeSpeedup(Sp) andRelativeEfficiency(Ep) described byrelations(2)and(3). = 1 (2) =

1 =

(3)

wherep is the number ofprocessors, T1isthe execution timeachievedby asequentialalgorithmandTpisthe execution timeobtainedwith aparallelalgorithmwithpprocessors. WhenSp= pwe have alinearspeedup. Sometimes inpractice, there is aninteresting situation, known assuperlinearspeedupwhenSp>p. One possible cause isthe cache effect, resulted from memorieshierarchyofparallelcomputers [16]. Inour experimentssuch situations have beenencountered,some of which arecontainedintables 1 and 3. Note: In our experiments,becausewe wereespeciallyinterested inthe effects of mbwreduction,inrelations(2)and(3)we considerT1asexecution timebeforembw reduction, serial case.

Figure 1. A matrix after bwreduction and after mbw reduction GainTime(GT) measure isintroduced, which is, in percents, the difference betweenthe execution timebeforembwreducing(Tb) andthe execution timeaftermbwreducing(Ta), related tothe execution timebeforembwreducing(Tb), thesamepartitioning: =
( )

for (4)

100

1899 | P a g e

Liviu Octavian Mafteiu-Scai /International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com Vol. 3, Issue 1, January -February 2013, pp.1898-1907
positive valuesshowing a more efficiencyin termsofexecution timeaftermbwreducing. Itintroducedthe measure Increase ofefficiency (IE), which is, in percent, difference betweenthe relative efficiencyaftermbw reducing(Epa) andrelativeefficiencybeforembwreducing(Epb) reportedto the relative efficiencybefore mbw reduction (Epb), for thesamepartitioning: =

100

(5)

It wasnotedbyAthe average number ofiterationsrequired forconvergenceand with Apthe averagenumber ofiterationsrequired forconvergence per processor. Inthe experiments performed, A wasthe average value obtained for 50 systemswith samesizebutwithdifferentnonzeroelements distribution and different sparsity. So, Ap = A/p where p is the number of processors/partitions. It will be seenfrom experimentsthat betweenAp andexecution timethere is adirectlyproportionalrelation, so thatApcan bea measure ofthe parallelization efficiency. Usingonlythe value ofthe average number ofiterationsper processorAp, can allowan estimation of theefficiency by neglecting the hardwarefactor (communication timebetweenprocessors, cache effect, etc.), whose influenceisdifficultto calculate.Based on theseconsiderations, it isproposeda new measureofefficiencybased onthe average number ofiterationsper processor, calledEfficiency in Convergence(EC).It showsfortwo situationsSa (after) and Sb (before), howit has decreased/increased, in percent, the average number ofiterationsper processorin situation Sbfrom the situation Sa. In this study (the relevance of average bandwidth reduction in parallel case) the situation Sa is represented by and situation Sb by whereindices a and b refers to after mbw reduction and before mbw reduction. So, the computing relation is: =

andavoidunnecessarymessages in point to point synchronization. Forparallelpartitioning,the systemof equationswas dividedequally betweenprocessorsusingthe divisorsof the systemsize. Matrices/systemschosenfor the experimentswere generatedrandomly,positive definite, symmetric and sparse, with a sparsity degree between5 and 20%. The size of thesematriceswere between10 and1000,theweightrepresentingasizeof 200. In the experiments there were generatedandused matrices withuniformandnonuniform distributionofnonzeroelements. Interms of reducingthe average bandwidthmbw, values obtained were between10 and70%. For eachinstance/partitioningwere used50 systemswith the samesizebut differentassparsityand distribution ofnonzeroelements. Sizeswere variedwithratio10, from 10 to 1000. In case ofCG,forarithmeticprecisionrequired by theconvergence,epsilonvalueschosenwere10-10, 1030 , 10-100and10-200and the initialvectorthat was usedX0={0, 0,,0}. 2.1 Serial case Gaussian elimination.It has beenexperimentally observed that ingeneral, in GE, the average bandwidth reduction and/or bandwidth reduction did notsignificantlyaffectthecomputationin terms ofitsefficiency. Onlyin the cases wherembw<< n or bw<< nthere was an increaseinprocessefficiencybecausethe complexitydecreases fromO(n3)toO(nb2), as ismentioned in [14] regarding bw. Conjugate gradient.Experiments represented inFigure2 showsa general sensitivityto the mbw value, especially at larger sizesthan 100anda greateraccuracycomputation.

100

(6)

## The next sectionwillshow that mbw reducing leads toefficientparallel computingprocess.

2.

THE EXPERIMENTS

The experimentswere performed onIBM Blue Gene /P supercomputer.Inexperimentsit has been implementedthegaussian elimination without pivoting and the preconditioned conjugate gradient with block Jacobi preconditioning. To implement theseserial and parallel numerical methods, it was usedIBMXLC compilerversion 9.0underLinux.Forparallelimplementationof theGE and CGmethodsit was usedMPI(Message-Passing Programming), a standardinparallel computingwhichenablesoverlappingcommunication s andcomputationsamongprocessors

1900 | P a g e

Liviu Octavian Mafteiu-Scai /International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com Vol. 3, Issue 1, January -February 2013, pp.1898-1907

Figure 2 Conjugate gradient-serial case 2.2 Parallel case 2.2.1 Gaussian elimination.According toour experimentsit resultedthat inapproximately60% of cases, the mbwreduction ,lead to an increase inefficiency, but without major differencesin terms ofexecution time(order of magnitude).It wasobserved thatincreasing the number ofprocessorsinvolved in computation, firstleading toa decrease inexecution time,reachinga minimum valuefollowed by an increasein execution timewith increasingnumber ofprocessor, ascan be seeninFigure 3. An exampleis shownbelow. Example 1:size of system: 500x500, 1486 non-zero values, uniform distribution; before mbw reduction: mbw0=110.33 bw0=499; after mbw reduction: mbw=12.44 bw=245. Figure 4 Relative speedup: Example 1

## Figure 3 Gain Time: Example 1

Figure 6 Increase of Efficiency:Example 1 Inexperiments there wereencounteredsituations (some partitioning) when the gaussianeliminationfailed.Possible causes include: rounding errors, numericalinstabilityormaindiagonalof the associated matrixcontainszeros orvery small values.

1901 | P a g e

Liviu Octavian Mafteiu-Scai /International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com Vol. 3, Issue 1, January -February 2013, pp.1898-1907
Processors 1 2 10 20 25 50 100 125 250 500 Average Runtime (s) beforembw after 3,609060 3,609085 3,547747 3,547770 0,842680 0,842976 0,453526 0,454285 0,376788 0,376625 0,264829 0,231279 0,213262 0,179467 0,273594 0,177232 1,598770 0,873000 6,175630 5,036316 1,735589 1,143524 Sp before After -0,000693 1,000000 0,999993 -0,000648 1,017282 1,017276 -0,035126 4,282836 4,281332 -0,167355 7,957780 7,944484 0,043260 9,578490 9,582635 12,668552 13,627888 15,604789 15,846705 16,923127 20,109881 35,220911 13,191276 20,363478 45,395523 2,257398 4,134089 18,448547 0,584404 0,716607 12,741968 7,042048 8,475456 Table 1 Example 1: experimental results GT (%) Ep before after 1,000000 0,999993 0,508641 0,508638 0,428284 0,428133 0,397889 0,397224 0,383140 0,383305 0,272558 0,312096 0,169231 0,201099 0,105530 0,162908 0,009030 0,016536 0,001169 0,001433 0,327547 0,341137 IE (%) -0,000693 -0,000648 -0,035114 -0,167076 0,043279 14,506289 18,830760 54,370803 83,135166 22,621972 19,330474

2.2.2 Conjugate gradient Figure7 showsthe global effect of reducingmbwin the case of parallelconjugategradient, especially with the increasing the size ofsystems andwith increasedcomputingaccuracy. We mention thatinFigure 7 are representedthe average valuesobtained for thedifferentsystems of equations(sparsity, distribution, etc.) before and aftermbwreduction Below are presented some examples (2, 3, 4, 5 and 6) ofdifferent situations, which showsthe effects

ofmbwreducing in parallelsolvingsystemsof linear equations using conjugate gradient method. Example 2:size of system: 1000x1000, 36846 nonzero values, uniform distribution; before mbw reduction: mbw0=413 bw0=999; after mbw reduction: mbw=211 bw=911. InTable 2, Figure 8 and 9 it is shownthe correlation betweenexecution timeand average number ofiterationsper processor Ap, whichjustifies the use ofthe lastas an indicatorofperformance measurement.

## Figure 8 The correlation between Ap and Runtime, from Example 2 (e-10)

1902 | P a g e

Liviu Octavian Mafteiu-Scai /International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com Vol. 3, Issue 1, January -February 2013, pp.1898-1907

Figure 9 The correlation between Ap and Runtime, from Example 2 (e-200) Accu racy e-10 Beforembwredu ction Runtime (s) A Ap Aftermbwreducti on Runtime (s) A Ap e-200 Beforembwreductio n Runtime (s) A Ap Aftermbwreduction A Ap Runtime (s) 14 696 357 5 190 5 163 3 839 9 720 477 269 261 145 90 88 50 46 121 9 Ep 1,30 1,27 1,19 1,11 1,35 1,26 0,95 1,35 0,87 1,25 1,01 0,83 0,91 0,78 1,10 GT (%)

Processor s

GT (% )

## 1 2 4 5 8 10 20 25 40 50 100 125 200 250 Av. Processors

2 3 864 86 103 51 6 20 823 46 63 82 866 59 6 17 146 18 4 33 867 86 39 30 785 39 3 7 841 33 9 3 145 36 7 7 874 17 72 4 794 79 1 5 119 95 5 103 52 05 127 51 16 101 15 33 22 62 e-10

## 4 482 289 116 98 103 49 23 19 22 10 5 6 4 4 88

5 6 7 8 9 10 11 1141 114 636 -32 1626 1626 906 1064 532 297 -3 2156 1078 600 2 12 81 81 5 1030 257 145 -25 1234 3086 173 6 3 59 30 9 1260 252 142 -45 1187 2375 133 0 5 43 1 8 8637 108 61 1895 2368 133 41 2 0 71 4 3 1376 137 78 -59 1379 1379 776 0 00 8 6 1357 679 39 -73 1128 5645 324 1 6 18 2 1254 502 29 -49 1214 4857 279 8 91 1533 383 23 -4 1213 3035 178 8 22 1042 209 12 -19 1130 2261 134 7 92 1320 132 9 -64 1522 1523 96 7 60 1615 129 8 -35 1136 909 59 1 64 9216 46 3 3277 1639 117 10 9 25 8773 35 3 1196 479 36 30 24 1190 188 106 -21 1521 2735 153 35 0 Table 2 Example 2: experimental results 4 6 42 4 e-200 Aftermbwreducti on Sp Beforembwreducti on

12 1249 1282 93 1351 83 1460 38 1190 61 1279 41 1664 43 1173 20 1778 69 1220 55 1422 97 1704 57 1397 78 1554 55 1409 45 38

13 1249 6414 93 3378 2 2921 5 1488 2 1279 0 8321 4 4695 4446 2442 1423 1364 699 622 2170 1

## 15 23 41 -9 -23 37 7 -47 3 -46 -8 7 -50 57 -30 21

Beforembwreducti on Sp 1,00 1,66 4,15 4,94 4,66 9,84 21,37 24,79 22,15 46,00 93,24 76,82 126,42 121,96 39,93 Ep 1,00 0,83 1,04 0,99 0,58 0,98 1,07 0,99 0,55 0,92 0,93 0,61 0,63 0,49 0,83

Aftermbwreducti on IE Sp 1,30 2,54 4,76 5,53 10,80 12,59 19,00 33,64 34,74 62,53 100,71 103,16 182,14 195,61 54,93 30,15 68,11 -8,65 18,68 59,19 7,80 3,45 32,16 -7,36 31,66 7,16 134,2 33,30 0 22,94 11,09

## 1 2 4 5 8 10 20 25 40 50 100 125 200 250 Averag e

IE Ep Sp Ep 0,76 0,76 1,00 1,00 1,62 0,81 24,23 -2,69 1,51 0,75 3,32 0,83 5,22 1,30 3,40 0,68 20,03 6,80 1,36 7,89 0,99 31,26 6,78 0,85 69,46 6,21 0,62 11,68 1,17 12,38 0,62 36,95 28,00 1,40 16,65 0,67 42,04 32,51 1,30 21,31 0,53 32,84 -3,80 50,83 1,27 38,66 0,77 67,50 1,35 56,68 0,57 15,95 93,98 0,94 56,91 0,46 39,21 154,67 1,24 140,79 0,70 25,92 11,37 77,77 0,39 173,85 0,70 42,55 253,83 1,02 38,60 0,69 56,58 1,10 Table 3 Example 2: experimental results 10,82

1903 | P a g e

Liviu Octavian Mafteiu-Scai /International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com Vol. 3, Issue 1, January -February 2013, pp.1898-1907

Figure 13: Matrix form from Example 4 before and aftermbw reduction ( =10-30) Figure 10: IE from Example 2-experimental results Example 3:size of system: 100x100, 1182 non-zero values, sparsity=11,82%, uniform distribution; before mbw reduction: mbw0=33.61 bw0=99; after mbw reduction: mbw=9.97 bw=60. This is an exampleinwhichallpartitioning are favorable toaverage bandwidth reduction, as it can be seen in figure 14.

Figure 11: Matrix form from Example 3 before and aftermbw reduction In Figure 12, according toequation (6), we see that reducing theaveragebandwidthmbw has the best effect in the case of 10partitions,interms of numberof iterationsper processornecessary forconvergence.

Figure 14:Efficiency inconvergence-Example 4 Example 5:size of system: 100x100, 1898 non-zero values, sparsity=18.98%, nonuniform distribution; before mbw reduction: mbw0=36.38bw0=97; after mbw reduction: mbw=21.63bw=97.

Figure 15 Matrix form from Example 5 before and aftermbw reduction ( =10-30) Figure 12:Efficiency inconvergence-Example 3 Example 4:size of system: 100x100, 924 non-zero values, sparsity=9.24%, nonuniform distribution; before mbw reduction: mbw0=38.73 bw0=99; after mbw reduction: mbw=20.05 bw=80. Insome cases, somepartitioningleads toa drastic decrease inefficiencyaftermbwreducing,butin general,there are otherfavorablesituationsin termsof convergence, as can be seenin figures 16 and 12.

1904 | P a g e

Liviu Octavian Mafteiu-Scai /International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com Vol. 3, Issue 1, January -February 2013, pp.1898-1907
Runtime (s) Gaussianeliminat ion Processor s e-10 e-200 Conjugate Gradient

Figure 16 Efficiency inconvergence-Example 5 Example 6:size of system: 200x200, 4814 non-zero values, sparsity=12.03%, uniform distribution; before mbw reduction: mbw0=66.96 bw0=196; after mbw reduction: mbw=28.34 bw=196.

1 0,055312 0,0856 0,71408 2 0,055265 0,0483 0,35996 90 6 4 0,042756 0,0280 0,22991 05 4 5 0,040240 0,0217 0,17075 96 0 10 0,034234 0,0129 0,09611 16 9 20 0,032118 0,0091 0,06396 24 0 25 0,032066 0,0084 0,06069 32 9 50 0,035373 0,0074 0,05209 67 0 100 0,051450 0,0073 0,04848 84 3 41 7 Average 0,042090 0,0254 0,19956 Table 4 Comparative experimental results 62 3

## Figure 17 Matrix form from Example 6 before and aftermbw reduction

Figure 19 Example 8:size of system: 100x100, 508 non-zero values, uniform distribution; before mbw reduction: mbw0=26.81 bw0=95; after mbw reduction: mbw=5.18 bw=55. Runtime (s) Gaussianeliminat ion Processor s e-10 e-200 Conjugate Gradient

Figure 18 Efficiency inconvergence-Example 6 2.3Comparingthe mbwreducingeffects at GEandCG Inorder to seewhichmethodis moststrongly influenced bythereduction ofmbw there were performeda series ofcomparativeexperiments. Inexamples7, 8and 9are presentedexperimental resultsfor the gaussianeliminationandconjugategradientin solvingsame linearsystems of equations. Computationaccuracyimposed forconvergencein the case ofconjugate gradientwas10-10and10-200. The results presentedin the tablesareobtained afterthembwreduction. Example 7:size of system: 100x100, 2380 non-zero values, uniform distribution; before mbw reduction: mbw0=36.97 bw0=99; after mbw reduction: mbw=21.84 bw=91.

1 0,053286 0,0689 0,5235 2 0,053243 0,0394 0,2876 99 18 4 0,040771 0,0230 0,1668 34 19 5 0,038213 0,0171 0,1254 55 83 10 0,032215 0,0108 65 0,0802 66 20 0,030087 0,0076 0,0564 45 19 25 0,030005 0,0070 0,0456 84 98 50 0,061563 0,0063 0,0397 16 46 100 0,111814 0,0062 70 0,0400 96 Average 0,050133 0,0207 0,1517 97 89 63 Table 5 Comparative experimental results 48

1905 | P a g e

Liviu Octavian Mafteiu-Scai /International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com Vol. 3, Issue 1, January -February 2013, pp.1898-1907

## 3. FINAL CONCLUSIONS AND FUTURE WORK

The proposed indicator in [11], average bandwidth mbw, wasvalidatedbyexperimental resultspresentedin this paperandrecommendedits useinthepreconditioningof large systemsof linear equations, especially when thesolvingis doneusing aparallel computer. Usingthe proposed indicator average number ofiterationsper processorAp, is a good choice because it allowsan estimation of theefficiency by neglecting the hardwarefactor, in case of parallel iterative methods. Also, the proposed indicator Efficiency in Convergence(EC), based onAp, shows in comparative studies for parallel iterative methods, in an intuitive way,the obtained progress/regression. The proposedindicatorsGainTime(GT),andIncrease ofefficiency (IE), incomparative studiesshowclear by andintuitive by the obtained progress/regression. Extendingthe studyto the case ofnonlinear systemsof equationsarerequired to bemade, encouraging resultsare also expectedin conditions as shownin[9], under certain conditionsnonlinearparallel case canbe reduced toonelinear. The influence of mbwreducingin the case ofunequalandoverlappingpartitions (equal&overlap, unequal&non-overlapping, andunequal&overlapping) will be afuture study.

Figure 20 Experimental results: CG vs. GE Example 9 (same system as in Example 6) Runtime(s) Conjugate gradient Gaussia nelimina tion e-10 e-200 Processors 3,60908 2,6408 19,46738 3,54777 1,3360 10,05824 5 67 8 0,84297 0,2855 2,115549 0 89 6 0,45428 0,1582 1,132339 6 21 0,37662 0,1312 0,885119 5 82 0,23127 0,0844 0,51753 5 54 0,17946 0,0681 0,299843 9 74 0,17723 0,0615 0,256805 7 17 0,87353 0,0575 0,187639 2 21 5,03631 0,0481 0,139639 5 15 6 13 Average 1,53285 0,4871 3,506010 7 75 Table 6 Comparative experimental results: CG vs. GE 1 2 10 20 25 50 100 125 250 500

REFERENCES
[1] Michele Benzi, PreconditioningTechniques for Large Linear Systems: A Survey, Journal of ComputationalPhysics, Vol. 182, Issue 2, 1 November 2002, Pages 418477, Elsevier 2002 YousefSaad, Iterative Methodsfor Sparse Linear Systems, second edition, ISBN-13: 978-0-898715-34-7 SIAM 2003 O. Axelsson, A survey of preconditioned iterative methods for linear systems of algebraicequations, Springer, BIT NumericalMathematics 1985, Volume 25, Issue 1, pp 165-187 , 1985 E. Cuthilland J. McKee, Reducingthebandwidth of sparse symmetricmatrices. In Proc. of ACM, pages 157-172, New York, NY, USA, 1969. ACM. N.E. Gibbs, W.G. Poole, and P.K. Stockmeyer. An algorithm for reducingthebandwidthandprofle of a sparse matrix. SIAM Journal on NumericalAnalysis, 13:236-250, 1976.

[2]

[3]

Figure 21 Experimental results: CG vs. GE The executiontimeis significantly lowerin the case ofconjugate gradient, especially if usingmore processorsfor processing. Increasing the number ofprocessorsin the case ofconjugategradientleads tothe possibility of performingcalculationswith high accuracy(e-200) and low execution time,comparable to thoseobtained by the gaussianeliminationmethodorthat obtained [4]

[5]

1906 | P a g e

Liviu Octavian Mafteiu-Scai /International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com Vol. 3, Issue 1, January -February 2013, pp.1898-1907
[6] R. Marti, M. Laguna, F. Glover, and V. Campos. Reducingthebandwidth of a sparse matrixwith tabu search. European Journal of OperationalResearch, 135:450-459, 2001. R. Marti, V. Campos, and E. Pinana, A branchandboundalgorithm for thematrix, bandwidthminimization, European Journal of OperationalResearch, 186:513-528, 2008. A. Capraraand J. Salazar-Gonzalez, Laying out sparse graphswithprovably minimum Bandwidth, INFORMS J. on Computing, 17:356-373, July 2005 S. Maruster,V. Negru, L.O. Mafteiu-Scai, Experimental study on parallelmethods for solvingsystems of equations, Proc. of 14th International Symposium on Symbolicand Numeric Algorithms for ScientificComputing 2011 (will appear in IEEE-CPS Feb.2012) L.O. Mafteiu-Scai, Bandwidthreduction on sparse matrix, In West University of TimisoaraAnnals, Mathematicsand Computer Scienceseries,volume XLVIII fasc. 3, Timisoara, Romania, pp 163-173, ISSN:1841-3293, 2010. L. O. Mafteiu-Scai, V. Negru, D. Zaharie, O. Aritoni, AverageBandwidthReduction in Sparse MatricesUsingHybridHeuristics, in Proc. KEPT 2011, selectedpapers, (extendedversion) ed. M. Frentiu et. all, Cluj-Napoca, 2011, pp 379-389, Presa Universitara Clujeana, ISSN 2067-1180, 2011 L. O. Mafteiu-Scai, ,,InterchangeOpportunity In AverageBandwidthReduction In Sparse Matrix, In West University of TimisoaraAnnals, Mathematicsand Computer Scienceseries , volume L fasc. 2, Timisoara, Romania, pp 01-14, ISSN:1841-3293, VersitaPublishing DOI: 10.2478/v10324-012-0016-1, 2012. P. Arbenz, A. Cleary, J. Dongarra, and M. Hegland, Parallelnumerical linear algebra, chapter A comparison of parallelsolvers for diagonally dominant and general narrowbanded linear systems, pages 35-56. Nova SciencePublishers, Inc., Commack, NY, USA, 2001 S. S. Skiena, The algorithm design manual, second ed., Springer, ISBN 9781-84800-069-8, 2012 S. Sahni, V. Thanvantri, Paralel computing:performance metricsandmodels, Computer & Information SciencesDep., Univ. of Florida, Rep. USA ArmyResearch Office, Grant DAA H04-95-1-0111, 1995 [16] D. P. Helmbold, C. E. McDowell, Modeling speedup(n) greaterthan nIEEE Transactions on ParallelandDistributedSystems, vol. 1, pp. 250256, 1990.

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

1907 | P a g e