Академический Документы
Профессиональный Документы
Культура Документы
ABSTRACT storage spa
e for arrays sin
e any attempt to read from or
Software debugging, maintenan
e and reuse have to deal write into a lo
ation outside the de
lared ranges is an er-
with many problems from Fortran s
ienti
odes that do ror, and may result in unexpe
ted results, se
urity holes or
not fully respe
t the standard spe
i
ation. Our study on failures.
Linpa
k, Perfe
tClub and SPEC95 ben
hmarks and several In many languages, some array dimension de
larators of
industrial software reveals a large number of unpre
ise vari- formal arguments are impli
it. In C and Java, the program-
able de
larations that prevent program analysis, veri
ation mer
an repla
e the rst array dimension de
larators by [℄.
and parallelization. Furthermore, they de
rease the read- In Fortran, it is the last array dimension that
an be de
lared
ability of programs and make reverse-engineering more dif- with an assumed-size de
larator (*), be
ause arrays are al-
ult. lo
ated in
olumn-major order. Array argument de
larators
This paper presents two dierent methods to
ompute the with 1 (whi
h do not respe
t Fortran 77 standard) instead
exa
t size of arrays in Fortran
odes that have pointer-type of * are also en
ountered.
Our study of s
ienti
appli
ations from Linpa
k, Perfe
t-
REAL A(1) or assumed-size REAL A(*) de
larations.
The rst method uses the relationship between a
tual and ? ?
Club [ ℄ and SPEC CFP95 [ ℄ ben
hmarks shows a large
formal arguments from parameter-passing rules. New array number of unnormalized array argument de
larations. Many
de
larations in the
alled pro
edure are
omputed with re- of them have 1 as the upper bound for the last dimension of
spe
t to the de
larations in the
alling pro
edures. the array de
laration although array referen
es in the
or-
The se
ond approa
h is based on an array region analysis responding pro
edure are outside the dened extent of the
that gives information about the set of array elements a
- array. This
auses premature aborts due to bound viola-
tions when the programs are
ompiled with an array range
essed during the exe
ution of
ode. This approa
h to array
resizing
ould be applied to other languages without array ?
he
king option. In [ ℄, the de
larations had to be xed
de
laration su
h as MATLAB and APL in order to redu
e by hand, whi
h is not an easy pro
ess. Like the assumed-
the exe
ution overhead of dynami
test and resizing. size array de
larator, this unnormalized feature in Fortran
Our two approa
hes are
ombined to yield very good re- prevents array bound
he
king and alias analysis that are
sults for Linpa
k, Perfe
tClub and SPEC95 ben
hmarks.
riti
al for
ode safety and debugging.
Furthermore, when these kinds of array de
larations are
used by programmers,
ode understanding, one of the
ore
Keywords software engineering a
tivities, be
omes more diÆ
ult. Pro-
program analysis, program
omprehension, debug, reuse, gram
omprehension is really needed to maintain, reuse, re-
reverse-engineering, array de
laration, array resizing, array engineer and enhan
e software systems.
region. The numbers of unnormalized and assumed-sized arrays
in Table 1 (fth
olumn: Total) show how often this kind
of de
larations o
urs, in 18 out of 23 programs (the ben
h-
1. INTRODUCTION mark adm from Perfe
tClub is not used be
ause it is exa
tly
In many programming languages, array de
larations are the same as ben
hmark apsi from SPEC CFP95). The av-
very important for program analyses su
h as array aliasing, erage per
entage of these de
larations is 59.66% out of the
array bound and array initialization
he
king. It is the re- total array argument de
larations in the 18 programs. To
sponsibility of the programmer to allo
ate a large enough ease
ode maintenan
e and debugging, an automati
phase
is essential if we do not want to infer thousands pre
ise de
-
larations manually.
Array de
larations
an raise problems not only in Fortran
Permission to make digital or hard copies of all or part of this work for but also in other
lass of programming languages that do
personal or classroom use is granted without fee provided that copies are not have variable de
larations. A large exe
ution time over-
not made or distributed for profit or commercial advantage and that copies head for dynami
array allo
ation and resizing in MATLAB
bear this notice and the full citation on the first page. To copy otherwise, to
republish, to post on servers or to redistribute to lists, requires prior specific ?
[ ℄ is due to repeated reallo
ation and
opying. As ma-
permission and/or a fee. trix and ve
tors sizes are not de
lared by the programmer,
PASTE’01, June 18-19, 2001, Snowbird, Utah, USA.. the MATLAB interpreter allo
ates storage for these vari-
Copyright 2001 ACM 1-58113-411-8/01/0006 ...$5.00.
ables on demand, during program exe
ution. An attempt ten, assuming M 1, whi
h
an be proven by using inter-
to write into a matrix element outside the bounds of the pro
edural information. So new de
larations REAL DX(M)
matrix element
auses the system to reallo
ate storage for and REAL DY(M)
an be indu
ed by nding the real array
the entire matrix,
opying over all elements from the old a
esses in this subroutine.
storage to newly allo
ated spa
e. This be
omes very expen-
sive if these operations are done within loops. In JAVA, PROGRAM MAIN
the dynami
allo
ation of the Ve
tor
lass exhibits the same REAL A(201,200),B(200)
INTEGER IPVT(200)
problem. In this
ase, an impli
it appropriate preallo
ation, LDA = 201
when possible, would be interesting to improve performan
e. N = 100
A judi
ious use of de
larations
an signi
antly improve C P(LDA,N) fLDA==201, N==100g
the quality of
ode that
an be generated by an APL
om- CALL DGESL(A,LDA,N,IPVT,B)
?
piler as shown in [ ℄. Pre
ise array de
larations are also im- CALL DGEFA(N,IPVT,B)
END
portant for data distribution in Fortran
ompilers for message-
?
passing ma
hines as in [ ℄. Furthermore, our te
hni
al re-
?
port [ ℄ shows that the normalization of array de
laration
SUBROUTINE DGESL(A,LDA,N,IPVT,B)
INTEGER LDA,N,IPVT(1)
is unavoidable for array range
he
king. REAL A(LDA,1),B(1)
As a result of these experiments, we developed two new DO 10 K = 1, N-1
methods to nd out automati
ally the proper upper bound L = IPVT(K)
for the unnormalized and assumed-size array de
larations, C P(LDA,N,K) fLDA==201, N==100, 1K, K99g
CALL DAXPY(N-K,L,A(K+1,K),B(K+1))
a pro
ess we
all array resizing. Both methods are imple-
???
mented in PIPS [ , , ℄, a free, open and extensible work-
10 CONTINUE
END
ben
h for automati
ally analyzing and transforming s
ien-
ti
appli
ations. The goals of PIPS are program
ompi- SUBROUTINE DGEFA(N,IPVT,B)
lation, reverse-engineering, program veri
ation, sour
e-to- INTEGER N,IPVT(1)
sour
e program optimization and parallelization. Its inter- REAL B(1)
DO 20 K = 1, N-1
pro
edural analyses help with program understanding and L = IPVT(K)
with
he
king the legality and the impa
t of automati
pro- C P(N,K) fN==100, 1K, K99g
gram transformations. CALL DAXPY(N-K,L,B,B(K+1))
The rst approa
h to array resizing is a top-down analy- 20 CONTINUE
sis based on the asso
iation rules of arguments in Fortran END
?
standard [ ℄. This method, whi
h is a whole program
om-
SUBROUTINE DAXPY(M,DA,DX,DY)
pilation, tries to nd the exa
t array argument de
larations REAL DX(*),DY(*),DA
in the
alled pro
edure with respe
t to the de
larations in C <DX(PHI1)-EXACT-f1PHI1, PHI1Mg>
the
alling pro
edures. C <DY(PHI1)-EXACT-f1PHI1, PHI1Mg>
The se
ond approa
h is a bottom-up analysis using array DO 30 I = 1,M
??
region analysis [ , ℄. New array sizes are
omputed from the
30
DY(I) = DY(I) + DA*DX(I)
CONTINUE
sets of a
tually a
essed array elements during the exe
ution
END
of the program. The main program is not always ne
essary
and this approa
h
an also be used for library maintenan
e.
The paper is organized as follows. Se
tion 2
ontains a
Figure 1: Running example, ex
erpt from Linpa
k
running example to des
ribe the problems and the two ap-
proa
hes. Se
tion 3 and Se
tion 4 present in detail, respe
- More details about the two above approa
hes are given in
tively, the top-down and the bottom-up approa
hes. The the following se
tions.
experimental results and the
omparison between the two
algorithms are dis
ussed in Se
tion 5. Con
lusions are given 3. TOP-DOWN APPROACH
in the last se
tion.
This approa
h is based on the asso
iation rules of formal
and a
tual arguments1 in Se
tion 15.9.3.3 of the Fortran
2. RUNNING EXAMPLE
The example in Figure 1 is extra
ted from the Linpa
k
?
77 standard [ ℄. In a program, the number and the size
of dimensions in an a
tual argument array de
laration may
ben
hmark. It
ontains 7 unnormalized and assumed-size be dierent from those in an asso
iated formal argument
array de
larations that should be resized su
h as IPVT(1), array de
laration (array reshaping ). A formal array
an
A(LDA,1) in DGESL and DX(*), DY(*) in DAXPY. be asso
iated with an a
tual array or with an a
tual array
By propagating information down the
all tree, the pro- element.
edure
all CALL DGESL(A,LDA,N,IPVT,B) in the main pro- In the rst
ase, the size of the formal argument array
gram shows that the a
tual array MAIN:A(201,200) is as- must not ex
eed the size of the a
tual argument array. The
so
iated with the formal array DGESL:A(LDA,1) and as we size of Q
an array is equal to the number of elements in the
have LDA = 201, the se
ond dimension of the array DGESL:A array: ni=1 di where n is the number of dimensions of array,
ould be repla
ed by 1:200. In the same way, we have di = ui li + 1 is the size of the i-th dimension in whi
h li
new array de
larations: DGESL:B(200), DGESL:IPVT(200), and ui are respe
tively the
orresponding lower and upper
DGEFA:B(200) and DGEFA:IPVT(200). bounds.
On the other hand, the intrapro
edural array region anal-
ysis in subroutine DAXPY shows that array region DX(1:M) 1
The a
tual Fortran terminology is "dummy" and "a
tual"
must be read, and that DY(1:M) must be read and writ- arguments
In the se
ond
ase, the size of the formal argument ar- Pre
onditions ? [ ℄, an auxiliary analysis in PIPS, are also
ray must not ex
eed the size of the a
tual argument array taken into a
ount. They are aÆne predi
ates over s
alar
plus one minus the subs
ript value of the array element. integer variable values that hold true before the exe
ution of
As Fortran language allo
ates array in
olumn-major order, the
orresponding statement. Pre
onditions are propagated
the subs
ript
P referen
eQA(s1 ; s2 ; ; sn ) is
value of an array from the module entry point down to the abstra
t syntax
Q
1 + ni=1 (si li ) ij=11 dj . Note that 0j=1 dj = 1. ?
tree leaves. Unstru
tured programs [ ℄ are also handled in
PIPS.
pro
edure Top Down Array Resizing(p)
The pre
ondition of the
urrent
all site gives more infor-
/* p:
urrent pro
edure */ mation for the simpli
ation and the translation steps.
begin
for ea
h a1 2 set of 1 and * arrays in p SUBROUTINE DGESL(A,LDA,N,IPVT,B)
V := ; /* set of new bound values */ INTEGER LDA,N,IPVT(200)
for ea
h q 2 set of
allers of p REAL A(LDA,200),B(200)
for ea
h
2 set of
alls for p in q END
a2 :=
orresponding argument(a1,
)
k := number equal dimensions(a2,a1,
,p,q) SUBROUTINE DGEFA(N,IPVT,B)
s2 := size of a
tual array(a2,k,
,q) INTEGER N,IPVT(200)
s2':= translate to
allee frame(p,s2,
,q) REAL B(200)
if (s2' != s2) then END
s1 := size of formal array(a1,k,p)
new value := s2'/s1 SUBROUTINE DAXPY(M,DA,DX,DY)
else REAL DX(*),DY(M+100),DA
new value := * END
endif
V := V [ fnew valueg
endfor
endfor
Figure 3: New de
larations with top-down approa
h
if |V|=1 then
last upper bound(a1) := element(V)
For the running example, array element passing in
else DGESL: CALL DAXPY(N K; L; A(K + 1; K); B(K + 1)) and
last upper bound(a1) := * DGEFA: CALL DAXPY(N K; L; B; B(K + 1)) makes it more diÆ-
endif
ult to nd the right sizes for arrays DAXPY:DX and DAXPY:DY.
endfor For DX, the
orresponding a
tual array size in the
aller
end DGESL is LDA 200 + 1 (1 + (K + 1 1) + (K 1) 201) and
Figure 2: The top-down array resizing algorithm in the
aller DGEFA is 200. By using the pre
onditions
DGESL : N = 100
Our algorithm is a top-down traversal of the
all graph of
the program that is des
ribed in Figure 1. We always begin and
by the main program and the new sizes of array arguments DGESL : LDA = 201
in the
alled pro
edure are
omputed with respe
t to the
a
tual de
larations in the
alling pro
edures. shown in Figure 1, and the binding information
Fun
tion number equal dimensions(a2; a1;
; p; q )
omputes
k, the number of rst equal dimensions of the a
tual array DGESL : N DGESL : K = DAXPY : M
a2 and the formal array a1. When the a
tual argument is an after the translation to the s
ope of DAXPY, we have two
array element, k is also the number of rst subs
ripts that dierent values: 20201 + 202 M and 200. So array DX still
are equal to their
orresponding lower bounds. This step has an assumed-size array de
laration.
simplies the
omputation of array sizes and the subs
ript However, all a
tual array sizes of array DY
an be trans-
value expressions. The size of the a
tual array in pro
edure lated to 100 + DAXPY : M and the new de
laration is DY(100 + M).
q is
al
ulated by using all dimensions but the rst k ones. 6 out of 7 arrays have been resized with appropriate bounds
If the a
tual argument is an array element, their subs
ript (Figure 3).
value expression is evaluated only from the subs
ript k + 1.
The size of the formal array is
omputed in the same way,
omitting the last dimension whi
h has an unnormalized or 4. BOTTOM-UP APPROACH
assumed-size bound to be repla
ed. The bottom-up approa
h has been designed to normalize
Fun
tion translate to
allee frame(p; s2;
; q ) tries to trans- programs when initial de
larations are not expli
it in the
late all variables in the size expression of the a
tual argu- main program or when the main program is missing e.g for
ment from the
aller's name spa
e into the
allee's name numeri
al libraries. A typi
al example is a program where
spa
e, by using the relations between formal and a
tual ar- arrays are de
lared as pointers and dynami
ally allo
ated
guments, and between the de
larations of global variables by mallo
-like fun
tions. In its subroutines, these arrays are
in both routines. If the new values
ontain variables that passed as arguments but their a
tual sizes are not always in
annot be translated in the s
ope of the
allee, or if these the argument list.
values are dierent for dierent
all sites, we have to keep One solution is to analyze the program behavior and to try
an assumed-size de
larator for the array. If not, we have to extra
t information from the dynami
allo
ation fun
tion
the same values for dierent
all sites and they are in the arguments in order to nd the a
tual array size and then
allee's frame, thus the array
an be resized with this new use the top-down approa
h. However, the allo
ation spa
e
upper bound value.
omputation often depends on the type of the elements and
operators so this solution seems unrealisti
. Furthermore, 5. Compute the maximum value of the last dimension
when the main program itself is missing or for programming variable n from the remaining
onstraints. If the sys-
languages with dynami
resizing, we need another method tem has several upper bounds and their relations are
to nd a
tual array a
esses. known at
ompile time, the pre
onditions are taken
The bottom-up approa
h
omputes, without knowledge into a
ount to nd this maximum value.
of the initial array de
larations, adequate de
larations. It is
based on a
onvex array a
ess analysis whose purpose is to As in the rst approa
h, the implementation returns *
identify the set of array elements a
essed during the exe
u- when the maximum operator fails. By applying the bottom-
tion of a given se
tion of
ode. An
onvex array region , as up algorithm, array regions permit to resize 6 out of 7 arrays
??
dened in [ , ℄, is a set of array elements des
ribed by a set for the running example (Figure 4). However, in subroutine
DGESL, as the referen
e B(L) is not aÆne (L = IPVT(K)), in-
of aÆne equalities and inequalities. These
onstraints link
the region parameters that represent the array dimensions formation about array regions is lost and we
annot nd out
to the values of the program integer s
alar variables. a new value for the upper bound of B.
A region has the approximation MUST if every element in
the region is
ertainly a
essed, and the approximation MAY SUBROUTINE DGESL(A,LDA,N,IPVT,B)
INTEGER LDA,N,IPVT(99)
if its elements are simply potentially a
essed. In fa
t, they REAL A(LDA,*),B(100)
are under- and over-approximation of EXACT whi
h means END
that the region exa
tly represents the a
essed set of array
elements. A major sour
e of inexa
tness
omes from the SUBROUTINE DGEFA(N,IPVT,B)
la
k of stru
ture of programs when goto s are used instead INTEGER N,IPVT(99)
REAL B(100)
of stru
tured tests and loops. The aggressive restru
turing END
phase implemented in PIPS is ne
essary to gather more in-
formation about array referen
es. Another limitation is due SUBROUTINE DAXPY(M,DA,DX,DY)
to nonlinear expressions or some
onvex region operators. REAL DX(M),DY(M),DA
For instan
e, the region END