Вы находитесь на странице: 1из 65

ARRAYS & LOOPS

A SAS array is nothing more than a collection of variables (of the same type), in which each variable can be identified by referring to the array and, by means of an index, to the location of the variable within the array. SAS arrays are defined using the ARRAY statement, and are only valid within the data step in which they are defined. The syntax for the array statement is: ARRAY array-name {subscript} <$> < length > << array-elements > <( initial-values )>>

ARRAYS & LOOPS


Why use SAS arrays? Repeat an action or set of actions on each of a group of variables Write shorter programs Restructure a SAS data set to change the unit of observation

ARRAYS & LOOPS(#2)


ARRAY array-name {subscript} <$> < length > << array-elements > <( initial-values )>> { subscript } is the dimension (possibl multiple) of the array, and can be omitted or specified as {*} in which case SAS infers the dimension from the number of array elements. < array-elements > is the list of array elements (variables) which can be omitted if the dimension is given, in which case SAS creates variables called array-name1 to array-name{n} where {n} is the dimension of the array. For example: array wt {50}; will cause the variables wt1-wt50 to be created.

ARRAYS & LOOPS(#3)


Another Quick Example array pop(1:5) ga sc nc va wv; array name is "pop" the sequence of variable names ga sc nc va wv is the "array list" the variables in the array list are the "array elements" each array element in this example has a position number in the array list from 1 to 5

ARRAYS & LOOPS (#4)


Convenience of arrays : . two ways to refer to variables which are array elements . variable name . array name with subscript pointing to position of variable in the array list . pop(3) refers to the variable in position # 3 in the array pop (nc in above example)

ARRAYS & LOOPS (#5)


Subscripts can be constants variables expressions Array elements must all be of same type (numeric or character) can be variables which exist or which will be created in the data step

ARRAYS & LOOPS (#6)


array x(1:6) a b c d e f; array x(0:5) a b c d e f; array quiz(20) q1-q20; array quiz(1:20) q1-q20; array quiz(*) q1-q20; x(5) same as e x(4) same as e equivalent array declarations quiz(4) same as q4 subscript lower bound=1, upper bound=20 SAS creates quiz1-quiz20 as array elements character array, elements have length=1 character color(2) same as blue pop(2) same as yr96

array quiz(20);
array color(1:3) $ 1 red blue green; array pop(1:5) yr95-yr99;

ARRAYS & LOOPS (#7)


array quiz(*) q1-q20; subscript lower bound=1, upper bound=20 array quiz(20); SAS creates quiz1-quiz20 as array elements ARRAY LOG_INC{3} INCOME1 INCOME2 INCOME3 (0 0 0); will initialize the values array color(1:3) $ 1 character array, elements have length=1 character red blue green; color(2) same as blue array pop(1:5) yr95-yr99; pop(2) same as yr96 array pop(95:99) yr95-yr99; pop(96) same as yr96 all numeric variables on the all character variables on the all variables on the observation

array x(*) _numeric_; observation array y(*) _character_; observation array z(*) _all_;

ARRAYS & LOOPS (#8)


Two-dimensional array declarations
array quiz(1:4,1:5) q1-q20; picture 4 rows, 5 columns: q1 q2 q3 q4 q5 q6 q7 q8 q9 q10 q11 q12 q13 q14 q15 q16 q17 q18 q19 q20 q(2,4) same as q9 array pop(1:3,98:99) nc98 nc99 va98 va99 sc98 sc99; nc98 nc99 va98 va99 sc98 sc99 picture 3 rows, 2 columns:

pop(3,99) same as sc99

ARRAYS & LOOPS (#9)


1. Recode the set of variables A B C D E F G in the same way: if the variable has a value of 99 recode it to SAS missing.
DATA Radhika; INPUT A B C D E F G; ARRAY V(7) A B C D E F G; DO K=1 TO 7; IF V(K)= 99 THEN V(K)=.; END; CARDS; 45 78 89 99 34 45 90 99 78 54 34 45 66 99 ; RUN; PROC PRINT; RUN;

ARRAYS & LOOPS (#10)

ARRAYS & LOOPS (#11)


ANOTHER EXAMPLE THAT SHOWS ARRAYS ;DO LOOPS; WITHOUT I/P

DATA IN(DROP=I); ARRAY X (*) X1-X10; DO I=1 TO 10; X(I)=I; END; RUN; TITLE 'THE DATA AS ONE OBSERVATION'; PROC PRINT; RUN;

ARRAYS & LOOPS (#13)


DATA IN(DROP=I); ARRAY X (*) X1-X10; ARRAY Y (*) Y1-Y10; DO I=1 TO 10; Y(I) = I * 2; X(I)=I; END; RUN; TITLE 'THE DATA AS ONE OBSERVATION'; PROC PRINT; RUN;

ARRAYS & LOOPS (#14)

ARRAYS & LOOPS (#15)


DATA IN(DROP=I); 12 ARRAY X (*) X1-X10;

13 14 16 15 17 18 19 20

ARRAY Y (*) Y1-Y9;


DO I=1 TO 10; Y(I) = I * 2; X(I)=I; END; RUN;

ERROR: Array subscript out of range at line 16 column 6.


X1=1 X2=2 X3=3 X4=4 X5=5 X6=6 X7=7 X8=8 X9=9 X10=. Y1=2 Y2=4 Y3=6 Y4=8 Y5=10 Y6=12 Y7=14 Y8=16 Y9=18 I=10 _ERROR_=1 _N_=1 NOTE: The SAS System stopped processing this step because of errors. WARNING: The data set WORK.IN may be incomplete. When this step was stopped there were 0 observations and 19 variables.

ARRAYS & LOOPS (#16)

2. Each observation of your data set has five variables SEX1 SEX2 SEX3 SEX4 SEX5 which give the sex (1=male, 2=female) of up to 5 persons. You want to count the number of males (MALES) and the number of females (FEMALES) on each observation. (How i can be dropped in the ex next) Data Another_Array_Example; Input Sex1-Sex5; array sex(1:5) sex1-sex5; males=0; females=0; do i=1 to 5; if sex(i)=1 then males=males+1; else if sex(i)=2 then females=females+1; end; Cards; 12112 21121 ;

ARRAYS & LOOPS (#17)

ARRAYS & LOOPS (#18)


3. Recode all numeric variables in your data set as follows: if a variable has a value of 98 or 99 recode it to SAS missing. Using * and

_NUMERIC_ and DIM

DATA SHOW_USING_DIM; INPUT SEX1-SEX5; ARRAY MYARRAY(*) _NUMERIC_; DO I=1 TO DIM(MYARRAY); IF MYARRAY(I)=98 OR MYARRAY(I)=99 THEN MYARRAY(I)=.; END; CARDS; 45 78 89 99 34 45 90 99 78 54 34 45 66 99

ARRAYS & LOOPS (#19)


DATA SHOW_TWO_DIMENSIONAL_ARRAY; array temprg{2,5} c1t1-c1t5 c2t1-c2t5; input c1t1-c1t5 / c2t1-c2t5; do i=1 to 2; do j=1 to 5; temprg{i,j}= temprg{i,j}; /* temprg{i,j}= temprg{i,j} * 2 */ end; end; datalines; 89.5 65.4 75.3 77.7 89.3 73.7 87.3 89.9 98.2 35.6 75.8 82.1 98.2 93.5 67.7 101.3 86.5 59.2 35.6 75.7 ; Proc print;

ARRAYS & LOOPS (#20)

ARRAYS & LOOPS (#21)


Three Dimensional Arrays

DATA THREE_DIMENSIONAL; INPUT X1-X8; ARRAY THREE_D(2,2,2) X1-X8; /* X1-X8 OR _NUMERIC_ */ DO I=1 TO 2; DO J=1 TO 2; DO K=1 TO 2; THREE_D(I,J,K) = THREE_D(I,J,K) * 2; END; END; END; CARDS; 10 20 30 40 50 60 70 80 11 22 33 44 55 66 77 88 19 29 39 49 59 69 79 89 ; RUN; PROC PRINT; RUN;
`

ARRAYS & LOOPS (#22)

Converting values of 99 to a SAS missing valuewithout using Array

data new; set learn.SPSS; if Height = 99 then Height = .; if Weight = 99 then Weight = .; if Age = 99 then Age = .; run; Proc print;run;

Converting values of 99 to a SAS missing value using Array


data new; set learn.SPSS; /*lib and dataset*/ array myvars{3} Height Weight Age; do i = 1 to 3; if myvars{i} = 99 then myvars{i} = .; end; drop i; run; Proc print;run;

The first thing you may notice is that the program with arrays is longer than the one without arrays! However, if you had 50 or 100 variables to process, the program using arrays would not be any longer.

Brackets to be used in Array


You may also use square brackets [ ] or parentheses () following the array name. SAS documentation usually uses curly brackets {}. It is recommend always use the same type of brackets (either straight or curly) when you use arrays. The reasonyou can always tell when a program is referencing an array if you reserve a particular type of bracket for use only when you are writing an array element. By placing the array in a DO loop, you can process each variable in the array. Also, because you do not need or want the DO loop counter included in the SAS data set, you use a DROP statement.

Call Missing routine


you could use the CALL MISSING routine to assign a missing value to a variable
data new; set learn.SPSS; array myvars{3} Height Weight Age; do i = 1 to 3; if myvars{i} = 999 then call missing(myvars{i}); end; drop i; run; Proc print;run;

Replacing NA and ? To Misising


data missing; set learn.chars; array char_vars{*} $ _character_; do loop = 1 to dim(char_vars); if char_vars{loop} in ('NA' '?') then call missing(char_vars{loop}); end; drop loop; run; Proc Print;Run;

_CHARACTER_ includes all the character variables in the Chars data set.

Converting all character values in a SAS data set to lowercase data lower; set learn.careless; array all_chars{*} _character_; do i = 1 to dim(all_chars); all_chars{i} = lowcase(all_chars{i}); end; drop i; run;

the keyword _CHARACTER_ to reference all the character variables in data set Careless, and then use a DO loop to convert all the values to lowercase.

Using an array to create new variables:


data temp; input Fahren1-Fahren24 @@; array Fahren[24]; array Celsius[24] Celsius1-Celsius24; do Hour = 1 to 24; Celsius{Hour} = (Fahren{Hour} - 32)/1.8; end; drop Hour; datalines; 35 37 40 42 44 48 55 59 62 62 64 66 68 70 72 75 75 72 66 55 53 52 50 45 ; Proc Print;run;

Changing the array bounds:


SAS numbers the elements of an array starting from 1. There are times when it is useful to specify the beginning and ending values of the array elements. For example, if you have variables Income1999 to Income2006, it would be nice to have the array elements start with 1999 and end with 2006. data account; input ID Income1999-Income2006; array income{1999:2006} Income1999Income2006; array taxes{1999:2006} Taxes1999-Taxes2006; do Year = 1999 to 2006; Taxes{Year} = .25*Income{Year}; end; drop Year; datalines; 001 45000 47000 47500 48000 48000 52000 53000 55000 002 67130 68000 72000 70000 65000 52000 49000 40100 ; Proc Print;Run;

Temporary array
data score; array ans{10} $ 1; array key{10} $ 1 _temporary_ ('A','B','C','D','E','E','D','C','B','A'); input ID (Ans1-Ans10)($1.); RawScore = 0; do Ques = 1 to 10; RawScore + (key{Ques} eq Ans{Ques}); end; Percent = 100*RawScore/10; keep ID RawScore Percent; datalines; 123 ABCDEDDDCA 126 ABCDEEDCBA 129 DBCBCEDDEB ;

This program uses a temporary array (key) to hold the answers to the 10 quiz questions. The keyword _TEMPORARY_ tells SAS that this is a temporary array and the 10 values in parentheses are the initial values for each of the elements of this array. It is important to remember that there are no corresponding variables (Key1, Key2, and so on) in this DATA step. Also, because elements of a temporary array are retained, the 10 answer key values are available throughout the DATA step for scoring each of the student tests. The scoring is done in a DO loop. A trick is used to do the scoring: a logical comparison is performed between the student answer and the corresponding answer key. If they match, the logical comparison returns a 1 and this is added to RawScore. If not, the result is a 0 and RawScore is not incremented.

ARRAYS & LOOPS (#23)


DO WHILE Statement You can execute the statements in a DO group repetitively while a condition holds using the DO WHILE statement. The form of the Do While statement is

Do While(Expression); Sas Statements; End;

Where EXPRESSION: is any expression. The expression is evaluated at the top of the loop before the statements in the DO group are executed.

ARRAYS & LOOPS (#24)


These statements repeat the loop as long as long as N is less than 5. There are 5 iterations in(0,1,2,3,4).

DO WHILE
DATA DOWHILE_USING; N=0; DO WHILE (N LT 5); OUTPUT; N+1; END; RUN; PROC PRINT; RUN; /* N < 5 */

ARRAYS & LOOPS (#25)


DATA TEST_DO_LOOP; INPUT SCORE1-SCORE5; ARRAY S{5} SCORE1-SCORE5; N=1; DO WHILE( N LE 5); S{N}=S{N}*10; N+1; END; CARDS; 10 20 30 40 50 11 22 32 42 52 ; Run; PROC PRINT; RUN;

ARRAYS & LOOPS (#26)


WITHOUT RUNNING A PROGRAM, WHAT WILL BE THE OUTPUT OF THE PROGRAM (QUESTION TO STUDENTS) ??? THINKKKKKKKKK DATA TEST; INPUT SCORE1-SCORE5; ARRAY S{5} SCORE1-SCORE5; N=1; DO WHILE( N LT 5); S{N}=S{N}*10; N+1; END; CARDS; 10 20 30 40 50 11 22 32 42 52 ; RUN; PROC PRINT; RUN;

ARRAYS & LOOPS (#27)

ARRAYS & LOOPS (#28)


DO UNTIL statement Do Until statement, like the Do While statement, executes the statements in a Do loop conditionally. The Do Until evaluates the condition at the bottom of the loop rather than at the Top (as Does While. Thus the statements b/n DO and END are always executed at least one time. The form of the DO UNTIL statement is DO UNTIL(Expression); where expression is any expression. DATA DO_UNTIL; N=0; DO UNTIL (N>=5); OUTPUT; N+1; END; RUN; PROC PRINT; RUN;

ARRAYS & LOOPS (#29)


Another Example where N=6 and then starts the loop. DATA Another_Example_Do_Until; N=6; DO UNTIL (N>=5); OUTPUT; N+1; END; RUN; PROC PRINT; RUN;

ARRAYS & LOOPS (#30)


DATA TEST; INPUT SCORE1-SCORE5; ARRAY S{5} SCORE1-SCORE5; N=1; DO UNTIL( N GT 5); S{N}=S{N}*10; N+1; END; CARDS; 10 20 30 40 50 11 22 32 42 52 ; RUN; PROC PRINT; RUN;

ARRAYS & LOOPS (#31)


WHAT HAPPENS TO O/P IF USE

N GE 5

DATA TEST; INPUT SCORE1-SCORE5; ARRAY S{5} SCORE1-SCORE5; N=1; DO UNTIL( N GE 5); S{N}=S{N}*10; N+1; END; CARDS; 10 20 30 40 50 11 22 32 42 52 ; RUN;

PROC PRINT;

RUN;

ARRAYS & LOOPS (#32)

ARRAYS & LOOPS (#33)


ARRAYS USING {} INSTEAD OF USING ()
DATA CHECKS; INPUT X1-X5 Y; ARRAY T{5} X1-X5; I = 1; DO WHILE(T{I} < Y ); PUT T{I} = Y=; I = I + 1; END; CARDS; 123453 024686 ; RUN; PROC PRINT; RUN;

In the SAS log, you will see values for T{I},Y bcz of PUT statement(When you use PUT= it will show the

ARRAYS & LOOPS (#34)

ARRAYS & LOOPS (#35)


IMPICIT DO LOOPS ARRAYS

Replace where ever you see 99 by 100

PROC PRINT; RUN;

DATA CHECKS; INPUT D1-D7; ARRAY DAYS(INDX) D1-D7; DO INDX = 1 TO 7; IF DAYS = 99 THEN DAYS=100; END; CARDS; 99 100 200 99 100 99 34 34 100 99 45 500 99 23 ; RUN;

ARRAYS & LOOPS (#36)


LOOPS USING DO OVER
DATA TEST; INPUT SCORE1-SCORE5; ARRAY S SCORE1-SCORE5; DO OVER S; S=S*10; END; CARDS; 10 20 30 40 50 11 22 32 42 52 ; RUN; PROC PRINT; RUN;

ARRAYS & LOOPS (#37)


LOOPS USING DO OVER

ARRAYS & LOOPS (#38)


DATA CHECKS; DO I= 1 TO 10; J=I+1; OUTPUT; END; RUN; PROC PRINT; RUN; Here the loops runs 4 times for each 2,3,5,7 value;;;;; DATA CHECKS; DO I= 2,3,5,7; J=I+1; OUTPUT; END; RUN; PROC PRINT; RUN; DATA CHECKS; DO I= 1 TO 10 BY 2; J=I+1; OUTPUT; END; RUN; PROC PRINT; DATA CHECKS; DO I= 'JAN','FEB','MAR'; J=I || 'MONTH'; OUTPUT; END; RUN; PROC PRINT; RUN;

ARRAYS & LOOPS (#39)


BY using _Temporary_ you can supply the data values in the array statement itself. DATA TEMPS; ARRAY TEMPRG{3} _TEMPORARY_ (20,30,40); DO I=1 TO 3; K=TEMPRG{I} * 2; OUTPUT; END; RUN; PROC PRINT; RUN;

Negative Increment DATA CHECKS; DO I= 10 TO 1 BY -1; J=I+1; OUTPUT; END; RUN; PROC PRINT; RUN;
Do and While in the same statement; DATA CHECKS; MONTH='JAN'; DO I= 10 TO 1 BY -1 WHILE(MONTH='JAN'); J=I+1; IF J=6 THEN MONTH='FEB'; OUTPUT; END; RUN; PROC PRINT; RUN;

DATA CHECKS; DO I= 2 TO 8 BY 2,11,13 TO 16; J=I+1; OUTPUT; END; RUN; PROC PRINT; RUN;

ARRAYS & LOOPS (#40)

ARRAYS & LOOPS (#41)


DATA CHECKS; DO I= .1 TO .9 BY .1; J=I+1; OUTPUT; END; RUN; PROC PRINT; RUN;

DATA CHECKS; J=0; DO YEAR=1946 TO 1949; J=J+YEAR; OUTPUT; END; RUN; PROC PRINT; RUN;

The following are the valid Do loops , you may want to use inside the program. Do I = 1 to N; Do I = N to 1 by 1; Do I=K+1 TO N-1; Do I=1 to K-1, K+1 to N; Do I=Saturday, Sunday,Monday; Do I=01jan85d ,06Apr98d;

ARRAYS & LOOPS (#42)


This very important example that shows ANY

Upper bound is 5 but inside the loop I have set to 20. Inspite of being it set to 20 it will not come out of the loop. It just overlooks the value for considering the UPPERBOUND. However if you change the INDEX value(In this case I) then loop gets EFFECTED. Shows in the next example. DATA CHECKS; K=5; DO I= 1 TO K; K=20; OUTPUT; END; run ;

CHANGES MADE TO THE UPPER BOUND or INCREMENT within the DO group do not effect the no of iterations. In this example my

ARRAYS & LOOPS (#43)


11. CHANGE THE INDEX value thereby it effects the Loop.

DATA CHECKS; K=5; DO I= 1 TO K; I=6; OUTPUT; END; RUN ; PROC PRINT; RUN;

ARRAYS & LOOPS (#44)


DIM FUNCTION: Dim function is really useful when you declare an array with an unknown number of elements.
DATA IN(DROP=I); ARRAY X (*) X1-X10; DO I=1 TO DIM(X); X(I)=I; END; RUN; TITLE 'THE DATA HAS ONE OBSERVATION';

You can write this way also; I can also use Curling braces while declaring and referencing array elements , However I cant use Curling braces while using DIM function.
DATA IN(DROP=I); ARRAY X {*} X1-X10; DO I=1 TO DIM(X); X{I}=I; END; RUN; TITLE 'THE DATA HAS ONE OBSERVATION'; PROC PRINT; RUN;

We now know that SAS data sets created during a SAS session are automatically deleted at the end of the session, unless they are saved to a permanent data set. You can also delete SAS data sets at will during a session. This may be necessary if you have to create a large number of data sets and your computer system runs into space restrictions. PROC DATASETS is a useful SAS procedure that allows deleting SAS data sets from any library. For example, the following statements will delete DEMOG from the WORK libarary: Proc datasets library=work; Delete demog; run ; quit; You may want to delete DEMOG from WORK after you have stored it permanently in another library such as PROJ. PROC DATASETS has a lot of other

Deleting SAS Datasets

PROC DATASETS
proc datasets library=health nolist; modify group (label='Test Subjects' read=green sortedby=lname); index create vital=(birth salary) / nomiss unique; informat birth date7.; format birth date7.; label salary='current salary excluding bonus'; modify oxygen; rename oxygen=intake; label intake='Intake Measurement'; quit;

PROC DATASETS
data Dataset1; set Dataset2; run; The DATA step will accomplish the task, but it may be inefficient, especially if you are working with a large number of observations. Another method is to use a COPY statement in PROC DATASETS. LIBNAME perm 'c:\BCBS'; Data DoLoop_Test; do I = 1 to 1500; output; end; run; PROC DATASETS lib=work; copy out=perm; select DoLoop_Test ; run;

Perm vs Temp Datasets