Академический Документы
Профессиональный Документы
Культура Документы
Introduction The SAS environment is designed to be easy to use, with windows for all the basic SAS tasks. After you become familiar with the starting points for your SAS tasks, you are ready to use the full range of SAS software. This lesson shows you how to use SAS windows to manage your SAS session, work with files, and process SAS programs
This lesson shows you how to use SAS windows to manage your SAS session, work with files, and process SAS programs.
Objectives
In this lesson, you learn to 1 The Explorer, Program Editor, Log, Output, and Results windows 2 Use Enhanced Editor windows 3 Manage your SAS windows 4 Use window features including menus, pop-up menus, 5 Create SAS libraries 6 Explore and manage SAS files 7 Enter and submit SAS programs 8 Create and use file shortcuts.
Using the Main SAS Windows When you start SAS software, by default the five main SAS windows open: I. Explorer, II. Program Editor, III. Log, IV. Output, V. Results Note: In the Windows operating environment, an Enhanced Editor window (see Editor - Untitled1 in the illustration below) opens instead of the Program Editor window.
Cont.
Features of SAS Windows Maximize, minimize, and restore windows Use menus, pop-up menus, and toolbars
I The Explorer Window 1. create new libraries and SAS files 2. open any SAS file 3. perform most file management tasks such as moving, copying and deleting files 4 .create shortcuts to non-SAS files.
II (A)The Program Editor Window In the Program Editor window, you enter, edit, and submit SAS programs. You can also open existing SAS programs. You can display the Program Editor window by selecting. View Program Editor
Note: In the Windows operating environment, the Program Editor window is not displayed by default
To open any SAS program in operating environments that support drag and drop, you can drag and drop it onto the Program Editor window. Alternately, you can select File Open and choose a program. In addition, you can specify the number of lines to submit at a time recall submitted statements save contents automatically clear contents turn line numbers on and off use the command line or menus save your program.
(B) Enhanced Editor Windows (Windows Operating Environment) In the Windows operating environment, an Enhanced Editor window opens by default. Like the Program Editor window, you can use the Enhanced Editor window to enter, edit, and submit SAS programs. You can open multiple Enhanced Editor windows. To display an additional Enhanced Editor window, select View Enhanced Editor.
III Log Window The Log window displays messages about your SAS session and any SAS programs you submit. You can display the Log window by selecting View Log.
IV Output Window You can create two basic types of SAS output: 1. listing, 2. HTML
listing
HTML
Height Gender Mean F 64.82 M 72.00 Std 2.4 4 2.1 6 Range Mean 141.7 3 172.8 0 Std 16.9 1 16.1 1 Range Weight
8.00 7.00
54.00 46.00
V Results Window The Results window helps you navigate and manage output from SAS programs that you submit. You can view, save, and print individual items of output. View Results
Default Libraries
Defining Libraries To define a library, you assign it a library name and specify a path, such as a directory. (It's a good idea to create the directory or other storage location before defining the library.) You also specify an engine, which is a set of internal instructions SAS software uses for writing to and reading from files in a library.
Using SAS Solutions and Tools Along with windows for working with your SAS files and SAS programs, SAS software provides a set of ready-to-use solutions, applications, and tools. You can access many of these tools by using the Solutions and Tools menus
Lesson:2
Basic Concepts
Introduction To program effectively using SAS software, you need to understand basic concepts about SAS programs and the SAS files that they process in various ways. In particular, you need to be familiar with SAS data sets, which are data that is logically arranged in a form accessible to SAS software. In this lesson, you'll examine a simple SAS program and see how it works. You'll learn details about the SAS data sets that it processes and find out about other types of SAS files. Finally, you'll see how SAS files are stored temporarily or permanently in SAS libraries.
Objectives
The structure and components of SAS programs The steps involved in processing SAS programs The structure and components of SAS data sets The two types of SAS data sets Sas libraries and the types of SAS files that they contain Temporary and permanent SAS libraries
SAS Programs
Components of SAS Programs Our sample SAS program contains two steps: a DATA step and a PROC step. These two types of steps, alone or combined, form all SAS programs.
Components of SAS Programs DATA steps: Put your data into a SAS data set Compute the values for new variables Check for and correct errors in your data Produce new SAS data sets by sub setting, merging,and Updating existing data sets. PROC (procedure) steps: print a report produce descriptive statistics create a tabular report produce plots and charts
SAS statements are free-format that means A) They can begin and end anywhere on a line B) One statement can continue over several lines C) Several statements can be on a line. D) Blanks or special characters separate "words in SAS statement.
Processing SAS Programs When you submit a SAS program, SAS software reads the statements and checks them for errors. When it encounters a DATA, PROC, or RUN statement, SAS software stops reading statements and executes the current step in the program. In our sample program, each step ends with a RUN statement.
Variables (Columns)
Descriptor Portion The descriptor portion of a SAS data set contains information about the data set, including The name of the data set The date and time the data set was created The number of observations The number of variables. Let's look at a different SAS data set. The table below lists part of the descriptor portion of the data set Clinic.Insure, which contains insurance information for patients admitted to a wellness clinic. (Likewise, your data set names should be descriptive of the contents of the data set.)
Version 8 of SAS software supports mixed case and long names for data sets and other members of SAS libraries, and for variable names. The lengths of character variable values and labels have also increased in Version 8. Max. Length in V6 8 bytes 8 bytes Max. Length in V8 32 bytes 32 bytes 32K 256 bytes
character variable values 200 bytes variable and member labels 40 bytes
Variable Attributes: Formats and Informats Formats and informats (input formats) are variable attributes that affect the way data values are written and read, respectively. SAS software offers a variety of formats and informats for numeric and character data, including date and time values.
Name -----Policy Total Name Type ---Num Num Char Length -----8 8 20
Format --------DOLLAR8. 2
Informat -------COMMA10.
MDDB
View files
Lesson :3
Edit SAS programs Clear SAS programming windows Interpret error messages in the SAS log Correct errors Resolve common problems.
Including a Stored SAS Program So far, you've copied and pasted SAS programs into the Program Editor window and submitted them for execution. Now you'll learn how to include (copy) a stored program into the Program Editor window. You can include a program using File Shortcuts My Favorite Folders the Open dialog box.
Using File Shortcuts File Shortcuts are located in the Explorer window. To include a program using File Shortcuts, 1.open File Shortcuts 2.double-click the file or select Open from the pop-up menu for the file.
Using My Favorite Folders To include a file that is stored in My Favorite Folders, 1.select View My Favorite Folders 2.double-click the file or select Open from the pop-up menu for the file.
In the Windows environment, you can submit your file directly from the Open dialog box by clicking the Submit check box before clicking Open.
SAS Program Structure Remember that SAS programs are made up of SAS statements.
Program Editor Features The Program Editor window allows you to enter and edit programs just as you would with a word processing program. You can also use and to insert, delete, move, and copy lines within the Program Editor window. Tools Options Editor
Error Types So far, the programs you've submitted in this lesson have been error-free, but programming errors do occur. SAS can detect five types of errors: syntax semantic execution-time data macro related.
syntax
semantic
Syntax errors are detected during compile time and occur when program statements do not conform to the rules of the SAS language. Semantic errors are detected during compile time and occur when the form of the elements in a SAS statement is correct, but the elements are not valid for that usage. Execution-time errors are errors that occur when the SAS System executes the program on data values.
executiontime
Data errors are detected during execution time and occur when some data values are not appropriate for data the SAS statements you have specified in the program. macro related Macro related errors occur when there are errors in using the macro facility itself or when there are errors in the SAS code produced by the macro facility.
Syntax Errors
Syntax errors generally cause SAS software to stop processing the step where the error is encountered. Common syntax errors include spelling mistakes forgetting semicolons leaving quotation marks unbalanced specifying invalid options.
IN LOG WINDOW displays the word ERROR identifies the possible location of the error prints an explanation of the error.
Message
Resolving the Problem To correct the error, submit a RUN statement to complete the PROC step. run;
Resolving the Problem To correct the error, do the following: 1.Recall the program to the Program Editor window. 2.Find the missing semicolon and add it. You can usually locate the statement where the semicolon belongs by looking at the underscored keywords in the error message and working backwards. 3.Resubmit the corrected program. 4.Check the Log window again to make sure that there are no other errors.
Resolving the Problem Unbalanced Quotes Some syntax errors, such as the missing quotation mark after HIGH in the program below, cause SAS to misinterpret the statements in your program.
Resolving the Problem To resolve the error, submit a quote followed by a semicolon and a RUN statement.
; run;
Resolving the Problem To correct the error, do the following: 1.Recall the program to the Program Editor window. 2.Remove or replace the invalid option, and check your statement syntax as needed. 3.Resubmit the corrected program. 4.Check the Log window again to make sure there are no other errors.
Lesson :4
Objectives
In this lesson, you learn to
Reference a SAS data library Reference a raw data file Name a SAS data set to be created Specify a raw data file to be read Read standard character and numeric Values in fixed fields Submit and verify a data step program Subset data.
1.Reference SAS data library 2.Reference external file 3.Name SAS data set 4.Identify external file 5.Describe data 6.Execute DATA step 7.List the data 8.Execute final program step
LIBNAME statement FILENAME statement DATA statement INFILE statement INPUT statement RUN statement
PROC PRINT statement RUN statement
Defining Libraries
You learned to assign library names using SAS windows. You can also assign library names using programming statements. libname taxes 'c:\users\acct\qtr1\report';
How Long Librefs Remain in Effect The LIBNAME statement is global, which means that its settings remain in effect until you modify them, cancel them, or end your SAS session.
Ex : data budget_1999 ;
INPUT
General form, statement using column input: INPUT variable <$> startcol-endcol . . . ; where variable is the SAS name you assign to the field the dollar sign ($) identifies the data set type as character (nothing appears here if the data set is numeric) startcol represents the starting column location in the data line for this variable endcol represents the ending column location in the data line for this variable
>----+----10---+----20 2810 61 MOD F 2804 38 HIGH F 2807 42 LOW M 2816 26 HIGH M 2833 32 MOD F 2823 29 HIGH M
input ID $ 1-4 Age 6-7 Act Level $ 9-12 Sex $ 14;
data venu.sample; Infile C:\v8\Desktop\sample.txt; input ID 1-4 Age 6-7 ActLevel $ 9-12 Sex $ 14;Run; Proc print data= venu.sample; Run; Obs 1 2 List output 3 4 5 6 ID 2810 2804 2807 2816 2833 2823 Age 61 38 42 26 32 29 Level MOD HIGH LOW HIGH MOD HIGH Sex F F M M F M
Describing the Data The INPUT statement describes the fields of raw data to be read and placed into the SAS data set.
To do this... Reference SAS data library Reference external file Name SAS data set Identify external file Describe data Execute the DATA step
data q.sample; input ID $ 1-4 Age 6-7 ActLevel $ 9-12 Sex $ 14; Cards; 2810 61 MOD F
run;
Regular option
DATE|NODATE NUMBER|NONUMBER PAGENO= PAGESIZE= LINESIZE=
Additional Features
BYLINE|NOBYLINE DETAILS|NODETAILS FIRSTOBS= FORMCHAR= FORMDLIM= LABEL|NOLABEL OBS= REPLACE|NOREPLACE SOURCE|NOSOURCE
YEARCUTOFF
How the YEARCUTOFF= Option Works When a two-digit year value is read, SAS software defaults to a year in the twentieth century. (For Version 8 of SAS software, the default value of YEARCUTOFF= is 1920.) Default:-
Expression Interpreted
Example:
options nonumber nodate; proc print data=sales.qtr; var salesrep type unitsold region; where unitsold>=30; run; options date; proc freq data=sales.qtr4; where unitsold>=30; tables salesrep; run;
Lesson 5
Objectives 1. Identify the two phases that occur when a DATA step is processed 2. Identify the processing phase in which an error occurs 3. Debug SAS DATA steps 4, Compiling and execution phases
Writing Basic DATA Steps If you completed the prerequisites for this module, you learned how to write a DATA step to create a permanent SAS data set from raw data in an external file.
data clinic.stress; infile tests; input ID 1-4 Name $ 6-25 RestHR 2729 MaxHR 31-33 RecHR 35-37 TimeMin 39-40 TimeSec 42-43 Tolerance $ 45; run;
data clinic.stress; infile tests; input ID 1-4 Name $ 6-25 RestHR 27-29 MaxHR 31-33 RecHR 35-37 TimeMin 39-40 TimeSec 42-43 Tolerance $ 45; run;
Compilation Phase How SAS Software Processes Programs When you submit a DATA step, SAS software processes the DATA step and creates a new SAS data set. Let's see exactly how that happens. A SAS DATA step is processed in two distinct phases:
During the compilation phase, each statement is scanned for syntax errors. Most syntax errors prevent further processing of the DATA step. If the DATA step compiles successfully, then the execution phase begins. A DATA step executes once for each observation in the input data set,
At the beginning of the compilation phase, the input buffer, an area of memory, is created to hold a record from the external file. The input buffer is created only when raw data is read, not when a SAS data set is read. The term input buffer refers to a logical concept and does not necessarily reflect the physical storage of data
Compilation Phase The program data vector contains two automatic variables that can be used for processing but are not written to the data set as part of an observation. 1. _N_ counts the number of times that the DATA step has begun to execute. 2._ERROR_ signals the occurrence of an error caused by the data during execution. The default value is 0, which means there is no error. When an error occurs, whether one error or a number of errors, the value is set to 1.
Compilation Phase
During the compilation phase, SAS software also scans each statement in the DATA step, looking for syntax errors. Syntax errors include: missing or misspelled keywords invalid variable names missing or invalid punctuation invalid options.
Compilation Phase As the INPUT statement is compiled, a slot is added to the program data vector for each variable in the input data set. Generally, variable attributes such as length and type are determined the first time that a variable is encountered data inder.update; infile invent; input Item $ 1-13 IDnum $ 15-19 Instock 21- 22 Backord 24-25; Total=instock+backord ; run;
Compilation Phase
Any variables created in the DATA step are also added to the program data vector. For example, the assignment statement below creates the variable Total. As the statement is compiled, the variable is added to the program data vector. The attributes of Total are determined by the expression in the statement. Because the expression produces a numeric value, Total is defined as a numeric variable and assigned a default length of 8.
At the bottom of the DATA step (in this example, when the RUN statement is encountered), the compilation phase is complete and the descriptor portion of the new SAS data set is created. The descriptor portion of the data set includes: Name of the data set Number of observations and variables Names and attributes of the variables
Execution Phase
After the DATA step is compiled, it is ready for execution. During the execution phase, the data portion of the data set is created. The data portion contains the data values.
During execution, each observation in the input data set is processed, stored in the program data vector, and then written to the new data set as an observation, unless otherwise directed. The DATA step executes once for each observation in the input data set, unless otherwise directed. For example, this DATA step reads values from the file Invent and executes nine times because there are nine records in the file.
Execution Phase
At the beginning of the execution phase, the value of _N_ is 1. Because there are no data errors, the value of _ERROR_ is 0.
data perm.update; infile invent; input Item $ 1-13 IDnum $ 15-19 Instock 21-22 BackOrd 24-25; Total=instock+backord; run;
The remaining variables are initialized to missing. Missing numeric values are represented by a period and missing character values are represented by a blank.
Execution Phase Next, the INFILE statement identifies the location of the raw data.
data perm.update; infile invent; input Item $ 1-13 IDnum $ 15-19 Instock 21-22 BackOrd 24-25; Total=instock+backord; run;
Execution Phase When an INPUT statement begins to read data values from a record, it uses an input pointer to keep track of its position. The input pointer starts at column 1 of the first record, unless otherwise directed. As the INPUT statement executes, the raw data in columns 1-13 are read and assigned to Item in the program data vector. data perm.update; infile invent; input Item $ 1-13 IDnum $ 15-19 Instock 2122 BackOrd 24-25; Total=instock+backord; run;
V---+----1----+----2----+Bird Feeder 6 Glass Mugs Glass Tray Padded Hangrs Jewelry Box Red Apron Crystal Vase Picnic Basket Brass Clock LG088 SB082 BQ049 MN256 AJ498 AQ072 AQ672 LS930 AN910
3 6 12 15 23 9 27 21 2
20 12 6 20 0 12 0 0 10
Execution Phase Notice that the input pointer now rests on column 14. With column input, the pointer moves as far as the INPUT statement instructs it and stops in the column immediately following the last one read.
data inder.update; infile invent; input Item $ 1-13 IDnum $ 15-19 Instock 21-22 BackOrd 24-25; Total=instock+backord; run;
Next, the data in columns 15-19 are read and assigned to IDnum in the program data vector.
data inder.update; infile invent; input Item $ 1-13 IDnum $ 15-19 Instock 21-22 BackOrd 24-25; Total=instock+backord; run;
At the end of the DATA step, three default actions occur. First, the values in the program data vector are written to the data set as the first observation. data perm.update; infile invent; input Item $ 1-13 IDnum $ 15-19 Instock 21-22 Backord 24-25; Total=instock+backord; run;
Execution Phase
IDnum LG088
InStock 3
BackOrd 20
Total 23
data perm.update; infile invent; input Item $ 1-13 IDnum $ 15-19 InStock 21-22 BackOrd 24-25; Total=instock+backord; run;
07 daat perm.update; ---14 WARNING 14-169: Assuming the symbol DATA was misspelled as daat. 08 infile invent; 09 input Item $ 1-13 IDnum $ 15-19 10 InStock 21-22 BackOrd 24-25; 11 Total=instock+backord; 12 run;
>----+----1----+----2----+Bird Feeder Glass Mugs Glass Tray Padded Hangrs Jewelry Box Red Apron Crystal Vase LG088 SB082 BQ049 MN256 AJ498 AQ072 AQ672 3 6 12 15 23 9 27 20 12 6 20 0 12 0
Debugging a DATA Step In this case, the DATA step completes the execution phase and the observations are written to the data set. However, several notes appear in the log. SAS Log
NOTE: Invalid data for IDnum in line 7 15-19.
RULE:
----+----1----+----2----+----3----+----
4
07 Crystal Vase AQ672 27 0 Item=Crystal Vase IDnum=. InStock=27 BackOrd=0 Total=27 _ERROR_=1 _N_=7 NOTE: Invalid data for IDnum in line 8 15-19. 08 Picnic Basket LS930 21 0 Item=Picnic Basket IDnum=. InStock=21 BackOrd=0 Total=21 _ERROR_=1 _N_=8 NOTE: Invalid data for IDnum in line 9 15-19. 09 Brass Clock AN910 2 10 Item=Brass Clock IDnum=. InStock=2 BackOrd=
The PRINT procedure displays the data set with the missing values for IDnum. In this example,
the periods indicate that IDnum is a numeric variable, although it should be defined as a character variable.
Obs
1 2 3 4 5 6 7 8 9
Item
Bird Feeder Glass Mugs Glass Tray Padded Hangrs Jewelry Box Red Apron Crystal Vase Picnic Basket Brass Clock
IDnum
. . . . . . . . .
InStock
3 6 12 15 23 9 27 21 2
BackOrd
20 12 6 20 0 12 0 0 10
Total
23 18 18 35 23 21 27 21 12
Lesson :6
Reading Raw Data in Fixed Fields How your data is organized determines which input style you should use to read the data. SAS software provides three primary input styles: column, formatted, and list input. This lesson teaches you how to use column or formatted input to read data that is arranged in fixed fields
Objectives 1 Standard and nonstandard numeric data 2 Read standard fixed-field data 3 Read nonstandard fixed-field data.
Input Styles :
1. 2. 3. 4. 5.
Column input Formatted input list input Mixed input Named input
1. Column Input
That the INPUT statement lists the variables with their corresponding column locations in order from left to right. However, one of the features of column input is the capability to read fields in any order. For example, you could have read the values for
InStock and BackOrd before the values for Item and IDnum
>----+----10---+----20---+
JOSEPH PAUL THACKERY JR.
No placeholder is required for missing data. A blank field is read as missing and does not cause other fields to be read incorrectly.
input Item $ 1-13 IDnum $ 15-19 Supplier $ 15-16 InStock 21-22 BackOrd 24-25;
Identifying Nonstandard Numeric Data Standard Numeric Data Standard numeric data values can only contain Numbers Decimal points Numbers in scientific, or E notation (23E4) Minus signs.
Choosing an Input Style Nonstandard data values require an input style with more flexibility than column input. You can use formatted input, which combines the features of column input with the ability to read nonstandard, as well as standard data.
Standard or non standard fixed fields data
Whenever you encounter raw data that is organized into fixed fields, you can use Column input to read standard data only Formatted input to read both standard and nonstandard data.
Using Formatted Input Formatted input is a very powerful method for reading both standard and nonstandard data in fixed fields.
General form, INPUT statement using formatted input: INPUT pointer-control variable informat.; where pointer-control positions the input pointer on a specified column variable is the name of the variable being created informat is the special instruction that specifies how SAS software reads raw data.
Using Formatted Input In this lesson, you'll be working with two column pointer controls. The @n moves the input pointer to a specific column number. The +n moves the input pointer forward, to a column number relative to the current position.
>V---+----10---+----20---+--EVANS HELMS HIGGINS LARSON MOORE POWELL RILEY DONNY LISA JOHN AMY MARY JASON JUDY 112 105 111 113 112 103 111 29,996.63 18,567.23 25,309.00 32,696.78 28,945.89 35,099.50 25,309.00
Column pointer controls are very useful. For instance, you can use the @n to move a pointer forward or backward when reading a record. In this INPUT statement, the value for FirstName is read
first, starting in column 9.
>----+---V10---+----20---+--EVANS DONNY 112 29,996.63 HELMS LISA 105 18,567.23 HIGGINS JOHN 111 25,309.00
Now let's read the values for LastName that begin in the first column. Here, you must use the @n pointer control to move the pointer back to column 1. input @9 FirstName $5. @1 LastName $7.
>V---+----10---+----20---+--EVANS DONNY 112 29,996.63 HELMS LISA 105 18,567.23 HIGGINS JOHN 111 25,309.00
The rest of the INPUT statement indicates the column locations of the raw data values for JobTitle and Salary.
input @9 FirstName $5. @1 LastName $7. +15 JobTitle 3. @19 Salary comma9.;
-----5---10--------15---20--V V
EVANS HELMS HIGGINS DONNY LISA JOHN 112 105 111 29,996.63 18,567.23 25,309.00
With formatted input, the column pointer control moves to the first column after the field just read. In this example, after LastName is
read, the pointer moves to column 8.
To start reading FirstName, beginning in column 9, you move the column pointer control ahead 1 column with +1.
>----+--V-10---+----20---+--EVANS DONNY 112 29,996.63 HELMS LISA 105 18,567.23 HIGGINS JOHN 111 25,309.00
The last field to be read contains the values for JobTitle. You can use the @n column pointer control to return to column 15.
input LastName $7. +1 FirstName $5. +5 Salary comma9. @15 JobTitle 3.;
Lesson : 7
However, the following external file contains data that is free-format, meaning data that is not arranged in columns. Notice that the values for a particular field do not begin and end in the same columns.
>----+----10---+----20---+----30-ABRAMS L.MARKETING $18,209.03 BARCLAY M.MARKETING $18,435.71 COURTNEY W.MARKETING $20,006.16 FARLEY J.PUBLICATIONS $21,305.89 HEINS W.PUBLICATIONS $20,539.23
Objectives
Free-format data, or data that is not organized in fixed fields Free-format data separated by nonblank delimiters, such as commas Free-format data that contains missing values Character values that exceed eight characters Nonstandard free-format data Character values that contain embedded blanks.
Free-Format Data
Suppose you have raw data that is free-format; that is, it is not arranged in fixed fields. The fields may be separated by blanks or some other delimiter, as shown below. Column and formatted input that you may have used before to read standard and nonstandard data in fixed fields won't work in this case.
List Input
Characteristics of list Input Style
Fields must be separated by at least one blank Each field must be specified in order Missing values must be represented by period Character values cant contain embedded blanks The default length of character variables is 8 bites. A longer value truncated when it is written in the programmer vector Data must be standard character or numeric character
data a; input name $ age sal ; cards; venu inder 25 reddy hanu ; run; proc print data= a; run; 26 21 24 456.09 467.17 766.36 765.89
Obs name 1 venu 2 inder 3 reddy 4 hanu AGE
24 25 21 26
sal
456.09 467.17 766.36 765.89
dlm= ', ;
Obs DIST mcode
27 29 34 35 45 56
input DIST $ mcode june july aug sept ; run; proc print data= a; run; proc print data= RAIN;; run;
june
1 3 2 2 2 23
july
8 14 10 12 34 24
aug
0 5 3 4 6 5
sept
0 10 3 8 90 6
MISSOVER Simply specify the MISSOVER option in the INFILE statement. The MISSOVER option .when your are using this option in In list input style in missing place we should give the periodic (.)
data BANK; INFILE CREDIT MISSOVER; input BANK $ june july aug sept ; run; proc print ; run;
Obs
BANK
june
23 34 45 . 23
july
. 45 56 34 .
aug
45 . . 54 87
sept
44 56 57 56 98
july
. 45 56 34 .
aug
45 . . 54 87
sept
44 56 57 56 98
july
45 56 34 .
aug
44 . 54 87
sept
. 57 56 98
Changing the Length of Character Values Remember that when you use list input to read raw data, character values are assigned a default length of 8. Let's take a look at what happens when list input is used to read character values that have a value longer than 8.
Note: before input statement you should write the length statement
Statebankindia, 23, ., 45, 44, Abdhrabankmysore, 34, 45, ., 56, Raw Date Credit HSBCbankusa, 45, 56, ., 57, AMBRObankcanada, ., 34, 54, 56, ICICIbankindia, 23, ., 87, 98, data BANK; INFILE credits dlm= , ' ; length bank $16.; input bank $ june july aug sept; run; proc print ; run;
Obs bank june
23 34 45 . 23
july
. 45 56 34 .
aug
45 . . 54 87
sept
44 56 57 56 98
Modifying List Input You can make list input more versatile by using modified list input. There are two modifiers that can be used with list input.
The ampersand (&) modifier is used to read character values that contain embedded blanks. The colon (:) modifier is used to read nonstandard data values and character values longer than eight characters, but without embedded blanks.
State bank india 23 . 45 44 Abdhra bank mysore 34 45 . 56 Raw Date HSBC bank usa 45 56 . 57 Credit AMBRO bank canada . 34 54 56 ICICI bank india 23 . 87 98
data BANK; INFILE CARDS missover ; input bank $ & aug sept ; run; proc print ; run;
july
. 45 56 34 .
aug
45 . . 54 87
sept
44 56 57 56 98
1 State bank india 2 Abdhra bank mysore 3 HSBC bank usa 4 AMBRO bank canada 5 ICICI bank india
The colon (:) format modified enables you to use list input but also to specify an informat after a variable name whether character or numeric SAS reads until it encounters black column
NEW YORK
LOS ANGELES
CHICAGO USA 3,009,530,909 SCOOTIA BANK HOUSTON CITY 1,728,910,890 AMERICAN EXPRESS PHILAD ELPHIA 1,642,900,878 CITY BANK Obs 1 2 3 4 5 City
NEW YORK LOS ANGELES CHICAGO USA HOUSTON CITY PHILAD ELPHIA
AMOUNT
7262700898 3259340889 3009530909 1728910890 1642900878
BANKS
AMBRO BAN HSBC BANK SCOOTIA BANK AMERICAN EXPRESS CITY BANK
AMOUNT :comma13. BANKS & $16. ; FORMAT AMOUNT COMMA13.; un; proc print ; run;
Obs 1 2 3 4 5
City
NEW YORK LOS ANGELES CHICAGO USA HOUSTON CITY PHILAD ELPHIA
AMOUNT BANKS
7,262,700,898 3,259,340,889 3,009,530,909 1,728,910,890 1,642,900,878 AMBRO BAN HSBC BANK SCOOTIA BANK AMERICAN EXPRESS CITY BANK
input @15 BANKS & $16. @1 City $10.AMOUNT 1-19 *FORMAT AMOUNT COMMA13.; CARDS ; NEWYARK 700 AMBRO BAN LOSENJELS 259 HSBC BANK CHICAGO 129 SCOOTIA BANK HOUUGES 728 AMERICAN EXPRESS PHILIPHS 642 CITY BANK ; Run; proc print ;Run;
rate ;
List INPUT
AMOUNT
700 259 129 728 642
Obs
NAME
IOD1
12 67 22 34
IOD2
37 109 50 59
IOD3
23 76 90 120
Lesson :8
SAS software provides many SAS informats for reading raw data values in various forms. If you took the lesson Reading Raw Data in Fixed Fields, you worked with informats to read standard and nonstandard data. In this lesson, you learn how to use a special category of SAS informats called date and time informats. These informats enable you to read a variety of common date and time expressions. After you read date and time values, you can also perform calculations with them.
options yearcutoff=1920; data perm.aprbills; infile aprdata; input LastName $8. @10 DateIn mmddyy8. +1 DateOut mmddyy8. +1 RoomRate 6. @35 EquipCost 6.; Days=dateout-datein+1; RoomCharge=days*roomrate; Total=roomcharge+equipcost; run;
Objectives
How SAS software stores date and time values To read common date and time expressions using SAS informats How to handle two-digit date values To calculate time intervals by subtracting two dates To multiply a time interval by a rate.
How SAS Software Stores Date Values Before you read date or time values into a SAS data set or use those values in calculations, you should understand how SAS software stores date and time values. When you read a date using a SAS informat, SAS software converts it to a numeric date value. A SAS date value is the number of days from January 1, 1960, to the given date.
Date informats
Date value Date value
SAS informat
Time informats
SAS software stores time values similar to the way it stores date values. A SAS time value is stored as the number of seconds since midnight.
A SAS datetime is a special value that combines both date and time information. A SAS datetime value is stored as the number of seconds between midnight on January 1, 1960, and a given date and time.
INPUT <variable name>variable informat.; where variable is the name of the variable being read. informat. is any valid SAS informat. Note that the format includes a final decimal point.
MMDDYYw. Informat Date Expression 10/15/99 10/15/99 10 15 99 10-15-1999 SAS Date Informat MMDDYY8. MMDDYY8. MMDDYY8. MMDDYY10.
In the MMDDYYw. informat, the month, day, and year fields can be separated by blanks or special characters. If delimiters are used, they must be placed between all fields in the values. Remember to specify a field width that includes not only the month, day, and year values, but any delimiters as well. Here are some date expressions you can read using the MMDDYYw. informat:
DATEw. Informat
Date Expression 30May00 30May2000 30-May-2000 TIMEw. Informat Time Expression 17:00:01.34 17:00 2:34 SAS Time Informat TIME11. TIME5. TIME5. SAS Date Informat DATE7. DATE9. DATE11.
Note: Five is the minimum acceptable field width for the TIMEw.
Sample Programme
data dates; input LastName $ 1-7 +1 DateIn mmddyy8. +1 Dateout mmddyy8.; cards; Akron 04/05/99 04/05/99 Brown 04/12/99 04/05/99 Carnes 04/27/99 04/05/99 Denison 04/11/99 04/05/99 Fields 04/15/99 04/05/99 Jamison 04/16/99 04/05/99 ;
5 Fields 6 Jamison
14349 14350 14339 14339
Obs
LastNam e
DateI n
14339 14346 14361 14345
Dateou t
14339 14339 14339 14339
LESSON :9
input Lname $ 1-8 Fname $ 10-15; input Department $ 1-12 JobCode $ 1519; input Salary comma10.;
input #1 Lname $ 1-8 Fname $ 10-15 #2 Department $ 1-12 JobCode $ 15-19 #3 Salary comma10.;
But you can also position the input pointer on a specific record by using a line pointer control in the INPUT statement. input #2 Name $ 1-12 Age 15-16 Gender $18;
>----+----10---+---S. Thompson 37 M L. Rochester 31 F M. Sabatello 43 M
The Forward Slash (/) Line Pointer Control The #n Line Pointer Control
Example
data a; input Lname $ 1-8 Fname $ 10-15 / Department $ 1-12 JobCode $ 15-19 / Salary comma10.; cards; ABRAMS THOMAS MARKETING 25,209.03 BARCLAY ROBERT EDUCATION 24,435.71 COURTNEY MARK PUBLICATIONS TW01 24,006.16 ; run; proc print data= a; run; IN01 SR01
Obs
Lname
Fname
OMAS
Department
MARKETING
JobCode
SR01
Salary
25209.03
1 ABRAMS T 2 BARCLAY 3 Y
COURTNE
OBERT
EDUCATION PUBLICATION S
IN01
24435.71
MARK
TW01
24006.16
The #n specifies the absolute number of the line to which you want to move the input pointer. The #n pointer control can read records in any order; therefore, it must be specified before the instructions for reading values in a specific record.
input #2 Department $ 1-12 JobCode $ 15-19 #1 Lname $ 1-8 Fname $ 10-15 #3 Salary comma10.; >----+----10---+---ABRAMS THOMAS MARKETING SR01 $25,209.03 BARCLAY ROBERT EDUCATION IN01 $24,435.71 COURTNEY MARK PUBLICATIONS TW01 $24,006.16
data aa; input #2 Department $ 1-12 JobCode $ 15-19 #1 Lname $ 1-8 Fname $ 10-15 #3 Salary comma10.; cards; ABRAMS THOMAS MARKETING $25,209.03 BARCLAY ROBERT EDUCATION $24,435.71 COURTNEY MARK PUBLICATIONS TW01 $24,006.16 ; run; proc print data= aa;RUN; IN01 1 MARKETING 2 EDUCATION 3 S
PUBLICATION SR01 IN01 TW01
Fname
OMAS ROBER T MARK
Salary
25209.0 3 24435.7 1 24006
Lesson :10
Objectives Create multiple observations from a single record that Contains repeating blocks of data create multiple observations From a single record that contains one ID field followed by The same number of repeating fields create multiple observations from a single record that contains one ID field Followed by a varying number of repeating fields.
line-hold specifiers.
The trailing @ enables the next INPUT statement to read from the current record in the same iteration of the DATA step. The double trailing at sign (@@) enables the next INPUT statement to read from the current record across further iterations of the DATA step.
@@
data temp; input id$ temp @@; cards; pp 68 pp 67 pp 70 ss 68 ss 67 ss 70 kk 68 kk 67 kk 70 Kk 68 tt 67 tt 70 ; run; proc print data= temp; run;
Obs id 1 pp 2 pp 3 pp 4 ss 5 ss 6 ss 7 kk 8 kk 9 kk 10 Kk 11 tt 12 tt temp
68 67 70 68 67 70 68 67 70 68 67 70
@
data rain; input name$ @; do Quarter=1 to 4; input rain @; output; end; cards; hyd 56 67 89 34 23 sec 34 23 45 65 65 gt 34 22 34 54 35 ED 45 65 77 86 56 ;run; proc print data= rain; run;
Obs 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
name
hyd hyd hyd hyd sec sec sec sec gt gt gt gt ED ED ED ED
Quarter
1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4
rain
56 67 89 34 34 23 45 65 34 22 34 54 45 65 77 86
Lesson:11
Creating Variables
When working with data sets, it's useful to create completely new variables or new variables that are based on the values of existing variables. These new variables can contain the results of SAS functions, conditionally-assigned values, or running totals of other variable values This lesson provides a range of techniques for creating and controlling new variables. It also shows how these variables can help you to analyze your data.
Objectives
Create a variable by using a simple assignment statement Create a variable that accumulates values down observations Assign values to a variable conditionally Specify the lengths of new variables Execute multiple statements conditionally.
Assignment Statements
variable=expression; where variable names a new or existing variable expression is any valid SAS expression.
Operators
* / + -
SAS Functions
MIN MAX ROUND MEAN SUM UPCASE returns the smallest value of the arguments returns the largest value of the arguments rounds a value to the nearest roundoff unit computes the arithmetic mean (average) calculates the sum of the arguments converts character values to uppercase letters
data aq; maxvalue= Max (2, 3, 4, 5, 6, 7, 80); round= round(2.99 ); min= min(2, 3, 4, 5, 6, 7, 80); mean= mean(2, 3, 4, 5, 6, 7, 80); sum= sum(2, 3, 4, 5, 6, 7, 80); run; proc print data= aq;run;
Obs 1 maxvalue
80
round
3
min
2
mean
15.2857
sum
107
data finance.newloan; set finance.records(drop=amount rate); TotalLoan+payment; if code='1' then Type='Fixed'; run;
Comparison and Logical Operators When writing IF-THEN statements, you can use any of the following comparison operators
Comparison Operation equal to not equal to greater than less than greater than or equal to less than or equal to equal to one of a list
if test<85 and time<=20 then status='RETEST'; if region in ('NE','NW','SW') then rate>=fee-25; if target gt 300 or sales ge 50000 then bonus=salary*.05;
If then else
How to assign new variable based on the existed data set or new data set ?
Eg: data inder; set clinic.admit; if age>30 then type='middme'; else type=' low';run; proc print data= inder;run; New assigned variable
statements
KEEP DROP
DROP KEEP
data venu (drop=name id weight height); set clinic.admit; run; proc print data= venu; run;
data venu (keep=name id weight height); set clinic.admit; run; proc print data= venu; run;
Lesson :12
Reading and Concatenating SAS Data Sets
To create a new data set from an existing SAS data set. To create the new data set, you can read a single data set, or you can concatenate two or more data sets. Concatenating appends the observations from one data set to another data set.
Objectives Create new data set from one or more existing sets Select observations based on a condition Select variables to include or exclude.
Syntax
General form, basic DATA step for reading a single data set: DATA SAS-data-set; SET SAS-data-set; RUN; where SAS-data-set in the DATA statement is the name (libref.filename) of the SAS data set to be created SAS-data-set in the SET statement is the name (libref.filename) of the SAS data set to be read.
Mixa
Mixb
1 2 3 4 5 b1 b2 b3 b4 b5
Example:
1 2 3 4 5
a1 a2 a3 a4 a5
data mixab; set mixa mixb; run; proc print data=mixab; run;
var_B
b1 b2 b3 b4 b5
Selecting Observations
To select only those observations that meet a specified condition, you can use a subsetting IF statement in any DATA step. IF expression; Use expressions in SAS programming statements to Transform variables Create new variables Conditionally process Calculate new values Assign new values.
BY statement Most merges are combined with a BY statement to produce a match-merge of two or more data sets. When a BY statement is used, observations are match-merged according to the values of the BY variable(s).
Mixa data mixab; set mixa mixb; by num;run; proc print data=mixab; run;
Obs 1 2 3 4 5 6 7 8 9 10
1 2 3 4 5 a1 a2 a3 a4 a5 1
Mixb
b1 b2 b3 b4 b5 2 3 4 5
num
1 1 2 2 3 3 4 4 5 5
var_A
a1
var_B
b1
a2 b2 a3 b3 a4 b4 a5 b5
Mixa
1 2 3 4 5 a1 a2 a3 a4 a5
Mixb
6 b1 7 b2 8 b3 9 9 b4 b5
Lesson: 13
Objectives Identify default output from match-merging Prepare data for match-merging by sorting, if necessary Specify data sets to merge and the data set to be created Specify the common variable to use in matching observations Rename any like-named variables to avoid overwriting values Select only matched observations, if desired Select variables Predict the results of match-merging
Preparing Data for Match-Merging In DATA step match-merging, all data sets to be merged must be sorted or indexed by the values of a common variable (also known as a BY variable). The common variable must have the same type and length in all data sets to be merged.
Sorting Data
PROC SORT <DATA=SAS-data-set>; <OUT=SAS-data-set>; BY variable(s); RUN; the DATA= option names the data set to be read the OUT= option creates an output data set containing the data in sorted order variable(s) in the required BY statement specifies the variable(s) whose values are used to order the data.
data address; Infile address input name$ sal dollar10. ; format sal dollar10.;run; proc print data=address; run; proc sort data= address out=venkat; by name ; run;
venu $78900 goutam $90000 hanu $67000 inder $95000 hari $80000
Obs 1 2 3 4 5 name
venu goutam hanu inder hari
sal
$78,900 $90,000 $67,000 $95,000 $80,000
Descending
proc sort data= address out= venkat; by name descending; run; Proc print data= venkat; Run;
Obs 1 2 3 4 5 name
hanu venu hari goutam inder
sal
$67,000 $78,900 $80,000 $90,000 $95,000
data clinic.combined; merge clinic.demog clinic.visit; by id; run; proc print data=clinic.combined; run;
Sales_M data mixed; merge sales_M sales_t; by num; run; proc print data= mixed; run;
Obs 1 2 3 4 5 num
1 2 3 4 5
Sales_T 1 200
monday
200 300 430 490 509
tuesday
200 400 500 900 500
Renaming Variables Sometimes you may have variables with the same name in more than one input data set. In this case, DATA step matchmerging overwrites values of the like-named variable in the first data set in which it appears with values of the like-named variable in subsequent data sets.
RENAME= data set option: (RENAME=(old-variable-name=new-variable-name)) the RENAME= option, in parentheses, follows the name of each data set that contains one or more variables to be renamed old-variable-name names the variable to be renamed new-variable-name specifies the new name for the variable.
Syntax
Example
data rename; merge sales_m (rename=(sales=sl_1)) sales_T (rename=(sales=sl_2));by num; run; proc print data=rename;run; Obs 1 2 3 4 5 num
1 2 3 4 5
sl_1
200 600 100 640 440
sl_2
800 400 200 400 390
Lesson : 14
Objectives 1. 2. 3. 4. 5. 6. specify SAS data sets to print select variables and observations to print specify column totals for numeric variables sort data by values of one or more variables assign descriptive labels to variables double space SAS listing output.
Creating a Basic Report General form, basic PROC PRINT step: PROC PRINT <DATA=SAS-data-set>; RUN; where SAS-data-set is the name of the SAS data set to be printed.
Selecting Variables By default, a PROC PRINT step lists all the variables in a data set. You can select variables and control the order in which they appear by using a VAR statement in your PROC PRINT step.
General form, VAR statement: VAR variable(s); where variable(s) is one or more variable names, separated by blanks.
proc print data=clinic.admit; var age height weight fee; run;
In addition to selecting variables, you can control the default Obs column that PROC PRINT displays to list observation numbers. You can specify text to replace the Obs heading in your PROC PRINT output, or you can choose not to display observation numbers.
Obs
1 2 3 4 5
Age
27 34 31 43 51
Height
72 66 61 63 71
Weight
168 152 123 137 158
Fee
85.20 124.80 149.75 149.75 124.80
Specifying the Obs Column Header proc print data=work.example obs='Patient'; var age height weight fee; run;
Removing the Obs Column To remove the Obs column, you can specify the NOOBS option in the PROC PRINT statement. proc print data=work.example noobs; var age height weight fee; run;
Age
27 34 31 43 51
Height
72 66 61 63 71
Weight
168 152 123 137 158
Fee
85.20 124.80 149.75 149.75 124.80
WHERE statement
General form, WHERE statement: WHERE where-expression; where where-expression specifies a condition for selecting observations. The where-expression can be any valid SAS expression.
proc print data=clinic.admit; var age height weight fee; where age>30; run;
Obs
2 3 4 5 7 8 9 10 11 14 15 16 17 20 21
Age
34 31 43 51 32 35 34 49 44 40 47 60 43 41 54
Height Weight
66 61 63 71 67 70 73 64 66 69 72 71 65 67 71
Fee
152 124.80 123 149.75 137 149.75 158 124.80 151 149.75 173 149.75 154 124.80 172 124.80 140 149.75 163 124.80 173 124.80 191 149.75 123 124.80 141 149.75 183 149.75
A variable specified in the WHERE statement can be any variable in the SAS data set, not necessarily one of the variables specified in the VAR statement. The WHERE statement works for both character and numeric variables. To specify a condition based on the value of a character variable,
1. 2.
you must enclose the value in quotes write the value with lower and uppercase letters exactly as it appears in the data set.
1.You can use compound expressions like these in your WHERE statements where age<=55 and pulse>75; where area='A' or region='S'; where ID>1050 and state='NC'; 2.When you test for multiple values of the same variable, you specify the variable name in each expression: where actlevel='LOW' or actlevel='MOD'; where fee=124.80 or fee=178.20; 3.You can use the IN operator as a convenient alternative: where actlevel in ('LOW','MOD'); where fee in (124.80,178.20);
General form, SUM statement: SUM variable(s); where variable(s) is one or more variable names, separated by blanks. You do not need to name the variables in a VAR statement if you specify them in the SUM statement.
proc print data=clinic.admit; var age height weight fee; where age>30; sum fee; run;
Obs
2 3 4 5 7 8 9 10 11 14 15 16 17 20 21
Age
34 31 43 51 32 35 34 49 44 40 47 60 43 41 54
Height
66 61 63 71 67 70 73 64 66 69 72 71 65 67 71
Weight
152 123 137 158 151 173 154 172 140 163 173 191 123 141 183
Fee
124.80 149.75 149.75 124.80 149.75 149.75 124.80 124.80 149.75 124.80 124.80 149.75 124.80 149.75 149.75
2071.60
Ob s 1 2 6 11 14 18 20
Ag e
Se x
Weig ht
168 152 193 140 163 188 141
Heig ht
72 66 76 66 69 75 67
27 M 34 F 29 M 44 F 40 F 25 M 41 F
General form, LABEL statement: LABEL variable-1='label-1' . . . <variable-n='label-n'>; Labels can be up to 256 characters long and must be enclosed in quotes.
Obs
1 2 3 4
Date
JAN1999 FEB1999 MAR1999 APR1999
AerClass
56 32 35 47
Walk/Jog/Run
78 109 106 115
Swim
14 19 22 24
Lesson : 15
Enhancing Reports
When you create reports, you may want to make your data easy to interpret by adding titles and footnotes, replacing variable names with descriptive labels, or formatting variable values. This lesson shows you how to specify these enhancements. Although this lesson focuses on reports and uses PROC PRINT output in examples, you can apply these enhancements to any SAS procedure output.
Objectives add titles and footnotes replace variable names with descriptive labels format data values.
TITLE statement:
TITLE statements are global statements. That is, after you define a title, it remains in effect until you modify it, cancel it, or end your SAS session
General form, TITLE statement: TITLE<n> 'title-text'; where n is a number from 1 to 10 that specifies the title line, and 'title-text' is the actual title to be displayed. The maximum title length depends on your operating environment and the value of the LINESIZE=option
title1 'Heart Rates for Patients with'; title3 'Increased Stress Tolerance Levels'; proc print data=clinic.stress; var resthr maxhr rechr; where tolerance='I'; run;
Obs
2 3 8 11 14 15 20
RestH R
68 78 70 65 74 75 78
MaxH R
171 177 167 181 152 158 189
RecH R
133 139 122 141 113 108 138
Using the TITLES Window You can also specify titles in the TITLES window. These titles are not stored with your program, and they remain in effect only during your SAS session. To open the TITLES window, issue the TITLES command. To specify a title, type in the text you want. To cancel a title, erase the existing text. Notice that you do not enclose title text in quotes in this window
Specifying FOOTNOTES General form, FOOTNOTE statement: FOOTNOTE<n> 'footnote-text'; where n is a number from 1 to 10 that specifies the footnote line, and footnote-text is the actual footnote to be displayed. The maximum footnote length depends on your operating environment .
footnote1 'Data from Treadmill Tests'; footnote3 '1st Quarter Admissions'; proc print data=clinic.stress; var resthr maxhr rechr; where tolerance
Obs
2 3 8 11 14 15 20
RestHR
68 78 70 65 74 75 78
MaxHR
171 177 167 181 152 158 189
RecHR
133 139 122 141 113 108 138
Using the FOOTNOTES Window You can also specify footnotes in the FOOTNOTES window. These footnotes are not stored with your program, and they remain in effect only during your SAS session. To open the FOOTNOTES window, issue the FOOTNOTES command. To specify a footnote, type in the text you want. To cancel a footnote, erase the existing text. Notice that you do not enclose footnote text in quotation marks in this window.
System Options
General form, OPTIONS statement: OPTIONS options; where options specifies one or more system options to be changed. The system options available depend on your host system.
options nonumber nodate; proc print data=sales.qtr4; var salesrep type unitsold region; where unitsold>=30; run; options date; proc freq data=sales.qtr4; where unitsold>=30; tables salesrep; run;
Regular Option DATE | NODATE NUMBER | NONU MBER PAGENO= PAGESIZE= LINESIZE= whether the date and time appear in your output whether page numbers appear in your output the beginning page number for your output the number of lines printed on each page of output the print line width for your log and procedure output
Formatting Data Values General form, FORMAT statement: FORMAT variable(s) format-name; where variable(s) is the name of one or more variables whose values are to be written according to a particular pattern format-name specifies a SAS or user-defined format that is used to write out the values.
Formats affect only the way that the data values appear in output, not the actual data values as they are stored in the SAS data set.
proc print data=clinic.admit ; var actlevel fee; where actlevel='HIGH'; label fee='Admission Fee'; format fee dollar4.; run; Obs ActLevel
1 HIGH 2 HIGH 6 HIGH 11 HIGH 14 HIGH 18 HIGH 20 HIGH
Fee
$85 $125 $125 $150 $125 $85 $150
You can permanently assign a format to a variable in a SAS data set, or you can temporarily specify a format in a PROC step to determine the way that the data values appear in output.
Field Widths All SAS formats specify the total field width (w) used for displaying the values in the output. For example, suppose that the longest value for the variable Net is a four digit number, such as 5400. To specify the COMMAw.d format for Net, you specify a field width of 5 or more. You must count the comma, because it occupies a position in the output.
,
2
4
3
0
4
0
5
Decimal Places For numeric variables you can also specify the number of decimal places (d), if any, to be displayed in the output. Numbers are rounded to the specified number of decimal places. In the example above, no decimal places are displayed. Writing the whole number 2030 as 2,030.00 requires eight print positions, including two decimal places and the decimal point
,
2
0
3
3
4
0
5
.
6
0
7
0
8
Formatting 15374 with a dollar sign, commas, and two decimal places requires ten print positions.
0
10
Lesson : 16
Objectives Permanently associate a format with a variable Create your own formats to display variable values Permanently store the formats that you create.
SAS Formats and the FORMAT Statement COMMAw.d displays numeric values with commas DOLLARw.d displays numeric values with a leading dollar sign ($) and commas.
dollar9. ^
$5,349.41 123456789
Numeric SAS formats, such as the DOLLARw.d format, can also specify a d value, which is the number of decimal places to be displayed. Note that a period separates the w from the d value.
dollar9.2 ^ $5,349.41 123456789
Remember that multiple formats and variables can be associated in a single FORMAT statement .
Finally, you can add FORMAT statements to the PRINT procedure to enhance the data values in a report.
proc print data=perm.employee; format salary dollar10.2; run;
Now that you have the FORMAT statement written, how do you permanently associate the formats with their respective variables? You place the FORMAT statement in the DATA step
General form, PROC FORMAT statement: PROC FORMAT <options>; where options include LIBRARY=libref specifies the libref for a SAS data library that contains a permanent catalog in which user-defined formats are stored FMTLIB displays the contents of a format catalog.
Defining a Unique Format General form, VALUE statement: VALUE name range-1='label-1' <...range-n='label-n'>; where the format's name must begin with a dollar sign ($) if it applies to a character variable, and it cannot be longer than eight characters cannot be the name of a SAS format cannot end with a number does not end in a period when specified in a VALUE statement.
proc format lib=library; value JobFmt 103='manager' 105='text processor' 111='assoc. technical writer' 112='technical writer' 113='senior technical writer';run;
FirstName
Donny Lisa John Amy Mary Jason
LastName
Evans Helms Higgins Larson Moore Powell
JobTitle
Salary
112 29996.63 105 18567.23 111 25309.00 113 32696.78 112 28945.89 103 35099.50
Obs FirstName
1 Donny 2 Lisa 3 John 4 Amy 5 Mary 6 Jason 7 Judy 8 Neal 9 Henry 10 Chip
LastName
Evans Helms Higgins Larson Moore Powell Riley Ryan Wilson Woods
JobTitle
technical writer text processor assoc. technical writer senior technical writer technical writer manager assoc. technical writer technical writer senior technical writer text processor
Salary
29996.63 18567.23 25309.00 32696.78 28945.89 35099.50 25309.00 28180.00 31875.46 17098.71
Character values
proc format lib=library; value $Answer 'Y'='Yes' 'N'='No' 'U'='Undecided' 'NOP'='No opinion'; run;
Numeric values proc format lib=library; value JobFmt 103='manager' 105='text processor' 111='assoc. technical writer' 112='technical writer' 113='senior technical writer'; run;
You can specify a non-inclusive range of numeric values by using the less than symbol (<) to avoid any overlapping. In this example, the range of values from 0 to less than 12 are labeled as child. The next range begins at 12, so the value 12.3 would be assigned the label teenager.
proc format lib=library; value AgeFmt 0-<12='child 12-<20='teenager' 20-<65='adult' 65-<100='senior citizen'; run;
You can also use the keywords LOW and HIGH to specify the lower and upper limits of a variable's value range. The keyword LOW does not include missing values. The keyword OTHER can be used to label missing values and any value that is not specifically addressed in a range. proc format lib=library; value AgeFmt low-<12='child' 12-<20='teenager' 20-<65='adult 65-<high='senior citizen'; other='unknown'; run;
You can define several formats by using multiple VALUE statements in a single PROC FORMAT step. In this example, each VALUE statement defines a different format. proc format lib=library; value JobFmt 103='manager' 105='text processor' 111='assoc. technical writer' 112='technical writer' 113='senior technical writer ; value $Respnse 'Y'='Yes' 'N'='No 'U'='Undecided' 'NOP'='No opinion'; run;
Numeric
values
Format jobcode
JobFmt.;
Character values
Format jobcode $JobFmt.;
Lesson : 17
Objectives The variables to appear in your table The statistic to be computed for each variable The arrangement of statistics and variables in the table Additional features such as formats for values in the table, column and row totals, and labels for statistics and a summary variable.
One-Dimensional Table : Column expression Proc tabulate data=clinic.diabstat; class type; var premium; table type premium; run;
Two-Dimensional Table : Row Expression proc tabulate data=clinic.admit; class sex; var height weight; table sex,height*min weight*min; run; Height Min Sex F M
61.00 69.00
comma
Weight Min
118.00 147.00
proc tabulate data=clinic.admit; class sex actlevel; var height weight; table actlevel,sex,height*min weight*min; run;
ActLevel MOD
Weight Min Sex F
66.00 72.00 140.00 168.00 61.00 71.00 118.00 154.00
ActLevel HIGH
Height Min Sex F M
ActLevel LOW
Height Weight Min Min Sex F M
63.00 69.00 123.00 147.00
Height Min
Weight Min
Setting Up a Table
Unlike PROC PRINT, PROC TABULATE doesn't create default reports. So, before you begin writing a PROC TABULATE step, it's a good idea to sketch the table you want.
After you sketch the table, you can write the basic code to compute the statistics you want to display. When you are satisfied with the basic report, you can add options and statements to modify its appearance. In fact, once you define the basic structure of your table, enhancing it is easy.
To set up a table with PROC TABULATE, you need to identify the data you are analyzing, and then determine Which variables, if any, you need to classify your data Which variables, if any, you need to analyze your data The type of table you need to represent your data.
General form, PROC TABULATE statement: PROC TABULATE options; where options includes the DATA= option to specify the data set to use. For example:
Specifying Variables After you invoke PROC TABULATE and identify your data set, you need to specify variables to create your tables. As you saw earlier, you need to distinguish between variables that classify your data (into groups, or categories, or classes) and variables used for arithmetic analysis. These are called class variables and analysis variables, respectively. You list
Class Variables
Can be character or numeric. Classify data into groups or categories. Have only a few distinct values, in most cases. (PROC TABULATE prints each value of a class variable.)
PctInsured 50 N Company Parnassus Reliable Ruritan USA Inc.
1.00 1.00 1.00 2.00 2.00 2.00 1.00 1.00 1.00 1.00 2.00 1.00 2.00 1.00 1.00 1.00
60 N
80 N
100 N
Analysis Variables Must be numeric Are used for arithmetic analysis Often contain continuous values.
Total Sum
$13,079.32
BalanceDue Mean
$108.52
Describing the Table After you specify your variables in CLASS or VAR statements, you need to describe the table you want PROC TABULATE to produce.
You use the TABLE statement to specify the number of dimensions in the table (page, row, column) the variables in the table (Sex, Height) the statistics to be calculated (MAX)
Describing the Table General form, TABLE statement: TABLE page-expression,row-expression, column-expression / <options>; where each expression specifies the elements (variables and statistics) in that dimension of the table. These expressions are known collectively as dimension expressions. .
Dimension expressions can also contain operators that you use when combining elements to produce the table you want.
Remember this
If a TABLE statement doesn't contain a comma, it requests a one-dimensional table, no matter how many variables or statistics it specifies. A TABLE statement with one comma specifies a twodimensional table. Two commas indicate a three-dimensional table.
Specifying Statistics
Your final task before writing your own PROC TABULATE step is to specify the statistics needed. To request a statistic, you use an operator, the asterisk (*), to attach the statistic to the variable. In the TABLE statements below, the statistic MEAN is specified for the variable Fee. Notice that you don't specify statistics in the CLASS or VAR
statements. Notice also how changing the
Fee proc tabulate data=clinic.admit; var fee; table fee*mean; run; Mean
127.95
Mean Fee
127.95
Rules for Specifying Statistics 1. If you specify only class variables in your TABLE statement, The default statistic is N (frequency) The only statistics you can request are N and PCTN (percent of total frequency).
proc tabulate data=clinic.admit; class sex actlevel; table sex*pctn actlevel*n; run;
The TABLE statement in the PROC TABULATE step above specifies only class variables, so it can request only N and PCTN.
2. If you specify any analysis variables in your TABLE statement, A) The default statistic is SUM B) You can request any statistic to be computed on the analysis variables.
proc tabulate data=clinic.admit; class actlevel; var height weight; table height*mean weight*max,actlevel; Run;
3.In a TABLE statement, you can specify statistics in any dimension, but they must all be in the same dimension.
proc tabulate data=clinic.admit; class sex actlevel; var height weight; table height*mean weight*max,actlevel; table sex*pctn actlevel*n; run;
Lesson : 18
Creating Plots
SAS/GRAPH software enables you to display your data graphically. To create a variety of plots, you can use the GPLOT procedure within SAS/GRAPH software
Objectives
Invoke the GPLOT procedure and name the data set to be used Request a plot and specify variables to be plotted Scale axes Select observations Overlay plots Define plotting symbols and the method of interpolation View graphs using the GRAPH, Results, and Explorer windows Specify an output catalog and store graphs temporarily or permanently.
Creating a Basic Plot Let's start by using the GPLOT procedure to plot one variable against another within a set of coordinate axes. You specify a PROC GPLOT statement to invoke the procedure and identify the data set to be used a PLOT statement to specify the variables to be plotted.
The graph below is the output from the PROC GPLOT step above. The entire graph appears in one default color, and the default plotting symbols (plus signs) are not connected.
Syntax
PROC GPLOT DATA=SAS-data-set; PLOT vertical-variable*horizontal-variable; RUN; where SAS-data-set identifies the data set to be used vertical-variable is the variable plotted on the vertical axis horizontal-variable is the variable plotted on the horizontal axis.
Scaling Axes To scale the axes in your plot, you can specify the VAXIS= and HAXIS= options in the PLOT statement. The VAXIS= option specifies tick marks along the vertical axis, and the HAXIS= option specifies tick marks along the horizontal axis. Notice that a slash (/) precedes options in the PLOT statement VAXIS= and HAXIS= options: PLOT vertical-variable*horizontal-variable / VAXIS=<value-list | range> HAXIS=<value-list | range>; where value-list or range determines the placement of tick marks along the axis.
vaxis=10 to 100 by 10
Value Lists proc gplot data=clinic.therapy1999; plot aerclass*month / haxis='01' '06' '12'; Run;
Range of Values You can also scale axes by specifying a range of values. Notice the default scaling of the vertical axis in the plot below.
proc gplot data=clinic.therapy1999; plot aerclass*month / haxis='01' '06' '12' vaxis=0 to 100 by 50; run;
When you specify a range of values, be sure to scale the axis in workable increments to accommodate the smallest and largest data values to be plotted.
Enhancing Plots
Now that you know how to create single and overlaid plots, you can use the SYMBOL statement to enhance your plots by specifying plotting symbols, plot lines, color, and interpolation. (Interpolation is a technique for estimating values between plot points and drawing lines to connect the points.)
option...
VALUE= HEIGHT= INTERPOL= WIDTH= COLOR=
Specifies
plotting symbol height of the plotting symbol interpolation technique thickness of the line in pixels color of plotting symbols or lines
symbol1 color=red value=star interpol=spline height=1 cm width=4; symbol2 color=green value=plus interpol=spline height=1 cm width=4;
Setting Plotting Symbols The VALUE= (or V=) option specifies the plotting symbol that represents each data point. Possible values for the VALUE= option include
the letters A through W the numbers 0 through 9 a number of special symbols including PLUS, STAR, SQUARE, DIAMOND, TRIANGLE, and many others NONE, which produces a plot with no symbols for data points.
Setting Plotting Symbol Height percentage of the display area (PCT) inches (IN) centimeters (CM) points (PT) character cells (CELL), which is the default unit.
symbol1 value=triangle height=1 cm color=black; proc gplot data=clinic.totals2000; plot newadmit*month; run;
Specifying Connecting Lines Possible values include NONE, JOIN, NEEDLE, SPLINE, HILO, STD, and more.
interpol=join;
Specifying Color
proc gplot data=clinic.therapy1999 gout=newcat; plot swim*month aerclass*month /; where swim>35 and aerclass>35; run;
After you create graphs, you can view and manage the entries in the catalog where they are stored (whether or not you specified an output catalog using the GOUT= option).
Additional Features
HMINOR= VMINOR=
symbol1 color=red interpol=spline value=none; symbol2 color=blue interpol=spline value=none; proc gplot data=air.airqual; plot avgtsp*month=state / vminor=3 hminor=0; where state in ("AL" , "GA"); run;
CAXIS = CTEXT =
proc gplot data=air.airqual; plot avgtsp*month=state / vminor=3 hminor=0 ctext=brown caxis=red; where state in ("AL" , "NY"); run;
Lesson : 19
Bar Chart
Pie Chart
Objectives Invoke the GCHART procedure and specify the data set to be used Specify the type of chart to be created Specify the statistic to be displayed Summarize a variable within categories Specify variables to be displayed Select observations Control the pattern and color of bars and slices Use RUN-group processing View and store charts.
Example:
proc gchart data=clinic.admit; hbar sex; vbar age; pie actlevel; run;
vbar
Numeric Chart Variables
pie
The type of chart determines the statistics displayed. By default, PROC GCHART displays the FREQ (frequency) of the chart variable. The following program creates a pie chart that displays the frequency of the variable ActLevel
hbar
For horizontal bar charts,, PROC GCHART also displays the statistics CFREQ (cumulative frequency), PERCENT (percentage), and CPERCENT (cumulative percentage). The following program creates a horizontal bar chart of the variable Company. Notice the four default statistics for this type of chart.
Specifying Statistics
To specify a statistic other than the default statistic FREQ, you use the TYPE= option in the statement that specifies the chart. Let's look at the general form of the GCHART procedure again, this time focusing on the options. The TYPE= option is just one of the possible options. Notice that a slash precedes options.
PROC GCHART <DATA=SAS-data-set>; chart-form chart-variable / TYPE=statistic; RUN; where statistic indicates the statistic of the chart variable to be displayed. Statistics include CFREQ (cumulative frequency), PERCENT (percent), and CPERCENT (cumulative percent).
Specifying Statistics
proc gchart data=clinic.insure; vbar company / type=cfreq; run;
Summarizing a Variable within Categories In addition to specifying a particular statistic for your chart, you may want to summarize one variable within categories defined by a second variable. You can use the SUMVAR= option to summarize a variable within categories. PROC GCHART <DATA=SAS-data-set>; chart-form chart-variable / SUMVAR=summary-variable; RUN; where the values of summary-variable are summarized for each unique value of chart-variable.
When you specify SUMVAR=, the default statistic is SUM, so the chart displays the total of the values of the summary variable for each unique value of the chart variable.
When you specify SUMVAR=, the default statistic is SUM, so the chart displays the total of the values of the summary variable for each unique value of the chart variable.
When you use SUMVAR=, you can also use TYPE=. However, the value of TYPE= can be only SUM or MEAN.
Enhancing Charts
The default fill for horizontal and vertical bar charts specifies that all bars will be the same solid color. To change the default values for bar charts, you use the PATTERNID= option in the statement that specifies the chart. This topic focuses on PATTERNID=MIDPOINT, which specifies a different color/pattern combination for each bar.
General form, PATTERNID= option: PATTERNID=<BY | MIDPOINT | GROUP | SUBGROUP> where BY, MIDPOINT, GROUP, or SUBGROUP specify that bar colors and/or patterns vary according to the option specified. }}}
This program shows the effect of the PATTERNID=MIDPOINT option on bar color and fill patterns.
proc gchart data=clinic.insure; vbar company / sumvar=balancedue type=mean patternid=midpoint; run;
Enhancing Charts To change the fill pattern to either all solid or all hatch, you can use the FILL= option in the PIE or PIE3D statement. FILL=<X|S>
The program next slide shows how the FILL=X option changes the slice patterns. Only one hatch pattern is used, and it rotates through the color list. Note that one color (green) is used twice because there were not as many colors in the color list as there are slices in the chart (some colors were assigned to companies with no balance due or companies included in OTHER).
Example This program shows the default color and fill pattern for pie charts. Note that both solid and hatch patterns are used.
proc gchart data=clinic.insure; pie company / sumvar=balancedue type=mean; run;
proc gchart data=clinic.insure; pie company / sumvar=balancedue type=mean fill=x; run;
pattern1 color=lib; pattern2 color=lig; proc gchart data=clinic.admit; hbar age / sumvar=weight type=mean subgroup=sex patternid=subgroup mean; run;
Additional Features For pie charts, you can specify text color by using the CTEXT= option, control where labels appear by using the SLICE= option, and explode one or more pie slices for effect by using the EXPLODE= option. we can give slice option for HIGH observation in admit table
proc gchart data=clinic.admit; pie3d actlevel / sumvar=fee type=mean ctext=blue ctext=blue slice=arrow explode="HIGH"; run;
pattern1 color=lib; pattern2 color=lig; proc gchart data=clinic.admit; vbar3d age / sumvar=weight type=mean subgroup=sex patternid=subgroup mean; run;
Lesson :20
Procedure Syntax The MEANS procedure can include many statements and options for specifying needed statistics. For the sake of simplicity, let's look at a few key statements and consider the procedure in its basic form. General form, basic MEANS procedure: PROC MEANS <DATA=SAS-data-set> <statistic-keyword(s)> <option(s)>; RUN; where SAS-data-set identifies the data set to process statistic-keyword(s) specifies the statistics to compute option(s) control the content, analysis, and appearance of output.
Objectives Determine the n-count, mean, standard deviation, minimum, and maximum of numeric variables Generate a wide range of descriptive, quantile, and hypothesistesting statistics Control the number of decimal places used in PROC MEANS output Use the VAR statement to analyze specific variables Use CLASS and BY statements to categorize data.
In its simplest form, PROC MEANS prints the n-count (number of non-missing values), mean, standard deviation, and minimum and maximum values of every numeric variable in a data set.
proc means data=perm.survey; run;
Variable
Item1 Item2 Item3 Item4 Item5 Item6 Item7 Item8 Item9 Item10 Item11 Item12 Item13 Item14 Item15 Item16 Item17 Item18
N
4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
Mean
3.7500000 3.0000000 4.2500000 3.5000000 3.0000000 3.7500000 3.0000000 2.7500000 3.0000000 3.2500000 3.0000000 2.7500000 2.7500000 3.0000000 3.0000000 2.5000000 3.0000000 3.2500000
Std Dev
1.2583057 1.6329932 0.5000000 1.2909944 1.6329932 1.2583057 1.8257419 1.5000000 1.4142136 1.2583057 1.8257419 0.5000000 1.5000000 1.4142136 1.6329932 1.9148542 1.1547005 1.2583057
Minimum
2.0000000 1.0000000 4.0000000 2.0000000 1.0000000 2.0000000 1.0000000 1.0000000 2.0000000 2.0000000 1.0000000 2.0000000 1.0000000 2.0000000 1.0000000 1.0000000 2.0000000 2.0000000
Maximum
5.0000000 5.0000000 5.0000000 5.0000000 5.0000000 5.0000000 5.0000000 4.0000000 5.0000000 5.0000000 5.0000000 3.0000000 4.0000000 5.0000000 5.0000000 5.0000000 4.0000000 5.0000000
Specifying Statistics The default statistics produced by the MEANS procedure (n-count, mean, standard deviation, minimum, and maximum) are not always the ones that you need. You might prefer to limit output to the mean of the values. Or you might need to compute a different statistic, such as the median or range of the values. To specify statistics, include statistic keywords as PROC MEANS options. When you specify a statistic in the PROC MEANS statement, default statistics are not produced. For example, to see the median and range of Perm.Survey numeric values, add the MEDIAN and RANGE keywords as options.
statistic keywords
Descriptive Statistics CLM Two-sided confidence limit for the mean CSS Corrected sum of squares CV Coefficient of variation KURTOSIS Kurtosis LCLM One-sided confidence limit below the mean MAX Maximum value MEAN Average MIN Minimun value N Number of observations with nonmissing values NMISS Number of observations with missing values RANGE Range SKEWNESS Skewness STDDEV / STD Standard Deviation STDERR Standard error of the mean SUM Sum SUMWGT Sum of the Weight variable values. UCLM One-sided confidence limit above the mean USS Uncorrected sum of squares VAR Variance
Quantile Statistics
MEDIAN / P50
Median or 50th percentil 1st percentile 5th percentile 10th percentile Lower quartile or 25th percentile Upper quartile or 75th percentile 90th percentile 95th percentile 99th percentile Difference between upper and lower quartiles: Q3-1
median range;
Range
3.0000000 4.0000000 1.0000000 3.0000000 4.0000000 3.0000000 4.0000000 3.0000000 3.0000000 3.0000000 4.0000000 1.0000000 3.0000000 3.0000000 4.0000000 4.0000000 2.0000000 3.0000000
Median
4.0000000 3.0000000 4.0000000 3.5000000 3.0000000 4.0000000 3.0000000 3.0000000 2.5000000 3.0000000 3.0000000 3.0000000 3.0000000 2.5000000 3.0000000 2.0000000 3.0000000 3.0000000
Limiting Decimal Places By default, PROC MEANS output uses the BEST. format. This can result in unnecessary decimal places, making your output hard to read. To limit decimal places, use the MAXDEC= option in the PROC MEANS statement and set it equal to the length you prefer.
General form, PROC MEANS statement with MAXDEC= option: PROC MEANS <DATA=SAS-data-set> <statistic-keyword(s)> MAXDEC=n; where n specifies the maximum number of decimal places.
Variable
Age Height Weight Pulse FastGluc PostGluc
Minimum
15 61 102 65 152 206
Maximum
63 75 240 100 568 625
Variables
proc means data=clinic.diabetes min max maxdec=0; var age height weight; run;
Variable
Age Height Weight
Minimum
15 61 102
Maximum
63 75 240
CLASS Group Processing You will often want statistics for grouped observations, instead of for observations as a whole. For example, census numbers are more useful when grouped by region than when viewed as a national total. To produce separate analyses of grouped observations, add a CLASS statement to the MEANS procedure proc means data=clinic.heart maxdec=1; var arterial heart cardiac urinary; class survive sex; run;
Survive
DIED
Sex
1
N Obs Variable
4 Arterial Heart Cardiac Urinary Arterial Heart Cardiac Urinary Arterial Heart Cardiac Urinary Arterial Heart Cardiac Urinary
N
4 4 4 4 6 6 6 6 5 5 5 5 5 5 5 5
Mean
92.5 111.0 176.8 98.0 94.2 103.7 318.3 100.3 77.2 109.0 298.0 100.8 78.8 100.0 330.2 111.2
Std Dev
10.5 53.4 75.2 186.1 27.3 16.7 102.6 155.7 12.2 32.0 139.8 60.2 6.8 13.4 87.0 152.4
Minimum
83.0 54.0 95.0 0.0 72.0 81.0 156.0 0.0 61.0 77.0 66.0 44.0 72.0 84.0 256.0 12.0
Maximum
103.0 183.0 260.0 377.0 145.0 130.0 424.0 405.0 88.0 149.0 410.0 200.0 87.0 111.0 471.0 377.0
SURV
Group Processing
Like the CLASS statement, the BY statement specifies variables to use for categorizing observations
General form, BY statement: BY variable(s); where BY variable(s) specifies category variables for group processing.
1.Unlike CLASS processing, BY processing requires that your data already be sorted in the order of the BY variables. Unless data set observations are already sorted, you will need to run the SORT procedure before using PROC MEANS with any BY group
proc sort data=clinic.heart out=work.hartsort; by survive sex; run; proc means data=work.hartsort maxdec=1; var arterial heart cardiac urinary; by survive sex; run;
N
4 4 4 4
Mean
92.5 111.0 176.8 98.0
Std Dev
10.5 53.4 75.2 186.1
Minimum
83.0 54.0 95.0 0.0
Maximum
103.0 183.0 260.0 377.0
N
6 6 6 6
Mean
94.2 103.7 318.3 100.3
Std Dev
27.3 16.7 102.6 155.7
Minimum
72.0 81.0 156.0 0.0
Maximum
145.0 130.0 424.0 405.0
N
5 5 5 5
Mean
77.2 109.0 298.0 100.8
Std Dev
12.2 32.0 139.8 60.2
Minimum
61.0 77.0 66.0 44.0
Maximum
88.0 149.0 410.0 200.0
N
5 5 5 5
Mean
78.8 100.0 330.2 111.2
Std Dev
6.8 13.4 87.0 152.4
Minimum
72.0 84.0 256.0 12.0
Maximum
87.0 111.0 471.0 377
Lesson :21
Procedure Syntax The FREQ procedure can include many statements and options for controlling frequency output. For the sake of simplicity, we'll consider the procedure in its basic form.
General form, basic FREQ procedure: PROC FREQ <DATA=SAS-data-set>; RUN; where SAS-data-set names the data set to be used.
By default, PROC FREQ creates a one-way table with the frequency, percent, cumulative frequency, and cumulative percent of every value of all variables in a data set.
Frequency
2930 3106 2451
Percent
34.52 36.60 28.88
Cumulative Frequency
2930 6036 8487
Cumulative Percent
34.52 71.12 100.00
Selecting Variables
General form, TABLES statement: TABLES variable(s); where variable(s) lists the variables to include.
Frequenc y
2848 1355 1706 2578
Percent
33.56 15.97 20.10 30.38
Cumulativ e Frequency
2848 4203 5909 8487
Cumulativ e Percent
33.56 49.53 69.63 100.00
Specifying Frequency Order By default, PROC FREQ displays frequency distributions in the order of each variable's unformatted values. This is known as internal order. proc freq data=clinic.diabetes; tables height; run; Cumula
Heig ht
61 62 63 64 65 66 68 70 71 72 73 75
Freque ncy
2 1 1 3 2 2 2 2 2 1 1 1
Perc ent
10.00 5.00 5.00 15.00 10.00 10.00 10.00 10.00 10.00 5.00 5.00 5.00
You might prefer to view the values in a different order. To control the way that PROC FREQ displays distributions, add the ORDER= option to the PROC FREQ statement and specify the method you prefer. General form, ORDER= option: ORDER=DATA|FORMATTED|FREQ|INTERNAL where DATA orders values by appearance in the data set FORMATTED orders by formatted value FREQ orders values by descending frequency count INTERNAL orders by unformatted value (default).
Frequenc y
2 2 2 3 1 1 1 1 2 2 1 2
Percen t
10.00 10.00 10.00 15.00 5.00 5.00 5.00 5.00 10.00 10.00 5.00 10.00
CumFrequ ency
2 4 6 9 10 11 12 13 15 17 18 20
Cumulativ e Percent
10.00 20.00 30.00 45.00 50.00 55.00 60.00 65.00 75.00 85.00 90.00 100.00
Freque ncy
3 2 2 2 2 2 2 1 1 1 1 1
Two-Way Tables Two-Way Tables So far, you have used the FREQ procedure to create one-way tables of frequency. The table results show total frequency counts for the values within the data set. However, it is often helpful to crosstabulate frequencies with the values of other variables. Census data, for example, is typically crosstabulated with a variable representing geographical regions. The simplest crosstabulation is a two-way table. To create a two-way table, join two variables with asterisks (*) in the TABLES statement of a PROC FREQ step.
General form, TABLES statement for crosstabulation: TABLES variable-1*variable-2 <* ... variable-n>; where (for two-way tables) variable-1 specifies table rows variable-2 specifies table columns.
You can execute SAS statements repeatedly by placing them in a DO loop. Unlike simple DO statements, which execute as a group when an IF condition is met, DO loops execute any number of times in a single iteration of the DATA step. Using DO loops lets you write concise DATA steps that are easier to alter and debug.
For example, the DO loop in this program eliminates the need for 12 separate programming statements to calculate annual earnings:
data finance.earnings; set finance.master; Earned=0; do count=1 to 12; earned+(amount+earned)*(rate/12); end; run;
You can also use DO loops to generate data conditionally execute statements read data.
Objectives After completing this lesson, you will be able to construct a DO loop to perform repetitive calculations control the execution of a DO loop generate multiple observations in one iteration of the DATA step construct nested DO loops.
Constructing DO Loops
DO loops process a group of statements repeatedly rather than once. This can greatly reduce the number of statements required for a repetitive calculation. For example, these twelve sum statements compute a company's annual earnings from investments. Notice that all twelve statements are identical.
data finance.earnings; set finance.master; Earned=0; earned+(amount+earned)*(rate/12); earned+(amount+earned)*(rate/12); earned+(amount+earned)*(rate/12); earned+(amount+earned)*(rate/12); earned+(amount+earned)*(rate/12); earned+(amount+earned)*(rate/12); earned+(amount+earned)*(rate/12); earned+(amount+earned)*(rate/12); earned+(amount+earned)*(rate/12); earned+(amount+earned)*(rate/12); earned+(amount+earned)*(rate/12); earned+(amount+earned)*(rate/12); run;
A DO loop enables you to achieve the same results with fewer statements. In this case, the sum statement executes twelve times within the DO loop during each iteration of the DATA step
data finance.earnings; set finance.master; Earned=0; do count=1 to 12; earned+(amount+earned)*(rate/12); end; run;
Constructing DO Loops
When creating a DO loop with the iterative DO statement, you must specify an index variable. The index variable stores the value of the current iteration of the DO loop. Use any valid SAS name. However, this specification increments the index variable by 2, resulting in rows values of 2, 4, 6, 8, 10, 12:
do quiz=1 to 5; do rows=2 to 12 by 2;
DO Loop Execution Using the form of the DO loop just presented, let's see how the DO loop executes in the DATA step. This example calculates the interest earned each month for a one-year investment.
This DATA step does not read data from another source. When submitted, it compiles and then executes only once to generate data. During compilation, the program data vector is created for the Finance.Earnings data set.
data finance ; Amount=1000; Rate=.075/12; do month=1 to 12; Earned+(amount+earned)*(rate); end; run;
When the DATA step executes, the values of Amount and Rate are
assigned.
Next, the DO loop executes. During each execution of the DO loop, the value of Earned is calculated and added to its previous value. On the
twelfth execution of the DO loop, the program data vector looks like this:
You can also include an OUTPUT statement within the DO loop to write an observation to the data set for each iteration of the DO loop. The OUTPUT statement writes the current values to the data set immediately. By including the OUTPUT statement, you override the automatic output at the end of the DATA step. When the index variable is 21, the observation is not written to the data set.
data work.earn; Value=2000; do year=1 to 20; Interest=value*.075; value+interest; output; end; run;
Decrementing DO Loops You can decrement a DO loop by specifying a negative value for the BY clause. For example, the specification in this iterative DO statement decreases the index variable by 1, resulting in values of 5, 4, 3, 2, 1. start V stop V
You can also specify how many times a DO loop executes by listing items in a series All numeric values
DO index-variable=2,5,9,13,27; more SAS statements END;
Variable names must represent either all numeric or all character values. Do not enclose variable names in quotation marks.
Nesting DO Loops Iterative DO statements can be executed within a DO loop. Putting a DO loop within a DO loop is called nesting.
data work.earn; do year=1 to 20; Capital+2000; do month=1 to 12; Interest=capital*.075; capital+interest; end; end; run;
Lesson 23
Improving Program Efficiency with Macro Variables
Introduction SAS macro variables enable you to substitute text in your SAS programs. Macro variables can supply a variety of information, from operating system information to SAS session information to any text string you define. By substituting text into programs, SAS macro variables make your programs easy to update, as this program shows:
%let year=1999; title "Temporary Employees for &year"; data hrd.newtemp; set hrd.temp; if year(enddate)=&year; run; proc print data=hrd.newtemp; run;
Objectives Create macro variables with the %LET statement Reference automatic and user-defined macro variables Identify when macro variable references are resolved Display log messages stating how macro variable references resolve when using the SYMBOLGEN option Combine macro variable references with prefixes and suffixes.
When writing your SAS programs, you may find that you need to reference the same variable, data set, or text string several times in the same program.
title "Temporary Employees for 1999"; data hrd.temp1999; set hrd.temp; if year(enddate)=1999; run; proc print data=hrd.temp1999; run;
SAS macro variables are part of the macro facility, which is a tool for extending and customizing SAS software and for reducing the amount of text you must enter to complete tasks. A macro variable is independent of a SAS data set and contains one value that remains constant until you change it. The value of a macro variable is a text string that becomes part of your program whenever the macro variable is referenced.
Types of Macro Variables There are two types of macro variables: Automatic macro variables User-defined macro variables. Automatic Macro Variables Whenever you invoke the SAS System, automatic macro variables are created that provide such information as the date or time a SAS job or session began executing release of SAS software you are running name of the most recently created SAS data set abbreviation for your host operating system.
Name Information Supplied SYSDATE9 date the job or session began executing SYSDATE SYSDAY SYSTIME SYSSCP SYSVER SYSLAST date the job or session began executing weekday the job or session began executing time the job or session began executing operating system abbreviation SAS software version and/or release number name of most recently created data set
Notice that all automatic macro variables begin with the letters SYS. SAS software reserves the right to use the SYS prefix for automatic macro variables. It is recommended that you not begin the name of any user-defined macro variable with the letters SYS. For a complete list of automatic macro variables, refer to SAS Macro Language: Reference.
title 'Temporary Employees Hired in November'; footnote "Report Run on &sysday, &sysdate"; data hrd.tempnov; set hrd.temp; if month(begindate)=11; run; proc print data=hrd.tempnov; Run;
For example, this program uses the SYSDAY macro variable to execute a section of the program only on a specified day of the week.
data hrd.temppay(drop=day); set hrd.newtemp (keep=name payrate hours1 hours2); total=hours1+hours2; Day="&sysday"; if day='Friday' then do; gross=input(payrate,2.)*total; tempfee=gross*.05; end; run; proc print data=hrd.temppay; run;
Create a macro variable using the %LET statement Display log messages stating how macro variable references resolve when using the SYMBOLGEN option Combine macro variable references with prefixes and suffixes Create a macro variable during DATA step execution by using the CALL SYMPUT routine.
User-Defined Macro Variables General form, %LET statement: %LET name=value; where: name specifies the name of your macro variable value is the value of the macro variable. Do not begin the macro variable name with the letters SYS. The use of this prefix is reserved to SAS software for automatic macro variable names. The name that you define must follow the rules for SAS names. Everything appearing between the equal sign and semicolon is considered part of the macro variable value.
%let year=1999; title "Temporary Employees for &year"; data hrd.newtemp; set hrd.temp; if year(enddate)=&year; run; proc print data=hrd.newtemp; run;
SYMBOLGEN Option
General form, OPTIONS statement with SYMBOLGEN option: OPTIONS NOSYMBOLGEN | SYMBOLGEN; where NOSYMBOLGEN specifies that log messages will not be displayed. The default is NOSYMBOLGEN.
options symbolgen; %let year=1999; title "Temporary Employees for &year"; data hrd.newtemp; set hrd.temp; if year(enddate)=&year; run; proc print data=hrd.newtemp; run;
SAS Log
36 options symbolgen; 37 %let year=1999; 38 data hrd.newtemp; 39 set hrd.temp; 40 if year(enddate)=&year; SYMBOLGEN: Macro variable YEAR resolves to 1999 NOTE: The data set HRD.NEWTEMP has 8 observations and 18 variables. NOTE: The DATA statement used 0.71 seconds. 41 proc print data=hrd.newtemp; SYMBOLGEN: Macro variable YEAR resolves to 1999 42 title "Temporary Employees for &year"; 43 run;
When using two periods, the macro variable references in your program resolve correctly. For example, the DATA statement data &libref..temp&yr;
General form, CALL statement for CALL SYMPUT routine: CALL SYMPUT( name,value); where CALL is the keyword. SYMPUT invokes the SYMPUT routine. name is the name of the macro variable to be defined. The variable can be a character string enclosed in quotes, a character variable, or a character expression. value is the value to be assigned to the macro variable. The value can be a text string enclosed in quotes, a data set variable, or a DATA step expression.
Lesson 24
MACRO LANGUAGE
The macro facility is a tool for extending and customizing the SAS System. It allows you to abbreviate a large amount of text conveniently and to make text substitutions easily. It contains a programming language to enable you to execute parts of a SAS program (even entire steps) conditionally; data-entry features that accept input from a user; and a means of communicating information between steps of a SAS job.
%SYSPROD %SYSRC SYSRC %SYSRPUT SYSSCPL SYSSCP and SYSSCPL SYSSITE SYSTIME %TRIM and %QTRIM %UNQUOTE %UPCASE and %QUPCASE SYSVER SYSVLONG and SYSVLONG4 %VERIFY %WINDOW
%IF-%THEN/%ELSE IMPLMAC %INDEX %INPUT INTO %label %LEFT and %QLEFT %LENGTH %LET %LOCAL %LOWCASE and %QLOWCASE %MACRO MACRO MAUTOSOURCE %MEND MERROR MFILE
MLOGIC MPRINT MRECALL MSTORED MSYMTABMAX= MVARSIZE= %NRBQUOTE %NRQUOTE %NRSTR %PUT %QCMPRES %QLEFT %QLOWCASE %QSCAN %QSUBSTR %QSYSFUNC %QTRIM
%QUOTE and %NRQUOTE %QUPCASE RESOLVE SASAUTOS= SYMBOLGEN SYMGET SYMGETN SYMPUT SYMPUTN SYSBUFFR %SYSCALL SYSCC
SYSERR %SYSEVALF %SYSEXEC SYSFILRC %SYSFUNC and %QSYSFUNC SYSMENV SYSMSG SYSPARM= SYSPARM SYSPBUFF
%macro isname(name); %let name=%upcase(&name); %if %length(&name)>8 %then %put &name: The fileref must be 8 characters or less %else %do; %let first=ABCDEFGHIJKLMNOPQRSTUVWXYZ_; %let all=&first.1234567890; %let chk_1st=%verify(%substr(&name,1,1),&first); %let chk_rest=%verify(&name,&all); %if &chk_rest>0 %then %put &name: "%substr(&name,&chk_rest,1)".; %if &chk_1st>0 %then %put &name: "%substr(&name,1,1)".; %if (&chk_1st or &chk_rest)=0 %then %put &name is a valid fileref.; %end; %mend isname;
macro settax(taxrate); %IF %THEN; %let taxrate = %upcase(taxrate); %if &taxrate = CHANGE %then %do; <%ELSE;> data thisyear; set lastyear; if sale > 100 then tax = .05; else tax = .08; run; %end; %else %if &taxrate = SAME %then %do; data thisyear; set lastyear; tax = .03; run; %end; %mend settax;
% length %let a=Happy; %let b=Birthday; %put The length of &a is %length(&a).; %put The length of &b is %length(&b).; %put The length of &a &b is to you ;
Executing these statements writes to the SAS log: The length of Happy is 5. The length of Birthday is 8. The length of Happy Birthday To You is 13.
%macro vars(first=1,last=); %global gfirst glast; %let gfirst=&first; %let glast=&last; var test&first-test&last; %mend vars;
When you submit the following program, the macro VARS generates the VAR statement and the values for the macro variables used in the title statement
Macro
Non-macro
%label
%macro info(type); %if %upcase(&type)=SHORT %then %goto quick; /* No % here */ proc contents; run; proc freq; tables _numeric_; run; %quick: proc print data=_last_(obs=10); /* Use % here */ run; %mend info; %info(short)
%let county=Clark; %macro concat; data _null_; longname="&county"||" County"; put longname; run; %mend concat; %concat Calling the macro CONCAT produces the following statements: data _null_; longname="Clark"||" County"; put longname; run;
Lesson :25
SAS/STAT
INTRODUCTION Sometimes you need quick answers to questions about your data. You may want to query your data to examine relationships between data values view a subset of your data compute values quickly.
PROC ANOVA< options > ; CLASS variables ; MODEL dependents=effects < / options > ; ABSORB variables ; BY variables ; FREQ variable ; MANOVA < test-options >< / detail-options > ; MEANS effects < / options > ; REPEATED factor-specification < / options > ; TEST < H=effects > E=effect ;
PROC GLM
To use PROC GLM, the PROC GLM and MODEL statements are required. You can specify only one MODEL statement (in contrast to the REG procedure, for example, which allows several MODEL statements in the same PROC REG run). If your model contains classification effects, the classification variables must be listed in a CLASS statement, and the CLASS statement must appear before the MODEL statement. In addition, if you use a CONTRAST statement in combination with a MANOVA, RANDOM, REPEATED, or TEST statement, the CONTRAST statement must be entered first in order for the contrast to be included in the MANOVA, RANDOM, REPEATED, or TEST analysis.
PROC GLM < options > ; CLASS variables ; MODEL dependents=independents < / options > ; ABSORB variables ; BY variables ; FREQ variable ; ID variables ; WEIGHT variable ; CONTRAST 'label' effect values < ... effect values > < / options > ; ESTIMATE 'label' effect values < ... effect values > < / options > ; LSMEANS effects < / options > ; MANOVA < test-options >< / detail-options > ; MEANS effects < / options > ; OUTPUT < OUT=SAS-data-set > keyword=names < ... keyword=names > < / option > ; RANDAM effects < / options > ; REPEATED factor-specification < / options > ; TEST < H=effects > E=effect < / options > ;
PROC REG
In the preceding list, brackets denote optional specifications, and vertical bars denote a choice of one of the specifications separated by the vertical bars. In all cases, label is optional. The PROC REG statement is required. To fit a model to the data, you must specify the MODEL statement. If you want to use only the options available in the PROC REG statement, you do not need a MODEL statement, but you must use a VAR statement. Several MODEL statements can be used. In addition, several MTEST, OUTPUT, PAINT, PLOT, PRINT, RESTRICT, and TEST statements can follow each MODEL statement. The BY, FREQ, ID, VAR, and WEIGHT statements are optionally specified once for the entire PROC step, and they must appear before the first RUN statement.
PROC REG < options > ; < label: > MODEL dependents=<regressors> < / options > ; BY variables ; FREQ variable ; ID variables ; VAR variables ; WEIGHT variable ; ADD variables ; DELETE variables ; < label: > MTEST <equation, ... ,equation> < / options > ; OUTPUT < OUT=SAS-data-set > keyword=names < ... keyword=names > ; PAINT <condition | ALLOBS> < / options > | < STATUS | UNDO> ; PLOT <yvariable*xvariable> <=symbol> < ...yvariable*xvariable> <=symbol> < / options > ; PRINT < options > < ANOVA > < MODELDATA > ; REFIT; RESTRICT equation, ... ,equation ; REWEIGHT <condition | ALLOBS> < / options > | < STATUS | UNDO> ; < label: > TEST equation,<, ...,equation> < / option > ;
Objectives convert character data to numeric data convert numeric data to character data create SAS date values extract the month and year from a SAS date value extract, edit, and search character variable values.
calculate sample statistics create SAS date values round values generate random numbers extract a portion of a character value convert data from one data type to another.
Automatic Character-to-Numeric Conversion SAS Log data hrd.newtemp; set hrd.temp; Salary=payrate*hours; run; NOTE: Character values have been converted to numeric values at the places given by: (Line):(Column).
When Automatic Conversion Occurs Automatic character-to-numeric conversion occurs when a character value is assigned to a previously defined numeric variable, such as the numeric variable Rate Rate=payrate; used in an arithmetic operation Salary=payrate*hours; compared to a numeric value with a comparison operator if payrate>=rate; specified in a function that requires numeric arguments. NewRate=sum(payrate,raise);
MDY Function
General form MDY function: MDY(month,day,year) where month can be a variable that represents the month or a number from 1-12 day can be a variable that represents the day or a number from 1-31 year can be a variable that represents the year or a number with 2 or 4 digits.
TODAY Function
General form, TODAY function: TODAY()
data hrd.newtemp; set hrd.temp; EditDate=today(); run; proc print data=hrd.newtemp; format editdate date9.; run;
EndDate
14621 14565 14608
EditDate
15JAN2000 15JAN2000 15JAN2000
Character Functions
Function Purpose SCAN returns a specified word from a character value. SUBSTRextracts a substring or replaces character values. TRIMtrims trailing blanks from character values. INDEXsearches a character value for a specific string. UPCASEconverts all letters in a value to uppercase. LOWCASEconverts all letters in a value to lowercase.
Objectives
invoke the SQL procedure select columns define new columns specify the tables(s) to be read specify subsetting criteria order rows by values of one or more columns end the SQL procedure.
proc sql; select id,lastname,netpay,grosspay, grosspay*.06 as bonus from emplib.payroll where netpay>25000 order by lastname;
ID
1002 1007 1049 1006 1077 1008 1009 1005 1012 1015 1010 1011 1017 1001
LastName
BOWMAN BROWN FERNANDEZ GARRETT GIBSON HERNAND JONES KNAPP QUINTERO SCHOLL SMITH VAN HOTTEN WAGGONNER WATERHOUSE
NetPay
$29,048.50 $37,049.40 $25,169.63 $34,013.88 $41,553.94 $54,189.70 $44,128.90 $33,122.70 $51,888.53 $27,640.80 $37,331.48 $29,053.05 $26,484.02 $32,140.60
GrossPay
$42,120.33 $53,927.72 $35,956.61 $47,241.50 $61,108.73 $78,575.07 $63,986.91 $48,027.99 $79,828.51 $40,079.23 $54,899.24 $43,688.80 $38,550.25 $46,603.94
bonus
2527.22 3235.663 2157.397 2834.49 3666.524 4714.504 3839.215 2881.679 4789.711 2404.754 3293.954 2621.328 2313.015 2796.236
proc sql; select id,lastname,netpay,grosspay, grosspay*.06 as bonus from emplib.payroll where netpay>25000 order by lastname; ordering rows
Sex
F F F F M M M
Age
22 28 31 49 34 51 60
CentHgt
160.02 157.48 154.94 162.56 185.42 180.34 180.34
KgWgt
63.18182 53.63636 55.90909 78.18182 70 71.81818 86.81818
Clauses
proc sql; select custname as name, count(*) from sql.customer group by name having count(*)=1; group observations conditions that each group order
Objectives
group variables into one- and twodimensional arrays perform an action on array elements create new variables using an ARRAY statement assign initial values to array elements create temporary array elements using an ARRAY statement.
Defining an array
ARRAY statement:
ARRAY array-name{dimension} elements; where array-name specifies the name of the array dimension describes the number and arrangement of array elements elements lists the variables to include in the array.
Description of Finance.Sales91 :
Variable Type Length SalesRep char 8 Qtr1 num 8 Qtr2 num 8 Qtr3 num 8 Qtr4 num 8 To group the variables in the array, first give the array a name. In this example, make the array name sales.
array sales {4} qtr1 qtr2 qtr3 qtr4; array sales {*} qtr1 qtr2 qtr3 qtr4; array sales {4} or (4) or [4] qtr1 qtr2 qtr3 qtr4; array sales{4} qtr1-qtr4;
data finance.report(drop=i); (drop=option used) set finance.qsales; array sale{4} sales1-sales4; array Goal{4} (9000 9300 9600 9900); array Achieved{4}; do i=1 to 4; achieved(i)=100*sale(i)/goal(i); end; run;
New variables created ( goal & achieved)
data hrd.convert; set hrd.fitclass; array wt(6) weight1-weight6; do i=1 to 6; wt(i)=wt(i)*2.2046; end; run;
data hrd.convert; set hrd.fitclass; array wt(6) weight1-weight6; do i=1 to dim(wt); wt(i)=wt(i)*2.2046; end; run;
Creating Variables with the ARRAY Statement For the above example data hrd.diff; set hrd.convert; array wt(6) weight1-weight6; array WgtDiff(5); new variable creating
When creating variables with an ARRAY statement, you do not need to specify array elements. Because you are not referencing existing variables, SAS software automatically creates the variables for you.
array WgtDiff(5);
* Temporary arrays needed to perform clacualtion and in saving time. Then dod not appear in the output data