Вы находитесь на странице: 1из 417

Lesson :1 Working in the SAS Environment

Introduction The SAS environment is designed to be easy to use, with windows for all the basic SAS tasks. After you become familiar with the starting points for your SAS tasks, you are ready to use the full range of SAS software. This lesson shows you how to use SAS windows to manage your SAS session, work with files, and process SAS programs

This lesson shows you how to use SAS windows to manage your SAS session, work with files, and process SAS programs.

Objectives
In this lesson, you learn to 1 The Explorer, Program Editor, Log, Output, and Results windows 2 Use Enhanced Editor windows 3 Manage your SAS windows 4 Use window features including menus, pop-up menus, 5 Create SAS libraries 6 Explore and manage SAS files 7 Enter and submit SAS programs 8 Create and use file shortcuts.

Using the Main SAS Windows When you start SAS software, by default the five main SAS windows open: I. Explorer, II. Program Editor, III. Log, IV. Output, V. Results Note: In the Windows operating environment, an Enhanced Editor window (see Editor - Untitled1 in the illustration below) opens instead of the Program Editor window.

Cont.

Complete SAS Windows

Features of SAS Windows Maximize, minimize, and restore windows Use menus, pop-up menus, and toolbars

I The Explorer Window 1. create new libraries and SAS files 2. open any SAS file 3. perform most file management tasks such as moving, copying and deleting files 4 .create shortcuts to non-SAS files.

II (A)The Program Editor Window In the Program Editor window, you enter, edit, and submit SAS programs. You can also open existing SAS programs. You can display the Program Editor window by selecting. View Program Editor

Note: In the Windows operating environment, the Program Editor window is not displayed by default

Features of the Program Editor Window

To open any SAS program in operating environments that support drag and drop, you can drag and drop it onto the Program Editor window. Alternately, you can select File Open and choose a program. In addition, you can specify the number of lines to submit at a time recall submitted statements save contents automatically clear contents turn line numbers on and off use the command line or menus save your program.

(B) Enhanced Editor Windows (Windows Operating Environment) In the Windows operating environment, an Enhanced Editor window opens by default. Like the Program Editor window, you can use the Enhanced Editor window to enter, edit, and submit SAS programs. You can open multiple Enhanced Editor windows. To display an additional Enhanced Editor window, select View Enhanced Editor.

III Log Window The Log window displays messages about your SAS session and any SAS programs you submit. You can display the Log window by selecting View Log.

IV Output Window You can create two basic types of SAS output: 1. listing, 2. HTML

listing

HTML
Height Gender Mean F 64.82 M 72.00 Std 2.4 4 2.1 6 Range Mean 141.7 3 172.8 0 Std 16.9 1 16.1 1 Range Weight

8.00 7.00

54.00 46.00

V Results Window The Results window helps you navigate and manage output from SAS programs that you submit. You can view, save, and print individual items of output. View Results

Creating SAS Libraries

Default Libraries

Defining Libraries To define a library, you assign it a library name and specify a path, such as a directory. (It's a good idea to create the directory or other storage location before defining the library.) You also specify an engine, which is a set of internal instructions SAS software uses for writing to and reading from files in a library.

Using SAS Solutions and Tools Along with windows for working with your SAS files and SAS programs, SAS software provides a set of ready-to-use solutions, applications, and tools. You can access many of these tools by using the Solutions and Tools menus

Lesson:2

Basic Concepts
Introduction To program effectively using SAS software, you need to understand basic concepts about SAS programs and the SAS files that they process in various ways. In particular, you need to be familiar with SAS data sets, which are data that is logically arranged in a form accessible to SAS software. In this lesson, you'll examine a simple SAS program and see how it works. You'll learn details about the SAS data sets that it processes and find out about other types of SAS files. Finally, you'll see how SAS files are stored temporarily or permanently in SAS libraries.

Objectives

The structure and components of SAS programs The steps involved in processing SAS programs The structure and components of SAS data sets The two types of SAS data sets Sas libraries and the types of SAS files that they contain Temporary and permanent SAS libraries

SAS Programs

Components of SAS Programs Our sample SAS program contains two steps: a DATA step and a PROC step. These two types of steps, alone or combined, form all SAS programs.

Components of SAS Programs DATA steps: Put your data into a SAS data set Compute the values for new variables Check for and correct errors in your data Produce new SAS data sets by sub setting, merging,and Updating existing data sets. PROC (procedure) steps: print a report produce descriptive statistics create a tabular report produce plots and charts

Characteristics of SAS Programs A) It usually begins with a SAS keyword.

B) It always ends with a semicolon.


Layout for SAS Programs

SAS statements are free-format that means A) They can begin and end anywhere on a line B) One statement can continue over several lines C) Several statements can be on a line. D) Blanks or special characters separate "words in SAS statement.

Processing SAS Programs When you submit a SAS program, SAS software reads the statements and checks them for errors. When it encounters a DATA, PROC, or RUN statement, SAS software stops reading statements and executes the current step in the program. In our sample program, each step ends with a RUN statement.

Referencing SAS Files


.

SAS Data Sets


Two-Level Names To reference a SAS file in your SAS programs, you use a libref. filename

SAS Data Sets


Name Jones Laverne Jaffe Wilson Name Jones Laverne Jaffe Wilson Name Jones Laverne Jaffe Wilson Sex M M F M Sex M M F M Sex M M F M Age 48 58 . 28 Age 48 58 . 28 Age 48 58 . 28 Weight 128.6 158.3 115.5 170.1 Weight 128.6 158.3 115.5 170.1 Weight 128.6 158.3 115.5 170.1

Data portion Observation (Rows)

Variables (Columns)

Descriptor Portion The descriptor portion of a SAS data set contains information about the data set, including The name of the data set The date and time the data set was created The number of observations The number of variables. Let's look at a different SAS data set. The table below lists part of the descriptor portion of the data set Clinic.Insure, which contains insurance information for patients admitted to a wellness clinic. (Likewise, your data set names should be descriptive of the contents of the data set.)

SAS Data Sets


Rules for SAS Names Be 1 to 32 characters in length Begin with a letter (A-Z, including mixed case characters) or an underscore (_) Continue with any combination of numbers, letters, or underscores. Example: Payroll LABDATA1995_1997 _EstimatedTaxPayments3

SAS Data Sets


1.Variable Attributes: Name Be 1 to 32 characters in length Begin with a letter (A-Z, including mixed case characters) or an underscore (_) Continue with any combination of numbers, letters, or underscores. Example: Height GLUCOSE_TOLERANCE_READING AmountBudgeted_1999

SAS Data Sets


2.Variable Attributes: Type Character ($) Numeric. 3.Variable Type and Missing Values For character variables shown below a blank represents a missing value. For numeric variables shown below, a period represents a missing value. 4.Variable Attributes: Length Character variables can be up to 32K long.(Default Length 8) All numeric variables have a default length of 8. Numeric values (no matter how many digits they contain)

Version 8 of SAS software supports mixed case and long names for data sets and other members of SAS libraries, and for variable names. The lengths of character variable values and labels have also increased in Version 8. Max. Length in V6 8 bytes 8 bytes Max. Length in V8 32 bytes 32 bytes 32K 256 bytes

Elements member names variable names

character variable values 200 bytes variable and member labels 40 bytes

Variable Attributes: Informats


Informats determine how data values are read into a SAS data set. You must use informats to read numeric values that contain letters or other special characters. For example, the numeric value $1,234 contains two special characters, a dollar sign ($) and a comma (,). You can use an informat (here, COMMA.) to read the value while removing the dollar sign and comma, and then store the resulting value as a standard numeric value.

Variable Attributes: Formats and Informats Formats and informats (input formats) are variable attributes that affect the way data values are written and read, respectively. SAS software offers a variety of formats and informats for numeric and character data, including date and time values.
Name -----Policy Total Name Type ---Num Num Char Length -----8 8 20

Format --------DOLLAR8. 2

Informat -------COMMA10.

Label ------------Policy Number Total Balance Patient Name

SAS Libraries SAS Data Sets Catalog files

MDDB

View files

Lesson :3

Editing and Debugging SAS Programs


Introduction In the previous lesson, you learned about SAS programs, Now that you're familiar with the basics, you can learn how to use SAS programming windows to edit and debug SAS programs effectively.

Objectives In this lesson you learn to

Edit SAS programs Clear SAS programming windows Interpret error messages in the SAS log Correct errors Resolve common problems.

Including a Stored SAS Program So far, you've copied and pasted SAS programs into the Program Editor window and submitted them for execution. Now you'll learn how to include (copy) a stored program into the Program Editor window. You can include a program using File Shortcuts My Favorite Folders the Open dialog box.

Using File Shortcuts File Shortcuts are located in the Explorer window. To include a program using File Shortcuts, 1.open File Shortcuts 2.double-click the file or select Open from the pop-up menu for the file.

Using My Favorite Folders To include a file that is stored in My Favorite Folders, 1.select View My Favorite Folders 2.double-click the file or select Open from the pop-up menu for the file.

Using the Open Dialog Box


1. select the Program Editor window. 2. select File Open to display the Open dialog box. 3. click Open or OK.

In the Windows environment, you can submit your file directly from the Open dialog box by clicking the Submit check box before clicking Open.

SAS Program Structure Remember that SAS programs are made up of SAS statements.

Program Editor Features The Program Editor window allows you to enter and edit programs just as you would with a word processing program. You can also use and to insert, delete, move, and copy lines within the Program Editor window. Tools Options Editor

Interpreting Error Messages

Error Types So far, the programs you've submitted in this lesson have been error-free, but programming errors do occur. SAS can detect five types of errors: syntax semantic execution-time data macro related.

syntax

semantic

Syntax errors are detected during compile time and occur when program statements do not conform to the rules of the SAS language. Semantic errors are detected during compile time and occur when the form of the elements in a SAS statement is correct, but the elements are not valid for that usage. Execution-time errors are errors that occur when the SAS System executes the program on data values.

executiontime

Data errors are detected during execution time and occur when some data values are not appropriate for data the SAS statements you have specified in the program. macro related Macro related errors occur when there are errors in using the macro facility itself or when there are errors in the SAS code produced by the macro facility.

Syntax Errors
Syntax errors generally cause SAS software to stop processing the step where the error is encountered. Common syntax errors include spelling mistakes forgetting semicolons leaving quotation marks unbalanced specifying invalid options.

IN LOG WINDOW displays the word ERROR identifies the possible location of the error prints an explanation of the error.

Example proc print data =clinic.admitee; var name fee; run;

Note: Clinic library name not assigned

Resolving the Common Problems


Missing RUN Statement

Message

Resolving the Problem To correct the error, submit a RUN statement to complete the PROC step. run;

Resolving the Common Problems


Missing Semicolon One of the most common errors is omitting a semicolon at the end of a statement. The program below is missing a semicolon at the end of the PROC PRINT statement.

Resolving the Problem To correct the error, do the following: 1.Recall the program to the Program Editor window. 2.Find the missing semicolon and add it. You can usually locate the statement where the semicolon belongs by looking at the underscored keywords in the error message and working backwards. 3.Resubmit the corrected program. 4.Check the Log window again to make sure that there are no other errors.

Resolving the Problem Unbalanced Quotes Some syntax errors, such as the missing quotation mark after HIGH in the program below, cause SAS to misinterpret the statements in your program.

Resolving the Problem To resolve the error, submit a quote followed by a semicolon and a RUN statement.

; run;

Resolving the Problem Invalid Option

Resolving the Problem To correct the error, do the following: 1.Recall the program to the Program Editor window. 2.Remove or replace the invalid option, and check your statement syntax as needed. 3.Resubmit the corrected program. 4.Check the Log window again to make sure there are no other errors.

Lesson :4

Reading Raw Data

Objectives
In this lesson, you learn to

Reference a SAS data library Reference a raw data file Name a SAS data set to be created Specify a raw data file to be read Read standard character and numeric Values in fixed fields Submit and verify a data step program Subset data.

Steps to Create a SAS Data Set

1.Reference SAS data library 2.Reference external file 3.Name SAS data set 4.Identify external file 5.Describe data 6.Execute DATA step 7.List the data 8.Execute final program step

LIBNAME statement FILENAME statement DATA statement INFILE statement INPUT statement RUN statement
PROC PRINT statement RUN statement

1.Using a LIBNAME Statement

Defining Libraries
You learned to assign library names using SAS windows. You can also assign library names using programming statements. libname taxes 'c:\users\acct\qtr1\report';

How Long Librefs Remain in Effect The LIBNAME statement is global, which means that its settings remain in effect until you modify them, cancel them, or end your SAS session.

2. Using a FILENAME Statement

filename taxes c:\users\acct\qtr1\report';


3. General form, basic DATA statement: DATA SAS-data-set; where SAS-data set is the name of the data set to be created.

Ex : data budget_1999 ;

4. General form, INFILE statement:


INFILE file-specification <options>; where file-specification can take the form fileref to name a previously defined file reference or 'filename' to point to the actual name and location of the file.

Ex: infile 'c:\irs\personal\refund.dat';

INPUT
General form, statement using column input: INPUT variable <$> startcol-endcol . . . ; where variable is the SAS name you assign to the field the dollar sign ($) identifies the data set type as character (nothing appears here if the data set is numeric) startcol represents the starting column location in the data line for this variable endcol represents the ending column location in the data line for this variable

Raw Data File Exercise

>----+----10---+----20 2810 61 MOD F 2804 38 HIGH F 2807 42 LOW M 2816 26 HIGH M 2833 32 MOD F 2823 29 HIGH M
input ID $ 1-4 Age 6-7 Act Level $ 9-12 Sex $ 14;

data venu.sample; Infile C:\v8\Desktop\sample.txt; input ID 1-4 Age 6-7 ActLevel $ 9-12 Sex $ 14;Run; Proc print data= venu.sample; Run; Obs 1 2 List output 3 4 5 6 ID 2810 2804 2807 2816 2833 2823 Age 61 38 42 26 32 29 Level MOD HIGH LOW HIGH MOD HIGH Sex F F M M F M

Describing the Data The INPUT statement describes the fields of raw data to be read and placed into the SAS data set.

To do this... Reference SAS data library Reference external file Name SAS data set Identify external file Describe data Execute the DATA step

Use this SAS statement...


libname clinic 'c:\users\may\data'; filename tests 'c:\users\tmill.dat';

data clinic.stress; infile tests; INPUT statement RUN statement

data q.sample; input ID $ 1-4 Age 6-7 ActLevel $ 9-12 Sex $ 14; Cards; 2810 61 MOD F

2804 38 HIGH F 2807 42 LOW M

2816 26 HIGH M 2833 32 MOD F

2823 29 HIGH M ; run; proc print data= q.sample;

run;

Setting System Options


To modify system options, you submit an OPTIONS statement. You can place an OPTIONS statement anywhere in a SAS program to change the current settings.

Regular option
DATE|NODATE NUMBER|NONUMBER PAGENO= PAGESIZE= LINESIZE=

Additional Features
BYLINE|NOBYLINE DETAILS|NODETAILS FIRSTOBS= FORMCHAR= FORMDLIM= LABEL|NOLABEL OBS= REPLACE|NOREPLACE SOURCE|NOSOURCE

YEARCUTOFF

Setting System Options


If you use two-digit year values in your data lines, you should consider another important system option, the YEARCUTOFF= option. This option specifies which 100-year span is used to interpret two-digit year values.

How the YEARCUTOFF= Option Works When a two-digit year value is read, SAS software defaults to a year in the twentieth century. (For Version 8 of SAS software, the default value of YEARCUTOFF= is 1920.) Default:-

Expression Interpreted

12/07/41 18Dec15 04/15/30 15Apr95

12/07/2041 18Dec2015 04/15/2030 15Apr1995

Example:

options nonumber nodate; proc print data=sales.qtr; var salesrep type unitsold region; where unitsold>=30; run; options date; proc freq data=sales.qtr4; where unitsold>=30; tables salesrep; run;

Lesson 5

Understanding DATA Step Processing


Introduction In previous lessons you learned how to read data, perform basic modifications, and create a new SAS data set. This lesson teaches you what happens "behind the scenes" when DATA steps are processed. We'll do that by examining the program data vector, a logical framework that the SAS System uses when creating SAS data sets.

Objectives 1. Identify the two phases that occur when a DATA step is processed 2. Identify the processing phase in which an error occurs 3. Debug SAS DATA steps 4, Compiling and execution phases

Writing Basic DATA Steps If you completed the prerequisites for this module, you learned how to write a DATA step to create a permanent SAS data set from raw data in an external file.

data clinic.stress; infile tests; input ID 1-4 Name $ 6-25 RestHR 2729 MaxHR 31-33 RecHR 35-37 TimeMin 39-40 TimeSec 42-43 Tolerance $ 45; run;

Raw Data File Tests


2458 2462 2501 2523 2539 2544 2552 2555 Murray, W Almers, C Bonaventure, T Johnson, R LaMance, K Jones, M Reberson, P King, E 72 68 78 69 75 79 69 70 185 171 177 162 168 187 158 167 128 133 139 114 141 136 139 122 12 10 11 9 11 12 15 13 38 5 13 42 46 26 41 13 D I I S D N D I

data clinic.stress; infile tests; input ID 1-4 Name $ 6-25 RestHR 27-29 MaxHR 31-33 RecHR 35-37 TimeMin 39-40 TimeSec 42-43 Tolerance $ 45; run;

proc print data=clinic.stress; run;

Compilation Phase How SAS Software Processes Programs When you submit a DATA step, SAS software processes the DATA step and creates a new SAS data set. Let's see exactly how that happens. A SAS DATA step is processed in two distinct phases:

During the compilation phase, each statement is scanned for syntax errors. Most syntax errors prevent further processing of the DATA step. If the DATA step compiles successfully, then the execution phase begins. A DATA step executes once for each observation in the input data set,

Co Compilation Phase mpilation Phase

At the beginning of the compilation phase, the input buffer, an area of memory, is created to hold a record from the external file. The input buffer is created only when raw data is read, not when a SAS data set is read. The term input buffer refers to a logical concept and does not necessarily reflect the physical storage of data

Compilation Phase The program data vector contains two automatic variables that can be used for processing but are not written to the data set as part of an observation. 1. _N_ counts the number of times that the DATA step has begun to execute. 2._ERROR_ signals the occurrence of an error caused by the data during execution. The default value is 0, which means there is no error. When an error occurs, whether one error or a number of errors, the value is set to 1.

Compilation Phase

During the compilation phase, SAS software also scans each statement in the DATA step, looking for syntax errors. Syntax errors include: missing or misspelled keywords invalid variable names missing or invalid punctuation invalid options.

Compilation Phase As the INPUT statement is compiled, a slot is added to the program data vector for each variable in the input data set. Generally, variable attributes such as length and type are determined the first time that a variable is encountered data inder.update; infile invent; input Item $ 1-13 IDnum $ 15-19 Instock 21- 22 Backord 24-25; Total=instock+backord ; run;

Compilation Phase
Any variables created in the DATA step are also added to the program data vector. For example, the assignment statement below creates the variable Total. As the statement is compiled, the variable is added to the program data vector. The attributes of Total are determined by the expression in the statement. Because the expression produces a numeric value, Total is defined as a numeric variable and assigned a default length of 8.

At the bottom of the DATA step (in this example, when the RUN statement is encountered), the compilation phase is complete and the descriptor portion of the new SAS data set is created. The descriptor portion of the data set includes: Name of the data set Number of observations and variables Names and attributes of the variables

Execution Phase

After the DATA step is compiled, it is ready for execution. During the execution phase, the data portion of the data set is created. The data portion contains the data values.

During execution, each observation in the input data set is processed, stored in the program data vector, and then written to the new data set as an observation, unless otherwise directed. The DATA step executes once for each observation in the input data set, unless otherwise directed. For example, this DATA step reads values from the file Invent and executes nine times because there are nine records in the file.

Execution Phase

At the beginning of the execution phase, the value of _N_ is 1. Because there are no data errors, the value of _ERROR_ is 0.
data perm.update; infile invent; input Item $ 1-13 IDnum $ 15-19 Instock 21-22 BackOrd 24-25; Total=instock+backord; run;

The remaining variables are initialized to missing. Missing numeric values are represented by a period and missing character values are represented by a blank.

Execution Phase Next, the INFILE statement identifies the location of the raw data.

data perm.update; infile invent; input Item $ 1-13 IDnum $ 15-19 Instock 21-22 BackOrd 24-25; Total=instock+backord; run;

Execution Phase When an INPUT statement begins to read data values from a record, it uses an input pointer to keep track of its position. The input pointer starts at column 1 of the first record, unless otherwise directed. As the INPUT statement executes, the raw data in columns 1-13 are read and assigned to Item in the program data vector. data perm.update; infile invent; input Item $ 1-13 IDnum $ 15-19 Instock 2122 BackOrd 24-25; Total=instock+backord; run;

Raw Data File INDER

V---+----1----+----2----+Bird Feeder 6 Glass Mugs Glass Tray Padded Hangrs Jewelry Box Red Apron Crystal Vase Picnic Basket Brass Clock LG088 SB082 BQ049 MN256 AJ498 AQ072 AQ672 LS930 AN910

3 6 12 15 23 9 27 21 2

20 12 6 20 0 12 0 0 10

Execution Phase Notice that the input pointer now rests on column 14. With column input, the pointer moves as far as the INPUT statement instructs it and stops in the column immediately following the last one read.
data inder.update; infile invent; input Item $ 1-13 IDnum $ 15-19 Instock 21-22 BackOrd 24-25; Total=instock+backord; run;

Next, the data in columns 15-19 are read and assigned to IDnum in the program data vector.
data inder.update; infile invent; input Item $ 1-13 IDnum $ 15-19 Instock 21-22 BackOrd 24-25; Total=instock+backord; run;

At the end of the DATA step, three default actions occur. First, the values in the program data vector are written to the data set as the first observation. data perm.update; infile invent; input Item $ 1-13 IDnum $ 15-19 Instock 21-22 Backord 24-25; Total=instock+backord; run;

Execution Phase

SAS Data Set Perm.Update

Item Bird Feeder

IDnum LG088

InStock 3

BackOrd 20

Total 23

Debugging a DATA Step


Diagnosing Syntax Errors
Now that you know how a DATA step is processed, let's use that knowledge to debug some errors. For example, suppose you have syntax errors in your DATA step. In the example below, the keyword DATA is misspelled. When the DATA step is submitted for processing, the misspelled word is interpreted as "DATA" and the DATA step compiles successfully. SAS software underlines the detected error, identifies it with a number, and writes a corresponding message in the log.

data perm.update; infile invent; input Item $ 1-13 IDnum $ 15-19 InStock 21-22 BackOrd 24-25; Total=instock+backord; run;

Debugging a DATA Step


SAS Log

07 daat perm.update; ---14 WARNING 14-169: Assuming the symbol DATA was misspelled as daat. 08 infile invent; 09 input Item $ 1-13 IDnum $ 15-19 10 InStock 21-22 BackOrd 24-25; 11 Total=instock+backord; 12 run;

Debugging a DATA Step


Incorrectly identifying a variable's type is another common execution-time error. As you know, the values for IDnum are character values. Suppose you forget to place the dollar sign ($) after the variable's name in your INPUT statement. This is not a compile-time error because SAS software cannot verify IDnum 's type until the data values for IDnum are read.

>----+----1----+----2----+Bird Feeder Glass Mugs Glass Tray Padded Hangrs Jewelry Box Red Apron Crystal Vase LG088 SB082 BQ049 MN256 AJ498 AQ072 AQ672 3 6 12 15 23 9 27 20 12 6 20 0 12 0

Debugging a DATA Step In this case, the DATA step completes the execution phase and the observations are written to the data set. However, several notes appear in the log. SAS Log
NOTE: Invalid data for IDnum in line 7 15-19.
RULE:

----+----1----+----2----+----3----+----

4
07 Crystal Vase AQ672 27 0 Item=Crystal Vase IDnum=. InStock=27 BackOrd=0 Total=27 _ERROR_=1 _N_=7 NOTE: Invalid data for IDnum in line 8 15-19. 08 Picnic Basket LS930 21 0 Item=Picnic Basket IDnum=. InStock=21 BackOrd=0 Total=21 _ERROR_=1 _N_=8 NOTE: Invalid data for IDnum in line 9 15-19. 09 Brass Clock AN910 2 10 Item=Brass Clock IDnum=. InStock=2 BackOrd=

Debugging a DATA Step

The PRINT procedure displays the data set with the missing values for IDnum. In this example,
the periods indicate that IDnum is a numeric variable, although it should be defined as a character variable.
Obs
1 2 3 4 5 6 7 8 9

Item
Bird Feeder Glass Mugs Glass Tray Padded Hangrs Jewelry Box Red Apron Crystal Vase Picnic Basket Brass Clock

IDnum
. . . . . . . . .

InStock
3 6 12 15 23 9 27 21 2

BackOrd
20 12 6 20 0 12 0 0 10

Total
23 18 18 35 23 21 27 21 12

Lesson :6
Reading Raw Data in Fixed Fields How your data is organized determines which input style you should use to read the data. SAS software provides three primary input styles: column, formatted, and list input. This lesson teaches you how to use column or formatted input to read data that is arranged in fixed fields
Objectives 1 Standard and nonstandard numeric data 2 Read standard fixed-field data 3 Read nonstandard fixed-field data.

Input Styles :

1. 2. 3. 4. 5.

Column input Formatted input list input Mixed input Named input

1. Column Input

That the INPUT statement lists the variables with their corresponding column locations in order from left to right. However, one of the features of column input is the capability to read fields in any order. For example, you could have read the values for
InStock and BackOrd before the values for Item and IDnum

input Item $ 1-13 IDnum $ 15-19 InStock 21-22 BackOrd 24-25;

Column Input Features


Character variables values can be up to 32K and can contain embedded blanks input Name $ 125;

>----+----10---+----20---+
JOSEPH PAUL THACKERY JR.

No placeholder is required for missing data. A blank field is read as missing and does not cause other fields to be read incorrectly.

input Item $ 1-13 IDnum $ 15-19 InStock 21-22 BackOrd 24-25;

>----+----10---+----20---+BIRD FEEDER LG088 3 20 6 GLASS MUGS SB082 12 GLASS TRAY BQ049 12 6

Fields or parts of fields can be reread.

input Item $ 1-13 IDnum $ 15-19 Supplier $ 15-16 InStock 21-22 BackOrd 24-25;

>----+----10---+----20---+PADDED HANGRS MN 256 JEWELRY BOX RED APRON AJ 498 AQ 072 15 23 9 20 0 12

Fields do not have to be separated by blanks or other delimiters.

input Item $ 1-13 IDnum $ 14-18 InStock 19-20 BackOrd 21-22;


>----+----10---+----20---+PADDED HANGRSMN2561520 JEWELRY BOX AJ49823 0 RED APRON AQ072 912

Identifying Nonstandard Numeric Data Standard Numeric Data Standard numeric data values can only contain Numbers Decimal points Numbers in scientific, or E notation (23E4) Minus signs.

Nonstandard Numeric Data


Nonstandard numeric data include Values that contain special characters, such as percent signs (%), dollar signs ($), and commas (,) Date and time values Data in fraction, integer binary and real binary, and hexadecimal forms.

Choosing an Input Style Nonstandard data values require an input style with more flexibility than column input. You can use formatted input, which combines the features of column input with the ability to read nonstandard, as well as standard data.
Standard or non standard fixed fields data

Whenever you encounter raw data that is organized into fixed fields, you can use Column input to read standard data only Formatted input to read both standard and nonstandard data.

Using Formatted Input Formatted input is a very powerful method for reading both standard and nonstandard data in fixed fields.

General form, INPUT statement using formatted input: INPUT pointer-control variable informat.; where pointer-control positions the input pointer on a specified column variable is the name of the variable being created informat is the special instruction that specifies how SAS software reads raw data.

Using Formatted Input In this lesson, you'll be working with two column pointer controls. The @n moves the input pointer to a specific column number. The +n moves the input pointer forward, to a column number relative to the current position.

>V---+----10---+----20---+--EVANS HELMS HIGGINS LARSON MOORE POWELL RILEY DONNY LISA JOHN AMY MARY JASON JUDY 112 105 111 113 112 103 111 29,996.63 18,567.23 25,309.00 32,696.78 28,945.89 35,099.50 25,309.00

Using Formatted Input Using the @n Column Pointer Control


The @n is an absolute pointer control that moves the input pointer to a specific column number. The @ moves the pointer to column n, which is the beginning column of the field being read. input @n variable informat.; input @1 LastName $7. @9 FirstName $5. The default column pointer location is column 1, so a column pointer control is not required to read the first field.

Reading Columns in Any Order

Column pointer controls are very useful. For instance, you can use the @n to move a pointer forward or backward when reading a record. In this INPUT statement, the value for FirstName is read
first, starting in column 9.

input @9 FirstName $5.

>----+---V10---+----20---+--EVANS DONNY 112 29,996.63 HELMS LISA 105 18,567.23 HIGGINS JOHN 111 25,309.00

Now let's read the values for LastName that begin in the first column. Here, you must use the @n pointer control to move the pointer back to column 1. input @9 FirstName $5. @1 LastName $7.

>V---+----10---+----20---+--EVANS DONNY 112 29,996.63 HELMS LISA 105 18,567.23 HIGGINS JOHN 111 25,309.00

The rest of the INPUT statement indicates the column locations of the raw data values for JobTitle and Salary.
input @9 FirstName $5. @1 LastName $7. +15 JobTitle 3. @19 Salary comma9.;

-----5---10--------15---20--V V
EVANS HELMS HIGGINS DONNY LISA JOHN 112 105 111 29,996.63 18,567.23 25,309.00

The +n Pointer Control


The +n is a relative pointer control that moves the input pointer forward to a column number relative to the current position. The + advances the pointer n columns. input +n variable informat.; In order to count correctly, it is important to understand just where the column pointer control is located after each data value is read. Let's look at an example.

With formatted input, the column pointer control moves to the first column after the field just read. In this example, after LastName is
read, the pointer moves to column 8.

To start reading FirstName, beginning in column 9, you move the column pointer control ahead 1 column with +1.

input LastName $7. +1 FirstName $5.

After reading FirstName, the column pointer moves to


column 14. Now you want to skip over the values for JobTitle and read the values for Salary, which begin in column 19. So you move the column pointer ahead 5 columns from column 14. input LastName $7. +1 FirstName $5. +5 Salary comma9.

>----+--V-10---+----20---+--EVANS DONNY 112 29,996.63 HELMS LISA 105 18,567.23 HIGGINS JOHN 111 25,309.00

The last field to be read contains the values for JobTitle. You can use the @n column pointer control to return to column 15.

input LastName $7. +1 FirstName $5. +5 Salary comma9. @15 JobTitle 3.;

Lesson : 7

Reading Free-Format Data


This external file contains data that is arranged in columns, or fixed fields. You can specify a beginning and ending column for each field. >----+----10---+----20---+----30-BIRD FEEDER GLASS MUGS GLASS TRAY PADDED HANGRS JEWELRY BOX RED APRON CRYSTAL VASE PICNIC BASKET LG088 SB082 BQ049 MN256 AJ498 AQ072 AQ672 LS930 3 6 12 15 23 9 27 21 20 12 6 20 0 12 0 0

However, the following external file contains data that is free-format, meaning data that is not arranged in columns. Notice that the values for a particular field do not begin and end in the same columns.

>----+----10---+----20---+----30-ABRAMS L.MARKETING $18,209.03 BARCLAY M.MARKETING $18,435.71 COURTNEY W.MARKETING $20,006.16 FARLEY J.PUBLICATIONS $21,305.89 HEINS W.PUBLICATIONS $20,539.23

Objectives

Free-format data, or data that is not organized in fixed fields Free-format data separated by nonblank delimiters, such as commas Free-format data that contains missing values Character values that exceed eight characters Nonstandard free-format data Character values that contain embedded blanks.

Free-Format Data
Suppose you have raw data that is free-format; that is, it is not arranged in fixed fields. The fields may be separated by blanks or some other delimiter, as shown below. Column and formatted input that you may have used before to read standard and nonstandard data in fixed fields won't work in this case.

>----+----10---+----20---+---ABRAMS*L.*MARKETING*$8,209 BARCLAY*M.*MARKETING*$8,435 COURTNEY*W.*MARKETING*$9,006 FARLEY*J.*PUBLICATIONS*$8,305 HEINS*W.*PUBLICATIONS*$9,539

List Input
Characteristics of list Input Style
Fields must be separated by at least one blank Each field must be specified in order Missing values must be represented by period Character values cant contain embedded blanks The default length of character variables is 8 bites. A longer value truncated when it is written in the programmer vector Data must be standard character or numeric character

input Gender $ Age Bankcard Deptcard FreqDept;

data a; input name $ age sal ; cards; venu inder 25 reddy hanu ; run; proc print data= a; run; 26 21 24 456.09 467.17 766.36 765.89
Obs name 1 venu 2 inder 3 reddy 4 hanu AGE
24 25 21 26

sal
456.09 467.17 766.36 765.89

Working With Delimiters


Most free-format data fields are clearly separated by blanks and are easy to imagine as variables and observations. But fields can also be separated by nonblank delimiters, such as commas, as shown below.
RAINFALL VIJ,27,1,8,0,0 GUNTUR,29,3,14,5,10 ONGOLE,34,2,10,3,3 NELLORE,35,2,12,4,8 E.G,45,2,34,6,90 W.G,56,23,24,5,6,

data RAIN; INFILE RF

dlm= ', ;
Obs DIST mcode
27 29 34 35 45 56

input DIST $ mcode june july aug sept ; run; proc print data= a; run; proc print data= RAIN;; run;

june
1 3 2 2 2 23

july
8 14 10 12 34 24

aug
0 5 3 4 6 5

sept
0 10 3 8 90 6

1 VIJ 2 GUNTUR 3 ONGOLE 4 NELLORE 5 E.G 6 W.G

MISSOVER Simply specify the MISSOVER option in the INFILE statement. The MISSOVER option .when your are using this option in In list input style in missing place we should give the periodic (.)

infile credit missover;

SBI 23 . 45 44 AB 34 45 . 56 HSBC 45 56 . 57 AMBRO . 34 54 56 ICICI 23 . 87 98

data BANK; INFILE CREDIT MISSOVER; input BANK $ june july aug sept ; run; proc print ; run;

Obs

BANK

june
23 34 45 . 23

july
. 45 56 34 .

aug
45 . . 54 87

sept
44 56 57 56 98

1 SBI 2 AB 3 HSBC 4 AMBRO 5 ICICI

Using Missover option


Obs BANK 1 SBI 2 AB 3 HSBC 4 AMBRO 5 ICICI june
23 34 45 . 23

july
. 45 56 34 .

aug
45 . . 54 87

sept
44 56 57 56 98

Without Missover option


Obs BANK 1 SBI 2 HSBC 3 AMBRO 4 ICICI june
23 45 . 23

july
45 56 34 .

aug
44 . 54 87

sept
. 57 56 98

Changing the Length of Character Values Remember that when you use list input to read raw data, character values are assigned a default length of 8. Let's take a look at what happens when list input is used to read character values that have a value longer than 8.

length City $ 12;

Note: before input statement you should write the length statement

Statebankindia, 23, ., 45, 44, Abdhrabankmysore, 34, 45, ., 56, Raw Date Credit HSBCbankusa, 45, 56, ., 57, AMBRObankcanada, ., 34, 54, 56, ICICIbankindia, 23, ., 87, 98, data BANK; INFILE credits dlm= , ' ; length bank $16.; input bank $ june july aug sept; run; proc print ; run;
Obs bank june
23 34 45 . 23

july
. 45 56 34 .

aug
45 . . 54 87

sept
44 56 57 56 98

1 Statebankindia 2 Abdhrabankmysore 3 HSBCbankusa 4 AMBRObankcanada 5 ICICIbankindia

Modifying List Input You can make list input more versatile by using modified list input. There are two modifiers that can be used with list input.

The ampersand (&) modifier is used to read character values that contain embedded blanks. The colon (:) modifier is used to read nonstandard data values and character values longer than eight characters, but without embedded blanks.

Ampersand (&) modifier

& indicates that a character value may have


one or more single embedded blanks. The first occurrence of at least two consecutive blanks indicated an end of the variable value

State bank india 23 . 45 44 Abdhra bank mysore 34 45 . 56 Raw Date HSBC bank usa 45 56 . 57 Credit AMBRO bank canada . 34 54 56 ICICI bank india 23 . 87 98
data BANK; INFILE CARDS missover ; input bank $ & aug sept ; run; proc print ; run;

19. june july


Obs bank june
23 34 45 . 23

july
. 45 56 34 .

aug
45 . . 54 87

sept
44 56 57 56 98

1 State bank india 2 Abdhra bank mysore 3 HSBC bank usa 4 AMBRO bank canada 5 ICICI bank india

Reading Nonstandard Values


The colon (:) modifier enables you to read nonstandard data values and character values longer than eight characters, but without embedded blanks. The : indicates that values are read until a blank (or other delimiter) is encountered, and then an informat is applied. If an informat for reading character values is specified, the w value determines the variable's length, overriding the default length.

The colon (:) format modified enables you to use list input but also to specify an informat after a variable name whether character or numeric SAS reads until it encounters black column

NEW YORK

7,262,700,898 AMBRO BAN 3,259,340,889 HSBC BANK

Raw Date Credit


data cityrank; INFILE CREDIT input City $ & 13.

LOS ANGELES

CHICAGO USA 3,009,530,909 SCOOTIA BANK HOUSTON CITY 1,728,910,890 AMERICAN EXPRESS PHILAD ELPHIA 1,642,900,878 CITY BANK Obs 1 2 3 4 5 City
NEW YORK LOS ANGELES CHICAGO USA HOUSTON CITY PHILAD ELPHIA

AMOUNT
7262700898 3259340889 3009530909 1728910890 1642900878

BANKS
AMBRO BAN HSBC BANK SCOOTIA BANK AMERICAN EXPRESS CITY BANK

AMOUNT :comma13. BANKS & $16. ; FORMAT AMOUNT COMMA13.; un; proc print ; run;

Obs 1 2 3 4 5

City
NEW YORK LOS ANGELES CHICAGO USA HOUSTON CITY PHILAD ELPHIA

AMOUNT BANKS
7,262,700,898 3,259,340,889 3,009,530,909 1,728,910,890 1,642,900,878 AMBRO BAN HSBC BANK SCOOTIA BANK AMERICAN EXPRESS CITY BANK

Mixed input Style


data cityrank;

FORMATED INPUT column INPUT

input @15 BANKS & $16. @1 City $10.AMOUNT 1-19 *FORMAT AMOUNT COMMA13.; CARDS ; NEWYARK 700 AMBRO BAN LOSENJELS 259 HSBC BANK CHICAGO 129 SCOOTIA BANK HOUUGES 728 AMERICAN EXPRESS PHILIPHS 642 CITY BANK ; Run; proc print ;Run;

rate ;

List INPUT

HTML Output Mixed input style programme


Obs BANKS 1 AMBRO BAN 2 HSBC BANK 3 SCOOTIA BANK 4 AMERICAN EXPRESS 5 CITY BANK City
NEWYARK LOSENJELS CHICAGO HOUUGES PHILIPHS

AMOUNT
700 259 129 728 642

Named input style


DATA GAMES; LENGTH NAME $10.; INPUT NAME=$ IOD1= IOD2= IOD3=; DATALINES ; NAME=GANGULY IOD1=12 IOD2=37 IOD3=23 NAME=SHEVAHG IOD1=67 IOD2=109 IOD3=76 NAME=DRAVID IOD1=22 IOD2=50 IOD3=90 NAME=TENDULKHAR IOD1=34 IOD2=59 IOD3=120 = ; RUN;PROC PRINT ;RUN;

Obs

NAME

IOD1
12 67 22 34

IOD2
37 109 50 59

IOD3
23 76 90 120

1 GANGULY 2 SHEVAHG 3 DRAVID 4 TENDULKHAR

Lesson :8

Reading Date and Time Values


Introduction

SAS software provides many SAS informats for reading raw data values in various forms. If you took the lesson Reading Raw Data in Fixed Fields, you worked with informats to read standard and nonstandard data. In this lesson, you learn how to use a special category of SAS informats called date and time informats. These informats enable you to read a variety of common date and time expressions. After you read date and time values, you can also perform calculations with them.

options yearcutoff=1920; data perm.aprbills; infile aprdata; input LastName $8. @10 DateIn mmddyy8. +1 DateOut mmddyy8. +1 RoomRate 6. @35 EquipCost 6.; Days=dateout-datein+1; RoomCharge=days*roomrate; Total=roomcharge+equipcost; run;

Objectives

How SAS software stores date and time values To read common date and time expressions using SAS informats How to handle two-digit date values To calculate time intervals by subtracting two dates To multiply a time interval by a rate.

How SAS Software Stores Date Values Before you read date or time values into a SAS data set or use those values in calculations, you should understand how SAS software stores date and time values. When you read a date using a SAS informat, SAS software converts it to a numeric date value. A SAS date value is the number of days from January 1, 1960, to the given date.

Date informats
Date value Date value

SAS informat

Date Expression 02Jan00 01-02-2000 02/01/00 2000/01/02

SAS Date Informat DATEw. MMDDYYw. DDMMYYw. YYMMDDw.

SAS Date Value 14611 14611 14611 14611

Time informats
SAS software stores time values similar to the way it stores date values. A SAS time value is stored as the number of seconds since midnight.

A SAS datetime is a special value that combines both date and time information. A SAS datetime value is stored as the number of seconds between midnight on January 1, 1960, and a given date and time.

General form, INPUT statement with an informat:

INPUT <variable name>variable informat.; where variable is the name of the variable being read. informat. is any valid SAS informat. Note that the format includes a final decimal point.

Date Expression 10/15/99 15Oct99 10-15-99 99/10/15

SAS Date Informat MMDDYYw. DATEw. MMDDYYw. YYMMDDw.

MMDDYYw. Informat Date Expression 10/15/99 10/15/99 10 15 99 10-15-1999 SAS Date Informat MMDDYY8. MMDDYY8. MMDDYY8. MMDDYY10.

In the MMDDYYw. informat, the month, day, and year fields can be separated by blanks or special characters. If delimiters are used, they must be placed between all fields in the values. Remember to specify a field width that includes not only the month, day, and year values, but any delimiters as well. Here are some date expressions you can read using the MMDDYYw. informat:

DATEw. Informat
Date Expression 30May00 30May2000 30-May-2000 TIMEw. Informat Time Expression 17:00:01.34 17:00 2:34 SAS Time Informat TIME11. TIME5. TIME5. SAS Date Informat DATE7. DATE9. DATE11.

Note: Five is the minimum acceptable field width for the TIMEw.

Sample Programme
data dates; input LastName $ 1-7 +1 DateIn mmddyy8. +1 Dateout mmddyy8.; cards; Akron 04/05/99 04/05/99 Brown 04/12/99 04/05/99 Carnes 04/27/99 04/05/99 Denison 04/11/99 04/05/99 Fields 04/15/99 04/05/99 Jamison 04/16/99 04/05/99 ;
5 Fields 6 Jamison
14349 14350 14339 14339

Obs

LastNam e

DateI n
14339 14346 14361 14345

Dateou t
14339 14339 14339 14339

1 Akron 2 Brown 3 Carnes 4 Denison

LESSON :9

Creating a Single Observation from Multiple Records


Information for one observation can be spread out over several records. You can write multiple INPUT statements to read each record that comprises a single observation . . .
Objectives In this lesson you learn to read multiple records in a sequential order and create a single observation read multiple records in any order to create a single observation.

input Lname $ 1-8 Fname $ 10-15; input Department $ 1-12 JobCode $ 1519; input Salary comma10.;

input #1 Lname $ 1-8 Fname $ 10-15 #2 Department $ 1-12 JobCode $ 15-19 #3 Salary comma10.;

Line Pointer Controls


Column Specifications
Column Pointer Controls input Name $ 1-12 Age 15-16 Gender $ 18; input Name $12. @15 Age 2. @18 Gender $1.;

But you can also position the input pointer on a specific record by using a line pointer control in the INPUT statement. input #2 Name $ 1-12 Age 15-16 Gender $18;
>----+----10---+---S. Thompson 37 M L. Rochester 31 F M. Sabatello 43 M

There are two basic types of line pointer controls.

The Forward Slash (/) Line Pointer Control The #n Line Pointer Control

The Forward Slash (/) Line Pointer Control


You use the forward slash (/) line pointer control to read multiple records in sequential order. input Lname $ 1-8 Fname $ 10-15 / Department $ 1-12 JobCode $ 15-19 / Salary comma10.; >----+----10---+---ABRAMS THOMAS MARKETING SR01 $25,209.03 BARCLAY ROBERT EDUCATION IN01 $24,435.71 COURTNEY MARK PUBLICATIONS TW01 $24,006.16

Example
data a; input Lname $ 1-8 Fname $ 10-15 / Department $ 1-12 JobCode $ 15-19 / Salary comma10.; cards; ABRAMS THOMAS MARKETING 25,209.03 BARCLAY ROBERT EDUCATION 24,435.71 COURTNEY MARK PUBLICATIONS TW01 24,006.16 ; run; proc print data= a; run; IN01 SR01

Obs

Lname

Fname
OMAS

Department
MARKETING

JobCode
SR01

Salary
25209.03

1 ABRAMS T 2 BARCLAY 3 Y
COURTNE

OBERT

EDUCATION PUBLICATION S

IN01

24435.71

MARK

TW01

24006.16

The #n Line Pointer Control

The #n specifies the absolute number of the line to which you want to move the input pointer. The #n pointer control can read records in any order; therefore, it must be specified before the instructions for reading values in a specific record.
input #2 Department $ 1-12 JobCode $ 15-19 #1 Lname $ 1-8 Fname $ 10-15 #3 Salary comma10.; >----+----10---+---ABRAMS THOMAS MARKETING SR01 $25,209.03 BARCLAY ROBERT EDUCATION IN01 $24,435.71 COURTNEY MARK PUBLICATIONS TW01 $24,006.16

data aa; input #2 Department $ 1-12 JobCode $ 15-19 #1 Lname $ 1-8 Fname $ 10-15 #3 Salary comma10.; cards; ABRAMS THOMAS MARKETING $25,209.03 BARCLAY ROBERT EDUCATION $24,435.71 COURTNEY MARK PUBLICATIONS TW01 $24,006.16 ; run; proc print data= aa;RUN; IN01 1 MARKETING 2 EDUCATION 3 S
PUBLICATION SR01 IN01 TW01

SR01 Obs Department JobCod e Lname


ABRAMS T BARCLAY COURTNE

Fname
OMAS ROBER T MARK

Salary
25209.0 3 24435.7 1 24006

Lesson :10

Creating Multiple Observations from a Single Record


Sometimes you may have raw data files that contain data for several observations in one record. Data is stored in this manner to reduce the size of the entire data file.

Objectives Create multiple observations from a single record that Contains repeating blocks of data create multiple observations From a single record that contains one ID field followed by The same number of repeating fields create multiple observations from a single record that contains one ID field Followed by a varying number of repeating fields.

line-hold specifiers.
The trailing @ enables the next INPUT statement to read from the current record in the same iteration of the DATA step. The double trailing at sign (@@) enables the next INPUT statement to read from the current record across further iterations of the DATA step.

input name $20. @; input name $20. @@;

@@
data temp; input id$ temp @@; cards; pp 68 pp 67 pp 70 ss 68 ss 67 ss 70 kk 68 kk 67 kk 70 Kk 68 tt 67 tt 70 ; run; proc print data= temp; run;
Obs id 1 pp 2 pp 3 pp 4 ss 5 ss 6 ss 7 kk 8 kk 9 kk 10 Kk 11 tt 12 tt temp
68 67 70 68 67 70 68 67 70 68 67 70

@
data rain; input name$ @; do Quarter=1 to 4; input rain @; output; end; cards; hyd 56 67 89 34 23 sec 34 23 45 65 65 gt 34 22 34 54 35 ED 45 65 77 86 56 ;run; proc print data= rain; run;

Obs 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

name
hyd hyd hyd hyd sec sec sec sec gt gt gt gt ED ED ED ED

Quarter
1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4

rain
56 67 89 34 34 23 45 65 34 22 34 54 45 65 77 86

Lesson:11

Creating Variables
When working with data sets, it's useful to create completely new variables or new variables that are based on the values of existing variables. These new variables can contain the results of SAS functions, conditionally-assigned values, or running totals of other variable values This lesson provides a range of techniques for creating and controlling new variables. It also shows how these variables can help you to analyze your data.

Objectives
Create a variable by using a simple assignment statement Create a variable that accumulates values down observations Assign values to a variable conditionally Specify the lengths of new variables Execute multiple statements conditionally.

Assignment Statements
variable=expression; where variable names a new or existing variable expression is any valid SAS expression.

data lab23.drug1h; set research.cltrials; if placebo='YES'; CholChange=chol2-chol1; glucose=glucose+glucose*.1 0; run;

Operators

* / + -

Multiplication Division addition Subtraction

SAS Functions
MIN MAX ROUND MEAN SUM UPCASE returns the smallest value of the arguments returns the largest value of the arguments rounds a value to the nearest roundoff unit computes the arithmetic mean (average) calculates the sum of the arguments converts character values to uppercase letters

data aq; maxvalue= Max (2, 3, 4, 5, 6, 7, 80); round= round(2.99 ); min= min(2, 3, 4, 5, 6, 7, 80); mean= mean(2, 3, 4, 5, 6, 7, 80); sum= sum(2, 3, 4, 5, 6, 7, 80); run; proc print data= aq;run;
Obs 1 maxvalue
80

round
3

min
2

mean
15.2857

sum
107

Assigning Values Conditionally


General form, IF-THEN statement: IF expression THEN statement; where expression is any valid SAS expression. statement is any executable SAS statement.

data finance.newloan; set finance.records(drop=amount rate); TotalLoan+payment; if code='1' then Type='Fixed'; run;

Comparison and Logical Operators When writing IF-THEN statements, you can use any of the following comparison operators

Operator = or eq ^= or ne > or gt < or lt >= or ge <= or le in

Comparison Operation equal to not equal to greater than less than greater than or equal to less than or equal to equal to one of a list

if test<85 and time<=20 then status='RETEST'; if region in ('NE','NW','SW') then rate>=fee-25; if target gt 300 or sales ge 50000 then bonus=salary*.05;

If then else
How to assign new variable based on the existed data set or new data set ?
Eg: data inder; set clinic.admit; if age>30 then type='middme'; else type=' low';run; proc print data= inder;run; New assigned variable

Select when -otherwise end


data a; set clinic.admit; length qurter $ 12.; select ; when (ActLevel in('HIGH'))qurter ='halfyear'; when (ActLevel in('LOW')) qurter ='quarterye'; when (ActLevel in('MOD')) qurter ='fullyearwise'; otherwise; end; run; proc print data=a; run;

statements
KEEP DROP
DROP KEEP

data venu (drop=name id weight height); set clinic.admit; run; proc print data= venu; run;

data venu (keep=name id weight height); set clinic.admit; run; proc print data= venu; run;

Lesson :12
Reading and Concatenating SAS Data Sets
To create a new data set from an existing SAS data set. To create the new data set, you can read a single data set, or you can concatenate two or more data sets. Concatenating appends the observations from one data set to another data set.

Objectives Create new data set from one or more existing sets Select observations based on a condition Select variables to include or exclude.

Syntax
General form, basic DATA step for reading a single data set: DATA SAS-data-set; SET SAS-data-set; RUN; where SAS-data-set in the DATA statement is the name (libref.filename) of the SAS data set to be created SAS-data-set in the SET statement is the name (libref.filename) of the SAS data set to be read.

Mixa

Mixb
1 2 3 4 5 b1 b2 b3 b4 b5

Example:

1 2 3 4 5

a1 a2 a3 a4 a5

data mixab; set mixa mixb; run; proc print data=mixab; run;

Obs num var_A 1 a1 1 2 a2 2 3 a3 3 4 a4 4 5 a5 5 1 6 2 7 3 8 4 9 5 10

var_B

b1 b2 b3 b4 b5

Selecting Observations
To select only those observations that meet a specified condition, you can use a subsetting IF statement in any DATA step. IF expression; Use expressions in SAS programming statements to Transform variables Create new variables Conditionally process Calculate new values Assign new values.

Merging vs. Concatenating


In concatenating, data sets in the SET statement are read sequentially, in the order in which they are listed, until all observations have been processed. The new data set contains all the variables from all the input data sets and the total number of records from all input data sets.

BY statement Most merges are combined with a BY statement to produce a match-merge of two or more data sets. When a BY statement is used, observations are match-merged according to the values of the BY variable(s).

data abc; set a b; by num; run;

Mixa data mixab; set mixa mixb; by num;run; proc print data=mixab; run;
Obs 1 2 3 4 5 6 7 8 9 10
1 2 3 4 5 a1 a2 a3 a4 a5 1

Mixb
b1 b2 b3 b4 b5 2 3 4 5

num
1 1 2 2 3 3 4 4 5 5

var_A
a1

var_B
b1

a2 b2 a3 b3 a4 b4 a5 b5

Same Variable name


data mixab; set mixa mixb; by num; run; proc print data=mixab; run;

Mixa
1 2 3 4 5 a1 a2 a3 a4 a5

Mixb
6 b1 7 b2 8 b3 9 9 b4 b5

Nore:both tables has comman variables (var_A )

Obs num var_A 1 a1 1 2 a2 2 3 a3 3 4 a4 4 5 a5 5 6 b1 6 7 b2 7 8 b3 8 9 b4 9 9 b5 10

Lesson: 13

Merging SAS Data Sets


Introduction

In SAS programming, a common task is to combine


observations from two or more data sets into a single observation in a new data set according to the values of a common variable. In DATA step terminology, this is called match-merging.

Objectives Identify default output from match-merging Prepare data for match-merging by sorting, if necessary Specify data sets to merge and the data set to be created Specify the common variable to use in matching observations Rename any like-named variables to avoid overwriting values Select only matched observations, if desired Select variables Predict the results of match-merging

Preparing Data for Match-Merging In DATA step match-merging, all data sets to be merged must be sorted or indexed by the values of a common variable (also known as a BY variable). The common variable must have the same type and length in all data sets to be merged.

Sorting Data
PROC SORT <DATA=SAS-data-set>; <OUT=SAS-data-set>; BY variable(s); RUN; the DATA= option names the data set to be read the OUT= option creates an output data set containing the data in sorted order variable(s) in the required BY statement specifies the variable(s) whose values are used to order the data.

proc sort data=clinic.demog; By ID; Run;

data address; Infile address input name$ sal dollar10. ; format sal dollar10.;run; proc print data=address; run; proc sort data= address out=venkat; by name ; run;

venu $78900 goutam $90000 hanu $67000 inder $95000 hari $80000
Obs 1 2 3 4 5 name
venu goutam hanu inder hari

sal
$78,900 $90,000 $67,000 $95,000 $80,000

Descending

proc sort data= address out= venkat; by name descending; run; Proc print data= venkat; Run;
Obs 1 2 3 4 5 name
hanu venu hari goutam inder

sal
$67,000 $78,900 $80,000 $90,000 $95,000

Performing a Basic Match-Merge


After all input data sets are sorted or indexed by the value of a BY variable, you can merge the data sets using a DATA step that contains MERGE and BY statements.

data clinic.combined; merge clinic.demog clinic.visit; by id; run; proc print data=clinic.combined; run;

Sales_M data mixed; merge sales_M sales_t; by num; run; proc print data= mixed; run;
Obs 1 2 3 4 5 num
1 2 3 4 5

Sales_T 1 200

1 200 2 300 3 430 4 490 5 509

2 400 3 500 4 900 5 500

monday
200 300 430 490 509

tuesday
200 400 500 900 500

Renaming Variables Sometimes you may have variables with the same name in more than one input data set. In this case, DATA step matchmerging overwrites values of the like-named variable in the first data set in which it appears with values of the like-named variable in subsequent data sets.

RENAME= data set option: (RENAME=(old-variable-name=new-variable-name)) the RENAME= option, in parentheses, follows the name of each data set that contains one or more variables to be renamed old-variable-name names the variable to be renamed new-variable-name specifies the new name for the variable.

Syntax

data clinic.combined; mergeclinic.admit(rename=(date=Visitdate1)) clinic.admitjune(rename=(date=VisitDate2)); by id; run;

Example

data rename; merge sales_m (rename=(sales=sl_1)) sales_T (rename=(sales=sl_2));by num; run; proc print data=rename;run; Obs 1 2 3 4 5 num
1 2 3 4 5

sl_1
200 600 100 640 440

sl_2
800 400 200 400 390

Lesson : 14

Creating List Reports


Introduction To list the information in a data set, you can create a report using a PROC PRINT step. Then you can enhance the report with additional statements and options to create reports like those shown below.

Objectives 1. 2. 3. 4. 5. 6. specify SAS data sets to print select variables and observations to print specify column totals for numeric variables sort data by values of one or more variables assign descriptive labels to variables double space SAS listing output.

Creating a Basic Report General form, basic PROC PRINT step: PROC PRINT <DATA=SAS-data-set>; RUN; where SAS-data-set is the name of the SAS data set to be printed.

Notice the layout of the resulting report. By default,


All observations and variables in the data set are printed. A column for observation numbers appears on the far left , Variables appear in the order that they occur in the data set.

Selecting Variables By default, a PROC PRINT step lists all the variables in a data set. You can select variables and control the order in which they appear by using a VAR statement in your PROC PRINT step.

General form, VAR statement: VAR variable(s); where variable(s) is one or more variable names, separated by blanks.
proc print data=clinic.admit; var age height weight fee; run;

In addition to selecting variables, you can control the default Obs column that PROC PRINT displays to list observation numbers. You can specify text to replace the Obs heading in your PROC PRINT output, or you can choose not to display observation numbers.
Obs
1 2 3 4 5

Age
27 34 31 43 51

Height
72 66 61 63 71

Weight
168 152 123 137 158

Fee
85.20 124.80 149.75 149.75 124.80

Specifying the Obs Column Header proc print data=work.example obs='Patient'; var age height weight fee; run;

Removing the Obs Column To remove the Obs column, you can specify the NOOBS option in the PROC PRINT statement. proc print data=work.example noobs; var age height weight fee; run;

Age
27 34 31 43 51

Height
72 66 61 63 71

Weight
168 152 123 137 158

Fee
85.20 124.80 149.75 149.75 124.80

WHERE statement

General form, WHERE statement: WHERE where-expression; where where-expression specifies a condition for selecting observations. The where-expression can be any valid SAS expression.

proc print data=clinic.admit; var age height weight fee; where age>30; run;

Obs
2 3 4 5 7 8 9 10 11 14 15 16 17 20 21

Age
34 31 43 51 32 35 34 49 44 40 47 60 43 41 54

Height Weight
66 61 63 71 67 70 73 64 66 69 72 71 65 67 71

Fee

152 124.80 123 149.75 137 149.75 158 124.80 151 149.75 173 149.75 154 124.80 172 124.80 140 149.75 163 124.80 173 124.80 191 149.75 123 124.80 141 149.75 183 149.75

Specifying WHERE Expressions

A variable specified in the WHERE statement can be any variable in the SAS data set, not necessarily one of the variables specified in the VAR statement. The WHERE statement works for both character and numeric variables. To specify a condition based on the value of a character variable,

1. 2.

you must enclose the value in quotes write the value with lower and uppercase letters exactly as it appears in the data set.

Examples of WHERE Statements

1.You can use compound expressions like these in your WHERE statements where age<=55 and pulse>75; where area='A' or region='S'; where ID>1050 and state='NC'; 2.When you test for multiple values of the same variable, you specify the variable name in each expression: where actlevel='LOW' or actlevel='MOD'; where fee=124.80 or fee=178.20; 3.You can use the IN operator as a convenient alternative: where actlevel in ('LOW','MOD'); where fee in (124.80,178.20);

Generating Column Totals

General form, SUM statement: SUM variable(s); where variable(s) is one or more variable names, separated by blanks. You do not need to name the variables in a VAR statement if you specify them in the SUM statement.
proc print data=clinic.admit; var age height weight fee; where age>30; sum fee; run;

Obs
2 3 4 5 7 8 9 10 11 14 15 16 17 20 21

Age
34 31 43 51 32 35 34 49 44 40 47 60 43 41 54

Height
66 61 63 71 67 70 73 64 66 69 72 71 65 67 71

Weight
152 123 137 158 151 173 154 172 140 163 173 191 123 141 183

Fee
124.80 149.75 149.75 124.80 149.75 149.75 124.80 124.80 149.75 124.80 124.80 149.75 124.80 149.75 149.75

2071.60

Double Spacing Output


proc print data=clinic.admit double; var age sex weight height; where actlevel='HIGH'; run;

Ob s 1 2 6 11 14 18 20

Ag e

Se x

Weig ht
168 152 193 140 163 188 141

Heig ht
72 66 76 66 69 75 67

27 M 34 F 29 M 44 F 40 F 25 M 41 F

Assigning Descriptive Labels

General form, LABEL statement: LABEL variable-1='label-1' . . . <variable-n='label-n'>; Labels can be up to 256 characters long and must be enclosed in quotes.

proc print data=clinic.therapy label; label walkjogrun='Walk/Jog/Run'; run;

Obs
1 2 3 4

Date
JAN1999 FEB1999 MAR1999 APR1999

AerClass
56 32 35 47

Walk/Jog/Run
78 109 106 115

Swim
14 19 22 24

Lesson : 15

Enhancing Reports
When you create reports, you may want to make your data easy to interpret by adding titles and footnotes, replacing variable names with descriptive labels, or formatting variable values. This lesson shows you how to specify these enhancements. Although this lesson focuses on reports and uses PROC PRINT output in examples, you can apply these enhancements to any SAS procedure output.

Objectives add titles and footnotes replace variable names with descriptive labels format data values.

TITLE statement:
TITLE statements are global statements. That is, after you define a title, it remains in effect until you modify it, cancel it, or end your SAS session

General form, TITLE statement: TITLE<n> 'title-text'; where n is a number from 1 to 10 that specifies the title line, and 'title-text' is the actual title to be displayed. The maximum title length depends on your operating environment and the value of the LINESIZE=option

title1 'Heart Rates for Patients with'; title3 'Increased Stress Tolerance Levels'; proc print data=clinic.stress; var resthr maxhr rechr; where tolerance='I'; run;

Heart Rates for Patients with Increased Stress Tolerance Levels

Obs
2 3 8 11 14 15 20

RestH R
68 78 70 65 74 75 78

MaxH R
171 177 167 181 152 158 189

RecH R
133 139 122 141 113 108 138

Using the TITLES Window You can also specify titles in the TITLES window. These titles are not stored with your program, and they remain in effect only during your SAS session. To open the TITLES window, issue the TITLES command. To specify a title, type in the text you want. To cancel a title, erase the existing text. Notice that you do not enclose title text in quotes in this window

Specifying FOOTNOTES General form, FOOTNOTE statement: FOOTNOTE<n> 'footnote-text'; where n is a number from 1 to 10 that specifies the footnote line, and footnote-text is the actual footnote to be displayed. The maximum footnote length depends on your operating environment .
footnote1 'Data from Treadmill Tests'; footnote3 '1st Quarter Admissions'; proc print data=clinic.stress; var resthr maxhr rechr; where tolerance

Obs
2 3 8 11 14 15 20

RestHR
68 78 70 65 74 75 78

MaxHR
171 177 167 181 152 158 189

RecHR
133 139 122 141 113 108 138

Data from Treadmill Tests

1st Quarter Admissions

Using the FOOTNOTES Window You can also specify footnotes in the FOOTNOTES window. These footnotes are not stored with your program, and they remain in effect only during your SAS session. To open the FOOTNOTES window, issue the FOOTNOTES command. To specify a footnote, type in the text you want. To cancel a footnote, erase the existing text. Notice that you do not enclose footnote text in quotation marks in this window.

System Options

General form, OPTIONS statement: OPTIONS options; where options specifies one or more system options to be changed. The system options available depend on your host system.
options nonumber nodate; proc print data=sales.qtr4; var salesrep type unitsold region; where unitsold>=30; run; options date; proc freq data=sales.qtr4; where unitsold>=30; tables salesrep; run;

Regular Option DATE | NODATE NUMBER | NONU MBER PAGENO= PAGESIZE= LINESIZE= whether the date and time appear in your output whether page numbers appear in your output the beginning page number for your output the number of lines printed on each page of output the print line width for your log and procedure output

Formatting Data Values General form, FORMAT statement: FORMAT variable(s) format-name; where variable(s) is the name of one or more variables whose values are to be written according to a particular pattern format-name specifies a SAS or user-defined format that is used to write out the values.
Formats affect only the way that the data values appear in output, not the actual data values as they are stored in the SAS data set.

proc print data=clinic.admit ; var actlevel fee; where actlevel='HIGH'; label fee='Admission Fee'; format fee dollar4.; run; Obs ActLevel
1 HIGH 2 HIGH 6 HIGH 11 HIGH 14 HIGH 18 HIGH 20 HIGH

Fee
$85 $125 $125 $150 $125 $85 $150

You can permanently assign a format to a variable in a SAS data set, or you can temporarily specify a format in a PROC step to determine the way that the data values appear in output.

Field Widths All SAS formats specify the total field width (w) used for displaying the values in the output. For example, suppose that the longest value for the variable Net is a four digit number, such as 5400. To specify the COMMAw.d format for Net, you specify a field width of 5 or more. You must count the comma, because it occupies a position in the output.

format net comma5.0; 5


1

,
2

4
3

0
4

0
5

Decimal Places For numeric variables you can also specify the number of decimal places (d), if any, to be displayed in the output. Numbers are rounded to the specified number of decimal places. In the example above, no decimal places are displayed. Writing the whole number 2030 as 2,030.00 requires eight print positions, including two decimal places and the decimal point

format qtr3tax comma8.2; 2


1

,
2

0
3

3
4

0
5

.
6

0
7

0
8

Formatting 15374 with a dollar sign, commas, and two decimal places requires ten print positions.

format totalsales dollar10.2; $ 1 5 , 3 7 4 . 0


1 2 3 4 5 6 7 8 9

0
10

Lesson : 16

Formatting Variable Values


Introduction Normally, the format associated with a variable stays in effect only for that programming step. But SAS software also gives you the option of permanently assigning a format to a variable. In addition, you can define your own formats and store them for future use.

Objectives Permanently associate a format with a variable Create your own formats to display variable values Permanently store the formats that you create.

System define formats

User define formats

SAS Formats and the FORMAT Statement COMMAw.d displays numeric values with commas DOLLARw.d displays numeric values with a leading dollar sign ($) and commas.

dollar9. ^

$5,349.41 123456789

Numeric SAS formats, such as the DOLLARw.d format, can also specify a d value, which is the number of decimal places to be displayed. Note that a period separates the w from the d value.
dollar9.2 ^ $5,349.41 123456789

Assigning Formats to Variables

format fee dollar9.2;

Remember that multiple formats and variables can be associated in a single FORMAT statement .

format supplies dollar6.2 fee dollar9.2;

Finally, you can add FORMAT statements to the PRINT procedure to enhance the data values in a report.
proc print data=perm.employee; format salary dollar10.2; run;

Now that you have the FORMAT statement written, how do you permanently associate the formats with their respective variables? You place the FORMAT statement in the DATA step

Creating User-Defined Formats


Sometimes variable values are stored according to a code. For example, when the PRINT procedure displays the data set Perm.Empinfo, notice that the values for JobTitle are coded, and they are
not easily interpreted.

General form, PROC FORMAT statement: PROC FORMAT <options>; where options include LIBRARY=libref specifies the libref for a SAS data library that contains a permanent catalog in which user-defined formats are stored FMTLIB displays the contents of a format catalog.

Permanently Storing Your Formats


You can store your formats in a permanent format catalog named Formats when you specify the LIBRARY= option in the PROC FORMAT statement.

PROC FORMAT LIBRARY=libref;


But first, you need a LIBNAME statement that associates the libref with the permanent SAS data library where the format catalog is to be stored. It is recommended, but not required, that you use the word Library as the libref when creating your own permanent formats.

libname library 'c:\sas\formats\lib';

Defining a Unique Format General form, VALUE statement: VALUE name range-1='label-1' <...range-n='label-n'>; where the format's name must begin with a dollar sign ($) if it applies to a character variable, and it cannot be longer than eight characters cannot be the name of a SAS format cannot end with a number does not end in a period when specified in a VALUE statement.

proc format lib=library; value JobFmt 103='manager' 105='text processor' 111='assoc. technical writer' 112='technical writer' 113='senior technical writer';run;
FirstName
Donny Lisa John Amy Mary Jason

LastName
Evans Helms Higgins Larson Moore Powell

JobTitle

Salary

112 29996.63 105 18567.23 111 25309.00 113 32696.78 112 28945.89 103 35099.50

Obs FirstName
1 Donny 2 Lisa 3 John 4 Amy 5 Mary 6 Jason 7 Judy 8 Neal 9 Henry 10 Chip

LastName
Evans Helms Higgins Larson Moore Powell Riley Ryan Wilson Woods

JobTitle
technical writer text processor assoc. technical writer senior technical writer technical writer manager assoc. technical writer technical writer senior technical writer text processor

Salary
29996.63 18567.23 25309.00 32696.78 28945.89 35099.50 25309.00 28180.00 31875.46 17098.71

The VALUE range specifies a


single value, such as 24 or S range of values, such as 0-1500 list of unique values separated by commas, such as 90,180,270,360. When the specified range contains character values, they must be enclosed in quotes and match the case of the variable's values. The format's name must also start with a dollar sign ($). For example, the VALUE statement below defines the $Answer format that substitutes one character value for another.

Character values

proc format lib=library; value $Answer 'Y'='Yes' 'N'='No' 'U'='Undecided' 'NOP'='No opinion'; run;
Numeric values proc format lib=library; value JobFmt 103='manager' 105='text processor' 111='assoc. technical writer' 112='technical writer' 113='senior technical writer'; run;

Specifying Value Ranges

You can specify a non-inclusive range of numeric values by using the less than symbol (<) to avoid any overlapping. In this example, the range of values from 0 to less than 12 are labeled as child. The next range begins at 12, so the value 12.3 would be assigned the label teenager.
proc format lib=library; value AgeFmt 0-<12='child 12-<20='teenager' 20-<65='adult' 65-<100='senior citizen'; run;

You can also use the keywords LOW and HIGH to specify the lower and upper limits of a variable's value range. The keyword LOW does not include missing values. The keyword OTHER can be used to label missing values and any value that is not specifically addressed in a range. proc format lib=library; value AgeFmt low-<12='child' 12-<20='teenager' 20-<65='adult 65-<high='senior citizen'; other='unknown'; run;

You can define several formats by using multiple VALUE statements in a single PROC FORMAT step. In this example, each VALUE statement defines a different format. proc format lib=library; value JobFmt 103='manager' 105='text processor' 111='assoc. technical writer' 112='technical writer' 113='senior technical writer ; value $Respnse 'Y'='Yes' 'N'='No 'U'='Undecided' 'NOP'='No opinion'; run;

How to call Formats

Numeric

values

Format jobcode

JobFmt.;

Character values
Format jobcode $JobFmt.;

Lesson : 17

Creating Tabular Reports


The TABULATE procedure gives you power and flexibility in summarizing your data in table form. PROC TABULATE creates customized one-, two-, and three-dimensional tables that display any of a large number of descriptive statistics. You can Modify virtually every feature of a table Calculate percentages Produce subreports without sorting data Summarize data and produce a report in one step Generate multiple tables in one step.

Objectives The variables to appear in your table The statistic to be computed for each variable The arrangement of statistics and variables in the table Additional features such as formats for values in the table, column and row totals, and labels for statistics and a summary variable.

One-Dimensional Table : Column expression Proc tabulate data=clinic.diabstat; class type; var premium; table type premium; run;

Two-Dimensional Table : Row Expression proc tabulate data=clinic.admit; class sex; var height weight; table sex,height*min weight*min; run; Height Min Sex F M
61.00 69.00

comma

Weight Min
118.00 147.00

Three-Dimensional Table : page expression including Row and Columns

proc tabulate data=clinic.admit; class sex actlevel; var height weight; table actlevel,sex,height*min weight*min; run;
ActLevel MOD
Weight Min Sex F
66.00 72.00 140.00 168.00 61.00 71.00 118.00 154.00

ActLevel HIGH
Height Min Sex F M

ActLevel LOW
Height Weight Min Min Sex F M
63.00 69.00 123.00 147.00

Height Min

Weight Min

Setting Up a Table

Unlike PROC PRINT, PROC TABULATE doesn't create default reports. So, before you begin writing a PROC TABULATE step, it's a good idea to sketch the table you want.

After you sketch the table, you can write the basic code to compute the statistics you want to display. When you are satisfied with the basic report, you can add options and statements to modify its appearance. In fact, once you define the basic structure of your table, enhancing it is easy.

To set up a table with PROC TABULATE, you need to identify the data you are analyzing, and then determine Which variables, if any, you need to classify your data Which variables, if any, you need to analyze your data The type of table you need to represent your data.

Beginning Your PROC TABULATE Step

General form, PROC TABULATE statement: PROC TABULATE options; where options includes the DATA= option to specify the data set to use. For example:

Specifying Variables After you invoke PROC TABULATE and identify your data set, you need to specify variables to create your tables. As you saw earlier, you need to distinguish between variables that classify your data (into groups, or categories, or classes) and variables used for arithmetic analysis. These are called class variables and analysis variables, respectively. You list

Class variables in a CLASS statement Analysis variables in a VAR statement.

Class Variables
Can be character or numeric. Classify data into groups or categories. Have only a few distinct values, in most cases. (PROC TABULATE prints each value of a class variable.)
PctInsured 50 N Company Parnassus Reliable Ruritan USA Inc.
1.00 1.00 1.00 2.00 2.00 2.00 1.00 1.00 1.00 1.00 2.00 1.00 2.00 1.00 1.00 1.00

60 N

80 N

100 N

Analysis Variables Must be numeric Are used for arithmetic analysis Often contain continuous values.

Total Sum
$13,079.32

BalanceDue Mean
$108.52

Describing the Table After you specify your variables in CLASS or VAR statements, you need to describe the table you want PROC TABULATE to produce.

You use the TABLE statement to specify the number of dimensions in the table (page, row, column) the variables in the table (Sex, Height) the statistics to be calculated (MAX)

Describing the Table General form, TABLE statement: TABLE page-expression,row-expression, column-expression / <options>; where each expression specifies the elements (variables and statistics) in that dimension of the table. These expressions are known collectively as dimension expressions. .

Dimension Expressions Dimension expressions contain elements

Dimension expressions can also contain operators that you use when combining elements to produce the table you want.

Remember this
If a TABLE statement doesn't contain a comma, it requests a one-dimensional table, no matter how many variables or statistics it specifies. A TABLE statement with one comma specifies a twodimensional table. Two commas indicate a three-dimensional table.

Specifying Statistics
Your final task before writing your own PROC TABULATE step is to specify the statistics needed. To request a statistic, you use an operator, the asterisk (*), to attach the statistic to the variable. In the TABLE statements below, the statistic MEAN is specified for the variable Fee. Notice that you don't specify statistics in the CLASS or VAR
statements. Notice also how changing the

Fee proc tabulate data=clinic.admit; var fee; table fee*mean; run; Mean
127.95

proc tabulate data=clinic.admit; var fee; table mean*fee; run;

Mean Fee
127.95

Rules for Specifying Statistics 1. If you specify only class variables in your TABLE statement, The default statistic is N (frequency) The only statistics you can request are N and PCTN (percent of total frequency).
proc tabulate data=clinic.admit; class sex actlevel; table sex*pctn actlevel*n; run;

The TABLE statement in the PROC TABULATE step above specifies only class variables, so it can request only N and PCTN.

2. If you specify any analysis variables in your TABLE statement, A) The default statistic is SUM B) You can request any statistic to be computed on the analysis variables.

proc tabulate data=clinic.admit; class actlevel; var height weight; table height*mean weight*max,actlevel; Run;

3.In a TABLE statement, you can specify statistics in any dimension, but they must all be in the same dimension.

proc tabulate data=clinic.admit; class sex actlevel; var height weight; table height*mean weight*max,actlevel; table sex*pctn actlevel*n; run;

Lesson : 18

Creating Plots
SAS/GRAPH software enables you to display your data graphically. To create a variety of plots, you can use the GPLOT procedure within SAS/GRAPH software

Objectives

Invoke the GPLOT procedure and name the data set to be used Request a plot and specify variables to be plotted Scale axes Select observations Overlay plots Define plotting symbols and the method of interpolation View graphs using the GRAPH, Results, and Explorer windows Specify an output catalog and store graphs temporarily or permanently.

Creating a Basic Plot Let's start by using the GPLOT procedure to plot one variable against another within a set of coordinate axes. You specify a PROC GPLOT statement to invoke the procedure and identify the data set to be used a PLOT statement to specify the variables to be plotted.

proc gplot data=clinic.totals2000; plot newadmit*month; run;

The graph below is the output from the PROC GPLOT step above. The entire graph appears in one default color, and the default plotting symbols (plus signs) are not connected.

Syntax
PROC GPLOT DATA=SAS-data-set; PLOT vertical-variable*horizontal-variable; RUN; where SAS-data-set identifies the data set to be used vertical-variable is the variable plotted on the vertical axis horizontal-variable is the variable plotted on the horizontal axis.

Scaling Axes To scale the axes in your plot, you can specify the VAXIS= and HAXIS= options in the PLOT statement. The VAXIS= option specifies tick marks along the vertical axis, and the HAXIS= option specifies tick marks along the horizontal axis. Notice that a slash (/) precedes options in the PLOT statement VAXIS= and HAXIS= options: PLOT vertical-variable*horizontal-variable / VAXIS=<value-list | range> HAXIS=<value-list | range>; where value-list or range determines the placement of tick marks along the axis.

Specifying Values Avalue list:

haxis='MON' 'TUES' 'WED' 'THUR' 'FRI'


A range of values

vaxis=10 to 100 by 10

Value Lists proc gplot data=clinic.therapy1999; plot aerclass*month / haxis='01' '06' '12'; Run;

Range of Values You can also scale axes by specifying a range of values. Notice the default scaling of the vertical axis in the plot below.

proc gplot data=clinic.therapy1999; plot aerclass*month / haxis='01' '06' '12' vaxis=0 to 100 by 50; run;

When you specify a range of values, be sure to scale the axis in workable increments to accommodate the smallest and largest data values to be plotted.

Enhancing Plots
Now that you know how to create single and overlaid plots, you can use the SYMBOL statement to enhance your plots by specifying plotting symbols, plot lines, color, and interpolation. (Interpolation is a technique for estimating values between plot points and drawing lines to connect the points.)

option...
VALUE= HEIGHT= INTERPOL= WIDTH= COLOR=

Specifies
plotting symbol height of the plotting symbol interpolation technique thickness of the line in pixels color of plotting symbols or lines

symbol1 color=red value=star interpol=spline height=1 cm width=4;

proc gplot data=clinic.totals2000; plot therapy*month; run;

symbol1 color=red value=star interpol=spline height=1 cm width=4; symbol2 color=green value=plus interpol=spline height=1 cm width=4;

proc gplot data=clinic.totals2000; plot therapy*month / overlay; run;

Setting Plotting Symbols The VALUE= (or V=) option specifies the plotting symbol that represents each data point. Possible values for the VALUE= option include

the letters A through W the numbers 0 through 9 a number of special symbols including PLUS, STAR, SQUARE, DIAMOND, TRIANGLE, and many others NONE, which produces a plot with no symbols for data points.

symbol1 value=square color=black; proc gplot data=clinic.totals2000; plot newadmit*month; run;

Setting Plotting Symbol Height percentage of the display area (PCT) inches (IN) centimeters (CM) points (PT) character cells (CELL), which is the default unit.

symbol1 value=triangle height=1 cm color=black; proc gplot data=clinic.totals2000; plot newadmit*month; run;

Specifying Connecting Lines Possible values include NONE, JOIN, NEEDLE, SPLINE, HILO, STD, and more.

symbol1 value=triangle interpol=none;

interpol=join;

symbol1 value=triangle interpol=needle;

Specifying Color

symbol1 value=square interpol=spline height=1 cm width=2.5 color=red;

Viewing and Storing Plots


By default, the plots or other graphs that you create using SAS/GRAPH software are stored as entries in a temporary SAS catalog named Work.Gseg. When you create a graph, it is appended to the catalog. Each stored graph is known as a catalog entry and has the type GRSEG. You can access a stored graph by its four-level name:

Viewing the Contents of the Output Catalog

proc gplot data=clinic.therapy1999 gout=newcat; plot swim*month aerclass*month /; where swim>35 and aerclass>35; run;
After you create graphs, you can view and manage the entries in the catalog where they are stored (whether or not you specified an output catalog using the GOUT= option).

Additional Features
HMINOR= VMINOR=

symbol1 color=red interpol=spline value=none; symbol2 color=blue interpol=spline value=none; proc gplot data=air.airqual; plot avgtsp*month=state / vminor=3 hminor=0; where state in ("AL" , "GA"); run;

CAXIS = CTEXT =

proc gplot data=air.airqual; plot avgtsp*month=state / vminor=3 hminor=0 ctext=brown caxis=red; where state in ("AL" , "NY"); run;

Lesson : 19

Creating Bar and Pie Charts


To create bar charts and pie charts that display relationships within your data graphically,you can use the GCHART procedure within SAS/GRAPH software. You can create charts to display

Bar Chart

Pie Chart

Objectives Invoke the GCHART procedure and specify the data set to be used Specify the type of chart to be created Specify the statistic to be displayed Summarize a variable within categories Specify variables to be displayed Select observations Control the pattern and color of bars and slices Use RUN-group processing View and store charts.

Invoking the GCHART Procedure


PROC GCHART <DATA=SAS-data-set>; chart-form chart-variable </ options>; RUN; where SAS-data-set is the name of the SAS data set to be used. chart-form is HBAR, HBAR3D, VBAR, VBAR3D, PIE, or PIE3D. The chart-form specifies a 2D or 3D horizontal bar chart, vertical bar chart, or pie chart, respectively. chart-variable is the variable that determines the number of bars or pie slices. / (slash) indicates that options follow. options are any valid options for the chart form specified.

Note:The default statistic for GCHART is FREQ (frequency).

Example:

proc gchart data=clinic.admit; hbar sex; vbar age; pie actlevel; run;

Specifying the Chart Type and Variables


After you invoke the GCHART procedure and specify the data set to use, you specify the type of chart you want to create (horizontal bar chart, vertical bar chart, or pie chart) the chart variable that determines the number of bars or pie slices to create.

Character Chart Variables

proc gchart data= clinic.admit; hbar sex; run;

vbar
Numeric Chart Variables

proc gchart data=clinic.admit; vbar age; run;

Default Statistics for Charts

pie
The type of chart determines the statistics displayed. By default, PROC GCHART displays the FREQ (frequency) of the chart variable. The following program creates a pie chart that displays the frequency of the variable ActLevel

proc gchart data=clinic.admit; pie actlevel; run;

hbar
For horizontal bar charts,, PROC GCHART also displays the statistics CFREQ (cumulative frequency), PERCENT (percentage), and CPERCENT (cumulative percentage). The following program creates a horizontal bar chart of the variable Company. Notice the four default statistics for this type of chart.

proc gchart data=clinic.insure; hbar company; run;

Specifying Statistics
To specify a statistic other than the default statistic FREQ, you use the TYPE= option in the statement that specifies the chart. Let's look at the general form of the GCHART procedure again, this time focusing on the options. The TYPE= option is just one of the possible options. Notice that a slash precedes options.

PROC GCHART <DATA=SAS-data-set>; chart-form chart-variable / TYPE=statistic; RUN; where statistic indicates the statistic of the chart variable to be displayed. Statistics include CFREQ (cumulative frequency), PERCENT (percent), and CPERCENT (cumulative percent).

Specifying Statistics
proc gchart data=clinic.insure; vbar company / type=cfreq; run;

Summarizing a Variable within Categories In addition to specifying a particular statistic for your chart, you may want to summarize one variable within categories defined by a second variable. You can use the SUMVAR= option to summarize a variable within categories. PROC GCHART <DATA=SAS-data-set>; chart-form chart-variable / SUMVAR=summary-variable; RUN; where the values of summary-variable are summarized for each unique value of chart-variable.
When you specify SUMVAR=, the default statistic is SUM, so the chart displays the total of the values of the summary variable for each unique value of the chart variable.

When you specify SUMVAR=, the default statistic is SUM, so the chart displays the total of the values of the summary variable for each unique value of the chart variable.

When you use SUMVAR=, you can also use TYPE=. However, the value of TYPE= can be only SUM or MEAN.

proc gchart data=clinic.insure; vbar company / sumvar=balancedue; run;

Enhancing Charts
The default fill for horizontal and vertical bar charts specifies that all bars will be the same solid color. To change the default values for bar charts, you use the PATTERNID= option in the statement that specifies the chart. This topic focuses on PATTERNID=MIDPOINT, which specifies a different color/pattern combination for each bar.

General form, PATTERNID= option: PATTERNID=<BY | MIDPOINT | GROUP | SUBGROUP> where BY, MIDPOINT, GROUP, or SUBGROUP specify that bar colors and/or patterns vary according to the option specified. }}}

This program shows the effect of the PATTERNID=MIDPOINT option on bar color and fill patterns.
proc gchart data=clinic.insure; vbar company / sumvar=balancedue type=mean patternid=midpoint; run;

Enhancing Charts To change the fill pattern to either all solid or all hatch, you can use the FILL= option in the PIE or PIE3D statement. FILL=<X|S>

The program next slide shows how the FILL=X option changes the slice patterns. Only one hatch pattern is used, and it rotates through the color list. Note that one color (green) is used twice because there were not as many colors in the color list as there are slices in the chart (some colors were assigned to companies with no balance due or companies included in OTHER).

Example This program shows the default color and fill pattern for pie charts. Note that both solid and hatch patterns are used.
proc gchart data=clinic.insure; pie company / sumvar=balancedue type=mean; run;
proc gchart data=clinic.insure; pie company / sumvar=balancedue type=mean fill=x; run;

pattern1 color=lib; pattern2 color=lig; proc gchart data=clinic.admit; hbar age / sumvar=weight type=mean subgroup=sex patternid=subgroup mean; run;

Additional Features For pie charts, you can specify text color by using the CTEXT= option, control where labels appear by using the SLICE= option, and explode one or more pie slices for effect by using the EXPLODE= option. we can give slice option for HIGH observation in admit table
proc gchart data=clinic.admit; pie3d actlevel / sumvar=fee type=mean ctext=blue ctext=blue slice=arrow explode="HIGH"; run;

pattern1 color=lib; pattern2 color=lig; proc gchart data=clinic.admit; vbar3d age / sumvar=weight type=mean subgroup=sex patternid=subgroup mean; run;

Lesson :20

Computing Statistics for Numeric Variables


Descriptive statistics such as mean, sum, minimum, and maximum can answer basic questions about numeric data. The MEANS procedure provides these and other data summarization tools, as well as helpful options for controlling your output. This lesson will show you how to use PROC MEANS to analyze your data in several ways.

Procedure Syntax The MEANS procedure can include many statements and options for specifying needed statistics. For the sake of simplicity, let's look at a few key statements and consider the procedure in its basic form. General form, basic MEANS procedure: PROC MEANS <DATA=SAS-data-set> <statistic-keyword(s)> <option(s)>; RUN; where SAS-data-set identifies the data set to process statistic-keyword(s) specifies the statistics to compute option(s) control the content, analysis, and appearance of output.

Objectives Determine the n-count, mean, standard deviation, minimum, and maximum of numeric variables Generate a wide range of descriptive, quantile, and hypothesistesting statistics Control the number of decimal places used in PROC MEANS output Use the VAR statement to analyze specific variables Use CLASS and BY statements to categorize data.

In its simplest form, PROC MEANS prints the n-count (number of non-missing values), mean, standard deviation, and minimum and maximum values of every numeric variable in a data set.
proc means data=perm.survey; run;
Variable
Item1 Item2 Item3 Item4 Item5 Item6 Item7 Item8 Item9 Item10 Item11 Item12 Item13 Item14 Item15 Item16 Item17 Item18

N
4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4

Mean
3.7500000 3.0000000 4.2500000 3.5000000 3.0000000 3.7500000 3.0000000 2.7500000 3.0000000 3.2500000 3.0000000 2.7500000 2.7500000 3.0000000 3.0000000 2.5000000 3.0000000 3.2500000

Std Dev
1.2583057 1.6329932 0.5000000 1.2909944 1.6329932 1.2583057 1.8257419 1.5000000 1.4142136 1.2583057 1.8257419 0.5000000 1.5000000 1.4142136 1.6329932 1.9148542 1.1547005 1.2583057

Minimum
2.0000000 1.0000000 4.0000000 2.0000000 1.0000000 2.0000000 1.0000000 1.0000000 2.0000000 2.0000000 1.0000000 2.0000000 1.0000000 2.0000000 1.0000000 1.0000000 2.0000000 2.0000000

Maximum
5.0000000 5.0000000 5.0000000 5.0000000 5.0000000 5.0000000 5.0000000 4.0000000 5.0000000 5.0000000 5.0000000 3.0000000 4.0000000 5.0000000 5.0000000 5.0000000 4.0000000 5.0000000

Specifying Statistics The default statistics produced by the MEANS procedure (n-count, mean, standard deviation, minimum, and maximum) are not always the ones that you need. You might prefer to limit output to the mean of the values. Or you might need to compute a different statistic, such as the median or range of the values. To specify statistics, include statistic keywords as PROC MEANS options. When you specify a statistic in the PROC MEANS statement, default statistics are not produced. For example, to see the median and range of Perm.Survey numeric values, add the MEDIAN and RANGE keywords as options.

statistic keywords
Descriptive Statistics CLM Two-sided confidence limit for the mean CSS Corrected sum of squares CV Coefficient of variation KURTOSIS Kurtosis LCLM One-sided confidence limit below the mean MAX Maximum value MEAN Average MIN Minimun value N Number of observations with nonmissing values NMISS Number of observations with missing values RANGE Range SKEWNESS Skewness STDDEV / STD Standard Deviation STDERR Standard error of the mean SUM Sum SUMWGT Sum of the Weight variable values. UCLM One-sided confidence limit above the mean USS Uncorrected sum of squares VAR Variance

Quantile Statistics
MEDIAN / P50

Median or 50th percentil 1st percentile 5th percentile 10th percentile Lower quartile or 25th percentile Upper quartile or 75th percentile 90th percentile 95th percentile 99th percentile Difference between upper and lower quartiles: Q3-1

P1 P5 P10 Q1 / P25 Q3 / P75 P90 P95 P99 QRANGE

proc means data=perm.survey run;


Variable
Item1 Item2 Item3 Item4 Item5 Item6 Item7 Item8 Item9 Item10 Item11 Item12 Item13 Item14 Item15 Item16 Item17 Item18

median range;
Range
3.0000000 4.0000000 1.0000000 3.0000000 4.0000000 3.0000000 4.0000000 3.0000000 3.0000000 3.0000000 4.0000000 1.0000000 3.0000000 3.0000000 4.0000000 4.0000000 2.0000000 3.0000000

Median
4.0000000 3.0000000 4.0000000 3.5000000 3.0000000 4.0000000 3.0000000 3.0000000 2.5000000 3.0000000 3.0000000 3.0000000 3.0000000 2.5000000 3.0000000 2.0000000 3.0000000 3.0000000

Limiting Decimal Places By default, PROC MEANS output uses the BEST. format. This can result in unnecessary decimal places, making your output hard to read. To limit decimal places, use the MAXDEC= option in the PROC MEANS statement and set it equal to the length you prefer.

General form, PROC MEANS statement with MAXDEC= option: PROC MEANS <DATA=SAS-data-set> <statistic-keyword(s)> MAXDEC=n; where n specifies the maximum number of decimal places.

proc means data=clinic.diabetes min max maxdec=0; run;

Variable
Age Height Weight Pulse FastGluc PostGluc

Minimum
15 61 102 65 152 206

Maximum
63 75 240 100 568 625

Variables
proc means data=clinic.diabetes min max maxdec=0; var age height weight; run;

Variable
Age Height Weight

Minimum
15 61 102

Maximum
63 75 240

CLASS Group Processing You will often want statistics for grouped observations, instead of for observations as a whole. For example, census numbers are more useful when grouped by region than when viewed as a national total. To produce separate analyses of grouped observations, add a CLASS statement to the MEANS procedure proc means data=clinic.heart maxdec=1; var arterial heart cardiac urinary; class survive sex; run;

Survive
DIED

Sex
1

N Obs Variable
4 Arterial Heart Cardiac Urinary Arterial Heart Cardiac Urinary Arterial Heart Cardiac Urinary Arterial Heart Cardiac Urinary

N
4 4 4 4 6 6 6 6 5 5 5 5 5 5 5 5

Mean
92.5 111.0 176.8 98.0 94.2 103.7 318.3 100.3 77.2 109.0 298.0 100.8 78.8 100.0 330.2 111.2

Std Dev
10.5 53.4 75.2 186.1 27.3 16.7 102.6 155.7 12.2 32.0 139.8 60.2 6.8 13.4 87.0 152.4

Minimum
83.0 54.0 95.0 0.0 72.0 81.0 156.0 0.0 61.0 77.0 66.0 44.0 72.0 84.0 256.0 12.0

Maximum
103.0 183.0 260.0 377.0 145.0 130.0 424.0 405.0 88.0 149.0 410.0 200.0 87.0 111.0 471.0 377.0

SURV

Group Processing
Like the CLASS statement, the BY statement specifies variables to use for categorizing observations

General form, BY statement: BY variable(s); where BY variable(s) specifies category variables for group processing.

1.Unlike CLASS processing, BY processing requires that your data already be sorted in the order of the BY variables. Unless data set observations are already sorted, you will need to run the SORT procedure before using PROC MEANS with any BY group

proc sort data=clinic.heart out=work.hartsort; by survive sex; run; proc means data=work.hartsort maxdec=1; var arterial heart cardiac urinary; by survive sex; run;

Survive=DIED Sex=1 Variable


Arterial Heart Cardiac Urinary

N
4 4 4 4

Mean
92.5 111.0 176.8 98.0

Std Dev
10.5 53.4 75.2 186.1

Minimum
83.0 54.0 95.0 0.0

Maximum
103.0 183.0 260.0 377.0

Survive=DIED Sex=2 Variable


Arterial Heart Cardiac Urinary

N
6 6 6 6

Mean
94.2 103.7 318.3 100.3

Std Dev
27.3 16.7 102.6 155.7

Minimum
72.0 81.0 156.0 0.0

Maximum
145.0 130.0 424.0 405.0

Survive=SURV Sex=1 Variable


Arterial Heart Cardiac Urinary

N
5 5 5 5

Mean
77.2 109.0 298.0 100.8

Std Dev
12.2 32.0 139.8 60.2

Minimum
61.0 77.0 66.0 44.0

Maximum
88.0 149.0 410.0 200.0

Survive=SURV Sex=2 Variable


Arterial Heart Cardiac Urinary

N
5 5 5 5

Mean
78.8 100.0 330.2 111.2

Std Dev
6.8 13.4 87.0 152.4

Minimum
72.0 84.0 256.0 12.0

Maximum
87.0 111.0 471.0 377

Lesson :21

Computing Frequency Distributions


Frequency tables show the distribution of variable values, both as percentages of a total and as counts of data. SAS software's FREQ procedure creates oneway frequency tables and two-way and n-way crosstabulation tables. It can also compute measures of association and of agreement, and organize output by stratification variables. The following lesson will show you how to use PROC FREQ to perform basic data analysis.

Procedure Syntax The FREQ procedure can include many statements and options for controlling frequency output. For the sake of simplicity, we'll consider the procedure in its basic form.

General form, basic FREQ procedure: PROC FREQ <DATA=SAS-data-set>; RUN; where SAS-data-set names the data set to be used.

By default, PROC FREQ creates a one-way table with the frequency, percent, cumulative frequency, and cumulative percent of every value of all variables in a data set.

proc freq data=parts.widgets; run;

ItemName Bolt Locknut Washer

Frequency
2930 3106 2451

Percent
34.52 36.60 28.88

Cumulative Frequency
2930 6036 8487

Cumulative Percent
34.52 71.12 100.00

Selecting Variables

General form, TABLES statement: TABLES variable(s); where variable(s) lists the variables to include.

Region East North South West

Frequenc y
2848 1355 1706 2578

Percent
33.56 15.97 20.10 30.38

Cumulativ e Frequency
2848 4203 5909 8487

Cumulativ e Percent
33.56 49.53 69.63 100.00

Specifying Frequency Order By default, PROC FREQ displays frequency distributions in the order of each variable's unformatted values. This is known as internal order. proc freq data=clinic.diabetes; tables height; run; Cumula
Heig ht
61 62 63 64 65 66 68 70 71 72 73 75

Freque ncy
2 1 1 3 2 2 2 2 2 1 1 1

Perc ent
10.00 5.00 5.00 15.00 10.00 10.00 10.00 10.00 10.00 5.00 5.00 5.00

tive Freque ncy


2 3 4 7 9 11 13 15 17 18 19 20

Cumula tive Percent


10.00 15.00 20.00 35.00 45.00 55.00 65.00 75.00 85.00 90.00 95.00 100.00

You might prefer to view the values in a different order. To control the way that PROC FREQ displays distributions, add the ORDER= option to the PROC FREQ statement and specify the method you prefer. General form, ORDER= option: ORDER=DATA|FORMATTED|FREQ|INTERNAL where DATA orders values by appearance in the data set FORMATTED orders by formatted value FREQ orders values by descending frequency count INTERNAL orders by unformatted value (default).

proc freq data=clinic.diabetes order=data; tables height; run;


Height
61 71 66 64 63 72 62 73 65 70 75 68

Frequenc y
2 2 2 3 1 1 1 1 2 2 1 2

Percen t
10.00 10.00 10.00 15.00 5.00 5.00 5.00 5.00 10.00 10.00 5.00 10.00

CumFrequ ency
2 4 6 9 10 11 12 13 15 17 18 20

Cumulativ e Percent
10.00 20.00 30.00 45.00 50.00 55.00 60.00 65.00 75.00 85.00 90.00 100.00

Example: ORDER=FREQ proc freq data=clinic.diabetes order=freq; tables height; run;


Heig ht
64 61 65 66 68 70 71 62 63 72 73 75

Freque ncy
3 2 2 2 2 2 2 1 1 1 1 1

Cumulat ive Cumulat Perce Frequen ive nt cy Percent


15.00 10.00 10.00 10.00 10.00 10.00 10.00 5.00 5.00 5.00 5.00 5.00 3 5 7 9 11 13 15 16 17 18 19 20 15.00 25.00 35.00 45.00 55.00 65.00 75.00 80.00 85.00 90.00 95.00 100.00

Two-Way Tables Two-Way Tables So far, you have used the FREQ procedure to create one-way tables of frequency. The table results show total frequency counts for the values within the data set. However, it is often helpful to crosstabulate frequencies with the values of other variables. Census data, for example, is typically crosstabulated with a variable representing geographical regions. The simplest crosstabulation is a two-way table. To create a two-way table, join two variables with asterisks (*) in the TABLES statement of a PROC FREQ step.

General form, TABLES statement for crosstabulation: TABLES variable-1*variable-2 <* ... variable-n>; where (for two-way tables) variable-1 specifies table rows variable-2 specifies table columns.

Lesson: 23 Generating Data with DO Loops

You can execute SAS statements repeatedly by placing them in a DO loop. Unlike simple DO statements, which execute as a group when an IF condition is met, DO loops execute any number of times in a single iteration of the DATA step. Using DO loops lets you write concise DATA steps that are easier to alter and debug.

For example, the DO loop in this program eliminates the need for 12 separate programming statements to calculate annual earnings:

data finance.earnings; set finance.master; Earned=0; do count=1 to 12; earned+(amount+earned)*(rate/12); end; run;

You can also use DO loops to generate data conditionally execute statements read data.

Objectives After completing this lesson, you will be able to construct a DO loop to perform repetitive calculations control the execution of a DO loop generate multiple observations in one iteration of the DATA step construct nested DO loops.

Constructing DO Loops

DO loops process a group of statements repeatedly rather than once. This can greatly reduce the number of statements required for a repetitive calculation. For example, these twelve sum statements compute a company's annual earnings from investments. Notice that all twelve statements are identical.

data finance.earnings; set finance.master; Earned=0; earned+(amount+earned)*(rate/12); earned+(amount+earned)*(rate/12); earned+(amount+earned)*(rate/12); earned+(amount+earned)*(rate/12); earned+(amount+earned)*(rate/12); earned+(amount+earned)*(rate/12); earned+(amount+earned)*(rate/12); earned+(amount+earned)*(rate/12); earned+(amount+earned)*(rate/12); earned+(amount+earned)*(rate/12); earned+(amount+earned)*(rate/12); earned+(amount+earned)*(rate/12); run;

A DO loop enables you to achieve the same results with fewer statements. In this case, the sum statement executes twelve times within the DO loop during each iteration of the DATA step

data finance.earnings; set finance.master; Earned=0; do count=1 to 12; earned+(amount+earned)*(rate/12); end; run;

Constructing DO Loops

General form, DO loop:

DO index-variable=specification1<,...specification-n>; more SAS statements END;


where index-variable controls the execution of the DO group specification includes start, stop, and increment values SAS statements create, modify, or process variables END terminates the loop.

When creating a DO loop with the iterative DO statement, you must specify an index variable. The index variable stores the value of the current iteration of the DO loop. Use any valid SAS name. However, this specification increments the index variable by 2, resulting in rows values of 2, 4, 6, 8, 10, 12:

do quiz=1 to 5; do rows=2 to 12 by 2;

DO Loop Execution Using the form of the DO loop just presented, let's see how the DO loop executes in the DATA step. This example calculates the interest earned each month for a one-year investment.

data finance ; Amount=1000; Rate=.075/12; do month=1 to 12; Earned+(amount+earned)*(rate); end; run;

This DATA step does not read data from another source. When submitted, it compiles and then executes only once to generate data. During compilation, the program data vector is created for the Finance.Earnings data set.
data finance ; Amount=1000; Rate=.075/12; do month=1 to 12; Earned+(amount+earned)*(rate); end; run;

When the DATA step executes, the values of Amount and Rate are
assigned.

data finance ; Amount=1000; Rate=.075/12; do month=1 to 12; Earned+(amount+earned)*(rate); end; run;

Next, the DO loop executes. During each execution of the DO loop, the value of Earned is calculated and added to its previous value. On the
twelfth execution of the DO loop, the program data vector looks like this:

data finance ; Amount=1000; Rate=.075/12; do month=1 to 12; Earned+(amount+earned)*(rate); end; run;

You can also include an OUTPUT statement within the DO loop to write an observation to the data set for each iteration of the DO loop. The OUTPUT statement writes the current values to the data set immediately. By including the OUTPUT statement, you override the automatic output at the end of the DATA step. When the index variable is 21, the observation is not written to the data set.

data work.earn; Value=2000; do year=1 to 20; Interest=value*.075; value+interest; output; end; run;

Decrementing DO Loops You can decrement a DO loop by specifying a negative value for the BY clause. For example, the specification in this iterative DO statement decreases the index variable by 1, resulting in values of 5, 4, 3, 2, 1. start V stop V

DO index-variable=5 to 1 by -1; more SAS statements END;

Specifying a Series of Items

You can also specify how many times a DO loop executes by listing items in a series All numeric values
DO index-variable=2,5,9,13,27; more SAS statements END;

All character values, with each value enclosed in quotation marks


DO index-variable='MON','TUE','WED','THR','FRI'; more SAS statements END;

All variable names

DO index-variable=win,place,show; more SAS statements END;

Variable names must represent either all numeric or all character values. Do not enclose variable names in quotation marks.

Nesting DO Loops Iterative DO statements can be executed within a DO loop. Putting a DO loop within a DO loop is called nesting.

data work.earn; do year=1 to 20; Capital+2000; do month=1 to 12; Interest=capital*.075; capital+interest; end; end; run;

Lesson 23
Improving Program Efficiency with Macro Variables

Introduction SAS macro variables enable you to substitute text in your SAS programs. Macro variables can supply a variety of information, from operating system information to SAS session information to any text string you define. By substituting text into programs, SAS macro variables make your programs easy to update, as this program shows:

%let year=1999; title "Temporary Employees for &year"; data hrd.newtemp; set hrd.temp; if year(enddate)=&year; run; proc print data=hrd.newtemp; run;

Objectives Create macro variables with the %LET statement Reference automatic and user-defined macro variables Identify when macro variable references are resolved Display log messages stating how macro variable references resolve when using the SYMBOLGEN option Combine macro variable references with prefixes and suffixes.

Understanding Macro Variables


This section introduces you to the function of SAS macro variables. After completing this section, you will be able to Identify when to include a macro variable in a SAS program Identify the different types of macro variables.

When writing your SAS programs, you may find that you need to reference the same variable, data set, or text string several times in the same program.

title "Temporary Employees for 1999"; data hrd.temp1999; set hrd.temp; if year(enddate)=1999; run; proc print data=hrd.temp1999; run;

SAS Macro Variables

SAS macro variables are part of the macro facility, which is a tool for extending and customizing SAS software and for reducing the amount of text you must enter to complete tasks. A macro variable is independent of a SAS data set and contains one value that remains constant until you change it. The value of a macro variable is a text string that becomes part of your program whenever the macro variable is referenced.

Types of Macro Variables There are two types of macro variables: Automatic macro variables User-defined macro variables. Automatic Macro Variables Whenever you invoke the SAS System, automatic macro variables are created that provide such information as the date or time a SAS job or session began executing release of SAS software you are running name of the most recently created SAS data set abbreviation for your host operating system.

Name Information Supplied SYSDATE9 date the job or session began executing SYSDATE SYSDAY SYSTIME SYSSCP SYSVER SYSLAST date the job or session began executing weekday the job or session began executing time the job or session began executing operating system abbreviation SAS software version and/or release number name of most recently created data set

Example 21APR2000 16FEB98 Tuesday 15:32 CMS 7.0 HRD.TEMP9 9

Notice that all automatic macro variables begin with the letters SYS. SAS software reserves the right to use the SYS prefix for automatic macro variables. It is recommended that you not begin the name of any user-defined macro variable with the letters SYS. For a complete list of automatic macro variables, refer to SAS Macro Language: Reference.

title 'Temporary Employees Hired in November'; footnote "Report Run on &sysday, &sysdate"; data hrd.tempnov; set hrd.temp; if month(begindate)=11; run; proc print data=hrd.tempnov; Run;

For example, this program uses the SYSDAY macro variable to execute a section of the program only on a specified day of the week.

data hrd.temppay(drop=day); set hrd.newtemp (keep=name payrate hours1 hours2); total=hours1+hours2; Day="&sysday"; if day='Friday' then do; gross=input(payrate,2.)*total; tempfee=gross*.05; end; run; proc print data=hrd.temppay; run;

Creating Your Own Macro Variables

Create a macro variable using the %LET statement Display log messages stating how macro variable references resolve when using the SYMBOLGEN option Combine macro variable references with prefixes and suffixes Create a macro variable during DATA step execution by using the CALL SYMPUT routine.

User-Defined Macro Variables General form, %LET statement: %LET name=value; where: name specifies the name of your macro variable value is the value of the macro variable. Do not begin the macro variable name with the letters SYS. The use of this prefix is reserved to SAS software for automatic macro variable names. The name that you define must follow the rules for SAS names. Everything appearing between the equal sign and semicolon is considered part of the macro variable value.

User-Defined Macro Variables

%let region=northwest; %let level=768;


This %LET statement assigns the value 700+700*.05 to the macro variable rate. %let rate=700+700*.05;

%let year=1999; title "Temporary Employees for &year"; data hrd.newtemp; set hrd.temp; if year(enddate)=&year; run; proc print data=hrd.newtemp; run;

SYMBOLGEN Option

General form, OPTIONS statement with SYMBOLGEN option: OPTIONS NOSYMBOLGEN | SYMBOLGEN; where NOSYMBOLGEN specifies that log messages will not be displayed. The default is NOSYMBOLGEN.

SYMBOLGEN specifies that log messages will be displayed.

options symbolgen; %let year=1999; title "Temporary Employees for &year"; data hrd.newtemp; set hrd.temp; if year(enddate)=&year; run; proc print data=hrd.newtemp; run;

SAS Log

36 options symbolgen; 37 %let year=1999; 38 data hrd.newtemp; 39 set hrd.temp; 40 if year(enddate)=&year; SYMBOLGEN: Macro variable YEAR resolves to 1999 NOTE: The data set HRD.NEWTEMP has 8 observations and 18 variables. NOTE: The DATA statement used 0.71 seconds. 41 proc print data=hrd.newtemp; SYMBOLGEN: Macro variable YEAR resolves to 1999 42 title "Temporary Employees for &year"; 43 run;

Using Suffixes That Begin with a Period


If your macro variable reference must appear before a suffix that begins with a period, you must specify two periods between the macro variable reference and the suffix. The first period ends the macro variable reference. The second period is the period that separates the first and second level names of your data set.

Using Suffixes That Begin with a Period


If your macro variable reference must appear before a suffix that begins with a period, you must specify two periods between the macro variable reference and the suffix. The first period ends the macro variable reference. The second period is the period that separates the first and second level names of your data set.
%let yr=1999; %let period=end; %let libref=hrd; title "Temporary Employees for &yr"; data &libref..temp&yr; set &libref..temp; if year(&period.date)=&yr; run; proc print data=&libref..temp&yr; run;

When using two periods, the macro variable references in your program resolve correctly. For example, the DATA statement data &libref..temp&yr;

is processed by SAS software as data hrd.temp1999;

The CALL SYMPUT Routine


You can use the CALL SYMPUT routine to create a macro variable whose value is assigned during the execution of the DATA step. To learn how to use this routine, let's look more closely at your DATA step. The DATA step Hrd.Overtime and include only the overtime hours. It overtime hours. below creates the new data set uses the subsetting IF statement to names of employees who worked also calculates the total number of

General form, CALL statement for CALL SYMPUT routine: CALL SYMPUT( name,value); where CALL is the keyword. SYMPUT invokes the SYMPUT routine. name is the name of the macro variable to be defined. The variable can be a character string enclosed in quotes, a character variable, or a character expression. value is the value to be assigned to the macro variable. The value can be a text string enclosed in quotes, a data set variable, or a DATA step expression.

Lesson 24
MACRO LANGUAGE

The macro facility is a tool for extending and customizing the SAS System. It allows you to abbreviate a large amount of text conveniently and to make text substitutions easily. It contains a programming language to enable you to execute parts of a SAS program (even entire steps) conditionally; data-entry features that accept input from a user; and a means of communicating information between steps of a SAS job.

%SYSPROD %SYSRC SYSRC %SYSRPUT SYSSCPL SYSSCP and SYSSCPL SYSSITE SYSTIME %TRIM and %QTRIM %UNQUOTE %UPCASE and %QUPCASE SYSVER SYSVLONG and SYSVLONG4 %VERIFY %WINDOW

%IF-%THEN/%ELSE IMPLMAC %INDEX %INPUT INTO %label %LEFT and %QLEFT %LENGTH %LET %LOCAL %LOWCASE and %QLOWCASE %MACRO MACRO MAUTOSOURCE %MEND MERROR MFILE

MLOGIC MPRINT MRECALL MSTORED MSYMTABMAX= MVARSIZE= %NRBQUOTE %NRQUOTE %NRSTR %PUT %QCMPRES %QLEFT %QLOWCASE %QSCAN %QSUBSTR %QSYSFUNC %QTRIM

%QUOTE and %NRQUOTE %QUPCASE RESOLVE SASAUTOS= SYMBOLGEN SYMGET SYMGETN SYMPUT SYMPUTN SYSBUFFR %SYSCALL SYSCC

SYSERR %SYSEVALF %SYSEXEC SYSFILRC %SYSFUNC and %QSYSFUNC SYSMENV SYSMSG SYSPARM= SYSPARM SYSPBUFF

SYSCHARWIDTH SYSCMD SYSDATE SYSDATE9 SYSDAY SYSDEVIC SYSDMG SYSDSN SYSENV

Example macro programme

%macro isname(name); %let name=%upcase(&name); %if %length(&name)>8 %then %put &name: The fileref must be 8 characters or less %else %do; %let first=ABCDEFGHIJKLMNOPQRSTUVWXYZ_; %let all=&first.1234567890; %let chk_1st=%verify(%substr(&name,1,1),&first); %let chk_rest=%verify(&name,&all); %if &chk_rest>0 %then %put &name: "%substr(&name,&chk_rest,1)".; %if &chk_1st>0 %then %put &name: "%substr(&name,1,1)".; %if (&chk_1st or &chk_rest)=0 %then %put &name is a valid fileref.; %end; %mend isname;

%end; %mend isname; %isname(file1) %isname(1file) %isname(filename1) %isname(file$)


Executing this program writes to the SAS log: FILE1 is a valid fileref. 1FILE: The first character cannot be "1". FILENAME1: The fileref must be 8 characters or less. FILE$: The fileref cannot contain "$".

macro settax(taxrate); %IF %THEN; %let taxrate = %upcase(taxrate); %if &taxrate = CHANGE %then %do; <%ELSE;> data thisyear; set lastyear; if sale > 100 then tax = .05; else tax = .08; run; %end; %else %if &taxrate = SAME %then %do; data thisyear; set lastyear; tax = .03; run; %end; %mend settax;

% length %let a=Happy; %let b=Birthday; %put The length of &a is %length(&a).; %put The length of &b is %length(&b).; %put The length of &a &b is to you ;

Executing these statements writes to the SAS log: The length of Happy is 5. The length of Birthday is 8. The length of Happy Birthday To You is 13.

Creating Global Variables in a Macro Definition

%macro vars(first=1,last=); %global gfirst glast; %let gfirst=&first; %let glast=&last; var test&first-test&last; %mend vars;

When you submit the following program, the macro VARS generates the VAR statement and the values for the macro variables used in the title statement

proc print; %vars(last=50); title "Analysis of Tests &gfirst&glast"; run;


The SAS System sees the following:
PROC PRINT; VAR TEST1-TEST50; TITLE "Analysis of Tests 1-50"; RUN;

Macro

Non-macro

%label
%macro info(type); %if %upcase(&type)=SHORT %then %goto quick; /* No % here */ proc contents; run; proc freq; tables _numeric_; run; %quick: proc print data=_last_(obs=10); /* Use % here */ run; %mend info; %info(short)

Global Macro Variables

%let county=Clark; %macro concat; data _null_; longname="&county"||" County"; put longname; run; %mend concat; %concat Calling the macro CONCAT produces the following statements: data _null_; longname="Clark"||" County"; put longname; run;

Lesson :25

SAS/STAT
INTRODUCTION Sometimes you need quick answers to questions about your data. You may want to query your data to examine relationships between data values view a subset of your data compute values quickly.

PROC ANOVA< options > ; CLASS variables ; MODEL dependents=effects < / options > ; ABSORB variables ; BY variables ; FREQ variable ; MANOVA < test-options >< / detail-options > ; MEANS effects < / options > ; REPEATED factor-specification < / options > ; TEST < H=effects > E=effect ;

PROC GLM
To use PROC GLM, the PROC GLM and MODEL statements are required. You can specify only one MODEL statement (in contrast to the REG procedure, for example, which allows several MODEL statements in the same PROC REG run). If your model contains classification effects, the classification variables must be listed in a CLASS statement, and the CLASS statement must appear before the MODEL statement. In addition, if you use a CONTRAST statement in combination with a MANOVA, RANDOM, REPEATED, or TEST statement, the CONTRAST statement must be entered first in order for the contrast to be included in the MANOVA, RANDOM, REPEATED, or TEST analysis.

PROC GLM < options > ; CLASS variables ; MODEL dependents=independents < / options > ; ABSORB variables ; BY variables ; FREQ variable ; ID variables ; WEIGHT variable ; CONTRAST 'label' effect values < ... effect values > < / options > ; ESTIMATE 'label' effect values < ... effect values > < / options > ; LSMEANS effects < / options > ; MANOVA < test-options >< / detail-options > ; MEANS effects < / options > ; OUTPUT < OUT=SAS-data-set > keyword=names < ... keyword=names > < / option > ; RANDAM effects < / options > ; REPEATED factor-specification < / options > ; TEST < H=effects > E=effect < / options > ;

PROC REG
In the preceding list, brackets denote optional specifications, and vertical bars denote a choice of one of the specifications separated by the vertical bars. In all cases, label is optional. The PROC REG statement is required. To fit a model to the data, you must specify the MODEL statement. If you want to use only the options available in the PROC REG statement, you do not need a MODEL statement, but you must use a VAR statement. Several MODEL statements can be used. In addition, several MTEST, OUTPUT, PAINT, PLOT, PRINT, RESTRICT, and TEST statements can follow each MODEL statement. The BY, FREQ, ID, VAR, and WEIGHT statements are optionally specified once for the entire PROC step, and they must appear before the first RUN statement.

PROC REG < options > ; < label: > MODEL dependents=<regressors> < / options > ; BY variables ; FREQ variable ; ID variables ; VAR variables ; WEIGHT variable ; ADD variables ; DELETE variables ; < label: > MTEST <equation, ... ,equation> < / options > ; OUTPUT < OUT=SAS-data-set > keyword=names < ... keyword=names > ; PAINT <condition | ALLOBS> < / options > | < STATUS | UNDO> ; PLOT <yvariable*xvariable> <=symbol> < ...yvariable*xvariable> <=symbol> < / options > ; PRINT < options > < ANOVA > < MODELDATA > ; REFIT; RESTRICT equation, ... ,equation ; REWEIGHT <condition | ALLOBS> < / options > | < STATUS | UNDO> ; < label: > TEST equation,<, ...,equation> < / option > ;

Transforming Data with SAS Functions

Objectives convert character data to numeric data convert numeric data to character data create SAS date values extract the month and year from a SAS date value extract, edit, and search character variable values.

Uses of SAS Functions

calculate sample statistics create SAS date values round values generate random numbers extract a portion of a character value convert data from one data type to another.

General form, SAS function


function-name(argument-1< ,argumentn>); where argument can be
variables mean(x,y,z) constants mean(456,502,612,498) expressions mean(37*2,192/5,mean(22,34,56)) Even if the function does not require arguments, the function name must still be followed by parentheses, for example: function-name().

Automatic Character-to-Numeric Conversion SAS Log data hrd.newtemp; set hrd.temp; Salary=payrate*hours; run; NOTE: Character values have been converted to numeric values at the places given by: (Line):(Column).

When Automatic Conversion Occurs Automatic character-to-numeric conversion occurs when a character value is assigned to a previously defined numeric variable, such as the numeric variable Rate Rate=payrate; used in an arithmetic operation Salary=payrate*hours; compared to a numeric value with a comparison operator if payrate>=rate; specified in a function that requires numeric arguments. NewRate=sum(payrate,raise);

Explicit Character-to-Numeric Conversion


General form INPUT function: INPUT (source,informat) where source indicates the character variable, constant, or expression to be converted to a numeric value a numeric informat must also be specified, as in this example: input(payrate,2.)

Explicit Numeric-to-Character Conversion


General form PUT function: PUT (source,format) where source indicates the numeric variable, constant, or expression to be converted to a character value a format matching the data type of the source must also be specified, as in this example: put(site,2.)

MDY Function
General form MDY function: MDY(month,day,year) where month can be a variable that represents the month or a number from 1-12 day can be a variable that represents the day or a number from 1-31 year can be a variable that represents the year or a number with 2 or 4 digits.

TODAY Function
General form, TODAY function: TODAY()
data hrd.newtemp; set hrd.temp; EditDate=today(); run; proc print data=hrd.newtemp; format editdate date9.; run;

EndDate
14621 14565 14608

EditDate
15JAN2000 15JAN2000 15JAN2000

Character Functions
Function Purpose SCAN returns a specified word from a character value. SUBSTRextracts a substring or replaces character values. TRIMtrims trailing blanks from character values. INDEXsearches a character value for a specific string. UPCASEconverts all letters in a value to uppercase. LOWCASEconverts all letters in a value to lowercase.

Performing Queries Using SQL

Objectives
invoke the SQL procedure select columns define new columns specify the tables(s) to be read specify subsetting criteria order rows by values of one or more columns end the SQL procedure.

You can use PROC SQL to


retrieve and manipulate SAS tables add or modify data values in a table add, modify, or drop columns in a table create tables generate reports.

Writing a proc SQL step


PROC SQL; SELECT column-1<, . . . column-n,> FROM table-1 | view-1<, . . . table-n | view-n> <WHERE expression> <ORDER BY column-1<, . . . column-n,>>; where PROC SQL invokes the SQL procedure SELECT specifies the columns to be selected FROM specifies the table to be queried WHERE subsets the data based on a condition ORDER BY sorts rows by the values of specific columns. Unlike other procedures, the order of clauses within a SQL SELECT statement does matter.

proc sql; select id,lastname,netpay,grosspay, grosspay*.06 as bonus from emplib.payroll where netpay>25000 order by lastname;

ID
1002 1007 1049 1006 1077 1008 1009 1005 1012 1015 1010 1011 1017 1001

LastName
BOWMAN BROWN FERNANDEZ GARRETT GIBSON HERNAND JONES KNAPP QUINTERO SCHOLL SMITH VAN HOTTEN WAGGONNER WATERHOUSE

NetPay
$29,048.50 $37,049.40 $25,169.63 $34,013.88 $41,553.94 $54,189.70 $44,128.90 $33,122.70 $51,888.53 $27,640.80 $37,331.48 $29,053.05 $26,484.02 $32,140.60

GrossPay
$42,120.33 $53,927.72 $35,956.61 $47,241.50 $61,108.73 $78,575.07 $63,986.91 $48,027.99 $79,828.51 $40,079.23 $54,899.24 $43,688.80 $38,550.25 $46,603.94

bonus
2527.22 3235.663 2157.397 2834.49 3666.524 4714.504 3839.215 2881.679 4789.711 2404.754 3293.954 2621.328 2313.015 2796.236

Selecting columns and defining new columns


proc sql; select id,lastname,netpay,grosspay, grosspay*.06 as bonus from emplib.payroll where netpay>25000 order by lastname; Specifying the Table

proc sql; select id,lastname,netpay,grosspay, grosspay*.06 as bonus from emplib.payroll where netpay>25000 order by lastname; ordering rows

Ordering by multiple columns


proc sql; select actlevel,sex,age, height*2.54 as CentHgt, weight/2.2 as KgWgt from clinic.admit where actlevel='LOW' order by sex,age;
ActLevel
LOW LOW LOW LOW LOW LOW LOW

Sex
F F F F M M M

Age
22 28 31 49 34 51 60

CentHgt
160.02 157.48 154.94 162.56 185.42 180.34 180.34

KgWgt
63.18182 53.63636 55.90909 78.18182 70 71.81818 86.81818

Querying multiple tables


proc sql; select bldginfo.id,lastname,building,room,extension from emplib.payroll,emplib.bldginfo where bldginfo.id=payroll.id order by lastname;

Specifying Columns That Appear in Multiple Tables


proc sql; select bldginfo.id,lastname,building,room,extension from emplib.payroll,emplib.bldginfo where bldginfo.id=payroll.id order by lastname; Specifying multiple table names

Clauses
proc sql; select custname as name, count(*) from sql.customer group by name having count(*)=1; group observations conditions that each group order

Processing variables with Arrays


data work.report; set master.temps; array daytemp(365) day1-day365; do i=1 to 365; daytemp(i)=5*(daytemp(i)-32/9); end; run;
* array and DO loop, the program below eliminates the need for 365 separate programming statements to convert the daily temperature from Fahrenheit to Celsius for the year.

Objectives
group variables into one- and twodimensional arrays perform an action on array elements create new variables using an ARRAY statement assign initial values to array elements create temporary array elements using an ARRAY statement.

Understanding SAS arrays


A SAS array is a temporary grouping of SAS variables under a single name. An array exists only for the duration of the DATA step.

Defining an array
ARRAY statement:
ARRAY array-name{dimension} elements; where array-name specifies the name of the array dimension describes the number and arrangement of array elements elements lists the variables to include in the array.

* Do not give an array the same name as a variable in the same


DATA step. Also, avoid using the name of a SAS function: the array will be correct, but you won't be able to use the function in the same DATA step and a warning message will be written to the SAS log.

Description of Finance.Sales91 :
Variable Type Length SalesRep char 8 Qtr1 num 8 Qtr2 num 8 Qtr3 num 8 Qtr4 num 8 To group the variables in the array, first give the array a name. In this example, make the array name sales.

array sales {4} qtr1 qtr2 qtr3 qtr4; array sales {*} qtr1 qtr2 qtr3 qtr4; array sales {4} or (4) or [4] qtr1 qtr2 qtr3 qtr4; array sales{4} qtr1-qtr4;

Expanding use of an array

data finance.report(drop=i); (drop=option used) set finance.qsales; array sale{4} sales1-sales4; array Goal{4} (9000 9300 9600 9900); array Achieved{4}; do i=1 to 4; achieved(i)=100*sale(i)/goal(i); end; run;
New variables created ( goal & achieved)

Referencing Elements of an Array


ARRAY reference: array-name(index value) where index value
is enclosed in parentheses specifies a variable, SAS expression, or integer is within the lower and upper bounds of the dimension of the array

array qtr(4) jan apr jul oct; do i=1 to 4; YearGoal=qtr(i)*1.2; end;


arrays are used with DO loops to process multiple variables and perform repetitive calculations.

Creating one dimensional array


SAS Data Set Hrd.Fitclass Name Weight1 Weight2 Weight3 Weight4 Weight5 Weight6
Alicia 69.6 Betsy 52.6 Brenda 68.6 68.9 52.6 67.6 68.8 51.7 67.0 67.4 50.4 66.4 66.0 49.8 65.8 66.2 49.1 65.2

data hrd.convert; set hrd.fitclass; array wt(6) weight1-weight6; do i=1 to 6; wt(i)=wt(i)*2.2046; end; run;

Using the DIM Function in an Iterative DO Statement DIM function: DIM(array-name)


where array-name specifies the array. In the above example, dim(wt) returns a value of 6.

data hrd.convert; set hrd.fitclass; array wt(6) weight1-weight6; do i=1 to dim(wt); wt(i)=wt(i)*2.2046; end; run;

Creating Variables with the ARRAY Statement For the above example data hrd.diff; set hrd.convert; array wt(6) weight1-weight6; array WgtDiff(5); new variable creating
When creating variables with an ARRAY statement, you do not need to specify array elements. Because you are not referencing existing variables, SAS software automatically creates the variables for you.

array WgtDiff(5);

Default variable names


array WgtDiff(5)
WgtDiff1 WgtDiff2 WgtDiff3 WgtDiff4 WgtDiff5
The default variable names are created by concatenating the array name and the number1,2,3, and so on, up to the array dimension.

Arrays of Character Variables


To create an array of character variables, add a dollar sign ($) after the array dimension. array firstname(5) $; Default length 8 array firstname(5) $ 24; If more length

Assigning initial values to Arrays


array goal{4} g1 g2 g3 g4 (initial values); array goal{4} g1 g2 g3 g4 (9000 9300 9600 9900);
place the values after array elements

array goal{4} g1 g2 g3 g4 (9000 9300 9600 9900);


one value for corresponding array element

array goal{4} g1 g2 g3 g4 (9000 9300 9600 9900);


separate each value wit comma or blank

array goal{4} g1 g2 g3 g4 (9000 9300 9600 9900);


enclose parenthesis for values

array colors{3} color1-color3 ('red','white','blue');


enclose each character value in quotation

Creating Temporary Array Elements


data finance.report; set finance.qsales; array sale{4} sales1-sales4; array Goal{4} _temporary_ (9000 9300 9600 9900); array Achieved{4}; do i=1 to 4 achieved(i)=100*sale(i)/goal(i); end; run;

* Temporary arrays needed to perform clacualtion and in saving time. Then dod not appear in the output data

Вам также может понравиться