Вы находитесь на странице: 1из 87

Orangetree Business Solutions Private Limited, 2012

No part of this book should be referenced or copied without the prior permission of the
company.

Orangetree Business Solutions Private Limited , 2012 Page 2


CONTENTS

1. Elementary Data Step Techniques Page 4

2. Output Delivery System and Customized Output Page 14

3. Import and Export in SAS and Associations between Two Variables Page 23

4. Descriptive Statistics in SAS Page 28

5. SAS Functions Page 34

6. Proc Report and Combining Data Sets Page 45

7. Infile Statement and Array Page 54

8. Introduction to SQL in SAS Page 62

9. Introduction to SAS Macros Page 74

Appendix: Suggested Books and Links Page 86

Orangetree Business Solutions Private Limited , 2012 Page 3


Chapter 1
Elementary Data Step Techniques

SAS is an integrated system of software solutions that enables you to perform the fol-
lowing tasks:

data entry, retrieval, and management


report writing and graphics design
statistical and mathematical analysis
business forecasting and decision sup-
port
operations research and project man-
agement
applications development

How you use SAS depends on what you


want to accomplish. Some people use
many of the capabilities of the SAS System,
and others use only a few. At the core of
the SAS System is Base SAS software which is
the software product that you will learn to
use in this documentation. This section pre-
sents an overview of Base SAS. It introduces the capabilities of Base SAS, addresses
methods of running SAS, and outlines various types of output.
SAS, today, is the most widely used Business Intelligence Software. There are other soft-
ware in the market like EXCEL,BUSINESS OBJECT(BO),ORACLE,COGNOS etc., but there
are a number of reasons why SAS is preferred over all the others. (In addition to BI soft-
ware, we have various data warehouses like SAP,SQL SERVER etc. where data can be
stored in a raw format.).
Now, SAS is an ETL (Extraction, Transformation and Loading) tool. The data (in raw for-
mat) is extracted from its storage location and then Transformation of the data takes
place. Transformation can pertain to treatment of missing values (e.g. putting 0 or
NA) when an observation is missing and all sorts of data manipulation can be done
here. The data (after cleaning) can be finally loaded to the data warehouse. Though,
Informatica is one of the most popular ETL tools, SAS is the most preferred of all due to
the following reasons:
a) It is an ETL tool and can also be used for reporting.

Orangetree Business Solutions Private Limited , 2012 Page 4


b) It can fetch data from remote locations and store it (the data) after transformation
(as opposed to BO,COGNOS etc. which can only transform data but not store it
and they require backend data warehouses such as SAP/ORACLE).
c) The unique feature of SAS is that it can be used as a forecasting tool (unlike BO or
Informatica).
d) SAS can also collate the datasets using a common characteristic , say, Customer
ID and it can also merge or append data sets. For example, there is one dataset
on Customer ID, Age , Gender and Educational Qualifications of customers (which
is stored in one location) and there is another dataset on Customer ID, Units sold
etc. and SAS can also merge or append such data sets by the common variable,
Customer ID.

Moreover, the SAS language is user-friendly and can be used in another platform, say,
Mainframe and Statistica(latest version is 10).So we can write Base SAS codes at the
backend but the interface or front end will be Statistica or Mainframe output. Moreo-
ver SPSS/Statistica codes are cumbersome while SAS codes are user-friendly. ORACLE
or EXCEL could also be used in Analytics but the advantage of SAS is that here, a
small keyword can be used to generate a huge output, so there is an increased flexi-
bility compared to ORACLE.SQL language can also be directly written into SAS by in-
stalling a component known as SAS-SQL.

There are two versions of SAS, viz., i) Base SAS


9.1 and ii) Enterprise guide 4.1.Companies
mostly use 9.1,although the latest in market is
the 9.2 version which is quite costly.9.1 and 4.1
have the same code and output. We will be
working on the Enterprise guide 4.1.(The enter-
prise guide 4.1 and learning edition 4.1 has the
same look and feel). While working on the
Learning edition, we can only handle 1500
rows and not more than that.

SAS Programming Environment Contains 6


Main Windows:
1. Project Designer: Shows the Process Flow of
a Project in Flow charts
2. Project Explorer: Shows the Process Flow of a Project as Drop Down Menu
3. Code Editor: Used to write and Edit codes
4. Server List: Show the Physical Storage Locations of Data
5. Log Window: Information about the execution of a program and Lists the errors
while execution
6. Output Window: Displays the output of execution of a program

Orangetree Business Solutions Private Limited , 2012 Page 5


Code Editor

Log Window

Type of Libraries available in SAS

There are two types of libraries in SAS


Temporary library
Permanent library
Depending on the library name that is used when create a file, we can store SAS files
temporarily or permanently.

Temporary Library
Its Temporary Storage Location of a SAS data file. They last only for the current SAS ses-
sion. Work is the temporary library in SAS. When the session ends, the data files stored in
the temporary library are automatically deleted.

The file is stored in Work, when:


No specific library name is used while creating a file.
Specify the library name as Work

Orangetree Business Solutions Private Limited , 2012 Page 6


Example:

Data employee;
Set local.emp;
Run;

On the above code employee data will be stored in temporary library work.

Permanent Library:
Its the Permanent storage location of data files. Data sets stored in any Permanent
SAS libraries are available for use in subsequent SAS sessions. A data set stored in a per-
manently library will be there unless we delete them physically. To store files perma-
nently in a SAS data library specify a library name Other than the default library name
Work.
Three Permanent Libraries provided by SAS are:
Local
SASuser
SAShelp

Creating a Permanent Library


To create a permanent library use libname statement. It creates a reference to the
path where SAS files are stored. The LIBNAME statement is global, which means that
the librefs remain in effect until modify them , cancel them, or end your SAS session.
The LIBNAME statement assigns a permanent library for the current SAS session only.
Assign a librefs to each permanent SAS data library each time a SAS session starts. SAS
no longer has access to the files in the library, once the libref is deleted or SAS session
is ended. Contents of Permanent library exists in the path specified.
Syntax for Creating a used defined Library linemen<librefpath ;
where,
libref is the name of the library to be created. The following are some conventions
that needs to be followed while creating the user defined library.
A used defined library or any library for that matter in SAS cant have more than 8
characters.
A libref can have both alpha base (A-Z) and Numeric base(0-9).
It can start with an alphabet but not with any numbers.
It cant have any of the special characters except _ and it can begin with _
and can continue with several combination of _ .
path is location in memory to store the SAS files

Example of a used defined library:

Libname Day1 C:\Documents and Settings\admin\Desktop\Orange Tree;

Here, Day1is a library reference name, libname is the keyword assigns the libref Day1to
the folder called OrangeTree in the specified path:
C:\Documents and Settings\admin\Desktop\Orange Tree

Path should be specified in either single quote() or double quote( ).

Orangetree Business Solutions Private Limited , 2012 Page 7


SAS Data Sets

SAS Data Set is a SAS file which holds Data


Data must be in the form of a SAS data set to be processed
Many of the data processing tasks access data in the form of a SAS data set and an-
alyze, manage, or present the data
A SAS data set also points to one or more indexes, which enable SAS to locate rec-
ords in the data set more efficiently

Rules for SAS Data Set Names

SAS data set names :


can be 1 to 32 characters long
must begin with a letter (AZ, either uppercase or lowercase) or an underscore _.
can continue with any combination of numbers, letters, or underscores.
These are examples of valid data set names:
_sales1
Datatelecom

Columns in SAS

Columns are generally known as head-


ings, fields but in SAS columns are called
variables . It is a collection of values that
describe a particular characteristic. In
this table ID, Department, Satisfaction,
Years and Status are the name of the
variables in the data set.

Rows in SAS

Rows are sometime called Cases or rec-


ords but in SAS these are called obser-
vations . It is a Collection of data values
that usually relate to a single object in
SAS Data Sets
Example- Accounting, Chemistry are the observations under Variable Name
(Department).

Missing Values in SAS

If a data is unknown for a particular observation, a missing value is recorded


. (called period) indicates missing value of a numeric variable. Salary which is a
numeric variable has 3 missing values
(blank) indicates missing value of a character variable. In this table above De-
partment is a character variable and has 1 missing value in it.

Orangetree Business Solutions Private Limited , 2012 Page 8


Referencing Permanent SAS Files Two-Level Names

Two-level name are used to reference a permanent SAS file in SAS programs
There are two parts in a Two-Level Name:
1. Libref name
2. Filename

Libref Is the name of the SAS data library that contains the file
Filename Is the name of the file itself A period separates the libref and filename. Ex-
ample: Clinic.Admit is the two-level name for the SAS data set Admit. Admit is as-
signed to the library named Clinic.

Referencing Temporary SAS Files


To reference temporary SAS files specify the default libref Work, a period, and the file-
name; Example: Here, the two-level name Day1.Empreferences the SAS data set
named Emp that is stored in the permanent library day1.

One-Level name
One-level name (the filename only) can be used to reference a file in a temporary
SAS library

When a one-level name is used, the default libref Work is assumed, Example: Here,
the one-level name Test also references the SAS data set named Test that is stored
in the temporary SAS library Work.

SAS Programs contains the following steps:


Data Step
Proc Step
Combination of DATA and PROC step

Data Step

Typically create or modify SAS data sets and they can also be used to produce cus-
tom-designed reports.

DATA steps are used to:

Put data into a SAS data set

Compute values

Check for and correct errors in data

Produce new SAS data sets by sub setting, merging, and updating existing data
sets

Put data into a SAS data set

Orangetree Business Solutions Private Limited , 2012 Page 9


Compute values

Check for and correct errors in data

Produce new SAS data sets by sub setting, merging, and updating existing data
sets

Proc Step

They pre-written routines that enable us to analyze and process the data in a SAS data
set and to present the data in the form of a report. PROC steps sometimes create new
SAS data sets that contain the results of the procedure. PROC steps can list, sort, and
summarize data.

PROC steps are used to:


Create a report that lists the data
Produce descriptive statistics
Create a summary report
Produce plots and charts

Reading In stream Data using Cards and Datalines


Data can be entered into SAS data set directly through SAS program. Reading in
stream data is useful when to create data and test programming statements on a few
observations.
To read in stream data use:
DATALINES statement as the last statement in the DATA step (except for the RUN
statement) and immediately preceding the data lines.
a null statement( a single semicolon) to indicate the end of the input data
Only one DATALINES statement can be used in a DATA step
Use separate DATA steps to enter multiple sets of data
If the data contains semicolons, use the DATALINES4 statement plus a null state-
ment that consists of four semicolons (;;;;) to indicate the end of the input data.

DATA<dataset name>;
INPUT<variablename1>*$+ <variablename2>*$+ ;
DATALINES;
run ;

After the DATALINES statement specify the data values

After typing in the values give a semicolon to indicate the end of the data values.

Can also use the statement Cards instead of Datalines

Orangetree Business Solutions Private Limited , 2012 Page 10


Codes to create dataset

Data Day1.employee;
Length city$10 ;
Input City$ Id$ Sal Doj;
InFormat sal dollar10. doj ddmmyy10. ;
Format sal dollar10. doj ddmmyy10. ;
datalines;
Bangalore T101 $20,000 19/09/1979
Delhi T101 $23,000 13/01/1983
Kolkata Y109 $24,000 12/09/2001
Chennai I111 $29,000 10/10/2010
;run;

In the above code we creating a new dataset employee . Length statement is used to
increase the length of the variable (column heading) city as the default length is 8 . If
the length is not increased then the letter e of the observation Bangalore will come
in the employee data set.
Input is the keyword to declare the column headings i.e. city id sal doj. Dollar ($) sign is
used with the variable city and id because they are character variable.
Informat is the keyword to read values with special character like $,/ , comma etc. For-
mat is the keyword to write them in the proper format example salary as $23,000 etc.
Datalines is the keyword to declare the observations under the variables.
Now if we want to add label (descriptive text )to the variable sal for better under-
standing. Secondly we can also change the name of a variable permanently by the
rename keyword.
Below is the code for label and rename.

Data Day1.employee;
Length city$10 ;
Input City$ Id$ Sal Doj;
Informat sal dollar10. doj ddmmyy10. ;
Format sal dollar10. doj ddmmyy10. ;
label sal="salary of the employees";
Rename doj = date_of_joining;
datalines;
Bangalore T101 $20,000 19/09/1979
Delhi T101 $23,000 13/01/1983
Kolkata Y109 $24,000 12/09/2001
;run;

Some Basic SAS Procedures

Proc Contents
Proc contents lists the structure of the specified SAS data set. The information includes
the names and types (numeric or character) of the variables in the data set. The most
common form of usage is

Proc contents data=second;


run;
This lists the information for the data set second.

Orangetree Business Solutions Private Limited , 2012 Page 11


proc contents data=_all_;
run;
_All_ - printing the contents of individual files when you specify _ALL_ in the DATA= op-
tion.

proc contents data=_all_ nods;


run;
Nods- Suppress the printing of individual files

Proc contents data=sashelp.class varnum;


run;
Varnum- Print a list of the variables by their position in the data set. By default, the
CONTENTS statement lists the variables alphabetically.

Proc contents data=sashelp.class position short;


run;
Position Short Print the name of the variable from the dataset both in alphabetic and
creation order.

Proc Print

The PRINT procedure prints the observations in a SAS data set, using all or some of the
variables. You can create a variety of reports ranging from a simple listing to a highly
customized report that groups the data and calculates totals and subtotals for numer-
ic variables.

Selecting Variables for printing

Proc print data=day1.candy;


Var brand name;
Run;

This example selects two variables for the reports. Var is the keyword to select variable.

Creating a Customized Layout with Var and ID Variables

Proc print data=day1.candy_sales_summary n noobs ;


Id prodid;
Var category subcategory;
Sum sale_amount;
Run;

This customized report

Id selects variables to include in the report and the order in which they appear

Var selects observations to include in the report

Sum sums the sale_amount for each category and subcategory

Orangetree Business Solutions Private Limited , 2012 Page 12


displays numeric data with commas and dollar signs.

n display the total number of observations in the dataset.

Points to remember

Syntax rules

All commands end in a semi-colon

SAS statements are not case sensitive. You may use upper or lower case.

Variables names can be upper or lower case.

When you refer to "external" file name, it is case-sensitive (interaction with UNIX op-
erating system)

Commands can extend over several lines as long as words are not split

You may have more than one command per line

SAS name rules (for datasets and variables): up to 32 characters long; must start
with a letter or an underscore (_). Avoid special characters.

Orangetree Business Solutions Private Limited , 2012 Page 13


Chapter 2
Output Delivery System (ODS)
and Customized Output
ODS stands for the Output Delivery System. ODS allows output from the Data Step &
SAS procedures to presented in a more useful way. ODS also allows for some of the
output of SAS procedures to be stored in SAS datasets. Although this is an improve-
ment over the regular SAS output it still has its limitations.

Uses of ODS

ODS can arrange output in a prettier way.

It also can create output in a variety of formats, such as: html, pdf, rtf, etc.

As stated earlier it can also create output datasets that generally can also be cre-
ated within most of the SAS procedures.

Example:

To send the output of a procedure to ODS and create an .html file:


ods html/rtf/pdf body = path of the folder/filename.extension(html/rtf/pdf);
proc print data =library.dataset name;
run;

RTF Files

This creates prettier output that can be read by MS Word and other word processing
programs. The general syntax is:

ods rtf body = path of the folder/filename.extension (rtf);

PDF Files

This creates prettier output that can be read by Adobe Acrobat Reader, one cave-
at of using this is that you need the Adobe Acrobat Distiller. The general syntax is:

ods pdf body=path of folder/filename.pdf;

Restrictions on observations

You can control how many records you process or print when testing your programs. If
you just want to print a few observations in a data set or test a new program it is a

Orangetree Business Solutions Private Limited , 2012 Page 14


good idea to limit the observations (records) processed in your program. There is no
reason to test your program on all 100,000 records if you just want to see if your calcu-
lation works or if you can read a raw data file. Limiting observations by using the OBS
and FIRSTOBS options can significantly reduce processing time and paper usage.

Limiting observations when printing


Proc print data=day1.class(obs=50);
Run;

The (obs=50) option will limit the printing to the first 50 observations in the data set. OBS
specifies the last observation of the SAS data set that will be printed. It does NOT speci-
fy how many observations that should be printed. If you wanted to begin printing at
the 20th observation and end at the 50th observation you can use the firstobs option:
Proc print data=day1.class (firstobs = 20 obs = 50);
Run;

Limiting observations in a data set


Data day1.class1;
Set day1.class(obs=50);
Run;

This will limit processing to the first 50 observations in day1.class1. You can also use the
firstobs option to begin processing at a place other than the first observation. If you
combine firstobs with the obs option, remember that obs tells SAS the LAST observa-
tion to process not how many observations to process so be sure that obs is greater
than firstobs (firstobs = 1000 obs = 1500).

Limiting observations in a procedure (proc) step


proc freq data=day1.class (obs=30);
run;

Points To Remember
Valid in: DATA step and PROC steps
Category: Observation Control
Default: MAX
Restriction: Use with input data sets only
Restriction: Cannot use with PROC SQL views

Restrictions On Variables

Introduction--The DROP/KEEP Concept

Within the SAS (R) System, the DROPIKEEP concept is used to effect the availa-
bility of variables within a SAS step, or to control which variables are to be writ-
ten to SAS data files. Both words ("KEEP" and "DROP") are part of the same con-
cept, and wherever the syntax of the SAS System allows the use of one of them,

Orangetree Business Solutions Private Limited , 2012 Page 15


the use of the other would also be allowed.

Difference Between DROP and KEEP

The KEEP concept specifies which variables to make available or select. When the
KEEP concept is employed, only variables explicitly mentioned on the list following
the word "KEEP" are available/ selected.

The DROP concept, on the other hand, specifies which variables not to have
available/selected. When the DROP concept is employed, those variables not ex-
plicitly mentioned on the list following the word "DROP" are the only ones availa-
ble selected.

KEEP and DROP statements are used often to control the number of variables (fields)
read into and output into the datasets. During the data processing we create several
variables but need to save only select ones in the final dataset.

If you want to restrict the number of columns in output data set, use the following
method. This will ensure that output dataset is created with required variables only.

Data day1.class1 (keep = var1 var2 var3 etc);


Set day1.class;
Run;

Alternately you can specify the first statement as follows:


Data day1.class1 ;

Set day1.class;
keep = var1 var2 var3 ;
Run;
If you are reading a big dataset into SAS and require only a few variables from it, use
the following statements in the program.

Data day1.class1 ;
Set day1.class((keep = var1 var2 var3 etc);
Run;
In the first case, SAS reads the entire data set class, even though you only intend to use
three variables. In the second case, SAS reads from disk only the three variables you
intend to keep. Please note that we have to use such efficient methods to restrict the
data read into the system to optimize the system resources such as SASWORK and
shared drives. The same way DROP statement can also be specified based on the da-
ta requirement of the user.

Conditional statements

Conditional statements are used to restricts the output based on certain conditions.

They are as follows:

Orangetree Business Solutions Private Limited , 2012 Page 16


1)Where

2)If else

3)When

Using the WHERE Statement to Subset Data

Use a WHERE statement to select observations that meet a particular condition from a
SAS data set. The WHERE statement subsets the input data by specifying certain condi-
tions that each observation must meet before it is available for processing. The condi-
tions that you define in a WHERE statement is an arithmetic or logical expression that
generally consists of a sequence of operands and operators. To compare character
values, you must enclose them in single or double quotation marks and the values
must match exactly, including capitalization. Using the WHERE statement might im-
prove the efficiency of your SAS programs because SAS is not required to read all the
observations in the input data set. Note: You can use only one WHERE statement in a
Data or a Proc step.

Comparison Operators

Comparison operators set up a comparison, operation, or calculation with two varia-


bles, constants, or expressions. If the comparison is true, the result is 1. If the comparison
is false, the result is 0.
They are as follows:
1. =(eq)
2. >(gt)
3. >=(ge)
4. <(lt)
5. =<(le)
6. ^=(ne)

Logical Operators

Logical operators, also called Boolean operators, are usually used in expressions to link
sequences of comparisons. The logical operators are shown below:

Contains Operator: The CONTAINS or question mark (?) operator selects observations
that include the string specified in the WHERE expression. This operator is available for
character variables only. The position of the string in the variable does not matter;
however, the operator distinguishes between uppercase and lowercase characters
when making comparisons. The following examples select observations containing the
values Mobay and Brisbayne for the variable COMPANY, but they do not select the
observation containing Choco:

Where product contains Choco;

Orangetree Business Solutions Private Limited , 2012 Page 17


Where product ? Choco;

BetweenAnd Operator: The BETWEEN-AND operator selects observations in which


the values of the variables fall within a range of values. You can specify the limits of
the range as constants or expressions. Any range you specify with the BETWEEN-AND
operator is an inclusive range, so that a value equal to one of the limits of the range is
within the range. The BETWEEN-AND operator has the following form: WHERE variable
BETWEEN value AND value ;

Examples: where salary between 500 and 1000; where taxes between salary*0.30 and
salary*0.50;

You can combine the NOT operator with the BETWEEN-AND operator to select values
that fall outside the range.

Where salary not between 500 and 1000;

Like Operator: The LIKE operator selects observations by comparing the values of a
character variable to a specified pattern, which is referred to as pattern matching.
The LIKE operator is case sensitive. There are two special characters available for spec-
ifying a pattern:

Percent sign (%):specifies that any number of characters can occupy that position.
The following WHERE expression selects all employees with a name that starts
with the letter N. The names can be of any length. where lastname like 'N%';

Underscore (_): matches just one character in the value for each underscore
character.

IN Operator: The IN operator, which is a comparison operator, searches for character


and numeric values that are equal to one from a list of values. The list of values must
be in parentheses, with each character value in quotation marks and separated by
either a comma or blank. For example, suppose you want all sites that are in North
Carolina or Texas. You could specify: where state = 'NC' or state = 'TX';

However, the easier way would be to use the IN operator, which says you want any
state in the list: where state in ('NC','TX');
In addition, you can use the NOT logical operator to exclude a list. For example, where
state not in ('CA', 'TN', 'MA');

AND OPERATOR: If both conditions linked by the AND are true, then the expression is
true. If either condition linked by the AND is false then the entire statement is false.

Example- where brand=Nestle and calories >200;

The above code will extract only brand Nestle with calories greater than 200.

Orangetree Business Solutions Private Limited , 2012 Page 18


OR OPERATOR: If either condition linked by the OR is true, then the entire expression
is true. If both conditions linked by the OR are false then the entire expression is false.

Example; Where brand =Nestle or calories >200

IF .. THEN .. ELSE statements

SAS evaluates the expression in an IF-THEN statement to produce a result that is either
nonzero, zero, or missing. A nonzero and non missing result causes the expression to be
true; a result of zero or missing causes the expression to be false.

If the conditions that are specified in the IF clause are met, the IF-THEN statement exe-
cutes a SAS statement for observations that are read from a SAS data set, for records
in an external file, or for computed values. An optional ELSE statement gives an alter-
native action if the THEN clause is not executed. The ELSE statement, if used, must im-
mediately follow the IF-THEN statement.

Using IF-THEN statements without the ELSE statement causes SAS to evaluate all IF-THEN
statements. Using IF-THEN statements with the ELSE statement causes SAS to execute IF-
THEN statements until it encounters the first true statement. Subsequent IF-THEN state-
ments are not evaluated.

Also known as conditional statements, these are very important when subsetting data
or processing observations conditionally. The ELSE part is not required as we have seen
earlier with statements like
if x < 0 then delete;
if name='Smith';
In the form IF <condition> THEN <statement1> ELSE <statement2> SAS evaluates for
each observation the logical condition. If the condition is true, it executes statement1,
if it is false statement2. Important to note is that only a single statement follows the
THEN and ELSE clause. Example:

Data day1.cars_new;
Set day1.cars;
If country=USA then status =New Car;
Else if country =Japan then status =old Car;
Else status =others;
Run;

In the above code we extract observation and creates a new variable status and as-
sign values to them conditionally corresponding to each country.

What are the differences between WHERE & IF statement?

The major difference can be summarized as:


IF can only be used in a DATA step
Many IF statements can be used in one DATA step
If can create new variable.

Orangetree Business Solutions Private Limited , 2012 Page 19


Whereas
Where can be used in a DATA step as well as a PROC.
Where cannot create new variable.

When: When conditional statement is similar to if statement. A select-group provides a


multiple path conditional branch. A select-group contains a SELECT statement, option-
ally one or more WHEN statements, optionally an OTHERWISE statement, and an END
statement.
SELECT (exp1)
The SELECT statement and its corresponding END statement, delimit a group of state-
ments collectively called a select-group. The expression in the SELECT statement is
evaluated and its value is saved.

WHEN (exp2, exp2, ...) unit

Specifies one or more expressions that are evaluated and compared with the saved
value from the SELECT statement.

If an expression is found that is equal to the saved value, the evaluation of expressions
in WHEN statements is terminated, and the unit of the associated WHEN statement is
executed. If no such expression is found, the unit of the OTHERWISE statement is exe-
cuted. The WHEN statement must not have a label.

OTHERWISE unit

Specifies the unit to be executed when every test of the preceding WHEN statements
fails. If the OTHERWISE statement is omitted and execution of the select-group does not
result in the selection of a unit, the ERROR condition is raised. The OTHERWISE statement
must not have a label or condition prefix.

Splitting of Data Set

The objective is to divide a dataset into subgroups by the values of a variable.


Here is our first sample dataset.

Data day1.choco day1.nuts;


Set day1.candy_sales_summary;
If category =Candy then output day1.choc;
Else output day1.nuts;
Run;

n the above code ,the data set candy_sales_summary is splitted into two i.e.
day1.choco and day1.nuts. All the category Candy is transferred to Day1.choco and
rest all categories goes to day1.nuts. Basically if statement is used for subsetting the
candy_sales_summary dataset.

Orangetree Business Solutions Private Limited , 2012 Page 20


Sorting Of dataset

Proc sort arranges observations of the data set.


Can create a new SAS data set containing rearranged observations.
Sorts ascending (default) and descending.
Does not provide printed output (that requires the proc print statements).
Treats missing data as smallest possible value
Example:
Proc sort data=day1.candy
Out=day1.candy_sorted;
By calories;
Run;

In the above code we arranged the candy dataset in ascending order by the varia-
ble calories but the arranged data is stored in candy_sorted dataset as we used the
out keyword.

To arrange the dataset in descending order following is the code.

Proc sort data=day1.candy Out=day1.candy_sorted;


By descending calories;
Run;

DEFINING NODUP AND NODUPKEY OPTIONS

The NODUP option checks for and eliminates duplicate observations. If you specify this
option, PROC SORT compares all variable values for each observation to those for the
previous observation that was written to the output data set. If an exact match is
found, the observation is not written to the output data set. The NODUPKEY option
checks for and eliminates observations with duplicate BY variable values. If you specify
this option, PROC SORT compares all BY variable values for each observation to those
for the previous observation written to the output data set. If an exact match using the
BY variable values is found, the observation is not written to the output data set.

Notice that with the NODUPKEY option, PROC SORT is comparing all BY variable values
while the NODUP option compares all the variables in the data set that is being sort-
ed. An easy way to remember the difference between these options is to keep in
mind the word key in NODUPKEY. It evaluates the key or BY variable values that
you specify. One thing to beware of with both options is that they both compare the
previous observation written to the output data set. So, if the observations that you
want eliminated are not adjacent in the data set after the sort, they will not be elimi-
nated.

Proc sort data=day1.candy


Out=day1.candy_sorted nodup;
By calories;
Run;

Orangetree Business Solutions Private Limited , 2012 Page 21


So in this code, we wanted to order the data by the variable Calories and eliminate
any observations that have the exact same information for all variables.

Proc sort data=day1.candy


Out=day1.candy_sorted nodupkey;
By calories;
Run;

Orangetree Business Solutions Private Limited , 2012 Page 22


Chapter 3
Import & Export in SAS and
Association between Two Variables
With the Import and Export Wizards you can transfer data between external data
sources and SAS data sets. Each wizard presents a series of windows with simple choic-
es to guide you through a process, making a complex, or infrequent task more simple
to perform.

Proc Import

The IMPORT procedure reads data from an external data source and writes it to a SAS
data set. External data sources can include:
Excel
CSV
TXT
Delimited files contain columns of data values that are separated by a delimiter, such
as a blank or a comma.

IMPORT Procedure

The syntax for the IMPORT procedure is shown here briey but is described in detail
in theSAS Procedures Guide.

PROC IMPORT DATAFILE=<lename | TABLE=tablename>


OUT=<libref.> SAS-data-set
<DBMS=identier> <REPLACE>;
<data-source-statements;>

Importing a delimited file

Proc import datafile=path/filename.extension


Out=libref.dataset name
Dbms=dlm replace; delimiter=; ;
Run;

This code is for any delimited file like semicolon, space, underscore etc.

Import a tab delimited file

Proc import datafile=path/filename.extension


Out=libref.dataset name
Dbms=tab replace;
Run;

Orangetree Business Solutions Private Limited , 2012 Page 23


Importing a Excel file

Proc import datafile=path/filename.extension


Out=libref.dataset name
Dbms=xls replace; sheet = name of the worksheet where the data is;
Run;

EXPORT Procedure

The syntax for the EXPORT procedure is shown here briey but is described in detail
in the SAS Procedures Guide.

PROC EXPORT DATA=<libref.>SAS-data-set


OUT=lename | OUTTABLE=tablename
<DBMS=identier> <REPLACE>;
RUN;

The EXPORT procedure reads data from a SAS data set and exports it to an external
data source. PROC EXPORT also controls the results with options and statements that
are specic
to the output data source.

Exporting a Delimited File

The following example exports a SAS data set named MYFILE.CLASS and creates a de-
limited external le called CLASS. Notice that the DELIMITER= statement species the
ampersand (&) delimiter to separate the column names in the new le.

proc export data=myfiles.class


outfile="/myfiles/class";
dbms=dlm;
delimiter=&;
run;

Proc Freq

The FREQ procedure produces one-way to n-way frequency and contingency (cross
tabulation) tables. For two-way tables, PROC FREQ computes tests and measures of
association. For n-way tables, PROC FREQ provides stratified analysis by computing sta-
tistics across, as well as within, strata.

For one-way frequency tables, PROC FREQ computes goodness-of-fit tests for equal
proportions or specified null proportions. For one-way tables, PROC FREQ also provides
confidence limits and tests for binomial proportions, including tests for non inferiority
and equivalence. For contingency tables, PROC FREQ can compute various statistics
to examine the relationships between two classification variables. For some pairs of
variables, you might want to examine the existence or strength of any association be-
tween the variables. To determine if an association exists, chi-square tests are comput-

Orangetree Business Solutions Private Limited , 2012 Page 24


ed. To estimate the strength of an association, PROC FREQ computes measures of as-
sociation that tend to be close to zero when there is no association and close to the
maximum (or minimum) value when there is perfect association.

Example:
Proc freq data =day1.candy;
Tables brand;
Run;

In a freq procedure specifies the frequency and cross tabulation tables to produce. A
request is composed of one variable name or several variable names that are sepa-
rated by asterisks. To request a one-way frequency table, use a single variable. To re-
quest a two-way cross tabulation table, use an asterisk between two variables. To re-
quest a multi-way table (an n-way table, where n>2), separate the desired variables
with asterisks. The unique values of these variables form rows, columns, and strata of
the table.

Example:
Proc freq data =day1.candy_sales_summary;
Tables category * subcategory;
Run;

The TABLES statement requests one-way to n-way frequency and cross tabulation ta-
bles and statistics for those tables. If you omit the TABLES statement, PROC FREQ gener-
ates one-way frequency tables for all data set variables that are not listed in the other
statements. Above code will generate two way frequency distribution for the variable
category and subcategory. In the output table we will get Frequency, Percent fre-
quency , row percent and column percent. The upper left hand corner of the table
contains a legend of what the numbers inside each cell represent.

Orangetree Business Solutions Private Limited , 2012 Page 25


Proc freq data =day1.candy_sales_summary;
Tables category * subcategory/norow nocol nopercent ;
Run;

Nocol - suppresses printing of column percentages of a crosstab.


Norow - suppresses printing of row percentages of a crosstab.
Nopercent - suppresses printing of cell percentages of a crosstab.

Proc freq data =day1.candy_sales_summary;


Tables category * subcategory/list ;
Run;

List - prints two-way to n-way tables in a list format rather than as cross tabulation ta-
bles.

Proc Format

The reports contain different values of different types. The values must be formatted for
ready understandability and proper presentation. The formats are applied to the val-
ues , after the data is read from the dataset and before sending the data to print des-
tination. The Format procedure allows formatting numeric and non numeric values by
using different statements after the proc format statement in a proc step.

Example:
Proc format;
Value $gender M=Male
F=Female;
Run;

In the above code we created a user defined format for M and F .

Proc freq data=day1.class;


Tables sex ;
Run;

Above output we will get if we dont invoke the user defined format i.e. gender. This

Orangetree Business Solutions Private Limited , 2012 Page 26


will show you frequency distribution for the variable sex.

Proc freq data=day1.class;


Tables sex ;
Format sex $gender. ;
Run;

In the above
code we
called the us-
er defined for-
mat gender.
That is why
the observa-
tions under
sex variable
are changed
as Male and
female.

Orangetree Business Solutions Private Limited , 2012 Page 27


Chapter 4
Descriptive Statistics in SAS

To generate descriptive statistics for numeric variables , following are the procedure.
Proc Means
Proc Summary
Proc univariate
Proc Tabulate

Proc means

The MEANS procedure provides data summarization tools to compute descriptive sta-
tistics for variables across all observations and within groups of observations. For exam-
ple, PROC MEANS calculates descriptive statistics based on moments

estimates quantiles , which includes the median


calculates confidence limits for the mean
identifies extreme values
performs a t test

By default, PROC MEANS displays output. You can also use the OUTPUT statement to
store the statistics in a SAS data set. Means procedure by default generates five de-
scriptive statistics, they are mean, standard deviation, minimum value, and maxi-
mum value. The syntax of the PROC MEANS statement is:

PROC MEANS <options>; <statements>;

Example:
Proc means data=day1.candy_sales_summary;
Var Sale_amount;
Run;

Proc means procedure generates five de-


scriptive statistics for the number variable
sale_amount which is selected by the key-
word VAR. The VAR or VARIABLES statement
can be used with all procedures to indicate
which variables are to be analyzed. If this statement is omitted, the default is to in-
clude all variables of the appropriate type (character or numeric) for the given analy-
sis.

Orangetree Business Solutions Private Limited , 2012 Page 28


Other commonly used options available in PROC MEANS include:

DATA= Specify data set to use

NOPRINT Do not print output

MAXDEC=n Use n decimal places to print output

Commonly used statements with PROC MEANS include:

BY variable list -- Statistics are reported for groups in separate tables

CLASS variable list Statistics reported by groups in a single table

VAR variable list specifies which numeric variables to use

OUTPUT OUT = dataset name statistics will be output to a SAS data file

A few examples:
Proc means data = day1.candy_sales_summary;
Class category subcategory;
Var sale_amount;
Run;

Similarly we can execute the code using by variable.


Proc sort data=day1.candy_sales_summary Out=day1.candy_sorted;
By category;
Run;

Orangetree Business Solutions Private Limited , 2012 Page 29


Proc means data = day1.candy_sorted;
Var sale_amount;
By category ;
Run;

In the above code we will get cate-


gory wise descriptive statistics due
to the use of BY Keyword.

Proc means data = day1.candy_sorted;


Var sale_amount;
Class category ;
Output out=day1.candy_sorted1;
Run;

In the above code we are creating


a dataset using the keyword output
out.

Proc Summary

The PROC SUMMARY procedure allows the user to obtain statistical analyses on
data obtained from a permanent, or working storage, SAS data set. The purpose
of the procedure may be summarized as follows:

To compute summary statistics for specified levels of sub grouped observations.


The resulting statistics will be assigned to new variables, while the old variables are
dropped.

To produce a SAS data set for use with subsequent data steps or procedures.

To generate output in Proc summary print keyword is required.

Proc summary data=day1.candy_sales_summary print;


Var Sale_amount;
Class category; output out =day1.candy_summary;
Run;

The VAR statement lists the numeric variables for which summary statistics are de-
sired. This is a required statement for PROC SUMMARY. The output statement is re-
quired. Specifying OUTPUT OUT options to create a new data set specified as can-
dy_summary. Whenever we create output from proc summary or proc means step we
get two variable i.e. _Type_ and _freq_ . The SAS variable _TYPE_ is created con-

Orangetree Business Solutions Private Limited , 2012 Page 30


taining the level of subgroup specified. In this example _ TYPE_ = 0, specifying the
'total' of the summary levels available. The SAS variable _FREQ_ is created by the
procedure and contains the number of observations in the subgroup defined. All
observations, including missing observations within the subgroup are included.

Proc Univariate

Proc univariate provides following descriptive statistics for a single variable:

Number of observations in the population

Median value

Mode value

Percentiles as follows-
1,5,10,25,50,75,90,95,98,99

Top 5 values (largest) (with po-


sition in file)

Bottom 5 values (smallest)


(with position in file)

Quartiles

Range

Mean

Variance

Standard Deviation

Minimum Value

Maximum value

Sum of Squares

Skewness

Kurtosis

Number of observations equal to zero, less than zero and greater than zero

Number of missing observations

Corrected Sum of Squares

Uncorrected Sum of Squares

Standard Error of the Mean

Orangetree Business Solutions Private Limited , 2012 Page 31


Example:
Proc univariate data=day1.class;
Var height;
Run;

The above code will generate all descriptive statistics for the numeric variable Height.
Proc univariate data=day1.class;
Var height;
Histogram Height;
Run;

The above will generate the descriptive statistics for the numeric variable height. Apart
from this it can generate histogram for the variable height. In addition, you can use
the following statements to request plots:

the HISTOGRAM statement for creating histograms

the QQPLOT statement for creating Q-Q plots

the CLASS statement together with any of these plot statements for creating com-
parative plots

Proc Tabulate

The TABULATE procedure in SAS provides a flexible platform to generate tabular re-
ports. The simplest possible table in TABULATE has to have three things: a PROC TABU-
LATE statement, a TABLE statement, and a CLASS or VAR statement. In this example,
we will use a VAR statement. Later examples will show the CLASS statement.
The PROC TABULATE statement looks like this:

PROC TABULATE DATA=day1.class;


Run;

The second part of the procedure is the TABLE statement. It describes which variables
to use and how to arrange the variables. When there is only one variable, you get a
one-dimensional table.

PROC TABULATE DATA=day1.class;


TABLE height;
RUN;

If you run this code as is, you will get an error message because TABULATE cant figure
out whether the variable HEIGHT is intended as an analysis variable, which is used to
compute statistics, or a classification variable, which is used to define categories in the
table. In this case, we want to use height as the analysis variable. We will be using it to

Orangetree Business Solutions Private Limited , 2012 Page 32


compute a statistic. To tell TABULATE that HEIGHT is an analysis variable, you use a VAR
statement. The syntax of a VAR statement is simple: you just list the variables that will
be used for analysis. So now the syntax for our PROC TABULATE is:

PROC TABULATE DATA=day1.class;


VAR Height;
TABLE Height;
RUN;

The above code will generate the output. It has a single column, with
the header HEIGHT to identify the variable, and the header SUM to identify the statistic.
There is just a single table cell, which contains the value for the sum of HEIGHT for all of
the observations in the dataset Class.
To specify the statistic for a PROC TABULATE table, you modify the TABLE statement.
You list the statistic right after the variable name. To tell TABULATE that the statistic
MEAN should be applied to the variable HEIGHT, you use an asterisk to link the variable
name to the statistic keyword. The asterisk is a TABULATE operator. Just as you use an
asterisk as an operator when you want to multiply 2 by 3 (2*3), you use an asterisk
when you want to apply a statistic to a variable.

PROC TABULATE DATA=day1.class;


CLASS SEX;
VAR Height;
TABLE HEIGHT*MEAN*SEX;
RUN;

The output with the new statistic is shown below. Note that the variable name at the
top of the column heading has remained unchanged. However, the
statistic name that is shown in the second line of the heading now
says Mean. In addition, the value shown in the table cell has
changed from the sum to the mean.

Proc tabulate data=day1.class;


Var age height weight;
Class sex;
Table (age height weight)* mean, Sex;
Run;

The resulting table is shown below. Now the column headings have changed. The vari-
able name Height and the statistic name Mean are still
there, but under the statistic label there are now two col-
umns. Each column is headed by the variable label SEX
and the category Female and Male .The values shown in
the table cells now represent subgroup means.

Orangetree Business Solutions Private Limited , 2012 Page 33


Chapter 5
SAS Functions

Date and Time Functions

The SAS System Software provides a wealth of tools for users who need to work with
data collected in the time domain. These tools include functions which create a SAS
date, time or date time variable from either raw data or from variables in an existing
SAS data set.

determine the interval between two periods

declare a SAS date or time variable as a constant

extract parts from a SAS date variable, such as the month, day or week, or year.

A second set of tools, SAS date/time formats, modify the external representation of a
SAS date or time variable. As with other SAS System formats, a date, time or datetime
format displays the values of the variable according to a specified width and form.
Use of date, time or datetime formats is essential when creating applications or pro-
grams in the SAS System portraying the values of variables collected in time. SAS date/
time informats are able to convert raw data into a date, time or datetime variable.
They read fields (i.e., variables) in either raw data files or SAS data sets .

Extracting parts from a SAS Date Variable Several SAS functions are available to ob-
tain information about the values of a SAS date variable. These include:

MONTH -Returns the month


DAY -Returns the day
YEAR - Returns the year
QTR -Returns the quarter
WEEKDAY -Returns the day of the week (1= Sunday)

Data day1.candy_date;
Set day1.candy_sales_summary;
format Sysdate ddmmyy10.;
Sysdate=today();
Date=day(date);
Month=month(date);
Year=year(date);
Quarter=qtr(date);
Week_day=weekday(date);
Week_num=week(date);
Run;

Orangetree Business Solutions Private Limited , 2012 Page 34


DAY( date )- returns the day of the month from a SAS date value.
Today()-returns todays date as a SAS date value.
Month(date)-returns the numerical value for the month of the year from a SAS date
value.
Year(date)- returns the year from a SAS date value.
Qtr(date)-returns the quarter of the year from a SAS date value. The QTR function re-
turns a value of 1, 2, 3, or 4 from a SAS date value to indicate the quarter of the year in
which a date value falls
Weekday(date)-returns the day of the week from a SAS date value. It produces an in-
teger that represents the day of the week, where 1=Sunday, 2=Monday, ...,
7=Saturday
Week(date)-returns the week of year from a SAS date value.

Calculating Time Intervals

A common application of SAS System date and time capabilities is to determine how
long a period has elapsed between two points in time. This can be accomplished by
one of two methods:

arithmetic operation (usually subtraction and/or division) between two SAS date,
time or datetime variables or between a SAS date, time, or datetime variable and
a constant term

use of the INTCK function

INTCK Function

A popular and powerful SAS function, INTCK, is available to determine the number of
time periods which have been crossed between two SAS date, time or datetime varia-
bles. The form of this function is: INTCK(interval,from,to)

Where: interval = character constant or variable name representing the time period
of interest enclosed in single quotes from = SAS date, time or datetime value identify-
ing the start of a time span.

Data day1.candy_date1;
Set day1.candy_sales_summary;
Year=intck(Year, date, today());
Run;

The above code with calculate the interval between two date values (date, current
date) in terms of years. But the value will be integer value.

DatDif and Yrdif Function


Yrdif Returns the difference in years between two dates.
Syntax: YRDIF(sdate,edate,basis)

Orangetree Business Solutions Private Limited , 2012 Page 35


Arguments
Sdate - specifies a SAS date value that identifies the starting date.
Edate - specifies a SAS date value that identifies the ending date.
Basis - identifies a character constant or variable that describes how SAS calculates
the date difference. The following character strings are valid:
'30/360'
specifies a 30-day month and a 360-day year in calculating the number of years. Each
month is considered to have 30 days, and each year 360 days, regardless of the actu-
al number of days in each month or year.

'ACT/ACT'
uses the actual number of days between dates in calculating the number of years.
SAS calculates this value as the number of days that fall in 365-day years divided by
365 plus the number of days that fall in 366-day years divided by 366.

'ACT/360'
uses the actual number of days between dates in calculating the number of years.
SAS calculates this value as the number of days divided by 360, regardless of the actu-
al number of days in each year.

'ACT/365'
uses the actual number of days between dates in calculating the number of years.
SAS calculates this value as the number of days divided by 365, regardless of the actu-
al number of days in each year.

Datdif- The DATDIF function returns the number of days between two dates. The argu-
ments required are the start_date, end_date and basis. The end_date specifies the
date to subtract from (ie. the more recent of the two dates) and the start_date speci-
fies the date to be subtracted (ie. the less recent of the two dates). The basis is a char-
acter constant or variable that specifies the number of days in a month and year that
SAS should assume to calculate the difference. act/act specifies the actual values
and is aliased by Actual. 30/360 assumes a 30 day month and 360 days in an year.

Data day1.candy_date2;
Set day1.candy_sales_summary;
format Sysdate ddmmyy10. ;
Sysdate=today();
Day_diff=datdif(date,sysdate,ACT/ACT);
Year_Diff=yrdif(date,sysdate,ACT/ACT);
Run;

MDY Function

The MDY function takes three numeric constants, variables or expressions representing
the month, day and year and returns a date value comprising of the month, day and
year supplied as arguments. The month argument has to lie in the range 1-12, and the
day argument in the range 1-31. The year argument can be 2 or 4 integers in the first
case, the year will be picked based on the YEARCUTOFF option.

Orangetree Business Solutions Private Limited , 2012 Page 36


Data day1.candy_date3;
Set day1.candy_date;
Format New_date ddmmyy10.;
New_date=mdy( ,month,day,year);
Run;

In the above code we concatenate date value which is under three columns the
month variable, day variable and year variable and we will join them using MDY to get
the complete date value under one column i.e. New_date.

Datepart and timepart

DATEPART( ) and TIMEPART( ) functions are used to extract the date and time values
from the SAS date-time value respectively. One need to provide the date-time stamp
as an argument to these function, then the corresponding function will return the de-
sired part of the given date-time stamp.

SYNTAX:

DATEPART(sasdate_time_value);

TIMEPART(sasdate_time_value);

data temp;
date_time = "19DEC2010:20:10:10"dt;
date_part = datepart(date_time);
time_part = timepart(date_time);
run;

SAS Text Functions

Functions That Change the Case of Characters


Upcase
Low case
Propcase

UPCASE Function
Purpose: To change all letters to uppercase.
Syntax: UPCASE(character-value)
character-value is any SAS character expression

LOWCASE Function
To change all letters to lowercase.
Syntax: LOWCASE(character-value)
character-value is any SAS character expression.
Note: The corresponding function UPCASE changes lowercase to uppercase.

Orangetree Business Solutions Private Limited , 2012 Page 37


PROPCASE Function
To capitalize the first letter of each word in a string.
Syntax: PROPCASE(character-value)
character-value is any SAS character expression.

Data day1.crime_new;
Set day1.crime;
State_up=upcase(Staten);
State_low=lowcase(staten);
State_prop=propcase(staten);
Run;

In above cade the case of the variable Staten is changed from Procase to upcase
and stored under an new variable name State_up. Same is the case with other func-
tions.

Find Function:

To locate a substring within a string and ignore case or trailing blanks.

Syntax: FIND(character-value, find-string <,'modifiers'> )

Data day1.candy_find;
Set day1.candy;
Position=find (Product,Chocolate,I);
Run;

This code will help to locate that whether variable product contains chocolate or not.
If yes, then a what position it is. Basically it return a numeric value which means at
what place chocolate I s present. But find function is not case sensitive because of the
modifier I.

Index Function

To locate the starting position of a substring in a string.

Syntax: INDEX(character-value, find-string)

character-value is any SAS character expression. find-string is a character variable or


string literal that contains the substring for which you want to search. The function re-
turns the first position in the character-value that contains the find-string. If the find-
string is not found, the function returns a 0.

Data day1.candy_index;
Set day1.candy_sales_summary;
Position=index(product,Chocolate);
Run;

Orangetree Business Solutions Private Limited , 2012 Page 38


This code will help to locate that whether variable product contains chocolate or not.
If yes, then a what position it is. Basically it return a numeric value which means at
what place chocolate is present. But INDEX function is case sensitive.

Substr Function

To extract part of a string. When the SUBSTR function is used on the left side of the
equal sign, it can place specified characters into an existing string.

Syntax: SUBSTR (character-value, start <,length>)

character-value is any SAS character expression. start is the starting position within the
string. length if specified, is the number of characters to include in the substring. If this
argument is omitted, the SUBSTR function will return all the characters from the start po-
sition to the end of the string.

For these examples, let STRING = "ABC123XYZ"

SUBSTR(STRING,4,2) = "12", SUBSTR(STRING,4) = "123XYZ"

Data day1.candy_cust1;
Set day1.candy_customers;
New_region=substr(region,1,3);
Run;

In this code we will extract first three letters from the region variable and will be stored
under new variable New_region.

SUBSTR (on the left-hand side of the equal sign)

As we mentioned in the description of the SUBSTR function, there is an interesting and


useful way it can be usedon the left-hand side of the equal sign.

Purpose: To place one or more characters into an existing string.

Syntax: SUBSTR(character-value, start <, length>) =

character-value is any SAS character expression. start is the starting position in a string
where you want to place the new characters. length is the number of characters to
be placed in that string. If length is omitted, all the characters on the right-hand side of
the equal sign replace the characters in character-value.

Data day1.candy_cust2;
Set day1.candy_customers;
Substr(region,1,3)=abc;
Run;

Orangetree Business Solutions Private Limited , 2012 Page 39


In the above code the first three letters under the variable region will be replaced by
abc. For example the observation under region are Central, East, North. They will be
changed as abctral,abct and abcth in the new dataset candy_cust2.

Scan Function

Extracts a specified word from a character expression, where word is defined as the
characters separated by a set of specified delimiters.

Syntax: SCAN(character-value, n-word <,'delimiter-list'>)

character-value is any SAS character expression. n-word is the nth "word" in the string.
If n is greater than the number of words, the SCAN function returns a value that con-
tains no characters. If n is negative, the character value is scanned from right to left.
A value of zero is invalid.

Data day1.candy_cust2;
Set day1.candy_customers;
First=scan(Name,1, );
Middle=scan(Name,2, );
Last=scan(Name,3, );
Run;

This code will extract the string or the character value under the variable name part by
part separated by the delimiter space. For example the string under variable name is
bulls eye emporium. After applying scan function the first part (bulls) will come under
first, eye will come under the variable middle and emporium will come under the vari-
able last. Now the string is divided into three parts i.e. first, middle and last . To join
them and put them under one common variable (column heading) we will apply Catx
function.

CATX Function

To concatenate (join) two or more character strings, stripping both leading and trailing
blanks and inserting one or more separator characters between the strings.

Syntax: CATX (separator, string-1, string-2 <,string-n>)

separator is one or more characters, placed in single or double quotation marks, to be


used as separators between the concatenated strings. string-1, string-2,string-n are the
character strings to be concatenated. These arguments can also be written as: CATX
(" ",OF C1-C5), where C1 to C5 are character variables.

Data day1.candy_cust3;
Set day1.candy_cust2;
New_name=catx ( ,first, middle, last);
Run;

This code will join back the three substring into one complete string.

Orangetree Business Solutions Private Limited , 2012 Page 40


Translate function

TRANSLATE can substitute one character for another in a string. TRANWRD is more flexi-
bleit can substitute a word or several words for one or more words.
Purpose of translate function is to exchange one character value for another. For ex-
ample, you might want to change values 15 to the values AE.
Syntax: TRANSLATE(character-value, to-1, from-1 <, to-n, from-n>) ; character-value is
any SAS character expression. to-n is a single character or a list of character values.
from-n is a single character or a list of characters. Each character listed in from-n is
changed to the corresponding value in to-n. If a character value is not listed in from-n,
it will be unaffected.

DATA MULTIPLE;
INPUT QUES : $1. @@;
QUES = TRANSLATE(QUES,'ABCDE','12345');
DATALINES;
1 4 3 2 5
5 3 4 2 1
;
run;

In this example, we want to convert the character values of 15 to the letters AE.

Tranwrd Function

To substitute one or more words in a string with a replacement word or words. It works
like the find and replace feature of most word processors.

Syntax: TRANWRD(character-value, from-string, to-string)

character-value is any SAS character expression. from-string is one or more characters


that you want to replace with the character or characters in the to-string. to-string is
one or more characters that replace the entire from-string.

Making the analogy to the find and replace feature of most word processors here,
from-string represents the string to find and to-string represents the string to replace.
Notice that the order of from- and to-string in this function is opposite (and more logi-
cal to this author) from the order in the TRANSLATE function.

Data day1.candy_tranwrd;
Set day1.candy_sales_summary;
New=tranwrd (product,Chocolate,Choco);
Run;

The substring chocolate under the variable product will be replaced by choco.

Trim Function

To remove trailing blanks from a character value. This is especially useful when you

Orangetree Business Solutions Private Limited , 2012 Page 41


want to concatenate several strings together and each string may contain trailing
blanks.

Syntax: TRIM (character-value)

character-value is any SAS character expression.

data trim;
set day1.candy;
oldName=name||brand;
NewName=trim(Name)||trim(Brand);
run;

In this example we join two variable name and brand under the new column heading
Old name and it has trailing blanks in between . Now to remove the training blanks
and join them we use trim function.

Numeric Functions

Int function

The INT function truncates the decimal portion of the value of the argument. The inte-
ger portion of the value of the argument remains. The INT function takes the integer
value of each element of the argument matrix.

Syntax: Int(numeric variable);

Data day1.candy_int;
Set day1.candy_sales_summary;
Saleamount=int(sale_amount);
Run;

This example will extract only the interger value of sale_amount. If sale_amount is Rs
100.92 ,it will extract only the integer part(100).

Round Function

The round function to round to the nearest decimal.

Syntax: Round(numeric variable, .1);

DATA _null_;
cost = 4.99;
units = 3;
ucost = Round(cost/units,.01);
PUT cost units ucost ;
RUN;

Orangetree Business Solutions Private Limited , 2012 Page 42


In this example we are not creating any dataset that is why we used _null_. Secondly
we are rounding off the value of cost /units(4.99/3) to two decimal place (that is
why .01 is mentioned). To display the result in LOG we used put function. Similarly be-
low code will round off the numeric value to the nearest integer value and also to one
decimal place respectively.

Data day1.candy_rnd;
Set day1.candy_sales_summary;
Sale=round(sale_amount);
Sale1=round(sale_amount,.1);
Run;

In the DATA step you can use a number of SAS functions, e.g., MEAN (computes arith-
metic mean), SUM (calculates sum of arguments), VAR (calculates the variance), ABS
(returns absolute value), SIN (calculates sine), LOG (produces the natural logarithm),
SQRT (calculates the square root). For instance, to create a new variable final which
will be the arithmetic mean (average) of the 3 scores (variables: test1, test2, and
test3), you would use the following command: final=MEAN(test1,test2,test3);

Sum, Mean, Max and Min Function

The SUM function sums the numeric arguments. The arguments are separated by com-
mas.

Syntax: Sum(variable1, variable2,).

We use this in data step and we get row sum. The MEAN function average the numeric
arguments. The arguments are separated by commas.

Data day1.crime_1;
Set day1.crime;
Total_crime=sum(var1,var2,var3..);
Avg_crime=mean(var1,var2,var3..);
Max_crime=max(var1,var2,var3..);
Min_crime=min(var1,var2,var3..);
Run;

These functions are used in the data step so that we can get total crime, average
crime, maximum crime and minimum crime state wise. Functions used in data step will
give row sum.

Input Function

Input function perform character-to-numeric conversion. Also useful in converting


character values such as dates into true SAS numeric date values.
Obviously, if a variable contains non-numeric information (e.g. names) then it should
be saved as a SAS character variable. If the variable contains real numeric data
which will be used in numeric calculations, such as weight or height, then it should be
stored in a numeric variable. If a variable contains integer data which will not neces-

Orangetree Business Solutions Private Limited , 2012 Page 43


sarily be used in any calculations, such as ID number, it is preferable to save it as a vari-
able of type numeric rather than a variable of type character, even if you have no
intention of performing algebraic calculations using the variable. For a nominal varia-
ble, such as gender, it is preferable to store this as a numeric variable with an appropri-
ate format rather than a character variable with, for example, values 'M' and 'F'. Nu-
meric data are sometimes imported into variables of type character and it may be
desirable to convert these to variables of type numeric. Note that it is not possible to
directly change the type of a variable. It is only possible to write the variable to a new
variable containing the same data, although with a different type. If the INPUT func-
tion returns a character value to a variable that has not yet been assigned a length,
by default the variable length is determined by the width of the informat. The INPUT
function enables you to convert the value of source by using a specified informat. The
informat determines whether the result is numeric or character. Use INPUT to convert
character values to numeric values or other character values.

Syntax: INPUT (source,informat.)

Arguments: source specifies a character constant, variable, or expression to which you


want to apply a specific informat. informat. is the SAS informat that you want to apply
to the source. This argument must be the name of an informat followed by a period,
and cannot be a character constant, variable, or expression.

data day1.sales2;
set day1.sales1;
sales_amount1=input(sales_amount,comma12.);
format sales_amount1 comma12.;
run;

Put Function

The Put function converts from a numeric or character input/variable/target to a char-


acter output, applying a format as part of the conversion. The Put function can be
applied to a character or numeric input, but the output of the Put function is always
character.

data day1.sales4;
set day1.sales1;
cust_id=put(customer_id,3.);
format cust_id $3.;/* since cust_id is character variable */
run;

This is an example of how to change a numeric variable, ID, to character variable. This
example uses PUT function to convert numeric data to character data. The PUT func-
tion writes values with a specified format. It takes two arguments: the name of the nu-
meric variable and a SAS format or user-defined format for writing the data.

Orangetree Business Solutions Private Limited , 2012 Page 44


Chapter 6
Proc Report
and Combining Data Sets
The Proc Report has many features, like print and means procedures. The significance
of the Proc Report has its features to format each and every heading , summary or de-
tail line. The PROC REPORT allows formatting the summary lines by color , font , add
custom message, include summary values of the variables etc. The use of the by state-
ment is very much similar in the way , we use in PROC PRINT procedure. The output of
the proc report is generated in a separate window and not in the regular output win-
dow. The statements BREAK , COLUMN, COMPUTE, DEFINE are used more effectively in
proc report procedure.

Syntax Of PROC REPORT:

PROC REPORT DATA= datasetname <options>;


COLUMN variable list and column specifications;
DEFINE column / column usage and attributes;
COMPUTE column; compute block statements; ENDCOMP;
RUN;

The COLUMN statement is used to identify all variables used in the generation of the
table. This statement is followed by the DEFINE statement which specifies how the col-
umn is to be used and what its attributes are to be. One DEFINE statement is used for
each variable in the COLUMN statement. The COMPUTE statement is used to start the
definition of a compute block. The compute block is terminated with a ENDCOMP.
The compute block has a variety of uses including the creation of new columns and
performance of column specific operations.

The following PROC step shows the code to create a simple REPORT table.

proc report data=day1.clinics ;


columns region lname fname wt;
define region / display;
define lname / display;
define wt / display;
run;

You can see that this report resembles the output from a PROC PRINT, however there
are several distinct differences. These include:
there is no OBS column
variable labels are used instead of column names
it is possible to calculate summary statistics and new columns with PROC

Orangetree Business Solutions Private Limited , 2012 Page 45


USING THE COLUMN STATEMENT

The COLUMN statement is used not only to identify the variables of interest, but to also
add headers and to group variables. The primary function of the COLUMN statement is
to provide a list of variables for REPORT to operate against, and these variables are
listed in the order (left to right) that they are to appear on the report. In addition to list-
ing the variables, you can do a number of other things on the COLUMN statement as
well. For instance you can use a comma to attach a statistic to a variable. The de-
fault statistic is the SUM.

proc report data=day1.clinics ;


column region lname fname wt,mean;
define region / display;
title1 'Using Proc REPORT';
title2 'Using the Comma';
run;

In this example column is used to select


the variable for generating report. If we
select all numeric variable then the out-
put would be summarizing one as sum is
the default function. If it is mixed of char-
acter and numeric variable then the out-
put will be listing one. But for all numeric variable , we can also generate listing report
by using the display keyword for any one of the numeric variable. The way we used it
for region in the above code.

Usage of Variables in a Report

Much of a reports layout is determined by the usages that you specify for variables in
the DEFINE statements or DEFINITION windows. For data set variables, these usages are:
DISPLAY,ORDER,ACROSS,GROUP,ANALYSIS. A report can contain variables that are not
in the input data set. These variables must have a usage of COMPUTED.

Display Variables

A report that contains one or more display variables has a row for every observation in
the input data set. Display variables do not affect the order of the rows in the report. If
no order variables appear to the left of a display variable, then the order of the rows in
the report reects the order of the observations in the data set. By default, PROC RE-
PORT treats all character variables as display variables.

Order Variables

A report that contains one or more order variables has a row for every observation in
the input data set. If no display variable appears to the left of an order variable, then
PROC REPORT orders the detail rows according to the ascending, formatted values of

Orangetree Business Solutions Private Limited , 2012 Page 46


the order variable. You can change the default order with ORDER= and DESCENDING
in the DEFINE statement.

Across Variables

PROC REPORT creates a column for each value of an across variable. PROC REPORT
orders the columns by the ascending, formatted values of the across variable.

Group Variables

If a report contains one or more group variables, then PROC REPORT tries to consoli-
date into one row all observations from the data set that have a unique combination
of formatted values for all group variables. When PROC REPORT creates groups, it or-
ders the detail rows by the ascending, formatted values of the group variable.

Analysis Variables

An analysis variable is a numeric variable that is used to calculate a statistic for all the
observations represented by a cell of the report. You associate a statistic with an anal-
ysis variable in the variables denition or in the COLUMN statement. By default, PROC
REPORT uses numeric variables as analysis variables that are used to calculate the Sum
statistic.

Computed Variables

Computed variables are variables that you dene for the report. They are not in the
input data set, and PROC REPORT does not add them to the input data set. However,
computed variables are included in an output data set if you create one. In the win-
dowing environment, you add a computed variable to a report from the COMPUTED
VAR window.

In the non windowing environment, you add a computed variable by


including the computed variable in the COLUMN statement.
dening the variables usage as COMPUTED in the DEFINE statement.
computing the value of the variable in a compute block associated with the varia-
ble.

Example:

Proc Report data=sashelp.class;


columns sex name age height weight ;
define sex/group Gender ;
run;

In this above observations under variable sex will not repeat themselves because of
the group keyword. Secondly we have added a label to the variable sex i.e. Gen-
der. This is the way to add label in proc report to a variable.

Orangetree Business Solutions Private Limited , 2012 Page 47


Example:

Proc Report data=sashelp.class;


columns sex name age height weight ;
define Age/order ;
run;

In this example we have arranged the output in ascending order by age . This possi-
ble because of the order keyword.

Example:

Proc Report data=sashelp.class ;


columns age height weight ;
define height / analysis mean ;
define weight / analysis mean ;
run;

In this example we have changed the default sum function to mean function by the
use of analysis keyword. Now in the report we will get average height and average
weight.

Example:

Proc Report data=sashelp.class ;


columns name sex age height weight ratio;
define ratio / computed format=6.2;
compute ratio;
ratio = height.mean / weight.mean;
endcomp;
run;

In this example we are going to compute a new variable ratio. That is why it is de-
clared in the column statement. Then we are identifying the ratio variable for compu-
tation in define statement. Ratio variable is computed by dividing average height by
average weight row wise. In this way a loop is created, to end that loop we are using
endcomp.

COMBINING DATA SETS

Append

The APPEND procedure adds the observations from one SAS data set to the end of an-
other SAS data set. PROC APPEND does not process the observations in the first data
set. It adds the observations in the second data set directly to the end of the original
data set.
By appending Data Sets
It is concatenation of two data sets which are already existing.
The observation in each data set will stack together according to the order speci-

Orangetree Business Solutions Private Limited , 2012 Page 48


fied to form new data set
Appends the observations from one
data set to another data set

Syntax:

DATA output-SAS-data-set;
SET SAS-data-set-1 SAS-data-set-2;
RUN;

Where,
output-SAS-data-set names the data
set to be created
SAS-data-set-1 and SAS-data-set-2
specify the data sets to be read
SAS-data-set-1 and SAS-data-set-2
gets appended and copies to out-
put-SAS-data-set

Example:

Data combined;
Set A C;
Run;

Appending Data Sets Using Proc Step

Adding observations using append procedure


The base file gets appended with observations from data file.
No new data set is created
Works only if the base file is having all the variables in the data file, otherwise use
force option

Syntax:

Proc Append base = <SAS-data-set-1> data = <SAS-data-set-2> [force]; Run;

Where,
SAS-data-set-1 and SAS-data-set-2 specify the data sets to be read
SAS-data-set-2 gets appended to SAS-data-set-1999
Force is an optional keyword, used when base file is having some variables missing
compared to data file, to force SAS to append

Example:

Proc Append base = A data = C;


Run;

Orangetree Business Solutions Private Limited , 2012 Page 49


Merging

A merge combines observations from two or more SAS data sets based on the values
of specified common variables (one or more). It creates a new data set (the merged
data set). Merging is done in a data step with the statements

MERGE : to name the input data sets

BY : to name the common variable (s) to be used for matching

Prerequisites for a match-merge

input data sets must have a common variable

input data sets must be sorted by the common variable (s)

Syntax:

DATA output-SAS-data-set;

MERGE SAS-data-set-1 SAS-data-set-2;

BY <DESCENDING> variable(s);

RUN;

Where,
output-SAS-data-set names the data set to be created
SAS-data-set-1 and SAS-data-set-2 specify the data sets to be read
variable(s) in the BY statement specifies one or more variables whose values are
used to match observations
DESCENDING indicates that the input data sets are sorted in descending order by
the variable that is specified
If there are more than one variable in the BY statement, DESCENDING applies only
to the variable that immediately follows it
Each input data set in the MERGE statement must be sorted in order of the values
of the BY variable(s)
Each BY variable must have the same type in all data sets to be merged

Sorting of Data Set


Procedure sort can be used to sort the data sets either ascending or descending

Syntax:
Proc Sort Data = Data-Set-1 [out = Data-Set-2];
By [Descending] Variabel1 [Variable2 ];
Run;

Orangetree Business Solutions Private Limited , 2012 Page 50


Here,
Data-Set-1 will be sorted in either ascending or descending order
If OUT= option is specified then a Data-Set-1 will be copied to Data-Set-2 and will
get sorted there but the original data set (Data-Set-1) remains un sorted.
By statement will sort the data set according to the variables specified
Descending option will sort the data
set in descending order by the varia-
ble just proceeding that.

Example:

During match-merging SAS sequentially


checks each observation of each data
set to see whether the BY values match,
then writes the combined observation to
the new data set.

data merged;
merge a b;
by num;
run;

Example: Sample Data Sets:

Clinic.Demog

proc sort data=clinic.demog;


by id;
run;

proc print data=clinic.demog;

Clinic.Visit

proc sort data=clinic.visit;


by id;
run;
proc print data=clinic.visit;
run;

Example: Merging

data clinic.merged;
merge clinic.demog clinic.visit;
by id;
run;

Orangetree Business Solutions Private Limited , 2012 Page 51


Excluding Unmatched Observations

By default, DATA step match-merging combines all observations in all input data sets.
To exclude unmatched observations from output data set, use the IN= data set option
and the subsetting IF statement in DATA step. In this case, use the IN= data set option
to create and name a variable that indicates whether the data set contributed data
to the current observation; the subsetting IF statement to check the IN= values and to
write to the merged data set only those observations that appear in the data sets for
which IN= is specified.

Syntax: (IN=variable)
Where,
the IN= option, in parentheses, follows the data set name
variable names the variable to be created
Within the DATA step, the value of the variable is 1 if the data set contributed data
to the current observation. Otherwise, its value is 0.

Example:

To Match-merge the data sets Clinic.Demog and Clinic.Visit and select only observa-
tions that appear in both data sets :
Use IN= to create two temporary variables, indemog and invisit
The first IN= creates the temporary variable indemog, which is set to 1 when an ob-
servation from Clinic.Demog contributes to the current observation; otherwise, it is
set to 0
Likewise, the value of invisit depends on whether Clinic.Visit contributes to an ob-
servation or not
IF statement is used to select only observations that appear in both Clinic.Demog
and Clinic.Visit
If the condition is met, the new observation is written to Clinic.Merged. Otherwise,
the observation is deleted

data clinic.merged;
merge clinic.demog (in=
indemog) clinic.visit
(in=invisit);
by id;
if indemog=1 and invisit=1;
run;
proc print da-
ta=clinic.merged;
run;

Orangetree Business Solutions Private Limited , 2012 Page 52


Different Types Of Merge

Join Condition Description

Full Join No condition Includes all the observations from both


the dataset

Right Inner Join If Y = 1 Includes all the observations from right


dataset

Left Inner Join If X = 1 Includes all the observations from left


dataset

Exact Inner Join If X = 1 and Y = 1 Includes all the matching observations


from both datasets

Outer Join If X = 0 or Y = 0 Includes all the non matching observa


tions from both datasets

Right Outer Join If X = 0 and Y = 1 Includes all the non matching observa
tions from right dataset

Left Outer Join If X = 1 and Y = 0 Includes all the non matching observa
tions from left dataset

Orangetree Business Solutions Private Limited , 2012 Page 53


Chapter 7
Infile Statement and Array

In any programming language, a link is needed between programs, files.


INFILE and FILE are statements that SAS uses:
linking to raw files (normally contain only data and no data dictionary)
INFILE is used to point to input files, FILE points to output files

INFILE/FILE work with other SAS statements to provide extensive data input and output
in the DATA step, such as:

FILENAME
DATALINES
PUT
INPUT

Syntax: INFILE file-specification <options> ;

Where, file-specification can take the form fileref to name a previously defined file ref-
erence or 'filename' to point to the actual name and location of the file and options
describes the input file's characteristics and specifies how it is to be read with the INFILE
statement.

Example:
FILENAME test 'c: \ irs \ personal\refund.dat ';
INFILE test obs =100;

Here,
INFILE statement is used along with FILENAME statement;
Test is the file reference which contains the data;
Obs= option will import only the first 100 observations from the data;
INFILE statement can also specify the complete path of a file instead of using the FILE-
NAME statement;

Example: INFILE c: \ irs \ personal \ refund.dat ;

Input Statement:

Describes the fields of raw data to be read and placed into the SAS data set.

Specify the variable names and data types

Orangetree Business Solutions Private Limited , 2012 Page 54


Syntax: INPUT variable <$> startcol - endcol . . . ;

Where,
variable is the SAS variable name assigned to the field
($) identifies the variable type as character (if the variable is numeric, then $ is not
specified)
startcol represents the starting column for this variable
endcol represents the ending column for this variable.

Example:
The following code reads data from the file below.
filename exer c : \ users\ exer.dat ;

data exercise ;
infile exer ;
input ID $ 1-4 Age 6-7 ActLevel $ 9-12 Sex $ 14 ;
run ;

Reading Column input or fixed field raw data files


It is the most common input style
Column input specifies actual column locations for values
In such files the values for each variable are in the same location in all records
When use column input, the data must be:
Standard character or numeric values
In fixed fields

Syntax:

The complete syntax for importing a raw data


file from the memory to SAS is: LIBNAME state-
ment; FILENAME statement; DATA statement;
INFILE statement; INPUT statement; RUN state-
ment;

Example:
libname libref SAS-data-library ;
filename exercise c:\users\exer.dat ;
data exer ;
infile exercise ;
input ID $ 1-4 Age 6-7 ActLevel $ 9-12 Sex $ 14 ;
Run ;

Here, Libname creates library reference, Filename Reference a external file, Data set
name a SAS data set to be created, Infile statement identifies a external file, Input
statement describes the data from the external file.

Orangetree Business Solutions Private Limited , 2012 Page 55


Features of Column Input

It can be used to read character variable values that contain embedded blanks.

input Name $ 1-25;

No placeholder is required for missing data. A


blank field is read as missing and does not
cause other fields to be read incorrectly.

input Item $ 1-13 IDnum $ 15-


19 Instock 21-22 Backord 24-
25;

Fields or parts of fields can


be re-read.

input Item $ 1-13 IDnum $ 15-19 Supplier $ 15-16 InStock 21-22 BackOrd 24-25;

Fields do not have to be separated by blanks or other delimiters.

input Item $ 1-13 IDnum $ 14-18 InStock 19-20 BackOrd 21-22;

Standard and Nonstandard Numeric Data

Standard numeric data values can contain only


numbers
decimal points
numbers in scientific or E-notation (2.3E4, for example)
plus or minus signs

Orangetree Business Solutions Private Limited , 2012 Page 56


Nonstandard numeric data includes
values that contain special characters, such as percent signs (%), dollar signs ($),
and commas (,)
date and time values
data in fraction, integer binary, real binary, and hexadecimal forms

The file below contains personnel information for a technical writing department of a
small computer manufacturer. The fields contain values for each employee's last
name, first name, job title, and annual salary. The values for Salary contain commas.
The values for Salary are considered to be nonstandard numeric values. Column input
cannot be used to read these values.

Choosing an Input Style


Nonstandard data values require an input style that is more flexibility than column
input
Formatted input can be used, which combines the features of column input with
the ability to read both standard and nonstandard data.

When raw data that is organized into fixed fields is to be read, use:
Column input to read standard data only
Formatted input to read both standard and nonstandard data.

Reading Formatted Input

INPUT Statement

Syntax:

INPUT < column pointer-control > variable informat . ;

Where,
Column pointer-control positions the input pointer on a specified column
variable is the name of the variable that is being created
informat is the special instruction that specifies how SAS reads raw data.

Orangetree Business Solutions Private Limited , 2012 Page 57


Column Pointer Controls

The two column pointer controls are:


@n :- Moves the input pointer to a specific column number
+n :- Moves the input pointer forward to a column number that is relative to the
current position

@n Column Pointer Control

It moves the input pointer to a specific column number. The @ moves the pointer to
column n, which is the first column of the field that is being read.

The Syntax for Input using @n column pointer control is:

INPUT @n variable informat.;

Where,
variable is the name of the variable that is being created
informat is the special instruction that specifies how SAS reads raw data

Example:

input @9 FirstName $5. @1 LastName $7. @15 JobTitle 3. @19 Salary comma9. ;

Here,
The value for FirstName is read first, starting in column 9.
The lastname is read by taking the @ pointer to the 1st column.
The jobtitle and salary is read from column 15 and column 19 respectively.

The +n Pointer Control

It moves the input pointer forward to a column number that is relative to the current
position . It moves the pointer forward n columns.

The Syntax for Input using +n column pointer control is:


INPUT +n variable informat . ;

Where,
variable is the name of the variable that is being created
informat is the special instruction that specifies how SAS reads raw data
In order to count correctly, it is important to understand where the column pointer
control is located after each data value is read

Orangetree Business Solutions Private Limited , 2012 Page 58


Example:
input LastName $7. +1 FirstName $5. +5 Salary comma9. @15 JobTitle 3.;

Here,
Because the values for LastName begin in column 1, a column pointer control is
not needed
After LastName is read, the pointer moves to column 8
To start reading FirstName, which begins in column 9, move the column pointer
control ahead 1 column with +1
After reading FirstName, the column pointer moves to column 14
Moved column pointer ahead 5 columns from column 14 to read Salary
@n column pointer control is used to return to column 15 to read jobtitle

Array

SAS arrays are useful when we wish to perform a similar operation on a set of variables,
e.g. array weight wt1-wt50;
do i=1 to 50;
if weight{i}=999 then weight{i}=.;
end;

A SAS array is nothing more than a collection of variables (of the same type), in which
each variable can be identified by referring to the array and, by means of an index, to
the location of the variable within the array.

SAS arrays are defined using the ARRAY statement, and are only valid within the data
step in which they are defined. The syntax for the array statement is:

ARRAY array-name {subscript} <$> < length >


<< array-elements > <( initial-values )>>

array-name must follow the naming rules for SAS variables.


{ subscript } is the dimension (possibly multiple) of the array, and can be omitted or
specified as {*} in which case SAS infers the dimension from the number of array el-
ements.
< array-elements > is the list of array elements (variables) which can be omitted if
the dimension is given, in which case SAS creates variables called array-
name1to array-name{n} where {n} is the dimension of the array.
For example: array wt {50};
will cause the variables wt1-wt50 to be created.

Orangetree Business Solutions Private Limited , 2012 Page 59


A Simple Example

Suppose we have a data set with 10 variables, named x1,x2,: : :,x10. Whenever any of
these variables has a value of 9, we wish to replace it with a missing value (.).

data new;
set old;
array x x1-x10;
do i=1 to dim(x);
if x{i} = 9 then x{i} = .;
end;
run;

Using dim(x) instead of a constant (10) eliminates the need to know the size of the ar-
ray. Special variable lists (like first -- last or x:) can be very useful when setting up an ar-
ray.

In this example the array name is zero. The variable list includes x1,x2,x3,x4 and x5. This
array changes all missing values to zeros in the variables named in the array state-
ment.

data original;
input x1-x5;
cards;
9 8 7 8 .
8 7 6 . 9
. 9 7 6 .
;Run;

Now we modify the data. Here we are assigning some values to the missing data.

data modified; set original;


array zero x1-x5;
do over zero;
if zero=. then zero=0;
end;
Run;

Renaming Variables Using Array

array var a1-a20;


array varnew na1-na20;
do over var;
varnew=var;
if varnew=. Then varnew=0;
end;

Interpretation

Here we rename the variables in array var, a1-a20, to na1-na20 in array varnew, and
then assign the new variables a value of 0 if the value was missing. If we do not drop

Orangetree Business Solutions Private Limited , 2012 Page 60


the original variables, a1-a20, we have preserved the raw data values within our data
set as well as created new variables that we can use for analysis with no missing data.

Creating New Variable Using Multiple Array

array atemp Htemp1-Htemp5;


array htemp htemp1-htemp5;
array arange rng1-rng5;
do over atemp;
arange=atemp-htemp;
end;

In the above code , the order of the variables in each array is the same, var1-var5.
We need to specify one array name in the DO OVER Loop as each array contains the
same number of elements.
We can do the above code by use of indexed and do loop. It will give us the same
results.

array atemp {5} Ltemp1-Ltemp5;


array htemp {5} htemp1-htemp5;
array range {5} rng1-rng5;
do i=1 to 5;
range[i]=htemp[i] atemp(i);
end;

Orangetree Business Solutions Private Limited , 2012 Page 61


Chapter 8
Introduction to SQL in SAS

Structured Query Language (SQL) is a standardized , widely used language that re-
trieves and updates data in relational tables and databases. A relation is a mathemat-
ical concept that is similar to the mathematical concept of a set. Relations are repre-
sented physically as two dimensional tables that are arranged in rows and columns.
The structured query language is now in the public domain and is part of many ven-
dors products.

What is the SQL Procedure?


The SQL procedure is SAS implementation of structured query language. Proc SQL is a
part of base SAS software , and you can use it with any SAS data sets(table). Often
proc sql can be an alternative to other SAS procedures or the data step. You can use
SAS language elements such as global statements , data set options , functions , infor-
mats and formats with PROC SQL just as you can with other SAS procedures.

PROC SQL can


Generate reports
Generate summary statistics
Retrieve data from tables or views
Combine data from tables or views
Create tables, views and indexes
Update the data values in PROC SQL tables
Update and retrieve data from database management system table

List of Keywords , operators and Function used in the session

The Select Statement

The SELECT statement is the primary tool of PROC SQL . You use it to identify, retrieve
and manipulate columns of data from a table. You can also use several optional
clauses within the SELECT statement to place restrictions on a query.

Proc Sql;
Select * from day1.uscitycoords;
Quit;

The select statement must contain a select clause and a from clause , both of which
are required in a PROC SQL query.

Orangetree Business Solutions Private Limited , 2012 Page 62


Selecting some columns from the data set

Proc sql;
Select city, state from day1.uscitycoords;
Quit;

You can eliminate the duplicate rows from the results by using the distinct keyword in
the select clause. The following query , which uses the Distinct keyword to produce a
single row of output for each Continent that is in the unitedstates tables.

Proc Sql;
Select distinct continent from day1.unitedstates;
Quit;

Calculating Values

You can perform calculations with values that you retrieve from numeric columns. The
following example converts temperatures in the Worldtemps table from Fahrenheit to
Celsius. By specifying a column alias , you can assign a new to any column within a
PROC SQL query. The new name must follow the rules for SAS names.

Proc sql;
Select city, (avglow 32)*5/9 as lowc format 4.1
From day1.worldtemps;
Quit;

Referring to a Calculated column by Alias

When you use a column alias to refer to a calculated value, you must use the CALCU-
LATED keyword with the alias to inform PROC SQL that the value is calculated within
the query. The Following examples uses two calculated values, lOWC and HIGHC , to
calculate a third value, Range:

Proc sql;
Select city, (avghigh 32)*5/9 as highc format 4.1 , (avglow 32)*5/9 as lowc for-
mat 4.1 , (calculated highc calculated lowc) as range format 4.1 from
day1.worldtemps;
Quit;

Assigning Values Conditionally

You can use conditional logic within a query by using a CASE Expression to condition-
ally assign a value. You can use a CASE expression anywhere that you can use a col-
umn name. In this example , a CASE expression determines the climate zone for each
city based on the value in the latitude column in the Worldcitycoords table. The que-
ry also assigns as alias of Location to the value. You must close the CASE logic with the
END keyword.

Lets see the example in the next page.

Orangetree Business Solutions Private Limited , 2012 Page 63


Proc sql;
Select city, country, latitude,
Case
When latitude gt 67 then North Frigid
When 67 ge latitude ge 23 then North Temperate
When 23 gt latitude gt -23 then Torrid
When -23 ge latitude ge -67 then South temperate
Else South Frigid
End
As climatezone From day1.worldcitycoords;
Quit;

You can also construct a Case expression by using the case operand form , as in the
following example. This example selects states and assigns them to a region based on
the value of the continent column.

Proc sql;
Select name, continent,
Case continent
When North America then continental U.S
When Oceania then Pacific Islands
Else None
End
As region from day1.unitedstates;
Quit;

Replacing Missing Values

The COALESCE function enables you to replace missing values in a column with a new
value that you specify. For every row that the query processes, the coalesce function
checks each of its arguments until it finds a non missing value, then returns that value.
The following query replaces missing values in the high point column in the Continents
table with the words Not available, area with 0 and depth with its mean value.

Proc sql;
Select name,
coalesce(highpoint, Not Available) as highc,
coalesce(Area, 0) as newarea ,
coalesce (depth,mean(depth)) as newdepth
from day1.continents;
Quit;

The following CASE expression shows another way to perform the same replacement
of missing values. However , the COALESCE function requires fewer lines of code to ob-
tain the same results.

Proc sql;
Select name,
Case
When highpoint is missing then Not Available
Else highpoint
end

Orangetree Business Solutions Private Limited , 2012 Page 64


as highpoint from day1.continents;
Quit;

Sorting Data

You can sort query results with an ORDER BY clause by specifying any of the columns in
the table, including unselected or calculated columns.
The following example selects countries and their populations from the countries table
and orders the results by population.

Proc sql;
Select name, population format comma15. From day1.countries
Order by population;
Quit;

When you can use an ORDER BY clause, you change the order of the output but not
the order of the rows that are stored in the table. The Proc sql default sort order is as-
cending.

Sorting by Multiple Columns

You can sort by more than one column by specifying the column names, separated
by commas , in the order by clause. The following example sorts the countries table by
two columns, continents and name:

Proc sql;
Select name, Continent
From day1.countries
order by continent, Name;
Quit;

When you specify multiple columns in the order by clause, the first column determines
the primary row order of the results. Subsequent columns determine the order of rows
that have the same value for the primary sort. The following example sorts the features
table by feature type and name.

Proc sql;
Select name, type from day1.features
Order by type desc, name;
Quit;

Sorting by Calculated Column

You can sort by a calculated column by specifying its alias in the order by clause. The
following example calculates population density and then performs a sort on the cal-
culated density column:

Proc Sql;
Select name,
Population format comma15.,

Orangetree Business Solutions Private Limited , 2012 Page 65


area format comma8. ,
population/area as density format comma15.
From day1.countries
Order by density desc;
Quit;

Sorting By Column Position

The following report will get sorted in ascending order of name(A Z) as order of name
variable in the code is 1:

Proc sql;
Select name, population format comma10. From day1.countries
Order by 1;
Quit;

Sorting by Unselected Columns

You can sort query results by columns that are not included in the query. For Example,
the following query returns all rows in the Countries table and sorts them by population,
even though the population column is not included in the query.

Proc sql;
Select name, continent from day1.countries
Order by population desc;
Quit;

Sorting Columns that contain missing values

Proc sql sorts nulls, or missing values, before character or numeric data, therefore ,
when you specify ascending order, missing values appears first in the query results. The
following example sorts the rows in the continents table by the highpoint column.

Proc sql;
Select name, highpoint
from day1.continents
order by highpoint;
Quit;

Using a simple Where Clause

The following example uses a where clause to find all countries that are in the conti-
nent of Europe and their populations.

Proc sql;
Select name, population format comma15. From day1.countries
Where continent=Europe;
Quit;

Orangetree Business Solutions Private Limited , 2012 Page 66


The following example subsets the unitedstates table by including only states with pop-
ulations greater Than 5,000,000 people.

Proc sql;
Select name, population format comma15. ,
From day1.unitedstates
Where population gt 5000000
Order by population desc;
Quit;

Retrieving Rows that Satisfy Multiple Conditions

You can use logical or Boolean, operators to construct a where clause that contains
two or more expressions. The following table lists the logical operates that you can use.
The following example uses two expressions to include only countries that are in Africa
and that have a population greater than 100000 people.

Proc sql;
Select name,population format comma15. ,
From day1.countries
Where continent =Africa And population gt 100000
Order by population desc;
Quit;

Using other Conditional Operators

You can use many different conditional operators in a where clause.

Using IN Operator

The IN operator enables you to include values within a list that you supply . The follow-
ing example uses the IN operator to include only the mountains and waterfalls in the
features table.

Proc sql;
Select name, type, height format comma10.
From day1.features
Where type in( Mountain,Waterfall)
Order by height;
Quit;

Using the is Missing Operator

The IS MISSING operator enables you to identify rows that contain columns with missing
values.

Proc sql;
Select name, Highpoint
From day1.continents
Where highpoint is missing;
Quit;

Orangetree Business Solutions Private Limited , 2012 Page 67


The is Null operator is the same as , and interchangeable with, the IS MISSING operator.

Using the BETWEEN AND Operators

To select rows based on a range of values , you can use the between and and opera-
tors. This Example selects countries that have latitudes within five degrees of the equa-
tor.

Proc sql;
Select city, country, latitude
From day1.worldcitycoords
Where latitude between -12 and 50;
Quit;

Like operator

The like operator enables you to select rows based on pattern matching. For exam-
ple , the following query returns all countries in the countries table that begin with the
letter A and are any number of characters long, or end with the letter a and are
seven characters long.

Proc Sql;
Select name
From day1.countries
Where name like A%;
Quit;

Proc Sql;
Select name , population comma15.
From day1.countries
Where name like ______a;
Quit;

Similarly the following codes returns values where country names start with B , ends
with n, or starts with A and ends with n.

Proc Sql;
Select capital, population format comma15.
From day1.countries
Where capital like B%;
Quit;

Proc Sql;
Select name, population format comma15.
From day1.countries
Where name like %n;
Quit;

Proc Sql;
Select name, population format comma15.
From day1.countries
Where name like A%n;

Orangetree Business Solutions Private Limited , 2012 Page 68


Quit;

The following query searched for names of countries from the countries dataset where
a string ba is present.

Proc Sql;
Select name, population format comma15.
From day1.countries
Where name like %ba%;
Quit;

The following query shows how to use a where clause for finding out a range of val-
ues .

Proc sql;
Select name, depth
From day1.continents
Where depth lt 500
Order by depth;
quit;

Proc sql;
Select name, depth
From day1.continents
Where depth lt 500 and depth is not missing
Order by depth;
Quit;

The above query helps to find out a depth value that are less than 500 and also not
missing.

Multiple value Subqueries

A multiple value subquery can return more than one value from one column. It is
used in a WHERE or HAVING expression that contains in operator or a comparison op-
erator that is modified by ANY or ALL. This example displays the populations of oil pro-
ducing countries. The subquery first returns all countries that are found in the OILPROD.
Proc sql;
Select name, population comma15.
From day1.countries
Where name in (select country from day1.oilprod);
Quit;

If you use the NOT IN operator in this query , then the query result will contain all the
countries that are not contained in the OILPROD table.
Proc sql;
Select name, population comma15.
From day1.countries
Where name not in (select country from day1.oilprod);
Quit;

Orangetree Business Solutions Private Limited , 2012 Page 69


Combining Queries with Set Operators

Proc sql can combine the results of two or more queries in various ways by using the
following set operators.
Union: Produces all unique rows from both queries.
Except: Produces rows that are part of the first query only.
Intersect: Produces rows that are common to both query results.

The Union Operator

The Union Operator combines two query results . It produces all the unique rows that
result from both queries i.e. it returns a row if it occurs in the first table , the second or
both. Union does not return duplicate rows. If a row occurs more than once, then only
one occurrence is returned.

Proc sql;
Select * from day1.table1
Union
Select * from day1.table2;
Quit;

Producing rows that are in only the first query result (except)

The EXCEPT operator returns rows that result from the first query but not from the se-
cond query. In this example, the row that contains the values 3 and three exists in the
first query (table 1) only and is returned by except.

Proc sql;
Select * from day1.table1
Except
Select * from day1.table2;
Quit;

Producing rows that belong to both query results (Intersect)

The Intersect operator returns rows from the first query that also occur in the second.

Proc sql;
Select * from day1.table1
Intersect
Select * from day1.table2;
Quit;

Inserting rows into a table

With the set clause, you assign values to columns by name. The columns can appear
in any order in the set clause. The following INSERT statement uses multiple SET clauses
to add two rows to table1.

Orangetree Business Solutions Private Limited , 2012 Page 70


Proc sql;
create table day1.newcountries as
Select * from day1.countries;

Insert into day1.newcountries


Set name =India,
Capital =New Delhi,
Population = 100000,
area=5600,
Continent =Asia;

Select * from day1.newcountries;


Quit;

Updating tables

Modifies a column's values in existing rows of a table or view. You want to update the
UNITEDSTATES table with updated population data. Use the following PROC SQL code
to update the population information for each state in the UNITEDSTATES table:

Proc sql;
create table day1.newcountries1 as
Select * from day1.countries;

Update day1.newcountries1
Set population = population * 2
Where name like B%;

Select * from day1.newcountries1;


Quit;

In the above example the population column is updated but only for the name that
starts with B.

Deleting Rows

The Delete statement deletes one or more rows in a table. The following DELETE state-
ment deletes the names of cities that begin with the letter A.

Proc sql;
create table day1.newcountries2 as
Select * from day1.countries;

Delete from day1.newcountries2


Where name like A%;

Select * from day1.newcountries2;


Quit;

The drop clause deletes columns from tables. The following DROP clause delete UN-
DATE from countries.

Orangetree Business Solutions Private Limited , 2012 Page 71


Proc Sql;
Alter table day1.countries
Drop undate;
Quit;

SQL JOIN

The JOIN keyword is used in an SQL statement to query data from two or more tables,
based on a relationship between certain columns in these tables. Tables in a data-
base are often related to each other with keys. A primary key is a column (or a combi-
nation of columns) with a unique value for each row. Each primary key value must be
unique within the table. The purpose is to bind data together, across tables, without
repeating all of the data in every table.
Different SQL JOINs
Before we continue with examples, we will list the types of JOIN you can use, and the
differences between them.

INNER JOIN: Return rows when there is at least one match in both tables
LEFT JOIN: Return all rows from the left table, even if there are no matches in the
right table
RIGHT JOIN: Return all rows from the right table, even if there are no matches in the
left table
FULL JOIN: Return rows when there is a match in one of the tables

INNER JOIN

An inner join returns only the subset of rows from the first table that matches rows from
the second table. You can specify the columns that you want to be compared for
matching values in a WHERE clause.

A table alias is a temporary , alternate name for a table. You specify table alias in the
from clause. Table aliases are used in joins to qualify column names and can make a
query easier to read by abbreviating table names. The following example compares
the oil production of countries to their oil reserves by joining the oilprod and oilrsvrs ta-
bles on their country column. Because the country columns are common to both ta-
bles, they are qualified with their table aliases. You could also qualify the columns by
prefixing the column names with the table names.

Proc sql;
Select * from day1.oilprod as p
inner join
day1.oilrsvrs as r
On p.country= r.country;
Quit;

Orangetree Business Solutions Private Limited , 2012 Page 72


Left Join

A left join lists all the rows from the left hand table (the first table listed in the from
clause). A left join is specified with the keywords left join and on.

For example, to list the coordinates of the capitals of international cities , join the coun-
tries table, which contains capitals , with the worldcitycoords table, which contains cit-
ies coordinates , by using a left join. The left join lists all capitals, regardless of whether
the cities exist in worldcitycoords. Using an inner join would list only capital cities for
which there is a matching city in worldcitycoords.

Proc sql;
Title Coordinates Of Capital Cities;
Select Capital, Name, Latitude, longitude
From day1.countries as a
left join
day1.worldcitycoords as b
On a.capital=b.city and A.name=b.country;
Quit;

Right Join

The right join is just opposite the left join. The result of a Right Outer merge or join pro-
duces matched rows from both tables while preserving all unmatched rows from the
right table. This example reverses the join of the last example, it uses a right join to se-
lect all the cities from the worldcitycoords table and displays the population only if the
city is the capital of a country ( that is , if the city exists in the countries table).

Proc sql;
Select city, country, population
From day1.countries as a
right join
day1.worldcitycoords as b
On a.capital=b.city and A.name=b.country;
Quit;

Full Join

A full outer join, specified with the keywords FULL JOIN and ON , selects all matching
and nonmatching rows. This example displays the matching and non matching rows
from the city and capital columns of worldcitycoords and countries.

Proc sql;
Select city, country, population
From day1.countries as a
full join
day1.worldcitycoords as b
On a.capital=b.city and a.name=b.country;
Quit;

Orangetree Business Solutions Private Limited , 2012 Page 73


Chapter 9
Introduction to SAS Macros

SAS Macro Language is a tool for extending and customizing the SAS system and for
reducing the amount of text one must enter to do common tasks. SAS Macro facility is
a tool for text substitution. We associate a macro reference with text. When the mac-
ro processor encounters that reference, it replaces the reference with the associated
text. This text can be as simple as text strings or as complex as SAS language state-
ments. SAS macro facility is a component of BASE SAS.

There are two main


components of SAS
Macro facility : SAS
Macro variable
and SAS Macro
Program.

Advantages of using SAS Macro Facility

By using SAS Macro facility the program can become reusable, shorter and easier
to follow.
Repetitive works can be accomplished easily and quickly.

The macro language statements start with a percent sign (%) and macro variable ref-
erences start with an ampersand (&).

SAS Macro Variables

There are two types of SAS macro variables.


User defined Macro variables
Automatic Macro variables

User Defined Macro Variables

Macro variables are defined with %Let statement. The %Let statement contains an
assignment where the macro variable is assigned a value. The Macro variable is re-
ferred to by using & symbol.

Orangetree Business Solutions Private Limited , 2012 Page 74


%let city = Kolkata;
Title Data for &city;

To display a list of User defined macro variables in the log:


%put _user_;

Automatic or System Defined Macro Variables

For various operations SAS Systems defines many Macro variables. These are termed as
system defined macro variables. These macro variables are created when SAS session
starts. To display a list of system defined macro variables in the log:
%put _automatic_;

Macro Variable Resolution and the Use of Single and Double Quotation Marks

If a macro variables reference is enclosed in double quotation then the macro varia-
bles reference is resolved, otherwise not.

%let name = Ramesh;

%put "&name is a good boy.";


Output in the log - " Ramesh is a good boy."

%put '&name is a good boy.';


Output in the log - '&name is a good boy.'

Displaying Macro Variable Values

Two ways to display macro variable values are with the macro language statement %
PUT and with the SAS system option SYMBOLGEN. Both of these features write the val-
ues of macro variables to the SAS log.

The %PUT statement instructs the macro processor to write information to the SAS log.
The %PUT statement can be submitted by itself from the windowing environment Editor
or from within a SAS program. Since %PUT is a macro language statement, it does not
need to be part of a DATA step or PROC step, nor can it be part of a DATA step or
PROC step. A %PUT statement displays only text and information about macro varia-
bles.

With SYMBOLGEN enabled, SAS presents the results of the resolution of macro variables
in the SAS log. SYMBOLGEN displays the value of a macro variable in the SAS log near
the statement with the macro variable reference.

SYMBOLGEN shows the values of both automatic and user-defined macro variables.
The SYMBOLGEN option helps you debug your programs. If we are getting unexpected
results when using macro variables, enable this option and read the SAS log.

Orangetree Business Solutions Private Limited , 2012 Page 75


options symbolgen;

%let reptitle=Book Section;


%let repvar=section;
title "Frequencies by &reptitle as of &sysday";

proc freq data=books.ytdsales;


tables &repvar;
run;

title "Means by &reptitle as of &sysday";

proc means data=books.ytdsales;


class &repvar;
var saleprice;
run;

Scope of a Macro Variable

Scope of a macro variable refers to the area or bounded region to which the variable
can be accessed. The scope of a sas macro variable can be categorized into two
groups.

Global Macro variable


Local macro variable

The Global Macro variable exists till the duration of the SAS session. These global varia-
bles can be used within or outside the macros. Global macro variables can be creat-
ed anytime during the SAS session or job, their values can also be changed anywhere.

Syntax for declaring global Macro Variable :

%let <variable_name> = <variable_value>;


%let city = Kolkata;

The Local Macro variables are declared within a SAS macro and the scope exists till
the execution of the macro. They have no meaning outside that macro. They can be
accessed or modified within the created macro only .

Macro Program

Macro variables are only useful for simple text substitution. When a part of the SAS
code or the whole program needs to be repeated then we need to write them in a
macro program. Each macro program is assigned a name. When we reference a
macro program, the statements inside the macro program execute. The text that re-
sults from the execution is substituted into your SAS program at the location of the
macro program reference.

Macro programs use macro variables and macro language statements to generate
the text that builds your SAS programs. The SAS macro programming language has the

Orangetree Business Solutions Private Limited , 2012 Page 76


same type of statements as other programming languages. Many macro language
statements resemble their SAS language counterparts.

Several macro language statements can be used only inside macro programs. The
macro language statements that we have seen so far, %LET and %PUT, can be used
inside or outside macro programs.

Creating a Macro Program

%MACRO macro_name;
macro definition;
%MEND macro_name;

A Macro program starts with a %MACRO statement and ends with a %MEND.
macro_name is the name assigned to the macro program. It must be a valid SAS
name ( Max 32 characters), it should not be any reserved word in the macro facility.
macro definition can include text strings, macro variables, functions, SAS programs
etc.

To compile the macro program for later use in our SAS session, we have to submit the
macro program definition from the Editor or from within the SAS program that calls it.
The word scanner tokenizes the macro program and sends the tokens to the macro
processor for compilation.

When the macro processor compiles the macro language statements in the macro
program, it saves the results in a SAS catalog. By default, SAS stores macro programs in
a catalog in the WORK library called SASMACR. Macro programs can also be saved in
permanent catalogs and structures called autocall libraries.

A compiled macro program can be reused within the same SAS session. A macro pro-
gram has to be submitted only once in the SAS session. The compiled macro program
remains in the SASMACR catalog throughout the SAS session. When the SAS session
ends, SAS deletes the SASMACR catalog that contains the compiled macro program.

The entry in the catalog for the <macro program> is the compiled version of <macro
program>.

We can also list the entries in WORK.SASMACR catalog by submitting the following
PROC CATALOG step.

proc catalog c=work.sasmacr;


contents;
run;
quit;

Orangetree Business Solutions Private Limited , 2012 Page 77


Executing a Macro Program

A macro program is executed by submitting a reference to the macro program. To


execute a macro program, we have to submit the following statement from the Editor
or from within your SAS program.

%program
where program is the name assigned to the macro program.

A reference to a macro program that has been successfully compiled can be placed
anywhere in your SAS program except in data lines. This call to the macro program is
preceded by a percent sign (%). The percent sign tells the word scanner to direct pro-
cessing to the macro processor. The macro processor takes over and looks for the
compiled program in the WORK.SASMACR catalog of session compiled macro pro-
grams. If found, the macro processor directs execution of the compiled macro pro-
gram. If not found, an error message is written to the SAS log.

No semicolon follows the call to the macro program. The call to a macro program is
not a SAS statement. Indeed, using a semicolon to terminate the call to the macro
program might cause errors in the execution of your macro program.

Inserting Comments in Macros

/* This if one type of comment in SAS Macro */


%* This if another type of comment in SAS Macro;

Passing Values to a Macro Program through Macro Parameters

We can increase the reusability and flexibility of the macro program by using parame-
ters. With the help of parameters we dont need to change the macro program every
time.
Macro parameter names are specified on the %MACRO statement. The names as-
signed to the parameters must be the same as the names of the macro variables that
we want to reference inside the macro program. The initial values of the parameters
are specified on the call to the macro program. When the macro program starts, the
corresponding macro variables are initialized with the values of the parameters.

There are two types of macro program parameters: positional and keyword.

Specifying Positional Parameters in Macro Programs

%macro program(positional-1, positional-2, ...,positional-n);


macro program referencing the macro variables in the
positional parameter list
%mend <program>;

Positional parameters are enclosed in parentheses and are separated with commas.

Orangetree Business Solutions Private Limited , 2012 Page 78


There is no limit to the number of positional parameters that can be defined. However,
too many positional parameters is not desirable.

When we call a macro program that uses positional parameters, we must specify the
same number of values in the macro program call as the number of parameters listed
on the %MACRO statement. Valid values include null values and text. If we want to as-
sign a positional parameter a null value and we want to assign values to subsequent
positional parameters, use a comma as a placeholder.

The general format of a call to a macro program that uses positional parameters is
%program(value-1, value-2, ..., value-n)

%macro print(var1,var2,var3,var4,var5);
proc print data=&var1;
id &var2;
var &var3 &var4;
sum &var5;
run;
%mend;
%print(day1.candy_sales_summary,prodid,subcategory,category,sale_amount);

Defining a Macro Program with Keyword Parameters

In the keyword parameters, the keywords are specified with the name followed by =
sign. Unlike the positional parameters the keyword parameters can be specified in
any order in the called macro.

%macro program(keyword-1=, keyword -2=, ..., keyword n=);


macro program referencing the macro variables in the
keyword parameter list
%mend <program>;

%macro print(mydata=,var1=,var2=,var3=);
proc print data=&mydata;
id &var1;
var &var2 &var3;
sum &var3;
%mend;
%print
var1=prodid,mydata=day1.candy_sales_summary,var3=sale_amount,var2=subcategory);

Storing and Reusing Macro Programs

There are two ways to store macro programs in SAS:


the autocall facility and
the stored compiled macro facility

The autocall facility consists of external files or SOURCE entries in SAS catalogs that con-
tain the macro programs. When we specify certain SAS options, the macro processor

Orangetree Business Solutions Private Limited , 2012 Page 79


searches your autocall libraries when it is resolving a macro program reference.

The stored compiled macro facility consists of SAS catalogs that contain compiled
macro programs. When you specify certain SAS options, the macro processor searches
your catalogs of compiled macro programs when it is resolving a macro program ref-
erence.

Saving Macro Programs with the Autocall Facility

When we store a macro program in an autocall library, we do not have to submit the
macro program for compilation before referring the macro program. The macro pro-
cessor does that automatically if it finds the macro program in the autocall library. Sev-
eral SAS products ship with libraries of macro programs that we can reference, or that
are referenced by the SAS products themselves.

The main disadvantage to the autocall facility is that the macro program must be
compiled for the first time it is used in a SAS session. This takes resources. Also, resources
are used to search the autocall libraries for the macro program reference.

After the macro processor finds the macro program in autocall library, it submits the
macro program for compilation. If there are any macro language statements in open
code, these statements are executed immediately. The macro program is compiled
and stored in the session compiled macro program catalog, SASMACR. SASMACR is in
the WORK directory.

The macro program can be reused within the SAS session. When it is, only the macro
program itself is executed. Any macro language statements in open code that might
have been stored with the macro program are not executed again. The compiled
macro program is deleted at the end of the session when the catalog
WORK.SASMACR is deleted. The code remains in the autocall library.

Creating an Autocall Library

The macro programs can be stored as external files or as source entries in SAS cata-
logues.

To store macro programs as external files in a directory-based system such as Windows,


UNIX we first have to define the directory and add the macro programs to the directo-
ry. Each macro program is stored in an individual file with a file type or extension of
SAS. The name given to the file must be the same as the macro program name.

When we are storing a macro program in a SAS catalogue we have to make each
macro program in a separate source entry. The name of the source entry should be
same as the macro program entry.

Orangetree Business Solutions Private Limited , 2012 Page 80


Making Autocall Libraries Available to the SAS Programs

When we want SAS to search for macro programs in autocall libraries, we must specify
the two SAS options, MAUTOSOURCE and SASAUTOS. These options can be specified in
three ways:
Add MAUTOSOURCE and SASAUTOS to the SAS command that starts the SAS ses-
sion.
Submit an OPTIONS statement with MAUTOSOURCE and SASAUTOS from within a
SAS program.
Submit an OPTIONS statement with MAUTOSOURCE and SASAUTOS from within an
interactive SAS session.

The MAUTOSOURCE option must be enabled to tell the macro processor to search au-
tocall libraries when resolving macro program references. By default, this option is ena-
bled. Specify NOMAUTOSOURCE to turn off this option. A reason someone might disa-
ble MAUTOSOURCE is to save computing resources when not using autocall libraries.

options mautosource;
options nomautosource;

The SASAUTOS= option identifies the location of the autocall libraries for the macro
processor. On the SASAUTOS= option, specify either the actual directory reference en-
closed in single quotation marks or the filerefs that point to the directories. A FILENAME
statement defines the fileref.

The syntax of the SASAUTOS= option follows. The first line shows how to specify one li-
brary. The second line shows how to specify multiple libraries. The macro processor
searches the libraries in the order in which they are listed on the SASAUTOS= option
statement.

options sasautos=library;
options sasautos=(library-1, library-2,..., library-n);

How to access inbuilt SAS Macro Programs

The next statements define two filerefs under Windows XP with SAS 9 and assigns them
to SASAUTOS=. The OPTIONS statement includes these two filerefs plus the SASAUTOS
fileref.

filename reports 'c:\mymacroprograms\repmacs';


filename graphs 'c:\mymacroprograms\graphmacs';
options sasautos=(reports graphs sasautos);

To specify the same libraries as above without using filerefs, submit the following state-
ment. Note the inclusion of the SASAUTOS fileref.
options sasautos=
('c:\mymacroprograms\repmacs' 'c:\mymacroprograms\graphmacs' sasautos);

Orangetree Business Solutions Private Limited , 2012 Page 81


An autocall library stored in a SAS catalog requires that you specify the CATALOG ac-
cess method on the FILENAME statement that identifies the autocall library. The syntax
of the FILENAME statement is
filename fileref catalog 'library.catalog';

The next statements reference a user-defined autocall library stored in a SAS catalog
under Windows XP SAS 9. It also includes the SASAUTOS fileref.

filename mymacs catalog 'books.repmacs';


options sasautos=(mymacs sasautos);

Saving Macro Programs with the Stored Compiled Macro Facility

Macro programs that you want to save and do not expect to modify can be com-
piled and saved in SAS catalogs using the stored compiled macro facility. When a
compiled macro program is referenced in a SAS program, the macro processor skips
the compiling step, retrieves the compiled macro program, and executes the com-
piled code. The main advantage of this facility is that it prevents repeated compiling
of macro programs that you use frequently.

A disadvantage of this facility is that the compiled versions of macro programs cannot
be moved to other operating systems. The macro source code must be saved and
recompiled under the new operating system. Further, if you are moving the compiled
macro programs to a different release of SAS under the same operating system, you
might also have to recompile the macro programs.

Macro source code is not stored by default with the compiled macro program. You
are responsible for maintaining a copy of the macro source code. A convenient place
to store the code is an autocall library. Also, you can save the source code as a
SOURCE entry in a catalog if you specify the SOURCE option when compiling your
macro program.

Another way of saving the macro program code for later retrieval is shown in a later
section where the SOURCE option is added to the %MACRO statement when creating
a stored compiled macro program. This option stores the macro program code in the
same entry as the compiled code, and you can retrieve this code later with the %
COPY statement.

Setting SAS Options to Create Stored Compiled Macro Programs

We need to set two SAS options, MSTORED and SASMSTORE, before we can compile
and store the macro programs.
The MSTORED option instructs SAS that we want to make stored compiled macro pro-
grams available to our SAS session.
options mstored;
To turn off the MSTORED option, submit the following OPTIONS statement.
options nomstored;

Orangetree Business Solutions Private Limited , 2012 Page 82


The value that you assign to the SASMSTORE option is the libref that points to the loca-
tion of the SAS catalog containing the compiled macro programs. Here is an example
of SASMSTORE under Windows XP in SAS 9:

libname myapps 'c:\mymacroprograms';


options mstored sasmstore=myapps;

SAS stores compiled macro programs in a catalog called SASMACR. The SASMACR
catalog is stored in the directory specified by the SASMSTORE option. In this example,
that directory has the libref of MYAPPS. Do not rename the SASMACR catalog. Use the
CATALOG command or PROC CATALOG to view the list of macro programs stored in
this catalog.
You can also tell the macro processor to search SASMACR catalogs in multiple loca-
tions for a stored compiled macro program by listing the multiple paths on the
LIBNAME statement. The following code tells the macro processor to look in the
SASMACR catalog in the three locations that are specified within the parentheses. The
order in which you list the paths is the order in which SAS searches for a stored com-
piled macro program. If you have a macro program with the same name in two loca-
tions, the program found in the first of the two paths is the one that executes.

libname myapps ('c:\mymacroprograms', 'z:\mymacroprograms', 'c:\lega-


cy\macros');

options mstored sasmstore=myapps;

Creating Stored Compiled Macro Programs

Once the SAS options in the previous section are set, macro programs can be com-
piled and stored in a catalog by adding options to the %MACRO statement. The syn-
tax of %MACRO when you want to compile and store a macro program follows:

%macro macro-name(parameters) / store <source secure


des="description">;
macro-program-code
%mend macro-name;

The STORE keyword is required. The SOURCE, SECURE, and DES= options are not re-
quired. The SOURCE option tells the macro processor to save a copy of the macro pro-
grams source code, along with the compiled macro program in the same SASMACR
catalog. It does not have a separate entry in the catalog and is instead stored in the
same MACRO entry as the compiled macro program.
Starting with SAS 9.2, you can use the SECURE option to encrypt the compiled macro
program and prevent someone from easily obtaining the source code. Without the
SECURE option, it is not easy, but it is possible to extract the code.

Orangetree Business Solutions Private Limited , 2012 Page 83


Use the DES= option to save up to 40 characters of text to describe your macro pro-
gram. SAS displays the descriptive text when you view the contents of the catalog that
holds the compiled stored macro programs.

Example : Creating a Stored Compiled Macro Program


An example of defining a macro program and storing it in a catalog under Windows
XP in SAS 9 follows:

libname myapps 'c:\mymacroprograms';


options mstored sasmstore=myapps;
%macro reptitle(repprog) / store des='Standard Report Titles';
title "Bookstore Report &repprog";
title2 "Processing Date: &sysdate SAS Version: &sysver";
%mend reptitle;

Saving and Retrieving the Source Code of a Stored Compiled Macro Program

As mentioned earlier, the SOURCE option on the %MACRO statement in conjunction


with the STORE option saves a copy of the source code of the compiled macro pro-
gram. It is not saved as a separate entry that you can retrieve; it is embedded in the
same entry as the compiled code. To retrieve a copy of the code, use the %COPY
macro language statement. This statement can list the code in the SAS log or save the
code to a file. The syntax of the %COPY statement follows. The three options are op-
tional.

%COPY macro-program-name / <library= outfile= <fileref> <external file'> source >;

By default, if you do not specify a libref with the LIBRARY= option, the macro processor
will look in the library specified by the current setting of SASMSTORE.

Example : Saving the Source Code of a Stored Compiled Macro Program


The below program is modified to save the macro program code along with the com-
piled macro program.

libname myapps 'c:\mymacroprograms';


options mstored sasmstore=myapps;
%macro reptitle(repprog) / store source
des='Standard Report Titles';
title "Bookstore Report &repprog";
title2 "Processing Date: &sysdate SAS Version: &sysver";
%mend reptitle;

If you want to view the code in the SAS log, submit the following statement:
%copy reptitle / library=myapps source;
If you want to save the code in a file called REPTITLE_SOURCE.SAS, submit the
following %COPY statement.
%copy reptitle / library=myapps source out-
file='c:\mymacroprograms\reptitle_source.sas';

Orangetree Business Solutions Private Limited , 2012 Page 84


SAS Macro Functions

The Macro facility has its own set of functions.

Suppose, you want to store the sum of 100 and 30 in a global macro variable X.

%Let X = 100 + 30;

To see the value of X, we run the following code:


%Put &X;

To your astonishment, you will find that the log is showing 100 + 30, not 130.

Now lets redefine X.

%Let X = %EVAL(100+30);
%Put &X;

Now you get 130.

%EVAL is used to evaluate a mathematical expression and the output is in integer for
mat. For an output with decimal places, we use %SYSEVALF instead of %EVAL.

Creating Macro Variable in Data Step

Data _null_;
Set Day1.Class End= Final;
If age > 13 then n+1;
If final then CALL SYMPUT(number, n);
Run;
%PUT &number students have age greater than 13;

Here we created a macro variable number using the CALL routine SYMPUT. The value
that number stores is the final count of student whose age is greater than 13.

Simple syntax for CALL SYMPUT is:


CALL SYMAPUT (Macro variable name, value to be stored).

Creating Macro Variable in Proc SQL

Proc SQL;
Select Name Into: X from SASHELP.CLASS
Having Height = Max(Height);
Quit;

%PUT The longest student in the class is &X;

Orangetree Business Solutions Private Limited , 2012 Page 85


Appendix
Suggested Books & Links

1. Delwiche, L.D. , Slaughter, S.J., The Little SAS Book: A Primer, 4th Edition, SAS
Publishing

2. Bass N.J., Lata K.M., Base SAS Programming Black Book , Dreamtech Press

3. SAS Online Doc Version 8 PDF files, Worcester Polytechnic Institute,


http://www.math.wpi.edu/saspdf/common/mainpdf.htm

4. Burlew, M.M. , SAS Macro Programming Made Easy, 2nd Edition, SAS Publish-
ing

Orangetree Business Solutions Private Limited , 2012 Page 86


Contact Us:
Orangetree Business Solutions Private Limited
BD 36, Sector 1, SaltLake, Kolkata - 700 064

Call Us:
033 40041497
09051563222

Mail us:
info@orangetreeglobal.com

Visit us:
www.orangetreeglobal.com

Вам также может понравиться