Вы находитесь на странице: 1из 42

Introduction to SAS

Outline
The SAS programming environment and language.
Introduction to the SAS Window Environment
the structure and components of SAS programs
SAS Data Sets
SAS libraries : temporary and permanent
Working with data sets in SAS

Getting Your Data into SAS

Modifying SAS Data Sets

Combining (Appending & Merging)

Selecting, Sorting

Printing
Using Basic Statistical Summary Procedures

Means Procedure

Freq Procedure

Plot Procedure

Univariate Procedure

SAS
(Statistical Analysis System)
The SAS programming environment and language used to read, process
and analyze data.
Introduction to the SAS Window Environment
The structure and components of SAS programs
SAS Data Sets
SAS libraries : Temporary and Permanent

The SAS Window Environment


SAS Command Bar

Pull-down Menus

Log Window

Explorer
Window

Results

Editor Window

Output

Toolbar

Window Environment
Editor Window : is a text editor. Used to type in, edit, and submit /execute
SAS programs as well as edit other text files such as raw data files.
Log Window: lists program statements that are processed and gives notes,
warnings and errors.
Output Window : gives the output from the program if there is any
printable results.
Results Window : lists each part of your results in an outline form.
Explorer Window: The Explorer window gives you easy access to your SAS
files and libraries

SAS Programs

SAS programs are used to access, manage, analyze, and present the data
A SAS program written using the SAS language is a sequence of statements
executed in order.
SAS statements as with any language, there are a few rules to follow while
writing SAS programs.
File extension - .sas
A sample SAS program
DATA demo;
INPUT sale year;
DATALINES;
20 2001
34 2002
15 2003
21 2004
30 2005
;
PROC PRINT DATA= demo;
RUN;

SAS programs

Syntax Rules for SAS statements


Free format : does not differentiate between upper and lower case
Usually begin with an identifying keyword
Can span multiple lines
Every statement ends with a semicolon
Multiple statements can be on the same line
can start in any column.
To add the comments use either option
start with an asterisk (*) and end with a semicolon (;).
start with a slash asterisk (/*) and end with an asterisk slash (*/).
Possible Errors
Indicated in the Log window
Misspelled key words
Missing or invalid punctuation (missing semi-colon common)
Invalid options

Raw Data

Read in Data
Process Data
(Create new variables)

Data Step

Output Data
(Create SAS Dataset)

Analyze Data Using Statistical Procedures

PROCs

SAS programs
2 Basic steps in SAS programs:
Data Steps
Read and Modify data
Create new data
Begins with DATA
statement
Proc Steps
perform specific analysis or
function
produce results or report

Begins with PROC


statement

Data bankacct;
infile records;
input Name $ 1-10 AccountType $ 12-20
Deposit 22-25;
run;
proc print data=bankacct;
run;
proc means data=bankacct;
var deposit;
run;

SAS Programs
The end of the data or proc steps are indicated by:
RUN statement most steps
QUIT statement some steps
Beginning of another step (DATA or PROC statement)
Output generated from SAS program
SAS log
Information about the processing of the SAS program
Includes any warnings or error messages
Accumulated in the order the data and procedure steps are
submitted
SAS output
Reports generated by the SAS procedures
Accumulates output in the order it is generated

SAS Data Sets


Before you want to do any analysis, write a report or do anything with your
data, you must read the data into SAS.
Before SAS can analyze your data, the data must be as a SAS data set.
It is very similar to Excel
Made up of rows and columns
Rows are called observations
Columns are called variables
An observation is all the information for one entity (employee, company,
country)
SAS processes data one observation at a time

SAS Data Sets


There are two types of data
Character that includes letters, numbers, symbols etc., e.g. Emp ID,
Color Code
Numeric floating point numbers e.g. age ,salary, temperature
Rules for SAS variable names
must be 32 characters or fewer in length.
must start with a letter or an underscore ( _ ).
can contain only letters, numerals, or underscores ( _ ).
can contain upper- and lowercase letters.
Each file is located/stored in SAS Data Libraries.

SAS Data Libraries


A SAS library is simply a location where SAS data sets (as well as other
types of SAS files) are stored.

Identified by assigning a library reference name libref

Depending on the library name that you use when you create a file, you
can store SAS files temporarily or permanently.
Temporary
Work library
SAS data files are deleted when session ends
Library reference name not necessary
Permanent
SASUSER library
SAS data sets are saved after session ends
You can create and access your own libraries
Eg : LIBNAME test 'H:\lab class ';

Referencing SAS Files


To reference a SAS file, you use a two-level name, libref.filename.
In the two-level name, libref is the name for the SAS library that contains
the file, and filename is the name of the file itself.
A period separates the libref and filename.
To reference temporary SAS files, you specify the default libref Work,
a period, and the filename Or simply use a one-level name (the
filename only)
Referencing a SAS file in any library other than the Work indicates that
the SAS file is stored permanently.

Working with data sets in SAS


Getting data into SAS
Modifying SAS Data Sets
Selecting Variables /Observations
Sorting
Combining: Appending & Merging
Printing

Getting data into SAS


There can be different methods for getting your data into SAS like
Importing existing data set
Using Import menu option
Using the PROC IMPORT
Entering raw data manually
Using Table Editor
Using the DATA steps

Getting data into SAS

Using the import data menu option/Import


Wizard
1.
2.
3.
4.

File Import Data


Standard data source select the file format
Specify file location or Browse to select file
Create name for the new SAS data set and specify location
(permanent or temporary)

Example: demo1.exl, demo1.txt

S. No.
1
2
3
4
5
6
7

Age
20
55
40
.
35
24
38

Gender

Income

F
M
F
M
M
M
F

10000
25000
35000
18000
12000
16000
30000

Job Status
L
M
H
M
L
M
H

Getting data into SAS


Manually Entering Raw Data Files using the Table Editor.
1.
2.
3.

4.

Tools Table Editor


Enter data manually into table
- Observations in each row
- Variables in each column
Right Click Column Column Attributes
- Variable Name, Variable Label, Type Character/Numeric,
Format, Informat
Note: Informats determine how raw data is read.
Formats determine how variable is displayed.
Close window Save Changes Yes
Specify File name and directory

Getting data into SAS

Manually Entering Raw Data Files in SAS program using the DATA Step

Few Examples
/*Reading raw data separated by spaces*/
data one;
input S_No Age Gender $ Income Job_Status $;
datalines;
1 20 F 10000 L
2 55 M 25000 M
3 40 F 35000 H
4 . M 18000 M
5 35 M 12000 L
6 24 M 16000 M
7 38 F 30000 H
;
proc print data = one;
title ' Problem 1';
Run;

/*Reading data arranged in columns */


data two;
input S_No 1 Age 2-3 Gender $ 4 Income 5-9 Job_Status $ 10 ;
datalines;
120F10000L
255M25000M
340F35000H
4. M18000M
535M12000L
624M16000M
738F30000H
;
proc print data = two;
title ' Problem 2';
run;

Getting data into SAS


/* Reading selected variables from your data */
data three;
input S_No 1 income 5-9;
datalines;
120F10000L
255M25000M
340F35000H
4. M18000M
535M12000L
624M16000M
738F30000H
;
proc print data = three;
title ' Problem 3';
run;

Getting data into SAS


/*Creating permanant data set

*/

/*LIBNAME test 'H:\lab class;*/


data test.one;
input S_No Age Gender $ Income Job_Status $;
datalines;
1 25 F 10000 L
2 55 M 25000 M
3 40 F 35000 H
4 . M 18000 M
5 35 M 12000 L
6 25 M 16000 M
7 40 F 30000 H
;
proc print data = test.one;
title ' Problem 4';
run;

Modifying SAS Data Sets


Create a new SAS data set using an existing SAS data set as input
Syntax:
DATA new _data;
SET old_data;
<additional SAS statements if any>;
RUN;
By default the SET statement reads all observations and variables from the
old data set into the output data set.
Example:
data new;
set test.one;
run;
proc print data=new;
Title 'original data';
run;

Modifying SAS Data Sets


Selecting Variables
Use DROP and KEEP to determine which variables are written to new SAS data
set.
DROP and KEEP as statements
Syntax:DROP V1 V2;
KEEP V3 V4;
DROP and KEEP options in SET statement
Syntax: SET old_data (KEEP=V1);
Syntax: SET old_data (DROP=V1);
Conditional Processing
Uses IF-THEN-ELSE logic
Syntax: IF <expression1> THEN <statement>;

Modifying SAS Data Sets


/* keep variables s_no and age only in the new file*/
data new1; set test.one;
keep s_no age;
run;
proc print data = new1; run;
OR
data new1;
set test.one (keep= s_no age);
run;
/* Same file can be obtained by dropping the variables income, gender and
job_status*/
data new2; set test.one;
drop income gender job_status;
run;
proc print data = new2; run;

Modifying SAS Data Sets


/* Creating new data set where income is greater than 20,000*/
data new3;
set test.one;
if income>20000;
run;
proc print data = new3; run;
Creating new variables from the existing variables
/* Recode into a new variables */
data new;
set test.one;
if age>35 then age_code=2;
else age_code=1;
drop age;
title ' Problem 6';
run;
proc print data = new; run;

Modifying SAS Data Sets


Subsetting Rows (Observations)

Using IF statement
Only writes observations to the new data set in which an expression is true;
General Form: IF <expression>;
Example: IF Job_status = H;

Using WHERE option in SET statement


only read rows from the input data set in which the expression is true
General Form: SET input_data_set (where=(<expression>));
SET test.one (where=(Job_Status=H));
Example:
Comparison
Resulting output data set is equivalent
IF statement all rows read from the input data set
Where option only rows where expression is true are read from input data set
Difference in processing time when working with big data sets

Modifying SAS Data Sets


/* Subsetting the observations */
data new4;
set test.one;
if job_status='H';
run;
proc print data = new4; run;
OR
data new5;
set test.one (where=(Job_Status= 'H'));
run;
proc print data = new5; run;
/*Conditionally Deleting an Observation*/
data new6;
set test.one;
if job_status='H' then delete;
run;
proc print data=new6;
run;

Modifying SAS Data Sets


Sorting Data on particular variable(s)
General Form:
PROC SORT DATA=old_data <options>;
BY Variable1 Variable2;
RUN;
Sorts data according to Variable1 and then Variable2;
By default, SAS sorts data in ascending order
Use DESCENDING statement for numbers high to low and letters Z to A
Example
data one; set test.one; run; /*To read data */
proc sort data=one out=two;
by income;
run;
proc print data=one; run;
proc print data=two; run;

Modifying SAS Data Sets


Concatenating (or Appending)
Stacks each data set upon the other
If one data set does not have a variable that the other datasets do, the
variable in the new data set is set to missing for the observations from that
data set.
Form: DATA output_data_set;
SET data1 data2;
run;
Example:
PROC APPEND may also be used
Form: PROC APPEND BASE=old_data DATA=new_data;
run;

Concatenating (or Appending).


/*Reading second data set to be added into the old one*/
data two;
input S_No Age Gender $ Income Job_Status $;
datalines;
8 25 F 10000 L
9 50 M 35000 H
10 35 F 25000 M
;
data three;
set test.one two;
run;
proc print data = three;
run;
/*OR*/
/* using PROC APPEND: it adds the data into the same file*/
data one; set test.one; run; /*To read data */
proc append base =one data=two ;
run;
proc print data = test.one;
run;

Modifying SAS Data Sets

Merging Data Sets


usually needs one common variable
A single record in a data set corresponds to a single record in all
other data sets
Form: DATA output_data_set;
MERGE input_data1 input_data2;
By variable1 ;
RUN;
Data must be sorted before merging can be done (PROC SORT)

Merging Data Sets

/*Reading the new data set containing the


new variable to be added */
data two;
input S_No Name$;
datalines;
1 Leena
2 Ajay
3 Sunita
4 Gopal
5 Sachin
6 Tanay
7 Mamta
;
data three;
merge test.one two;
run;
proc print data = three;
run;

/*Reading the new data set containing the


new variable to be added */
data two;
input S_No Name$;
datalines;
1 Leena
2 Ajay
4 Gopal
5 Sachin
7 Mamta
3 Sunita
6 Tanay
;
data three;
merge test.one two;
by S_No ; /*match merge*/
run;
proc print data = three;
run;

Using Basic Statistical Summary Procedures

Print Procedure
Means Procedure
Freq Procedure
Chart Procedure
Plot Procedure
Univariate Procedure

Proc Print

Print Procedure

PROC PRINT is used to print data to the output window


By default, prints all observations and variables in the SAS data set
Form:
PROC PRINT DATA=input_data <options>
<SAS statements if any>;
RUN;
Some Options
input_data (obs=n): Specifies the number of observations to be printed
NOOBS: Suppresses printing observation number
LABEL: Prints the labels instead of variable
names

Optional SAS statements


SUM variable1 variable2 variable3;
Prints sum of listed variables at the bottom of the output
VAR variable1 variable2 variable3;
Prints only listed variables in the output

Proc Print
/* Printing selected variables
(use of var, where, sum, noobs, n, print selected observations, html output)*/
ods html file='try.html';
proc print data = test.one noobs n;
var S_No Age;
where age >30;
title ' Proc print usage';
sum income;
run;
ods html close;
proc print data = test.one(firstobs=2 obs=4) ;
var S_No Age income;
run;

Proc Means

Means Procedure

Proc Means is used to get the simple summary statistics of numeric variables
General Form:
PROC MEANS DATA=input_data_set options;
<SAS statements if any>;
RUN;

With no options or optional SAS statements, the Means procedure will print out the
number of non-missing values, mean, sd, min, and max for all numeric variables .

Some of the options:


MAX, MIN, MEAN, MEDIAN, N (number of non-missing values), NMISS (number of
missing values), RANGE, STDDEV, SUM.

Example
proc means data=test.one;
class gender;
var income age;
run;

FREQ
Procedure
PROC FREQ is used to generate frequency tables (chracter variables)

Proc Freq

Most common usage is create table showing the distribution of categorical


variables
General Form:
PROC FREQ DATA=input_data_set;
TABLES variable-combinations;
RUN;
Use BY statement to get percentages within each category of a variable
Example
proc freq data =test.one;
tables gender gender *job_status;
run;

Proc Chart
Used to create frequency bar chart
General form: PROC CHART DATA=input_data;
VBAR variable_list /options;
RUN;
/*Use of Proc Chart to create frequency bar chart for characteristics variable*/
proc chart data=test.one;
vbar job_status ;
run;

/*Use of Proc Chart to create frequency bar chart for numeric variable*/
ods graphics on;
ods html ;
proc chart data=test.one;
vbar income /descrete; /* other options could be midpoints=values/ range or
levels=no.*/
run;
ods graphics off;
ods html close;

Proc Plot

Plot Procedure

Used to create basic scatter plots of the data


General Form: PROC PLOT DATA=input_data_set;
PLOT vertical_variable * horizontal_variable/<options>;
RUN;

By default, SAS uses letters to mark points on plots


A for a single observation, B for two observations at the same point, etc.

To specify a different character to represent a point


PLOT vertical_variable * horizontal variable = *;

To plot more than one variable on the vertical axis


PLOT vertical_variable1 * horizontal_variable=2
vertical_variable2 * horizontal_variable=1/OVERLAY;

Use PROC GPLOT or PROC SGPLOT for more sophisticated plots

Proc Plot
Example
* Create data for variables x and y;
data generate;
do x = 1 to 8;
y1 = x **2;
y2 = x**3;
output;
end;
proc print data = generate;
title 'generated data';
run;
/*A Simple Scatter Plot*/
proc plot data=generate;
plot y1*x;
run;
proc plot data=generate;
plot y1*x='1' y2*x='2'/overlay;
run;

Proc Univariate

Univariate Procedure

PROC UNIVARIATE is used to examine the distribution of data


Produces summary statistics for a single variable
Includes mean, median, mode, standard deviation, skewness, kurtosis,
quantiles, etc.

General Form: PROC UNIVARIATE DATA=input_data_set <options>;


VAR variable1 variable2 variable3;
RUN ;

If the variable statement is not used, summary statistics will be produced for all
numeric variables in the input data set.
Options include:
PLOT produces Stem-and-leaf plot, Box plot, and Normal probability plot;

NORMAL produces tests of Normality

Example
proc univariate data = test.one;
var age;
run;

Вам также может понравиться