Вы находитесь на странице: 1из 41

How to start using SAS

The topics
An overview of the SAS system
Reading raw data/ create SAS data set
Combining SAS data sets & Match merging
SAS Data Sets
Formatting data
Introduce some simple regression procedure
Summary report procedures

Basic Screen Navigation
Main:
Editor
contains the SAS program to be submitted.
Log
contains information about the processing of the SAS
program, including any warning and error messages
Output
contains reports generated by SAS procedures and
DATA steps
Side:
Explore
navigate to other objects like libraries
Results
navigate your Output window
SAS programs
A SAS program is a sequence of steps that the user
submits for execution.

Data steps are typically used to create SAS data sets
PROC steps are typically used to process SAS data
sets (that is, generate reports and graphs, edit
data, sort data and analyze data
SAS Data Libraries
A SAS data library is a collection of SAS files that are
recognized as a unit by SAS
A SAS data set is one type of SAS file stored in a data
library
Work library is temporary library, when SAS is closed, all
the datasets in the Work library are deleted; create a
permanent SAS dataset via your own library.

SAS Data Libraries
Identify SAS data libraries by assigning each a library reference
name (libref) with LIBNAME statement

LIBNAME libref file-folder-location;
Eg: LIBNAME readData 'C:\temp\sas class\readData;

Rules for naming a libref:
The name must be 8 characters or less
The name must begin with a letter or underscore
The remaining characters must be letters, numbers or
underscores.


Reading raw data set into SAS
system
In order to create a SAS data set from a raw
data file, you must
Start a DATA step and name the SAS data set
being created (DATA statement)
Identify the location of the raw data file to read
(INFILE statement)
Describe how to read the data fields from the raw
data file (INPUT statement)
Reading external raw data file into
SAS system
LIBNAME readData 'C:\temp\sas class\readData;
DATA readData.wa80;
INFILE k:\census\stf2_wa80.txt;
INPUT @10 SUMRYLVL $2. @40 COUNTY $3.
@253 TABA1 9.0 @271 TABA1 9.0;
RUN;
The LIBNAME statement assigns a libref readData to a data library.
The DATA statement creates a permanent SAS data set named wa80.
The INFILE statement points to a raw data file.
The INPUT statement
- name the SAS variables
- identify the variables as character or numeric ($ indicates character data)
- specify the locations of the fields in the raw data
- can be specified as column, formatted, list, or named input
The RUN statement detects the end of a step

Example 1
Reading raw data separated by spaces

/* Create a SAS permanent data set named HighLow1;
Read the data file temperature1.dat using listing input */
DATA readData.HighLow1;
INFILE C:\sas class\readData\temperature1.dat;
INPUT City $ State $ NormalHigh NormalLow
RecordHigh RecordLow;
RUN;
/* The PROC PRINT step creates a isting report of the
readData.HighLow1 data set */
PROC PRINT DATA = readData.highlow1;
TITLE High and Low Temperatures for July;
RUN;
Nome AK 55 44 88 29
Miami FL 90 75 97 65
Raleign NC 88 68 105 50




temperature1.dat:
Example 2
Reading multiple lines of raw data per observation

/* Read the data file using line pointer, slash(/) and pount-n (#n).
The slash(/) indicates next line, the #n means to go to the n line
for that observation. Slash(/) can be replaced by #2 here */
DATA readData.highlow2;
INFILE C:\sas class\readData\temperature2.dat;
INPUT City $ State $
/ NormalHigh NormalLow
#3 RecordHigh RecordLow;
PROC PRINT DATA = readData.highlow2;
TITLE High and Low Temperatures for July;
RUN;
Nome AK
55 44
88 29
Miami FL
90 75
97 65
Raleign NC
88 68
105 50

temperature2.dat:
Example 3
Reading multiple observations per line of raw data


/* To read multiple observations per line of raw data,use double railing at
signs (@@) at the end of INPUT statement */
DATA readData.highlow3;
INFILE C:\sas class\readData\temperature3.dat;
INPUT City $ State $ NormalHigh NormalLow RecordHigh
RecordLow @@;
PROC PRINT DATA = readData.highlow3;
TITLE High and Low Temperatures for July;
RUN;
Nome AK 55 44 88 29 Miami FL 90 75 97 65 Raleign NC 88
68 105 50




temperature3.dat
:
Reading external raw data file into
SAS system
Reading raw data arranged in columns
INPUT FILEID $ 1-5 RECTYP $ 6-9 SUMRYLVL $ 10-11
URBARURL $ 12-13 SMSACOM $ 14-15;
Reading raw data mixed in columns
INPUT FILEID $ 1-5 @10 SUMRYLVL $ 2. @253 TABA1 9.0
@271 TABA1 9.0;
/* The @n is the column pointer, where n is the number of the column
SAS should move to. The $w. reads standard character data, and
w.d reads standard numeric data, where w is the total width and d
is the number of decimal places. */


Reading Delimited or PC Database
Files with the IMPORT Procedure
If your data file has the proper extension, use the simplest form of
the IMPORT procedure:
PROC IMPORT DATA FILE = filename OUT = data-set
Type of File Extension DBMS Identifier
Comma-delimited .csv CSV
Tab-delimited .txt TAB
Excel .xls EXCEL
Lotus Files .wk1, .wk3, .wk4 WK1,WK3,WK4
Delimiters other than commas or tabs DLM
Examples:
1. PROC IMPORT DATAFILE=c:\temp\sale.csv OUT=readData.money; RUN;
2. PROC IMPORT DATAFILE=c:\temp\bands.xls OUT=readData.music; RUN;
Reading Files with the IMPORT
Procedure
If your file does not have the proper extension, or your file
is of type with delimiters other than commas or tabs, then
you must use the DBMS= and DELIMITER= option
PROC IMPORT DATAFILE = filename OUT = data-set
DBMS = identifier;
DELIMITER = delimiter-character;
RUN;
Example:
PROC IMPORT DATAFILE = C:\sas class\readData\import2.txt
OUT =readData.sasfile DBMS =DLM;
DELIMITER = &;
RUN;


Format in SAS data set
Standard Formats (selected):
Character: $w.
Date, Time and Datetime:
DATEw., MMDDYYw., TIMEw.d,
Numeric: COMMAw.d, DOLLARw.d,
Use FORMAT statement
PROC PRINT DATA=sales;
VAR Name DateReturned CandyType Profit;
FORMAT DateReturned DATE9. Profit DOLLAR 6.2;
RUN;

Format in SAS data set
Create your own custom formats with two steps:
Create the format using PROC FORMAT and VALUE statement.
Assign the format to the variable using FORMAT statement.
General form of a simple PROC FORMAT steps:
PROC FORMAT;
VALUE name range-1=formatted-text-1
range-2=formatted-text-2 ;
RUN;
The name in VALUE statement is the name of the format you are
creating, which cant be longer than eight characters, must not start or
end with a number. If the format is for character data, it must start
with a $.




Format in SAS data set
Exmaple:
/* Step1: Create the format for certain variables */
PROC FORMAT;
VALUE genFmt 1 = 'Male'
2 = 'Female';
VALUE money
low-<25000='Less than 25,000'
25000-50000='25,000 to 50,000'
50000<-high='More than 50,000';
VALUE $codeFmt
'FLTA1'-'FLTA3'='Flight Attendant'
'PILOT1'-'PILOT3'='Pilot';
RUN;
/* Step2: Assign the variables */
DATA fmtData.crew1;
SET fmtData.crew;
FORMAT Gender genFmt. Salary money. JobCode $codeFmt.;
RUN;
Format in SAS data set
Permanently store formats in a SAS catalog by
Creating a format catalog file with LIB in PROC
FORMAT statement
Setting the format search options
Example:
LIBNAME class C:\sas class\Format;
OPTIONS FMTSEARCH=(fmtData.fmtvalue); RUN;
PROC FORMAT LIB=fmtData.fmtvalue;
VALUE genFmt 1 = Male 2=Female;
RUN;
Combining SAS Data Sets:
Concatenating and Interleaving
Use the SET statement in a DATA step to
concatenate SAS data sets.
Use the SET and BY statements in a DATA
step to interleave SAS data sets.
Combining SAS Data Sets:
Concatenating and Interleaving
General form of a DATA step concatenation:
DATA SAS-data-set;
SET SAS-data-set1 SAS-data-set2 ;
RUN;
Example:
DATA stack.allEmp;
SET stack.emp1 stack.emp2 stack.emp3;
RUN;

Combining SAS Data Sets:
Concatenating and Interleaving
General form of a DATA step interleave:
DATA SAS-data-set;
SET SAS-data-set1 SAS-data-set2 ;
BY BY-variable;
RUN;
Sort all SAS data set first by using PROC SORT
Example:
PROC SORT data=stack.emp2 OUT=stack.emp2_sorted; BY Salary;
RUN;
DATA stack.allEmp;
SET stack.emp1 stack.emp2 stack.emp3;
BY salary;
RUN;

Match-Merging SAS Data Sets
One-to-one match merge
One-to-many match merge
Many-to-many match merge
The SAS statements for all three types of match
merge are identical in the following form:
DATA new-data-set;
MERGE data-set-1 data-set-2 data-set-3 ;
BY by-variable(s); /* indicates the variable(s) that control
which observations to match */
RUN;



Merging SAS Data Sets: A More
Complex Example
/* To match-merge the data sets by common variables -
EmpID, the data sets must be ordered by EmpID */
PROC SORT data=combData.Groupsched;
BY EmpID;
RUN;








Example: Merge two data sets acquire the names of the group
team that is scheduled to fly next week.
combData.employee combData.groupsched
EmpID LastName
E00632 Strauss
E01483 Lee
E01996 Nick
E04064 Waschk
EmpID FlightNum
E04064 5105
E0632
5250
E01996 5501
Merging SAS Data Sets: A More
Complex Example
/* simply merge two data sets */
DATA combData.nextweek;
MERGE combData.employee combData.groupsched;
BY EmpID;
RUN;


EmpID LastJName FlightNum
E00632 Strauss 5250
E01483 Lee
E01996 Nick 5501
E04064 Waschk 5105
Merging SAS Data Sets: A More
Complex Example
Eliminating Nonmatches
Use the IN= data set option to determine which dataset(s)
contributed to the current observation.
General form of the IN=data set option:
SAS-data-set (IN=variable)
Variable is a temporary numeric variable that has two
possible values:
0 indicates that the data set did not contribute to the
current observation.
1 indicates that the data set did contribute to the
current observation.
Merging SAS Data Sets: A More
Complex Example
/*Exclude from the data set employee who are scheduled to fly next
week. */
LIBNAME combData K:\sas class\merge;
DATA combData.nextweek;
MERGE combData.employee
combData.groupsched (in=InSched);
BY EmpID;
IF InSched=1; True
RUN;

EmpID LastJName FlightNum
E00632 Strauss 5250
E01996 Nick 5501
E04064 Waschk 5105
Merging SAS Data Sets: A More
Complex Example
/* Find employees who are not in the flight scheduled group. */
LIBNAME combData K:\sas class\merge;
DATA combData .nextweek;
MERGE combData .employee (in=InEmp)
combData.groupsched (in=InSched);
BY EmpID;
IF InEmp=1; True
IF InSched=0; False
RUN;




EmpID LastJName FlightNum
E01483 Lee
Different Types of Merges in SAS
DATA work.three;
MERGE work.one work.two;
BY X;
RUN;
One-to-Many Merging

X Y
1 A
2 B
3 C
X E
1 A1
1 A2
2 B1
3 C1
3 C2
X Y Z
1 A A1
1 A A2
2 B B1
3 C C1
3 C C2
Work.three
Work.two
Work.one
Different Types of Merges in SAS
DATA work.three;
MERGE work.one work.two;
BY X;
RUN;
Many-to-Many Merging

X Y
1 A1
1 A2
2 B1
2 B2
X Z
1 AA1
1 AA2
1 AA3
2 BB1
2 BB2
X Y Z
1 A1 AA1
1 A2 AA2
1 A2 AA3
2 B1 BB1
2 B2 BB2
Work.three
Work.two
Work.one
Some simple regression analysis
procedure
The REG Procedure
The LOGISTIC Procedure


The REG procedure
The REG procedure is one of many regression
procedures in the SAS System.
The REG procedure allows several MODEL
statements and gives additional regression
diagnostics, especially for detection of collinearity. It
also creates plots of model summary statistics and
regression diagnostics.
PROC REG <options>;
MODEL dependents=independents </options>;
PLOT <yvariable*xvariable>;
RUN;

An example
PROC REG DATA=water;
MODEL Water = Temperature Days Persons / VIF;
MODEL Water = Temperature Production Days / VIF;
RUN;
PROC REG DATA=water;
MODEL Water = Temperature Production Days;
PLOT STUDENT.* PREDICTED.;
PLOT STUDENT.* NPP.;
PLOT NPP.*r.;
PLOT r.*NQQ.;
RUN;
The LOGISTIC procedure
The binary or ordinal responses with continuous
independent variables
PROC LOGISTIC < options > ;
MODEL dependents=independents < / options > ;
RUN;
The binary or ordinal responses with categorical
independent variables
PROC LOGISTIC < options > ;
CLASS categorical variables < / option > ;
MODEL dependents=independents < / options > ;
RUN;

Example
PROC LOGISTIC data=Neuralgia;
CLASS Treatment Sex;
MODEL Pain= Treatment Sex Treatment*Sex Age Duration;
RUN;

Overview Summary Report
Procedures
PROC FREQ: produce frequency counts
PROC TABULATE: produce one- and two-dimensional tabular
reports
PROC REPORT: produce flexible detail and summary reports
The FREQ Procedure
The FREQ procedure display frequency counts
of the data values in a SAS data set.

General form of a simple PROC FREQ steps:

PROC FREQ DATA = SAS-data-set;
TABLE SAS-variables </options>;
RUN;
The FREQ Procedure
Example:
PROC FREQ DATA = class.crew ;
FORMAT JobCode $codefmt. Salary money.;
TABLE JobCode*Salary /NOCOL NOROW OUT =freqTable;
RUN;


The TABULATE Procedure
PROC TABULATE displays descriptive
statistics in tabular format.
General form of a simple PROC TABULATE
steps:
PROC TABULATE DATA=SAS-data-set;
CLASS class-variables;
VAR analysis-variables;
TABLE row-expression,
column-expression</options>;
RUN;


The TABULATE Procedure
Example:
TITLE 'Average Salary for Cary and Frankfurt';
PROC TABULATE DATA= class.crew FORMAT=dollar12.;
WHERE Location IN ('Cary','Frankfurt');
CLASS Location JobCode;
VAR Salary;
TABLE JobCode, Location*Salary*mean;
RUN;

The REPORT procedure
REPORT procedure combines features of the
PRINT, MEANS, and TABULATE procedures.
It enables you to
create listing reports
create summary reports
enhance reports
request separate subtotals and grand totals

The REPORT procedure
Example
PROC REPORT DATA =class.crew nowd HEADLINE HEADSKIP;
COLUMN JobCode Location Salary;
DEFINE JobCode / GROUP WIDTH= 8 'Job Code';
DEFINE Location / GROUP 'Home Base';
DEFINE Salary / FORMAT=dollar10. 'Average Salary MEAN ;
RBREAK AFTER / SUMMARIZE DOL;
RUN;

Вам также может понравиться