Академический Документы
Профессиональный Документы
Культура Документы
SAS Overview
3
Data Portion
4
5
Observation flow via Data step
6
Program data vector (PDV) is a logical area in memory where SAS builds a data set, one
observation at a time. When a program executes, SAS reads data values from the input buffer (a
logical area in memory into which SAS reads each record of raw data when SAS executes an
INPUT statement) or creates them by executing SAS language statements. The data values are
assigned to the appropriate variables in the program data vector. From here, SAS writes the
values to a SAS data set as a single observation. When the DATA step reads a SAS data set,
SAS reads the data directly into the program data vector.
1. The SAS automatic variables ( _N_ and _ERROR_ )
2. Temp variable (First., Last. , IN , END …etc)
machine code along with creation of two things viz. INPUT BUFFER and PDV.
Execution Phase - Before beginning of the this phase all the variables will be initialized
to missing(if character) and periods(if numeric)[use "put _all_" statement to check PDV
status in the log]. Then when the input statement is encountered for the first time the
first record from raw data file is moved from it to INPUT BUFFER after that one
observation is then again moved to PDV. When run statement is encountered, then the
implicit OUTPUT statement forces the read data row to the output data set.
7
Sample programe
Data Total_points (drop=TeamName); 1
input TeamName $ ParticipantName $ Event1 Event2 Event3; 2
TeamTotal + (Event1 + Event2 + Event3); 3
datalines;
Knights Sue 6 8 8
Kings Jane 9 7 8
Knights John 7 7 7
;
Run;
Values from the First Record are Read (Input buffer, PDV, Output)
http://support.sas.com/documentation/cdl/en/lrcon/62955/HTML/default/viewer.htm#a000961108.htm
8
Parts of a Program
◦ Introduction comments
◦ Data import
◦ Data manipulation/cleaning
◦ Data analysis
◦ Data export
9
Semi-colon at the end of every statement
Comments
◦ /* Comment */
◦ * Comment ;
◦ */; - completes open comments from prior runs
10
OPTIONS statement
◦ Specifies SAS system options
11
A Libname is a reference to a directory.
◦ libname test ‘c:\temp\’;
◦ Must be used for a permanent SAS data set
12
% Let allows you to create substitution for names
throughout the data set
◦ Use % Let when high chance that it will change
◦ Allows user to make changes in one place, rather
than throughout the program
◦ Often use % Let for path names and file names
13
Example of a %Let statement
14
Importing Data into SAS
15
SAS can import the following types of files:
◦ Text (fixed-length, variable-length and delimited)
◦ Excel workbooks
◦ Access databases
◦ DBF files
◦ EBCIDIC files (IBM Mainframe)
16
There are two types of data sets:
◦ Working
Temporary data set stored in the “Work”
directory
Once SAS is closed, the data sets are deleted
◦ Permanent
Data sets stored in folder on computer
Use libname reference as a “pipeline” to point to
permanent SAS datasets for program
If SAS is closed, data sets are saved on the
computer
17
SAS views dates as the number of days since
January 1, 1960.
18
When raw data contains invalid dates (e.g. ‘000000’),
dates can be read in as character and then converted
to a date format to eliminate errors in the log. For
example,
◦ IF DATEVAR1 NE '000000' THEN DATEVAR =
INPUT(DATEVAR1,YYMMDD6.);
19
Informat DATA TEMP;
20
Data Analysis
and Manipulation
21
Once SAS imports the data, there are two types of
statements for manipulation and analysis:
◦ Data Step
If-Then-Else statements
Merges
Loops
◦ Proc Step
SAS-defined procedures for analysis (summaries,
statistics, etc.)
22
DATA steps in SAS are used to:
◦ Read in raw data files
Use INPUT statement
23
The DROP= option eliminates the specified variables from
the output data set. For example,
◦ DATA newdata1(DROP=field1 field2);
24
Example of a Data Step
25
If-Then-Else Statements
◦ Simple logic in order to perform data manipulation
data data1;
set libname1.perm_data1;
IF day_of_Week = "MON" THEN ID = 2; ELSE
IF day_of_Week = "TUE" THEN ID = 3; ELSE
IF day_of_Week = "WED" THEN ID = 4; ELSE
IF day_of_Week = "THU" THEN ID = 5; ELSE
IF day_of_Week = "FRI" THEN ID = 6; ELSE
IF day_of_Week = "SAT" THEN ID = 7; ELSE
IF day_of_Week = "SUN" THEN ID = 1; ELSE
IF day_of_Week = "HOL" THEN ID = 8;
run;
26
Use If-Then-Else statements to subset your data
◦ Use “If condition” to keep records with condition
◦ Break up data set into several data sets by condition
DATA excep(RENAME=(sumfield=excepamt)
gooddata(DROP=field1 field2);
SET origdata;
sumfield = field1 + field2;
IF sumfield GT 100 THEN OUTPUT excep;
ELSE OUTPUT gooddata;
RUN;
27
If there are several steps in an If-Then-Else statement, use a Then Do or
Else Do
Example of code:
28
Blanks and Null Values
◦ There are several ways to flag blank / null records:
charvar = ‘ ‘;
numvar = .;
29
A SAS merge combines records that have equal
values based on the field(s) in the BY statement.
The data sets must be sorted by the field(s) in the
BY statement prior to performing the merge.
The IF statement will select the records to include
in the resulting table.
Many to many merges will not have the same
result as a many to many join.
Overlaying of common variables will occur (Update
will be helpful)
◦ Use PROC CONTENTS and RENAME= data set
option
30
SAS Dataset 2 SAS Dataset 2 SAS Dataset 4
SAS Dataset 1
31
31
Merge SAS Data Set1-2 Merge SAS Data Set3-4
Code Amt
Code In2 In1 A 9
A 1 1 B 1
A 1 1 C .
B 1 0 D 5
E 2
C 1 1 F .
C 1 1
Update SAS Data Set3-4
D 1 0
E 0 1 Code Amt
F 1 1 A 9
F 1 1 B 1
C 2
D 5
E 2
F 3
Example SAS MERGE.SAS
32
Example of Merge in Data step:
http://support.sas.com/techsup/technote/ts705.pdf
33
Loops allow the users to iterate through a variable or data step in
a controlled manner
DATA newfile;
SET oldfile;
DO WHILE (condition);
task to perform during condition;
END;
RUN;
34
CAT, CATS,CATT,CATX
35
Count
36
Countc
37
Ifc
ifc (condition, value #1, value #2, value #3)
looks at a condition and returns user specified, character values if the condition is true, false, or
results in a missing value.
Parameters:
◦ condition (numeric expression)
◦ value #1: char expression if condition is true.
◦ value #2: char expression if condition is false.
◦ value #3: char expression if condition results in a missing value.
Ifn
ifn (condition, value #1, value #2, value #3)
looks at a condition and returns user specified, numeric values if the condition is true, false, or
results in a missing value.
Parameters:
◦ condition (numeric expression)
◦ value #1: num expression if condition is true.
◦ value #2: num expression if condition is false.
◦ value #3: num expression if condition results in a missing value.
subpad
parameters:
-string string to take an excerpt (substring) from.
-position location of first character in the substring. –
-length the number of characters in the –
subpad WILL return a variable with the length specified, padding the
results with spaces.
subpad can return a string with a length of zero.
39
substrn
• parameters:
-string – string to take an excerpt (substring) from.
-position - location of first character in the substring.
-length - the number of characters in the
same as the substr function except:
substrn truncates the result when length exceeds the length of the string.
there will be no error messages for invalid third arguments.
ANYALNUM (variable)
beats using: indexc(lowcase(variable),‘qwertyuiopasdfghjklzxcvbnm1234567890’)
ANYALPHA (variable)
beats using: indexc(lowcase(variable),‘qwertyuiopasdfghjklzxcvbnm’)
NOTALNUM (variable)
beats using: indexc(lowcase(variable),‘qwertyuiopasdfghjklzxcvbnm1234567890’) =0
substr.txt
40
Proc statements are SAS procedures for data
analysis
◦ Proc Contents
◦ Proc Datasets
◦ Proc Means
◦ Proc Summary
◦ Proc Freq
◦ Proc Format
◦ Proc Univariate
◦ Proc Print
41
Provides field information and data set information
◦ Field Info: Name, Type, Format, Label
◦ Data Set Info: Creation date, Number of observations
◦ Automatically prints to output
◦ Very useful to perform prior to a merge to ensure
variables from one data set will not write over variables
from the other data set.
Example of code:
42
Provides information regarding a library, including datasets in the
library
◦ Can perform the following:
Append datasets together
Copy datasets to other libraries
Delete datasets
Modify the contents of a dataset
Rename a dataset
Example of code:
RUN;
43
Computes descriptive statistics on numeric variables in a SAS
data set. Results can be output to a new SAS data set.
◦ Defaults to print all descriptive statistics (e.g. mean, standard
deviation, minimum, etc.)
◦ Output automatically prints
◦ If no numeric variables are specified in the VAR statement, then
all numeric variables in the data set are analyzed
Example of code:
44
Computes descriptive statistics on numeric variables in a SAS
data set and outputs the results into a new SAS data set
◦ Descriptive statistics must be specified in the OUTPUT
statement
◦ Output does not automatically print
◦ If no numeric variables are specified in the VAR statement, then
a simple count of the observations is generated.
Example of Code:
45
Produces 1-way to n-way frequency and cross-tabulation
tables
Example of Code:
DATA datanew2;
SET datanew;
IF COUNT > 1 THEN OUTPUT datanew2;
RUN;
46
PROC FORMAT, because of its name, is most often used to
change the appearance of data for presentation.
Example of Code:
Proc format;
value tempfmt low -< 61 = ' 1'
61 -< 63 = ' 2'
63 - high = ' 3'
other = ' '
Proc format.txt
;
run;
data pme;
set pme;
tempcode=put(avgtemp, tempfmt.);
Run;
http://www2.sas.com/proceedings/sugi30/001-30.pdf
47
47
Provides statistics on numeric variables
◦ Automatically prints the results to output
From the report:
◦ Number of missing observations
◦ Number of records > 0, = 0, and <0
◦ Total, Mean, Median, Standard Deviation, Percentiles, etc.
◦ Top and bottom 5 records
◦ Sophisticated statistical calculations
Example of Code:
48
Print data set to screen
◦ Less frequently used if using PC SAS
◦ However, when using SAS on the Mainframe, Proc
Print would allow you to see your data
49
What is a macro?
◦ A macro is a block of code meant for repetitive
runs through the data
50
Example of macro code:
%MACRO Macro_name(DATASET);
DATA &DATASET.;
SET &DATASET.;
%MEND Macro_name;
%Macro_name(TOTAL);
51
SAS Output
52
There are several ways you can export data:
◦ Proc Export
53
Proc Export is the most
Example
common way to export data
Very similar to the Proc SAS Program appended
at #17
Import step, but works in
the reverse
SAS can export to the
following files:
◦ Excel
◦ Access
◦ DBF
◦ Text
54
Exercise1 Exercise2
DOC File
DOC File
Inventory Data for SAS Basics.txt AR Inventory for SAS Training.txt AR Data SAS Training.txt
55
Debugging and Checking SAS
Programs
56
It is essential to check your SAS log for errors:
◦ Error (in red): Very Bad!
◦ Warning (in green): Maybe bad!
57
Let’s run Debug Program.sas to see different types of
errors.
58
proc SQL
Macros
Proc Tabulate/Report
Merge using Proc format
59