Вы находитесь на странице: 1из 60

2

SAS Overview

3
 Data Portion

 Rules for SAS name (Varibale/Data Set)

4
5
 Observation flow via Data step

6
 Program data vector (PDV) is a logical area in memory where SAS builds a data set, one
observation at a time. When a program executes, SAS reads data values from the input buffer (a
logical area in memory into which SAS reads each record of raw data when SAS executes an
INPUT statement) or creates them by executing SAS language statements. The data values are
assigned to the appropriate variables in the program data vector. From here, SAS writes the
values to a SAS data set as a single observation. When the DATA step reads a SAS data set,
SAS reads the data directly into the program data vector.
1. The SAS automatic variables ( _N_ and _ERROR_ )
2. Temp variable (First., Last. , IN , END …etc)

While reading a Raw data file:-


 Compilation Phase - It checks for syntax errors and conversion of data step into the

machine code along with creation of two things viz. INPUT BUFFER and PDV.
 Execution Phase - Before beginning of the this phase all the variables will be initialized

to missing(if character) and periods(if numeric)[use "put _all_" statement to check PDV
status in the log]. Then when the input statement is encountered for the first time the
first record from raw data file is moved from it to INPUT BUFFER after that one
observation is then again moved to PDV. When run statement is encountered, then the
implicit OUTPUT statement forces the read data row to the output data set.

7
 Sample programe
Data Total_points (drop=TeamName); 1
input TeamName $ ParticipantName $ Event1 Event2 Event3; 2
TeamTotal + (Event1 + Event2 + Event3); 3
datalines;
Knights Sue 6 8 8
Kings Jane 9 7 8
Knights John 7 7 7
;
Run;
 Values from the First Record are Read (Input buffer, PDV, Output)

Computed Value of the Sum Statement

http://support.sas.com/documentation/cdl/en/lrcon/62955/HTML/default/viewer.htm#a000961108.htm
8
 Parts of a Program
◦ Introduction comments

◦ Data import

◦ Data manipulation/cleaning

◦ Data analysis

◦ Data export

9
 Semi-colon at the end of every statement

 Spacing for aesthetics, not syntax

 Comments
◦ /* Comment */
◦ * Comment ;
◦ */; - completes open comments from prior runs

 “Run;” at the end to execute the code

10
 OPTIONS statement
◦ Specifies SAS system options

◦ Often used at the beginning of a program to specify


page formatting, limit the number of observations
processed, etc.

◦ Unless re-specified, options stay the same


throughout the entire SAS session for all DATA steps
and PROC’s that execute (even if the current
program does not specify them).
OPTION NOPRINT MLOGIC SYMBOLGEN pagesize=60 ;
DM LOG 'CLEAR' LOG;

11
 A Libname is a reference to a directory.
◦ libname test ‘c:\temp\’;
◦ Must be used for a permanent SAS data set

 A Fileref is a reference to a specific file.


◦ filename flatfile ‘c:\temp\test.txt’;
◦ Provides an easy way to change the file used when
testing

12
 % Let allows you to create substitution for names
throughout the data set
◦ Use % Let when high chance that it will change
◦ Allows user to make changes in one place, rather
than throughout the program
◦ Often use % Let for path names and file names

13
 Example of a %Let statement

%LET OUTPUTPATH=C:\Documents and Settings\My


Documents\SAS99 v3\Testing v3\Output;

Filename PROGFILE "&OUTPUTPATH.\PROGFILE.sas";


Filename EXCLUDE "&OUTPUTPATH.\EXCLUDE.sas";
Filename OUT_ENT "&OUTPUTPATH.\Entry_nonzero.xls";
Filename OUT_DQ "&OUTPUTPATH.\Mat_Exceptions.xls";

14
Importing Data into SAS

15
 SAS can import the following types of files:
◦ Text (fixed-length, variable-length and delimited)
◦ Excel workbooks
◦ Access databases
◦ DBF files
◦ EBCIDIC files (IBM Mainframe)

SAS File Transfer.sas

16
 There are two types of data sets:
◦ Working
 Temporary data set stored in the “Work”
directory
 Once SAS is closed, the data sets are deleted

◦ Permanent
 Data sets stored in folder on computer
 Use libname reference as a “pipeline” to point to
permanent SAS datasets for program
 If SAS is closed, data sets are saved on the
computer

17
 SAS views dates as the number of days since
January 1, 1960.

 Since SAS considers dates to be numeric,


calculations are easily performed.

 SAS reads and prints a wide variety of dates


including mm/dd/yy, yyyymmdd, Julian, etc..

 YEARCUTOFF= system option


http://ftp.sas.com/service/techsup/tsnews-l/0040.html

18
 When raw data contains invalid dates (e.g. ‘000000’),
dates can be read in as character and then converted
to a date format to eliminate errors in the log. For
example,
◦ IF DATEVAR1 NE '000000' THEN DATEVAR =
INPUT(DATEVAR1,YYMMDD6.);

 Comparisons to SAS date constants


◦ IF date > 12312000; WRONG!
◦ IF date > ‘31dec2000’d; RIGHT!

 SAS Date Functions

19
 Informat DATA TEMP;

◦ Tells SAS how to INFILE "G:\Mydocument\GLDET.TXT" DELIMITER = '~'


MISSOVER DSD LRECL=32767 FIRSTOBS=2 ;
read data values INFORMAT GL_ACCOUNT__ $33. ;
into a variable INFORMAT GL_ACCOUNT_DESCRIPTION $128. ;

◦ Used in input INFORMAT ACCOUNT_TYPE_INDICATOR $1. ;


INFORMAT FDC_CODE $10. ;
statements
FORMAT GL_ACCOUNT__ $33. ;
FORMAT GL_ACCOUNT_DESCRIPTION $128. ;
 Format FORMAT ACCOUNT_TYPE_INDICATOR $1. ;
◦ Tells SAS how to FORMAT FDC_CODE $10. ;

write data values INPUT


either to a file, a GL_ACCOUNT__ $
report, or a screen GL_ACCOUNT_DESCRIPTION $

◦ Used in output ACCOUNT_TYPE_INDICATOR $


FDC_CODE $
statements ;
RUN;

20
Data Analysis
and Manipulation

21
 Once SAS imports the data, there are two types of
statements for manipulation and analysis:
◦ Data Step
 If-Then-Else statements
 Merges
 Loops

◦ Proc Step
 SAS-defined procedures for analysis (summaries,
statistics, etc.)

22
 DATA steps in SAS are used to:
◦ Read in raw data files
 Use INPUT statement

◦ Read in and manipulate SAS data sets


 Use SET or MERGE statements
 Subset data
 Create additional fields
 Perform calculations
 Merge SAS data sets
 Append SAS data sets
 Loop through SAS data sets
 etc.

23
 The DROP= option eliminates the specified variables from
the output data set. For example,
◦ DATA newdata1(DROP=field1 field2);

 The KEEP= option keeps only the specified variables in the


output data set. For example,
◦ DATA newdata2(KEEP=field3 field4);

 The RENAME= option renames variables in the output data


set. This is commonly used for merges. For example,
◦ DATA new(RENAME=(oldname=newname));

24
 Example of a Data Step

data rev_chart (keep = acct_type rev_debit_amt rev_debit_ct rev_credit_amt


rev_credit_ct
exp_debit_amt exp_debit_ct exp_credit_amt exp_credit_ct period);
set input_data year_input_data;
if Acct_Type = 'Revenue' then do;
rev_debit_amt = debit_amt;
rev_debit_ct = debit_cnt;
rev_credit_amt = credit_amt;
rev_credit_ct = credit_cnt;
end;
else if Acct_Type = 'Expense' then do;
exp_debit_amt = debit_amt;
exp_debit_ct = debit_cnt;
exp_credit_amt = credit_amt;
exp_credit_ct = credit_cnt;
end;
else delete;
run;

25
 If-Then-Else Statements
◦ Simple logic in order to perform data manipulation

data data1;
set libname1.perm_data1;
IF day_of_Week = "MON" THEN ID = 2; ELSE
IF day_of_Week = "TUE" THEN ID = 3; ELSE
IF day_of_Week = "WED" THEN ID = 4; ELSE
IF day_of_Week = "THU" THEN ID = 5; ELSE
IF day_of_Week = "FRI" THEN ID = 6; ELSE
IF day_of_Week = "SAT" THEN ID = 7; ELSE
IF day_of_Week = "SUN" THEN ID = 1; ELSE
IF day_of_Week = "HOL" THEN ID = 8;
run;

26
 Use If-Then-Else statements to subset your data
◦ Use “If condition” to keep records with condition
◦ Break up data set into several data sets by condition

 Example of SAS code to subset data:

DATA excep(RENAME=(sumfield=excepamt)
gooddata(DROP=field1 field2);
SET origdata;
sumfield = field1 + field2;
IF sumfield GT 100 THEN OUTPUT excep;
ELSE OUTPUT gooddata;
RUN;

27
 If there are several steps in an If-Then-Else statement, use a Then Do or
Else Do

◦ Use “End” statement at end of Do statement

 Example of code:

DATA DATAKEEP DATAEXCLUDE;


SET DATA1;
IF VAR1="KEEP" THEN DO;
DATASET="DATAKEEP";
OUTPUT DATAKEEP;
END;
ELSE DO;
DATASET="DATAEXCLUDE";
OUTPUT DATAEXCLUDE;
END;
RUN;

28
 Blanks and Null Values
◦ There are several ways to flag blank / null records:
 charvar = ‘ ‘;
 numvar = .;

29
 A SAS merge combines records that have equal
values based on the field(s) in the BY statement.
 The data sets must be sorted by the field(s) in the
BY statement prior to performing the merge.
 The IF statement will select the records to include
in the resulting table.
 Many to many merges will not have the same
result as a many to many join.
 Overlaying of common variables will occur (Update
will be helpful)
◦ Use PROC CONTENTS and RENAME= data set
option

30
SAS Dataset 2 SAS Dataset 2 SAS Dataset 4
SAS Dataset 1

Code In2 Code In1 Code Amt Code Amt


A 1 A 1 A 1 A 9
A 1 C 1 B . B 1
B 1 C 1 C 2 C .
C 1 E 1 D 5 E 2
D 1 F 1 F 3 F .
F 1 F 1

31
31
Merge SAS Data Set1-2 Merge SAS Data Set3-4

Code Amt
Code In2 In1 A 9
A 1 1 B 1
A 1 1 C .
B 1 0 D 5
E 2
C 1 1 F .
C 1 1
Update SAS Data Set3-4
D 1 0
E 0 1 Code Amt
F 1 1 A 9
F 1 1 B 1
C 2
D 5
E 2
F 3
Example SAS MERGE.SAS
32
 Example of Merge in Data step:

data data_merge notina notinb;


merge data1 (in=a) data2 (in=b);
by field1 field2 field3;
if a and b then output data_merge;
else if a and not b then output notinb;
else if not a and b then output noina;
run;

http://support.sas.com/techsup/technote/ts705.pdf

33
 Loops allow the users to iterate through a variable or data step in
a controlled manner

 Two types of loops in SAS


◦ Do Until
◦ Do While
 Must use an END statement

 Example of Do Loop code:

DATA newfile;
SET oldfile;
DO WHILE (condition);
task to perform during condition;
END;
RUN;

34
 CAT, CATS,CATT,CATX

35
 Count

36
 Countc

37
 Ifc
ifc (condition, value #1, value #2, value #3)
looks at a condition and returns user specified, character values if the condition is true, false, or
results in a missing value.
Parameters:
◦ condition (numeric expression)
◦ value #1: char expression if condition is true.
◦ value #2: char expression if condition is false.
◦ value #3: char expression if condition results in a missing value.

 Ifn
ifn (condition, value #1, value #2, value #3)
looks at a condition and returns user specified, numeric values if the condition is true, false, or
results in a missing value.
Parameters:
◦ condition (numeric expression)
◦ value #1: num expression if condition is true.
◦ value #2: num expression if condition is false.
◦ value #3: num expression if condition results in a missing value.

 ifn comparison to if/then/else


ifc.txt
◦ com= ifn(sales >= quota, sales* mqpct , sales* nmqpct );
◦ is the same as:
◦ if sales >= quota then
◦ com=sales* mqpct
 else
com=sales* nmqpct );
38
 lengthc, lengthm, lengthn
• Differences:
lengthc: returns the length of a string including trailing blanks.
lengthm: returns the length of the variable allocated in memory.
lengthn: returns the length of a string excluding the trailing blanks.
1. returns 0 if a string is blank, compared to the length function
which returns 1.

 subpad
parameters:
-string string to take an excerpt (substring) from.
-position location of first character in the substring. –
-length the number of characters in the –
subpad WILL return a variable with the length specified, padding the
results with spaces.
subpad can return a string with a length of zero.

39
 substrn
• parameters:
-string – string to take an excerpt (substring) from.
-position - location of first character in the substring.
-length - the number of characters in the
same as the substr function except:
substrn truncates the result when length exceeds the length of the string.
there will be no error messages for invalid third arguments.

 ANYALNUM (variable)
beats using: indexc(lowcase(variable),‘qwertyuiopasdfghjklzxcvbnm1234567890’)
 ANYALPHA (variable)
beats using: indexc(lowcase(variable),‘qwertyuiopasdfghjklzxcvbnm’)
 NOTALNUM (variable)
beats using: indexc(lowcase(variable),‘qwertyuiopasdfghjklzxcvbnm1234567890’) =0

substr.txt

40
 Proc statements are SAS procedures for data
analysis
◦ Proc Contents
◦ Proc Datasets
◦ Proc Means
◦ Proc Summary
◦ Proc Freq
◦ Proc Format
◦ Proc Univariate
◦ Proc Print

41
 Provides field information and data set information
◦ Field Info: Name, Type, Format, Label
◦ Data Set Info: Creation date, Number of observations
◦ Automatically prints to output
◦ Very useful to perform prior to a merge to ensure
variables from one data set will not write over variables
from the other data set.

 Example of code:

PROC CONTENTS DATA=dataset;


RUN;

42
 Provides information regarding a library, including datasets in the
library
◦ Can perform the following:
 Append datasets together
 Copy datasets to other libraries
 Delete datasets
 Modify the contents of a dataset
 Rename a dataset

 Example of code:

PROC DATASETS LIBRARY=libname; Proc dataset.txt

RUN;

43
 Computes descriptive statistics on numeric variables in a SAS
data set. Results can be output to a new SAS data set.
◦ Defaults to print all descriptive statistics (e.g. mean, standard
deviation, minimum, etc.)
◦ Output automatically prints
◦ If no numeric variables are specified in the VAR statement, then
all numeric variables in the data set are analyzed

 Example of code:

PROC MEANS DATA=oldfile;


CLASS division dept;
VAR units value;
RUN;

44
 Computes descriptive statistics on numeric variables in a SAS
data set and outputs the results into a new SAS data set
◦ Descriptive statistics must be specified in the OUTPUT
statement
◦ Output does not automatically print
◦ If no numeric variables are specified in the VAR statement, then
a simple count of the observations is generated.

 Example of Code:

PROC SUMMARY DATA=oldfile NWAY;


CLASS division dept;
VAR units value;
OUTPUT OUT=currfile SUM=;
RUN;

45
 Produces 1-way to n-way frequency and cross-tabulation
tables

 Example of Code:

PROC FREQ DATA=dataset NOPRINT;


TABLES variable / LIST MISSING OUT=datanew;
RUN;

DATA datanew2;
SET datanew;
IF COUNT > 1 THEN OUTPUT datanew2;
RUN;

46
 PROC FORMAT, because of its name, is most often used to
change the appearance of data for presentation.
 Example of Code:

Proc format;
value tempfmt low -< 61 = ' 1'
61 -< 63 = ' 2'
63 - high = ' 3'
other = ' '
Proc format.txt
;
run;
data pme;
set pme;
tempcode=put(avgtemp, tempfmt.);
Run;
http://www2.sas.com/proceedings/sugi30/001-30.pdf

47
47
 Provides statistics on numeric variables
◦ Automatically prints the results to output
 From the report:
◦ Number of missing observations
◦ Number of records > 0, = 0, and <0
◦ Total, Mean, Median, Standard Deviation, Percentiles, etc.
◦ Top and bottom 5 records
◦ Sophisticated statistical calculations

 Example of Code:

PROC UNIVARIATE DATA=oldfile;


VAR units;
RUN;

48
 Print data set to screen
◦ Less frequently used if using PC SAS
◦ However, when using SAS on the Mainframe, Proc
Print would allow you to see your data

*SET EXTERNAL LOG;


PROC PRINTTO
PRINT = "&PATH.\PROGRAMS\&CLIENT. &PERIOD.
&PROGRAM..DOC"
LOG = "&PATH.\PROGRAMS\&CLIENT. &PERIOD. &PROGRAM..DOC“
NEW;
RUN;
http://support.sas.com/documentation/cdl/en/proc/61895/HTML/
default/viewer.htm#a000146809.htm

49
 What is a macro?
◦ A macro is a block of code meant for repetitive
runs through the data

50
 Example of macro code:

%MACRO Macro_name(DATASET);

DATA &DATASET.;
SET &DATASET.;

IF Var1=. THEN Var1=0;


RUN;

%MEND Macro_name;

%Macro_name(TOTAL);

51
SAS Output

52
 There are several ways you can export data:

◦ Proc Export

◦ ODS (Output Delivery System)

◦ DDE (Data Dynamic Exchange)

53
 Proc Export is the most
Example
common way to export data
 Very similar to the Proc SAS Program appended
at #17
Import step, but works in
the reverse
 SAS can export to the
following files:
◦ Excel
◦ Access
◦ DBF
◦ Text

54
Exercise1 Exercise2

DOC File
DOC File

Input Data Input Data

Inventory Data for SAS Basics.txt AR Inventory for SAS Training.txt AR Data SAS Training.txt

SAS Basics Exercise1 Solution.sas SAS Basics Exercise2 Solution.sas

55
Debugging and Checking SAS
Programs

56
 It is essential to check your SAS log for errors:
◦ Error (in red): Very Bad!
◦ Warning (in green): Maybe bad!

 SAS will not stop its processing if it finds errors!

57
 Let’s run Debug Program.sas to see different types of
errors.

 Things to look out for:


◦ Colors or perform search for “error” and “warning”
◦ Dropped records or observations

 Visualize and understand how your results should


look. Did the results turn out as planned?

Use inventory raw file appended on slide #50


SAS Basics Debug Program.sas

58
 proc SQL
 Macros
 Proc Tabulate/Report
 Merge using Proc format

59

Вам также может понравиться