Вы находитесь на странице: 1из 29

SAS CHEAT SHEET

Version: 1.0.2

Dr. Ali Ajouz


Data analyst with a
strong math SAS and all other SAS Institute Inc. product or service
background and names are registered trademarks or trademarks of SAS
experience in machine Institute Inc. in the USA and other countries. indicates
learning, computational USA registration.
number theory , and The following example codes often works. Refer to SAS
statistics. Passionate docs for all arguments.
about explaining data
science to non-technical
business audiences. April 17, 2017

Linkedin: https://www.linkedin.com/in/ajouz/ Email: aliajouz@gmail.com


From Raw Data to Final Analysis

1. You begin with raw data, that is, a collection of data that
has not yet been processed by SAS.
2. You use a set of statements known as a DATA step to get
your data into a SAS data set.
3. Then you can further process your data with additional
DATA step programming or with SAS procedures.
Reading raw data from instream data lines

data names; DATA STEP


infile datalines delimiter=',';
INFILE: identifies an external file to read
length first last $25.;
INPUT: specifies the variables in the new data
input first $ last $;
datalines; LENGTH: the number of bytes for storing
Ali, Ajouz variables
Karam, Ghawi DATALINES: indicates that data lines follow
;
RUN: execute the DATA step.
run;

Some useful infile options are: DELIMITER,


MISSOVER, DLM, FIRSTOBS
Reading raw data from external file

proc import Using: PROC IMPORT


datafile="C:\file.csv"
out=outdata DATAFILE: the location of file you want
dbms=csv to read.
replace; DBMS : specifies the type of data to
getnames=yes; import.

data names; Using: DATA STEP


infile "file-path" delimiter=',';
using INFILE statement to identify an
length first last $25.;
external file to read with an
input first $ last $;
INPUT statement
run;
Reading raw data (3)

INFORMATS FORMATS
Informats tell SAS how to read the VS formats tell SAS how to display the
data. data.

USING FORMAT AND INFORMAT POPULAR INFORMATS

data names; $w. Reads in


infile datalines delimiter=','; character data of
informat first $15. last $15. birthday ddmmyy10.; length w.
format birthday date9.;
input first $ last $ birthday; w.d Reads in numeric
datalines; data of length w
Ali,Ajouz, 04/11/1980 with d decimal
; points

MMDDYYw. Reads in date


data in the form
of 04-11-80
Reading raw data
(4)
We can use multiple INPUT
We can use double trailing @@ to read a
statements to read group of
record into a group of observations
records as single observation.

Single observations from Multiple observations


multiple records from single record
data multilines; data multilines;
infile datalines; infile datalines delimiter=",";
length FullName Addresse $36; length FullName Addresse $36;
input FullName $36.;
input FullName$ Addresse $ @@;
input Addresse $36.;
datalines;
datalines;
Ali Ajouz Ali Ajouz, Danziger str.7, Karam Ghawi,Berliner
Danziger str.7 str.32
Karam Ghawi ;
Berliner str.32 run;
;
Selecting columns by name
Selecting columns
data selected;
set sashelp.cars; Selecting columns by position
keep model type;
%let lib= sashelp;
Deleting columns by name %let mydata= class;
%let to_keep=(1:3); /* range */
data selected; proc sql noprint;
set sashelp.cars; select name into :vars separated by ' '
drop model type; from dictionary.columns
where libname="%upcase(&lib.)" and
memname="%upcase(&mydata)"
Change column label
and varnum in &to_keep;
quit;
data selected; data want;
set sashelp.cars; set &lib..&mydata (keep=&vars.);
label model="car model"
OR
type="car type";
%let to_keep=(1,3,5); /*list */
Selecting Rows

Ignore the first N rows Selecting rows by index

data want; data want;


set sashelp.buy (firstobs=5); do Slice=2,1,7,4;
set sashelp.buy point=Slice;
output;
Limit numbers of rows to be read end;
stop;
data want;
run;
set sashelp.buy (obs=5);

Random sample
Select rows by using IF statement
proc surveyselect data=sashelp.cars
data want; method=srs n=10 seed=12345
set sashelp.buy; out=SampleSRS;
if amount >=-1000; run;
Data attributes + columns list Peek at the data set contents
proc datasets ;
Data header (first 5 rows)
contents
data=SASHELP.HEART
order=collate; proc print data=sashelp.class (obs=5);
quit;

Frequency tables Number of rows

proc freq data=sashelp.class; data _NULL_;


tables SEX/ plots=freqplot;
if 0 then set sashelp.class nobs=x;

Data summary statistics call symputx('nobs', x);


stop;
proc means data=sashelp.iris run;
N mean std min max q1 median
%put #rows = &nobs;
q3;
data Score ;
infile datalines dsd; Sorting a data set
input Name$ Score;
datalines;
Ali,66
Karam,82
Roaa,57
Hala,57
;
proc sort data=Score out=sorted;
by descending score;
run;
proc print data=sorted; The Sort procedure order SAS
var Name Score; data set observations by the
title 'Listed by descending Score'; values of one or more character
run; or numeric variables .
Arithmetic Operators Comparison operators

+ addition = EQ Equal to
Some SAS
- subtraction ^= NE Not Equal to
Operators
* multiplicatio > GT greater than
n
< LT less than
A SAS operator is a
** exponentiat
ion >= GE greater than symbol that
represents a
or equal to
/ division comparison,
arithmetic
Logical operators <= LE less than or calculation, or logical
equal to operation; a SAS
& AND function; or grouping
parentheses.
| OR
IN is one of
^ NOT
data names;
infile datalines;
input Name$;
If statement
datalines;
Ali
Karam
Roaa
Hala
;
data titles;
set names;
if name="Ali" then
do; SAS evaluates the expression in
title="Papa"; an IF-THEN/ELSE statement to
end; produce a result that is either
else if name="Karam" then nonzero, zero, or missing. A
do;
nonzero and nonmissing result
title="Mama";
end; causes the expression to be true;
else a result of zero or missing causes
title="Kid"; the expression to be false.
data names;
infile datalines; Select statement
input Name$;
datalines;
Ali
Karam
Roaa
Hala
;
data titles;
set names;
select (name); Select VS If:
when ("Ali") title="Papa"; When you have a long series of
when ("Karam") mutually exclusive conditions,
title="Mama"; using a SELECT group is more
otherwise title="Kid"; efficient than using a series of
end; IF-THEN statements because
run; CPU time is reduced.
data df; data df;
do x=1 to 3; do x=1,2,3; DO Statement
y=x**2; y=x**2;
output; output;
end; end;
run; run;

data df; data df;


x=1; x=1;
do while (x<4); do until (x>3);
y=x**2; y=x**2; The basic iterative DO
output; output; statement in SAS has the syntax
x=x + 1; x=x + 1; DO value = start TO stop. An END
end; end; statement marks the end of the
run; run; loop, as shown in the following
examples.
Transposing data with the data step
Step 1: Sample data Step 3: Transposing the data

data raw; data rawflat;


infile datalines set raw;
delimiter=","; by id;
input ID $ Name$ Try keep id Name Try1-Try3;
result; retain Try1-Try3;
datalines; array myvars {*} Try1-Try3;
1,Ali,1,160
1,Ali,2,140 if first.id then
2,Karam,1,141 do i = 1 to
2,Karam,3,161 dim(myvars);
; myvars{i} = .;
end;
Step 2: Sorting by ID
myvars{Try} = result;
proc sort
if last.id;
data=raw;
run;
by id;
run;
Labels and Formats
data Salary;
infile datalines dsd;
input Name$ Salary;
format Salary dollar12.;
Label Name="First Name"
Salary="Annual Salary";
datalines;
Ali,45000
Karam,50000
;
We can use LABEL Statement to
run;
assigns descriptive labels to
proc print data=Salary label; variables. Also, FORMAT Statement
run; to associates formats with variables.
data Score ;
infile datalines dsd;
input Name$ Score;
Ranking
datalines;
Ali,66
Karam,82
Roaa,57
Hala,57
;
proc rank data=Score
out=order descending ties=low;
var Score ;
ranks ScoreRank;
run; The RANK procedure computes
ranks for one or more numeric
proc print data=order;
title "Rankings of Participants' variables across the
Scores"; observations of a SAS data set
run; and writes the ranks to a new
SAS data set.
Working with numeric values
function-name (var1, var2, ........., varn)

Some useful functions Apply functions


SUM Sum of nonmissing
arguments
to all variables x1 to xn

MEAN Average of arguments function-name(of x1-xn)

MIN Smallest value to all variables begin with same string


MAX Largest value function-name(of x:)
CEIL Smallest integer
greater than or equal
to argument
to special name list

FLOOR Greatest integer less function-name(of _ALL_)


than or equal to function-name(of _Character_)
argument
function-name(of _Numeric_)
Summarizing data by groups Summarizing
proc sort data=sashelp.class data
out=Class;
by sex;
In the DATA step, SAS identifies
data Class (keep= sex count); the beginning and end of each BY
set Class; group by creating two temporary
by sex; variables for each BY variable:
if first.sex then FIRST.variable and LAST.variable.
do;
count=0;
end;
count+1;
if last.sex;

Accumulating variable using (+)

data total;
set sashelp.buy;
TotalAmount + Amount;
A missing value is a valid value in SAS. A missing
character value is displayed as a blank, and a
missing numeric value is displayed as a period.
STDIZE procedure with the REPONLY option can be
Working with
used to replace only the missing values. The missing values
METHOD= option enables you to choose different
location measures such as the mean, median, and
midrange. Only numeric input variables should be
used in PROC STDIZE.

Step 2: Missing Step 3: Impute missing


Step 1: Setup
indicators values
/* data set*/ data MissingInd proc stdize data=MissingInd
%let mydata (drop=i);
reponly
=sashelp.baseball; set &mydata;
array mi{*} method=median
/* vars to impute */
&MissingInd;
%let inputs = salary; out=imputed;
array x{*} &inputs;
/* missing Indicators*/ do i=1 to dim(mi); var &inputs;
%let MissingInd = mi{i}=(x{i}=.); run;
MIsalary; end;
run;
Last observation carried forward (LOCF)

Step 1: Sample data Step 3: Compute LOCF


LOCF
data raw; data rawflat; One method of
infile datalines set raw; handling missing
delimiter=","; by id; data is simply to
input ID $ LastTry; retain LOCF;
impute, or fill in,
datalines;
1,160 * set to missing to avoid being carried values based on
1,. to the next ID; existing data.
1,140
2,141 if first.id then
2,. do;
; LOCF=.;
end;
Step 2: Sorting by ID * update from last non missing value;
proc sort if LastTry not EQ . then
data=raw; LOCF=LastTry;
by id; run;
run;
Working with character values

Some useful functions

Function Details Example

SUBSTR(char, position ,n) Extracts a substring phone = "(01573) 3114800" ;


from an argument area = substr(phone, 2, 5) ;

SCAN(char, n, 'delimiters') Returns the nth word Name = "Dr. Ali Ajouz" ;
from a string. title = scan(Name, 1, ".") ;

INDEX(source,excerpt) Searches a character INDEX('Ali Ajouz','Ajo');


for a string

COMPRESS Remove characters phone="(01573)- 3114800";


from a string phone_c= compress(phone,'(-) ');
Working with character values (2)
Some useful functions

Function Details Example

TRANWRD Replaces or removes old ="I eat Apple";


given word within a string new =
TRANWRD(old,"Apple","Pizza");

CATX(sep, str1,str2, .) Removes leading and First="Ali";


trailing blanks, inserts Last = "Ajouz";
delimiters, and returns a UserName=catx(".", First, Last);
concatenated character
string.

STRIP(char) Removes leading and STRIP(" aliajouz@gmail.com ");


trailing blanks

Also UPCASE for uppercase and LOWCASE for lowercase


SAS dates and times
SAS DATE VALUE: is a value that Converting a text date to SAS
represents the number of days between date USING INFORMATS
January 1, 1960, and a specified date.

December 31, 1959 -1 INPUT INFORMAT

January 1, 1960 0 04/11/1980 mmddyy10.


January 2, 1960 1 04/11/80 mmddyy8.
Raw date to SAS date
04nov1980 date9.
data Birthday;
nov 04, 1980 worddate12
infile datalines;
.
input date ddmmyy10.;
datalines; Refer to SAS docs for all
04/11/1980 available INFORMATS.
;
Working with dates, and times
Some useful functions
DATEPART(datetime) Extracts the date from a Example
SAS datetime value.
data Me (drop=DateTime);
TIMEPART(datetime) Extracts a time value
format Date ddmmyy10. Time
from a SAS datetime
value. time.;

DAY(date) Returns the day of the DateTime='04Nov80:12:15'dt;


month from a SAS date Date=datepart(DateTime);
value.
Time=timepart(DateTime);
MONTH(date) Returns the month
Day=day(Date);
from a SAS date value. Month=month(Date);
Year=year(Date);
YEAR(date) Returns the year from a run;
SAS date value.

DATE / TODAY Returns the current


date
Concatenating
data sets rows

We can use SET statement


to combine data sets
vertically.
+ =
data result;
set data1 data2;
run;
Concatenating
data sets
columns
+
We can use Merge
statement to combine data =
sets horizontally
(One-to-One) .

data result;
merge data1 data2;
run;
We can use IN option in
the MERGE statement Merging data sets on a
with IF statement to
control the output.
common column

data result;
merge data1(in=InData1)
+ data2(in=InData2);
by ID;
/* code here */

if InData1=1 and InData2=0; if InData1=0 and InData2=1; if InData1=1 and InData2=1;


to be continued ..

I welcome comments,
suggestions and corrections.
aliajouz@gmail.com

Вам также может понравиться