Академический Документы
Профессиональный Документы
Культура Документы
SAS Training
Produces oneway and n-way frequency tables, and it concisely describes the data by reporting
the distribution of variable values
Create crosstabulation tables that summarize data for two or more categorical variables by
showing the number of observations for each combination of variable values
Can include many statements and options for controlling frequency output
By default, PROC FREQ creates a one-way table with the frequency, percent, cumulative
frequency, and cumulative percent of every value of all variables in a data set
Syntax
Proc Freq Data = <SAS-data-set>;
Run;
Where,
Example:
Here,
In the above program FREQ procedure creates a frequency table for each variable in
the data set Parts.Widgets
To specify the variables to be processed by the FREQ procedure, include a TABLES statement
Syntax:
Proc Freq Data = <SAS-data-set> ;
Tables variable(s);
Run;
Where,
Example:
In the above program FREQ procedure creates a frequency table for variables rate and
months in the data set finance.loans
Rate
Frequency
Percent
9.50%
22.22
22.22
9.75%
11.11
33.33
10.00%
22.22
55.56
10.50%
44.44
100.00
Cumulative Frequency
Cumulative Percent
Month
s
Frequency
Percent
Cumulative Frequency
Cumulative Percent
12
11.11
11.11
24
11.11
22.22
36
11.11
33.33
48
11.11
44.44
60
22.22
66.67
360
33.33
100.00
Rate
Frequency
Percent
9.50%
22.22
22.22
9.75%
11.11
33.33
10.00%
22.22
55.56
10.50%
44.44
100.00
Cumulative Frequency
Cumulative Percent
Months
Frequency
Percent
Cumulative Frequency
Cumulative Percent
12
11.11
11.11
24
11.11
22.22
36
11.11
33.33
48
11.11
44.44
60
22.22
66.67
360
33.33
100.00
To create a two-way table, join two variables with an asterisk (*) in the TABLES statement of a
PROC FREQ step
Syntax:
Proc Freq Data = <SAS-data-set>;
Tables variable-1 * variable-2 * . <variable-n>;
Run;
Where,
Example:
The above program creates the two-way table for variables weight and height
A series of two-way tables is produced, with a table for each level of the other variables
Example:
Here,
The above program will produce two crosstabulation tables, one for each value of Sex.
To control the depth of crosstabulation results, add a slash (/) and any combination of the
following options to the TABLES statement:
Example:
proc freq data = clinic.diabetes;
tables sex*weight / nofreq norow nocol;
run;
Here,
Output:
PROC MEANS
Provides mean, minimum, maximum and other data summarization tools, as well as helpful
options for controlling the output
Syntax:
Proc Means <DATA=SAS-data-set> <statistic- keyword(s)> <option(s)>;
Run;
Where,
Example:
Here,
PROC MEANS prints the n-count (number of nonmissing values), the mean, the standard
deviation, and the minimum and maximum values of every numeric variable in the data set
perm.survey
Specifying Statistics:
To specify statistics, include statistic keywords as options in the PROC MEANS statement
When a statistic is specified in the PROC MEANS statement, default statistics are not produced
Example
Means procedure prints only median and range for all the numeric variables
The following keywords can be used with PROC MEANS to compute statistics:
Descriptive Statistics
Keyword
Description
CLM
CSS
CV
Coefficient of variation
KURTOSIS / KURT
Kurtosis
LCLM
MAX
Maximum value
MEAN
Average
MIN
Minimum value
NMISS
RANGE
Range
SKEWNESS / SKEW
Skewness
STDDEV / STD
Standard deviation
STDERR / STDMEAN
SUM
Sum
SUMWGT
UCLM
USS
VAR
Variance
Quantile Statistics
Keyword
Description
MEDIAN / P50
P1
1st percentile
P5
5th percentile
P10
10th percentile
Q1 / P25
Q3 / P75
P90
90th percentile
P95
95th percentile
P99
99th percentile
QRANGE
Hypothesis Testing
Keyword
Description
PROBT
By default, the MEANS procedure generates statistics for every numeric variable in a data set
To specify the variables that PROC MEANS analyzes, add a VAR statement and list the variable
names
Syntax:
Proc Means Data = <SAS-data-set> <statistic- keyword(s)> <option(s)>;
Var variable(s);
Run;
Where,
Example:
Here,
The means procedure will calculate the result for age, height and weight only.
To produce separate analyses of grouped observations, add a CLASS statement to the MEANS
procedure
does not generate statistics for CLASS variables, because their values are used only to
categorize data
CLASS variables can be either character or numeric, but they should contain a limited number of
discrete values that represent meaningful groupings
Syntax:
Proc Means Data = <SAS-data-set> <statistic- keyword(s)> <option(s)>;
Class variable(s);
Run;
Where,
Example:
Here,
The output of the program shown above is categorized by values of the variables
Survive and Sex.
Syntax:
Proc Means Data = <SAS-data-set> <statistic- keyword(s)> <option(s)>;
By variable(s);
Run;
Where,
Example:
Here,
The output of the program shown above is categorized by values of the variables
Survive and Sex.
Unlike CLASS processing, BY processing requires that the data is already sorted or indexed in
the order of the BY variables
BY group results have a layout that is different from the layout of CLASS group results.
Create an output SAS data set that contains only the summarized variable
Syntax:
Proc Means Data = <SAS-data-set> <statistic- keyword(s)> <option(s)>;
Output Out = SAS-data-set <statistic-keyword= variable-name(s)>;
Run;
Where.
SAS-data-set in the output statement specifies the name of the output data set
variable- name(s) specifies the names of the variables that will be created to contain
the values of the summary statistic. These variables correspond to the analysis
variables that are listed in the VAR statement.
Example:
Obs
Sex
The above program creates a typical PROC MEANS report and also creates a
summarized output data set that includes only the MEAN and MIN statistics
_TYPE_
_FREQ_
AvgAge
AvgHeight
AvgWeight
MinAge
MinHeight
MinWeight
20
46.7000
66.9500
174.650
15
61
102
11
48.9091
63.9091
150.455
16
61
102
44.0000
70.6667
204.222
15
66
140
The difference between the two procedures is that PROC MEANS produces a report by default.
By contrast, to produce a report in PROC SUMMARY, must include a PRINT option in the PROC
SUMMARY statement.
Syntax:
Proc Summary Data = <SAS-data-set> <statistic- keyword(s)> <option(s)>;
Run;
Where,
Example:
The above program creates an output data set but does not create a report
Syntax:
ODS open-destination;
ODS close-destination CLOSE;
Where,
open-destination is a keyword and any required options for the type of output that is to be
created, such as
HTML FILE='html-file-pathname'
LISTING
Example:
Here,
The ods html statement creates an HTML output of the name mydata.html in the path
specified.
ODS Destinations:
The table that follows lists the ODS destinations that are supported.
This destination
Produces
HTML
Listing
ODS Document
Output
Printer Family
RTF
The keyword _ALL_ is used in the ODS CLOSE statement to close all open destinations
concurrently
Syntax:
ODS open-destination1;
ODS open-destination2;
ODS _all_ CLOSE;
Where,
open-destination1 is a keyword and any required options for the first type of output
that is to be created
open-destination2 is a keyword and any required options for the second type of output
that is to be created
Example:
Here,
The ods html statement creates an HTML output of the name admit.html in the path
specified
The ods pdf statement creates a PDF output of the name admit.pdf in the path specified
FILE= can also be used to specify the file that contains the HTML output. FILE= is an
alias for BODY=.
Can also use the ODS HTML statement to direct the results from multiple procedures to the same
HTML file
Syntax:
ODS open-destination;
Procedure1
Procedure2
ODS close-destination CLOSE;
Where,
open-destination is a keyword and any required options for the type of output that is to
be created
Example:
Here,
The program above generates HTML output for the PRINT and TABULATE procedures
User can create some of custom formats to apply on same variables. For example, we can format
a product number so that it is displayed as descriptive text
FORMAT procedure, can be used to create user defined formats for variables
Syntax:
options includes :
Library= libref , specifies the libref for a SAS data library that contains a
permanent catalog in which user-defined formats are stored
Fmtlib , prints the contents of a format catalog
range specifies one or more variable values and a character string or an existing format
label is a text string enclosed in quotation marks
When PROC FORMAT is used to create a format, the format is stored in a format catalog
If the SAS data library does not already contain a format catalog, SAS automatically creates one
If LIBRARY= option is not specified, then the formats are stored in a default format catalog
named Work.Formats
Formats are stored in a permanent format catalog named Formats when we specify the
LIBRARY= option in the PROC FORMAT statement
PROC FORMAT LIBRARY=libref;
A LIBNAME statement needed to associates the libref with the permanent SAS data library in
which the format catalog is to be stored
It is recommended, but not required, to use the word Library as the libref when creating our own
permanent formats
libname library 'c:\ sas \formats\lib ;
Example:
(Without Format)
FirstName
LastName
JobTitle
Salary
Donny
Evans
112
29996.63
Lisa
Helms
105
18567.23
John
Higgins
111
25309.00
Amy
Larson
113
32696.78
Mary
Moore
112
28945.89
Jason
Powell
103
35099.50
Here,
The values for JobTitle are coded, and they are not easily interpreted
Using proc format we can create a format for this variable which describes the values of
this variable
The user defined format JOBFMT is used for formatting a variable called jobtitle
Output:
(With Format)
FirstName
LastName
JobTitle
Salary
Donny
Evans
technical writer
29996.63
Lisa
Helms
text processor
18567.23
John
Higgins
25309.00
Amy
Larson
32696.78
Mary
Moore
technical writer
28945.89
Jason
Powell
manager
35099.50
Example:
Here,
Format is created for character variable ( $ sign before the format name)
proc format lib= library;
value jobfmt
103='manager'
105='text processor'
111='assoc. technical writer'
112='technical writer'
113='senior technical writer';
run;
Here,
Format is created for numeric variable ( no $ sign before the format name)
To define several formats, use multiple VALUE statements in a single PROC FORMAT step
END
103
103
manager
105
105
text processor
111
111
112
112
technical writer
113
113
Proc Transpose
Syntax
PROC TRANSPOSE <DATA=input-data-set> <LABEL=label> <LET>
<NAME=name> <OUT=output-data-set> <PREFIX=prefix>;
BY <DESCENDING> variable-1 <...<DESCENDING> variable-n>;
COPY variable (s);
ID variable;
VAR variable (s);
Run;
where,
Label assign a name to the variable that contains the label of the variable being transposed
Name assign a variable name to the variable that contains the name of the variable being
transposed
Prefix assign the prefix for the transposed variables. The default is COL, which would produce
COL1,COL2, COL3, etc
Id use the values of variable listed as the names for the variables transposed
Original Dataset
Example:
proc transpose data=long1 out=wide1 prefix=faminc;
by famid ;
id year;
var faminc;
run;
Obs
famid
year
faminc
96
40000
97
40500
98
41000
96
45000
97
45400
98
45800
96
75000
97
76000
98
77000
Result Dataset
Obs
famid
_NAME_
faminc96
faminc97
faminc98
faminc
40000
40500
41000
faminc
45000
45400
45800
faminc
75000
76000
77000
Example:
Obs
famid
family
faminc96
faminc97
faminc98
faminc
40000
40500
41000
faminc
45000
45400
45800
faminc
75000
76000
77000
Exporting Data
Export Using SAS GUI:
SAS dataset can be exported as an external file of any type such as:
Excel (.xls)
SAS dataset (.sas7bdat)
Text (.txt)
CSV (.csv)
HTML (.html)
Microsoft Access Files (.mdb)
Data=SAS-data-set :- identifies the input SAS data set with either a one- or two-level
SAS name (library and member name
Outfile="filename" :- specifies the complete path and filename of the output PC file,
spreadsheet, or delimited external file
Outtable="tablename" :- specifies the table name of the output DBMS table
DBMS=identifier :- specifies the type of data to export. For example, DBMS=DBF
specifies to export a dBASE file, DBMS=ACCESS exports a Microsoft Access table
REPLACE :- overwrites an existing file
Delimiter=<character> :- If DBMS=DLM then delimiter= <delimiting character> should
be specified>
To use a SAS function, specify the function name followed by the function arguments, which are
enclosed in parentheses
Even if the function does not require arguments, the function name must still be followed by
parentheses
Unless the length of the target variable has been previously defined, a default length is assigned
Syntax:
function-name (argument-1 , <argument-n>);
where,
arguments can be
variables P H D Q x,y,z
constants P H D Q 456,502,612,498
expressions P H D Q 37*2,192/5 mean(22,34,56)
Example:
std(x1,x2,x3) ;
Sum Function
Syntax:
sum( argument , argument,...)
where,
Example:
Data work.after;
Set work.before;
totalsal = sum (sal1,sal2,sal3);
Run;
Here,
The above program calculates the sum of the values in sal1, sal2 and sal3 variables.
MEAN Function
Syntax:
Example:
Data work.after;
Set work.before;
avg = mean (marks1,marks2,marks3);
Run;
Here,
The above program calculates the average of the values in marks1, marks2 and marks3
variables.
MIN Function
Syntax:
Example:
Data work.after;
Set work.before;
minimum =min (marks1,marks2,marks3);
Run;
Here,
The above program finds the minimum of the values in marks1, marks2 and marks3
variables.
MAX Function
Syntax:
max(argument, argument,...)
where,
Example:
Data work.after;
Set work.before;
maximum =max (marks1,marks2,marks3);
Run;
Here,
The above program finds the maximum of the values in marks1, marks2 and marks3
variables.
VAR Function
Syntax:
var(argument, argument,...)
where,
Example:
Data work.after;
Set work.before;
variance = var (s1, s2, s3);
Run;
Here,
The above program calculate the variance of the values in s1, s2 and s3 variables.
STD Function
Syntax:
std(argument, argument,...)
where,
Example
Data work.after;
Set work.before;
stdev =std (s1, s2, s3);
Run;
Here,
The above program calculate the standard deviation of the values in s1, s2 and s3
variables.
Syntax:
INPUT (source, informat );
Where.
informat is the numeric informat to be specified. When choosing the informat, be sure
to select a numeric informat that can read the form of the values.
Example
Data hrd.newtemp;
Set hrd.temp;
Test=input(saletest,comma9.);
Run;
Here,
The function uses the numeric informat COMMA9. to read the values of the character
variable SaleTest. Then the resulting numeric values are stored in the variable Test.
Character Value
Informat
2115233
7.
2,115,233
COMMA9.
PUT Function
Format specified in the PUT function must match the data type of the source
Syntax:
PUT(source,format) ;
Where,
Numeric formats right-align the result; character formats left-align the result.
If you use the PUT function to create a variable that has not been previously identified,
it creates a character variable whose length is equal to the format width.
Example
data hrd.newtemp;
set hrd.temp;
Assignment = put (site,2.) || '/ || dept;
run;
Here,
Put function converts the data type of site variable into character data type.
After that the value is concatenated and saved in the new variable assignment.
Syntax:
YEAR (date);
Where,
date is a SAS date value that is specified either as a variable or as a SAS date constant
Example
Data hrd.temp98;
Set hrd.temp;
yr = year(startdate);
Run;
Here,
Year function extracts the year portion from the date value variable startdate and save it
in the new variable yr.
QTR Function
Syntax:
QTR (date) ;
Where,
date is a SAS date value that is specified either as a variable or as a SAS date
constant.
Example
Data hrd.temp98;
Set hrd.temp;
quarter = qtr(startdate);
Run;
Here,
QTR function extracts the quarter value from the date value variable startdate and save
it in the new variable quarter.
MONTH Function
Syntax:
MONTH (date) ;
where,
date is a SAS date value that is specified either as a variable or as a SAS date
constant.
Example
data hrd.nov99;
set hrd.temp;
mn = month(startdate);
Run;
Here,
Month function extracts the month value from the startdate variable and save it in the
new variable mn.
DAY Function
Syntax:
DAY (date);
Where,
date is a SAS date value that is specified either as a variable or as a SAS date constant
Example:
data hrd.nov99;
set hrd.temp;
days = day(date);
Run;
Here,
Day function extracts the day value from the date variable and save it in the new
variable days.
WEEKDAY Function
Syntax:
WEEKDAY (date) ;
where,
date is a SAS date value that is specified either as a variable or as a SAS date constant
Example
data hrd.nov99;
set hrd.temp;
weekday = weekday(date);
Run;
Here,
WEEKDAY function extracts the day of the week value from the date variable and save
it in the new variable weekday.
The WEEKDAY function returns a numeric value from 1 to 7. The values represent the days of the
week.
Value
equals
Sunday
Monday
Tuesday
Wednesday
Thursday
Friday
Saturday
MDY Function
Creates a SAS date value from numeric values that represent the month, day, and year
Syntax:
month can be a variable that represents the month, or a number from 1-12
day can be a variable that represents the day, or a number from 1-31
year can be a variable that represents the year, or a number that has 2 or 4 digits.
Example:
Here,
A new variable date will be created by combining the values in the variables month,
day and year using the mdy function.
Return the current date from the system clock as a SAS date value
Syntax:
DATE()
TODAY()
These functions require no arguments, but they must still be followed by parentheses.
Example
data hrd.newtemp;
set hrd.temp;
EditDate = date();
run;
Here,
Date function returns the current system date and store it in a new variable editdate.
TIME Function
Syntax:
time ( );
Example:
data hrd.newtemp;
set hrd.temp;
starttime = time();
run;
Here,
TIME function returns the current system time and store it in a new variable starttime.
INTCK Function
Returns the number of time intervals that occur in a given time span
Counts intervals from fixed interval beginnings, not in multiples of an interval unit from the from
value
For example :
WEEK intervals are counted by Sundays rather than seven-day multiples from the from
argument
Syntax:
DAY
DTMONTH
WEEKDAY
DTWEEK
WEEK
HOUR
TENDAY
MINUTE
SEMIMONTH
SECOND
MONTH
QTR
SEMIYEAR
YEAR
Example:
Data work.anniv20;
SET flights.mechanics ( KEEP=id lastname firstname hired);
Years= INTCK ( 'year , hired , today() );
If years=20 and Month (hired) = Month (TODAY());
Proc Print Data = work.anniv20;
Run;
Here,
The program identifies mechanics whose 20th year of employment occurs in the
current month
It uses the INTCK function to compare the value of the variable Hired to the date on
which the program is run.
INTNX Function:
Applies multiples of a given interval to a date, time, or datetime value and returns the resulting
value
Syntax:
INTNX ( interval , start-from , increment< , 'alignment'> )
Where,
increment specifies a negative or positive integer that represents time intervals toward the past or
future
DAY
DTMONTH
WEEKDAY
DTWEEK
WEEK
HOUR
TENDAY
MINUTE
SEMIMONTH
SECOND
MONTH
YEAR
BEGINNING
MIDDLE
END
SAMEDAY
B
M
E
S
QTRSEMIYEAR
Example:
SAS Statement
Date Value
The statements above count five months from January, but the returned value depends
on whether alignment specifies the beginning, middle, or end day of the resulting
month.
If alignment is not specified, the beginning day is returned by default.
DATEPART Function
Syntax:
Datepart (variable);
where,
Example
data hrd.newtemp;
set hrd.temp;
Date = datepart(saledate);
run;
Here,
Datepart function extracts the date portion from saledate, which is in date and time
format, and save it in new variable date .
DATDIF Functions
Syntax:
basis specifies a character constant or variable that describes how SAS calculates the
date difference.
Example
data hrd.newtemp;
set hrd.temp;
date= DATDIF(sdate,edate,ACT/ACT);
run;
Here,
DATDIF function gives the difference between two dates in number of days.
YRDIF Function
Accept start dates and end dates that are specified as SAS date values
Use a basis argument that describes how SAS calculates the date difference
Syntax
YRDIF ( start_date , end_date , basis )
where,
basis specifies a character constant or variable that describes how SAS calculates the
date difference.
Example:
data hrd.newtemp;
set hrd.temp;
date= YRDIF (sdate, edate, ACT/ACT);
run;
Here,
YRDIF function gives the difference between the two dates in number of years.
There are two character strings that are valid for basis in the DATDIF function and four character
strings that are valid for basis in the YRDIF function. These character strings and their meanings
are listed in the table below.
Character String
Meaning
Valid In DATDIF
Valid In YRDIF
'30/360'
yes
yes
'ACT/ACT'
yes
yes
'ACT/360'
no
yes
'ACT/365'
no
yes
Enables you to separate a character value into words and to return a specified word
Uses delimiters, which are characters that are specified as word separators, to separate a
character string into words
Can specify as many delimiters as needed to correctly separate the character expression
Syntax:
where,
delimiters are special characters that must be enclosed in single quotation marks (' ').
Example:
It creates three variables to store the employee's first name, middle name & last name
which is stored in a variable called name
SUBSTR Function:
When the function is on the right side of an assignment statement, the function returns the
requested string
When the function is on the left side of an assignment statement, the function is used to modify
variable values
Syntax:
SUBSTR (argument, position, <n>)
Where,
Example:
It extract the first letter of the MiddleName value to create the new variable MiddleInitial.
SUBSTR function is best used when the exact position of the substring that is to be extracted from
the character value is known
TRIM Function:
Whenever the value of a character variable does not match the length of the variable, SAS pads
the value with trailing blanks
Trim the values of a variable and then assign these values to a new variable, the trimmed values
are padded with trailing blanks again if the values are shorter than the length of the new variable
Syntax:
TRIM ( argument )
Where,
Examples:
A new variable called newaddress is created which contain the full address taken from
three different variables called address, city and zip
The trailing spaces of the variables address and city are trimmed using trim function .
CATX Function:
Enables to concatenate character strings, remove leading and trailing blanks, and insert
separators
Results of the CATX function are usually equivalent to those that are produced by a combination
of the concatenation operator and the TRIM and LEFT functions
Syntax:
CATX ( separator , string-1 <,...string-n> )
Where,
Example:
Here,
The above program uses CATX function to concatenate the variables address, city &
zip into new variable newaddress and separates each values with comma.
INDEX Function:
Searches values from left to right, looking for the first occurrence of the string
Is case sensitive
Syntax:
INDEX (source ,excerpt )
Where,
Example:
Data hrd.datapool;
Set hrd.temp;
If Index ( job , 'word processing ) > 0;
Run;
Here,
It is creating a new dataset with only those observations, in which the function locates
the string word processing and returns a value greater than 0.
FIND Function:
Syntax:
FIND (string , substring , <modifiers> , < startpos> )
Where,
string specifies a character constant, variable, or expression that will be searched for substrings
modifiers is a character constant, variable, or expression that specifies one or more modifiers
startpos is an integer that specifies the position at which the search should start and the direction
of the search
If startpos is not specified, FIND starts the search at the beginning of the string and searches
the string from left to right.
If startpos is positive, FIND searches from startpos to the right
If startpos is negative, FIND searches from startpos to the left
The modifiers argument specifies one or more modifiers for the function, as listed below.
The modifier i causes the FIND function to ignore character case during the search. If
this modifier is not specified, FIND searches for character substrings with the same
case as the characters in substring.
The modifier t trims trailing blanks from string and substring
Example:
Data hrd.datapool;
Set hrd.temp;
If Find ( job , word processing , t ) > 0;
Run;
Here,
It Creates a new dataset with only those observations, in which the function locates the
string word processing and returns a value greater than 0.
UPCASE Function:
Syntax:
UPCASE (argument)
Where,
Example:
Data hrd.newtemp;
Set hrd.temp;
Job = UPCASE (job) ;
Run;
Here,
The above program converts the values of Job to uppercase and save into a new
dataset.
LOWCASE Function:
Syntax:
LOWCASE ( argument )
Where,
Example:
Data hrd.newtemp;
Set hrd.temp;
Contact = LOWCASE ( contact);
Run;
Here,
The above program converts the values of variable contact to lowercase and store in a
new dataset.
PROPCASE Function:
Converts all words in an argument to proper case (the first letter in each word is capitalized)
First copies a character argument and converts all uppercase letters to lowercase letters
Then converts to uppercase the first character of a word that is preceded by a delimiter
Syntax:
PROPCASE (argument , <delimiter (s)> )
Where,
delimiter(s) specifies one or more delimiters that are enclosed in quotation marks. The
default delimiters are blank, forward slash, hyphen, open parenthesis, period, and tab.
Example:
Data hrd.newtemp;
Set hrd.temp;
Contact = PROPCASE(contact);
Run;
Here,
The program converts the values of variable contact into proper case and save into new
dataset.
TRANWRD Function
Syntax
TRANWRD (source, target, replacement)
where
Example:
Data work.after;
Set work.before;
name = TRANWRD (name, 'Miss', 'Ms.');
name = TRANWRD (name ,'Mrs. ','Ms.');
Run;
Here,
The above program change all occurrences of Miss or Mrs. to Ms. in the variable name.
Translate Function
Syntax
TRANSLATE(source, < to 1-n>, < from 1-n>)
where,
source specifies the source string or name of the variable whose value is to be translated
Example:
Data work.after;
Set work.before;
name = TRANSLATE (name, XYZ', ABC.');
Run;
Here,
The above program will replace all the As with X, Bs with Y and Cs with Z in the name
variable.
Syntax:
INT (argument)
Where,
Example:
Data work.after;
Set work.before;
Intamt = INT(amount);
Run;
Here,
The value of the variable amount is converted to integer and stored in a new variable.
ROUND Function
Syntax:
ROUND ( argument , round-off-unit );
Where,
Example:
Data work.after;
Set work.before;
amt = ROUND(amount,.2);
Run;
Here,
Can place an OPTIONS statement anywhere in a SAS program to change the settings from that
point onwards
OPTIONS statement is global ie: the settings remain in effect until modify them, or end SAS
session
Syntax:
OPTIONS options;
Where,
options specifies one or more system options to be changed
This suppresses the printing of both page numbers and the date and time in listing output
This prints both page numbers and the date&time in listing output
Example:
Here,
Page numbers and the current date are not displayed in the PROC PRINT output
Page numbers are not displayed in the PROC FREQ output, either, but the date does
appear at the top of the page that contains the PROC FREQ report
Output:
Obs
2
3
4
5
7
8
ID
2462
2501
2523
2539
2552
2555
F
F
F
M
F
M
34
31
43
51
32
35
66
61
63
71
67
70
Weight
152
123
137
158
151
173
Cumulative Cumulative
Sex Frequency Percent
Frequency Percent
-------------------------------------------------------------------------F
2
25.0
2
25.0
M
6
75.0
8
100.0
PAGENO= option is used to specify the beginning page number for the report
If its not specified, the output is numbered sequentially throughout the SAS session, starting with
page 1
The PAGESIZE= option specifies how many lines each page of output should contain
The LINESIZE= option specifies the width of the print line for the procedure output and log
Observations that do not fit within the line size continue on a different line
Syntax:
options pageno = n pagesize =n linesize = n;
Where,
n is any number
Example:
Here,
The output pages are numbered sequentially throughout the SAS session
The page of the output that the PRINT procedure produces contains 15 lines
YEARCUTOFF Option:
Date
Expression
Interpreted As
12/07/41
12/07/1941
18Dec15
18Dec2015
04/15/30
04/15/1930
15Apr95
15Apr1995
Syntax:
Example:
Date Expression
Interpreted As
12/07/41
12/07/2041
18Dec15
18Dec2015
04/15/30
04/15/2030
15Apr95
15Apr1995
Syntax:
OPTIONS FIRSTOBS=n;
OPTIONS OBS=n;
Where,
n is a positive integer
For FIRSTOBS=, n specifies the number of the first observation to process
For OBS=, n specifies the number of the last observation to process
By default, FIRSTOBS=1. The default value for OBS= is MAX
Example:
Here SAS reads the 10th observation of the data set first and reads through the last observation
(for a total of 11 observations)
options firstobs =1 obs =10 ;
proc print data =sasuser.heart ;
run ;
To reset the number of the last observation to process, you can specify OBS=MAX in the
OPTIONS statement.
options obs = max;
This instructs any subsequent SAS programs in the SAS session to process through the last
observation in the data set being read
Obs and firstobs will be for the duration of current SAS session
OPTIONS procedure can be used to display the current setting of one or all SAS system options
The results are displayed in the log
Syntax:
PROC OPTIONS < option (s ) > ;
RUN;
Where, option(s) specifies how SAS system options are displayed
Example:
proc options;
Run;
This lists all SAS system options, their settings, and a description
To list the value of one particular system option, use the OPTION= option in the PROC OPTIONS
statement as shown below:
If a SAS system option uses an equal sign, such as YEARCUTOFF=, you do not include the
equal sign when specifying the option to OPTION=.
Is an external text file whose records contain data values that are organized in fields
Raw data files are non-proprietary and can be read by a variety of software programs
1.
2.
Write a DATA step program to read the raw data file and create a SAS data set.
To read the raw data file, the DATA step must provide the following instructions to SAS:
The table below outlines the basic statements that is used to import a Raw data file
To Do This
LIBNAME statement
FILENAME statement
DATA statement
INFILE statement
Describe data
INPUT statement
RUN statement
RUN statement
FILENAME statement:
Before reading raw data, it must be pointed to the location of the external file that contains the data
Syntax:
Example:
Here,
The FILENAME statement temporarily associates the fileref Tests with the external file that
contains the data
Syntax:
Where,
Example:
Here,
Infile Statement:
Syntax:
INFILE file-specification <options> ;
Where,
file-specification can take the form fileref to name a previously defined file reference or 'filename'
to point to the actual name and location of the file
options describes the input file's characteristics and specifies how it is to be read with the INFILE
statement.
Example:
INFILE statement can also specify the complete path of a file instead of using the FILENAME
statement:
Example:
Input Statement:
Describes the fields of raw data to be read and placed into the SAS data set.
Syntax:
INPUT variable <$> startcol - endcol . . . ;
where
($) identifies the variable type as character (if the variable is numeric, then $ is not specified)
Example:
In such files the values for each variable are in the same location in all records
Syntax:
The complete syntax for importing a raw data file from the memory to SAS is:
LIBNAME statement
FILENAME statement
DATA statement
INFILE statement
INPUT statement
RUN statement
Example:
Here,
Libname creates library reference
Filename Reference a external file
Data set name a SAS data set to be created
Infile statement identifies a external file
Input statement describes the data from the external file
It can be used to read character variable values that contain embedded blanks.
input Name $ 1-25;
No placeholder is required for missing data. A blank field is read as missing and does not cause
other fields to be read incorrectly.
input Item $ 1-13 IDnum $ 15-19 Instock 21-22 Backord 24-25;
numbers
decimal points
numbers in scientific or E-notation (2.3E4, for example)
plus or minus signs
values that contain special characters, such as percent signs (%), dollar
signs ($), and commas (,)
date and time values
data in fraction, integer binary, real binary, and hexadecimal forms
The file below contains personnel information for a technical writing department of a small
computer manufacturer. The fields contain values for each employee's last name, first name, job
title, and annual salary.
The values for Salary contain commas. The values for Salary are considered to be nonstandard
numeric values.
Nonstandard data values require an input style that is more flexibility than column input
Formatted input can be used, which combines the features of column input with the ability to
read both standard and nonstandard data.
When raw data that is organized into fixed fields is to be read, use:
Syntax:
Where,
informat is the special instruction that specifies how SAS reads raw data.
+n :- Moves the input pointer forward to a column number that is relative to the current position
The @ moves the pointer to column n, which is the first column of the field that is being read
informat is the special instruction that specifies how SAS reads raw data
Example:
input @9 FirstName $5. @1 LastName $7. @15 JobTitle 3. @19 Salary comma9. ;
Here,
It moves the input pointer forward to a column number that is relative to the current position
informat is the special instruction that specifies how SAS reads raw data
In order to count correctly, it is important to understand where the column pointer control is
located after each data value is read
Example:
input LastName $7. +1 FirstName $5. +5 Salary comma9. @15 JobTitle 3.;
Here,
Because the values for LastName begin in column 1, a column pointer control is not
needed
After LastName is read, the pointer moves to column 8
To start reading FirstName, which begins in column 9, move the column pointer control
ahead 1 column with +1
After reading FirstName, the column pointer moves to column 14
Moved column pointer ahead 5 columns from column 14 to read Salary
@n column pointer control is used to return to column 15 to read jobtitle
INFORMAT
Used to Read data values in certain forms into standard SAS values
It determines how data values are read into a SAS data set
Informats are used to read numeric values that contain letters or other special characters
Informats must be used to read standard / non-standard data (numeric data containing letters or
special characters such as comma).
The numeric value $1,234.00 contains two special characters, a dollar sign ($) and a comma (,).
Informat is used to read the value while removing the dollar sign and comma, and then store the
resulting value as a standard numeric value
$ 1,000,000 is a non-standard numeric data as it contains a dollar sign($) and commas (,). In
order to remove the dollar sign and commas before storing the numeric value 1000000 in a
variable, read the value with COMMA11. Informat
INFORMAT statement:
It specifies the informat for reading the values of the variables that are listed in the INFORMAT
statement
A single INFORMAT statement can associate the same informat with several variables, or it can
associate different informats with different variables
If a variable appears in multiple INFORMAT statements, SAS uses the informat that is assigned
last.
Syntax:
variablename is the name of the variable for which we are specifying the informat
w Specifies the informat width, which for most informats is the number of columns in the input
data
If w and d values are omitted from the informat, SAS uses default values
Example:
Here,
we are specifying a numeric informat for variables Birthdate & Interview
Informat is used in input statement to read the data in a particular format from the raw data file
Example:
As FirstName and LastName is character in type, $ is used. 5 and 7 are the width of
FirstName and LastName respectively
Comma9. is used to read the Salary value, as it contains non-standard numeric values
Output:
Obs
FirstNa
me
LastName
JobTitle
Salary
DONNY
EVANS
112
29996.63
ALISA
HELMS
105
18567.23
JOHN
HIGGINS
111
25309.00
AMY
LARSON
113
32696.78
MARY
MOORE
112
28945.89
JASON
POWELL
103
35099.50
JUDY
RILEY
111
25309.00
Format
SAS software offers a variety of character, numeric, and date and time formats
Can temporarily specify a format in a PROC step to determine the way the data values appear in
output
Syntax:
variablename specifies the name of the variable for which the format is used
w Specifies the format width, which for most formats is the number of columns in the
input data.
If omit w and d values from the format, SAS uses default values
The d value specified with format tells SAS to display that many decimal places, regardless of
how many decimal places are in the data
If the format width is too narrow to represent a value, SAS tries to squeeze the value into the
space available
When a FORMAT statement is used in a procedure step, the formats that are associated with the
variables remain in the effect only for that particular step. That is the format association is
temporary and not permanent
TIMEw.d writes time values as hours, minutes, and seconds in the form hh:mm:ss.ss
Example:
To display the value 1234 as $1234.00 in a report, use the DOLLAR8.2 format
The WORDS22. format, which converts numeric values to their equivalent in words, writes the
numeric value 692 as six hundred ninety-two
Files that have a variable-length record format. They have an end-of-record marker after the last
field in each record
Variable-length records have values that are shorter than others or that are missing
This can cause problems when trying to read the raw data into SAS data set
Example:
Here,
The asterisk symbolizes the end-of-record marker and is not part of the data
INPUT statement specifies a field width of 8 columns for Receipts
In the third record, the input pointer encounters an end-of-record marker before the 8th
column
Input pointer moves down to the next record in an attempt to find a value for Receipts
However, GRILL is a character value, and Receipts is a numeric variable. Thus, an
invalid data error occurs, and Receipts is set to missing
When using column input or formatted input to read fixed-field data in variable-length records,
PAD option can be used to avoid problems
It PADs each record with blanks so that all data lines have the same length
Example:
Here,
The pad option pads all the values of the variable receipts with spaces