You are on page 1of 44

SAS/STAT

SAS
FOR
STATISTICAL ANALYSIS

OVERVIEW
SAS/STAT Software
Component of the SAS System
Provides comprehensive statistical tools for a wide range of statistical
analyses, including analysis of variance, regression, categorical data
analysis, multivariate analysis, survival analysis.
In addition to 54 procedures for statistical analysis, SAS/STAT
software also includes the Market Research Application (MRA), a
point-and-click interface to commonly used techniques in market
research..

BASIC STATISTICS
Mean
Median
Mode
Dispersion

Standard Deviation
Range
Percentiles
Quartiles

MEAN
An arithmetic average
Procedure for computing
Add up the numbers
Divide by the number of observations

Example
+ 90)

Arithmetic mean= (80 + 90 + 90 + 100 + 85


=535
6
=89.1

MEDIAN
Mid value
Procedure For Computing
Sorting the data
For Even number of observation
Median= average of(n/2)th obs and (n/2)+1th obs
For Odd number of observation
Median=(n/2)th obs
Example: 80 85 90 90 90 100
n=6
Median = (3rd obs+4thobs)/2
= (90+90)/2
= 90

MODE
Most frequently occurring observation
The value that is repeated most often in the data set.

Example:
data = 80 85 90 90 90 100
since there are 3 90's
Mode = 90

DISPERSION
Standard Deviation
Squared root of the average of the Squared distances of the
observation from the mean.

Range
Difference between highest and lowest observed value.

Percentiles
Divide the data into 100 equal parts.

Quartiles
Divide the data into 4 equal parts.

PROBABILITY

Numerical measure of the likelihood of the event.

It is a number that we attach to an event .

Event :
One or more of the possible outcome of doing something.
Example:
The event that we'll get over an inch of rain tomorrow, which
reflects the likelihood that we will get this much rain .

PROBABILITY
A probability is a number from 0 to 1.

If an event having probability 0, this indicates that this event never


will occur.
If an event having probability 1, this indicates that this event always
will occur.

What if we assign a probability of 0.5?


This means that it is just as likely for the event to occur as for the
event to not occur.

HYPOTHESIS TESTING
Procedure for making rational decision about the reality of effects.
Setting up and testing hypotheses is an essential part of statistical
inference.
Example :
Claiming that a new drug is better than the current drug for treatment of the
same symptoms
In each problem considered ,the question of interest is simplified into two
competing claims/hypothesis.
Null Hypothesis(Ho)
Alternate Hypothesis(H1).

10

NULL HYPOTHESIS
The hypothesis that there were no effects is called the NULL
HYPOTHESIS.(Ho)
Note : unlike geometry, we cannot prove the effects are real, rather we may
decide the effects are real.

Example :
In a clinical trial of a new drug, the null hypothesis might be that the new
drug is no better, on average, than the current drug.
Ho: there is no difference between the two drugs on average.

11

ALTERNATIVE HYPOTHESIS
definition

Example :
In a clinical trial of a new drug, the alternative hypothesis might be that
the new drug has a different effect, on average, compared to that of the
current drug.
H1: the two drugs have different effects, on average.
OR
H1: the new drug is better than the current drug, on average.

12

P-VALUE
Probability of wrongly rejecting the null hypothesis if it is in fact true.
The p-value is compared with the significance level ,
if it is smaller, the result is significant.
i.e. If p-value <0.05
then it indicates the strength of evidence for say, rejecting the null
hypothesis H0,
rather than concluding 'reject H0' or 'do not reject H0'.

13

SIGNIFICANCE LEVEL
"Does a 5 percent significance level mean there is only a 5% chance that
my results are significant?"
The significance level is actually the alpha.(

If the null hypothesis is true, there is a 5 percent chance of rejecting it


because of random variation (luck).

14

T TEST
T TEST is performed on three types of samples.
One sample
Two samples
Paired observations

15

T TEST
One

sample t-test

The one-sample t test compares the mean of the sample to a


given number.

Two sample test


The two-sample t test compares the mean of the first sample
minus the mean of the second sample to a given number.

Paired t-test
The paired observations t test compares the mean of the
differences in the observations to a given number.

16

REGRESSION ANALYSIS
Regression analysis is the analysis of the relationship
between one variable and another set of variables.

Where
yi is the response variable
xi is a regressor variable
0 and 1 are unknown parameters to be estimated
i is an error term.

17

ANALYSIS OF VARIANCE
Analysis of variance (ANOVA) is a technique for analyzing experimental
data in which one or more response (or dependent or simply Y)
variables are measured under various conditions identified by one or
more classification variables.
Example :
An experiment may measure weight change (the dependent
variable) for men and women who participated in three different
weight-loss programs. The six cells of the design are formed by
the six combinations of sex (men, women) and program (A, B, C).

18

SAS/STAT
There are 54 procedures for statistical analysis.
Analysis of variance
Generalized linear models
Categorical data analysis
Mixed models
Survival analysis
Multivariate techniques
Nonparametric analysis
Psychometric analysis

19

PROC T TEST
.

PROC TTEST < options > ;


CLASS variable ;
PAIRED variables ;
BY variables ;
VAR variables ;
FREQ variable ;
WEIGHT variable ;
No statement can be used more than once. There is no restriction on the order of
the statements after the PROC statement.

20

COMPARSION BETWEEN
PROC GLM AND PROC ANOVA
GLM procedure can analyze for both
balanced and unbalanced data.

The ANOVA procedure is designed to


handle balanced data (that is, data
with equal numbers of observations
for every combination of the
classification factors).
.

PROC ANOVA takes into account the special structure of a balanced design,
it is faster and uses less storage than PROC GLM for balanced data

21

COMPARSION BETWEEN
PROC GLM AND PROC MIXED
In Random statement ,PROC GLM
effects are treated as fixed and
computes expected mean squares.

In Random statement ,PROC MIXED


computes REML and ML estimates of
variance parameters

The REPEATED statement in PROC


MIXED is used to specify covariance
structures for repeated measurements
on subjects.

In REPEATED statement in PROC


GLM is used to specify various
transformations with which to
conduct the traditional univariate or
multivariate tests.

In repeated measures situations, the mixed model approach used in


PROC MIXED is more flexible and more widely applicable than
either the univariate or multivariate approaches

22

PROC ANOVA
PROC ANOVA < options > ;
CLASS variables ;
MODEL dependents=effects < / options > ;
ABSORB variables ;
BY variables ;
FREQ variable ;
MANOVA < test-options >< / detail-options > ;
MEANS effects < / options > ;
REPEATED factor-specification < / options > ;
TEST < H=effects > E=effect ;

23

PROC MIXED
The primary assumptions underlying the analyses performed by PROC
MIXED are as follows:
The data are normally distributed .
The means (expected values) of the data are linear in terms of a certain
set of parameters.
The variances and covariances of the data are in terms of a different
set of parameters, and they exhibit a structure matching one of those
available in PROC MIXED

24

PROC MIXED
The following statements are available in PROC MIXED.
PROC MIXED < options > ;
BY variables ;
CLASS variables ;
ID variables ;
MODEL dependent = < fixed-effects > < / options > ;
RANDOM random-effects < / options > ;
REPEATED < repeated-effect > < / options > ;
PARMS (value-list) ... < / options > ;

25

PROX MIXED (Contd.)


PRIOR < distribution > < / options > ;
CONTRAST 'label' < fixed-effect values ... >
< | random-effect values ... > , ... < /
options > ;
ESTIMATE 'label' < fixed-effect values ... >
< | random-effect values ... >< /
options > ;
LSMEANS fixed-effects < / options > ;
MAKE 'table' OUT=SAS-data-set ;
WEIGHT variable ;

PROC GLM
GLM procedure can be used for many different analyses, including
Simple regression
Multiple regression
Analysis of variance (ANOVA), especially for unbalanced data
Analysis of covariance
Response-surface models
Weighted regression
Polynomial regression
Partial correlation
Multivariate analysis of variance (MANOVA)
Repeated measures analysis of variance

27

PROC GLM
The following statements are available in PROC GLM.
PROC GLM < options > ;
CLASS variables ;
MODEL dependents=independents < / options > ;
ABSORB variables ;
BY variables ;
FREQ variable ;
ID variables ;
WEIGHT variable ;

28

PROC GLM(cont.)
CONTRAST 'label' effect values < ... effect values > < / options > ;
ESTIMATE 'label' effect values < ... effect values > < / options > ;
LSMEANS effects < / options > ;
MANOVA < test-options >< / detail-options > ;
MEANS effects < / options > ;
OUTPUT < OUT=SAS-data-set >
keyword=names < ... keyword=names > < / option > ;
RANDOM effects < / options > ;
REPEATED factor-specification < / options > ;
TEST < H=effects > E=effect < / options > ;
29

EXAMPLE ON GLM
data exp;
input A $ B $ Y @@;
datalines;
A1 B1 12 A1 B1 14
A1 B2 11 A1 B2 9
A2 B1 20 A2 B1 18
A2 B2 17
;
proc glm;
class A B;
model Y=A B A*B;
run;

30

EXAMPLE ON GLM

31

PROC FREQ
FREQ procedure produces one-way to n-way frequency and
crosstabulation (contingency) tables.
The statistics for contingency tables include
Chi-square tests and measures
Measures of association
Risks (binomial proportions) and risk differences for 22 tables
Odds ratios and relative risks for 22 tables
Tests for trend
Tests and measures of agreement

32

PROC FREQ
PROC FREQ < options > ;
BY variables ;
EXACT statistic-options < / computation-options > ;
OUTPUT < OUT=SAS-data-set > options ;
TABLES requests < / options > ;
TEST options ;
WEIGHT variable

33

PROC TABULATE
Simple but powerful methods to create tabular reports .
Flexibility in classifying the values of variables and establishing
hierarchical relationships between the variables.
Mechanisms for labeling and formatting variables and proceduregenerated statistics.

34

PROC TABULATE
PROC TABULATE <option(s)BY <DESCENDING> variable-1

<...<DESCENDING> variable-n>
<NOTSORTED>;
CLASS variable(s) </ options>;
CLASSLEV variable(s) / style =<style-element-name | <PARENT>> <[styleattribute-specification(s)]>;
FREQ variable;
KEYLABEL keyword-1='description-1'
<...keyword-n='description-n'>;
KEYWORD keyword(s) / style =<style-element-name | <PARENT>> <[styleattribute-specification(s)]>;
TABLE <<page-expression,> row-expression,> column-expression </ tableoption(s)>;
VAR analysis-variable(s)</ options>;
WEIGHT variable;

>;

35

PROC UNIVARIATE
The UNIVARIATE procedure provides data summarization tools, highresolution graphics displays, and information on the distribution of
numeric variables.
calculates descriptive statistics based on moments
calculates the median, mode, range, and quantiles
calculates the robust estimates of location and scale
calculates confidence limits
generates frequency tables
performs goodness-of-fit tests for fitted parametric and nonparametric
distributions
creates quantile-quantile plots and probability plots for various
theoretical distributions

36

PROC UNIVARIATE
PROC UNIVARIATE <option(s)BY <DESCENDING> variable-1

<...<DESCENDING> variable-n>
<NOTSORTED>;
CLASS variable-1<(variable-option(s))> <variable-2<(variable-option(s))>>
</ KEYLEVEL='value1'|('value1' 'value2')>;
FREQ variable;
HISTOGRAM <variable(s)> </ option(s)>;
ID variable(s);
INSET <keyword(s) DATA=SAS-data-set> </ option(s)>;
OUTPUT <OUT=SAS-data-set> statistic-keyword-1=name(s)
<... statistic-keyword-n=name(s)> <percentiles-specification>;
PROBPLOT <variable(s)> </ option(s)>;
QQPLOT <variable(s)> </ option(s)>;
VAR variable(s);

>WEIGHT variable;
;

37

PROC NPAR1WAY
The NPAR1WAY procedure performs nonparametric tests for location
and scale differences across a one-way classification.
PROC NPAR1WAY also provides a standard analysis of variance on
the raw data and statistics based on the empirical distribution function.
PROC NPAR1WAY provides tests using the raw input data as scores.
When the data are classified into two samples, tests are based on
simple linear rank statistics.
When the data are classified into more than two samples, tests are
based on one-way ANOVA statistics.
Both asymptotic and exact p-values are available for these tests.

38

PROC NPAR1WAY
PROC NPAR1WAY < options > ;
BY variables ;
CLASS variable ;
EXACT statistic-options < / computation-options > ;
FREQ variable ;
OUTPUT < OUT=SAS-data-set > < options > ;
VAR variables

39

OUTPUT DELIVERY SYSTEM

Display your output in hypertext markup language (HTML)


Display your output in Rich-Text-Format (RTF)
Create SAS data sets directly from output tables
Select or exclude individual output tables
Customize the layout, format, and headers of your output
ODS combines raw data with one or more table definitions to produce
one or more output objects. These objects can be sent to any or all
ODS destinations.

40

OUTPUT DELIVERY SYSTEM


How ODS Works ?
In your ODS statement(s), you specify one or more
destinations for your output
This destination . . .

Produces . . .

Output

SAS data sets

Listing

listing output

HTML

HTML output

41

OUTPUT DELIVERY SYSTEM

42

OUTPUT DELIVERY SYSTEM


ODS LISTING <action>;
ODS LISTING <DATAPANEL=number | DATA | PAGE>;

ODS HTML action;


ODS HTML HTML-file-specification(s) <option(s)>;

ODS OUTPUT action;


ODS OUTPUT data-set-definition(s);

43

44