Академический Документы
Профессиональный Документы
Культура Документы
SAS
FOR
STATISTICAL ANALYSIS
OVERVIEW
SAS/STAT Software
Component of the SAS System
Provides comprehensive statistical tools for a wide range of statistical
analyses, including analysis of variance, regression, categorical data
analysis, multivariate analysis, survival analysis.
In addition to 54 procedures for statistical analysis, SAS/STAT
software also includes the Market Research Application (MRA), a
point-and-click interface to commonly used techniques in market
research..
BASIC STATISTICS
Mean
Median
Mode
Dispersion
Standard Deviation
Range
Percentiles
Quartiles
MEAN
An arithmetic average
Procedure for computing
Add up the numbers
Divide by the number of observations
Example
+ 90)
MEDIAN
Mid value
Procedure For Computing
Sorting the data
For Even number of observation
Median= average of(n/2)th obs and (n/2)+1th obs
For Odd number of observation
Median=(n/2)th obs
Example: 80 85 90 90 90 100
n=6
Median = (3rd obs+4thobs)/2
= (90+90)/2
= 90
MODE
Most frequently occurring observation
The value that is repeated most often in the data set.
Example:
data = 80 85 90 90 90 100
since there are 3 90's
Mode = 90
DISPERSION
Standard Deviation
Squared root of the average of the Squared distances of the
observation from the mean.
Range
Difference between highest and lowest observed value.
Percentiles
Divide the data into 100 equal parts.
Quartiles
Divide the data into 4 equal parts.
PROBABILITY
Event :
One or more of the possible outcome of doing something.
Example:
The event that we'll get over an inch of rain tomorrow, which
reflects the likelihood that we will get this much rain .
PROBABILITY
A probability is a number from 0 to 1.
HYPOTHESIS TESTING
Procedure for making rational decision about the reality of effects.
Setting up and testing hypotheses is an essential part of statistical
inference.
Example :
Claiming that a new drug is better than the current drug for treatment of the
same symptoms
In each problem considered ,the question of interest is simplified into two
competing claims/hypothesis.
Null Hypothesis(Ho)
Alternate Hypothesis(H1).
10
NULL HYPOTHESIS
The hypothesis that there were no effects is called the NULL
HYPOTHESIS.(Ho)
Note : unlike geometry, we cannot prove the effects are real, rather we may
decide the effects are real.
Example :
In a clinical trial of a new drug, the null hypothesis might be that the new
drug is no better, on average, than the current drug.
Ho: there is no difference between the two drugs on average.
11
ALTERNATIVE HYPOTHESIS
definition
Example :
In a clinical trial of a new drug, the alternative hypothesis might be that
the new drug has a different effect, on average, compared to that of the
current drug.
H1: the two drugs have different effects, on average.
OR
H1: the new drug is better than the current drug, on average.
12
P-VALUE
Probability of wrongly rejecting the null hypothesis if it is in fact true.
The p-value is compared with the significance level ,
if it is smaller, the result is significant.
i.e. If p-value <0.05
then it indicates the strength of evidence for say, rejecting the null
hypothesis H0,
rather than concluding 'reject H0' or 'do not reject H0'.
13
SIGNIFICANCE LEVEL
"Does a 5 percent significance level mean there is only a 5% chance that
my results are significant?"
The significance level is actually the alpha.(
14
T TEST
T TEST is performed on three types of samples.
One sample
Two samples
Paired observations
15
T TEST
One
sample t-test
Paired t-test
The paired observations t test compares the mean of the
differences in the observations to a given number.
16
REGRESSION ANALYSIS
Regression analysis is the analysis of the relationship
between one variable and another set of variables.
Where
yi is the response variable
xi is a regressor variable
0 and 1 are unknown parameters to be estimated
i is an error term.
17
ANALYSIS OF VARIANCE
Analysis of variance (ANOVA) is a technique for analyzing experimental
data in which one or more response (or dependent or simply Y)
variables are measured under various conditions identified by one or
more classification variables.
Example :
An experiment may measure weight change (the dependent
variable) for men and women who participated in three different
weight-loss programs. The six cells of the design are formed by
the six combinations of sex (men, women) and program (A, B, C).
18
SAS/STAT
There are 54 procedures for statistical analysis.
Analysis of variance
Generalized linear models
Categorical data analysis
Mixed models
Survival analysis
Multivariate techniques
Nonparametric analysis
Psychometric analysis
19
PROC T TEST
.
20
COMPARSION BETWEEN
PROC GLM AND PROC ANOVA
GLM procedure can analyze for both
balanced and unbalanced data.
PROC ANOVA takes into account the special structure of a balanced design,
it is faster and uses less storage than PROC GLM for balanced data
21
COMPARSION BETWEEN
PROC GLM AND PROC MIXED
In Random statement ,PROC GLM
effects are treated as fixed and
computes expected mean squares.
22
PROC ANOVA
PROC ANOVA < options > ;
CLASS variables ;
MODEL dependents=effects < / options > ;
ABSORB variables ;
BY variables ;
FREQ variable ;
MANOVA < test-options >< / detail-options > ;
MEANS effects < / options > ;
REPEATED factor-specification < / options > ;
TEST < H=effects > E=effect ;
23
PROC MIXED
The primary assumptions underlying the analyses performed by PROC
MIXED are as follows:
The data are normally distributed .
The means (expected values) of the data are linear in terms of a certain
set of parameters.
The variances and covariances of the data are in terms of a different
set of parameters, and they exhibit a structure matching one of those
available in PROC MIXED
24
PROC MIXED
The following statements are available in PROC MIXED.
PROC MIXED < options > ;
BY variables ;
CLASS variables ;
ID variables ;
MODEL dependent = < fixed-effects > < / options > ;
RANDOM random-effects < / options > ;
REPEATED < repeated-effect > < / options > ;
PARMS (value-list) ... < / options > ;
25
PROC GLM
GLM procedure can be used for many different analyses, including
Simple regression
Multiple regression
Analysis of variance (ANOVA), especially for unbalanced data
Analysis of covariance
Response-surface models
Weighted regression
Polynomial regression
Partial correlation
Multivariate analysis of variance (MANOVA)
Repeated measures analysis of variance
27
PROC GLM
The following statements are available in PROC GLM.
PROC GLM < options > ;
CLASS variables ;
MODEL dependents=independents < / options > ;
ABSORB variables ;
BY variables ;
FREQ variable ;
ID variables ;
WEIGHT variable ;
28
PROC GLM(cont.)
CONTRAST 'label' effect values < ... effect values > < / options > ;
ESTIMATE 'label' effect values < ... effect values > < / options > ;
LSMEANS effects < / options > ;
MANOVA < test-options >< / detail-options > ;
MEANS effects < / options > ;
OUTPUT < OUT=SAS-data-set >
keyword=names < ... keyword=names > < / option > ;
RANDOM effects < / options > ;
REPEATED factor-specification < / options > ;
TEST < H=effects > E=effect < / options > ;
29
EXAMPLE ON GLM
data exp;
input A $ B $ Y @@;
datalines;
A1 B1 12 A1 B1 14
A1 B2 11 A1 B2 9
A2 B1 20 A2 B1 18
A2 B2 17
;
proc glm;
class A B;
model Y=A B A*B;
run;
30
EXAMPLE ON GLM
31
PROC FREQ
FREQ procedure produces one-way to n-way frequency and
crosstabulation (contingency) tables.
The statistics for contingency tables include
Chi-square tests and measures
Measures of association
Risks (binomial proportions) and risk differences for 22 tables
Odds ratios and relative risks for 22 tables
Tests for trend
Tests and measures of agreement
32
PROC FREQ
PROC FREQ < options > ;
BY variables ;
EXACT statistic-options < / computation-options > ;
OUTPUT < OUT=SAS-data-set > options ;
TABLES requests < / options > ;
TEST options ;
WEIGHT variable
33
PROC TABULATE
Simple but powerful methods to create tabular reports .
Flexibility in classifying the values of variables and establishing
hierarchical relationships between the variables.
Mechanisms for labeling and formatting variables and proceduregenerated statistics.
34
PROC TABULATE
PROC TABULATE <option(s)BY <DESCENDING> variable-1
<...<DESCENDING> variable-n>
<NOTSORTED>;
CLASS variable(s) </ options>;
CLASSLEV variable(s) / style =<style-element-name | <PARENT>> <[styleattribute-specification(s)]>;
FREQ variable;
KEYLABEL keyword-1='description-1'
<...keyword-n='description-n'>;
KEYWORD keyword(s) / style =<style-element-name | <PARENT>> <[styleattribute-specification(s)]>;
TABLE <<page-expression,> row-expression,> column-expression </ tableoption(s)>;
VAR analysis-variable(s)</ options>;
WEIGHT variable;
>;
35
PROC UNIVARIATE
The UNIVARIATE procedure provides data summarization tools, highresolution graphics displays, and information on the distribution of
numeric variables.
calculates descriptive statistics based on moments
calculates the median, mode, range, and quantiles
calculates the robust estimates of location and scale
calculates confidence limits
generates frequency tables
performs goodness-of-fit tests for fitted parametric and nonparametric
distributions
creates quantile-quantile plots and probability plots for various
theoretical distributions
36
PROC UNIVARIATE
PROC UNIVARIATE <option(s)BY <DESCENDING> variable-1
<...<DESCENDING> variable-n>
<NOTSORTED>;
CLASS variable-1<(variable-option(s))> <variable-2<(variable-option(s))>>
</ KEYLEVEL='value1'|('value1' 'value2')>;
FREQ variable;
HISTOGRAM <variable(s)> </ option(s)>;
ID variable(s);
INSET <keyword(s) DATA=SAS-data-set> </ option(s)>;
OUTPUT <OUT=SAS-data-set> statistic-keyword-1=name(s)
<... statistic-keyword-n=name(s)> <percentiles-specification>;
PROBPLOT <variable(s)> </ option(s)>;
QQPLOT <variable(s)> </ option(s)>;
VAR variable(s);
>WEIGHT variable;
;
37
PROC NPAR1WAY
The NPAR1WAY procedure performs nonparametric tests for location
and scale differences across a one-way classification.
PROC NPAR1WAY also provides a standard analysis of variance on
the raw data and statistics based on the empirical distribution function.
PROC NPAR1WAY provides tests using the raw input data as scores.
When the data are classified into two samples, tests are based on
simple linear rank statistics.
When the data are classified into more than two samples, tests are
based on one-way ANOVA statistics.
Both asymptotic and exact p-values are available for these tests.
38
PROC NPAR1WAY
PROC NPAR1WAY < options > ;
BY variables ;
CLASS variable ;
EXACT statistic-options < / computation-options > ;
FREQ variable ;
OUTPUT < OUT=SAS-data-set > < options > ;
VAR variables
39
40
Produces . . .
Output
Listing
listing output
HTML
HTML output
41
42
43
44