2 views

Uploaded by Rutwik

- For 6520-Research II Final Review
- Introducing SigmaXL Version 7 - Aug 13 2014
- Data Analysis Guide Spss
- SAS Macro Processing
- Spss Analysis
- Presentation - Kruskal
- Studies of Similarity.pdf
- Attitudes Towards Biology Jae Bs
- Measuring Security Price Performance (S. Brown, J. Warner)
- Materials and Methods for Noodles
- Final Report on Qtia
- Introduction to Statistics
- SSRN-id1133042
- An Analysis of Ma 105 Grades
- Inferential Statistics
- mn
- glm
- Lab 04 BIOL 1510 Lab Manual Plant Defenses F2017.pdf
- lindsay new spss
- [Jurnal]the Wealth Effect of S Strategic Alliances

You are on page 1of 44

SAS

FOR

STATISTICAL ANALYSIS

OVERVIEW

SAS/STAT Software

Component of the SAS System

Provides comprehensive statistical tools for a wide range of statistical

analyses, including analysis of variance, regression, categorical data

analysis, multivariate analysis, survival analysis.

In addition to 54 procedures for statistical analysis, SAS/STAT

software also includes the Market Research Application (MRA), a

point-and-click interface to commonly used techniques in market

research..

BASIC STATISTICS

Mean

Median

Mode

Dispersion

Standard Deviation

Range

Percentiles

Quartiles

MEAN

An arithmetic average

Procedure for computing

Add up the numbers

Divide by the number of observations

Example

+ 90)

=535

6

=89.1

MEDIAN

Mid value

Procedure For Computing

Sorting the data

For Even number of observation

Median= average of(n/2)th obs and (n/2)+1th obs

For Odd number of observation

Median=(n/2)th obs

Example: 80 85 90 90 90 100

n=6

Median = (3rd obs+4thobs)/2

= (90+90)/2

= 90

MODE

Most frequently occurring observation

The value that is repeated most often in the data set.

Example:

data = 80 85 90 90 90 100

since there are 3 90's

Mode = 90

DISPERSION

Standard Deviation

Squared root of the average of the Squared distances of the

observation from the mean.

Range

Difference between highest and lowest observed value.

Percentiles

Divide the data into 100 equal parts.

Quartiles

Divide the data into 4 equal parts.

PROBABILITY

Event :

One or more of the possible outcome of doing something.

Example:

The event that we'll get over an inch of rain tomorrow, which

reflects the likelihood that we will get this much rain .

PROBABILITY

A probability is a number from 0 to 1.

will occur.

If an event having probability 1, this indicates that this event always

will occur.

This means that it is just as likely for the event to occur as for the

event to not occur.

HYPOTHESIS TESTING

Procedure for making rational decision about the reality of effects.

Setting up and testing hypotheses is an essential part of statistical

inference.

Example :

Claiming that a new drug is better than the current drug for treatment of the

same symptoms

In each problem considered ,the question of interest is simplified into two

competing claims/hypothesis.

Null Hypothesis(Ho)

Alternate Hypothesis(H1).

10

NULL HYPOTHESIS

The hypothesis that there were no effects is called the NULL

HYPOTHESIS.(Ho)

Note : unlike geometry, we cannot prove the effects are real, rather we may

decide the effects are real.

Example :

In a clinical trial of a new drug, the null hypothesis might be that the new

drug is no better, on average, than the current drug.

Ho: there is no difference between the two drugs on average.

11

ALTERNATIVE HYPOTHESIS

definition

Example :

In a clinical trial of a new drug, the alternative hypothesis might be that

the new drug has a different effect, on average, compared to that of the

current drug.

H1: the two drugs have different effects, on average.

OR

H1: the new drug is better than the current drug, on average.

12

P-VALUE

Probability of wrongly rejecting the null hypothesis if it is in fact true.

The p-value is compared with the significance level ,

if it is smaller, the result is significant.

i.e. If p-value <0.05

then it indicates the strength of evidence for say, rejecting the null

hypothesis H0,

rather than concluding 'reject H0' or 'do not reject H0'.

13

SIGNIFICANCE LEVEL

"Does a 5 percent significance level mean there is only a 5% chance that

my results are significant?"

The significance level is actually the alpha.(

because of random variation (luck).

14

T TEST

T TEST is performed on three types of samples.

One sample

Two samples

Paired observations

15

T TEST

One

sample t-test

given number.

The two-sample t test compares the mean of the first sample

minus the mean of the second sample to a given number.

Paired t-test

The paired observations t test compares the mean of the

differences in the observations to a given number.

16

REGRESSION ANALYSIS

Regression analysis is the analysis of the relationship

between one variable and another set of variables.

Where

yi is the response variable

xi is a regressor variable

0 and 1 are unknown parameters to be estimated

i is an error term.

17

ANALYSIS OF VARIANCE

Analysis of variance (ANOVA) is a technique for analyzing experimental

data in which one or more response (or dependent or simply Y)

variables are measured under various conditions identified by one or

more classification variables.

Example :

An experiment may measure weight change (the dependent

variable) for men and women who participated in three different

weight-loss programs. The six cells of the design are formed by

the six combinations of sex (men, women) and program (A, B, C).

18

SAS/STAT

There are 54 procedures for statistical analysis.

Analysis of variance

Generalized linear models

Categorical data analysis

Mixed models

Survival analysis

Multivariate techniques

Nonparametric analysis

Psychometric analysis

19

PROC T TEST

.

CLASS variable ;

PAIRED variables ;

BY variables ;

VAR variables ;

FREQ variable ;

WEIGHT variable ;

No statement can be used more than once. There is no restriction on the order of

the statements after the PROC statement.

20

COMPARSION BETWEEN

PROC GLM AND PROC ANOVA

GLM procedure can analyze for both

balanced and unbalanced data.

handle balanced data (that is, data

with equal numbers of observations

for every combination of the

classification factors).

.

PROC ANOVA takes into account the special structure of a balanced design,

it is faster and uses less storage than PROC GLM for balanced data

21

COMPARSION BETWEEN

PROC GLM AND PROC MIXED

In Random statement ,PROC GLM

effects are treated as fixed and

computes expected mean squares.

computes REML and ML estimates of

variance parameters

MIXED is used to specify covariance

structures for repeated measurements

on subjects.

GLM is used to specify various

transformations with which to

conduct the traditional univariate or

multivariate tests.

PROC MIXED is more flexible and more widely applicable than

either the univariate or multivariate approaches

22

PROC ANOVA

PROC ANOVA < options > ;

CLASS variables ;

MODEL dependents=effects < / options > ;

ABSORB variables ;

BY variables ;

FREQ variable ;

MANOVA < test-options >< / detail-options > ;

MEANS effects < / options > ;

REPEATED factor-specification < / options > ;

TEST < H=effects > E=effect ;

23

PROC MIXED

The primary assumptions underlying the analyses performed by PROC

MIXED are as follows:

The data are normally distributed .

The means (expected values) of the data are linear in terms of a certain

set of parameters.

The variances and covariances of the data are in terms of a different

set of parameters, and they exhibit a structure matching one of those

available in PROC MIXED

24

PROC MIXED

The following statements are available in PROC MIXED.

PROC MIXED < options > ;

BY variables ;

CLASS variables ;

ID variables ;

MODEL dependent = < fixed-effects > < / options > ;

RANDOM random-effects < / options > ;

REPEATED < repeated-effect > < / options > ;

PARMS (value-list) ... < / options > ;

25

PRIOR < distribution > < / options > ;

CONTRAST 'label' < fixed-effect values ... >

< | random-effect values ... > , ... < /

options > ;

ESTIMATE 'label' < fixed-effect values ... >

< | random-effect values ... >< /

options > ;

LSMEANS fixed-effects < / options > ;

MAKE 'table' OUT=SAS-data-set ;

WEIGHT variable ;

PROC GLM

GLM procedure can be used for many different analyses, including

Simple regression

Multiple regression

Analysis of variance (ANOVA), especially for unbalanced data

Analysis of covariance

Response-surface models

Weighted regression

Polynomial regression

Partial correlation

Multivariate analysis of variance (MANOVA)

Repeated measures analysis of variance

27

PROC GLM

The following statements are available in PROC GLM.

PROC GLM < options > ;

CLASS variables ;

MODEL dependents=independents < / options > ;

ABSORB variables ;

BY variables ;

FREQ variable ;

ID variables ;

WEIGHT variable ;

28

PROC GLM(cont.)

CONTRAST 'label' effect values < ... effect values > < / options > ;

ESTIMATE 'label' effect values < ... effect values > < / options > ;

LSMEANS effects < / options > ;

MANOVA < test-options >< / detail-options > ;

MEANS effects < / options > ;

OUTPUT < OUT=SAS-data-set >

keyword=names < ... keyword=names > < / option > ;

RANDOM effects < / options > ;

REPEATED factor-specification < / options > ;

TEST < H=effects > E=effect < / options > ;

29

EXAMPLE ON GLM

data exp;

input A $ B $ Y @@;

datalines;

A1 B1 12 A1 B1 14

A1 B2 11 A1 B2 9

A2 B1 20 A2 B1 18

A2 B2 17

;

proc glm;

class A B;

model Y=A B A*B;

run;

30

EXAMPLE ON GLM

31

PROC FREQ

FREQ procedure produces one-way to n-way frequency and

crosstabulation (contingency) tables.

The statistics for contingency tables include

Chi-square tests and measures

Measures of association

Risks (binomial proportions) and risk differences for 22 tables

Odds ratios and relative risks for 22 tables

Tests for trend

Tests and measures of agreement

32

PROC FREQ

PROC FREQ < options > ;

BY variables ;

EXACT statistic-options < / computation-options > ;

OUTPUT < OUT=SAS-data-set > options ;

TABLES requests < / options > ;

TEST options ;

WEIGHT variable

33

PROC TABULATE

Simple but powerful methods to create tabular reports .

Flexibility in classifying the values of variables and establishing

hierarchical relationships between the variables.

Mechanisms for labeling and formatting variables and proceduregenerated statistics.

34

PROC TABULATE

PROC TABULATE <option(s)BY <DESCENDING> variable-1

<...<DESCENDING> variable-n>

<NOTSORTED>;

CLASS variable(s) </ options>;

CLASSLEV variable(s) / style =<style-element-name | <PARENT>> <[styleattribute-specification(s)]>;

FREQ variable;

KEYLABEL keyword-1='description-1'

<...keyword-n='description-n'>;

KEYWORD keyword(s) / style =<style-element-name | <PARENT>> <[styleattribute-specification(s)]>;

TABLE <<page-expression,> row-expression,> column-expression </ tableoption(s)>;

VAR analysis-variable(s)</ options>;

WEIGHT variable;

>;

35

PROC UNIVARIATE

The UNIVARIATE procedure provides data summarization tools, highresolution graphics displays, and information on the distribution of

numeric variables.

calculates descriptive statistics based on moments

calculates the median, mode, range, and quantiles

calculates the robust estimates of location and scale

calculates confidence limits

generates frequency tables

performs goodness-of-fit tests for fitted parametric and nonparametric

distributions

creates quantile-quantile plots and probability plots for various

theoretical distributions

36

PROC UNIVARIATE

PROC UNIVARIATE <option(s)BY <DESCENDING> variable-1

<...<DESCENDING> variable-n>

<NOTSORTED>;

CLASS variable-1<(variable-option(s))> <variable-2<(variable-option(s))>>

</ KEYLEVEL='value1'|('value1' 'value2')>;

FREQ variable;

HISTOGRAM <variable(s)> </ option(s)>;

ID variable(s);

INSET <keyword(s) DATA=SAS-data-set> </ option(s)>;

OUTPUT <OUT=SAS-data-set> statistic-keyword-1=name(s)

<... statistic-keyword-n=name(s)> <percentiles-specification>;

PROBPLOT <variable(s)> </ option(s)>;

QQPLOT <variable(s)> </ option(s)>;

VAR variable(s);

>WEIGHT variable;

;

37

PROC NPAR1WAY

The NPAR1WAY procedure performs nonparametric tests for location

and scale differences across a one-way classification.

PROC NPAR1WAY also provides a standard analysis of variance on

the raw data and statistics based on the empirical distribution function.

PROC NPAR1WAY provides tests using the raw input data as scores.

When the data are classified into two samples, tests are based on

simple linear rank statistics.

When the data are classified into more than two samples, tests are

based on one-way ANOVA statistics.

Both asymptotic and exact p-values are available for these tests.

38

PROC NPAR1WAY

PROC NPAR1WAY < options > ;

BY variables ;

CLASS variable ;

EXACT statistic-options < / computation-options > ;

FREQ variable ;

OUTPUT < OUT=SAS-data-set > < options > ;

VAR variables

39

Display your output in Rich-Text-Format (RTF)

Create SAS data sets directly from output tables

Select or exclude individual output tables

Customize the layout, format, and headers of your output

ODS combines raw data with one or more table definitions to produce

one or more output objects. These objects can be sent to any or all

ODS destinations.

40

How ODS Works ?

In your ODS statement(s), you specify one or more

destinations for your output

This destination . . .

Produces . . .

Output

Listing

listing output

HTML

HTML output

41

42

ODS LISTING <action>;

ODS LISTING <DATAPANEL=number | DATA | PAGE>;

ODS HTML HTML-file-specification(s) <option(s)>;

ODS OUTPUT data-set-definition(s);

43

44

- For 6520-Research II Final ReviewUploaded byrcolb18339
- Introducing SigmaXL Version 7 - Aug 13 2014Uploaded byljupcoaleksov
- Data Analysis Guide SpssUploaded byUrvashi Khedoo
- SAS Macro ProcessingUploaded bynightcrawler191
- Spss AnalysisUploaded byJishu Twaddler D'Crux
- Studies of Similarity.pdfUploaded byjuan_montivero
- Attitudes Towards Biology Jae BsUploaded bymohammed issaka
- Measuring Security Price Performance (S. Brown, J. Warner)Uploaded byAlizada Huseynov
- Presentation - KruskalUploaded byshaman1221
- Materials and Methods for NoodlesUploaded byjeevithra
- Final Report on QtiaUploaded bywasifq
- Introduction to StatisticsUploaded byAhmed Kadem Arab
- SSRN-id1133042Uploaded byboohayomega
- An Analysis of Ma 105 GradesUploaded byJacob Brazeal
- Inferential StatisticsUploaded byjanette
- mnUploaded byAdi Prayoga
- glmUploaded byNestor Fabellar
- Lab 04 BIOL 1510 Lab Manual Plant Defenses F2017.pdfUploaded bygihon
- lindsay new spssUploaded byapi-242910883
- [Jurnal]the Wealth Effect of S Strategic AlliancesUploaded bymealleta
- The Effects of the Financial and Administrative Factors on University-Industry Interaction in Kohgiloyeh and Boyerahmad Province, IranUploaded byTI Journals Publishing
- instrumenUploaded byFinga Hitam Manies
- analisa rujukUploaded byEsa Rosyida Umam
- Chapter13 NewUploaded byKaustubh Tirpude
- aaasUploaded byKenneth Koh
- social statisticsUploaded byAntonio Concepcion
- nsg research ResultUploaded byMarc Abaya
- ArticleUploaded bymatias_moroni_1
- minitab assignment g.10Uploaded byKatherine Alombro
- Content ServerUploaded byMario Tadeo Partida

- ARVIND GOODHILL.docxUploaded byVachan Abhilekh Thakur
- Spatial Stats SoftwareUploaded bycampus
- Data AnalyticsUploaded bysheebakbs5144
- ALL NOTES FOR MECHUploaded bymguru946
- marketing research-an applied orientationUploaded byCristina Damian
- Notes on ForecastingUploaded byJulienne Aristoza
- 03 Practice QuizzesUploaded byflathogath
- Demand EstimationUploaded byvivek1119
- Factor Selection for Delay Analysis Using Knowledge Discovery in DatabasesUploaded byIngrid Johanna
- Repeated Measure ANOVA_Between and Within SubjectsUploaded byFenil Shah
- InstructionUploaded byReza Riantono Sukarno
- Pd SegmentationUploaded byRazel Legaspi Villanueva
- l13_ancovaUploaded bydamysaputra
- Week 11 Tutorial Slides1Uploaded byJuan Ramírez Sánchez
- Research Proposal Stealth MarketingUploaded byjeganathan
- Acl ExercisesUploaded byboludu
- SCM Plan to ProduceUploaded bysaisrinivas64
- As Tuti 2017Uploaded byVisha Vsh
- Tidy dataUploaded byPatrick Mugo
- ResearchUploaded byLynette Pearce
- K-NN MethodUploaded byFatima Waheed
- DWH QUIZUploaded byjash
- finalexamsession1_2010Uploaded byStavros Mouslopoulos
- Jac Methods Ch06Uploaded bymsoon
- Time series 2.docxUploaded byMasudul Islam
- EkonometrikaUploaded byIbnu Kurniawan Soetomo
- Factor Analysis PptUploaded byvipul7589
- FormulasUploaded byramboram10
- Strang 2017 Analyzing Relationships in Terrorism Big DataUploaded byLaura Craig
- MlrUploaded byJoanne Wong