SAS - Fundamentals - Statistics

SAS System may be viewed as a library of prewritten statistical algorithms.
By submitting a brief SAS program, you can access a procedure from the library and use it to analyze a set of data.
Most statistical research involves gathering data and performing analyses to determine what data means.
_________________________________________________________________________________________________
The process of research often begins by developing a clear statement of the research question (or questions). The
research question is a statement of what you hope to have learned by the time the research has been completed. It is
good practice to revise and refine the research question several times to ensure that you are very clear about what it is
you really want to know. The reason behind refining and revising the research question is to make it concise w.r.t various
events and variables so that the research is correctly focused Ex: "What variables have a causal effect on an event", once
the research question has been more clearly defined, you are in a better position to develop a good hypothesis that
provides a possible answer to the question.
A hypothesis is a statement about the predicted relationships among events or variables. In developing the hypothesis,
you might be influenced by any number of sources: an existing theory, some related research, or even personal
experience. For ex. we can state a hypothesis "the amount of Y increases the amount of X" below figure illustrates this
relationship:
The variable being affected (X) appears on the left side of the
figure, the causal variable (Y) appears on the right and the
arrow reflects the prediction that Y is the causal variable, and
X is the variable being affected.
A statistical null hypothesis is typically a prediction that there
is no difference between groups in the population, or that
there is no relationship between variables in the population.
You will analyze the data from your sample, and if the observed difference is large enough, you will reject this null
hypothesis of no difference. Rejecting this statistical null hypothesis means that you have obtained some support for your
original research hypothesis (the hypothesis that there is a difference between the groups).
Statistical null hypotheses are often represented symbolically.
A statistical alternative hypothesis is typically a prediction that there is a difference between groups in the population,
or that there is relationship between variables in the population. The alternative hypothesis is the counterpart to the null
hypothesis; if you reject the null hypothesis, you tentatively accept the alternative hypothesis. There are different ways
that you can state alternative hypotheses. One way is simply to predict that there is a difference between the population
means, without predicting which population mean is higher.
A Nondirectional hypotheses (also known as a two-sided or two-tailed alternative hypothesis) simply predicts that one
population mean differs from the other population meanit does not predict which population mean will be higher. With
a nondirectional alternative hypothesis, you are predicting some type of difference, but you are not predicting the specific
nature, or direction, of the difference.
Directional hypotheses (also known as a one-sided or one-tailed alternative hypothesis) not only predicts that there will
be a difference, but also makes a specific prediction about which population will display the higher mean.
Choosing directional versus non-directional tests?

Most statistics textbooks recommend using a nondirectional, or two-sided, alternative hypothesis, in most cases. The
problem with the directional hypothesis is that if your obtained sample means are in the opposite direction of the direction
that you predict, it can cause you to fail to reject the null hypothesis even when there are very large differences between
the sample means.
____________________________________________________________________________
Data is defined as a collection of scores that are obtained when subject characteristics and/or performance are observed
and recorded. Different types of instruments can be used to obtain different types of data.
With the null hypothesis stated, you can now test it by conducting a study in which you gather and analyze relevant data.
The study results would lend some support to your research hypothesis; if not, the results would fail to provide support.
In either case, you would be able to draw conclusions regarding the tenability of your hypotheses, and would have made
some progress toward answering your research question. For example, if you obtained support for your hypothesis with
a correlational study, you might choose to follow it up with a study using a different research method, perhaps an
experimental study.
A variable refers to some specific characteristic of a subject that can assume one or more different values. A value, refers
to either a particular subject's relative standing on a quantitative variable, or a subject's classification within a classification
variable. Quantitative variables represent the quantity, or amount, of the construct that is being assessed, numbers
typically serve as values. In Classification (qualitative/categorical) variables different values represent different groups
to which the subject might belong.
Classifying Variables According to Their Scales of Measurement:

Four different scales of measurement: nominal, ordinal, interval, and ratio.
Before analyzing a data set, it is important to determine which scales of measurement were used because certain types
of statistical procedures require specific scales of measurement. For example, a one-way analysis of variance generally
requires that the dependent variable be an interval-level or ratio-level variable; the chi-square test of independence allows
you to analyze nominal-level variables; other statistics make other assumptions about the scale of measurement used
with the variables that are being studied.
A nominal scale is a classification system that places people, objects, or other entities into mutually exclusive categories.
A variable that is measured using a nominal scale is a classification variable: It simply indicates the name of the group to
which each subject belongs e.g., sex and political party, they tell you which group a subject belongs to, but they do not
provide any quantitative information about the subjects. With the remaining three scales of measurement, however, some
quantitative information is provided.
Values on an ordinal scale represent the rank order of the subjects with respect to the variable that is being assessed.
However, an ordinal scale has a serious limitation in that equal differences in scale values do not necessarily have equal
quantitative meaning. These rankings tell us very little about the quantitative differences between the subjects with regard
to the underlying construct (effectiveness, in this case). An ordinal scale simply provides a rank order of who is better than
whom.
With an interval scale, equal differences between scale values do have equal quantitative meaning. For this reason, you
can see that the interval scale provides more quantitative information than the ordinal scale. A good example of an interval
scale is the Fahrenheit scale used to measure temperature. With the Fahrenheit scale, the difference between 70 degrees
and 75 degrees is equal to the difference between 80 degrees and 85 degrees: the units of measurement are equal
throughout the full range of the scale. However, the interval scale also has an important limitation: it does not have a true
zero point. A true zero-point means that a value of zero on the scale represent zero quantity of the construct being
assessed. It should be obvious that the Fahrenheit scale does not have a true zero point. When the thermometer reads
zero degrees, that does not mean that there is absolutely no heat present in the environmentit is still possible for the
temperature to go lower (into the negative numbers). A true zero point can be found only with variables measured on a
ratio scale.
Ratio scales are similar to interval scales in that equal differences between scale values do have equal quantitative
meaning. However, ratio scales also have a true zero point, which gives them an additional property: with ratio scales, it
is possible to make meaningful statements about the ratios between scale values. For example, the system of inches used
with a common ruler is an example of a ratio scale. There is a true zero point with this system, in that zero inches does
in fact indicate a complete absence of length. With this scale, it is possible to make meaningful statements about ratios.
Classifying Variables According to the Number of Values They Display: Three types:
Dichotomous Variable assumes just two values, sometimes called binary variables, ex. sex
Limited-value Variable assumes just two to six values in your sample.
Multi-value Variable assumes more than six values in your sample.
Basic Approaches to Research

Nonexperimental Research (correlational, nonmanipulative, or observational research): Researcher simply studies the
naturally-occurring relationship between two or more naturally-occurring variables(variable that is not manipulated or
controlled by the researcher; it is simply measured as it normally exists). With nonexperimental designs, researchers often
refer to criterion variables and predictor variables. A criterion variable is an outcome variable that can be predicted from
one or more predictor variables. The criterion variable is often the main focus of the study in that it is the outcome variable
mentioned in the statement of the research problem. The predictor variable, on the other hand, is the variable that is
used to predict values on the criterion. In some studies, you might even believe that the predictor variable has a causal
effect on the criterion. It should be noted here that nonexperimental research that investigates the relationship between
just two variables generally provides very weak evidence concerning cause-and-effect relationships.
To obtain stronger evidence of cause and effect, researchers generally either analyze the relationships among a larger
number of variables using sophisticated statistical procedures such as structural equation modeling, or drop the
nonexperimental approach entirely and instead use experimental research methods.
Experimental Research characteristics:
subjects are randomly assigned to experimental conditions

the researcher manipulates an independent variable
subjects in different experimental conditions are treated similarly with regard to all variables except the
independent variable
The independent variable is that variable whose values (or levels) are selected by the experimenter to determine what
effect the independent variable has on the dependent variable. The independent variable is the experimental counterpart
to a predictor variable. A dependent variable, on the other hand, is some aspect of the subject's behavior that is assessed
to determine whether it has been affected by the independent variable. The dependent variable is the experimental
counterpart to a criterion variable.
predictor variable and criterion variable can be used with almost any type of researchexperimental or
nonexperimental. However, the terms independent variable and dependent variable should be used only with
experimental researchresearch conducted under controlled conditions with a true manipulated independent variable.
Levels of the independent variable: experimental conditions or treatment conditions, corresponding to the different
groups to which a subject might be assigned. With respect to the independent variable, it is common to speak of the
experimental group versus the control group. Generally speaking, the experimental group is the group that receives the
experimental treatment of interest, while the control group is an equivalent group of subjects that does not receive this
treatment. The simplest type of experiment consists of one experimental group and one control group.
Type-of variable figure: graphically illustrates the number of values that are assumed by predictor and criterion variables.
Illustrating an experiment with a type-of-variable figure:
place the symbol for the dependent variable on the left side of the equals sign (=),
place the symbol for the independent variable on the right side of the equals sign.
The preceding type-of-variable figure could be used to illustrate any experiment in which the dependent variable was a
multi-value variable and the independent variable was a dichotomous variable.
Three Types of SAS Files: one file will contain the SAS program, one will contain the SAS log, and one will contain the SAS
output.
SAS Program: consists of a set of statements written by the user which provide SAS System with the data to be analyzed,
tell SAS about the nature of the data, and indicate which statistical analyses should be performed on the data. These
statements are usually typed as data lines in a file in the computers memory. The DATA step versus the PROC step. There
is another, more fundamental way, to divide a SAS program into its constituent components. It is possible to think of each
SAS program as consisting of a DATA step and a PROC step. In the DATA step, programming statements create and/or
modify a SAS data set. the PROC step includes statements that request specific statistical analyses of the data.
After submitting the SAS program. Once the preceding program has been submitted for analysis, SAS will create two types
of files reporting the results of the analysis, SAS log file & SAS output file.
SAS log is generated by SAS after you submit your program. It is a summary of notes and messages generated by SAS as
your program executes. These notes and messages will help you verify that your SAS program ran correctly.
Specifically, the SAS log provides
a reprinting of the SAS program that was submitted (minus the data lines)
a listing of notes indicating how many variables and observations are contained in the data set
a listing of any notes, warnings, or error messages generated during the execution of the SAS program
The SAS output file contains the results of the statistical analyses requested in the SAS program. An output file is
sometimes called a listing file, because it contains a listing of the results of the analyses that were requested.

SAS - Fundamentals - Statistics

Загружено:

Сведения о документе

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

SAS - Fundamentals - Statistics

Загружено:

Авторское право:

Доступные форматы

SAS System may be viewed as a library of prewritten statistical algorithms.

Choosing directional versus non-directional tests?

Classifying Variables According to Their Scales of Measurement:

Basic Approaches to Research

subjects are randomly assigned to experimental conditions

Illustrating an experiment with a type-of-variable figure:

Вам также может понравиться