Академический Документы
Профессиональный Документы
Культура Документы
Christopher P. Halter
JASP Guide
ISBN: 171702601X
ISBN-13: 978-1717026019
ii
CONTENTS
SECTION I: AN INTRODUCTION 1
Chapter 1 The Guide 2
Notes about the statistics guide 2
The Philosophy Behind This Book and the Open Source Community 3
Notes about the data 3
i
JASP Guide
ii
First Edition
iii
JASP Guide
References 236
Index 237
iv
First Edition
v
JASP Guide
THANK YOU.
vi
Section I: An Introduction
1
JASP Guide
Chapter 1
The Guide
The JASP Guide’s purpose is to assist the novice social science and education
researcher in interpreting statistical output data using the JASP Statistical Analysis
application. Through the examples and guidance, you will be able to select the
statistical test that is appropriate for your data, apply the inferential test to your data,
and interpret a statistical test’s results table.
The Guide goes into the uses of some of the most common statistical tests and
discusses some of the limitations of those tests, i.e. Chi-square using Contingency
Tables, t-Test, ANOVA, ANCOVA, Correlation, and Regressions (Linear and
Binomial). The ANOVA description includes procedures for conducting the One-
Way ANOVA, as well as the General Linear Model (GLM) for other types of
ANOVA analysis.
2
First Edition
Exploratory Factor Analysis has been included in this guide as a valuable procedure
for data reduction. Reliability tests will be discussed as a way to verify the reliability
of coding data between researchers.
The focus of this guide will be the typical statistical analysis tools that may be useful
for the novice or beginning researcher. Only a subset of the tools currently available
in JASP will be covered. The guide does not include all of the tools and features
available within the JASP application. The ones that were viewed as most common
and readily available to the novice researcher are included.
The JASP application supports both Frequentist and Bayesian procedures. These
procedures are each explored in the specific section for those analysis methods.
The sample window views and output tables shown in this guide were mainly created
from JASP 0.8.6.
So began my search for an alternative that would be useful in learning basic analysis
skills and capable of performing basic statistical analysis tests. This brought me to
JASP. Developed at the University of Amsterdam, this powerful software package is
effective and easy to use. Another key feature of the open source group is that the
software is distributed free of charge.
Data shown in this guide are for demonstration purposes and do not represent actual
research that has been previously conducted. Specific data types will be used to show
the steps and procedures for various analysis methods; i.e. t-Test, ANOVA,
Reliability, etc. This sample data should not be used to make assumptions or claims
about the populations that they represent.
3
JASP Guide
The data codebook and data tables can be found in the appendix. Below is a short
description of each data set that will be used in the examples.
The Sample Student Data (SSD) is based on data from the High School & Beyond
study commissioned by the Center on Education Policy (CEP) and conducted by
researcher Harold Wenglinsky.
The majority of the SSD data points were drawn from the larger HSB data, with 200
of the records being drawn to be used in this guide. The variables of gender, race,
socioeconomic status, school type attended, program type enrolled, as well as
assessment scores for reading, writing, mathematics, science, and social studies have
been drawn from the HSB data.
The SSD data points concerning first generation college attendance and attending
college after high school are fictitious data based on known information about
similar populations and based on the student demographics of the HSB data.
The Test Scores (TestScores) data contains fictitious assessment scores from an
advanced mathematics course. The data points include the student ID, a pre-test
assessment score, a post-test assessment score.
The Reliability for Agreement (Reliable1) data is an exercise in how well a group of
assessors agree on applying a rubric assessment score. The data points include the
item number and a value for whether or not the 5 assessment scorers applied the
same rubric value.
4
First Edition
The Reliability for Accuracy (Reliable2) data is an exercise in how well a group of
assessors applied a rubric score that contained 4 performance levels. The data points
include the item number and assessment scorers applied by each of the 5 scorers.
The Favorite Class (FavClass) data file is a survey asking students to rate their classes
on a scale from 1-4, with 1 meaning they hate the class and 4 meaning they love the
class. The classes being rated are Biology, Geology, Chemistry, Algebra, Calculus,
and Statistics.
To download these sample datasets and run your own analysis, please visit
https://tinyurl.com/JASP-sampledata
The guide uses screenshots from the actual JASP application. These screenshots are
to demonstrate the pull-down menus and settings dialogue boxes used by JASP.
Unfortunately, these screenshots tend to be lower quality than other graphics.
The graphics representing the JASP application are presented in the best possible
quality for a computer screenshot.
5
Chapter 2
Overview of Frequentist Statistical
Analysis in Social Science
One of the assumptions made about quantitative research methods is that they
merely deal with numbers. And let’s face it, to many of us numbers are quite boring.
A well-constructed data table or beautifully drawn graph does not capture the
imagination of most readers. But appropriately used quantitative methods can
uncover subtle differences, point out inconsistencies, or bring up more questions
that beg to be answered.
In short, thoughtful quantitative methods can help guide and shape the qualitative
story. This union of rich narratives and statistical evidence is at the heart of any good
mixed methods study. The researcher uses the data to guide the narrative. Together
these methods can reveal a more complete and complex account of the topic.
6
First Edition
Mark Twain (1804-1881) is often credited with describing the three types of lies as
“lies, damn lies, and statistics”. This phrase is still used in association with our view
of statistics. This may be due to the fact that one could manipulate statistical analysis
to give whatever outcome is being sought. Poor statistics has also been used to
support weak or inconsistent claims. This does not mean that the statistics is at fault,
but rather a researcher who used statistical methods inappropriately. As researchers
we must take great care in employing proper methods with our data.
Continuous data can be thought of as “scaled data”. The numbers may have been
obtained from some sort of assessment or from some counting activity. A common
example of continuous data is test scores.
All of these examples can be thought of as rational numbers. For those of us who
have not been in an Algebra class for a number of years, rational numbers can be
represented as fractions, which in turn can be represented as decimals. Rational
numbers can still be represented as whole numbers as well.
A subset of this sort of data can be called discrete data. Discrete data is obtained
from counting things. They are represented by whole numbers. Some examples of
discreet numbers include;
Categorical data is another type of important statistical data, and one that is often
used in social science research. As the name implies, categorical data is comprised of
categories or the names of things that we are interested in studying. In working with
statistical methods we often transform the names of our categories into numbers.
7
JASP Guide
Spanish 2
French 3
Cantonese 4
Primary Home Language Code Book
In the above example the numbers assigned to the categories do not signify any
differences or order in the languages. The numbers used here are “nominal” or used
to represent names.
Another example of categorical data could be the grade level of a high school
student. In this case we may be interested in high school freshmen, sophomores,
juniors, and seniors. Assigning a numerical label to these categories may make our
analysis simpler.
Sophomore 2
Junior 3
Senior 4
High School Grade Levels example
In this example, again the numbers are just representing the names of the high
school level, however they do have an order. “Freshman” comes prior to
“Sophomore”, which is prior to “Junior”, and also “Senior”. This sort of categorical
data can be described as ordinal, or representing an order.
8
First Edition
It is important to recognize the sort of data that is being used in the research analysis
process. A researcher should ask;
9
JASP Guide
Our data can also be classified as either parametric or nonparametric. This term
refers to the distribution of data points. Parametric data will have a “normal
distribution” that is generally shaped like the typical bell curve. Non-parametric data
does not have this normal distribution curve.
10
First Edition
Depending on the distribution of your data, various statistical analysis techniques are
available to use. Some methods are designed for parametric data while other
methods are better suited for non-parametric data distributions.
In choosing a statistical method we must consider both the character of the data as
well as the distribution of our data. The character, or data type, can be described as
nominal, ordinal, ratio, or interval. The distribution can be described as parametric or
non-parametric. These data features will lead us to selecting the most appropriate
statistical method for our analysis of the data.
Throughout this guide the examples will come from our sample data set that
contains both categorical and continuous data.
Often we take sample data to represent some larger population. We can calculate the
sample mean with certainty but the true mean of the population being represented
cannot be known for certain through the sample data. This is when the confidence
intervals come into consideration.
11
JASP Guide
The confidence interval is a calculated range of values for the true mean. We can
know with a certain amount of “confidence”, typically at the 95% confidence level,
that the true mean will fall within the specified confidence interval, or range.
For example, we may find that the mean of our sample is 52.65 for some measure.
The calculated confidence interval could be from 51.34 to 53.95 for the general
population. Therefore, given that the sample mean is 52.65, we can state with 95%
confidence that the true mean lies somewhere between 51.34 and 53.95 for the
population.
P-Value
What is a P-value?
In statistical analysis the way we measure the significance of any given test is with the
P-value. This value indicates the probability of obtaining the same statistical test
result by chance. Our calculated p-values are compared against some predetermined
significance level. The most common significance levels are the 95% Significance
Level, represented by a p-value of 0.05, and the 99% Significance Level, represented
by a p-value of 0.01. A significance level of 95%, or 0.05, indicates that we are
accepting the risk of being wrong 1 out of every 20 times. A significance level of
99%, or 0.01, indicates that we risk being wrong only 1 out of every 100 times.
The most common significance level used in the Social Sciences is 95%, so we are
looking for p-values < 0.05 in our test results.
However, in statistical analysis we are not looking to prove our test hypothesis with
the p-value. We are often trying to reject the Null Hypothesis.
In statistical testing the results are always comparing two competing hypothesis. The
null hypothesis is often the dull, boring hypothesis stating that there is an association
or relationship between the test populations or conditions. The null hypothesis tells
us that whatever phenomenon we were observing had no or very little impact. On
the other hand, we have the alternative, or researcher’s hypothesis. This is the
hypothesis that we are rooting for, the one that we want to accept in many cases. It is
the result we often want to find since it indicates that there are associations and
relationships between populations or conditions. Then we can take that next step to
explain or examine them more closely.
When we perform a statistical test, the p-value helps determine the significance of
the test and the validity of the claim being made. The claim that is always “on trial”
here is the null hypothesis. When the p-value is found to be statistically significant
(p < 0.05), or that it is highly statistically significant (p < 0.01), then we can conclude
12
First Edition
that the relationships or associations found in the observed data are very unlikely to
occur by chance if the null hypothesis is actually true. Therefore, the researcher can
“reject the null hypothesis”. If you reject the null hypothesis, then the alternative
hypothesis must be accepted. And this is often what we want as researchers.
The only question that the p-value addresses is whether or not the experiment or
data provides enough evidence to reasonably reject null hypothesis. The p-value or
calculated probability is the estimated probability of rejecting the null hypothesis of a
study question when that null hypothesis is actually true. In other words, it measures
the probability that you will be wrong in rejecting the null hypothesis. And all of this
is decided based on our predetermined significance level, in most cases the 95% level
or p < 0.05.
Let’s look at an example. Suppose your school purchases a SAT Prep curriculum in
the hopes that this will raise the SAT test scores of your students. Some students are
enrolled in the prep course while others are not enrolled in the prep course. At the
end of the course all your students take the SAT test and their resulting test scores
are compared.
In this example our null hypothesis would be that “the SAT prep curriculum had no
impact on student test scores”. This result would be bad news considering how
much time, effort, and money was invested in the test prep. The alternative
hypothesis is that the prep curriculum did have an impact on the test scores, and
hopefully the impact was to raise those scores. Our predetermined significance level
is 95%. After using a statistical test suppose that we find a p-value of 0.02, which is
indeed less than 0.05. We can reject the null hypothesis. Now that we have rejected
the null hypothesis the only other option is to accept the alternative hypothesis,
specifically that the scores are significantly different.
This result does NOT imply a "meaningful" or "important" difference in the data.
That conclusion is for you to decide when considering the real-world relevance of
your result. So again, statistical analysis is not the end point in research, but a
meaningful beginning point to help the researcher identify important and fruitful
directions suggested by the data.
It has been suggested that the idea of “rejecting the null hypothesis” has very little
meaning for social science research. The null hypothesis always states that there are
“no differences” to be found within your data. Can we really find NO
DIFFERENCES in the data? Are the results that we find between two groups ever
going to be identical to one another?
The practical answer to these questions is “No”. There will always be differences
present in our data. What we are really asking is whether or not those differences
have any statistical significance. As we discussed previously, our statistical tests are
aimed at producing the p-value that indicates the likelihood of having the differences
13
JASP Guide
occur purely by chance. And the significance level of p = 0.05 is just an agreed upon
value among many social scientist as the acceptable level to consider as statistically
significant.
And to find that the differences within the data are statistically significant may just be
a factor of having a large enough sample size to make those differences meaningful.
A small p-value (typically ≤ 0.05) indicates strong evidence against the null
hypothesis, so you reject the null hypothesis. A large p-value (> 0.05) indicates weak
evidence against the null hypothesis, so you fail to reject the null hypothesis. P-values
very close to the cutoff (0.05) are considered to be marginal so you could go either
way. But keep in mind that the choice of significance levels is arbitrary. We have
selected a significance level of 95% because of the conventions used in most Social
Science research. I could have easily selected a significance level of 80%, but then no
one would take my results very seriously.
Relying on the p-value alone can give you a false sense of security. The p-value is
also very sensitive to sample size. If a given sample size yields a p-value that is close
to the significance level, increasing the sample size can often shift the p-value in a
favorable direction, i.e. make the resulting value smaller.
So how can we use p-values and have a sense of the magnitude of the differences?
This is where Effect Size can help.
Effect Size
Whereas statistical tests of significance tell us the likelihood that experimental results
differ from chance expectations, effect-size measurements tell us the relative
magnitude of those differences found within the data. Effect sizes are especially
important because they allow us to compare the magnitude of results from one
population or sample to the next.
Effect size is not as sensitive to sample size since it relies on standard deviation in
the calculations. Effect size also allows us to move beyond the simple question of
“does this work or not?” or “is there a difference or not?”, but allows us to ask the
question “how well does some intervention work within the given context?”
Let’s take a look at an example that could, and has happened, to many of us when
conducting statistical analysis. When we compare two data sets, perhaps we are
looking at SAT assessment scores between a group of students who enrolled in a
SAT prep course and another group of students who did not enroll in the prep
course.
Suppose that the statistical test revealed a p-value of 0.043. We should be quite
pleased since this value would be below our significance level of 0.05 and we could
14
First Edition
report a statistical difference exists between the group of test takers enrolled in the
prep course and those who were not enrolled in the course. But what if the
calculated p-value was 0.057. Does this mean that the prep course is any less
effective?
So here is the bottom-line. The p-value calculation will help us decide if a difference
or association has some significance that should be explored further. The effect size
will give us a sense of the magnitude of any differences to help us decide if those
differences have any practical meaning and are worth exploring.
So both the p-value and the effect size can be used to assist the researcher in making
meaningful judgments about the differences found within our data.
Once we have calculated the effect size value we must determine if this value
represents a small, medium, or large effect. Jacob Cohen (1988) suggested various
effect size calculations and magnitudes in his text Statistical Power Analysis for the
Behavioral Sciences.
The values in the effect size magnitude chart can be thought of as a range of values
with the numbers in each column representing the midpoint of that particular range.
For example, the effect size chart for Phi suggests a small, medium, and large effect
size for the values of 0.1, 0.3, and 0.5 respectively. We could think of these as ranges
with the small effect for Phi ranging from 0.0 to approximately 0.2, the medium
effect size ranging from approximately 0.2 to 0.4, and the large effect size ranging
from approximately 0.4 and higher.
r
Correlation 0.1 0.3 0.5
Correlation and
r2 t-Test 0.01 0.09 0.25
(Independent)
Values from Cohen (1988) Statistical Power Analysis for the Behavioral Sciences
15
JASP Guide
The importance of effect size can be best summed up by Gene Glass, as cited in
Kline’s Beyond Significance Testing: Reforming Data Analysis Methods in Behavioral
Research, Washington DC: American Psychological Association; 2004. p. 95.
Statistical significance is the least interesting thing about the results. You should
describe the results in terms of measures of magnitude –not just, does a treatment
affect people, but how much does it affect them.
-Gene V. Glass
16
Chapter 3
Overview of Bayesian Statistical
Analysis in Social Science
After his death, the writings of Thomas Bayes were discovered by one of Bayes’
mathematician friends, Richard Price. Price published the work and theorems of
Bayes, sparking the birth of Bayesian Statistics.
𝑃 𝐵 𝐴 𝑃(𝐴)
𝑃 𝐴 𝐵 =
𝑃(𝐵)
• In this equation P(A) and P(B) are the probabilities of two independent
events and the P(B) ≠ 0.
17
JASP Guide
So, in simpler terms, Bayes claims that if we knew the likelihood of something
happening and had some speculation or evidence for the likelihood of the other
event happening, that we could predict the likelihood of this unknown event given
what we know about the known event.
Seems pretty clear, right? Let’s take a look at a more concrete example of this.
JASP will conduct calculations with our data and produce two main values of
interest; the posterior distribution and the Bayes Factor. The posterior distribution is
an update estimate of the effect size distribution range based on the current data.
The Bayes Factor gives an estimate of the likelihood on one hypothesis compared to
the other hypothesis.
P(Package | Doorbell)
Now let’s say that you do not receive packages very often, maybe 2% of the time,
but your doorbell ringing is a little more common with solicitors or the local girl
scouts selling cookies, about 12% of the time. We can also say the the package
delivery service rings the doorbell 95% of time when they deliver.
18
First Edition
𝑃 0.95 𝑃(0.02)
𝑃 𝑃𝑎𝑐𝑘𝑎𝑔𝑒 𝐷𝑜𝑜𝑟𝑏𝑒𝑙𝑙 =
𝑃(0.12)
So, there is a 16% probability, or likelihood, that if your doorbell rings it will be from
a package delivery.
Let’s suppose that a group of researchers recently announced that they had devised a
test that could detect if a person would become allergic to chocolate cake later in life.
This new test could save countless birthday parties and celebrations.
This new test for the development of chocolate cake allergies has only been around a
short period of time, but it has an impressive 95% accuracy rate of correctly
identifying people who will develop this allergy. It will give a false positive 10% of
the time, identifying those who will not develop the allergy as being at risk.
From previous data released by the research group, we know that 1% of the
population could develop this rare allergy. So we want to know, what is the
likelihood that you will develop the allergy if you get a positive result from this new
test. Is it the case that you have a 95% chance of being at risk? Maybe not.
19
JASP Guide
Fortunately, we can calculate the number of positive tests for anyone given the test
by adding up those who get a positive result and will get the allergy with those who
have a positive test and will not develop the allergy.
• 1% will have the allergy and 95% of them will get a positive test.
• 99% will not get the allergy and 10% of them will get a positive test.
This gives us the probability of a positive test result for anyone as;
Now we can calculate the likelihood that we will develop the allergy given a positive
test result.
P(PosTest|Allergy) = 0.95
P(Allergy) = 0.01
P(PosTest) = 0.107
=.>?∗=.=A
𝑃 𝐴𝑙𝑙𝑒𝑟𝑔𝑦 𝑃𝑜𝑠𝑇𝑒𝑠𝑡 = = 0.088 or 8.8%
=.A=B
So in this example, if you test positive for developing the chocolate cake allergy, then
there is actually an 8.8% probability, or likelihood, that you will in fact develop this
rare allergy.
If you have ever wondered how search engines can show advertisements related to
your previous searches or suggest websites based on your specific search terms, you
can thank Thomas Bayes. An even greater contribution of Reverend Thomas Bayes
to our everyday lives is providing the world with the algorithms and methods to
detect SPAM before it reaches your inbox!
Prior Distributions
Bayes statistical analysis allows us to use previous, or prior, knowledge about some
event in the calculation. As our knowledge is updated we can update the data in the
statistical model. This prior knowledge is called the “Prior Distribution”.
There are two basic types of priors in Bayesian statistics; non-informative priors and
informative priors. Non-informative priors are used when we do not have much
information about the data. Informative priors can be used as evidence is collected
and we have more expectations about the data.
20
First Edition
There are several models that illustrate the possible Prior Distributions, most notably
Cauchy’s, Student’s t-Test, and Normal probability distributions.
By Skbkekas - CC BY 3.0,
https://commons.wikimedia.org/w/index.php?curid=9649146
The t-distribution is symmetric and bell-shaped, like the normal distribution, but has
heavier tails, meaning that it is more prone to producing values that fall far from its
mean. This makes it useful for understanding the statistical behavior of certain types
of ratios of random quantities, in which variation in the denominator is amplified
and may produce outlying values when the denominator of the ratio falls close to
zero.
By Skbkekas - CC BY 3.0,
https://commons.wikimedia.org/w/index.php?curid=9546828
21
JASP Guide
22
First Edition
are expecting the effect size within our data to be outside the interval -0.707 to 0.707,
with the range centered at zero.
As more data is collected and additional information is gained, the researcher can
begin to move beyond the non-informative priors and select informative priors that
specify some known effect size ranges that may be centered around values other than
zero.
H0 ≠ H1
On the other hand, we can also test for directional differences with the null
hypothesis (H0) having greater differences or less differences than the Alternative
hypothesis (H1).
H0 > H1 or H0 < H1
When reporting the Bayes Factor, we can report either the likelihood of the Null
hypothesis when compared to the Alternative hypothesis (H01) or the likelihood of
the Alternative hypothesis when compared to the Null hypothesis (H10).
Several authors (Jeffries, 1961 and Rafferty, 1995) have published some guidelines in
reporting Bayes Factors and the terms to use when describing the strength of the
hypothesis.
23
JASP Guide
Another scale provided by Lee and Wagenmakers (2013), shown in the following
table, was adjusted from Jeffreys (1961).
24
First Edition
We can now take p-values, effect size, and Bayes factors into account when
determining the strength of our evidence for either the Alternative Hypothesis (H1)
or the Null Hypothesis (H0).
Statistic Interpretation
P-value
< 0.001 Decisive evidence against H0
0.001 – 0.01 Substantive evidence against H0
0.01 – 0.05 Positive evidence against H0
> 0.05 No evidence against H0
Evidence Categories for p-values (adapted from Wasserman, 2004, p. 157), for Effect Sizes (as
proposed by Cohen, 1988), and for Bayes Factor BF10 (Jeffreys, 1961).
Statistical Assumptions
The mathematics behind both Bayesian and Frequentist statistical analysis are based
on certain assumptions about the data. These assumptions allow the results to
“work” across a wide array of scenarios, while still performing reliably and providing
results that accurately describe the statistical model.
25
JASP Guide
Two of the most common assumptions are concerned with the Normality of the
data and its Homogeneity.
Real world data is often quite messy and rarely conforms to this perfectly shaped
distribution. The goal of normality is not to have data that mirrors the bell curve but
to have our data approximate this distribution curve as closely as possible.
Homogeneity assumes that the data of our sample populations have equal variation,
or equal enough that when the expected and observed variances are graphed on a
scatterplot they would form something that resembles a line.
We should keep in mind that both Normality and Homogeneity are assumptions
in the statistical models, but these are not requirements. In the end it is up to the
researcher to use good judgment when considering the statistical assumptions. A
novice researcher should adhere to these basic assumptions as closely as possible.
26
First Edition
Chapter 4
Getting Started with JASP
We will focus on using continuous and categorical data sets with the JASP statistical
analysis application.
27
JASP Guide
Continuous Data
• How large is your data set?
• Will all the data be manually entered into the spreadsheet?
• How many decimal places are required for your data?
• How will you “name” the data for easy reference?
• Are there any outliers in the data?
• How will you handle outliers?
Categorical Data
• What are the value names for each data item?
• How will you represent each value name with an integer value?
• Is your data nominal or ordinal? How will this guide the decision for
selecting values?
In the case of our “Sample Student Data” (SSD.sav) sample dataset the codebook is
represented in the table below.
28
First Edition
Some of the data in our sample set are nominal in nature. There is not any order to
the labels and an order should not be implied from the values. For example, the
value for socioeconomic status has been listed “low, middle, and high” with values
of “1, 2, and 3” assigned respectively. This does not imply the “low” is first,
“middle” is second, and “high” is third. These labels could have been placed in any
order and assigned any value.
On the other hand if our sample data had involved grade level or degrees attained,
then we might be able to assign values based on an order, so this would represent
ordinal data.
Sample CODEBOOK for Schooling Data:
29
JASP Guide
Data in JASP
JASP is able to handle and distinguish between four variables, or data types:
Nominal Text variables are typically used to identify aspects of the data that will not
be part of the statistical analysis. This data merely contains descriptions, key words,
or some other type of text information.
Nominal variables are categorical variables that are represented by numeric values.
For example, a variable “Gender” may have levels “0” and “1” representing males
and females respectively, as we find in the SSD example data used in this guide. Even
though these are numbers, they do not imply an order, and the distance between
them is not meaningful.
Ordinal variables are categorical variables with an inherent order. The ordinal
variables could represent some sort of order such as a grade level with 1 =
Freshman, 2 = Sophomore, 3 = Junior, and 4 = Senior. Note that the distance
between the numbers is not meaningful. JASP assumes that all ordinal variables have
been assigned numeric values.
Continuous variables are variables with values that allow a meaningful comparison of
distance. Examples include money, distance, and test scores. Often we make the
assumption that rubric scores are continuous variables since the rubric levels of 1, 2,
3, and 4 should represent some meaningful difference between them.
Variable types in JASP are often enforced. This means that JASP will not allow
performing a categorical variable analysis using continuous data or performing
continuous variable calculations with categorical (nominal and ordinal) data.
30
First Edition
The JASP statistical application is able to natively use “.sav” file formats. This means
that files created within SPSS, or any other compatible application can be easily
opened with JASP without having to convert the file.
Click on the “File” icon from the top left corner of the screen, then use the dialogue
windows to select the .sav data file from your computer.
The file will open with the variable names and data.
31
JASP Guide
JASP can also open files in other formats such as .csv (comma-separated values), .txt
(plain text), .sav (IBM’s SPSS), and .ods (OpenDocument Spreadsheet).
Spreadsheets in JASP
Setting Up the Spreadsheet
Getting the spreadsheet setup and the data entered is a simple process. There are a
few key points to keep in mind:
• In the spreadsheet, the columns contain each variable or data type and the
rows represent each case in the study. This is similar to the way JASP
displays the data.
• The first row of the spreadsheet should be the variable names with row 2
containing the first data. Variable names should be short, but meaningful to
you.
• Categorical data must be entered as its numerical value and not the name.
The codebook you created will come in handy for this process.
• Enter all the data.
OpenOffice Spreadsheet with data labels shown as numerical values vice names
32
First Edition
To open the .ods file in JASP, click on the File tab at the top of the screen. Using the
dialogue box, navigate to the .ods file to select and open the file. JASP will use the
entered data to determine the type of data in each column, i.e. Nominal, Ordinal, or
Continuous. The data type will be represented by an icon next to the column header.
The ruler icon represents Continuous data, the bar graph icon represents Categorical
Ordinal data, and the Venn Diagram icon represents Categorical Nominal data.
If JASP assigned an incorrect data type to your column, this can be corrected by
clicking on the icon to activate a pull down menu. The data type can be changed by
selecting the desired type.
33
JASP Guide
If the data type is Categorical, you can assign labels to the values by clicking on the
column header. This will show the Label window. When the .ods file is opened by
JASP, the values and the labels will initially be the same.
To change the label, click in the Label field and enter the desired text.
34
First Edition
Chapter 5
Hypothesis Building
Hypothesis Setting
JASP allows us to set the hypothesis type. There are three types that can be selected;
simple differences, positive differences, and negative differences.
35
JASP Guide
Recall that in Frequentist statistics we often use the 95% confidence level giving us a
critical value of p = 0.05.
Two-Tailed Distribution
In the figure above, we see the tails represented by the shaded region under the
curve. Both these regions would be outside the confidence level. The analysis would
be interested in differences that fell within either shaded region under the curve.
36
First Edition
are interested in only one of the critical sections under the curve, namely the shaded
region to the right of the median.
In the figure above, our one-tailed test is represented by the shaded region to the
right of the upper critical value. The analysis would only be interested in differences
that fell in this region.
37
JASP Guide
that we are interested in only one of the sections under the curve, namely the shaded
region to the left of the median.
In the figure above, our one-tailed test is represented by the shaded region to the left
of the lower critical value. The analysis would only be interested in differences that
fell in this region.
38
First Edition
The symbols in each dialogue box indicate the data types that may be entered for
analysis.
One way to think of the rows and columns is for the rows to represent an
independent variable, or factor, and for the columns to represent a depended
variable, or factor. In this sense our hypothesis would be that the row factor has
some impact on differences in the column factor.
39
JASP Guide
In the example above the row contains data about a student’s SES grouping while
the column contains data about school program enrolled, such as vocational
education, general education, or college prep.
The alternative hypothesis for this example could be stated as our belief that a
student’s SES grouping has some impact on the school program choices of students.
This model would explore differences in program type enrollment based on the
student SES group.
Relationship Hypothesis
Relationships, or differences within the data, are typically investigated with either a t-
Test or Analysis of Variance (ANOVA). With both of these methods we will build a
hypothesis about differences within some continuous data, such as an assessment
measure or test score, based on a grouping factor, such as school type or SES group.
The t-Test hypothesis window allows us to enter the dependent variable(s) and some
grouping factor. The dependent variable is the continuous measure, as indicated by
the ruler icon in the window. Any number of dependent variables can be entered
into this window, creating multiple analysis results. The grouping variable will be the
categorical data that we believe has an impact on the differences within the
dependent variable.
40
First Edition
In this example the alternative hypothesis can be stated as our belief that a student’s
science test scores will have differences that are based on the school type they
attended, either public or private school.
The ANOVA analysis is similar to the t-Test, in that there is a continuous dependent
variable with differences measured across three or more independent categorical
factors, referred to as Fixed factors.
When multiple Fixed Factors are entered we can test for interaction effects between
these factors.
41
JASP Guide
In this example the alternative hypothesis can be stated as our belief that student
reading scores will have differences that are based on their program type enrollment.
Association Hypothesis
Association testing is concerned with the predictive nature of one variable with
respect to another variable. Regression analysis is typically used for this sort of
question.
The regression hypothesis building window has Dependent Variables and Covariates.
The dependent variable is some measure that we believe can be predicted based on
the covariate. Both of these variables are continuous data.
42
First Edition
In this example the alternative hypothesis can be stated as our belief that a student’s
reading test scores can be predicted from their math test scores.
43
JASP Guide
44
First Edition
45
JASP Guide
Chapter 6
Descriptive Statistics
Categorical data is best described by exploring the frequencies within the data. The
frequencies will display the percentages of each category within the data set. The
following examples will use the Sample Student Data (SDS) dataset; see Resources
for Codebook and data values.
46
First Edition
When considering Categorical data, it is often helpful to investigate that data from a
descriptive standpoint. Here we are interested in frequencies, counts, comparisons
between categories, etc.
The Descriptive Statistics’ dialogue window, on the left side of the JASP window,
allows us to select the measures and visualizations that are needed to get a clear sense
of our data.
As these measures are selected, the results will appear in the JASP window on the
right side.
47
JASP Guide
Begin by moving a Categorical item into the Variables window. This can be
accomplished by either clicking and dragging the item or by selecting the item and
clicking the arrow between the two boxes.
Once the Categorical data item is moved into the Variables box, the Descriptives
results table will update with the new information. The next step will be considering
which measures make sense for the data you are using.
Checking “Frequencies Tables” below the Variables window will create a table in the
Results window that displays the frequencies of each item in the Category. In our
example we can see the counts and frequencies for low, middle, and high SES groups
contained within the data.
48
First Edition
Now we can move on to the Plots settings to decide which graphical visualizations
make sense for our data. In this case we will only be selecting “Distribution Plots”
for our Categorical data since neither Correlation or Box plots would provide us with
useful information about the SES category.
Next we can select settings from the “Statistics” section. In the case of our
Categorical data the only setting that will be the most useful is “Mode”. As we
deselect settings that are not needed and select other settings to view, our
Descriptive Statistics table on the right side window will automatically update.
49
JASP Guide
After all the Descriptive settings have been selected, click the “OK” button. If you
need to go back and make changes to any of the previously selected settings, simply
click in the output section of the Results window and the settings will be displayed in
the left side window.
Within the JASP Results window, we can include our own thoughts, notes, ideas, and
points of interest about the data. For each output section of the Results window you
will find a small pull-down arrow when the pointer hovers over the output.
Clicking the arrow will bring up the dialogue menu to “Add Note”.
The researcher’s note can then be entered into a new section above the output table.
One of the powerful features of JASP is that the output tables and graphics can
easily be imported into a word processing document. The JASP Results section also
displays the tables in APA format.
Use the same pull-down arrow, you can select “Copy”, to place the table into your
clipboard allowing it to be pasted into a different application or document.
50
First Edition
When the pull-down for graphs is selected, we are presented with the “Copy” option
as well as a “Save Image As” option, allowing us to export the graph as a .png image.
JASP has the capability to create graphs that are “Split” by some other variable in
your data. For example, we could examine the SES groups and how they are split, or
represented, within the different school types (private or public school).
To create these split graphs, move the main variable into the Variable box and move
the organizing variable into the “Split” box. Be sure to select “Distribution Plots” as
well.
51
JASP Guide
The JASP Results window will produce two separate graphs of this data.
By exploring the descriptive statistics for our data, questions may begin to emerge
and suggest directions for further investigation.
The Descriptive Statistics’ dialogue window, on the left side of the JASP window,
allows us to select the measures and visualizations that are needed to get a clear sense
of our data.
As these measures are selected, the output will appear in the JASP Results window
on the right side.
52
First Edition
Begin by moving a Continuous item into the Variables window. This can be
accomplished by either clicking and dragging the item or by selecting the item and
clicking the arrow between the two boxes.
Once the Continuous data item is moved into the Variables box, the Descriptives
results table will update with the new information. The next step will be considering
which measures make sense for the data you are using.
We can look at the Plots settings to decide which graphical visualizations make sense
for our data. In this case we will be selecting “Distribution Plots” for our
Continuous data as well as the Box plots to provide us with useful information.
53
JASP Guide
The Distribution plot will display a histogram of the data with a curve superimposed
over the graph. This curve will help us determine if the data has a “somewhat”
normal distribution. Box plots can also help us visualize how the data is spread out
across the quartiles. Each section of the Box plot contains 25%, or one-quarter, of
the data. In our example data set of 200 students, each section contains 50 student
scores.
In the case of our Writing assessment scores, we can notice from the Box plot that
the lowest quartile of scores is more spread out than the upper quartile of scores.
This could be something to consider as we begin analyzing the data.
Next we can select settings from the “Statistics” section. In the case of our
Continuous data, most of these settings will provide us with meaningful information
as we examine the measures and begin to make sense of the data. As we select
settings that are needed, the Descriptive Statistics table on the right side window will
automatically update.
54
First Edition
After all the Descriptive settings have been selected, click the “OK” button. If you
need to go back and make changes to any of the previously selected settings, simply
click in the output section of the Results window and the settings will be displayed in
the left side window.
55
JASP Guide
more heavily to the right of the mean and a negative skewness value indicates
data weighted to the left of the mean.
As we saw in the previous examples, JASP has the capability to create graphs that are
“Split” by some other variable in your data. For example, in this case we could
examine the Writing assessment and how they are split, or represented, within gender
to observe any differences between the scores of girls and boys.
To create these split graphs, move the main variable into the Variable box and move
the organizing variable into the “Split” box. Be sure to select “Correlation Plots” as
well as Boxplots.
In the case of Continuous data that has been split by some selected category, in this
case we are considering the Writing scores split across gender, the Box plot may
contain some useful clues about the data.
56
First Edition
When exploring continuous data “split” by some categorical data, the correlation
plot may be a useful visualization. In the Plots section, check the Correlation Plot
box.
The Correlation Plot will produce a density graph for each of the categories in the
“Split” variable.
Male Female
57
JASP Guide
By exploring the descriptive statistics for our data, questions may begin to emerge
and suggest directions for further investigation. In the case of our writing scores we
may notice that the median score for girls is higher than the boys’ median score. We
can also see that the writing assessment for boys seems to have a greater spread and
variability.
Once the data’s descriptive statistics has been explored, patterns may have emerged
and questions may have arisen based on the data. We are now ready to begin
inferential statistical analysis.
58
Section III: Frequentist
Approaches
59
Chapter 8
Relationship Analysis with t-Test
69
JASP Guide
The following examples will use the Sample Student Data (SDS) dataset; see
Resources for Codebook and data values.
The question we are asking this data is whether or not there are differences in the
students’ science scores compared to a “hypothetical” National average science
score?
Using the t-Test tab, select One Sample t-Test. With One Sample t-Test we will
investigate the differences between one group’s performance on a measure
compared to some known average on the SAME measure.
When using the One Sample t-Test dialogue window, select the test variable. Be sure
to move the measure to be compared into the test variable window. Move the
variable by either highlighting the variable and clicking on the arrow next to the
variable window or by dragging the variable into the window.
In this example we are testing the science assessment compared to the “National
average” scores for the science assessment. This comparison is a hypothetical
example.
70
First Edition
We will also enter the known average for this measure. The known average would
come from a source outside of your own data collection, such as a norm referenced
test or some national assessment measure that has a published average. In this
example we are using a score of “50” to represent the known average for this
assessment.
In the Tests settings, we have checked both Student’s t-Test and the Wilcoxon
signed-ranked t-Test. The Student’s t-Test should be used if the data meets our
assumption of “Normality”, while the Wilcoxon signed-ranked t-Test is designed to
be used with data that does not meet this “Normality” assumption.
For this example we are testing if the sample population science assessment score is
“different” than the known average. This can be found in the Hypothesis setting
with “≠ Test Value”. You could also select one of the directional hypothesis if there
was a reason to test for directional differences.
The Assumption Checks setting should be used to verify normality within our data.
As stated above, if the data meets the normality assumption, we can use the
Student’s t-Test. If the data does not meet the normality assumption, we can use the
Wilcoxon signed-rank t-Test.
We will also select the effect size and descriptives tables for the Results output.
With many of the Frequentist analysis methods, there is an “assumption” that the
data exhibits a normal distribution. There are two ways to check for normality within
your data; mathematically or graphically.
71
JASP Guide
JASP includes the mathematical normality check as an option in the analysis settings,
as seen in the figure above. If the data does not “pass” this mathematical normality
check, then the results table will contain a warning. However if the sample size is
either too small or too large, the normality check may incorrectly measure the
normality of the data.
The graphical normality check involves the researcher inspecting the distribution plot
of the data and determining if it appears to have normal distribution characteristics.
science
The Assumptions Checks in the JASP settings will give us a mathematical measure of
normality.
Assumption Checks
Test of Normality (Shapiro-Wilk)
W p
science 0.99 0.03
The results indicate that there is a statically significant deviation from the normal
distribution within our data (p = 0.03). We should note that the actual p-value is
fairly close to the critical value of p = 0.05. This result is consistent with the
distribution graph of our data.
The resulting output table will show the significance level (p-value), along with the
mean difference and the confidence interval for the mean difference. In this case we
72
First Edition
find that the p-value for our measure when compared to the known mean is
statistically significant.
Note. For the Student t-test, effect size is given by Cohen's d ; for the Wilcoxon test, effect size is
given by the matched rank biserial correlation.
Descriptives
N Mean SD SE
science 200 51.85 9.90 0.70
In this analysis we will use the Wilcoxon test since the data does deviate from the
normality assumption. The One Sample T-Test results table indicates that the mean
score in our sample population is statistically different than the known average of
“50” (p = 0.01). We also find that the sample has a very small effect size (d = 0.02).
The question we are asking this data is whether or not there are differences in
student science scores based on their gender?
Using the t-Test tab, select “Independent Samples T-Test” from the pull-down
menu. With Independent Samples we will investigate the differences between TWO
groups on the SAME measure.
73
JASP Guide
When using the Independent Samples t-Test select the test variable for the
dependent variable window and the groups or factor for the grouping variable
window. Be sure that the grouping variable only contains two grouping factors.
In this example we will be looking for differences in the science assessment based on
a student’s gender (boys and girls).
If you select a grouping variable that contains more than two groups, the JASP
results window will give an error message.
74
First Edition
We will also use the menu below the variable window to select the settings needed
for this analysis. In this example we are using the Student t-Test.
The hypothesis that we will use in this example is that the science assessment scores
for boys and girls are different. Our hypothesis does not indicate a direction for the
differences or assume which group might outperform the other.
We will also check the settings to calculate the effect size and to produce a
descriptives table for the science assessment.
The Student’s t-Test has two assumptions; 1) the data has a generally normal
distribution and 2) the data does not exhibit significant variation. The assumption
checks section of the settings allows us to verify these assumptions.
Recall that the mathematical assumption check for normality can be influenced by
sample size, so a researcher should check the normality of the data both
mathematically and graphically.
But what if our data violates these assumption checks and either does not have a
normal distribution or exhibits too much variation within the data?
75
JASP Guide
t-Test Selection
Well, luckily for us, the JASP Independent Samples t-Test includes two variations of
the test that do not rely on these assumptions; the Welch Test and the Mann-
Whitney Test.
Reviewing the Results window should begin with the Assumption Checks. Recall
that the Student’s t-Test assumes that your data generally has a normal distribution
and that there is not significant variance within the data.
Assumption Checks
Test of Normality (Shapiro-Wilk)
W p
science male 0.97 0.06
female 0.99 0.27
The Test of Normality is not statistically significant (p = 0.06) for either the boys’
science score or for girl’s science scores (p = 0.27). This result indicates that the data
does approximate the normal distribution.
76
First Edition
The question we are asking this data is if there are differences in a student’s math
score based on whether or not a student would be a first generation college student,
i.e. whether or not their parents attended college?
In this scenario we will be looking for differences in the math assessment scores
between students whose parent attended college and those whose parents did not
attend college. The same settings have been selected in the settings window as shown
in the previous example.
Again, we will begin with the assumption checks for the data.
The mathematical test of normality indicates that some of the data deviates from the
expected normal distribution. The p-value for the first generation college student
groups is less than 0.001, making the deviation statistically significant.
We also check Levene’s Test of Equality of Variances. The results show that the
differences in the variances are not statistically significant (p = 0.50), therefore we
can assume equal variances.
77
JASP Guide
Given that one assumption needed for the Student’s t-Test was violated, we will use
Welch’s t-Test since it does not rely on the assumptions of a normal distribution and
equal variances within the data.
Using Welch’s t-test we can see that the differences in math assessment scores are
significantly different between the groups (p < 0.001) with a large effect size (d =
1.33).
The descriptives for this sample can give us more information about the differences.
Group Descriptives
Group N Mean SD SE
math Parents no college 94 47.14 7.51 0.77
Parents have college 106 57.53 8.07 0.78
The math assessment mean average for the students whose parent went to college
was over 10 points greater when compared to the mean score for the the other
group whose parents did not attend college.
78
First Edition
The question that we are asking this data is whether or not there are differences in
student achievement from their pre-test assessment to their post-test assessment?
Using the t-Test tab, select Paired Samples t-Test. With Paired (Dependent) Samples
we will investigate the differences between ONE group on the TWO different
measures, such as two assessments or a pre-/post-assessment.
When using the Paired Samples t-Test, in the dialogue box select the TWO variables.
Use the arrow to move each variable into “Test Pair(s) window” or drag the desired
variables into the window.
The test settings and requirements are similar to the other t-Tests, such as the
Independent Samples t-Test and the One Sample t-Test.
79
JASP Guide
Here we have selected both the Student’s t-Test, used for data that conforms to our
assumption rules of normality, as well as the Wilcox signed-rank test used for data
that does not exhibit normal distribution.
The results indicate that the Test of Normality is not significant (p = 0.72), therefore
we can assume that the data has generally normal distribution.
The Paired Samples t-Test table indicates that the differences in pre- and post- test
are statistically significant (p < 0.001) with a large effect size (d = -0.72).
80
Section IV: Bayesian Approaches
139
Chapter 18
Relationship Analysis with Bayesian t-Test
149
JASP Guide
The following examples will use the Sample Student Data (SDS) dataset; see
Resources for Codebook and data values.
The question we are asking this data is whether or not there are differences in the
students’ science scores compared to a “hypothetical” National average science
score?
Using the t-Test tab, select Bayesian One Sample t-Test. With One Sample t-Test we
will investigate the differences between one group’s performance on a measure
compared to some known average on the SAME measure.
When using the One Sample t-Test dialogue window, select the test variable. Be sure
to move the measure to be compared into the test variable window. Move the
variable by either highlighting the variable and clicking on the arrow next to the
variable window or by dragging the variable into the window.
In this example we are testing the science assessment compared to the “National
average” scores for the science assessment. This comparison is a hypothetical
example.
150
First Edition
We will also enter the known average for this measure. The known average would
come from a source outside of your own data collection, such as a norm referenced
test or some national assessment measure that has a published average for the entire
test group. In this example we are using a score of “50” to represent the known
average for this assessment.
For this example, we are testing if the sample population science assessment score is
“different” than the known average. This can be found in the Hypothesis setting
with “≠ Test Value”. You could also select one of the directional hypothesis if there
was a reason to test for directional differences.
The Bayes Factor of BF10 is selected to produce the probability or likelihood of the
Alternative Hypothesis (H1) compared to the Null Hypothesis (H0). We could have
also selected the inverse of this hypothesis, namely the probability or likelihood of
the Null Hypothesis (H0) compared to the Alternative Hypothesis (H1).
In some cases the Bayes Factor will be a decimal probability, which may be more
difficult to interpret. In such cases this can be remedied by selecting the other Bayes
Factor argument. The resulting probability may be more meaningful or clearly
interpreted.
In the Plots settings we are interested in the Prior and Posterior plots with additional
information as well as the Bayes factor robustness check with additional information.
151
JASP Guide
Prior setting
The default Prior of 0.707 is set by the JASP default setting. Unless there is some
other evidence to suggest an alternative or updated prior width, the default should be
used for the analysis.
It should be noted that under the effect size suggestions of Cohen (1988), a t-Test
large effect size of 0.8 is recommended.
Descriptives
N Mean SD SE
science 200 51.85 9.90 0.70
The Bayesian One Sample T-Test results table indicates that the mean score in our
sample population is statistically different than the known average. The Alternative
Hypothesis that the scores will be different is 2.33 times more likely than the Null
Hypothesis of no difference in scores, as described by the Bayes Factor.
The Bayes Factor (BF10) = 2.33 can be interpreted as “anecdotal evidence” in favor
of the alternative hypothesis or as some authors state, “barely worth mentioning”.
152
First Edition
When we examine the plots that were selected, the Prior and Posterior plot as well as
the Bayes Factor robustness check, we can get a sense of the strength of the
evidence.
The Prior and Posterior plot gives a visual representation of how the posterior prior
changes with respect to the prior given the evidence. Here we see the Bayes Factor
for both the Alternative hypothesis compared to the Null (BF10 = 2.33) and the Null
hypothesis compared to the Alternative (BF01 = 0.43).
We are also shown a revised median for the effect size range, here given as median =
0.184. The default median range is set at zero. The 95% confidence interval is also
given for the posterior distribution [0.014 to 0.323].
The circle on the posterior graph being lower than the circle on the prior graph
indicates that the evidence is in favor of the alternative hypothesis. Since the circles
are so close together on the graph this also suggests that the evidence is weak.
153
JASP Guide
The Bayes robustness check gives the researcher a sense of how the Bayes factor
would change given different prior distributions. In the case of our example above,
the highest BF was achieved with a very small prior distribution of less than 0.25
Cauchy prior width. Our prior width of 0.707 had similar BF’s as the wide and
ultrawide Cauchy prior widths, all falling within the “anecdotal” range.
The question we are asking this data is whether or not there are differences in
student math scores based on whether or not the student attended college after high
school graduation?
Using the t-Test tab, select Bayesian Independent Samples t-Test from the pull-down
menu. With Independent Samples we will investigate the differences between TWO
groups on the SAME measure.
154
First Edition
When using the Bayesian Independent Samples t-Test select the test variable for the
dependent variable window and the groups or factor for the grouping variable
window. Be sure that the grouping variable only contains two grouping factors.
In this example we will be looking for differences in the math assessment based on
after high school college attendance.
If you select a grouping variable that contains more than two groups, the JASP
results window will give an error message.
155
JASP Guide
We will also use the menu below the variable window to select the settings needed
for this analysis.
The settings for the Bayesian Independent Sample t-Test is very similar to the
previous example. Our hypothesis states that the group means are not equal (Group
1 ≠ Group 2). We are interested in the Bayes Factor BF10 and the plots to determine
the strength of our outcomes.
Prior setting
156
First Edition
Group Descriptives
Group N Mean SD SE
math No college after HS 67 43.07 3.93 0.48
College after HS 133 57.47 7.39 0.64
The Bayes Factor (BF10) indicates that it is much more likely that the math scores will
be different for those students who later attend college when compared to those
who did not attend college. This result would be classified as “extreme” evidence in
favor of the alternative hypothesis.
157
JASP Guide
The Prior and Posterior plot gives a visual representation of how the posterior prior
changes with respect to the prior given the evidence. Here we see the Bayes Factor
for both the Alternative Hypothesis compared to the Null (BF10 = 4.67 X 1030) and
the Null Hypothesis compared to the Alternative (BF01 = 2.14 X 10-31).
We are also shown a revised median for the effect size range, here given as median =
-2.203. The default median range was set at zero. The 95% confidence interval is also
given for the posterior distribution [-2.566 to -1.851].
The circle on the posterior graph being lower than the circle on the prior graph
indicates that the evidence is in favor of the alternative hypothesis. Since the circles
are farther apart on the graph this also suggests that the evidence is strong, or in this
case “extreme” evidence.
158
First Edition
The Bayes robustness check gives the researcher a sense of how the Bayes factor
would change given different prior distributions. In the case of our example above,
all of the BF Cauchy prior width ranges achieved BF results indicating “extreme”
evidence for the Alternative hypothesis. Our prior width of 0.707 had very similar
BF’s as the wide, ultrawide, and maximum Cauchy prior widths, all falling within the
“extreme” range.
The question that we are asking this data is whether or not there are differences in
student achievement from their pre-test assessment to their post-test assessment?
Using the t-Test tab, select Bayesian Paired Samples t-Test. With Paired (Dependent)
Samples we will investigate the differences between ONE group on the TWO
different measures, such as two assessments or a pre-/post-tests.
159
JASP Guide
When using the Bayesian Paired Samples t-Test, in the dialogue box you will need to
select the TWO variables. Use the arrow to move each variable into “Test Pair(s)
window” or drag the desired variables into the window.
In this example we will be looking at a Before_TI pre-test and the After_TI post-
test.
Here we are interested in possible changes to a student’s test score after there is
some testing intervention. The window above represents the Before Testing
Intervention score compared to the After Testing Intervention score.
The test settings and requirements are similar to the other t-Tests, such as the
Independent Samples t-Test and the One Sample t-Test.
160
First Edition
Here we have the Hypothesis setting, along with settings for the Bayes Factor,
Descriptives, and the Plots.
Prior setting
Descriptives
N Mean SD SE
Before_TI 20 18.40 3.15 0.70
After_TI 20 20.45 4.06 0.91
The Bayes Factor (BF10) indicates that The Alternative Hypothesis is over 10 times
more likely (BF10 = 10.23) than the Null Hypothesis. This result would be classified
as “strong” evidence in favor of the alternative hypothesis that the test scores
significantly changed after the testing intervention.
161
JASP Guide
The Prior and Posterior plot gives a visual representation of how the posterior prior
changes with respect to the prior given the evidence. Here we see the Bayes Factor
for both the Alternative Hypothesis compared to the Null (BF10 = 10.23) and the
Null Hypothesis compared to the Alternative (BF01 = 0.09).
We are also shown a revised median for the effect size range, here given as median =
-0.878. The 95% confidence interval is also given for the posterior distribution
[-1.566 to -0.274].
The circle on the posterior graph being lower than the circle on the prior graph
indicates that the evidence is in favor of the alternative hypothesis. Since the circles
are farther apart on the graph this also suggests strong evidence in favor of the
Alternative hypothesis.
162
First Edition
The Bayes robustness check gives the researcher a sense of how the Bayes factor
would change given different prior distributions. In the case of our example above,
the highest BF was achieved with a small prior distribution of about 0.5 Cauchy prior
width. Our prior width of 0.707 had similar BF’s as the wide and ultrawide Cauchy
prior widths, all falling within the “Moderate” and “Strong” evidence border range.
163
First Edition
Chapter 23
Concluding Thoughts
Hopefully you have found this guide to be a useful tour through statistical analysis.
There are many more methods of analysis available to us through statistics. It is up
to the researcher to understand the appropriate use of these methods and to be able
to select the necessary tools to answer their questions.
With descriptive statistics we are able to get a sense of the make-up and
characteristics of our data. We gain information about the population demographics
as well as any variation within the sample. The graphic representations, such as pie
charts, histograms, and barcharts allow us to visually inspect the data.
In our tests for differences within data we explored the uses of Chi Square analysis
when dealing with categorical data, such as population descriptors. The t-Test
uncovers differences within continuous data, such as test scores, between two
groups. If we need to explore the differences within continuous data between more
than two groups, the ANOVA analysis can provide this information.
We are also able to determine associations between two data sets. Correlations, along
with scatterplots, can point out both positive and negative associations. The strength
of this association can also be measured. Once we know that an association exists,
187
JASP Guide
The true power of statistical analysis comes when we use it to test models and uncover
subtle differences within our data. Thoughtful consideration of your statistical results
can lead to rich questions about the world in which we live.
188
First Edition
Section V: Resources
189
Analysis Memos
The purpose of writing an analysis memo is to keep all your analyses organized and
to have written documentation of all the analysis you do. This way you will know
what paths you went down, which ones lead to interesting places, and you will have
the writing to include in your dissertation/paper when needed.
I. Question
This is expressed in terms that can be answered with our data. (ie. Are there gender
differences in responses to questions X,Y,Z?)
III. Results.
Include tables and/or charts with results that could be input into formal writing if
needed.
IV. Discussion.
Thoughts or reactions to results. Can be formal or informal writing.
190
First Edition
II. Method: A chi-square test of group difference was conducted on SES, with three
categories, by School type, with two categories.
III. Results: The chi-square test of group differences was significant (X2 (2) = 6.33, p
= .04), indicating that there are statistically significant group differences in SES by
type of school.
IV. Discussion: Although the majority of students from all three SES groups attend
public school, fewer low income students attend private school than any other SES
groups, while more middle income students attend private school than any other
SES groups. This is an interesting finding as one might assume that students from
high SES backgrounds would be more likely to attend private schools. I wonder if
this could be due to higher SES families living in school districts with better public
schools, while middle income families may not have access to the best public
schools, but do have the financial means to send their children to private schools.
191
JASP Guide
192
JASP Guide
]]gbhibbj de`f_k
Eta Squared ANOVA η2 =
]]h`hla
228
First Edition
The Chi Square analysis has two main calculations for effect size, Phi (ϕ) or Cramer’s
Phi (ϕc). For crosstabs tables that are 2 X 2 we will use Phi. A crosstabs table that is
described as 2 X 2 will have exactly 2 rows and 2 columns.
With crosstabs tables that are greater than 2 X 2 we use Cramer’s Phi. When we have
a crosstabs table that is greater than 2 X 2 this means that output table has either 3
or more rows, 3 or more columns, or both the rows and columns have 3 or more
entries.
In the current version of JASP both Phi and Cramer’s Phi are produced by the
Crosstabs command when you select “Phi” in the statistics selection.
In the Cramer’s Phi formula the 𝜒2 is equal to the Chi Square value produced by
JASP. The N is equal to the total number of observances or samples. For k we use
the lesser value from the number of rows or the number of columns.
229
JASP Guide
TDUV Z\mmDGDVoD
Cohen’s d (d) d=
CYUVZUGZ ZD[\UY\HV (]^)
Or
TDUVW S TDUVX
d=
CYUVZUGZ ZD[\UY\HV (]^)
The Independent Samples t-Test effect size can be calculated with either Cohen’s d
(in some cases referred to as Hedge’s g) or the r2 value. In this case since we will
have standard deviations for two separate (independent) data sets, we must use a
“pooled standard deviation” value in the equation.
TDUV Z\mmDGDVoD
Cohen’s d (d) d=
CYUVZUGZ ZD[\UY\HV ]^ FHHpDZ
TDUVW S TDUVX
or d=
]^ _``abc
The equation to calculate the pooled standard deviation (SDpooled) uses the standard
deviation from each group in the equation shown below.
(]^de`f_X )W E (]^de`f_W )W
SD pooled =
J
The means and standard deviation values are from the t-Test (Independent samples)
Output Table.
Another option to calculate the effect size for a t-Test (Independent Samples) is to
use the r2 calculation.
2 YW
2
r value r =
Y W • Zm
230
First Edition
In the r2 formula, t is the t-value from the JASP output table and df represents the
degrees of freedom from the JASP output table.
The effect size for a one-way ANOVA tests can be calculated with eta squared (η2).
The Sum of Squares values are taken from the JASP ANOVA output table.
231
JASP Guide
232
First Edition
233
JASP Guide
r (N - 2) = Pearson’s r-
value, p < 0.001
234
First Edition
In some ways reporting a Bayesian result is far simpler than reporting other statistics.
The Bayesian methods all use the Bayes Factor (BF) with the same interpretations of
the results.
• The BF01 was 11.5, suggesting that these data are 11.5 more likely to be
observed under the null hypothesis, suggesting strong evidence for the
alternative hypothesis.
• The Bayes factor (BF01 = 12.2) suggested strong evidence that the data were
12 times more likely in favor of the null hypothesis
Or even
• Statistical analyses were conducted using the free software JASP using default
priors (JASP Team. JASP Version 0.8.6 [Computer software]). We reported
Bayes factors expressing the likelihood of the data given H1 relative to H0
assuming that H0 and H1 are equally likely. The resulting BF10 = 7.5 suggest
moderate evidence for the likelihood of the alternative hypothesis over the
null hypothesis.
There are several key components that must be included in the write-up of an
empirical paper implementing Bayesian estimation methods. These could include;
235
235
References
Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed). Hillsdale, N.J:
L. Erlbaum Associates.
Eagle, E., & Carroll, C. D. (1988). High school and beyond national longitudinal study:
postsecondary enrollment, persistence, and attainment for 1972, 1980, and 1982 high school
graduates. Washington, D.C: National Center for Education Statistics, U.S. Dept. of
Education, Office of Educational Research and Improvement
Kline, R. B. (2004). Beyond significance testing: reforming data analysis methods in behavioral
research (1st ed). Washington, DC: American Psychological Association.
Howell, D. C. (1982). Statistical methods for psychology. Boston, Mass: Duxbury Press.
Jeffreys, H. (1961). Theory of probability (3rd Ed.). Oxford, UK: Oxford University
Press.
National Governors Association Center for Best Practices, Council of Chief State
School Officers (2010). Common Core State. National Governors Association Center
for Best Practices, Council of Chief State School Officers, Washington D.C.
Lee, M.D., & Wagenmakers, E.-J. (2013). Bayesian Modeling for Cognitive Science: A
Practical Course: Cambridge University Press
Wetzels, R., Matzke, D., Lee, M. D., Rouder, J. N., Iverson, G. J., & Wagenmakers,
E.-J. (2011). Statistical Evidence in Experimental Psychology: An Empirical
Comparison Using 855 t Tests. Perspectives on Psychological Science, 6(3), 291–298.
https://doi.org/10.1177/1745691611406923
236
First Edition
Index
A H
Analysis of Covariance (ANCOVA), 100 Harold Wenglinsky, 4
ANOVA, 81, 166 Homogeneity, 26
Assumption Checks, 94 hypergeometric, 145
B I
Bayes Factor, 18, 23 independent factor, 81, 100, 166, 179
Bayes’ Theorem, 17 independent multinomial, 145
binomial logistic regression, 117 Independent samples t-Test, 70, 74, 152, 157
Bivariate Correlation, 109, 186 independent variable, 82, 167
inter-rater reliability, 130
C
J
categorical, 7
categorical control variable, 102 joint multinomial, 145
Categorical data, 7, 28
Cauchy distribution, 21
Center on Education Policy, 4
K
Chi Square, 60, 142 Kendyll’s Tau, 186
Codebook, 27 kurtosis, 55
coefficients, 116
Contingency tables, 60, 142
Continuous, 30
L
Continuous data, 7, 28, 82, 101, 109, 167, 180, leptokurtic, 55
186 Levene’s statistic, 87
correlation, 109, 186
covariant, 100
Cronbach’s Alpha, 126, 128 M
Mark Twain, 7
D mean, 46
median, 46
dependent, 114
dependent variable, 81, 82, 101, 166, 167, 180
Descriptive statistics, 46 N
discreet data, 7 Negative correlation, 111, 188
nominal, 8, 29, 30
E Normal distributions, 22
Normality, 26
eigenvalue, 132 numerical values, 28
Equality of Variance, 87
O
F
One-Way ANOVA, 81, 100, 166
Factor Analysis, 131 ordinal, 8, 29, 30
frequencies, 46
237
JASP Guide
R U
R Squared, 115 uniform distribution, 22
rational numbers, 7
regression analysis, 113
Reliability, 126
238
239
JASP Guide
Christopher P. Halter, Ed.D, is a faculty member at the University of California San Diego’s
Department of Education Studies. He teaches courses in mathematics education, secondary
mathematics methods, research methodology, emerging technologies, and statistical analysis.
His research includes teacher development, new teacher assessment, digital storytelling, and
video analysis. He also teaches online courses in creating online collaborative communities,
middle school science strategies, and blended & synchronous learning design.
240