0 оценок0% нашли этот документ полезным (0 голосов)

7 просмотров8 страницMar 11, 2020

© © All Rights Reserved

0 оценок0% нашли этот документ полезным (0 голосов)

7 просмотров8 страницВы находитесь на странице: 1из 8

predicts the probability that an observation falls into one of two categories of a

dichotomous dependent variable based on one or more independent variables that can

be either continuous or categorical. If, on the other hand, your dependent variable is a

count, see our Poisson regression guide. Alternatively, if you have more than two

categories of the dependent variable, see our multinomial logistic regression guide.

For example, you could use binomial logistic regression to understand whether exam

performance can be predicted based on revision time, test anxiety and lecture

attendance (i.e., where the dependent variable is "exam performance", measured on a

dichotomous scale – "passed" or "failed" – and you have three independent variables:

"revision time", "test anxiety" and "lecture attendance"). Alternately, you could use

binomial logistic regression to understand whether drug use can be predicted based on

prior criminal convictions, drug use amongst friends, income, age and gender (i.e.,

where the dependent variable is "drug use", measured on a dichotomous scale – "yes"

or "no" – and you have five independent variables: "prior criminal convictions", "drug

use amongst friends", "income", "age" and "gender").

The steps for interpreting the SPSS output for a logistic regression

1. Scroll down to the Block 1: Method = Enter section of the output.

2. Look in the Omnibus Tests of Model Coefficients table, under

the Sig. column, in the Model row. This is the p-value that is interpreted.

If the p-value is LESS THAN .05, then researchers have a significant

model that should be further interpreted.

If the p-value is MORE THAN .05, then researchers do not have a significant

model and the results should be reported.

3. Look in the Hosmer and Lemeshow Test table, under the Sig. column. This is

the p-value you will interpret.

If the p-value is LESS THAN .05, then the model does not fit the data.

If the p-value is MORE THAN .05, then the model does fit the data and

should be further interpreted.

4. Look in the Classification Table, under the Percentage Correct in the Overall

Percentage row. This is the total accuracy of the model. Researchers want it to

ultimately be at least 80%.

5. Look in the Variables in the Equation table, under the Sig., Exp(B),

and Lower and Upper columns. The Sig. column is the p-value associated with

the adjusted odds ratios and 95% CIs for each predictor, clinical, demographic, or

confounding variable. The value in the Exp(B) is the adjusted odds ratio.

The Lower and Upper values are the limits of the 95% CI associated with the

adjusted odds ratio.

6. Researchers will interpret the adjusted odds ratio in the Exp(B) column and the

confidence interval in the Lower and Upper columns for each variable.

If the confidence interval associated with the adjusted ratio crosses over 1.0,

then there is a non-significant association. The p-value associated with

these variables will also be HIGHER than .05.

If the adjusted odds ratio is ABOVE 1.0 and the confidence interval is

entirely above 1.0, then exposure to the predictor increases the odds of the

outcome.

If the adjusted odds ratio is BELOW 1.0 and the confidence interval is

entirely below 1.0, then exposure to the predictor decreases the odds of the

outcome.

If the variable is measured at the ordinal or continuous level, then the

adjusted odds ratio is interpreted as meaning for every one unit increase in

the ordinal or continuous variable, the risk of the outcome increases at the

rate specified in the odds ratio.

Example 1

A health researcher wants to be able to predict whether the "incidence of heart

disease" can be predicted based on "age", "weight", "gender" and "VO2max" (i.e.,

where VO2max refers to maximal aerobic capacity, an indicator of fitness and health).

To this end, the researcher recruited 100 participants to perform a maximum VO2max

test as well as recording their age, weight and gender. The participants were also

evaluated for the presence of heart disease. A binomial logistic regression was then

run to determine whether the presence of heart disease could be predicted from their

VO2max, age, weight and gender.

Interpreting and Reporting the Output of a Binomial Logistic Regression Analysis

SPSS Statistics generates many tables of output when carrying out binomial logistic

regression. In this section, we show you only the three main tables required to

understand your results from the binomial logistic regression procedure, assuming that

no assumptions have been violated.

Variance explained

In order to understand how much variation in the dependent variable can be explained

by the model (the equivalent of R2 in multiple regression), you can consult the table

below, "Model Summary":

This table contains the Cox & Snell R Square and Nagelkerke R Square values, which

are both methods of calculating the explained variation. These values are sometimes

referred to as pseudo R2 values (and will have lower values than in multiple

regression). However, they are interpreted in the same manner, but with more caution.

Therefore, the explained variation in the dependent variable based on our model

ranges from 24.0% to 33.0%, depending on whether you reference the Cox & Snell R 2

or Nagelkerke R2 methods, respectively. Nagelkerke R 2 is a modification of Cox &

Snell R2, the latter of which cannot achieve a value of 1. For this reason, it is

preferable to report the Nagelkerke R2 value.

Category prediction

Binomial logistic regression estimates the probability of an event (in this case, having

heart disease) occurring. If the estimated probability of the event occurring is greater

than or equal to 0.5 (better than even chance), SPSS Statistics classifies the event as

occurring (e.g., heart disease being present). If the probability is less than 0.5, SPSS

Statistics classifies the event as not occurring (e.g., no heart disease). It is very

common to use binomial logistic regression to predict whether cases can be correctly

classified (i.e., predicted) from the independent variables. Therefore, it becomes

necessary to have a method to assess the effectiveness of the predicted classification

against the actual classification. There are many methods to assess this with their

usefulness often depending on the nature of the study conducted. However, all

methods revolve around the observed and predicted classifications, which are

presented in the "Classification Table", as shown below:

Firstly, notice that the table has a subscript which states, "The cut value is .500". This

means that if the probability of a case being classified into the "yes" category is

greater than .500, then that particular case is classified into the "yes" category.

Otherwise, the case is classified as in the "no" category (as mentioned previously).

Whilst the classification table appears to be very simple, it actually provides a lot of

important information about your binomial logistic regression result, including:

The percentage accuracy in classification (PAC), which reflects the percentage

of cases that can be correctly classified as "no" heart disease with the

independent variables added (not just the overall model).

Sensitivity, which is the percentage of cases that had the observed

characteristic (e.g., "yes" for heart disease) which were correctly predicted by

the model (i.e., true positives).

Specificity, which is the percentage of cases that did not have the observed

characteristic (e.g., "no" for heart disease) and were also correctly predicted as

not having the observed characteristic (i.e., true negatives).

The positive predictive value, which is the percentage of correctly predicted

cases "with" the observed characteristic compared to the total number of cases

predicted as having the characteristic.

The negative predictive value, which is the percentage of correctly predicted

cases "without" the observed characteristic compared to the total number of

cases predicted as not having the characteristic.

Variables in the equation

The "Variables in the Equation" table shows the contribution of each independent

variable to the model and its statistical significance. This table is shown below:

The Wald test ("Wald" column) is used to determine statistical significance for each of

the independent variables. The statistical significance of the test is found in the "Sig."

column. From these results you can see that age (p = .003), gender (p = .021) and

VO2max (p = .039) added significantly to the model/prediction, but weight (p = .799)

did not add significantly to the model. You can use the information in the "Variables

in the Equation" table to predict the probability of an event occurring based on a one

unit change in an independent variable when all other independent variables are kept

constant. For example, the table shows that the odds of having heart disease ("yes"

category) is 7.026 times greater for males as opposed to females. If you are unsure

how to use odds ratios to make predictions, learn about our enhanced guides here.

Based on the results above, we could report the results of the study as follows:

A logistic regression was performed to ascertain the effects of age, weight, gender and

VO2max on the likelihood that participants have heart disease. The logistic regression

model was statistically significant, χ2(4) = 27.402, p < .0005. The model explained

33.0% (Nagelkerke R2) of the variance in heart disease and correctly classified 71.0%

of cases. Males were 7.02 times more likely to exhibit heart disease than females.

Increasing age was associated with an increased likelihood of exhibiting heart disease,

but increasing VO2max was associated with a reduction in the likelihood of exhibiting

heart disease.

example, I am trying see if whether or not a student said that he voted (coded 1="yes";

0="no") can be predicted by the following four variables:

Gender (Variable "q1," a dichotomous variable where male respondents are

coded "1" and women="0")

Political ideology (Variable "q16," which is coded in discreet, one unit

intervals where 1="very conservative" and 7="very liberal."

Cumulative GPA (Variable "q9," coded as a continuous variable that can range

from o.0-4.0).

Follow politics closely (Variable "q17L," which is coded in one unit, discreet

intervals ranging from 1-5. Students's answers to the question, "I follow politics

closely" in this survey ranged from 1 ("very much disagree") to 5 ("very much

agree").

Interpretation:

The results in the SPSS output window will have many tables; we are interested only

in the following two tables (pay close attention to the table that are listed towards the

very end of the output):

Although the logic and method of calculation used in logistic regression is different

than that used for regular regression, SPSS provides two "pseudo R-squared statistics"

(this is the term we use when we report this data), that can be interpreted in a way that

is similar to that in multiple regression. The main difference between the Cox and

Snell measurement and the Nagelkerke measure is that the former tends to produce

more conservative (that is lower) pseudo R2s than the latter measure.

In political science, most researchers use the more conservative Cox and Snell pseudo

R2 statistic. The Cox and Snell pseudo R2 statistic reported in Figure 3 is generally

interpreted to mean:

"the four independent variables in the logistic model together account for 15.7 percent

the explanation for why a student votes or not."

Generally speaking, the higher the pseudo R-squared statistic, the better the model fits

our data. In this case, we would probably say that the model we have built

"moderately" fits our data (in other words, although the model accounts for a

significant amount of the variation in whether or not a student votes, there are also

lots of other variables not in our model which influence this decision).

You should be aware of the fact that there is much debate among scholars over which

statistics should be reported when using logistic regression, and many articles and

books using this technique will employ other measures to assess how well a given

logistic regression model "fits"--that is precisely includes the correct independent

variables and only the right variables. Nevertheless, the reason the Cox and Snell

pseudo R-squared statistic is automatically calculated by SPSS is because it is both

widely reported and fairly straightforward to understand and explain. It closely

resembles the much more universally accepted R-squared statistic that we use to

assess model fit when using OLS multiple regression.

dependent variable when controlling for other variables. Figure 4 above reports the

partial logistic regression coefficients for each independent variable in the model in

the column marked "B." PLEASE NOTE: THESE COEFFICIENTS DO NOT HAVE

THE SAME MEANING IN LOGISTIC REGRESSION THAT THEY HAVE IN

REGULAR BIVARIATE AND MULTIPLE REGRESSION!!! By themselves, these

coefficients do not have a meaning that is easily explained or understood except by

experts.

To assess the isolated impact of each independent variable, we instead want look at

what are called the "odds ratios," which are listed in Figure 4 in the column titled

Exp(B).

The easiest way to explain how to interpret an odds ratio is to use an example from the

table. Recall that Q1 is the variable name for the gender variable (male=1; female=0).

What the logistic regression results in Figure 4's Exp(B) collumn say is:

"Controlling for differences in political ideology, GPA, and the extent to one agrees

with the statement 'I follow politics closely,' being a male student increases the

likelihood of voting by 1.46 times."

The political ideology variable is coded 1-7, where each one unit increase means that

a student self-identified as being increasingly "liberal." Interpreting odds-ratio for Q16

(the variable name for the ideology variable) indicates that:

"For every one unit increase in a student's liberalism (as measured by a 7-unit index),

the likelihood of voting decreased slightly (by .96 times), after controlling for the

other factors in the model."

The other two independent variables might be interpreted in the following manner:

"For each full grade increase in cumulative GPA (one full point on the four point

grading scale), students were nearly four times as likely to vote, controlling for all

factors included in the model. In interpreting this figure, it is important to keep in

mind that the great majority of students in this sample have grade point averages that

range between 3.0 and 3.9."

"Students who report that they follow politics regularly were much more likely to

vote. Students were coded on a five point discreet scale ranging from strongly

disagree (1) to strongly agree (5) with the statement 'I follow politics closely.' In the

logistic regression model, every one-unit shift towards the "strongly agree" category

corresponded with an increased likelihood of voting by 1.87 times."

As with the pseudo R-square statistic, there is some debate over how logistic partial

regression statistics should be interpreted, which means that you may read logistic

regression tables where other measures are used. Unfortunately, not all social

scientists using logistic regression will report odds-ratios. SPSS reports this statistic

because they it is a widely-used and easily-understood measure of how each the

independent variable influences the value a dichotomous variable will take,

controlling for the other independent variables in the model.

Verifying that dependent variables are statistically significant explanations for

variations in the dependent variable. Finally, we must look at Figure 4 one more time

to examine each independent variable's significance statistic. The odds ratio statistics

we have just analyzed are based on a fairly small sample (the first table in the SPSS

output for this regression model--not shown above--indicates that fewer than 200

students were included in the sample). We need to figure out whether or not these

statistics that show an increase or decrease in the likelihood of a given student voting

are reliable. In other words, we want to know if the odds-ratios in Figure 4 that show

our independent variables affecting the likelihood of voting are possibly due to

random chance because these statistics were generated from a sample rather than a

survey of the entire population.

Could the relationships we in Figure 4's odds ratios (as listed in the the Exp(B)

column) be due solely to chance? The significance statistics in Figure 4 show that

answer to this question is clearly yes for both the gender (Q1) and political ideology

(Q16) variables. We cannot say with a high degree of confidence (better than 95

percent certainty) that the relationships we found between these variables and a

change in the likelihood of voting in our sample would hold true in the population as a

whole because there is more than a .05 chance that our observed relationship between

voting and these two variables is due to random survey error.

For GPA (Q9), the significance statistic indicates that the probability that a full grade

increase in GPA actually corresponds to no increase (or a decrease) in the likelihood

of a student voting is essentially nonexistent. Our odds-ratio is statistically significant

because there is only one chance in a thousand that we have observed a relationship in

our sample that would not be found in the larger population from which our sample

was drawn.

## Гораздо больше, чем просто документы.

Откройте для себя все, что может предложить Scribd, включая книги и аудиокниги от крупных издательств.

Отменить можно в любой момент.