You are on page 1of 25

Additive and multiplicative models

(AS08)
EPM304 Advanced Statistical Methods in Epidemiology

Course: PG Diploma/ MSc Epidemiology

This document contains a copy of the study material located within the computer
assisted learning (CAL) session.
If you have any questions regarding this document or your course, please contact
DLsupport via DLsupport@lshtm.ac.uk.
Important note: this document does not replace the CAL material found on your
module CDROM. When studying this session, please ensure you work through the
CDROM material first. This document can then be used for revision purposes to
refer back to specific sessions.
These study materials have been prepared by the London School of Hygiene & Tropical Medicine as part of
the PG Diploma/MSc Epidemiology distance learning course. This material is not licensed either for resale
or further copying.
London School of Hygiene & Tropical Medicine September 2013 v2.0

Section 1: Additive and multiplicative models


Aim

To consider how the effect of 2 categorical variables combine, in terms of


their effect on the outcome variable
To learn how additive models may sometimes be used as an alternative to
multiplicative models with interaction terms

Objectives
By the end of this session you will be able to:

describe the difference between an additive model and a multiplicative model


obtain an additive model for modelling risks or rates, from cohort and cross
sectional study designs
compare the fit to the data of an additive model with the fit to the data of a
multiplicative model
think about the statistical and biological considerations for choosing between
an additive and a multiplicative model

This session should take you between 1h 15m and 2 hours to complete.

Section 2: Planning your study


The models we have fitted in previous sessions have all been multiplicative. This
means that model parameters represent ratios (e.g. rate ratios, odds ratios) and the
effects of two factors are assumed to combine multiplicatively. This assumption
seems to work well for many risk factors in epidemiology; we can always fit
interaction terms when the assumption is not appropriate.
In this session, we see how additive models may sometimes be used as an
alternative to multiplicative models with interaction terms.
With additive models, we assess the effect of explanatory variables with risk or rate
differences, rather than with risk, rate or odds ratios.
To work through this session you should know about regression models, specifically
Logistic and Poisson regression. You should also know about the strategy for
building these models. If you need to review any materials before you continue refer
to the appropriate sessions below.
Framework for
regression models
Logistic regression
Poisson regression

AS01
SM07, SM08, SM09
SM11, AS05

2.1: Planning your study


To illustrate methods in this session, we will use data from the Whitehall cohort
study.
Whitehall Study:

A cohort study of risk factors for


mortality in men employed
in Whitehall, London.

Interaction: Hyperlink: Whitehall Study: (card appears on right handside):


This cohort study was set up to examine risk factors for mortality in male
government employees (civil servants) working around Whitehall, London.
Employees were recruited between 1967 and 1970. Information on exposure to
selected risk factors was obtained by a self-administered questionnaire and a
screening examination during this period. All participants were followed at the
National Health Service Central Registry to identify mortality and emigration.
Information on death (date and cause) was provided for those who died.
The results used in this session are from a 10% random sample of the total dataset.

Section 3: Background: Multiplicative models


The models you have fitted previously all take the general form:
log( y ) =
=

+ i xi
+ 1x1 + 2x2 +

where:
y is a measure of disease
occurrence
xi are the explanatory variables (or
categories within them)
and i (i = 1, ... , I ) are the
regression parameters, which have to be estimated from the data.
p3c1rhs
In each of the following models, what is the disease outcome measure, and in what
form is the disease outcome measure modelled:
a) a Poisson regression model?
b) a logistic regression model?
Interaction: Button: clouds picture (pop up box appears and text and an interaction
appear on bottom RHS):
For Poisson regression the outcome measure y is the rate , but remember it is log(
y ) that is modelled.

For logistic regression the outcome measure y is the odds


it is log( y ) that is modelled.

/(1 ), but remember

What form can the explanatory (independent) parameters xi take?


Interaction: Button: clouds picture (pop up box appears):
The explanatory variables can be categorical variables, quantitative variables or
interactions between the explanatory variables.

3.1: Background: Multiplicative models


Because the left-hand side of the equation is on the log scale, these models are all
multiplicative models. To illustrate this, consider the simple model with two
dichotomous variables x1 and x2 which represent two binary (0 or 1) exposures E1
and E2.
log( y ) =

+ 1x1 + 2x2

If we want the model equation on the original scale of y (so rates or odds), we need
to exponentiate both sides of the equation. Using the laws of logarithms, we have
y = exp() exp(1x1) exp(2x2)
Then the rates (or odds) for the four combinations of exposure are:

E1E1+
E2exp()
exp() exp(1)
E2+
exp() exp(2)
exp() exp(1) exp(2)
so that the effects of the two exposures multiply together.
Note that exp(1) is a rate ratio for the effect of E1 if the outcome y is a rate, and it
is an odds ratio for the effect of E1 if the outcome y is an odds. Similarly for exp(2)
Pop-up for each cell of the table:
Interaction: Button: "exp()" (pop-up box appears):
In this case there is no exposure to E1 or E2, so x1=0 and x2=0 and the rate y is
given by
exp() exp(1 x 0) exp(2 x 0)= exp()exp(0)exp(0) = exp()

Interaction: Button: "exp() exp(1)" (pop-up box appears):


In this case there is exposure to E1, but not to E2, so x1=1 and x2=0 and the rate y

is given by

exp() exp(1) x exp(2x 0) = exp() exp(1) x exp(0) = exp() exp(1)


Interaction: Button: "exp() exp(2)" (pop-up box appears):
In this case there is exposure to E2, but not to E1, so x1=0 and x2=1 and the rate y is
given by
exp() exp(1x 0) x exp(2) = exp() exp(0) exp(2) = exp() exp(2)

Interaction: Button: "exp() exp(1) exp(2)" (pop-up box appears):


In this case there is exposure to E1 and to E2, so x1=1 and x2=1, and the rate y is
given by exp() exp(1 x 1) exp(2 x 1) = exp() exp(1) exp(2)

3.2: Background: Multiplicative models


So: The effects of the exposures on log (y) combine additively, that is, the model is
additive on the log scale
But: On the original scale of y, the effects of the exposures combine multiplicatively
so that the model is multiplicative on the original scale.
When the effects of explanatory variables combine multiplicatively on the
original scale then we say that it is a multiplicative model.

3.3: Background: Multiplicative models


Sometimes the effects of explanatory variables are not of this kind. We can model
the effect of categorical variables (on the outcome) that do not combine
multiplicatively, by incorporating interaction terms.
The interaction term serves to "correct" the model so that it fits the data better.
Fitting interaction terms is how you have dealt with non-multiplicative effects so far.
Why might this be a problem? Might it be preferable to avoid interaction terms?
Interaction: Button: clouds picture (pop up box appears and card appears on
RHS):
Including interaction terms in a model adds additional parameters, and for ease of
interpretation and presentation we would prefer to avoid interaction terms if
possible.
We now consider alternative models that have an additive structure. Such models
may sometimes provide a better fit to the data without needing interaction terms.

Section 4: Additive models

The simplest alternative to the multiplicative model is the additive model. The
general form of an additive model is:
y=

+ i xi

Notice that this model is additive on the original scale of y, hence the term
additive model. The outcome y is modelled on its original scale.
This type of model is suitable for modelling rates and risks.
Interaction: Tabs: Rates:
In cohort studies, we have person-time data and y is the rate .
=

+ i xi

This is the additive rate model.


The parameters i represent rate differences whereas in the multiplicative Poisson
model they represent a log(rate ratio)
Interaction: Tabs: Risks:
For binary data, we are modelling the proportion with a given response, and y is the
proportion .

= + i xi
This is the additive risk model.
The parameters i represent risk differences whereas in logistic regression, where
we have a multiplicative model, they represent a log(odds ratio).
Note: This model involves neither logs nor odds, although it is often termed "logistic
regression with additive risks". A better term would be binomial regression. Note
also that this model is for risks (not odds), so it may not be used for case-control
studies.

4.1: Additive models


The additive model for two dichotomous variables x1 and x2, which represent two
binary (0 or 1) exposures E1 and E2, has the form:
y=

+ 1x1 + 2x2

Under this model, the rates (or risks) for the four combinations of exposure are
shown opposite. You can click the button below the table to compare this with the
multiplicative format.

Notice how the effect due to E1 (+1) and the effect due to E2 (+2) add to give
the combined effect (+1 + 2).
Notice also how the effect of E1 - as measured by 1 - is measured as a rate (or
risk) difference.
And the effect of E2 - as measured by 2 - is measured also as a rate (or risk)
difference.
In contrast, with a multiplicative model 1 and 2 were each (i) a log(odds ratio), if
y was the log(odds of outcome) or (ii) a log(rate ratio), if y was the log(rate of
outcome)
Additive model
E1
E1+
E2

+1

E2+
+2
+1+2
Since the outcome is modelled on the original scale, there is no need to exponentiate
the coefficients.
Interaction: Button: (pop up box appears):
In this case, there is no exposure to E1 nor to E2. So x1 = 0 and
x2 = 0, and the rate y is given by:
y = + 1x1 + 2x2
= + (1 x 0) + (2 x 0)
=
Interaction: Button: +1 (pop up box appears):
In this case, there is exposure to E1 but not to E2. So x1 = 1 and
x2 = 0, and the rate y is given by:
y = + 1x1 + 2x2
= + (1 x 1) + (2 x 0)
= + 1
Interaction: Button: + 2 (pop up box appears):
In this case, there is exposure to E2 but not to E1. So x1 = 0 and
x2 = 1, and the rate y is given by:

y = + 1x1 + 2x2
= + (1 x 0) + (2 x 1)
= + 2
Interaction: Button:

+ 1 + 2 (pop up box appears):

In this case, there is exposure to both E1 and E2. So x1 = 1 and


x2 = 1, and the rate y is given by:
y = + 1x1 + 2x2
= + (1 x 1) + (2 x 1)
= + 1 + 2
Interaction: Button: Compare (above table changes to the following):
Multiplicative model
E1
E1+
E2
exp()
exp() exp(1)
E2+
exp() exp(2)
exp() exp(1) exp(2)
Interaction: Button: exp() (pop up box appears):
In this case, there is no exposure to E1 nor to E2. So x1 = 0 and
x2 = 0, and the rate y is given by:
y = exp() exp(1x1) exp(2 x2)
y = exp() exp(0) exp(0)
y = exp()
Interaction: Button: exp() exp(1) (pop up box appears):
In this case, there is exposure to E1 but not to E2. So x1 = 1 and
x2 = 0, and the rate y is given by:
y = exp() exp(1 x1) exp(2 x2)
y = exp() exp(1) exp(0)
y = exp() exp(1)
Note that the effect of x1 is measured as :
rate if x1=1 / rate if x1=0
= exp() exp(1) / exp()
= exp(1)
So exp(1) is a rate (or odds) ratio

Interaction: Button: exp() exp(2) (pop up box appears):


In this case, there is exposure to E2 but not to E1. So x1 = 0 and
x2 = 1, and the rate y is given by:
y = exp() exp(1x1) exp(2 x2)
y = exp() exp(0) exp(2)
y = exp() exp(2)
Interaction: Button: exp() exp(1)exp(2) (pop up box appears):
In this case, there is exposure to both E1 and E2. So x1 = 1 and
x2 = 1, and the rate y is given by:
y = exp() exp(1x1) exp(2 x2)
y = exp() exp(1) exp(2)
y = exp() exp(1) exp(2)

4.2: Additive models


Comparison between a multiplicative and an additive model
On the following cards you will compare a multiplicative model with an additive
model. You will examine the strength of evidence for the need for interaction terms
in each type of model, and compare the models in terms of whether they do, or do
not, need interaction terms in order to give an adequate fit to the data.
To illustrate this, consider the Whitehall study of risk factors for ischaemic heart
disease (IHD) mortality. Smoking was thought to increase mortality. Mortality also
increases with age.
The two exposures were categorised as follows:
Smoking: non/ex-smokers,
current smokers
Age:
p4c4lhs

50 to 64 years,
65 to 74 years.

The outcome, y, is the ischemic heart disease (IHD) mortality rate (per 1000 personyears)
In a multiplicative Poisson model, log(y) = log(rate) is modelled.
In an additive rate model, y = rate is modelled.

4.3: Additive models


The observed data

The observed IHD mortality rates (events/person-years) are given below.


There is a higher IHD mortality rate in smokers than in non-smokers (for
both age groups), and a higher rate in those aged 65-74 years old than in
those aged 50-64 years old (for smokers and non-smokers).
Age group (x2)

Smoking
(x1)

50-64 years=0

65-74 years=1

No/ex=0

2.76 (240/86863)

8.78 (243/27692)

Yes=1

5.60 (376/37131)

12.32 (293/23786)

Use the button to swap to a table of rate ratios for the effect of age group,
separately for smokers and non-smokers, and rate differences for the effect of age,
separately for smokers and non-smokers. On the basis of these rate ratios and rate
differences, do you think the effects of smoking and age combine multiplicatively or
additively?
Interaction: Button: Swap (table on centre bottom changes to the following):
Rate ratios and differences for effect of age by smoking
Smoker
Non/ Ex Smoker
Rate ratio for age:

3.18

2.20

Rate difference for age:

6.02

6.72

Interaction: Button: The rate differences for the effect of age are similar (6.02
compared with 6.72) for non/ex-smokers and smokers. The rate ratios for age are
not so similar for non/ex-smokers and smokers (3.18 for non/ex-smokers compared
to 2.20 for smokers). This suggests that the effects of smoking and age might
combine additively, but perhaps not multiplicatively, in which case an interaction
term might be needed if we fit a multiplicative model.
We could form a similar table to summarise the effect of smoking, separately for age
group 50-64 years old, and age group 65-74 years old. If we did this, we would find
that the rate ratio for smoking in individuals aged 50-64 years old is 2.03 and the
rate ratio for smoking in individuals aged 65-74 years old is 1.40. And the rate
difference between smokers and non/ex-smokers is 2.84 for individuals aged 50-64
years old, compared to 3.54 for individuals aged 65-74 years old.

4.4: Additive models


Below are the two multiplicative models with and without interaction between age
and smoking. Use the button to swap between the tables.
Note: The output is given on the log scale, so the coefficients are log rates and log
RRs
Interaction: Button: Swap (table on centre bottom changes to the following):

10

Multiplicative Poisson model with smoking and age, and the interaction
between them

0.7066
1.1556
0.3674

Standa
rd
Error
0.0826
0.0910
0.1198

1.0163

0.0645

Coeffici
ent
Smoking
Age
Smoking.
Age
Constant

P > |z|

8.55
12.70
3.07

< 0.001
< 0.001
0.002

15.74

< 0.001

95% Confidence
Interval
0.545
0.977
0.602

0.869
1.334
0.133

0.890
1.143
Log likelihood = -1225.701

Multiplicative Poisson model with smoking and age, without interaction


Coeff
icient
Smoking

0.534
5

Age

0.942
6
1.117
8

Constant

Stand
ard
Error
0.0597

P > |z|

95% Confidence
Interval

8.95

< 0.001

0.417

0.652

0.0591

15.95

< 0.001

0.827

1.058

0.0527

21.21

< 0.001

1.014

1.221
Log likelihood = -1230.411

In the model without interaction, what is the rate ratio for smoking, to 2 decimal
places?
RR (Smoking) =
In the model with interaction, what is the rate ratio for the additional joint effect of
smoking and age, to 2 decimal places?
RR (Smoking.Age) =
Interaction: Calculation: RR (Smoking) =____:
Correct Response 1.71 (pop up box appears):
Correct
That's right, the rate ratio is given as the exponential of the coefficient for smoking:
RR = exp(0.5345) = 1.71
Incorrect Response (pop up box appears):
Sorry, the rate ratio should be calculated as the exponential of the coefficient for
smoking (because the coefficient is the log rate ratio):
RR = exp(0.5345) = 1.71

11

Interaction: Calculation: RR (Smoking.Age) =____:


Correct Response 0.69 (pop up box appears):
Correct
Yes, the rate ratio is given as the exponential of the coefficient for the smoking.age
interaction term:
RR = exp(0.3674) = 0.69
Incorrect Response (pop up box appears):
Sorry, the rate ratio should be calculated as the exponential of the coefficient for the
interaction term smoking.age (because the coefficient is the log rate ratio):
RR = exp(0.3674) = 0.69

4.5: Additive models


The likelihood ratio test for interaction gives P = 0.002. Is there evidence of
interaction when the data are analysed using a multiplicative model?
Interaction: Button: clouds picture (pop up box appears on right handside):
Yes, there is strong evidence of interaction, therefore you must include the
interaction in the multiplicative model in order to obtain a good fit of the model to
your data.
Interaction: Button: Swap (the table on centre bottom changes to the following):
Multiplicative Poisson model with smoking and age, and the interaction
between them
z

P > |z|

0.7066
1.1556
0.3674

Standa
rd
Error
0.0826
0.0910
0.1198

8.55
12.70
3.07

< 0.001
< 0.001
0.002

1.0163

0.0645

15.74

< 0.001

Coeffici
ent
Smoking
Age
Smoking.
Age
Constant

95% Confidence
Interval
0.545
0.977
0.602

0.869
1.334
0.133

0.890
1.143
Log likelihood = -1225.701

12

Multiplicative Poisson model with smoking and age, without interaction


Coeffici
ent
Smoking
Age
Constant

0.5345
0.9426
1.1178

Standa
rd
Error
0.0597
0.0591
0.0527

P > |z|

95% Confidence
Interval

8.95
15.95
21.21

< 0.001
< 0.001
< 0.001

0.417
0.827
1.014

0.652
1.058
1.221

Log likelihood = 1230.411

4.6: Additive models


The two multiplicative models can be summarised as shown in the tabs below. Move
the cursor over each term of the equation to see what it means.
Interaction: Tabs: No interaction:
Multiplicative model with no interaction:
log( ) = 1.12 + 0.53 x1 + 0.94 x2

= exp(1.12+0.53x1 +0.94x2)
= exp(1.12) x exp(0.53x1) x exp(0.94x2)
From this we can obtain the fitted mortality rates (i.e. those predicted under this
model), for each of the four combinations of smoking and age. The fitted mortality
rates are given in the table opposite.
Interaction: Scroll over log( ):
Log rate
Interaction: Scroll over 1.12:
Baseline log rate (i.e. log rate in non/ex-smokers who are aged 50-64 years
old)
Interaction: Scroll over 0.53 x1:
Log RR for smoking
Interaction: Scroll over 0.94 x2:
Log RR for age
Interaction: Tabs: Interaction:
Multiplicative model with an interaction:

13

log( ) =

1.02 + 0.71 x1 + 1.16 x2 0.37x1x2

= exp(1.02 + 0.71x1 + 1.16x2 -0.37x1x2)


= exp(1.02) x exp(0.71x1) x exp(1.16x2) x exp(-0.37x1x2)
= exp(1.02) x exp(0.71x1) x exp(1.16x2) / exp(0.37x1x2)
From this we can obtain the fitted mortality rates as shown in the table opposite.
Interaction: Scroll over: 1.02:
Log rate in the baseline group, i.e. in non/ex-smokers aged 50-64 years old
Interaction: Scroll over: 0.71 x1:
Log RR for smoking in the baseline group of age, i.e. in individuals aged 5064 years old
Interaction: Scroll over: 1.16 x2:
Log RR for age in the baseline group of smoking, i.e. in individuals who are
non/ex-smokers
Interaction: Scroll over: - 0.37x1x2:
Log RR interaction
Fitted mortality rates estimated from the multiplicative model with no
interaction:
Smoking (x1)
None / ex-smoker
(=0)
Current smoker
(=1)

Age group (x2)


50-64 years
65-74 years
(=0)
(=1)
3.06
7.85
5.21

13.33

Interaction: Button: 3.06 (pop up box appears):


exp(1.12) = 3.06
Interaction: Button: 7.85 (pop up box appears):
exp(1.12)xexp(0)xexp(0.94) = exp(2.06) = 7.85
Interaction: Button: 5.21 (pop up box appears):
exp(1.12)xexp(0.53)xexp(0) = exp(1.65) = 5.21

14

Interaction: Button: 13.33 (pop up box appears):


exp(1.12)xexp(0.53)xexp(0.94) = exp(2.59) = 13.33
Model:
log = 1.12 + 0.53 smoking + 0.94 age

= exp(1.12)xexp(0.53 smoking)xexp(0.94 age)


Interaction: Button: Explanation (pop up box appears):
Explanation
In this model we assume the effects of smoking and age combine multiplicatively (no
interaction means that effects combine multiplicatively).
The estimated mortality rate for someone exposed to neither smoking nor age is
exp(1.12) = 3.06.
The estimated mortality rate for someone exposed to both is obtained by multiplying
together the baseline rate, the effect of smoking and the effect of age: exp(1.12)
exp(0.53) exp(0.94) = 13.33.

15

Fitted mortality rates estimated from the multiplicative model with


interaction:
Smoking (x1)
None / ex-smoker
(=0)
Current smoker
(=1)

Age group (x2)


50-64 years
65-74 years
(=0)
(=1)
2.77
8.85
5.64

12.43

Interaction: Button: 2.77 (pop up box appears):


exp(1.02) = 2.77
Interaction: Button: 8.85 (pop up box appears):
exp(1.02)xexp(1.16) = exp(2.18) = 8.85
Interaction: Button: 5.64:
exp(1.02)xexp(0.71) = exp(1.73) = 5.64
Interaction: Button: 12.43:
exp(1.02)xexp(0.71)xexp(1.16)xexp(-0.37) = exp(2.52) = 12.43
Model:
log =

1.02 + 0.71 smoking + 1.16 age


0.37 smoking.age

= exp(1.02) x exp(0.71x1) x exp(1.16x2) x exp(-0.37x1x2)


Interaction: Button: Explanation (pop up box appears):
Explanation
In this model the effects of smoking and age do not combine multiplicatively.
Including the interaction term has dealt with the problem of non-multiplicative
effects. The fitted mortality rate for a non/ex-smoker aged 50-64 years old is 2.77.
The IHD mortality rate for someone exposed to both smoking and age is obtained by
multiplying together the baseline rate, the effect of smoking, and the effect of age,
and then dividing by the interaction term (since the interaction term, on the log
scale, is negative):
exp(1.02) exp(0.71) exp(1.16) / exp(0.37) = 12.43.
An interaction term different to zero suggest departures from a multiplicative model.

16

4.7: Additive models


The meaning of interaction on the multiplicative scale
Interaction: Tab 1:
The sign (positive or negative) and size of an interaction coefficient may be used to
assess whether multiplicative effects are consistent with the data. The following
interpretation is applicable provided that each of (i) the RR for x1 in the baseline
group of x2 is greater than one and (ii) the RR for x2 in the baseline group of x1 is
greater than one
OR
Both these RRs are less than one,
i.e. that these RRs are in the same direction in terms of whether they are above or
below one.
Interaction: Tab 2:
In our example, the RR for age in the baseline group of smoking is >1 (it is
exp(1.1556) = 3.18), and the RR for smoking in the baseline group of age is also >1
(it is exp(0.7066) = 2.03), so we can interpret the interaction coefficient in the
following way.
If these conditions are true (as they are in our example), then:
An interaction coefficient close to zero suggests that the effects of age and smoking
combine multiplicatively.
A large negative interaction coefficient suggests less than multiplicative effects
(additive, perhaps).
A large positive interaction coefficient suggests greater than multiplicative effects.
In the multiplicative model for smoking and age that includes interaction, the
interaction term is negative (-0.3674) and there is strong evidence that it is not zero
(p=0.002), possibly suggestive of additive effects.

4.8: Additive models


Fitting an additive rate model
Now let's look at the additive rate models with and without interaction between age
and smoking. Use the button to swap between the tables.
Comparison of the models using a likelihood ratio test gives P = 0.47. Is there
evidence of interaction in an additive model?
Interaction: Button: clouds picture (pop up box appears):
A P-value of 0.47 shows data are consistent with no interaction in the additive
model. Note that the P-value for the smoking.age interaction term in the table (from
the Wald test) also gives P = 0.47.
Interaction: Button: Swap (table on bottom centre changes to the following):

17

Additive rate model with interaction

Smoking

2.8380

Standa
rd
Error
0.3395

Age

6.0120

0.5905

Smoking.Ag
e
Constant

0.7053

0.9747

2.7630

0.1783

Coeffici
ent

P>
|z|

8.36
10.1
8
0.72

<
0.001
<
0.001
0.469

15.4
9

<
0.001

95%
Confidence
Interval
2.173
3.503
4.855

7.160

1.205

2.616

2.413

3.113

Log likelihood = 1225.701

Additive rate model without interaction


Coeffici
ent
Smoking

2.9246

Standa
rd
Error
0.3189

Age

6.2789

0.4712

Constant

2.7394

0.1746

P>
|z|

9.17

<
0.001
<
0.001
<
0.001

13.3
3
15.6
9

95%
Confidence
Interval
2.300
3.550
5.355

7.202

2.397

3.082
Log likelihood = 1225.964

In the model without interaction, what is the rate difference for smoking, to 2
decimal places?
RD (Smoking) =
In the model with interaction, what is the rate difference for the additional joint
effect of smoking and age, to 2 decimal places?
RD (Smoking.Age) =
Interaction: Calculation: RD (Smoking) =____:
Correct Response 2.92 (pop up box appears):
Correct
That's right, the rate difference is the coefficient for smoking in the table:
RD = 2.92
Incorrect Response (pop up box appears):
Sorry, the rate difference is the coefficient for smoking in the table since the
outcome is modelled on the original scale:

18

RD = 2.92
Interaction: Calculation: RD (Smoking.Age) =____:
Correct Response 0.71 (pop up box appears):
Correct
Yes, the rate difference is the coefficient for smoking.age in the table:
RD = 0.71
Incorrect Response (pop up box appears):
Sorry, the rate difference is the coefficient for smoking.age in the table since the
outcome is modelled on the original scale:
RD = 0.71

4.9: Additive models


The two additive rate models can be summarised as shown on the tabs below. Move
the cursor over each term of the equation to see what it means.
Interaction: Tabs: No interaction::
Additive rate model with no interaction:

= 2.74 + 2.92 x1 + 6.28 x2


The corresponding mortality rates are given in the table opposite.
Interaction: Scroll over: :
Rate
Interaction: Scroll over 2.74:
Baseline rate
Interaction: Scroll over: 2.92 x1:
Rate difference for smokers
Interaction: Scroll over: 6.28 x2:
Rate difference for age
Interaction: Tabs: Interaction:

19

Additive model with an interaction:

= 2.76 + 2.84 x1 + 6.01 x2 + 0.71x1x2


The corresponding mortality rates are given in the table opposite.
Interaction: Scroll over: :
Rate
Interaction: Scroll over: 2.76:
Baseline rate (i.e. rate in non/ex-smokers aged 50-64 years old)
Interaction: Scroll over: 2.84 x1:
Rate difference for smokers in the baseline group of age (50-64 years old)
Interaction: Scroll over: 6.01 x2:
Rate difference for age in the baseline group of smoking (non/ex-smokers)
Interaction: Scroll over: 0.71x1x2:
Interaction - the "additional" effect of smoking in age group 65-74 years old
(compared to the effect of smoking in individuals aged 50-64 years old),
and the "additional" effect of being aged 65-74 years old in smokers
(compared to the effect of age in non/ex-smokers). Note that although the
word "additional" is used here, the interaction term can be either negative
or positive.
p4c10rhs (when lhs is on No Interaction tab)
Fitted mortality rates estimated from the additive rate model with no
interaction:
Smoking (x1)
None / ex-smoker
(=0)
Current smoker
(=1)

Age group (x2)


50-64 years
65-74 years
(=0)
(=1)
2.74
2.74 + 6.28
= 9.02
2.74 + 2.92
= 5.66

2.74 + 2.92
+ 6.28
= 11.94

Model:
= 2.74 + 2.92 smoking + 6.28 age
Interaction: Button: Explanation (pop up box appears):
Explanation

20

In this example, no interaction term has been added to the additive model, so the
effects of smoking and age are assumed to combine additively.
The fitted mortality rate for an individual aged 50-64 years old who is a non/exsmoker is 2.74.
The fitted mortality rate for an individual aged 65-74 years old who is a smoker is
obtained by adding together the baseline rate, the effect of smoking and the effect
of age: 2.74 + 2.92 + 6.28 = 11.94.
Fitted mortality rates estimated from the additive rate model with
interaction:
Smoking
(x1)
None / exsmoker
(=0)
Current
smoker
(=1)

Agegroup (x2)
50-64
65-74 years
years
(=1)
(=0)
2.76
2.76 + 6.01
= 8.77
2.76 +
2.84
= 5.60

2.76 + 2.84
+ 6.01 +
0.71
= 12.32

Model:
= 2.76 + 2.84 smoking + 6.01 age
+ 0.71 smoking.age
Interaction: Button: Explanation (pop up box appears):
In this model the effects of smoking and age do not combine additively. The fitted
mortality rate for a non/ex-smoker aged 50-64 years old is 2.76. The estimated
mortality rate for an individual aged 65-74 years old who is a smoker is: 2.76 + 2.84
+ 6.01 + 0.71 = 12.32.

4.10: Additive models


The meaning of interaction on the additive scale
The sign (positive or negative) and size of an interaction coefficient may be used to
assess whether additive effects are consistent with the data. The following
interpretation is applicable provided that each of (i) the rate difference for x1 in the
baseline group of x2 is greater than zero and (ii) the rate difference for x2 in the
baseline group of x1 is greater than zero
OR
Both these rate difference are less than zero,
i.e. that these rate difference are in the same direction in terms of whether they are
above or below zero.

21

In our example, both rate differences are greater than zero so we can interpret the
interaction coefficient in the following way.
An interaction coefficient close to zero suggests additive effects.
A large negative interaction coefficient suggests less than additive effects.
A large positive interaction coefficient suggests greater than additive effects
(multiplicative, perhaps).
In the additive model with interaction opposite, the interaction term is small (0.70)
and there is no evidence it is different to zero (p=0.47), suggesting additive effects.

4.11: Additive models


Five further points on interaction in additive or multiplicative models should be
noted:

If the effects of the exposures combine multiplicatively then they cannot combine
additively, and vice versa. However, in reality it may not be possible to
distinguish between the models
No interaction on the multiplicative scale means there is an interaction on the
additive scale, although it might not be reflected by P-values from the hypothesis
tests. Hence, when reporting interaction results, it is important to specify the
scale e.g. there is (no) heterogeneity of rate ratios or there is (no)
heterogeneity of rate differences.
Interaction tests have low power and should be interpreted cautiously - large pvalues suggest data are compatible with no interaction but may mean not
enough power to detect an interaction.
Conducting lots of tests for interaction may lead to evidence for one or more
interactions just by chance.
The fitted values of the IHD mortality rates for the above additive model with an
interaction (2.76, 8.77, 5.60, 12.32) are exactly equal to the observed data. The
fitted values for the multiplicative model with an interaction (2.77, 8.85, 5.64,
12.43) are different to the observed data, but only because of rounding error
(because in the calculations we worked with all values to only 2 decimal places).
The fitted values from a model will always be exactly the same as the observed
data when there are as many parameters in the model as there are data points
(the models with interactions each have 4 parameters, which are estimated from
the 4 observed rates in the data).

Section 5: Choosing between multiplicative and additive models


We want the model to be simple (few parameters) but to provide a good fit to the
data (the fitted values to be similar to the observed data).
The more parameters that are included in the model, the better the fit in general.
When there are as many parameters as data points (combinations of explanatory
variables), the model has a perfect fit (e.g. the above multiplicative and additive
models with an interaction have a perfect fit).
However, remember that models that have as many parameters as data points are
not in general useful (the example here is an exception, as we have so few data

22

points). With more covariates (anything from 5 or more), such models are
unnecessarily complicated. We would like to have a model that is "as simple as
possible, but no simpler".
In this example, we would either fit
a multiplicative model with an interaction (since there was evidence against the
null hypothesis of no interaction), or,
an additive model without an interaction (as the data were compatible with the
null hypothesis of no interaction).
The additive model is preferable, based on statistical considerations alone, because it
describes the data with fewer parameters (3 rather than 4).

5.1: Choosing between multiplicative and additive models


It is also useful to look at the expected number of deaths from the multiplicative and
additive models (with no interaction) and compare to the observed number of
deaths.
Expected deaths from multiplicative and additive models without interaction
Expected
Expected
Smoking
Age
D : observed
multiplicative model
additive model
0
0
240
265.63
237.95
0
1
243
217.37
249.74
1
0
376
350.37
380.23
1
1
293
318.63
284.07
The expected number of deaths are closer to the observed number of deaths for the
additive model compared with the multiplicative model, suggesting the additive
model with no interaction fits the data better than the multiplicative model with no
interaction.

5.2: Choosing between multiplicative and additive models


Our preference is for a model that is as simple as possible while providing an
adequate fit to the data. Hence, we might avoid models containing interaction terms,
since these generally complicate interpretation and presentation of the results.
This suggests a general strategy of trying both multiplicative and additive models
(without interaction terms), and choosing whichever provides the better fit based on
the test for interaction and comparing the expected number of deaths (from models
with no interaction) with the observed number of deaths .
There are several caveats, however:
Unless there are very large amounts of data, it is often difficult to distinguish
between the fit of the alternative models, because interaction tests have low
power.
Multiplicative models have a number of desirable mathematical properties which

23

make them easier to work with, and this is one of the reasons why they are much
more commonly used. Additive models tend to have convergence problems and
therefore they generally take longer to fit (sometimes they fail to converge).
Also, the Wald-based confidence intervals and P-values from additive models
can be misleading.

5.3: Choosing between multiplicative and additive models


It is often impossible to distinguish clearly on purely statistical criteria between the
fit of additive and multiplicative models. Despite this, the implications of these
differing models, for example, for the fitted effects of various combinations of the
risk factors, can be very different. It is important, therefore, to use any information
we have about the biological mode of action of the exposures to select an
appropriate model formulation.
Biological considerations
For two independent exposures for which the pathways of disease causation are
separate, the effects are more likely to be additive and an additive model may be
more appropriate. Where exposures have the same pathway, the effects are more
likely to multiply, in which case a multiplicative model is more appropriate.
Click below for some examples.
Interaction: Button: Example 1 (pop up box appears):
Example 1
Transfusion of contaminated blood and sexual exposure to an infected partner are
two independent pathways to HIV infection. You might expect the effect of these two
exposures to combine additively.
Interaction: Button: Example 2 (pop up box appears):
Example 2
Condom use and number of sexual partners relate to the same pathway of infection.
You could assume that the effect of not using a condom is to multiply the risk
associated with the level of sexual exposure (the effect will be more than one, since
not using a condom is expected to increase the risk of infection).
Interaction: Button: Example 3 (pop up box appears):
Example 3
A more complex example is that of multistage models of carcinogenesis. These have
helped explain the way in which incidence rates of a cancer are related to age and to
the level and duration of various exposures. In its simplest form, the model assumes
that cells are initially "normal" (Stage 0), and may then undergo a series of
transitions through stages 1,2, etc, with each transition occurring with low
probability for any individual cell. If and when it undergoes the kth transition (to
stage k), the cell undergoes malignant replication and the cancer occurs. Data for
many cancers appear consistent with this model (often k=5 or 6). Exposures to risk
factors are assumed to have their effect by increasing the rate of transition at one or

24

more stages of the process. Data on the effects of a particular exposure on risk are
used to make inferences about the stage or stages at which that exposure has its
effect. For the joint effect of two exposures, the general conclusion is that if they act
at the same stage their effects can be expected to combine additively, while if they
act at different stages their effects can be expected to combine multiplicatively.

Section 6: Summary
Multiplicative Models:
Ratios are useful for studying aetiology
Many effects combine multiplicatively
Models usually converge
Models easily fitted in standard software
Additive Models:
Differences are useful public health measures
Some effects known to combine additively
Models sometimes do not converge

25