Академический Документы
Профессиональный Документы
Культура Документы
article
info
Keywords:
Delphi method
Financial markets
Expert judgment
Belief perseverance
Overconfidence
Conditional forecasting
abstract
Experts were used as Delphi panellists and asked to present forecasts on financial market
variables in a controlled experiment. We found that the respondents with the least accurate
or least conventional views were particularly likely to modify their answers. Most of
these modifications were in the right direction but too small, probably because of beliefperseverance bias. This paper also presents two post-survey adjustment methods for
Delphi method based forecasts. First, we present a potential method to correct for the belief
perseverance bias. The results seem promising. Secondly, we test a conditional forecasting
process, which unexpectedly proves unsuccessful.
2013 International Institute of Forecasters. Published by Elsevier B.V. All rights reserved.
1. Introduction
The Delphi method was introduced at the RAND Corporation in the 1950s. It aims to maintain the advantages of
an interacting group without potentially counterproductive group dynamics, such as dominant individuals who
may not be the best experts.
In short, the traditional version of the method is based
on a multi-round survey. Respondents are asked to answer
a number of questions in writing. Answering is anonymous; other respondents do not know who answered
what. In most cases the answers are numeric estimates,
ratings on a scale, or yes/no. Often, the respondents also
have the opportunity to write comments on the issues
raised in the questionnaire. Statistics on answers and the
related comments are subsequently distributed to the respondents, but this information is anonymous and no
respondent can identify who answered what. Each respondent is allowed to modify his own answers, and possibly to
add more comments.
After a few rounds, some convergence in answers is normally observed due to a group opinion building process,
leading to less variance in the answers and more agreement within the panel. The number of rounds can be either
predetermined or dependent on the criteria of consensus
and stability. In the papers reviewed by Rowe and Wright
(1999), the number of respondents in Delphi panels varied
from 3 to 98. Ideally, the respondents will all be experts in
the same field, but with somewhat different backgrounds.
Linstone and Turoff (2011) emphasised the role of
communication in judgemental forecasting, and argued
that the Internet will have a major impact on the way
in which comparable methods will be used in the future,
since the number of potential participants will be much
larger than in traditional Delphi panels. In recent years,
some studies have used the real-time version of the
method (see Gordon & Pease, 2006), which is normally an
online application that allows respondents to modify their
answers at any time, up until the end of the answering
process (for the validation of the method, see Gnatzy,
Warth, von der Gracht, & Darkow, 2011).
The final answer of the group is defined as the mean
or median of the individual answers. In many cases, even
the questions to be answered are proposed and selected
by group members themselves before the first answering
0169-2070/$ see front matter 2013 International Institute of Forecasters. Published by Elsevier B.V. All rights reserved.
http://dx.doi.org/10.1016/j.ijforecast.2013.09.007
314
315
heavily in the final conclusions (Sherman, Zehner, Johnson, & Hirt, 1983). Testees have been observed to stick with
intentionally fed misinformation even when told that the
so-called research results were pure fiction (Anderson,
Lepper, & Ross, 1980). Lord, Ross, and Lepper (1979) found
that people may even interpret the same new information
differently and systematically in favour of their own previous beliefs. More psychological literature on this phenomenon was reviewed in detail by Nickerson (1998). The
potential elimination of this bias in meta-analysis has been
discussed briefly by Rabin and Schrag (1999). This psychological phenomenon may even have fundamental macroeconomic consequences by affecting inflation and output
dynamics (Adam, 2007).
We also tested two post-correction methods.
The first method is derived directly from Hypothesis 4.
If respondents typically modify their answers in the right
direction but by too little, one can obtain better forecasts
by scaling up modifications in individual forecasts ex post.
The second method can be used when new information
on causally relevant background variables becomes available. Because our panellists were qualified professionals in
finance and economics, we assumed that they would understand how the variables to be forecast depend on the
interest rate. We therefore expected that a substantial portion of the forecast errors would be due to unexpected
shocks to the market interest rate, so that eliminating this
error would lead to much more accurate forecasting.
3. The experimental design
3.1. Preparation of the experiment
We organised a regimented experiment with two similar expert panels. The middle management of the Bank
of Finland2 (BoF) and the Financial Supervisory Authority
of Finland (FSA) showed interest in experimental research
on forecasting, possibly because these organisations routinely produce different kinds of financial and macroeconomic forecasts. We were thus able to arrange the
experiment as an in-house project in which experts were
asked to forecast developments in financial markets. In order to get the full benefits, the Delphi method requires a
sufficiently large number of panellists. According to Rowe
and Wright (2001) and references therein, ten persons
should be sufficient; there is no evidence that increasing
the size of the panel beyond 710 persons improves the
accuracy. In order to have two similar panels of equal size,
we needed 20 participants, which we were able to obtain.
This was probably close to the maximum of the scarce human resources that we would have been able to recruit. The
respondents were asked to participate by their direct supervisors, but no one was forced to take part.
Most previous experiments (such as those of Bolger,
Stranieri, Wright, & Yearwood, 2011; Graefe & Armstrong,
2011; and Parent et al., 1984) have used non-experts as
respondents. There may be practical reasons for the use of
individuals such as students in experimental research instead of experienced professionals. However, in real life,
2 The Bank of Finland is the national central bank.
316
non-specialists are seldom relied on when significant policy and business decisions are to be made. The results
obtained from panels consisting of first-year students
answering questions related to headline news may not extend to expert groups (see for example Rowe et al., 1991).
In our panels, all of the respondents were experts in finance, working either for the FSA or for the BoFs financial markets department. Each respondent professionally
monitors one or more sectors of the domestic and international financial markets. With one exception, each participant has had several years of experience in the field. All of
them have at least an M.Sc. degree in a relevant field, such
as economics, finance, law or business administration, and
many have doctorates or licentiate degrees.
According to previous studies (Hussler, Muller, &
Rond, 2011, p. 1650; Rowe & Wright, 1996), experts seem
to behave differently from non-experts in Delphi surveys;
at least, they are less likely to modify their answers between rounds. One of the few forecasting experiments
which has been undertaken using experienced professionals in finance was carried out in Germany in the 1970s;
differences in the accuracies of Delphi panels and FTF
meetings were minor (Brockhoff et al., 1975). A recent contribution was presented by Song et al. (2013), who examined the way in which forecasts based on statistical models
could be improved by Delphi panels of experts, but did not
test different ways of deriving the collective subjective assessment of a group.
Our twenty experts were separated randomly into two
groups. One group answered questions on Finnish financial
markets in a Delphi setting, while the other group arranged
an FTF meeting to answer the same questions.
The expert groups, initially named Group One (later FTF
group) and Group Two (later Delphi group), were formed
randomly, except that both FSA employees and BoF employees were divided evenly between the two groups.
Moreover, there were specific gender quotas; both groups
originally included two women and eight men. However,
in Group One, which eventually ended up being the FTF
group, one male participant was replaced by a female. This
replacement took place before the start of the experiment.
Most of the respondents knew each other before the experiment, and had worked together on numerous occasions. All participants were informed of the outcome of the
grouping before the first forecasting round.
Even though answering was basically anonymous in
the Delphi group, we were able to follow the development of individual answers. Each respondent was asked
to choose a pseudonym that would not reveal her identity and not to disclose the pseudonym to anyone. These
pseudonyms ranged from humorous to poetic, and from
meaningless character strings to enigmatic descriptions of
the respondent. The respondents were asked to write the
pseudonym on each questionnaire to be returned during
the process. This method enabled us to follow the development of individuals assessments, while largely retaining
the anonymity requirement of the Delphi method. Needless to say, a respondent may be somehow emotionally involved with his pseudonym.
317
318
j,p,n =
|xj,p,n yj |
StDev(xj,1 )
(1)
where i,p,n is the standardized penalty points of respondent or group p, for question j in round n; xj,p,n is
the forecast; yj is the actual outcome; StDev(xj,1 ) is the
319
Table 1
Numbers of respondents who thought that the variable would be particularly sensitive to market rates and variables for conditional forecasting.
j
Question
Selected
1
2
3
4
5
6
7
8
9
10
11
12
13
14
Total
Group 2
Delphi
Group 1
FTF
Total
1
1
0
4
5
8
0
3
2
0
8
1
1
7
2
2
2
5
3
8
0
2
1
0
7
0
0
7
3
3
2
9
8
16
0
5
3
0
15
1
1
14
41
39
80
Table 2
Penalty points by question.
j
Question
Group 1; round 1;
FTF; change in
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
0.98
1.08
21.09
0.69
3.27
0.14
0.99
2.35
0.86
6.63
0.38
4.56
0.80
0.85
2.05
0.80
0.34
22.14
0.09
3.59
0.11
0.47
2.22
0.80
6.84
0.41
4.36
0.67
0.62
3.06
0.17
0.74
3.10
0.01
Average
3.12
1.05
0.60
0.32
0.03
0.52
0.14
0.06
0.21
0.03
0.20
0.13
0.23
1.01
Question
Group 2; round 1;
Delphi; change in
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
1.14
1.40
21.649
0.87
3.30
0.16
1.21
2.14
0.41
7.34
0.36
4.12
0.45
0.38
1.99
1.01
0.99
21.648
0.65
3.14
0.13
0.87
2.18
0.38
7.22
0.41
4.12
0.41
0.22
1.69
0.13
0.40
0.001
0.22
0.16
0.03
0.34
3.13
3.00
0.12
Average
0.04
0.02
0.12
0.06
0.01
0.04
0.16
0.30
320
Table 3
Correlations between group forecast improvements and paying attention
to others answers.
Delphi group (N = 15)
Basic correlations
Attention paid to others
numeric answers
Attention paid to others
verbal comments
Change in penalty points
First-order partial
correlations
Attention paid to others
numeric answers
Attention paid to others
verbal comments
Change in penalty points
NumForec
VerbCom
1.000
0.647
0.674***
1.000
***
error
0.641**
0.527**
0.641**
0.527**
1.000
NumForec
VerbCom
error
1.000
0.723***
0.650**
0.723***
1.000
0.643**
0.643**
1.000
0.650**
Notes. We control for the rank order of the original forecast error, where
1 = worst error and 15 = least serious error.
10% significance.
**
5% significance.
***
1% significance.
0.680
j,p,1
OPPOS
j,p,1 OPPOS
2
t-value
R
Mean dep var
F-stat
0.200
0.230
Coeff
***
0.670
4.7***
3.4***
0.197
0.229
0.002
5.1
5.3
0.529
0.530
0.177
5.107
4.873
Constant
OPPOS
j,p,1 OPPOS
McFadden R2
LR-stat
***
10% significance.
5% significance.
***
1% significance.
OPPOSj,p
xj,p,1 1 9 xj,i,1
9 i=1
=
,
StDev(xj,1 )
Coeff
j,p,1
4.6***
3.4***
1.1
0.177
***
Table 5
Probability of changing the forecast as a function of the initial forecast
error and disagreement with others.
t-value
***
(2)
321
3.247
0.637
1.842
z-value
Coeff
z-value
***
3.112
2.8***
1.7*
3.0***
0.571
1.854
0.046
1.6
2.8***
1.0
2.7
0.330
61.01***
0.335
61.847***
5% significance.
***
1% significance.
Table 6
Change in penalty points in cases where the forecast changed.
Constant
j,p,1
OPPOS
j,p,1 OPPOS
R2
Mean dep var
F-stat
Coeff
z-value
Coeff
z-value
0.975
0.330
0.127
4.8***
5.8***
1.4
0.939
0.327
0.129
0.001
5.4***
1.4
0.5
0.684
0.33
4.23***
4.8***
0.684
0.330
3.98***
10% significance.
5% significance.
***
1% significance.
322
(3)
p9=1 xj,p
9
(4)
j,Group2 =
|xj,Group2 yj |
StDev(xj,1 )
(5)
323
Table 7
Penalty points of invariance-corrected Delphi forecasts relative to original penalty points.
j
Question
FTF
Delphi
=2
=3
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
0.804
0.342
22.138
0.087
3.589
0.111
0.469
2.217
0.798
6.840
0.415
4.357
0.670
0.620
3.059
1.011
0.992
21.648
0.652
3.138
0.131
0.868
2.179
0.384
7.216
0.415
4.115
0.414
0.218
1.746
0.967
0.565
21.643
0.466
2.965
0.103
0.588
2.174
0.311
7.139
0.545
4.042
0.407
0.154
1.611
0.923
0.139
21.637
0.281
2.791
0.075
0.308
2.168
0.238
7.062
0.676
3.968
0.400
0.090
1.475
3.101
3.008
2.912
1.5
2.8**
2.815
2.3**
2.8**
Question
=4
=5
=6
=7
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
0.879
0.287
21.632
0.095
2.618
0.047
0.028
2.163
0.166
6.985
0.806
3.894
0.392
0.025
1.340
0.835
0.713
21.627
0.090
2.444
0.018
0.252
2.158
0.093
6.908
0.937
3.821
0.385
0.039
1.205
0.791
1.139
21.621
0.276
2.271
0.010
0.532
2.152
0.020
6.831
1.068
3.747
0.377
0.104
1.069
0.747
1.566
21.616
0.462
2.098
0.038
0.812
2.147
0.052
6.754
1.198
3.673
0.370
0.168
0.934
2.757
2.6**
3.1***
2.768
2.1**
3.1***
2.801
1.6
2.3*
2.842
1.2
1.4
Even though the outcome is consistent with the previous psychological literature, more research is needed
to corroborate these findings. One possibility is that this
method works if, and only if, the panel consists of experts,
as they may be particularly unwilling to admit to themselves that their round 1 forecasts were wildly inaccurate, making them under-react to any influences. At least,
the findings of Hussler et al. (2011) indicate that a nonspecialist would be more inclined to adopt a new answer.
Note that it might be better not to tell the respondents
about this adapting technique until after they have returned the last questionnaires, because knowing this might
affect their answering behaviour.
6.2. Conditional forecasts
The forecast errors of the seven variables believed to
be interest rate-sensitive (see Section 3.5) comprise two
components.
(6)
(7)
324
Table 8
Penalty points of invariance-corrected Delphi method forecasts, for different values of .
j
Question
=1
= 1.1
Varying
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
1.001
0.992
21.648
0.652
3.138
0.131
0.868
2.179
0.384
7.216
0.415
4.115
0.4143
0.218
1.746
1.006
0.949
21.647
0.634
3.121
0.128
0.840
2.178
0.376
7.208
0.428
4.108
0.4135
0.212
1.733
0.875
0.509
21.630
0.077
2.600
0.036
0.115
2.161
0.158
6.977
0.873
3.887
0.389
0.019
1.287
3.008
2.999
2.773
Note: In the last column, each forecast is calculated using a question-specific value of which is optimised for data on the other 14 questions.
Table 9
Components of final forecast errors in the penalty points.
j
1
4
5
6
8
11
14
j
1
4
5
6
8
11
14
Group 2Delphi
Question
Big banks loan losses
M2 deposits
Interest in corporate loans
Net interest income of small banks
Households mutual fund shares
Mean deposit rate of term deposits
Stock of housing loans
Group 1FTF
Question
Big banks loan losses
M2 deposits
Interest in corporate loans
Net interest income of small banks
Households mutual fund shares
Mean deposit rate of term deposits
Stock of housing loans
| + |/StDev
| |/StDev
||/StDev
1.011
0.652
3.138
0.131
2.179
0.415
0.218
0.837
1.093
3.306
0.088
2.376
1.617
0.162
0.174
0.441
0.168
0.219
0.197
1.202
0.380
| + |/StDev
| |/StDev
||/StDev
0.804
0.087
3.589
0.111
2.217
0.415
0.620
0.768
0.728
3.946
0.135
2.724
1.997
0.495
0.036
0.642
0.357
0.246
0.507
1.582
0.125
(| | | + |)/StDev
0.174
0.441
0.168
0.043
0.197
1.202
0.057
(| | | + |)/StDev
0.036
0.642
0.357
0.024
0.507
1.582
0.125
StDev is the standard deviation of the first round answers, as in Eq. (1).
where xj |y15 is the forecast which the individual respondent or FTF meeting group would presumably have given
if the December market rates had been known beforehand,
and y15 is the interest rate outcome for December 2011.
The interest rate-corrected forecast of the Delphi group is
defined as the average of individual forecasts. The effect
of the forecast error caused by unexpected changes in the
market rate is
(8)
The error that cannot be explained by unexpected developments in interest rates is denoted j , implying
yj = xj |x15 + j + j .
(9)
325
326
Table 10
Answers to the questions.
j
Question
Actual realisation
First round
Group 2 (Delphi)
mean
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
Question
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
mEUR
mEUR
mEUR
bEUR
%
mEUR
mEUR
mEUR
mEUR
bEUR
%
%
%
bEUR
% p.a.
mEUR
mEUR
mEUR
bEUR
%
mEUR
mEUR
mEUR
mEUR
bEUR
%
%
%
bEUR
% p.a.
224.7
137 898
2311
119.504
9
137.9
66 502
10 945
4815
100.6
2.17
13
80.71
80.055
1.947
347.25
163 545
51.0
115.460
51.3
140.4
62 643
13 586
4936
43.1
2.12
56.5
79.42
79.597
2.44
Actual realisation
224.7
137 898
2311
119.504
9
137.9
66 502
10 945
4815
100.6
2.17
13
80.71
80.055
1.947
Group 1 (FTF)
mean
329.7
157 744
109.3
116.300
50.9
140.2
63 340
13 850
5072
48.6
2.12
61.1
78.40
79.023
2.46
StDev of all 20
answers
107.5
18.37
104.4
4.633
12.82
16.423
3.202
1234
298.29
7.832
0.1447
10.56
2.867
1.218
0.2494
Group 2 Delphi
mean
Adapted, = 4
mean
Group 1 FTF
333.33
156 111
51.1
116.5
49.2
140.0
63 722
13 633
4929
44.1
2.11*
56.4
79.52
79.789
2.368
319.2
132 627*
52.8*
119.06
42.6*
138.6*
66 413*
13 613*
4864*
45.9
2.05
54.1*
79.59*
80.024*
2.281*
311.1*
144 174
0.0
119.10*
55
139.7
65 000
13 680
5053
47*
2.11*
59
78.79
79.300
2.710
Note:
*
Indicates the best final forecast.
References
Adam, K. (2007). Experimental evidence on the persistence of output and
inflation. The Economic Journal, 117(520), 603636.
Anderson, C. A., Lepper, M. R., & Ross, L. (1980). Perseverance of social
theories: the role of explanation in the persistence of discredited
information. Journal of Personality and Social Psychology, 39(6),
10371049.
Ang, S., & OConnor, M. (1991). The effect of group interaction processes
on performance in time series extrapolation. International Journal of
Forecasting, 7(2), 141149.
Armstrong, J. S. (2006). Findings from evidence-based forecasting: methods for reducing forecast error. International Journal of Forecasting,
22(3), 583598.
Armstrong, J. S. (2007). Significance tests harm progress in forecasting.
International Journal of Forecasting, 23(2), 321327.
Basu, S., & Schroeder, R. G. (1977). Incorporating judgments in sales
forecasts: application of the Delphi method at American Hoist &
Derrick. Interfaces, 7(3), 1827.
Bolger, F., Stranieri, A., Wright, G., & Yearwood, J. (2011). Does the
Delphi process lead to increased accuracy in group-based judgmental
forecasts or does it simply induce consensus amongst forecasters?
Technological Forecasting and Social Change, 78(9), 16711680.
Brockhoff, K., Kaerger, D., & Rehder, H. (1975). The performance of
forecasting groups in computer dialogue and face-to-face discussion.
In H. A. Linstone, & M. Turoff (Eds.), The Delphi method, techniques
and applications (pp. 291321). Massachusetts: Addison-Wesley
Publishing Company.
Dalkey, N.C. (1968). Experiments in group prediction. Working paper of the
RAND Corporation.
Ecken, P., Gnatzy, T., & von der Gracht, H. A. (2011). Desirability bias in
foresight: consequences for decision quality based on Delphi results.
Technological Forecasting and Social Change, 78(9), 16541670.
Fellner, G., & Krgel, S. (2012). Judgmental overconfidence: three
measures, one bias? Journal of Economic Psychology, 33(1), 142154.
Gnatzy, T., Warth, J., von der Gracht, H., & Darkow, I.-L. (2011). Validating
an innovative real-time Delphi approacha methodological comparison between real-time and conventional Delphi studies. Technological
Forecasting and Social Change, 78(9), 16811694.
Gordon, T., & Pease, A. (2006). RT Delphi: an efficient, round-less almost
real time Delphi method. Technological Forecasting and Social Change,
73(4), 321333.
Graefe, A., & Armstrong, J. S. (2011). Comparing face-to-face meetings,
nominal groups, Delphi and prediction markets on an estimation task.
International Journal of Forecasting, 27(1), 183195.
Helmer, O. (1964). Convergence of expert consensus through feedback.
RAND corporation.
Hussler, C., Muller, P., & Rond, P. (2011). Is diversity in Delphi panellist
groups useful? Evidence from a French forecasting exercise on the
future of nuclear energy. Technological Forecasting and Social Change,
78(9), 16421653.
Kerr, N. L., & Tindale, R. S. (2011). Group based forecasting? A social
psychological analysis. International Journal of Forecasting, 27(1),
1440.
Linstone, H. A., & Turoff, M. (Eds.) (1975). The Delphi method, techniques
and applications. Massachusetts: Addison-Wesley Publishing Company.
Linstone, H. A., & Turoff, M. (2011). Delphi: a brief look backward
and forward. Technological Forecasting and Social Change, 78(9),
17121719.
327
Song, H., Gao, B. Z., & Lin, V. S. (2013). Combining statistical and
judgemental forecasts via a web-based tourism demand forecasting
system. International Journal of Forecasting, 29(2), 295310.
Turoff, M. (1972). An alternative approach to cross impact analysis.
Technological Forecasting and Social Change, 3(3), 309339.
Tversky, A., & Koehler, D. J. (1994). Support theorya nonextensional
representation of subjective probability. Psychological Review, 101(4),
547567.
von der Gracht, H. A. (2012). Consensus measurement in Delphi studies;
Review and implications for future quality assurance. Technological
Forecasting and Social Change, 79(8), 15251536.
Waggoner, D. F., & Zha, T. (1999). Conditional forecasts in dynamic
multivariate models. The Review of Economics and Statistics, 81(4),
639651.
Woudenberg, F. (1991). An evaluation of Delphi. Technological Forecasting
and Social Change, 40(2), 131150.