Вы находитесь на странице: 1из 151

This document contains for the greater part the SON-R 21/2-7 Manual and Research

Report. Not included are chapter 12 (Directions per subtest), chapter 13 (The record
form, norm tables and computer program) and the appendices.
The reference for this text is:
Tellegen, P.J., Winkel, M., Wijnberg-Williams, B.J. & Laros, J.A. (1998).
Snijders-Oomen Nonverbal Intelligence Test. SON-R 21/2-7 Manual and Research
Report. Lisse: Swets & Zeitlinger B.V.
This English manual is a translation of the Dutch manual, published in 1998 (SON-R
21/2-7 Handleiding en Verantwoording). The German translation was also published
in 1998 (SON-R 21/2-7 Manual). In 2007 a German manual was published with
German norms (SON-R 21/2-7 Non-verbaler Intelligenztest. Testmanual mit deutscher
Normierung und Validierung).

Translation by Johanna Noordam


ISBN 90 265 1534 0
Since 2003, the SON-tests are published by Hogrefe Verlag, Gttingen, Germany.
1998, 2009 Publisher: Hogrefe, Authors : Peter J. Tellegen & Jacob A. Laros
http://www.hogrefe.de
E-mail: verlag@hogrefe.de

Rohnsweg 25, 37085 Gttingen, Germany

CONTENTS

Foreword . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

, -7
PART I: THE CONSTRUCTION OF THE SON-R 2,
1.

2.

3.

4.

5.

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

13

1.1

Characteristics of the SON-R 2,-7 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

13

1.2

History of the SON-tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

14

1.3

Rationale for the revision of the Preschool SON . . . . . . . . . . . . . . . . . . . . . . .

16

1.4

Phases of the research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

17

1.5

Organization of the manual . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

18

Preparatory study and construction research . . . . . . . . . . . . . . . . . . . . . . . . . . . .

19

2.1

The preparatory study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

19

2.2

The construction research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

22

, -7 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Description of the SON-R 2,

25

3.1

The subtests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

25

3.2

Reasoning tests, spatial tests and performance tests . . . . . . . . . . . . . . . . . . . .

31

3.3

Characteristics of the administration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

32

Standardization of the test scores . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

35

4.1

Design and realization of the research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

35

4.2

Composition of the normgroup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

37

4.3

The standardization model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

39

4.4

The scaled scores . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

40

Psychometric characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

43

5.1

Distribution characteristics of the scores . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

43

5.2

Reliability and generalizability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

47

5.3

Relationships between the subtest scores . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

51

5.4

Principal components analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

52

5.5

Stability of the test scores . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

54

SON-R 2,-7

PART II: VALIDITY RESEARCH


6.

Relationships with other variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .


6.1 Duration of test administration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.2 Time of test administration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.3 Examiner influence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.4 Regional and local differences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.5 Differences between boys and girls . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.6 SES level of the parents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.7 Parents country of birth . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.8 Evaluation by the examiner . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.9 Evaluation by the teacher . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

57
57
58
58
59
60
61
62
63
64

7.

Research on special groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .


7.1 Composition of the groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7.2 The test scores of the groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7.3 Relationship with background variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7.4 Diagnostic data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7.5 Evaluation by the examiner . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7.6 Evaluation by institute or school staff . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7.7 Examiner effects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7.8 Psychometric characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

67
67
70
74
74
75
77
78
79

8.

Immigrant children . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8.1 The test results of immigrant children . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8.2 Relationship with the SES level . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8.3 Differentiation according to country of birth . . . . . . . . . . . . . . . . . . . . . . . . . .
8.4 Comparison with other tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8.5 The test performances of children participating in OPSTAP(JE) . . . . . . . . . .

81
81
82
82
83
84

9.

Relationship with cognitive tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .


9.1 Correlation with cognitive tests in the standardization research . . . . . . . . . . .
9.2 Correlation with nonverbal tests in primary education . . . . . . . . . . . . . . . . . .
9.3 Correlation with cognitive tests at OVB-schools . . . . . . . . . . . . . . . . . . . . . . .
9.4 Correlation with cognitive tests in special groups . . . . . . . . . . . . . . . . . . . . . .
9.5 Correlation with the WPPSI-R in Australia . . . . . . . . . . . . . . . . . . . . . . . . . . .
9.6 Correlation with cognitive tests in West Virgina, USA . . . . . . . . . . . . . . . . . .
9.7 Correlation with the BAS in Great Britain . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9.8 Overview of the correlations with the criterion tests . . . . . . . . . . . . . . . . . . . .
9.9 Difference in correlations between the Performance Scale and the Reasoning
Scale . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9.10 Difference in mean scores on the tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9.11 Comparisons in relation to external criteria . . . . . . . . . . . . . . . . . . . . . . . . . . .

87
89
93
94
96
101
102
104
106
109
110
112

CONTENTS

PART III: THE USE OF THE TEST


10. Implications of the research for clinical situations . . . . . . . . . . . . . . . . . . . . . . . .

117

10.1 The objectives of the revision . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

117

10.2 The validity of the test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

119

10.3 The target groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

124

10.4 The interpretation of the scores . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

128

10.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

134

11. General directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

137

11.1 Preparation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

137

11.2 Directions and feedback . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

138

11.3 Scoring the items . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

140

11.4 The adaptive procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

141

11.5 The subtest score . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

144

11.6 Adapting the directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

145

12. Directions per subtest . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

147

12.1 Mosaics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

148

12.2 Categories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

154

12.3 Puzzles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

159

12.4 Analogies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

163

12.5 Situations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

169

12.6 Patterns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

173

13. The record form, norm tables and computer program . . . . . . . . . . . . . . . . . . . .

187

13.1 The use of the record form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

187

13.2 The use of the norm tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

191

13.3 The use of the computer program . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

194

13.4 Statistical comparisons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

198

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

205

Appendix A

Norm tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

211

Appendix B

The record form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

250

Appendix C

The file SONR2.DAT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

255

Appendix D

Contents of the test kit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

256

SON-R 2,-7

TABLES AND FIGURES IN THE TEXT


Introduction
Table 1.1 Overview of the versions of the SON-tests . . . . . . . . . . . . . . . . . . . . . . . . .

15

Pilot study and construction research


Table 2.1 Relationship between the subtests of the Preschool SON and the
SON-R 2,-7 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Table 2.2 Origin of the items . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

21
23

, -7
Description of the SON-R 2,
Table 3.1 Tasks in the subtests of the SON-R 2,-7 . . . . . . . . . . . . . . . . . . . . . . . . . .
Figure 3.1 Items from the subtest Mosaics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Figure 3.2 Items from the subtest Categories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Figure 3.3 Items from the subtest Puzzles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Figure 3.4 Items from the subtest Analogies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Figure 3.5 Items from the subtest Situations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Figure 3.6 Items from the subtest Patterns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Table 3.2 Classification of the subtests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

26
27
27
28
29
30
31
32

Standardization of the test scores


Table 4.1 Composition of the norm group according to age, sex and phase of research
Table 4.2 Demographic characteristics of the norm group in comparison with the
Dutch population . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Table 4.3 Education and country of birth of the mother in the weighted and
unweighted norm group . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Psychometric characteristics
Table 5.1 P-value of the items . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Figure 5.1 Plot of the discrimination and difficulty parameter of the items . . . . . . . . .
Table 5.2 Mean and standard deviation of the raw scores . . . . . . . . . . . . . . . . . . . . . .
Table 5.3 Distribution characteristics of the standardized scores in the weighted
norm group . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Table 5.4 Floor and ceiling effects at different ages . . . . . . . . . . . . . . . . . . . . . . . . . .
Table 5.5 Reliability, standard error of measurement and generalizability of the test
scores . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Table 5.6 Reliability and generalizability of the IQ score of the Preschool SON, the
SON-R 2,-7 and the SON-R 5,-17 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Table 5.7 Correlations between the subtests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Table 5.8 Correlations of the subtests with the rest total score and the square of the
multiple correlations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Table 5.9 Results of the Principal Components Analysis in the various age and
research groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Table 5.10 Test-retest results with the SON-R 2,-7 . . . . . . . . . . . . . . . . . . . . . . . . . . .
Table 5.11 Examples of test scores from repeated test administrations . . . . . . . . . . . .

37
38
38

43
45
46
46
47
48
50
51
52
53
55
56

CONTENTS

Relationships with other variables


Table 6.1 Duration of the test administration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Table 6.2 Relationship of the IQ scores with the time of administration . . . . . . . . . .
Table 6.3 Examiner effects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Table 6.4 Regional and local differences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Table 6.5 Relationship of the test scores with sex . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Table 6.6 Relationship of the IQ score with the occupational and educational level
of the parents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Table 6.7 Relationship of the IQ score with the SES level . . . . . . . . . . . . . . . . . . . . .
Table 6.8 Relationship between IQ and country of birth of the parents . . . . . . . . . . .
Table 6.9 Relationship between evaluation by the examiner and the IQ . . . . . . . . . .
Table 6.10 Correlations of the total scores with the evaluation by the teacher . . . . . . .
Table 6.11 Correlations of the subtest scores with the evaluation by the teacher . . . .
Research on special groups
Table 7.1 Subdivision of the research groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Table 7.2 Composition of the research groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Table 7.3 Test scores per group . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Figure 7.1 Distribution of the 80% frequency interval of the IQ scores of the various
groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Table 7.4 Relationship of the IQ scores with background variables . . . . . . . . . . . . . .
Table 7.5 Reasons for referral of children at schools for Special Education and
Medical Daycare Centers for preschoolers, with mean IQ scores . . . . . . .
Table 7.6 Relationship between IQ and evaluation by the examiner . . . . . . . . . . . . .
Table 7.7 Correlations between test scores and evaluation by institute or school
staff member . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Table 7.8 Correlations between the subtests and subtest-rest correlations . . . . . . . . .
Immigrant children
Table 8.1 Test scores of native Dutch children, immigrant children and children
of mixed parentage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Table 8.2 Relationship between group, SES level and IQ . . . . . . . . . . . . . . . . . . . . . .
Table 8.3 Differentiation of mean IQ scores according to country of birth . . . . . . . .
Table 8.4 Mean IQ scores of Surinam, Turkish and Moroccan children who had
participated in the OPSTAP(JE) project . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Relationship with cognitive tests
Table 9.1 Overview of the criterion tests used and the number of children to whom
each test was administered . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Table 9.2 Characteristics of the children to whom a criterion test was administered
in the standardization research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Table 9.3 Correlations with other tests in the standardization research . . . . . . . . . . .
Table 9.4 Correlations with nonverbal cognitive tests in the second year of
kindergarten, 5 to 6 years of age . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

57
58
59
60
60
61
62
63
64
65
66

68
69
71
73
74
75
76
77
79

81
82
83
84

88
89
90
94

SON-R 2,-7

Table
Table
Table
Table
Table
Table
Table
Table
Table
Table

9.5
9.6
9.7
9.8
9.9
9.10
9.11
9.12
9.13
9.14

Table 9.15
Table 9.16
Table 9.17
Table 9.18

Correlations with cognitive tests completed by children at low SES schools


given educational priority . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Characteristics of the children in the special groups to whom a criterion
test was administered . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Correlations with criterion tests in the special groups . . . . . . . . . . . . . . . .
Correlations with the WPPSI-R in Australia . . . . . . . . . . . . . . . . . . . . . . . .
Age and sex distribution of the children in the American validation
research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Correlations with criterion tests in the American research . . . . . . . . . . . . .
Correlations with the BAS in Great Britain . . . . . . . . . . . . . . . . . . . . . . . . .
Overview of the correlations with the criterion tests . . . . . . . . . . . . . . . . . .
Difference in scores between SON-IQ and PIQ of the WPPSI-R . . . . . . . .
Correlations of the Performance Scale and the Reasoning Scale with
criterion tests, for cases in which the difference between correlations was
greater than .10 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Comparison between the mean test scores of the SON-R 2,-7 and the
criterion tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Comparisons between tests of the evaluation of the subjects testability . .
Comparisons between tests in relation to socioeconomic and ethnic
background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Comparisons between tests in relation to evaluation of intelligence and
language skills . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Implications of the research for clinical situations


Table 10.1 Mean change in IQ score over a period of one month . . . . . . . . . . . . . . . . .
Figure 10.1 The components of the variance of the SON-R 2,-7 IQ score . . . . . . . . .
Table 10.2 Classification of IQ scores and intelligence levels . . . . . . . . . . . . . . . . . . .
Table 10.3 Composition of the variance when several tests are administered . . . . . . .
Table 10.4 Correction of mean IQ score based on administration of two or three tests
Table 10.5 Obsolescence of the norms of the SON-IQ . . . . . . . . . . . . . . . . . . . . . . . . .
Record form, norm tables and computer program
Table 13.1 Examples of the calculation of the subjects age . . . . . . . . . . . . . . . . . . . . .
Figure 13.1 Diagram of the working of the computer program . . . . . . . . . . . . . . . . . . .
Table 13.2 Comparison between the possibilities using the computer program and
using the norm tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Table 13.3 Examples of probability and reliability intervals for various scores . . . . .

95
97
98
102
103
104
105
107
108

109
111
113
114
116

118
123
130
132
133
133

190
195
197
202

FOREWORD

Nan
Snijders-Oomen
(1916-1992)

Jan
Snijders
(1910-1997)

The publication of the SON-R 2,-7 completes the third revision of the Snijders-Oomen Nonverbal Intelligence Tests. Over a period of fifty years Nan Snijders-Oomen and Jan Snijders
were responsible for the publication of the SON tests. We feel honored to be continuing their
work. They were interested in this revision and supported us with advice until their death.
The present authors played different roles in the production of this test and the manual. Peter
Tellegen, as project manager, was responsible for the revision of the test and supervised the
research. Marjolijn Winkel made a large contribution to all phases of the project in the context of
her PhD research. Her thesis on the revision of the test will be published at the end of 1998. Jaap
Laros, at present working at the University of Brasilia, participated in the construction of the
subtests, in particular Mosaics and Analogies. Barbara Wijnberg-Williams, made a large contribution, based on her experience as a practicing psychologist at the University Hospital of
Groningen, to the manner in which the test can be administered nonverbally to children with
communicative handicaps.
The research was carried out at the department for Personality and Educational Psychology of
the University of Groningen. Wim Hofstee, head of the department, supervised the project.
Jannie van den Akker and Christine Boersma made an important contribution to the organization of the research.
The research was made financially possible by a subsidy from SVO, the Institute for Educational Research (project 0408), by a subsidy from the Foundation for Behavioral Sciences, a section

10

SON-R 2,-7

of the Netherlands Organization for Scientific Research (NWO-project 575-67-033), and by


contributions from the SON research fund. Wolters-Noordhoff, who previously published the
SON-tests, made an important contribution to the development of the testing materials. The
drawings for the subtests Categories, Puzzles and Situations were made by Anjo Mutsaars. The
figures for the subtest Patterns were executed by Govert Sips of the graphical design agency
Sips. Wouter Veeman from Studio van Stralen executed the subtests Mosaics and Analogies.
The construction of a test requires a large number of subjects, for the construction research as
well as for the standardization and the validation. In the last few years, more than three thousand
children were tested with the SON-R 2,-7 in the framework of the research. We are greatly
indebted to them, as well as to their parents and the staff members of the schools and institutes
where the research was carried out.
In the Netherlands, as well as in Australia, Great Britain and the United States of America, many
students, researchers, and practicing psychologists and orthopedagogic specialists contributed
to the research. Thanks to their enthusiasm and involvement, the research could be carried out
on such a large and international scale. Without claiming to be comprehensive, we would like to
mention the following people by name:
Margreet Altena, Rachida El Baroudi, Cornalieke van Beek, Wynie van den Berg,
M. van den Besselaar, Marleen Betten, Marjan Bleckman, Nico Bollen, Rene Bos,
Ellen Bouwer, Monique Braat, C. Braspenning, Marcel Broesterhuizen, Karen Brok,
Ankie Bronsveld, Aletha Brouwer, Anne Brouwer, Sonja Brouwer, Lucia Burnett,
Mary Chaney, Janet Cooper, Pernette le Coultre-Martin, Richard Cress, J. van Daal,
Shirley Dennehy, M. van Deventer, Dorrit Dickhout-Kuiper, Julie Dockrell,
Nynke Driesens, Petra van Driesum, Marcia van Eldik, Marielle Elsjan, Yvonne Eshuis,
Arnoud van Gaal, Judith Gould, Marian van Grinsven, Nicola Grove,
Renate Grovenstein, Marije Harsta, R.G. den Hartog, Leida van der Heide,
Roel van der Helm, Marlou Heppenstrijdt, Valerie Hero, Sini Holm,
Marjan Hoohenkerk, E.P.A. Hopster, Jacqueline ten Horn, Jeannet Houwing,
Hans Hster, Jo Jenkinson, Jacky de Jong, Myra de Jong, Anne Marie de Jonge,
Jos Kamminga, Jennifer Kampsnider, Claudine Kempa, Debby Kleymeer, Jeanet
Koekkoek, Marianne van de Kooi, Annette Koopman, Monique Koster, A.M. Kraal,
Marijke Kuiper, Koosje Kuperus, Marijke Knzli-van der Kolk, Judith Landman,
Nan Le Large, Del Lawhon, J. van Lith-Petry, Jan Litjens, Amy Louden,
Henk Lutje Spelberg, Mannie McClelland, Sanne Meeder, Anke van der Meijde,
Jacqueline Meijer, Sjoeke van der Meulen, Bieuwe van der Meulen, Jitty Miedema,
Margriet Modderman, Cristal Moore, Marsha Morgan, Renate Mulder,
Marian Nienhuis-Katz, F. Nietzen, Theo van Noort, Stephen OKeefe,
Jamila Ouladali, Mary Garcia de Paredes, Inge Paro, Immelie Peeters, Jo Pelzer,
Simone Peper, Trudy Peters-ten Have, Dorothy Peterson, Mirea Raaijmakers,
Lieke Rasker, Inge Rekveld, Lucienne Remmers, E.J. van Rijn van Alkemade,
Susan Roberts, Christa de Rover, Peter van de Sande, A.J. van Santen,
Liesbeth Schlichting, Marijn Schoemaker, Ietske Siemann, Margreet Sjouw,
Emma Smid, L. Smits, Tom Snijders, Marieke Snippe, P. Steeksma, Han Starren,
Lilian van Straten, Penny Swan, Dorine Swartberg, Marjolein Thilleman,
Lous Thobokholt-van Esch, Jane Turner, Dick Ufkes, Baukje Veenstra,
Nettie van der Veen, Marja Veerman, Carla Vegter, Pytsje Veltman, Harriet Vermeer,
Mieke van Vleuten, Jeroen Wensink, Betty Wesdorp-Uytenbogaart, Jantien Wiersma,
Aranka Wijnands, G.J.M. van Woerden, Emine Yildiz and Anneke Zijp.
With the publication of this Manual and Research Report of the SON-R 2,-7, an important
phase of the revision of the test comes to an end. This does not mean that the test is finished.
The value of a test is determined, for a large part, by diagnostic experiences and by ongoing

11

FOREWORD

research. We are, therefore, interested in the experiences of users, and we would appreciate
being informed of their research results when these become available as internal or external
publications. We intend to inform users and other interested parties about the developments and
further research with the SON tests via Internet. The address of the homepage will be:
www.ppsw.rug.nl/hi/tests/sonr.
In the last years the need to carry out diagnostic research on children at a young age has greatly
increased. Furthermore, the realization has grown that the more traditional intelligence tests are
less suitable for important groups of children because they do not take sufficient account of the
limitations of these children, or of their cultural background. In these situations the SON tests
are frequently used. We hope that this new version of the test will also contribute to reliable and
valid diagnostic research with young children.

Groningen, January 1998

Dr. Peter Tellegen

Heymans Institute
University of Groningen
Grote Kruisstraat 2/1
9712 TS Groningen
The Netherlands
tel. +31 50 363 6353
fax +31 50 363 6304
e-mail: p.j.tellegen@rug.nl
http://www.testresearch.nl

, -7
Reviewing of the SON-R 2,
The test has been reviewed by de COTAN, the test commission of the Netherlands
Institute for Psychologists. The categories used are insufficient, sufficient and good.
The rating is as follows:
Basics of the construction of the test:
Execution of the materials:
Execution of the manual:
Norms:
Reliability:
Construct validity:
Criterion validity:

good
good
good
good
good
good
good

13

INTRODUCTION

The new version of the Snijders-Oomen Nonverbal Intelligence Test for children from two-anda-half to seven years, the SON-R 2,-7, is an instrument that can be individually administered to
young children for diagnostic purposes. The test makes a broad assessment of mental functioning possible without being dependent upon language skills.

, -7
1.1 CHARACTERISTICS OF THE SON-R 2,
The SON-R 2,-7, like the previous version of the test, the SON 2,-7 (Snijders & SnijdersOomen, 1976), provides a standardized assessment of intelligence. The childs scores on six
different subtests are combined to form an intelligence score that represents the childs ability
relative to his or her age group. Separate norm tables allow total scores to be calculated for the
performance tasks and for the tasks mainly requiring reasoning ability.
A distinctive feature of the SON-R 2,-7 is that feedback is given during administration of
the test. After the child has given an answer, the examiner tells the child whether it is correct or
incorrect. If the answer is incorrect, the examiner demonstrates the correct answer. When
possible, the correction is made together with the child. The detailed directions provided in the
manual also make the test suitable for the assessment of very young children. In general, the
examiner demonstrates the first items of each subtest in part or in full. Examples are included in
the test directions and items.
The items on the subtests of the SON-R 2,-7 are arranged in order of increasing difficulty.
This way a procedure for determining a starting point appropriate to the age and ability of each
individual child can be used. By using the starting point and following the rules for discontinuing the test, the administration time is limited to fifty to sixty minutes.
The test can be administered nonverbally or with verbal directions. The spoken text does not
give extra information. The manner of administration can thus be adapted to the communication
ability of each individual child, allowing the test to proceed as naturally as possible.
Because the test can be administered without the use of written or spoken language, it is
especially suitable for use with children who are handicapped in the areas of communication
and language. For the same reason it is also suitable for immigrant children who have little or no
command of the language of the examiner.
The testing materials do not need to be translated, making the test suitable for international
and cross-cultural research. The SON-tests are used in various countries. The names of the
various subtests are shown on the test booklets in the following languages: English, German,
Dutch, French, and Spanish. The manual has been published in English and German as well as
in Dutch.
A similarity between the SON-R 2,-7 and other intelligence tests for (young) children, such
as the BAS (Elliott, Murray & Pearson, 1979-82), the K-ABC (Kaufman & Kaufman, 1983), the
RAKIT (Bleichrodt, Drenth, Zaal & Resing, 1984) and the WPPSI-R (Wechsler, 1989), is that
intelligence is assessed on the basis of performance on a number of quite diverse tasks. However, verbal test items are not included in the SON-R 2,-7. Such items are often dependent to a
great extent on knowledge and experience. The SON-R 2,-7 can therefore be expected to be
focused more on the measurement of fluid intelligence and less on the measurement of crystallized intelligence (Cattell, 1971) than are the other tests.

14

SON-R 2,-7

The subtests of the SON-R 2,-7 differ from the nonverbal subtests in other intelligence tests in
two important ways. First, the nonverbal part of other tests is generally limited to typical
performance tests. The SON-R 2,-7, however, includes reasoning tasks that take a verbal form
in the other tests. Second, while the testing material of the performance part of the other tests is
admittedly nonverbal, the directions are given verbally (Tellegen, 1993).
An important difference with regard to other nonverbal intelligence tests such as the CPM
(Raven, 1962) and the TONI-2 (Brown, Sherbenou & Johnsen, 1990) is that the latter tests
consist of only one item-set and are therefore greatly dependent on the specific ability that is
measured by that test. Nonverbal intelligence tests such as the CTONI (Hammill, Pearson &
Wiederholt, 1996) and the UNIT (Bracken & McCallum, 1998) consist of various subtests, like
the SON-R 2,-7. A fundamental difference, however, is that the directions for these tests are
given exclusively with gestures, whereas the directions with the SON-R 2,-7 are intended to
create as natural a test situation as possible.
An important way in which the SON-R 2,-7 differs from all the above-mentioned tests is
that the child receives assistance and feedback if he or she cannot do the task. In this respect the
SON-R 2,-7 resembles tests for learning potential that determine to what extent the child
profits from the assistance offered (Tellegen & Laros, 1993a). The LEM (Hessels, 1993) is an
example of this kind of test.
In sum, the SON-R 2,-7 differs from other tests for young children in its combination of a
friendly approach to children (in the manner of administration and the attractiveness of the
materials), a large variation in abilities measured, and the possibility of testing intelligence
regardless of the level of language skill.

1.2 HISTORY OF THE SON-TESTS


The publication of the SON-R 2,-7 completes the third revision of the test battery that Nan
Snijders-Oomen started more than fifty years ago. In table 1.1 the earlier versions are shown
schematically.
The first version of the SON-test was intended for the assessment of cognitive functioning in
deaf children from four to fourteen years of age (Snijders-Oomen, 1943). Drawing on existing
and newly developed tasks, Snijders-Oomen developed a test battery which included an assortment of nonverbal tasks related to spatial ability and abstract and concrete reasoning. The test
was intended to provide a clear indication of the childs learning ability and chances of succeeding at school. One requirement for the test battery was that upbringing and education should
influence the test results as little as possible. Further, a variety of intellectual functions had to be
examined with the subtests, and the tasks had to interest the child to prevent him or her becoming bored or disinclined to continue.
No specific concept of intelligence was assumed as a basis for the test battery. However,
form, concrete coherence, abstraction and short-term memory were seen as acceptable
representations of intellectual functioning typical of subjects suffering from early deafness
(Snijders-Oomen, 1943). The aim of the test battery was to break through the one-sidedness of
the nonverbal performance tests in use at the time, and to make functions like abstraction,
symbolism, understanding of behavioral situations, and memory more accessible for nonverbal
testing.
The first revision of the test was published in 1958, the SON-58 (Snijders & SnijdersOomen, 1958). In this revision the test battery was expanded and standardized for hearing as
well as deaf children from four to sixteen years of age.
Two separate test batteries were developed during the second revision. The most important
reason for this was, in all the subtests of the original SON, a different type of test item had
seemed more appropriate for children above six years of age. The bipartite test that already
existed in fact was implemented systematically in this second revision: the SSON (Starren,
1975) was designed for children from seven to seventeen years of age; for children from three to
seven years of age the SON 2,-7, commonly known as Preschool SON, or P-SON, was developed (Snijders & Snijders-Oomen, 1976).

15

INTRODUCTION

Table 1.1
Overview of the Versions of the SON-Tests
SON
(1943)
Snijders-Oomen
Deaf Children
4-14 years
SON-58
(1958)
Snijders & Snijders-Oomen
Deaf and Hearing Children
4-16 years
, -7 (Preschool SON)
SON 2,
(1975)
Snijders & Snijders-Oomen
Hearing and Deaf Children
3-7 years

SSON
(1975)
Starren
Hearing and Deaf Children
7-17 years

, -7
SON-R 2,
(1998)
Tellegen, Winkel, Wijnberg-Williams & Laros
General Norms
2;6-8;0 years

,-17
SON-R 5,
(1988)
Snijders, Tellegen & Laros
General Norms
5;6-17;0 years

under each heading has been listed: the year of publication of the Dutch manual, the authors of the
manual, the group and the age range for which the test was standardized

The form and contents of the SSON strongly resembled the SON-58, except that the SSON
consisted entirely of multiple choice tests. After the publication of the SSON in 1975, the SON58 remained in production because it was still in demand. In comparison to the SSON, the
SON-58 contained more stimulating tasks and provided more opportunity for observation of
behavior, because it consisted of tests in which children were asked to manipulate a large variety
of test materials. The subtests in the Preschool SON maintained this kind of performance test to
provide opportunities for the observation of behavior.
The third revision of the test for older children, the SON-R 5,-17, was published in 1988
(Snijders, Tellegen & Laros, 1989; Laros & Tellegen, 1991; Tellegen & Laros, 1993b). This test
replaces both the SON-58 and the SSON, and is meant for use with hearing and deaf children
from five-and-a-half to seventeen years of age. In constructing the SON-R 5,-17 an effort was
made to combine the advantages of the SSON and the SON-58. On the one hand, a range of
diverse testing materials was included. On the other hand, a high degree of standardization in
the administration and scoring procedures as well as a high degree of reliability of the test was
achieved.
The SON-R 5,-17 is composed of abstract and concrete reasoning tests, spatial ability tests
and a perceptual test. A few of these tests are newly developed. A memory test was excluded
because memory can be examined better by a specific and comprehensive test battery than by a
single subtest. In the SON-R 5,-17, the standardization for the deaf is restricted to conversion
of the IQ score to a percentile score for the deaf population. The test uses an adaptive procedure
in which the items are arranged in parallel series. This way, fewer items that are either too easy
or too difficult are administered. Feedback is given in all subtests; this consists of indicating

16

SON-R 2,-7

whether a solution is correct or incorrect. The standardized scores are calculated and printed by
a computer program.
The SON-R 5,-17 has been reviewed by COTAN, the commission of the Netherlands Institute
for Psychologists responsible for the evaluation of tests. All aspects of the test (Basics of the
construction of the test, Execution of the manual and test materials, Norms, Reliability and
Validity) were judged to be good (Evers, Van Vliet-Mulder & Ter Laak, 1992). This means the
SON-R 5,-17 is considered to be among the most highly accredited tests in the Netherlands
(Sijtsma, 1993).
After completing the SON-R 5,-17, a revision of the Preschool SON was started, resulting in
the publication of the SON-R 2,-7. The test was published in 1996, together with a manual
consisting of the directions and the norm tables (Tellegen, Winkel & Wijnberg-Williams, 1997).
In the present Manual and Research report, the results of research done with the test are also
presented: the method of revision, the standardization and the psychometric characteristics, as
well as the research concerning the validity of the test. Norm tables allowing the calculation of
separate standardized total scores for the performance tests and the reasoning tests have been
added. Also, the reference age for the total score can be determined. Norms for experimental
usage have been added for the ages of 2;0 to 2;6 years. All standardized scores can easily be
calculated and printed using the computer program.

1.3 RATIONALE FOR THE REVISION OF THE PRESCHOOL SON


The most important reasons for revising the Preschool SON were the need to update the norms,
to modernize the test materials, to improve the reliability and generalizability of the test, and to
provide a good match with the early items of the SON-R 5,-17.

Updating the norms


The Preschool SON was published in 1975. After a period of more than 20 years, revision of an
intelligence test is advisable. Test norms tend to grow obsolete in the course of time. Research
shows (Lynn & Hampson, 1986; Flynn, 1987) that performance on intelligence tests increases
by two or three IQ points over a period of 10 years. Experience in the Netherlands with the
revision of the SON-R 5,-17 and the WISC-R is consistent with this (Harinck & Schoorl,
1987). Comparisons in the United States of scores on the WPPSI and WPPSI-R, and scores on
the WISC-R and WISC-III showed an average increase in the total IQ scores of more than three
points every ten years. The increase in the performance IQ was more than four points every ten
years (Wechsler, 1989, 1991).
Changes in the socio-economic environment may explain the increase in the level of performance on intelligence tests (Lynn & Hampson, 1986). Examples of these changes are watching
television, increase in leisure time, smaller families, higher general level of education, changes in
upbringing and education. The composition of the general population has also changed; in the
Netherlands the population is ageing and the number of immigrants is increasing. The norms of
the Preschool SON from 1975 can be expected to provide scores that are too high, and that no
longer represent the childs performance in comparison to his or her present age group.

The testing materials


The rather old-fashioned testing materials were the second reason for revising the test: some of
the drawings used were very dated, and the increasing number of immigrant children in the
Netherlands over the last twenty years makes it desirable to reflect the multi-cultural background of potential subjects in the materials (see Hofstee, 1990). The structure of the materials
and the storing methods of the test were also in need of improvement.

Improving the reliability and generalizability


A third motive for revision was to improve the reliability and generalizability of the Preschool
SON, especially for the lower and upper age ranges. Analysis of the data presented in the manual

INTRODUCTION

17

of the Preschool SON showed that the subtests differentiated too little at these ages. The range of
possible raw scores had a mean of 12 points. In the youngest age group, 20% of the children
received the lowest score on the subtests and in the oldest age group, 43% received the highest
score (Hofstee & Tellegen, 1991). In other words, the Preschool SON was appropriate for children of four or five years old, but it was often too difficult for younger children and too easy for
older children. Further, there was no standardization at the subtest level, only at the level of the
total score; this meant that it was not possible to calculate the IQ properly if a subtest had not been
administered. Finally, the norms were presented per age group of half a year. This could lead to a
deviation of six IQ points if the age did not correspond to the middle of the interval.

, -17
Correspondence with the SON-R 5,
To be able to compare the results of the SON-R 2,-7 with those of the SON-R 5,-17, the new
test for young children should be highly similar to the test for older children. An overlap in the
age ranges of the tests was also considered desirable. This way, the choice of a test can be based
on the level of the child, or on other specific characteristics that make one test more suitable
than the other. Various new characteristics of the SON-R 5,-17, such as the adaptive test
procedure, the standardization model and the use of a computer program, were implemented as
far as possible in the construction of the SON-R 2,-7.

1.4 PHASES OF THE RESEARCH


On the basis of the above-mentioned arguments it was decided to revise the Preschool SON. The
revision was not restricted to the construction of new norms; the items, subtests and directions
were also subjected to a thorough revision. The revision proceeded in several phases. This
section presents a short review of the research phases.

Preparatory study
In the preparatory study the Preschool SON was evaluated. This started in 1990. The aim of the
preparatory study was to decide how the testing materials of the Preschool SON could best be
adapted and expanded. To this end, users of the Preschool SON were interviewed, the literature
was reviewed, other intelligence tests were analyzed and a secondary analysis of the data of the
standardization research of the Preschool SON was performed.

Construction research phase


The construction research for the SON-R 2,-7 took place in 1991/92. During this period, three
experimental versions of the test were administered to more than 1850 children between two
and seven years of age. The final version of the SON-R 2,-7 was compiled on the basis of the
data from this research, the experiences and observations of examiners, and the comments and
suggestions of psychologists and educators active in the field.

Standardization research phase


The standardization research, in which more than 1100 children in the age range two to seven
years participated, took place during the school year 1993/94. The results of this research
formed the basis for the standardization of the SON-R 2,-7, and the evaluation of its psychometric characteristics. During the standardization research, background data relevant for the
interpretation of the test scores were collected.
For the validation of the test, other language and intelligence tests were administered to a
large number of the children who participated in the standardization research. Administration of
these tests was also made possible by collaboration with the project group that was responsible
for the standardization of the Reynell Test for Language Skills (Van Eldik, Schlichting, Lutje
Spelberg, Sj. van der Meulen & B.D. van der Meulen, 1995) and the Schlichting Test for
Language Production (Schlichting, Van Eldik, Lutje Spelberg, Sj. van der Meulen & B.F. van
der Meulen, 1995).

18

SON-R 2,-7

Validation research phase


Separate validation research was done for the following groups: children in special educational
programs, children at medical preschool daycare centers, children with a language, speech and/
or hearing disorder, deaf children, autistic children and immigrant children. Validation research
was also carried out in Australia, the United States of America and the United Kingdom. The
results of these children on the SON-R 2,-7 have been compared with their performance on
many other cognitive tests.

1.5 ORGANIZATION OF THE MANUAL


This manual is made up of three parts. In the first part the construction phase of the test is
discussed. Chapter 2 deals with the preparatory study and the construction research during
which new testing materials and administration procedures were developed. In chapter 3 a
description is given of the subtests and the main characteristics of the administration of the test.
The standardization research and the standardization model used are described in chapter 4.
Information about psychometric characteristics such as reliability, factor structure and stability
can be found in chapter 5.
In the second part research concerning the validity of the test is described. Chapter 6 is based
on the results in the norm group and discusses the relations between test performance and other
variables, such as socio-economic level, sex and evaluations by the examiner and teachers. In
chapter 7 the test results in a number of special groups of children, with whom the SON-tests are
often used, are discussed. The special groups include children with a developmental delay,
autistic children, language, speech and/or hearing disabled children, and deaf children. Chapter
8 deals with the performance of immigrant children. In chapter 9 the correlations between the
SON-R 2,-7 and several other tests for intelligence, language skills, memory and perception
are discussed. The research on validity involved both children in regular education and handicapped children, and was partly carried out in other countries.
The third part of this book concerns the practical application of the test. Chapter 10 deals
with the implications of the research results in practice, and with problems that can arise with
the interpretation of the results. The general directions for the administration and scoring of the
test are described in chapter 11; the directions for the separate subtests can be found in chapter
12. Chapter 13 gives guidelines for using the record form, the norm tables and the computer
program.
In the appendices the norm tables for determining the reference age, and the standardized
subtest and total scores can be found, as well as an example of the record form and a description
of the contents of the test kit.
In general, ages in the text and tables are presented in years and months. This means that 4;6
years equals four years and six months. In a few tables the mean ages are presented with a
decimal; this means that 4.5 years is the same as 4;6 years. In the norm tables the age of 4;6
years indicates an interval from four years, six months, zero days to four years, six months,
thirty days inclusive.
To improve legibility, statistical results have been rounded off. This can lead to seemingly
incorrect results. For instance a distribution of 38.5% and 61.5% becomes, when rounded off,
39% and 62%, and this does not add up to 100%. Similar small differences may occur in the
presentation of differences in means or between correlations.
Pearson product-moment correlations were used in the analyses. Unless stated otherwise,
the correlations were tested one-sidedly.

19

PREPARATORY STUDY AND


CONSTRUCTION RESEARCH

In this chapter, the test construction phase is described. In this phase, the research necessary to
construct a provisional version of the test was carried out. Successive improvements resulted in
the final test battery.

2.1 THE PREPARATORY STUDY


The preparatory study was carried out to discover how best to adapt and possibly to expand the
materials of the Preschool SON. To this end ten users of the Preschool SON were interviewed
about their experience with the test (via questionnaires). Secondary analyses were also carried
out on the original material from the standardization research of the Preschool SON. A review
of the literature and an analysis of other intelligence tests were undertaken as a preparation for
the revision (Tellegen, Wijnberg, Laros & Winkel, 1992).

Composition of the Preschool SON


The Preschool SON was composed of fifty items distributed over five subtests: Sorting, Mosaics, Combination, Memory and Copying. In the subtest Sorting, geometrical forms and pictures
were sorted according to the category to which they belong. The subtest Mosaics was an action
test in which various mosaic patterns had to be copied using red and yellow squares. Combination consisted of matching halves of pictures and doing puzzles. In the subtest Memory, also
called the Cat House, the aim was to find either one or two cats that were hidden several times in
the house. Copying consisted of copying figures that were drawn by the examiner or shown in a
test booklet.

Evaluation by users
An inventory of the comments received from ten users of the Preschool SON was made.
These were psychologists employed by school advisory services, audiological centers, institutes for the deaf, medical preschool daycare centers, and in the care for the mentally
deficient.
On the whole, the Preschool SON was given a positive assessment as a test to which children
respond well and that affords plenty of opportunity to observe the childs behavior. The users
did, however, have the impression that the IQ score of the Preschool SON overrated the level of
the children. Clear information about administering and scoring the various subtests was lacking in the manual. The users followed the directions accurately but not literally. Furthermore,
they thought the subtests contained too few examples. They were inclined to provide extra help,
especially to young and to mentally deficient children. The discontinuation criterion, used in the
Preschool SON, was three consecutive mistakes per subtest. This discontinuation rule was
considered too strict, particularly for the youngest children, and, in practice, this rule was not
always applied.
The subtest Memory was administered in different ways. Some users administered it as a
game, playing a kind of hide and seek, whereas others tried to avoid doing this. The users had
the impression that this subtest was given too much weight in the total score of the Preschool
SON. Also, some doubt existed about the relationship between this subtest and the other ones.

20

SON-R 2,-7

Comparative research on the Preschool SON, the Stanford-Binet and parts of the WPPSI was
conducted by Harris in the United States of America. In general, her assessment of the test was
positive. Her criticism focused on some of the materials and the global norm tables (Harris, 1982).

Secondary analyses of the standardization data


The original data from a sample of hearing children (N=503) involved in the standardization
research of the Preschool SON was used for the secondary analyses. A study was made of the
distribution of the test scores according to age, the correlation between the test scores and the
reliability. The results were as follows:
The standard deviation of the raw subtest scores was usually highest in children from four to
five years of age. For Mosaics and Copying, the range of scores for young children from 2;6
to 4 years was very restricted. For most subtests the range decreased greatly in the oldest
groups from 5;6 to 7 years.
In the conversion of the scores into IQ scores, the distributions were not sufficiently normalized, so that they were negatively skewed for children from five years onwards. This could
result in extremely low IQ scores.
The reliability for combinations of age groups was recalculated. After this, a correction for
age was carried out. The mean reliability of the subtests was .57 for children from 2;6 to four
years of age, .66 for children from four to five years, and .61 for children from 5;6 to seven
years. The reliability of the total score was .78 for children from 2;6 to four years, .86 for
children from four to five years, and .82 for children from 5;6 to seven years. Generally, the
reliability was low, especially for the youngest and oldest age groups where strong floor and
ceiling effects were present. The reliability of the subtests and the total scores was much
lower than the values mentioned in the manual of the Preschool SON. The cause of this
discrepancy was that, in the manual, the reliability was calculated for combined age groups
with no correction for age.
The generalizability of the total score is important for the interpretation of the IQ scores. In
this case, the subtests are seen as random samples from the domain of possible, relevant
subtests. The generalizability coefficient of the Preschool SON was .61 for the age group
from 2;6 to four years, .75 for the age group from four to five years and .65 for the age group
from 5;6 to seven years.
The reliability of the subtest Memory was low and the score on this subtest showed a low
correlation with age and with the scores on the remaining subtests.

Review of the literature


In the revision of the Preschool SON we attempted to produce a version that was compatible with
the early items of the SON-R 5,-17. As the subtest Analogies in the SON-R 5,-17 is one of its
strongest components, the possibility of developing a similar analogy test for young children was
examined. Based on recent research results (Alexander et al., 1989; Goswami, 1991) it seemed
possible to construct an analogy test for children from about 4 years of age onwards. Since an
analogy test would most likely be too difficult for the youngest children, starting this test with
sorting seemed advisable; the level of abstraction required for sorting is lower than the level of
abstraction required for understanding analogies, and, in a certain sense, precedes it.

Implications for the revision


The results of the preparatory study confirmed the need for a new standardization and a
thorough revision of the Preschool SON. An important goal in the revision of the Preschool
SON was the improvement of the psychometric characteristics of the test. The reliability and
the generalizability of the test scores were lower than was desirable, especially in the
youngest and oldest of the age groups for which the test was designed. However, an increase
in reliability could not be gained simply by expanding the number of items and subtests
because an increase in the duration of the test could lead to fatigue, loss of motivation and
decrease in concentration. Any expansion of the test had therefore to be combined with an
effective adaptive procedure.

21

PREPARATORY STUDY AND CONSTRUCTION RESEARCH

For the SON-R 5,-17 with an administration time of about one-and-a-half hours, the mean
reliability of the total score is .93 and the generalizability is .85. If the administration of the
SON-R 2,-7 was to be limited to one hour, a reliability of .90 and a generalizability of .80 seemed
to be realistic goals. The improvement of these characteristics could be achieved by adding very
easy and very difficult items to each subtest, and by increasing the number of subtests.
An important object during the revision of the Preschool SON was to obtain a good match
with the early items of the SON-R 5,-17. As the age ranges of the two tests overlapped, the idea
was to take the easy items of the SON-R 5,-17 as a starting point for the new, most difficult
items of the SON-R 2,-7.
These considerations led to a plan for the revision of the Preschool SON in which the subtest
Memory was dropped. The subtest Memory (the Cat House) had a low level of reliability and,
what is more, a low correlation with age and the remaining subtests. The interviews with users
of the Preschool SON showed that children enjoyed doing the Cat House subtest, but that the
directions for administration were often not followed correctly. Another consideration was that
assessment of memory can be carried out more effectively with a specific and comprehensive
test battery. The results from a single subtest are insufficient to draw valid conclusions about
memory. On the basis of similar considerations, no memory subtest had been included in the
SON-R 5,-17. The four remaining subtests of the Preschool SON were expanded to six subtests by dividing two existing subtests:
The subtest Sorting was divided into two subtests: the section Sorting Disks was expanded
with simple analogy items consisting of geometrical forms similar to the SON-R 5,-17; the
section Sorting Pictures was expanded with easy items from the subtest Categories of the
SON-R 5,-17.
The section of the subtest Combining, in which two halves of a picture had to be combined,
was expanded with items from the subtest Situations from the SON-R 5,-17; the section
Puzzles was expanded and implemented as a separate subtest.
The subtest Mosaics was expanded with simple items and with items from the SON-R 5,-17.
The subtest Copying was adapted to increase its similarity to the subtest Patterns of the
SON-R 5,-17.
The relationship between the subtests of the Preschool SON and the SON-R 2,-7 is presented
schematically in table 2.1.
Table 2.1
Relationship Between the Subtests of the Preschool SON and the SON-R 2,-7
Preschool SON

, -7
SON-R 2,

Subtest

Task

Subtest

Task

Sorting

Sorting disks

Analogies

Sorting disks
Analogies SON-R 5,-17

Sorting figures

Categories

Sorting figures
Categories SON-R 5,-17

Mosaics

Mosaics
with/without a frame

Mosaics

Mosaics in a frame
Mosaics SON-R 5,-17

Combination

Two halves of a
picture

Situations

Two halves of a picture


Situations SON-R 5,-17

Puzzles

Puzzles

Puzzles in a frame
separate puzzles

Patterns

Copying patterns

Memory

Finding cats

Copying

Copying drawn figures

22

SON-R 2,-7

2.2 THE CONSTRUCTION RESEARCH


In 1991/92, extensive research was done with three experimental versions of the test. These
were administered to more than 1850 children between two and eight years of age. The research
was carried out in preschool play groups, day care centers and primary schools across the
Netherlands. The versions were also administered on a small scale to deaf children and children
with learning problems. The examiners participating in the construction research were mainly
trained psychologists with experience in testing. Psychologists and educators who normally
make diagnostic assessments of young children were contacted in an early phase to obtain
information about the usability of the construction versions for children with specific problems.
More than twenty people in the field, employed by school advisory services, audiological
centers and outpatient departments, administered sections of the three versions to a number of
children. They commented on and gave suggestions for the construction of the material, the
directions and the administration procedure.

Points of departure for the construction


The most important objectives in the construction and administration of the experimental
versions were:
expanding the number of items and subtests to improve the reliability of the test and to make
the test more suitable for the youngest and the oldest age groups,
limiting the mean administration time to a maximum of one hour by using an effective
adaptive procedure,
making the testing materials both attractive for children and durable,
developing clear directions for the administration of the test and the manner of giving feedback.

Testing materials
From the first experimental version on, the test consisted of the following subtests: Mosaics,
Categories, Puzzles, Analogies, Situations and Patterns. This sequence was maintained throughout the three versions. Tests that are spatially oriented are alternated with tests that require
reasoning abilities, and abstract testing materials are alternated with materials using concrete
(reasoning) pictures. Mosaics is a suitable test to begin with as it requires little direction, the
child works actively at a solution, and the task corresponds to activities that are familiar to the
child.
The items of the experimental versions consisted of (adapted) items from the Preschool SON
and the SON-R 5,-17 and of newly constructed items. Most of the new items were very simple
items that would make the test better suited to young children. Table 2.2 shows the origin of the
items in the final version of the test. Of a total of 96 items, five of which are example items, 45%
are new, 25% are adaptations of Preschool SON items, and 30% are adaptions from the SON-R
5,-17.
In the first experimental version the original items of the Preschool SON and the SON-R
5,-17 were used. In the following versions all items of the subtests were redrawn and
reformed to improve the uniformity of the material and to simplify the directions for the
tasks. In the pictures of people the emphasis was on pictures of children and care was taken
to have an even distribution of boys and girls. More children with a non-western appearance
were included.
An effort was made to make the material colorful and attractive, durable and easy to store. A
mat was used to prevent the material from sliding around, to facilitate picking up the pieces and
to increase the standardization of the test situation.

Adaptive procedure and duration of administration


To make the test suitable for the age range from two to seven years, a broad range of task
difficulty is required. An adaptive test procedure is desirable to limit the duration of the test, and
to prevent children having to do tasks far above or far below their level. Having to do items that

23

PREPARATORY STUDY AND CONSTRUCTION RESEARCH

Table 2.2
Origin of the Items
,-7
Subtests of the SON-R 2,
Origin

Mos

Cat

Puz

Ana

Sit

Pat

Total

Adapted from
the Preschool SON

24

Adapted from
the SON-R 5,-17

29

New items

10

10

43

16

16

15

18

15

16

96

Total number of items,


including examples

are much too difficult is very frustrating and demotivating for children. When older children are
given items that are much too easy, they very quickly consider these childish and may then be
inclined not to take the next, more difficult items seriously.
In the Preschool SON a discontinuation rule of three consecutive mistakes was used.
Because the mistakes had to be consecutive, children sometimes had to make many mistakes
before the test could be stopped. In practice this meant that, especially with young children,
examiners often stopped too early. In the SON-R 5,-17 the items are arranged in two or three
parallel series and in each series the test is discontinued after a total of two mistakes. In the first
series the first item is taken as a starting point; in the following series the starting point depends
on the performance in the previous series. This method has great advantages: everyone starts the
test at the same point, but tasks that are too easy as well as tasks that are too difficult are skipped.
Further, returning to an easier level in the next series is pleasant for the child after he or she has
done a few tasks incorrectly.
Research was carried out with the first experimental version to see if the adaptive method of
the SON-R 5,-17 could also be applied with the SON-R 2,-7. The problem was, however, that
the subtests consist of two different parts. This makes a procedure with parallel series confusing
and complicated because switching repeatedly from one part of the test to the other may be
necessary. In the subsequent construction research, only one series of items of progressive
difficulty was used. However, the discontinuation criterion was varied and research was done on
the effect of using an entry procedure in which the item taken as a starting point depended on the
age of the child.
Finally, on the basis of the results of this research, a procedure was chosen in which the
first, third or fifth item is taken as a starting point and each subtest is discontinued after a
total of three mistakes. The performance subtests can also be discontinued when two subsequent mistakes are made in the second section of these tests. The items in these subtests have
a high level of discrimination, and the children require a fair amount of time to complete the
tasks. They become frustrated if they have to continue when the next item is clearly too
difficult for them.
As a result of the adaptive procedure, the number of items to be administered is strictly
limited, and the mean duration of the test is less than an hour, but very little information is lost
by skipping a few items. Further, the childrens motivation remains high during this procedure
because only a very few items above their level are administered.

Difficulty of items and ability to discriminate


After each phase of research the results were analyzed per subtest with the 2-parameter logistic
model from the item response theory (IRT; see Lord, 1980; Hambleton & Swaminathan, 1985).
The program BILOG (Mislevy & Bock, 1990) was used for this analysis. With this program the
parameters for difficulty and discrimination of items can be estimated for incomplete tests. The

24

SON-R 2,-7

IRT-model was used because the adaptive administration procedure makes it difficult to evaluate these characteristics on the basis of p-values and item-total correlations. The parameter for
difficulty indicates a level of ability at which 50% of the children solve the item correctly; the
parameter for discrimination indicates how, at this level, the probability that the item will be
answered correctly increases as ability increases.
Because of the use of an adaptive procedure, it was important that the items were
administered in the correct order of progressive difficulty; the examiner had to be reasonably
certain that items skipped at the beginning would have been solved correctly, and that items
skipped at the end would have been solved incorrectly. Also important was a balanced
distribution in the difficulty of the items, and sufficient numbers of easy items for young
children and difficult items for older ones. On the basis of the results of the IRT-analysis,
new items were constructed, some old items were adapted and others were removed from the
test. In some cases the order of administration was changed. A problem arising from this was
that items may become more difficult when administered early in the test. The help and
feedback given after an incorrect solution may benefit the child so that the next, more
difficult item becomes relatively more easy.

Directions and feedback


An important feature of the SON-tests is that directions can be given verbally as well as
nonverbally. This makes the test situation more natural because the directions can correspond to
the communication skills of the child. When verbal directions are given, care must be taken not
to provide extra information that is not contained in the nonverbal directions. However, nonverbal directions have their limitations, so that explaining to the children exactly what is expected
of them is difficult, certainly with young children. Examples were therefore built into the first
items to give the child the opportunity to repeat what the examiner had done or to solve a similar
task. As the test proceeds, tasks are solved more and more independently. To make the items of
the SON-R 5,-17 suitable for this approach, they were also adapted, for example, by first
working with cards that have to be arranged correctly instead of pointing out the correct alternative.
Not only does the difficulty of the items increase in the subtests, the manner in which they
are administered changes as well. In the construction research this procedure was continuously
adapted, and the directions were improved in accordance with the experiences and comments of
the examiners and of practicing psychologists. The greatest problems in developing clear directions arose in the second section of the subtest Analogies. Here the child has to apply a similar
transformation to a figure as is shown in an example. This is difficult to demonstrate nonverbally because of the high level of abstraction, but it can be explained in a few words. The test
therefore provides first for extensive, repeated practice on one example, and then provides an
example with every following item.
The feedback and help given after an incorrect solution is important in giving the child a
clear understanding of the aim of the tasks. The manner in which feedback and help should be
given was worked out in greater detail during the research and is described in the directions.

Scoring Patterns
In the subtest Patterns lines and figures must be copied, with or without the help of preprinted
dots. Whether the child can draw neatly or accurately is not important when copying, but
whether he or she can see and reproduce the structure of the example is. This makes high
demands on the assessment and a certain measure of subjectivity cannot be excluded. During
the construction research, a great deal of attention was paid to elucidating the scoring rules, and
inter-assessor discrepancies were used to determine which drawings were difficult to evaluate.
On this basis, drawings that help to clarify the scoring rules were selected. These drawings are
included in the directions for the administration of Patterns.

25

, -7
DESCRIPTION OF THE SON-R 2,

The SON-R 2,-7 is a general intelligence test for young children. The test assesses a broad
spectrum of cognitive abilities without involving the use of language. This makes it especially
suitable for children who have problems or handicaps in language, speech or communication,
for instance, children with a language, speech or hearing disorder, deaf children, autistic children, children with problems in social development, and immigrant children with a different
native language.
A number of features make the test particularly suitable for less gifted children and children
who are difficult to test. The materials are attractive, the tasks diverse. The child is given the
chance to be active. Extensive examples are provided. Help is available on incorrect responses,
and the discontinuation rules restrict the administration of items that are too difficult for the
child.
The SON-R 2,-7 differs in various aspects from the more traditional intelligence tests, in
content as well as in manner of administration. Therefore, this test can well be administered as
a second test in cases where important decisions have to be taken, on the basis of the outcome of
a test, or if the validity of the first test is in doubt.
Although the reasoning tests in the SON-R 2,-7 are an important addition to the typical
performance tests, the nonverbal character of the SON tests limits the range of cognitive
abilities that can be tested. Other tests will be required to gain an insight into verbal
development and abilities. However, for those groups of children for whom the SON-R
2,-7 has been specifically designed, a clear distinction must be made between intelligence
and verbal development.
After describing the composition of the subtests, the most important characteristics of the test
administration are presented in this chapter.

3.1 THE SUBTESTS


The SON-R 2,-7 is composed of six subtests:
1. Mosaics,
2. Categories,
3. Puzzles,
4. Analogies,
5. Situations and
6. Patterns.
The subtests are administered in this sequence. The tests can be grouped into two types:
reasoning tests (Categories, Analogies and Situations) and more spatial, performance tests
(Mosaics, Puzzles and Patterns). The six subtests consist, on average, of 15 items of increasing difficulty. Each subtest consists of two parts that differ in materials and/or directions.
In the first part the examples are included in the items. The second part of each subtest,
except in the case of the Patterns subtest, is preceded by an example, and the subsequent
items are completed independently. In table 3.1 a short description is given of the tasks
in both sections of the subtests. In figures 3.1 to 3.6 a few examples of the items are
presented.

26

SON-R 2,-7

Table 3.1
Tasks in the Subtests of the SON-R 2,-7
Task part I

Task part II

Mosaics

Copying different simple


mosaic patterns in a frame,
using red squares.

Copying mosaic patterns in


a frame, using red, yellow
and red/yellow squares.

Categories

Sorting cards into two groups


according to the category to
which they belong.

Three pictures of objects have


something in common.
From a series of five pictures,
two must be chosen that have the
same thing in common.

Puzzles

Puzzle pieces must be laid


in a frame to resemble a
given example.

Putting three to six separate


puzzle pieces together to
form a whole.

Analogies

Sorting disks into two


compartments on the basis
of form and/or color
and/or size.

Solving an analogy problem by


applying the same principle
of change as in the example
analogy.

Situations

Half of each of four pictures


is printed. The missing
halves must be placed with
the correct pictures.

One or two pieces are missing in


drawing of a situation.
The correct piece(s) must be chosen
from a number of alternatives.

Patterns

Copying a simple pattern.

Copying a pattern in which five, nine


or sixteen dots must be connected by
a line.

Mosaics (Mos)
The subtest Mosaics consists of 15 items. In Mosaics, part I, the child is required to copy several
simple mosaic patterns in a frame using three to five red squares. The level of difficulty is
determined by the number of squares to be used and whether or not the examiner first demonstrates the item.
In Mosaics II, diverse mosaic patterns have to be copied in a frame using red, yellow and red/
yellow squares. In the easiest items of part II, only red and yellow squares are used, and the
pattern is printed in the actual size. In the most difficult items, all of the squares are used and the
pattern is scaled down.

Categories (Cat)
Categories consists of 15 items. In Categories I, four or six cards have to be sorted into two
groups according to the category to which they belong. In the first few items, the drawings on
the cards belonging to the same category strongly resemble each other. For example, a shoe or a
flower is shown in different positions. In the last items of part I, the child must him or herself
identify the concept underlying the category: for example, vehicles with or without an engine.
Categories II is a multiple choice test. In this part, the child is shown three pictures of objects
that have something in common. Two more pictures that have the same thing in common have
then to be chosen from another column of five pictures. The level of difficulty is determined by
the level of abstraction of the shared characteristic.

Puzzles (Puz)
The subtest Puzzles consists of 14 items. In part I, puzzle pieces must be laid in a frame to

27

DESCRIPTION OF THE SON-R 2,-7

Figure 3.1
Items from the Subtest Mosaics

Item 3
(Part I)

Item 9
(Part II)

Item 14
(Part II)

resemble the given example. Each puzzle has three pieces. The first few puzzles are first
demonstrated by the examiner. The most difficult puzzles in part I have to be solved independently.
In Puzzles II, a whole must be formed from three to six separate puzzle pieces. No directions
are given as to what the puzzles should represent; no example or frame is used. The number of
puzzle pieces partially determines the level of difficulty.
Figure 3.2
Items from the Subtest Categories

Item 4
(Part I)

Item 11
(Part II)

28

SON-R 2,-7

Figure 3.3
Items from the Subtest Puzzles

Item 3
(Part I)

Item 11
(Part II)

Analogies (Ana)
The subtest Analogies consists of 17 items. In Analogies I, the child is required to sort three,
four or five blocks into two compartments on the basis of either form, color or size. The child
must discover the sorting principle him or herself on the basis of an example. In the first few
items, the blocks to be sorted are the same as those pictured in the test booklet. In the last items
of part I, the child must discover the underlying principle independently: for example, large
versus small blocks.
Analogies II is a multiple choice test. Each item consists of an example-analogy in which a
geometric figure changes in one or more aspect(s) to form another geometric figure. The
examiner demonstrates a similar analogy, using the same principle of change. Together with the
child, the examiner chooses the correct alternative from several possibilities. Then, the child has
to apply the same principle of change to solve another analogy independently. The level of
difficulty of the items is related to the number and complexity of the transformations.

Situations (Sit)
The subtest Situations consists of 14 items. Situations I consists of items in which one half of
each of four pictures is shown in the test booklet. The child has to place the missing halves
beside the correct pictures. The first item is printed in color in order to make the principle clear.
The level of difficulty is determined by the degree of similarity between the different halves
belonging to an item.
Situations II is a multiple choice test. Each item consists of a drawing of a situation with one
or two pieces missing. The correct piece (or pieces) must be chosen from a number of alternatives to make the situation logically consistent. The number of missing pieces determines the
level of difficulty.
Patterns (Pat)
The subtest Patterns consists of 16 items. In this subtest the child is required to copy an
example. The first items are drawn freely, then pre-printed dots have to be connected to make
the pattern resemble the example. The items of Patterns I are first demonstrated by the examiner
and consist of no more than five dots.

DESCRIPTION OF THE SON-R 2,-7

29

Figure 3.4
Items from the Subtest Analogies

Item 8

(Part I)

Item 9
(Part I)

Item 16
(Part II)

30

SON-R 2,-7

Figure 3.5
Items from the Subtest Situations

Item 5
(Part I)

Item 10
(Part II)

31

DESCRIPTION OF THE SON-R 2,-7

Figure 3.6
Items from the Subtest Patterns

Item 6
(Part I)

Item 13
(Part II)

Item 16
(Part II)

The items in Patterns II consist of five, nine or sixteen dots and have to be copied by the child
without help. The level of difficulty is determined by the number of dots and whether or not the
dots are pictured in the example pattern.

3.2 REASONING TESTS, SPATIAL TESTS AND


PERFORMANCE TESTS
Reasoning tests
Reasoning abilities have traditionally been seen as the basis for intelligent functioning (Carroll,
1993). Reasoning tests form the core of most intelligence tests. They can be divided into
abstract and concrete reasoning tests. Abstract reasoning tests, such as Analogies and Categories, are based on relationships between concepts that are abstract, i.e., not bound by time or
place. In abstract reasoning tests, a principle of order must be derived from the test materials
presented, and applied to new materials. In concrete reasoning tests, like Situations, the object is
to bring about a realistic time-space connection between persons or objects (see Snijders,
Tellegen & Laros, 1989).

Spatial tests
Spatial tests correspond to concrete reasoning tests in that, in both cases, a relationship within a
spatial whole must be constructed. The difference lies in the fact that concrete reasoning tests
concern a meaningful relationship between parts of a picture, and spatial tests concern a form
relationship between pieces or parts of a figure (see Snijders, Tellegen & Laros, 1989; Carroll,
1993). Spatial tests have long been integral components of intelligence tests. The spatial subtests included in the SON-R 2,-7 are Mosaics and Patterns. The subtest Puzzles is more
difficult to classify, as the relationship between the parts concerns form as well as meaning. We
expected the performance on Puzzles and Situations to relate to concrete reasoning ability.

32

SON-R 2,-7

However, the correlations and factor analysis show that Puzzles is more closely associated with
Mosaics and Patterns (see section 5.3)

Performance tests
An important characteristic that Puzzles, Mosaics and Patterns have in common is that the item
is solved while manipulating the test stimuli. That is why these three subtests are called performance tests. In the three reasoning tests (Situations, Categories and Analogies), in contrast,
the correct solution has to be chosen from a number of alternatives. For the rest, the six subtests
are very similar in that perceptual and spatial aspects as well as reasoning ability play a role in
all of them.
The performance subtests of the SON-R 2,-7 can be found in a similar form in other
intelligence tests. However, only verbal directions are given in these tests. Reasoning tests can
also regularly be found in other intelligence tests, but then they often have a verbal form (such as
verbal analogies).
In table 3.2 the classification of the subtests is presented. The empirical classification, in which
a distinction is made between performance tests and reasoning tests, is based on the results of
principal components analysis of the test scores of several different groups of children (see
section 5.4). In table 3.2. the number of each subtest indicates the sequence of administration;
the sequence of the subtests in the table is based on similarities of content. This sequence is used
in the following chapters when presenting the results.
Table 3.2
Classification of the Subtests
No

Abbr

Subtest

Content

Empirical

6
1
3
5
2
4

Pat
Mos
Puz
Sit
Cat
Ana

Patterns
Mosaics
Puzzles
Situations
Categories
Analogies

Spatial insight
Spatial insight
Concrete reasoning
Concrete reasoning
Abstract reasoning
Abstract reasoning

Performance test
Performance test
Performance test
Reasoning test
Reasoning test
Reasoning test

3.3 CHARACTERISTICS OF THE ADMINISTRATION


In this section the most important characteristics of the SON-R 2,-7 are discussed.

Individual intelligence test


Most intelligence tests for children are administered individually. The SON-R 2,-7 follows this
tradition for the following reasons:
the directions can be given nonverbally,
feedback can be given in the correct manner,
testing can be tailored to the level of each individual child,
the examiner can encourage children who are not very motivated or cannot concentrate;
personal contact between the child and the examiner is essential for effective testing, certainly
for children up to the age of four to five years.

Nonverbal intelligence test


The SON-R 2,-7 is nonverbal. This means that the test can be administered without the use of
spoken or written language. The examiner and the child are not required to speak or write and
the testing materials have no language component. One is, however, allowed to speak during the

DESCRIPTION OF THE SON-R 2,-7

33

test administration, otherwise an unnatural situation would arise. The manner of administration
of the test depends on the communication abilities of the child. The directions can be given
verbally, nonverbally with gestures or using a combination of both. Care must be taken when
giving verbal directions that no extra information is given.
No knowledge of a specific language is required to solve the items being presented. However, level of language development, for example, being able to name objects, characteristics
and concepts, can influence the ability to solve the problems correctly. Therefore the SON-R
2,-7 should be considered a nonverbal test for intelligence rather than a test for nonverbal
intelligence.

Directions
An important part of the directions to the child is the demonstration of (part of) the solution to a
problem. An example item is included in the administration of the first item on each subtest, and
detailed directions are given for all first items. Once the child understands the nature of the task,
the examiner can shorten the directions for the following items. If the child does not understand
the directions, they can be repeated.
In the second part of each subtest an example is given in advance. Once the child understands
this example, he or she can do the following items independently.

Feedback
The examiner gives feedback after each item. In the SON-R 5,-17, feedback is limited to
telling the child whether his of her answer is correct or incorrect. In the SON-R 2,-7 the
examiner indicates whether the solution is correct or incorrect, and, if the answer is incorrect,
he/she also demonstrates the correct solution for the child. The examiner tries to involve the
child when correcting the answer, for instance, by letting him or her perform the last action.
However, the examiner does not explain why the answer was incorrect.
By giving feedback, a more normal interaction between the examiner and the child occurs,
and the child gains a clearer understanding of the task. The child is given the opportunity to
learn and to correct him or herself. In this respect a similarity exists between the SON-tests and
tests for learning potential (Tellegen & Laros, 1993a).

Entry procedure and discontinuation rule


Each subtest begins with an entry procedure. Based on age and, when possible, the estimated
cognitive level of the child, a start is made with the first, third or fifth item. This procedure was
chosen to prevent children from becoming demotivated by being required to solve too many
items that are below their level. The design of the entry procedure ensures that the first items the
child skips would have been solved correctly. Should the level chosen later appear to be too
difficult, the examiner can return to a lower level. However, because of the manner in which the
test has been constructed, this should occur infrequently.
Each subtest has rules for discontinuation. A subtest is discontinued when a total of three
items has been incorrectly solved. The mistakes do not have to be consecutive. The three
performance subtests are also discontinued when two consecutive mistakes are made in the
second part. Frequent failure often has a drastically demotivating effect on children and can
result in refusal to go on.

Time factor
The speed with which the problems are solved plays a very subordinate role in the SON-R
2,-7. A time limit for completing the items is used only in the second part of the performance
tests. The time limit is generous. Its goal is to allow the examiner to end the item. The construction research showed that children who go beyond the time limit are seldom able to find a
correct solution when given more time.

Duration of test administration


The administration of the SON-R 2,-7 takes about 50 minutes (excluding any short breaks

34

SON-R 2,-7

during administration). During the standardization research the administration took between
forty and sixty minutes in 60% of the cases. For children with a specific handicap, the administration takes about five minutes longer. For children two years of age, administration time is
shorter; nearly 50% of the two-year-olds complete the test in less that forty minutes.

Standardization
The SON-R 2,-7 is meant primarily for children in the age range from 2;6 to 7;0 years. The
norms were constructed using a mathematical model in which performance is described as a
continuous function of age. An estimate is made of the development of performance in the
population, on the basis of the results of the norm groups (see chapter 4). These norms run from
2;0 to 8;0 years. In the age group from 2;0 to 2;6 years, the test should only be used for
experimental purposes. In many cases the test is too difficult for children younger than 2;6
years. Often, they are not motivated or concentrated enough to do the test. However, in the age
group from 7;0 to 8;0 years, the test is eminently suitable for children with a cognitive delay or
who are difficult to test. The easy starting level and the help and feedback given can benefit
these children. For children of seven years old who are developing normally, the SON-R 5,-17
is generally more appropriate.
The scaled subtest scores are presented as standard scores with a mean of 10 and a standard
deviation of 3. The scores range from 1 to 19. The SON-IQ, based on the sum of the scaled
subtest scores, has a mean of 100 and a standard deviation of 15. The SON-IQ ranges from 50 to
150. Separate total scores can be calculated for the three performance tests (SON-PS) and the
three reasoning tests (SON-RS). These have the same distribution characteristics as the IQ
score. When using the computer program, the scaled scores are based on the exact age; in the
norm tables age groups of one month are presented. With the computer program, a scaled total
score can be calculated for any combination of subtests.
In addition to the scaled scores, based on a comparison with the population of children of the
same age, a reference age can be determined for the subtest scores and the total scores. This
shows the age at which 50% of the children in the norm population perform better, and 50%
perform worse. The reference age ranges from 2;0 to 8;0 years. It provides a different framework for the interpretation of the test results, and can be useful when reporting to persons who
are not familiar with the characteristics of deviation scores. The reference age also makes it
possible to interpret the performance of older children or adults with a cognitive delay, for
whom administration of a test, standardized for their age, is practically impossible and not
meaningful.
As with the SON-R 5,-17, no separate norms for deaf children were developed for the
SON-R 2,-7. Our basic assumption is that separate norms for specific groups are only required
when a test discriminates against a special group of children because of its contents or the
manner in which it is administered. Research using the SON-R 2,-7 and the SON-R 5,-17
with deaf children (see chapter 7) shows that this is absolutely not the case for deaf children
with the SON tests.

35

STANDARDIZATION OF THE TEST SCORES

Properly standardized test norms are necessary for the interpretation of the results of a test. The
test norms make it possible to assess how well or how badly a child performed in comparison to
the norm population. The norm population of the SON-R 2,-7 includes all children residing in
the Netherlands in the relevant age group, except those with a severe physical and/or mental
handicap. The standardization process transforms the raw scores into normal distributions with
a fixed mean and standard deviation. This allows comparisons to be made between children,
including children of different ages. Intra-individual comparisons between performances on
different subtests are also possible. As test performances improve very strongly in the age range
from two to seven years, the norms should ideally be related to the exact age of the child and not
to an age range, as is the case for most intelligence tests for children.

4.1 DESIGN AND REALIZATION OF THE RESEARCH


Age groups
Eleven age groups, increasing in age by 6 months, from 2;3 years to 7;3 years formed the point
of departure for the standardization research. In each group one hundred children were to be
tested: fifty boys and fifty girls. When selecting the children, an effort was made to keep the age
within each group as homogeneous as possible. The age in the youngest group, for instance, was
supposed to deviate as little as possible from two years, three months and zero days.

Regions of research
To ensure a good regional distribution, the research was carried out in ten regions, five of which
are in the West, three in the North/East, and two in the South of the Netherlands. The regions
were chosen to reflect specific demographic characteristics of the Netherlands. In nine of the ten
regions, one examiner administered all the tests. In one region, two examiners shared the test
administration. Approximately the same number of children was tested in each region in five
separate two week periods. The test was administered to 22 children, one boy and one girl from
each age group in each region in each period. The sample to be tested consisted of 1100
children, i.e., 10 (regions) x 5 (periods) x 11 (age groups) x 2 (one boy and one girl).

Communities
The second phase of the standardization research concerned the selection of the communities in
the ten research regions where the test administrations were to take place. In total, 31 communities were selected. Depending on the size of the community, the research was carried out during
one, two or three periods. The selected communities were representative for the Netherlands
with regard to number of inhabitants and degree of urbanization.

Schools
Children four years and older were tested at primary schools. Research at schools was carried
out in the same communities as the research with younger children. One, two or three schools
were selected in each community, depending on the number of periods in which research was to
be done in that community. To select the schools, a sample was drawn from the schools in each
community. The chance of inclusion was proportional to the number of pupils at the school.

36

SON-R 2,-7

Fifty schools were approached, 25 were prepared to participate. Schools that were not prepared
to participate were replaced by other schools in the same community. The socio-economic
status of the parents was taken into account in the choice of replacement schools.

Selection of the children


The manner of selecting the children depended on their age. For children in the age groups up to
four years, samples were drawn from the local population register, which contains data on
name, date of birth, sex and address of the parents. The boy or girl, whose age corresponded
most closely to the required age for each age group was selected. The parents received a letter
explaining the aims of the research and asking them to participate. If no reaction to this letter
was received, they were approached again by letter of by telephone.
In about one quarter of the cases, the test could not be administered to the child that had
originally been selected. Some parents refused permission for their child to participate. Sometimes, the data from the population register were no longer correct, or practical problems made
it impossible for the parents to allow their child to participate in the research program. In this
case, the children were replaced, as far as possible, by children from the same community.
For children four years and older, the experimenter selected, per school and per age group,
one boy and one girl whose age on the planned test date corresponded as closely as possible to
the required age. If the deviation from the required age was too large, either two boys or two
girls were selected from one age group, or one extra child was tested at another school. Parents
were sent a written request for permission, which was nearly always given.

Practical implementation
The department of Orthopedagogics of the University of Groningen, responsible for the standardization in the Netherlands of the Reynell Test for Language Understanding and the Schlichting Test for Language Production (Lutje Spelberg & Sj. van der Meulen, 1990), collaborated in
the design and execution of the standardization research. In three of the five research periods,
children who were tested with the SON-R 2,-7 had also participated in the standardization
research of the language tests six months earlier. To validate both the language tests and the
SON-R 2,-7, a third test was administered to some of the children in the intervening period.
Eleven examiners, eight women and three men, administered most of the tests. Most were
psychology graduates, with extensive experience in testing young children, some of which had
been gained in the previous research they had carried out with the language tests.
Children below four years old were tested in a local primary health care center, in the
presence of one of the parents. In a few cases the child was tested at home. Older children were
tested at school in a separate room. An effort was made to administer the whole test in one
session. However, a short break between the subtests was allowed. At the schools, breaking off
the test for longer periods, or even continuing a test the next day, was sometimes necessary
because of school hours and breaks.
In a few cases the test could not be administered correctly. If no more than four subtests
could be administered, the test was considered invalid and was not used in the analyses. This
situation occurred in the case of ten children, eight of whom who were two years old.

Completing the norm group


The greater part of the standardization research took place in the period from September to
December 1993. As fewer children than had been planned were tested in the youngest age
groups, the norm group was supplemented with 31 children in the spring of 1994. Further,
immigrant children appeared to be under-represented in the youngest age groups. Eight immigrant children, who had been tested in a different research project were therefore added to the
norm group. Finally, eight pupils, 4 years or older, from special schools were added. This was a
sample from a group of children who had been tested at schools for special education with a
preschool department.

37

STANDARDIZATION OF THE TEST SCORES

4.2 COMPOSITION OF THE NORM GROUP


The norm group consisted of 1124 children. Table 4.1 shows the composition of the group
according to age and sex, and the distribution according to age of the children who were added
to the norm group for various reasons. The mean age per group is practically identical to the
planned age, and the distribution according to age within the age groups is very narrow. In all
the groups the number of boys is approximately equal to the number of girls.
The extent to which the distribution of the selected demographic characteristics of the norm
group conformed to that of the total Dutch population (Central Bureau for Demographics, CBS,
1993) is presented in table 4.2. Children from the large urban communities are slightly underrepresented, but these communities are also characterized by a relatively smaller number of
youngsters.

Weighting the norm group


As a result of sample fluctuations and the different sampling methods used for children above and
below four years of age, the backgrounds of the children differed from age group to age group.
For the standardization, the following factors were weighted within each age group: the percentage of children with a mother born abroad, the educational level of the mother, and the childs
sex. This allowed a better comparison between the different age groups. Finally, the observations
were weighted so that the number of children per age group was the same. After weighting, every
age group consisted of 51 boys and 51 girls, making the size of the total sample 1122.
An example may elucidate this weighting procedure. The percentage of children with a
foreign mother in the entire norm group was 11%. If the percentage in the age group 3;9 years,
for example, was 8%, the children with a foreign mother in this age group received a weight of
11/8, and the children with a Dutch mother received a weight of (100-11)/(100-8) = 89/92.
When using weights, critical limits of 2/3 and 3/2 were adhered to, in order to prevent some
children contributing either too much or too little to the composition of the weighted norm
group. After the various steps in the weighting procedure, 80% of the children had a weighting
factor between .80 and 1.25.
Table 4.1
Composition of the Norm Group According to Age, Sex and Phase of Research (N=1124)
Age Group

Total

2;3

2;9

3;3

3;9

4;3

4;9

5;3

5;9

6;3

6;9

7;3

98

99

99

100

102

101

105

105

102

107

106

94

89

86

90

99

101

104

104

101

105

103

3
1

9
1

11
2

7
3

2
1

47
51

50
49

48
51

52
48

52
50

50
51

52
53

53
52

49
53

53
54

55
51

2.24
14

2.76
16

3.25
16

3.75
14

4.25
15

4.74
22

5.24
22

5.74
24

6.25
18

6.74
23

7.24
21

Phase
1993
Addition:
1994
Immigrant
Spec. Educ.
Sex
Boys
Girls
Age
Mean (years)
SD (days)

38

SON-R 2,-7

Table 4.2
Demographic Characteristics of the Norm Group in Comparison with the Dutch Population
(N=1124)
Region
North/East-Netherlands
South-Netherlands
West-Netherlands
Size of Community
Less than 10.000 inhabitants
10.000 to 20.000 inhabitants
20.000 to 100.000 inhabitants
More than 100.000 inhabitants
Degree of Urbanization
(Urbanized) Rural Communities
Commuter Communities
Urban Communities

Norm Group

Population

31%
19%
50%

31%
22%
47%

Norm Group

Population

12%
22%
44%
22%

11%
20%
42%
27%

Norm Group

Population

37%
16%
47%

34%
15%
51%

Table 4.3 presents the level of education and country of birth of the mother, before and after
weighting, for three age groups. As can be seen, the differences between the age groups were
much smaller after weighting. The level of education of the mothers corresponded well to the
level of education in the population of women between 25 and 45 years of age (CBS, 1994). The
percentages for low, middle and high levels of education in the population are respectively 27%,
54% and 19%. The percentage of children whose mother was born abroad also corresponded to
the national percentage of 10% immigrant children in the age range from zero to ten years
(Roelandt, Roijen & Veenman, 1992).
Table 4.3
Education and Country of Birth of the Mother in the Weighted and Unweighted Norm Group

Unweighted Norm Group

Education Mother
Low
Middle
High

Country of Birth Mother


Netherlands
Abroad

2 and 3 years
4 and 5 years
6 and 7 years

26%
32%
40%

57%
51%
45%

17%
17%
15%

91%
90%
86%

9%
10%
14%

Total

32%

51%

17%

89%

11%

Weighted Norm Group

Education mother
Low
Middle
High

2 and 3 years
4 and 5 years
6 and 7 years

28%
32%
33%

54%
52%
50%

18%
16%
17%

89%
89%
87%

11%
11%
13%

Total

31%

52%

17%

89%

11%

Country of Birth Mother


Netherlands
Abroad

STANDARDIZATION OF THE TEST SCORES

39

4.3 THE STANDARDIZATION MODEL


Subtest scores
The first step in standardization is transforming the raw subtest scores to normally distributed
scores with a fixed mean and standard deviation. Usually, these transformations are carried out
separately for each age group. The disadvantage of this method, however, is that the relatively
small number of subjects in each age group allows chance factors to play an important role in
the transformations. In the SON-R 2,-7, a different method, developed for the standardization
of the SON-R 5,-17, was applied (Snijders, Tellegen & Laros, 1989, p.43-45; Laros &
Tellegen, 1991, p. 156-157).
In this method, the score distributions for all age groups are fitted simultaneously as a
continuous function of age. This is done for each subtest separately. The function gives an
estimate, dependent on age, of the distribution of the scores in the population. With the fitting
procedure an effort is made to minimize the difference between the observed distribution and
the estimated population distribution, while limiting the number of parameters of the function.
Within the age range of the model two pre-conditions must be met:
1. For each age, the standardized score must increase if the raw score increases.
2. For each raw score, the standardized score must decrease if the age increases.
A great advantage of this method is that the use of information on all age groups simultaneously
makes the standardization much more accurate. Further, the standardized scores can be calculated on the basis of the exact age. The model also allows for extrapolation outside the age range in
which the standardization research was carried out. In the SON-R 2,-7, the model had to
comply with the pre-conditions for the age range from 2;0 to 8;0 years.

The logistic regression model


The logistic regression model is used to estimate parameters of a function in order to describe
the chance of a certain occurrence as precisely as possible. The model has the following form:
Chance(occurrence) = exp[Z]/(1+exp[Z])
Z can be a composite function of independent variables, in our case, age and score. The dependent variable is defined by determining for each person and for each possible score (in the range
from 0 to the maximum score minus 1), whether that score or a lower score was received. If this
is the case, the dependent variable is given the value 1. If this is not the case, the dependent
variable is given the value 0.
Because of the narrow distribution of age in each subgroup, the analysis was based on the
mean age in the subgroup. However, our model has the special characteristic that standardization does not need to be based on homogeneous age groups.
The regression procedure was carried out in two phases. In the first phase, Z was defined as
follows:
Z = b 0 + b1 X + b 2 X 2 + b 3X 3 + b4 X 4 + b 5 X 5 + b 6Y + b 7 Y 2 + b 8 Y 3
Here b0 through b8 are the estimated parameters, X through X5 are powers of the raw score, and
Y through Y3 are powers of age. When fitting the model, the procedure for logistic regression in
SPSS was used (SPSS Inc, 1990).
Using the parameters found for the third degree function of age, age was transformed to Y in
such a manner that the relation between Y and the test scores in the above mentioned model
became linear. In the following phase, Y was used in the regression analysis and the interaction
between score and age was added to the model. The definition of Z in this second phase was:
Z = b0 + b1X + b2X2 + b3X3 + b4X4 + b5X5 + b6Y + b7Y*X
+ b8Y*X2 + b9Y*X3 + b10Y*X4 + b11Y*X5

40

SON-R 2,-7

After the stepwise fitting procedure, the number of selected parameters in the subtests varied
from six to ten. The cumulative proportion in the population, in the age range from two to eight,
could then be estimated for every possible combination of age and score. Normally distributed
z-values were then determined by calculating the mean z-value for the normal distribution
interval that corresponded to the upper limit and the lower limit of each raw score. The averaging procedure caused a slight loss of dispersion, for which we corrected.
This model may seem to be complicated. However, for simple linear transformations per age
group, twenty-two parameters for each subtest would have to be estimated, and in the case of
nonlinear transformations based on the cumulative proportions, more than one hundred parameters would have to be estimated.

Reliability
For each subtest and age group the reliability was calculated with the formula for labda2
(Guttman, 1945). This is, like labda3 (Coefficient alpha; Cronbach, 1951), a measure for internal consistency. However, labda2 is preferable if the number of items is limited, and if the
covariance between the items is not constant (Ten Berge & Zegers, 1978).
The reliability for each subtest was fitted as a third degree function of the transformed age
(Y), using the method of stepwise multiple regression. In a few cases, when extrapolating to the
ages of 2;0 and 8;0, extreme values occurred for the estimate of reliability. In these cases, the
lower limit for the estimated value was set at .30 and the upper limit at .85.

Correlations and total scores


In each age group correlations between the standardized subtest scores were first corrected for
unreliability, and then fitted as a third degree function of age. Using the estimated values of the
correlations in the population, the standard deviation of the total score could be calculated for
every age and every combination of subtests, and transformed into the required standardized
distribution.

4.4 THE SCALED SCORES


The scaled scores are presented in two different ways, as standard scores and as reference ages.
The standard score (also called deviation score) shows how well or how badly the child performs in relation to the population of children of the same age. The reference age (also called
mental age or test age) shows at which age 50% of the children in the population perform worse
than the subject. Unless stated otherwise, standard scores are meant when scaled scores are
mentioned in this manual. In the following section, a short explanation is given of scaled scores
of the SON-R 2,-7.

Standard scores
Scaled subtest scores are presented on a normally distributed scale with a mean of 10 and a
standard deviation of 3. These so-called Wechsler scores have a range of 1 to 19. As a result of
floor and ceiling effects, the most extreme scores will not occur in all age groups. The raw
scores of the subtests are less differentiated than the standard scores. As a result, only some of
the values in the range of 1 to 19 are used in each age group. However, the values show the
position in the normal distribution with more precision which would not be possible with a less
differentiated scale.
The sum of the six scaled subtest scores is the basis of the IQ score. This SON-IQ has a mean
of 100 and a standard deviation of 15. The range extends from 50 to 150.
The sum of the scaled scores of Mosaics, Puzzles and Patterns is transformed to provide the
Performance Scale (SON-PS), and the sum of Categories, Situations and Analogies forms the
Reasoning Scale (SON-RS). Both scales, like the IQ-distribution, have a mean of 100 and a
standard deviation of 15. The range extends from 50 to 150.
In the Appendix, the norm tables for the subtests are shown for each month of age, for the age

STANDARDIZATION OF THE TEST SCORES

41

range 2;0 to 8;0 years. The tables for calculating the standardized total scores are presented per
four month period. When the computer program is used, all the standardized scores are based on
the exact age.

Reference age
The reference age is derived from the raw score(s). The actual age of the child is not important.
For the age range of 2;0 to 8;0 years, the reference age is presented in years and months. The
reference age for the subtests can be found in the norm tables. The reference age for the total
score is the age at which a child with this raw scores would receive an IQ score of 100. This age
is determined iteratively, with the help of the computer program, for the Total Score on the test,
the Performance Scale and the Reasoning Scale. An approximation of the reference age for the
total score is presented in the norm tables in the appendix. This approximation is based on the
sum of the six raw subtest scores.
For use of the norm tables and the computer program, we refer to chapter 13 (The record form,
norm tables and computer program). Directions on the procedure to be used when the test has
not been fully administered can also be found in this chapter.

43

PSYCHOMETRIC CHARACTERISTICS

Important psychometric characteristics of the SON-R 2,-7 will be discussed in this chapter.
These are the distribution characteristics of the scores, the reliability and generalizability of the
test, the relationship between the test scores and the stability of the scores. In general, these
results are based on the weighted norm group (N=1122). In several analyses comparisons have
been made between the results in three age groups, namely:
two- and three year-olds (the norm groups of 2;3, 2;9, 3;3 and 3;9 years),
four- and five-year-olds (the norm groups of 4;3, 4;9; 5;3 and 5;9 years),
six- and seven-year-olds (the norm groups of 6;3, 6;9 and 7;3 years).
The results in this chapter are relevant for the internal structure of the test. Research on validity,
carried out in the norm group, will be discussed in chapter 6 (Relationships with other variables)
and in chapter 9 (Relationship with cognitive tests).

5.1 DISTRIBUTION CHARACTERISTICS OF THE SCORES


Level of difficulty of the test items
As entry and discontinuation rules are used in the SON-R 2,-7, it is important that successive
items of the subtests increase in difficulty. Table 5.1 shows the p-values of the items, calculated
over the entire norm group. The p-value represents the proportion of children who completed
the item correctly. Items skipped at the beginning of the subtest are scored as correct; items that
are not administered after discontinuation of the test are scored as incorrect.
In general, the level of difficulty of the items increased as expected. Six of the 91 items were
more difficult than the following item, but in four cases the difference in p-value was only .02.
Table 5.1
P-value of the Items (N=1122)

item 1
item 2
item 3
item 4
item 5
item 6
item 7
item 8
item 9
item 10
item 11
item 12
item 13
item 14
item 15
item 16
item 17

Pat

Mos

Puz

Sit

Cat

Ana

.90
.88*
.90
.88
.86
.79
.77
.62
.60
.43
.33
.30
.21
.20
.13
.04

.95
.81
.77
.76
.73
.70
.64
.58
.46
.33
.23
.14
.08*
.10
.06

.97
.90
.89
.79
.76
.72
.64
.59
.37*
.44
.25
.19
.13
.05

.95
.91
.87
.86
.80
.67
.56
.54
.46
.32
.17
.12
.07
.06

.91
.89
.89
.82
.75
.69
.64
.51
.49
.33
.30
.17
.10
.09
.05

.96
.93
.84*
.86
.73
.52*
.58
.57
.45
.28
.28
.23
.15
.13
.04*
.06
.04

*: the p-value is lower than the p-value of the following item

44

SON-R 2,-7

For two items, item 9 of Puzzles and item 6 of Analogies, the difference was larger. The six
deviating items are marked with an asterisk in table 5.1.

IRT model
As in the construction research, the item characteristics for the definitive test were estimated
with the 2-parameter model from item response theory. The computer program BIMAIN
(Zimowski et al., 1994) was used for these calculations. This program does not require all
subjects to have completed all the items. The two item parameters estimated for the items of
each subtest are the a-parameter and the b-parameter. The a-parameter shows how well the item
discriminates and the b-parameter shows how difficult the item is. To obtain a reliable estimate
of the item parameters, the analysis was carried out on the test results of 2498 children, almost
all the children who were tested during the standardization and the validation research. The
estimate is based on the items administered de facto.
In figure 5.1 the item characteristics are represented in a graph. The distribution of the bparameters is similar to the results obtained on the basis of the p-values. Except for a few small
deviations, the items increase in difficulty. The difficulty of the items is also distributed evenly
over the range from -2 to +2.
The mean of the discrimination parameter is highest for Patterns (mean=4.8) and Mosaics
(mean=3.8). For Puzzles, Situations, Categories and Analogies, the means are 2.8, 2.4, 2.9 and
2.2 respectively. Within the subtests, however, the discrimination values of the items can diverge
strongly.
Initially, we considered basing the scoring and standardization of the SON-R 2,-7 on the
estimated latent abilities as represented in the IRT model. A good method for doing this with
incomplete test data was described by Warm (1989). Such a method of scoring has important
advantages: items which clearly discriminate have more weight in the evaluation, no assumptions need to be made about scores on items that were not administered, and the precision of
statements about the ability of a person can be shown more clearly. However, the disadvantages
are that this scoring method can only be done with a computer, and that important differences
can occur between the standardized computer results and the results obtained with norm tables.
The main factor in the decision not to apply the IRT model when standardizing the test, however, was the fact that the data did not fit the model. This is not surprising. The IRT model
assumes that the item scores are obtained independently. However, the feedback and the help
given with the SON-R 2,-7, creates an interdependence among the scores. This works out
positively for the test and its validity, but it limits the psychometric methods that can be applied
successfully. IRT models that take learning effects into account are being developed (see
Verhelst & Glas, 1995), but programs with which the item parameters can be estimated in
combination with an adaptive test administration, are not yet available.

Correlation of test performances with age


Table 5.2 presents, for each age group, the mean and the standard deviation of the raw subtest
scores and of the sum of the six subtest scores. The mean score increases with age for all
subtests. The sum of the raw subtest scores increases by about nine points per half year in the
youngest age groups, and by about five points per half year in the oldest age groups.
The strong correlation with age is also evident from the high correlations between the subtest
scores and age. The multiple correlation of age and the square of age with the subtest scores has
a mean of .87 and varies from .80 (Analogies) to .91 (Patterns). For the other subtests, the
correlations are .85 (Situations), .88 (Categories), .89 (Puzzles) and .90 (Mosaics). For the sum
score, the multiple correlation with age is .93. Because of the large increase in test performance
with age, the norm tables were constructed for each month of age.

Distribution of the standardized scores


The subtest scores, standardized and normalized for age, are presented on a scale of 1 to 19 with
a mean of 10 and a standard deviation of 3. The sum of the six standardized subtest scores is
presented on a scale with a mean of 100 and a standard deviation of 15. This score, the SON-IQ,

45

PSYCHOMETRIC CHARACTERISTICS

Figure 5.1
Plot of the Discrimination (a) and Difficulty (b) Parameter of the Items
Patterns
a
9

15

4
5

10 11 12 14
13

16

1 2 5
2
|
-3

|
-2

|
-1

|
0

|
1

|
2

Mosaics
a
5

4
6
1

8
9

10
11 12

13
14

2 3
|
-3

|
-2

15
|
-1

|
0

|
1

|
2

Puzzles
a
5

5
4
1

2
3

2
|
-3

10
9

|
-2

|
-1

11

|
0

|
1

12 13

14
|
2

|
1

11 12 13 14
|
2

Situations
a 5
2

4
3

1
2

5
|
-3

|
-2

8 7 9
|
b
0

|
-1

10

Categories
a 5
2 3
2

8
9

6 7

1
|
-3

11
10

|
-2

|
-1

|
0

12 14 15
13
|
1

|
2

Analogies
a 4
2

2
|
-3

4 3
|
-2

5
|
-1

8
7 6 9
|
b
0

10 12 14
11
13
|
1

15
16 17
|
2

46

SON-R 2,-7

Table 5.2
Mean and Standard Deviation of the Raw Scores
Pat
Age

Mos

Puz

Sit

Cat

Ana

Sum Subt.

Mean (SD) Mean (SD) Mean (SD) Mean (SD) Mean (SD) Mean (SD)

2;3
2;9
3;3
3;9
4;3
4;9
5;3
5;9
6;3
6;9
7;3

1.1
3.8
5.7
7.1
8.4
9.5
10.4
11.2
12.9
13.2
14.1

Total

(1.5)
(2.3)
(1.8)
(1.3)
(1.3)
(1.5)
(1.6)
(1.9)
(2.0)
(1.8)
(1.7)

8.9 (4.3)

1.3
1.8
3.3
5.3
7.3
8.3
9.0
9.8
11.1
11.4
12.2

( .9)
(1.1)
(2.1)
(2.2)
(1.9)
(1.8)
(1.5)
(1.8)
(2.0)
(1.9)
(2.0)

2.0
2.7
4.3
6.3
7.7
8.4
9.4
10.0
11.0
11.2
11.5

7.4 (4.1)

(1.0)
(1.3)
(2.0)
(1.9)
(1.7)
(2.0)
(1.7)
(2.0)
(1.9)
(1.5)
(1.5)

1.6
3.6
5.1
6.2
7.3
7.8
8.4
9.1
10.2
10.5
11.1

7.7 (3.7)

(1.7)
(2.2)
(1.8)
(1.6)
(1.7)
(1.7)
(1.7)
(1.7)
(1.8)
(1.7)
(1.6)

7.4 (3.4)

1.2
2.8
4.7
6.1
7.6
8.5
8.6
9.9
11.0
11.2
11.9

(1.6)
(2.0)
(1.9)
(1.9)
(1.9)
(2.0)
(1.7)
(1.9)
(1.7)
(1.9)
(1.6)

7.6 (3.9)

1.8
3.5
5.1
6.2
6.9
8.2
8.4
9.6
10.5
11.3
12.7

(1.4)
(1.9)
(2.0)
(2.1)
(2.1)
(2.1)
(2.4)
(2.6)
(3.3)
(3.1)
(3.0)

7.7 (4.0)

Mean (SD)
9.0
18.1
28.3
37.3
45.2
50.7
54.3
59.8
66.6
68.8
73.6

(5.3)
(6.8)
(8.4)
(7.6)
(7.2)
(7.9)
(6.7)
(8.6)
(9.1)
(8.3)
(8.3)

46.5 (21.7)

ranges from 50 to 150. A distribution with a mean of 100 and a standard deviation of 15 is also
used for the Performance Scale (SON-PS), based on the sum of the scores of Mosaics, Puzzles
and Patterns, and for the Reasoning Scale (SON-RS), based on the sum of the scores of Categories, Analogies and Situations.
In table 5.3, the mean and the standard deviation of the standardized scores are presented for
the entire weighted norm group and for three age groups. Only very small deviations from the
planned distribution were found for the entire group. No significant deviations from the normal
distribution were found in tests for skewness and kurtosis. Deviations in mean and dispersion
sometimes differed slightly across the three separate age groups, but an analysis of variance
showed that the differences between the means were not significant. A test for the homogeneity
of the variances also failed to show any significant differences. The kurtosis was not significant
in the different groups. The distribution was positively skewed for Puzzles and for the
Reasoning Scale in the oldest group. However, the values for skewness were small, .4 and .3,
respectively. A variance analysis was also carried out over the eleven original age groups. No
significant differences in mean and variance between the groups were established for any of the
variables.
Table 5.3
Distribution Characteristics of the Standardized Scores in the Weighted Norm Group
Total
Mean (SD)
Patterns
Mosaics
Puzzles
Situations
Categories
Analogies

10.0
10.1
10.0
10.0
10.0
10.0

(2.9)
(3.0)
(3.0)
(2.9)
(2.9)
(2.9)

2-3 years
Mean (SD)
9.9
10.0
10.0
10.0
10.0
10.0

(2.8)
(3.0)
(2.9)
(2.8)
(2.9)
(2.7)

4-5 years
Mean (SD)
10.0
10.0
10.0
9.9
10.0
10.0

(2.9)
(3.1)
(3.0)
(3.1)
(3.0)
(3.0)

6-7 years
Mean
(sd)
10.1
10.2
10.1
10.0
10.1
9.8

(3.1)
(3.0)
(3.0)
(2.8)
(2.9)
(3.1)

SON-PS
SON-RS

100.2 (15.1)
99.9 (15.0)

100.1 (15.2)
100.1 (14.5)

99.9 (15.0)
100.0 (15.6)

100.6 (15.2)
100.0 (14.9)

SON-IQ

100.1 (15.0)

100.1 (14.8)

99.9 (15.2)

100.5 (15.0)

47

PSYCHOMETRIC CHARACTERISTICS

Table 5.4
Floor and Ceiling Effects at Different Ages
Floor Effect (lowest possible standardized score)
Age

Pat

Mos

Puz

Sit

Cat

Ana

PS

RS

IQ

2;0
2;3
2;6
2;9
3;0
3;3
3;6

9
8
6
4
3
1
1

6
6
5
4
3
2
1

4
3
3
2
2
1
1

8
7
5
4
3
2
1

9
8
7
5
3
2
1

7
6
5
3
2
1
1

70
68
62
52
52
50
50

86
80
72
61
52
50
50

73
68
63
51
50
50
50

Ceiling Effect (highest possible standardized score)


Age

Pat

Mos

Puz

Sit

Cat

Ana

PS

RS

IQ

5;0
5;6
6;0
6;6
7;0
7;6
8;0

19
19
18
16
15
14
13

19
19
18
17
16
15
14

19
19
18
17
16
16
15

19
19
18
17
16
16
16

19
18
18
17
16
16
15

19
19
19
18
17
16
15

150
150
149
141
137
132
126

150
150
150
149
140
138
134

150
150
150
150
143
139
133

These results indicate that the standardization model is adequate and gives a good estimate of
the distribution of the scores in the population; the deviations in the samples can be seen as
chance deviations from the population values resulting from sample fluctuations.

Floor and ceiling effects


Although the standardization of the subtest scores was based on a distribution with a range from
1 to 19, these scores could not be obtained in all age groups. The youngest children had raw
scores of zero so often that the standardized scores were substantially higher than 1. This means
that, at this age, the test differentiates less for children with a low performance level. The first
part of table 5.4 presents, for a few age ranges, the standardized scores in the situation where the
child receives no positive scores. In the age range 2;0 to 2;6 years, considerable floor effects
can be seen. From 2;9 years onwards these effects are much smaller. The lowest possible
standard subtest scores are about two standard deviations below the mean of 10 and the lowest
IQ score that can occur is 51. From 3;6 years onwards, no floor effects occur.
In the second part of table 5.4 the standardized scores are presented for the situation in which
all the items are done correctly. From the age of about 6;0 onwards, small ceiling effects can be
observed. From 7;0 years onwards, these effects become more important and the maximum IQ
score of 150 can no longer be reached.

5.2 RELIABILITY AND GENERALIZABILITY


Reliability of the subtests
The reliability of the subtests is based on the internal consistency of the item scores. The
reliability was calculated using the formula for labda2 (Guttman, 1945). However, an assumption made by this and similar formulas for internal consistency is that the item scores are
obtained independently. The sequence in which the items are administered should therefore

48

SON-R 2,-7

have no effect on the scores. In the case of the SON-R 2,-7, this condition is not fulfilled for
two reasons. First, the entry and discontinuation rules mean that scores on some items determine whether other items are or are not administered. The latter items are, however, scored as
correct or incorrect. When item scores become interdependent in this way, reliability is
inflated. In the case of the SON-R 5,-17, where this was investigated, the mean overestimation
of the reliability of the subtests as a result of the adaptive procedure was .11 (Snijders, Tellegen
& Laros, 1989, p. 46-51). The item scores are not independent for a second reason. After every
item that a child cannot solve independently, extensive help and feedback are given. This often
leads to the next, more difficult item being solved correctly. These inconsistencies, which have
a valid cause, lead to an underestimation of reliability.
The net effects of the underestimation of reliability (as a result of valid inconsistencies) on
the one hand, and the overestimation of reliability (as a result of artificial consistencies) on the
other hand, cannot be determined. Therefore, the reliability of the subtests with the SON-R
2,-7 was based on the formulas for internal consistency and no correction for under or overestimation was applied. The uncertainty about the correctness of the estimate of reliability is a
reason to be reticent about the individual interpretation of results on the subtest level. It was also
the reason why the standardized subtest scores were not presented, as was done with the
SON-R 5,-17, in such a way that the reliability was taken into account in the score.
Table 5.5
Reliability, Standard Error of Measurement and Generalizability of the Test Scores
Reliability
Age

Pat

Mos

Puz

Sit

Cat

Ana

Mean

PS

RS

IQ

2;6
3;6
4;6
5;6
6;6
7;6

.79
.73
.72
.74
.76
.79

.41
.76
.77
.74
.78
.84

.45
.75
.75
.70
.69
.69

.79
.66
.62
.62
.66
.69

.81
.73
.70
.68
.68
.69

.75
.73
.74
.78
.83
.85

.67
.73
.72
.71
.73
.76

.68
.86
.88
.87
.87
.88

.89
.84
.81
.81
.84
.86

.86
.90
.90
.90
.91
.92

Mean

.75

.73

.69

.67

.71

.78

.72

.85

.84

.90

Standard Error of Measurement


Age

Pat

Mos

Puz

Sit

Cat

Ana

PS

RS

IQ

2;6
3;6
4;6
5;6
6;6
7;6

1.4
1.6
1.6
1.5
1.5
1.4

2.3
1.5
1.5
1.5
1.4
1.2

2.2
1.5
1.5
1.6
1.7
1.7

1.4
1.7
1.9
1.8
1.8
1.7

1.3
1.6
1.7
1.7
1.7
1.7

1.5
1.6
1.5
1.4
1.2
1.2

8.5
5.6
5.3
5.4
5.4
5.2

5.0
6.1
6.6
6.5
6.0
5.5

5.6
4.7
4.7
4.7
4.5
4.2

Generalizability

Standard Error of Estimation

Age

PS

RS

IQ

Age

PS

RS

IQ

2;6
3;6
4;6
5;6
6;6
7;6

.45
.67
.77
.78
.75
.71

.74
.66
.57
.56
.63
.71

.71
.77
.78
.78
.80
.82

2;6
3;6
4;6
5;6
6;6
7;6

11.1
8.7
7.3
7.0
7.5
8.1

7.7
8.8
9.8
9.9
9.1
8.1

8.0
7.1
7.1
7.0
6.7
6.4

Mean

.69

.64

.78

PSYCHOMETRIC CHARACTERISTICS

49

The calculated values of labda2 have been fitted in the standardization model as a function of
age. The results for a number of ages are presented in table 5.5. The mean reliability of the
subtests is .72; it increases, though not regularly, with age. Very low reliabilities were found for
Mosaics and Puzzles at the age of 2;6 years. A learning effect may occur with these subtests at
a young age when help is offered, and this may result in an underestimation of reliability.
In the second part of table 5.5 the standard errors of measurement are presented. The standard error of measurement is the standard deviation of the standardized scores that would be
received by an individual child, if the subtest could be administered to him or her many times. It
indicates how strongly the test results of a child can fluctuate. Section 13.4 describes how to use
the standard error of measurement to test the differences between scores statistically.

Reliability of the total scores


The reliability of the Performance Scale, the Reasoning Scale and the SON-IQ was calculated
using the formula for stratified alpha. This is a formula for the reliability of linear combinations
(Cronbach, Schnemann & McKie, 1965; Nunnally, 1978, p. 246-250). The reliability of the IQ
score had a mean of .90. Reliability increased with age, from .86 at 2;6 years to .92 at 7;6 years.
The standard error of measurement of the IQ decreased from 5.6 at 2;6 years to 4.2 at 7;6 years
(see table 5.5).
The mean reliability of the Performance Scale was .85 and the mean reliability of the
Reasoning Scale .84. In general, the reliability of the Performance Scale was higher. The
youngest children formed an exception. In this group, the reliability of the Reasoning Scale was
clearly higher than the reliability of the Performance Scale.
The scores on the Performance Scale and the Reasoning Scale were strongly correlated.
In the entire norm group the correlation was .56. In the age groups two and three, four and
five, and six and seven years, the correlations were .52, .55 and .61 respectively. The
correlation between the two scales decreased the reliability of the difference between the
Performance Scale and the Reasoning Scale. The mean reliability of the difference score was
.65. The minimum difference between the two scores for significance on the 1%- and 5%
level is shown in the norm tables.

Generalizability of the total scores


The generalizability of the IQ and the two scale scores was also determined. This shows how
well one can generalize, on the basis of the selected subtests, to the total domain of comparable
subtests. The generalizability was calculated using the formula for coefficient alpha with subtest
scores instead of item scores as the unit of analysis. For homogeneous (sub)tests, alpha, as a
measure for internal consistency, can also be used as estimate of reliability. However, coefficient alpha has a different meaning for a total score based on subtest scores, each of which has
its own specific reliable variance. In this case, it can be interpreted as a measure of generalizability. The six subtests of the SON-R 2,-7 can be considered a sample from the domain of
similar nonverbal subtests. Alpha represents the expected correlation of the IQ score with the
total score on a different, same sized combination of subtests from the domain. The square root
of alpha is the correlation of the IQ score with the hypothetical test score that would be expected
if a large number of similar nonverbal subtests had been administered. The same applies for the
Performance Scale and the Reasoning Scale. However, here the domain of subtests is limited to
similar performance or reasoning tests.
The mean generalizability coefficient () of the SON-IQ was .78. It increased from .71 at 2;6
years to .82 at 7;6 years. The mean generalizability for the Performance Scale was .69 (relatively high for the middle age groups) and for the Reasoning Scale .64 (relatively high for the
extreme age groups).
In table 5.5 the standard errors of estimation, based on the generalizability coefficient, are
also presented. The standard error of estimation for the IQ represents the standard deviation of
the distribution of IQ scores of all subjects with the same SON-IQ that would be found if a large
number of subtests were administered. The greater the dispersion, the less accurate are the
statements about the level of intelligence based on these test results. The standard error of

50

SON-R 2,-7

estimation was used to construct the interval in which the domain score will, with a certain
probability, be found. This interval is not situated symmetrically around the given score. When
the point of departure is the distribution of the scores in the norm population, the middle of the
interval equals 100 + (IQ-100). The standard error of estimation equals 15(1-). In the
norm tables, this interval is presented for each IQ score with a latitude of 1.28 times the standard
error of estimation. This means that the probability that the domain score is in the interval is
80%. When using the computer program, these intervals are also presented for the Performance
Scale and the Reasoning Scale.
For individual assessments, the interval gives a good indication of the accuracy with which a
statement, based on the test results, can be made about the level of intelligence. The interval is
broader than the intervals that are based, as is customary, on the reliability of the test. When
interpreting the results of an intelligence test, one will, in general, not want to limit oneself to
the specific abilities included in the test. The interval, based on generalizability, takes into
account the facts that the number of items per subtest is necessarily limited, and that the choice
of the subtests also denotes a limitation.
Given the problems in correctly determining the reliability of the subtests with the SON-R
2,-7, it is fortunate that the calculation of the generalizability of the total scores depends
exclusively on the number of subtests and the strength of the correlations between the subtests,
and not on the reliability of the subtests.

, -17
Comparison with the Preschool SON and the SON-R 5,
The reliability and generalizability of the IQ score of the SON-R 2,-7 were compared with the
previous version of the test, the Preschool SON, and with the revision of the SON for older
children, the SON-R 5,-17. In the manual for the Preschool SON, reliabilities based on calculations over combined age groups were presented. The combination of age groups leads to a
high overestimation of reliability. Therefore, new calculations were carried out on the original
normalization material, and the reliability and the generalizability for homogeneous age groups
were determined. (Tellegen et al., 1992).
The reliability and the generalizability of the SON-R 2,-7 were greatly improved with
respect to the Preschool SON. This is especially so for the more extreme age groups. However,
an improvement can also be seen for the four-year-olds, for whom the reliability and generalizability of the old Preschool SON were highest (table 5.6).
Table 5.6
Reliability and Generalizability of the IQ Score of the Preschool SON, the SON-R 2,-7 and the
SON-R 5,-17
Reliability
Age

Generalizability
P-SON

2;6 years

SON-R
2,-7

SON-R
5,-17

P-SON

SON-R
2,-7

SON-R
5,-17

.86

2;6 years

.54

.71

.90

3;6 years

.69

.77

.90

4;6 years

.74

.78

.90

.90

5;6 years

.71

.78

.79

.91

.92

6;6 years

.62

.80

.81

.92

.93

7;6 years

.52

.82

.83

Age

.78
3;6 years
4;6 years

.86

5;6 years
.82
6;6 years
7;6 years

51

PSYCHOMETRIC CHARACTERISTICS

In comparison with the SON-R 5,-17, the results of similar age groups for reliability and
generalizability are practically the same. However, for the total age range of the SON-R
5,-17, the mean reliability (.93) and the generalizability (.85) are higher than for the
SON-R 2,-7.

5.3 RELATIONSHIPS BETWEEN THE SUBTEST SCORES


The relationship between the test scores was examined using the correlations between the
subtests and the correlations of each subtest with the sum of the remaining subtests.

Correlations between the subtests


The correlations between the standardized subtest scores for the entire norm group and for three
age groups are presented in table 5.7. The mean correlation in the entire group was .36. The
strongest correlations were found between Patterns and Mosaics (.50) and between Puzzles and
Mosaics (.45); the weakest correlations were those of Categories and Analogies with Puzzles
(.30 and .28) and of Analogies with Situations (.31).
The mean correlations increased with age. In the youngest group the mean was .33, in the
middle group .37, and in the oldest group .40. If we compare the oldest and youngest age
groups, nearly all correlations appear to increase. The exception to the rule is Categories; the
correlation of Categories with Patterns increased, but the correlations with the other four subtests decreased.
The increase in the correlations with age corresponds to the findings with the SON-R 5,-17.
Here the mean correlation in the age range 6;6 to 14;6 years increased from .38 to .51. The mean
correlation with the SON-R 5,-17 for the six and seven-year-olds was .39, almost equal to the
mean correlation of .40 in the same age group with the SON-R 2,-7.
With the SON-R 2,-7, as with the SON-R 5,-17, the correlation between the performances
on the different subtests increased with age. This also increased the reliability and generalizability of the SON-IQ for the older age groups.
Table 5.7
Correlations Between the Subtests
Age: 2-7 years
Pat
Pat
Mos
Puz
Sit
Cat
Ana

.50
.39
.35
.35
.34

Age: 2-3 years

Mos Puz

.45
.36
.36
.37

.34
.30
.28

Sit

.39
.31

Cat

.39

Ana

Age: 4-5 years

Pat
Mos
Puz
Sit
Cat
Ana

Pat
Mos
Puz
Sit
Cat
Ana

Pat

Mos Puz

Sit

Cat

Ana

.36
.24
.33
.32
.28

.39
.30
.39
.31

.31
.31
.22

.51
.29

.45

Pat

Mos Puz

Sit

Cat

Ana

.56
.44
.41
.37
.43

.47
.48
.33
.39

.33
.36

.41

Age: 6-7 years

Pat

Mos Puz

.60
.50
.33
.36
.32

.49
.34
.34
.40

.32
.32
.28

Sit

.33
.28

Cat

.33

Ana

Pat
Mos
Puz
Sit
Cat
Ana

.38
.26
.35

52

SON-R 2,-7

Table 5.8
Correlations of the Subtests with the Rest Total Score and the Square of the Multiple
Correlations
Correlation with Rest Total

Pat
Mos
Puz
Sit
Cat
Ana

Square of the Multiple Correlation

2-7 years

2-3

4-5

6-7

.56
.59
.50
.49
.51
.47

.44
.52
.42
.51
.59
.45

.61
.63
.55
.45
.47
.45

.63
.63
.53
.54
.46
.54

Pat
Mos
Puz
Sit
Cat
Ana

2-7 years

2-3

4-5

6-7

.33
.37
.27
.25
.27
.24

.20
.28
.20
.31
.40
.24

.43
.45
.33
.20
.23
.22

.41
.43
.30
.30
.24
.30

Correlation with the total score


The correlation of the subtests with the total score was examined by calculating the correlation
with the unweighted sum of the five remaining subtests and the square of the multiple correlation of a subtest with the five remaining subtests (table 5.8). The latter indicates the proportion
of variance explained by the optimally weighted combination of the other subtests.
For the entire norm group, Patterns and Mosaics correlated most strongly with the remaining
total score. However, this was not the case in the youngest age group. For the two to three-yearolds, Categories had the strongest correlation with the remaining total score (.59), but for the six
to seven-year-olds this correlation decreased to .46. In this age range, Categories had the
weakest correlation with the remaining subtests.
About 70% of the variance of each subtest could not be explained by the scores on the other
subtests. This is partially explained by the unreliability of the subtests. However, it also indicates that a substantial part of the reliable variance of each subtest is specific. The importance of
the subtest-specific reliable variance decreased as the children grew older.

5.4 PRINCIPAL COMPONENTS ANALYSIS


In order to determine how many dimensions can be distinguished meaningfully when interpreting the test results, a Minimum Rank Factor Analysis was first carried out for the entire
norm group (Ten Berge & Kiers, 1991). This method was used to determine how many
factors were required to explain the common variance of the variables. One factor explained
87% of the common variance, two factors explained 97% and three factors explained 100%.
The third factor added little to the the explained variance. After rotation, only one subtest
had a high loading on the third factor. As a result, further analyses of a solution were based
on two factors.
In the first part of table 5.9 the results of the Principal Components Analysis for the entire
norm group and for the three age groups are presented. In the entire norm group, 60% of the
total variance is explained by the first two components. The percentage increases slightly, to
64%, in the age groups. The total variance includes the subtest-specific reliable variance and the
error of measurement variance of the subtests. Therefore, the percentages of explained variance
are lower than for the minimum rank factor analysis that determines which part of the common
variance is explained.
In the entire norm group the loadings on the rotated components showed a clear distinction
between the performance subtests (Patterns, Mosaics and Puzzles) and the reasoning subtests
(Situations, Categories and Analogies). This distinction was also seen in the middle age group.
In the youngest groups, however, Patterns had an equally high loading on both components,
whereas in the oldest group, Situations, like the performance tests, had its highest loading on the
first component.

53

PSYCHOMETRIC CHARACTERISTICS

To determine how important the differences in loadings between the three age groups were, a
Simultaneous Components Analysis was carried out on these data sets (Millsap & Meredith,
1988; Kiers & Ten Berge, 1989). This was done to examine whether a uniform solution of
component weights explained (substantially) less of the variance than the solutions that were
optimal for the separate age groups. The analysis with the SCA program (Kiers, 1990) showed
that this was not the case: the uniform solution over the three age groups explained 61.1% of the
variance and the separate optimal solutions explained 61.4% of the variance. Also important
was the fact that the simple weights, being 1 or 0 (depending on the scale to which the subtest
belongs), were almost as effective as the optimal uniform solution. Using simple weights, as is
done in the construction of the Performance Scale and the Reasoning Scale, the percentage of
explained variance was 60.8%.
Table 5.9
Results of the Principal Components Analysis in the Various Age and Research Groups
Eigenvalue and Percentage of the Explained Variance by the first two Main Components
2-7 years
F1
F2

2.8
.8

47%
13%

2-3 years
2.7
.8

45%
14%

4-5 years
2.9
.8

48%
14%

6-7 years
3.0
.8

50%
14%

Loadings on the first two Varimax-Rotated Components


2-7 years
F1
F2

2-3 years
F1
F2

4-5 years
F1
F2

6-7 years
F1
F2

Pat
Mos
Puz

.72
.75
.80

.29
.29
.12

.44
.72
.85

.43
.30
.07

.82
.78
.78

.23
.31
.19

.69
.79
.79

.37
.23
.07

Sit
Cat
Ana

.35
.17
.18

.59
.80
.75

.30
.25
.05

.65
.78
.78

.22
.20
.21

.68
.74
.70

.65
.13
.35

.29
.88
.70

Boys
F1
F2

Girls
F1
F2

low SES
F1
F2

high SES
F1
F2

Pat
Mos
Puz

.84
.79
.59

.13
.28
.38

.71
.72
.84

.31
.33
.07

.81
.76
.76

.15
.24
.10

.62
.70
.84

.39
.31
.08

Sit
Cat
Ana

.24
.16
.25

.70
.79
.66

.39
.16
.17

.55
.81
.75

.36
.00
.46

.59
.88
.46

.23
.12
.33

.71
.84
.64

Immigrant
F1
F2

Tested outside
the Netherlands
F1
F2

Gen./Perv.
Dev. Disorder
F1
F2

Speech/language/
Hearing Disorder
F1
F2

Pat
Mos
Puz

.90
.71
.52

.03
.39
.34

.85
.82
.75

.28
.36
.39

.80
.80
.82

.37
.33
.24

.78
.79
.82

.28
.26
.18

Sit
Cat
Ana

.10
.22
.38

.87
.72
.60

.30
.41
.28

.78
.71
.80

.42
.29
.23

.66
.80
.78

.57
.28
.24

.32
.79
.83

54

SON-R 2,-7

In the second part of table 5.9, the loadings on the first two components are shown for different
samples of the norm group. These are the boys (N=561), the girls (N=563), and the children
whose parents had either a low (N=233) or a high SES level (N=202). The SES level and its
correlation with the test performances are described in section 6.6. In the four groups the
loadings of the subtests are consistent with a distinction between performance and reasoning
tests, with one exception: the loading of Analogies was the same for both components for the
children with a low SES level.
The last part of table 5.9 presents the component loadings for a number of groups who were
not, or only partially, tested in the context of the standardization research. The first group
consisted of immigrant children (N=118). These were children who lived in the Netherlands and
whose parents were both born abroad. About two thirds of this group was tested in the context of
the standardization research. The remaining one third was tested at primary schools in the
context of the validation research (see chapter 8). The second group consisted of children who
were tested in other countries (N=440). The research was conducted in Australia, the United
States of America and Great Britain, mainly with children without specific problems or handicaps, although some children with impaired hearing, bilingual children and children with a
learning handicap were included (see section 9.5, 9.6 and 9.7). The third and fourth groups
consisted of children with specific problems and handicaps, who were examined in the Netherlands in the context of the validation of the test (see chapter 7). The third group consisted of
children with a general developmental delay and children with a pervasive developmental
disorder (N=328). The fourth group consisted of children with a language/speech disorder,
impaired hearing and/or deaf children (N=346). In these four groups, with one exception, the
loadings on the first two rotated components corresponded to the distinction between performance and reasoning tests. In the group of children with language/speech and/or impaired hearing
disorders, the subtest Situations had its highest loading on the first performance component.
The distinction made by the SON-R 2,-7 between the Performance Scale and the Reasoning
Scale, is supported to a large extent by these results in very different groups. Though the
reliability of the difference between scores on the two scales is moderate, this distinction is the
most relevant one for the intra-individual interpretation of the test results. The empirical validity
of the distinction between the Performance Scale and the Reasoning Scale will be discussed in
section 9.9.

5.5 STABILITY OF THE TEST SCORES


Correlations and means
The SON-R 2,-7 was administered a second time to a sample of 141 children who had participated in the standardization research. The mean interval between administrations was 3.5
months, with a standard deviation of 21 days. The age of the children varied from 2;3 to 7;4
years. The mean age at the first administration was 4;6 years with a standard deviation of 1;5
years. The number of boys and girls was almost equal.
The correlations between the scores at each administration, and the mean and standard
deviation of the scores, are presented in table 5.10. If the standard deviation of the scores of the
first administration was different from the standard deviation in the norm population, the correlations were corrected (see Guilford & Fruchter, 1978).
The test-retest correlation for the IQ score was .79. For the Performance Scale and the
Reasoning Scale, it was .74 and .69 respectively, and for the subtests .57 on average. The
stability was relatively high for Mosaics and Categories (both .64) and relatively low for Situations and Analogies (respectively .48 and .49).
The test-retest correlations for all the test scores are clearly lower than the reliability based
on internal consistency. This indicates that changes in performance occur which cannot be
attributed to errors of measurement. In chapter 10 the significance of this will be discussed.
Performances on all subtests were, on average, better during the second administration. The
increase in standardized scores (both times based on the exact age) varied from .5 (Analogies) to

55

PSYCHOMETRIC CHARACTERISTICS

Table 5.10
Test-Retest Results with the SON-R 2,-7 (N=141)

r
Patterns
Mosaics
Puzzles
Situations
Categories
Analogies

.56
.64
.60
.48
.64
.49

SON-PS
SON-RS

.74
.69

SON-IQ

.79

Admin. I
Mean (SD)
10.2
10.6
10.2
10.4
10.5
10.6

(3.0)
(2.8)
(2.9)
(2.5)
(2.8)
(2.9)

Admin. II
Mean (SD)
10.8
11.6
11.4
11.6
11.2
11.1

Difference

(2.6)
(3.1)
(2.8)
(3.1)
(2.9)
(3.0)

.6
1.0
1.1
1.2
.7
.5

102.5 (14.3)
103.5 (13.7)

107.9 (14.3)
108.7 (15.2)

5.5
5.2

103.4 (13.7)

109.4 (14.7)

6.0

correlations have been corrected for variance in the first administration

1.2 (Situations). The scores on the Performance Scale and the Reasoning Scale increased by
more than 5 points. The IQ score increased, on average, by 6 points. All differences in mean
scores were significant at the 1% level, except for the subtests Patterns and Analogies.
A distinction was made between the children who were younger than 4;6 years (mean age
3;4 years; N=67) at the first administration, and children who were older (mean age 5;7 years;
N=74). In the younger group the test-retest correlation for the IQ was .78, in the older group .81.
The correlation for the Reasoning Scale decreased slightly with age (from .71 to .69). For the
Performance Scale it increased clearly (from .65 to .80). The increase in the mean IQ in both
groups was practically equal.

Profile analysis
A profile analysis was carried out to determine the meaning of the intra-individual differences
between the subtest scores of a single subject. One of the characteristics of the profile is the
dispersion of the scores. This was calculated as the standard deviation of the six scores (the
square root of the mean square of the deviations of the six subtests from the individual mean). In
the entire norm group the mean of the dispersion was 2.0. For 24% of the children, the intraindividual dispersion was 2.5 or higher, and for 9% the dispersion was 3.0 or higher.
The mean individual dispersion for the 141 children who were tested twice with the SON-R
2,-7 was 2.0 on both occasions. Remarkably, the correlation between the dispersion on the first
and second administration was weak (.17) and not significant.
Another important characteristic of the profile is the relative position of the subtest scores.
To determine whether this was stable, the six subtest scores from the first administration were
correlated, for each child, with the six scores of the second administration. The mean correlation
was .32. The strength of the correlation depends very much on the dispersion of the scores on
the first administration. Clearly, if the differences are small, they are determined largely by
errors of measurement and are therefore unstable. Where the dispersion on the first administration was less than 2.0 (N=69), the mean correlation was .22; where the dispersion was 2.0 to 3.0,
the mean correlation was .38, and for the twelve children who had a dispersion of 3.0 or more,
the mean correlation was .61. This indicates that the differences between the subtest scores must
be substantial before we can conclude that they will remain stable over a period of some months.
When using the computer program, the dispersion is calculated and printed.
The difference between the scores on the Performance Scale and the Reasoning Scale in the
first administration correlated .46 with the difference between the two scores in the second
administration. For the children younger than 4;6 years, the correlation was .43 and for the older
children it was .50.

56

SON-R 2,-7

Table 5.11
Examples of Test Scores from Repeated Test Administrations (I and II)
Example A
I
II

Example B
I
II

Example C
I
II

Example D
I
II

SON-IQ

97

108

109

116

106

110

121

120

SON-PS
SON-RS

100
93

105
113

108
107

110
118

100
113

94
126

122
116

123
113

Patterns
Mosaics
Puzzles
Situations
Categories
Analogies

11
8
11
9
9
9

14
11
7
14
12
10

12
13
9
13
12
8

10
17
8
13
11
14

11
9
10
8
14
14

9
9
9
13
16
13

14
9
18
10
15
12

12
12
17
12
13
11

Dispersion
Correlation

1.1
2.4
.18

2.0

2.9

2.3

2.7

3.1

.32

.56

2.0
.78

As an example, the scores of a few children on the two administrations are presented in table
5.11. The dispersion and the correlation between the six scores are also shown. The examples
illustrate that important changes can take place in the intra-individual order of the subtest
scores.

57

RELATIONSHIPS WITH OTHER VARIABLES

In this chapter the relationship is discussed between test performance and a number of variables
that are important in order to judge the validity of the test. The analyses are based on the results
of the standardization research. Other tests were also administered to a large number of the
children in order to validate the SON-R 2,-7. The results are described in chapter 9. A comparison is made in section 9.11 between the SON-R 2,-7 and other tests, with respect to their
relationship with a number of variables that are discussed in this chapter, i.e. SES index,
parents country of birth, evaluation by the examiner, and the schools evaluation of language
skills and intelligence.

6.1 DURATION OF TEST ADMINISTRATION


In general the test was administered in one session with short breaks if necessary. In the case of
9% of the children a break of longer than a quarter of an hour was taken, usually due to school
recess or the end of the school day. In these cases the second part of the test was administered
later in the day or on another day. The mean IQ score of the children to whom the test was
administered in two parts did not deviate from the mean of the children to whom the test was
administered in one session.
The duration of administration (including short breaks) had a mean of 52 minutes with a standard deviation of 12 minutes. For two-year-olds the duration of administration was shorter; in the
age group of 2;3 years the mean duration of administration was 38 minutes and in the age group
of 2;9 years this was 46 minutes. From three years onwards the mean was fairly constant at 54
minutes. In table 6.1 the frequency distribution of the duration of administration is presented both
for the total norm group and for the two-year-olds and the older children as separate groups.
There was a significant positive correlation between the duration of administration and the
IQ score. This relationship was strong (r=.52) especially for the two-year-olds. The correlation
for the older children was .34. The relation could be explained by the fact that children within
each group who performed well completed more items on average.
Table 6.1
Duration of the Test Administration
Duration of the complete test
(N=1124)

- 40 min
41 - 50 min
51 - 60 min
61 - 70 min
> 70 min

Mean duration in minutes


(N=1014)

2-7 years

2 yrs

3-7 yrs

16%
32%
32%
14%
6%

49%
30%
17%
3%
1%

9%
32%
36%
16%
7%

Mean

(SD)

Patterns
Mosaics
Puzzles
Situations
Categories
Analogies

7.0
10.3
8.5
6.3
8.4
8.8

(3.0)
(3.9)
(3.1)
(2.3)
(3.3)
(2.8)

Total

49.2 (10.7)

58

SON-R 2,-7

The duration of the administration of the separate subtests was known for 1014 children (table
6.1). Situations had the shortest duration of administration with a mean of 6.3 minutes and also
the narrowest dispersion in duration. Mosaics had the longest duration of administration with a
mean of 10.3 minutes, and the widest dispersion.
The duration of administration was also recorded for children who participated in other
validation research projects (see chapter 7). The mean duration (including short breaks) for these
children, who had varying problems and handicaps in cognitive development and communication,
was 57 minutes. This was 5 minutes longer than for the children in the standardization research.
The duration of administration was relatively short for children with a general developmental
delay (a mean of 53 minutes) and relatively long for deaf children (a mean of 66 minutes).

6.2 TIME OF TEST ADMINISTRATION


The influence of the time of administration on test results was examined in the standardization
research. The largest part of the norm group was tested during the first twelve weeks of the
school year 1993/94. For these 1065 children, the relationship was examined, using analysis of
variance, between the IQ scores and the period of research (four consecutive periods of three
weeks), the day of the week on which the test was administered, and the time of day at which the
administration was started.
In table 6.2 the mean IQ scores for each category of these three variables are presented as a
deviation from the total mean. Each variable was controlled for the effect of the other two
variables. The largest differences in mean IQ scores were found for the variable starting time,
but the effect was not significant (F[6,997]=1.26; p=.27).
Table 6.2
Relationship of the IQ Scores with the Time of Administration (N=1065)
Starting Time
8- 9 a.m.
9-10 a.m.
10-11 a.m.
11- 1 p.m.
1- 2 p.m.
2- 3 p.m.
After 3 p.m.

dev.

Day of Week

108
231
240
139
162
115
70

1.9
.3
.4
1.9
.6
2.4
1.2

Monday
Tuesday
Wednesday
Thursday
Friday

dev.

198
287
162
262
156

.8
.1
.5
1.2
.3

Period
I
II
III
IV

dev.

305
302
178
280

.6
.2
.6
.8

6.3 EXAMINER INFLUENCE


Eleven examiners tested most of the children in the standardization research. The scores of the
different examiners were compared, while controlling for the sex of the children, the percentage
of immigrant children (children whose parents had both been born abroad) and the SES index.
In table 6.3 the deviations from the total mean are shown for the IQ score, after controlling for
the other variables.
The beta coefficient, which indicates how strong the association is after controlling for the
other variables, was .18 and clearly significant (F[10,1059]=4.09; p<.01]. However, the differences between the examiners also occurred partially because of sample fluctuations. About one
quarter of the variance in the mean scores of the examiners could be ascribed to this. However,
even when this was taken into account, deviations of two to three IQ points, resulting from the
manner of administration by the examiner or from other characteristics of the examiner,
remained plausible.

59

RELATIONSHIPS WITH OTHER VARIABLES

Table 6.3
Examiner Effects (N=1073)
Mean Scores as
Deviation from the
Total Mean
Examiner
A
B
C
D
E
F
G
H
I
J
K

Strength of the Examiner Effect


before (eta) and after (beta)
Controlling for other Variables

dev.

104
98
115
50
92
61
97
110
123
115
108

4.81
3.14
2.58
2.45
0.21
0.20
1.24
1.76
2.46
2.78
2.99

Score

eta

beta

Patterns
Mosaics
Puzzles
Situations
Categories
Analogies

.15
.22
.14
.13
.21
.16

.15
.20
.15
.12
.19
.17

SON-PS
SON-RS

.17
.18

.16
.18

SON-IQ

.18

.18

The strength of the examiner influence showed no clear relation to age; the beta for the two- and
three-year-olds was .23, the beta for the four- and five-year-olds was .18 and the beta coefficient
for the six- and seven-year-olds was .28.
The mean IQ score of the children who were tested by three male examiners was 2.2 points
lower than the mean IQ score of the children who were tested by the female examiners. The pvalue of the difference was .02. There was no interaction effect between the sex of the child and
the sex of the examiner. The number of male examiners was too small to assess whether the
difference between the male and the female examiners was based on their sex or whether this
was caused by personal characteristics unrelated to their sex.
With the exception of Situations, the examiner influence was significant in the various
subtests. The influence was greatest for Mosaics and Categories, the tests that are administered first. This may indicate that the differences between the examiners were related to the
manner in which the child was put at ease and motivated at the beginning of the test
administration.

6.4 REGIONAL AND LOCAL DIFFERENCES


The selection of the communities where the standardization research was carried out was
stratified according to region, community size and degree of urbanization. The IQ scores as a
deviation from the total mean are presented in table 6.4. In the first column the observed values
are given. In the second column the means after controlling for sex, the SES-index and the
percentage of immigrant children are given. The entire group consists of 1102 children. The
children from special schools and the immigrant children who were later added to the norm
group were not included. The children whose SES-index was not known were also not included.
Relatively few differences were found between regions or different sized communities. Both
variables had a p-value of .04 for the differences. After controlling for the background of the
children, the differences decreased and were no longer significant. The differences according to
degree of urbanization (rural communities, urbanized rural communities, commuter communities and urban communities) were small before and after controlling for the other variables and
were not significant.
The results for region, community size and degree of urbanization correspond to the findings
with the standardization research of the SON-R 5,-17 (Snijders, Tellegen & Laros, 1989). An

60

SON-R 2,-7

Table 6.4
Regional and Local Differences (N=1102)
Deviations of the IQ Scores in relation to the Total Mean
I: without controlling for other variables
II: after controlling for other variables
Region
N
North/East
South
West

Community Size
(x 1000)
N
I

II

342 1.0 .7
212 2.3 1.5
548
.2 .1

Degree of Urbanization
II

< 20 375 .4
.4
<100 489
.8
.3
>100 238 2.1 1.4

Rural Community
Urbanized Rural Comm.
Commuter Community
Urban Community

II

164
250
183
505

.4
.6
.7
.1

.5
.0
.6
.1

exception was that with the SON-R 5,-17, relatively high performances were found for children in commuter communities, both before and after controlling for other variables.

6.5 DIFFERENCES BETWEEN BOYS AND GIRLS


In table 6.5 the mean test scores of the boys (N=561) and the girls (N=563) from the norm group
are presented. The differences that were significant at the 1% level using a t-test, are marked
with an asterisk. The girls performed significantly better than the boys on four subtests. The
biggest differences were encountered with the abstract reasoning tests Categories and Analogies. Patterns was the only performance subtest in which a clear difference was found. The
Performance Scale showed a sex difference of 1.7 points that was not significant. The difference
of 4.6 points on the Reasoning Scale, however, was significant. The difference between the
mean IQ scores was 3.5 points. This difference tended to decrease for older children. In the
group of two- and three-year-olds the difference in IQ scores was 5.7 points, for the four- and
five-year-olds this was 2.4 points and for the oldest children 2.1 points. However, the interaction
effect between sex and age group was not significant (F[2,1118]=1.73; p=.17). A regression
analysis showed that the interaction effect between the exact age and the sex on the IQ score,
was also not significant.
Table 6.5
Relationship of the Test Scores with Sex (N=1124)

Score

Boys
Mean (SD)
(2.8)
(3.0)
(3.0)
(2.9)
(2.9)
(2.9)

Girls
Mean (SD)

(3.0)
(3.0)
(3.0)
(2.9)
(3.0)
(2.8)

0.5
0.2
0.0
0.5
0.7
0.9

3.06 *
1.08
-.04
2.83 *
3.82 *
5.07 *

Patterns
Mosaics
Puzzles
Situations
Categories
Analogies

9.7
9.9
10.0
9.7
9.7
9.5

SON-PS
SON-RS

99.3 (14.9)
97.6 (14.8)

101.0 (15.2)
102.2 (14.9)

1.7
4.6

1.86
5.19 *

SON-IQ

98.4 (14.8)

101.9 (14.9)

3.5

3.94 *

*: p < .01 with two-tailed testing

10.3
10.1
10.0
10.2
10.4
10.4

Difference

61

RELATIONSHIPS WITH OTHER VARIABLES

On the basis of other research data, the decrease or disappearance of the difference between
boys and girls with age is plausible. No sex difference was found during the standardization
research of the SON-R 5,-17 during which 1350 children from 6;6 to 14;6 years were tested.
The mean IQ score of the boys was 100.1 and of the girls 100.0. During the American standardization research of the K-ABC (Kaufman & Kaufman, 1983), a positive difference was found of
4.4 points on the total score for girls in the age group from 2, to 5 years, whereas this difference
was only .2 in the age group from 5 to 12, years. During the standardization of the GOS
2,-4,, the Dutch version of the K-ABC for young children, the total score for the girls proved
to be 4.7 points higher than the total score for the boys (Neutel, Van der Meulen & Lutje
Spelberg, 1996). The results with regard to sex-related differences found in the SON-R tests and
the K-ABC are thus very similar. In the case of adolescents and adults, however, males appear to
perform better on intelligence tests (Lynn, 1994).

6.6 SES LEVEL OF THE PARENTS


Education and occupation
The socio-economic level of the parents was based on their occupational and educational level.
Information on the level of occupation was provided by the index of occupations of the Institute
for Applied Sociology in Nijmegen (Van Westerlaak, Kropman & Collaris, 1975). If a parent
was out of work, he or she was classified according to the last job held. The index of occupations
distinguishes 6 levels. The categories unskilled worker (e.g. grocery packer, construction
worker) and skilled worker (e.g. dockworker, animal keeper) were combined. The occupation
housewife also belongs to that category. The categories lower employee (e.g. bartender, bank
teller) and small independent businessman (e.g. druggist, gardener) were also combined. Two
more categories are distinguished: intermediate employee (e.g. teacher, librarian) and professional (e.g. psychologist, lawyer).
The level of education was based on the highest level that had been completed. These levels
range from the lowest category, i.e. primary school, via the general secondary education stream,
the higher general secondary education stream, the pre-university education stream, to the
highest category, higher vocational education and university.
The occupational and educational level of both parents was known in the case of 1071
children of the norm group. In table 6.6 the distribution according to occupational and educaTable 6.6
Relationship of the IQ Score with the Ooccupational and Educational Level of the Parents
(N=1071)

Occupational Level

Father
Pct

Mean

(SD)

0
1
2
3

33%
32%
19%
16%

96.1
99.6
102.6
108.4

(14.3)
(14.2)
(14.8)
(14.8)

Educational Level

Father
Pct

Mean

(SD)

0
1
2
3
4

7%
38%
29%
19%
7%

92.9
96.9
101.2
104.7
111.6

(12.5)
(14.7)
(13.4)
(14.9)
(15.0)

(Un)Skilled Worker/Housewife
Lower Empl/Sm. Ind. Business
Intermediate Employee
Professional

Primary Education
General Secondary Education
Higher Gen.Secondary Education
Tertiary: Non-University
Tertiary: University

Mother
Pct

Mean

(SD)

39%
44%
14%
3%

96.1
101.5
106.3
110.4

(14.8)
(14.0)
(15.1)
(14.7)

Mother
Pct

Mean

(SD)

6%
43%
34%
14%
3%

92.4
96.6
102.7
107.7
111.5

(12.3)
(14.3)
(14.2)
(14.5)
(15.8)

62

SON-R 2,-7

tional level of both parents, and the mean IQ for each category, is presented for these children.
The table shows that the IQ score clearly increased with the occupational and educational level
of the father and the mother. The correlation with the occupational level of the father was .28
and with the occupational level of the mother .27. The correlation with the educational level of
the father was .31 and with the educational level of the mother .32. All these correlations were
significant at the 1% level.

SES index and SES level


The SES index was based on a combination of the educational and occupational levels of the
parents. The occupational level of the parent with the highest level was added to the educational
level of both parents. If the occupational level of one parent was not known, the level of the
other parent was used. If the educational level of one parent was not known, the educational
level of the other parent was counted twice.
In all, the SES index of 1118 children in the norm group could be calculated. The mean was
4.8, the standard deviation 2.6. The SES index was also categorised according to level. Four
levels were used: low, below average, above average and high. The categories are
referred to as SES levels.
The correlation of the SES index with the IQ score was .34. The Performance Scale had a
correlation of .29 with the SES index, the Reasoning Scale a correlation of .31. Among the
subtests small differences in strength of the correlation were found; Situations had the weakest
correlation (.22) and Analogies had the strongest (.25). The differences between boys and girls
were also slight; the correlations with the IQ score were .36 and .34 respectively.
The correlation between the SES index and the IQ score increased with age; for children of
two and three years of age the correlation was .23, for children of four and five years of age it
was .35 and for six- and seven-year-olds it was .46. In table 6.7 the mean IQ scores per SES level
are presented for the entire group and for the three age groups separately. In the entire group the
difference in mean IQ score between children from a low and from a high SES level was 15
points. In the youngest group this difference was 12 points and in the oldest group 19 points.
The fact that the performances, especially of children with a high SES level, increased with age
is remarkable. Analysis of variance, however, showed that the interaction effect between age
group and SES level was not significant (F[6,1106]=1.23; p=.28). Regression analysis also
showed that the interaction effect between the SES index and the age at which the test was taken
was not significant.
Table 6.7
Relationship of the IQ Score with the SES Level
Entire Group
(N=1118)
SES Level
1 Low
2 Below Average
3 Above Average
4 High

2-3 years
(N=396)

4-5 years
(N=409)

6-7 years
(N=313)

Pct

Mean

(SD)

Mean

(SD)

Mean

(SD)

Mean

(SD)

21%
32%
29%
18%

92.6
98.4
102.8
107.9

(13.7)
(14.0)
(14.0)
(14.9)

93.6
98.9
101.6
105.5

(15.2)
(14.8)
(14.5)
(12.5)

92.6
98.0
102.1
108.1

(13.2)
(14.1)
(13.3)
(17.1)

92.0
98.4
105.7
111.1

(13.2)
(12.6)
(13.8)
(13.9)

6.7 PARENTS COUNTRY OF BIRTH


In this section a short overview is given of the test performances of the immigrant children in the
norm group. In chapter 8 the results of immigrant children will be discussed in detail. In table
6.8 the mean IQ scores are presented for three groups of children; native Dutch children (both
parents born in the Netherlands), immigrant children (both parents born abroad), and a mixed

63

RELATIONSHIPS WITH OTHER VARIABLES

Table 6.8
Relationship Between IQ and Country of Birth of the Parents (N=1116)
Both Parents
Native Dutch

One Parent
Foreign

Both Parents
Foreign

Country of Birth

Mean

(SD)

Mean

(SD)

Mean

(SD)

The Netherlands
Surinam/Antilles
Turkey
Morocco
Other Western
Other Non-Western

969

100.7 (14.9)

11
2
1
27
25

103.6
79.5
82
103.0
101.0

(12.8)
(13.4)
(17.2)
(14.5)

27
18
21
3
12

91.8
94.1
88.8
107.3
98.9

(13.5)
(13.0)
(10.7)
(17.9)
(16.9)

Total

969

100.7 (14.9)

66

101.3 (15.7)

81

93.2 (13.8)

group (one parent born abroad). The eight immigrant children who were later added to the norm
group are not included in this analysis. With reference to the country of birth of the immigrants,
a distinction was made between the three most important groups, i.e. Surinam or the Antilles,
Turkey and Morocco. The remaining countries were subdivided into Western (Europe, North
America, Australia) and non-Western countries.
The mean IQ score of the immigrant children was 93.2, 7.5 points lower than the mean IQ of
native Dutch children. The difference was significant at the 1% level. The mean IQ of the mixed
group, with one foreign parent, was slightly higher than that of the native Dutch children.
However, this difference was not significant. In the mixed and immigrant groups, the performances of the Turkish and Moroccan children were low; the performances of the Surinam and the
Antillean children were above average in the mixed group and low in the immigrant group. The
remaining Western children scored above average in both groups and the remaining nonWestern children had an average score in both groups.

6.8 EVALUATION BY THE EXAMINER


After the test had been administered, the behavior of the child was evaluated by the examiner as
to
motivation,
concentration,
cooperation with the examiner,
comprehension of the directions.
The evaluation categories were poor, mediocre, varying and good. The evaluations
mediocre and varying were combined in the presentation of the results. In table 6.9, the
frequency distribution of the evaluation is presented for three age groups, together with the
mean IQ scores.
Clear differences in evaluation existed between the different ages. The evaluation poor was
rarely given on any aspects for children from four years onwards. Children two and three years
of age received the rating poor between 3% (motivation and cooperation) and 7% (concentration and comprehension of the directions) of the time. The mean evaluation of the four aspects
was good for 69% of the children two and three years of age, for 89% of children four and five
years of age and for 96% of children six and seven years of age. In all age groups, problems with
concentration were mentioned most frequently. In the youngest age group, comprehension of
the directions was also evaluated as mediocre or varying fairly frequently.
In all three age groups the evaluation of concentration and of comprehension of directions
correlated significantly with the IQ score. The correlations were strongest in the youngest age

64

SON-R 2,-7

Table 6.9
Relationship Between the Evaluation by the Examiner and the IQ

Motivation
Poor
Mediocre/Varying
Good

2-3 years (N=396)

4-5 years (N=413)

6-7 years (N=315)

Pct

(SD)

Pct

(SD)

Pct

90.3 (14.9)
98.2 (13.3)
101.4 (15.0)

1%
9%
89%

85.0 ( 4.2)
96.5 (13.6)
100.4 (15.4)

2%
98%

3%
24%
73%

Mean

Correlation
Concentration
Poor
Mediocre/Varying
Good

.15*
Pct
7%
32%
61%

Mean

Pct

89.3 (17.8)
97.0 (12.7)
103.2 (14.5)

1%
17%
82%

Poor
Mediocre/Varying
Good

Mean

3%
20%
77%

Mean

Pct

89.3 (14.7)
97.8 (13.4)
101.4 (14.9)

0.2%
7%
93%

Correlation

Pct

77.0 (11.9)
95.4 (12.4)
101.2 (15.4)

9%
91%

Mean

Pct

Poor
Mediocre/Varying
Good

7%
27%
66%

Correlation

Mean

Pct

Pct

87
93.9 (12.4)
100.3 (15.4)

0.3%
4%
96%

87.3 (11.7)
95.8 (14.1)
103.5 (14.1)

9%
91%

Mean

Mean

(SD)

112
89.5 ( 9.2)
100.7 (14.9)
.11

(SD)

Pct

91.4 (13.2)
100.7 (15.2)

3%
97%

.17*

(SD)

91.8 (16.5)
101.1 (14.5)

.11

(SD)

.33*

Mean

.18*

(SD)

.16*

Comprehension
of directions

106.9 (15.4)
100.1 (14.9)

.21*

(SD)

(SD)

.07

(SD)

.28*
Pct

Mean

.12*

(SD)

Correlation
Cooperation

Mean

Mean

(SD)

88.9 (13.9)
100.6 (14.8)
.14*

*: p < .01 with one-tailed testing

group. In this group the correlations of motivation and cooperation with intelligence were also
significant.
The four evaluations were also combined. Zero was the lowest possible combined score (all
four evaluations poor) and eight the highest possible combined score (all four evaluations
good). The combined score, which gives an indication of how well the child responds to being
tested, increased greatly with age until the age of four years. In the age groups of 2;3, 2;9, 3;3
and 3;9 the means were respectively 5.3, 6.3, 7.2, 7.5. From four years onwards the mean
gradually increased to 7.9 at the age of 7;3 years.

6.9 EVALUATION BY THE TEACHER


The teachers of the children tested at school were asked, at the end of the school year, to
evaluate them on a number of aspects. In general, a period of six to eight months existed
between the test administration and the evaluation. At that time, the schools had not been
informed about the test results. The evaluations were given by teachers of the classes one
through four at 48 different schools (Classes one and two correspond broadly with kindergarten
in the American school system, and with preschool in the English school system. Class three

65

RELATIONSHIPS WITH OTHER VARIABLES

corresponds to first grade or form, and class four to second grade or form of primary schools).
At all schools, an evaluation was requested of motivation, concentration and work tempo of the
child, and on intelligence, motor development and language development. In classes 3 and 4 an
evaluation of the level of reading, writing and arithmetic was also requested. The evaluation was
given on a 5-point scale, ranging from low via average to high.
Table 6.10 presents the correlations between the schools evaluations of these characteristics
and the Performance Scale, the Reasoning Scale and the SON-IQ. Correlations are presented for
the entire group (N=616), and for the pupils of classes 1 and 2 (N=344, mean age 5;2 years) and
the pupils of classes 3 and 4 (N=272, mean age 6;9 years) separately. All correlations were
significant at the 1% level with one-tailed testing.
In classes 1 and 2, the evaluations of intelligence, concentration and language development
had strong relationships with the IQ score (the correlations are .47, .47 and .44 respectively) The
evaluations of motivation and work tempo also had a reasonably strong correlation with the IQ.
The weakest correlation was found for the evaluation of motor development (r=.28). After a
stepwise regression analysis, the multiple correlation of the evaluations of intelligence, concentration and language development with the IQ score was .53. The correlations of the evaluations
with the Performance Scale were higher than with the Reasoning Scale, except for the evaluation of motor development where little difference was found.
In classes 3 and 4, the correlations of the IQ score with the evaluations of intelligence and
language development were slightly weaker than in groups 1 and 2 (.44 and .42 respectively).
The correlations with motivation, concentration and work tempo decreased more, as did the
correlation with the evaluation of motor development. In classes 3 and 4 an evaluation was also
given of the level of reading, writing and arithmetic. Of these, arithmetic had the highest
correlation with the IQ score (r=.36). After stepwise regression analysis, the multiple correlation of the evaluations of intelligence, language development and writing skills with the IQ
score was .48. In classes 3 and 4 the evaluation of writing skills clearly had a stronger correlation with the Performance Scale than with the Reasoning Scale; this was less so for arithmetic
and work tempo. The other evaluations had stronger correlations with the Reasoning Scale.
For all classes combined, the correlation between the SON-IQ and the evaluation of intelligence was .46; the correlations with language development (r=.44) and with the evaluation of
concentration by the teacher (r=.40) were also high.
In table 6.11 the correlations of the subtests with the teachers evaluations are presented.
With the exception of the correlation between writing and Categories, all correlations were
significant at the 1% level. Of all the subtests, Mosaics had the strongest correlation with the
evaluation of intelligence (r=.38) and Situations the weakest (r=.24). Situations also had a weak
Table 6.10
Correlations of the Total Scores with the Evaluation by the Teacher
Classes 1 and 2
(N=344)

Classes 3 and 4
(N=272)

Classes 1-4
(N=616)

Evaluation

PS

RS

IQ

PS

RS

IQ

PS

RS

IQ

Motivation
Concentration
Tempo

.34
.45
.33

.24
.37
.27

.34
.47
.34

.23
.27
.26

.28
.31
.23

.28
.32
.27

.30
.37
.30

.27
.35
.26

.32
.40
.31

Intelligence
Motor Development
Language Development

.44
.25
.41

.37
.24
.34

.47
.28
.44

.37
.17
.36

.42
.18
.39

.44
.19
.42

.42
.22
.40

.40
.22
.37

.46
.24
.44

.26
.31
.34

.29
.23
.31

.31
.30
.36

Reading
Writing
Arithmetic

66

SON-R 2,-7

Table 6.11
Correlations of the Subtest Scores with the Evaluation by the Teacher
Groups 1-4 (N=616)
Evaluation

Pat

Mos

Puz

Sit

Cat

Ana

Motivation
Concentration
Tempo

.25
.31
.24

.24
.30
.27

.25
.29
.22

.15
.24
.14

.21
.24
.22

.24
.30
.22

Intelligence
Motor Development
Language Development

.33
.22
.33

.38
.14
.33

.31
.18
.32

.24
.12
.26

.33
.15
.29

.33
.21
.28

Groups 3 and 4 (N=272)


Evaluation

Pat

Mos

Puz

Sit

Cat

Ana

Reading
Writing
Arithmetic

.21
.28
.27

.17
.18
.33

.24
.28
.23

.18
.17
.19

.24
.13
.19

.24
.23
.30

correlation with the other evaluations. Patterns and Analogies had the strongest correlations
with the evaluation of motor development. The three performance subtests correlated most
highly with the evaluation of language development. Situations and Categories (multiple choice
tests) correlated less strongly with the evaluations of motivation, concentration and work tempo
than did the other subtests. Puzzles, Categories and Analogies had relatively strong correlations
with the evaluation of reading, Patterns and Puzzles with the evaluation of writing, and Patterns
and Analogies with the evaluation of arithmetic.
For the group as a whole, the evaluation of intelligence was low for six children, below
average for 63 children, average for 343 children, above average for 107 children, and high
for 33 children. The mean IQ scores were 74.2, 89.2, 97.7, 107.0 and 114.6 respectively. This
shows a difference of more than 40 IQ points between the children who were evaluated, more
than half a year after administration of the test, by the teacher as being either less or highly
intelligent.
When the fact is taken into account that the relationships examined here refer to subjective
evaluations by a large number of different teachers, and not to standardized measurements of
school achievement, the correlation between the evaluation of intelligence and the SON-IQ can
certainly be called good.

67

RESEARCH ON SPECIAL GROUPS

In practice, intelligence tests are administered mainly to children with a cognitive developmental delay and to children with specific handicaps. Many of these children have a handicap in
communicative skills such as language or speech, and/or hearing problems. With these children,
the use of a nonverbal intelligence test that does not depend on the use of language is a
prerequisite for an independent evaluation of their cognitive skills. In this chapter the results are
discussed of the research carried out with the SON-R 2,-7 on a number of groups of special
children. In chapter 9 the correlations between the SON-IQ and the scores on other tests
administered to these children will be discussed.

7.1 COMPOSITION OF THE GROUPS


Research with the SON-R 2,-7 was carried out at a large number of schools and institutes for
children with problems and handicaps. This was done partially parallel to, and partially following the standardization research. An effort was made to examine all the children in the correct
age group. However, in a few cases the parents refused permission or the school considered it
inadvisable to test the children. The test was administered by staff members at the school/
institute, by examiners of the standardization research, and by trained students participating in
the research projects within the framework of their study (Brouwer, Koster & Veenstra, 1995,
Snippe, 1996).

Types of schools and institutes


Pupils at the following types of schools/institutes participated in the research:
Schools for Special Education with a department for children at risk in their development
The test was administered to 100 children at six schools for special education with a department for young children at risk in their development. Children are usually transferred from
these schools to schools for children with specific learning and educational problems and to
schools for learning disabled children.
Medical daycare centers for preschoolers
The test was administered to 162 children at three medical daycare centers for preschoolers.
These establishments provide daytime treatment, from the age of about one and a half
onwards, for children with a developmental disorder. Such disorders are usually the result of
a combination of psychic, somatic and social factors.
Schools for children with Language, Speech and Hearing disorders
Three schools for children with speech or language problems and children with impaired
hearing participated in the research. One hundred and eighty-three children were tested at
these schools.
The outpatients department for nose, throat and ear surgery
Children with speech or language and hearing problems were also tested at the outpatients
department for nose, throat and ear surgery of a University hospital, where they were undergoing psychological examination in connection with their problems. This group consisted of
90 children.
Institutes for the deaf
Children who were being educated at, or receiving guidance from, one of the five institutes
for the deaf in the Netherlands were tested. The research group was limited to native Dutch

68

SON-R 2,-7

children and children who did not have multiple handicaps. The results of the pupils at one
institute for the deaf were not taken into account in the presentation because of a strong
examiner effect. At the four other institutes for the deaf 95 children were tested with the
SON-R 2,-7.
Autism teams
Three different autism teams tested 44 children who were diagnosed as autistic or as having a
developmental disorder related to autism. Autism teams are ambulatory institutions concerned with the diagnosis and guidance of children with these disorders. Autism and autism
related disorders belong to the category of pervasive developmental disorders (APA, 1987).

The research groups


Children with different problems can be placed at the same school or institute. For the analysis
of the results, the children were therefore grouped according to the nature of their problem
rather than the type of school or institute. The relationship between the type of school and the
research group is presented in table 7.1. The following research groups were formed:
General developmental delay
Children with a general developmental delay were pupils at the schools with a department for
children at risk in their development and at the medical preschool daycare centers. Various
cognitive, social and emotional factors play a role in the referral of these children. The test
scores of both groups of children were very similar. The entire group consisted of 238
children.
Pervasive developmental disorders
Half the group of 90 children with a pervasive developmental disorder were children tested
by the autism teams. All the children from the other groups who were diagnosed as autistic or
having an autism related disorder were also included in this group. These were mainly pupils
from schools for children with language, speech and hearing disorders, from schools with a
department for children at risk in their development and children from medical preschool
daycare centers.
Language and/or speech disabilities
This group consisted of pupils from the schools for children with language, speech and
hearing disorders, and children tested in the outpatients department for nose, throat and ear
surgery, who had a language and/or speech disorder. If they also had a hearing loss, it was less
than 30 dB. The entire group consisted of 179 children.
Table 7.1
Subdivision of the Research Groups
Research Group
School/
Institute
Special schools for children
at risk in their development
Medical daycare centers
for preschoolers
Schools for children with
lang./sp./hear. disorders
Outpatients department for
nose/throat/ear surgery
Institute for the deaf
Autism teams
Total

General
developm.
delay

Pervasive
developm.
disorder

100

89

11

162

149

183

Language/
speech
disorder

Hearing
impaired

Deaf

13

21

116

44

90
95
44

1
44

63

27
2

92

674

238

90

179

73

94

69

RESEARCH ON SPECIAL GROUPS

Hearing impaired
This group consisted of 73 hearing-impaired children with a hearing loss of more than 30 dB
and less than 90 dB. The children were mainly pupils from the schools for children with
language, speech and hearing disorders, and the outpatients department for nose, throat and
ear surgery. Two pupils who had been tested at the Institute for the deaf were also included in
this group.
Deaf
The deaf children had a hearing loss of at least 90 dB. The group of 94 deaf children consisted
mainly of children who had been tested at the Institutes for the deaf. Two pupils from the
schools for children with language, speech and hearing disorders, with a hearing loss of more
than 90 dB were also included in this group.

Background of the children


In table 7.2 the distribution of the five research groups is presented according to sex, age, and
socio-economic level. A distinction is also made between native Dutch and immigrant children.
Table 7.2
Composition of the Research Groups
Research Group
General
development
delay

Pervasive
development
disorder

Speech/
language
disorder

Hearing
impaired

Deaf

72%
28%

79%
21%

70%
30%

63%
37%

60%
40%

Mean
(SD)

5;2
(1;2)

5;6
(1;2)

5;1
(1;1)

5;3
(1;3)

5;3
(1;3)

2 years
3 years
4 years
5 years
6 years
7 years

4%
13%
24%
31%
24%
5%

1%
11%
21%
27%
30%
10%

3%
17%
25%
34%
20%
1%

1%
18%
25%
21%
29%
7%

1%
19%
20%
25%
28%
7%

3.1
(2.1)

5.0
(2.7)

3.9
(2.3)

4.4
(2.2)

5.2
(2.8)

8%

4%

17%

23%

7%

86%
8%
6%

88%
5%
8%

96%
3%
1%

95%
2%
4%

94%
6%
0%

9%

1%

18%

23%

10%

Sex
Boys
Girls
Age

SES Index
Mean
(SD)
Unknown
Country of birth
Native Dutch
Mixed
Immigrant
Unknown

70

SON-R 2,-7

Children with one parent who was born outside the Netherlands belong to the mixed category.
Boys were over-represented in all groups. This is the case, in particular, in the groups with a
general developmental delay, a pervasive developmental disorder and with speech or language
disorders. In these groups the percentage of boys varied between 70% and 79%. On a national
scale, the percentage of boys in the age range up to 8 years in special education is also twice as
high as the percentage of girls (CBS, 1993). In the groups of hearing-impaired and deaf children, the ratio of boys to girls was lower, with the percentage of boys approximately 60%.
The age distribution was very similar in the various groups. The mean age varied from 5;1 to
5;6 years. Most children were between 3 and 6 years old at the time of the test administration. A
small number of two-year-olds (most older than 2;6 years) and a small number of seven-yearolds (most younger than 7;6 years) were tested.
In the norm group the mean SES level, based on the educational and occupational level of the
parents, was 4.8 with a standard deviation of 2.6. The mean SES level of the children with a
pervasive developmental disorder and of the deaf children was slightly higher; for the hearingimpaired children it was slightly lower. The SES level of the speech or language disabled
children was clearly lower with a mean of 3.9, and the mean SES index of 3.1 of the children
with a general developmental delay was very low. In view of the relationship between the level
of intelligence and the SES level in the norm population, a developmental delay may be expected to occur more frequently in children with a low SES level.
The percentage of native Dutch children in the groups of children with a speech or language
disorder, of hearing-impaired children and of deaf children was relatively high. This was
because, in the group of deaf children, immigrant children did not meet the selection criteria,
and in the other two groups, the research was carried out in the North of the Netherlands where
relatively few immigrants live.

7.2 THE TEST SCORES OF THE GROUPS


In table 7.3, the means of the scores on the different subtests, and of the total scores are
presented for each group. In the second part of the table, the deviation of the mean of each
subtest from the mean of all subtests is shown for each group. In the last part of the table the
distribution of the IQ scores is presented for five intervals of 20 points. The results for each
group will be discussed, then the results of the different groups will be compared.

Children with a general developmental delay


The children with a general developmental delay had a mean IQ score of 80.3 with a relatively
high standard deviation of 17.6. Nearly 70% had a score lower than 70 (this was 2% in the norm
group). Three percent of the children had a score higher than 110 (26% in the norm group).
There was a small difference between the mean scores on the Performance Scale and on the
Reasoning Scale, but it was not statistically significant (t[237]=1.91, p=.06). This group scored
lowest on Patterns (mean=6.3) and highest on Puzzles (mean=8.0).
The mean IQ score of the children who were tested at the schools for special education with
a department for children at risk in their development (mean=79.5) did not deviate significantly
from the mean score of the children who were tested in the medical daycare centers for preschoolers (mean=80.8). However, the dispersion in scores in the latter group was greater
(sd=19.1) than in the former group (sd=14.9).

Children with a pervasive developmental disorder


Children with a pervasive developmental disorder had a mean IQ score of 78.3 with a
relatively high standard deviation of 18.7. In this group, 75% of the children had a score
lower than 90, and 6% had a score higher than 110. Hardly any difference existed between
the mean scores on the Performance Scale and on the Reasoning Scale. The lowest scores
were obtained on the subtest Patterns (mean=5.6) and the highest scores on the subtest
Analogies (mean=7.8).

71

RESEARCH ON SPECIAL GROUPS

Table 7.3
Test Scores per Group
Mean and Standard Deviation
General
developm.
delay
(N=238)

Pervasive
developm.
disorder
(N=90)

Speech/
language
disorder
(N=179)

Hearing
impaired
(N=73)

Mean (SD)

Mean (SD)

Mean (SD)

Mean (SD)

Pat
Mos
Puz
Sit
Cat
Ana

6.3
6.6
8.0
7.8
7.2
7.6

(3.3)
(3.5)
(3.3)
(3.3)
(3.5)
(2.7)

5.6
7.1
7.5
7.0
6.5
7.8

(3.6)
(3.9)
(3.3)
(3.4)
(3.7)
(3.4)

7.7
8.3
8.6
8.6
7.8
8.6

(3.0)
(3.2)
(3.0)
(3.0)
(2.9)
(3.1)

8.2
8.6
9.3
9.5
8.9
9.1

Deaf
(N=94)

(2.9)
(3.2)
(2.9)
(3.3)
(3.2)
(3.0)

Mean (SD)
9.9
9.9
10.3
10.5
8.4
9.2

(3.0)
(2.7)
(3.3)
(2.8)
(2.6)
(3.1)

PS
RS

81.4 (17.8)
83.2 (17.3)

80.2 (19.1)
80.9 (17.8)

88.8 (15.8)
88.8 (15.7)

91.9 (15.5)
94.4 (16.9)

100.0 (15.3)
95.9 (13.6)

IQ

80.3 (17.6)

78.3 (18.7)

87.5 (15.9)

92.2 (16.6)

97.9 (14.4)

Deviation from the Mean Subtest Score per Subtest


General
developm.
delay
Pat
Mos
Puz
Sit
Cat
Ana

1.0
.6
.8
.5
.1
.3

Pervasive
developm.
disorder

Speech/
language
disorder

Hearing
impaired

Deaf

.6
.0
.3
.4
.5
.4

.7
.4
.3
.6
.1
.2

.2
.2
.6
.8
1.3
.5

1.3
.2
.6
.0
.5
.9

Frequency Distribution of the IQ Scores

Interval

Norm
group

General
developm.
delay

Pervasive
developm.
disorder

Speech/
language
disorder

Hearing
impaired

Deaf

50- 69
70- 89
90-110
111-130
131-150

2%
23%
49%
24%
2%

28%
40%
28%
3%
0%

32%
43%
19%
6%
0%

12%
46%
36%
6%
1%

8%
34%
48%
8%
1%

1%
32%
46%
20%
1%

The children with the diagnosis of autism had a lower IQ score (mean=73.3, N=38) than the
children with the diagnosis of autism related disorder (mean=82.0, N=52; t[88]=2.23, p=.03).
The largest difference between the autistic children and the children with an autism related
disorder was found in the subtests Categories and Situations. Apparently the autistic children
had difficulty completing reasoning tests that use concrete pictures and situations.
The mean IQ score of the children tested by the autism teams did not differ from the mean
scores of the children with a pervasive developmental disorder who were tested at other schools/
institutes.

72

SON-R 2,-7

Children with a speech or language disorder


The mean IQ score of the children with a speech or language disorder was 87.5 with a standard
deviation of 15.9. More than one third of the children had a score between 90 and 110. More
than half had a score lower than 90, and 7% a score higher than 110. The means of both scale
scores were the same. The mean subtest scores deviated less than in the two previous groups.
A slight loss of hearing (less than 30 dB) or varying conductive hearing losses occured in
more than half the children. These children had a lower IQ score (mean=85.7, N=85) than
the children with good hearing (mean=89.6, N=70) but this difference was not significant
(t[153]=1.52, p=.13).
The difference in mean IQ scores between the children who were tested at the schools for
children with a speech, language and hearing disorder and children from the outpatients department was not significant.

Hearing-impaired children
The mean IQ score of hearing-impaired children was 92.2 with a standard deviation of 16.6. The
mean score on the Reasoning Scale (mean=94.4) was slightly higher than the score on the
Performance Scale (mean=91.9). However, the difference was not significant (t[72]=1.58,
p=.12). The differences between the mean scores on the subtests were also small.
No difference in IQ scores occurred between the children with a loss of hearing of 30-59 dB
(mean=92.5, N=36) and the children with a loss of hearing of 60-89 dB (mean=92.7, N=43).
Hardly any difference in mean IQ scores was found between the children who were tested at
the schools for children with speech, language and hearing disorders, and the children from the
outpatients department.

Deaf Children
The research with deaf children was restricted to native Dutch children who were not multiply
handicapped. A few children with one parent who was born outside the Netherlands were
included in the analysis. The mean IQ score of the deaf children was 97.9 with a standard
deviation of 14.4. As in the norm group, nearly half the children had an IQ score between 90 and
110. A clear difference was found between the scores on the Performance Scale (mean=100.0)
and on the Reasoning Scale (mean=95.9; t[93]=2.82, p=.01). Deaf children obtained the lowest
scores on the subtests Categories (mean=8.4) and Analogies (mean=9.2). The scores on the
other subtests deviated only slightly from the mean of 10 found in the norm group.
These results were very similar to those of the research carried out using the SON-R 5,-17
with the entire population of older deaf children (Snijders, Tellegen & Laros, 1989). The native
Dutch deaf children, who were not multiple handicapped (three quarters of the deaf population),
had a mean score on the SON-R 5,-17 of 97.0 and, as on the SON-R 2,-7, the lowest score was
on the subtests Categories and Analogies. In the research with the SON-R 5,-17, these abstract
reasoning tests also appeared to have the most substantial relationship with the STADO-R, a
written language test for deaf children (De Haan & Tellegen, 1986).

Comparisons between the groups


The differences in mean IQ scores among the five groups were highly significant
(F[4,669]=26.10, p<.001). Differences between pairs of groups were tested at the 5% level,
using the modified LSD procedure (test for the least-significant differences). The difference
between the children with a general developmental delay and the children with a pervasive
developmental disorder was not significant. Both groups scored significantly lower than the
three other groups. The children with a speech or language disorder differed significantly from
the deaf children, but not from the children with impaired hearing (hearing loss less than 90 db).
The children with impaired hearing did not score significantly lower than the deaf children.
In figure 7.1, 80% intervals of the distribution of the IQ scores are presented for the different
groups. In each group 10% of the children have a lower score, and 10% have a higher score. To
facilitate comparison, the interval for the children in primary education, four years and older,
from the standardization research is also shown. The intervals illustrate the substantial differ-

73

RESEARCH ON SPECIAL GROUPS

Figure 7.1
Distribution of the 80% Frequency Interval of the IQ Scores of the Various Groups

50
|

60
|

70
|

80
|

90
|

100
|

110
|

120
|

<

>
Primary Education

<

>
Deaf

<

>
Hearing Impaired

<

>
Speech/language Disorder

<

>
Pervasive Developmental Disorder
<

>
General Developmental Delay

|
50

|
60

|
70

|
80

|
90

|
100

|
110

|
120

ences between the groups. The children with a general developmental delay and the children
with pervasive developmental disorders had low performance levels. Deaf children were very
similar to children in primary education. The children with impaired hearing and the children
with a speech or language disorder took an intermediate position.
Besides these differences, the figure also shows a large overlap in the distributions of the
groups. The mean scores of the children with a developmental disorder or delay were low, but
in both groups a good 10% of the children had a score higher than 100, which is the mean of
the norm population. In contrast, 10% of the children in these groups had a score of 50 or
thereabouts, which means that they performed at such a low level that the test did not differentiate further.
In all the groups, children performed relatively poorly on the subtests Categories and (with
exception of the deaf) Patterns. In all the groups, children performed relatively well on Puzzles,
Situations and (with exception of the deaf) on Analogies. The results on Mosaics varied (see
table 7.3).
When evaluating the differences between the groups, the manner in which the groups were
selected must be taken into account. Most of the children examined attended special schools and
institutes that had strict selection procedures for admittance. Children who had, for example, a
pervasive developmental disorder or with impaired hearing, but who were in regular education
were strongly under-represented. In their case, a cognitive delay is less likely to occur. On the
other hand, autistic children in daycare centers for the mentally disabled were not included in
the research. The results are only representative for the children at the kinds of schools and
institutes listed above, and then only to a limited extent due to the small number of schools and
institutes involved. No statement can be made on the basis of this research about the intelligence of autistic children, or the intelligence of children with impaired hearing. Only in the

74

SON-R 2,-7

case of the deaf children was an effort made to obtain a representative picture of the intelligence
level of (native Dutch) deaf children who are not multiple handicapped.

7.3 RELATIONSHIP WITH BACKGROUND VARIABLES


A variance analysis for a number of background variables such as sex, age, SES level and
immigrant status was carried out with the IQ score as dependent variable, controlling for the
research group. No significant interaction effect with the research groups was found for any of
the variables. In table 7.4 the mean values of the IQ scores are presented as the deviation from
the total mean after controlling for the research group.
Few differences were found between boys and girls (p=.64), or among the three age groups
of two and three years, four and five years, and six and seven years (p=.59). A relationship with
the SES level of the parents (p=.02) was found, but this was much weaker than in the norm
group. The difference between the native Dutch children, the immigrant children and the children with a mixed background was not significant (p=.17). However, different background
characteristics (like sex and SES level) played an indirect role in the referral to the special
schools, because of the relative frequency of developmental problems among boys and among
children with a low SES level.
Table 7.4
Relationship of the IQ Scores with Background Variables
Sex

Boys
Girls

Age
N

Dev

470
204

.2
.5

2-3 j.
4-5 j.
6-7 j.

SES Level

Country of birth

N Dev

Dev

Dev

121 1.3
354 .5
199 .2

Low
172
Below aver. 233
Above aver. 115
High
77

2.7
.5
3.3
2.6

Native Dutch 538


Mixed
32
Immigrant
23

.2
5.0
2.8

7.4 DIAGNOSTIC DATA


Diagnostic data for a large number of pupils from the schools for special education with a
department for children at risk in their development and from the medical daycare centers had
been gathered during the admittance procedure to the school or the daycare center in question.
The data refer to the home situation, the existence of emotional problems, behavioral problems
and communicative handicaps, and also include an evaluation of motor, language and cognitive
development. Complete data sets were available for 238 children, 93 children from a department
for children at risk and 145 children from a medical daycare center. Twenty-four of these
children had a pervasive developmental disorder. The mean IQ score of the entire group of 238
children was 80.9 with a standard deviation of 17.1.
In table 7.5, the distribution of the diagnostic variables is presented together with the
mean IQ scores for each category. Various problems and delays appear to be present in all
the diagnostic variables. The most favorable evaluation was found in relation to communicative handicaps (60% none) and motor development (40% normal). Serious behavioral
problems and large delays in language development were mentioned most frequently. With
respect to the evaluation of cognitive development, nearly half the children had a small delay
and 20% had a large delay.
The correlations between the IQ scores and the evaluation of the home situation, and of
emotional and behavioral problems, were weak and not significant. The relationships with the
other diagnostic variables were significant on the 1% level. The correlation with communicative

75

RESEARCH ON SPECIAL GROUPS

Table 7.5
Reasons for Referral of Children at Schools for Special Education and Medical Daycare Centers for Preschoolers (N=238), with mean IQ scores

Normal
Pct Mean
Home situation

Emotional problems
Behavioral problems
Communicative handicap

Motor development
Language development
Cognitive development

29%

80.1

Fairly
Unfavorable
Pct Mean
48%

80.6

Very
Unfavorable
Pct Mean
23%

82.9

None
Pct Mean

Light
Pct Mean

Severe
Pct Mean

17%
14%
60%

59%
51%
30%

24%
35%
10%

79.9
80.5
83.7

83.3
81.5
79.6

75.9
80.3
67.7

Normal
Pct Mean

Small
Delay
Pct Mean

Large
Delay
Pct Mean

40%
24%
32%

48%
44%
48%

12%
32%
20%

91.8
93.6
95.6

73.4
80.2
78.1

74.1
72.2
64.3

handicaps was -.26. The correlation with both motor and language development was .46. The
SON-IQ correlated most strongly, .66, with the evaluation of cognitive development. The mean
IQ score of the children whose cognitive development had been evaluated as normal was 95.6,
whereas the mean IQ score of the children with a large delay was more than 30 points lower, i.e.,
64.3. With a stepwise multiple regression the correlation with the IQ increased slightly, from .66
to .67, when motor development was also taken into account.
The Performance Scale and the Reasoning Scale both correlated strongly with the evaluation
of cognitive development (.59 and .61). The Performance Scale had a stronger correlation with
the evaluation of motor development (r=.43) than with the evaluation of language development
(r=.40). The Reasoning Scale had a higher correlation with the evaluation of language development (r=.44) than with the evaluation of motor development (r=.39).

7.5 EVALUATION BY THE EXAMINER


As was done during the standardization research, all the children in the special groups were
rated by the examiner on motivation, cooperation, and comprehension of the directions, following the test administration. In table 7.6 the ratings and the mean IQ scores are presented for each
group. In a small number of cases, (approximately 2%), motivation, cooperation, or comprehension of the directions was evaluated as poor. An exception was the group with a pervasive
developmental disorder, where cooperation was evaluated as poor in 8% of the children.
Concentration was evaluated as low in 5% of the children. In the case of the deaf children,
however, this was 1%. Concentration and, to a lesser degree, motivation were frequently rated as
mediocre or varying. Cooperation and comprehension of the directions were most frequently
rated as good.
The deaf children were evaluated most positively. On average, the children with a pervasive
developmental delay had the lowest evaluation with relation to motivation, concentration, cooperation and comprehension of directions.

76

SON-R 2,-7

Table 7.6
Relationship between IQ and Evaluation by the Examiner

Motivation

General
developm.
delay
(N=238)

Pervasive
developm.
disorder
(N=90)

Speech/
language
disorder
(N=179)

Hearing
impaired
(N=73)

Deaf
(N=94)

Pct Mean

Pct Mean

Pct Mean

Pct Mean

Pct Mean

Poor
2%
Mediocre/Varying 33%
Good
65%

63.7
73.4
84.4

Correlation

.33*

.37*

.33*

.41*

Pct Mean

Pct Mean

Pct Mean

Pct Mean

Concentration

2%
29%
69%

Poor
6%
Mediocre/Varying 44%
Good
50%

62.9
78.1
84.1

Correlation

.28*

.39*

Pct Mean

Pct Mean

Cooperation

Poor
1%
Mediocre/Varying 24%
Good
75%

61.7
78.6
81.1

Correlation

.11

Comprehension
of directions

Pct Mean

Poor
3%
Mediocre/Varying 22%
Good
75%

55.9
75.0
82.9

Correlation

.31*

4%
43%
52%

54.5
69.7
82.8

8%
14%
78%

61.3
72.0
85.1

60.7
72.2
81.3

2%
22%
75%

5%
39%
56%

76.3
78.1
90.6

21%
79%

92.6
99.3
.19

Pct Mean
1% 68
26% 92.2
73% 100.3

.44*

.49*

.31*

Pct Mean

Pct Mean

Pct Mean

66.3
76.5
89.4

6%
38%
56%

71.5
84.2
96.7

74.0
84.7
99.1

2%
12%
86%

72.3
80.7
93.4

3%
30%
67%

1%
12%
86%

86
78.7
94.3

13%
87%

86.1
99.6

.32*

.32*

.28*

.32*

Pct Mean

Pct Mean

Pct Mean

Pct Mean

2%
28%
70%

50.0
71.3
82.0
.34*

3%
15%
82%

79.2
76.6
89.8
.28*

3%
20%
77%

63.5
84.9
95.2
.38*

1%
9%
90%

83
90.5
98.8
.19

*: p < .01 with one-tailed testing

Sixty-five percent of the entire group of 674 children were rated good on all four aspects, or on
three aspects, with the fourth rated as mediocre/varying. Eleven percent had a mean rating of
mediocre/varying, or lower.
In comparison to the standardization research, the evaluations of the children from these
special groups were most similar to the evaluations of children two and three years of age.
However, children from special groups received much higher ratings for comprehension of the
directions than did the two- and three-year-olds in the standardization research.
The ratings of motivation, cooperation, and comprehension of the directions correlated
significantly with the IQ score in most groups. The correlations were strongest in the group with
impaired hearing and for the evaluation of concentration. The correlations were substantially
stronger than in the norm group. The main cause for this is that a negative evaluation was more
frequently given in the special groups.

77

RESEARCH ON SPECIAL GROUPS

7.6 EVALUATION BY INSTITUTE OR SCHOOL STAFF


In the case of a large number of children who were tested at schools and institutes, a staff
member, closely concerned with the child evaluated the following four aspects: intelligence,
language development, fine motor skills and communicative orientation (the extent to which the
child seeks and maintains contact with others in his or her surroundings). The evaluation was
given on a five-point scale running from low via intermediate to high in the case of intelligence and language development, and in the case of motor activity and communication from
low via reasonable to high.
In general the evaluation was carried out after the schools and institutes had received the
provisional results on the test. The possibility that the results on the SON-R 2,-7 influenced
the evaluation cannot be excluded. However, many other test and research data on the
children were available at the schools, so that the question remains whether the test results
on the SON-R 2,-7 contributed much to the evaluation. We also do not know whether the
person making the evaluation was acquainted with the results. Because a certain amount of
contamination may have occurred, the results presented in this section must be interpreted
with care.
Mean evaluations and their correlations with the test scores are presented in table 7.7 for two
broad groups. The first group consisted of children with a general developmental delay (N=222)
and children with a pervasive developmental disorder (N=46). The second group consisted of
children with a speech or language disorder (N=105), children with impaired hearing (N=42)
and deaf children (N=94).
In the group of children with a general developmental delay or with pervasive developmental
disorders, the subjective evaluation of intelligence and language development was generally
low. In the speech/language/hearing-impaired group, the children were given a relatively low
evaluation regarding their language development; on all other aspects, the mean evaluation was
higher than in the first group.
The mean IQ score in the first group was 80.9 with a standard deviation of 17.1; the correlation with the evaluation of intelligence was .68. In the second group, where the dispersion of
Table 7.7
Correlations Between Test Scores and Evaluation by Institute or School Staff Member

Distribution
Mean
SD
correlation

General developmental delay


Perv. developm. disorder (N=268)

Speech/language disorder
Hearing impaired/deaf (N=241)

Intell. Language Motiv. Commun.

Intell.

2.4
(.9)

2.3
(.8)

2.9
(.9)

3.0
(.9)

Intell. Language Motiv. Commun.

2.9
(.7)
Intell.

Language Motiv. Commun.


2.1
(1.0)

3.2
(1.1)

3.5
(1.0)

Language Motiv. Commun.

Patterns
Mosaics
Puzzles
Situations
Categories
Analogies

.56
.60
.37
.50
.58
.43

.42
.41
.19
.39
.45
.27

.44
.36
.36
.25
.31
.36

.24
.15
.19
.19
.28
.15

.53
.51
.45
.37
.45
.31

.35
.27
.22
.28
.12
.10

.34
.21
.24
.20
.22
.17

.21
.21
.18
.10
.20
.06

SON-PS
SON-RS

.59
.64

.40
.47

.45
.38

.22
.26

.59
.50

.33
.22

.32
.26

.24
.15

SON-IQ

.68

.48

.46

.27

.61

.31

.32

.23

correlations > .14 are significant at the 1% level

78

SON-R 2,-7

both IQ scores and the evaluation of intelligence was narrower, the correlation was also weaker,
i.e., .61.
In the group of children with a developmental delay, the correlation of the Reasoning Scale
with the evaluation of intelligence was higher than that of the Performance Scale. The correlations with the subtests Puzzles and Analogies were relatively weak. In the group of children
with language/speech/hearing disorders, the Performance Scale had the highest correlations
with the evaluation of intelligence; Situations and Analogies had the lowest correlations.
Patterns and Mosaics had strong correlations with the evaluation of intelligence in both groups.
Reasonably strong correlations with the evaluation of language development and fine motor
development were also found in both groups. Patterns had the strongest correlation with the
evaluation of motor skills. The Performance Scale correlated more strongly than the Reasoning
Scale with motor skills. The correlations between the test scores and the evaluation of the
communicative orientation of the child were positive but weak.
Using a stepwise multiple regression analysis, the extent of the influence of the other evaluations on the correlation between the evaluation of intelligence and the SON-IQ was examined.
In both groups the correlation increased when the evaluation of motor skills was included; in the
first group from .68 to .74, and in the second group from .61 to .65.

7.7 EXAMINER EFFECTS


The evaluation of examiner effects was much more difficult in the special groups than in the
standardization research, because large differences existed between the groups and because
most or all of the children tested by an examiner belonged to one specific group. Furthermore,
the number of children tested by each examiner was much smaller than in the standardization
research. Using a variance analysis, the differences in IQ scores between the examiners was
tested. The school evaluation of intelligence and fine motor activity, and the SES index were all
controlled for. The comparison was limited to examiners who had tested at least 20 children.
The number of examiners was 11 and the number children 446.
The main examiner effect, after controlling for the other variables, was significant
(F[10,426]=2.81, p<.01). The beta coefficient was .17 and the mean absolute deviation of the
examiners from the total mean was 2.4 IQ points. These results correspond to the size of the
examiner effect in the standardization research where the absolute deviation of the examiners
was 2.2 points.
The children from one of the Institutes for the deaf, who were all tested by the same examiner,
were not taken into account in the presentation of the results of the deaf children. The mean IQ
score of these children was 82.2 (sd=14.7). This deviated significantly (p<.001) from the mean
IQ score of 97.9 of the other 94 deaf children. The difference was especially large on the
Reasoning Scale, i.e., 20 points; on the Performance Scale the difference was 11 points.
The SON-R 2,-7 and the Performance Scale of the WPPSI-R were administered, a few
months apart, to 19 of the 22 children at this Institute. The mean PIQ score was 104.2, a
difference of more than 23 points with the SON-IQ. Though the scores on the WPPSI-R may be
overestimates, as the directions are adapted and the norms slightly dated, these results suggest a
strong examiner effect in the administration of the SON-R 2,-7. Three years later the SON-R
5,-17 was administered to 18 children at this institute. The mean score was 100.2, nearly 17
points higher than the IQ score on the SON-R 2,-7.
The fact that the correlations with both the WPPSI-R PIQ (r=.82) and the SON-R 5,-17 IQ
(r=.66) were reasonably strong, is noteworthy. This means that the examiner effect occurred
systematically. We suspect that the low performances of the children with this examiner were
related to the short time she had been working at the Institute for the deaf. The examiner was
probably less able to make the aim of the tasks clear to the deaf children. This would also
explain why the performance, especially on reasoning tasks, was so low; the nature of these
tasks is less obvious than that of the tasks in the performance subtests.

79

RESEARCH ON SPECIAL GROUPS

7.8 PSYCHOMETRIC CHARACTERISTICS


The correlations between the subtest scores, and the distinction between performance and
reasoning tests, were examined in two groups of children. The relative frequency of heterogeneous test profiles in these groups, in comparison to the standardization group, was also examined.
The first group consisted of children with a general developmental delay and children with a
pervasive developmental disorder. The second group consisted of children with a speech or
language disorder, children with impaired hearing and deaf children.

Correlations between subtests


The mean correlation between the subtests was .51 in the first group of children, and .44 in the
second. This was higher than the mean correlation of .37 for the four- and five-year-olds in the
norm group. The higher correlations can be (partially) explained by the wider dispersion of the
subtest scores; the mean variance of the subtest scores was 11.3 in the first group and 9.5 in the
second group. In the norm population the variances of the subtest scores equal 9.0.
The correlations between the subtests are presented in table 7.8. In both groups, Mosaics had
the strongest correlation with Patterns (.71 and .63). In both groups, the correlation of Analogies
with Puzzles and Situations was relatively weak. In the second group the correlation of Situations with Mosaics and Categories was also weak.
The three performance tests correlated most strongly, in both groups, with the sum of the
remaining subtests. In the first group, the correlation of Analogies with the total score was
weakest; in the second group the correlation of all three reasoning tests with the total score was
relatively weak.
Because the correlations between the subtests were stronger than in the norm group, the
generalizability coefficient of the IQ score in both groups was also higher. In the first group
alpha was .86, in the second group .82. In the norm group the generalizability in the comparable
age range was .78.
Table 7.8
Correlations Between the Subtests and Subtest-Rest Correlations
General developmental delay and
Pervasive developm.disorder (N=328)
Puz

Sit

Cat

Speech/language disorder and


Hearing impaired/Deaf (N=346)

Pat

Mos

Ana

Pat
Mos
Puz
Sit
Cat
Ana

.71
.62
.53
.53
.48

.58
.49
.48
.48

.50
.45
.39

.56
.42

.47

Subt.
Rest

.75

.71

.66

.64

.63

.57

Pat

Mos

Puz

Sit

Cat

Ana

Pat
Mos
Puz
Sit
Cat
Ana

.63
.54
.40
.43
.43

.55
.37
.42
.42

.46
.40
.37

.37
.34

.46

Subt.
Rest

.66

.66

.63

.51

.55

.54

Principal Components Analysis


A PCA was carried out in both groups. The results were discussed in section 5.4 and presented
in table 5.9. In the combined group of children with a general developmental delay and children
with a pervasive developmental disorder, the first two factors explained 71% of the variance.
The loadings of the subtests on the rotated factors were consistent with the distinction between
performance and reasoning tests.
In the group of children with a speech, language or hearing disorder, the first two factors
explained 66% of the variance. The loadings for five of the subtests corresponded to the distinc-

80

SON-R 2,-7

tion between performance and reasoning tests. However, the subtest Situations had its highest
loading on the first (performance) factor.

Individual profile
The intra-individual differences among the subtest scores of the children in the special groups
were not exceptionally large. In the standardization research the mean dispersion of the six
scores was 2.0 with a standard deviation of .7. The mean for children from the special groups
was 2.1 with a standard deviation of .7. The means varied from 1.9 for children with impaired
hearing to 2.2 for children with a pervasive developmental disorder.

81

IMMIGRANT CHILDREN

In this chapter a study is made of the test performances of children one or both of whose parents
were born outside the Netherlands. These children were tested in the standardization research
(N=147), or attended a preschool playgroup (N=8), or a primary school where complementary
research projects were carried out (N=54). Of these 209 children, 118 were immigrant children
(both parents were born outside the Netherlands) and the remaining 91 children belonged to the
mixed group (one parent born outside the Netherlands). In section 8.5 the results of the immigrant children will be compared with the results of 90 children participating in OPSTAP(JE), a
program to stimulate the development of immigrant children.

8.1 THE TEST RESULTS OF IMMIGRANT CHILDREN


The test scores of the mixed and immigrant groups were compared with the scores of the native
Dutch children from the standardization research. The mean ages were 4;9 years in the native
Dutch group, 5;3 years in the mixed group and 5;6 years in the immigrant group. The percentage
of boys in the mixed group was 47% and in the immigrant group 52%.
The mean scores are presented in table 8.1. The performances of the children with a mixed
background differed only slightly from the performances of native Dutch children. The differences for the total scores were negligible, and none of the differences in the subtest scores were
significant at the 5% level. However, the mean scores of the immigrant children were clearly
lower than those of the native Dutch children. The mean IQ score of the immigrant children was
nearly 8 points lower than that of the native Dutch children. With the exception of Analogies, all
differences were significant at the 1% level.
In the group of immigrant children the differences between the mean subtest scores were
slight. The biggest difference, between Mosaics (8.7) and Analogies (9.3), was not significant at
the 5% level. The fact that the biggest difference in subtest scores in the mixed group also
Table 8.1
Test Scores of Native Dutch Children, Immigrant Children and Children of Mixed Parentage

Score
Patterns
Mosaics
Puzzles
Situations
Categories
Analogies

Native Dutch
(N=969)

Mixed
(N=91)

Immigrant
(N=118)

Mean (SD)

Mean (SD)

Mean (SD)

10.1
10.1
10.1
10.1
10.2
10.0

(2.9)
(3.0)
(3.0)
(2.9)
(2.9)
(2.9)

10.0
9.7
10.3
9.8
10.0
10.5

(2.6)
(2.9)
(3.0)
(2.9)
(3.1)
(3.2)

8.9
8.7
9.2
9.1
8.8
9.3

(2.8)
(3.1)
(2.4)
(2.8)
(3.0)
(3.2)

SON-PS
SON-RS

100.7 (15.2)
100.4 (14.8)

100.2 (14.3)
100.5 (15.2)

93.4 (13.6)
93.8 (15.6)

SON-IQ

100.7 (14.9)

100.6 (15.2)

92.8 (14.4)

82

SON-R 2,-7

occurred between Mosaics (9.7) and Analogies (10.5) is noteworthy. The results show that the
lower performances of the immigrant children were not caused or worsened specifically by the
subtests Categories, Situations and Puzzles. These subtests use meaningful picture materials
and might therefore have a culture specific meaning. The mean score on these three subtests was
equal to the mean score on Patterns, Mosaics and Analogies. These last three subtests use nonmeaningful picture materials such as geometrical forms. No differences were found between the
mean scores on the Performance Scale and the Reasoning Scale in the immigrant or in the mixed
group.

8.2 RELATIONSHIP WITH THE SES LEVEL


Information about the level of education and occupation of the parents was available for most
children. The SES index, calculated on the basis of these data, had a mean of 5.1 (sd=2.8) in the
mixed group, a mean of 2.5 (sd=2.7) in the immigrant group, and a mean of 4.9 in the native
Dutch group (sd=2.5). The SES index of the immigrant children was significantly (p<.01) lower
than the SES index of the native Dutch children.
In table 8.2 the percentage of children at each SES level is presented for each group (the SES
index has been limited to four categories). The distribution curve of the mixed group was
slightly flatter than that of the native Dutch group; the distribution of the immigrant group was
very skewed. In comparison to the native Dutch group, more than three times as many children
of the immigrant group had a low SES level, whereas the number of children with a high SES
level in the native Dutch group was more than three times as high as that in the immigrant group.
The mean IQ scores for each SES level are also presented in table 8.2. Within each group a
clear and comparable relationship existed between SES level and IQ, and no significant interaction effect was found. When the SES level was controlled, the differences among the three
groups almost disappeared and were no longer significant (F[2,1158]=2.81; p>.05). The difference of nearly eight IQ points between the immigrant and the native Dutch group decreased,
after controlling for the SES level, to three points.
Table 8.2
Relationship Between Group, SES Level and IQ
Native Dutch
(N=963)
SES Level
Low
Below average
Above average
High

Mixed
(N=90)

Pct Mean (SD)

Pct Mean (SD)

18% 92.9 (14.9)


33% 98.8 (13.9)
31% 102.9 (13.8)
19% 108.1 (14.9)

21% 94.3 (11.9)


27% 98.3 (16.0)
28% 102.7 (15.9)
24% 106.5 (14.2)

Immigrant
(N=117)
Pct Mean (SD)
61% 90.3
22% 94.6
11% 99.1
6% 102.9

(13.5)
(12.3)
(17.8)
(16.6)

8.3 DIFFERENTIATION ACCORDING TO COUNTRY OF BIRTH


The largest immigrant groups in the Netherlands come from Surinam, the Antilles, Morocco and
Turkey. These groups are most strongly represented in this research group (table 8.3). Children
of parents born in Surinam, Morocco and Turkey had mean scores close to 90. The small group
of children from other African and South American countries had the same mean score. The
Antillean and Asian children had scores close to 100 and the small group of children from other
Western countries performed above average.
For children with one parent born outside the Netherlands, the differences in mean IQ scores

83

IMMIGRANT CHILDREN

Table 8.3
Differentiation of Mean IQ Scores According to Country of Birth
Country of birth of parents

Country
One or both
abroad

One parent
abroad

Both parents
abroad

of birth
of child

land

Mean

Mean

Mean

Mean

Surinam
Antilles
Marocco
Turkey
Indonesia
Other Africa
Other Asia
Other S-America
Other Western

49
22
26
26
15
14
11
8
38

92.3
99.3
88.7
91.0
102.6
96.1
101.4
96.8
103.9

13
7
1
4
12
10
4
7
33

97.8
102.9
82
86.3
103.6
99.3
106.0
97.6
102.8

36
15
25
22
3
4
7
1
5

90.3
97.6
88.9
91.8
98.7
88.0
98.7
91
111.8

4
9
3
2

4
6
4
7

79.8
93.1
97.3
93.0
99.5
88.7
88.3
107.7

209

96.2

91

100.6

118

92.8

39

94.2

Total

were slight. Only a small group with one Turkish or Moroccan parent scored clearly below
average.
A total of 39 children were not born in the Netherlands. In the case of seven of these
children, both parents were born in the Netherlands; these children were presumably adopted. The mean IQ score of these seven children was 4 points lower than of the native Dutch
children. Six of the children in the mixed group were born outside the Netherlands. Their
mean IQ score was nearly 2 points higher than the score of the children in the mixed group
who were born in the Netherlands. Of the 107 children whose parents were both born
outside the Netherlands, and whose country of birth was known, 26 were also born outside
the Netherlands. Their mean IQ score was more than 1 point lower than the IQ score of
the immigrant children who were born in the Netherlands. This indicates that whether the
child was born in the Netherlands or in another country had little effect on the test performance.

8.4 COMPARISON WITH OTHER TESTS


The mean IQ score of the 118 immigrant children in this research project with the SON-R 2,7 was 92.8, nearly 2 points higher than the mean IQ score of 91.0 of the immigrant children
participating in the standardization research of the SON-R 5,-17 (N=61). In comparison to the
SON-R 5,-17, the scores of the Turkish children were higher, while the scores of the Surinam/
Antillean children were lower.
Research was done with the RAKIT in different immigrant groups by Resing, Bleichrodt and
Drenth (1986). The RAKIT is an intelligence test with verbal and performance tasks for children 4 years and older. In the age group of 5;8 years the mean RAKIT IQ in the Surinam/
Antillean group was 89.6; in the Turkish group this was 80.0 and in the Moroccan group 80.5.
Each group consisted of approximately 60 children. The mean IQ scores on the SON-R 2,-7 in
the three ethnic groups were 3, 9 and 11 points higher respectively.
Using the LEM (Learning test for Ethnic Minorities; Hessels, 1993), research was done with
Turkish and Moroccan children five and six years of age. The LEM was specially designed to
measure learning potential and to depend as little as possible on culture specific knowledge and
skills. The Turkish and Moroccan groups consisted of 120 children each. The mean standardized total scores of the Turkish and the Moroccan children were 83.5 and 84.4 respectively. This

84

SON-R 2,-7

means that their mean score on the LEM was approximately 6 points lower than the mean IQ
score of Turkish and Moroccan children on the SON-R 2,-7.
The conclusion on the basis of these comparisons is that immigrant children get better results
on the SON-R 2,-7 than on the RAKIT and the LEM. Comparisons of the SON-R 2,-7 and
another test (see section 9.11), administered to the same children, indicate also that the SON-R
2,-7 is much less dependent on culture specific knowledge and skills.

8.5 THE TEST PERFORMANCES OF CHILDREN PARTICIPATING IN


OPSTAP(JE)
OPSTAP is a family intervention program for immigrant families (Eldering & Vedder, 1992)
and has been used in the Netherlands since 1987. It is the Dutch version of the program HIPPY
(Home Intervention Programme for Preschool Youngsters; Lombard, 1981) that was developed
in Israel. OPSTAP is aimed at helping mothers of immigrant children in the kindergarten age
range. OPSTAPJE has been operating for a few years now and is aimed at helping mothers with
children in the preschool age range. The goal of the programs is to enhance the mothers ability
to stimulate the child in his or her development. This is achieved by (group)discussions, by
providing materials and by supplying exercises for the child.
In 1994, research using the SON-R 2,-7 was carried out, in collaboration with Richard
Cress of the Averos Foundation, with a number of children who were participating in OPSTAP
or OPSTAPJE. In general, the test was administered at the end of the two-year intervention
period. Three of the four examiners (all of them female) had participated in OPSTAP(JE) as
coordinator or trainee. One examiner was of Moroccan descent and one of Surinam descent. A
total of 105 OPSTAP(JE) children were tested. We have limited the presentation of the results to
those Surinam, Moroccan and Turkish children, whose parents were both born outside the
Netherlands (N=90). A good comparison can be made between these groups and similar immigrant groups discussed previously that have not, as far as we know, participated in an intervention program.
The percentage of boys in both the OPSTAP(JE) group and the immigrant comparison group
was 53%. The age varied from two to seven years and had a mean of 5;0 years (sd=1;4 years).
The number of Surinam children was 33; the number of Moroccan children was 22 and the
number of Turkish children was 35.
In table 8.4 the mean scores are presented of the OPSTAP(JE) children, of the immigrant
children from the comparison group and of the native Dutch children from the standardization
research. The mean score of the 90 OPSTAP(JE) children was 102.8. This was two points higher
than the mean score of the native Dutch children. However, the difference was not significant.
The mean score of the OPSTAP(JE) children was 12.5 points higher than that of the immigrant
Table 8.4
Mean IQ Scores of Surinam, Turkish and Moroccan Children Who Had Participated in the
OPSTAP(JE) Project
OPSTAP (JE)
Immigrant

Comparison Group
Immigrant
Native Dutch

Country of birth
of parents

Mean (SD)

Mean (SD)

Mean (SD)

Surinam
Morocco
Turkey
The Netherlands

33
22
35

98.4 (17.6)
106.5 (11.3)
104.7 (11.1)

36
25
22

90.3 (15.0)
88.9 (10.1)
91.8 (14.0)

969

100.7 (14.9)

Total

90

102.8 (14.2)

83

90.3 (13.3)

969

100.7 (14.9)

IMMIGRANT CHILDREN

85

children from the comparison group. The difference according to country of birth was largest
for Moroccan children and least for Surinam children. A variance analysis carried out with
country of birth and participation in OPSTAP(JE) as factors, showed that neither the interaction
effect nor the main effect for country of birth was significant. However, the main effect for
participation in OPSTAP(JE) was highly significant (F[1,167]=33.77, p<.01).
The possibility exists that factors other than participation in OPSTAP(JE) contributed to
these differences, such as, for instance, the SES level of the parents. A selection effect may have
occurred in the decision for parents to participate in OPSTAP(JE), or when parents agreed to
participate in this research. Another difference is that the test was administered at home in the
OPSTAP(JE) research, and at school in most of the other research projects. The ethnic background of the examiners appeared to have had no influence. The scores of the children who were
tested by the two immigrant examiners were on average two points lower than the scores of the
children who were tested by the two Dutch examiners. Furthermore, the scores of the children
who were tested by an examiner from their own ethnic group were no higher than those of the
other children. What could, of course, have played a role is, that all four examiners had a great
deal of experience with immigrant children and were therefore well able to motivate and stimulate the children. In order to be able to give an unambiguous evaluation of the effect of
OPSTAP(JE), research needs to be done with a pretest, post-test, control group design, with the
examiner as variable to be controlled for.

87

RELATIONSHIP WITH COGNITIVE TESTS

Within the framework of the validation of the test, the relationship between the IQ scores on the
SON-R 2,-7 and the performances on a large number of cognitive tests was examined. The
validation measures, here referred to as criterion tests, were mostly general development and
intelligence tests like the BOS 2-30, the Stutsman, the GOS 2,-4,, the LDT, the RAKIT,
various versions of the Wechsler tests, the BAS, the MSCA and the TONI-2, and tests for
language development and verbal intelligence like the Reynell Test and the Schlichting Test, the
TvK, the PPVT-R and the PLS-3. More specific tests were also administered, including a
memory test (TOMAL) and a test for visual perception (DTVP-2). In the text the tests will be
described and the acronyms explained.
The administration of the SON-R 2,-7 and the criterion tests was carried out within the
framework of a number of different research projects. In sections 9.1 through 9.7 the results of
each research project are described. These projects were:
1. The nationwide standardization research.
2. Research in the Netherlands on pupils at second year kindergarten level ( 5-6 years) at
primary schools.
3. Research in the Netherlands at OVB-schools. These are schools with a policy of educational
priority in certain areas designated as low SES areas.
4. The Dutch research at schools and institutes for children with special problems and handicaps.
5. Research in Australia on non-handicapped children and on children with impaired hearing or
a developmental delay.
6. Research in the United Stated of America on children in regular education.
7. Research in Great Britain on children without specific problems, children with learning
problems and children growing up bilingually.
Table 9.1 presents the tests that were used in the different research projects. In a number of cases
only some sections of the criterion test were administered. In order to be able to compare the
correlations of the research projects better, they have all been corrected for the dispersion of the
IQ scores of the SON-R 2,-7 (Guilford & Fruchter, 1978, p. 325). This correction is not
comparable to the correction for attenuation by which correlations are systematically strengthened. When correcting for dispersion, the correlations are strengthened if the standard deviation
of the SON-IQ in the research group is smaller than 15, and they are weakened if the standard
deviation is larger than 15. As an example we will give a few corrected correlations for an
observed correlation of .60. This becomes: .65 (sd=13); .63 (sd=14); .58 (sd=16) or .55 (sd=17).
In section 9.8 a summary is presented of the correlations between the SON-IQ and the
criterion tests that are discussed in this chapter. A distinction is made between general intelligence tests, nonverbal cognitive tests, and language and verbal intelligence tests. Approximately half of the correlations with general intelligence tests ranged from .59 to .70. With nonverbal
cognitive tests they ranged from .59 to .75, and with verbal (intelligence) measures half of the
correlations ranged from .45 to .54.
Section 9.9 examines whether important differences were found between the correlations of
the Performance Scale and the Reasoning Scale with the criterion tests, and whether these
differences were systematic. When differences were found, the Performance Scale of the
SON-R 2,-7 had a relatively strong correlation with the performance part of other intelligence

88

SON-R 2,-7

Table 9.1
Overview of the Criterion Tests Used and the Number of Children to Whom Each Test Was
Administered
Netherlands
Criterion
Test
P-SON/SON-R 5,-17
BOS 2-30 (BSID)
GOS 2,-4,/K-ABC
RAKIT
WPPSI(-R)/WISC-R
LDT
Stutsman
MSCA (MOS)
BAS
TONI-2
DTVP-2
TOMAL
Reynell (TB)
Schlichting (ZO/WO)
TvK
PPVT-R (Peabody)
PLS-3

Standardiz. Primary OVB


research
school school

Special
groups

Australia

USA

GB

119
50
115
165

153

73
41
73

206
26

70
112
80
42

155

31

75

26

58

153
153

558
558
108

179

49

29
47

tests and with visual perception, whereas the Reasoning Scale had a strong correlation with the
verbal part of other intelligence tests and with language comprehension.
In section 9.10 the differences between the mean scores of the SON-IQ and mean total scores
of the criterion tests are presented. The problems that occur when making these comparisons are
also examined. Large differences between standardized scores may occur as a result of norms
becoming obsolete, or as a result of differences in populations used for standardization. If
obsolescence of the norms was not taken into account, the scores on the SON-R 2,-7 were
generally lower than on the other tests. If scores on the other tests were corrected for obsolescence, the mean score of the SON-IQ corresponded with the corrected American and English
norms. The scores on the SON-R 2,-7 were relatively high in comparison to the corrected
Dutch test scores. However, the scores corresponded well with the most recently standardized
test in the Netherlands, the GOS.
Because of the amount of research described in this chapter, it may be easier for the reader to
read the summarizing sections 9.8 through 9.10 first, and then the separate descriptions of each
research project.
Finally, in section 9.11 a comparison is made between the relationship of the SON-R 2,-7
and the criterion tests, using a number of external variables. These are the testability of the
child, the correlation with SES level and native country of the parents, and external assessments
of intelligence and language skills.
The results of the research described in this chapter can clarify the extent to which the scores
on the SON-R 2,-7 are comparable with the scores on other intelligence tests, and give insight
into the relationships between the nonverbal measure of intelligence provided by with the
SON-R 2,-7 and other aspects of cognitive development such as language skill, memory and
perception. In chapter 10, the results of this correlational research are worked out in more detail
together with the results of the previous chapters. In chapter 10, attention is focussed especially
on the implications of the research results for the use of the test in practice.

89

RELATIONSHIP WITH COGNITIVE TESTS

9.1 CORRELATION WITH COGNITIVE TESTS IN THE


STANDARDIZATION RESEARCH
The design and execution of the standardization research of the SON-R 2,-7 was carried out in
collaboration with the project group responsible for the standardization of the Reynell Test for
Language Comprehension and the Schlichting Test for Language Production. Approximately
half the children participating in the SON-R 2,-7 standardization research had completed the
Reynell and the Schlichting Test half a year earlier. For a small number of children the interval
was a year. Between the administration of the language tests and the SON-R 2,-7, a criterion
test was administered to many of the children as part of the process of validating both the
SON-R 2,-7 and the Reynell/Schlichting Test. As extra criterion test, the BOS, the GOS, the
RAKIT or the TvK was used. In the case of many of the children who had not been tested
Table 9.2
Characteristics of the Children to Whom a Criterion Test Was Administered in the Standardization Research

Total

Retest
SON-R
2,-7

SON-R
5,-17

BOS

GOS

141

119

50

115

165

558

56

241

108

12
13
12
9
21
30
7
12
9
8
8

22
23
23
27
24

28
22

11
19
19
25
26
14
1

14
23
25
27
25
27
24

41
51
54
57
54
55
64
66
58
58

47
9

24
53
50
55
59

11
16
18
17
16
15
15

72
69

63
56

24
26

59
56

80
85

269
289

23
33

117
124

50
58

5.3
(2.7)

4.1
(2.2)

4.9
(2.7)

5.5
(2.7)

4.9
(3.0)

4.9
(2.5)

4.5
(2.3)

5.0
(2.5)

4.4
(2.3)

84%
6%
9%

83%
7%
10%

94%
2%
4%

90%
6%
4%

87%
4%
9%

92%
5%
3%

96%
4%

93%
4%
3%

90%
5%
5%

Reynell-Schlichting
RAKIT LC/SD/WD Lexi
AM

TvK

Age group
2;3 years
2;9 years
3;3 years
3;9 years
4;3 years
4;9 years
5;3 years
5;9 years
6;3 years
6;9 years
7;3 years
Sex
Boys
Girls
SES Index
Mean
(SD)
Country of birth
Native Dutch
Mixed
Immigrant

the age group is the age at the time of administration of the SON-R 2,-7

90

SON-R 2,-7

previously with another test, either the GOS or the RAKIT was administered approximately
three months after adminstration of the SON-R 2,-7, or the children were tested again with
either the SON-R 2,-7 or the SON-R 5,-17.
In table 9.2 the background of the children to whom a criterion test was administered is
presented. The age groups refer to the age at which the SON-R 2,-7 was administered. The
results are presented in table 9.3. The age in this table is based on the mean age at administration
of the SON-R 2,-7 and the criterion test. The interval (in months) is the period between the
administration of the tests. The results of the research are discussed for each test.

, -7
SON-R 2,
To determine the stability of the test results, the SON-R 2,-7 was administered a second time to
141 children after a delay of three to four months. The results were presented in section 5.5.
They will be discussed briefly here as they may serve as basis for the assessment of the
Table 9.3
Correlations with Other Tests in the Standardization Research
Scores

Age
(years)

Interval
(months)

Criterion
Mean (SD)

, -7
SON-R 2,
IQ-score on retest

141

.79

109.4 (14.7)

103.4 (13.7)

4.7 (1.4)

3.5 (0.7)

, -17
SON-R 5,
Standard IQ

119

.76

103.6 (12.2)

98.2 (12.6)

6.4 (0.7)

3.6 (0.7)

50

.59
.53

100.5 (17.7)
98.6 (17.3)

103.0 (15.9)

2.4 (0.3)

2.7 (0.7)

115

.65
.63
.49

104.4 (15.7)
102.8 (17.4)
105.1 (13.0)

102.9 (15.5)

3.6 (0.7)

3.2 (0.7)

165

.60

102.2 (15.6)

102.4 (14.6)

5.8 (1.0)

3.0 (0.7)

558

.48
.46
.35
.45

100.8
101.1
100.3
100.9

(12.8)
(15.0)
(14.2)
(14.8)

101.4 (15.4)

4.4 (1.4)

6.3 (1.4)

241

.27

100.6 (14.3)

101.9 (15.8)

4.1 (0.7)

6.1 (1.0)

56

.54

102.4 (15.8)

101.7 (16.5)

2.0 (0.1)

6.9 (2.5)

108

.59

4.7 (1.6)

101.2 (15.4)

5.7 (1.0)

3.0 (0.8)

Criterion Test

BOS 2-30
Mental Scale
Nonverbal Scale
, -4,
,
GOS 2,
Cognitive DI
Simultaneous DI
Sequential DI
RAKIT
Shortened version IQ
REYNELL/SCHLICHTING
Mean LC, SD and WD
Lang.comprehension (LC)
Sentence Developm. (SD)
Word Development (WD)
Auditive Memory
Lexilist
TvK
Mean of 5 subtests

SON-R 2,-7
Mean (SD) Mean (SD) Mean (SD)

the correlations have been corrected for the variance of the SON-IQ
the age is the mean age at the time of administration of the SON-R 2,-7 and the criterion test

RELATIONSHIP WITH COGNITIVE TESTS

91

correlations of the SON-R 2,-7 with other tests. The age at the first administration ranged from
two to seven years with a mean of 4;6 years. The correlation between the IQ scores was .79. This
correlation increased slightly with age. With children up to 4;6 years (N=67), the correlation
was .78 and with older children the correlation was .81 (N=74). On the basis of these retest
correlations, correlations with criterion tests were not expected to exceed .80 if the period
between administrations was a few months or more.

, -17
SON-R 5,
After a delay of at least three months, the SON-R 5,-17 was administered to 119 children 5
years and older (mean age was 6;3 years). The more difficult items of the subtests Mosaics,
Categories, Analogies and Situations of the SON-R 2,-7 are very similar in content to the
easier items of these subtests of the SON-R 5,-17. The subtest Puzzles does not have an
equivalent in the SON-R 5,-17. Two new subtests in this test are Stories and Hidden Pictures.
Both tests have a subtest Patterns, however, the subtests differ in content.
The correlation between the IQ scores of the two tests was .76. The correlation was as high
with children younger than 6;6 years (N=68; r=.75) as with the older children (N=51; r=.75). As
was the case with the retest of the SON-R 2,-7, there was a noticeable learning effect with the
administration of the SON-R 5,-17. The mean scores were more than 5 points higher with the
SON-R 5,-17.

BOS 2-30
The BOS 2-30 (Bayley Developmental Scales; Van der Meulen & Smrkovsky, 1983) is a test for
the mental and motor development of children in the age range from two to thirty months. This
test is the Dutch version of the Bayley Scales of Infant Development (BSID; Bayley, 1969). A
developmental index is calculated for the Mental Scale and the Motor Scale with a mean of 100
and a standard deviation of 16. A nonverbal score for the Mental Scale can be determined by
excluding the items with a verbal content in the scoring (Van der Meulen & Smrkovsky, 1987;
Le Coultre-Martin et al., 1988).
Fifty children (24 boys and 26 girls) were tested. In the case of 47 children, both parents
were born in the Netherlands. The SES level corresponded to that of the norm group. The mean
age at the time of administration of the BOS was 2;3 years. The SON-R 2,-7 was administered
two to four months later. The administration of the BOS was limited to the Mental Scale.
The correlation of the developmental index of the Mental Scale of the BOS with the SON-IQ
was .59. The correlation of the Nonverbal Scale of the BOS with the SON-IQ was slightly lower
(r=.53). On average, the children scored more than two points lower on the BOS than on the
SON-R 2,-7.

, -4,
,
GOS 2,
The GOS 2,-4, (Groningen Developmental Scales; Neutel, Van der Meulen & Lutje Spelberg,
1996) is the Dutch version, for children from 2, to 4, years, of the Kaufman Assessment
Battery for Children (K-ABC; Kaufman & Kaufman, 1983). Two new subtests were added to
the GOS (Motor Skills and Copying Figures). In contrast to the K-ABC, the GOS does not make
a distinction between a Mental Scale and an Achievement Scale. The number of subtests administered is 9, 11 or 13, depending on age. The total of all subtests forms the Cognitive Scale.
Furthermore, the subtests are subdivided into a Simultaneous Scale and a Sequential Scale.
Three subtests from the Achievement Scale of the K-ABC have been added to the Simultaneous
Scale. The subtest Arithmetic and the two new subtests are part of the Sequential Scale. The
three developmental indexes have a mean of 100 and a standard deviation of 15.
The GOS was administered to 115 children (59 boys and 56 girls). The mean SES index was
5.5. In the case of 103 children, both parents were born in the Netherlands. The period between
the administration of the tests was on average three months. The GOS was administered first to
64 children, and the SON-R 2,-7 was administered first to 51 children. The mean age at the
time of administration of the tests was 3;7 years.
The correlation between the Cognitive Developmental Scale of the GOS and the SON-IQ
was .65. The mean and the standard deviation of both tests were very similar. The correlation for

92

SON-R 2,-7

three age groups, based on the age at the time of administration of the GOS was .64 (younger
than 3;2 years; N=39), .62 (age between 3;2 and 4;2 years; N=51) and .77 (older than 4;2 years;
N=25).
In the entire group the correlation between the Simultaneous Scale and the SON-IQ was .63,
and between the Sequential Scale and the SON-IQ .49. The dispersion of Simultaneous Scale
(sd=17.4) was significantly larger than that of the Sequential Scale (sd=13.0). When the correlations were corrected not for the standard deviation of the SON-IQ, but instead for the standard
deviation of the two subscales of the GOS, the correlation of the Simultaneous Scale with the
SON-IQ was .59 and the correlation of the Sequential Scale with the SON-IQ was .56.

RAKIT
The RAKIT (Revision of the Amsterdam Intelligence Test for Children; Bleichrodt, Drenth,
Zaal & Resing, 1984) is a general intelligence test, developed in the Netherlands, for children in
the age range four to eleven years. There are twelve subtests which tap spatial-perceptual as
well as verbal abilities. In the age range six to ten years, the RAKIT IQ has a correlation of .81
with the IQ score on the WISC-R (Bleichrodt, Resing, Drenth & Zaal, 1987). In our research
project the shortened version of the RAKIT, five or six subtests, depending on age, was administered. The IQ score of the shortened RAKIT has a mean of 100 and a standard deviation of 15.
Research was done with 165 children (80 boys and 85 girls). The mean SES index was 4.9.
Thirteen percent of the children had one or both parents born outside the Netherlands. The
RAKIT was administered first to 111 children and the SON-R 2,-7 was administered first to 54
children. The mean interval between the two administrations was three months. The age at the
time of administration was on average 5;10 years.
The correlation between the SON-IQ and the shortened version RAKIT IQ was .60. The
mean and the dispersion of both tests corresponded well. Three age groups were distinguished
on the basis of the combination of RAKIT subtests administered. In the first group (mean age
4;8 years at the time of administration of the RAKIT; N=53) the correlation was .50. In the
second group (mean age 5;8 years; N=48) the correlation was .62. In the oldest age group (mean
age 6;10 years; N=64) the correlation between the SON-IQ and the RAKIT IQ was .65.

Reynell Test and Schlichting Test


The Reynell Test for Language Comprehension (Van Eldik et al., 1995) is the recently completed Dutch revision of the language comprehension section of the Reynell Developmental
Language Scales (Reynell, 1985). The test provides one standardized score for receptive language development. The Schlichting Test for Language Production (Schlichting et al., 1995) is
a newly developed test for expressive language development. The test provides standardized
scores for Sentence Development and Word Development for children in the age range 1;8 to
6;3 years. For the age range 2;9 to 4;9 the test has an Auditive Memory section (repeating series
of words). For the age of 1;9 years standardized scores are calculated for the Lexilist, a list of
words and sentences from early language development completed by the parents.
The Reynell Test and the Schlichting Test were standardized on the same population. Both
tests were administered in one session. The standardized scores have a mean of 100 and a
standard deviation of 15.
Half a year, or in a few cases one year, after the Reynell/Schlichting Test, the SON-R 2,-7
was administered to 558 children (269 boys and 289 girls). The interval between the administration of the language tests and the SON-R 2,-7 was, on average, 6.3 months. The mean age at
the time of administration of the SON-R 2,-7 was 4;7 years. The SES index had a mean of 4.9.
In the case of 92% of the children both parents were born in the Netherlands.
The correlation of the SON-IQ with the Language Comprehension score on the Reynell Test
was .46; the correlations with Sentence Development and Word Development of the Schlichting
Test were .35 and .45 respectively. The correlation of the SON-IQ with the mean score on
Language Comprehension, Sentence Development and Word Development was .48. This correlation increased with age. Depending on the age at the time of administration of the Reynell/
Schlichting Test, the correlation for the one- and two-year-olds was .40 (N=153), for the three-

RELATIONSHIP WITH COGNITIVE TESTS

93

year-olds .36 (N=124), for the four-year-olds .51 (N=119), for the five-year-olds .52 (N=124)
and for the six-year-olds .72 (N=58).
The correlation between the SON-IQ and the score on the Lexilist was .54 (N=56). The age
at the time of administration of the Lexilist was 1;9 years. The mean age at the time of administration of the SON-R 2,-7 was 2;4 years.
The correlation of the SON-IQ with the Auditive Memory section of the Schlichting Test was
.27 (N=241). For children less than four years at the time of administration of the Schlichting
Test, the correlation was .25 (N=127) and for the older children the correlation was .28 (N=114).

TvK
The TvK (Language Tests for Children; Van Bon, 1982) is a test battery consisting of ten tests
for receptive and productive language development in children in the age range four to ten
years, developed in the Netherlands. The TvK is an adaptation of the Illinois Test of Psycholinguistic Abilities (ITPA). During the research two receptive tests (Choice of Sentence Structure
and the Choice of Vocabulary) and three productive tests (Word Form Production, Sentence
Structure Production 0 and Vocabulary Production 1) were administered. The scaled scores of
the tests have a mean of 5 and a standard deviation of 2.
The TvK was administered to 108 children (50 boys and 58 girls). In the case 97 children,
both parents were born in the Netherlands. The SES index had a mean of 4.4. The age at the time
of administration of the TvK was on average 5;6 years. The SON-R 2,-7 was administered on
average three months later.
The correlations of the SON-IQ with the subtests of the TvK ranged from .39 (Choice of
Sentence Structure) to .52 (Choice of Vocabulary). The correlation of the SON-IQ with the
mean score on the five subtests of the TvK was .59. For the younger children (age at the time of
administration of the TvK less than 5;6 years; N=53) the correlation was .50; for the older
children the correlation was .68 (N=55).

9.2 CORRELATION WITH NONVERBAL TESTS IN PRIMARY


EDUCATION
Within the framework of a research project carried out by psychology students, the SON-R
2,-7 was administered to pupils in the second year of kindergarten (approximately 5-6 years of
age) at six primary schools (Van den Berg et al., 1994; Driesens et al. 1994; Elsjan et al., 1994).
Three other nonverbal tests were also administered: the TONI-2, the nonverbal section of the
TOMAL and the DTVP-2.
The TONI-2 (Brown, Sherbenou & Johnsen, 1990) is the revision of the Test of Nonverbal
Intelligence (TONI; Brown, Sherbenou & Johnsen, 1982). The test has only one section and can
be administered in approximately 15 minutes. The test consists of multiple choice items in
which the relationship between abstract figures must be discovered. Two parallel versions of the
test are available. In this research Form A was used. The TONI-2 has been standardized in the
United States of America for the age range from 5 to 86 years.
The TOMAL (Test of Memory and Learning; Reynolds & Bigler, 1994) is a battery of
memory tests, developed and standardized in America. The standard battery consists of five
verbal and five nonverbal subtests. There are also four supplementary subtests. During this
research, the administration of the TOMAL was limited to the five nonverbal subtests which,
together, provide the score on the Nonverbal Memory Index. The test has been standardized for
the age range from five to nine years.
The DTVP-2 (Hammill, Pearson & Voress, 1993) is the recent American revision of the
Marianne Frostig Developmental Test of Visual Perception (Frostig, Lefever & Whittlesey,
1966). The original test was published in the Netherlands as the Test voor Visuele Waarneming
(Van den Akker & Van Boecop, 1976). The DTVP-2 consists of eight subtests. Besides the total
score for General Visual Perception, separate total scores can be calculated for Motor-Reduced
Visual Perception and Visual-Motor Integration, each of which is based on four subtests. The
test has been standardized for the age range from four to eleven years.

94

SON-R 2,-7

Table 9.4
Correlations with Nonverbal Cognitive Tests in the Second Year of Kindergarten, 5 to 6 Years of
Age (N=153)
Correlation
with SON-IQ

Test

Score

Mean (SD)

,-7
SON-R 2,

IQ

TONI-2

Form A

.51

103.5 (14.1)

TOMAL

Nonverbal Memory Index

.45

97.5 (11.7)

DTVP-2

General Visual Perception


Motor-Reduced Visual Perception
Visual Motor Integration

.73
.70
.66

109.2 (14.4)
100.8 (13.8)
116.7 (15.7)

102.4 (15.8)

the correlations have been corrected for the variance of the SON-IQ

The testing materials did not have to be adapted for the research in the Netherlands. The
directions of the TOMAL and the DTVP-2 were translated. The directions of the TONI-2 are
given nonverbally. American norms were used in the research. The standardized total scores
have a mean of 100 and a standard deviation of 15. The norms for the TONI-2 and the TOMAL
are given for each year of age and are therefore very rough for the young age groups. The
standardized scores for the age in months were therefore calculated by interpolation and extrapolation.
The research was carried out on 153 children (64 boys and 89 girls). The mean age of the
children was 5;10 years with a standard deviation of 5 months. The SES index had a mean of 6.6
(sd=3.0) and was clearly higher than the mean of the norm group. The percentage of native
Dutch children was 86%.
All four tests were administered to the children in three sessions at school. The administration of the TONI-2 and the TOMAL was combined, the TONI-2 being administered first. The
sequence of administration of the SON-R 2,-7, the TONI/TOMAL and the DTVP-2 varied.
The mean interval between the administration of the SON-R 2,-7 and one of the other tests was
21 days with a standard deviation of 12 days.
The mean scores and the correlations between the SON-IQ and the total scores on the other
tests are presented in table 9.4. The correlation with the IQ score on the TONI-2 was .51; the
correlation with the Nonverbal Memory Index of the TOMAL was .45. The highest correlation,
.73, was found with the total score on the DTVP-2. The correlation with the tasks that do not
require motor skills was somewhat stronger (r=.70) than the correlation with the visual motor
tasks (r=.66).

9.3 CORRELATION WITH COGNITIVE TESTS AT OVB-SCHOOLS


In the framework of research on the effect of a policy for the stimulation of children from socioeconomic and cultural groups with educational delays (Bollen, 1991, 1996; Rekveld, 1994), a
group of children was tested in 1991 with a number of subtests of the LDT and the RAKIT. Two
years later, in 1993, the tests were administered for a second time, together with the SON-R
2,-7. After a further two years, the WISC-R and two tests for reading skills were administered
to a number of these children. The first test administration, in 1991, took place at the beginning
of the school year with children in the first year of kindergarten (approximately 4-5 years of
age) at four different OVB-schools. These are regular primary schools which have been given
educational priority because of the large number of native Dutch children with a low SES level
and/or the large number of immigrant children.

95

RELATIONSHIP WITH COGNITIVE TESTS

At the time of administration of the SON-R 2,-7 and the second administration of the LDT and
the RAKIT, most of the children were in first grade (approximately 6-7 years of age). The mean
age at the time of administration of the SON-R 2,-7 was 6;8 years with a standard deviation of
four months. The SES index of the children (34 boys and 39 girls) had a mean of 2.0 (sd=1.9).
The SES level of 75% of the children was low, 16% were below average and 9% were above
average or high. Approximately half the children had one or both parents born outside the
Netherlands, mainly in Surinam or the Antilles.
The LDT (Leiden Diagnostic Test; Schroots & Van Alphen de Veer, 1976) is a general
intelligence test for children in the age range four to eight years. The test has eight subtests,
some taken from other tests. During the research the performance subtest Block Patterns and
three verbal subtests (Repeating Sentences, Questions about a Story and Comprehension and
Insight) were administered. The standardized subtest scores have a mean of 100 and a standard
deviation of 15. Four verbal subtests from the RAKIT (Bleichrodt et al., 1984) were administered (Meaning of Words, Learning Names, Production of Ideas and Story Pictures). These are
all components of the verbal learning and fluency factor. The standardized subtest scores have a
mean of 15 and a standard deviation of 5. The WISC-R is the Dutch edition (Van Haasen et al.,
1986) of the American test with the same name (Wechsler, 1974). The scores for the performance IQ (PIQ), the verbal IQ (VIQ) and the total IQ (FSIQ) have a mean of 100 and a standard
deviation of 15. Two reading tests that had been developed by the CITO (Central Institute for
Test Development) were administered in the school year 1995/96. These were the Cito ThreeMinute-Test for the level of Technical Reading and the Cito Test for Textual Reading.
The subtests of the LDT and the RAKIT were administered in 1991 (N=69) and in 1993
(N=73) in one session. The period between administration of the LDT/RAKIT and the SON-R
2,-7 varied in 1993 from several days up to several weeks. The WISC-R and the Test for
Table 9.5
Correlations with Cognitive Tests Completed by Children at Low SES Schools Given Educational Priority (OVB-Schools)
Year
adm.
91

93

95

Criterion Test
LDT
Block patterns
Mean 3 verbal tests
RAKIT
Mean 4 verbal tests
LDT
Block patterns
Mean 3 verbal tests
RAKIT
Mean 4 verbal tests
WISC-R
Total IQ
Performance IQ
Verbal IQ
CITO-test
Technical reading
Textual reading

Crit. Test
Mean (SD)

69

.54
.44

99.7 (13.7)
96.3 (11.0)

.61

12.7 ( 3.6)

.66
.54

95.8 (13.7)
98.6 (10.8)

.42

13.5 ( 3.5)

.74
.73
.60

90.5 (13.1)
91.5 (12.8)
91.2 (13.6)

.38
.52

39.0 (21.6)
16.4 (16.1)

73

41

the correlations have been corrected for the variance of the SON-IQ
the SON-R 2,-7 was administered in 1993

, -7
SON-R 2,
Mean (SD)
92.7 (15.4)

92.0 (15.0)

92.2 (14.1)

96

SON-R 2,-7

Technical Reading were administered at the beginning of the school year 95/96 (N=41); the
Test for Textual Reading was was administered later that year (N=35).
The correlations of the SON-IQ with the various test scores are presented in table 9.5. The
correlation with the performance subtest Block Patterns of the LDT, administered two years
earlier, was .54. When administered in the same period as the SON-R 2,-7, the correlation
increased to .66. The correlation with the three verbal subtests of the LDT also increased from
.44 to .54. The fact that the strong correlation of the SON-IQ with the four verbal subtests of the
RAKIT decreased from .61 to .42, is noteworthy. The two subtests that had weaker correlations
with the SON-R 2,-7 in 1993, (Word Meaning and Production of Ideas), also had weaker
correlations with the LDT when administered in 1993.
The strongest correlation was found between the SON-IQ and the WISC-R, which was
administered two years later. The correlation with the total IQ was .74; the correlations with the
PIQ and the VIQ were .73 and .60 respectively. On average the SON-IQ score was slightly
higher than the IQ score on the WISC-R.
The correlation of the SON-IQ with the Test for Textual Reading was .52 and the correlation
with the Test for Technical Reading was .38.

9.4 CORRELATION WITH COGNITIVE TESTS IN SPECIAL GROUPS


In the framework of the validation research, the SON-R 2,-7 was administered at a number of
schools and institutes for children with special problems and handicaps. Information on the
childrens scores on a number of cognitive tests that are frequently used in these groups was
requested from the schools. The tests were the Preschool SON, the SON-R 5,-17, the BOS, the
Stutsman, various versions of the Wechsler tests, the LDT and the RAKIT. Furthermore, information was requested concerning two tests for language development, the Reynell and the TvK.
If a test had been administered several times, the most recent score was used.
For each criterion test, an overview is presented in table 9.6 of the age distribution of the
children at the time of administration of the SON-R 2,-7, of various other background characteristics and of the specific groups the children came from. These groups are described in
chapter 7. Most of the children for whom other test results were known were four to six years
old at the time the SON-R 2,-7 was administered. The other test data, in the case of the younger
children, came mostly from the Preschool SON and the Reynell. In general, the criterion test
had been administered earlier, often many years earlier. The percentage of boys was relatively
high, and the average SES level was lower than in the norm population.
The correlation of the SON-R 2,-7 with the various criterion tests is presented in table 9.7.
The results for each test will be discussed separately.

Preschool SON
In the case of 188 children, the IQ scores were known on the predecessor of the SON-R
2,-7, the Preschool SON. More than half of this group were children with language/speech
and hearing problems. Additionally, a large number of children with a general developmental delay and/or a pervasive developmental disorder were tested with the Preschool SON.
The IQ scores of the deaf children that were based on the separate standardization for the
deaf, were transformed into IQ scores based on the standardization for the hearing. Data
from the Preschool SON were only used in the analysis if the test had been administered in
full.
The mean age at the time of administration of the Preschool SON was 3;10 years, the mean
age at the time of administration of the SON-R 2,-7 was 5;3 years. The period between the
administration of the tests was, on average, nearly a year and a half. In a few cases the interval
was more than four years. In the case of 95% of the children, the SON-R 2,-7 was administered
after the Preschool SON.
The correlation between the IQ scores on both tests was .65. This correlation increased
greatly as the age at which the Preschool SON was administered increased. In the age group up

97

RELATIONSHIP WITH COGNITIVE TESTS

to 3;5 years the correlation was .57 (N=60); in the age range 3;5 to 4;1 years the correlation was
.64 (N=64) and in the age range from 4;1 onwards the correlation was .77 (N=64). The interval
between the administration of the two tests may have influenced the increase in the correlations
with age. In the youngest group the average interval was 22 months and in the oldest group 10
months. A relatively large difference, 13 IQ points, was found between the mean scores of the
two tests. A substantial decrease in IQ scores can be expected in view of the interval of more
than 20 years between the two standardizations.
Table 9.6
Characteristics of the Children in the Special Groups to Whom a Criterion Test Was
Administered

Total

P-SON
SON-R
5,-17

BOS

Stutsman

WPPSI
WPPSI-R
WISC-R

LDT

RAKIT

Reynell

TvK

206

26

42

112

80

70

179

49

21
63
73
41
8

4
11
6
4
1

2
5
17
8
9
1

2
16
42
46
6

1
9
20
39
11

8
23
36
3

2
39
47
60
31

6
21
21
1

57
22
58
23
46

9
13
4

12
3

27

61
8
4

39

44
7
20
9

26
4
24
16

64
26
74
15

12
28
9

140
66

11
15

30
12

86
26

55
25

45
25

128
51

33
16

4.2
(2.6)

3.9
(2.1)

4.7
(2.9)

4.0
(2.5)

3.4
(2.4)

3.6
(2.5)

3.5
(2.0)

3.7
(2.3)

89%
7%
4%

95%
5%

90%
5%
5%

94%
5%
1%

91%
4%
5%

97%
2%
2%

92%
5%
2%

96%
2%
2%

Age
2 years
3 years
4 years
5 years
6 years
7 years
Group
Gen.Dev.Disorder
Perv.Dev.Disorder
Speech/lang.Disord.
Hearing impaired
Deaf
Sex
Boys
Girls
SES Index
Mean
(SD)
Country of birth
Native Dutch
Mixed
Immigrant

the age is the age at the time of administration of the SON-R 2,-7

98

SON-R 2,-7

, -17
SON-R 5,
The children at one institute for the deaf were not included in the analysis of the special groups,
because of the probability that the low scores of these children were the result of an examiner
effect (see section 7.7). Most of these children (N=18) were tested again three years later with
the SON-R 5,-17, the revision of the SON for older children. The mean age at the time of
administration of the SON-R 2,-7 was 5;5 years. The mean age at the time of administration of
the SON-R 5,-17 was 8;6 years. The correlation between the IQ scores was .66.
Table 9.7
Correlations with Criterion Tests in the Special Groups
Scores

Age
(years)

Interval
(months)

criterion
Mean (SD)

188

.65

97.9 (16.4)

84.8 (18.2)

4.6 (0.7)

17.0 (12.6)

, -17
SON-R 5,
Standard IQ

18

.66

100.2 (20.4)

83.5 (14.5)

7.0 (0.8)

36.8 ( 7.0)

BOS 2-30
Nonverbal scale

26

.50

95.5 (18.1)

84.1 (13.9)

3.5 (0.5)

35.4 (14.5)

STUTSMAN
Total IQ

42

.57

106.7 (21.6)

92.7 (18.7)

4.1 (0.6)

21.1 (15.3)

WPPSI-R
Performance scale

19

.82

104.2 (15.6)

80.6 (13.2)

5.5 (0.8)

4.7 ( 2.8)

WPPSI
Performance scale

20

.82

111.4 (12.6)

102.0 (11.2)

5.5 (0.5)

10.7 ( 5.8)

53

.60
.49
.59

87.9 (17.4)
87.7 (16.4)
90.3 (18.8)

83.0 (16.0)

5.5 (0.6)

8.2 ( 8.9)

20

.62
.47
.76

85.9 (16.7)
91.8 (17.9)
82.6 (14.9)

82.1 (14.7)

6.6 (0.4)

2.4 ( 2.7)

LDT
Total IQ

80

.58

85.0 (14.8)

81.6 (14.0)

5.9 (0.7)

8.4 ( 6.1)

RAKIT
(Shortened version) IQ

40

.46

80.0 (16.6)

79.7 (15.0)

5.8 (0.6)

6.5 ( 5.0)

RAKIT
Mean of 4 subtests

30

.64

14.9 ( 3.2)

93.8 (14.6)

5.9 (0.6)

8.5 ( 5.5)

REYNELL
Language comp. A

179

.44

1.4 ( 1.2)

83.0 (17.5)

4.9 (1.0)

5.3 ( 6.2)

TvK
Mean of 4 subtests

49

.53

3.4 ( 1.5)

86.1 (14.3)

5.9 (0.7)

3.5 ( 2.8)

Criterion Test
P-SON
IQ-score

WPPSI
Total IQ
Verbal scale
Performance scale
WISC-R
Total IQ
Verbal scale
Performance scale

SON-R 2,-7
Mean (SD)

Mean (SD)

Mean (SD)

the correlations have been corrected for the variance of the SON-IQ
the age is the mean age at the time of administration of the SON-R 2,-7 and the criterion test

RELATIONSHIP WITH COGNITIVE TESTS

99

BOS 2-30
The scores of 26 children on the nonverbal developmental index of the BOS 2-30 were known
(Bayley Scales of Infant Development; Van der Meulen & Smrkovsky, 1983, 1987). All the
children had a language/speech or hearing disorder. The mean age at the time of administration
of the BOS was 2;0 years. The administration of the SON-R 2,-7 took place between one and
five years later. The period between administrations was, on average, nearly three years. The
mean age at the time of administration of the SON-R 2,-7 was 4;11 years. The correlation
between the nonverbal developmental index of the BOS and the SON-IQ was .50.

STUTSMAN
The Stutsman Test (Stutsman, 1931) uses toys and utensils. The tasks to be performed are
different for each age group. The test was adapted for the Netherlands (Smulders, 1963).
However, the old American norms were maintained.
In this investigation, the test was mainly administered to deaf children and children with a
general developmental delay. The mean age at the time of administration of the Stutsman was
3;3 years and the mean age at the time of administration of the SON-R 2,-7 was 5;0 years. In 40
of the 42 cases the Stutsman was administered first. The correlation between the IQ scores on
the two tests was .57. The norms of the Stutsman are obsolete; the scores have a mean that is 14
points higher than the SON-IQ.

WPPSI-R
The performance scale of the WPPSI-R (Wechsler Preschool and Primary Scale of Intelligence
- Revised; Wechsler, 1989) was administered to 19 children at one institute for the deaf. This is
the institute that was not taken into account in the analysis of the results of deaf children,
because of an examiner effect on the administration of the SON-R 2,-7 (see section 7.7). At the
time the WPPSI-R was administered it had not been translated and standardized for the Netherlands. A translation done by the institute was used, and the directions for the performance
subtests were adapted for use with deaf children. The scores were based on American norms.
The SON-R 2,-7 was administered first to 8 children and the WPPSI-R was administered
first to 11 children. The mean age at the time of administration of the tests was 5;6 years. The
interval between the tests was, on average, 5 months. The correlation between the performance
IQ of the WPPSI-R and the SON-IQ was .82.

WPPSI
A Dutch manual of the WPPSI (Wechsler, 1967) in which the American norms are used, was
published in 1973 (Berger, Creuwels & Peters, 1973). In 1981 a Flemish adaptation of the test
was published with Flemish norms (Stinissen & Vander Steene, 1981). The test data for the
WPPSI do not always show clearly which directions and norms were used.
In the case of 20 deaf children the administration of the WPPSI was limited to the performance scale. The SON-R 2,-7 was administered first to six children and the WPPSI was administered first to 14 children. The mean age at the time of administration of the WPPSI was 5;3
years and of the SON-R 2,-7 5;8 years. The interval between the tests was, on average, 11
months. The correlation between the WPPSI PIQ and the SON-IQ was .82.
The WPPSI was administered in full to 53 children. These were nearly all children with a
developmental disorder. In 70% of the cases the WPPSI was administered first. The mean ages
at the time of administration of the WPPSI and the SON-R 2,-7 were 5;3 and 5;9 years
respectively. The interval between administration of the tests was, on average, 8 months. The
correlation with the total IQ of the WPPSI was .60. The correlations of the SON-IQ with the
verbal scale and the performance scale of the WPPSI were .49 and .59 respectively.

WISC-R
In the case of 20 children, scores were available on the WISC-R (Van Haasen et al., 1986), the
Dutch language version of the Wechsler Intelligence Scale for Children - Revised, (Wechsler,
1974), that has been standardized for the Netherlands. This test was administered mainly to

100

SON-R 2,-7

children with a general developmental delay or with pervasive developmental disorder and to a
few children with a speech or language disorder. The SON-R 2,-7 was administered first to 15
children. The mean ages at the time of administration of the SON-R 2,-7 and the WISC-R were
6;6 and 6;8 years respectively. The interval between administration was, on average, a little
more than two months.
The correlation with the WISC-R total IQ was .62. With the verbal scale the correlation was
.47 and with the performance scale .76. The mean score of the SON-IQ was more than 3 points
lower than the WISC-R total IQ. The score on the verbal scale of the WISC-R was 9 points
higher than the score on the performance scale; the mean score on the performance scale was
practically the same as the SON-IQ.
Correlations with the SON-IQ were also calculated for the combined data of the WPPSI, the
WPPSI-R and the WISC-R. As the norms differ, the mean scores of the tests were equated to 0
for each test combination. Subsequently the correlations were calculated for the combined
group. Using this procedure, the correlation of the SON-IQ with the performance scale of the
Wechsler tests could be calculated for 112 children; this was .69.
In the case of 73 children to whom the WPPSI and the WISC-R were administered in full, the
correlation with the total IQ was .62. The correlations of the SON-IQ with the verbal scale and
the performance scale for these children were .49 and .63 respectively.

LDT
The LDT (Leiden Diagnostic Test; Schroots & Alphen de Veer, 1976) consists of eight subtests
which tap verbal and performance skills, and memory. The subtests are partially adapted subtests from other tests, including subtests of the WPPSI and the WISC. The test has been
standardized for the Netherlands.
The LDT was administered in full to 80 children, most of whom had a general developmental
delay or a speech or language disorder. In the case of 53 children, the LDT was administered
first. The mean ages at the time of administration of the LDT and the SON-R 2,-7 were 5;7 and
6;1 years respectively. The average interval between the tests was 8 months.
The correlation of the SON-IQ with the LDT IQ was .58. The correlation of the SON-IQ with
the mean score on three performance subtests (Block Patterns, Folding Papers and CopyTapping) was .67; the correlation with the mean score on two memory tests (Vocabulary Length
and Indicating Pictures) was .43 and the correlation with the mean score on three verbal tests
(Repeating Sentences, Questions about a Story and Comprehension and Insight) was .20. In the
case of children younger than 5;6 years at the time of administration of the LDT (N=38), the
correlation with the LDT IQ was .53.; in children older than 5;6 it was .61. The correlation with
the performance tests of the LDT increased with age from .59 to .74.

RAKIT
The administration of the RAKIT (Revision of the Amsterdam Intelligence Test for Children;
Bleichrodt et al., 1984) takes so long that usually only a few subtests were administered. The
RAKIT was administered to all groups except the deaf children. In the case of 54% of the
children the SON-R 2,-7 was administered first.
The shortened version of the RAKIT was administered to 27 children and the test was
administered in full to 13 children. In this group of 40 children, the mean ages at the time of
administration of the RAKIT and the SON-R 2,-7 were 5;7 and 5;11 years respectively. The
period between the administrations was, on average, a good half year. The correlation of the
SON-IQ with the RAKIT IQ was .46; the mean scores were practically the same.
In the case of 30 other children, the administration of the RAKIT was limited to the first
four subtests (Figure Recognition, Exclusion, Memory and Word Meaning). The mean ages
at the time of administration of the RAKIT and the SON-R 2,-7 were 5;10 and 6;0 years
respectively. The period between administrations was, on average, a good 8 months. The
correlation between the SON-IQ and the mean standard score on the four subtests of the
RAKIT was .64.

RELATIONSHIP WITH COGNITIVE TESTS

101

REYNELL
In these research groups, the scores on the RDLS (Reynell Development Language Scales;
Reynell, 1977) relate to the Dutch translation by Bomers and Mugge (1985), which uses the old
English norms. In most cases only the subtest Language Comprehension A was administered.
The standardized scores have a mean of 0 and a standard deviation of 1.
The Reynell Test was administered to 179 children. The mean age at the time of administration of the Reynell was 4;10 years and the mean age at the time of administration of the SON-R
2,-7 was 5;0 years. The interval between tests was on average a little more than 5 months; in
52% of the cases the Reynell was administered first.
The correlation between the score on Language Comprehension and the SON-IQ was .44. In
the group of children with a general developmental delay or with a pervasive development
disorder (N=90), the correlation was .55; in the group of children with a speech or language
disorder, or with impaired hearing (N=89), the correlation was .35. A distinction was made in
both groups between the children who were younger than five years at the time of administration of the Reynell and the older children. The correlation in the youngest group of children with
general or pervasive development problems was .63 (N=54) and in the oldest group the correlation was .46 (N=36). In the group of children with speech or language disorders, or with
impaired hearing, the correlation was .24 in the youngest group (N=44) and .49 in the oldest
group (N=45).

TvK
The scores of 49 children were known on at least three of the following four subtests of the TvK
(Language Tests for Children; Van Bon, 1982) Word-Form Production, Choice of Sentence
Structure, Choice of Vocabulary and Vocabulary Production.
The TvK was administered mainly to children with a speech or language disorder. Children
with a pervasive development disorder and hearing impaired children were also tested. In 84%
of the cases the TvK was administered after the SON-R 2,-7. The mean interval between the
tests was a little more than three months. The mean age at the time of administration of the
SON-R 2,-7 was 5;10 years and the mean age at the time of administration of the TvK was 6;0
years. The standardized scores on the TvK have a mean of 5 and a standard deviation of 2. The
correlation of the mean standard score on the subtests of the TvK with the SON-IQ was .53.

9.5 CORRELATION WITH THE WPPSI-R IN AUSTRALIA


Comparative research on the WPPSI-R (Wechsler, 1989) and the SON-R 2,-7 was carried out
in Victoria, Australia by Jo Jenkinson at Deakin University, in collaboration with Susan Roberts
of the Mental Health Research Institute, Shirley Dennehy of the Advisory Council for Children
with Impaired Hearing, and the University of Groningen (Brouwer, Koster & Veenstra, 1995;
Jenkinson, Roberts, Dennehy & Tellegen, 1996; Tellegen, 1997). The research was done with a
sample of 155 children (72 boys and 83 girls) with a mean age of 4;5 years (standard deviation
was 10 months). The sample consisted of children without specific problems and handicaps
(control group; N=59), children with impaired hearing, of whom 75% had a hearing loss of at
least 60 dB (N=59), and children with a developmental delay (N=37).
The SON-R 2,-7 and the WPPSI-R were administered alternately; the mean interval
between tests was 20 days. The SON-R 2,-7 was administered by Dutch students and the
WPPSI-R by a school psychologist who was, in most cases, associated with the institute where
the research was conducted. Only the performance subtests of the WPPSI-R were administered
to hearing disabled children and children with a developmental delay. The entire WPPSI-R was
administered to the control group.
The mean scores, and the correlations of the SON-IQ with the performance scale (PIQ), the
verbal scale (VIQ) and the total score on the WPPSI-R (FSIQ), are presented in table 9.8.
American norms have been used for the WPPSI-R and Dutch norms for the SON-R 2,-7. When
calculated over the total group, the correlation with the PIQ was .78; within the different groups

102

SON-R 2,-7

this correlation was .74 or .75. The correlation with the verbal IQ was clearly lower in the
control group (r=.54). The correlation with the full scale IQ of the WPPSI-R (r=.75) was slightly
higher in the control group than the correlation with the PIQ. On average, the scores on the
SON-R 2,-7 were five points lower than those of the PIQ.
The mean differences between the groups were very similar for the SON-IQ and the PIQ: the
difference between the hearing-disabled group and the control group was 13.3 for the SON-IQ
and 10.3 for the PIQ. For the group with a developmental delay the difference was 40.5 for the
SON-IQ and 38.5 for the PIQ.
Table 9.8
Correlations with the WPPSI-R in Australia
Mean and Standard Deviation
Entire
group
(N=155)
SON-IQ
WPPSI-R

PIQ
VIQ
FSIQ

Control
group
(N=59)

Hearing
impairment
(N=59)

Developm.
delay
(N=37)

94.2 (22.3)

108.9 (14.5)

95.6 (15.6)

68.4 (19.0)

99.1 (21.8)

112.2 (13.5)
109.1 (11.1)
112.2 (12.4)

101.9 (17.2)

73.7 (17.5)

Entire
group

Control
group

Hearing
impairment

Developm.
delay

.78

.74
.54
.75

.74

.75

Correlation with SON-IQ

WPPSI-R

PIQ
VIQ
FSIQ

the correlations have been corrected for the variance of the SON-IQ

9.6 CORRELATION WITH COGNITIVE TESTS IN WEST VIRGINIA,


USA
In the USA, research on the relationship between the SON-R 2,-7 and several cognitive tests has
been done by Stephen OKeefe at the West Virginia Graduate College. Students from the University of Groningen, who administered the SON-R 2,-7 to some of the children, participated in this
research (Ten Horn, 1996). The following tests were used: the WPPSI-R (Wechsler, 1989), the
K-ABC (Kaufman & Kaufman, 1983), the MSCA (McCarthy Scales of Childrens Abilities;
McCarthy, 1972), the PPVT-R (Peabody Picture and Vocabulary Test Revised; Dunn & Dunn,
1981) and the PLS-3 (Preschool Language Scale-3; Zimmerman, Steiner & Pond, 1992).
Most of the children were between four and five years of age, were tested at school in
different places in West Virginia, and had no specific handicaps. Nearly all children were
Caucasian. The distribution according to sex and age at which the SON-R 2,-7 was administered, is presented per test in table 9.9. In most cases, administration of the criterion test and the
SON-R 2,-7 took place shortly after each other, sometimes on the same day. The criterion test
was usually administered first. The PPVT was not administered within the framework of this
research. Previously acquired data were made available by the school.
In table 9.10, the mean scores on the tests and the correlations with the SON-IQ are presented. American norms were used for the American tests and Dutch norms for the SON-R 2,-7.
The results are described per test.

103

RELATIONSHIP WITH COGNITIVE TESTS

WPPSI-R
The WPPSI-R was administered to 75 children whose mean age at the time the SON-R 2,-7
was administered was 5;1 years. The correlation of the SON-IQ with the total IQ (FSIQ) of the
WPPSI-R was .59; the correlations with the performance and verbal scales were .60 and .43
respectively. The mean score on the SON-IQ was more than two points lower than the FSIQ and
nearly four points lower than the PIQ.

K-ABC
The original American edition of the Kaufman Assessment Battery for Children differs in
several respects from the Dutch edition for young children (the GOS 2,-4,). The simultaneous
scale of the K-ABC consists of seven parts, and the sequential scale of six parts. However, the
number of subtests administered depends on age. The subtests of the simultaneous and sequential scales form the mental scale. A number of subtests of the mental scale, in which no verbal
abilities are required, form the nonverbal scale. The mean of all scale scores is 100 and the
standard deviation is 15.
The mean age of the 31 children to whom the K-ABC was administered was 4;7 years. The
SON-IQ had the highest correlation with the total Mental Score of the K-ABC, r=.66. The
correlation with the Sequential Scale (r=.29) was considerably lower than the correlation with
the Simultaneous Scale (r=.58). This corresponds with the results of Dutch research with the
GOS 2,-4,, but with the K-ABC, as with the GOS, the distribution of scores on the Sequential
Scale was considerably narrower that the distribution of scores on the Simultaneous Scale. The
correlation with the Achievement Scale was .58 and correlation with the Nonverbal Score of the
mental scale was .61.

MSCA
The McCarthy Scales of Childrens Abilities, published in the Netherlands as the MOS 2,-8,
(Van der Meulen & Smrkovsky, 1986), consists of eighteen subtests. The administration was limited to the subtests of the Verbal Scale, the Perceptual Performance Scale and the Quantitative
Scale, which, together, form the General Cognitive Index. The scale scores have a mean of 50 and
a standard deviation of 10. The general index has a mean of 100 and a standard deviation of 16.
The test was administered to 26 children with a mean age of 4;7 years. The correlation with
the General Cognitive Index of the MSCA was .61. The highest correlation was with with the
Perceptual Performance Scale (r=.61). The correlation with the Verbal Scale was .48 and the
correlation with the Quantitative Scale was .40.

PPVT-R
The Peabody Picture Vocabulary Test requires the child to choose from four pictures the one that
best represents the meaning of a word that has been presented verbally. The standard score on
the test has a mean of 100 and a standard deviation of 15.
The PPVT-R scores of 29 children to whom the SON-R 2,-7 was administered were known
by the school. The mean age at the time of administration of the SON-R 2,-7 was 5;6 years.
The correlation of the Peabody Standard Score with the SON-IQ was .47.
Table 9.9
Age and Sex Distribution of the Children in the American Validation Research
Sex
Criterion Test

Boys

Girls

WPPSI-R
K-ABC
MSCA
PPVT-R
PLS-3

75
31
26
29
47

38
16
12
15
26

37
15
14
14
21

Age at time of admin. of SON-R


3 years 4 years 5 years 6 years
1

28
31
24
3
47

45

1
25

104

SON-R 2,-7

Table 9.10
Correlations with Criterion Tests in the American Research
Scores

Criterion Test
WPPSI-R
Full Scale IQ
Performance IQ
Verbal IQ
K-ABC
Mental Processing Composite
Simultaneous Processing
Sequential Processing
Achievement Scale
Nonverbal Scale
MSCA
General Cognitive Index
Verbal Scale
Perceptual-Perform. Scale
Quantitative Scale
PPVT-R
Standard Score Equivalent
PLS-3
Total Language Score
Auditory Comprehension
Expressive Communication

Age Interv.
(years) (days)

Criterion
Mean (SD)

,-7
SON-R 2,
Mean (SD)

Mean

Mean

75

.59
.60
.43

96.8 (13.9)
98.3 (14.9)
96.1 (13.0)

94.5 (16.6)

5.1

14

31

.66
.58
.29
.58
.61

97.3
96.2
98.3
96.0
96.5

(16.0)
(19.3)
(13.3)
(13.9)
(15.5)

86.1 (20.9)

4.6

16

26

.61
.48
.61
.40

102.3
50.8
52.2
49.9

(19.3)
(13.3)
(10.2)
(10.9)

95.0 (19.1)

4.6

13

29

.47

95.7 (19.7)

95.5 (15.3)

5.5

47

.61
.59
.56

102.7 (19.8)
103.6 (18.8)
101.3 (18.6)

91.4 (18.3)

4.6

the age is the age at the time of administration of the SON-R 2,-7
the correlations have been corrected for the variance of the SON-IQ

PLS-3
The Preschool Language Scale-3 is a test for the receptive and expressive language ability of
young children. Separate scores are calculated for Auditory Comprehension and Expressive
Communication. Together they form the Total Language Ability Score. The three standardized
scores have a mean of 100 and a standard deviation of 15.
The test was administered to 47 children. The mean age at the time of administration of the
SON-R 2,-7 was 4;7 years. The correlation with the Total Score of the PLS-3 was .61. The
correlation with the Receptive Language Ability (r=.59) was slightly higher than the correlation
with the Expressive Language Ability (r=.56).

9.7 CORRELATION WITH THE BAS IN GREAT BRITAIN


A comparative research project on the BAS (British Ability Scales; Elliott, Murray & Pearson,
1979-82) and the SON-R 2,-7 was set up by Julie Dockrell of the Institute of Education,
University of London. The research was carried out in collaboration with the University of
Groningen. English students administered the BAS and Dutch students the SON-R 2,-7. The
tests were administered alternately to 58 children from the first class of different primary
schools, with the interval between tests varying from a few days to a few weeks. The first class

105

RELATIONSHIP WITH COGNITIVE TESTS

corresponds to group three in Dutch primary education. The mean age was 6;3 years with a
standard deviation of 3 months. The group consisted of 34 boys and 24 girls. The schools
selected children belonging to one of the following three groups: the control group (children
without specific problems and handicaps, N=20); the ESL group (English as a Second
Language, N=22) and the LD-group (Learning Disabled, N=16).
The shortened version of the BAS was administered. This consists of four subtests (Naming
Vocabulary, Digit Recall, Similarities and Matrices), supplemented by two nonverbal subtests
(Block Design and Visual Recognition). In addition to the IQ score for the shortened version and
the combination of six subtests, the mean score for the three verbal tests (Naming Vocabulary,
Digit Recall and Similarities) and the three nonverbal tests (Matrices, Block Design and Visual
Recognition) were also calculated. The IQ scores have a mean of 100 and a standard deviation
of 15; the subtest scores have a mean of 50 and a standard deviation of 10.
In table 9.11, the mean scores for the entire group and for the different subgroups are
presented, together with the correlations of the scores on the BAS with the SON-IQ. The
correlation with the shortened version of the BAS was .80 in the entire group. When two
nonverbal subtests are added to the shortened version of the BAS, the correlation increased to
.87. The correlation with the three nonverbal tests (r=.78) was higher than with the three verbal
tests (r=.71), but even the latter was high.
Within the three subgroups the correlations of the SON-IQ with the BAS IQ, based on six
subtests, and with the nonverbal tests, had comparably high values. In the control group, however, the correlations of the SON-IQ with the shortened version of the BAS, and with the three
verbal subtests, were clearly lower than in the other groups.
In the entire group, the IQ scores on the SON-R 2,-7 were, on average, 7 points lower than
on the shortened version of the BAS. The difference in IQ scores between the control group and
the ESL group was slightly less for the SON-R 2,-7 (20.8 points) than for the shortened form of
the BAS (23.9 points). When the two nonverbal tests were added to the BAS IQ, the difference
on the BAS between the two groups decreased to 19.5 points. The difference between the
Table 9.11
Correlations with the BAS in Great Britain
Mean and Standard Deviation
Entire
group
(N=58)

Control
group
(N=20)

English
2nd language
(N=22)

Learning
problems
(N=16)

SON-IQ

83.6 (20.4)

102.7 (14.4)

81.9 (11.2)

61.9 (11.9)

BAS IQ (Shortened vers.)

90.6 (20.0)

111.5 (10.0)

87.6 (11.7)

68.7 ( 9.8)

BAS IQ (6 Subtests)
Mean of 3 verbal tests
Mean of 3 nonverbal tests

92.4 (18.8)
44.1 ( 9.4)
49.2 ( 9.8)

111.2 ( 8.7)
54.3 ( 5.6)
56.4 ( 7.0)

91.7 (10.8)
41.4 ( 5.6)
51.1 ( 6.9)

69.9 ( 8.8)
35.0 ( 4.2)
37.6 ( 4.8)

Entire
group

Control
group

English
2nd language

Learning
problems

BAS IQ (Shortened vers.)

.80

.56

.76

.78

BAS IQ (6 Subtests)
Mean of 3 verbal tests
Mean of 3 nonverb.tests

.87
.71
.78

.83
.35
.69

.85
.60
.81

.87
.73
.81

Correlation with the SON-IQ

the correlations have been corrected for the variance of the SON-IQ

106

SON-R 2,-7

control group and the LD group was 40.8 points for the SON-IQ and 42.8 points for the
shortened BAS. For the BAS IQ based on six subtests, the difference was 41.3 points.

9.8 OVERVIEW OF THE CORRELATIONS WITH THE CRITERION


TESTS
In table 9.12 an overview is presented of the correlations in the various research projects of the
intelligence and (language) development tests with the SON-IQ. A distinction has been made
between:
general intelligence measures, based on verbal as well as performance test sections,
nonverbal measures, such as scores for performance intelligence, visual perception and nonverbal memory,
verbal measures, such as the verbal section of the intelligence tests and more specific measures for verbal development and skills.
The 12 correlations with general intelligence measures varied from .54 to .87. The mean of the
correlations was .65. Half of the correlations ranged from .59 to .70. Two correlations with the
total IQ of the WPPSI-R (r=.75) and the total IQ of the WISC-R (r=.74), and the correlation with
the sum of the six subtests of the BAS (r=.87) were higher than .70.
The 21 correlations with nonverbal (intelligence) measures ranged from .45 to .83 and had a
mean of .65. Fifty percent of the correlations ranged from .59 to .75. The correlations that were
higher than .75 refer, in two cases, to the performance IQ of the WPPSI-R (.77 and .83), to the
correlation with three performance subtests of the BAS (.78), to the correlation with the SON-R
5,-17 (.76) and to the retest with the SON-R 2,-7 (.79). Relatively low correlations were
found with the nonverbal version of the BOS (.50 and .53), with the Stutsman (r=.57), the
TONI-2 (r=51) and the nonverbal memory of the TOMAL (r=.45).
The 19 correlations with measures for verbal development and verbal intelligence ranged
from .20 to .71 with a mean of .48. Half of the correlations fell between .45 and .54. The verbal
sections of intelligence tests (WISC-R, BAS) as well as specific language tests (TvK, PLS-3)
had relatively strong correlations.
In a few studies the correlation with the total score on the criterion test could be compared
with the correlation on the performance and the verbal scales (as in the case of the WPPSI-R, the
WISC-R, the MSCA, the BAS and the LDT). In all these cases the correlation with the performance scale of the criterion test was clearly stronger than with the verbal scale. In the case of the
WPPSI-R, the WISC-R and the MSCA the correlation with the total score was almost as high as
the correlation with the performance scale of these tests. This corresponds to the findings with
the SON-R 5,-17 (Tellegen, 1993). In the case of the BAS the correlation with the total score
was stronger than with the performance section as a result of the strong correlation with the
verbal section. The correlation with the total score in the case of the LDT, on the other hand,
decreased because of the very weak correlation with the verbal section. The two tests for which
a nonverbal score could be calculated by leaving out part of the test (BOS 2-30 and K-ABC) had
weaker correlations with the SON-IQ.
The correlations that were obtained support the convergent and divergent validity of the SON-R
2,-7. The correlations with general intelligence tests and nonverbal cognitive tests were
reasonably strong, whereas the correlations with verbal and memory tests were clearly weaker.
However, the level of the correlations with other intelligence measures was considerably
lower than the reliability of the test. This means that important differences may be found
between the scores on the SON-R 2,-7 and other intelligence tests. To a great extent this was
the result of the young age at which the children were tested and the occasionally very long
interval between the administrations of the tests, as well as of differences in the composition of
the tests and in the manner of administration between the SON-R 2,-7 and the criterion tests. In
general, performance on intelligence tests tends to be less stable as the age at which the children

107

RELATIONSHIP WITH COGNITIVE TESTS

Table 9.12
Overview of the Correlations with the Criterion Tests

Test

Country

Group

P-SON
SON-R 2,-7
SON-R 5,-17
SON-R 5,-17

NL
NL
NL
NL

special groups
stand.research
special groups
stand.research

188
141
18
119

Stutsman

NL

special groups

TONI-2

NL

BOS 2-30
BOS 2-30

Intelligence/Development
General
Nonverbal
Verbal
IQ
IQ
IQ
IQ

9.4
9.1
9.4
9.1

42

.57 IQ

9.4

prim.education

153

.51 IQ

9.2

NL
NL

stand.research
special groups

50
26

.59 MS

.53 Nonv.
.50 Nonv.

9.1
9.4

K-ABC (GOS)
K-ABC

NL
US

stand.research
prim.education

115
31

.65 GCI
.66 GCI

.61 Nonv.

9.1
9.6

WPPSI/WPPSI-R
WPPSI/WISC-R
WISC-R
WPPSI-R
WPPSI-R
WPPSI-R

NL
NL
NL
AU
AU
US

special groups
special groups
OVB-schools
special groups
prim.education
prim.education

39
73
41
96
59
75

.75 FSIQ
.59 FSIQ

.83
.63
.73
.77
.74
.60

MSCA

US

prim.education

26

.61

BAS (shortened)

GB

mixed group

58

LDT
LDT

NL
NL

OVB-schools
special groups

71
80

RAKIT (short)
RAKIT (short)
RAKIT

NL
NL
NL

stand.research
special groups
OVB-schools

165
70
71

DTVP-2

NL

prim.education

153

.73 GVP

9.2

TOMAL

NL

prim.education

153

.45 NMI

9.2

PPVT-R

US

prim.education

29

.47

9.6

PLS-3

US

prim.education

47

.61

9.6

TvK
TvK

NL
NL

stand.research
special groups

108
49

.59 (5s)
.53 (4s)

9.1
9.4

Reynell (old)
Reynell (new)

NL
NL

special groups
stand.research

179
558

.44 LC
.48 LC

9.4
9.1

Schlichting
Schlichting
Schlichting
Schlichting

NL
NL
NL
NL

stand.research
stand.research
stand.research
stand.research

558
558
56
241

.35
.45
.54
.27

9.1
9.1
9.1
9.1

.65
.79
.66
.76

sec.

.54 VIQ
.43 VIQ

9.4
9.4
9.3
9.5
9.5
9.6

.61

.48

9.6

.87 (6s)

.78 (3s)

.71 (3s)

9.7

.58 IQ

.60 BP
.67 (3s)

.49 (3s)
.20 (3s)

9.3
9.4

.51 (4s)

9.1
9.4
9.3

.62 FSIQ
.74 FSIQ

PIQ
PIQ
PIQ
PIQ
PIQ
PIQ

.49 VIQ
.60 VIQ

.60
.54

the correlations have been corrected for the variance of the SON-IQ
(3s) signifies score based on 3 subtests
NL (Netherlands); GB (Great Britain); US (United States of America); AU (Australia)
sec: the section in which the research has been described

SD
WD
Lex
AM

108

SON-R 2,-7

Table 9.13
Difference in Scores between SON-IQ and PIQ of the WPPSI-R (N=230)
Frequency Distribution of the Absolute Difference in Scores

No correction
Correction mean

09

10 19

20 29

30 39

40 49

50 59

54%
66%

34%
25%

10%
7%

0.4%
0.4%

0.9%
1.3%

0.4%
0%

Children with Difference Score > 30


Sex

Age

Land

SON-IQ

PIQ

Sex

Age

Land

A boy
B girl

4;4
4;8

Aust.
US

79
68

130
110

C girl
D girl

4;11
5;1

US
US

SON-IQ

PIQ

124
113

86
71

are tested decreases and as the period between the test administrations increases (Bayley, 1949).
As described in section 9.1 and 9.4, on the basis of various analyses, the correlation of the
SON-IQ with the criterion tests increased greatly as the age at which the tests were administered
increased. The facts that part of the research was done with children who were difficult to test
and that a shortened version of the criterion tests was often administered are also factors
contributing to the weakening of the correlations.
In order to illustrate the occasionally very large discrepancies between the scores on the SON-R
2,-7 and tests that correspond greatly in content, we shall make a further comparison of the
differences in scores between the PIQ of the WPPSI-R and the SON-IQ. The comparison is
based on the results of the 155 children who were tested in Australia and the 75 children who
were tested in West Virginia with the WPPSI-R. In these research projects the interval between
the two test administrations was generally limited to a few weeks. In table 9.13 the frequency
distribution of the absolute differences between the PIQ and the SON-IQ is presented. These
scores were also calculated after first correcting for the difference in means, so that possible
discrepancies in standardization of the tests do not play a role; five points were deducted from
the PIQ for this.
After correcting for the means, the differences in scores for two thirds of the children were
slight (less than 10 points). For a quarter of the children, the differences ranged from 10 to 19
points. In the case of 9% of the children the differences were larger than 20 points and for four
of these children the differences were quite extreme, i.e. 30 points or more. The scores of these
four children are presented in the second part of table 9.13.
Two children, a boy with impaired hearing tested in Australia and a girl tested in West
Virginia, scored substantially lower on the SON-R 2,-7 than on the performance section of the
WPPSI-R. In the latter case the performance on the SON-R 2,-7 was possibly influenced
negatively by the fact that the child had been tested on the WPPSI-R earlier that day. Two girls,
both tested in West Virginia, scored substantially higher on the SON-R 2,-7 than on the
WPPSI-R. Neither of these girls functioned well socially and both were difficult to test.
These extreme cases, in which a child performed far below his or her potential on one of the
tests, had a strong negative influence on the correlations. In the Australian research the correlation increased from .78 to .80 if the deviating subject was left out of the calculation. In the
American research the correlation increased from .60 to .74 if the three children with deviating
scores are left out.
The examples show that extreme underperforming can occur with the SON-R 2,-7 as well
as with the WPPSI-R. This certainly also applies to other intelligence tests. In chapter 10 the
significance of this for diagnostic work with young children is examined.

109

RELATIONSHIP WITH COGNITIVE TESTS

9.9 DIFFERENCE IN CORRELATIONS BETWEEN THE


PERFORMANCE SCALE AND THE REASONING SCALE
In order to evaluate the validity of the distinction between the Performance Scale (SON-PS) and
the Reasoning Scale (SON-RS) of the SON-R 2,-7, we examined whether consistent differences were found in the strength of the correlation of these scales with the other tests. The
comparison was limited to samples of at least 50 persons. As we are comparing correlations of
two scales within the same research group and not correlations between different research
groups, the correlations were not corrected for the variance of the test scores.
Table 9.14 presents the correlations of the Performance and the Reasoning Scales with the
criterion tests for which a difference of .10 or more was found between the correlations with the
total score, or a subscale. These correlations are printed in bold type. The country where the
research was carried out, the specific group of children, and the section in which the research is
described, are also shown in the table. The correlations with the LDT at the OVB-schools,
where the LDT was administered twice, were calculated as the mean of both correlations. In the
Table 9.14
Correlations of the Performance Scale and the Reasoning Scale with Criterion Tests, for Cases
in which the Difference Between Correlations Was Greater Than .10 (printed in bold)
r
Criterion Test

PS

RS

Diff.

Land

Group

sec.

WPPSI/
WISC-R

FSIQ
PIQ
VIQ

73

.54
.59
.39

.58
.53
.50

.04
.06
.11

NL

special groups

9.4

WPPSI-R

FSIQ
PIQ
VIQ

75

.59
.64
.42

.53
.51
.42

.06
.13
.00

VS

prim.education

9.6

WPPSI-R

FSIQ
PIQ
VIQ

59

.67
.77
.38

.62
.53
.53

.05
.24
.15

AU

control group

9.5

WPPSI-R

PIQ

96

.90

.74

.16

AU

special groups

9.5

BAS

IQ Shortened vers.
3 Nonverbal subt.
3 Verbal subtests

58

.77
.82
.68

.86
.79
.82

.09
.03
.14

GB

entire group

9.6

LDT

Total IQ
3 Performance subt.
2 Memory tests
3 Verbal subtests

80

.47
.66
.30
.07

.56
.49
.47
.32

.09
.17
.17
.25

NL

special groups

9.4

LDT

Blokpatronen
3 Verbal subtests

71

.62
.39

.45
.47

.17
.08

NL

OVB schools

9.3

RAKIT

IQ Shortened vers.

165

.58

.46

.12

NL

norm group

9.1

RAKIT

IQ Shortened vers.

70

.42

.52

.10

NL

special groups

9.4

DTVP-2

GVP Total score

153

.78

.52

.26

NL

prim.education

9.2

Reynell

Language compreh. 179

.36

.54

.18

NL

special groups

9.4

codes for the countries: NL (The Netherlands); GB (Great Britain); US (United States of America) ;
AU (Australia)

110

SON-R 2,-7

Australian research with the WPPSI-R, the children with impaired hearing were combined with
the children with learning problems when calculating the correlations.
The Performance Scale of the SON-R 2,-7 clearly had a stronger correlation than the Reasoning Scale with:
the performance scale of the Wechsler tests,
the performance subtests of the LDT,
the DTVP-2, the test for visual perception.
The Reasoning Scale of the SON-R 2,-7 clearly had a stronger correlation than the Performance Scale with:
the verbal scale of the Wechsler tests,
the verbal subtests of the BAS,
the verbal subtests and the memory tests of the LDT,
the Reynell Test for Language Comprehension.
The results in two research projects with the shortened version of the RAKIT were contradictory. In the case of the DTVP-2, the large difference in correlations was caused mainly by the
subtests of the scale for Visual Motor Integration; the difference here was .34. The difference
between the correlations with the scale for Motor Reduced Visual Perception was .14. In the
standardization research the difference in correlations with the Reynell Test for Language
Comprehension and the Schlichting Test for Language Production was slight. The difference in
two research projects with the TvK had a mean of -.08.
The results support the distinction that was made on the basis of the analysis of the internal
structure of the test (section 5.4). They indicate that two aspects of general intelligence are
represented in the SON-R 2,-7; on one hand the performance perceptual tasks, related to
spatial understanding and visual motor skills, and on the other hand the tasks that require
abstract and concrete reasoning. These latter tasks have a stronger relationship with verbal
intelligence and language skills. Because of this, the SON-R 2,-7 is more versatile than a
nonverbal intelligence test that is limited to specific performance tasks.

9.10 DIFFERENCES IN MEAN SCORES ON THE TESTS


An important factor in interpreting and comparing test scores is the degree of comparability of
the norms of the different tests. This can be particularly problematic if a test is used in a country
that is not the country where the standardization was done. However, norms that are obtained in
the same country are also not always easily comparable. This could be because the norm
populations are defined differently (including/excluding children from special education; including/excluding immigrant children). It could also be the manner in which the scores of
children who complete none or only a part of the test are handled in the standardization. Also,
floor and ceiling effects that often occur in the extreme age ranges can complicate making
comparisons.
The size of the norm group influences the accuracy of the norms. Further, the model that is
chosen for the transformation of the raw scores into scaled scores determines the norms. If
norms are presented for a relatively wide age range, systematic distortions occur for the children
whose age does not correspond to the middle of the range. Furthermore, the question arises
whether the norms are applicable when only part of a test was administered (for instance a
shortened version, or only the performance sections). In this case, the manner of administration
differs from the manner in which the norm data were gathered.
An important, and very difficult problem for diagnostics is obsolescence of test norms. The
performance on intelligence tests by children of the same age increases by about 3 points every
10 years in Western countries (Lynn & Hampson, 1986; Flynn, 1987). However, this can differ

111

RELATIONSHIP WITH COGNITIVE TESTS

from country to country and from test to test. As a result of a general improvement in performance, the norms will become stricter for a new test, and the scores will be lower than on tests
standardized some time ago. A similar effect was observed in the Netherlands during the revision
of the WISC-R (Harinck & Schoorl, 1987) and the SON-R 5,-17 (Snijders, Tellegen & Laros,
1989). In an American comparison of the WISC-III with the WISC-R (Wechsler, 1991), and of
the WPSSI-R with the WPPSI (Wechsler, 1989), the increase for the FSIQ was 3.4 points per 10
years (averaged over both tests); for the PIQ this was 4.3 points and for the VIQ 1.9 points.
In table 9.15 the mean scores on the SON-R 2,-7 and the most important criterion tests are
presented. When possible, the results of different research groups were combined (the section in
which the research is described is referred to in the table). Neither criterion tests that were
administered to less than 50 children, nor specific verbal tests are shown in this table. Furthermore, a distinction was made between criterion tests that were scored according to Dutch,
American and English norms.
The differences between the mean scores were also corrected for the interval between the
publication of the criterion test and the publication of the SON-R 2,-7 (1996). Unfortunately
Table 9.15
Comparison Between the Mean Test Scores of the SON-R 2,-7 and the Criterion Tests
Dutch Norms
Criterion Test

P-SON

Total IQ

BOS

Mean
SON-R Crit.

Year
Type

Difference
without/with
correction

section

188

84.9

97.9

75 p

13.0

4.0

[9.4]

Mental Scale
Nonverbal Scale

50
76

103.0
96.5

100.5
97.5

83 g
83 p

2.4
1.0

6.8
4.6

[9.1]
[9.1/9.4]

GOS

Gen. Cogn. Index

115

102.9

104.4

93 g

1.5

.5

[9.1]

RAKIT

Short.version IQ

205

98.0

97.9

84 g

0.0

4.1

[9.1/9.4]

LDT

Total IQ
Block patterns

80
71

81.6
92.3

85.0
97.8

76 g
76 p

3.3
5.5

3.5
3.1

[9.4]
[9.3]

WISC-R

Total IQ
Performance IQ

61
61

88.9
88.9

89.0
88.6

86 g
86 p

.1
.3

3.3
4.6

[9.3/9.4]
[9.3/9.4]

Mean
SON-R Crit.

Year
Type

Difference
without/with
correction

American Norms
Criterion Test

section

TONI-2

IQ Form A

153

102.4

103.5

90 p

1.2

1.4

[9.2]

DTVP-2

Gen. Vis. Perc.

153

102.4

109.2

93 p

6.8

5.5

[9.2]

TOMAL

Nonv. Mem. Ind.

153

102.4

97.5

94 p

4.9

5.7

[9.2]

WPPSI-R

Total IQ
Performance IQ

134
230

100.8
94.3

103.6
98.8

89 g
89 p

2.7
4.6

.3
1.6

[9.5/9.6]
[9.5/9.6]

Mean
SON-R Crit.

Year
Type

Difference
without/with
correction

79 g

7.1

English Norms
Criterion Test
BAS

N
Short.version IQ

58

83.6

year: year of publication of the manual


type: g = general, p = performance/nonverbal

90.6

1.3

section
[9.7]

112

SON-R 2,-7

most test manuals do not give any information about the period in which the norm data were
gathered. If the interval between gathering the norm data and the publication of the test was
known to be much longer than three years, this was taken into account (in the case of the GOS
the interval was six years). In the absence of reliable data about the obsolescence of the norms in
relation to country and test, the strength of the correction was based on the aforementioned
American results of the WPPSI-R and the WISC-III. For each year between the publication of
the criterion test and the SON-R 2,-7, .34 point was deducted from the mean scores for general
intelligence measures and .43 point for performance and nonverbal measures.
When no correction was performed, the differences in means between the SON-R 2,-7 and
the criterion tests were slight for the tests that were standardized in the Netherlands after 1980.
The scores on the SON-R 2,-7, however, were considerably lower than scores on the Preschool
SON (published in 1975) and also clearly lower than the scores on the LDT (1976). After
correction for the year of publication, the scores on the SON-R 2,-7 were, in general, 3 to 4
points higher than scores on the other tests that were standardized in the Netherlands. However,
even after correction, the scores on the SON-R 2,-7 were 4 points lower than the scores on the
Preschool SON. This supports the impression gained from practical experience that the norms
of the Preschool SON were much too easy. The reason for the relatively large difference with
the mean scores on the BOS may be the fact that both tests were administered at two years of
age. A ceiling effect occurs on the BOS at this age and a floor effect occurs on the SON-R
2,-7.
The fact that the scores on the SON-R 2,-7, after correction, were generally higher than on
the other tests could mean that the increase in the intelligence scores of Dutch children in the
last ten to fifteen years is less than we have assumed on the basis of the American data. It could
also mean that a number of children from the special groups and the immigrant group, with
whom part of this research was carried out, profited more from the specific characteristics of the
SON-R 2,-7, such as the nonverbal character and the feedback. When comparing the SON-R
2,-7 with the most recently standardized test, the GOS, which was administered to 115 children during the standardization research, little difference was found in the mean scores.
When comparing tests, using American norms, the scores on the SON-R 2,-7 were lower
than the scores on the American tests (with the exception of the TOMAL). However, after
correction, no differences were found, on average, with the different tests. The difference with
the total score on the WPPSI-R was minimal. With the PIQ the difference was -1.6 and with the
IQ score on the TONI-2 the difference was 1.4. However, a large negative difference occurred
with the DTVP-2 and an equally large positive difference occurred with the TOMAL.
When the English norms for the BAS were used, a large difference, 7 points, was found.
After correction for obsolescence of the norms, this difference practically disappeared.
These results indicate that a strong similarity exists in the development of the (nonverbal)
intelligence of children in the Netherlands, the United Stated of America and Great Britain, and
that the Dutch age norms of the SON-R 2,-7 can be used in Western countries for a broad
assessment of intelligence. However, standardizations conducted on a national level remain
preferable, in order to arrive at more precise norms at the subtest level, and at a better determination of the dispersion and the form of the score distributions.

9.11 COMPARISONS IN RELATION TO EXTERNAL CRITERIA


Analysis of the relationship between the scores on the SON-R 2,-7 and the criterion tests
shows the existence of a reasonable correspondence with general and nonverbal intelligence
tests. In order to increase insight into the aspects on which the SON-R 2,-7 differs from other
tests, a number of comparisons were made between the SON-R 2,-7 and other tests, in relation
to external criteria. The external criteria were the assessment by the examiner of the testability
of the child, background information such as the SES level and native country of the parents,
and the assessment by teachers and institutional staff of intelligence and language development.

113

RELATIONSHIP WITH COGNITIVE TESTS

In the comparisons, correlations between the SON-R 2,-7 and a number of criterion tests with
other variables, were calculated. The comparison between the SON and the other test was
always based on the same group of children. As these correlations were examined within a
group, and not between groups, they were not corrected for the variance of the SON-IQ.

Evaluation of testability
As with the SON-R 2,-7, the children who completed the GOS 2,-4, or the RAKIT in the
framework of the standardization research, were evaluated, after the test, by the examiner on
motivation, concentration and understanding of the directions. In table 9.16 the number of times
the children were given the evaluation good with relation to these aspects is presented for the
children who were evaluated on the SON and the GOS (N=107), and for the children who were
evaluated on the SON and the RAKIT.
The children were more frequently evaluated as being well motivated and well concentrated
during the administration of the SON-R 2,-7 than during the administration of the GOS and the
RAKIT. The difference in percentages varied from 10% to 18%. The evaluation of comprehension of directions was also more often positive with the SON-R 2,-7; the difference with both
other tests was about 6%.
The percentages were lower in the comparison of the SON and the GOS than in the comparison of the SON and the RAKIT. This was the result of the younger ages at which the SON-GOS
combination was administered.
The results are an indication that the attractiveness and variety of the testing materials of the
SON-R 2,-7, the opportunity for the child to be active, the help and feedback given, the limits
on the administration of difficult items, the absence of the necessity to talk, and the extensive
directions, have been successful in allowing the children to do the test in the best possible
circumstances.
Table 9.16
Comparisons between Tests of the Evaluation of the Subjects Testability
Percentage of the children with evaluation good

, -7
SON-R 2,
, -4,
,
GOS 2,

Motivation

Concentration

Compr. directions

107
107

79%
64%

77%
62%

79%
74%

15%

15%

5%

Difference

Percentage of the children with evaluation good

, -7
SON-R 2,
RAKIT

Motivation

Concentration

Compr. directions

169
169

91%
81%

85%
67%

88%
81%

10%

18%

7%

Difference

Background variables
The correlations of a number of criterion tests with the SES index and with the distinction native
Dutch subject or not, have been compared with the correlation between the SON-R 2,-7 and
these variables. The analyses were always carried out in the same group. Due to missing values,
small differences in numbers occur in the correlation with SES index and native country; in
table 9.17 the mean number is shown. Country of origin was dichotomised to form a native
Dutch group (children whose parents were both born in the Netherlands) versus a group of
children one or both of whose parents was born abroad. A positive correlation means that the

114

SON-R 2,-7

native Dutch children scored higher on the test. The comparisons were limited to the standardization research and the research at primary schools. In table 9.17 the column headed by
difference shows the difference between the correlation of the SON-R 2,-7 and the correlation of the criterion test with the variable. A positive difference means that the SON-R 2,-7 had
a stronger correlation with the background variable.
Most of the comparisons indicated that the SON-R 2,-7 correlated less strongly with the
SES level of the parents than the other tests. In nine of the thirteen comparisons involving
absolute differences of .05 or more, the differences were negative. The correlation of the
SON-R 2,-7 with the SES index was considerably weaker (.10 or more) than the GOS, the total
IQ on the WISC-R, the DTVP (visual perception), the verbal subtests of the RAKIT, the verbal
scale of the WISC-R and the TvK (language test). On the other hand, the correlation of the
SON-R 2,-7 with the SES index was considerably higher (.10 or more) than those of the BOS,
the TOMAL (nonverbal memory) and the performance scale of the WISC-R.
Nearly all comparisons showed that the differences in performance between the native Dutch
and the immigrant children was smaller on the SON-R 2,-7 than on the other tests. This was
Table 9.17
Comparisons Between Tests in Relation to Socioeconomic and Ethnic Background
Correlation with
SES index

Correlation with
Dutch/Immigrant

Crit.
test

SON-R
, -7
2,

Diff.

Crit. SON-R
, -7
test
2,

Diff.

118

.21

.24

.04

.11

.17

.06

50

.12

.23

.11

.30

.03

.27

, -4,
,
GOS 2,

115

.54

.39

.15

.17

.06

.11

RAKIT (Short.version)

168

.48

.43

.05

.16

.16

.00

REYNELL/SCHLICHTING
Mean LC, SD and WD

557

.39

.34

.05

.16

.04

.12

TvK (Mean of 5 subt.)

108

.52

.40

.11

.23

.05

.18

Crit.
test

SON-R
, -7
2,

Diff.

Standardization Research
,-17
SON-R 5,
BOS 2-30

2nd Year Schooling


(5-6 years old)

Crit. SON-R
, -7
test
2,
Diff.

TONI-2

141

.39

.48

.09

.06

.24

.18

TOMAL

141

.32

.48

.16

.16

.24

.07

DTVP-2

141

.59

.48

.11

.36

.24

.12

OVB-Schools

Crit.
test

SON-R
, -7
2,

Diff.

LDT Block Patterns

65

.47

.49

.03

.06

.01

.05

LDT (Verbal tests)

65

.57

.49

.07

.07

.01

.06

RAKIT (Verbal tests)

65

.59

.49

.10

.07

.01

.06

40

.49
.56
.28

.38
.38
.38

.11
.18
.10

.27
.26
.22

.13
.13
.13

.14
.13
.09

WISC-R
Total IQ
Verbal Scale
Performance Scale

Crit. SON-R
, -7
test
2,

Diff.

RELATIONSHIP WITH COGNITIVE TESTS

115

particularly so for the BOS, the GOS, and the DTVP, the total score on the WISC-R and the
verbal scale of the WISC-R, and for the language tests (the Reynell/Schlichting Test and the
TvK). Only on the TONI were the differences between the native Dutch and the immigrant
children clearly smaller than on the SON-R 2,-7.
The differences between the SON-R 2,-7 and the SON-R 5,-17 were slight for both
background variables.
These comparisons demonstrate that the performance on the SON-R 2,-7 is less dependent on
social and cultural differences than the performance on tests that (partially) require verbal
knowledge and skills, like general and verbal intelligence tests and language tests.

Evaluation of intelligence and language skills


Most of the school-aged children in the standardization research were evaluated by the teacher
as to intelligence and language development. A large number of children from the special
groups were also evaluated on these aspects by institute staff members. The correlations
between these evaluations and performances on the SON-R 2,-7 were discussed in sections 6.9
and 7.6. Here we will examine the extent to which the correlations of other tests with the
evaluation of intelligence and language development deviated from those of the SON-R 2,-7.
In table 9.18 the correlations of the SON-R 2,-7, and of the criterion tests, with the evaluation
of intelligence and language development are presented. The results on the performance scales
of the WPPSI and the WPPSI-R when the verbal scale was not administered, are combined, as
are the results from complete administrations of the WPPSI and the WISC-R. The same was
done in the case of the RAKIT, where either the shortened form or the four subtests were
administered. In these cases the correlations were first calculated for each subgroup and subsequently combined by weighting them in proportion to the number of persons in each subgroup.
Six of the correlations with the evaluation of intelligence showed an absolute difference
larger than .10. In four of these cases, the SON-R 2,-7 has a stronger correlation with the
evaluation of intelligence. This is the case with two language tests and the Stutsman. In the
special groups, the SON-R 2,-7 correlated more strongly with the evaluation of intelligence
than the RAKIT, but in the standardization research the association between the RAKIT and the
evaluation of intelligence was stronger. Also, the LDT had a higher correlation with the evaluation of intelligence. The differences with the performance section of the LDT and the Wechsler
were slight. The SON-R 2,-7 had the same correlation as the SON-R 5,-17 with the evaluation.
In general, the SON-R 2,-7 correlated more weakly with the evaluation of language development than did the other tests. The six correlations with a difference greater than .10 all refer to
the total scores and to the verbal and memory section of the RAKIT, LDT and WPPSI/WISC-R.
In the special groups, however, the SON-R 2,-7 had a stronger correlation than the RAKIT
with the evaluation of language skills.
The fact that, in the standardization research as well as in the special groups, the correlation
of the SON-R 2,-7 with the evaluation of language development deviated little from the
correlations of the specific language tests, like the Reynell/Schlichting and the TvK, with this
evaluation, is noteworthy.
The comparisons with the other tests demonstrate that the SON-R 2,-7 correlates adequately
with the evaluations of intelligence and language development. The SON-R 2,-7 correlated
more strongly with the evaluation of intelligence than did language tests, while, surprisingly, the
correlation with the evaluation of language development was approximately the same as that of
the language tests. The correlations with the evaluation of intelligence and language development were similar to those of the performance scale of other tests. Correlations with the evaluation of intelligence tended to be a bit weaker than for general intelligence tests; correlations
with the evaluation of language development were clearly weaker.

116

SON-R 2,-7

Table 9.18
Comparisons Between Tests in Relation to Evaluation of Intelligence and Language Skills

Standardization Research

Correlation with
evaluation of
intelligence

Correlation with
evaluation of
language developm.

Crit. SON-R
, -7 Diff.
test
2,

Crit. SON-R
, -7
test
2,

Diff.

,-17
SON-R 5,

116

.47

.47

.00

.41

.34

.07

RAKIT (Short.version)

158

.62

.42

.20

.58

.43

.14

REYNELL/SCHLICHTING
Mean LC, SD and WD

285

.50

.47

.03

.54

.49

.05

TvK (5 subtests)

95

.56

.62

.06

.51

.56

.05

Special Groups

P-SON

Crit. SON-R
, -7 Diff.
test
2,

Crit. SON-R
, -7
test
2,

Diff.

152

.75

.74

.00

.33

.35

.02

STUTSMAN

40

.61

.79

.18

.64

.70

.06

WPPSI/WPPSI-R
Performance Scale

39

.45

.49

.03

.41

.35

.06

68

.74
.63
.70

.65
.65
.65

.09
.01
.05

.74
.73
.62

.55
.55
.55

.19
.18
.06

77

.69
.50
.55
.52

.53
.53
.53
.53

.16
.03
.02
.01

.49
.19
.39
.52

.23
.23
.23
.23

.26
.05
.15
.29

69

.49

.60

.11

.27

.41

.14

141

.55

.75

.21

.29

.36

.08

49

.45

.72

.27

.29

.27

.02

WPPSI/WISC-R
Total IQ
Verbal Scale
Performance Scale
LDT
Total IQ
Performance subtests
Memory subtests
Verbal subtests
RAKIT
Shortened version or
mean of 4 subtests
REYNELL
Language compreh. A
TvK
Mean of 4 subtests

117

10 IMPLICATIONS OF THE RESEARCH FOR


CLINICAL SITUATIONS

In the previous chapters a detailed description was given of the results of the research carried
out with the SON-R 2,-7 to date. A summary of important results is presented here. In the
summary, the following questions will be answered:
have the objectives of the revision been realized,
does the test provide a valid measurement of intelligence,
for whom is the test suitable,
how should the results be interpreted.

10.1 THE OBJECTIVES OF THE REVISION


The most important objectives in the revision of the Preschool SON were:
actualizing and improving the testing materials,
clarifying the directions,
determining accurate and differentiated norms,
increasing the reliability and generalizability,
limiting the duration of the test by an adaptive procedure,
realizing a good correspondence with the SON-R 5,-17.

Testing materials
New testing materials were developed, and existing material was completely renewed. The
number of items almost doubled and the number of subtests was increased from five, as in the
Preschool SON, to six. Our experience with the test suggests that the materials are attractive for
children and that the drawings and directions are clear. The storage system has been greatly
improved and the materials are very manageable and durable.

Directions
The description of the directions is much more detailed than in the previous version of the test.
This requires a greater effort from the examiner in learning how to administer the test. However,
it prevents the examiner from giving a personal interpretation of the directions, which would
result in the test not being administered in a standardized manner. The directions leave sufficient
room to adapt the administration to specific characteristics of the child.
The directions show clearly how to provide feedback and help. This is important as the way
in which directions are given differs from that of most intelligence tests. In comparison with the
Preschool SON, feedback and help are offered more consistently and are described in more
detail.

Norms
In the Preschool SON, age norms were given only for the total score on the test. The SON-R
2,-7 has norms at the level of the subtests, the scale scores (SON-PS and SON-RS), and the
total score (SON-IQ). Furthermore, the general norms are based on a large sample of 1124
children, instead of 500 children as with the Preschool SON. The statistical fitting procedure
used with the SON-R 2,-7 increases the accuracy of the norms still further. Weighting the

118

SON-R 2,-7

sample with respect to a number of variables related to intelligence (SES level, mothers country
of birth, and sex) prevents differences between age groups, with regard to these variables, from
influencing the accuracy of the norms.
The age range of the norms has been extended, for practical purposes, from 2;6 to 8;0 years.
The norms are rather precisely differentiated according to age. In the norm tables monthly norm
groups are distinguished, whereas the computer program bases the norms on the exact age.
Differentiated norms are very important for testing young children. For each age group in the
standardization research, the change in the IQ score that would result from using the norms for
children who were one month older was determined (table 10.1). For the two and three-yearolds the difference was approximately 3 IQ points, for the four and five-year-olds it was 2 IQ
points and for the older children 1 IQ point. If three-monthly norm tables had been used (as is
the case in WPPSI-R), then the administration of the test one day earlier or later could result in
a difference of 9 IQ points for a child on the border between two age groups. The systematic
deviation (upwards or downwards) on the borderline between two age groups of the tables is
then 4 to 5 points. In the Preschool SON, with age groups of half a year, these deviations were
even larger. By using monthly age groups, the systematic deviations for the youngest children
are at most 1 or 2 IQ points with the SON-R 2,-7.
The most precise results, certainly for the youngest age groups, are obtained with the computer program. Furthermore, the program makes an accurate calculation of the scaled test results
possible when the test has not been administered in full. Finally, the program can calculate the
reference age for the total scores. This can only be approximated using the tables.
Table 10.1
Mean Change in IQ Score Over a Period of One Month
Age

Diff.

Age

Diff.

Age

Diff.

2;3 years
2;9 years
3;3 years
3;9 years

3.9
3.3
2.9
2.6

4;3 years
4;9 years
5;3 years
5;9 years

2.3
2.0
1.8
1.6

6;3 years
6;9 years
7;3 years

1.4
1.1
.9

Reliability and generalizability


An important objective in the revision of the test was to improve the reliability and generalizability. The mean reliability of the test increased from .82 to .90, and the generalizability from .64
to .78. All age groups showed an improvement but the more extreme ages showed the greatest
improvement.
The objective for the revision was a reliability of .90 and a generalizability of .80. From three
years onwards the results correspond closely to these values. For the two-year-olds however, the
reliability (.86) and the generalizability (.71) are lower than the goal we had set; for the sevenyear-olds the reliability (.92) and the generalizability (.82) are higher.
When calculating the reliability of the subtests, the problem arises that two types of interdependence occur. This occurs on the one hand because of the adaptive procedure (which leads
to an overestimation of the reliability), and on the other hand because of the feedback (which
leads to an underestimation of the reliability). Such interdependence makes the calculation of
the reliability of the subtests and the total score less accurate. However, the mean value of .90
for the reliability seems to be a realistic estimate. Lower reliability for the SON-R 2,-7 than for
the SON-R 5,-17 (.93) was consistent with expectations based on the younger age at the time
of administration of the SON-R 2,-7, the smaller number of subtests, and the shorter duration
of the test administration.
However, the reliability of .90 is clearly higher than the generalizability of the total score of
the test (.78). This is also self-evident. If the reliability of the SON-R 2,-7 were the same as the

IMPLICATIONS OF THE RESEARCH FOR CLINICAL SITUATIONS

119

generalizability, this would imply that the subtests have no specific reliable variance and that a
uniform level of ability determines the performance on all test items. Research with the SON-R
5,-17, however, has shown that the proportion of specific reliable variance of the subtests
actually increases as the age of the children being examined decreases.

Adaptive procedure
The adaptive procedure, in which the entry and discontinuation rules are applied, was developed
to limit the duration of the administration of the test, and to improve the motivation of the
children. The administration of childish items, ones that are much too easy, as well as the
administration of items that are too difficult, has a demotivating effect. Especially for children
who are uncertain and often feel that they are failing, being confronted with tasks that are above
their level can be very frustrating. Because the administration of a subtest is discontinued after
a maximum of three mistakes, it was often possible to administer the test to children who were
otherwise difficult or impossible to test.
The mean duration of administration, in the different groups of children who have been
examined, was less than one hour. For very young children the duration of administration was
much shorter and for older children the duration of administration was somewhat related to the
level of their ability: children who performed relatively well completed more items and this
took more time.

, -17
Correspondence with the SON-R 5,
In the first two versions of the SON tests no distinction was made between a test for the younger
and a test for the older children. This distinction was first made in the construction of the
Preschool SON and the SSON. However, there were large differences between the Preschool
SON and the SSON in both content and manner of administration (see section 1.2). One
objective during the construction of the SON-R 2,-7 was to achieve a good correspondence
with the SON-R 5,-17, which was published in 1988. A strong similarity in content now exists
between the difficult items of the subtests Mosaics, Categories, Analogies and Situations of the
SON-R 2,-7 and the easy items of the corresponding subtests of the SON-R 5,-17. In both
tests an adaptive procedure is used and feedback is given. In the case of the SON-R 5,-17,
however, feedback is limited to indicating whether a solution is correct or incorrect. Both tests
also use highly differentiated norms that can be calculated with a computer program.
Strong similarities exist between the materials and the procedures used on the two tests. The
correlation between the SON-R 2,-7 and the SON-R 5,-17, with an interval of three to four
months between tests, was considerable (r=.76) and not much lower than the retest correlation
of the SON-R 2,-7 for children from 4;6 years onwards (r=.81). Sattler (1992) considers an
overlap in the content of tests such as the WPPSI-R and the WISC-III not very desirable,
because the tests can no longer be administered within a short period as independent tests. The
reason for the overlap of the age norms in the SON tests, however, is not to make retests within
a short period possible, but to offer a choice that is optimal with respect to the age, skills and
specific problems of the child.

10.2 THE VALIDITY OF THE TEST


The question as to the validity of the test is primarily a question of whether the SON-R 2,-7 is a
good and usable measure of intelligence. This question is especially important because the tests
nonverbal character limits the manner in which intelligence can be measured. However, the question is also difficult to answer because intelligence is not an accurately defined and demarcated
concept. The research carried out does, therefore, not supply an unambiguous answer as to the
validity. It does, however, provide more insight into the positive aspects of the test and of the
limitations that must be taken into account in the interpretation of the results. In the discussion of
validity we make distinctions between the contents of the test, the relationship with other indicators of intelligence, and the relationship with other cognitive and non-cognitive variables.

120

SON-R 2,-7

Contents of the test


The subtests of the SON-R 2,-7 are all aimed at solving problems which require spatial insight
and the ability to reason abstractly and concretely. Performance depends less on acquired
knowledge than on the ability to discover methods and rules, and to apply these to new and
gradually more complex situations and materials. In this way the SON-R 2,-7 corresponds to
the definitions of intelligence as problem solving ability and ability to learn, and so emphasizes
fluid intelligence rather than crystallized intelligence (Cattell, 1971). However, this does not
mean that experience gained by the children will not influence their ability to solve the problems.
The performance on the test improves greatly with age. In the range 2;3 to 7;3 years, 86% of
the variance in the raw total score was explained by age. This means that the SON-R 2,-7 is
primarily a developmental test that registers large differences in cognitive development. This
also means that highly differentiated norms are required, and that high demands are made on the
test as the age variance is not relevant for the reliability and validity of the scaled scores.
The mean of the correlations between the scaled subtest scores of the SON-R 2,-7 was .36.
The correlations increased with age. The relatively low level of the correlations emphasizes the
importance of basing an evaluation of intelligence on a variety of test items. The common
variance of the subtests is determined mainly by one factor. Furthermore, a distinction can be
made between the more spatial, visual-motor, performance tests (Mosaics, Puzzles and
Patterns) and the tests aimed at concrete and abstract reasoning (Categories, Analogies and
Situations). The correct solutions to these last tests are reasoned out and selected, whereas the
solutions of the performance tests are constructed.

Congruent validity
The relationship with other indicators of intelligence was investigated by examining the correlations with evaluations by other persons, and with performance on other intelligence tests.
The correlation between the SON-IQ and primary school teachers evaluations of the
childrens intelligence was .46. For children from special education programs and at medical
pre-school daycare centers, the SON-IQ had a correlation of .66 with the evaluation of the
cognitive development given with the referral to the school in question. The evaluation of
intelligence, generally given after the administration of the test for the children from these
groups, had a correlation of .68 with the SON-IQ. For the children with a language/speech/
hearing disorder, the correlation between the SON-IQ and the subsequent evaluation of intelligence was .61. The fact that the correlation within mainstream primary education was lower
than within the special groups, is not surprising. In the first years of primary education, cognitive development is not studied systematically. However, children in the special groups are
given an extensive psychological examination at the time of admission, and subsequently,
intelligence tests are administered at regular intervals to follow their development.
In comparison with general, partially verbal intelligence tests, the correlations between the
SON-R 2,-7 and the evaluation of intelligence are slightly lower. In comparison with the
performance section of such tests and the Stutsman, the correlations are practically similar or
higher.
The mean correlation of the SON-R 2,-7 with general intelligence tests and with nonverbal
tests was .65. Approximately half the correlations with general intelligence tests lay between
.59 and .70 and approximately half the correlations with nonverbal (intelligence) tests lay
between .59 and .75.
Correlations higher than .70 were found with the total score on the WPPSI-R (.75), the
WISC-R (.74) and the shortened version of the BAS (.87); with the performance section of the
WPPSI(-R) and WISC-R (.73 and .83) and the BAS (.78); with the DTVP-2 (.73) and with the
SON-R 5,-17 (.76). These results indicate that a reasonably strong correlation existed between
the score on the SON-R 2,-7 and a large number of very diverse (nonverbal) intelligence tests.
However, they also indicate that the childs performance on the SON-R 2,-7 can differ greatly
from his or her performance on another test. These differences can be much larger than may be

IMPLICATIONS OF THE RESEARCH FOR CLINICAL SITUATIONS

121

expected solely on the basis of the reliability of the tests. Four different causes can be indicated:
differences in content and procedure between the tests,
fluctuations in performance,
stable changes,
limitations of the research.
Differences in content
Large differences in content can exist between the SON-R 2,-7 and other intelligence tests.
Specifically verbal subtests are absent in the SON, as are memory tests and tests in which a
series of actions must be imitated, such as the sequential subtests of the K-ABC. Further, the
subtests of the SON-R 2,-7 do not have tempo characteristics, whereby simple tasks must be
completed as quickly as possible. In addition to these differences in the content of the items,
there are differences in the manner of administration that may influence the results: for instance,
the help and feedback given during the administration of the SON-R 2,-7 and the limited
number of mistakes before the test is discontinued.
Fluctuations in performance
Fluctuations in the performances of (young) children can also be an important cause of the
difference in scores. The retest correlation of the SON-R 2,-7, with an interval of three to four
months, was .79. This was clearly lower than the reliability of the test (.90), which is based on
the internal consistency. The idea that, in this short period, large stable changes occur effecting
this difference, is not plausible. The retest correlation may have been influenced slightly negatively by the fact that the learning effect that occurs with a retest is not the same for each child.
The study of differences in scores between the performance scale of the WPPSI-R and the SONIQ shows that large differences in performance may occur that cannot be explained solely on the
basis of content, stable changes, or errors of measurement. The relationship between the performance on the different subtests of the test was less strong, and the correlation of the SON-IQ
with other test scores were lower for the younger children. Problems with concentration and
motivation during the administration of the test occurred more often at younger ages. In the age
range for which the SON-R 2,-7 is intended (and especially for the youngest children), fluctuations in performance that are difficult to predict will have to be taken into account. All kinds of
factors can influence performance: how much at ease a child is with a specific examiner,
something that happened on the day of the test administration, feelings of anticipation, physical
condition, like tiredness or beginning influenza, etc. Fluctuations in performance, not related to
characteristics of the subtests, can also occur during the test administration. This could be due to
factors like tiredness during the course of the test administration, physical discomfort, or
increasing motivation as the child begins to feel more at ease in the test situation.
Stable changes
More stable changes in ability may lead to differences in scores when a relatively large interval
occurs between the administration of the tests. The rate at which children develop is not the
same for everyone and will fluctuate. Various factors influence the cognitive development of
children. Large changes in the circumstances in which a child grows up, and important events
may slow development down or, alternately, remove impediments to development. In various
correlational research projects with the SON-R 2,-7, the interval between the administration of
the tests was more than one year and differences in rate of development definitely affected part
of the correlations negatively.
Limitations of the research
In the different phases of the research administering the tests, scoring, calculating the age,
determining the scaled scores, recording and processing the data mistakes can also be
made that influence the results. One example is the switching of subjects when matching the
data. This happened during the comparative research of the SON-R 2,-7 and the WPPSI-R
(Tellegen, 1997). In large scale research such mistakes can be, and are, made. Also, knowing the

122

SON-R 2,-7

specific conditions under which each test was administered, and evaluating whether each administration occurred according to the standard directions becomes more difficult. This problem occurred, for instance, with the test results supplied by the special schools. The extent to
which the standard test administration procedure may have had to be adapted to the specific
problems of different children is not known for these results.
The inaccuracy of norm tables can also be a source of differences between scores. However,
in the case of tests with very broad norm tables (half-yearly or yearly), this was corrected for
during this research. When comparing the test scores, poor correspondence between norms
because of obsolete norms or because the norm groups are not comparable, can lead to large
differences between scores. However, this has generally no effect on the correlations.
During this research, the administration of other tests was, for practical purposes, sometimes
limited to a shortened version, or, in connection with handicaps of the children, to the nonverbal
or performance section. This often limited the reliability and validity of the criterion tests and
also the strength of the correlations.
Conclusions
The limitations of the reliability of the test, the specific characteristics of the contents of the
SON-R 2,-7, and the instability of the test performances of young children seem to us to be the
most important causes of the differences in scores between the SON-R 2,-7 and other (nonverbal) intelligence tests. These factors, which lead to lower correlations, play a smaller part
when children are older. When the influence of stable changes and the limitations of the
research are taken into account, it is realistic to take a value of approximately .70 for the
correlation of the SON-R 2,-7 with other intelligence tests as a point of departure, if the
interval between the administration of the tests is not longer than one year.
Based on this evaluation of the correlation with other intelligence tests, and the data on the
reliability (based on internal consistency) and the stability, the variance of the SON-IQ can be
roughly described as follows (see figure 10.1):
10% measurement error variance,
10% reliable unstable variance,
10% reliable test-specific variance,
70% stable reliable variance that is generalizable to other tests.
The last component, the variance that the SON-R 2,-7 has in common with other intelligence
tests administered at a different time, is the most relevant for the evaluation of intelligence, and
the value .70 can be seen as an indication of the validity. However, for very young children, the
validity will be lower due to the lower reliability of the test and greater instability; in older
children the validity will, in accordance, be higher. The proportion of test-specific variance will,
of course, also depend on the extent to which the criterion tests correspond in content and
procedure with the SON-R 2,-7. The validity in this case is based on the correlation with a
(nonverbal) intelligence test administered at another time. However, if we could correlate the
scores on the SON-R 2,-7 with the ideal score based on a large number of other tests,
administered at different times within one year, then the correlation would equal ,.70 and the
validity coefficient would be .84.

Construct Validity
The objective during the development of the first SON test was to overcome the one-sidedness
of the existing performance tests, and to incorporate tasks related to abstract and concrete
reasoning in the nonverbal test. The aim of the SON-test was, and is, to measure general
intelligence as precisely as possible, within the limitations of a nonverbal administration. The
results of the factor analyses on the subtests of the SON-R 2,-7, carried out with very diverse
groups of children, in the Netherlands as well as abroad, support the distinction between three
performance tests (Mosaics, Puzzles and Patterns) and three reasoning tests (Categories,
Analogies and Situations). This is a relative distinction, as the largest part of the common
variance of the subtests can be reduced to one general factor.
The significance of the two scales is confirmed by the correlations with other tests. The

123

IMPLICATIONS OF THE RESEARCH FOR CLINICAL SITUATIONS

Figure 10.1
The Components of the Variance of the SON-R 2,-7 IQ Score

Variance of measurement error

10%

Unstable variance

10%

Test-specific variance

10%

Stable
generalizable
variance

reliability
(.90)
70%

stability
(.80)

proportion
of valid
variance
(.70)

Performance Scale had a relatively strong correlation with the performance scale of the Wechsler
tests, the performance section of the LDT and with the DTVP-2. The Reasoning Scale had a
relatively strong correlation with the verbal section of the Wechsler tests, the verbal section of
the LDT and the BAS, and with the Reynell Test for Language Comprehension. These results
show that a broader domain of intelligence is measured by the SON-R 2,-7 than by tests that
consist exclusively of a performance section.
Memory
The SON-R 2,-7, like the SON-R 5,-17, has no specific memory tests. The correlations of the
SON-IQ with two memory tests of the LDT (.43), the TOMAL (.45), the sequential development index of the K-ABC/GOS (.29 and .49), and with auditory memory in the Schlichting Test
(.27), were moderately positive. These relatively low correlations argue strongly in favor of
examining intelligence and memory separately. Incorporating a few memory subtests in an
intelligence test is too restricted a basis for a valid assessment of memory. In addition, the
interpretation of the intelligence score becomes more difficult because memory is a separate
factor.
Visual perception
The correlation of .73 with the DTVP-2, a test for visual perception, shows that visual perception is strongly represented in the SON-R 2,-7. The Performance Scale in particular was
strongly related to the DTVP-2. Perception, in this context, is not passively seeing, but comprises structuring, evaluating and comparing visual information.
Motor skills
The subtests of the Performance Scale of the SON-R 2,-7 in particular require visual-motor
skills. In primary education, the correlation of the SON-IQ with the teachers evaluation of
motor development was low (.24). However, for children with a developmental delay or with a
language/speech/hearing disorder, these correlations were higher (.46 and .32).
Verbal skills
Knowing the relationship between the SON-R 2,-7 and verbal intelligence and language skills
is important if we are to be able to judge to what extent the domain of intelligence is restricted

124

SON-R 2,-7

by the nonverbal character of the test. However, verbal intelligence is not a clearly defined
concept. On the basis of factor-analytical research, a distinction is made by Kaufman (1975) in
the verbal scale of the WISC-R between the factors Verbal Comprehension and Freedom from
Distractibility. Further, the factor Verbal Comprehension, includes quite diverse subtests, for
instance, Similarities, a verbal subtest of abstract reasoning, and Vocabulary, a test tapping
verbal knowledge. The skills required for the subtest Similarities belong to the intelligence
domain that the SON-R 2,-7 is intended to measure. The performance on a subtest like Vocabulary is so dependent on the circumstances in which a child grows up that a nonverbal alternative for this test is not feasible for the SON-R 2,-7. On the K-ABC, subtests which clearly tap
verbal knowledge are scored separately and are not included in the calculation of the mental
development index.
The fact that a precise distinction between intelligence, verbal intelligence and language
skills cannot be made is shown by the correlations of the SON-IQ with evaluations of intelligence and language skills. In the case of children in primary education, the correlation with the
evaluation of intelligence was .46 and the correlation with the evaluation of language development was only slightly lower, .44. However, clear differences in the correlations with these
evaluations were found for children with a developmental delay (.68 versus .48) and for children
with a language/speech/hearing disorder (.61 versus .31).
The correlations of the SON-IQ with the verbal section of intelligence tests, and with tests of
language skills and language development were in the order of .50. Taking into account the fact
that the SON-R 2,-7 can be administered completely without using any language, these correlations are considerable. The Reasoning Scale of the SON-R 2,-7 contributed most to these
correlations.
Socio-economic differences
The SON-IQ had approximately the same association with the SES level of the parents as other
(nonverbal) intelligence tests. However, the correlation with SES level was less strong than for
language tests and the verbal section of general intelligence tests. In comparison with most
other tests, smaller differences were found between immigrant and native Dutch children when
using the SON-R 2,-7.
Conclusions
These results support the conclusion that the concept of intelligence measured by the SON-R
2,-7 corresponds broadly with what is considered to be general intelligence. The SON-R 2,-7
emphasises visual-motor and perceptual skills, spatial insight, and the ability to reason abstractly and concretely. This corresponds with the factors Fluid Intelligence and Broad Visual
Perception of Carrolls classification (1993). Memory, knowledge, and language skills have an
indirect association with performance, but the measurement is not based on these skills. The test
is less dependent on socio-economic factors than are verbal tests, and can best be defined as a
nonverbal, general intelligence test with an emphasis on fluid intelligence and visual perception.

10.3 THE TARGET GROUPS


Over the years the SON test has developed from an intelligence test for deaf children to a
general nonverbal intelligence test that is especially suitable for children with communicative
handicaps, for example, children with language/speech/hearing disorders, autistic children, and
children raised with a different language or bilingually. The test is also highly suitable for
children who are difficult to test, who have learning problems, or a developmental delay. As
reference ages were also calculated in the standardization, the test can be used for older mentally deficient or mentally handicapped children and adults. The test is less suited to certain
categories of children, e.g., children with visual handicaps and children with serious motor
handicaps.

IMPLICATIONS OF THE RESEARCH FOR CLINICAL SITUATIONS

125

Communicative handicaps
The research with native Dutch children who were deaf but not multiple handicapped found that
the mean IQ score (97.9) deviated only slightly from the score of the hearing population. As
with the SON-R 5,-17, the lower scores in the group of deaf children related only to Categories
and Analogies, the subtests for abstract reasoning.
The mean score of the children with a language/speech and/or hearing disorder was approximately 90. However, this group of children cannot be compared very well with the deaf
children. On one hand, the group included children with multiple handicaps. On the other hand,
children with a language/speech/hearing disorder, who were functioning well in regular education were not included in the research group.
Administration of the SON-R 2,-7 appears to be quite possible for children with communicative handicaps. Cooperation and comprehension of the directions were judged to be good by
the examiner for 80%-90% of these children. Motivation was judged to be good in approximately 70%. Problems with concentration were most frequently mentioned: with about 40% of
the children rated as moderate or fluctuating. The test could be administered in full to practically all the children.
In the case of the above mentioned children a nonverbal test is necessary for a valid evaluation of the level of intelligence, as the delay in verbal development may result completely or
partially from this handicap, and bear little relation to other aspects of cognitive development.
In this group the SON-IQ correlated clearly with the evaluation of intelligence (r=.61), whereas
the correlation with the evaluation of language development was much lower (r=.31).

Developmental delay/disorder
The research on children with a developmental delay and developmental disorders was carried
out with children at schools for special education with a pre-school department, at medical preschool daycare centers, and with children with pervasive developmental disorders. With these
children, multiple social, emotional and behavioral problems often occur, as well as delays in
cognitive, verbal and motor development.
The mean SON-IQ for this group was approximately 80. A considerable delay was found
on all subtests. Large differences in scores were found within the group. Approximately 10%
of the children had a score close to 50, and slightly more than 10% had a score higher than
100. Performance on the test corresponded strongly with the diagnostic evaluation of cognitive development at the time of admittance to the school/institute (r=.66), and with the
evaluation of intelligence that was made later by other professionals involved with the
children (r=.68).
The children in this group were more difficult to test. Motivation, concentration, cooperation
or comprehension of the directions were more often rated as moderate or fluctuating by the
examiner than in the group of children with communicative handicaps.

Children who are difficult to test


In an ideal testing situation the children are well motivated to complete the test, they comprehend the directions, and they work concentratedly until they have finished the tasks. In practice
this may be different. In the case of (young) children, motivation and concentration cannot be
expected to be present beforehand. The testing situation, the materials and the interaction with
the examiner must be such that the child becomes interested, and the course of the administration should be structured in such a way that the interest is held.
The comparison of examiners evaluations of the childrens testability, after administration
of the GOS 2,-4,, the RAKIT and the SON-R 2,-7, showed that the children were better
motivated and concentrated during the administration of the SON-R 2,-7. They also understood the directions better than with the two other, partially verbal, tests. A number of different
characteristics of the SON-R 2,-7 probably played a role here. The nonverbal character of the
test, whereby the child may, but does not have to talk is attractive for children who are shy or
guarded toward adults. The help and feedback offered encourages the child to complete the
tasks. This lessens the feelings of failure, helps clarify the objectives of the tasks, and leads to a

126

SON-R 2,-7

more natural interaction between examiner and child. Strictly limiting the number of items
completed incorrectly also prevents the child from quickly becoming demotivated. Furthermore, the SON-R 2,-7 has very varied test items, with which the child is constantly and
actively involved.
These qualities make the SON-R 2,-7 attractive for use with children who do not have a
specific communicative handicap, but whose social, emotional and behavioral problems may
interfere with the administration of a more traditional intelligence test.

The mentally handicapped


In the case of the mentally handicapped, the administration and interpretation of intelligence
tests are difficult, because a large discrepancy exists between their chronological age and the
level at which they function. When a test that is appropriate for their age is administered, the
scaled scores will often have the lowest value, making further differentiation according to level
impossible. However, when a test that corresponds to the level of the subject is chosen, for
example a test for young children, a reference age can be calculated, but not an IQ score or any
other standard score scaled for age. The administration of a test approprate to the level of the
subject will often be preferred. The manner of administering the tasks, and the level of difficulty
of the tasks, will then correspond well to the abilities of the subject, and the administration will
be more motivating than the administration of a test at too high a level.
When they are no longer in a period of rapid development, the reference ages of mentally
handicapped subjects can be compared. However, if the research is carried out on mentally
handicapped children who are still developing, the comparison of the reference age must be
limited to persons of approximately the same age.
Using the SON-R 2,-7 and the SON-R 5,-17, research was carried out, on a limited scale,
with children and adults who were mentally handicapped (Wijnands, 1997). The correlation
between the reference age on the SON tests and the reference age based on various other tests,
including different versions of the Wechsler tests, the BOS 2-30 and the MSCA (n=26), was.79.
The positive points of the SON-R 2,-7, mentioned above in relation to children who are
difficult to test, appear to be very important as well when testing persons with a mental handicap. These people often have a great fear of failure, and the help and feedback, and the discontinuation after a limited number of mistakes, contribute to their motivation and enjoyment in
completing the test.

Immigrant children
Testing immigrant children with traditional intelligence tests can lead to an underestimation of
their cognitive potential. This occurs because no account is taken of the fact that lack of
knowledge of, and skill in the language of the examiner does not necessarily indicate that the
verbal capacities of these children are lower. A lower level of performance on the verbal section
can, but it does not necessarily, indicate a lower level of intelligence. The performance on the
performance section of these tests can also be biased because the directions are usually given
verbally. Correlational comparisons between the SON-R 2,-7 and a number of other tests
showed that, in most cases, the differences between native Dutch and immigrant children were
smaller when the SON-R 2,-7 was used. On the SON-R 2,-7, the difference in IQ scores
between native Dutch and immigrant children was 7.5 IQ points, half a standard deviation.
Children with one parent born outside the Netherlands scored as high as native Dutch children.
Turkish and Moroccan children scored approximately 10 points higher on the SON-R 2,-7 than
comparable groups tested with the RAKIT, and 6 points higher than comparable groups tested
with the LEM.
The delay that was found in the group of immigrant children was comparable to the delay
found in native Dutch children with parents of the same SES level. Research on immigrant
children who participated in OPSTAP(JE) showed that, after a two-year stimulation program,
these children performed at the mean level of native Dutch children. However, selection for
participation in OPSTAP(JE) and/or the research may have contributed to these relatively good
performances.

IMPLICATIONS OF THE RESEARCH FOR CLINICAL SITUATIONS

127

There were no indications that the contents of the pictures in the different subtests caused extra
problems for the children with a different cultural background. We assume that depicting children with a non-western appearance contributed to making the SON-R 2,-7 test materials
recognizable to these children.
All results indicate that the test can be used effectively with immigrant children. Of course,
an evaluation of these childrens language skills, and of the extent of their knowledge of the
Dutch language, can also be important. However, this must not be confused with intelligence,
nor should language skills directly influence the evaluation of intelligence.

Visual handicaps
The SON-R 2,-7 has a strong visual orientation. All subtests use pictures. When vision is
greatly impaired and not compensated by glasses or other means, use of the SON-R 2,-7 must
be strongly discouraged. Adapted tests are available for these children (Dekker, 1987). Slight
limitations of vision will probably not be a problem. The pictures in the test are large and clear
and do not require the ability to discriminate small visual differences.

Motor handicaps
Various tasks of the SON-R 2,-7 require motor skills and eye-hand coordination. During the
construction of the test an effort was made to minimize the influence of this on the evaluation of
performance. In the subtest Patterns wide criteria for the evaluation of the drawings are used. In
the subtests Puzzles and Mosaics, frames are used to make it easier for young and poorly
coordinated children to perform the tasks well. Furthermore, the time limits, in as far as they are
applied, are broad and speed is not scored. In the case of children with more serious motor
handicaps, the possibility that these handicaps may influence performance negatively should be
considered. In chapter 11, possibilities for adapting the administration procedure to the childs
level of motor skill are discussed.

Use of the test in other countries


The nonverbal character of the test, and the availability of the manual in different languages,
mean that the test can easily be used in other countries. Research in Australia, Great Britain and
the United States of America has shown that the testing materials are very usable in these
countries. A problem occurred now and then with one or two items, for example, the example
item of the subtest Situations in which a rabbit in a cage is being fed. Having a rabbit as a pet in
Australia, unlike the Netherlands, is unusual. However, such small problems with the testing
materials will influence the validity of the test in these countries slightly, or not at all. If the test
is used in countries and cultures that differ greatly from the Netherlands, or more generally,
from Western countries, one should check whether the testing materials are sufficiently recognizable or need adapting.
Comparison of the scores on the SON-R 2,-7 according to Dutch norms, and the scores
on other tests according to English and American norms, has shown that the mean scores
differ only slightly when the period between the different standardizations is taken into
account. For countries that have a comparable socio-economic level to the Netherlands, the
Dutch norms of the SON-R 2,-7 can be used to evaluate intelligence. However, national
norms remain preferable to improve the standardization for the country in question. One
must keep in mind, however, that many of the national norms currently in use probably
produce greater distortions than the Dutch norms for the SON-R 2,-7. At least for the time
being, these are up to date.

Ages
The SON-R 2,-7 was originally intended for the age range 2, to 7 years. However, the norms
of the tests were constructed for the age range 2 to 8 years. In the following section we will show
how the test can be used for a number of different age groups. The question when the SON-R
5,-17 should be preferred to the SON-R 2,-7 will also be discussed here.

128

SON-R 2,-7

2;0 2;5 years


The test is used experimentally. At this age considerable floor effects occur and reliability
and generalizability are low. The motivation and concentration of children at this age are
often insufficient to allow completion of the test. The test can be diagnostically interesting
when a child has a high score on a test with strong ceiling effects in this age group, for
instance, the BOS 2-30.
2;6 2;11 years
The test is usable in this age group. Moderate floor effects occur. Reliability and generalizability are reasonable. However, difficulty coping with the test situation can be a problem,
especially for children with specific problems and handicaps.
3;0 5;5 years
In this age group the test can be used to good effect. Floor or ceiling effects rarely occur.
Reliability and generalizability are good.
5;5 5;11 years
The SON-R 5,-17 has also been standardized for this age group. However, the SON-R
2,-7 is more suitable because various subtests of the SON-R 5,-17 have a strong floor
effect at this age.
6;0 6;11 years
Both the SON-R 2,-7 and the SON-R 5,-17 are highly suitable for this age group. The
SON-R 2,-7 has slight ceiling effects. The use of the SON-R 5,-17 is preferable when
examining (highly) gifted children. For children with a cognitive delay and/or handicaps,
and for children who are difficult to test, use of the SON-R 2,-7 is preferable.
7;0 7;11 years
In general, use of the SON-R 5,-17 is preferable in this age group. The reliability and
generalizability of the SON-R 2,-7 are good, but ceiling effects clearly occur. This may not
be a problem for children with a below-average level. Administration of the SON-R 2,-7
can be attractive in this age group for children with handicaps, children with a cognitive
developmental delay, and for children who are difficult to test. These children may profit
from the help offered during the test and from the easy level of the first items.
From 8;0 years onwards
From this age on, no norms for the standard scores exist for the SON-R 2,-7. The SON-R
5,-17 has been standardized to the age of 17;0 years. For children 8 years and older, the
SON-R 2,-7 can be interesting when the level is so low that the administration of a test that
corresponds to the abilities of the subject is more suitable. For these children, accurate
determination of the reference age can be more informative than an extremely low IQ score.

10.4 THE INTERPRETATION OF THE SCORES


The most important function of the administration of a test like the SON-R 2,-7 is to provide
information about the level of the cognitive development of a child for diagnostic purposes, for
advice and assistance, and for an (interim) evaluation of the effect of treatment programs and
interventions. This means that the results of the test may have great consequences for the home
situation of the child and for his or her development. The effects can also be far-reaching for the
parents, and advice given or decisions taken in the process of diagnosis can have financial
consequences.
In general, administration of the test will not take place in isolation, but within a framework
of discussions with, and observation of child and parent(s), and of information from schools or

IMPLICATIONS OF THE RESEARCH FOR CLINICAL SITUATIONS

129

family doctors. An intelligence test is also frequently administered in combination with other
developmental tests. The administration of the test will often take place as part of a cycle in
which formulating questions and gathering relevant information are alternated (Kievit & Tak,
1996).
The SON-R 2,-7 supplies information about performance on different levels (subtests, scale
scores and total scores), and in different ways (reference age, deviation scores, observations). In
the following section, the value of this information for the diagnostic process, and the risks that
exist when a single result on one test is interpreted as the level of intelligence will be discussed.

Level of scores
The objective of the SON-R 2,-7 is to give an impression of the general intelligence level of the
child. Diverse subtests are used not to determine differences in performance among the subtests,
but rather because the influence of the specific characteristics of the subtests on the total score
decreases when the test is made up of several subtests. The accuracy of the IQ score is not
primarily judged by the reliability of the test, but by its generalizability. All the variance that is
specific to the subtests is considered irrelevant for the generalizability. The SON-IQ, with the
80% probability interval that is based on the generalizability should, in our opinion, be the basis
for the evaluation of the test results.
Subtest scores
The differences between the scores on the subtests have the lowest reliability and stability. The
retest research shows that differences between subtest scores are also unstable. When the differences between subtests are relatively large, the chance is greater that the order of the differences
is largely maintained. If one wants to interpret the differences between the subtest scores
further, one must therefore first determine whether the differences are relatively large. This can
be done by the computer program.
Although conclusions should not be drawn on the basis of the results at the subtest level,
evaluating the differences between subtest scores in relation to other information available
about the child, or to impressions gained during the administration of the test, may be worthwhile. Such an evaluation may allow specific ideas to be developed about the childs strengths
and weaknesses, which can subsequently be examined further. The explorative use of the subtest
data can be of value when the intertest differences are sufficiently large.
Scale scores
The possibilities for using the scores on the Performance Scale and the Reasoning Scale are
greater than the possibilities of the subtest scores, but are still more limited than the possibilities
for the score on the SON-IQ. The scale scores are more reliable than the subtest scores, with a
mean of .85, and a retest stability of .72. An important difference with respect to the subtest
scores is that the scale scores are based on several subtests. This means that generalizable
statements can be made on the basis of these scores. However, the correlation between the two
scale scores is rather high (.56), which means that the reliability of the difference between the
two scores is limited to .65. The stability of the difference score is even lower, i.e., .46. Before
interpreting differences between the two scores, one should certainly determine whether the
difference is significant. Both the norm tables and the computer program supply information on
this. General statements, for example about a possible difference between the development of
performance and reasoning ability, can be made only if the probability intervals of the two
scores do not overlap. This information is supplied when using the computer program.
The diagnostic possibilities of the two scale scores need to be studied in further detail. For
the time being, this information should be used exploratively.
IQ score
The SON-IQ, the scaled and standardized total score of the SON-R 2,-7, is the most usable,
generalizable, reliable, and stable result of the test. Combined with the 80% probability interval,

130

SON-R 2,-7

Table 10.2
Classification of IQ Scores and Intelligence Levels
IQ

Description

>130
121130
111120
90110
80 89
70 79
<70

Very high
High
Above aver.
Average
Below aver.
Low
Very low

2%
7%
16%
50%
16%
7%
2%

IQ

Description (1)

IQ

Description (2)

>130
121130
111120
90110
80 89
60 79
<60

Highly gifted
Gifted
Above average
Average
Less gifted
Learning probl.
Learning disorder

>129
120129
110119
90109
80 89
70 79
<70

Very superior
Superior
High average
Average
Low average
Borderline
Mentally
deficient

(1) classification by Struiksma en Geelhoed (1996)


(2) classification used with the Wechsler scales (Sattler, 1992)

the SON-IQ gives a good indication of the level of intelligence of the child. The categories as
shown in table 10.2 can be used to give a rough definition of the test result. The first column is
neutral and descriptive: this shows whether the childs performance on the test was high, low or
average. This classification is also used by the DTVP-2 and, with slightly different limits, by the
WPPSI-R. The two other classifications give a description of the level of intelligence of the
child in qualitative terms, related to the IQ score.

The reference age


At the level of subtests, scale scores and total test results, the results on the test can also be
presented as a reference age. As for the standard scores, the reference age based on the total test
result is the most reliable, stable and generalizable age score, and can therefore best be used for
the evaluation of the results.
For children and adults older than 8 years, the reference age is the only standardized information available. For children up to 8 years old, the reference age can be used as additional
information for the SON-IQ.
Opinions differ as to the usefulness of reference ages (also called test ages or mental ages).
The reference age reflects an absolute level of performance, as is demonstrated by the fact that
the reference age can be estimated fairly accurately from the sum of the raw scores. Contrary to
the standard scores that represent relative levels of performance within an age group, the
calculation of the reference age does not depend on the age of the child. However, the interpretation of the reference age should be done in connection with the age at the time of the test
administration. An identical reference age of 4;3 years means that a child of 4;1 years has
performed as well on the test as a child of 5;2 years, but the psychological significance of the
test result is completely different for the two children.
An IQ score can be expected to be more or less stable during the course of a childs development. However, this is not the case for the reference age; as long as the child is developing, the
reference age will increase. As the development progresses much faster at a younger than at an
older age, the discrepancy between the reference age and the age at the time of administration of
the test will constantly increase for a specific child. For an IQ of 80, the discrepancy in months is
much smaller at the age of 4;0 years than at the age of 6;0 years. Furthermore, given a fixed age,
the discrepancy is much smaller with an IQ of 80 than with an IQ of 120. This means that reference ages, or discrepancies between reference age and age at the time of administration of the
test, are often difficult to compare and do not lend themselves very well for statistical analysis.
Another disadvantage of using the reference age is that, in contrast to the IQ score, no probability
interval is offered to indicate how (in)accurate the statement about the reference age is.
Despite these limitations, the reference ages can certainly be useful. The reference age
represents, in a very concrete way, how the child functions during the test, and this information

IMPLICATIONS OF THE RESEARCH FOR CLINICAL SITUATIONS

131

can be instructive when reporting the results, for instance, to the parents. Furthermore, the
reference age provides information about the level of tasks that the child comprehends, and this
can be used to show at which level learning materials or training can be given. One can,
naturally, not only depend on the reference age. When a 7-year-old child has a reference age of
3;5, this child is in a completely different situation, and has completely different learning
abilities, from a 3-year-old child with a reference age of 3;5 years. For that matter, a 3-year-old
with an IQ of 80 is of course not equal to a 7-year-old with an IQ of 80.
The reference age can best be described as the childs performance on the test corresponds
to the mean performance of children of .. years old. This is better than saying the child
functions at the level of a .. year-old. The latter formulation suggests, unjustly, that the complete cognitive or mental level of the child is described by the test.

Evaluation by the examiner


During the research, children were evaluated on aspects of motivation, concentration, cooperation and comprehension of the directions, after the test had been administered. Especially in the
case of young children and children with developmental delays and/or disorders, problems
occurred regularly in these areas. These groups often performed less well, or even badly on the
test. Children who received negative evaluations but were able to complete the test were not
excluded from the description and analysis of the results. The information from such an evaluation is important for the diagnosis. If one has the impression that the child was not well
motivated and concentrated, the question whether the test result provides a valid indication of
the intelligence arises. An important point here is whether problems occurred by chance during
this test administration, or whether they are characteristic for the child and occur in many
situations. The question may then arise whether treating the motivation and concentration
problems will, in the long run, lead to better test and learning performances.
When a child performs badly on a test and the evaluation of the various motivational and
concentration aspects is positive, one can have more confidence that the child has really shown
his or her ability level, and that the low score is not the result of the fact that he or she is difficult
to test.
The four evaluation categories used in the research and printed on the record form can be
taken as point of departure for the observation. Because the children are so actively involved
with the SON-R 2,-7, and because of the extensive interaction between the examiner and the
child, the test offers many opportunities for observation, and we expect users of the test to make
use of this.

Generalization of the test result


The SON-IQ shows how well the child has performed on the test. Based on the first description
in table 10.2, this performance can be classified as ranging from very high to very low.
Classification becomes more difficult when one wants to take the limitations of the test into
account, and to make a general statement about the intelligence level based on the performance.
Generalizing across subtests
The 80% probability interval that is always given with the IQ scores allows for two limitations:
namely, the unreliability of the test and the fact that part of the reliable variance is specific for
each subtest. The interval indicates where the IQ score is expected to lie if a large number of
comparable subtests were administered. This score would be almost completely reliable, and
the influence of specific characteristics of the subtests would then be negligible. This interval
has a width of about 18 points. Most descriptive categories of the IQ score in table 10.2 have a
width of 10 points. This means that the 80% interval of the IQ score embraces either two
categories, or one category and part of both of the adjacent categories.
Generalizing across time
The 80% interval of the IQ score takes no account of the stability of the test over a period of
several months. However, this is important for the evaluation of intelligence. The retest correla-

132

SON-R 2,-7

tion of the SON-R 2,-7 in a heterogeneous age group is as high as the generalizability coefficient. Whether this is valid for all age groups is not known. However, it means that the 80%
probability interval can also be interpreted in another way, namely as the expected interval for
the hypothetical IQ score if we could administer these six subtests many times with an intervening period of several months. In this interpretation, allowance has been made, in the 80%
interval, for the reliability as well as the instability, but no longer for the specific variance of the
subtests.
Generalizing across tests and time
The 80% interval of the SON-IQ, with which (approximately) two of the three limitations of a
single administration of the test can be taken into account in two different ways, has a width that
is not precise enough, in many practical situations, to make important decisions. If all three
aspects are taken into consideration unreliability, instability and test-specific characteristics
an assessment of the level of intelligence, based on the test result, can be made with even less
certainty. The real danger of drawing completely incorrect conclusions based on a single test
result for young children is demonstrated by the comparison of the scores on the SON-IQ with
the PIQ and the WPPSI-R (see section 9.8). In the case of four of the 230 children, a difference
of around 40 points occurred. In two cases the child had a low score on the SON-R 2,-7, and in
two cases on the WPPSI-R. If the evaluation is to be used to make important decisions, with far
reaching consequences for the child and his or her surroundings, the administration of a single
intelligence test is unlikely to be sufficient. The risk that a distorted idea of the intelligence will
be formed, due to a combination of unreliability, fluctuations in the performance and specific
characteristics of the test, is too great.
Administration of several tests
Based on the research on the congruent validity of the SON-R 2,-7, the variance of the test has
been described as follows in section 10.2 (see figure 10.1):
measurement error variance (10%)
unstable variance (10%)
test-specific variance (10%)
valid generalizable variance (70%)
The proportion of valid generalizable variance is based on correlations of approximately .70
with other (nonverbal) intelligence tests. If we assume that there are other intelligence tests,
with similar variance compositions, and with correlations of .70 with each other and with the
SON-R 2,-7, then the composition of the variance of the mean score when two or three
different tests are administered can be calculated (see table 10.3). The assumption here is that
the interval between the test administrations is between several weeks and several months.
When two tests are administered, the share of the undesired sources of variance is reduced by
40%. The proportion of valid variance increases from 70% to 82%. When three tests are administered, the share of the undesired sources of variance is reduced by 60%; the proportion of valid
variance now becomes 88%. For young children, the share of undesired variance is larger than
for the older children. Therefore, in the last part of table 10.3, an estimate has been made of the
Table 10.3
Composition of the Variance When Several Tests Are Administered

SON-R 2,-7

Average of
two tests

Average of
three tests

2-4 years: 3 tests


5-7 years: 2 tests

Variance of meas. error


Unstable variance
Specific variance

10%
10%
10%

6%
6%
6%

4%
4%
4%

5%
5%
5%

Valid variance

70%

82%

88%

85%

133

IMPLICATIONS OF THE RESEARCH FOR CLINICAL SITUATIONS

Table 10.4
Correction of Mean IQ Score Based on Administration of Two or Three Tests
Mean IQ
50
55
60

45
50
56

Mean

IQ

Mean

IQ

Mean

IQ

Mean

IQ

Mean

IQ

Mean

IQ

65
70
75

61
67
72

80
85
90

78
83
89

95
100
105

94
100
106

110
115
120

111
117
122

125
130
135

128
133
139

140
145
150

144
150
155

both the mean IQ score (Mean) and the newly standardized IQ score (IQ) are presented

components of the variance, when three tests are administered to children from 2 to 4 years of
age, and when two tests are administered to children from 5 to 7 years of age. This leads to an
estimate of 85% valid variance for both groups. The reliability of the mean score is .95 and the
stability is .90.
When the scores on two or three tests are averaged, the dispersion becomes narrower. Table
10.4 shows how to correct the mean score for this. This correction is based on a standard
deviation of the mean score of 13.6.
To calculate the mean of the scores, the norms of the different tests must be comparable. For
a number of reasons, including obsolescence of the test norms, this is often not the case. This
problem was discussed in section 9.10. In table 10.5 the expected obsolescence of the norms of
the SON-R 2,-7, based on an estimate of obsolescence of one IQ point per 3 years, is presented.
This means that, in theory, 3 years after the standardization 1 IQ point should be subtracted from
the IQ score. The same holds true for other intelligence tests.
Table 10.5
Obsolescence of the Norms of the SON-IQ
Year of administration

Year of administration

Year of administration

1996 - 1998: 1 point


1999 - 2001: 2 points

2002 - 2004: 3 points


2005 - 2007: 4 points

2008 - 2010: 5 points


2011 - 2013: 6 points

The obsolescence has been calculated from 1993/94, the year in which the standardization was carried
out. An obsolescence rate of 1 IQ point per three years was used.

An improved estimate of the level of intelligence can also be gained by administering the
test again. This also leads to higher reliability and improved stability. However, a retest
within a short period brings with it the problem of learning effects. Further, the specific
characteristics of the test will still influence the mean score. Administering a different test in
combination with the SON-R 2,-7 is a much more attractive alternative, because this
reduces the influence of various undesired sources of variance at the same time. Naturally
the alternative test must be suitable for the target group. For various groups of children with
whom the SON is used, only nonverbal tests, or the performance section of more general
tests will be considered as alternatives. Diversity in materials and method of testing is, as
far as possible, desirable. Furthermore, having different examiners administer the tests is
recommended.
Because the SON-R 2,-7 differs so clearly from many other tests in the method of administration and the materials used, it is very suitable to be administered as an extra test for children
to whom a (partially) verbal intelligence test can also be administered.
An IQ score based on two test administrations, and for young children preferably on three
test administrations, can be interpreted with much more confidence as the level of intelligence.
Such scores can be evaluated qualitatively according to the descriptions presented in the second
and third parts of table 10.2.

134

SON-R 2,-7

Other information
The question whether another test besides the SON-R 2,-7 should be administered depends
primarily on the consequences of an incorrect evaluation of the child. If these are not very
serious, and if the evaluation can easily be revised, a relatively large margin of uncertainty is
acceptable. The risk of an incorrect evaluation will also decrease if the result of the test can be
interpreted in combination with information from the parents, teachers and others concerned
with the child. The observations of the examiner may also give an indication of the desirability
of administering an extra test.
Manner of administration
A condition for the validity of the test result is that the test is administered in the correct manner
and according to the directions. Experience in administering tests is very important in this
respect, as is experience in interacting with young children and, if relevant, with children with
specific problems or handicaps. The administration of the test does not necessarily have to be
done by a psychologist. However, the interpretation and recording of the results remain the
domain of qualified experts.
An important aspect of the SON-R 2,-7 is its friendly approach to children and the interaction between examiner and child. This makes completing the test enjoyable, for both child and
examiner. However, this also means that the examiner is closely involved in the administration
and hence that the risks of examiner effects are greater. With one exception, systematic examiner effects were restricted to a few IQ points during our research. To reduce the risk of such
effects, it is advisable, in addition to closely following the directions, to be present on some
occasions when someone else administers the test and to allow someone else to be present when
administering the test oneself. If possible, comparing the results of different examiners now and
then is also useful.

10.5 CONCLUSIONS
The research has shown that the SON-R 2,-7 is a valid, reliable intelligence test that can be
used to good effect with children with problems and handicaps in language development and
communication, with children with a foreign language or bilingual background, with children
with a developmental delay and developmental disorders, and with mentally deficient children
and adults. When more traditional, verbal intelligence tests are used with these groups, the
evaluation of intelligence can be distorted by the language skills of the child. The test results of
the deaf children who were not multiple handicapped, and whose performance was almost equal
to that of the children in the norm group, demonstrate the importance of a nonverbal test
administration. The same holds true for the performance of the immigrant children. This was
much better than the performance on the traditional tests, and was comparable to the results of
the children in the norm group with a similar SES level.
Young children, in particular young children with a problem or handicap, are still frequently
difficult to test. In addition to the nonverbal character of the test, which allows, but does not
require, the child to speak, a child-oriented test situation is established by the help given by the
examiner, the attractiveness of the materials and the manner in which the child is actively
involved. Comparisons with two other tests showed that the children were more motivated and
concentrated with the SON-R 2,-7 and that, according to the examiners, they understood the
directions better.
The interaction between examiner and child offers extra opportunities for observation. However, the manner of administering the test does require the examiner to be thoroughly prepared
and to follow the directions.
The scores on the test correlated strongly with various evaluations of intelligence. The
performance of children with a developmental delay (in the Netherlands and in Australia) and
learning problems (in Great Britain) was low, as was expected. The correlations with other
nonverbal intelligence tests were reasonable. However, due to differences in content between

IMPLICATIONS OF THE RESEARCH FOR CLINICAL SITUATIONS

135

the tests and to fluctuations in the performances, the correlations were lower than would be
considered possible on the basis of the reliability of the test. The score on the test gives an
indication of the intelligence of the child; the score is not the level of intelligence. When
decisions with far-reaching consequences have to be made, the diagnosis should be based on the
administration of two or three intelligence tests.
The IQ score, for which the reference age can also be determined, is of prime importance for
the interpretation of the test results. The distinction between a Performance Scale and a Reasoning Scale was supported by the Principal Components Analysis and by the patterns of correlations with other tests. This is important because it, in turn, supports the multifacetted nature of
the concept of intelligence as it is measured with the SON-R 2,-7; however, the reliability of
the difference between the two scale scores is relatively low and of less practical importance.
The norms for the test scores are based on the exact age of the child, so avoiding systematic
distortions in the presentation of the results, and probability intervals are presented, allowing
the user to take the uncertainty about the results into account.
The difficulties which arise when testing young children, and the great diversity of problems
and handicaps of young children for which psychological assessments are requested, make it
extremely important that a number of well-constructed, standardized and validated intelligence
tests are available. The SON-R 2,-7 complies with these criteria.

137

11 GENERAL DIRECTIONS

In this chapter the general characteristics of the procedure for the test administration and
scoring are presented. In chapter 12 the directions for each separate subtest will be described.

11.1 PREPARATION
Before the test is administered for the first time, the examiner should become familiar with the
materials, the directions and scoring of the items. We strongly advise trying out the test a
number of times before using it. In our experience, administration of the test is not difficult. In
order to administer the test correctly the examiner must have a good command of the directions
so that he or she does not need to consult the manual during the administration. Learning to
administer the test is facilitated by observing a test administration or watching a video recording
of it.
If attention is not continually focused on the child, he or she can easily be distracted and
loose interest in the test. Specific characteristics of the administration of each subtest are
described on the record form so that these are always immediately available during the administration of the test.
A valid test administration of young children, certainly when they have problems and handicaps, requires a high level of expertise from the examiner. Experience in testing children is
essential. When a child has specific problems or handicaps, experience in interacting with these
children is desirable in order to be able to communicate easily with the child, and to deal with
any problems that may arise. Administration of the test is not restricted to psychologists
and (ortho)educationalists; experience in testing of, and interaction with young children is of
paramount importance. However, interpreting and reporting on the test results remains the
prerogative of experts.
The directions should be followed as closely as possible. Deviating from the directions may
influence the test results. In general, sufficient latitude is allowed in the directions for adapting
to the comprehension and skills of the individual child. Because of specific problems of a child,
e.g., motor handicaps, adapting the administration of the test may be necessary. This will be
discussed in section 11.6.

Set-up
The examiner sits at a table opposite the child. The table should not be too broad. Otherwise,
the examiner cannot easily help the child. The height of the table and chair should be
adjusted to the child. The child should be able to sit comfortably, to easily see what is on the
table and what the examiner does. Preferably, the examiner should sit so that the light falls
on his or her face.
Only the material the child needs at that moment should be on the table. The child works
on a large anthracite-colored mat. The mat stops the material sliding around, makes it easier
for the child to pick up items, and supplies a uniform background. The record form and the
material needed by the examiner are placed on another table, preferably outside the reach of
the child.

138

SON-R 2,-7

CHILD
bottom
left

test materials

right

top
EXAMINER
record
form
storage
box

When describing the directions, the left-right perspective is correct from the examiners position. The top-bottom perspective is correct when seen from the childs perspective. We have
chosen this top-bottom perspective because referring to the top of the test booklet as the bottom,
when it is lying the other way round according to the examiners perspective, may be confusing.
The test booklets are presented so that the title page is facing the child. The page numbers
and numbers on the cards are always legible from the examiners perspective. When studying
the directions, this test situation should be taken into account.
The examiner should always be sure that the childs view of the material is not blocked while
presenting materials, giving directions, or correcting answers. The examiner should consider
right or left handedness when placing materials on the table or giving them to the child.

Introduction
Before starting the test, the examiner should allow time for the child to get used to the setting.
The child should not have the impression that he or she has to achieve, but that he or she will be
playing with different materials.
The length of time, needed to administer the SON-R 2,-7, varies from three to five quarters
of an hour. The entire test should, preferably, be administered in one sitting. The examiner can
allow a short break between subtests now and then, so that the child can have a drink or go to the
bathroom.

11.2 DIRECTIONS AND FEEDBACK


Verbal and nonverbal directions
The SON-R 2,-7 can be administered with and without the use of spoken language. The verbal
and nonverbal directions are always printed in columns next to each other. The sentences printed
in small capitals (CAPITALS) in the left-hand column represent the spoken text. The italicized
text (italic) in the right-hand column represents the nonverbal directions.
We have tried to make both types of directions equivalent. Therefore it is important that
no extra verbal information be given. When using verbal directions one should limit oneself
to the text in capitals and give no further explanation. Naming the pictures or the shape and
color of the blocks used in Analogies for example, is not allowed. When using nonverbal
directions, one should be careful not to add any extra information in gestures or facial
expressions. The directions may be repeated when the child does not understand what he or
she is expected to do.
The nonverbal directions are used for children who have problems understanding spoken
language. Most generally, a combination of the two is used in which the nonverbal directions are
accompanied by parts of the verbal directions depending on the capabilities of the child to
understand verbal direction.

139

GENERAL DIRECTIONS

When using nonverbal directions, the gesture for together is often used. This should be done in
the following manner; move both hands together (slowly) as if to catch a large ball.

Help and feedback


After each item the examiner tells the child whether a solution is correct or incorrect. When a
child has made a mistake or when he or she is not able to complete the item, the examiner helps
and corrects the solution, while trying to actively involve the child. The purpose of feedback is
not to show the child what he or she cannot do, but to show him or her how to do the item
correctly. However, the item is only scored as being correct if the child has completed it
independently.
In principle, the examiner corrects an item made incorrectly while involving the child. The
mistake does not have to be corrected when the subtest is discontinued on the basis of the rules
for discontinuation.
The manner in which feedback should be given is described at the end of each part in the
subtest directions. In broad lines the feedback consists of the following reactions:
Following a correct solution:
YES, THATS GOOD, or YES, THATS
RIGHT, or GOOD, or use a similar
phrase.
When a child has made a mistake:
NO, ITS NOT QUITE RIGHT

Nod affirmatively: yes or use a


similar gesture.

Make a questioning gesture.


Shake head: no.

The examiner points to the picture, block, puzzle, or card. The examiner corrects the mistake,
when possible with the child.
LOOK, IF WE DO IT LIKE THIS, ITS BETTER.

Point to the correction and nod


affirmatively: yes.

The examiner tries to involve the child in actively correcting the mistakes by letting him or her
perform the last activity. The examiner does not explain why the childs answer was wrong.
When the child does not react despite encouragement:
The examiner completes the item while trying to actively involve the child in the solution.

Extended and short directions


In the first part of the subtests no separate examples are shown, but an example is included in the
administration of the item. That is why extended directions are given for the first items in each
subtest. When the purpose is clear, the examiner can suffice with short directions and gradually
shorten them to:
NOW THIS ONE, or NEXT ONE

Nod encouragingly.

When a child does not comprehend the directions, these may be repeated.
The second part of each subtest, with the exception of Patterns, is preceded by an example. This
example is always completed when the child reaches the items of the second part.
Every time the child has given an answer to an item, one must ascertain whether the child has
finished.
ARE YOU READY? or THATS IT? or READY?

Make a questioning gesture.

140

SON-R 2,-7

The child may immediately correct his or her answer him/herself. In such a case the examiner
should ask what the final answer is. Make sure that the child does not consider the question,
whether he or she is ready or whether the final answer has been given, to express doubt about the
correctness of the answer. Varying the questions might be advisable (for example: show me the
correct picture again, or which picture matches this best).
Wait with feedback until the answer is complete (this is very important when more than one
choice must be made).
Sometimes a child comments on slight differences in color between the testing material and the
pictures in the test booklets, or about the space remaining in the frame of Mosaics. Reassure
the child and tell him or her that it does not matter.

11.3 SCORING THE ITEMS


All items completed by the child are scored as being correct (1) or incorrect (0). An item is only
correct if it has been completed by the child independently and correctly. A time limit is used
in the second part of some of the subtests. When this is the case, items must be completed within
the time limit in order to be scored as being correct. In the case of older children items at the
beginning of the subtest are not presented on the basis of the entry procedure (see section 11.4)
and are scored as +. These items are scored as correct for the total score of the subtest. When
a child refuses to do an item, this is indicated by and scored as being incorrect.

Time limits
In part II of the performance subtests (Mosaics, Puzzles and Patterns) a maximum amount of
time is allowed per item. The examiner uses a stopwatch for these items. The time limit is 2,
minutes. Experience has shown that items are hardly ever completed correctly after this amount
of time has passed. The examiner may stop earlier when the child clearly cannot finish the item
successfully. When the child is almost finished after 2, minutes, the examiner allows the child
to finish the item.
The following situations can arise:
When the child is finished before the time is up, the examiner scores the item as being either
correct (1) or incorrect (0).
When it is clear before the time is up that the child will not succeed, the examiner can offer
help. The item is then scored as being incorrect (0).
When the child is not finished and the time limit has been reached, the examiner can help. The
item is scored as being incorrect (0).
When the time limit has been reached and the child can finish the item independently in a
short time, the child is allowed to do so. The item is scored as being incorrect (0).

Refusal
When the child does not wish to continue halfway through an item, the examiner encourages the
child to go on. When this has no effect, the examiner offers help. The item is then scored as
being incorrect (0).
When the child refuses to do an item in advance, or even to begin with a subtest, and
encouragement does not help, the examiner completes the item and tries to involve the child.
The item is then scored as being a refusal (). The child is then encouraged to complete the next
item. The administration of the subtest is discontinued when two consecutive items have
been refused. This subtest cannot be used for the evaluation of the test performances or for
the calculation of the IQ score.
When the child does not want to continue, a break may be called for, and in this extreme
situation changing the sequence of administration of the subtests may be considered. If, for
instance, the child does not want to continue doing Analogies, Patterns can be administered first

141

GENERAL DIRECTIONS

followed by Situations. Patterns, during which the child draws, is more attractive to do for some
children than Situations, during which the child must make choices.

11.4 THE ADAPTIVE PROCEDURE


During the administration of the SON-R 2,-7 an adaptive procedure is used that aims at
limiting the administration to the items that correspond to the level of the child. Using the
entry procedure, based on age/level, items are not presented that would in all likelihood have
been correctly solved. Children appear to become demotivated and uninterested when they
have to complete too many items below their level. The discontinuation rule precludes the
child having to try to solve too many items above his or her level. Items that are too difficult
for the child are frustrating and may easily lead to a refusal to continue, or to behavior
indicating that the child does not care what he or she does and that the child pays no more
attention. Besides these aspects of motivation, the adaptive procedure aims at limiting the
duration of the test.

Entry procedure
The first item of a subtest to be completed depends on the age and level of the child. Based on
age and class in primary education, the following rule holds:
Entry-item 1:
Entry-item 3:
Entry-item 5:

children of 2 or 3 years who have no school experience.


children of 4 and 5 years who are in their first or second year of school.
children of 6 years or older who are in their third or higher year of school.

When a discrepancy exists between the age of the child and the level in primary education, the
entry level corresponding to the lower level is chosen. A six-year-old who is still in his second
year of school will begin with entry-item 3. Children of 2 and 3 years always begin with entryitem 1.
When a child is suspected of having a substantial cognitive developmental delay, the entry
level can be adapted. When the examiner receives the impression that a five-year-old functions
at the level of a three-year-old child, he or she will begin with entry-item 1.
However, when a child is suspected of having only a slight developmental delay (roughly
corresponding to an IQ of 85 to 100), beginning at a lower level than is suggested on the basis of
age and level at school is not necessary or desirable. When the child has a fear of failure or is
difficult to test for another reason, beginning at a lower level may be wise.
In the subtest directions, the administration procedure is always described starting with item 1.
At the end of the description of part I of each subtest, changes in the directions due to beginning
with entry-item 3 or 5 are described.
The skipped items are scored as + on the record form. In the calculation of the subtest score
these items are reckoned as being correct.

Rules for discontinuation


The following discontinuation rule holds for all subtests:
A) A SUBTEST IS DISCONTINUED WHEN A TOTAL OF THREE INCORRECT ANSWERS HAS BEEN GIVEN.
PAY ATTENTION: To reach the criterion of three incorrect answers it is not necessary that
the mistakes be consecutive.
In addition to A the following discontinuation rule applies to part II of the performance subtests
(Mosaics, Puzzles and Patterns):

142

SON-R 2,-7

B) ADMINISTRATION OF THE SUBTESTS MOSAICS, PUZZLES AND PATTERNS


IS ALSO DISCONTINUED WHEN TWO CONSECUTIVE ITEMS HAVE BEEN
SCORED AS BEING INCORRECT IN PART II OF THESE SUBTESTS.
PAY ATTENTION: The time limit is also in effect for these items.

PAY ATTENTION: The number of mistakes includes the items completed incorrectly (score
0) as well as the items that were refused (score ).
Examples for discontinuation rule A:
The subtest is discontinued when a total of three items have been scored as being incorrect. The
entry procedure does not affect the discontinuation rule.
The meaning of the scores is:
+ Item skipped
1 Item correct

(based on the entry procedure; scored as being correct for the total score),
(completed entirely, independently and within the time limit when in
effect),
0 Item incorrect (completed incorrectly, not independently, incompletely or not within the
time limit),
Item refused (scored as being incorrect for the total score).

Mos

Cat

Puz

Cat

Ana

Sit

Pat

10 11 12 13 14 15

score
2

10 11 12 13 14 15

10 11 12 13 14

10 11 12 13 14 15

10

10 11 12 13 14

10

11 12 13 14 15 16

score
6

0 1

score
13

score
6

11 12 13 14 15 16 17 score
7

score
6

score
10

143

GENERAL DIRECTIONS

Examples for discontinuation rule B:


When two consecutive items of part II in the subtests Mosaics, Puzzles and Patterns have
been scored as being incorrect the subtest is also discontinued.

Mos

Puz

Pat

10 11 12 13 14 15

10 11 12 13 14

10

11 12 13 14 15 16

score
9

score
10

score
10

A special situation: starting and going back to previous items


The criteria for the entry-item and the construction of the various subtests are such that going
back to previous items should only happen incidentally. When this does occur, the following
rules, that will be described separately for entry-item 3 and entry-item 5, apply.
Entry-item 3
A child is 4 or 5 years old and starts the subtest with item 3. When either item 3 or 4 are
incorrectly solved, one goes straight back to item 1 and presents both items 1 and 2. When the
criterion for discontinuation has not been reached, one continues with the more difficult items
until the criterion has been reached.
When one has started the subtest with item 3, and the situation occurs that one has to go
back to item 1, then the following subtests are always started with item 1, and the entry
procedure is abandoned.
A number of examples: the sequence in which the items have been presented is printed under the
item scores.

Puz

Ana

9 10 11 12 13 14

score
5

10

11 12 13 14 15 16 17 score
A

Item 3, the first item to be completed has been solved incorrectly. One goes straight
back to item 1.

Sit

7
A

10 11 12 13 14

score
0

144

SON-R 2,-7

Entry-item 5
Children of 6 years and older start with item 5. When either item 5 or item 6 is scored as being
incorrect, one goes straight back to item 3 and does item 3 as well as item 4.
When items 3 and 4 are both scored as being correct, one goes on to the more difficult items
until the discontinuation criterion has been reached.
When either item 3 or 4, or both items 3 and 4 are scored as being incorrect, one goes back to
item 1 and does item 1 as well as item 2. When the discontinuation criterion has been met as
calculated from item 1 on, the score is calculated on the basis of the item at which the discontinuation criterion has been reached. When the discontinuation criterion has not yet been met, one
goes on to the more difficult items.
When the situation occurs that one has to go back in a subtest, the subsequent subtests are
started at the lowest entry level reached, i.e., at entry-item 3 or at entry-item 1.

Mos

10 11 12 13 14 15

score
6

Item 6 has been completed incorrectly, so item 3 and 4 are administered. Then one
continues with item 7 until a total of three items have been completed incorrectly.

Cat

8
A

10 11 12 13 14 15

score
5

Item 5 has been completed incorrectly. So item 3 and 4 are administered. Because item
3 is completed incorrectly, item 1 and 2 are administered. The total number of mistakes
is still less than three so one continues with item 6 until three mistakes have been made.

Puz

10 11 12 13 14

score
1

After completing item 5 and subsequently 3 and 4, three mistakes have been made.
However, one does go back to item 1.

Ana

10

11 12 13 14 15 16 17 score
A

Pay attention: The discontinuation criterion was reached at item 4 of Analogies. Item
5, which was completed first, is no longer counted for the score.

11.5 THE SUBTEST SCORE


The score on the subtest equals the number of items completed correctly (1), plus the number of
items that were skipped at the beginning (+). However, the subtest score is easier to calculate by
taking the number of the last item that was administered and deducting the number of mistakes
(0) and refusals (-). The scores of a six-year-old child on the different subtests are shown below.

145

GENERAL DIRECTIONS

Various aspects of the adaptive procedure are also illustrated on this record form. If this is not
yet entirely clear, section 11.3 and 11.4 should be studied anew. The scores that are calculated
here are the raw subtest scores. Using the norm tables or the computer program, they may be
transformed into the scaled standard scores.

Moz

Cat

Puz

Ana

Sit

Pat

1
+

10 11 12 13 14 15

10 11 12 13 14 15

10 11 12 13 14

10

10 11 12 13 14

10

score
7

score

score
7

11 12 13 14 15 16 17 score
A

11 12 13 14 15 16

score
8

score
5

The subtest Categories has not been scored in this example because the child refused to complete two consecutive items. This subtest is not used when calculating the IQ score.

11.6 ADAPTING THE DIRECTIONS


The directions of the SON-R 2,-7 show great flexibility, which makes adapting the test to the
communicative skills and age or cognitive level of the child a possibility. However, within this
broad framework, standardization of the method of administration is required and deviation is
undesirable. Undoubtedly other methods of administration are possible and may lead to a better
performance by the child. However, the standardization research was not carried out that way,
and a test result obtained in different circumstances cannot be interpreted using the norm tables.
Even so, special problems or handicaps my occur, which would make rigorous application of
the directions undesirable because the result of the test would then not be indicative of the
childs cognitive skills. This may be the case if a child has a motor handicap. In all sections of
the test the child is expected to actively do something, and when he or she is not physically able
to do so, the test result will not be valid.
Small scale research has been conducted as to the usefulness of the test for children with a motor
handicap (Van de Beek, 1995). This research has demonstrated that Patterns cannot be administered correctly. Picking up and handling the other materials can also be problematic. This can be

146

SON-R 2,-7

obviated, for example, by always offering the cards one by one during the subtests Situations
and Categories, by putting the blocks on the table instead of having the subject take them out of
the box when doing Mosaics, by adding an extra non-slip layer under the mat, or by allowing the
child to give the examiner directions (this does assume good verbal skills). Problems in
handling the materials also make adapting the time limits desirable, and possibly administering
the test over a period of a few days. However, our experience is limited and the diversity of
motor handicaps is so large, that giving set rules for administering the test to these children is
difficult. The examiner will have to discover what the limiting factors are and whether, and in
which manner, these can be compensated. When one works mainly with motor handicapped
children, administering the test a few times to children who are not handicapped is advisable.
This way one can get a clear idea of problems that occur during the administration that are
specific to the handicap.
Adapting the manner of administration of the test can also be desirable when children are
very fidgety of find it hard to focus on the test (for example autistic children). In such a case,
sitting at a corner of the table may be preferable to sitting opposite the child as one can then
draw the childs attention by touching him or her.
When one deviates from the standard directions during the administration of the test, this
should be mentioned on the record form so that others can take this into account when interpreting the results.

205

REFERENCES

Akker, J. van den & Boecop, A. van (1976). Test voor visuele waarneming van Marianne Frostig.
Handleiding. Amsterdam: Swets & Zeitlinger.
Alexander, P.A., Willson, V.L., White, C.S., Fuqua, J.D., Clark, G.D., Wilson, A.F. & Kulowich, J.M.
(1989). Development of analogical reasoning in 4- and 5-year-old children. Cognitive Development, 4, 65-88.
APA (1987). Diagnostic and Statistical Manual of Mental Disorders, DSM III-R. Washington: American Psychiatric Association.
Bayley, N. (1949). Consistency and variability in the growth of intelligence from birth to eighteen
years. Journal of Genetic Psychology, 75, 165-196.
Bayley, N. (1969). Manual for the Bayley Scales of Infant Development. New York: The Psychological
Corporation.
Beek, C. van de (1995). De toepasbaarheid van de SON-R 2,-7 bij kinderen met een motorische
handicap. RU Groningen: intern verslag.
Berg, W. van den, Heide, L. van der, Kamminga, J., Meeder, S. & Paredes, M.G. de (1994). Slim
gezien! een vergelijking tussen de SON-R 2,-7 (intelligentietest) en de DTVP-2 (visuele perceptietest). RU Groningen: intern verslag.
Berge, J.M.F. ten & Kiers, H.A.L. (1991). A numerical approach to the approximate and the exact
minimum rank of a covariance matrix. Psychometrika, 56, 309-315.
Berge, J.M.F. ten & Zegers, F.E. (1978). A series of lower bounds to the reliability of a test. Psychometrika, 4, 575-579.
Berger, H.J.Chr., Creuwels, J.M.P. & Peters, H.F.M. (1973). Nederlandse handleiding bij het gebruik
van Wechslers intelligentie-schaal voor kleuters, de W.P.P.S.I. Amsterdam: Swets & Zeitlinger.
Bleichrodt, N., Drenth, P.J.D., Zaal, J.N. & Resing, W.C.M. (1984). RAKIT Revisie Amsterdamse
Kinder Intelligentie Test. Instructie, normen, psychometrische gegevens. Lisse: Swets &
Zeitlinger.
Bleichrodt, N., Resing, W.C.M., Drenth, P.J.D. & Zaal, J.N. (1987). Intelligentie-meting bij kinderen.
Lisse: Swets & Zeitlinger.
Bollen, N. (1991). Cognitief aanvangsniveau jongste kleuters basisonderwijs. OVG Groningen: intern
verslag.
Bollen, N. (1996). De cognitieve ontwikkeling van kleuter tot achtjarige in het basisonderwijs. OVG
Groningen: intern verslag.
Bomers, A.J.A.M. & Mugge, A.M. (1985). Reynell Taalontwikkelingstest: Nederlandse instructie.
Nijmegen: Berkhout.
Bon, W.H.J. van (1982). TvK Taaltests voor Kinderen. Handleiding. Lisse: Swets & Zeitlinger.
Bracken, B.A. & McCallum, R.S. (1998). UNIT Universal Nonverbal Intelligence Test. Itaska, IL:
Riverside Publishing.
Brouwer, A., Koster, M. & Veenstra, B. (1995). Validation of the Snijders-Oomen test
(SON-R 2,-7) for Dutch and Australian children with disabilities. RU Groningen: intern
verslag.
Brown, L., Sherbenou, R.J. & Johnsen, S.K. (1982). Test of Nonverbal Intelligence. Austin, TX:
Pro-Ed.
Brown, L., Sherbenou, R.J. & Johnsen, S.K. (1990). TONI-2 Test of Nonverbal Intelligence.
Examiners manual. Second Edition. Austin, TX: Pro-Ed.
Carroll, J.B. (1993). Human cognitive abilities. A survey of factor-analytic studies. Cambridge:
Cambridge University Press.

206

SON-R 2,-7

Cattell, R.B. (1971). Abilities; their structure, growth, and action. Boston: Houghton Mifflin.
CBS (1993). Centraal bureau voor de statistiek: Statistisch Jaarboek 1993. s-Gravenhage: SDU/
uitgeverij.
CBS (1994). Centraal bureau voor de statistiek: de leefsituatie van de nederlandse bevolking 1993,
kerncijfers. s-Gravenhage: SDU/uitgeverij.
Coultre-Martin, J.P. le, Wijnberg-Williams, B.J., Meulen, B.F. van der & Smrkovsky, M. (1988). BOS
2-30. Normen voor kinderen met een vermoede hoorstoornis of met een spraak- of taalstoornis.
Tijdschrift voor Orthopedagogiek, 27, 75-84.
Cronbach, L.J. (1951). Coefficient alpha and the internal structure of tests. Psychometrika, 16, 297334.
Cronbach, L.J., Schnemann, P. & McKie, D. (1965). Alpha coefficients for stratified parallel tests.
Educational and Psychological Measurement, 25, 291-312.
Dekker, R. (1987). Intelligentie van visueel gehandicapte kinderen in de leeftijd van 6 tot 15 jaar.
Amsterdam: VU Uitgeverij.
Drenth, P.J.D. (1966). De psychologische test. Deventer: Van Loghum Slaterus.
Driesens, N., Horn, J. ten, Paro, I., Schoemaker, M. & Swartberg, D. (1994). De mogelijke samenhang
tussen twee niet-verbale intelligentietests: SON-R 2,-7 en de TONI-2. RU Groningen: intern
verslag.
Dunn, L.M. & Dunn, L.M. (1981). PPVT Peabody Picture Vocabulary Test Revised. Manual for
Forms L and M. Circle Pines, MN: American Guidance Service.
Eldering, L. & Vedder, P. (1992). OPSTAP: een opstap naar meer schoolsucces? Amsterdam/Lisse:
Swets & Zeitlinger.
Eldik, M.C.M. van, Schlichting, J.E.P.T., Lutje Spelberg, H.C., Meulen, Sj. van der & Meulen, B.F.
van der (1995). Reynell Test voor Taalbegrip. Handleiding. Nijmegen: Berkhout.
Elliott, C.D., Murray, D.J. & Pearson, L.S. (1979-82). British ability scales: Manuals. Windsor:
National Foundation for Educational Research.
Elsjan, M., Kooi, M. van de, Kuiper, M., Raaijmakers, M. & Wensink, J. (1994). SON-R 2,-7 en
TOMAL: samenhang tussen een niet-verbale intelligentietest en een geheugentest. RU Groningen: intern verslag.
Evers, A., Vliet-Mulder, J.C. van & Laak, J. ter (1992). Documentatie van Tests en Testresearch in
Nederland. Assen: Van Gorcum.
Flynn, J.R. (1987). Massive IQ Gains in 14 Nations: What IQ tests Really Measure. Psychological
Bulletin, 2, 171-191.
Frostig, M., Lefever, D.W. & Whittlesey, J.R.B. (1966). Administration and scoring manual for the
Marianne Frostig Developmental Test of Visual Perception. Palo Alto, CA: Consulting Psychologists Press.
Goswami, U. (1991). Analogical reasoning: what develops? A review of research and theory. Child
Development, 62, 1-22.
Guilford, J.P. & Fruchter, B. (1978). Fundamental statistics in psychology and education (6th ed.).
New York: McGraw-Hill.
Guttman, L. (1945). A basis for analyzing test-retest reliability. Psychometrika, 10, 255-282.
Haan, N. de & Tellegen, P.J. (1986). De herziening van de schriftelijke taaltest voor doven. RU
Groningen: intern verslag.
Haasen, P.P. van, Bruyn, E.E.J. de, Pijl, Y.J., Poortinga, Y.H., Lutje Spelberg, H.C., Steene,
G. vander, Coetsier, P., Spoelders-Claes & Stinissen, J. (1986). WISC-R, Wechsler
Intelligence Scale for Children Revised. Nederlandstalige uitgave. Lisse: Swets & Zeitlinger.
Hambleton, R.K. & Swaminathan, H. (1985). Item response theory: Principles and applications.
Boston, MA: Kluwer-Nijhoff.
Hammill, D.D., Pearson, N.A. & Voress, J.K. (1993). DTVP-2 Developmental Test of Visual Perception. Examiners manual. Second Edition. Austin, TX: Pro-Ed.
Hammill, D.D., Pearson, N.A. & Wiederholt, J.D. (1996). CTONI Comprehensive Test of Nonverbal
Intelligence. Examiners Manual. Austin, TX: Pro-Ed.
Harinck, F. & Schoorl, P. (1987). Wast vernieuwde WISC-R werkelijk witter? Kind en adolescent, 3,
109-118.

REFERENCES

207

Harris, S.H. (1982). An evaluation of the Snijders-Oomen Nonverbal Intelligence Scale for Young
Children. Journal of Pediatric Psychology, 7, 3, 239-251.
Hessels, M.G.P. (1993). Leertest voor Etnische Minderheden. Theoretische en Empirische
Verantwoording. Rotterdam: RISBO.
Hofstee, W.K.B. (1990). Toepasbaarheid van psychologische tests bij allochtonen. Rapport van de
testscreeningscommissie ingesteld door het LBR in overleg met het NIP. Utrecht: Landelijk
Bureau Racismebestrijding.
Hofstee, W.K.B. & Tellegen, P.J. (1991). SON 2,-7, subsidie-aanvraag NWO 560-267-033. Groningen: RUG Persoonlijkheids- en Onderwijspsychologie.
Horn, J. ten (1996). Amerikaanse validering van de Snijders-Oomen niet-verbale intelligentietest voor
jonge kinderen, de SON-R 2,-7. RU Groningen: intern verslag.
Jenkinson, J., Roberts, S., Dennehy, S. & Tellegen, P. (1996). Validation of the Snijders-Oomen
Nonverbal Intelligence Test Revised 2,-7 for Australian Children with Disabilities. Journal
of Psychoeducational Assessment, 14, 276-286.
Kaufman, A.S. (1975). Factor Analysis of the WISC-R at 11 age levels between 6, and 16, years.
Journal of Consulting and Clinical Psychology, 43, 135-147.
Kaufman, A.S. & Kaufman, N.L. (1983). K-ABC Kaufman Assessment Battery for Children.
Interpretive Manual. Circle Pines, MN: American Guidance Service.
Kiers, H.A.L. (1990). SCA: een programma voor simultane component analyse. Groningen: IEC,
ProGamma.
Kiers, H.A.L. & ten Berge, J.M.F. (1989). Alternating least squares algoritms for simultaneous
components analysis with equal weight matrices in two or more populations. Psychometrika,
54, 467-473.
Kievit, Th. & Tak, J.A. (1996). De praktijk van de hulpverlening en het gebruik van de regulatieve
cyclus. In: Kievit, Th., Wit, J de, Groenendaal, J.H.A. & Tak, J.A. (eds.), Handboek psychodiagnostiek voor de hulpverlening aan kinderen. Utrecht: De Tijdstroom.
Laros, J.A. & Tellegen, P.J. (1991). Construction and validation of the SON-R 5,-17, the SnijdersOomen non-verbal intelligence test. Groningen: Wolters-Noordhoff.
Lienert, G.A. (1961). Testaufbau und Testanalyse. Weinheim: Verlag Julius Beltz.
Lombard, A.D. (1981). Success begins at Home. Educational Foundations of Pre-schoolers.
Massachusetts, Toronto: Lexington Books.
Lord, F.M. (1980). Applications of item response theory to practical testing problems. Hillsdale, NJ:
Lawrence Erlbaum.
Lord, F.M. & Novick, M.R. (1968). Statistical Theories of Mental Test Scores. Reading, MA:
Addison-Wesley Publishing Company.
Lutje Spelberg, H.C. & Van der Meulen, Sj. (1990). Het meten van taalbegrip en taalproductie,
subsidie-aanvraag NWO 560-256-040. Groningen: RUG afd. Orthopedagogiek.
Lynn, R. (1994). Sex differences in intelligence and brain size: a paradox resolved. Personality and
Individual Differences, 17, 2, 257-271.
Lynn, R. & Hampson, S. (1986). The rise of national intelligence: evidence from Britain, Japan and the
U.S.A.. Personality and Individual Differences, 1, 23-32.
McCarthy, D. (1972). Manual for the McCarthy Scales of Childrens Abilities. San Antonio: The
Psychological Corporation.
Meulen, B.F. van der & Smrkovsky, M. (1983). BOS 2-30 Bayley Ontwikkelingsschalen. Handleiding.
Lisse: Swets & Zeitlinger.
Meulen, B.F. van der & Smrkovsky, M. (1986). MOS 2,-8, McCarthy Ontwikkelingsschalen.
Handleiding. Lisse: Swets & Zeitlinger.
Meulen, B.F. van der & Smrkovsky, M. (1987). BOS 2-30 Bayley Ontwikkelingsschalen. Handleiding
bij de niet-verbale versie. Lisse: Swets & Zeitlinger.
Millsap, R.E. & Meredith, W.M. (1988). Component analysis in cross-sectional and longitudinal data.
Psychometrika, 53, 123-134.
Mislevy, R.J. & Bock, R.D. (1990). BILOG 3: Item Analysis and Test Scoring with Binary Logistic
Models. Mooresville, IN: Scientific Software.
Neutel, R.J., Meulen, B.F. van der & Lutje Spelberg, H.C. (1996). GOS 2,-4, Groningse
OntwikkelingsSchalen. Handleiding. Lisse: Swets & Zeitlinger.

208

SON-R 2,-7

Nunnally, J.C. (1978). Psychometric Theory (2nd ed.). New York: McGraw-Hill.
Nunnally, J.C. & Bernstein, I.H. (1994). Psychometric Theory (3rd ed.). New York: McGraw-Hill.
Raven, J.C. (1962). Coloured Progressive Matrices. London: Lewis.
Rekveld, I. (1994). De cognitieve ontwikkeling van kleuters in het basisonderwijs. OVG
Groningen: Intern verslag.
Resing, W.C.M., Bleichrodt, N. & Drenth, P.J.D. (1986). Het gebruik van de RAKIT bij allochtoon
etnische groepen. Nederlands Tijdschrift voor de Psychologie, 41, 179-188.
Reynell, J.K. (1977). Reynell Developmental Language Scales. Windsor: NFER-Nelson.
Reynell, J.K. (1985). Reynell Developmental Language Scales, second revision. Windsor: NFERNelson.
Reynolds, C.R. & Bigler, E.D. (1994). TOMAL Test of Memory and Learning. Examiners manual.
Austin, TX: Pro-Ed.
Roelandt, Th., Roijen, J.H.M. & Veenman, J. (1992). Minderheden in Nederland: statistisch vademecum 1992. s-Gravenhage: SDU/uitgeverij.
Sattler, J.M. (1992). Assessment of Children. Revised and Updated Third Edition. San Diego, CA:
J.M. Sattler, Publisher, Inc.
Schlichting, J.E.P.T., Eldik, M.C.M. van, Lutje Spelberg, H.C., Meulen, Sj. van der & Meulen, B.F.
van der (1995). Schlichting Test voor Taalproduktie. Handleiding. Nijmegen: Berkhout.
Schroots, J.J.F. & Alphen de Veer, R.J. van (1976). LDT Leidse Diagnostische Test, deel 1
Handleiding. Amsterdam: Swets & Zeitlinger.
Sijtsma, K. (1993). Kaf en koren onder Nederlandse tests. De Psycholoog, 28, 12, 502-503.
Smulders, F.J.H. (1963). STUTSMAN intelligentietest voor kleuters. Nederlandstalige bewerking.
Nijmegen: Berkhout.
Snijders-Oomen, N. (1943). Intelligentieonderzoek van doofstomme kinderen. Nijmegen: Berkhout.
Snijders, J.Th. & Snijders-Oomen, N. (1958) eerste editie, (1970) tweede editie. Snijders-Oomen
niet-verbale intelligentieschaal SON-58. Groningen: Wolters-Noordhoff.
Snijders, J.Th. & Snijders-Oomen, N. (1976). Snijders-Oomen Non-verbal Intelligence Scale, SON
2,-7. Groningen: Tjeenk Willink BV.
Snijders, J.Th., Tellegen, P.J. & Laros J.A. (1989). Snijders-Oomen non-verbal intelligence test,
SON-R 5,-17. Manual and research report. Groningen: Wolters-Noordhoff.
Snippe, M.D. (1996). Prestaties van kinderen met autisme en aan autisme verwante stoornissen op de
SON-R 2,-7. RU Groningen: Intern verslag.
SPSS Inc. (1990). SPSS/PC+ 4.0 Advanced Statistics. Chicago, Illinois: SPSS Inc.
Starren, J. (1975). SSON 7-17. De ontwikkeling van een nieuwe versie van de SON voor 7-17 jarigen.
Verantwoording en handleiding. Groningen: Wolters-Noordhoff.
Stinissen, J. & Steene, G. vander (1981). WPPSI Wechsler Preschool and Primary Scale of
Intelligence. Handleiding bij de Vlaamse aanpassing. Lisse: Swets & Zeitlinger.
Struiksma, A.J.C. & Geelhoed, J.W. (1996). Intelligentieonderzoek. In: Kievit, Th., Wit, J de, Groenendaal, J.H.A. & Tak, J.A. (eds.), Handboek psychodiagnostiek voor de hulpverlening aan
kinderen. Utrecht: De Tijdstroom.
Stutsman, R. (1931). Mental measurement of preschool children. Yonkers-on-Hudson, NY: World
Book.
Tellegen, P.J. (1993). A nonverbal alternative to the Wechsler Scales: The Snijders-Oomen Nonverbal
Intelligence Tests. In First Annual South Padre Island International Interdisciplinary
Conference on Cognitive Assessment of Children and Youth in School and Clinical Settings, A
Compendium of Proceedings. Fort Worth, TX: CyberSpace Publishing Corporation.
Tellegen, P. (1997). An Addition and Correction to the Jenkinson et al. (1996) Australian SON-R
2,-7 Validation Study. Journal of Psychoeducational Assessment, 15, 67-69.
Tellegen, P.J. & Laros, J.A. (1993a). The Snijders-Oomen Nonverbal Intelligence Tests: General
Intelligence Tests or Tests for Learning Potential? In: Hamers, J.H.M., Sijtsma, K. &
Ruijssenaars, A.J.J.M. (eds.), Learning Potential Assessment: Theoretical, Methodological and
Practical Issues. Amsterdam/Lisse: Swets & Zeitlinger.
Tellegen, P.J. & Laros, J.A. (1993b). The Construction and Validation of a Nonverbal Test of
Intelligence: The Revision of the Snijders-Oomen Tests. European Journal of Psychological
Assessment, Vol 9, 2, 147-157.

REFERENCES

209

Tellegen, P.J., Winkel, M. & Wijnberg-Williams, B.J. (1997). Snijders-Oomen Nonverbal Intelligence
Test SON-R 2,-7. Manual. Lisse: Swets & Zeitlinger
Tellegen, P.J., Wijnberg, B.J., Laros, J.A. & Winkel, M. (1992). Evaluatie van de SON 2,-7 ten
behoeve van de revisie. RU Groningen: intern verslag.
Verhelst, N.D. & Glas, C.A.W. (1995). Dynamic Generalizations of the Rasch Model. In: Fischer,
G.H. & Molenaar, I.W. (eds.), Rasch Models: Foundations, Recent Developmemts, and
Applications. New York: Springer-Verlag.
Warm, T.A. (1989). Weighted Likelihood Estimation of Ability in Item Response Theory.
Psychometrika, 54, 3, 427-450.
Wechsler, D. (1967). Manual for the Wechsler Preschool and Primary Scale of Intelligence. San
Antonio, TX: The Psychological Corporation.
Wechsler, D. (1974). Manual for the Wechsler Intelligence Scale for Children Revised. San Antonio,
TX: The Psychological Corporation.
Wechsler, D. (1989). WPPSI-R, Wechsler Preschool and Primary Scale of Intelligence Revised.
Manual. San Antonio, TX: The Psychological Corporation.
Wechsler, D. (1991). WISC-III Manual. San Antonio, TX: The Psychological Corporation.
Westerlaak, J.M. van, Kropman, J.A. & Collaris, J.W.M. (1975). Beroepenklapper. Nijmegen:
Instituut voor Toegepaste Sociologie (ITS).
Wijnands, A. (1997). De SON-R tests: verkennend onderzoek van de SON-R tests bij kinderen en
volwassenen met een verstandelijke handicap. RU Groningen: intern verslag.
Zimmerman, I.L., Steiner, V.G. & Pond, R.E. (1992). PLS-3 Preschool Language Scale-3. Examiners
Manual. San Antonio, TX: The Psychological Corporation.
Zimowski, M.F., Muraki, E., Mislevy, R.J. & Bock, R.D. (1994). BIMAIN 2, Multiple-group IRT
Analysis and Test Maintenance for Binary Items. Chicago, IL: Scientific Software International.

Вам также может понравиться