Thesis 7 - 16

INFORMATION TO USERS
This was produced from a copy of a document sent to us for microfilming. While the
most advanced technological means to photograph and reproduce this document
have been used, the quality is heavily dependent upon the quality of the material
submitted.
The following explanation of techniques is provided to help you understand
markings or notations which may appear on this reproduction.
1. The sign or "target" for pages apparently lacking from the document
photographed is "Missing Page(s)". If it was possible to obtain the missing
page(s) or section, they are spliced into the film along with adjacent pages.
This may have necessitated cutting through an image and duplicating
adjacent pages to assure you of complete continuity.
2. When an image on the film is obliterated with a round black mark it is an
indication that the film inspector noticed either blurred copy because of
movement during exposure, or duplicate copy. Unless we meant to delete
copyrighted materials that should not have been filmed, you will find a
good image of the page in the adjacent frame.
3. When a map, drawing or chart, etc., is part of the material being photographed the photographer has followed a definite method in "sectioning"
the material. It is customary to begin filming at the upper left hand corner
of a large sheet and to continue from left to right in equal sections with
small overlaps. If necessary, sectioning is continued againbeginning
below the first row and continuing on until complete.
4. For any illustrations that cannot be reproduced satisfactorily by
xerography, photographic prints can be purchased at additional cost and
tipped into your xerographic copy. Requests can be made to our
Dissertations Customer Services Department.
5. Some pages in any document may have indistinct print. In all cases we
have filmed the best available copy.
University
Microfilms
International
300 N ZEEB ROAD. A N N A R B O R , Ml 48106
18 B E D F O R D ROW, LONDON WC1R 4EJ, E N G L A N D
8012402
OEHMKE, THERESA MARIA
THE DEVELOPMENT AND VALIDATION OF A TESTING INSTRUMENT

TO MEASURE PROBLEM SOLVING SKILLS OF CHILDREN IN GRADES
FIVE THROUGH EIGHT
The University of Iowa
PH.D.
1979
University
Microfilms
I n t e r n c i t I O PI 3.1 300 N. Zeeb Road, Ann Arbor, MI 48106
18 Bedford Row, London WC1R 4EJ, England
Copyright 1979
by
OEHMKE, THERESA MARIA
All Rights Reserved
THE DEVELOPMENT AND VALIDATION OF A

TESTING INSTRUMENT TO MEASURE PROBLEM SOLVING SKILLS
OF CHILDREN IN GRADES FIVE THROUGH EIGHT
by
Theresa Maria Oehmke
A thesis submitted in partial fulfillment of the

requirements for the degree of Doctor of Philosophy
in Education in the Graduate College of
December, 1979
Thesis supervisor: Professor Harold L. Schoen
Graduate College
Iowa City, Iowa
CERTIFICATE OF APPROVAL
PH.D. THESIS
This is to certify that the Ph.D. thesis of

Theresa Maria Oehmke
has been approved by the Examining Committee
for the thesis requirement for the Doctor of
Philosophy degree in Education
at the December, 1979 graduation.
Thesis committee:
ffy%A-yi~^-0L^&(
Thesis3 supervisor
Member
Member
Member
r^oci^-e^z^
DEDICATION
To
Bob and Jim
ii
ACKNOWLEDGEMENT
For their help in the preparation of this thesis, I owe an expression of appreciation to a large number of persons; only a few of whom I
shall mention by name.
To Professor Harold Schoen I extend my thanks for all the guidance,
motivation, direction and assistance he gave me from the initiation to
the completion of this study.
Special thanks are due George Immerzeel, Joan Duea, Earl Ockenga
and John Tarr and the personnel at the Malcolm Price Laboratory School
for their encouragement and support during several phases of this investigation.
A debt of gratitude to Professors H. D. Hoover and A. N. Hieronymus
is acknowledged for their assistance on some of the statistical details
of the testing procedure and for providing me with the opportunity to
collect some of the pertinent data in this study.
I would like to express my appreciation to William M. Smith for the
use of his expertise on the technical aspects of the use of the computer
on data processing.
Thanks is given to Ada Burns for her amazing ability to type what I
thought I had written.
Finally, I would like to thank my husband Bob, and son Jim, and all
my friends who were always ready with an encouraging word.
iii
TABLE OF CONTENTS
Page
LIST OF TABLES
vi
LIST OF FIGURES
vii
CHAPTER
I.
INTRODUCTION
Purpose
Overview of the Study
IPSP Test Development
IPSP Test Validation
Reliability
Content Validity
Concurrent Validity
Discriminant Validity
Operational Definitions Used in This Study
Overview
II.
REVIEW OF THE LITERATURE
12
Models of Problem Solving

Testing the Ability to Solve Problems
Summary
III.
.....
3
3
h
5
6
8
8
9
10
11
DEVELOPMENT AND PROCEDURES
13
19
22
23
Purpose of the Iowa Problem Solving Project

Development of the IPSP Test
Design and Development of Interview Procedures
The Quantification Scale
Development
Use of the Scale
The Pilot Interview Study
Procedures
Pilot Sample
Interview Problems
The Interviews
The Final Interview Study
ITBS and the IPSP Test
IPSP Subtest Discrimination
iv
23
2*1
28
30
30
3b
1+0
^0
kl
k2
^3
kk
^5
h9
CHAPTER
Page
IV. DATA AND RESULTS
50
IPSP Test and Interview Data

Pilot Interview Study
Final Interview Study
IPSP Test Administration
ITBS Subtests
Correlations Between IPSP Subtests and ITBS
Subtests
V.
50
51
53
55
55
60
60
62
ANALYSES AND IMPLICATIONS
72
Summary and Conclusions

Phase 1: The Final Interview Study
Phase 2: ITBS and the IPSP Test
Phase 3: IPSP Subtest Discrimination
Limitations
Classroom Implications
Implications for Research
IPSP Test
72
73
73
7^
7^
. 76
8l
82
APPENDIX A.
DESCRIPTION OF THE IOWA PROBLEM SOLVING PROJECT . . . .
APPENDIX B.
PHASES OF IPSP TEST DEVELOPMENT AND SUMMARY OF TEST
APPENDIX C.
85
FORMS
9^
INTERVIEW PROBLEMS USED IN THE PILOT STUDY
98
APPENDIX D.
IOWA TEST OF BASIC SKILLS RELIABILITY ANALYSIS

NATIONAL REPRESENTATIVE SAMPLE
APPENDIX E. FORMULA FOR ITBS CORRELATIONS CORRECTED FOR
ATTENUATION
APPENDIX F.
APPENDIX G.
106
10 8
ANALYSIS OF IPSP TEST RESULTS OCTOBER 1978

AND MARCH 1979 ADMINISTRATIONS
Ill
PILOT TEACHING STUDY
120
BIBLIOGRAPHY
125
LIST OF TABLES
Table
Page
1. Phases of Validation of the IPSP Test
27
2.
Analysis of IPSP Test with Interviews-Pilot Study
52
3. Analysis of IPSP Test with InterviewsFinal Study
5^
b.
56
Reliability Analysis
5. Reliability Analysis
57
6.
58
7.
59
8.
Correlations Between IPSP Subtests and Iowa Test of Basic

Skills Tests
6l
9-
Reliability Analysis Sample of Iowa Students
October 1978 . . 68
10.
Reliability Analysis Sample of Iowa Students
March 1979 . . . 69
11.
Correlations Corrected for Attenuation of October 1978 IPSP

Subtests
12.
Iowa Test of Basic Skills Reliability Analysis

Representative Sample
VI
70
National
107
LIST OF FIGURES
Figure
Page
1.
Steps and Component Skills of the IPSP Test Model
2.
Interviewing Procedures
31
3.
Quantification Scheme for the Components of the IPSP Test

Model
Step 1: Getting to Know the Problem
35

Model
Step 3: Carrying Out the Plan
36
b.
5-

Model
Step it: Looking Back
37
6.
Schedule of Events for Pilot Interview Study
b2
7.
Schedule of Events for Final Interview Study
bG
8.
Square of Correlations Corrected for Attenuation Between

IPSP Steps and ITBS Subtests Form 563, Grade 5
9.
10.
11.

63
6b
65

66
12.
Ann's IPSP Test Profile
77
13-
Dave's IPSP Test Profile
78
lU.
Schedule for Pilot Study
122
vii
CHAPTER I
INTRODUCTION
The question of what processes are involved in problem solving and

more particularly in mathematical problem solving is one that has been
investigated for many years.
As one surveys the literature, one not only
becomes aware of the vast quantity of research that is being done but
also of the many and diverse methods used to study the problem solving
processes.
Studies range from simply observing individual students as they
solve problems to factor analysis of paper-pencil measures of problem
solving. A data-gathering method that has come into prominent use today
is the structured one-to-one interview.
In such cases, the researcher
observes the behavior of the subject as he "thinks aloud" while solving

selected problems. Typically the interviews are audio- or video-taped
and a protocol or check list of processes is completed for each studentproblem pair. Other approaches include simulating the problem solving
processes of a human being on a computer and using the results to build
a problem solving theory.
In addition, results and methods of research-
ers investigating the cognitive processes involved in general problem

solving often have application for mathematical problem solving.
In an attempt to analyze and thereby further understand the problem
solving process researchers have proposed various multistep models.

One of the earliest was proposed by Wallas (1926).
Based on his own
experiences and his analyses of what others think they are doing when
they solve problems, Wallas suggested a four-step model: preparation,
incubation, illumination, and verification.
Drawing upon many years of
experience as a mathematician and a teacher, Polya (1957, 1962) proposed another four-step model: understand the problem, make a plan,
carry out the plan, and look back at the complete solution.
Restle and
Davis (1962) suggested that the problem solver goes through a number of
independent but sequential stages. The student solves a subproblem at
each stage, thereby allowing him to go on to the next step.
These and
other multistep models have appeared in the literature with varying

degrees of supporting empirical evidence.
If problem solving is a multistep process the question then
arises: would it be possible to measure a person's ability (or skill)
at each of the steps?
If there were a reliable testing instrument
which measured this ability, of what value would it be to the classroom

teacher?
Would the teacher be able to use such a test to help plan for
problem solving instruction?
Of course, the one-to-one interview strat-
egy is available to evaluate a child's problem solving processes but the

teacher does not have the time to conduct interviews with all the students in the classroom.
In addition, the interview has been criticized
for yielding results which are subjective and unreliable.

There is little in the literature concerning paper-pencil testing
instruments which purport to measure problem solving processes. Many
existing group administered tests contain problem solving or
application sections which consist of verbal problems. For example,

the Iowa Test of Basic Skills (Houghton, Mifflin Co., 19lb)
and the
Metropolitan Achievement Tests (Harcourt, Brace and Jovanovich, 1971)

contain subtests designed to measure problem solving ability.
However,
the information these tests provide, namely, the grade-level equivalent

or percentile rank of the student and class in a larger group, is derived from the number of correct answers. No attempt is made to measure the process that was used to arrive at the solution or to identify
specific skills which may be the source of the child's difficulties.
Purpose
At
As part of the Iowa Problem Solving Project (IPSP) , Schoen and

Oehmke developed a multiple choice paper-pencil test designed to provide individual and class profiles which illustrate the performance of
fifth through eighth graders at each of three steps of the problem solving process. A modified version of the four-step problem solving process model proposed by Polya served as the IPSP testing model.
The purpose of this study is the validation of this testing instrument called the Iowa Problem Solving Project Test (IPSP test).
Overview of the Study
The steps in the validation process are listed below.
1. Compute estimates of the reliability coefficients for the IPSP
test and its three subtests.
'A three-year project directed by George Immerzeel of the University of

Northern Iowa and funded under ISEA, Title IV, C.
2.
Judge the content validity of the IPSP test using the judg-
ments of mathematics educators and educational measurement specialists.

3. Judge the concurrent validity by measuring the relationship
between the IPSP subtest scores and a) the results of a think aloud
measurement procedure, and b) the mathematics concepts, mathematics
problem solving, reading comprehension, and graph skills subtest scores
of the Iowa Test of Basic Skills.
b.
Judge the discriminant validity of the IPSP test by analyzing
a matrix of correlation coefficients, corrected for attenuation,

between pairs of the three IPSP subtests and the four ITBS subtests .
IPSP Test Development
Several "non-standard" instruments have been developed to test
problem solving ability (Wearne, 1976; Zalewski, 1975; Lucas, 1972).
These will be described in Chapter two. However, the IPSP test appears
to be the first attempt to measure skills in a multistep model. A multiple choice format was chosen in order to maximize the potential for
broad, long-range impact on classroom practices. It was reasoned that
a) standardized machine-scored tests presently have a great influence on
curriculum and instruction, and b) in this format the IPSP test may have
some influence on standardized testing.
Furthermore, the machine scor-
ing capability is likely to increase the number of potential users.

After a good deal of "brainstorming" with members of fie IPSP team, a
search of the problem solving literature and a detailed analysis of
each stage, the testing model as shown in Figure 1 was developed.
5
1.
Get to
A.
B.
C.
Know the Problem

Determine insufficient information
Identify extraneous information
Write a question for the problem setting
2.
Choose What to Do from a List of Strategies
3.
Do It
A. Choose the necessary computation
B. Estimate from a diagram
C. Compute from a diagram
D. Use a table
E. Compute from an equation
b.
Look Back
A. Identify problems that can be solved in the same
way as a given one
B. Vary conditions in a given problem
C. Check a solution with the conditions of the
problem
Steps and Component Skills of the IPSP Test Model
Figure 1
Like standardized tests the IPSP test can be efficiently administered to large groups of students and machine scored with various norm
data easily obtainable.
In addition, subtest scores corresponding to
steps 1, 3, and b can be obtained. After nearly two years of effort,

no via'ble way to test skills in step 2 in a multiple-choice format was
found.
IPSP Test Validation
As an innovative measurement instrument containing many items with
untested structural characteristics and designed to measure constructs
never "before measured with paper-pencil instrument, the IPSP test
development called for much careful planning and formative evaluation.
6
Major questions concerned the validity of the test, if indeed, a reliable test with reliable subtests could be constructed. By utilizing
the Iowa Testing Program's tryout facilities it was possible to construct experimental test units, administer them to representative samples of Iowa fifth through eighth graders, and revise the units based
on the item analyses and test data. Also, over 100 students
were interviewed at various stages in the test development process as a
concurrent check on the test validity.
Over a three-year period the
test evolved into its present form.

For the present study estimates of reliability and of several
types of validity of the final form of the IPSP test were obtained:
(l) content validity, (2) concurrent validity, and (3) discriminant
validity. A general description of the procedures for each of the
above areas follows. A detailed description of the procedures and
findings are in later chapters.
Reliability
The concept of reliability refers to an estimation of the degree
of consistency of measurement.
Theoretically, if a test is reliable,
an individual should obtain the same, or almost the same, score on

repeated administrations. Many factors may affect a student's observed
score to make it different than the theoretically true score. These
"errors" of measurement may be caused by the test itself, i.e., a particular sample of items may be a "good" or "bad" sample; by attributes
of the person taking the test, i.e., motivation, attitude, health, testwiseness; or by administrative conditions and procedures at the time of
7
the test, i.e. , noise, distractions, poor lighting, or lack of uniformity in giving test instructions. The most difficult factors to control
are the subject's attributes.
Reliability of a score for a testing sample is defined as the ratio
of the variance of the true score to the variance of the observed score:
2
r
tt
2
S
2
r JX = reliability of the test, s m = variance of the true
tt
T
2
score, and s
= variance of obtained score. Reliability is also re-
where
ferred to in the literature as a correlation between scores on the same

test or parallel tests.
One of the most commonly used methods of estimating reliability is
the Cronbach alpha.
If the data are in dichotomous form, this estimate
is equivalent to the Kuder-Richardson 20 coefficient.
It may be calcu-
lated by the following formula:

2
- srr (1 A>
where
J.
is the variance of the sum over the k
items and
Xi
is the
average item variance. An advantage of this coefficient is that it

computes all ways a given test might be split and then gives a "best
estimate" reliability.
In this study, reliability coefficients were
estimated using either the Cronbach alpha formula or a modified version

of the KR-8 which is a formula in the series of 20 developed by KuderRichardson (1937)
8
Content Validity
The basic content validity question is: Are the items in question
a representative sample of the construct or subject matter domain to be
measured?
A familiar example of a need for high content validity is
the typical classroom test. Here the individual's score on a sample of

items from a content domain is used to infer the students' achievement
level in the entire domain. Thus the assessment of content validity
involves careful judgment of the degree to which the selection of items
is appropriate and representative of the domain or construct to be
tested.
Content validity is usually assessed by the judgment of experts,
that is, subject matter and testing specialists. A limitation of this
method'is that there are no clearly specified techniques or standards
for determining content validity.
The reaction of educational testing experts and mathematics educators to the IPSP testing model and sample test units was sought. The
testing experts were two faculty members in Educational Measurement at
The University of Iowa who also are authors of the Iowa Test of Basic
Skills.
The mathematics educators were the IPSP project team and ad-
visory board members, both composed of mathematics teachers in grade

five through graduate mathematics education.
Concurrent Validity
Concurrent validity refers to the degree to which a test correlates with this independent measure. It is the determining factor in
deciding whether a test can replace procedures that are either more
elaborate or require special techniques as in the case of the IPSP test

and the one-to-one interview method.
One way of demonstrating a test's
concurrent validity is to compare the test scores with some independent

measure which is presumed to measure the behavior in question.
In this study data from interviews and from the IPSP test were
gathered concurrently.
The interview measure consisted of presenting a
series of problems to a child and asking him to think aloud as he attempted to solve each problem.
The data were then obtained by coding
and analyzing the solution strategies from observations, audio-tapes,

and the subjects' written work. Thus one estimate of the concurrent
validity of the IPSP test is the correlation between the score in each
of the three IPSP subtests and corresponding data collected via the
think aloud procedure.
Another estimate of concurrent validity was derived from the IPSP
test's relationship to several standardized achievement measures. The
criteria selected were four Iowa Test of Basic Skills subtests: mathematics concepts, mathematics problem solving, reading comprehension,
and graph skills.
Discriminant Validity
Campbell and Fiske (1959) state that validation usually proceeds
by convergent methods, i.e., independent measurement techniques designed to measure the same trait. However, for purposes of test interpretation, discriminant validity is also required; that is, not only
should a test be highly correlated with other tests purporting to measure the same trait but it should not correlate highly with tests that
measure distinctly different traits. In the case of the IPSP test, the
10
discriminant validity issue refers to the degree to which the subtest
scores differ from each other and from scores on other similar tests.
Discriminant validity in this study is approached through the use
of matrices of correlations corrected for attenuation.
Intercorrelated
variables include steps 1, 3, and b of the IPSP test and the aforementioned subtests of the ITBS.
In particular, the three IPSP subtests
should not be highly correlated with each other nor with the ITBS
subtests.
Operational Definitions Used in This Study

1.
Problem solving
To search consciously for some action appropriate to attaining a
clearly conceived, but not immediately attainable aim. To solve
a problem means to find such action (Polya, 1962),
2.
Cognitive processes
Actions of cognitive performance such as perceiving, remembering, thinking, and desiring that depend on the subject's performance capacities.
These processes are not directly observ-
able but are presumed to underlie a person's behavior when faced

with a cognitive task.
3.
Four-step problem solving test model

For purposes of this study Figure 1 defines the model. Demonstration of the skills in each of steps 1, 3, and b is taken
as evidence of a student's ability in that step,
it. Problem solving processes

a) the cognitive processes presumed to underlie or guide the
11
subject's choices of solution strategies;
b)
the observable actions or operations the subject actually

adopts to attempt to solve the problem.
Also called problem
solving solution strategies. It should be clear from the

context which meaning applies.
Overview
Previous approaches to problem solving research and attempts to
measure problem solving are discussed in Chapter II, Review of the
Literature.
Chapter III includes a discussion of the theoretical
framework and design of the IPSP test, procedures used to develop the
verbal problems, scoring methods for the think aloud interviews, and
data gathering procedures. The results are discussed in Chapter IV.
A summary of the validation results, implications for future research,
and implications for the teaching of mathematical problem solving are
discussed in Chapter V.
12
CHAPTER II
REVIEW OF THE LITERATURE
Since the purpose of this paper is the development and validation

of an instrument to test mathematical problem solving, first, various
models that mathematical educators, mathematicians, psychologists, and
other researchers have used to study the problem solving processes will
be discussed, and second, testing instruments developed by other researchers to measure problem solving processes will be summarized. The
reader who is interested in a more detailed discussion of heuristics,
strategies, task analysis, structures and other pertinent factors involved in the problem solving processes is referred to one or more of
the following reviews.
A collection of papers edited by Kleinmuntz (1966) includes discussions of particular aspects of problem solving, research and theory..
Davis (1966) gives a comprehensive survey of research and theory relative to traditional learning, cognitive and Gestalt approaches to problem solving, and computer and mathematical models of problem solving.
Kilpatrick (1967) presents an extensive discussion of the definition of heuristics. He describes in detail the notion of problem solving as a search process , the thinking aloud technique, and other methods used to study problem solving processes. He also develops a
coding system for transcribing the problem solving protocol of subjects
13
from audio-tape to paper as they use the think aloud technique.
Hollander's (1973) review focuses on studies related to word problems for students in grades three through eight. The review includes
studies carried out from 1922 to 1969 in seven categories: problem
analysis, computation, general reading ability, specific reading skills,
specificity of the problem statement, the problem situation, and language factors.
Webb (197 5) notes that
...Because of the complexity of problem solving
processes and the number of variables associated
with problem solving, research in this area has
been too diverse to have any real consolidation. ..(p. l)
His review focuses on studies that involve problem solving tasks and
strategies. These studies were conducted from 1967 to 1973 and involved students in grades three through eight.
Lucas (1972) discusses the nature of problem solving, the search
mode used by information processors, some formal models of the problem
solving process and some techniques used by earlier researchers to
externalize thought processes.
Models of Problem Solving
The attempt to describe the thought processes used in mathematical
problem solving is not a new quest.
For many years mathematicians have
sought to determine and understand, the thought processes they use in

discovering
new mathematics. Some accounts discuss in great detail
the individual thought processes used in formulating and proving mathematical conjectures.
lit
The mathematician Henri Poincare (l9lit), in attempting "to see

what happens in the very soul of the mathematician," gives an explicit
account of his personal recollection of the discovery of the Fuchsian
functions. After his discussion he states:
...As regards my other researches, the account I
should give would be exactly similar, and the observations related by other mathematicians in the
inquiry of I'Enseignement Mathematique1 would only
confirm them.
One is at on.ce struck by these appearances of
sudden illumination, obvious indications of a long
course of previous unconscious work in mathematical discovery seems to me indisputable, and we
shall find traces of it in other cases where it is
less evident...(p. 55).
"The appearances of sudden illumination" recounted by Poincare was
also cited by Gauss in referring to a theorem on which he had worked for
years.
He states, "Like a sudden flash of lightning the riddle happened
to be solved."
"Sudden illumination" is the third stage of the psychologist
Wallas' (1926) model. In an attempt to analyze and thereby
further understand the problem solving process, Wallas observed accounts
of thought processes related to him by his students, colleagues and
friends.
It was the account of the German physicist Helmholz that in-
spired Wallas to describe three stages in the formation of new thought:

preparation:
the first stage during which the problem is investigated

and all the facts are gathered,
incubation:
the second stage during which one rests from any conscious
thought about the problem at hand and/or consciously
A review which instituted an inquiry into the habits of mind and methods of work of mathematicians during the early 20th century.
15
thinks of another problem,
illumination: the thircPstage during which the idea and/or solution appears as a 'flash' or 'aha'.
Wallas added a fourth stage, verification, during which the validity of
the idea is tested: which Helmholz did not describe but which Poincare
vividly describes in his accounts.
Polya (1957,1962) also developed a four-step model for problem
solving.
His principal aim in advocating a heuristic approach to prob-
lem solving was to enable the student to ask questions which focus on
the essence of the problem. Drawing upon many years of experience as a
mathematician and a teacher of mathematics, he describes the following
four-step model:
(1) Understanding the problem:
The student tries to understand
the problem by looking at the data and asking the questions, Is it possible to satisfy the conditions of the problem?, Is there redundant or
insufficient data?
(2) Devising a plan: The student tries to find a connection between the data and the unknown; the student should eventually choose a
plan or strategy for the solution.
(3)
Carrying out the plan: The student carries out the plan,
checking each step along the way.

(it) Looking back:
The student examines the solution by checking
the results and/or arguments. The student also attempts to relate the
method or result to other problems.
In a discussion of Polya's model, Mayer (1977) points out that some of
Polya's ideas (restating the given and the goal) are examples of the
16
Gestalt idea of "restructuring."
He concludes that:
...while Polya gives many excellent intuitions

about how the restructuring event occurs and how
to encourage it, the concept is still a vague one
that has not been experimentally well studied...
(p. 67)
The earlier mathematicians, as well as Wallas and Polya, based
their accounts of the problem solving process on either introspection
or retrospection. That is, the problem solvers reported on their
thought processes as they worked (introspection) or they recalled these
thought processes after they had completed the problem (retrospection).
Kilpatrick (1967) finds difficulties with both of these methods. Of
introspection he states, "but psychologists soon were raising questions
about the nature and magnitude of the distortion introduced by requiring the subject; to observe himself thinking."
His criticism of retro-
spection refers to Bloom and Broder (1950) , who stated that the difficulties with retrospection lie in remembering all the steps in one's
thought processes, including errors and blind alleys, and in reproducing these steps without rearranging them into a more coherent, logical
order.
It appears that Claparede (1917;193*0 was the first to use a third
approach, the "think aloud" technique (Kilpatrick, 1967).
This technique
does not require the subjects to think and observe themselves thinking
at the same time.
The subjects are asked to vocalize their thought pro-
cesses as they are thinking.
Hence the subjects do not have to analyze
their thought processes nor are they required to have special training.
There are, however, potential difficulties with the think aloud technique: interference of speech with thinking, the lapse into silence when
17
the subject is deeply engrossed in thought, and the essential difference
between the verbalized solution and the one found silently.
Kilpatrick
(1967) summarizes the views of several authors (Rota, 1966; Brunk, Collister, Swift and Slayton, 1958; Gagne and Smith, 1962; Dansereau and
Gugg, 1966) concerning these difficulties and concludes that:
...The method of thinking aloud has the special
virtues of being both productive and easy to use.
If the subject understands what is wantedthat
he is not only to solve the problem but also to
tell how he goes about finding a solutionand if
the method is used with an awareness of its limitation, then one can obtain detailed information
about thought processes ...(p. 8).
One of the first attempts to systematically gather empirical evidence was by Duncker (19^5) who studied the problem solving protocol of
subjects who were given a problem and asked to "think aloud." Two of
the problems that he used were the tumor problem:
...Given a human being with an inoperable stomach
tumor, and rays which destroy organic tissues at
sufficient intensity, by what procedure can one
free him of the tumor by these rays and at the
same time avoid destroying the healthy tissue which
surrounds it ?...(p. 1).
and the 13 problem:
...Why are all six place numbers of the form 276,276,
591,591, 112,112 divisible by 13?...(p. 31).
Duncker illustrated a typical solution protocol for the tumor problem with a flow chart and observed that the problem solving process
starts from a general solution, then progresses to a functional solution
and then to a specific solution.
In a more recent attempt to gather data empirically, Restle and
Davis (1962) developed a model which describes the subject as going
18
through sequential stages when solving a problem. Each stage is a subproblem with its own subgoal. Thus, the individual solves a sequence
of subproblems which then enables him to continue on to the next stage.
The model states that the number of stages, k,
for any given problem
can be determined by the square of the average time to solution, t,

divided by the square of the standard deviation, s, of the time to
solution, or
2 2
k = t /s . They do not describe the stages, and assume
that each is of equal difficulty.

Simon (1975) and his associates have also investigated thought
processes used in problem solving.
He used laboratory conditions to
observe human beings working on well-structured problems that the subjects find difficult but not unsolvable.
He states the following broad
characteristics of the information processing system which he uses to

describe human problem solving:
(1) serial processing, one process at a time;
(2) small short term memory capacity for only a few symbols; and
(3) infinite long term memory with fact retrieval but slow storage.
Simon further states that the solver always appears to search
sequentially, adding small successive accretions to his store of information about the problem and its solution.
These models and other multistep models (e.g., Dewey, 1933; Johnson, 1955; Gagne, 1962; Guilford, 1967; Kilpatrick, 1969; Post,
1968; Webb, 19TU, 1977; Lester, 1975) have appeared in the literature
with varying degrees of supporting empirical evidence. Such models
suggest a number of questions about teaching and measuring problem
solving skills. If problem solving is a multistep
19
process, would it be possible to measure a person's ability at each
step? Would this information be more useful to a teacher than just the
single number right, percentile, or grade level equivalent score?
How can this type of evaluation be effected?
Can paper-pencil tests be
used or is a one-to-one interview assessment necessary?
A paper and
pencil test can be a very accurate and efficient evaluation instrument,

especially in the case of easily measured skills. However, the complex
process of problem solving is more difficult to evaluate.
Testing the Ability to Solve Problems
In contrast to both the wealth and diversity of research in the
problem solving process, there is a scarcity of instruments to measure
these processes. Johnson (l96l) cites the dearth of instruments to measure the problem solving process in the National Council of Teachers of
Mathematics year book on Evaluation in Mathemathics:
...The committee would have liked to include material on the appraisal of mental processes in the
learning of mathematics. As teachers of mathematics, we are deeply concerned about developing
skill in productive thinking. Too often, many of
us find ourselves knowing little about the relations between the solutions given by our students
and the thought processes that led to these solutions. However, tests for appraising higher mental processes such as concept formation, problem
solving, and creative thinking in mathematics do
not exist... (pp. 2,3).
Recently, efforts have been made to develop instruments that measure problem solving processes.
Speedie, Treffinger, and Feldhusen
(1973) summarized what they and other authors (Ray, 1955; John, 195^;
Keisler, 1969) view as characteristics of a good problem solving process
test.
Three of these characteristics are:
20
1.
The test should yield a variety of continuous measures concerning the outcomes of the problem solving, the processes,
and the intellectual skills involved.
2.
The test should demonstrate both reliability and validity.
3. The test should be practical for group administration.

Many standardized group administered tests, such as the Iowa Test
of Basic Skills and the Metropolitan Achievement Tests, contain subtests
designed to measure problem solving ability. This measurement is obtained from a single score, i.e., the number of correct answers. No
attempt is made to decompose either the verbal problem itself, or the
solution, into component skills so that one can identify the specific
skill which may be the source of the student's difficulty.
Proudfit (1978) cites other limitations of standardized tests. She
lists eight tests that were examined by Charles and Moses (Webb & Moses,
1977) who concluded that most of the items did not measure problem solving processes but emphasized application of either previously learned
skills or algorithms.
In addition, Proudfit examined other problem solv-
ing instruments, e.g., the Purdue Elementary Problem Solving Inventory,

and concluded that they did not meet the previously stated criteria on
at least one of the three counts.
Post (1968), using a four step model (recognition, analysis, production, verification) designed a problem solving test which is in a
multiple-choice format.
However, the scores on the test are measures
of the end product and not of any particular step in his model. His
content validation procedure used the judgment of a panel of experts,
and his split half reliabilities ranged from .60 - .82.
21
Foster (1972) developed a problem solving test which had a multiple-choice format for some items and an open-ended format for other
items.
Some of the items are designed to measure process more than
product, but the test is not machine scorable.

Hiatt (1970) discusses the need for measuring mathematical problem
solving processes. He designed a "creative problem solving" test in
which points are awarded on the basis of approaches used, e.g., more
points are awarded if the student does some computation mentally.
Since this test is an open-ended format, it would be difficult to administer to large groups.
Wearne (1977) developed a test of problem solving behavior which
provides information about the child's mastery of the prerequisite mathematical concepts in the problems. Each item, called a super-item, is
composed of three parts: a comprehension question, an application question, and a problem solving question. The comprehension question
assesses the child's understanding of the problem setting. The application question assesses the child's understanding of the concept which
is presumed to be a prerequisite to solving the problem, and finally,
the problem is solved. However, it was difficult to substantiate the
assertion that the comprehension and application items are assessing
prerequisites of the problem solving items.
Proudfit (1978) has designed a problem solving process test in
which the child is given three problems, asked to select one, solve it,
and then answer a list of 12 questions before going on to the next problem. The questions refer to the child's solution processes. Two of the
questions are open ended and the other 10 are in a multiple-choice
22
format.
The test was administered to 100 fifth grade students but no
formal reliability or validity data were reported.

Zalewski (197*0 developed a set of verbal problems which he administered to a group of seventh graders. His purpose was to predict the
results of problem solving process assessments using the think aloud
procedure and coding scheme from his paper and pencil test.
For the
interview test he used Lucas' point system which gave 1 point for
'Approach', 2 points for 'Plan', and 2 points for 'Result'. The written
test was scored using the number of correct answers. The correlation
between the rankings on the written tests and the interviews was .68,
below the criterion Zalewski had set prior to his study.
Summary
Many researchers, including psychologists, mathematicians and educators, have investigated the problem solving processes using multistep
models.
Their concensus is that the multistep model is a valid model
for the investigation of the problem solving processes. In addition,

not only is there a scarcity of instruments to evaluate these models ,
but also a scarcity of instruments to measure problem solving processes
in a machine scorable format. The IPSP test is an instrument developed
to measure problem solving skills in a machine scorable format. The
validation of this test is the purpose of this study.
23
CHAPTER III
DEVELOPMENT AND PROCEDURES
Purpose of the Iowa Problem Solving Project

The Iowa Problem Solving Project (IPSP) is a three year project
directed by George Immerzeel of the University of Northern Iowa and
funded under ISEA, Title IV,C. Its primary purpose is to develop, evaluate, and disseminate materials to improve the mathematical problem
solving abilities of students in grades five through eight.
The approach advocated by the IPSP staff involves both the teaching of specific solution strategies and the solving of many interesting
verbal problems with increasing difficulty levels. Eight instructional
modules have been developed to help the student build a variety of
skills and strategies necessary to successfully solve verbal problems.
Each module consists of a booklet and a card deck.
The booklet provides
instructional activities aimed at developing a particular skill while

the 100-problem card deck provides practice in solving problems of varying difficulty levels which are especially designed for that skill. Two
IPSP members (Schoen and Oehmke) were assigned the task of developing a
measurement instrument based on the IPSP problem solving model. A more
complete description of the IPSP proposal, modules, and teaching strategies is given in Appendix A.
2it
Development of the IPSP Test

After several meetings with the IPSP team and advisory board to
discuss the purpose, goals and philosophy of the IPSP test, Schoen and
Oehmke began the task of developing a problem solving process test. The
testing model which was used, after several revisions, is given in Chapter I (figure l) . It should be noted that while there were frequent
consultations among the staff, the IPSP test and the IPSP modules were
designed and developed independently; the modules were developed at the
University of Northern Iowa and the testing instrument was developed at
The University of Iowa.
Three of the more important constraints that were imposed on the
testing instruments were:
(l) the format should be multiple-choice so
that the test can be machine scored, (2) the items should measure problem solving subskills and not just the ability to get a final answer,
and (3) the test should be based on the IPSP testing model as developed
by the IPSP team.
A search of the literature was completed to locate instruments
which measure problem solving processes and any items which were found
were classified according to the IPSP testing model.
Instruments were
found which were in an open-ended or multiple-choice format or some

combination of both (Kilpatrick, 1967; Post, 1968; Hiatt, 1970; Foster,
1972; Hollander, 1973;
Krutetski, 1976).
James, 1973; Zalewski, 19lb;
Wearne, 1976;
In many cases it was difficult to classify the items
according to the specific subskills of the IPSP testing model. Those

items that seemed to be most amenable to revision were selected and rewritten to conform to a specific step of the model.
In addition, many
25
items were written to test subskills in each category. The objective
was to build a large item bank to be used during the formative period.
Thus, valid items which satisfied item analysis and reliability criteria
in tryouts could be selected for inclusion in the IPSP test.
A first draft of the IPSP test was examined by two authors of the
Iowa Test of Basic Skills (ITBS) at The University of Iowa, by the IPSP
team, and by the IPSP Advisory Board.
The consensus was that the items
did measure the subskills in the IPSP testing model. However, there was
also a consensus that the items were "too wordy" and tended to be
"cute."
These suggestions, as well as those of several doctoral stu-
dents in mathematics education, were included in several revisions of

the first draft.
Each revised, open-ended item was then typed on a 3"X5" card and
presented to twenty-nine students in individual interviews. These students were in grades five through eight and were selected by their
teachers as representing a cross section of each of their classes. At
the beginning of each interview the student was told that these problems
might be different than any they had seen before, that s/he was not
being tested but that his/her thoughts and suggestions as s/he was working the problems were needed to improve the test items. The student
read each problem aloud; then talked aloud while solving the problem.
Paper and pencil were available if the student chose to use them.
The primary purpose of these interviews was to get the reaction of
the target audience to the items and to gather ideas for distractors.
It was the interviewers' intent to be passive but still attempt to
elicit as much information as possible from the students. Generally,
26
the students were extremely cooperative. Two of the brightest eighth

graders not only solved the "hardest" problems immediately, but analyzed
the items in great detail. Their comments were very informative, and
some of their suggestions were used as foils for the next revision. The
interviews were recorded on audio tape and analyzed according to reading
difficulties, concept difficulties, strategies used, computational
errors, and answers given. Using information obtained from these
interviews, the first draft of the IPSP test was revised.
Experimental units of multiple-choice items were developed and
administered at three different times with revisions following each tryout.
The last tryout prior to the project evaluation was administered
to representative samples of Iowa students in grades five through eight.

Appendix B contains a more complete description of the phases in
the development of the IPSP test along with a summary of formative data.
The study reported here focuses on the validation, rather than the development, of the IPSP test. This validation was completed in three
main phases: determining the relationship between the IPSP test and
data gathered from individual interviews, determining the relationship
between the IPSP subtests and several ITBS scales, and determining the
relationships between the IPSP subtests. The first phase is an assessment of concurrent validity, and the last two are forms of discriminant
validation using the Fisk and Campbell (1959) terminology. During the
validation several different forms of the IPSP test were used. These
are numbered and described in Table 1.
Phase one of the validation process was considered to be the most
important and the least straightforward.
Thus, most of the effort, as
27
Table 1
Phases of Validation of the IPSP Test
Test
Form
Grade
Level
Number of Items
Total (Subtest)
Phase of
Validation
Test Date:
December, 1977
561
5,6
it0(l2,it,12,12)
Pilot Interview
Study; Pretest
Sample Tested:
Malcolm Price
Lab School
562
5,6
U0(12,it ,12,12)
Pilot Interview
Study; Posttest
781
7,8
it0(l2,it,12,12)
Pilot Interview
Study; Pretest
782
7,8
it0(l2,it,12,12)
Pilot Interview
Study; Posttest
Test Date:
January, 1978
563
5,6
20(6,2,6,6)
Final Tryout
Series
Sample Tested:
Representative
Sample of Iowa
Students
56b
5,6
20(6,2,6,5)
581
5,6,7,8
20(6,2,6,6)
582
5,6,7,8
20(6,2,5,7)
783
7,8
20(6,2,6,6)
78U
7,8
20(it,2,7,7)
Test Date:
May, 1978
561
it0(l2,i|,12,12)
Final Interview
Study
Summative Evaluation of IPSP Project: Pretest
Sample Tested:
Two Fifth Grade
Classes from an
Iowa City Elementary School
Test Date:
October, 1978
Sample Tested:
One Hundred
Fifth and Sixth
Grade Classes
from Iowa
Schools
565
5,6
30(10,0,10,10)
785
7,8
30(10,0,10,10)
28
well as the emphasis in this report, was placed on the relationship between the interview data and IPSP test scores.
Design and Development of Interview Procedures
A preliminary list of interviewing procedures was developed using
the experiences that were gained during the first round of interviews
described in the previous section. The purpose of the later interviews
was to discover the strategies the child uses to solve verbal problems.
Hence it was decided that the interviewer should not lead the child into
selecting a particular heuristic. Any questions asked by the interviewer to elicit more information should not lead the child. There were
also instances of nonverbal leading that occurred during the trial interviews. For example, a child unsure of which operation to use would
say, "I think I should multiply?" and then look at the interviewer's
face to get some sort, of reaction. Whether the child in fact multiplied
or performed some other operation depended on the reaction of the interviewer.
Consequently a major reason for developing a set of written
interview procedures was to minimize the interviewer's influence on the

student's problem solving processes.
A first draft of the procedure was brief and contained such instructions to the interviewer as: Encourage the student to vocalize his
thinking as much as possible during the sample warm-up time; Try to record something on tape; Don't go any longer than 15 seconds without recording something on tape. After the list was developed it was tested
with two volunteer fifth and sixth graders. Revisions were made,
and these were tried with two volunteer students, a sixth and a
29
seventh grader.
In addition, two doctoral students in mathematics edu-
cation tried the procedure with several of their students.
Some of the
suggestions that were incorporated into a revised edition were: Supply

a pencil with the eraser cut off to prevent erasures; Rather than giving
the student one or more sheets of paper for all the problems, put one
problem on each sheet so that problem and computations are together;
Tell the students that you will not let them know whether their solution
is correct.
A difficulty that occurred with some frequency was that of the student becoming silent while using paper and pencil either to do arithmetic computations or while apparently thinking. If the interviewer
interrupted and asked the child to tell what s/he was thinking, the
reply would be, "I'll tell you as soon as I'm finished," or "Just a
minute, I'm thinking."
This difficulty was handled in several ways.
If
the child was in deep thought working on the problem, the interviewer
would ask a question about the overt observable behavior of the child,
e.g., "Are you doing some multiplication now?" or "Are you adding now?",
when it was obvious that the child was doing that particular computation
using paper and pencil. The child usually mumbled "yes" or shook his
or her head and went on with the arithmetic. This strategy was used as
an indicator on the tape to let the coder know what the student was
doing during that time.
A second strategy that worked very well was to make a comment similar to the following:
"That's fine, you're telling me what you're
thinking"; "You're doing fine, Sue, you're telling me what I want to

know"; Can you put into words what you're thinking?" Such statements
30
seemed to encourage the child to vocalize and give more details, yet
did not appear to lead the students into using a specific strategy. The
final form of the list of interview procedures was the result of five
tryouts and revisions. Figure 2 shows the final form used in this validation study.
The Quantification Scale
Development
Since one of the goals of this study was to assess the relationship
between the IPSP test and the results of the think aloud interviews, a
quantification code based on the IPSP testing model was needed to process the interview findings. Some related research was found.
Kilpatrick (1967) developed a coding scheme to analyze the protocols used by his subjects in a think aloud interview, but did not
attempt to quantify these protocols. Lucas (1972) used a modification
of Kilpatrick's coding scheme with calculus students. His five-point
scoring code is based on three categories: Approach (one point), Plan
(two points), and Result (two points).
The "Approach" phase represented
the subject's understanding of the problem, the "Plan" phase represented

the subject's attempt to find a path to obtain the answer, and the
"Result" phase was the subject's final answer.
Zalewski (l97it) investigated the relationship between a paperpencil test and interview results. He essentially followed the procedures established by Kilpatrick and Lucas with some modifications since
his subjects were seventh graders. The process score obtained by using
Lucas' scoring procedure provided a basis for ranking the subjects.
31
Interviewing Procedures
Problems should be typed one on a page, preferably placed at the
side of the page so that the student can use the rest of the page
for any computing, drawing diagrams, tables, or any type of thinking.
Start the interview with 2 sample problems, thus allowing the student to become familiar with the routine and with the type of
information the interviewer would like to find. At all times make
a conscious effort to put the child at ease.
Tell the student that no information will be given on whether the
answer or strategies are correct since you want to get the best
possible data.
Do encourage the students to go on by making comments such as,
"You're doing just fine." "That's good, you're telling me what
you're thinking." "Go ahead." BUT DO NOT LEAD THE STUDENT INTO
USING A STRATEGY.
Don't go any longer than about 15-20 seconds without recording
something on tape. EXCEPTION: If the student is doing computations or drawing a diagram or making tables, etc., make some sort
of statement as, "You're making a table," etc.
Encourage the student to vocalize his thinking as much as possible.
If a student falls silent while writing or drawing, prompt him by
reading what he has written or ask him what he is doing. However,
rule 5 takes precedence over rule 7If a student doesn't answer, or doesn't make any comments about
his thinking, wait about 15 seconds and ask, "Can you tell me what
you are thinking?" Wait another 10 seconds or so and ask again.
This time, "Are you trying to figure something out?" If nothing
happens, call this IMPASSE. Now ask the question, "Would you like
a hint or another problem?"
If the child says yes, this would indicate to you that the rating
part of the data gathering is over, but to continue to get diagnostic data. This can be done by asking the student to identify
the area that presented the trouble, why did he have this trouble,
e.g., didn't know method, lack of understanding of the problem,
read problem incorrectly, etc.
Figure 2
32
Figure 2 (cont'd.)
8c.
If the student says no, then allow more time and ask him if he
would tell you what he is thinking or what method he is trying,
or would he try to do his figuring on paper. Then repeat steps
8a, 8b and 8c again.
9-
If the student is not trying to solve the problem get him on the
right track, but after IMPASSE.
10.
For the first half of the problems observe the student. Does he
have a habit of LOOKING BACK? If not, follow step 11.
11.
If the student does NOT have the habit of LOOKING BACK, and has
already been given the first half of the problems, then lead him
on with prompts listed on the LOOKING BACK coding sheet, e.g.
"Did you check your answer with the conditions of the problem?"
"Did you check your answer?" "How sure are you that your answer
is correct?"
DON'T
1.
Do not allow the child to erase.

through the mistake.
Instruct him to make a line
2.
Do not give any tutoring or prompting until after the IMPASSE and
then only if the child asks a question. However do use the procedure listed in step 8a.
3.
Do not summarize what the child has done. Try to get him/her to
do it.
it. Do not tell the student whether he is on the right track, or

whether his answer is correct.
5. Do not tell the student that you are going to use the strategies
listed in steps 8a, 8b and 8c.
33
Written tests were administered to these same subjects who were then
ranked according to the number of correct answers. The correlation
coefficient between the written tests and interviews was .68. Zalewski
concluded that a higher correlation is necessary before the written
test scores can be used as a substitute or predictor" for interview
results.
Webb (1975) also used an adaptation of the coding system developed
by Kilpatrick and Lucas. He used the "Approach," "Plan," and "Result"
scoring system and obtained a frequency count from a check list of problem solving process variables.
From the preceding discussion it appeared that no 3-step quantification scheme was available to investigate the relationships between
interviews and IPSP test results. A first attempt at developing the
scale was made using Kilpatrick's processing sequence with some modifications in order to follow the IPSP testing model. In trying to quantify these processing sequences the procedures became very cumbersome.
A new attempt was made in which flow charts were designed for each step
of the model. Again when it came time to assign a number at the various
branches the instrument became unmanageable. Another attempt was made
in which behavior in each step of the testing model was assigned three
numbers:
0 , 1 , and 2, which were to serve as categories. In the 0 cat-
egory would be those processes which were totally incorrect or a response such as "I don't know what to do." The 2 category would contain
responses which were completely correct and the 1 category would contain the intermediate responses. This new procedure was used with the
audio tapes from the first interviews.
It became immediately apparent
3it
that at least one more category was needed and that the categories were
not explicit enough for each step of the model.
These revisions were made and the resulting instrument now had four
categories:
0, 1, 2, 3, and more explicit descriptors under each cate-
gory. This new instrument was used to process additional tapes and further revisions were made. At this stage the instrument was examined by
the same two mathematics educators who were consulted on the interview
form. Each category and its descriptors were thoroughly discussed.
Step it, the looking back step, presented the greatest difficulty.
If
the subject gives an answer, appears to be mulling it over, goes back to

the problem, reads it again, and then gives another answeris this
checking the answer or trying to understand the problem?
It was decided
to include this process under step 1 and be more explicit with the
descriptors under step it.
After general agreement on the appropriateness of the scale was
reached, three raters quantified audio tapes of interviews using this
form. After a few minor additions the instrument was considered to be
in "final" form. As a final test, each rater analyzed the same three
interviews on audio tapes.
Use of the Scale
The final form of the quantification scale is given in Figures 3,
it, 5. Behavior which involved reading, analyzing and understanding the
problem was classified as step 1 behavior. Briefly, a score of 0 was
assigned to a student who failed completely to understand a problem;
was assigned to a student whose analysis of the problem was incorrect
Quantification Scheme for the Components of the IPSP Test Model

Step 1: Getting to Know the Problem
Says he doesn't understand

the problem and makes no
attempt at solution.
Makes a false s t a r t (recognizes i t as such) but can't

arrive at a correct strategy.
Tries to solve the problem

unaware that there is insufficient information and
never starts correct
strategy.
Says he doesn't understand

problemrereads ittries to
make a start but is unsuccessful (includes rephrasing,
trying to understand what is
unknown, what is given, or
searching for a path).
Fails to use data correctly

in attempts at solution,
e.g., uses all extraneous
data to arrive at solution.
Immediately tries to do
some arithmetic operations
using all the numbers in
the problem without regard
to a correct strategy.
Makes a false startbut

eventually arrives at a correct strategy (includes
trial and error).
Reads problem, knows there

are "too many numbers" but
can't organize proper data
(extraneous data).

and unconsciously makes up
his own missing data but does
not state that there is insufficient data. His solution
strategy is correct for the
data he provides.
Tries to summarize data or
repeat it in a different form
which starts him out on the
correct strategy but later he
gets off the track.
Makes true statements about

extraneous data but does not
advance solution.
States there is no solution

because of insufficient data
and attempts to modify conditions but is unsuccessful.
Rereads problemappears to
know there is something missing but cannot state what is
wrong and makes a false
start (missing data).
without regard to using data
correctly. After a brief trial
and error realizes he is not
using data correctly but cannot correct the situation.
Uses correct strategy but

does not use data in its
proper form (e.g., neglects
units).
Solves some of the cases involved in the problem but
fails to consider all solutions.
Tries to summarize data or

repeat it in a different form
but does not find correct
strategy.
Any correct attempt to understand

the problem by reading or rephrasing, i.e., trying to understand what is unknown, what is
given, or searching for a path.
Rereads the problem to assist in
drawing figures, tables , equations , performing a check or introducing symbols. (Must be apparent that this is going to aid
in understanding the problem and
finding a correct strategy.)
States a plan for an intermediate
or final goal which is a correct
strategy.
Carries out exploratory manipulations which lead to correct solution.
States problem can't be worked
and tells what is needed to work
it (modifies problem). States
reason why it can't be worked
(insufficient data).
States what data is not needed in
solution of problem while stating
correct strategy (extraneous data).
States the conditions and constraints of the problem correctly.
Immediately starts out to work
the problem and succeeds.
Figure 3
OJ

Step 3:
Carrying Out the Plan

0
Any manipulations or computations that are done are

incorrect.
Does less than half the

number of necessary computations correctly.
Does half or more of the

necessary computations correctly.
Sets up problem correctly

and carries out actual computations correctly.
Strategy is set up correctly but is not able to carry

it out, e.g., cannot solve
an equation.
Sets up equation but cannot

solve it completely. Does
simple operations like addition and subtraction.
Sets up problem incorrectly,

but all computation is done
correctly.
Tries to use a diagram,

figure, or table but does
the computation incorrectly.
Uses successive approximation

(systematic trial and error)
and does the first step correctly but cannot carry it to
the end.
Sets up equation but cannot

solve it completely in that
s/he makes errors on the ha
harder operations, e.g., multiplication, dividion, clearing fractions, etc.
Suggests a plan but cannot

carry it out.
Makes a mistake in copying

correct number but carries
out computations correctly.
Makes an incorrect diagram,
figure, or table but uses the
numbers in the computation
correctly.
Uses successive approximation
and does the first few steps
correctly but bombs on the
computations involved in the
final step.
Does computation correctly
but uses units incorrectly.
Figure b
Uses the algorithm or equation

correctly, e.g., manipulates
all parts of the equation correctly .
When using successive approximations (trial and error),
uses information from previous
trial correctly, i.e., computes all these values correctly.
Starts to execute plans, makes
a mistake (computationally)
but finds errors and corrects
them.

Step h:
Looking Back
0
Makes no attempt to check

answer or conditions of
problem.
Says, "It's probably
wrong," and makes no
attempt to check the
answer.
Says s/he doesn't know and

makes no attempt to correct i t .
1
Expresses uncertainty about
answer.
Says it's probably wrong (or
some version) and attempts to
give a reason for his/her
uncertainty.
Makes an attempt to check the
answer but is not successful
enough to be convinced that
it is right or wrong.
Checks computations involved
in answer but does not check
to see if answer satisfies
condition of problem. Errors
here should be major.
Makes some attempt to check

answer or decide whether it
is correctbut eventually
gives up.
answer by various methods
(i.e., retraces steps, checks
condition of problem, substitutes answer but cannot carry
out check completely).
answer by various methods
(i.e., retraces steps, checks
condition of problem, substitutes answer but fails to detect the incorrect answer).
Errors here should be minor.
3
Attempts tt -heck the values
of an unknc . or the validity
of an argument.
Tries to decide whether the
answer makes sense (i.e.,
realistic, reasonable estimates) .
Checks that all pertinent data
has been used.
Suggests a new problem that
can be solved in the same way.
Successfully attempts to simplify the problem.
Checks solution by retracing
steps or substitution.
Checks that solution satisfies
conditions of problem.
Figure 5
38
but who understood some of its elements; 2 was assigned to a student

whose analysis was correct except for a minor error such as reading data
incorrectly; 3 was assigned to an entirely correct understanding of the
problem which led to a valid solution strategy.
A crucial point is that
the step 1 score was not affected by errors in the application of a

solution strategy, once that strategy was chosen. Errors in the application of a chosen strategy were reflected in the step 3 score.
Step it behavior consisted of student moves after a tentative solution was reached.
Many students stopped as soon as they had an answer
and were given a score of 0 for step it. Briefly, a score of 1 was
assigned if some uncertainty was expressed but no systematic check was
made; 2 was assigned if a check was attempted but was either incorrect
or incomplete; 3 was assigned if a valid check of the computation, conditions and/or reasonableness of the solution was carried out. Again,
specific criteria were described for each numerical score, but an important point is that the step it score was not affected by any behavior
preceding a tentative solution. An exception was that students were
assigned 0 on step it if no tentative solution was reached.
The following two examples will illustrate the scoring scheme.
Ann, a sixth grader, was presented with this problem:
A bag of XL-50 brand marbles contains 25 marbles and costs
19$.
How much will 125 marbles cost?
Ann read the problem aloud and this is the transcribed interview:
A:
Uh . . . Oh boy . . . hm . . .
I:
What are you doing now?
A:
I'm trying to figure out how I'll do this.
Either add or multiply
39
. . .O.K.
I'm going to multiply 125 marbles by 19$.
(multiplies)
It comes out $11.25. That's not right.

I: What are you trying to find?
A:
I'm trying to get the right answer.
I:
But what answer?
A: What I should do with 25, 19, and 125, because I know with those
numbers I have to do something.
(Silence)
(Rereads the problem)
(Silence)
A:
I want to see if I multiplied wrong . . .

(Remultiplies but is still stumped)
Ann exhibited a behavior which occurred very frequently in student
interviews. She tried to do some arithmetic operations using all the

numbers in the problem.
She lacked, or at least failed to use, analytic
skills, essentially step 1 behavior.
However, her computational skills
and ability to use tables and diagrams, as illustrated in other problems , were good. On this problem, she was given a score of 0 for step
1; 3 for step 3; 0 for step it. If she had made a computational error or
misused an equation, she would have received a 0, 1, or 2 on step 3. A
similar pattern emerged in her solution to other problems.
Dave, a fifth grader, is an example of a student who was able to
understand most of the problem settings presented to him, but had difficulty carrying out his solution strategies. This is illustrated with
the following example of a single-step problem:
itO
Mr. Price earned $75 in each of 8 weeks. How much did he

earn for all 8 weeks?
D:
O.K. 75, 75, 75, . . . (add eight 75's)

O.K. 1, 2, 3, . . .
I:
So you wrote eight 75's down, right?
D:
Yes, O.K., that'd be . . . eight 5's would be Uo. It'd be 0 and it

on top. And eight 7's would be . . .O.K. let's see . . . Hmm.
(Writes them down and adds them)
D:
It'd be $5.60.
I:
O.K. that's your answer then?
D: Yes.
Dave chose to add eight 75's, which was a correct strategy. However, he had difficulty in finding the sum. To make the computation
easier, he correctly noted that 8 sevens is the same as it fourteens. He
was scored a. 3 on step 1, and 2 on step 3 on this problem. His score on
step it was 0 since he did not exhibit any behavior in that category.
Lateri with prompting, Dave realized that he had left out the 8 fives,
and he corrected himself.
The Pilot Interview Study

Procedures
After a year of development, a pilot study was designed in which
data from interviews and the IPSP test in their developing forms, were
gathered from the same sample of students in Fall, 1977- The year of
development included a trial run in January, 1977, and one in March,
1977, in which four 20-item forms of the IPSP test for each run were
itl
administered to students in grades five through eight. The form of the

IPSP test that was used in this pilot study was a revised one based on
the data obtained from those trial runs.
Preparations for the pilot study included briefing the classroom
teachers and administrators of the school, setting up a schedule for
interviewing the students, and working out the logistics of the schedule. A final session was held which was attended by the IPSP staff, the
classroom teachers who would be involved in the study, the head of the
school's mathematics department, the principal, and involved counselors.
The form of the IPSP test that would be used was presented, and the team
discussed the purpose of the study and answered questions raised by the
school personnel.
Pilot Sample
All of the students in grades five through eight in the Malcolm
Price Laboratory School, Cedar Falls, Iowa, were involved in this pilot
study. The students were randomly divided into two groups across grade
levels.
eight.
Group one consisted of 99 students in grades five through

Group two consisted of 103 students from the same grade levels.
However, within the groups, the fifth and sixth graders were administered one form of the test while the seventh and eighth graders completed
another form of the test.
Concurrently with the above groupings , each
teacher was asked to divide each of their classes into an upper ability
and lower ability half and to select one "verbal" student from each
half. This resulted in the selection of 32 students, four from each
grade, to be involved in think aloud interviews. All of the interviews
it2
were conducted by the investigator and followed the interview form discussed in the previous section.
The interview tapes were then coded
according to the quantification code previously described.
Students in
group two completed the IPSP test after the interviews were completed.
Correlation coefficients between interview and IPSP test scores were
then computed. Figure 6 shows the time schedule for the study.
Schedule of Events for Pilot Interview Study

Date
Responsibility
Activity
11/28
paper and pencil test

for group 1
classroom teacher
11/29
interview it students from

each grade in group 2
(room it)
the investigator
11/30
paper and pencil test

for group 2
classroom teacher
interview b students from

each grade in group 1
(room it)
the investigator
Figure 6
Interview Problems
One hundred open-ended verbal problems were developed for grades
five through eight independently from those on the IPSP test. These
problems were reviewed'by members of the IPSP staff. Samples from the
100 problems were administered to six volunteer students in grades five
through eight in think aloud interviews. Information obtained from
these interviews and suggestions from the staff were used in revising
k3
some problems and eliminating others. A pool of 65 problems resulted.
These problems were then classified into seven levels: level one containing simple one step word problems and each succeeding level containing problems that were increasingly difficult in both concepts and
computations to be used. Each problem was typed on a half sheet of
paper so the student could do any needed computations on that paper.
These problems are included in Appendix C.
The Interviews
The investigator conducted all 32 interviews. Because the interviews were taking place during the regular school day, a rather brief
time limit of 20 minutes per student was allotted. The first five minutes were used in talking to the student about the procedure to be used
and in presenting two sample problems. Students were encouraged to talk
but were not given any hints or told whether what they were doing was
correct.
The student's responses to the sample problems were used by the
interviewer to choose the difficulty level of the first problem to be
presented during the interview.
Succeeding problems were chosen from
various prearranged levels depending on the student's performance on

the previous problems. This procedure was used in an attempt to lower
the student's frustration levels and increase the student's verbalization of his thought processes by optimizing the match between the student's ability and the problem's difficulty.
it it

The experience gained in the pilot study was used as a basis for
revising the procedure for the final interview study: the next phase of
the investigation of the relationship between the IPSP test and the
think aloud interviews. Two major procedural changes were made. First,
all students were administered the same five verbal problems, and second, five doctoral students, including the investigator, and a professor
of mathematics education from The University of Iowa, conducted the
interviews. The interviewers were trained in the use of the interview
procedure previously described.
Fifty-five students from two fifth grade classes in an Iowa City
elementary school were interviewed and administered the IPSP test. To
prepare the students for the interviews the investigator visited the
two fifth grade classes the day before the interviewing started, talked
to the students about the purpose of the interviews and asked for their
cooperation while they were being interviewed.
At that time a tape of
an interview selected for the high quality of the student's reporting

of his thought processes was played.
This was followed by a classroom
discussion of the comments that were made by this student in the interview recording.
Some of the students in the class pointed out that the
child had worked a problem incorrectly.
It was explained that this was
all right since the interviewers were more interested in what the child
was thinking than in his working the problem correctly.
The procedure used was as follows. The first day, one fifth grade
class completed the IPSP test; the second day, students from both fifth
grade classes were asked to solve the same five verbal problems in a
U5
think aloud interview; and the third day, the other fifth grade class
completed the IPSP test.
The IPSP tests were scored and the audio-tapes of the think aloud
interviews were coded using the quantification scale described at the
beginning of this chapter. The investigator coded all 55 students.
However, a doctoral student in mathematics education and a senior ma^nematics major also coded several student interviews and the percent of
interrater agreement was computed.
The time schedule and the five verbal problems for this investigation are shown in Figure 7ITBS and the IPSP Test
To further describe the IPSP test against more familiar measures,
the relationship between the four scales of the ITBS and the three subtests of the IPSP test was investigated. The four scales of the ITBS
(forms 5 & 6, levels 11 through lit) used in this study were: Test R,
Reading Comprehension; Test W-2, Reading Graphs and Tables; Test M-l,
Mathematics Concepts; Test M-2, Mathematics Problem Solving.
subtests of the IPSP test were:
The three
Step 1, Understanding the Problem, Step
3, Carrying Out the Strategy; Step it, Looking Back.
Since the IPSP
steps have been described in previous sections, descriptions of the four

ITBS subtests only are included here.
Test R:
Reading Comprehension
The Reading Comprehension Test consists of selections varying

in length from a few sentences to a full page. . . .
For these reasons, the items in all levels of the tests
place a premium on understanding and drawing inferences from
the reading selections. . . .
. . .To score well on the last few selections, a pupil has to
Schedule of Events for Final Interview Study

Date
Activity
Sample
May 8, 1978
IPSP Test Form 561

Administered
First Fifth Grade Class
May 9, 1978
Think Aloud
Interview
Both Fifth Grade Classes
May 10, 1978
IPSP Test Form 561

Administered
Second Fifth Grade Class
Problems Administered for Think Aloud Interview

A rock that weighs 30 pounds on earth weighs 5 pounds on the moon. How
much does a rock that weighs 18 pounds on earth weigh on the moon?
Each hamburger at McDonald's weighs .15 of a pound before it is cooked.
How much does the meat for 8 hamburgers weigh?
Eight men and 12 women weigh 3000 pounds . The women all weigh the same.
Each man weighs 210 pounds. What is the weight of one woman?
You need to buy 10 grapefruit. They are sold 2 grapefruit for 25$ or a
bag of 10 grapefruit for $1.00. How much do you save by buying a
bag of grapefruit?
How many pairs of socks at $1.31 a pair can you buy if you have $10?
How much change will you have left?
Three Sample Problems from Test Form 561

You threw a baseball 5 meters farther than Tom did. You want to know
how far your throw went. You could solve the problem if you knew:
1) Tom's throw was 5 meters shorter than yours.
2) A meter is a little more than a yard.
3) A baseball is 8 inches around,
it)
T o m ' s t h r o w w a s 3it m e t e r s .
Figure 7
Figure 7 (cont'd.)
Together you and I had $6.00. We spent a total of $3.20 for a record
and some ice cream. We each took half of the money that was left.
Which question below could be answered using this information?
1) How much did the ice cream cost?
2) How much did the record cost?
3) Could we buy another record at the same price?
it) How much money did each of us have left?
Starting
point
^ y a r d s
"
Throw
landed here
I threw a football itO yards. The picture above shows the path that the
football followed. At its highest point, about how high was the
throw above the ground.
1)
2)
3)
b)
50
10
30
5
yards
yards
yards
yards
h8
use a l l the s k i l l s generally associated with mature adult

reading. (Hieronymus and Lindquist, 197it, p . i+T)
Test W-2: Reading Graphs and Tables
Instruction in reading graphs and tables is concentrated in
mathematics books and the grade placement is not particularly rigid. A large share of this instruction is concerned
with reading traditional graph forms (bar graphs, line
graphs, and circle graphs), despite the fact that graphs
appearing in modern magazines, newspapers, and textbooks
are more likely to be pictographs.
At least five different graphs or tables are included
in the test for each level. . . . (Hieronymus and Lindquist,
197^, p. 52)
Test M-l: Mathematics Concepts
The changes in content, grade placement, and relative
emphasis upon various mathematics concepts in the current
forms of Test M-l reflect developments that have occurred
in the mathematics curriculum during the last decade. . . .
Subsequently, a questionnaire was sent to all school
systems participating in the Iowa Basic Skills Testing Program, asking each to identify the textbook series it was
then using in each year. . .it was necessary to know the
extent of use of each series to ensure that the test content would be representative of the mathematics curriculums
in the majority of elementary schools. (Hieronymus and
Lindquist, 1971*, p. 5*0
Test M-2: Mathematics Problem Solving
In Test M-2, competence in problem solving is tested in a
functional setting of challenging and practical problem
situation.
A conscientious effort has been made to include as many different number combinations as possible
and to represent most frequently the specific number skills
that have shown the highest incidence of error (higher
decade facts, zero facts, etc.). . . .Computational skills
are systematically tested in a functional setting.
(Hieronymus and Lindquist, 197it, p. 5*0
The IPSP tests were administered to a representative sample of
2017 Iowa students during January, 1978. The relationship between the
student's scores on the IPSP subtests and the October, 1977, administration of the ITBS was. investigated using multiple regression techniques.
it9

The "final forms" of the IPSP test consist of two forms for grades
5 and 6 (565 and 566), and two forms for grades 7 and 8 (785 and 786).
These forms are each 30-item tests with 10 items in each subtest. The
forms in each pair are of equivalent difficulty and very similar content
and they will be referred to as equivalent.
One set of the final forms
was administered to over 1000 students at each of grades 5, 6, 7, and 8

in October, 1978. The other set was administered to the same (except
for attrition, absentees, etc.) in March, 1979-
The October administra-
tion served as a pretest for the IPSP project evaluation and the March
administration was the posttest.
In order to determine the level of relationship among the subtests
in each IPSP test, raw correlations between pairs of subtests were computed.
These were then corrected for attenuation, i.e., the lowering
effect of the unreliability of the subtests.
50
CHAPTER IV
DATA AND RESULTS
The purpose of this chapter is to analyze and interpret the data

gathered in the phases of the validation discussed in Chapter III. The
relationship between the IPSP test and the interview data, both in the
pilot interview study and the final interview study, is discussed in the
first section. The relationship between the IPSP test and selected subtests of the ITBS is explored in the next section.
In the final sec-
tion, the discrimination among the IPSP subtests is examined.
IPSP Test and Interview Data

A procedure by which validity of a newly developed test may be
judged is to administer it to an appropriate sample concurrently with an
established test which is assumed to measure the desired skills. A
strong relationship between the two sets of test scores would then indicate that the new test is valid.
A variation of this procedure was nec-
essary in this study since the IPSP test appears to be the first effort
to develop a test to measure this set of skills. It was decided that
the concurrent measure would be data from students as they thought aloud
while solving open-ended verbal problems in one-to-one interviews.
A major problem that this procedure presented was that no reliable
and valid technique for gathering quantitative data from individual
51
interviews relative to the three steps in the IPSP test was available.
Consequently a quantification scheme as described in Chapter III was
developed. The scheme was used on tapes from both the Pilot and the
Final Interview studies.
Pilot Interview Study

Since the 32 students that were selected in this study were from
"average" classrooms, their mathematical ability varied. Thus the decision was made to present problems which varied according to the students ' ability to verbalize and to react to the previous problem. Weaker students were presented with easier one step verbal problems and
problems which used graphics to display the necessary information. This
method did increase the amount of verbalization and also seemed to give
the weaker students more confidence in their problem solving ability.
However, it also created difficulties with differentiating between good
and poor problem solvers since the weaker students were being presented
simple one step problems in order to encourage them to think aloud.
The correlation matrix display (Table 2)
shows poor relation-
ship between the think aloud interview and the IPSP test (forms 56l and
78l).
Several reasons can account for this:

1. The previously stated one, i.e., the presentation of problems
according to the students' ability to solve the previous ones.
2.
The manner in which the sample of 32 students were chosen may

have been biased.
3. Each student was given roughly 15 minutes to work problems

this allowed the number of problems the students worked to
range from 6 to 15-
Table 2
Analysis of IPSP Test with
InterviewsPilot Study
Correlation Coefficients
Interview
IPSP
Test
Step 1
Step 3
Step it
Step 1
.28
.23
.19
Step 3
.lt6
.30
.11
Step it
.12
.01
.09
Reliability Coefficients
Step 1
Step 3
Step it
Form 561
77
72
7it
Form 781
7it
72
.79
.86
95
.86
Inter-rater
Agreement*
*Proportion of agreement based on a total of

18 problems worked by 22 students.
53
All of these constraints were removed in the final interview study in

hopes of obtaining more satisfactory data.
Final Interview Study
The interview data were processed using the quantification scheme.
Interrater agreement was computed between three interviewers, a mathematics education doctoral student, a senior mathematics major and the
investigator.
The proportion of agreement was 89% overall. The details
are included in Table 3.

Since the quantification scheme was used in categorizing the behavior for the three steps of the IPSP test model, the data were not in
dichotomous form.
Thus, the usual methods of estimating reliability
could not be applied.
The problem was solved by using Lord and Novick's
(1968, p. 20it) sample analog of the Cronbach estimate for the measure of
internal consistency of the five problems given to the 55 students in
this study. The reliability coefficients and the relationship between
the interview data and IPSP test results are displayed in Table 3.
Based on these findings there appears to be a rather strong relationship between the two measurement techniques in step 1 and step 3The .13 correlation in step b, the looking back step, verified observations that were made by the interviewers; i.e., this step was rarely
observed in the think aloud interviews. This observation has also been
made by other researchers (Kantowski, 1975; Kilpatrick, 1967).
Even
when the interviewer asked the student to look back at a problem and
check the answer, the student would not do so on the next problem.
Table 3
Analysis of IPSP Test with
InterviewsFinal Study
Correlation Coefficients
Interview
Step 1
Step 3
Step
Step 1
.6b
.6b
.20
Step 3
.56
.55
.11
Step it
.52
.50
.13
Reliability
Coefficients
Step 1
Step 3
Step
Form 561
79
.lb
.73
Interview
Problems
78
.93
.itO
.73
.93
Inter-rater
Agreement
1.00
P r o p o r t i o n of agreement b a s e d on 5 problems
each from 15 s t u d e n t s .
55
IPSP Test Administration
Six experimental forms of the IPSP test were developed for administration January, 1978: two equivalent forms for grades five and six,
two equivalent forms for grades seven and eight, and two equivalent
forms for grades five, six, seven, and eight. There was an average of
six items in each of the three steps with 18 as the total number of
items.
Students were allowed 30 minutes to complete the test. Each
form of the test was administered to an average of 500 students in

classes at each of the four grade levels. The samples for these forms
of the IPSP test were generally representative of the Iowa school population with a total of 2017 students involved.
Each form was adminis-
tered to every fourth child in each classroom in the sample.

Tables it, 55 6, 7 show the reliability coefficients and other statistics for these test forms. The items from these forms were revised
and used in the final forms of the IPSP test administered in 1978-79In addition, four of the most reliable forms were chosen, one at each
grade level, to examine their relationship to the four subtests of the
ITBS.
In particular, forms 563 for grades five and six and 783 and
582 for grades seven and eight, respectively, were used. While these
forms are not equivalent to the final IPSP test forms, they are constructed from roughly the same proportion of item types. Their relationship to the ITBS subtests should certainly be a good approximation
of the relationship of the final IPSP tests to the same subtests.
Table it
R e l i a b i l i t y Analysis
Grade
Step
Number
of
Number
Items
Form 563
lit7
1
3
it
Total
litO
1
3
it
Total
lit it
1
3
it
Total
138
1
3
it
Total
Note:
SD
SD
SE
m
r
XX
J a n u a r y , 1978
6
6
6
3.56
it.33
3.56
1.6l
l.itit
1.6l
1.07
.86
1.03
.56
.6b
59
18
ll.it6
3.79
1.7k
.79
6
6
6
it.07
it. 8 1
U.26
l.it7
1.19
1.57
l.Oit
.81
.50
5it
18
13.lit
3.26
1.66
lb
Form 56U
5
.6b
J a n u a r y , 1978
7
6
5
it.01
3.71
2.95
1.63
l.lt6
l.it8
l.ll
99
-9U
.5b
.5b
.60
18
10.68
3.71
1.78
.77
7
6
5
U.it5
it.33
3.30
1.67
1.27
1.52
1.06
.97
.87
.60
.it2
.67
18
12.08
3.70
1.70
79
= Standard Deviation
SE = S t a n d a r d E r r o r of Measurement
m
r
= Cronbach a l p h a r e l i a b i l i t y
57
Table 5
Grade
Step
Number
Number
of Items
Form 783
116
Note:
SD
January, 1978
1.2U
1.59
1.27
1.01
1.05
1.17
3it
.56
.15
18
7.25
3.26
1.87
.67
1
3
it
5
6
7
2.86
2.6l
2.53
1.39
1.79
1.U6
95
1.06
1.16
.53
.65
.37
Total
18
8.00
3.70
1.85
.75
January, 1978
6
6
6
3.19
3.2it
1.33
1.55
1.55
1.26
1.06
l.lit
95
.53
.it6
.it3
18
7.76
3.17
I.85
.66
1
3
it
6
6
6
3.56
3.59
1.78
l.it6
I.56
1.33
1.08
1.09
1.06
.51
.37
Total
18
8.93
3.38
1.88
69
1
3
it
tal
130
xx
2.56
2.25
2.it it
Form 78)+
119
SE
m
5
6
7
1
3
it
Total
121
SD
SE = Standard Error of Measurement

m
r
= Cronbach alpha reliability
Table 6
Grade
Step
Number
Number
of I t e m s
Form 581
13k
1
3
it
Total
131
1
3
it
Total
118
1
3
it
Total
lit it
1
3
it
Total
Note:
SD
SD
X
January,
SE
m
r
XX
1978
6
5
7
2.57
2.82
3.16
1.1*7
1.33
1.51*
1.13
1.01
l.ll*
.1*1
.1*2
.1*5
18
8.56
3.1*1
1.87
70
6
5
7
3.18
3.08
3.96
1.59
1.11
1.71*
1.10
1.05
1.10
52
.11
.60
18
10.22
3.1*9
1.88
.71
6
5
7
3A9
3.33
U.25
1.53
1.23
1.55
1.09
.91*
1.13
.1*9
.1*1
.1*7
18
11.07
3.1*5
1.83
72
6
5
7
3.90
3.U8
it. 7 1
1.62
1.26
1.78
l.Oit
91
1.02
18
12.09
3.97
1.73
59
.1*8
.67
.81

m
r
xx
59
Table 7
Grade
Step
Number
Number
of Items
Form
137
1
3
1*
Total
. 138
1
3
1+
Total
111*
1
3
I*
Total
11*1*
1
3
It
Total
Note:
SD
SE
r
582
SE
r
XX
January, 1978
6
6
6
2.81*
3.25
2.6l
1.51
1.1*6
1.26
'' 1.12
1.16
.1*9
.1*1
15
18
8.69
3.21
1.93
.61*
6
6
6
3.51
it.Oit
3.03
1.50
1.1*8
1.28
1.05
1.03
1.12
.51
.52
.21*
18
10.59
3.35
1.83
70
6
6
6
3.92
it.20
3.31
1.63
1.2l*
1.59
.98
1.01*
1.07
.61*
.29
18
11.1*3
3.67
1.80
.76
6
6
6
It.05
it.68
3.57
1.52
1.33
1.33
95
90
1.11
.61
.51*
18
12.31
3.38
1.72
.lb
m
SD
= Standard Error of Measurement

1.08
55
.30
6o
ITBS Subtests
The four scales of the ITBS that were compared with the IPSP test
were Test R, Reading Comprehension; Test W-2, Reading Graphs and Tables;
Test M-l, Mathematics Concepts; and Test M-2, Mathematics Problem Solving.
These subtests were described in Chapter 3.
Reliability estimates
for the scales based on a nationally representative sample of students

are given in Appendix D (Heironymus and Lindquist, 1971*)Correlations Between IPSP Subtests and ITBS Subtests
To further judge the discriminant validity and to help empirically
describe the IPSP test's domain, the relationship between ITBS and IPSP
subtests were examined using multiple regression techniques.
ITBS re-
sults were obtained from the October, 1977, administration of the ITBS
to the same students who were administered the IPSP forms in January,
1978.
Table 8 shows the simple correlations between the three steps
of the IPSP test and the four subtests of the ITBS.
To further illus-
trate these relationships, the squares of these correlations corrected

for attenuation were plotted.
measures
and
A corrected correlation, r , between
is computed by the formula

r
r =
c
where
r
r
xy
(r r )"
xx yy
is the raw Pearson Product Moment correlation and
are estimates of the inner consistency reliabilities of
respectively.
r
xx
x
and
and
y,
The formula used to compute the reliability coefficients
of the linear combinations of the ITBS subtest scores is derived in
Table 8
Correlations Between IPSP Subtests and
Iowa Test of Basic Skills Tests
Grade
li*2
IPSP
Form
563
Step
Number
litO
563
119
783
it
55
.1*6
.63
51
.1*1*
.63
Total
.68
.61*
.68
.65
it
.5U
.31
.62
56
.37
.58
55
.27
.58
.1*1*
.39
.61
Total
.65
.66
.62
.63
it
50
.1*7
.31*
52
.1*3
.31*
50
.43
.38
Total
.61
.60
.55
51
U3
.68
it
.51*
.39
50
.62
.1*5
5h
59
51
.61
.59
.1*5
.1*8
Total
.56
.61*
.68
.60
lit it
582
Mathematics S k i l l s
Problem
Concepts Solving
53
.1*1
.62
Graphs
57
.1*5
.65
Reading
.61
62
Appendix
E. Figures 8 through 11 display this information using
bar graphs.
Of the three steps, step 3 appears to hold the least relationship
to the four ITBS subtests while step b appears to hold the greatest relationship. Note, however, that this trend did not appear in grades
seven and eight as can be observed in Figures 10 and 11.
Another obser-
vation is that, of the four ITBS subtests, reading comprehension consistently contributed the least to the multiple R-squared.
Certainly
the data provide a strong indication that the IPSP subtests measure
skills which are different from those measured by any one ITBS subtest
or any combination of them.

One set of final forms of the IPSP test was administered during
October, 1978, and the other set was administered in March, 1979are two levels of the IPSP test:
There
one for grades five and six, the other
for grades seven and eight. Each level consists of 30 items with each
of the three steps containing 10 items. Students were given 1*0 minutes
to complete the test. The October administration served as a pretest in
the final IPSP project evaluation and the March administration was the
posttest. The sample was obtained from a pool of nearly 200 Iowa classrooms whose teachers volunteered to participate in the IPSP evaluation.
In all, each form of the test was administered to over 1000 Iowa students at each of the four grade levels. The March sample was the same
as the October sample except for student attrition, turnover and absenteeism.
Between October and March, classes received one of three levels
63
Square of C o r r e l a t i o n s Corrected for A t t e n u a t i o n

Between IPSP Steps and ITBS S u b t e s t s
Form 563, Grade 5
1-
r-
.6
.5
.it
.2
.1
_1_
a b c d e
Step 1
b c d e
Step 3
a - M u l t i p l e R-squared
b - Reading Comprehension
c - Graph S k i l l s
d - Mathematics Concepts
e - Mathematics Problem S o l v i n g
Figure 8
a b c d e
Step it
Square of Correlations Corrected for Attenuation

Between IPSP Steps and ITBS Subtests
Form 563, Grade 6
.8...
.7..
.6.
54.1*
2__
a b c d e
Step 1
a b c d e
Step 3
a - Multiple R-squared
c - Graph Skills
e - Mathematics Problem Solving
Figure 9
a b c d e
Step U
65

Form 783, Grade 7
1.
9-U
.8.
.7-
.64-
-T
l*._
.3__
a b o d e
Step 1
a b c d e
Step 3
c - Graph Skills
Figure 10
a b c d e
Step it
66

Form 582, Grade 8
.7
.6
.5
.3
.2
l._
a b c d e
Step 1
a b c d e
Step 3
c - Graph Skills
Figure 11
a b c d e
Step 1*
67
of problem solving instruction, a fact which may have affected the
March data. Hence, we will use the data from the October administration
in this section.
The means, standard deviations, and reliability coefficients are
presented in Tables 9,10.
A more complete presentation of the test data
for both administration times is in Appendix
F.
The reliability coef-
ficients for these forms of the IPSP test range from .67 to .78 for the
individual steps and from .8U to .87 for the total test.
These coeffi-
cients, particularly those for the total, are well within the acceptable
estimates of internal consistency, especially for a rather short test.
In order to determine whether the IPSP subtests measure different
underlying skills, Pearson Product Moment correlation coefficients were
computed for each pair of subtests and corrected for attenuation. These
results, shown in Table 11, can be viewed as a measure of the relationship between the IPSP subtests after adjustment for the lowering
effect of their unreliability.
While there is no standard for how low
the correlations between subtests should be, a comparison was made between these corrected correlations and similar statistics computed for
the Iowa Test of Basic Skills data from a nationally representative
sample of 2558 fifth graders. As shown in Table
11
the corrected
correlations for the IPSP subtests range from .77 to .90; the corrected correlations for the four ITBS subtests range from .75 to .85.
It seems reasonable to conclude that the IPSP subtest scores are no
more highly related than are the ITBS tests of quite different content
68
Table 9
Sample of Iowa Students
October 1978
Grade
Step
Number
Number
of Items
SD
r
XX
Form 565
1215
1311*
10
10
10
5.1*1
6.1*1*
It.96
2.51*
2.10
2.1*7
77
72
78
Total
30
16.81
6.22
.87
1
3
it
10
10
10
6.62
7-23
5-99
2.1* it
1.87
2.36
77
.68
77
Total
30
19.83
5.76
.86
1
3
Form 785
1078
1101
1
3
1*
10
10
10
6.16
5.86
5.37
2 -1*3
2.10
2.18
77
Total
30
17.38
5.72
.84
1
3
it
10
10
10
6.93
6.48
5.96
2.30
2.08
2.19
.77
.68
.70
Total
30
19.38
5-57
.84
.67
.69
69
Table 10
Sample of Iowa Students
March 1979
Grade
Step
Number
Number
of Items
SD
r
XX
Form 566
1161
1184
1
3
4
10
10
10
6.4o
7.04
5-50
2.16
1.62
2.22
.69
.58
.71
Total
30
18.94
4.98
.81
1
3
4
10
10
10
7-25
7.57
6.39
2.05
1.56
2.14
71
Total
30
21.20
4.75
.81
59
.71
Form 786
1024
1
3
4
10
10
10
6.28
6.04
5.6l
2.15
2.14
2.39
.70
Total
30
17-93
5.67
.83
1
3
4
10
10
10
2.13
2.13
2.29
.72
.68
.72
Total
30
19.31
5.55
.84
VD VD VD
910
VD O LTN
t-VD
H
.66
.73
Table 11
Correlations Corrected for Attenuation
of October 1978 IPSP Subtests
Grade
Test
Subtest
IPSP Test
SI
56 5
1215
SI
S3
s4
.86
.88
S3
.80
S4
565
1314
SI
90
77
.82
S3
S4
785
1078
SI
.80
85
S3
.82
S4
785
1101
.83
SI
.82
S3
77
S4
ITBS Test
R
5
11
2558
R
W-2
M-l
M-2
W-2
M-l
M-2
.74
.80
.82
75
.75
.85
71
described in the previous paragraphs. A high relationship between
scores on tests which measure such apparently different content is
usually attributed to a general intelligence factor.
7?
CHAPTER V
ANALYSES AND IMPLICATIONS

Summary and Conclusions
The IPSP -test was designed and developed based on the four steps of
a problem solving model: getting to know the problem, choosing what to
do, doing it, and looking back.
However, the second step, choosing what
to do, was eventually eliminated.
There are presently two equivalent
forms of the IPSP test for grades five and six (565 and 566) and two
equivalent forms for grades seven and eight (785 and 786).
Each of the
forms is in a multiple-choice, machine scorable format.

The main purpose of this study was to validate the IPSP test.
First the test was shown to have a high degree of internal consistency.
Estimates of reliability of forms 565 and 785 were computed by grade
level using a modified KR-8 formula. The reliability coefficients
ranged from .63 for a specific step to .86 for the entire test (based on
a sample of over 1000 Iowa students at each grade level) , well within
the desired range for a test with ten items for each subtest and a total
of 30 items. The first conclusion is based on these results.
CI:
The IPSP test development effort illustrates that it
is possible to construct a psychornetrically sound test

based on the three steps from the problem solving model.
The remainder of the validation was completed in three phases.
73
Phase 1:
An important goal of the IPSP test development was that the test
results should be highly related to data collected via individual "think
aloud" interviews.
The strength of this relationship was measured by
administering a form of the IPSP test to two classes of fifth graders

and interviewing these same students in a think aloud format. The data
collected via the interviews was processed using a quantification scheme
which was developed specifically for this study.
The correlation coef-
ficients between scores attained by the two measurement techniques for

step 1 and step 3 were about .60, but for step 4 a correlation of .13
was found.
Conclusions two, three and four are based on the results of
the final interview study.

C2:
The interview quantification scheme developed for
this study based on the IPSP testing model was objective

and reliable.
C3:
While the relationship between the IPSP scores and
interview data for steps 1 and 3 was quite high, further

work is needed before one set of data can be used to replace the other.
C4:
Step 4 (looking back) behavior was rarely observed
in the think aloud interviews.

Phase 2: ITBS and the IPSP Test
To describe the IPSP test against a more familiar measure, regression techniques were used to show the relationship between four
scales of the ITBS:
Reading Comprehension, Reading Graphs and Tables,
Mathematics Concepts, and Mathematics Problem Solving; and the three
74
steps of the IPSP test. Conclusions five and six are based on the
findings in this phase of the study.
C5:
Of the three subtests, step 3 appears to hold the
least relationship to the four ITBS subtests , while step 4

appears to be most highly related in grades five and six.
This trend did not appear in grades seven and eight.
C6:
Of the four ITBS subtests, graphs and tables and
mathematics concepts are most highly related to the IPSP

subtests and reading comprehension is related the least.
Phase 3: IPSP Subtest Discrimination
The question of whether the three subtests of the IPSP test are
measuring different skills was also addressed.
Correlations corrected
for unreliability were computed for each pair of subtest scores. The
lower these corrected correlations are, the less related are the subtests.
As a point of comparison, similar statistics were computed for
the four subtests of the Iowa Test of Basic Skills used in phase 2.
Conclusions 7 and 8 are based on findings in phase 3.
C7:
The IPSP test is testing skills and abilities not
tested by the ITBS.

C8:
The IPSP subtest scores were no more highly related
than the ITBS subtests which on the face of it have quite

different content.
Limitations
There are several limitations of this study.
In Chapter II it was
pointed out that many researchers suggested that the problem solving
75
process is a multistep process, but it is not clear how skills within
each step are related to overall problem solving skills. That is, the
three scores that the IPSP test yields are measures of skills when the
problem solving process is broken down, but they may not be measures of
how well these skills can be synthesized to solve a new problem. Likewise, the specific skills in the IPSP testing model within the three
steps may not accurately reflect the complexity of the steps . Thus
there may be other important skills which have been overlooked.
The constraint that the IPSP test be machine scorable could be considered as a limitation in that light. The student must choose a constructed answer rather than construct one of his/her own. However, the
advantage of easy administration may outweigh this limitation.
The students for the January, 1978, test comprised a nearly representative sample of Iowa students. Although the sample for the October,
1978, test was not representative, it was chosen from over 200 volunteer
teachers with over 1000 students in each of the grades five through
eight.
Thus any statistics obtained from the IPSP test should be viewed
in this light.
As previously mentioned, there are limitations in the use of the
think aloud interviewing method for determining the processes students
use to solve problems. The presence of an observer may change the behavior of the problem solver who may also not accurately report his
thinking. However, the think aloud approach seems to be the best method
available for directly observing the problem solving process.
The strength of the relationship between the IPSP test and interview data is not only a function of the IPSP test but also of the
76
particular quantification scheme used for the interviews. While the
scheme in this study was carefully developed and yielded high interrater
agreement and internal consistency estimates, other reasonable approaches may have given quite different correlations with the IPSP test.
Classroom Implications
Two implications of the IPSP test development for the teaching of
problem solving will be discussed:
the classroom use of the IPSP test
and the use of questions in the form of those on the IPSP test in homework assignments and classroom tests.
First, the IPSP test development effort illustrates that it is
possible to construct a psychometrically sound test based on the three
steps from the problem solving model. Profiles of students' scores
showing percentile ranks on the three steps of the problem solving model
can be used as a diagnostic tool.
In examining the profiles (Figures 12,
13) of Ann and Dave, the two children whose interviews were described in
Chapter III, it can be seen that Ann is weakest in understanding the
problem while Dave is comparatively weak at carrying out his solution
strategies. Diagnostic assessments such as this would enable the
teacher to provide special instruction to students or classes who are
comparatively low in one of the steps. The IPSP problem solving modules are designed to facilitate such instruction (immerzeel et al.,
1977).
Included in each module are questioning techniques and other
classroom activities to develop each step of the problem solving model.

The second implication concerns questioning strategies in problem
solving instruction. During the course of the test development and
77
Ann's IPSP Test Profile
Percentile
Rank
Problem Solving Step

1
90
80
70
60
50
40
30
20
10
Figure 12
IPSP
Total
4
78
Dave's IPSP Test Profile
Percentile
Rank
Problem Solving Step

1
90
80
70
60
50
40
30
20
10
Figure 13
IPSP
Total
4
79
validation, over one hundred students were observed individually as they
solved problems.
Their general lack of analytical skills became obvious.
Many students viewed verbal problems as little more than algorithmic

exercises, i.e., they either saw what to do immediately or they gave up
entirely.
They did not analyze or explore possible strategies for
solving the problemspossibly because they were not aware that they
should or because they thought the first answer had to be the correct
one.
A frequently observed approach might be called trial and error
with the four operations. The student would try a particular operation
(e.g., addition) which involved all the numbers in the problem regardless of their pertinence to the solution. The student would then look
at the answer, decide whether to choose another operation or accept
this answer. Even when closely questioned, most of these students were
unable to explain their reasoning. The most frequently heard responses
were:
"It (the answer) doesn't look right," or "I don't know" (why I
chose that particular operation).

One way to discourage this approach is to ask questions calling
for analytic or synthetic thought. Students should be asked to identify
extraneous information in a problem setting, to vary conditions in a
problem or to write a question which could be solved using the given
information. This focuses the students' attention on elements of the
problem setting that they may not notice when problems are presented in
the more standard way.
It seems important, too, that these types of
questions not only be used in class discussions, but also in homework

sets and on tests. This will emphasize to the students the importance
80
of developing these skills. The following are two examples of problem
settings and questioning sequences that could easily be included on a
worksheet or a test.
Example 1.
Tell which information is needed and which information is extraneous (not needed) to answer each question a to d.
Weekend Telephone Discount Rates
First minute
Each add itional minute
Chi cago
$ 19
$ .14
San Franc isco
$ .20
$ .15
Last month, Alice made 6 calls to Chicago totalling 42 minutes, and 9

calls to San Francisco totalling 54 minutes. All the calls were made on
the weekend.
a)
On the average, how long was a call to Chicago?
b) What is the cost of an average call to Chicago?

c) What is the cost of an average call to San Francisco?
d) What is Alice's phone bill?
Example 2.
A factory has two machines which make hamburger patties. Machine A
produces 76 hamburger patties per minute and machine B produces 92 hamburger patties per minute. The total number of hamburger patties produced at the factory in one hour would be computed as follows:
There are 60 minutes in one hour,
76 X 60 = 4560 hamburger patties produced by machine A,
92 X 60 = 5520 hamburger patties produced by machine B,
4560 + 5520 = 10,080 hamburger patties produced in one hour.
81
a) Is the above solution correct?
b) Suppose machine A is speeded up to produce 85 hamburger patties
per minute (instead of 76).
What would be the total number of
hamburger patties produced in one hour?

c) Suppose machine B is slowed down to produce 86 hamburger patties per minute (instead of 92).
What would be the total num-
ber of hamburger patties produced in one hour?

d) Suppose machine A starts at 9:00 a.m. but machine B is broken
and doesn't start until 9:30 a.m.
What would be the total num-
ber of hamburger patties produced by 10:00 a.m.?

e) Suppose the hamburger patties produced in one hour are packaged in boxes holding 300 hamburger patties each. How many
boxes of hamburger patties would be packaged?
Questioning sequences such as these require the student to attend
to the data and relationships in the problem. At times there may be
easier ways to obtain a final answer but sequences which require analytic and synthetic reasoning may indeed help develop general problem
solving skills. As Einstein is supposed to have said:
...ability to see the problem is more important than
the ability to solve it....What is the value of
teaching a person to be creative in mathematics if
every question he is faced with in the final exam
has one and only one right answer?
Implications for Research

Several observations were made during this study that suggest further psychometric investigations of the IPSP test and hypotheses for
experimental studies.
82
IPSP Test
1. A previously stated limitation of the IPSP test was that it
was administered to an Iowa population. To further verify the results
obtained in this study the IPSP test should be validated with other
samples of fifth through eighth graders. The IPSP test has also been
used with a group of remedial college mathematics students with reliable
results (Bellile, 1980).
There may be some value in exploring the spe-
cific differences in performance across various samples.

2.
Generative models of each of the steps in the testing model
should be designed and developed for further use in diagnostic testing

and for additional data on the problem solving process. Not only should
the numbers for the stems and foils be computer generated but also settings for items and foils. A student could be presented items with the
same stems but questions from the various steps of the IPSP model on a
computer in interactive mode. Alternatively, the items can be generated and printed directly from the computer. The interactive mode is
preferable since branches may be used to further explore the students'
thought processes.
3. Factor analysis should be employed to determine characteristics
of those students who scored high on each individual subtest.
4. The predictive validity of the IPSP test as a diagnostic instrument should be studied.
One approach would be an experimental study
in which one group of students was taught those skills which the IPSP
test indicated they lacked, while another group was taught general problem solving skills. Performance on a problem solving posttest would be
the criterion. A pilot study with this design was run in December,
83
1978, with neutral results. The study is described in Appendix
G. A
refinement of the teaching methods with a longer treatment time should

be incorporated in a follow-up study.
5. Several findings in this study have implications for problem
solving process research.
a) The infrequent occurrence of looking back behavior in the
interviews is important.
Other investigators report simi-
lar findings. For example, Kantowski (1975) noted, "When

looking back strategies were used, they were for the most
part, to check the solutions of problems which involved
computations (p. 108). The present investigator also observed that when students were asked to check their
answers, most of them would check the computations they
used, but only the brighter students would also check the
solution strategy against the conditions of the problem.
Some important research questions are: Is looking back
important?
Is it taught?
Should it be taught?
If so,
how?
b)
Step 1 and step 4 skills are very closely related. Perhaps looking back involves essentially the same processes
as getting to know the problem initially. The unique
aspects of the relationship between these skills should be
investigated so that better teaching methods can be
designed.
c) Reading comprehension is not a major correlate of problem

solving skill , even of the specific skills tested by the
84
IPSP test. To improve problem solving skill, we must look
to special types of reading, organizational and logical
skills in connection with understanding of mathematical
concepts.
d) It is interesting that step 3 skills were less closely related to the ITBS measures than were skills in the other
steps.
Perhaps estimation skills, use of tables, use of
diagrams, etc. which underlie step 3 are less dependent on

reading skills, reasoning skills , and knowledge of mathematical concepts than are the abilities underlying step 1
and step 4. In addition, the fact that this trend was more
pronounced in grades 5 and 6 than in grades 7 and 8 may
indicate a developmental effect (but, of course, may be an
artifact of subtle test differences).
At any rate, fur-
ther research is needed to help understand this aspect of

problem solving ability.
85
APPENDIX A
DESCRIPTION OF THE IOWA PROBLEM SOLVING PROJECT
86
APPENDIX A
DESCRIPTION OF THE IOWA PROBLEM SOLVING PROJECT
THE IOWA PROBLEM-SOLVING PROJECT*

The importance of problem solving in mathematics is well established.
Teachers and parents alike regard the ability of stuo.nts to
solve problems as one of the major objectives of schools.

The need for materials to teach problem solving is also well recognized.
National tests point up a weakness in existing mathematics
programs; students simply don't perform as well in problem solving as

we'd like.
The Iowa Problem-Solving Project is developing materials to meet
this need.
Funded by a Title IV Grant, the Project authors are direct-
ing their efforts to upgrade students' and teachers' problem solving

experiences in grades 5, 6, 7, and 8.
THE MODEL
The instructional model used in the Project has four steps:
Getting to Know the Problem
Choosing What to Do
Doing It
Looking Back
Students and teachers use the model as a frame of reference. The model
*Excerpted from A Proposal on First Year Funding Under P.L. 93-380,
Title IV, Part C. George Immerzeel, Project Director.
87
provides a language to use in discussing and analyzing a problem.
Each
step in the model is crucial; each step has implications for teaching.
What can the teacher do to help students get to know the problem?
can the teacher do to help students choose what to do?
What
What can the
teacher do bo help students properly carry out their planto do it?

What can the teacher do to help students look back at what they've done
in solving a problem and to extend their students' thinking to other
problems ?
The model provides a structure in which to operate. All problems,
in some degree, are solved by progressing down a common paththe path
to problem solving.
The Iowa Problem-Solving Project materials build in references to
the model.
Teachers and students are expected to use it.
THE CALCULATOR
Now that hand-held calculators are commonly available, they may be
used to solve problems which previously were beyond the ability of many
students.
Problems can be used which are more like those which arise
in everyday livingreal-world problems.
Freed from much of the compu-
tational drudgery, the student may focus on the problem solving process
and those skills needed to solve problems.
Throughout the Iowa Problem-Solving Project students are expected
to use a calculator whenever they wish.
Many problems don't require
much calculation; in some problems the calculations can be done mentally.
Therefore the student must learn when the use of a calculator
is appropriate.
88
The Iowa Problem-Solving Project has written one module specifically to introduce students to using calculators. A second, more advanced, module will be developed later. Throughout the other modules,
the calculator will be used as needed.
THE MODULES
In solving problems, a variety of tools, skills, and strategies
are needed.
The larger the students' kit of tools, the more problems
they can successfully solve.

The Iowa Problem-Solving Project is developing six instructional units to build some of these skills. The units are:
Using Guesses to Solve Problems
Using Tables to Solve Problems
Using Resources to Solve Problems
Using Models to Solve Problems
Using Computation to Solve Problems
Using Calculator Codes to Solve Problems.
Each module consists of a booklet and card deck.
The booklet,
usually about 30 pages long, will provide experiences to develop a particular skill. The card deck, usually about 100 problems, will provide
practice in solving problems which are especially suited to that skill.
USING THE MATERIALS
The eight modules (2 for the calculator, 6 for the problem solving
skills) will be taught in grades 55 6, 7, and 8two modules each year.
Students having all eight modules will have a rich experience seldom
found in existing programs.
89
A module is expected to be taught in the mathematics period, replacing the usual instruction during its use.
The skills booklets take
approximately one week and involve some discussion with the teacher.
The problem cards, however, are suited to individual, partnership, or
small group work.
Students are not expected to work on all 100 cards
in a deck, but rather they will select those problems suited to their
ability and interest.
It is suggested that one week of practice in the
problem deck also be taken from the regular mathematics period, after
the work in the skills booklet has been completed.
During the rest of
the year, the card deck should be available to students to work on as

they have time.
The first use of the modules will be in selected classrooms in
Price Laboratory School in Cedar Falls.
Following the pilot testing,
the Project will revise the materials, readying them for a broader
tryout.
The tryout of the materials will be conducted in Iowa schools (to
get feedback from varied settings).
Inservice work with teachers in
the tryout schools will be conducted jointly by the Project team and
representatives of Area Education Agencies.
Following the tryout, the materials will be revised and readied
for wide distribution.
The first distribution is planned for the fall
of 1978.
EVALUATION OF THE PROJECT
An integral part of the Project is the development of instruments
to measure students' problem-solving skill. Existing instruments are
90
generally narrow in scope, tapping only a few of the tools commonly
used, and often interwoven with computation difficulties.
Concurrently with module development, the Project is building a
problem-solving testa test that is sensitive to the four steps in the
model and that encompasses the variety of skills found in the Project.
The test development involves the tryout of many test items and
validation in the classroom using trained observers and teachers.
91
SAMPLE PROBLEM
What sum are you most likely to get when you roll two dice?
Roll two dice 25 times and keep track of the sums.
j
Three
-7M U^
Will you add 2 numbers each time you roll the dice?
GET TO KNOW
THE PROBLEM
What will you look for when you're finished rolling

the dice?
How will you keep track of the sums?

CHOOSE WHAT
Could you use a list?
TO DO
Could you use a graph?
DO IT
How many times do you roll the dice?
Could your answer be as large as 13?

What would your result be if you rolled the dice
50 times?
IPSP CALENDAR
Sept
76
Nov
76
Jan Mar May

77
77
77
Sept
77
Nov
77
Try-out
Jan
78
Mar May
78
78
Sept
78
Nov
78
Jan Mar May

79
79
79
Develop Instructional
Model for ProblemSolving Process
Problem-Solving
Process Handbook
for Teachers
Using the Calculator
with Whole Numbers
Using Guesses in
Problem-Solving
Using Tables in
Problem-Solving
Using Resources in
Problem-Solving
Using the Calculator
with Decimals
Using Models in
Problem-Solving
Using Computation in
Problem-Solving
Using Equations in
Problem-Solving
Problem-Solving
Instruments for
Summative Evaluation
Study of ProblemSolving Project
NO
ro
93
FLOW CHART OF IPSP MODULE DEVELOPMENT
Write Problem
Bank Examples
Identify
-> Entry Skills
Write Objectives
for Skills Booklet
Complete Problem
Bank
Write Skills
Booklet
Pilot
Module
Evaluate
No
Yes
>
Revise
Module
Write Teachers'
Guide
~~'W
No
Yes
->
Try-out Module
Revise
->| Module
<r
V
Evaluate
No
Yes
->
Disseminate
Revise
- Module
94
APPENDIX B
PHASES OF IPSP TEST DEVELOPMENT AND SUMMARY OF TEST FORMS
95
APPENDIX B
PHASES OF IPSP TEST DEVELOPMENT AND SUMMARY OF TEST FORMS

Time Table for IPSP Project and Test Validation
DATE
ACTIVITY
September, 1976
Iowa Problem Solving Project funded
October-December, 1976
IPSP test model developed

Trial test items written for each step
Item tryouts with individual students
January, 1977
Version 1 of IPSP test in group tryout
March, 1977
Version 2 (revised version l) of IPSP test

in group tryout
January-September, 1977
Development of interview protocols and

quantification scheme
October, 1977
Iowa Test of Basic Skills administered to

Iowa students which includes the representative sample used in the validation
December, 1977
Pilot interview study at Malcolm Price Lab

Schoolforms 561 and 78l as pretests and
forms 562 and 782 as posttests were administered to all students.
Sixty students selected for mini-study to explore utility of IPSP test as an aid in
instructional decision making
Quantification scheme and interview protocol
was used in transcribing think aloud interviews of 32 students
January, 1978
Version 3 of IPSP test in group tryoutIowa

representative sample
January-April, 1978
Revisions of interview protocol and quantification scheme completed

Refine evaluation design of IPSP project
96
March-April, 1978
Pilot run for summative evaluation of IPSP

Forms 56l and 78l administered as pretest
while forms 562 and 782 were administered
as posttest
Data for attitudes toward word problems and
calculators were also collected
April-May, 1978
Final Interview Study at an Iowa City Elementary schoolform 561 was administered to
55 students who were also interviewed in a
think aloud setting
January-August, 1978
Analyze and summarize the validation data

Develop final test forms
October, 1978
Summative evaluationadminister IPSP test

forms 565 and 785 as pretests, and preattitude scales for word problems and
calculators
Oct., 1978-March, 1979
Collect and analyze data obtained from

October test
Mail and collect monthly summary forms involved in treatment groups
March, 1979
Summative evaluationadminister IPSP test

forms 566 and 786 as posttests, and postattitude scales for word problems and
calculators
April-September, 1979
Collect, analyze and summarize evaluation

data
97
FLOW CHART OF IPSP EVALUATION INSTRUMENTS DEVELOPMENT
I d e n t i f y problemsolving behaviours
t o be measured
Write t r i a l
items
test
Initiate validation
procedure
Interview
tryouts
Analyze d a t a
Yes
No
Revise
Try-out first version

of test
_&
Analyze data
Need
v ^ Revision?^^
No
~s
Yes
~5" Revise
Try-out second version c . .. ...

of test
\t
Analyze data
Yes
No
Use in final forms <-
Revise
98
APPENDIX C
99
APPENDIX C
What are 5 cookies plus 4 cookies?

8 more chairs are added to 6 chairs around a table. How many
chairs are there altogether?
Suppose you bought a coke for 20$ and a bag of potato chips for
30$. How much did you pay altogether?
There were 6 hotdogs in a basket. Trudy ate 2 of them?
were left in the basket?
How many
Suppose you had 10 pieces of candy and you gave away 4 of them.
How many pieces would you have left?
Judy baked 18 cookies and Jay baked 24 cookies for the homeroom
party. How many cookies did Judy and Jay bake altogether?
Mrs. Mason looked at the ad and

decided to buy one loaf of apple
bread and one loaf of rye bread.
How much did she pay for the
bread altogether?
DANISH
APPLE
BQA0 IZkjP
57*
f W HoLG WHEAT or Q'Yfi"
I Mao
l'0bfiF 4f<i
100
At the fair Art burst 4 balloons with his first set of darts and
6 balloons with his second set. How many balloons did Art burst
altogether?
Kelly spent 68 cents for lunch and has 21 cents left. How much
did she have to begin with?
At the beginning of the year the school library had 2768 books .
At the end of the year 163 had not been returned. How many books
were left ?
Jerry had 18 baseball cards and David has 23 cards. Then David
gave 10 of his cards to Jerry. How many baseball cards do Jerry
and David have altogether?
Amy and Beth washed the 10 windows in their house. How many
windows did each girl wash?
econd
base
How far does Jim run if he

starts at home plate and runs
all around the bases back to
home plate again?
third
base
home
plate
Mr. Price earned $75 in each of 8 weeks. How much did he earn
for all 8 weeks?
Danny who is 11 years old saves all of his allowance. If he gets
75 cents a week how much will he have saved at the end of 4 weeks?
Jackie worked for 5 hours and earned a total of $8.75. What was
Jackie's average pay per hour?
Mike earned $2 for each car he washed. Saturday he washed 4 cars

in the morning and 3 cars in the afternoon. How much money did
he earn?
101
3.2.
Jan arrived with $3.50

to spend at the fair.
Will she be able to go
on every ride once and
still have $1.25 for
bus fare home?
f?0LL&
60flST
3.3.
Mark bought a flashlight and two batteries.

89$. How much did the batteries cost?
3-4.
A new school has 2 classrooms for each of the grades. How many
classrooms will there be for grades one through six?
3.5.
Mrs. Harvey bought fifteen 13 cent stamps at the post office.

How much did the 15 stamps cost?
3.6.
In art class one day Mrs . White arranged the tables in 2 rows
with 12 in each row. The next day she arranged the tables in 3
rows with 11 tables in each row. On which day did she have more
tables in the room and how many more were there?
3.7.
Suppose you have traded two big marbles for 4 little marbles,
that rate how many big marbles will you need for 16 little
marbles?
4.1.
At a school fair 25 candy apples were sold, but only 1/5 as many
plain apples. How many apples were sold altogether?
4.2.
Tom and Dick ate 5 hamburgers,

eat?
4.3.
Joe is saving money to buy a car which costs $475enough money to buy the car if he saves $235 more,
he saved?
4.4.
Jenny was given $15 for a birthday present. She used the money
to buy a calculator which had been reduced from its regular price
of $12.99 to $7.82. How much did Jenny have left?
4.5.
I bought 6 boxes of crackerjacks expecting to find one prize in

each box. However, 3 boxes had double prizes in them. How many
prizes did I get from all 6 boxes?
The flashlight cost
At
How many hamburgers did each one

He will have
How much has
102
Susie has $2. How long can
she afford to park her car?
Jack wanted to buy a skateboard. When he showed this

ad to his sister Janne, she
decided to buy one also. How
much did each pay for the
skateboard if they shared the
total cost equally?
PARKING RATES:
First half hour
Second half hour
Each additional half hour
HALF
PRICE
75$
50$
25$
SALE ! ! !
SKATEBOARDS
buy one for $24.99
order another one for
ONLY $12.49!!!!
plus $4.55 for handling and
postage on each order of 2
skateboards sent to the same
address.
Doughnuts are sold for 15$ each or 6 for 85$. What is the largest
number of doughnuts you can buy for $4.55?
If a whole number is divided by 6 what is the greatest value the
remainder may have?
Ten boxes of apples each weighing 32 kilograms, and 5 boxes of
pears were delivered to a store. How many kilograms of fruit
were delivered?
Mr. Terry had $15-30 to buy some tickets to a game. He bought 3
adult tickets at $2.25 each and 5 children's tickets at 75$ each.
How much money did Mr. Terry have left?
A bag of sand weighs 5 pounds.

pounds of sand?
How many bags are needed for 350
Your class would like to raise $520 by selling candy which is

packed in 8 ounce boxes. How many boxes would the class have to
sell to make that much money?
Everything weighs 6 times as much on earth as it does on the
moon. A rock weighs 24 pounds on the moon. How much does it
weigh on earth?
Mrs. Cain spent $10 on a shopping trip. She bought a scarf for
$2.35 and 3 pairs of mittens with the remaining money. How much
money did one pair of mittens cost?
103
Mr. Jones earns $23-75 a day. What is his monthly pay if he
works 20 days a month at 6 hours each day?
What is the area of the shaded
part of the figure at the right?
3 mm
How many slices of bread would

you have if you sliced a loaf
in 19 different places?
There are books in two bookcases, an equal number on each shelf.

In the second bookcase there are twice as many shelves, and twice
as many books on each shelf. How many times more books are there
in the second book case than in the first?
Gene plans to buy a new car. He thinks he will be driving it
about 10,000 miles or 16,000 kilometers a year. About how much
gas will he save if he buys an SR108 model rather than an ETD8
model?
An airplane flew 260 kilometers in the first hour and 30 kilometers more in the second hour than in the first. In the third
hour it flew 250 kilometers less than in the first two hours
together. How many kilometers did the airplane fly in the three
hours?
lap 1
Hoyt
Gunner
Jones
55.9
54.8
54.6
lap 2
O OO H
Driver
VJl VJl VJl

VJl VJl VJl
In order to qualify for a

race a driver must complete
4 laps and 3 of these laps
must have a time of 55-8
seconds or less. Who qualified for the race?
lap 3
lap 4
56.8
56.2
56.6
56.0
55.4
54.9
On a test a student got 129 of the 169 problems correct. What

percent did he get correct?
A sports special was on TV for 90 minutes. Seventeen minutes of
that time was used for commercials. What percent of the time was
used for commercials?
A ship traveled 240 kilometers in 12 hours. In how many hours
will it travel 800 kilometers at that rate?
104
Jake answered
the job. The
sales totaled
did Jake earn
this ad and got

first week his
$210. How much
that week?
WANTED - SALES CLERK!!

Base Pay $55 per Week
PLUS
30% Commission on Sales
Over $150.
Apply in Person.
Jo's math scores for the first 4 weeks were 84, 95, 89, and 88.
What was his average score? The math teacher said anyone who had
an average of 90 or better for the first 5 weeks would get an A.
What is the lowest score Jo could make on her fifth test to get
an A?
A rectangular lot is 120 feet by 78 feet,
on the lot is a rectangle 15 ft. by 17 ft.
after constructing the pool?
A drink was made of 200

liters of orange juice,
3 liters of orange coloring and 150 liters of
water. What must it be
labeled according to the
chart at right?
Orange
Orange
Orange
Orange
Orange
A swimming pool built

What is the area left
LABEL
% ORANGE JUICE
Juice
100
Juice Blend
70-95
Juice Drink
30-70
Drink
10-35
Flavored Drink less than 10
Two bicyclists each traveled a distance of 52.8 km. The first

was on the road for 6 hours, and the second for 4 hours. How
many kilometers per hour did each bicyclist travel?
As soon as a new car is sold its value starts to go down. During
the first year a new car loses about 30% of its original price.
During the second year it loses 20% of its value at the end of
the first year. If a new car costs $5500, how much will it be
worth at the end of the second year?
170 + 175 + 180

3
889 + 887
Corn was being transported in trucks. The first truck carried 3
tons, the second carried 5 tons and the third carried half as
much as the second truck. How many tons of corn were being carried in the first and second trucks together?
105
7-4.
A frog fell to the bottom of a 30 foot deep well. Every day he

managed to climb up 4 feet but every night he slipped back 3
feet. How many days did it take the frog to reach the top of the
wall?
7.5-
On a main highway a Shell station is 18 miles from a Standard

station. A Texaco station is to be built on the same highway.
The Texaco station is to be built twice as far from the Shell
station as it is from the Standard station. How far will the
Texaco station be from the Shell station?
7-6.
The floor of a room has to be painted. The room is 7 meters long

and 3 meters high. How much does it cost to paint the floor if
1 square meter of paint costs 75 cents?
7.7-
Twelve persons carried 12 loaves of bread. Each man carried 2

loaves, each woman carried 1/2 a loaf and each child carried 1/4,
a loaf. How many men, women, and children were there?
7.8.
There are two special rectangles. Their lengths and widths are
whole numbers. For each rectangle the area and the perimeter is
the same number. Find the lengths and widths of the rectangles.
7.9.
What is the area of the shaded

part of the figure at the right?
/*y522\
r^&?
7.10. A farmer sowed a plot with oats and two more plots with wheat.
He harvested b times as much oats as he sowed and 8 times as much
wheat. How much more wheat did he harvest than oats?
106
APPENDIX D

107
APPENDIX D
Table 12
Iowa Test of Basic Skills Reliability Analysis
National Representative Sample
Grade
Note.
N
2578
2558
2600
2679
Level
11
12
13
14
Test
SD
SE
m
3.7
2.1
2.9
2.4
.93
.75
.82
.80
.92
.70
.85
.81
XX
R
W-2
M-l
M-2
17.8
12.5
13.67
4.24
7.02
5.34
R
W-2
M-l
M-2
32.1
10.. 3
19.1
12.3
13.04
4.30
7.71
5.65
3,8
2.4
R
W-2
M-l
M-2
37.2
12.6
21.9
13.1
14.38
5.19
8.80
5.90
4.0
2.3
3.0
2.5
92
R
W-2
M-l
M-2
40.3
13.6
22.6
14.2
14.93
5.38
8.70
5,61
3.9
2.3
3,1
2.5
.93
.81
.88
,80
33.6
9-1
3'.0
2.5
.80
.88
,82
SD = Standard Deviation
m
r
= Split-Halves Reliability
Source: Hieronymus, A.N. and Lindquist, E.F. Manual for Administrators,

Supervisors and Counselors: Iowa Test of Basic Skills. Boston:
Houghton, Mifflin Co., 1974.
108
APPENDIX E
FORMULA FOR ITBS CORRELATIONS CORRECTED FOR ATTENUATION
109
APPENDIX E
FORMULA FOR ITBS CORRELATIONS CORRECTED FOR ATTENUATION
Assume we have a linear regression equation
y=P
where
R, W , M.. , M
RR+V2+SMl+V2+d
are the scores given in the four Iowa Tests of
Basic Skills, Reading, Graph Skills, Mathematics Concepts and Mathematics Problem Solving, respectively. Also, for each of these tests we
have a standard deviation and reliability coefficient.
V
PR'
CT
W2' p w 2 '
etc
We wish to determine the reliability coefficient

scores
w=y-d
Denote these by
in terms of the reliability coefficients
of the set of
p ,p
,p
and p
F, .
M2
From t h e p r o c e d u r e d e s c r i b e d on p . 200 of Lord & Novick (1968)
we d e r i v e t h e
formula:
44?R + 4 2 a W 2 P W 2
p -
Here we have used
2 2
+ P
2" 2"
M i a M 1 PM 1 + P M/M 2 P M 2
2~ 2"
2" 2
"
110
2)
= P
R4
and have assumed that the

is
1.
424
B's
+
2
PM/M 1 + P M 2 0 M 2
have been normalized so that their sum
This last assumption will not affect the value of
merely multiplies all the scores
y-d
by a scaling factor
since it
e whose
square appears as a factor in both the numerator and denominator of the

equation (l).
The derivation of (2) is standard and is obtained as follows:
NO"2 = E ( w - w ) 2 = (
w
v
w
B.(W.-W.
))2
= ( Bx 2 ( w1. - w .1 ) 2 ) + E(
w i
w if j
2 B . ( w . - w . )E B . ( w . - w . ) )
i
i
i j
J J <]
= 82 (w.-w. ) 2 + 2 B . ( E ( w . - w . ) E
i
i \
i
i
ij
i w i
i j
= NEP
B.(w.-w.))
J J J
E 2B.(E(w.-w, )EP,(w,-w.))
x
1
x
i5^j
w
w J J
J
? ?
= N3To\
i
where
y-d,
N
i
is the total number of scores, w

and
the set of scores
index the set

y-d.
indexes the set of ..cores
(R,W ,JL,M ) , and
is the mean of
APPENDIX F
ANALYSIS OF IPSP TEST RESULTS
OCTOBER 1978 AND MARCH 1979 ADMINISTRATIONS
112
APPENDIX F
ANALYSIS OF IPSP TEST RESULTS
OCTOBER 1978 AND MARCH 1979 ADMINISTRATIONS
Test Form 565
This test was administered to a sample of Iowa fifth and
sixth graders on or about October 1, 1978. Pertinent results
by grade level are reported here.
Grade 6 (N=13l4)
Grade 5 (N=1215)
Subtest
S.D.
Reliability
S.D.
Reliability
1
2
3
5.1*1
6.44
4.96
2.54
2.10
2.47
0.77
0.72
0.78
6.62
7.23
5.99
2.44
1.87
2.36
0.77
0.68
0.77
Total
16.31
6.22
0.87
19.83
5.76
0.86
Percentile Ranks
Subtest 1
Subtest 2
Subtest 3
Raw
Score
5th
6th
5th
6th
5th
6th
10
9
8
7
6
97
89
81
71
59
94
80
95
81
62
42
25
99
51
38
97
88
74
58
40
95
86
76
64
98
90
77
62
48
5
4
3
2
1
0
46
32
19
10
4
1
27
17
9
4
1
l
25
14
6
3
1
1
13
6
3
1
1
1
51
37
25
14
6
l
34
22
13
6
2
1
66
OO
H
VO
-a
LfN
VD
LfN
H
-P
d
<D
O
Q)
OO
CM
CO
rH
Q)
VD
OO
O
00
00
rH
J"
ON
ON
CO
LfN
00
CM
JCM
ON
H
LfN
H
H
CO
LfN
00
CM
CM
ON
CO
t>-
VO
LfN
ON
J"
CO
OO
h
CM
t -
t
VD
O
VO
00
LfN
CO
J -
O N O N L ^ L f N C M C O L f N O
O N O N O N O N O N C O C O C O
VO
t
CM
t
O
00
CM
CM
H
CM
O
o3
LfN
H
OO
CM
CM
J -
VO
00
OJ
00
CM
t
VD
OO
VD
CO
LfN
00
LfN
t
J"
O
CM
ON
<-\
CO
r-{
tH
VD
<-l
CO
<D
PM
-p
-P
En
VD
H
cd
HJ
O
0)
EH
w
<L>
ON
ON
aj
CO
O
<D
-P
H
I
O
OO
<L>
d
a)
a) o
K o
CQ
ON
CM
OO
CM
t
CM
VO
CM
LfN
CM
J CM
00
CM
114
Test Form 566

This test was administered to a sample of Iowa fifth and
sixth graders on or about March 15, 1979. Pertinent results by
grade level are reported here.
Grade 6 (N=ai84)
Grade 5 (N==1161)
Subtest
_
X
S.D.
Reliability
_
X
S.D.
Reliability
1
2
3
6.40
7.04
5-50
2.16
1.62
2.22
0.69
0.58
0.71
7-25
7-57
6.39
2.05
1.56
2.14
0.71
0.59
0.71
Total
18.94
4.98
0.81
21.20
4.75
0.81
Subtest 1
Subt<3St 2
Subt est 3
Raw
Score
5th
6th
5th
6th
5th
6th
10
9
8
7
6
96
87
74
58
42
93
78
60
1*1
26
98
88
69
46
26
96
81
57
33
15
99
94
85
72
58
97
89
75
58
41
5
4
3
2
1
0
26
15
7
3
1
1
15
8
3
2
1
1
13
4
1
1
1
1
6
2
1
1
1
1
42
27
15
6
2
1
26
14
7
3
1
1
Test Form 566
30-Item Total Test Percentile Rank

Grade 5
Grade 6
_
Score
Grade 5
Grade 6
30
99
99
23
10
29
28
99
98
15
14
17
98
95
13
13
27
96
91
12
26
92
85
11
25
24
88
77
10
83
23
77
69
61
22
70
53
21
63
44
7
6
20
56
36
19
18
49
42
29
23
17
16
36
18
30
14
Test Form 785

This test was administered to a sample of Iowa seventh and
eighth graders on or about October 1, 1978. Pertinent results
by grade level are reported here.
Grade 7 (N=1078)
Subtest
Grade 8 (N=1101)
S.D.
Reliability
S.D.
Reliability
1
2
3
6.16
5.86
5.37
2.43
2.10
2.18
0.77
O.67
O.69
6.93
6.48
5.96
2.30
2.08
2.19
0.77
0.68
0.70
Total
17.38
5.72
0.84
19 ..38
5-57
0.84
Per centile Ranks

Subtest 1
Subtest, 2
Subtest 3
Raw
Score
7th
8th
7th
8th
7th
8th
10
96
86
73
59
46
45
99
94
83
68
32
50
97
88
74
56
39
99
95
87
74
60
45
98
9
8
7
6
5
4
94
79
29
16
20
10
3
2
1
0
33
22
13
6
1
1
61
21
13
7
3
1
1
34
21
11
4
l
l
25
13
6
3
1
1
92
80
65
49
34
2
1
1
1
117
Test Form 785
30-Item T o t a l Test P e r c e n t i l e Ranks

Grade 7
Grade 8
RaV
0
Grade 7
Grade 8
Soore
30
99
99
29
28
99
35
30
23
98
15
14
99
96
13
25
15
27
97
93
12
21
12
26
94
88
11
16
25
90
83
10
12
24
85
78
23
80
71
22
76
64
21
70
57
7
6
20
64
51
19
18
59
44
53
17
47
41
37
32
27
16
19
5
4
118
Test Form 786

This test was administered to a sample of Iowa seventh and
eighth graders on or about March 15, 1979. Pertinent results by
grade level are reported here.
Grade 7 (N=910)
Grade 8 (N=1024)
Subtest
S.D.
Relia^ bility
S.D.
Reliability
1
2
3
6.28
6.04
5.61
2.15
2.14
2.39
0.70
0.66
0.73
6.76
6.60
6.15
2.13
2.13
2.29
0.72
0.68
0.72
Total
17.93
5.67
0.83
19.51
5-55
0.84
Percentile Ranks
Subtest 1
Subtest 2
Subte:st 3
Raw
Score
7th
8th
7th
8th
7th
8th
10
9
8
7
6
97
88
75
61
44
95
84
68
51
35
98
91
79
63
48
97
87
70
53
36
98
92
81
68
55
97
87
76
62
46
5
4
3
2
1
0
29
16
7
3
1
1
22
12
6
2
1
1
33
19
10
3
1
1
23
13
6
3
1
1
41
28
16
7
3
1
32
20
10
4
1
1
ON
CO
<D
13
CM
CM
CO
H
CO
LfN
J -
00
ON
VD
CM
H
H
H
CM
t
OO
CM
C3
d)
Ti
crj
CM
CM
CM
00
CM
LfN
_3-
0)
13
03
ON
CO
ON
0)
T3
ONONCO
LTNCMCO
ro
O N O N O N O N O N O O C O
r(
BJ
K
0)
VD
CO
0)
crj
K
O
O
o \
CO
t ^
VO
LTN
_=J-
O O C M
rl
<D
ft
O
-P
-P
W
0)
EH
CO
<D
EH
P
O
EH
ON
LTN
ON
ON
OJ
CO
VO
VD
00
VD
co
h
oooo
rVO
ON
t^
LfN
H
LfN
ON
00
OO
00
CO
CM
CMVD
o
-=r
VO
LTNLfN-d-
CO
00
<D
+5
H
I
O
oo
0)
o
K
o
CQ
00
ON
CM
CO
CM
VD
CM
LfN
CM
-=tCM
00
CM
CM
CM
o
CM
ON
CO
VD
120
APPENDIX G
121
APPENDIX G
A pilot teaching study was conducted in December, 1978, in conjunction with the pilot interview study. The schedule of activities is
shown in Figure l4.
After the IPSP tests were scored and the inter-
views were completed, the standard scores
(X=50; S.D. =10)
student for each step of the IPSP test were computed.
of each
Those students
whose score on step 1 (understanding the problem) was at least 10 points

less than their score on either of the other two steps in the model were
selected for treatment regardless of score achieved or grade level. A
total of 60 students met this criterion and were randomly assigned to
one of three treatment groups: group one was given a three-day treatment in which step 1 only was taught, group two was taught all four
steps during the three days, and group three attended regular mathematics classes . Equivalent forms of the IPSP test were administered to all
three groups after the treatment.
Groups one and two were taught by two
IPSP teachers while, group three was taught by the regular classroom
teacher.
The group means on the IPSP post-test were not significantly different . However, a number of improvements in the design and treatments
could yield different results.
In later discussions with the two par-
ticipating teachers, many reasons were conjectured for the results.
122
Schedule for Pilot Study
Activity
Date
Responsibility
11/28
paper and pencil test for

group 1
classroom teacher
11/29
interview 4 students from

each grade in group 2 (room 4)
the investigator
11/30
paper and pencil test for

group 2
classroom teacher
interview 4 students from

each grade in group 1 (room 4)
the investigator
12/1 thru 12/5
Score IPSP pre-test

Choose 3 groups of 20 students
each whose scores differ by at
least 10 points between step 1
and other steps. Random assign
to Group I treatment of all 4
steps; Group II treatment step
1 only; Group III control
the investigator
12/5
Call and inform teachers of

group assignments
the investigator
12/6, 12/7
Treatment started for 3-day

treatment
involved teachers
12/8
Last day of treatment but

school closed because of sn<
12/9
Snow day
12/10,12/11
Weekend
12/12
Last day of treatment
involved teachers
12/13
IPSP post test administered
involved teachers
Figure l4
123
1.
It was not possible to give the treatment as scheduled. Tues-
day and Wednesday went as scheduled but because of snow the school was
closed on Thursday (scheduled for the last day of the treatment) and
Friday (scheduled for the post-test day). Since school was not in session Saturday and Sunday, the students were absent from school for four
consecutive days.
Thus the third day of treatment was like starting
over again.
2.
During this long weekend, the teacher who taught group two be-
came ill and had to be replaced. Fortunately, her replacement was the
director of the IPSP who had been visiting the classes intermittently
throughout the treatment and was well known by the students.
3. When the groups met on the first day and discovered that there
were mixed grade levels, the seventh and eighth graders asked, "Why are
we here?
Did we blow the test?"
The adjustment of the students to
this kind of grouping probably was a detrimental factor.

4.
Even though the participating teachers were acquainted with the
students and were involved in the development of the modules which were
being used, they found it difficult to teach such a diverse group:
"The older and brighter students tended to be more aggressive."
5.
Some of the students were disappointed that they had to miss
such activities as basketball practice, science projects, and music lessons to attend these treatment sessions. Their negative attitude was
another potential difficulty for groups one and two.
The teaching study should be carried out again with the following
suggested changes.
1.
Give the treatment over a two-week (or longer) period.
124
2.
Do not give the treatment across all grade levels, but com-
bine fifth and sixth graders, and seventh and eighth graders.
3.
Step 1 was chosen because this investigator thought that this
was the most important step in the model. Perhaps studies which focus
on the other steps should be undertaken also.
125
BIBLIOGRAPHY
Beldin, H. A study of selected arithmetic verbal problem solving skills

among high and low achieving sixth grade children. Unpublished
doctoral dissertation, Syracuse University, i960.
Bellile, E. Mathematical problem solving processes of college students
whose lowest ACT scores are their mathematics scores. Dissertation
in progress, University of Iowa.
Bloom, B. S., & Broder, Lois J. Problem-solving processes of college
students. Supplemental Educational Monographs, No. 73. Chicago:
University of Chicago Press, 1950.
Brunk, L., Collister, E. G. , Swift, C , & Stayton, S. A correlational
study of two reasoning problems. Journal of Experimental Psychology, 1958, 55, 236-241.
Campbell, D. T., & Fiske, D. W. Convergent and discriminant validation
by the multitrait-multimethod matrix. Psychological Bulletin,
1959, 56, 81-105.
Chase, C. The position of certain variables in the prediction of problem solving in arithmetic. Journal of Educational Research, i960,
54, 9-1U.
Claparede, E.
353-368.
La psychologie de 1'intelligence.
Claparede, E.
1-154.
La genese de l'hypothese.
Scientia, 1917, 22,
Arch, de Psychol., 1934, 24
Covington, M. V., Crutchfield, R. I., & Davis, L. B. The productive

thinking process, series I, general problem solving. Berkeley:
Breselten Printing Co., 1966.
Cronbach, L. J. Coefficient alpha and the internal structure of tests.
Psychometrika, 1951, 16, 297-334.
Cronbach, L. J., & Snow, R. E. Individual differences in learning ability as a function of instructional variables. Final report.
Stanford: School of Education, 1970.
126
Crutchfield, R. S. Instructing the individual in creative thinking. In

New approaches to individualized instruction. Princeton: Educational Testing Service, 1965 pp. 12-25.
Crutchfield, R. S. Creative thinking in children: Its teaching and
testing. In 0. G. Brim, Jr., R. S. Crutchfield, & W. H. Holtzman,
Intelligence: Perspectives 1965. New York: Harcourt Brace &
World, 1966, pp. 33-64.
Dansereau, D. F., & Gregg, L. W. An information processing analysis of
mental multiplication. Psychonomics Science, 1966, 6_, 71-72.
Davis, G. A. Current status of research and theory in human problem
solving. Psychological Bulletin, 1966, 66_, 36-54.
Davis, G. A., Manske, M. E., & Train, A. J. Training creative thinking.
Occasional Paper No. 6. Madison: Wisconsin Research and Development Center for Cognitive Learning, University of Wisconsin, Jan.
1967.
Dewey, J.
How we think. Boston:
Duncker, K. On problem solving.

(5, Whole No. 270).
Heath, 1933.
Psychological Monographs, 1945, 58,
Farnham-Diggory, S. Cognitive processes in education: A psychological

preparation in teaching and curriculum development. Evanston, 111.:
Harper & Row, 1972.
Flaherty, E. G. The thinking aloud technique and problem solving ability. Journal for Research in Mathematics Education, 1975, 68,
223-225.
Foster, T. E. The effect of computer programming experiences on student
problem solving behaviors in eighth grade mathematics. Ph.D. dissertation, University of Wisconsin, Madison, 1972.
French, J. W. The relationship of problem-solving styles to the factor
composition of tests. Research Bulletin RB-63-15. Princeton:
Educational Testing Service, 1963.
French, J. W., Ekstrom, R. B., & Price, L. A. Kit of reference tests
for cognitive factors. Princeton: Educational Testing Service,
1963.
Gagne", R. M. The acquisition of knowledge. Psychological Review, 1962,
69(4), 355-365.
Gagn, R. M., & Smith, E. C. A study of the effects of verbalization on
problem solving. Journal of Experimental Psychology, 1962, 63,
12-18.
127
Goldberg, D. J. The effects of training in heuristic methods on the
ability to write proofs in number theory. Unpublished doctoral
dissertation, Teachers College, Columbia University, 1973.
Guilford, J. P. The nature of human intelligence.
Hill, 1967.
New York:
Mcgraw-
Green, B. F., Jr. Current trends in problem solving. In B. Kleinmuntz

(Ed.), Problem solving: Research, method and theory. New York:
Wiley, 1966, pp. 3-18.
Hadamard, J. The psychology of invention in the mathematical field.
New York: Dover, 1954.
Henderson, K. B., & Pingry, R. E. Problem solving in mathematics. In
H. F. Fehr (Ed.), The learning of mathematics: Its theory and
practice. Yearbook, National Council of Teachers of Mathematics,
1953, 21, 228-270.
Hiatt, A. A. Assessing mathematical thinking abilities of sixth, ninth
and twelfth grade students. Ph.D. dissertation, University of
California, Berkeley, 1970.
Hieronymus, A. N., & Lindquist, E. F. Manual for administrators, supervisors and counselors, levels edition, forms 5 & 6, Iowa Test of
Basic Skills. Boston: Houghton Mifflin Co., 1974.
Hollander, S. K. Strategies of selected sixth graders reading and working verbal arithmetic problems (Doctoral dissertation, Hempstead,
New York: Hofstra University, 1973). Abstract: Dissertation
Abstracts, 34, 6258A, No. 10, 1974.
Immerzeel, G. Iowa problem-solving project. A proposal for first-year
funding under P. L. 93-380, Title IV, Part C. Submitted to State
of Iowa, Department of Public Instruction, Planning, Research, and
Evaluation, 1976.
Iowa Test of Basic Skills. Teacher's manual forms 5 & 6, levels 9-l4.
University of Iowa, Iowa City, Iowa, 1971James, J. A. A study of the effects of problem solving strategies developed in teacher in-service workshops in fourth and fifth grade
children's achievement (Doctoral dissertation, Detroit, Michigan:
Wayne State University, 1972). Abstract: Dissertation Abstracts,
33A:6649-6650, June, 1973.
James, J. B. A comparison of performance of sixth-grade children in
three arithmetic tasks: Typical textbook verbal problems; revised
verbal problems including irrelevant data; and computational exercises . Ph.D. dissertation, Tuscaloosa: University of Alabana,
1967. Abstract: Dissertation Abstracts 28:2030B, No. 5, 1967-
128
Jerman, M. Instruction in problem solving and an analysis of structural

variables that can contribute to problem solving difficulty. Technical Report #180, Psychology and Education Series. Stanford:
Institute for Mathematical Studies in the Social Sciences, 1970.
Jerman, M. Individualized instruction in problem solving in elementary
school mathematics. Journal for Research in Mathematics Education,
1973, 4(1), 6-19.
John, E. R. Contributions to the study of the problem-solving process.
Psychological Monographs, 1957, 71(l8, Whole No. 447).
Johnson, D. A. Introduction. Evaluation in mathematics. Yearbook,
National Council of Teachers of Mathematics, 1961, 26_, 1-6.
Johnson, D. M. The psychology of thought and judgment.
Harper, 1955-
New York:
Kalmykova, Z. I. Analysis and synthesis as problem solving methods. In

J. Kilpatrick, E. G. Begle, I. Wirzak, and J. W. Wilson (Eds.),
Soviet studies in the psychology of learning and teaching mathematics (Vol. 13) Stanford: School of Mathematics Study Group.
(In press, 1974)
Kantowski, E. L. Processes involved in mathematical problem solving.
Doctoral dissertation, University of Georgia, 1974.
Keisler, E. R. Teaching children to solve problems: A research goal.
Journal of Research and Development in Education, 1969, 3., 3-l4.
Kilpatrick, J. Analyzing the solution of word problems in mathematics:
An exploratory study. Unpublished doctoral dissertation, Stanford
- University, 1967Kilpatrick, J. Problem solving in mathematics, Review of Educational
Research, 1969, 39, 523-534.
Kilpatrick, J. Problem solving and creative behavior in mathematics.
In J. W.-Wilson and L. R. Carry (Eds.), Reviews of recent research
in mathematics education. Studies in mathematics (Vol. 19).Stanford: School Mathematics Study Group, 1969.
Kimble, G. A. A letter to Art Mellon. In J. F. Voss (Ed.), Approaches
to thought. Columbus, Ohio: Chas. A. Merrill Publ. Co., I969,
pp. 333-340.
Kleinmuntz., B. (Ed.). Problem solving: Research, method, and theory.
New York: John Wiley and Sons , 1966.
129
Krutetskii, V. A. The psychology of mathematical abilities in schoolchildren . Translated from the Russian by Joan Teller. J. Kilpatrick and I. Wirszap (Eds.). Chicago: University of Chicago Press,
1976.
Kuder, G. F., & Richardson, M. W. The theory of the estimation of test
reliability. Psychometrika, 1937, 2.(3), 151-160.
Lester, F. K. Mathematical problem solving in the elementary school:
Some educational and psychological considerations. Paper prepared
for the Research Workshop on Problem Solving in Mathematics Education Center for the Study of Learning and Teaching Mathematics,
The University of Georgia, May, 1975.
Lord, F. M. , 8B Novick, M. R. Statistical theories of mental test
scores. Reading, Mass.: Addison-Wesley, 1968.
Lucas, J. F. An exploratory study in diagnostic teaching of elementary
calculus. Unpublished doctoral dissertation, University of Wisconsin, 1972.
Luchins, A. S. Mechanization in problem solving.
graphs , 1942, 5 M 6 ) .
Psychological Mono-
Lundsteen, S. W., & Michael, W. B. Validation of three tests of cognitive style in verbalization for third and sixth grades. Educational and Psychological Measurement, 1966, 26(2) , 449-461.
Mayer, R. E. Thinking and problem solving: An introduction to human
cognition and learning. Glenview, Illinois: Scott, Foresman and
Company, 1977McKeachie, W. J., & Doyle, C. L.
Wesley, 1970.
Psychology. Reading, Mass.: Addison-
Mikhal'skii, K. A. The solution of complex arithmetic problems in auxiliary school. In J. Kilpatrick, E. G. Begle, I. Wirzap, & J. W.
Wilson (Eds.), Soviet studies in the psychology of learning and
teaching mathematics (Vol. 9) Stanford: School Mathematics Study
Group, 1975National Collection of Research Instruments for Mathematical Problem
Solving.Gerald Kulm (Ed.).Purdue University, 1976.
Newell, A., Shaw,
logic theory
Feigenbaum &
McGraw-Hill,
J. C , & Simon, H. A. Empirical explorations with the

machine: A case study in heuristics. In E. A.
J. Feldman (Eds.), Computers and thought. New York:
1963, pp. 109-133.
Newell, A., & Simon, H. A.

Prentice-Hall, 1972.
Human problem solving.
Englewood Cliffs:
130
Paige, J. M., & Simon, H. A. Cognitive processes in solving algebra
word problems. In B. Kleinmuntz (Ed.), Problem solving. New York:
John Wiley and Sons, 1966, pp. 51-119.
Poincare, H. Science and method. Translated by F. Maitland.
Charles Scribner's Sons, 19l4.
Polya, G. How to solve it (2nd ed.).
1957.
New York:
Garden City, New York: Doubleday,
Polya, G. Mathematical discoveryon understanding, learning and teaching problem solving (Vol. l). New York: John Wiley and Sons,
1962.
Post, T. R. The effects of the presentation of a structure of the problem-solving process upon problem-solving ability in seventh grade
mathematics (Doctoral dissertation, Indiana University, 1967).
Abstract: Dissertation Abstracts, 1968, 28, 4545A, No. 11.
Proudfit, L. Measuring problem-solving processes in elementary children. Unpublished manuscript, Indiana University, Bloomington,
1979Ray, W. S. Complex tasks for use in human problem solving research.
Psychological Bulletin, 1955, 52, 134-149Reitman, W. Some Soviet investigations of thinking, problem solving and
related areas. In R. A. Bauer (Ed.), Some views on Soviet psychology. Washington: American Psychological Association, 1962, pp.
29-61.
Restle, F., & Davis, J. H. Success and speed of problem solving by
individuals and groups. Psychological Review, 1962, 69, 520-536.
Riedesel, C. A. Problem solving: Some suggestions from research.
Arithmetic Teacher, 1969, 16, 54-58.
Scandura, J. M. Mathematical problem solving. American Mathematical
Monthly, March 1974, 8l, 273-280.
Shulman, L. S. Psychology and mathematics education. In E. G. Begle
(Ed.), Mathematics education. Yearbook, National Society for the
Study of Education, 1970, 69, 23-71Shulman, L. S., & Elstein, A. S. Studies of problem solving, judgement,
and decision making: Implications for educational research. In
F. Kerlinger (Ed.), Review of research in education 3. Itasca,
Illinois: Peacock Publ., 1975.
131
Simon, H. A. Learning with understanding. ERIC, Information Analysis

Center for Science, Mathematics and Environmental Education, The
Ohio State University, Columbus, June 1975Simon, H. A., & Newell, A. Human problem solving: The state of the
theory in 1970. American Psychologist, 1971, 26, 145-159.
Speedie, S. M. , Treffinger, D. J., & Feldhusen, J. F. Teaching problemsolving skills: Development of an instructional model based on
human abilities related to efficient problem solving. Report to
the U.S. Department of Health, Education, and Welfare, Office of
Education, Bureau of Research. Purdue University, August 1973.
Wallas, G.
The art of thought. New York: Harcourt, 1926.
Wearne, D. C. Development of a test of mathematical problem solving

which yields a comprehension, application, and problem solving
score (Tech. Rep. 407). Madison: Wisconsin Research and Development Center for Cognitive Learning, 1976.
Webb, N. L.
A review of the literature related to problem-solving
tasks and problem-solving strategies used by students in grades 4,
5 and 6. Unpublished manuscript, Indiana University, 1974.
Webb, N. L.
An exploration of mathematical problem-solving processes.
(Doctoral dissertation, Stanford University, 1975). Dissertation
Abstracts International, 1975, .36, 2689A. (University Microfilms
No. 75-25, 625)
Webb, N. L., & Moses, B. E. Developmental activities related to summative evaluation. Technical IV, Mathematical Problem Solving Project. Bloomington, Ind.: Mathematics Education Development Center, 1977Wertheimer, M.
Productive thinking. New York: Harper & Row, 1945-
Wickelgren, W. A. How to solve problems. San Francisco: W. H. Freeman

and Co., 1974.
Wilson, J. W. Generality of heuristics as an instructional variable.
Unpublished doctoral dissertation, Stanford University, 196l.
Wilson, N. Objective tests and mathematical learning. Australian
Council for Educational Research, Frederick Street, Hawthorn,
Victoria, 3122. Sydney: Halstead Press, Jan. 1970.
Wittman, E. Matrix strategies in heuristics. International Journal of
Mathematical Education in Science and Technology, May 1975? 6(2),
187-188.
132
Wittrock, M. C. Recent research in cognition applied to mathematics

learning. Mathematics Education Reports. ERIC, Information Analysis Center for Science, Mathematics and Environmental Education,
The Ohio State University, 1973. (Mimeographed)
Zalewski, D. I. An exploratory study to compare two performance meaures: An interview coding scheme of mathematical problem solving
and a written test, Tech. Rep. 306 . Madison: Wisconsin Research
and Development Center for Cognitive Learning, 1974.

Thesis 7 - 16

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Thesis 7 - 16

Загружено:

Авторское право:

Доступные форматы

INFORMATION TO USERS

THE DEVELOPMENT AND VALIDATION OF A TESTING INSTRUMENT

The University of Iowa

18 Bedford Row, London WC1R 4EJ, England

All Rights Reserved

THE DEVELOPMENT AND VALIDATION OF A

A thesis submitted in partial fulfillment of the

This is to certify that the Ph.D. thesis of

Bob and Jim

REVIEW OF THE LITERATURE

Models of Problem Solving

DEVELOPMENT AND PROCEDURES

Purpose of the Iowa Problem Solving Project

IV. DATA AND RESULTS

IPSP Test and Interview Data

ANALYSES AND IMPLICATIONS

Summary and Conclusions

DESCRIPTION OF THE IOWA PROBLEM SOLVING PROJECT . . . .

PHASES OF IPSP TEST DEVELOPMENT AND SUMMARY OF TEST

INTERVIEW PROBLEMS USED IN THE PILOT STUDY

IOWA TEST OF BASIC SKILLS RELIABILITY ANALYSIS

ANALYSIS OF IPSP TEST RESULTS OCTOBER 1978

PILOT TEACHING STUDY

1. Phases of Validation of the IPSP Test

Analysis of IPSP Test with Interviews-Pilot Study

3. Analysis of IPSP Test with InterviewsFinal Study

Correlations Between IPSP Subtests and Iowa Test of Basic

Reliability Analysis Sample of Iowa Students

Reliability Analysis Sample of Iowa Students

Correlations Corrected for Attenuation of October 1978 IPSP

Iowa Test of Basic Skills Reliability Analysis

Steps and Component Skills of the IPSP Test Model

Quantification Scheme for the Components of the IPSP Test

Quantification Scheme for the Components of the IPSP Test

Quantification Scheme for the Components of the IPSP Test

Step it: Looking Back

Schedule of Events for Pilot Interview Study

Schedule of Events for Final Interview Study

Square of Correlations Corrected for Attenuation Between

Square of Correlations Corrected for Attenuation Between

Square of Correlations Corrected for Attenuation Between

Ann's IPSP Test Profile

Dave's IPSP Test Profile

Schedule for Pilot Study

The question of what processes are involved in problem solving and

As one surveys the literature, one not only

In such cases, the researcher

observes the behavior of the subject as he "thinks aloud" while solving

In addition, results and methods of research-

ers investigating the cognitive processes involved in general problem

solving process researchers have proposed various multistep models.

Based on his own

Drawing upon many years of

other multistep models have appeared in the literature with varying

If there were a reliable testing instrument

which measured this ability, of what value would it be to the classroom

problem solving instruction?

Of course, the one-to-one interview strat-

egy is available to evaluate a child's problem solving processes but the

In addition, the interview has been criticized

for yielding results which are subjective and unreliable.

application sections which consist of verbal problems. For example,

Metropolitan Achievement Tests (Harcourt, Brace and Jovanovich, 1971)