Академический Документы
Профессиональный Документы
Культура Документы
Project 2 Reflection
Georgia Tech
Author Note
Project 2 Reflection
Project 1 approach
The approach in Project 1 was only verbal. The verbal approach already had some attributes
and ontology as part of the problem set. This allowed us to find the differences between the
attributes in the verbal description of the problem, build a semantic network and then arrive at the
correct answer. The agent uses the verbal description of the problem to identify the number of
1. If the shape is the same in both A and B then it sets the transform attribute for
transformed-shape as same.
2. If there is one object in both the source and the destination and there were no objects
4. By iterating over the other attributes, it identifies which ones are changed and adds it
This transformation is used to create the Semantic Network and stored with the original
A similar transform is produced for each of the answer choices and C using the original algorithm.
The transformation between C and each of the answer choices is compared with the original
transformation. The result that matches the original transformation is picked as the correct answer.
The challenges in Project 1 by using the verbal approach was around interpreting the
Size: medium
CS7637 KBAI Spring 2017: Project 1 Reflection (jgeorge84) 3
Should we arbitrarily assign numerical values to the values of size? Using the verbal
approach allowed us to calculate the number of objects and which ones were deleted in subsequent
figures easily.
The problem set itself in Project 1 had fewer computations because it was a 2x2 matrix.
The transformations in Project 1 were calculated along the horizontal and vertical axes but not the
diagonal axes. The number of answer choices was also smaller - 6 rather than 8 in the current
problem set. In contrast, for project 2, the RPM is a 3x3 matrix where the transformations have to
be computed along the horizontal, vertical and diagonal axes, validated by comparing with other
rows and columns and the answer choices before choosing the correct answer.
The agent for this phase uses a visual approach for reasoning. The purely visual approach
was chosen to verify the approach and strategy to solve Ravens Progressive Matrices (RPM) in
subsequent phases. One approach is to extract information from the figures, build a semantic
network and apply the transformations to choose the correct answer. Another approach is the
Gestalt method where the representation of the figures is not extracted but they are visually
compared. Another approach is a hybrid approach which uses the Gestalt method for solving some
and extracts some features to solve the others and as a last resort takes a brute force number of
pixels comparison approach when all else fails. This strategy mirrors our human cognition
approach where we take a first pass at solving problems using a superficial similarity approach
and only choosing to deep dive, and extract and compare particular features when necessary.
This approach worked well for 70% of the Basic and 60% of the Test and Challenge
However, the agent fails at slightly more complicated problems that moves along the x or
y axes.
CS7637 KBAI Spring 2017: Project 1 Reflection (jgeorge84) 5
Two classes - a RavensTransform holds the transformation for an object within a Ravens
figure and RavensSemanticNetwork holds the transformation for each object in a RavensFigure.
A transformation step attribute was also added as part of the RavensTransform class to identify the
transformation.
Original-shape:
Transformed-shape:
Position:
Rotation:
Reflection:
Deleted:
StartCol:
CS7637 KBAI Spring 2017: Project 1 Reflection (jgeorge84) 6
StartRow:
Width:
Height:
The last four attributes are used to extract shapes from the RavensFigure. The shape
however is not identified in this phase (and it is probably not necessary to identify the shape) as
long as the correspondence between the shapes in figures can be identified. The current algorithm
extracts shapes that are filled. For shapes, that are not filled, the inverse of the figure is taken to
Backward compatibility was not a significant consideration for the design of the visual
reasoning because the fundamental approach had changed. The program was run against tests in
Basic, Test and Challenge Problem Set B to ensure that the same agent could solve them. The
biggest challenged faced were extracting shapes and finding correspondence of the shapes among
the figures.
The agent uses the concept of Semantic Networks to solve the problems. Semantic
networks represent the objects as nodes and the relationship between the objects as links. Labels
on the links provide description of the relationships. In the case of a 3 x 3 Ravens Progressive
links and labels. The labels also identify the transformation that occurs for each object within a
Ravens Figure. For each answer choice in #, we will represent H:# and use the knowledge
representation (Semantic Network) from previous transformations (T1 T5) to choose the correct
answer for #. This method of problem solving is called Represent and Reason the problem is
represented using Semantic Networks and the correct answer is identified using reasoning over the
representation.
CS7637 KBAI Spring 2017: Project 1 Reflection (jgeorge84) 7
The agent uses the verbal description of the problem to identify the number of objects in
the source and destination for row 1 of a 2x2 matrix. Let us take Fig 2 as an example:
5. It extracts the shapes from A and B and forms a correspondence built on the shape
and the coordinates. The circle is named Shape1. A logicalXor indicates that the
shape has moved to B. The Euclidean distance between the two coordinates is
calculated.
transformation. The horizontal transformation has a higher weight than the vertical
transformation.
CS7637 KBAI Spring 2017: Project 1 Reflection (jgeorge84) 8
8. It performs Step 1 for A to E identifying the diagonal transformation which has the
least weight.
9. It then compares H with the answer choices and scores each one using the
transformations in Steps 1 4.
10. The answer choice with the highest score is set as the answer
Refactoring for phase 3 includes correcting the overfits and moving createTransform and
The current version of the agent has difficulty with positional attributes and has given me some
insights into human cognition. As humans, we are able to identify the different objects and
as humans, we can see that the correct answer is 8. My agent fails in this case and it is because it
is unable to identify cases where H and the answer choice are the same. It appears to identify the
first answer which has four white squares without taking the orientation of the black squares into
consideration.
CS7637 KBAI Spring 2017: Project 1 Reflection (jgeorge84) 11
Currently, the agent handles cases where there is an affine transformation. It is having trouble
identifying the solution for cases where there are overlaps. Part of the reason could be that the
algorithm for extracting shapes is not robust enough. A contour tracing algorithm might work better
in this case.
When there are three objects, it is difficult to find the correspondence especially when two of
The behavior of the agent models human cognition to a large degree. The figure below from
Carpenter, Just and Shell 1990 gives a representation of human cognition based on experiments
The four boxes represent the four stages in the agent. Perceptual analysis is replaced by encoding
the verbal representation and finding correspondences. The createTransform method is analogous
compares the original transformation (transform between first row) with that of the generated
CS7637 KBAI Spring 2017: Project 1 Reflection (jgeorge84) 13
transformation between H and each of the answer choices. Of course all of the objects and their
Apart from the structure of the agent being similar to human cognition, there are similarities to
how the agent behaves because we have modeled it on our understanding of human cognition. That
said, it is remarkable how quickly the brain is able to visually identify the similarities and
differences between the figures in RPM. Using different weighted systems for the cognition will
allow agents to identify similarities or differences among images that may not be very evident to
a human observer.
CS7637 KBAI Spring 2017: Project 1 Reflection (jgeorge84) 14
References
Goel, Ashok. (2015). Geometry, Drawings, Visual Thinking and Imagery: Towards a Visual
Turing Test of Machine Intelligence, Proceedings of the 29th Association for the
Austin, Texas.
https://en.wikipedia.org/wiki/Raven's_Progressive_Matrices.
Carpenter, P., Just, M., and Shell, P. 1990. What one intelligence test measures: a theoretical
account of the processing in the Raven Progressive Matrix Test. Psychological Review, 97(3),
404-431.