Академический Документы
Профессиональный Документы
Культура Документы
pp
roach for Translating Mathematics
Problems in Natural Language to S
p
ecifcation
Language COKB of Intelligent Education
Software
Nhon Do Hoai Phan Truong
2
, Tuyen Trong Tran
3
1
1nformation Technology University
nhondv@uit.edu.vn
2
Economics and Law University
hoaiphan@gmail.com
3
Binh Duong University
tttuyen@live.com
Abstract - Nowadays the research on intelligent systems for
Problem solving in education has had lots of practical
applications. A system based on ECOK model and a
specifcation language can process and produce gradual guide
ways for ftll problem solving explanation. However. these
systems limit users because they require the data input and the
result output both not in natural language but specification
language, which few users knows how to use this language input
to the systems. Therefore, to improve abilit of interacting by
natural language, this paper proposes a new aproach for
translating mathematics problems in natural language to
specifcation language as well as bringing out the model of
natural language translating method associated with knowledge
base domain for efective translation. Besides, this paper shows
the abilit of this approach applied in diferent knowledge
domains.
Index Term: Artificial intelligence, Education software,
knowledge base system, automatic problem solving system.
1. INTRODUCTION
Now there are a lot of applications such as searching,
database management. .. have used knowledge base, most
of them have not been able to perform data input in
natural language (NL) yet. Such sofware required the
data input for queries requests to set up interface input
that allows users choose fom given parameters or
classifed data input. Or other applications are used in a
kind of language to specify data from a compatible
application and knowledge domain, which is called the
specifcation language (SL). Some of those are intelligent
systems for problem solving (ISPS) in geometry [2] with
user-friendly solutions explanation, which are applied to
knowledge base model of rather complete knowledge
base called ECOKB[9]. However, the problem input and
explanation output are required in SL. Or structured query
language (SQL), used to query database, is produced from
data input fom users via interfaces selectively. This
limits the ability for using the programs popularly.
978-1-4244-6936-9/10/$26.00 2010 IEEE
321
Today natural language processing is also applied in
many domains, such as classifying documents, translation
machine, etc. However, the ability of general processing
is not effective when applied to a certain knowledge
domain. Above all, Vietnamese has too many specifc
characteristics to apply some of researched results from
other languages to. Among present Vietnamese
translating methods, there have been some researches
associated with knowledge domains in progressing but
not much effect.
With such consideration, it is found necessary to design
a translation method that can translate geometry problems
from NL to SL for input data of ISPS and their solutions
from SL back to NL so that it is easier for users to use
these systems.
This research is expected to reach the aim of advancing
a model of general translation associated with knowledge
base domain, a model of knowledge base organization
and knowledge processing method as well as processes of
translation from NL into general SL. These are able to be
applied to many other domains, for example, public
administration and education ...
These available natural language processing methods
are not effective enough to be applied in every general
case and geometry problem solving systems are
particularly used for mathematics knowledge domain.
Therefore, this research experiments on analytic geometry
problems to fnd out the best approaching model (the ih
grade geometry in particular) as well as relying on
ECOKB's model of knowledge base.
2. NATURAL VIETNAMESE PROCESSING AND
INTELLIGENT PROBLEM SOLVING
SYSTEMS
2.1 Natural Vietnamese processing
Present approaches of natural language processing
have had some results in factual applications, for example,
in translation, text classifcation for searching ...
However, Vietnamese processing has been limited in
general using for different domains. Here are some results
from Vietnamese processing:
2.1.1 Methodsfor word segmentation in Vietnamese
Word segmentation and classifcation are some basic
algorithms of natural language processing which are
applied in Vietnamese. Here are some methods: -
Maximum Matching method: [in Chih-Hao Tsai, 2000] its
results are not absolutely accurate because it depends
much on dictionaries. Transformation-based Learning
(TBL) , word segmentation model with weighted fnite
state transducer (WFST) and Neural network use
linguistic labeled corus to learn those rules automatically.
However, it is difcult to construct a corpus with fll
details in Vietnamese because it takes a lot of time and
effort to do that. - Dynamic programming method: With
51 % probability of right word compared to 65%
probability of approximate word, this method is less
effective than the others above. - Internet and Genetics
Algorithm-based Text Categorization for Documents in
Vietnamese (lGATEC) has less accuracy, low running
time and is un-experimented on large data.
From the general view, the word-base method has got a
high accuracy value of over 95%, due to a large training
corpus which has been annotated accurately. However,
the algorithm output absolutely depends on this training
corpus. Above all, WFST method is the best choice
because the author's goal is to segment words accurately
for machine translation. Those methods, in which
dictionaries or training corus is essential, help us not
only segment words accurately but also base on marked
information to perform other puroses of part-of-speech
(POS) specifcation, for example, machine translation,
dictation check or synonym dictionaries ... Therefore,
word-base approach for machine translating purose has
brought worthy results in spite of rather long training time,
complex installation and training corpus constructing.
This research approach will use the most successful
method of word recognition, so building a good wordnet
is really essential. The knowledge domain is implemented
on geometry problems that it is not too difcult to make a
dictionary for word segmentation and POS tagging (WS
PT).
2.1.2 Text classication
Some works on text classifcation for text storing or
searching have achieved some satisfactory results. For
example, applications in [1] text classifcation in
Vietnamese, in [2] text classifcation for e-newspapers to
set up a multi-lingual information searching system with
phrase segmentation technique in [5]. It is used some
325
techniques of word and word form statIstIcs for text
classifcation to apply to social statistic.
2.1.3 Machine translation
Language translation research is presented with
satisfactory results; however, Vietnamese translation is
still limited in using. Experimenting with machine
translation ability is also presented in [9] in this research.
The ability of knowledge transference from NL to SL,
which is processed on computers, has got some frst
results in translating NL to SQL [6]. However, the
research is still in progress so most of other applications
today have to specify knowledge manually.
2.2 Intelligent systems for Problem
In [3], a system with a general "C-Object Solver"
package of C-Object knowledge base (COKB) in MAPLE
enables to solve problems automatically for some kinds of
geometric objects such as triangles, quadrangles ... as well
as related knowledge of deduction and manipulation
performed on C-Object model. MAPLE, the powerful
Computer Algebra sofware with programming assistance
on complex abstract data structures, is appropriate to code
problem solving models and methods.
2.2.1 Problem solving process
The problem solving process accesses C-Object (CO)
[3] based on COKB model. It also constructs the
knowledge base which the program needs to use for
deduction and manipulation. This knowledge base
includes objects, object relations, fact, operators and
rules ...
The following general diagram of problem solving
process will be explained as follows: If a problem is
required to solve, the system will be input the problem in
SL, then analyze and process it. The problem analyzing
establishes its model of objects, interested attributes,
variances, events (including events of manipulating
relations, of object classifcation ... ) and the problem's
goal. Afer that, problem solving module will carry on
explanation searching based on available knowledge base.
Finally, explanation will be presented.
<-I Know|edgesew|oesI
/ r 1
Eplaination
Hnowledge
Fig 2.2.1 The general diagram of problem solving
process.
During problem solving process, problem solving
module carries out the explanation automatically.
However, the process of problem analyzing and explicit
explanation, which are produced in NL, is implemented
manually based on users' knowledge base. This research
will put the Intelligent Analytical Geometry Problem
Solving System (IAGPSS) in experiment to learn about
its SL and knowledge presentation model, then fnd out
translation model and arrange knowledge base domain for
the purpose of proceeding with knowledge translation
knowledge, to translate NL into knowledge specifcation.
2.2.2 Plane Geomety problem solving system
According to the introduction above, Plane Geometry
problem solving system is based on C-Object knowledge
base (COKE) model. The diagram of problem solving
process with C-Object network model is presented as
above for C-Object solving (as in Fig 2.2.l). Processed
problems are specifed in the language which the system
is graspable to process called specifcation language.
From the specifcation input, the system will process to
fnd out the explanation for each step based on knowledge
base model. The interested problems are SLand
knowledge presentation model to assist translation
process.
2.2.3 Specication Language
In [3], the specifcation of problem input is presented
in two parts: (I) the hypothesis starts at
"begin_hypothesis" and ends at "end_hypothesis"; (2)
the goal starts at "begin_goal" and ends at "endoal". It
is the SL which is constructed in the problem solving
system. Its structure is:
begin_hypothesis [ {<parameters>}, { < objects
> }, {< facts > }, { operators } ] end_hypothesis
begin_goal [<goals of problem> ] end_goal
In the hypothesis, users have to specify parameters,
objects, facts ... in the problem. The parameters appeared
in the problem are present at parameter. The key words
for declaring in Object is the C-Object name in Object
Knowledge base, such as: : "DIEM" (point), "DOAN"
(segment), "GOC" (angle), "TAMGIAC"(triangle),
"TAMGIACCAN" (Isosceles triangle)... 1 Plane
geometry KE domain.
The syntax of object declaration is
[<object>, <object te>);
The Object which is specifed is the object name or
structured type of object base list. For example: "cho
aiim M" (let a point M) is presented as [M,"DIEM"], or
another example "cho tam giac ABC" (let ABC triangle)
is specifed as [TAMGlAC[A,B,C],"TAMGlAC"] in here
A,B, C is the list of points.
The facts presenting the relations or features, properties
are defned as:
[relation name,{Object list)}
326
The key words of relations are performed in "[]" such
are "CAT", "VUONG", "VUONGGOC",
"SONGSONG" ... For example: the relation of
C'nh CD c