Вы находитесь на странице: 1из 12

Hindi-English Cross-Lingual Question-Answering System


New York University

We developed a cross-lingual, question-answering (CLQA) system for Hindi and English. It accepts questions
in English, finds candidate answers in Hindi newspapers, and translates the answer candidates into English
along with the context surrounding each answer. The system was developed as part of the surprise language
exercise (SLE) within the TIDES program.
Categories and Subject Descriptors: I.2.7 [Artificial Intelligence]: Natural Language Processing—Question-
answering systems
General Terms: Languages
Additional Key Words and Phrases: Hindi

We developed a cross-lingual question-answering (CLQA) system for Hindi and English.
It accepts questions in English, finds candidate answers in Hindi newspapers, and
translates the answer candidates into English along with the context surrounding each
The idea of CLQA is relatively new, based on advances in question-answering
technologies [Voorhees 2000; Harabagiu et al. 2000; Ramakrishnan et al. 2003;
Ravichandran et al. 2003; Li et al. 2003; Moldovan et al. 2003]. An evaluation project for
multilingual language question-answering was conducted at about the same time as we
developed our English-Hindi system. In this evaluation track, part of the Cross-Language
Evaluation Forum (CLEF) [Magnini 2003a], questions were given in one of five
European languages--Italian, Spanish, Dutch, French, or German--and the answers found
in English texts. ITC-irst created an Italian-English system [Magnini 2003b]; the
University of Southern California / ISI created a Spanish-English system [Echihabi 2003];
DFKI created a German-English system [Neumann 2003]; and Carnegie Mellon
University, University of Limerick, and University of Montreal created French-English
systems [Rijke 2003; O’Gorman 2003].
Our system was developed as part of the surprise language exercise (SLE) within the
TIDES program. In this exercise, participants were expected to produce systems in a
month after being notified of the language selected. Plans were made to develop systems
for cross-lingual information retrieval (CLIR), machine translation (MT), summarization,
and named entity (NE) tagging.
This research was supported by the Defense Advanced Research Projects Agency (USA) under Grant N66001-
00-1-8917 from the Space and Naval Warfare Systems Center San Diego. This paper does not necessarily
reflect the position or the policy of the US Government.
Authors' address: Computer Science Department, New York University, 715 Broadway, 7th floor, New York,
NY 10003, USA.
Permission to make digital/hard copy of part of this work for personal or classroom use is granted without fee
provided that the copies are not made or distributed for profit or commercial advantage, the copyright notice,
the title of the publication, and its date of appear, and notice is given that copying is by permission of the ACM,
Inc. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific
permission and/or a fee.
© 2004 ACM 1073-0516/03/0900-0181 $5.00

ACM Transactions on Asian Language Information Processing, Vol. 2, No. 3, September 2003, Pages 181-192.
182 • S. Sekine and R. Grishman

It so happened that three of these components, CLIR, MT, and NE tagging, are key
components for constructing a question-answering system. We sought to demonstrate
that with these building blocks it is possible to construct a CLQA system in a very short
period of time.
We would like to emphasize that this was a team effort. It would never have
been possible without the data, tools, systems, and ideas provided by the participants in
the SLE; details of the contributions will be described later.
In the following, we describe the data and components we used in the system, the
system itself, the evaluation results, and the analysis of errors.
We used data, tools, and systems provided by other participants in the SLE, along with
some components we already had and some we had recently developed. We also
produced some data for NE, numeric expression dictionaries, and so on. In the next
section we present the system overview, followed by a description of each component.
2.1. System Overview
There are two kinds of components: those for tagging, in advance, the names and numeric
expressions in a Hindi newspaper (taggers) and those that comprise the main question-
answering system (system components).
The taggers are designed to tag possible answers in the text; they are applied in
advance to annotate the text. When a question to the QA system is analyzed, the type of
expected answer is determined. For example, if the question starts with "Who" the
expected answer type is a person or organization. If it starts with "When" the expected
answer type is "TIME". The types we used are the three NE types defined in the shared
NE task of the SLE (person, organization, and location) and 15 numeric types we had
developed. Once the type of expected answer is determined, the QA system tries to find
answer candidates in the text that match the type and relate to the question. All texts are
automatically tagged by the NE and numeric taggers in advance, so that answers may
easily be found at runtime.
The system structure is similar to other monolingual QA systems [Voorhees and Tice
2000; Harabagiu et al. 2000; Sekine et al. 2002b]. The CLQA system consists of four
components: question examiner (QE), CLIR system, answer finder (AF), and MT system;
rather than the three components in most monolingual QA systems, which have QE, IR,
and AF. The question is first analyzed by the QE, which identifies the expected types of
answer and the keywords. Then the keywords are used by CLIR to retrieve relevant news
articles in the Hindi newspaper. Next, AF is used to extract answer candidates with
confidence measures; finally, the answer and context of the answer are translated back to
The user interface is web-based and written in perl-cgi. It receives the question and
displays the answer candidates. It has two user-selectable options; one selects between
two possible MT systems, the ISI's MT system or word-to-word translation using
bilingual dictionaries; the other option selects whether or not the processing information
should be displayed.
The overall system picture is shown in Figure 1.

ACM Transactions on Asian Language Information Processing, Vol. 2, No. 3, September 2003.
Hindi – English Cross Lingual Question – Answering System • 183

Fig. 1. System overview.

2.2 Taggers
In a typical QA system (developed over several months or even years), the number of
word or phrase types annotated in the text could be dozens, 100, or even more [Hovy
2001; Sekine 2002; Mann 2002]. However, as our development time was so restricted,
we limited ourselves to the 3 NE types defined in the SLE's shared NE task and 15
numeric types.
2.2.1 NE Tagger
The common NE task for the SLE involved three types of names: names of people, names
of organizations, and names of locations. The specifications were developed by the
Linguistic Data Consortium, based on the NE task for the Message Understanding
Conferences. Using these specifications, about 600,000 words of data (principally BBC
news, but also samples of other sources) were labeled cooperatively by BBN, the
Linguistic Data Consortium, and our group at NYU. Two Hindi-speaking graduate
students did the annotation at NYU, using a tool provided by the Linguistic Data
The annotated data was used to train an HMM-based NE tagger. The general approach
to using an HMM for NE tagging was pioneered by BBN [Bikel et al.1997]. We used a
relatively simple HMM model, with 6 states for each type of name; transition and
emission probabilities were not conditioned on the prior state or token, unlike the BBN

ACM Transactions on Asian Language Information Processing, Vol. 2, No. 3, September 2003.
184 • S. Sekine and R. Grishman

system. For tokens previously unseen at a given state, we backed-off to an emission

model based on the final 3 characters of a token. When tested on a sample of 20 hand-
annotated Hindi BBC news reports, the tagger had a precision of 82% and recall of 74%.
2.2.2 Numeric Expression Tagger
The numeric expression tagger was developed by using simple pattern-matching; we
designed the 15 types shown in Figure 2.
The Hindi language uses both Arabic numbers and spelled-out numbers (like "one,"
"two," and "thirty"). We selected the 15 types based on an analysis of a KWIC list of
numbers, also referring to the Extended Named Entity definition [Sekine 2002a]. We
built the list of indicators of numeric expressions (e.g. "soldier" for N_PEOPLE type)
from the KWIC list. It contains 112 words developed by a native speaker of Hindi.
Simple pattern-matching is done on the input sentences. The list of patterns is shown in
Figure 3.
This simple strategy works to some extent. Examples that the system could not tag
correctly include cases where a modifier exists before the INDICATOR (e.g., "10
American soldiers") or the numeric expression does not have an INDICATOR (e.g., "10
were dead"), and phrases with INDICATORs not included in our short lists.

AGE: Age of person 10 people, 2 soldiers

MONEY: Monetary expression 10 rupees, $200
PERCENT: Percent expression 10 percent, 2%
PERIOD: Length of time 10 hours, 2 days
PHYSICAL_EXTENT: Length 10 meters, 2 inches
SPACE: Area 10 acres, 2 hectares
SPEED Speed 10 mph
TEMPERATURE: Temperature 10 degrees Centigrade
TIME: Time June 30, 1998, 10AM
VOLUME Volume 10 cc
WEIGHT: Weight 10 kilograms, 2 pounds
COUNTX: Number of thing/events 10 missiles, 2 attacks
N_COUNTRY: Number of countries 10 countries
N_LOCATION: Number of locations 10 cities, 2 towns
N_PEOPLE: Number of people 10 people, 2 soldiers

Fig. 2. Fifteen numeric types.

digit+ INDICATOR → type of INDICATOR

[digit] MONTH [digit] → TIME
digit (1900-2009) → TIME

Fig. 3. Numeric patterns.

ACM Transactions on Asian Language Information Processing, Vol. 2, No. 3, September 2003.
Hindi – English Cross Lingual Question – Answering System • 185

The numeric tagger tagged the 5,557 newspaper articles in our corpus in a few
seconds. The recall and precision of the tagger, based on a small corpus (149 instances of
numeric expressions) are 35% and 52%. The limited availability of native speakers made
it difficult to improve or even debug the system.
2.3 System Components
The CLQA system consists of 4 components — QE, CLIR, AF, and MT — as described
above. We developed each component using existing programs or programs borrowed
from an outside system.
2.3.1 Question Examiner
The input to this component is the user's question in English. The program analyzes the
question, and outputs the expected type of answer (one or more of the 3 name types or 15
numeric expression types) and keywords in English, which will be used by the next
components, CLIR and AF.
The input is analyzed by a part-of-speech tagger and chunker; prepared key question
patterns are then applied to identify the expected answer type. From a question, all nouns,
adjectives, verbs, and adverbs except those in our stopword list are extracted as keywords.
No multiword phrases are used. Words are weighted for further use; proper nouns receive
the largest weight, nouns next, and the others are assigned the lowest score.
2.3.2 CLIR System
Cross-lingual information retrieval is done in order to find the list of documents that
might contain the answer candidates. The system looks up the keywords in an English-
Hindi bilingual dictionary, which will be described later, and a word list of all possible
Hindi translations of the English keywords is created (with the score assigned in QE; note
that we did not do any scoring in this translation phase). As we found that the reliability
of different dictionary sources varied greatly, a simple scoring method based on the
source dictionary might improve the CLIR performance. We chose word-based CLIR
based on suggestions in the SLE mailing list. We also followed the suggestion to use
IBM's large bilingual dictionary, created automatically from a bilingual corpus, although
we found that the quality of the dictionary did not seem as high as that of the other
2.3.3 Answer Finder
We used the top 20 newspaper articles retrieved by the CLIR component to find the
answer candidates. This number (20) was selected intuitively; optimization of the
number might improve the total performance. In those articles, the expressions that
matched the type found by the QE component were the candidates for the answer. We
scored those candidates based on their distance from Hindi keywords and keyword
weight. The score for a candidate expression is calculated by the formula shown in
equation (1).

score(e) = ∑
k∈keywords dist ( k , e) + C 2
* keyword _ weight (k ) * article _ score


ACM Transactions on Asian Language Information Processing, Vol. 2, No. 3, September 2003.
186 • S. Sekine and R. Grishman

Here, c1 and c2 are constants; dist(k,e) is the distance in tokens between the candidate and
keyword; keyword_weight is the weight assigned by the question examiner; and
article_score is the score assigned by the CLIR system.
Because it does not need any linguistic analysis (e.g. morphological analysis or
dependency analysis), it was reasonable to try in the limited development time. Further
investigation of the benefits of such analyses or sophisticated scoring might be important.
Note that even if more than one Hindi keyword corresponding to an English keyword
exists, only the closest Hindi keyword is used for the calculation; this results in preferring
the answer candidate with the larger number of keywords rather than a smaller number of
very close keywords.
2.3.4 The MT System
We now have answer candidates in Hindi and must translate them back into English. As
the answer candidates are noun phrases, we did not use automatic MT, but rather a
bilingual dictionary to produce the English answers. For the words not in the dictionary
(out of vocabulary words), we used Romanized Hindi words (produced by public-domain
tools). The system produced an English translation of the context of each answer, so that
users could verify whether the answer is correct or not. We provided two options: one is
to use ISI's machine translation (MT) server, and the other to use word-to-word
translation. ISI's MT system is also a product of SLE and was available to us as an MT
server. The word-to-word translation was just a look-up of the bilingual dictionary with
all possible translations listed for each word of the context. The out of vocabulary words
in the context were not translated using Romanization, as this may just confuse English
2.4 User Interface
The system is accessible via the Web. There are two pages; one to accept a question and
the other to display the answer candidates and context sentences (Figure 4). In the first
page, the user can also select two options; one for the choice of MT system (ISI's MT, or
word-to-word translation), and the other to display all information or answer candidates
only. The second option is for system development purposes, but is also useful for
understanding how the system works.
2.5 Data
We used several kinds of data; news reports in Hindi, several kinds of dictionaries, NE-
tagged text for training the NE tagger, and word lists for numeric expressions. As our
source of answers, we used Hindi BBC news text provided by MITRE. It consists of a
half-year (January to June 2003) of news reports, which MITRE downloaded from the
web site. The original data has many overlaps, as they downloaded files almost every day
without regard to duplication of articles. The duplicates were removed by simple
heuristics; the final corpus had 5,557 articles.
The bilingual dictionary was created from many sources (Figure 5). Other data was
available, but we did not use dictionaries that were not easily convertible to UTF-8
encoding. As noted in other articles in this issue, encoding is one of the biggest problems
in handling Hindi text. We sought to avoid this problem by using only those resources
that were easily convertible to UTF-8 encoding.

ACM Transactions on Asian Language Information Processing, Vol. 2, No. 3, September 2003.
Hindi – English Cross Lingual Question – Answering System • 187

Fig. 4. User interface of the QA system (displaying answers).

IIIT's Lexicon
SPAWAR's TIMEXP list (version2)
UMASS's city name list
UMASS's stopword list
NYU's TIMEXP supplement dictionary
Sheffield's gazetteer
NYU's 1000-word lexicon missing in BBC (created by a native speaker)
CMU's NE list (corrected by a native speaker)
ISI's Location dictionary
IBM's dictionary derived from IBM's bi-text

Fig.5. Resources for our dictionary.

Finally, the knowledge needed for the numeric expression tagger was mostly created
at NYU. Three kinds of knowledge were used: a list of spelled-out numeric expressions,
time expressions (created from SPAWAR's TIMEXP list, but also added to by our native

ACM Transactions on Asian Language Information Processing, Vol. 2, No. 3, September 2003.
188 • S. Sekine and R. Grishman

speaker), and the indicators of numeric expressions. As was explained before, the
indicator list is the set of words that follow numbers and indicate the type of numeric
expression (for example "soldier" for N_PERSON). Using a KWIC list, a native speaker
tagged the most frequent 150 words as an indicator of some numeric types.
We evaluated the CLQA system using questions created by a native speaker of Hindi.
The result indicates that the system works for some questions and that its accuracy is
acceptable, even though it was developed in one month.
3.1 Creating Questions and Answers
We asked a native Hindi speaker to read newspaper articles and create some questions for
the system. It was known that this approach to generating questions, based on inspecting
the texts that the system subsequently uses to find answers, was likely to generate
questions that use wordings very similar to those in the corpus (and so are easier for the
system to process correctly). However, mostly due to time constraints, we could not
avoid this; i.e., we did not have the time and resources to judge if answers existed in the
corpus for a set of independently-created questions. The types of questions were balanced
by NE and numeric types; we also suggested including Indian-specific events for some
portion of the questions. The native speaker used the retrieval tool for newspaper articles
and gave it some random keywords, and then when she found an interesting article, made
up a question about it. There were a total of 56 questions; the type distribution was
person(19), organization(6), location(4), age(2), time(7), weight(4), physical_extent(2),
percent(2), people(2), money(2), volume(1), speed(1) and countx(4).
3.2 Example Questions
Figure 7 shows some examples of questions, preceded by the rank of the correct answer
the system obtained (- indicates that the correct answer did not appear among the top 10
returned by the system for that question).
3.3 Evaluation Results
We ran the system for the 56 questions and evaluated the result. For 25 questions, the
system found the correct answer in English (including in 3 cases of the Romanized
English, because no translation was found in the bilingual dictionary). Table I shows the
number of questions for which the correct answer was found at a given rank. The
distinction between the two columns (Hindi/English correct answers) is explained below.
The MRR (mean reciprocal rank) for the top 5 answers is 0.25 for the 56 questions.
Although the question’s level of difficulty is a very important factor, and it is a bit
dangerous to compare different evaluations, the MRR score is not much lower than that of
some monolingual QA systems. We believe that this suggests that cross-lingual question-
answering is a viable application.
3.4 Analyses
We analyzed the cause of errors at each stage.
The question examiner tries to find the expected answer type and keywords. Table II
shows the accuracy of the type-finding process.

ACM Transactions on Asian Language Information Processing, Vol. 2, No. 3, September 2003.
Hindi – English Cross Lingual Question – Answering System • 189

Rank Hindi English

1 10 9
2 2 2
3 8 6
4 1 1
5 1 1
6 2 2
7 1 1
8 4 3
9 0 0
10 0 0
total 29 25

Table I. Number of Questions and Correct Answer Ranking

1: Who is the prime minister of India
3: Who is the chief of the weapon inspection team sent to Iraq
1: Which company created a video game on Iraq war
1: When was modern Jewish nation established in Tel-Aviv
3: When was PLO established
-: When did India test Prithvi missile
-: When was Shimla Agreement signed between India and Pakistan
3: How much money was involved in Bali bomb attack
-: What was the age of Harivansh Rai Bachhan when he died
8: What percentage of shares are held by Indian government in Maruti company
1: What is the proposed length of the gas pipeline to be brought from Iran to India via
-: How many gallons of water is present in Amrit Lake at Amritsar in India
6: What is the weight of Indian rocket Rohini-560
-: What is the speed of Supersonic Concorde
-: How many times did Chris Everett win French Open
1: How many people have been feared dead due to flood in southern part of SriLanka
3: Which Indian player has broken the record for scoring maximum runs in World Cup
-: Which Indian political party was once headed by Farooq Abdullah

Fig. 7. Example questions.

Table II. Accuracy of Type-Finding

Correct 51
Incorrect 3
Can't find type 2

ACM Transactions on Asian Language Information Processing, Vol. 2, No. 3, September 2003.
190 • S. Sekine and R. Grishman

Out of 56 questions, the question examiner component found the wrong type for 3
questions and could not find any type for 2 questions. As the question examiner is a
component we had developed before, the accuracy is quite high, and we can say that the
overall result was not affected by this component. The questions for which the system
could not find the type start with "Which Indian cricketer has been cleared off …" and
"Which is the Headquarters of …". In both cases, the nouns "cricketer" and "Headquarter"
were not listed in the common noun dictionary to indicate the type. The questions for
which the system made the wrong guess are "How many gallons of water is present in
Amrit Lake at Amritsar in India," where the correct type is VOLUME but the system
found COUNTX, "How much does Pakistan's Shaheen missile weigh," where the correct
type is WEIGHT but the system found MONEY or VOLUME, and "How much tea was
exported to Iraq by India in year 2002," where the correct type is WEIGHT but the system
found MONEY or VOLUME.
We did not measure the accuracy of keyword extraction, as it is not easy to evaluate
separately; but we did evaluate the quality of keyword translation. Here, we used the
dictionary created from various sources. As a result, there are sometimes many
translations for many words. For example, the English word "India" has 11 Hindi
translations in the bilingual dictionary, but only 3 are correct. We found that this over-
generation is mostly due to IBM's bilingual dictionary, which is generated automatically
from bilingual text. This phenomenon is more common for popular words. Although
CLIR can reduce the effect of such mistakes by combining the scores of multiple
keywords, this might cause some degradation in accuracy. From 56 questions, 268
English keywords (in total) were produced (an average of 4.8 keywords per question).
Out of 182 keywords that have an entry in the dictionary, 105 keywords have the wrong
translation and the average number of wrong translations is 3.4. Further investigation is
needed to analyze the effect of the wrong translations.
There are also 86 keywords that have no translation. Most are proper nouns, like
person, location, organization, or product names. This is clearly an important cause of
errors. For example, for the question "When did India test Prithvi missile," if "Prithvi"
can't be translated, there is little possibility that the system can extract the correct answer.
The relationship between the number of zero translations of keywords for each question
and the existence of the correct Hindi answer in the top 10 is shown in Table III.
It is clear from the table that if there is no translation for some keywords, accuracy is
much worse (38%; 14 out of 37) than when all keywords are translated (58%; 11 out of
19). We might need transliteration technology to improve this.
We evaluated our CLIR system by known document retrieval methods. The document
used to create the question is included in the top 20 retrieved documents 55% of the time.
Because the answer can be extracted from articles other than the original article, this

Table III. Relationship Between Zero Translations and Existence of Correct Answers
Number of zero translations 0 1 2 3 4 5 Total
Correct answer found 11 7 5 2 0 0 25
Correct answer was not found 8 9 7 6 0 1 31

ACM Transactions on Asian Language Information Processing, Vol. 2, No. 3, September 2003.
Hindi – English Cross Lingual Question – Answering System • 191

Table IV. Accuracy In Tagging Answers

Tagged Not-tagged Total
NE 29 0 29
Numeric 17 10 27

does not necessarily mean that 45% of the time the answer can't be found, although
obviously these are correlated.
Next, we evaluated NE and numeric tagger accuracy. In the original article used to
create the question, the percentage of instances where the answer was correctly tagged is
measured. Table IV shows the result. Incidentally, there were no errors in NE tagging,
perhaps because the native speaker picked up easy NE expressions; but further
investigation is needed.
Table I shows the number of questions for which the correct answer was found at a
given rank in Hindi and English. The decrease in the numbers from the Hindi column to
the English column is due to translation errors. There are four instances: two are
mispellings of English words in Romanized English. The other two are translation errors
in numeric expressions (“200 feet” and “200 dead” are not correctly translated into
English due to insufficient coverage in the bilingual dictionary).
We developed a cross-lingual question-answering (CLQA) system for Hindi and English
in one month. We believe that we showed that cross-lingual question-answering is viable
and that a basic system can be constructed quickly once other linguistic tools become
available. We evaluated the main sources of errors in the system, which suggested
several ways for improving the accuracy of the CLQA system.
This research could not have been completed without the contribution of many “Team
TIDES” researchers. We would like to thank the coordinator and all the participants in
the SLE. Most of the data was provided by other groups, and many ideas were suggested
on the Surprise Language mailing list. Contributors provided data (MITRE), MT (ISI);
dictionaries (LDC, IBM, CMU, SPAWAR, U. of Sheffield, ISI, BBN); tools (UMD,
Alias-I); know-how in CLIR (CMU); and NE tagger data (BBN, LDC). We also thank
Winston Lin and two Hindi speakers at NYU, Priyanka Mittal and Smita Shukla, for their
BIKEL, D. M., MILLER, S., SCHWART, R. AND WEISCHEDEL, R. 1997. Nymble: A high-performance learning
name-finder. In Proceedings of the Applied Natural Language Processing Workshop.
ECHIHABI, A., OARD, D., MARCU, D., AND HERMJAKOB, U. 2003. Answering Spanish questions from English
documents. CLEF-2003 Workshop Homepage. http://clef.iei.pi.cnr.it:2002/
HARABAGIU, S., PASCA, M., AND MAIORANO, S. 2000. Experiments with open-domain textual question answers.
In Proceedings of the International Conference on Computational Linguistics (COLING 2000), 292-298.
HOVY, E., HERMJAKOB, U., AND RAVICHANDRAN, D. 2001. A question/answer typology with surface text
pattern. In Proceedings of the Human Language Technology Conference.
LI, W., SRIHARI, R., NIU, C., AND LI, X. 2003. Question answering on a case intensive corpus. In Proceedings
of the Workshop on Multilingual Summarization and Question Answering – Machine Learning and
Beyond (Sapporo, Japan).

ACM Transactions on Asian Language Information Processing, Vol. 2, No. 3, September 2003.
192 • S. Sekine and R. Grishman

M. 2003. The multiple language question answering track at CLEF 2003. CLEF-2003 Workshop
Homepage. http://clef.iei.pi.cnr.it:2002/
MAGNINI, B. 2003. Italian in the monolingual and cross-language QA tasks. CLEF-2003 Workshop Homepage.
MANN, G. 2002. Fine-grained proper noun ontologies for question answering. In SemaNet’02: Building and
Using Semantic Networks.
MOLDOVAN, D., CLARK, C., .HARABAGIU, S., AND MAIORANO, S. 2003. COGEX: A logic prover for question
answering. HLT-NAACL 2003 ( Edmonton, Canada),166—172.
NEUMANN, D. 2003. A cross-language question/answering-system for German and English. CLEF-2003
Workshop Homepage. http://clef.iei.pi.cnr.it:2002/
O’GORMAN, A., GABBAY, I. AND SUTCLIFFE, R. E. F. 2003. French in the cross-language task. CLEF-2003
Workshop Homepage. http://clef.iei.pi.cnr.it:2002/
RAMAKRISHNAN, G., JADHAV, A., AND JOSHI, A. 2003. Soumen Chakrabarti and Push Bhattacharyya. question
answering via Baysian inference on lexical relations. In Proceedings of the Workshop on Multilingual
Summarization and Question Answering – Machine Learning and Beyond (Sapporo, Japan).
RAVICHANDRAN, D., HOVY, E., AND OCH, F. 2003. Statistical QA – classifier vs. re-ranker: What’s the
difference? In Proceedings of the Workshop on Multilingual Summarization and Question Answering –
Machie Learning and Beyond (Sapporo, Japan).
RUKE, M. 2003. Off-line strategies for answering Dutch questions. CLEF-2003 Workshop Homepage.
SEKINE, S., SUDO, K., AND NOBATA, C. 2002. Extended named entity hierarchy. In Proceedings of the
Language Resource and Evaluation Conference (LREC).
system, QAC question analysis and CRL QA data. In Proceedings of the NTCIR Workshop 3 Meeting
Question Answering Challenge (QAC-1), 79-86.
VOORHEES, E. AND TICE, D. 2000. Building a question-answering test collection. In Proceedings of the 23rd
Annual International ACM SIGIR Conference on Research and Development in Information Retrieval.
ACM, New York, 192-199.
VOORHEES, E., 2002. Overview of TREC-9 question answering track. In Proceedings of the Text Retrieval
Conference 9.

Received August 2003; revised September 2003; accepted November 2003

ACM Transactions on Asian Language Information Processing, Vol. 2, No. 3, September 2003.