Вы находитесь на странице: 1из 14

Use of Patterns for

Detection of Answer
Strings
Soubbotin and Soubbotin

Essentials of Approach
A certain shift from deep text analysis and
NLP methods to surface techniques
Use of formulas describing the structure of
strings likely bearing certain semantic
information

Example
FBI Director Louis Freeh
A person represented by his/her first/last
names
A person occupies a post in an
organization

The formula
A word composed of capital letters
An item from a list of posts in an
organization
An item from a list of first names
A capitalized word

Patterns
Formulas of such kind were called
patterns
First used at TREC-10 QA track
Each pattern is characterized by a certain
generalized semantics

Steps (Overview)
Identify strings corresponding to a formula
Identify the question terms (types)
Check for expressions negating the
semantics of the found strings
Apply the set of formulas (for a particular
question type) to match the strings in
question-relevant passages

A Surface Approach
No need to distinguish linguistic entities
Formulas for strings look like regular
expressions
But patterns include elements referring to
lists of predefined words/phrases

Patterns and Question Types


Who is person X?
Who occupies post Y in organization Z?

relationship is established between 2 or


more entities: person, post, organization etc

Where-question:
suggest

geographical items as answers


Construct formulas like: item from list of
cities/towns/counties, countries/states.

Examples

In what year questions


Find

strings with a sequence of 4 digits

Questions regarding length, area, weight,


speed, etc
Digits

plus units of measurement

What is the area of Venezuela?


340,569

match)

square miles (a simple pattern

Complex Patterns
Strings expressing relationship between
several semantic entities
The more complex a pattern is, the higher
its reliability

Names and Dates

People Names
Items

from first name list


Capitalized words
Specific name elements (bin, van, etc)
Abbreviations like Sr. and Jr.

Dates
Prepositions,

articles, digits, month names, commas,


dashes, brackets, phrases like early, in the period
of, years ago, B.C.

Pattern-Matching Strings and


Question Semantics

How question words are located in the patternmatching string (distance, left/right, position to
other matching strings etc)
Simplicity of a patterns structure is
compensated by complexity of rules
Without applying heuristic rules, sufficiently
reliable results cannot be ensured
Rank assigned to question words/phrases and
score assigned to candidate answers

QA Process

Define question types for all questions


Order the questions with more reliable patterns
Form and rank queries from question terms
Modify queries (if score is below threshold)
Identify pattern-matching strings (apply complex
and then simple)
Check correlation between patterns and
question semantics
Identify exact answers and calculate their scores

Analysis of Results

TREC 2002:
confidence-weighted

score = 0.691
271 right answers, 209 wrong answers, 148
no answer
First 29 correct answers belonged to question
types with highly reliable patterns
Incorrectly identified answer strings = 13.6%
(excluding NIL answers)

Вам также может понравиться