Information Extraction - CS

INFORMATION
EXTRACTION
John Francis Olivo
BSCS 3-4
INFORMATION EXTRAC TION
information extraction
Input text
Relevant info from
IE System text
Information Extraction
• Skim text input
• Creates database entries
• Objects and Events
• Relations between objects and event
• Midway between IR and full-text parsers

Information
Extraction Informally:
Systems Gets the gist of the text
and organizes it in a way useful
to people
Formally:
Gathers the semantic
information out of documents,
esp. web pages, that allow further
inferences to be made
IE systems are considered as “dumber” versions of the
goal of Natural Language understanding because IE
focuses more on
learning of particular relations
and producing a structured
representation
AHA!
Process:
- Skim text
- Locate relation instances
- Get information
- Store in DB
Named Entity Recognition (NER)
As presented by Christopher Manning, NER is a

very important subtask in information extraction
NER does the task of LOCATING and

CLASSIFYING entities or names found in
the text.
Sample text
The decision by the independent MP Andrew Wilkie to
withdraw his support for the minority Labor government
sounded dramatic but it should not further threaten its
stability. When, after the 2010 election, Wilkie, Rob
Oakeshott, Tony Windsor and the Greens agreed to
support Labor, they gave just two guarantees: confidence
and supply.
Sample text task: FIND ENTITIES
The decision by the independent MP Andrew Wilkie to
withdraw his support for the minority Labor government
sounded dramatic but it should not further threaten its
and supply.
Sample text task: CLASSIFY
The decision by the independent MP Andrew Wilkie to KEY:
withdraw his support for the minority Labor government Person
sounded dramatic but it should not further threaten its Date
Location
Organization
and supply.
Some uses:
Named entities can be indexed, linked off, etc.
Sentiment can be attributed to companies or products
A lot of IE relations are associations between named
entities
Standard approaches to IE (and NER)
1. Hand-written regular expressions
2. Using classifiers
Generative: Naïve Bayes
Discriminative: Maxent models
3. Sequence models
HMMs
CMMs/MEMMs
CRFs
Types of IE Systems
(a) Attribute-based IE system
(b) Relational-based IE system

Attribute-based IE systems
The whole text is considered as an object
Get attributes of the object
Low-level information extraction

e.g. mail program extractions:
- time and date
- phone number
- events
- etc
Relational-based IE systems
Extracts objects and relations between these objects
Built through CASCADED FINITE STATE TRANSDUCERS
Series of finite state automata

Steps in relational-based IE system:
1. Tokenization Same with lexical; basic words
2. Complex words handling Words with two or more morphemes
3. Basic phrases/groups handling Noun Groups and Verb Groups

4. Complex phrases/groups handling Compound-complex and Complex
5. Domain events Sequences of phrases in level 3 (and 4) is scanned for patterns of interest
6. Structure merging Semantic procedures are merged
AHA!
Information extraction works well for a

restricted domain in which it is
possible to determine what subjects
will be discussed and how they will be
mentioned
References:
Russell, Stuart J., and Norvig, Peter. Artificial Intelligence: A Modern Approach 2nd Ed. Pearson Education Inc. 2003
Manning, Christopher. Information Extraction and Named Entity Recognition. Stanford University
Hobbs, Jerry R., and Riloff, Ellen. Handbook of Natural Language Processing. Information Sciences Institute

Information Extraction - CS

Загружено:

Сведения о документе

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Information Extraction - CS

Загружено:

Авторское право:

Доступные форматы

INFORMATION

• Creates database entries

• Objects and Events

• Relations between objects and event

• Midway between IR and full-text parsers

As presented by Christopher Manning, NER is a

NER does the task of LOCATING and

(a) Attribute-based IE system

(b) Relational-based IE system

The whole text is considered as an object

Get attributes of the object

Low-level information extraction

Built through CASCADED FINITE STATE TRANSDUCERS

Series of finite state automata

1. Tokenization Same with lexical; basic words

2. Complex words handling Words with two or more morphemes

3. Basic phrases/groups handling Noun Groups and Verb Groups

Information extraction works well for a

Вам также может понравиться