Академический Документы
Профессиональный Документы
Культура Документы
EXTRACTION
John Francis Olivo
BSCS 3-4
INFORMATION EXTRAC TION
information extraction
Input text
Relevant info from
IE System text
Information Extraction
• Skim text input
Formally:
Gathers the semantic
information out of documents,
esp. web pages, that allow further
inferences to be made
IE systems are considered as “dumber” versions of the
goal of Natural Language understanding because IE
focuses more on
learning of particular relations
and producing a structured
representation
AHA!
Process:
- Skim text
- Locate relation instances
- Get information
- Store in DB
Named Entity Recognition (NER)
2. Using classifiers
Generative: Naïve Bayes
Discriminative: Maxent models
3. Sequence models
HMMs
CMMs/MEMMs
CRFs
Types of IE Systems
5. Domain events Sequences of phrases in level 3 (and 4) is scanned for patterns of interest
6. Structure merging Semantic procedures are merged
AHA!
Russell, Stuart J., and Norvig, Peter. Artificial Intelligence: A Modern Approach 2nd Ed. Pearson Education Inc. 2003
Manning, Christopher. Information Extraction and Named Entity Recognition. Stanford University
Hobbs, Jerry R., and Riloff, Ellen. Handbook of Natural Language Processing. Information Sciences Institute