Академический Документы
Профессиональный Документы
Культура Документы
language processing
Mark Sanderson
Porto, 2000
Aims
• To provide an outline of the attempts made
at using NLP techniques in IR
Objectives
• At the end of this lecture you will be able to
– Outline a range of attempts to get NLP to work
with IR systems
– Idly speculate on why they failed
– Describe the successful use of NLP in a limited
domain
Why?
• Seems an obvious area of investigation
– Why not working?
Use of NLP
• Syntactic
– Parsing to identify phrases
– Full syntactic structure comparison
• Semantic
– Building an understanding of a document’s
content
• Discourse
– Exploiting document structure?
Syntactic
• Parsing to identify phrases
– The issues.
– Explain how it’s done (a bit).
– Is it worth it?
• Other possibilities
– Grammatical tagging
– Full syntactic structure comparison
• Explain how it’s done (a little bit).
• Show results.
Simple phrase identification
• High frequency terms could be good candidates.
– Why?
PP
NP
PP
• Method
– Parse sentences into tree structures
– When you get a phrase match
• Look at linking syntactic operator.
• Look at the residual tree structure that didn’t match
• Does not to work
Semantic
• Disambiguation
– Given a word appearing in a certain context,
disambiguators will tell you what sense it is.
• IR system
– Index document collections by senses rather than
words
– Ask the users what senses the query words are
– Retrieve on senses
Disambiguation
• Does it work?
– No (well maybe)
• M. Sanderson, Word sense disambiguation and
information retrieval, in Proceedings of the 17th
ACM SIGIR Conference, Pages 142-151, 1994
• M. Sanderson & C.J. van Rijsbergen, The impact on
retrieval effectiveness of skewed frequency
distributions, in ACM Transactions on Information
Systems (TOIS) Vol. 17 No. 4, 1999, Pages 440-465.
Partial conclusions
• NLP has yet to prove itself in IR
– Agree
– D.D. Lewis & K. Sparck-Jones (1996) Natural language
processing for information retrieval, in Communications
of the ACM (CACM) 1996 Vol. 39, No. 1, 92-101
– Sort of don’t agree
– A. Smeaton (1992) Progress in the application of natural
language processing to information retrieval tasks, in The
Computer Journal, Vol. 35, No. 3.
Mark’s idle speculation
• What people think is going on always
Keywords
NLP
Mark’s idle speculation
• What’s usually actually going on
Keywords NLP
Areas where NLP does work
• Systems with the following ingredients.
– Collection documents cover small domain.
– Language use is limited in some manner.
– User queries cover tight subject area.
– Documents/queries very short
• Image captions
– LSI, pseudo-relevance feedback
– People willing to spend money getting NLP to
work
RIME & IOTA
• From Grenoble
– Y. Chiaramella & J. Nie (1990) A retrieval model based on
an extended modal logic and its application to the RIME
experimental approach, in Proceedings of the 13th SIGIR
conference, Pages 25-43
⇒
{[has-for-value], SGN}, t
• Optimisation
– Scan for potential documents.
– Process them intensively.
• Evaluation?
– Not in that paper.
Not unique
• SCISOR
– P.S. Jacobs & L.F. Rau (1990) SCISOR: Extracting
Information from On-line News, in Communications of
the ACM (CACM), Vol. 33, No. 11, 88-97
Why do they work?
• Because of the restrictions
– Small subject domain.
– Limited vocabulary.
– Restricted type of question.
• Compare with large scale IR system.
– Keywords are good enough.
– Long time to set up.
– Hard to adapt to new domain.
Anything else for NLP?
• Text Generation
– IR system explaining itself?
Conclusions
• By now, you will be able to
– Outline a range of attempts to get NLP to work
with IR systems
– Idly speculate on why they failed
– Describe the successful use of NLP in a limited
domain