Вы находитесь на странице: 1из 29

Norbert Fuhr

PROBABILISTIC
MODELS IN
INFORMATION
RETRIEVAL
Introduction
 The intrinsic uncertainty of IR.
 Two approaches:
Relevance models
Proof-theoretic model
Relevance models
 A user
assigns relevance judgments to
document w.r.t. his/her query.

 The IR systems yield the approximation


of the set of relevant documents.

 Some models: BIR model, BII model, DIA


model, etc…
Relevance models
 Binary independence retrieval model
(BIR)
A document d_m is composed of a set of
terms and represented as a vector.
 Assumptions:
“cluster hypothesis”: Terms are distributed
differently within relevant and non-relevant
documents.
A query q_k is also a set of terms.
Relevance models
Relevance models
 An example

Ranking is (1,1),(1,0),(0,1),(0,)
The probability ranking
principle
 Let C be the costs for the retrieval of a
relevant document. for non-relevant
documents.

Retrieve that document for which the expected


costs of retrieval are a minimum.
Proof-theoretic model
 IR is interpreted as uncertain inference.

 A generation of deductive databases:


queries and contents are treated as logical
formulas.
The query has to be proved from the
formulas.
 A document is an answer for a query
iffthe logic formula is true.
Jane Cleland-Huang, Reffaella Settimi, Oussama
BenKhadra, Eugenia Berezhanskaya, Selvia Christina

GOAL-CENTRIC
TRACEABILITY FOR
MANAGING NON-
FUNCTIONAL
 Non-Functional Requirements (NFR) are
difficult to trace:
Global impact upon a software system
Extensive network of interdependencies and
trace-offs
 Goal centric traceability (GCT) approach:
NFRs are modeled as goals and
operationalizations within SIG.

Dynamically establish traces from impacted


functional design element to elements in SIG.
Softgoal Interdependency
Graph
GCT Model
 Impact detection in GCT
Documents
Queries
Index terms
 Therelevance of a document to a
query q is pr( ,q)
Jane Cleland-Huang, Reffaella Settimi, Chuan Duan,
Xuchang Zou

UTILIZING
SUPPORTING
EVIDENCE TO
IMPROVE DYNAMIC
Introduction
 Current work
Recall level close to 90%
Precision from 10% to 45%.

 Target:
Maintain recall level at least 90%
Precision at least 20%
Introduction
 Threestrategies to improve the
performance of dynamic requirements
traceability:
Hierarchical modeling
Logical clustering of artifacts
Semi-automated pruning of the probabilistic
network.
Enhancement strategies
Motivation Example
Hierarchical
Enhancement
 R3 label is “De-icing”
 Using hierarchical information in R3 ->
R5 describe de-icing service.
 Similarly, C4 describe about truck
maintenance service.

The link between C4 and R5 is not


correct !!!
Hierarchical
Enhancement
 Solution:
Build a DAG graph to display the direct
relationship between artifacts.
 Results
Clustering
Enhancements
 Links tend to occur
in clusters:
q <-> d_j => higher
prob that q <-> d
q <-> q_i => higher
prob that d <-> q
 Care about relationship
of sibling artifacts.
Clustering
Enhancements
 Solution
Clustering
Enhancements
 Evaluation
Graph Pruning
Enhancement
 Observation:
Word “schedule” used for both de-icing
schedule and truck maintenance schedules
Query with “schedule” will returns artifacts
from both domains make precision lower.
Graph Pruning
Enhancement
 Solution:
Utilize initial decision made by the analyst to
place constraints and improve precision in
“problematic” area.
Rules to place constrains:
1. One or more links between two groups are
all rejected by an analyst.
2. Basic retrieval algorithm generated
candidate links between two groups.
Graph Pruning
Enhancement
 Evaluation

Вам также может понравиться