Вы находитесь на странице: 1из 130

UNIT-IV

CLASSIFICATION AND CLUSTERING

Text Classification and Nave Bayes


Vector Space Classification
Support vector machines and Machine
learning on documents.
Flat Clustering
Hierarchical Clustering
Matrix decompositions and latent semantic
indexing
Fusion and Meta learning
Text Classification

General classes are usually referred to as


topics and the classification task is called text
classification, text categorization and topic
classification.
Introduction to Information Retrieval

Document Classification
planning
Test language
Data: proof
intelligence

(AI) (Programming) (HCI)


Classes:
ML Planning Semantics Garb.Coll. Multimedia GUI

Training learning planning programming garbage ... ...


Data: intelligence temporal semantics collection
algorithm reasoning language memory
reinforcement plan proof... optimization
network... language... region...
Introduction to Information Retrieval Ch. 13

Classification Methods (1)


Manual classification
Used by the original Yahoo! Directory
Looksmart, about.com, ODP, PubMed
Accurate when job is done by experts
Consistent when the problem size and team is
small
Difficult and expensive to scale
Means we need automatic classification methods for
big problems
Introduction to Information Retrieval Ch. 13

Classification Methods (2)


Hand-coded rule-based classifiers
One technique used by new agencies, intelligence
agencies, etc.
Widely deployed in government and enterprise
Vendors provide IDE for writing such rules
Introduction to Information Retrieval Sec. 13.1

Classification Methods (3):


Supervised learning
Given:
A document d
A fixed set of classes:
C = {c1, c2,, cJ}
A training set D of documents each with a label in C
Determine:
A learning method or algorithm which will enable us
to learn a classifier
For a test document d, we assign it the class
(d) C
Introduction to Information Retrieval Ch. 13

Classification Methods (3)


Supervised learning
Naive Bayes (simple, common) see video
k-Nearest Neighbors (simple, powerful)
Support-vector machines (new, generally more
powerful)
plus many other methods
No free lunch: requires hand-classified training data
But data can be built up (and refined) by amateurs
Many commercial systems use a mixture of
methods
Introduction to Information Retrieval

The bag of words representation


I love this movie! It's sweet, but with
satirical humor. The dialogue is great
and the adventure scenes are fun It
manages to be whimsical and romantic
while laughing at the conventions of the
fairy tale genre. I would recommend it to
just about anyone. I've seen it several
times, and I'm always happy to see it
again whenever I have a friend who
hasn't seen it yet.

great 2
love 2
recommend 1
laugh 1
happy 1
Introduction to Information Retrieval Sec.13.6

Evaluating Categorization
Evaluation must be done on test data that are
independent of the training data
Sometimes use cross-validation (averaging results
over multiple training and test splits of the overall
data)
Easy to get good performance on a test set
that was available to the learner during
training (e.g., just memorize the test set)
Introduction to Information Retrieval Sec.13.6

Evaluating Categorization
Measures: precision, recall, F1, classification
accuracy
Classification accuracy: r/n where n is the
total number of test docs and r is the number
of test docs correctly classified
Introduction to Information Retrieval
Nave Bayes Classifier
The Bayesian Classification represents a supervised learning method as well as a
statistical method for classification. Assumes an underlying probabilistic model and
it allows us to capture uncertainty about the model in a principled way by
determining probabilities of the outcomes. It can solve diagnostic and predictive
problems.
This Classification is named after Thomas Bayes, who proposed the Bayes
Theorem.
Bayesian classification provides practical learning algorithms and prior knowledge
and observed data can be combined
lets say we have data on 1000 pieces of fruit. The fruit being a Banana, Orange or
some Other fruit and imagine we know 3 features of each fruit, whether its long or
not, sweet or not and yellow or not, as displayed in the table below
An Example of Text
Classification with Nave Bayes
Vector Space Classification
Support vector machines and Machine
learning on documents.
FAST and Hierarchical Clustering
Document clustering
Motivations
Document representations
Success criteria
Clustering algorithms
Partitional
Hierarchical
Ch. 16

What is clustering?
Clustering: the process of grouping a set of
objects into classes of similar objects
Documents within a cluster should be similar.
Documents from different clusters should be
dissimilar.
The commonest form of unsupervised learning
Unsupervised learning = learning from raw data, as
opposed to supervised data where a classification of
examples is given
A common and important task that finds many
applications in IR and other places
Ch. 16

A data set with clear cluster structure

How would
you design
an algorithm
for finding
the three
clusters in
this case?
Sec. 16.1

Applications of clustering in IR
Whole corpus analysis/navigation
Better user interface: search without typing
For improving recall in search applications
Better search results (like pseudo RF)
For better navigation of search results
Effective user recall will be higher
For speeding up vector space retrieval
Cluster-based retrieval gives faster search
Sec. 16.1

Scatter/Gather: Cutting, Karger, and Pedersen


Sec. 16.2

Issues for clustering


Representation for clustering
Document representation
Vector space? Normalization?
Centroids arent length normalized
Need a notion of similarity/distance
How many clusters?
Fixed a priori?
Completely data driven?
Avoid trivial clusters - too large or small
If a cluster's too large, then for navigation purposes you've
wasted an extra user click without whittling down the set of
documents much.
Clustering Algorithms
Flat algorithms
Usually start with a random (partial) partitioning
Refine it iteratively
K means clustering
(Model based clustering)
Hierarchical algorithms
Bottom-up, agglomerative
(Top-down, divisive)
Hard vs. soft clustering
Hard clustering: Each document belongs to exactly one cluster
More common and easier to do
Soft clustering: A document can belong to more than one
cluster.
Makes more sense for applications like creating browsable
hierarchies
You may want to put a pair of sneakers in two clusters: (i) sports
apparel and (ii) shoes
You can only do that with a soft clustering approach.
We wont do soft clustering today. See IIR 16.5, 18
Partitioning Algorithms
Partitioning method: Construct a partition of n
documents into a set of K clusters
Given: a set of documents and the number K
Find: a partition of K clusters that optimizes the
chosen partitioning criterion
Globally optimal
Intractable for many objective functions
Ergo, exhaustively enumerate all partitions
Effective heuristic methods: K-means and K-
medoids algorithms
Sec. 16.4

K-Means
Assumes documents are real-valued vectors.
Clusters based on centroids (aka the center of
gravity or mean) of points in a cluster, c:
1
(c)
| c | xc
x

Reassignment of instances to clusters is based


on distance to the current cluster centroids.
(Or one can equivalently phrase it in terms of
similarities)
Sec. 16.4

K-Means Algorithm
Select K random docs {s1, s2, sK} as seeds.
Until clustering converges (or other stopping criterion):
For each doc di:
Assign di to the cluster cj such that dist(xi, sj) is minimal.
(Next, update the seeds to the centroid of each cluster)
For each cluster cj
sj = (cj)
Sec. 16.4

K Means Example
(K=2)
Pick seeds
Reassign clusters
Compute centroids
Reassign clusters
x x Compute centroids
x
x
Reassign clusters
Converged!
Sec. 16.4

Time Complexity
Computing distance between two docs is
O(M) where M is the dimensionality of the
vectors.
Reassigning clusters: O(KN) distance
computations, or O(KNM).
Computing centroids: Each doc gets added
once to some centroid: O(NM).
Assume these two steps are each done once
for I iterations: O(IKNM).
Sec. 16.4

K-means issues, variations, etc.


Recomputing the centroid after every
assignment (rather than after all points are re-
assigned) can improve speed of convergence
of K-means
Assumes clusters are spherical in vector space
Sensitive to coordinate changes, weighting etc.
Disjoint and exhaustive
Doesnt have a notion of outliers by default
But can add outlier filtering
STEPS K -Means
Number of clusters K is given
Partition n docs into predetermined number of
clusters
Finding the right number of clusters is part of
the problem
Given docs, partition into an appropriate number of
subsets.
E.g., for query results - ideal value of K not known up
front - though UI may impose limits.
Can usually take an algorithm for one flavor and
convert to the other.
K not specified in advance
Say, the results of a query.
Solve an optimization problem: penalize having
lots of clusters
application dependent, e.g., compressed summary
of search results list.
Tradeoff between having more clusters (better
focus within each cluster) and having too many
clusters
K not specified in advance
Given a clustering, define the Benefit for a
doc to be the cosine similarity to its
centroid
Define the Total Benefit to be the sum of
the individual doc Benefits.

Why is there always a clustering of Total Benefit n?


Ch. 17

Hierarchical Clustering
Build a tree-based hierarchical taxonomy
(dendrogram) from a set of documents.
animal

vertebrate invertebrate

fish reptile amphib. mammal worm insect crustacean

One approach: recursive application of a


partitional clustering algorithm.
Dendrogram: Hierarchical Clustering

Clustering obtained
by cutting the
dendrogram at a
desired level: each
connected
component forms a
cluster.

59
Sec. 17.1

Hierarchical Agglomerative Clustering


(HAC)
Starts with each doc in a separate cluster
then repeatedly joins the closest pair of
clusters, until there is only one cluster.
The history of merging forms a binary tree
or hierarchy.

Note: the resulting clusters are still hard and induce a partition
Sec. 17.2

Closest pair of clusters


Many variants to defining closest pair of clusters
Single-link
Similarity of the most cosine-similar (single-link)
Complete-link
Similarity of the furthest points, the least cosine-
similar
Centroid
Clusters whose centroids (centers of gravity) are the
most cosine-similar
Average-link
Average cosine between pairs of elements
Sec. 17.2

Single Link Agglomerative Clustering


Use maximum similarity of pairs:

sim (ci ,c j ) max sim ( x, y )


xci , yc j
Can result in straggly (long and thin) clusters
due to chaining effect.
After merging ci and cj, the similarity of the
resulting cluster to another cluster, ck, is:

sim ((ci c j ), ck ) max( sim (ci , ck ), sim (c j , ck ))


Sec. 17.2

Single Link Example


Sec. 17.2

Complete Link
Use minimum similarity of pairs:
sim (ci ,c j ) min sim ( x, y)
xci , yc j
Makes tighter, spherical clusters that are typically
preferable.
After merging ci and cj, the similarity of the resulting cluster
to another cluster, ck, is:

sim ((ci c j ), ck ) min( sim (ci , ck ), sim (c j , ck ))


Ci Cj Ck
Sec. 17.2

Complete Link Example


Sec. 17.2.1

Computational Complexity
In the first iteration, all HAC methods need to compute
similarity of all pairs of N initial instances, which is
O(N2).
In each of the subsequent N2 merging iterations,
compute the distance between the most recently
created cluster and all other existing clusters.
In order to maintain an overall O(N2) performance,
computing similarity to each other cluster must be
done in constant time.
Often O(N3) if done naively or O(N2 log N) if done more
cleverly
Sec. 16.3

What Is A Good Clustering?


Internal criterion: A good clustering will produce
high quality clusters in which:
the intra-class (that is, intra-cluster) similarity is
high
the inter-class similarity is low
The measured quality of a clustering depends on
both the document representation and the
similarity measure used
Sec. 16.3

Purity example

Cluster I Cluster II Cluster III

Cluster I: Purity = 1/6 (max(5, 1, 0)) = 5/6

Cluster II: Purity = 1/6 (max(1, 4, 1)) = 4/6

Cluster III: Purity = 1/5 (max(2, 0, 3)) = 3/5


Sec. 16.3

Rand Index measures between pair


decisions. Here RI = 0.68
Different
Number of Same Cluster
Clusters in
points in clustering
clustering

Same class in
ground truth 20 24

Different
classes in 20 72
ground truth
Sec. 16.3

Rand index and Cluster F-measure

A D
RI
A B C D
Compare with standard Precision and Recall:
A A
P R
A B AC
People also define and use a cluster F-
measure, which is probably a better measure.
Introduction to Information Retrieval Sec. 18.1

Matrix-vector multiplication

has eigenvalues 30, 20, 1 with


corresponding eigenvectors

On each eigenvector, S acts as a multiple of the identity


matrix: but as a different multiple on each.
Any vector (say x= ) can be viewed as a combination of
the eigenvectors: x = 2v1 + 4v2 + 6v3
Semantic Indexing
Introduction to Information Retrieval Sec. 18.1

Matrix-vector multiplication
Thus a matrix-vector multiplication such as Sx
(S, x as in the previous slide) can be rewritten
in terms of the eigenvalues/vectors:

Even though x is an arbitrary vector, the


action of S on x is determined by the
eigenvalues/vectors.
Introduction to Information Retrieval Sec. 18.1

Matrix-vector multiplication
Suggestion: the effect of small eigenvalues is
small.
If we ignored the smallest eigenvalue (1), then
instead of

we would get

These vectors are similar (in cosine similarity,


etc.)
Introduction to Information Retrieval Sec. 18.1

Eigenvalues & Eigenvectors


For symmetric matrices, eigenvectors for distinct
eigenvalues are orthogonal

All eigenvalues of a real symmetric matrix are real.

All eigenvalues of a positive semidefinite matrix


are non-negative
Introduction to Information Retrieval Sec. 18.1

Example
Let Real, symmetric.

Then

The eigenvalues are 1 and 3 (nonnegative, real).


The eigenvectors are orthogonal (and real):
Plug in these values and
solve for eigenvectors.
Introduction to Information Retrieval Sec. 18.1

Eigen/diagonal Decomposition
Let be a square matrix with m
linearly independent eigenvectors (a non-
defective matrix) diagonal
Unique
for
Theorem: Exists an eigen decomposition distinct
eigen-
(cf. matrix diagonalization theorem) values

Columns of U are the eigenvectors of S


Diagonal elements of are eigenvalues of
Introduction to Information Retrieval Sec. 18.1

Diagonal decomposition: why/how

Let U have the eigenvectors as columns:

Then, SU can be written

Thus SU=U, or U1SU=

And S=UU1.
Introduction to Information Retrieval Sec. 18.1

Diagonal decomposition - example

Recall

The eigenvectors and form

Inverting, we have Recall


UU1 =1.

Then, S=UU1 =
Introduction to Information Retrieval Sec. 18.1

Example continued
Lets divide U (and multiply U1) by

Then, S=

Q (Q-1= QT )

Why? Stay tuned


Introduction to Information Retrieval Sec. 18.1

Symmetric Eigen Decomposition


If is a symmetric matrix:
Theorem: There exists a (unique) eigen
decomposition
where Q is orthogonal:
Q-1= QT
Columns of Q are normalized eigenvectors
Columns are orthogonal.
(everything is real)
Introduction to Information Retrieval Sec. 18.1

Exercise
Examine the symmetric eigen decomposition,
if any, for each of the following matrices:
Introduction to Information Retrieval

Similarity Clustering
We can compute the similarity between two
document vector representations xi and xj by xixjT
Let X = [x1 xN]
Then XXT is a matrix of similarities
Xij is symmetric
So XXT = QQT
So we can decompose this similarity space into a
set of orthonormal basis vectors (given in Q)
scaled by the eigenvalues in
This leads to PCA (Principal Components Analysis)

17
Introduction to Information Retrieval Sec. 18.2

Singular Value Decomposition


For an M N matrix A of rank r there exists a
factorization (Singular Value Decomposition = SVD)
as follows:

MM MN V is NN

(Not proven here.)


Introduction to Information Retrieval Sec. 18.2

Singular Value Decomposition

MM MN V is NN

AAT = QQT
AAT = (UVT)(UVT)T = (UVT)(VUT) = U2UT
The columns of U are orthogonal eigenvectors of AAT.
The columns of V are orthogonal eigenvectors of ATA.
Eigenvalues 1 r of AAT are the eigenvalues of ATA.

Singular values
Introduction to Information Retrieval Sec. 18.2

Singular Value Decomposition


Illustration of SVD dimensions and sparseness
Introduction to Information Retrieval Sec. 18.2

SVD example

Let

Thus M=3, N=2. Its SVD is

Typically, the singular values arranged in decreasing order.


Introduction to Information Retrieval Sec. 18.3

Reduced SVD

If we retain only k singular values, and set the


rest to 0, then we dont need the matrix parts
in color
Then is kk, U is Mk, VT is kN, and Ak is
MN
This is referred to as the reduced SVD
It is the convenient (space-saving) and usual
form for computational applications
Its whatk Matlab gives you
Introduction to Information Retrieval Sec. 18.3

SVD Low-rank approximation


Whereas the term-doc matrix A may have
M=50000, N=10 million (and rank close to
50000)
We can construct an approximation A100 with
rank 100.
Of all rank 100 matrices, it would have the lowest
Frobenius error.
Great but why would we??
Answer: Latent Semantic Indexing
C. Eckart, G. Young, The approximation of a matrix by another of lower rank.
Psychometrika, 1, 211-218, 1936.
Introduction to Information Retrieval

Latent Semantic Indexing


via the SVD
Introduction to Information Retrieval Sec. 18.4

What it is
From term-doc matrix A, we compute the
approximation Ak.
There is a row for each term and a column
for each doc in Ak
Thus docs live in a space of k<<r
dimensions
These dimensions are not the original axes
But why?
Introduction to Information Retrieval

Vector Space Model: Pros


Automatic selection of index terms
Partial matching of queries and documents (dealing
with the case where no document contains all search terms)
Ranking according to similarity score (dealing with
large result sets)
Term weighting schemes (improves retrieval performance)
Various extensions
Document clustering
Relevance feedback (modifying query vector)
Geometric foundation
Introduction to Information Retrieval

Problems with Lexical Semantics


Ambiguity and association in natural language
Polysemy: Words often have a multitude of
meanings and different types of usage (more
severe in very heterogeneous collections).
The vector space model is unable to discriminate
between different meanings of the same word.
Introduction to Information Retrieval

Problems with Lexical Semantics


Synonymy: Different terms may have an
identical or a similar meaning (weaker:
words indicating the same topic).
No associations between words are
made in the vector space
representation.
Introduction to Information Retrieval

Polysemy and Context


Document similarity on single word level:
polysemy and context ring
jupiter

space
meaning 1 voyager

planet saturn
... ...
meaning 2 car
company

contribution to similarity, if dodge
used in 1st meaning, but not if ford
in 2nd
Introduction to Information Retrieval Sec. 18.4

Latent Semantic Indexing (LSI)


Perform a low-rank approximation of
document-term matrix (typical rank 100300)
General idea
Map documents (and terms) to a low-
dimensional representation.
Design a mapping such that the low-dimensional
space reflects semantic associations (latent
semantic space).
Compute document similarity based on the inner
product in this latent semantic space
Introduction to Information Retrieval Sec. 18.4

Goals of LSI
LSI takes documents that are semantically similar
(= talk about the same topics), but are not similar
in the vector space (because they use different
words) and re-represents them in a reduced
vector space in which they have higher similarity.

Similar terms map to similar location in low


dimensional space
Noise reduction by dimension reduction
Introduction to Information Retrieval Sec. 18.4

Latent Semantic Analysis


Latent semantic space: illustrating example

courtesy of Susan Dumais


Introduction to Information Retrieval Sec. 18.4

Performing the maps


Each row and column of A gets mapped into
the k-dimensional LSI space, by the SVD.
Claim this is not only the mapping with the
best (Frobenius error) approximation to A, but
in fact improves retrieval.
A query q is also mapped into this space, by

Query NOT a sparse vector.


Introduction to Information Retrieval

LSA Example
A simple example term-document matrix
(binary)

37
Introduction to Information Retrieval

LSA Example
Example of C = UVT: The matrix U

38
Introduction to Information Retrieval

LSA Example
Example of C = UVT: The matrix

39
Introduction to Information Retrieval

LSA Example
Example of C = UVT: The matrix VT

40
Introduction to Information Retrieval

LSA Example: Reducing the dimension

41
Introduction to Information Retrieval

Original matrix C vs. reduced C2 = U2VT

42
Introduction to Information Retrieval

Why the reduced dimension matrix is


better
Similarity of d2 and d3 in the original space: 0.
Similarity of d2 and d3 in the reduced space:
0.52 0.28 + 0.36 0.16 + 0.72 0.36 + 0.12
0.20 + 0.39 0.08 0.52

Typically, LSA increases recall and hurts


precision

43
Introduction to Information Retrieval

Simplistic picture
Topic 1

Topic 2

Topic 3
Data Fusion
Outline
What is data fusion?
Why use data fusion?
Previous work
Components of data fusion
System selection
Bias concept
Data fusion methods
Experiments
Conclusion

109
Data Fusion
Merging the retrieval results of multiple
systems.
A data fusion algorithm accepts two or more
ranked lists and merges these lists into a single
ranked list with the aim of providing better
effectiveness than all systems used for data
fusion.

110
Why use data fusion?
Combining evidence from different
systems leads to performance
improvement
Use data fusion to achieve better
performance than the individual
systems involved in the process.
Example metasearch systems
www.dogpile.com
www.copernic.com

111
Why use data fusion?
Same idea is also used for different query
representations
Fuse the results of different query
representations for the same request and
obtain better results
Measuring relative performance of IR systems
such as web search engines is essential
Use data fusion for finding pseudo relevant
documents and use these for automatic
ranking of retrieval systems

112
Components of data fusion
1. DB/search engine selector
Select systems to fuse
2. Query dispatcher
Submit queries to selected search engines
3. Document selector
Select documents to fuse
4. Result merger
Merge selected document results

113
Ranking retrieval systems

114
System selection methods
1. Best: certain percentage of top performing
systems used
2. Normal: all systems to be ranked are used
3. Bias: certain percentage of systems that
behave differently from the norm (majority
of all systems) are used

115
Calculating bias of a system
Similarity value

s(v, w)
v w i i
v: vector of norm

(v ) ( w ) w: vector of retrieval system


2 2
i i

Bias of a system
B(v, w) 1 s (v, w)

116
Example of calculating bias

2 systems: A and B
7 documents: a, b, c, d, e, f, g
ith row is the result for ith query
XA=(3, 3, 3, 2, 1, 0, 0) XB=(0, 2, 3, 0, 2, 3, 2)

norm vector X = XA+XB = (3, 5, 6, 2, 3, 3, 2)

s(XA,X)=49/[32][96]1/2 = 0.8841
Bias(A)=1-0.8841=0.1159

s(XB,X)=47/[30][96]1/2 = 0.8758
Bias(B)=1-0.8758=0.1242

117
Bias calculation with order
Order is important because users usually just look
at the documents of higher rank.

2 systems: A and B
7 documents: a, b, c, d, e, f, g
ith row is the result for ith query

Increment the frequency count of a document by m/i instead of 1


where m is number of positions and i position of the document.

m=4
XA=(10, 8, 4, 2, 1, 0, 0); XB=(0, 8, 22/3, 0, 2, 8/3, 7/3)
Bias(A)=0.0087; Bias(B)=0.1226

118
Data fusion methods
1. Similarity value models
CombMIN, CombMAX, CombMED,
CombSUM, CombANZ, CombMNZ

2. Rank based models


Rank position (reciprocal rank) method
Borda count method
Condorcet method
Logistic regression model

119
Similarity value methods

CombMIN choose min of similarity values


CombMAX choose max of similarity values
CombMED take median of similarity values
CombSUM sum of similarity values
CombANZ - CombSUM / # non-zero similarity values
CombMNZ - CombSUM * # non-zero similarity values

120
Rank position method
Merge documents using only rank positions
Rank score of document i (j: system index)
1
r (d i )
j 1 pos(dij )
If a system j has not ranked document i at all,
skip it.

121
Rank position example
4 systems: A, B, C, D
documents: a, b, c, d, e, f, g
Query results:
A={a,b,c,d}, B={a,d,b,e},
C={c,a,f,e}, D={b,g,e,f}
r(a)=1/(1+1+1/2)=0.4
r(b)=1/(1/2+1/3+1)=0.52
Final ranking of documents:
(most relev) a > b > c > d > e > f > g (least relev)

122
Borda Count method
Based on democratic election strategies.
The highest ranked document in a system gets
n Borda points and each subsequent gets one
point less where n is the number of total
retrieved documents by all systems.

123
Borda Count example
3 systems: A, B, C
Query results:
A={a,c,b,d}, B={b,c,a,e}, C={c,a,b,e}
5 distinct docs retrieved: a, b, c, d, e. So, n=5.
BC(a)=BCA(a)+BCB(a)+BCC(a)=5+3+4=12
BC(b)=BCA(b)+BCB(b)+BCC(b)=3+5+3=11
Final ranking of documents:
(most relevant) c > a > b > e > d (least relevant)

124
Condorcet method
Also, based on democratic election strategies.
Majoritarian method
The winner is the document which beats each of
the other documents in a pair wise comparison.

125
Condorcet example
3 candidate documents: a, b, c
5 systems: A, B, C, D, E
A: a>b>c - B:a>c>b - C:a>b=c - D:b>a - E:c>a

Pairwise comparison Pairwise winners


a b c Win Lose Tie
a - 4, 1, 0 4, 1, 0 a 2 0 0
b 1, 4, 0 - 2, 2, 1 b 0 1 1
c 1, 4, 0 2, 2, 1 - c 0 1 1

Final ranking of documents


a>b=c
126
Experiments
Turkish Text Retrieval System will be used
All Milliyet articles from 2001 to 2005
80 different system ranked results
8 matching methods
10 stemming functions
72 queries for each system
4 approaches for on the experiments

127
Experiments
First Approach
Mean average precision values of merged system
is significantly greater than al the individual
systems
Second Approach
Find the data fusion method that gives the highest
mean average precision value

128
Experiments
Third Approach
Find the best stemming method in terms of mean
average precision values
Fourth Approach
See the effect of system selection methods

129
Conclusion
Data Fusion is an active research area
We will use several data fusion techniques on
the now famous Milliyet database and
compare their relative merits
We will also use TREC data for testing if
possible
We will hopefully find some novel approaches
in addition to existing methods

130

Вам также может понравиться